The Future of Predictions
My chatbot read 511 Nieman Lab predictions so you don’t have to.
We’ve got a bot for you to try. It’s a prototype of a RAG — prototype meaning it’s imperfect, RAG as in Retrieval-Augmented Generation, which we’ve talked about before. It’ll be fun, I promise.
And more than fun — and perhaps more terrifyingly — it points once more to how reading, and consuming, information will likely change.
I spent a few days this week building a chatbot that answers questions about perspectives and trends among the past three years of Nieman Lab predictions. Why those? There were 210 smart, insightful predictions for 2026 alone, and Gina and I don’t have time to read them all (though we applaud anyone who does). A tool to search and summarize predictions would offer real value, but we wanted to do more: We wanted to ask broader questions about trends and changes in predictions, something that a large language model is ideally suited to do.
So we built it and asked it about AI strategy. It gave us a 700-word report outlining how the predictions shifted, “from tentative experiments to existential necessity”:
About 23.5% of all predictions across the three years addressed AI strategy questions — making it one of the most discussed topics in the entire dataset. The conversation grew significantly over time: 19.4% in 2024, 19.7% in 2025, and 29% in 2026 — a 10 percentage point jump in the final year, suggesting AI moved from emerging concern to central preoccupation.
It had sections on how the conversation moved from “should we?” to “how fast?”; diverging opinions over whether AI adoption would be better for big or small newsrooms; emerging political dimensions; and “a fundamental rethinking of what journalism products even are,” which it dubbed “from articles to agents.” It even threw in a “what’s missing” section at the end:
Conspicuously absent across all three years: detailed discussion of governance structures, cross-newsroom standards, or collective action. Sam Guzik mentioned that “widespread adoption will likely lead to the development of industry standards and best practices,” but offered no vision of how. The predictions largely assumed individual newsrooms would navigate AI strategy alone, each making independent choices about tools, ethics, and business models.
The report included a dozen examples with links to the specific articles — answering our original questions about the predictions. Then it offered follow up topics to explore.
The bot can surface trends, shifts, and contradictions across years, topics, and authors. In testing, it provided detailed summaries of how opinions on AI, local news, and influencers evolved, with examples from individual predictions, without us ever clicking on an article.
We shared it with Josh Benton and Laura Hazard Owen of Nieman Lab to give a try. They called it “cool” (we’re flattered) and quickly found its limitations. For Laura, the bot was helpful for answering some of her own questions on the predictions — most intriguing for her was seeing how many fewer predictions about politics there were in 2026 than 2024 — but she sometimes had to rephrase a question to get a solid answer. Josh, who began the predictions in 2011, quickly flagged that it was over-generously categorizing them, throwing off the stats it provided.
We made some tweaks based on their feedback: adding instructions on asking clarifying questions, improving how the bot provides source links, and clarifying the detail it provides about when it runs a database query to count things vs. analyzes language.
That’s just a baby version of the kind of robust testing a developer team would do before a public product launch. We’re now letting you give it a whirl, knowing the bot makes occasional errors. A question phrasing we haven’t addressed yet could cause it to glitch. It should do its math right, but we can’t promise that (though we’ve included the ability to download a csv file of the resulting links, to help check its work). Occasionally it fails to populate links properly. It can miss things. In short, it’s a prototype.
The point of sharing it isn’t to show what a great tool we built; it’s to show that tools like this can be built by people like us, however imperfectly to begin with. The point is also to note that it’s both easier than you think — and simultaneously harder than it seems to get something robust enough for the public. The gap between serviceable prototype and public service technology is still quite wide.
In other words, while we think RAGs hold a lot of potential for journalism, both for newsgathering and in public-facing applications, they’re not some magic wand that can be built and deployed with high accuracy for pennies — at least not yet.
Regardless, this represents a level of prototyping I couldn’t have dreamed of a year ago. I’m not a technologist. Like Gina, I vibecode. I have some data journalism skills, but they stop at fairly simple python analysis and web scraping. Yet I built a chatbot that synthesizes insights across three years and hundreds of articles in less than a week and for about $15 (between subscriptions and API calls). It’s not perfect, and I wouldn’t release it to the public any time soon, but it shows three things:
How LLMs can synthesize information across source material at speed and scale, beyond what a human can easily do;
How our standard news outputs (articles) can be repackaged with LLMs to create additional value and more personalized user experience;
How the barriers to entry and costs to building prototypes and learning how AI tools work under-the-hood are collapsing.
Most critically, it shows that you can build it yourself.
This can get a bit nerdy, so bear with me; it’s worth understanding even if you don’t plan to build bots.
Some background: We chose the Nieman predictions for this experiment because they tend to be relatively short, clearly written (thank you, journalists), and have a single core idea, making them good candidates for building a simple RAG from scratch. But 511 of them is still too many for an LLM to read for every question, so they require some infrastructure for good querying. We also chose them because they’re a dataset we know somewhat well, having read a number of them ourselves. Plus our audience is journalists, and we figure you all might enjoy chatting with a database of news predictions. I know you’re dying to ask about Gina’s AI predictions.
Go ahead and try it. I’ll wait. Then I’ll tell you about some things I learned, challenges I faced, and why you should care.
There’s a chance you were told to come back later, sorry about that. The reason is the first challenge: cost.
Cost
We’re using Claude’s latest mid-tier model, Sonnet 4.5, for parsing text and providing insights. It’s doing an impressive job at spotting trends and even tracking things that are missing — when I asked it about the kinds of people writing about AI strategy, it gave me a section on gender and geography gaps, without being asked.
But that model is 3x more expensive than Claude’s cheapest, fastest model, Haiku. That means each prompt asking for a large trend costs around three to five cents. We put an initial $10 of API credit into the bot, which gets us a couple hundred queries at current rates. (You can thank us for our generosity later.) We tested some ways to lower the cost, but the quality of the insights the bot delivered dropped dramatically.
That API query when someone asks a question is most of the cost of the bot. Unlike a webpage or news app, where the marginal cost per user is fractions of a penny, every chatbot answer is generated fresh, so the cost scales linearly with usage. That cost per request increases if we add data or have the LLM examine more text — and eventually it will run out of memory (what’s called a context window) much like a human.
We can expect it to get cheaper — prices have dropped dramatically year over year, and there are engineering tricks to reduce costs — saving common answers, routing simpler questions to cheaper models (we built some of that into this bot). There are also open-source models you can run locally for a higher upfront cost, but near-zero per-query cost (those also have energy use and privacy benefits). But each API query will always cost something.
Size Limitations
To save on cost and context window, our bot does not reread each article every time. It chooses articles after going through a number of filters.
We’ve put the full text of 511 Nieman predictions into a structured dataset (it’s a small dataset, so we’re just talking about a regular spreadsheet). We could have done more, they go back to 2011 after all, but three years was enough for a balance of basic trends and low cost (see above). Each article is a row, with columns for headline, author, date, link, plus columns we created by having an LLM analyze each article once: keywords, topics, tone, core points, and summary.
The chatbot can query that database. For many questions, it generates decent answer from just those columns — without ever reading the full article text. If it’s a quantification question across the database (ex: “How many articles were about AI strategy each year?”) it will pull an answer by querying the data, rather than analyzing language. For more complex trend questions (ex: “How did opinions on AI change over time?”) the bot first filters predictions using those columns, ranks which are most responsive, then pulls a relevant section of 2,000 characters from up to 15 articles to create its report.
This setup is fairly accurate for getting the big ideas from our dataset of Nieman predictions. But there are clear drawbacks. We’re not actually reading the full text of the articles with each question, even though we have it, or using the dataset to its fullest potential — we’re asking it to filter through summaries and categories before it pulls in text samples. And the chatbot can miss that additional context, which means it can be directionally right without having all the specifics nailed.
This is a challenge with RAGs. Improving them comes down to a process called “chunking” — breaking articles into smaller pieces and tagging each piece with what it’s about, so the LLM can find the right chunk without reading everything. As humans, we do a version of this subconsciously when we research. But without proper guardrails, there’s clear potential for the original message to get distorted and the nuance to be lost when we supercharge that process with AI.
We didn’t spend a huge amount of time on the user interface either, so sometimes it thinks you’re asking a follow-up on your previous question, rather than a fresh one. But hey! This was a couple of hours of work.
Potential
The takeaway: I think you, too, should build a RAG. Yes, you, the non-technical journalist who reads our Substack because you are trying to learn about AI, but are intimidated by the idea of coding (we see you, you’re welcome here).
A lot of good (or “more accurate and specific”) uses of LLMs right now amount to systems thinking — and journalists are really good at this when we apply it to social, political, and economic systems. Going through the process of building, and troubleshooting, a basic RAG can teach you about how to apply that mindset to structuring a knowledge-base for information synthesis by an LLM, and how the knowledge you are already sitting on in your newsroom could be valuable in new ways in the AI-information age.
And you can try it without writing code.
I asked Claude Code to plan, build, and test this with me — I served as a project manager while the AI wrote the code. It’s not something I’d release as a branded tool without a real developer (like any of the good people from the News Product Alliance) to help us verify accuracy, reduce cost, review security, and tell me what else I’ve missed. But the barriers to building prototypes have dropped an astonishing amount.
This exercise got me thinking more about how articles become structured data, how the ways they are classified affect the quality of responses, and the cost and context limitations for commercial LLMs, even on a small dataset. They’re things everyone in journalism should be learning. RAGs and other AI tools are only becoming increasingly common — if public information is at the core of what we do, then we all need to learn about their potential and limitations. You can try now, or keep an eye out for a future post that breaks down how to try this.
The utility of this kind of document querying is clear for journalists — that’s why Google’s NotebookLM has been adopted by a number of newsrooms, and why Pinpoint was valuable before it. (There’s also the fact that the free tier of NotebookLM comes with Google accounts.) And, if you’ve never used it, NotebookLM is a great place to start getting familiar with how LLMs retrieve information from a set of records.
If this utility is clear for us, it’s clear for the rest of the world too; we’re not the only people drowning in information. If this is the way we’re likely to be combing through news and research, why wouldn’t everyone else be doing the same thing — including to our precious prose? If this kind of querying becomes the preferred way of consuming information, do articles become a premium product? Or extinct? Or maybe tools like this could actually drive more readership if they surface articles that readers might not otherwise have found?
Whatever the answers are, it seems essential to at least understand how these systems work. So your first homework assignment is this: What would you make a small RAG out of? A set of court cases? SEC filings? Articles from your own newsroom? A batch of records with a few hundred items or pages is a good starting point.
Or maybe all the content from this Substack.


