OpenEvidence, which is valued at $425 million, is taking on one of AI’s big engineering challenges: large language models whose training is stuck in the past.
By Katie Jennings, Forbes Staff
One of the limitations of large language models is that their training is frozen in time. If you ask OpenAI’s viral chatbot ChatGPT if Covid vaccines work against the most common variant circulating in 2023, it responds: “As an AI language model, I don’t have access to real-time data or information beyond my last update in September 2021.”
A tremendous amount has changed since then – there are new Covid strains, new vaccine and drug approvals, and tens of thousands of new scientific studies. In order for chatbots to be useful in a medical setting, they are going to need access to the latest research. Armed with $32 million in capital, nearly a dozen employees with PhDs (or PhD candidates) and a supercomputer in the Nevada desert, Daniel Nadler has been working to solve this knowledge cutoff problem with his new startup OpenEvidence.
Constantly retraining machine learning models requires huge amounts of costly computing power, but there is another option. It’s a technical and engineering challenge that involves “marrying these language models with a real-time firehose of clinical documents,” says OpenEvidence founder Nadler, 40. Essentially, granting the AI access to a new pool of data right before it goes to answer the question – a process computer scientists call “retrieval augmented generation.” If you ask OpenEvidence’s chatbot the question about vaccines and the new Covid variant, it responds that “specific studies on this variant are limited” and includes information from studies published in February and May 2023 with citations. The main difference, says Nadler, is that his model “can answer with an open book, as opposed to a closed book.”
This isn’t Nadler’s first time as the founder of an AI startup. He sold his previous company, Kensho Technologies, to S&P Global for $550 million (plus $150 million in stock) in 2018. Kensho is an AI-powered tool for Wall Street traders that analyzes millions of market data points to help identify arbitrage opportunities.
During the Covid pandemic, as the number of scientific studies about Covid-19 ballooned from zero to tens of thousands in the span of a few months, Nadler saw that healthcare providers were facing a similar problem to traders: how to separate credible, actionable information from the noise. He soon learned that wasn’t just true of Covid studies, but of the medical field more broadly, as around two scientific papers are published every minute. “The fundamental construct of the problem was identical,” says Nadler. “An information overload and a need to triage that information and a need to use computers to do so.”
U.S. Venture Capital Funding In Healthcare Artificial Intelligence/Machine Learning Startups
Venture capital investors have poured more than $46 billion dollars into U.S. healthcare-focused artificial intelligence and machine learning startups over the past decade, according to data from PitchBook. Investment peaked at $13.4 billion in 2021 and was down to $10.3 billion in 2022. Startups have raised $3 billion across 205 deals so far this year.
Nadler founded OpenEvidence in November 2021. After investing $5 million of his own money, he says he closed a $27 million Series B funding round from outside investors in July 2022, valuing the startup at $425 million. He opened the round to former Kensho investors, including billionaire venture capitalist Jim Breyer, billionaire Vista Equity Partners cofounder Brian Sheth and investment banker Ken Moelis, among others. In March, OpenEvidence was selected to participate in a Mayo Clinic Platform accelerator. Since then, Nadler says more than 10,000 clinicians have signed up for early access, which is what’s driving him to come out of stealth now.
Nadler says OpenEvidence is trying to take on the big incumbent database used by two million healthcare workers worldwide called UpToDate from the Netherlands-based global data company Wolters Kluwer. The clinical solutions in Wolters Kluwer’s health division, which includes UpToDate, generated more than $900 million in revenue in 2022. UpToDate relies on more than 7,000 human experts to write and edit the entries around medical topics, according to Wolters Kluwer Health spokesperson Suzanne Moran. “Topics in UpToDate are revised when important new information is published,” Moran said in a statement. Editors review more than 420 peer-reviewed journals.
Where Nadler sees AI having an advantage over the human-edited entries, is that OpenEvidence is interactive rather than a static page of text, meaning users can tailor their questions to precise patient scenarios and ask follow-ups, rather than having to read through huge chunks of text. It can also scan tens of thousands of journals compared to hundreds. The document pool that OpenEvidence is retrieving information from includes more than 35 million journal articles. Nadler says it sifts through the National Library of Medicine, which includes more than 31,000 peer-reviewed journals, multiple times a day. He says there is around a 24-hour lag time to process the new journal articles and get them into the retrieval pool.
All that data poses one potential logjam for Nadler’s goals, though: not all journal articles are created equal when it comes to the quality of what they publish. The scientific community has a ranking system known as impact factor, which means journals that are more highly cited are more important on a relative basis compared to journals with fewer citations. The OpenEvidence models factor this in when retrieving information from the pool of new journal articles. “You have evidence weighted answers,” says Nadler, meaning the “quality of the input source” is taken into account.
Every large language model behaves differently, but the general idea is that they compose answers by predicting the next most likely word in a sentence. When the models tend to get an answer wrong is when “many different completions [are] equally probable,” says Uri Alon, a postdoctoral researcher at the Language Technologies Institute at Carnegie Mellon University, who is not affiliated with OpenEvidence.
If you take a model that’s been trained on the internet and ask it about a famous person, it is likely to get biographical information correct. But if you ask about a regular person that it doesn’t have training data on, it might generate an incorrect response, known as an “hallucination.” Now, if you provide the model with a pool of information, including that regular person’s biographical data, it would be much more likely to get it right. “Some approaches allow you not only to generate an answer that is consistent with those documents that you retrieve,” says Alon. “But also pull the exact sentence or exact paragraph that says so.”
This is the approach that OpenEvidence is taking by providing citations to the journal articles that it is pulling from. However, Alon cautions that while retrieval augmented systems may help reduce hallucinations, nothing is bulletproof. These models will always be fallible much the same as humans. “If you gave a human a bunch of documents or paragraphs, let the human read it and then answer questions, and also ask the human to tell you where their answer came from in those documents, even humans would make mistakes,” he says.
Right now, OpenEvidence is free to use for early adopters who are licensed medical professionals. Part of the rationale for this is the amount of computing power – and expense – it takes to run the queries. Antonio Forte, a professor of plastic surgery at Mayo Clinic who is on OpenEvidence’s medical advisory board, says he uses UpToDate on a regular basis. Forte says the biggest difference using OpenEvidence over the past few weeks has been the time savings. Rather than having to read through the equivalent of a book chapter, he can get an answer “within 30 seconds, not within 10 minutes.”
The hope is that other healthcare workers will have a similar reaction to Forte. Nadler says he hasn’t decided on a revenue model yet. He is debating between subscription-based and ad-based, but is leaning towards a hybrid, an ad-based model with a subscription upsell. But one thing is for sure. OpenEvidence will not become a chatbot for the average patient. “That is not a technical problem. That is a regulatory and ethical problem,” says Nadler, which is why he wants to create a tool to help doctors and nurses but have them still rely on their human judgment. “[There] is a very firm limit to any conceivable harm that could come from the usage of the technology to a patient, because it is always being intermediated by a professional.”
Additional reporting by Kenrick Cai
MORE FROM FORBES