As more organizations turn to generative artificial intelligence (genAI) tools to transform massive amounts of unstructured data and other assets into usable information, being able to find the most relevant content during the AI generation process is critical.
Retrieval augmented generation or “RAG” for short, is a technology that can do just that by creating a more customized genAI model that enables more accurate and specific responses to queries.
Large language models (LLMs), also called deep-learning models, are the basis of genAI technology; they’re pre-trained on vast amounts of unlabeled or unstructured data that, by the time a model is available for use, can be outdated and not specific to a task.
LLMs can consist of a neural network with billions or even a trillion or more parameters. RAG optimizes the output of an LLM by referencing (accessing) an external knowledge base outside of the information on which it was trained. In other words, RAG enables genAI to find and use relevant external information, often from an organization’s proprietary data sources or other content to which it’s directed.
It not only amplifies an LLM’s knowledge base “but also significantly improves the accuracy and contextuality of its outputs,” Microsoft explained in a blog.
RAG is essentially a design pattern that uses search functionality to retrieve pertinent data and add it to the prompt of a genAI model to better ground the generative output with factual and new information.
“RAG can be used for both retrieving public internet data as well as for retrieving data from private knowledge bases,” according to Gartner Research.
Patrick Lewis, a natural language processing research scientist with start-up Cohere, originally coined the term RAG in a paper published in 2020. Lewis pointed out that LLMs cannot easily expand or revise their memory, and they can’t straightforwardly provide insight into their predictions, leading to “hallucinations.”
Just last week, Slack unveiled AI-based tools for businesses and cited RAG as one way the company hopes to reduce halluciations in genAI results.
In addition to Cohere, more than a half dozen vendors provide native or stand-alone solutions for developers to build RAG-based applications for an LLM. They include Vectara, OpenAI, Microsoft Azure Search, Google Vertex AI, LangChain, LlamaIndex and Databricks.
“More and more the solutions around RAG — and enabling people to use that more effectively — are going to focus on tying into the right data that has business value as opposed to just the raw productivity improvements,” said Rick Villars, IDC group vice president of worldwide research.
With RAG, organizations can maximize the chances of producing accurate results based on factual inputs, said Avivah Litan, distinguished vice president analyst at Gartner. It also minimizes the chances of hallucinations, since outputs are grounded with retrieved data.
RAG also allows workers to find, summarize, and utilize the information they're looking for faster by using the power of third-party LLMs applied to an organization’s own data. It also helps protect the organization from liability incurred when copyrighted or other IP protected materials get incorporated into LLM responses.
“This possibility is greatly reduced, because the prompt responses can be grounded in enterprise data,” Litan said.
One way to get better access to business information using RAG is with a vector database and graph technologies that can tap into proprietary data and allow an organization to truly dig into the business value, Villars said.
A vector database stores, indexes, and manage massive quantities of high-dimensional vector data efficiently; as a result, companies are spending money to develop them or add vector search capabilities to their existing SQL or NoSQL databases and genAI use cases and applications.
By 2026, more than 30% of enterprises are expected to adopt vector databases to ground their foundation models with relevant business data, according to Gartner Research. Gartner lists vector databases as “critical enabler” enterprise technology for 2024.
Popular uses for vector databases include product recommendations, similarity search, fraud detection and generative-AI-powered, question-and-answer applications, according to Gartner.
Vector databases can and often do serve as the backbone of RAG systems. The databases store and manage data typically derived from text, images, or sounds, which are converted into mathematical vectors.
“The other part of that is back to app modernization, ” Villars said. “One of the biggest legacy install bases companies have today are old client-server apps and even early mobile and cloud apps built on Java. We have to modernize those to make them part of this AI story.”