Integrating operational data with LLMs

September 7, 2023
Using LLMs with custom data
Experience automated API building for yourself!
Start for free today.
No credit card required
See it in action for yourself!
100% non-binding.

Artificial Intelligence (AI) is rapidly transforming how we interact with data and information. One of the most significant advancements in this field has been the development of Large Language Models (LLMs) like OpenAI’s ChatGPT. To fully leverage these technologies, however, it’s crucial to learn how to integrate them with the knowledge that is specific to your business. This blog post explores the essential aspects of this integration including one of them that is often ignored, and how the RAW platform can help you accomplish it.

The Importance of AI Integration

The integration of AI into business processes and applications is no longer a nice-to-have but an actual business necessity. AI services, especially those powered by LLMs, offer unparalleled capabilities in processing and understanding natural language, making them invaluable for a wide range of applications, from simple chatbots to advanced data analysis or even recommendation systems. Companies require these advanced capabilities to remain competitive going forward.

The Key is to Integrate Real-Time Operational Data

In the context of LLMs, we often see discussions on how to train LLMs over internal company documents - e.g., PDFs, Word documents, or PowerPoint presentations. These documents undoubtedly contain invaluable information, but this "human-made data" is just a tiny subset of a company's knowledge base.

In fact, we argue that the most valuable data in your company is not in PDFs or Word documents. The "beating heart" of your company is inside its operational systems - e.g. inside CRM systems, HR systems, relational databases running your services, or even stored in log files in a data lake. We need to be able to tap into that knowledge in real time and make it visible to the LLM because that's where the real-time view of your business actually is. That is where novel insights are to be found; that's where we should expect to discover the new insights that never made it to a quarterly report document or presentation.

Your First Steps to become an AI-powered company

There is no well-defined pattern into becoming an AI-powered company, that is, a company that successfully integrates LLMs in their day-to-day activities for a competitive edge. After all, the field is new and rapidly changing. However, we start to see some patterns: companies need to start somewhere. Briefly, a typical path might be:

  • As a first step, create an internal chatbot service. Being an internal service, this presents somewhat reduced risks. Make sure to have this chatbot service tap not just into unstructured data (PDFs, Word docs) but also experiment with real-time operational data.
  • And as you gain experience, start progressively deploying AI-powered features in your analytics stack, e.g., use LLM services to improve your own data discovery or data analysis.

Every company's path towards AI is different. We do, however, strongly recommend bringing real-time operational data to the design, because that's where the main competitive advantages will reside. For instance, even an internal chatbot benefits immensely from real-time operational database: by feeding these bots with data from databases, data lakes, or web services, companies can significantly enhance the responsiveness and relevance of their answers. You will not serve stale data from a report generated many quarters ago, but provide real-time insights based on the latest activity or pressing issues.

Operational data is where novel insights are to be found; that's where we should expect to discover the new insights that never made it to a quarterly report document or presentation. You need to tap into operational systems in real-time to find those.

How to link operational data with AIs

Normally, LLMs are either trained using static documents, or more commonly nowadays, we employ techniques such as retrieval augmented generation (RAG).

However, there's a better way to link operational data. Unlike documents, operational data needs additional "context". A simple value such as "sales: 4000" is meaningless without context: sales of what product or what period? That's why linking operational data with an LLM is best done by creating a data product. This data product, in practice, is a REST API that reads data from one or more operational systems and provides the additional context - in the form of metadata - so that the LLM can interpret the data correctly. Because the REST API is serving data from the source directly, this means the AI system consumes the latest information. Platforms like RAW make this process very simple to develop and operate.

But what should these APIs look like? Well, many of the principles behind data sharing with external parties also apply to feeding data in real time to LLMs, so it’s worth reading about how to do so here. In fact, it’s vital to understand the general principles of building and hosting APIs, such as data security and compliance and scalability, and to employ the best practices in metadata management. And again, RAW does most of these tasks automatically for you, which makes it extremely simple to deploy.

What about sensitive data? Hallucinations?

So far we discuss, we have discussed d about OpenAI's ChatGPT, the juggernaut of the LLMs. There are two very valid concerns when using that system. The first is sensitive data - after all, for the LLM to interpret the operational data, it needs to have access to it. That, in many situations, is a no-go. And then, there's the risk of hallucinations - how can we be sure we are getting the right results? And even if the LLM is giving out responses that are coming from real datasets, how can we be sure it truly understood the question and is fetching the right "value"? Also, questions are sometimes ambiguous...

Here we may look at solutions beyond ChatGPT as well. One of our favorite alternatives is a system by Squirro, called SquirroGPT. SquirroGPT is an enterprise-grade, evidence-based chat service. This allows companies to leverage the power of AI while maintaining data security and compliance. Systems like SquirroGPT give you the answer but also explain where the data came from. This is invaluable. It's like reading a document and verifying if the references and footnotes came from trustworthy sources. As a reader/interpreter of the final results, you are then more able to judge the answer - did it read the right dataset? Did it understand what you were asking?

Finally, RAW provides out-of-the-box integration for building ChatGPT plugins and SquirroGPT as well. The choice of LLM to use is yours!

Next steps!

RAW stands out as a comprehensive solution for LLM integration, offering a straightforward approach to building AI-powered APIs. It provides native integration with LLMs and ready-to-use templates, facilitating a seamless integration process. This is in addition to the standout feature of RAW, which is its ability to enable the rapid development of APIs. Users can build APIs in minutes, which is crucial in today’s fast-paced business environment. This speed does not come at the cost of security, as RAW ensures secure access.

Want to give it a try? Sign up now to get started for free!

Start for free today.
No credit card required.

Still got questions?
Get a free custom consultation.