SemDB: Solving the Challenges of Graph RAG

Matt Furnari
11/21/2024

In the beginning there was keyword search

Eventually word embeddings came along and we got Vector Databases and Retrieval Augmented Generation (RAG). They were good for writing blog posts about topics that sounded smart, but didn’t actually work well in the real world. Fast forward a few years and some VC hungry individuals bolted Graph Databases onto the Vector Databsaes and Graph RAG was born. 

It’s still great for blog posts. Still doesn’t work well in the real world. 

Enter SemDB.ai. 

SemDB is an abbreviation for Semantic Database. It’s a database of “semantics” – a database of meaning. SemDB strives to go beyond mathematical tricks and triples. It stores “meaning”. It allows us to index, retrieve, and act upon data by its meaning – not just its cosine similarity.

Behind the scenes, SemDB uses Ontology-Guided Augmented Retrieval (OGAR); a leap forward, enabling faster, more cost-effective, and scalable solutions for real-world applications.

In this post we will focus on a few shortcomings of the Graph RAG approach and how SemDB solves them. Take a look at this article Graph RAG Has Awesome Potential, But Currently Has Serious Flaws | by Troyusrex | Generative AI for an overview of both Graph RAG and some of its problems. 

Advantages of Graph RAG

Graph RAG is a huge advance over traditional Vector search.
  • Enhanced Contextual Understanding: By leveraging graph structures, Graph RAG can capture complex relationships between entities, leading to more accurate and context-aware information retrieval. This is particularly useful for tasks requiring deep understanding and reasoning.
  • Improved Retrieval Precision: Graph RAG can improve retrieval precision by using graph-based indexing and retrieval methods. This ensures that the most relevant information is retrieved, even if it is buried within a large dataset.
  • Mitigation of Hallucination: Traditional language models sometimes generate "hallucinated" information, which is not accurate or relevant. Graph RAG helps mitigate this issue by referencing structured knowledge bases, ensuring the generated content is grounded in factual data.
  • Domain-Specific Knowledge: Graph RAG can be tailored to specific domains by incorporating domain-specific knowledge graphs, making it highly effective for specialized applications such as legal research, medical diagnostics, and technical documentation.

Problems with Graph RAG

But, real world Graph RAG applications have a couple significant problems:
  • Speed: Graph RAG is horrendously slow for real world applications, often taking minutes to respond.
  • Cost: Data preparation can cost many thousands of dollars for moderately sized datasets.
  • Scalability: The reliance on clustered communities makes scaling challenging.
  • Accuracy: Testing has shown little increase in search accuracy compared to traditional RAG.

SemDB to the Rescue

If the progression has been

Keyword Search → Vector Search (RAG) → Graph Search (Graph RAG)

Then let’s skip ahead a few progressions and get the end:
Keyword Search → Vector Search (RAG) → Graph Search (Graph RAG) → ??? →OGAR - Ontology Guided Augmented Retrieval
You gotta admit, it’s an awesome acronym, right? OGAR…. Grrr

Vector Search and Graph RAG attempt to allow us to search by meaning. Before the arrival of ChatGPT, scientists used to think about things like “How do we represent meaning? What does it mean “to mean”?” There is a rich history of meaning representation that goes beyond word embeddings (vectors) and triples (graphs). Unfortunately, it’s now easier to outsource every task to a multi-hundred gigabyte neural network, than it is to write code. When all you have is an LLM, everything looks like a prompt engineering task

In contrast to Graph RAG, Semantic Database (SemDB) is designed to handle complexity effortlessly. Its ontology-driven framework and Local Understanding solve the problems of Graph RAG.

Local Understanding

As I previously mentioned, not everything needs to be outsourced to ChatGPT. SemDB is able to understand somewhere around 80-90% of sentence inputs without the use of an LLM. That means it can do 80-90% of the processing work without paying a per-token fee.

One of the greatest challenges with traditional Graph RAG systems is the prohibitively high cost of entity extraction, driven by heavy reliance on LLMs. Each data chunk and cluster requires multiple LLM calls, quickly adding up to tens of thousands of dollars for large datasets. SemDB, however, does most of this work locally, without involving Big Brother Open AI.

Why is that important?
  • Cost: Less LLM calls mean less $$$.
  • Accuracy: Local Understanding allows for Organization Specific vocabularies.
  • Speed: Local Understanding means local processing… and that’s fast.
  • Security: Not every piece of data needs to be sent to our AI overlords, so that they may use it to train their next models
  • Note: Open AI and Google both super-duper promise not to ever use your data to train their models. Seriously, they pinky-sweared and everything.

Cost Advantages of Local Understanding

With Local Understanding, SemDB significantly reduces the dependency on costly LLM calls, allowing organizations to process larger datasets at a fraction of the price:
  • Reduced External LLM Calls:
  • Traditional systems require 1 LLM call per data chunk and 1 per cluster. SemDB’s Local Understanding handles these tasks algorithmically, bypassing the need for external calls entirely.
  • This approach slashes costs, making large-scale projects financially viable.
  • Scalable Data Extraction:
  • Because Local Understanding operates within the organization’s infrastructure, there is no incremental cost for scaling. SemDB can handle datasets with millions of entities without ballooning expenses.
  • For example, where traditional methods might cost $60,000 for a million records, SemDB achieves the same results at a fraction of the cost, with no ceiling on dataset size or complexity.
  • Optimized Processing for Domain-Specific Graphs:
  • By tailoring its Local Understanding capabilities to the specific needs of the organization, SemDB enables the creation of more complex, richly detailed graphs without incurring additional costs.

Beyond Cost Savings: Enabling Richer Graphs

SemDB’s ability to extract more data for less cost doesn’t just save money—it also empowers organizations to build bigger, more detailed, and more accurate graphs:
  • Incorporating Nuanced Relationships: Local Understanding allows SemDB to detect subtle, domain-specific relationships that external systems might overlook, enriching the knowledge graph with deeper insights.
  • Expanding Data Coverage: By lowering costs, organizations can afford to process larger datasets, capturing more entities and relationships that drive value.
  • Iterative Improvement: SemDB’s architecture allows for ongoing refinement of graphs as new data becomes available, further enhancing accuracy and depth.
  • Organization Specific Vocabularies: Every company has their own lingo, vocabulary, and internal speak that the LLMs don’t fully understand. SemDB is able to capture that meaning, store it, and operate upon it like any other semantic nugget.
Organizations form their own vocabularies

Conclusion

At Intelligence Factory we use SemDB as the backbone of our applications. It allows us to build complex graphs for various domains. Honestly, our customers don’t care one bit about the advantages of Ontologies over Graphs. Some projects we’ve built on SemDB:
  • HIPAA Compliant Chat Bots: That don’t hallucinate give dieting advice to anorexics.
  • Iterative Improvement: SemDB’s architecture allows for ongoing refinement of graphs as new data becomes available, further enhancing accuracy and depth.
  • Sales Tools: To discover mine thousands of conversations for missed opportunities
What’s most important, however, is that you can take advantage of these technologies with our consumer focused products: FeedingFrenzy.ai and SemDB.ai. Both are built on this infrastructure and offer features that make running your business easier. For the more technical side of things, feel free to check out Buffa.ly.

Read More

When Retrieval Augmented Generation (RAG) Fails

11/25/2024
Retrieval Augmented Generation (RAG) sounds like a dream come true for anyone working with AI language models. The idea is simple: enhance models like ChatGPT with external data so...
Read more

SemDB: Solving the Challenges of Graph RAG

11/21/2024
In the beginning there was keyword search
Eventually word embeddings came along and we got Vector Databases and Retrieval Augmented...
Read more

Metagraphs and Hypergraphs with ProtoScript and Buffaly

11/20/2024
In Volodymyr Pavlyshyn's article, the concepts of Metagraphs and Hypergraphs are explored as a transformative framework for developing relational models in AI agents’ memory systems...
Read more

Chunking Strategies for Retrieval-Augmented Generation (RAG): A Deep Dive into SemDB's Approach

11/19/2024
In the ever-evolving landscape of AI and natural language processing, Retrieval-Augmented Generation (RAG) has emerged as a cornerstone technology...
Read more

Is Your AI a Toy or a Tool? Here’s How to Tell (And Why It Matters)

11/07/2024
As artificial intelligence (AI) becomes a powerful part of our daily lives, it’s amazing to see how many directions the technology is taking. From creative tools to customer service automation...
Read more

Stop Going Solo: Why Tech Founders Need a Business-Savvy Co-Founder (And How to Find Yours)

10/24/2024
Hey everyone, Justin Brochetti here, Co-founder of Intelligence Factory. We're all about building cutting-edge AI solutions, but I'm not here to talk about that today. Instead, I want to share...
Read more

Why OGAR is the Future of AI-Driven Data Retrieval

09/26/2024
When it comes to data retrieval, most organizations today are exploring AI-driven solutions like Retrieval-Augmented Generation (RAG) paired with Large Language Models (LLM)...
Read more

The AI Mirage: How Broken Systems Are Undermining the Future of Business Innovation

09/18/2024
Artificial Intelligence. Just say the words, and you can almost hear the hum of futuristic possibilities—robots making decisions, algorithms mastering productivity, and businesses leaping toward unparalleled efficiency...
Read more

A Sales Manager’s Perspective on AI: Boosting Efficiency and Saving Time

08/14/2024
As a Sales Manager, my mission is to drive revenue, nurture customer relationships, and ensure my team reaches their goals. AI has emerged as a powerful ally in this mission...
Read more

Prioritizing Patients for Clinical Monitoring Through Exploration

07/01/2024
RPM (Remote Patient Monitoring) CPT codes are a way for healthcare providers to get reimbursed for monitoring patients' health remotely using digital devices...
Read more

10X Your Outbound Sales Productivity with Intelligence Factory's AI for Twilio: A VP of Sales Perspective

06/28/2024
As VP of Sales, I'm constantly on the lookout for ways to empower my team and maximize their productivity. In today's competitive B2B landscape, every interaction counts...
Read more

Practical Application of AI in Business

06/24/2024
In the rapidly evolving tech landscape, the excitement around AI is palpable. But beyond the hype, practical application is where true value lies...
Read more

AI: What the Heck is Going On?

06/19/2024
We all grew up with movies of AI and it always seemed to be decades off. Then ChatGPT was announced and suddenly it's everywhere...
Read more

Paper Review: Compression Represents Intelligence Linearly

04/23/2024
This is post is the latest in a series where we review a recent paper and try to pull out the salient points. I will attempt to explain the premise...
Read more

SQL for JSON

04/22/2024
Everything old is new again. A few years back, the world was on fire with key-value storage systems...
Read more

Telemedicine App Ends Gender Preference Issues with AWS Powered AI

04/19/2024
AWS machine learning enhances MEDEK telemedicine solution to ease gender bias for sensitive online doctor visits...
Read more