Introducing RAG 2.0: Agentic RAG + Knowledge Graphs (FREE Template)
by Cole Medin • Comprehensive analysis and insights
đź“‹ Table of Contents
Introduction
Introduction to Advanced RAG Strategies: From Limitations to Synergistic Solutions
In the evolving landscape of retrieval-augmented generation (RAG) systems, traditional approaches often fall short in delivering the flexibility required for complex, real-world AI applications. As analyzed in the video, the speaker delves into innovative enhancements that address these shortcomings, drawing from extensive experimentation with various RAG techniques. This section synthesizes the key insights from the presentation, highlighting the transition from basic RAG implementations to more sophisticated, agent-driven architectures that integrate multiple knowledge retrieval mechanisms.
Traditional "naive" or "vanilla" RAG, as termed in the video and supporting resources such as the Weaviate article on Agentic RAG, operates on a rigid pipeline: documents are chunked, embedded into vectors, stored in a vector database, and retrieved based on semantic similarity to a user's query. This context is then force-fed into a large language model (LLM) for generation. While effective for straightforward queries, this method is inherently inflexible. It lacks the capacity for iterative refinement, multi-source exploration, or adaptive decision-making, limiting its utility in dynamic scenarios where queries may require relational insights or combined data representations.
"naive rag is almost never enough is because it is extremely inflexible."
To overcome these limitations, the video introduces two advanced, complementary strategies: Agentic RAG and hybrid knowledge retrieval using knowledge graphs. These approaches, when combined, enable the construction of more intelligent and flexible AI systems capable of handling a broader spectrum of queries with enhanced reasoning and precision.
- Agentic RAG: This strategy empowers an AI agent with the autonomy to reason about knowledge exploration. Rather than passively receiving pre-retrieved context, the agent dynamically selects tools, formulates queries, and iterates on searches. For instance, the speaker demonstrates an agent that chooses between vector searches and graph traversals based on query nature, allowing for adaptive retrieval. This is practically implemented by customizing the agent's system prompt (e.g., in a
prompts.py
file) to define logic for tool selection, as seen in the provided GitHub repository. - Hybrid Knowledge Retrieval: By integrating vector databases (e.g., PostgreSQL with pgvector for semantic similarity searches) and knowledge graphs (e.g., Neo4j with Graphiti for relational queries), this approach leverages the strengths of both paradigms. Vector databases excel in entity-specific lookups, such as "What are Google's AI initiatives?", while knowledge graphs handle relational queries like "How are Microsoft and OpenAI related?". The video's demonstration shows a hybrid system ingesting the same data into both structures via a script like
ingestion.py
, enabling the agent to answer diverse queries more effectively.
These strategies are synergistic, as evidenced by the speaker's custom agent template, which combines them to create a "powerhouse" system. As the speaker emphasizes:
"The two strategies that I keep going back to time and time again are agentic rag and knowledge graphs."
In practical applications, this synergy allows for temporal tracking of knowledge changes, complex relationship analysis, and query-specific tool selection, making it ideal for production-ready RAG systems. The subsequent sections of this article will explore the implementation details, technical stack, and setup processes drawn from the video, providing a comprehensive guide to building such systems.
The Conceptual Leap: From Naive Pre-Processing to Agentic Reasoning
The Conceptual Leap: From Naive Pre-Processing to Agentic Reasoning
Building on the introduction's overview of traditional Retrieval-Augmented Generation (RAG) limitations and the promise of Agentic RAG as a more intelligent strategy, this section delves into the core conceptual shift that defines Agentic RAG. In the video, the presenter provides a detailed analysis of this evolution, emphasizing how it transforms static data retrieval into a dynamic, adaptive process. As I synthesize the content, it's clear that this leap enables AI systems to handle complex queries with greater precision and flexibility, particularly when integrated with hybrid knowledge sources.
Defining Naive RAG: The Inflexible Foundation
Naive RAG, also referred to as vanilla or classic RAG in the video, operates as an inflexible pre-processing step in the retrieval pipeline. Here, documents are chunked, embedded into vector representations using models like OpenAI's text-embedding-3-small, and stored in a vector database such as PostgreSQL with the pgvector extension. When a user query arrives, it is similarly embedded and matched against the database to retrieve relevant chunks, which are then forcibly injected as context into the large language model's (LLM) prompt. The presenter illustrates this through a diagram from a Weaviate article (linked below), highlighting the rigidity: the system commits to a single retrieval path without options for refinement or alternative strategies. This approach, while straightforward, restricts the AI's ability to adapt to nuanced queries, often leading to suboptimal responses when relational or multifaceted information is required.
Introducing Agentic RAG: Empowering Dynamic Tool Selection
In contrast, Agentic RAG represents a paradigm shift by equipping an AI agent with a suite of tools—such as vector search, knowledge graph traversal, or even web search—and granting it the autonomy to reason about which tool or combination to employ based on the user's query. The presenter demonstrates this in a command-line interface where the agent analyzes queries like "How are OpenAI and Microsoft related?" and selects a knowledge graph tool for relational insights, or defaults to vector search for entity-specific details like Google's AI initiatives. This reasoning-driven process allows the agent to formulate queries dynamically, refine searches iteratively, or blend multiple retrieval methods, resulting in more contextually relevant and accurate outputs. As an advanced strategy, it addresses the inflexibility of naive RAG by treating retrieval as an integral part of the agent's decision-making loop rather than a detached pre-step.
The Core Implementation: System Prompt as the Reasoning Engine
At the heart of this Agentic RAG implementation lies the agent's system prompt, which serves as the blueprint for tool selection logic. In the video, the speaker explains that this prompt—customizable in a file like prompts.py
within the provided template—defines conditional rules for tool usage. For instance, it might instruct the agent to invoke a knowledge graph tool only for queries involving relationships between entities (e.g., companies like Amazon and Anthropic), while relying on vector search for isolated facts. This customization ensures the agent aligns its reasoning with the specific structure of the knowledge base, such as combining vector databases for semantic similarity and knowledge graphs for relational navigation. Practically, developers can iterate on this prompt to optimize for their dataset, enabling applications like hybrid retrieval systems that evolve with user needs.
"A gentic rag is all about giving the agent the ability to reason about how it explores the knowledge base instead of always force-feeding that context as kind of a pre-processing step."
This quote encapsulates the essence of the conceptual leap, as analyzed in the presenter's discussion. By embedding reasoning into the retrieval process, Agentic RAG not only enhances accuracy but also supports scalable, production-ready systems. For further reading on this comparison, refer to the Weaviate article referenced in the video: What is Agentic RAG?.
Agentic RAG in Action: A Hybrid Retrieval System with Vector DBs and Knowledge Graphs
Agentic RAG in Action: A Hybrid Retrieval System with Vector DBs and Knowledge Graphs
In building on the foundational concepts of Agentic RAG as a dynamic, reasoning-driven process—distinct from the static pipelines of traditional RAG, as outlined in prior sections—this segment delves into a practical implementation showcased in the video. The presenter, Cole Medin, demonstrates an agentic system that integrates vector databases and knowledge graphs, highlighting their synergistic potential through a real-world case study. As the website author analyzing this content, I synthesize the key elements of Medin's template to illustrate how such a hybrid approach enhances query handling in AI agents.
Case Study: AI Initiatives of Major Tech Companies
Medin constructs a case study centered on the AI initiatives of prominent tech firms, including Google, Microsoft, OpenAI, Amazon, and Anthropic. The core dataset, derived from markdown documents detailing these initiatives, is ingested into both a vector database (using PostgreSQL with the pgvector extension via Neon) and a knowledge graph (powered by Neo4j and the Graphiti library). This dual representation allows the agent to access the same information in structurally diverse formats: dense vector embeddings for semantic similarity and a graph-based structure for entities and relationships.
As Medin explains in the video, "We're storing the same data in a vector database and then also in a knowledge graph representing it very differently." This setup, available in the GitHub repository, includes an ingestion script (ingestion.py
) that automates the population of both stores from source documents, ensuring consistency while leveraging each system's strengths.
Contrasting Strengths: Semantic Search vs. Relational Queries
The hybrid system capitalizes on the complementary capabilities of vector databases and knowledge graphs. Vector databases, such as the pgvector-enabled PostgreSQL instance, excel in semantic similarity searches, making them ideal for queries focused on individual entities. For instance, a query like "What are Google's AI initiatives?" benefits from efficient retrieval of contextually relevant document chunks via cosine similarity on embeddings.
In contrast, knowledge graphs shine in handling relational queries that involve connections between multiple entities. Medin illustrates this with examples like Amazon's investment in Anthropic or Microsoft's partnership with OpenAI, where the graph's nodes (termed "episodes" in Graphiti) and edges capture intricate relationships, such as infrastructure dependencies (e.g., OpenAI's exclusive use of Azure).
Medin notes the trade-offs: populating a knowledge graph is computationally intensive, requiring an LLM to extract entities and relationships from documents, which can take minutes compared to seconds for vector ingestion. However, this upfront cost unlocks advanced query capabilities unattainable with vectors alone, enabling traversal of relational paths for deeper insights.
Demonstrating Agentic Decision-Making in the Demo
The video's command-line demo vividly showcases the agent's intelligence in tool selection, embodying the agentic principles discussed earlier. Hosted via a FastAPI endpoint with Pydantic AI as the agent framework, the system allows real-time interaction where the agent reasons over its tools based on the query type.
- For the query "What are the AI initiatives for Google?", the agent selects the vector search tool, retrieving and synthesizing relevant chunks to provide a comprehensive response on Google's efforts.
- For "How are OpenAI and Microsoft related?", it invokes the graph search tool, querying the knowledge graph to detail their partnership, including Azure's role as the sole hosting provider.
- In a hybrid scenario, such as "What are the initiatives for Microsoft? How does that relate to Anthropic? Use both search types," the agent combines both, first using vector search for entity-specific details and then graph traversal for relational analysis.
This decision-making is guided by a configurable system prompt in prompts.py
, where users can define rules like using the graph only for multi-entity questions. As Medin demonstrates, the agent's logs reveal its reasoning, such as formulating sub-queries (e.g., "OpenAI Microsoft relationship") and selecting tools dynamically. This flexibility, as per the Weaviate article referenced in the video, extends beyond single-source retrieval to support complex, multi-step explorations.
"That is what agentic rag is all about... giving the agent the ability to reason about how it explores the knowledge base."
By analyzing Medin's implementation, it's evident that this hybrid agentic RAG system not only addresses the inflexibility of traditional methods but also paves the way for more sophisticated AI applications, such as temporal tracking of knowledge changes via graphs. For those implementing this, the template supports various LLM providers (e.g., OpenAI, Ollama) and includes setup for local Neo4j via the Local AI Package, making it accessible for experimentation.
Practical Implementation: A Toolkit for Building Your Own Agentic System
Practical Implementation: A Toolkit for Building Your Own Agentic System
Building on the conceptual foundations of Agentic RAG outlined earlier and the concrete example of a hybrid retrieval system that integrates vector databases and knowledge graphs, this section delves into the practical aspects of constructing such a system. As the website author analyzing the video content, I synthesize the speaker's guidance to provide a technical reference for developers. The speaker demonstrates a reusable template that enables agents to dynamically select retrieval strategies, and here I outline the tools, setup procedures, and methodologies to replicate and extend this approach. This toolkit emphasizes precision in implementation, leveraging open-source libraries and cloud services for scalability and efficiency.
Core Technology Stack
The speaker's hybrid agentic system relies on a carefully selected stack that balances ease of use, performance, and extensibility. At its heart is Pydantic AI, a framework for building structured AI agents that handle dynamic reasoning and tool integration. For the knowledge graph component, Graffiti serves as the abstraction layer, interfacing with Neo4j as the underlying graph database engine. This combination allows for relational querying and temporal tracking of entities, where nodes (referred to as "episodes" in Graffiti) represent evolving knowledge states.
On the vector database side, Postgres is extended with PGVector to enable semantic similarity searches. The speaker recommends hosting this via Neon, a managed Postgres platform, which simplifies deployment and scaling. Additional components include FastAPI for creating a responsive API endpoint with streaming capabilities, and support for multiple LLM providers to ensure flexibility in model selection.
- Pydantic AI: Core agent framework for reasoning-driven workflows.
- Graffiti and Neo4j: For constructing and querying knowledge graphs with relational depth.
- Postgres with PGVector (via Neon): Handles vector embeddings for semantic retrieval.
- FastAPI: Provides the backend API for agent interactions.
This stack aligns with industry trends, such as OpenAI's exclusive hosting on Microsoft Azure and Anthropic's infrastructure on AWS, reflecting strategic partnerships that influence model availability and embedding choices.
Setup Process and Data Ingestion
To operationalize the system, the speaker provides a streamlined setup process in the GitHub repository (linked below). Prerequisites include Python, a Postgres instance (e.g., via Neon), a Neo4j database, and an LLM API key. The process begins with creating a virtual environment, installing dependencies, and configuring environment variables in a .env
file. This includes database URLs, Neo4j credentials, and LLM settings, allowing for providers like OpenAI, Ollama, or Gemini.
A key script, ingestion.py
, automates the population of both the vector database and knowledge graph from source documents. Documents are placed in a documents/
folder, typically in Markdown format. Running python -m ingestion.py --clean
initializes connections, chunks the documents, generates embeddings, and inserts data into Postgres via PGVector for vector storage. Simultaneously, it uses an LLM to extract entities and relationships, populating the Neo4j graph through Graffiti.
This dual ingestion is computationally intensive for the graph due to LLM calls for entity resolution, often taking minutes compared to seconds for vector insertion. The speaker notes that the same source data is represented differently in each store, enabling the agent to choose based on query needs—semantic similarity for single-entity lookups and relational traversal for multi-entity questions.
For Neo4j setup, options include the speaker's Local AI Package for local deployment or Neo4j Desktop for a graphical interface. Once configured, the agent API is launched with python -m agent.api
, and interactions occur via a CLI tool (python cli.py
) or direct HTTP requests to endpoints like /chat
or /chat/stream
.
"We're storing the same data in a vector database and then also in a knowledge graph representing it very differently."
Structured AI-Assisted Coding Methodology
The speaker advocates a disciplined approach to development using AI assistants like Claude Code, contrasting it with unstructured "vibe coding." This methodology involves creating planning.md
and task.md
files to guide the process. In Claude Code's plan mode (accessed via Shift+Tab twice), developers collaborate with the AI to outline project architecture, components, and a granular task list. A global rules file, claw.md
, defines behaviors such as MCP server usage for external documentation and database management.
Once planned, exiting plan mode allows the AI to execute tasks autonomously, potentially running for 35 minutes or more while coding, testing, and iterating. The speaker incorporates examples from prior projects to inform best practices, ensuring robust outputs. This agentic coding mirrors the system's own reasoning, producing a production-ready template with unit tests and extensibility.
"Cloud code really does stand above other AI coding assistants right now just because of how agentic it is."
Practical Tips and Considerations
For optimal performance, use OpenAI's text-embedding-3-small
model (1536 dimensions) for embeddings, adjusting SQL schemas if switching models. Configure separate LLMs for reasoning (e.g., GPT-4o-mini) and embeddings to leverage provider strengths—useful when one service lacks embedding support.
Neon is recommended for its generous free tier and expertise (founded by a 20+ year Postgres contributor), making it ideal for prototyping. Tweak the agent's system prompt in prompts.py
to define tool selection logic, such as using graphs for relational queries. For local setups, integrate Ollama to run everything offline.
- Start with Neon's free tier: Sign up for $100 credit.
- Access the full template: GitHub Repository.
- Local Neo4j via: Local AI Package.
- Graphiti library: GitHub.
This toolkit empowers developers to build flexible, reasoning-driven systems, extending the hybrid approach to diverse applications while maintaining technical precision.
Conclusion
Conclusion
In synthesizing the insights from the video, the speaker effectively demonstrates how Agentic RAG represents a pivotal evolution from naive retrieval-augmented generation approaches, enabling the development of more intelligent and adaptable AI applications. This progression shifts AI systems from rigid, predefined pipelines to dynamic architectures where agents can reason over queries, select appropriate retrieval strategies, and integrate diverse knowledge sources for enhanced accuracy and relevance.
The presenter emphasizes that the real strength of these agentic systems emerges through hybrid retrieval paradigms, such as integrating vector databases for semantic similarity searches with knowledge graphs for relational traversals. This combination empowers agents to dynamically assess query requirements—opting for vector-based lookups for entity-specific information or graph explorations for uncovering interconnections—thereby optimizing response quality in complex scenarios like analyzing corporate AI partnerships.
Moreover, the video underscores the practicality of implementing such advanced setups today, leveraging a structured development methodology alongside accessible tools. By employing Claude Code for agentic coding assistance, Neon for scalable PostgreSQL with pgvector extensions, and Neo4j for robust knowledge graph management, developers can efficiently prototype and deploy these architectures. As the speaker illustrates through the provided template, this ecosystem facilitates rapid iteration, from ingestion pipelines to real-time query handling, making sophisticated AI systems attainable even for those starting with basic setups.
For those looking to extend this work, the GitHub repository offers a complete, production-ready implementation: Agentic RAG with Knowledge Graphs Template. Additional resources include the Weaviate article on Agentic RAG (link) and Neon's platform for database needs (sign up with $100 credit).
📚 Resources & Links
The following resources were referenced in the original video: