Agentic AI - Chapter 2

RAG, the Memory and Context

You didn’t come this far to stop, go Next but here is quick summary

RECAP..

  • The Problem RAG Solves Training cutoff · No private knowledge · Hallucination · No citations · No memory · Inject at query time · Fresh · Grounded · Auditable

    What RAG Actually Is LLM + Retrieved Context + Grounding = Response · Ingestion Pipeline · Query Pipeline · Shared Index · Offline vs Real-time · Separation of concerns

    Embeddings & Vector Space Text → Numbers · Semantic similarity · Multi-dimensional · Cosine similarity · Euclidean distance · Context-dependent · Embedding model · 1024 dimensions

    Vector Databases Pinecone · Index · Namespace · Record · Vector + Metadata + Text · ANN search · pgvector · Elasticsearch · Hybrid search · Similarity matching

    Chunking Strategy 300–600 tokens · Too large = diluted · Too small = fragmented · Recursive splitter · Chunk overlap · Logical unit · POC first · Re-index cost

    Retrieval & Top-K Top-K = 5–10 · Too low = incomplete · Too high = noise · Similarity score filter · Cosine similarity · Dynamic parameter · No re-index needed · Tune first

    Metadata Filtering Key-value tags · doc_name · time_period · department · Filter before search · Narrow search space · Faster · Cheaper · Higher precision · Library catalog

    Re-Ranking Cohere Rerank · Top-K → Top-N · Contextual relevance · Second pass · Post-retrieval · Optional layer · Milliseconds · Add only when needed

    Grounding Rules System prompt · Only use retrieved data · No hallucination · Cite sources · Acknowledge unknown · Output format · Compliance · Trust · Behavioral constraint

    Full Architecture 5 layers · Source → Ingest → Store → Retrieve → Generate · Tune per layer · Debug by layer · Cost per query · Token economics · POC before scale