Retrieval Augmented Generation (RAG) Visualization

An educational walkthrough of how RAG works in practice

50 characters
10 characters

1. Text Chunking

In RAG systems, documents are broken into smaller chunks for more precise retrieval. Try different chunking strategies:

  • Fixed-size chunks: Simple character-based chunking with adjustable size. Simple but may break semantic units.
  • Sliding Window: Fixed-size chunks with overlap to maintain context between chunks. Better for preserving context across chunk boundaries.
  • Sentence-based: Groups sentences together, preserving semantic meaning but creating variable-sized chunks.
  • Paragraph-based: Uses natural paragraph breaks to create chunks, maintaining complete thoughts.

The choice of chunking strategy significantly impacts retrieval quality. Semantic-preserving methods (sentence/paragraph) often perform better for complex queries.

The solar system consists of the Sun and everything that orbits around it. This includes eight planets: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Earth is our home planet, with vast oceans covering most of its surface. It has one natural satellite called the Moon. Mars, often called the Red Planet, has captured our imagination for centuries. Scientists believe Mars once had flowing water on its surface. Jupiter is the largest planet in our solar system, with a distinctive Great Red Spot and numerous moons.

2. Vector Embedding

Each text chunk is converted into a high-dimensional vector (embedding) using a language model.

Real embeddings typically have 768-1536 dimensions, where each dimension represents abstract semantic features.

Below is a simplified representation showing only 10 dimensions per chunk. For educational purposes, we've labeled these dimensions with semantic categories, though in real embedding models, dimensions don't have human-interpretable labels.

Note: In this demo, words related to specific topics (like "Mars" or "red") influence certain dimensions more than others. For example, planet names have higher values in the "Celestial Bodies" dimension.

3. Vector Similarity Visualization

When a query is entered, it's also converted to an embedding vector.

The system calculates similarity between the query vector and all chunk vectors using cosine similarity.

The visualization below shows how the query embedding overlaps with each chunk embedding:

  • Query column: Shows the query's embedding values across dimensions
  • Chunk column: Shows the chunk's embedding values across dimensions
  • Overlap column: Shows the product of query and chunk values for each dimension

Positive products (green) contribute positively to similarity, while negative products (red) reduce similarity. The width of the bar indicates the magnitude of contribution.

4. Retrieval Results

Chunks are ranked by similarity score, and the most relevant ones are retrieved.

These chunks provide the context needed to answer the query accurately.

Results:

5. LLM Prompt Construction

In a RAG system, the retrieved chunks are combined with the original query to create a prompt for the language model.

This prompt provides the LLM with the necessary context to generate an accurate and relevant response.

Generated LLM Prompt: