RAG Visualization - Educational

1. Text Chunking

In RAG systems, documents are broken into smaller chunks for more precise retrieval. Try different chunking strategies:

Fixed-size chunks: Simple character-based chunking with adjustable size. Simple but may break semantic units.
Sliding Window: Fixed-size chunks with overlap to maintain context between chunks. Better for preserving context across chunk boundaries.
Sentence-based: Groups sentences together, preserving semantic meaning but creating variable-sized chunks.
Paragraph-based: Uses natural paragraph breaks to create chunks, maintaining complete thoughts.

The choice of chunking strategy significantly impacts retrieval quality. Semantic-preserving methods (sentence/paragraph) often perform better for complex queries.

The solar system consists of the Sun and everything that orbits around it. This includes eight planets: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Earth is our home planet, with vast oceans covering most of its surface. It has one natural satellite called the Moon. Mars, often called the Red Planet, has captured our imagination for centuries. Scientists believe Mars once had flowing water on its surface. Jupiter is the largest planet in our solar system, with a distinctive Great Red Spot and numerous moons.

The solar system consists of the Sun and everythin

g that orbits around it. This includes eight plane

ts: Mercury, Venus, Earth, Mars, Jupiter, Saturn,

Uranus, and Neptune. Earth is our home planet, wit

h vast oceans covering most of its surface. It has

one natural satellite called the Moon. Mars, ofte

n called the Red Planet, has captured our imaginat

ion for centuries. Scientists believe Mars once ha

d flowing water on its surface. Jupiter is the lar

gest planet in our solar system, with a distinctiv

e Great Red Spot and numerous moons.

2. Vector Embedding

Each text chunk is converted into a high-dimensional vector (embedding) using a language model.

Real embeddings typically have 768-1536 dimensions, where each dimension represents abstract semantic features.

Below is a simplified representation showing only 10 dimensions per chunk. For educational purposes, we've labeled these dimensions with semantic categories, though in real embedding models, dimensions don't have human-interpretable labels.

Note: In this demo, words related to specific topics (like "Mars" or "red") influence certain dimensions more than others. For example, planet names have higher values in the "Celestial Bodies" dimension.

Chunk	Dim 1 Celestial Bodies	Dim 2 Planetary Features	Dim 3 Size/Scale	Dim 4 Position/Order	Dim 5 Composition	Dim 6 Surface Features	Dim 7 Satellites	Dim 8 Color/Appearance	Dim 9 Scientific Interest	Dim 10 Historical Significance
The solar system consists of the Sun and everythin	0.83	0.33	0.17	0.17	0.56	0.94	1.00	0.78	0.61	0.61
g that orbits around it. This includes eight plane	-1.00	-1.00	-1.00	-1.00	-1.00	-1.00	-1.00	-1.00	-1.00	-1.00
ts: Mercury, Venus, Earth, Mars, Jupiter, Saturn,	1.00	1.00	0.73	0.47	0.38	0.20	0.07	0.12	0.25	0.34
Uranus, and Neptune. Earth is our home planet, wit	0.13	1.00	0.54	0.79	0.63	0.46	0.42	0.25	0.08	0.08
h vast oceans covering most of its surface. It has	-0.69	0.37	0.16	-0.05	-0.26	1.00	-0.69	-0.90	-0.48	0.79
one natural satellite called the Moon. Mars, ofte	0.22	0.28	1.00	0.82	0.64	0.46	0.70	0.34	0.16	0.16
n called the Red Planet, has captured our imaginat	0.23	0.72	0.23	0.79	0.65	0.51	0.51	1.00	0.79	0.65
ion for centuries. Scientists believe Mars once ha	0.87	1.00	0.87	0.74	0.61	0.48	0.35	0.74	0.61	0.48
d flowing water on its surface. Jupiter is the lar	0.40	0.91	0.79	0.66	0.53	1.00	0.28	0.28	0.79	0.91
gest planet in our solar system, with a distinctiv	0.48	0.62	0.28	0.52	0.76	1.00	0.90	0.79	0.69	0.59
e Great Red Spot and numerous moons.	0.39	0.47	0.54	0.47	0.39	0.31	0.39	1.00	0.85	0.77

3. Vector Similarity Visualization

When a query is entered, it's also converted to an embedding vector.

The system calculates similarity between the query vector and all chunk vectors using cosine similarity.

The visualization below shows how the query embedding overlaps with each chunk embedding:

Query column: Shows the query's embedding values across dimensions
Chunk column: Shows the chunk's embedding values across dimensions
Overlap column: Shows the product of query and chunk values for each dimension

Positive products (green) contribute positively to similarity, while negative products (red) reduce similarity. The width of the bar indicates the magnitude of contribution.

Search Query:

4. Retrieval Results

Chunks are ranked by similarity score, and the most relevant ones are retrieved.

These chunks provide the context needed to answer the query accurately.

Results:

5. LLM Prompt Construction

In a RAG system, the retrieved chunks are combined with the original query to create a prompt for the language model.

This prompt provides the LLM with the necessary context to generate an accurate and relevant response.

Retrieval Augmented Generation (RAG) Visualization