DevntraDevntra
calendar_todaySchedule a Call
CodeAncestry - Git repository semantic search interface
Portfoliochevron_rightCodeAncestry
Case Study: AI / Developer Tools

CodeAncestry

Ask your Git history anything. Why did this function change? Who refactored this module? CodeAncestry answers in seconds — semantically.

RAG PipelineSemantic SearchGit IntelligenceSnowflake Cortex
The Product

The Invisible History Problem

Every large repository carries thousands of commits, but engineers spend hours running git log --grep and reading through diffs to understand why a decision was made. CodeAncestry was built to transform that entire history into a queryable semantic knowledge base — connect your GitHub repo, and ask questions as if you were talking to the engineer who wrote every line of code.

manage_search

Hybrid Query Modes

Intelligently classifies queries as Temporal ("what changed last month"), Semantic ("why was authentication refactored"), or Hybrid — routing each to the right retrieval strategy automatically.

group

Contributor Analytics

Visual commit timelines with relevance scoring, contributor dashboards, and Recharts-powered analytics — understand not just what changed, but who shaped the codebase and when.

The Architecture

Snowflake-Powered RAG

Devntra designed a full RAG pipeline: Gemini AI analyzes and summarizes every commit, Snowflake Cortex generates 768-dimension vector embeddings, and VECTOR_COSINE_SIMILARITY retrieves the most relevant commits for any natural language question — answered via Mistral-7B.

Vector database and RAG pipeline code

Snowflake Cortex Vector DB

All commits are embedded using EMBED_TEXT_768 and stored as vectors in Snowflake. Queries run VECTOR_COSINE_SIMILARITY at warehouse scale — fast, accurate, and citation-aware.

mic

Voice-First Interface

Ask questions using your microphone and receive spoken AI answers — making repository exploration as natural as a conversation with a colleague who knows every commit by heart.

hub

GitHub OAuth Integration

Seamless GitHub OAuth connects directly to any repository. All secrets managed via 1Password, JWT tokens securing every API route.

Re
React 18
FA
FastAPI
Sf
Snowflake
Gm
Gemini
TS
TypeScript
Rc
Recharts
Impact & Results
RAG
Full Pipeline

Query classification → embedding → vector similarity → LLM response, all powered by Snowflake Cortex.

3 Modes
Query Types

Temporal, Semantic, and Hybrid search modes intelligently classified from natural language questions.

Voice
Input & Output

Ask questions using voice commands and receive AI-generated answers read back aloud.

format_quote
“I asked CodeAncestry why a critical API endpoint was refactored six months ago. It pulled up three relevant commits, explained the reasoning, and cited the exact engineers involved — in under three seconds. That used to take me an hour of digging through git blame.”
EM
Engineering ManagerCodeAncestry Beta User

Ready to build your AI search product?

Let's discuss how our RAG pipeline expertise can make any dataset semantically queryable.

Chat on WhatsApp