CodeSense AI

AI-powered code analysis platform that indexes GitHub repositories and provides intelligent code summaries, commit analysis, and meeting transcription capabilities.

Project Overview

CodeSense AI is a Next.js application that helps development teams understand and search their codebases using AI. It connects to GitHub repositories, generates semantic embeddings of source code using Google Gemini, and enables natural language queries about code functionality. The platform also includes meeting transcription with issue extraction and team collaboration features.

Target Users: Development teams, code reviewers, and engineers who need to quickly understand large codebases or track changes across repositories.

Core Features

Code Analysis & Search

GitHub Repository Indexing: Complete codebase ingestion using LangChain's GithubRepoLoader
AI-Powered Summaries: Automatic file-level summaries generated by Google Gemini
Vector Embeddings: 768-dimensional embeddings stored in PostgreSQL with pgvector for semantic search
Natural Language Q&A: Query codebase with questions and get AI-generated answers with file references

Commit Analysis

Automated Commit Polling: Fetches recent commits from linked GitHub repositories
AI Diff Summarization: Generates 4-bullet point summaries of code changes using Gemini
Commit History Tracking: Stores commit metadata with AI-generated insights

Meeting Processing

Audio Transcription: Process meeting recordings using AssemblyAI
Issue Extraction: Automatically identifies and extracts action items from meetings
Chapter Summaries: Generates time-stamped summaries of meeting segments

Team Collaboration

Multi-User Projects: Many-to-many relationship between users and projects
Project Management: Create, archive, and manage multiple GitHub repositories
Team Invitations: Invite team members to collaborate on projects

Tech Stack

Frontend

Next.js 15: App Router with React 19
TypeScript: Full type safety
Tailwind CSS: Utility-first styling
Radix UI: Accessible component primitives
React Query: Client-side state management and caching

Backend

Node.js: Server runtime
TRPC: Type-safe API layer
Prisma: Database ORM with PostgreSQL
Clerk: Authentication and user management

Database & Storage

PostgreSQL: Primary database with pgvector extension
AWS S3: File storage for audio recordings
Vector Search: Semantic similarity using pgvector

AI & External Services

Google Gemini: Code summarization and embeddings (gemini-2.5-flash, gemini-embedding-001)
AssemblyAI: Audio transcription and meeting analysis
GitHub API: Repository access and commit polling
LangChain: Document processing and repository loading

Development Tools

ESLint: Code linting
Prettier: Code formatting
TypeScript: Static type checking
Prisma Studio: Database management UI

Architecture Overview

Request Flow

Authentication: Clerk middleware protects routes and provides user context
API Layer: TRPC handles type-safe client-server communication
Business Logic: Server procedures coordinate database operations and external API calls
Data Layer: Prisma ORM manages PostgreSQL operations with raw SQL for vectors

Data Flow

Repository Indexing: GitHub → LangChain Loader → Gemini Summarization → Embedding Generation → PostgreSQL
Query Processing: User Question → Vector Similarity Search → Relevant Files → AI Response Generation
Meeting Processing: Audio Upload → S3 Storage → AssemblyAI Transcription → Issue Extraction → Database Storage

Key Services Interaction

AIService (google.services.ts): Centralized AI operations for summarization and embeddings
GithubLoader (github-loader.ts): Repository ingestion and processing pipeline
Assembly Service (assembly.ts): Audio processing and transcription
TRPC Routers: API endpoints organized by domain (projects, commits, questions)

Project Structure

codesense-ai/
├── prisma/
│   ├── schema.prisma          # Database schema with pgvector support
│   └── migrations/            # Database migration files
├── src/
│   ├── app/                   # Next.js App Router pages
│   │   ├── (protected)/       # Authenticated routes
│   │   │   ├── dashboard/     # Main project dashboard
│   │   │   ├── qa/           # Saved questions interface
│   │   │   └── meetings/      # Meeting management
│   │   ├── api/              # TRPC API routes
│   │   └── layout.tsx        # Root layout with providers
│   ├── components/
│   │   ├── ui/               # Radix UI component library
│   │   └── createProjectModal.tsx
│   ├── config/
│   │   ├── aiclient.config.ts    # AI service initialization
│   │   └── google.config.ts      # Google API configuration
│   ├── lib/
│   │   ├── gemini.ts             # AI service wrappers
│   │   ├── github-loader.ts      # Repository indexing
│   │   ├── github.ts             # GitHub API integration
│   │   ├── assembly.ts           # Audio processing
│   │   └── s3Client.ts           # AWS S3 integration
│   ├── server/
│   │   ├── api/
│   │   │   ├── root.ts           # Main TRPC router
│   │   │   └── routers/
│   │   │       └── project.ts    # Project-related procedures
│   │   └── db.ts                 # Prisma client
│   ├── services/
│   │   └── google.services.ts   # AI service implementation
│   ├── trpc/                    # TRPC configuration
│   └── types/
│       └── types.ts             # TypeScript type definitions
├── .env.example                 # Environment variables template
├── docker-compose.yml           # PostgreSQL development setup
└── start-database.sh           # Database initialization script

Setup Instructions

Prerequisites

Node.js 18+ (LTS recommended)
npm 11.6.1+
PostgreSQL with pgvector extension
Google Generative AI API key
Clerk authentication keys
(Optional) GitHub personal access token for private repos

Local Development Setup

Clone and Install

git clone <repository-url>
cd codesense-ai
npm install

Environment Configuration

cp .env.example .env

Configure these variables in .env:

# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/codesense-ai"

# AI Services
GOOGLE_GENERATIVE_AI_API_KEY="your-gemini-api-key"
ASSEMBLYAI_API_KEY="your-assemblyai-key"

# Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="your-clerk-publishable-key"
CLERK_SECRET_KEY="your-clerk-secret-key"
NEXT_PUBLIC_CLERK_SIGN_IN_URL="/sign-in"
NEXT_PUBLIC_CLERK_SIGN_IN_FALLBACK_REDIRECT_URL="/dashboard"
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL="/dashboard"

# Optional
GITHUB_TOKEN="your-github-token-for-private-repos"

Database Setup

# Start PostgreSQL (using Docker Compose)
docker-compose up -d

# Apply migrations and generate client
npm run db:generate

# (Alternative) Push schema directly
npm run db:push

Start Development Server

npm run dev

Visit http://localhost:3000

Common Setup Issues

pgvector Extension: Ensure PostgreSQL has the vector extension enabled: CREATE EXTENSION IF NOT EXISTS vector;
API Keys: Verify all required API keys are properly set in environment variables
Database Connection: Check DATABASE_URL format and PostgreSQL service status

Key Workflows

Creating a Project

User clicks "Create Project" in dashboard
Modal captures project name and GitHub URL
TRPC mutation project.createProject creates database record
Background processes:
- indexGithubRepo loads repository files
- Each file processed: AI summary → embedding generation → storage
- pollCommits fetches recent commit history

Repository Indexing Process

File Loading: LangChain GithubRepoLoader fetches repository contents
Content Processing: Each file truncated to 10,000 characters for context limits
AI Summarization: Gemini generates purpose-focused summaries
Embedding Generation: 768-dimensional vectors created from summaries
Storage: Raw SQL inserts vectors into PostgreSQL pgvector columns

Querying Codebase

User submits question via Q&A interface
System performs vector similarity search on stored embeddings
Top relevant files retrieved based on cosine similarity
Gemini generates contextual answer using retrieved code
Response saved with file references for future access

Meeting Processing

User uploads audio file through dashboard uploader
File stored in AWS S3, meeting record created with PROCESSING status
AssemblyAI transcribes audio and generates chapter summaries
Issues and action items extracted from transcript
Meeting status updated to COMPLETED, results displayed

API Overview

TRPC Procedures (`/api/routers/project.ts`)

Mutations

createProject: Create new project and trigger repository indexing
archiveProject: Soft-delete project (sets deletedAt timestamp)
saveQuestion: Store Q&A pairs with file references
createMeeting: Initialize meeting record for audio processing

Queries

getProjects: Fetch user's projects with commit polling
getProjectById: Retrieve single project with all related data
getCommits: Get commit history for a project
getQuestions: Retrieve saved Q&A pairs
getMeetings: Fetch meeting records and processing status

Database Models

User: Clerk integration with credit system (150 free credits)
Project: GitHub repository linking with soft deletion
UserToProject: Many-to-many team collaboration
SourceCodeEmbedding: Vector embeddings with pgvector (768 dimensions)
Commit: GitHub commit data with AI summaries
Question: Q&A pairs with JSON file references
Meeting: Audio meeting records with processing status
Issue: Extracted meeting action items

Limitations / Known Gaps

Technical Limitations

No CI/CD Pipeline: No automated testing or deployment workflows
Missing Test Suite: No unit or integration tests implemented
Single AI Provider: Only Google Gemini supported (no fallback options)
File Size Limits: Repository files truncated at 10,000 characters
Database Constraints: No connection pooling configuration for scalability

Feature Gaps

No Real-time Updates: Commit polling is manual, not webhook-based
Limited Search: Only semantic search, no exact text search
No Code Highlighting: Code references displayed without syntax highlighting
Missing Export: No way to export summaries or analysis results
No Analytics: No usage metrics or insights dashboard

Scalability Concerns

Synchronous Processing: Repository indexing blocks API responses
Memory Usage: Large repositories may cause memory pressure
Rate Limiting: No protection against API rate limits
Database Performance: No query optimization for large datasets

Future Improvements

Immediate Priorities

Test Suite: Implement unit tests for core services and integration tests for API endpoints
Error Handling: Add comprehensive error boundaries and retry logic
Performance: Implement background job processing for repository indexing
Monitoring: Add logging and error tracking (e.g., Sentry)

Feature Enhancements

Webhook Integration: Real-time GitHub webhook processing instead of polling
Multi-Provider AI: Support for OpenAI, Anthropic, and other AI providers
Advanced Search: Combine semantic and exact text search with filtering
Code Visualization: Interactive code graphs and dependency analysis
Team Analytics: Usage metrics, collaboration insights, and activity tracking

Infrastructure Improvements

CI/CD Pipeline: GitHub Actions for automated testing and deployment
Container Orchestration: Docker Compose for development, Kubernetes for production
Database Optimization: Connection pooling, query optimization, and caching layers
CDN Integration: Asset delivery and file processing optimization
Security: Input validation, rate limiting, and security headers

Development Commands

# Development
npm run dev              # Start development server with Turbo
npm run build            # Build for production
npm run start            # Start production server
npm run preview          # Build and preview production

# Database
npm run db:generate      # Create migrations and generate client
npm run db:migrate       # Deploy migrations to database
npm run db:push          # Push schema changes directly
npm run db:studio        # Open Prisma Studio

# Code Quality
npm run lint             # Run ESLint
npm run lint:fix         # Fix ESLint issues
npm run typecheck        # Run TypeScript type checking
npm run format:check     # Check Prettier formatting
npm run format:write     # Apply Prettier formatting
npm run check            # Run lint + typecheck

License

No license specified. Contact repository maintainers for usage permissions.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/prompts		.github/prompts
prisma		prisma
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
components.json		components.json
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
next.config.js		next.config.js
package.json		package.json
postcss.config.js		postcss.config.js
prettier.config.js		prettier.config.js
start-database.sh		start-database.sh
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

CodeSense AI

Project Overview

Core Features

Code Analysis & Search

Commit Analysis

Meeting Processing

Team Collaboration

Tech Stack

Frontend

Backend

Database & Storage

AI & External Services

Development Tools

Architecture Overview

Request Flow

Data Flow

Key Services Interaction

Project Structure

Setup Instructions

Prerequisites

Local Development Setup

Common Setup Issues

Key Workflows

Creating a Project

Repository Indexing Process

Querying Codebase

Meeting Processing

API Overview

TRPC Procedures (/api/routers/project.ts)

Database Models

Limitations / Known Gaps

Technical Limitations

Feature Gaps

Scalability Concerns

Future Improvements

Immediate Priorities

Feature Enhancements

Infrastructure Improvements

Development Commands

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

TRPC Procedures (`/api/routers/project.ts`)