Skip to content

kartikey2004-git/CodeSense-AI

Repository files navigation

CodeSense AI

AI-powered code analysis platform that indexes GitHub repositories and provides intelligent code summaries, commit analysis, and meeting transcription capabilities.

Project Overview

CodeSense AI is a Next.js application that helps development teams understand and search their codebases using AI. It connects to GitHub repositories, generates semantic embeddings of source code using Google Gemini, and enables natural language queries about code functionality. The platform also includes meeting transcription with issue extraction and team collaboration features.

Target Users: Development teams, code reviewers, and engineers who need to quickly understand large codebases or track changes across repositories.

Core Features

Code Analysis & Search

  • GitHub Repository Indexing: Complete codebase ingestion using LangChain's GithubRepoLoader
  • AI-Powered Summaries: Automatic file-level summaries generated by Google Gemini
  • Vector Embeddings: 768-dimensional embeddings stored in PostgreSQL with pgvector for semantic search
  • Natural Language Q&A: Query codebase with questions and get AI-generated answers with file references

Commit Analysis

  • Automated Commit Polling: Fetches recent commits from linked GitHub repositories
  • AI Diff Summarization: Generates 4-bullet point summaries of code changes using Gemini
  • Commit History Tracking: Stores commit metadata with AI-generated insights

Meeting Processing

  • Audio Transcription: Process meeting recordings using AssemblyAI
  • Issue Extraction: Automatically identifies and extracts action items from meetings
  • Chapter Summaries: Generates time-stamped summaries of meeting segments

Team Collaboration

  • Multi-User Projects: Many-to-many relationship between users and projects
  • Project Management: Create, archive, and manage multiple GitHub repositories
  • Team Invitations: Invite team members to collaborate on projects

Tech Stack

Frontend

  • Next.js 15: App Router with React 19
  • TypeScript: Full type safety
  • Tailwind CSS: Utility-first styling
  • Radix UI: Accessible component primitives
  • React Query: Client-side state management and caching

Backend

  • Node.js: Server runtime
  • TRPC: Type-safe API layer
  • Prisma: Database ORM with PostgreSQL
  • Clerk: Authentication and user management

Database & Storage

  • PostgreSQL: Primary database with pgvector extension
  • AWS S3: File storage for audio recordings
  • Vector Search: Semantic similarity using pgvector

AI & External Services

  • Google Gemini: Code summarization and embeddings (gemini-2.5-flash, gemini-embedding-001)
  • AssemblyAI: Audio transcription and meeting analysis
  • GitHub API: Repository access and commit polling
  • LangChain: Document processing and repository loading

Development Tools

  • ESLint: Code linting
  • Prettier: Code formatting
  • TypeScript: Static type checking
  • Prisma Studio: Database management UI

Architecture Overview

Request Flow

  1. Authentication: Clerk middleware protects routes and provides user context
  2. API Layer: TRPC handles type-safe client-server communication
  3. Business Logic: Server procedures coordinate database operations and external API calls
  4. Data Layer: Prisma ORM manages PostgreSQL operations with raw SQL for vectors

Data Flow

  1. Repository Indexing: GitHub → LangChain Loader → Gemini Summarization → Embedding Generation → PostgreSQL
  2. Query Processing: User Question → Vector Similarity Search → Relevant Files → AI Response Generation
  3. Meeting Processing: Audio Upload → S3 Storage → AssemblyAI Transcription → Issue Extraction → Database Storage

Key Services Interaction

  • AIService (google.services.ts): Centralized AI operations for summarization and embeddings
  • GithubLoader (github-loader.ts): Repository ingestion and processing pipeline
  • Assembly Service (assembly.ts): Audio processing and transcription
  • TRPC Routers: API endpoints organized by domain (projects, commits, questions)

Project Structure

codesense-ai/
├── prisma/
│   ├── schema.prisma          # Database schema with pgvector support
│   └── migrations/            # Database migration files
├── src/
│   ├── app/                   # Next.js App Router pages
│   │   ├── (protected)/       # Authenticated routes
│   │   │   ├── dashboard/     # Main project dashboard
│   │   │   ├── qa/           # Saved questions interface
│   │   │   └── meetings/      # Meeting management
│   │   ├── api/              # TRPC API routes
│   │   └── layout.tsx        # Root layout with providers
│   ├── components/
│   │   ├── ui/               # Radix UI component library
│   │   └── createProjectModal.tsx
│   ├── config/
│   │   ├── aiclient.config.ts    # AI service initialization
│   │   └── google.config.ts      # Google API configuration
│   ├── lib/
│   │   ├── gemini.ts             # AI service wrappers
│   │   ├── github-loader.ts      # Repository indexing
│   │   ├── github.ts             # GitHub API integration
│   │   ├── assembly.ts           # Audio processing
│   │   └── s3Client.ts           # AWS S3 integration
│   ├── server/
│   │   ├── api/
│   │   │   ├── root.ts           # Main TRPC router
│   │   │   └── routers/
│   │   │       └── project.ts    # Project-related procedures
│   │   └── db.ts                 # Prisma client
│   ├── services/
│   │   └── google.services.ts   # AI service implementation
│   ├── trpc/                    # TRPC configuration
│   └── types/
│       └── types.ts             # TypeScript type definitions
├── .env.example                 # Environment variables template
├── docker-compose.yml           # PostgreSQL development setup
└── start-database.sh           # Database initialization script

Setup Instructions

Prerequisites

  • Node.js 18+ (LTS recommended)
  • npm 11.6.1+
  • PostgreSQL with pgvector extension
  • Google Generative AI API key
  • Clerk authentication keys
  • (Optional) GitHub personal access token for private repos

Local Development Setup

  1. Clone and Install
git clone <repository-url>
cd codesense-ai
npm install
  1. Environment Configuration
cp .env.example .env

Configure these variables in .env:

# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/codesense-ai"

# AI Services
GOOGLE_GENERATIVE_AI_API_KEY="your-gemini-api-key"
ASSEMBLYAI_API_KEY="your-assemblyai-key"

# Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="your-clerk-publishable-key"
CLERK_SECRET_KEY="your-clerk-secret-key"
NEXT_PUBLIC_CLERK_SIGN_IN_URL="/sign-in"
NEXT_PUBLIC_CLERK_SIGN_IN_FALLBACK_REDIRECT_URL="/dashboard"
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL="/dashboard"

# Optional
GITHUB_TOKEN="your-github-token-for-private-repos"
  1. Database Setup
# Start PostgreSQL (using Docker Compose)
docker-compose up -d

# Apply migrations and generate client
npm run db:generate

# (Alternative) Push schema directly
npm run db:push
  1. Start Development Server
npm run dev

Visit http://localhost:3000

Common Setup Issues

  • pgvector Extension: Ensure PostgreSQL has the vector extension enabled: CREATE EXTENSION IF NOT EXISTS vector;
  • API Keys: Verify all required API keys are properly set in environment variables
  • Database Connection: Check DATABASE_URL format and PostgreSQL service status

Key Workflows

Creating a Project

  1. User clicks "Create Project" in dashboard
  2. Modal captures project name and GitHub URL
  3. TRPC mutation project.createProject creates database record
  4. Background processes:
    • indexGithubRepo loads repository files
    • Each file processed: AI summary → embedding generation → storage
    • pollCommits fetches recent commit history

Repository Indexing Process

  1. File Loading: LangChain GithubRepoLoader fetches repository contents
  2. Content Processing: Each file truncated to 10,000 characters for context limits
  3. AI Summarization: Gemini generates purpose-focused summaries
  4. Embedding Generation: 768-dimensional vectors created from summaries
  5. Storage: Raw SQL inserts vectors into PostgreSQL pgvector columns

Querying Codebase

  1. User submits question via Q&A interface
  2. System performs vector similarity search on stored embeddings
  3. Top relevant files retrieved based on cosine similarity
  4. Gemini generates contextual answer using retrieved code
  5. Response saved with file references for future access

Meeting Processing

  1. User uploads audio file through dashboard uploader
  2. File stored in AWS S3, meeting record created with PROCESSING status
  3. AssemblyAI transcribes audio and generates chapter summaries
  4. Issues and action items extracted from transcript
  5. Meeting status updated to COMPLETED, results displayed

API Overview

TRPC Procedures (/api/routers/project.ts)

Mutations

  • createProject: Create new project and trigger repository indexing
  • archiveProject: Soft-delete project (sets deletedAt timestamp)
  • saveQuestion: Store Q&A pairs with file references
  • createMeeting: Initialize meeting record for audio processing

Queries

  • getProjects: Fetch user's projects with commit polling
  • getProjectById: Retrieve single project with all related data
  • getCommits: Get commit history for a project
  • getQuestions: Retrieve saved Q&A pairs
  • getMeetings: Fetch meeting records and processing status

Database Models

  • User: Clerk integration with credit system (150 free credits)
  • Project: GitHub repository linking with soft deletion
  • UserToProject: Many-to-many team collaboration
  • SourceCodeEmbedding: Vector embeddings with pgvector (768 dimensions)
  • Commit: GitHub commit data with AI summaries
  • Question: Q&A pairs with JSON file references
  • Meeting: Audio meeting records with processing status
  • Issue: Extracted meeting action items

Limitations / Known Gaps

Technical Limitations

  • No CI/CD Pipeline: No automated testing or deployment workflows
  • Missing Test Suite: No unit or integration tests implemented
  • Single AI Provider: Only Google Gemini supported (no fallback options)
  • File Size Limits: Repository files truncated at 10,000 characters
  • Database Constraints: No connection pooling configuration for scalability

Feature Gaps

  • No Real-time Updates: Commit polling is manual, not webhook-based
  • Limited Search: Only semantic search, no exact text search
  • No Code Highlighting: Code references displayed without syntax highlighting
  • Missing Export: No way to export summaries or analysis results
  • No Analytics: No usage metrics or insights dashboard

Scalability Concerns

  • Synchronous Processing: Repository indexing blocks API responses
  • Memory Usage: Large repositories may cause memory pressure
  • Rate Limiting: No protection against API rate limits
  • Database Performance: No query optimization for large datasets

Future Improvements

Immediate Priorities

  1. Test Suite: Implement unit tests for core services and integration tests for API endpoints
  2. Error Handling: Add comprehensive error boundaries and retry logic
  3. Performance: Implement background job processing for repository indexing
  4. Monitoring: Add logging and error tracking (e.g., Sentry)

Feature Enhancements

  1. Webhook Integration: Real-time GitHub webhook processing instead of polling
  2. Multi-Provider AI: Support for OpenAI, Anthropic, and other AI providers
  3. Advanced Search: Combine semantic and exact text search with filtering
  4. Code Visualization: Interactive code graphs and dependency analysis
  5. Team Analytics: Usage metrics, collaboration insights, and activity tracking

Infrastructure Improvements

  1. CI/CD Pipeline: GitHub Actions for automated testing and deployment
  2. Container Orchestration: Docker Compose for development, Kubernetes for production
  3. Database Optimization: Connection pooling, query optimization, and caching layers
  4. CDN Integration: Asset delivery and file processing optimization
  5. Security: Input validation, rate limiting, and security headers

Development Commands

# Development
npm run dev              # Start development server with Turbo
npm run build            # Build for production
npm run start            # Start production server
npm run preview          # Build and preview production

# Database
npm run db:generate      # Create migrations and generate client
npm run db:migrate       # Deploy migrations to database
npm run db:push          # Push schema changes directly
npm run db:studio        # Open Prisma Studio

# Code Quality
npm run lint             # Run ESLint
npm run lint:fix         # Fix ESLint issues
npm run typecheck        # Run TypeScript type checking
npm run format:check     # Check Prettier formatting
npm run format:write     # Apply Prettier formatting
npm run check            # Run lint + typecheck

License

No license specified. Contact repository maintainers for usage permissions.

About

Resources

Stars

Watchers

Forks

Contributors

Languages