AI-powered code analysis platform that indexes GitHub repositories and provides intelligent code summaries, commit analysis, and meeting transcription capabilities.
CodeSense AI is a Next.js application that helps development teams understand and search their codebases using AI. It connects to GitHub repositories, generates semantic embeddings of source code using Google Gemini, and enables natural language queries about code functionality. The platform also includes meeting transcription with issue extraction and team collaboration features.
Target Users: Development teams, code reviewers, and engineers who need to quickly understand large codebases or track changes across repositories.
- GitHub Repository Indexing: Complete codebase ingestion using LangChain's GithubRepoLoader
- AI-Powered Summaries: Automatic file-level summaries generated by Google Gemini
- Vector Embeddings: 768-dimensional embeddings stored in PostgreSQL with pgvector for semantic search
- Natural Language Q&A: Query codebase with questions and get AI-generated answers with file references
- Automated Commit Polling: Fetches recent commits from linked GitHub repositories
- AI Diff Summarization: Generates 4-bullet point summaries of code changes using Gemini
- Commit History Tracking: Stores commit metadata with AI-generated insights
- Audio Transcription: Process meeting recordings using AssemblyAI
- Issue Extraction: Automatically identifies and extracts action items from meetings
- Chapter Summaries: Generates time-stamped summaries of meeting segments
- Multi-User Projects: Many-to-many relationship between users and projects
- Project Management: Create, archive, and manage multiple GitHub repositories
- Team Invitations: Invite team members to collaborate on projects
- Next.js 15: App Router with React 19
- TypeScript: Full type safety
- Tailwind CSS: Utility-first styling
- Radix UI: Accessible component primitives
- React Query: Client-side state management and caching
- Node.js: Server runtime
- TRPC: Type-safe API layer
- Prisma: Database ORM with PostgreSQL
- Clerk: Authentication and user management
- PostgreSQL: Primary database with pgvector extension
- AWS S3: File storage for audio recordings
- Vector Search: Semantic similarity using pgvector
- Google Gemini: Code summarization and embeddings (gemini-2.5-flash, gemini-embedding-001)
- AssemblyAI: Audio transcription and meeting analysis
- GitHub API: Repository access and commit polling
- LangChain: Document processing and repository loading
- ESLint: Code linting
- Prettier: Code formatting
- TypeScript: Static type checking
- Prisma Studio: Database management UI
- Authentication: Clerk middleware protects routes and provides user context
- API Layer: TRPC handles type-safe client-server communication
- Business Logic: Server procedures coordinate database operations and external API calls
- Data Layer: Prisma ORM manages PostgreSQL operations with raw SQL for vectors
- Repository Indexing: GitHub → LangChain Loader → Gemini Summarization → Embedding Generation → PostgreSQL
- Query Processing: User Question → Vector Similarity Search → Relevant Files → AI Response Generation
- Meeting Processing: Audio Upload → S3 Storage → AssemblyAI Transcription → Issue Extraction → Database Storage
- AIService (
google.services.ts): Centralized AI operations for summarization and embeddings - GithubLoader (
github-loader.ts): Repository ingestion and processing pipeline - Assembly Service (
assembly.ts): Audio processing and transcription - TRPC Routers: API endpoints organized by domain (projects, commits, questions)
codesense-ai/
├── prisma/
│ ├── schema.prisma # Database schema with pgvector support
│ └── migrations/ # Database migration files
├── src/
│ ├── app/ # Next.js App Router pages
│ │ ├── (protected)/ # Authenticated routes
│ │ │ ├── dashboard/ # Main project dashboard
│ │ │ ├── qa/ # Saved questions interface
│ │ │ └── meetings/ # Meeting management
│ │ ├── api/ # TRPC API routes
│ │ └── layout.tsx # Root layout with providers
│ ├── components/
│ │ ├── ui/ # Radix UI component library
│ │ └── createProjectModal.tsx
│ ├── config/
│ │ ├── aiclient.config.ts # AI service initialization
│ │ └── google.config.ts # Google API configuration
│ ├── lib/
│ │ ├── gemini.ts # AI service wrappers
│ │ ├── github-loader.ts # Repository indexing
│ │ ├── github.ts # GitHub API integration
│ │ ├── assembly.ts # Audio processing
│ │ └── s3Client.ts # AWS S3 integration
│ ├── server/
│ │ ├── api/
│ │ │ ├── root.ts # Main TRPC router
│ │ │ └── routers/
│ │ │ └── project.ts # Project-related procedures
│ │ └── db.ts # Prisma client
│ ├── services/
│ │ └── google.services.ts # AI service implementation
│ ├── trpc/ # TRPC configuration
│ └── types/
│ └── types.ts # TypeScript type definitions
├── .env.example # Environment variables template
├── docker-compose.yml # PostgreSQL development setup
└── start-database.sh # Database initialization script
- Node.js 18+ (LTS recommended)
- npm 11.6.1+
- PostgreSQL with pgvector extension
- Google Generative AI API key
- Clerk authentication keys
- (Optional) GitHub personal access token for private repos
- Clone and Install
git clone <repository-url>
cd codesense-ai
npm install- Environment Configuration
cp .env.example .envConfigure these variables in .env:
# Database
DATABASE_URL="postgresql://postgres:password@localhost:5432/codesense-ai"
# AI Services
GOOGLE_GENERATIVE_AI_API_KEY="your-gemini-api-key"
ASSEMBLYAI_API_KEY="your-assemblyai-key"
# Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="your-clerk-publishable-key"
CLERK_SECRET_KEY="your-clerk-secret-key"
NEXT_PUBLIC_CLERK_SIGN_IN_URL="/sign-in"
NEXT_PUBLIC_CLERK_SIGN_IN_FALLBACK_REDIRECT_URL="/dashboard"
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL="/dashboard"
# Optional
GITHUB_TOKEN="your-github-token-for-private-repos"- Database Setup
# Start PostgreSQL (using Docker Compose)
docker-compose up -d
# Apply migrations and generate client
npm run db:generate
# (Alternative) Push schema directly
npm run db:push- Start Development Server
npm run devVisit http://localhost:3000
- pgvector Extension: Ensure PostgreSQL has the vector extension enabled:
CREATE EXTENSION IF NOT EXISTS vector; - API Keys: Verify all required API keys are properly set in environment variables
- Database Connection: Check DATABASE_URL format and PostgreSQL service status
- User clicks "Create Project" in dashboard
- Modal captures project name and GitHub URL
- TRPC mutation
project.createProjectcreates database record - Background processes:
indexGithubRepoloads repository files- Each file processed: AI summary → embedding generation → storage
pollCommitsfetches recent commit history
- File Loading: LangChain GithubRepoLoader fetches repository contents
- Content Processing: Each file truncated to 10,000 characters for context limits
- AI Summarization: Gemini generates purpose-focused summaries
- Embedding Generation: 768-dimensional vectors created from summaries
- Storage: Raw SQL inserts vectors into PostgreSQL pgvector columns
- User submits question via Q&A interface
- System performs vector similarity search on stored embeddings
- Top relevant files retrieved based on cosine similarity
- Gemini generates contextual answer using retrieved code
- Response saved with file references for future access
- User uploads audio file through dashboard uploader
- File stored in AWS S3, meeting record created with PROCESSING status
- AssemblyAI transcribes audio and generates chapter summaries
- Issues and action items extracted from transcript
- Meeting status updated to COMPLETED, results displayed
Mutations
createProject: Create new project and trigger repository indexingarchiveProject: Soft-delete project (sets deletedAt timestamp)saveQuestion: Store Q&A pairs with file referencescreateMeeting: Initialize meeting record for audio processing
Queries
getProjects: Fetch user's projects with commit pollinggetProjectById: Retrieve single project with all related datagetCommits: Get commit history for a projectgetQuestions: Retrieve saved Q&A pairsgetMeetings: Fetch meeting records and processing status
- User: Clerk integration with credit system (150 free credits)
- Project: GitHub repository linking with soft deletion
- UserToProject: Many-to-many team collaboration
- SourceCodeEmbedding: Vector embeddings with pgvector (768 dimensions)
- Commit: GitHub commit data with AI summaries
- Question: Q&A pairs with JSON file references
- Meeting: Audio meeting records with processing status
- Issue: Extracted meeting action items
- No CI/CD Pipeline: No automated testing or deployment workflows
- Missing Test Suite: No unit or integration tests implemented
- Single AI Provider: Only Google Gemini supported (no fallback options)
- File Size Limits: Repository files truncated at 10,000 characters
- Database Constraints: No connection pooling configuration for scalability
- No Real-time Updates: Commit polling is manual, not webhook-based
- Limited Search: Only semantic search, no exact text search
- No Code Highlighting: Code references displayed without syntax highlighting
- Missing Export: No way to export summaries or analysis results
- No Analytics: No usage metrics or insights dashboard
- Synchronous Processing: Repository indexing blocks API responses
- Memory Usage: Large repositories may cause memory pressure
- Rate Limiting: No protection against API rate limits
- Database Performance: No query optimization for large datasets
- Test Suite: Implement unit tests for core services and integration tests for API endpoints
- Error Handling: Add comprehensive error boundaries and retry logic
- Performance: Implement background job processing for repository indexing
- Monitoring: Add logging and error tracking (e.g., Sentry)
- Webhook Integration: Real-time GitHub webhook processing instead of polling
- Multi-Provider AI: Support for OpenAI, Anthropic, and other AI providers
- Advanced Search: Combine semantic and exact text search with filtering
- Code Visualization: Interactive code graphs and dependency analysis
- Team Analytics: Usage metrics, collaboration insights, and activity tracking
- CI/CD Pipeline: GitHub Actions for automated testing and deployment
- Container Orchestration: Docker Compose for development, Kubernetes for production
- Database Optimization: Connection pooling, query optimization, and caching layers
- CDN Integration: Asset delivery and file processing optimization
- Security: Input validation, rate limiting, and security headers
# Development
npm run dev # Start development server with Turbo
npm run build # Build for production
npm run start # Start production server
npm run preview # Build and preview production
# Database
npm run db:generate # Create migrations and generate client
npm run db:migrate # Deploy migrations to database
npm run db:push # Push schema changes directly
npm run db:studio # Open Prisma Studio
# Code Quality
npm run lint # Run ESLint
npm run lint:fix # Fix ESLint issues
npm run typecheck # Run TypeScript type checking
npm run format:check # Check Prettier formatting
npm run format:write # Apply Prettier formatting
npm run check # Run lint + typecheckNo license specified. Contact repository maintainers for usage permissions.