Skip to content

lensoncode/java-deepeval-demo

Repository files navigation

AWS Bedrock + Spring AI Demo with Deepeval Integration

A complete Spring Boot demo application showing how to integrate AWS Bedrock (Claude 3.5 Sonnet) with Spring AI for chat/RAG functionality, and capture evaluation data for analysis with Deepeval (Python).

Perfect for Java developers exploring LLM integration and evaluation workflows.

Features

  • AWS Bedrock Integration: Claude 3.5 Sonnet via Spring AI (swappable to Titan/Llama)
  • Simple RAG Pipeline: In-memory context retrieval from a handbook
  • REST API: Clean endpoints for chat and evaluation capture
  • Evaluation Support: NDJSON dataset generation for Deepeval metrics
  • Docker Ready: Containerized deployment with docker-compose
  • Production Patterns: Configuration management, logging, validation

Prerequisites

  • Java 21 (JDK 21+)
  • Maven 3.9+
  • AWS Account with Bedrock access enabled
  • AWS Credentials configured (see below)
  • Docker (optional, for containerized deployment)
  • curl + jq (for testing)

AWS Setup

  1. Enable Bedrock Access in your AWS account:

    • Navigate to AWS Bedrock console
    • Request access to Anthropic Claude 3.5 Sonnet model
    • Wait for approval (usually instant for most regions)
  2. Configure IAM Permissions: Your IAM user/role needs these permissions:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "bedrock:InvokeModel",
            "bedrock:InvokeModelWithResponseStream"
          ],
          "Resource": "arn:aws:bedrock:*:*:model/*"
        }
      ]
    }
  3. Set up AWS Credentials (choose one method):

    Option A: Environment Variables

    export AWS_ACCESS_KEY_ID=your_access_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export AWS_REGION=eu-central-1

    Option B: AWS CLI Profile

    aws configure
    # Or use a named profile:
    export AWS_PROFILE=my-profile

    Option C: IAM Role (for EC2/ECS/Lambda)

    • No explicit credentials needed, uses instance role

Quick Start

1. Clone and Build

git clone <your-repo-url>
cd java-deepeval-demo

# Build with Maven
mvn clean install

2. Configure Environment

Create or edit environment variables:

# Required
export AWS_REGION=eu-central-1
export AWS_PROFILE=default  # or use AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY

# Optional: Override default model
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20240620-v1:0

3. Run Locally

# Using Maven
mvn spring-boot:run

# Or using Makefile
make run

The application starts on http://localhost:8080

4. Test the API

Ask a question:

curl -s http://localhost:8080/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"Who is the CEO?"}' | jq

# Or use Makefile
make ask

Expected response:

{
  "answer": "Alice Smith is the CEO of TechCorp Solutions. She joined the company in 2018 after leading product development at a Fortune 500 tech company.",
  "contexts": [
    "Company Overview\nOur company, TechCorp Solutions, was founded in 2015...",
    "..."
  ]
}

Capture evaluation data:

curl -s http://localhost:8080/eval-capture \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Who is the CEO?",
    "answer": "Alice Smith is the CEO of TechCorp Solutions.",
    "contexts": ["Company Overview\nThe CEO is Alice Smith..."],
    "expected": "Alice Smith"
  }' | jq

# Or use Makefile
make capture

Expected response:

{
  "ok": true,
  "count": 1
}

Check evaluation/eval_dataset.json to see the captured data.

Docker Deployment

Build and Run with Docker Compose

# Build image
make docker

# Start container (with AWS credentials mounted)
make up

# Check logs
make logs

# Test endpoints
make ask
make capture

# Stop container
make down

The docker-compose setup:

  • Mounts ~/.aws for credential access
  • Persists evaluation/ directory for dataset storage
  • Exposes port 8080
  • Includes health checks

Configuration

Application Properties

Edit src/main/resources/application.yml or use environment variables:

Property Environment Variable Default Description
spring.ai.bedrock.aws.region AWS_REGION eu-central-1 AWS region for Bedrock
spring.ai.bedrock.anthropic.claude.chat.model BEDROCK_MODEL_ID anthropic.claude-3-5-sonnet-20240620-v1:0 Bedrock model ID
evaluation.output-dir EVAL_OUTPUT_DIR evaluation Directory for eval data
evaluation.dataset-file EVAL_DATASET_FILE eval_dataset.json Eval dataset filename

Switching Models

To use different Bedrock models:

Amazon Titan:

export BEDROCK_MODEL_ID=amazon.titan-text-express-v1

Meta Llama:

export BEDROCK_MODEL_ID=meta.llama3-70b-instruct-v1:0

Note: Model availability varies by region. Check AWS Bedrock console.

Project Structure

bedrock-deepeval-demo/
├── pom.xml                          # Maven configuration with Spring AI BOM
├── Dockerfile                       # Multi-stage Docker build
├── docker-compose.yml               # Container orchestration
├── Makefile                         # Helper commands
├── README.md                        # This file
│
├── src/main/java/com/example/deepevaldemo/
│   ├── App.java                     # Spring Boot main class
│   ├── config/
│   │   └── BedrockConfig.java       # Spring AI ChatModel bean
│   ├── web/
│   │   └── AskController.java       # REST endpoints
│   ├── service/
│   │   ├── ChatService.java         # Chat orchestration + RAG
│   │   └── EvaluationService.java   # Eval data capture
│   └── rag/
│       └── ContextRetriever.java    # Simple in-memory retrieval
│
├── src/main/resources/
│   ├── application.yml              # Spring configuration
│   └── handbook.txt                 # Demo knowledge base
│
└── evaluation/
    ├── .gitkeep
    ├── README.md                    # Deepeval integration guide
    └── eval_dataset.json            # Generated NDJSON (runtime)

API Reference

POST /ask

Submit a question and receive an AI-generated answer with context.

Request:

{
  "question": "What are the pricing tiers?"
}

Response:

{
  "answer": "We offer three pricing tiers: Starter at $99/month, Professional at $299/month, and Enterprise with custom pricing.",
  "contexts": [
    "Pricing and Plans\nWe offer three pricing tiers...",
    "..."
  ]
}

Validation:

  • question is required and cannot be blank

POST /eval-capture

Capture evaluation data for Deepeval analysis.

Request:

{
  "question": "Who is the CEO?",
  "answer": "Alice Smith is the CEO of TechCorp Solutions.",
  "contexts": ["Company Overview\nThe CEO is Alice Smith..."],
  "expected": "Alice Smith"
}

Response:

{
  "ok": true,
  "count": 5
}

Validation:

  • All fields are required
  • contexts must be a non-empty array

How Evaluation Works

The /eval-capture endpoint stores evaluation records as newline-delimited JSON (NDJSON) in evaluation/eval_dataset.json.

Evaluation Workflow

  1. Capture Data: Use your Java app to generate questions/answers and capture them
  2. Review Dataset: Check evaluation/eval_dataset.json for completeness
  3. Run Deepeval (Python): Analyze with metrics like faithfulness, relevancy, precision
  4. Iterate: Improve prompts, context retrieval, or model based on results

Deepeval Integration

The full Python evaluation script will be provided in the accompanying Medium article. Here's a preview:

# pip install deepeval
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
import json

# Load captured dataset
test_cases = []
with open('evaluation/eval_dataset.json', 'r') as f:
    for line in f:
        record = json.loads(line)
        test_cases.append(LLMTestCase(
            input=record['question'],
            actual_output=record['answer'],
            expected_output=record['expected'],
            retrieval_context=record['contexts']
        ))

# Define metrics
metrics = [
    AnswerRelevancyMetric(threshold=0.7),
    FaithfulnessMetric(threshold=0.8)
]

# Run evaluation
results = evaluate(test_cases, metrics)
print(results)

See evaluation/README.md for detailed documentation.

Makefile Commands

make help       # Show all available commands
make run        # Run locally with Maven
make clean      # Clean build artifacts
make test       # Run tests
make docker     # Build Docker image
make up         # Start with docker-compose
make down       # Stop docker-compose
make logs       # View container logs
make ask        # Sample curl to /ask
make capture    # Sample curl to /eval-capture

Troubleshooting

Common Issues

"No credentials found"

  • Verify AWS credentials: aws sts get-caller-identity
  • Check environment variables: echo $AWS_PROFILE
  • Ensure ~/.aws/credentials exists

"Access denied to model"

  • Request model access in AWS Bedrock console
  • Verify IAM permissions include bedrock:InvokeModel
  • Check if model is available in your region

"Model not found"

  • Verify model ID format: anthropic.claude-3-5-sonnet-20240620-v1:0
  • List available models: aws bedrock list-foundation-models --region eu-central-1

Docker: "Permission denied"

  • On Linux, ensure user is in docker group: sudo usermod -aG docker $USER
  • Or run with sudo: sudo make up

Debug Logging

Enable debug logs in application.yml:

logging:
  level:
    com.example.deepevaldemo: DEBUG
    org.springframework.ai: DEBUG

Performance Considerations

Context Retrieval

The current implementation uses simple token overlap for retrieval. For production:

  • Use vector embeddings (OpenAI, Cohere, SentenceTransformers)
  • Integrate vector databases (Pinecone, Weaviate, Milvus)
  • Implement semantic search with cosine similarity

Cost Management

AWS Bedrock charges per token:

  • Claude 3.5 Sonnet: ~$0.003 per 1K input tokens, ~$0.015 per 1K output tokens
  • Monitor usage in AWS Cost Explorer
  • Set up billing alarms

Scaling

For production deployments:

  • Use Spring Boot Actuator for monitoring
  • Configure connection pooling for Bedrock API
  • Implement caching for frequent queries
  • Consider async processing for evaluation capture

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

MIT License - see LICENSE file for details

Resources

Related Articles

Support

For issues:

  • Check the troubleshooting section above
  • Review AWS Bedrock logs in CloudWatch
  • Open an issue on GitHub

Built with Spring Boot 3.3.x, Spring AI, and AWS Bedrock

About

Spring Boot + AWS Bedrock + Deepeval demo - AI chat with evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors