AWS Bedrock + Spring AI Demo with Deepeval Integration

A complete Spring Boot demo application showing how to integrate AWS Bedrock (Claude 3.5 Sonnet) with Spring AI for chat/RAG functionality, and capture evaluation data for analysis with Deepeval (Python).

Perfect for Java developers exploring LLM integration and evaluation workflows.

Features

AWS Bedrock Integration: Claude 3.5 Sonnet via Spring AI (swappable to Titan/Llama)
Simple RAG Pipeline: In-memory context retrieval from a handbook
REST API: Clean endpoints for chat and evaluation capture
Evaluation Support: NDJSON dataset generation for Deepeval metrics
Docker Ready: Containerized deployment with docker-compose
Production Patterns: Configuration management, logging, validation

Prerequisites

Java 21 (JDK 21+)
Maven 3.9+
AWS Account with Bedrock access enabled
AWS Credentials configured (see below)
Docker (optional, for containerized deployment)
curl + jq (for testing)

AWS Setup

Enable Bedrock Access in your AWS account:
- Navigate to AWS Bedrock console
- Request access to Anthropic Claude 3.5 Sonnet model
- Wait for approval (usually instant for most regions)

Configure IAM Permissions: Your IAM user/role needs these permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:*:*:model/*"
    }
  ]
}

Set up AWS Credentials (choose one method):

Option A: Environment Variables

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION=eu-central-1

Option B: AWS CLI Profile

aws configure
# Or use a named profile:
export AWS_PROFILE=my-profile

Option C: IAM Role (for EC2/ECS/Lambda)

No explicit credentials needed, uses instance role

Quick Start

1. Clone and Build

git clone <your-repo-url>
cd java-deepeval-demo

# Build with Maven
mvn clean install

2. Configure Environment

Create or edit environment variables:

# Required
export AWS_REGION=eu-central-1
export AWS_PROFILE=default  # or use AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY

# Optional: Override default model
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20240620-v1:0

3. Run Locally

# Using Maven
mvn spring-boot:run

# Or using Makefile
make run

The application starts on http://localhost:8080

4. Test the API

Ask a question:

curl -s http://localhost:8080/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"Who is the CEO?"}' | jq

# Or use Makefile
make ask

Expected response:

{
  "answer": "Alice Smith is the CEO of TechCorp Solutions. She joined the company in 2018 after leading product development at a Fortune 500 tech company.",
  "contexts": [
    "Company Overview\nOur company, TechCorp Solutions, was founded in 2015...",
    "..."
  ]
}

Capture evaluation data:

curl -s http://localhost:8080/eval-capture \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Who is the CEO?",
    "answer": "Alice Smith is the CEO of TechCorp Solutions.",
    "contexts": ["Company Overview\nThe CEO is Alice Smith..."],
    "expected": "Alice Smith"
  }' | jq

# Or use Makefile
make capture

Expected response:

{
  "ok": true,
  "count": 1
}

Check evaluation/eval_dataset.json to see the captured data.

Docker Deployment

Build and Run with Docker Compose

# Build image
make docker

# Start container (with AWS credentials mounted)
make up

# Check logs
make logs

# Test endpoints
make ask
make capture

# Stop container
make down

The docker-compose setup:

Mounts ~/.aws for credential access
Persists evaluation/ directory for dataset storage
Exposes port 8080
Includes health checks

Configuration

Application Properties

Edit src/main/resources/application.yml or use environment variables:

Property	Environment Variable	Default	Description
`spring.ai.bedrock.aws.region`	`AWS_REGION`	`eu-central-1`	AWS region for Bedrock
`spring.ai.bedrock.anthropic.claude.chat.model`	`BEDROCK_MODEL_ID`	`anthropic.claude-3-5-sonnet-20240620-v1:0`	Bedrock model ID
`evaluation.output-dir`	`EVAL_OUTPUT_DIR`	`evaluation`	Directory for eval data
`evaluation.dataset-file`	`EVAL_DATASET_FILE`	`eval_dataset.json`	Eval dataset filename

Switching Models

To use different Bedrock models:

Amazon Titan:

export BEDROCK_MODEL_ID=amazon.titan-text-express-v1

Meta Llama:

export BEDROCK_MODEL_ID=meta.llama3-70b-instruct-v1:0

Note: Model availability varies by region. Check AWS Bedrock console.

Project Structure

bedrock-deepeval-demo/
├── pom.xml                          # Maven configuration with Spring AI BOM
├── Dockerfile                       # Multi-stage Docker build
├── docker-compose.yml               # Container orchestration
├── Makefile                         # Helper commands
├── README.md                        # This file
│
├── src/main/java/com/example/deepevaldemo/
│   ├── App.java                     # Spring Boot main class
│   ├── config/
│   │   └── BedrockConfig.java       # Spring AI ChatModel bean
│   ├── web/
│   │   └── AskController.java       # REST endpoints
│   ├── service/
│   │   ├── ChatService.java         # Chat orchestration + RAG
│   │   └── EvaluationService.java   # Eval data capture
│   └── rag/
│       └── ContextRetriever.java    # Simple in-memory retrieval
│
├── src/main/resources/
│   ├── application.yml              # Spring configuration
│   └── handbook.txt                 # Demo knowledge base
│
└── evaluation/
    ├── .gitkeep
    ├── README.md                    # Deepeval integration guide
    └── eval_dataset.json            # Generated NDJSON (runtime)

API Reference

POST /ask

Submit a question and receive an AI-generated answer with context.

Request:

{
  "question": "What are the pricing tiers?"
}

Response:

{
  "answer": "We offer three pricing tiers: Starter at $99/month, Professional at $299/month, and Enterprise with custom pricing.",
  "contexts": [
    "Pricing and Plans\nWe offer three pricing tiers...",
    "..."
  ]
}

Validation:

question is required and cannot be blank

POST /eval-capture

Capture evaluation data for Deepeval analysis.

Request:

{
  "question": "Who is the CEO?",
  "answer": "Alice Smith is the CEO of TechCorp Solutions.",
  "contexts": ["Company Overview\nThe CEO is Alice Smith..."],
  "expected": "Alice Smith"
}

Response:

{
  "ok": true,
  "count": 5
}

Validation:

All fields are required
contexts must be a non-empty array

How Evaluation Works

The /eval-capture endpoint stores evaluation records as newline-delimited JSON (NDJSON) in evaluation/eval_dataset.json.

Evaluation Workflow

Capture Data: Use your Java app to generate questions/answers and capture them
Review Dataset: Check evaluation/eval_dataset.json for completeness
Run Deepeval (Python): Analyze with metrics like faithfulness, relevancy, precision
Iterate: Improve prompts, context retrieval, or model based on results

Deepeval Integration

The full Python evaluation script will be provided in the accompanying Medium article. Here's a preview:

# pip install deepeval
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
import json

# Load captured dataset
test_cases = []
with open('evaluation/eval_dataset.json', 'r') as f:
    for line in f:
        record = json.loads(line)
        test_cases.append(LLMTestCase(
            input=record['question'],
            actual_output=record['answer'],
            expected_output=record['expected'],
            retrieval_context=record['contexts']
        ))

# Define metrics
metrics = [
    AnswerRelevancyMetric(threshold=0.7),
    FaithfulnessMetric(threshold=0.8)
]

# Run evaluation
results = evaluate(test_cases, metrics)
print(results)

See evaluation/README.md for detailed documentation.

Makefile Commands

make help       # Show all available commands
make run        # Run locally with Maven
make clean      # Clean build artifacts
make test       # Run tests
make docker     # Build Docker image
make up         # Start with docker-compose
make down       # Stop docker-compose
make logs       # View container logs
make ask        # Sample curl to /ask
make capture    # Sample curl to /eval-capture

Troubleshooting

Common Issues

"No credentials found"

Verify AWS credentials: aws sts get-caller-identity
Check environment variables: echo $AWS_PROFILE
Ensure ~/.aws/credentials exists

"Access denied to model"

Request model access in AWS Bedrock console
Verify IAM permissions include bedrock:InvokeModel
Check if model is available in your region

"Model not found"

Verify model ID format: anthropic.claude-3-5-sonnet-20240620-v1:0
List available models: aws bedrock list-foundation-models --region eu-central-1

Docker: "Permission denied"

On Linux, ensure user is in docker group: sudo usermod -aG docker $USER
Or run with sudo: sudo make up

Debug Logging

Enable debug logs in application.yml:

logging:
  level:
    com.example.deepevaldemo: DEBUG
    org.springframework.ai: DEBUG

Performance Considerations

Context Retrieval

The current implementation uses simple token overlap for retrieval. For production:

Use vector embeddings (OpenAI, Cohere, SentenceTransformers)
Integrate vector databases (Pinecone, Weaviate, Milvus)
Implement semantic search with cosine similarity

Cost Management

AWS Bedrock charges per token:

Claude 3.5 Sonnet: ~$0.003 per 1K input tokens, ~$0.015 per 1K output tokens
Monitor usage in AWS Cost Explorer
Set up billing alarms

Scaling

For production deployments:

Use Spring Boot Actuator for monitoring
Configure connection pooling for Bedrock API
Implement caching for frequent queries
Consider async processing for evaluation capture

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

License

MIT License - see LICENSE file for details

Resources

Support

For issues:

Check the troubleshooting section above
Review AWS Bedrock logs in CloudWatch
Open an issue on GitHub

Built with Spring Boot 3.3.x, Spring AI, and AWS Bedrock

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
evaluation		evaluation
src/main		src/main
.gitignore		.gitignore
AWS_CREDENTIALS_GUIDE.md		AWS_CREDENTIALS_GUIDE.md
DEPLOYMENT_CHECKLIST.md		DEPLOYMENT_CHECKLIST.md
Dockerfile		Dockerfile
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AWS Bedrock + Spring AI Demo with Deepeval Integration

Features

Prerequisites

AWS Setup

Quick Start

1. Clone and Build

2. Configure Environment

3. Run Locally

4. Test the API

Docker Deployment

Build and Run with Docker Compose

Configuration

Application Properties

Switching Models

Project Structure

API Reference

POST /ask

POST /eval-capture

How Evaluation Works

Evaluation Workflow

Deepeval Integration

Makefile Commands

Troubleshooting

Common Issues

Debug Logging

Performance Considerations

Context Retrieval

Cost Management

Scaling

Contributing

License

Resources

Related Articles

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages