A complete Spring Boot demo application showing how to integrate AWS Bedrock (Claude 3.5 Sonnet) with Spring AI for chat/RAG functionality, and capture evaluation data for analysis with Deepeval (Python).
Perfect for Java developers exploring LLM integration and evaluation workflows.
- AWS Bedrock Integration: Claude 3.5 Sonnet via Spring AI (swappable to Titan/Llama)
- Simple RAG Pipeline: In-memory context retrieval from a handbook
- REST API: Clean endpoints for chat and evaluation capture
- Evaluation Support: NDJSON dataset generation for Deepeval metrics
- Docker Ready: Containerized deployment with docker-compose
- Production Patterns: Configuration management, logging, validation
- Java 21 (JDK 21+)
- Maven 3.9+
- AWS Account with Bedrock access enabled
- AWS Credentials configured (see below)
- Docker (optional, for containerized deployment)
- curl + jq (for testing)
-
Enable Bedrock Access in your AWS account:
- Navigate to AWS Bedrock console
- Request access to Anthropic Claude 3.5 Sonnet model
- Wait for approval (usually instant for most regions)
-
Configure IAM Permissions: Your IAM user/role needs these permissions:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream" ], "Resource": "arn:aws:bedrock:*:*:model/*" } ] } -
Set up AWS Credentials (choose one method):
Option A: Environment Variables
export AWS_ACCESS_KEY_ID=your_access_key export AWS_SECRET_ACCESS_KEY=your_secret_key export AWS_REGION=eu-central-1
Option B: AWS CLI Profile
aws configure # Or use a named profile: export AWS_PROFILE=my-profile
Option C: IAM Role (for EC2/ECS/Lambda)
- No explicit credentials needed, uses instance role
git clone <your-repo-url>
cd java-deepeval-demo
# Build with Maven
mvn clean installCreate or edit environment variables:
# Required
export AWS_REGION=eu-central-1
export AWS_PROFILE=default # or use AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY
# Optional: Override default model
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20240620-v1:0# Using Maven
mvn spring-boot:run
# Or using Makefile
make runThe application starts on http://localhost:8080
Ask a question:
curl -s http://localhost:8080/ask \
-H "Content-Type: application/json" \
-d '{"question":"Who is the CEO?"}' | jq
# Or use Makefile
make askExpected response:
{
"answer": "Alice Smith is the CEO of TechCorp Solutions. She joined the company in 2018 after leading product development at a Fortune 500 tech company.",
"contexts": [
"Company Overview\nOur company, TechCorp Solutions, was founded in 2015...",
"..."
]
}Capture evaluation data:
curl -s http://localhost:8080/eval-capture \
-H "Content-Type: application/json" \
-d '{
"question": "Who is the CEO?",
"answer": "Alice Smith is the CEO of TechCorp Solutions.",
"contexts": ["Company Overview\nThe CEO is Alice Smith..."],
"expected": "Alice Smith"
}' | jq
# Or use Makefile
make captureExpected response:
{
"ok": true,
"count": 1
}Check evaluation/eval_dataset.json to see the captured data.
# Build image
make docker
# Start container (with AWS credentials mounted)
make up
# Check logs
make logs
# Test endpoints
make ask
make capture
# Stop container
make downThe docker-compose setup:
- Mounts
~/.awsfor credential access - Persists
evaluation/directory for dataset storage - Exposes port 8080
- Includes health checks
Edit src/main/resources/application.yml or use environment variables:
| Property | Environment Variable | Default | Description |
|---|---|---|---|
spring.ai.bedrock.aws.region |
AWS_REGION |
eu-central-1 |
AWS region for Bedrock |
spring.ai.bedrock.anthropic.claude.chat.model |
BEDROCK_MODEL_ID |
anthropic.claude-3-5-sonnet-20240620-v1:0 |
Bedrock model ID |
evaluation.output-dir |
EVAL_OUTPUT_DIR |
evaluation |
Directory for eval data |
evaluation.dataset-file |
EVAL_DATASET_FILE |
eval_dataset.json |
Eval dataset filename |
To use different Bedrock models:
Amazon Titan:
export BEDROCK_MODEL_ID=amazon.titan-text-express-v1Meta Llama:
export BEDROCK_MODEL_ID=meta.llama3-70b-instruct-v1:0Note: Model availability varies by region. Check AWS Bedrock console.
bedrock-deepeval-demo/
├── pom.xml # Maven configuration with Spring AI BOM
├── Dockerfile # Multi-stage Docker build
├── docker-compose.yml # Container orchestration
├── Makefile # Helper commands
├── README.md # This file
│
├── src/main/java/com/example/deepevaldemo/
│ ├── App.java # Spring Boot main class
│ ├── config/
│ │ └── BedrockConfig.java # Spring AI ChatModel bean
│ ├── web/
│ │ └── AskController.java # REST endpoints
│ ├── service/
│ │ ├── ChatService.java # Chat orchestration + RAG
│ │ └── EvaluationService.java # Eval data capture
│ └── rag/
│ └── ContextRetriever.java # Simple in-memory retrieval
│
├── src/main/resources/
│ ├── application.yml # Spring configuration
│ └── handbook.txt # Demo knowledge base
│
└── evaluation/
├── .gitkeep
├── README.md # Deepeval integration guide
└── eval_dataset.json # Generated NDJSON (runtime)
Submit a question and receive an AI-generated answer with context.
Request:
{
"question": "What are the pricing tiers?"
}Response:
{
"answer": "We offer three pricing tiers: Starter at $99/month, Professional at $299/month, and Enterprise with custom pricing.",
"contexts": [
"Pricing and Plans\nWe offer three pricing tiers...",
"..."
]
}Validation:
questionis required and cannot be blank
Capture evaluation data for Deepeval analysis.
Request:
{
"question": "Who is the CEO?",
"answer": "Alice Smith is the CEO of TechCorp Solutions.",
"contexts": ["Company Overview\nThe CEO is Alice Smith..."],
"expected": "Alice Smith"
}Response:
{
"ok": true,
"count": 5
}Validation:
- All fields are required
contextsmust be a non-empty array
The /eval-capture endpoint stores evaluation records as newline-delimited JSON (NDJSON) in evaluation/eval_dataset.json.
- Capture Data: Use your Java app to generate questions/answers and capture them
- Review Dataset: Check
evaluation/eval_dataset.jsonfor completeness - Run Deepeval (Python): Analyze with metrics like faithfulness, relevancy, precision
- Iterate: Improve prompts, context retrieval, or model based on results
The full Python evaluation script will be provided in the accompanying Medium article. Here's a preview:
# pip install deepeval
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
import json
# Load captured dataset
test_cases = []
with open('evaluation/eval_dataset.json', 'r') as f:
for line in f:
record = json.loads(line)
test_cases.append(LLMTestCase(
input=record['question'],
actual_output=record['answer'],
expected_output=record['expected'],
retrieval_context=record['contexts']
))
# Define metrics
metrics = [
AnswerRelevancyMetric(threshold=0.7),
FaithfulnessMetric(threshold=0.8)
]
# Run evaluation
results = evaluate(test_cases, metrics)
print(results)See evaluation/README.md for detailed documentation.
make help # Show all available commands
make run # Run locally with Maven
make clean # Clean build artifacts
make test # Run tests
make docker # Build Docker image
make up # Start with docker-compose
make down # Stop docker-compose
make logs # View container logs
make ask # Sample curl to /ask
make capture # Sample curl to /eval-capture"No credentials found"
- Verify AWS credentials:
aws sts get-caller-identity - Check environment variables:
echo $AWS_PROFILE - Ensure
~/.aws/credentialsexists
"Access denied to model"
- Request model access in AWS Bedrock console
- Verify IAM permissions include
bedrock:InvokeModel - Check if model is available in your region
"Model not found"
- Verify model ID format:
anthropic.claude-3-5-sonnet-20240620-v1:0 - List available models:
aws bedrock list-foundation-models --region eu-central-1
Docker: "Permission denied"
- On Linux, ensure user is in docker group:
sudo usermod -aG docker $USER - Or run with sudo:
sudo make up
Enable debug logs in application.yml:
logging:
level:
com.example.deepevaldemo: DEBUG
org.springframework.ai: DEBUGThe current implementation uses simple token overlap for retrieval. For production:
- Use vector embeddings (OpenAI, Cohere, SentenceTransformers)
- Integrate vector databases (Pinecone, Weaviate, Milvus)
- Implement semantic search with cosine similarity
AWS Bedrock charges per token:
- Claude 3.5 Sonnet: ~$0.003 per 1K input tokens, ~$0.015 per 1K output tokens
- Monitor usage in AWS Cost Explorer
- Set up billing alarms
For production deployments:
- Use Spring Boot Actuator for monitoring
- Configure connection pooling for Bedrock API
- Implement caching for frequent queries
- Consider async processing for evaluation capture
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
MIT License - see LICENSE file for details
- "Evaluating AWS Bedrock Responses with Deepeval" on Medium (link TBD)
- Spring AI Bedrock Integration Guide
For issues:
- Check the troubleshooting section above
- Review AWS Bedrock logs in CloudWatch
- Open an issue on GitHub
Built with Spring Boot 3.3.x, Spring AI, and AWS Bedrock