fix(runtime): harden Flask validation Docker readiness and CI#35
fix(runtime): harden Flask validation Docker readiness and CI#35donny-devops wants to merge 8 commits into
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Analysis CompleteGenerated ECC bundle from 8 commits | Confidence: 50% View Pull Request #36Repository Profile
Changed Files (8)
Top hotspots
Top directories
Analysis Depth Readiness (evidence-backed, 29%)ECC Tools uses this to decide whether recommendations should stay at commit-history/setup guidance or expand into CI, security, harness, reference-set, AI-routing, and team backlog work.
Reference Set Readiness (0/7, 0%)
Likely Future Issues (2)
Suggested Follow-up Work (2)
Copy-ready bodies ci: add failure-mode evidence for .github/workflows/ci.yml ## Summary
- Add CI failure-mode evidence for the recently changed workflow or test-runner surface.
## Why
- Backfill CI failure-mode evidence before another workflow or test-runner change lands on the touched surface.
## Touched paths
- `.github/workflows/ci.yml`
## Validation
- Add or update a CI failure fixture, captured failing log, troubleshooting note, workflow dry-run evidence, or regression test for the changed CI/test-runner behavior.
- Run the affected workflow or test-runner entrypoint locally or in CI and record pass/fail evidence.chore: refresh lockfile and validate CI after dependency updates ## Summary
- Refresh the lockfile and rerun CI after the dependency or workflow changes in this PR.
## Why
- Package or workflow changes without a lockfile refresh tend to turn into noisy follow-up fixes after merge.
## Touched paths
- `.github/workflows/ci.yml`
## Validation
- Refresh the lockfile in the same package manager used by the repo.
- Run the repo typecheck / test / CI entrypoints that depend on the updated package graph.Detected Workflows (2)
Generated Instincts (22)
After merging, import with: Files
|
There was a problem hiding this comment.
Code Review
This pull request enhances the application's operational readiness by introducing a database-aware readiness check, centralized JSON error handling, and a more robust container entrypoint that waits for database connectivity. The documentation and environment templates were also updated to reflect new configuration options for Gunicorn and database timeouts. Feedback focuses on improving the robustness of the HTTP error handler where exc.code might be None, optimizing the database wait loop in the entrypoint by moving engine creation outside the retry loop, and improving type safety by replacing noqa suppressions with explicit return type annotations.
| def handle_http_exception(exc: HTTPException): # noqa: ANN202 | ||
| """Return consistent JSON for Flask/Werkzeug HTTP errors.""" | ||
| return ( | ||
| jsonify( | ||
| { | ||
| "error": exc.name, | ||
| "message": exc.description, | ||
| "status_code": exc.code, | ||
| } | ||
| ), | ||
| exc.code, | ||
| ) |
There was a problem hiding this comment.
The exc.code attribute in HTTPException can be None for certain exception types. If it is None, Flask will fail to generate a response from the returned tuple. Providing a default status code (e.g., 500) ensures the error handler is robust. Additionally, providing the return type annotation allows for the removal of the noqa suppression.
| def handle_http_exception(exc: HTTPException): # noqa: ANN202 | |
| """Return consistent JSON for Flask/Werkzeug HTTP errors.""" | |
| return ( | |
| jsonify( | |
| { | |
| "error": exc.name, | |
| "message": exc.description, | |
| "status_code": exc.code, | |
| } | |
| ), | |
| exc.code, | |
| ) | |
| @app.errorhandler(HTTPException) | |
| def handle_http_exception(exc: HTTPException) -> tuple[Response, int]: | |
| """Return consistent JSON for Flask/Werkzeug HTTP errors.""" | |
| code = exc.code or int(HTTPStatus.INTERNAL_SERVER_ERROR) | |
| return ( | |
| jsonify( | |
| { | |
| "error": exc.name, | |
| "message": exc.description, | |
| "status_code": code, | |
| } | |
| ), | |
| code, | |
| ) |
| from flask import Flask, jsonify | ||
| from sqlalchemy import text |
There was a problem hiding this comment.
Import Response from Flask and alias SQLAlchemy's exc module. This allows for proper type hinting and more specific error handling in the routes and error handlers below.
| from flask import Flask, jsonify | |
| from sqlalchemy import text | |
| from flask import Flask, Response, jsonify | |
| from sqlalchemy import exc as sa_exc, text |
| return {"status": "healthy", "service": "docker-flask-postgres-api"} | ||
|
|
||
| @app.get("/ready") | ||
| def ready(): # noqa: ANN202 |
| """Readiness check that verifies database connectivity.""" | ||
| try: | ||
| db.session.execute(text("SELECT 1")) | ||
| except Exception as exc: # noqa: BLE001 |
There was a problem hiding this comment.
Catching a broad Exception is discouraged. It is better to catch sa_exc.SQLAlchemyError to specifically handle database connectivity issues, allowing other unexpected application errors to be caught by the global error handler.
| except Exception as exc: # noqa: BLE001 | |
| except sa_exc.SQLAlchemyError as exc: |
| ) | ||
|
|
||
| @app.errorhandler(Exception) | ||
| def handle_unexpected_exception(exc: Exception): # noqa: ANN202 |
There was a problem hiding this comment.
| : "${DATABASE_URL:?DATABASE_URL must be set}" | ||
| : "${GUNICORN_WORKERS:=2}" | ||
| : "${GUNICORN_THREADS:=4}" | ||
| : "${DB_WAIT_TIMEOUT:=60}" |
There was a problem hiding this comment.
For consistency with other Gunicorn-related environment variables, GUNICORN_TIMEOUT should be initialized with a default value in the script's preamble.
| : "${DATABASE_URL:?DATABASE_URL must be set}" | |
| : "${GUNICORN_WORKERS:=2}" | |
| : "${GUNICORN_THREADS:=4}" | |
| : "${DB_WAIT_TIMEOUT:=60}" | |
| : "${DATABASE_URL:?DATABASE_URL must be set}" | |
| : "${GUNICORN_WORKERS:=2}" | |
| : "${GUNICORN_THREADS:=4}" | |
| : "${GUNICORN_TIMEOUT:=60}" | |
| : "${DB_WAIT_TIMEOUT:=60}" |
| while time.monotonic() < deadline: | ||
| try: | ||
| engine = create_engine(url, pool_pre_ping=True) | ||
| with engine.connect() as conn: | ||
| conn.execute(text("SELECT 1")) | ||
| print("[entrypoint] Database is reachable.") | ||
| sys.exit(0) | ||
| except SQLAlchemyError as exc: | ||
| last_error = str(exc) | ||
| time.sleep(2) |
There was a problem hiding this comment.
Creating a new SQLAlchemy engine inside the loop is inefficient as it sets up a new connection pool on every iteration. The engine should be created once outside the loop. Additionally, pool_pre_ping is unnecessary here because a fresh connection is requested immediately via engine.connect().
| while time.monotonic() < deadline: | |
| try: | |
| engine = create_engine(url, pool_pre_ping=True) | |
| with engine.connect() as conn: | |
| conn.execute(text("SELECT 1")) | |
| print("[entrypoint] Database is reachable.") | |
| sys.exit(0) | |
| except SQLAlchemyError as exc: | |
| last_error = str(exc) | |
| time.sleep(2) | |
| engine = create_engine(url) | |
| while time.monotonic() < deadline: | |
| try: | |
| with engine.connect() as conn: | |
| conn.execute(text("SELECT 1")) | |
| print("[entrypoint] Database is reachable.") | |
| sys.exit(0) | |
| except SQLAlchemyError as exc: | |
| last_error = str(exc) | |
| time.sleep(2) |
| --timeout 60 \ | ||
| --workers "${GUNICORN_WORKERS}" \ | ||
| --threads "${GUNICORN_THREADS}" \ | ||
| --timeout "${GUNICORN_TIMEOUT:-60}" \ |
There was a problem hiding this comment.
Summary
This PR successfully addresses the stated goals of hardening Flask validation, Docker readiness, and CI configuration. The changes include important reliability improvements: Marshmallow validator compatibility fixes, database connectivity verification, entrypoint wait logic, and security-focused port bindings.
Key Improvements
- ✅ Fixed Marshmallow validator compatibility with newer versions by accepting metadata kwargs
- ✅ Added
/readyendpoint with database connectivity verification - ✅ Implemented database wait loop in entrypoint to prevent migration race conditions
- ✅ Improved CI least-privilege model by removing auto-format push permissions
- ✅ Enhanced security by binding Postgres and pgAdmin to localhost
- ✅ Added comprehensive test coverage for new functionality
Critical Feedback
The main concern is the database session handling in the /ready endpoint (see comment on app/__init__.py). Using db.engine.connect() instead of db.session.execute() prevents potential connection pool issues under load and follows best practices for health check endpoints that don't need ORM session features.
The validation tests cover all new features thoroughly, and the CI pipeline properly enforces code quality without over-privileged auto-formatting.
Recommendation
Address the session cleanup issue in the readiness endpoint, then this PR will be ready to merge. The other changes are well-implemented and align with production best practices.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
| condition: service_healthy | ||
| healthcheck: | ||
| test: ["CMD", "curl", "-f", "http://localhost:5000/health"] | ||
| test: ["CMD", "curl", "-f", "http://localhost:5000/ready"] |
There was a problem hiding this comment.
🛑 Logic Error: The healthcheck creates a circular dependency. The API container depends on db being healthy (line 40-41), but this healthcheck tests /ready which verifies database connectivity. If the database becomes temporarily unavailable after startup, this healthcheck will fail and mark the API as unhealthy, even though the API service itself is running fine.
Change the healthcheck back to /health for liveness checks, or accept that the API healthcheck accurately reflects database dependency status. If keeping /ready, document that the API healthcheck intentionally fails when the database is unavailable.
| def ready(): # noqa: ANN202 | ||
| """Readiness check that verifies database connectivity.""" | ||
| try: | ||
| db.session.execute(text("SELECT 1")) |
There was a problem hiding this comment.
🛑 Crash Risk: The database session is not explicitly closed or rolled back after the readiness check query. While Flask-SQLAlchemy typically handles session cleanup, executing raw SQL with db.session.execute() in a health check endpoint without explicit cleanup can lead to connection pool exhaustion under high request load or when the database is slow to respond.
Add db.session.remove() or use a separate engine connection to ensure the session is properly cleaned up after each readiness check.
| db.session.execute(text("SELECT 1")) | |
| try: | |
| with db.engine.connect() as conn: | |
| conn.execute(text("SELECT 1")) | |
| except Exception as exc: # noqa: BLE001 |
| - "5432:5432" | ||
| - "127.0.0.1:5432:5432" | ||
| healthcheck: | ||
| test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-appuser} -d ${POSTGRES_DB:-appdb}"] |
There was a problem hiding this comment.
🛑 Logic Error: The pg_isready healthcheck includes specific user and database parameters but will fail if the environment variables are not properly interpolated. The ${POSTGRES_USER:-appuser} syntax works in the docker-compose context but the actual command string passed to the container may not expand these variables correctly depending on how PostgreSQL processes the healthcheck command.
Test this healthcheck thoroughly to ensure it properly validates with the interpolated username and database, or simplify to pg_isready without parameters if the connection variables are already configured via environment.
Micro-Learning Topic: Race condition (Detected by phrase)Matched on "race condition"A race condition is a flaw that produces an unexpected result when the timing of actions impact other actions. Try a challenge in Secure Code Warrior |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Security | 5 high |
🟢 Metrics 8 complexity · 0 duplication
Metric Results Complexity 8 Duplication 0
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Summary
/readyendpoint with database connectivity checkadminprofileRoot causes / risks fixed
value./healthchecked only app liveness, while Docker readiness needed database verification.contents: writeand auto-pushed formatting commits, which is noisy and over-privileged.Validation expected
ruff check .ruff format --check .pytest --cov=app --cov-report=xml --cov-fail-under=85 -v