fix: prevent SQL injection in query_metrics and natural_language_query endpoints#89
Open
atiliohector33 wants to merge 1 commit into
Open
fix: prevent SQL injection in query_metrics and natural_language_query endpoints#89atiliohector33 wants to merge 1 commit into
atiliohector33 wants to merge 1 commit into
Conversation
1a49f2e to
1ebd79b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔒 What's the problem?
Two API endpoints were building SQL queries by pasting raw user input directly into SQL strings — no validation, no sanitization.
This is a classic SQL Injection (CWE-89) vulnerability. An attacker who knows how the API works (or just guesses) can manipulate the query that gets sent to ClickHouse and read data they shouldn't see, or worse.
🔍 Vulnerable endpoints
1.
GET /api/analyze/{hash}/query_metrics/The
{hash}in the URL comes straight from the browser and was dropped into the SQL with no checks:That
conditionsstring then gets pasted into four different SQL templates via%(conditions)s.Normal request:
Produces safe SQL:
Attack request:
Produces broken SQL:
2.
POST /api/analyze/natural_language_query/The AI feature lets users pick which tables to query. Table names came from the request body and were interpolated directly into SQL:
Normal request body:
{ "tables_to_query": ["default>>>>>events"], "query": "show me the top 10 slowest queries" }Produces safe SQL:
Attack request body:
{ "tables_to_query": ["default>>>>>' OR 1=1 UNION SELECT name, '', query FROM system.users --"], "query": "anything" }Produces malicious SQL:
🤔 Why not just use parameterized queries?
Good question. The
run_query()function already supports parameters — but parameterization only works for values (strings, numbers, dates). The%(conditions)sslot injects an entire SQL clause, not a single value:The
conditionspattern is used across multiple SQL templates to let different endpoints share the same query structure. Refactoring all templates would be a much larger change. Input validation is the correct minimal fix here.✅ The fix
Fix #1 —
query_metrics: validate that the hash is a plain integernormalized_query_hashin ClickHouse is aUInt64— always a number like14763824958273648. Numbers can't carry SQL syntax. If we confirm it's a valid integer before touching the SQL, we're safe.The attack
' OR 1=1 --fails.isdigit()and gets a400immediately — never reaches the SQL builder.Fix #2 —
natural_language_query: validate identifiers with a regexDatabase and table names in ClickHouse can only contain
[a-zA-Z0-9_]. They can't contain quotes, semicolons, spaces, or any other SQL-special character. We enforce that rule before building the fragment:The attack payload
"default>>>>>' OR 1=1 --"givestable = "' OR 1=1 --"— which fails the regex and returns400 Bad Request.📋 Changes summary
query_metrics(line 79)pkfrom URL pasted into SQL via.format()pk.isdigit()guard + integer castnatural_language_query(lines 273–274)database/tablefrom request body pasted into SQL via f-string^[a-zA-Z0-9_]+$validates both before useFiles changed:
housewatch/api/analyze.py