feat: Implement date_part scalar function #27005
Conversation
Remove a lot of code that wasn't needed for date_part including iterator creation. We can just map values similar to simple math functions.
|
|
||
| const ( | ||
| DatePartString = "date_part" | ||
| DatePartTimeString = "date_part_time" |
There was a problem hiding this comment.
What is date_part_time? I don't see any tests for it.
There was a problem hiding this comment.
It's used to create a reference to time since time is an auxiliary field https://github.com/influxdata/influxdb/pull/27005/files#diff-609a7e16be956ed6386e1a4a4efadf600b7d4de7dcfea27330dc692d1e901dc8R930-R944 I'm going to create some ValueMapper tests for this.
There was a problem hiding this comment.
@gwossum I can add tests for this but it would likely require exporting
Line 881 in 362217b
davidby-influx
left a comment
There was a problem hiding this comment.
Some changes from the first pass. Will review again after changes.
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
|
@copilot resolve the merge conflicts in this pull request |
Done — I merged |
a51d744 to
d39380f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated no new comments.
Files not reviewed (1)
- query/internal/internal.pb.go: Generated file
Comments suppressed due to low confidence (1)
query/iterator.gen.go.tmpl:568
- In
ScanAt, when an aux value falls into thedefaultbranch anddefaultValue == SkipDefault(e.g.fill(none)), the map entry for that key is left untouched. With the new date_part GROUP BY behavior, non-active dimension aux slots are explicitly set tonil, so this path will leave stale values from a previous row inm[k.Val]and can incorrectly populate non-active date_part dimension columns. Clearing the key when no fill default is configured avoids that leakage.
default:
// Insert the fill value if one was specified.
if s.defaultValue != SkipDefault {
m[k.Val] = castToType(s.defaultValue, k.Type)
}
| if err := ValidateDatePart(expr.Args); err != nil { | ||
| return err | ||
| } | ||
| // GROUP BY date_part over a subquery source is not supported: the |
There was a problem hiding this comment.
It is much easier to just not support GROUP BY for date_part in sub-queries. I could adjust the code to support it but it will add additional complexity that I'm unsure we want.
Code Review —
|
| # | Location | Divergence |
|---|---|---|
| 6 | query/date_part.go:152 |
millisecond/microsecond return only the sub-second part — Postgres returns seconds*1000 + frac (off by up to 59,000 ms). |
| 7 | query/date_part.go:162 |
epoch uses t.Unix(), truncating fractional seconds that SQL epoch keeps. |
| 8 | query/date_part.go:163 |
isodow returns 0–6 (Sun=6) instead of SQL 1–7 (Sun=7) — off by one. date_part_test.go asserts the 0–6 values. |
Worth a deliberate decision: if the intent is SQL compatibility, fix the code and the
tests; if InfluxDB-specific semantics are intended, document it.
🟡 Clustered/distributed only — not reachable in single-node OSS (PLAUSIBLE)
- 9.
query/point.go:266—encodeAux/decodeAuxcan't serialize the
DecodedDatePartKeystruct; over the data-node wire codec the grouped value comes back
null and all buckets collapse into one group. - 10.
query/iterator.gen.go.tmpl:1486— the genericFilterIterator.Nextuses
EvalBoolwith a non-CallValuermap, so adate_part(...)WHERE predicate reaching
it filters out every point (zero rows). No in-repo callers ofNewFilterIterator;
reachability uncertain. - 11.
tsdb/engine/tsm1/iterator.gen.go.tmpl:289—itr.mis allocated only when
Condition != nil, but written wheneverNeedTimeRef. Safe locally (encoder keeps the
invariant), but a wire-decoded options struct withNeedTimeRef=true, Condition=nil
panics on a nil-map write → crashes the query/node.
⚡ Efficiency (CONFIRMED)
12. ⭐ DatePartValuer wired into every tsm1 query, date_part or not
tsdb/engine/tsm1/iterator.gen.go.tmpl:235 — unconditionally adds an extra valuer
indirection (an interface call per VarRef/Call lookup, per scanned point) to all
WHERE-filtered queries — the common path. The scanner cursor already gates the
identical wiring behind needDatePart; opt.NeedTimeRef could gate it here too.
This is a per-point CPU regression across the whole engine, not just date_part queries
— arguably higher priority than its category suggests.
13. ResolveKeys allocates per-point garbage discarded on bucket hit
query/date_part.go:381 — allocates a fresh entries slice + a 9-byte EncodedKey
string per point, but the reduce loop consumes EncodedKey only on bucket creation (map
miss) → K−1 of every K allocations are GC garbage. Return only DimKey, compute
EncodedKey lazily, reuse a scratch buffer.
14. Per-field redundant work in Scan
query/cursor.go:277 — the DatePartDimensionsString lookup, type assertion,
dpd.Expr.String(), and GroupingKeys insert are recomputed per field though invariant
across the field loop; hoist above it.
📋 Convention (CONFIRMED)
15. New date_part tests use raw t.Fatal/t.Error instead of testify
tests/server_test.go:8085 (also 8790, 8863, 9242, 9297) — violates the project's
testify rule. The same functions already use require.NoError(t, err, "init error") for
setup, so it's internally inconsistent too.
Refuted (19, not reported)
Mostly DRY/maintainability "keep-in-sync" observations (parallel DatePartExpr switches,
duplicated AST walkers, LocationOrUTC not reused) where verifiers found no current
observable defect, plus the millisecond/isodow duplicates and a
DatePartValuer{}-in-compileFields "dead code" claim that was refuted (it can fire
with a literal second arg).
Suggested fix order
- 1 (data loss) and 12 (engine-wide perf regression) — highest impact.
- Correctness cluster 2–5.
- Decision on SQL semantics 6–8 (fix code+tests, or document).
- Mechanical cleanups 13–15.
|
AI review not verified by a human. Take with a grain of salt. |
- cleanup tests to use testify only
Implements
date_part(part, expression), which extracts a component from a timestamp.Signature:
date_part('<part>', time)timeVarRef, nothing elseint64, evaluated in the query timezone (tz(...)), default UTCParts:
yearquarter[1, 4]month[1, 12]week[1, 53]day[1, 31]hour/minute/second[0,23]/[0,59]/[0,59]millisecond/microsecond/nanoseconddowisodowdoy[1, 366]epochweekis the ISO week andyearis the calendar year, so the two can disagree atyear boundaries. For example 2023-01-01 returns week 52.
Examples:
SELECT rules
date_partaggregate or selector.
date_part-only selects are rejected.date_partfields and aliases are allowed, and may nest in expressionssuch as
date_part('hour', time) + 1.GROUP BY date_part rules
time(). Other calls in GROUP BY are rejected.date_part('part', time)must match a grouped part. A non-groupedpart is rejected because it is undefined for the bucket. A non-active grouped
part yields null in that series.
year. A field or aliascolliding with it is rejected.
fill(null)andfill(none)are supported.fill(previous),fill(linear),and
fill(<value>)are not.See #27001 for 1.x limitations.