Skip to content

feat: Implement date_part scalar function #27005

Open
devanbenz wants to merge 96 commits into
master-1.xfrom
db/76/date_part
Open

feat: Implement date_part scalar function #27005
devanbenz wants to merge 96 commits into
master-1.xfrom
db/76/date_part

Conversation

@devanbenz

@devanbenz devanbenz commented Dec 3, 2025

Copy link
Copy Markdown

Implements date_part(part, expression), which extracts a component from a timestamp.

Signature:

  • Exactly 2 args, in order: date_part('<part>', time)
    • arg 1: string literal naming the part (case-insensitive)
    • arg 2: the time VarRef, nothing else
  • Returns int64, evaluated in the query timezone (tz(...)), default UTC

Parts:

part value
year calendar year
quarter quarter of year, [1, 4]
month month of year, [1, 12]
week ISO-8601 week of year, [1, 53]
day day of month, [1, 31]
hour / minute / second [0,23] / [0,59] / [0,59]
millisecond / microsecond / nanosecond sub-second component only
dow day of week, Sunday = 0 to Saturday = 6
isodow day of week, Monday = 0 to Sunday = 6
doy day of year, [1, 366]
epoch seconds since Unix epoch (whole seconds)

week is the ISO week and year is the calendar year, so the two can disagree at
year boundaries. For example 2023-01-01 returns week 52.

Examples:

-- weekdays only
SELECT * FROM some_measurement
WHERE time >= now() - 10d AND time <= now()
  AND date_part('dow', time) != 0 AND date_part('dow', time) != 6

SELECT value, date_part('hour', time) FROM some_measurement

SELECT rules

  • Must be paired with an anchor, meaning a stored field or a non-date_part
    aggregate or selector. date_part-only selects are rejected.
  • Multiple date_part fields and aliases are allowed, and may nest in expressions
    such as date_part('hour', time) + 1.

GROUP BY date_part rules

  • Allowed alongside time(). Other calls in GROUP BY are rejected.
  • Duplicate parts are deduplicated.
  • A SELECTed date_part('part', time) must match a grouped part. A non-grouped
    part is rejected because it is undefined for the bucket. A non-active grouped
    part yields null in that series.
  • Output column is named after the canonical part such as year. A field or alias
    colliding with it is rejected.
  • Resolved from the bucket value, not the row timestamp.
  • fill(null) and fill(none) are supported. fill(previous), fill(linear),
    and fill(<value>) are not.

See #27001 for 1.x limitations.

@devanbenz devanbenz self-assigned this Dec 4, 2025
@devanbenz devanbenz linked an issue Dec 8, 2025 that may be closed by this pull request
@devanbenz devanbenz marked this pull request as ready for review December 9, 2025 21:50
Comment thread query/cursor.go Outdated
Comment thread query/date_part.go Outdated
Comment thread query/date_part.go

const (
DatePartString = "date_part"
DatePartTimeString = "date_part_time"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is date_part_time? I don't see any tests for it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to create a reference to time since time is an auxiliary field https://github.com/influxdata/influxdb/pull/27005/files#diff-609a7e16be956ed6386e1a4a4efadf600b7d4de7dcfea27330dc692d1e901dc8R930-R944 I'm going to create some ValueMapper tests for this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwossum I can add tests for this but it would likely require exporting

type valueMapper struct {
and testing it. We don't currently have any valueMapper specific tests. It's basically just a struct filled with maps so we would likely just be testing go's map functionality, which may not be worth the effort?

@davidby-influx davidby-influx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes from the first pass. Will review again after changes.

Comment thread query/compile_test.go
Comment thread query/date_part.go
Comment thread query/date_part.go Outdated
Comment thread query/date_part.go Outdated
Comment thread query/date_part.go Outdated
Comment thread tsdb/engine/tsm1/iterator.gen.go Outdated
Comment thread tsdb/engine/tsm1/iterator.gen.go Outdated
Comment thread tsdb/engine/tsm1/iterator.gen.go Outdated
Comment thread tsdb/engine/tsm1/iterator.gen.go Outdated
Comment thread tsdb/engine/tsm1/iterator.gen.go.tmpl Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • query/internal/internal.pb.go: Generated file

Comment thread query/iterator.go
Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • query/internal/internal.pb.go: Generated file

Comment thread query/compile.go

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated no new comments.

Files not reviewed (1)
  • query/internal/internal.pb.go: Generated file

@devanbenz

Copy link
Copy Markdown
Author

@copilot resolve the merge conflicts in this pull request

Copilot AI commented Jun 24, 2026

Copy link
Copy Markdown

@copilot resolve the merge conflicts in this pull request

Done — I merged origin/master-1.x into this branch and resolved the conflict. Addressed in commit b786751.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • query/internal/internal.pb.go: Generated file

Comment thread query/compile.go

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated no new comments.

Files not reviewed (1)
  • query/internal/internal.pb.go: Generated file
Comments suppressed due to low confidence (1)

query/iterator.gen.go.tmpl:568

  • In ScanAt, when an aux value falls into the default branch and defaultValue == SkipDefault (e.g. fill(none)), the map entry for that key is left untouched. With the new date_part GROUP BY behavior, non-active dimension aux slots are explicitly set to nil, so this path will leave stale values from a previous row in m[k.Val] and can incorrectly populate non-active date_part dimension columns. Clearing the key when no fill default is configured avoids that leakage.
		default:
			// Insert the fill value if one was specified.
			if s.defaultValue != SkipDefault {
				m[k.Val] = castToType(s.defaultValue, k.Type)
			}

@devanbenz devanbenz marked this pull request as ready for review June 24, 2026 20:46
Comment thread query/compile.go
if err := ValidateDatePart(expr.Args); err != nil {
return err
}
// GROUP BY date_part over a subquery source is not supported: the

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is much easier to just not support GROUP BY for date_part in sub-queries. I could adjust the code to support it but it will add additional complexity that I'm unsure we want.

@davidby-influx

davidby-influx commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Code Review — date_part builtin + GROUP BY date_part(...)

Branch: db/76/date_part vs origin/master-1.x
Diff scope: merge-base 66b0dd767660 … HEAD 474ae657fa (24 files, +5,579 / −607)
Review: xhigh, workflow-backed (54 agents; 41 candidates verified → 19 refuted, 15 reported)

Findings are ranked most-severe first. Verdicts are from independent verifier agents.


🔴 Correctness — silently wrong results / data loss (single-node OSS, all CONFIRMED)

1. SELECT … INTO with GROUP BY date_part silently drops all but the last group

coordinator/statement_executor.go:1415convertRowToPoints treats the injected
date_part column (e.g. year) as a regular field, and every group row shares the same
tag set and the same representative bucket timestamp (single window, Interval=0).
Nothing blocks the INTO path.

SELECT count(v) INTO target FROM cpu GROUP BY date_part('year',time) writes
{year:2020,count:N1}@t0 and {year:2021,count:N2}@t0 with identical
measurement/tags/timestamp → they collide, last-write-wins, silent data loss,
plus a stray int field named after the part.

This is the worst one — silent data loss.

2. Multi-call GROUP BY date_part merges aggregates across groups, mislabeled

query/cursor.go:405multiScannerCursor.scan aligns per-call scanners on
(ts,name,tags) only and writes into one shared map keyed by the single
DatePartDimensionsString; each scanner's date_part value overwrites the others.

SELECT count(v), count(w) FROM cpu GROUP BY date_part('year',time) pairs one field's
count from one year with the other's count from a different year, stamped with
whichever scanner ran last. No compile guard rejects 2+ calls; no test covers it.

3. Raw (non-aggregate) SELECT … GROUP BY date_part does no grouping at all

query/select.go:710 — accepted but takes the aux-cursor branch with no
reduce/DimensionGrouper, so ScanAt hits the plain-int64 arm,
DatePartDimensionsString is never set, and GroupingKeys stays nil.

SELECT value FROM cpu GROUP BY date_part('year',time) returns one flat ungrouped
series with an extra year column — silently diverging from GROUP BY <tag>
semantics, no error.

4. fill(null) fragments date_part series under GROUP BY time(), date_part(...)

query/select.go:571 — filled points use a fixed auxFields slice that never
carries a DecodedDatePartKey, so empty-window rows lose the grouping value.
validateDatePartSelectFields rejects fill(previous/linear/number) but not the
default fill(null).

Empty windows emit null rows with empty GroupingKeys; the emitter's
sameGroupingKeys check then splits them into spurious extra series and fragments the
real ones.

5. Subquery validation bypass

query/compile.go:1411validateDatePartAnchor and the wildcard-collision
re-check run only on the outer statement in Prepare, never recursing into subquery
sources.

SELECT max(yr) FROM (SELECT host, date_part('year',time) AS yr FROM cpu) compiles
cleanly even though the equivalent top-level query is rejected — inner query plans as a
tag-only iterator emitting no points, so max(yr) silently returns nothing instead of
erroring. Inner SELECT * colliding with a stored field named year also escapes the
re-check.


🟠 Semantics diverge from SQL — but locked in by the new tests, so possibly intentional (CONFIRMED)

# Location Divergence
6 query/date_part.go:152 millisecond/microsecond return only the sub-second part — Postgres returns seconds*1000 + frac (off by up to 59,000 ms).
7 query/date_part.go:162 epoch uses t.Unix(), truncating fractional seconds that SQL epoch keeps.
8 query/date_part.go:163 isodow returns 0–6 (Sun=6) instead of SQL 1–7 (Sun=7) — off by one. date_part_test.go asserts the 0–6 values.

Worth a deliberate decision: if the intent is SQL compatibility, fix the code and the
tests; if InfluxDB-specific semantics are intended, document it.


🟡 Clustered/distributed only — not reachable in single-node OSS (PLAUSIBLE)

  • 9. query/point.go:266encodeAux/decodeAux can't serialize the
    DecodedDatePartKey struct; over the data-node wire codec the grouped value comes back
    null and all buckets collapse into one group.
  • 10. query/iterator.gen.go.tmpl:1486 — the generic FilterIterator.Next uses
    EvalBool with a non-CallValuer map, so a date_part(...) WHERE predicate reaching
    it filters out every point (zero rows). No in-repo callers of NewFilterIterator;
    reachability uncertain.
  • 11. tsdb/engine/tsm1/iterator.gen.go.tmpl:289itr.m is allocated only when
    Condition != nil, but written whenever NeedTimeRef. Safe locally (encoder keeps the
    invariant), but a wire-decoded options struct with NeedTimeRef=true, Condition=nil
    panics on a nil-map write → crashes the query/node.

⚡ Efficiency (CONFIRMED)

12. ⭐ DatePartValuer wired into every tsm1 query, date_part or not

tsdb/engine/tsm1/iterator.gen.go.tmpl:235 — unconditionally adds an extra valuer
indirection (an interface call per VarRef/Call lookup, per scanned point) to all
WHERE-filtered queries
— the common path. The scanner cursor already gates the
identical wiring behind needDatePart; opt.NeedTimeRef could gate it here too.

This is a per-point CPU regression across the whole engine, not just date_part queries
— arguably higher priority than its category suggests.

13. ResolveKeys allocates per-point garbage discarded on bucket hit

query/date_part.go:381 — allocates a fresh entries slice + a 9-byte EncodedKey
string per point, but the reduce loop consumes EncodedKey only on bucket creation (map
miss) → K−1 of every K allocations are GC garbage. Return only DimKey, compute
EncodedKey lazily, reuse a scratch buffer.

14. Per-field redundant work in Scan

query/cursor.go:277 — the DatePartDimensionsString lookup, type assertion,
dpd.Expr.String(), and GroupingKeys insert are recomputed per field though invariant
across the field loop; hoist above it.


📋 Convention (CONFIRMED)

15. New date_part tests use raw t.Fatal/t.Error instead of testify

tests/server_test.go:8085 (also 8790, 8863, 9242, 9297) — violates the project's
testify rule. The same functions already use require.NoError(t, err, "init error") for
setup, so it's internally inconsistent too.


Refuted (19, not reported)

Mostly DRY/maintainability "keep-in-sync" observations (parallel DatePartExpr switches,
duplicated AST walkers, LocationOrUTC not reused) where verifiers found no current
observable defect, plus the millisecond/isodow duplicates and a
DatePartValuer{}-in-compileFields "dead code" claim that was refuted (it can fire
with a literal second arg).


Suggested fix order

  1. 1 (data loss) and 12 (engine-wide perf regression) — highest impact.
  2. Correctness cluster 2–5.
  3. Decision on SQL semantics 6–8 (fix code+tests, or document).
  4. Mechanical cleanups 13–15.

@davidby-influx

Copy link
Copy Markdown
Contributor

AI review not verified by a human. Take with a grain of salt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.x area/influxql Issues related to InfluxQL query language kind/enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[1.x] Add date_part scalar function to influxdb

7 participants