Fix strptime failure with non-zero-padded format codes#6
Open
stephen-zhao wants to merge 1 commit intomainfrom
Open
Fix strptime failure with non-zero-padded format codes#6stephen-zhao wants to merge 1 commit intomainfrom
stephen-zhao wants to merge 1 commit intomainfrom
Conversation
On some platforms (notably Linux CPython), strptime does not accept the '-' modifier in format codes like %-d. Since strptime's %d can already parse non-zero-padded values, we normalize format codes by stripping the '-' modifier before passing them to strptime. Fixes #4 https://claude.ai/code/session_0137rpSUUxos1kRYsMtH8jiE
Reviewer's GuideNormalizes datetime format tokens by stripping unsupported '%-' modifiers before calling strptime, and adds regression tests to ensure non-zero-padded date formats are correctly parsed in existing pipelines. Sequence diagram for datetime parsing with format code normalizationsequenceDiagram
actor Client
participant DatetimeExtractor
participant Match
participant DfregexToken
participant Strptime
Client->>DatetimeExtractor: extract_datetime(text, pipeline)
DatetimeExtractor->>Match: finditer on text
loop for each match
DatetimeExtractor->>Match: groupdict()
Match-->>DatetimeExtractor: groups
DatetimeExtractor->>DfregexToken: get df_tokens[datetime_group_num]
DfregexToken-->>DatetimeExtractor: format_code
DatetimeExtractor->>DatetimeExtractor: __normalize_format_code(format_code)
DatetimeExtractor-->>DatetimeExtractor: normalized_format_code
DatetimeExtractor->>Strptime: strptime(datetime_string_value, normalized_format_code)
Strptime-->>DatetimeExtractor: datetime_object or error
end
DatetimeExtractor-->>Client: parsed datetimes
Class diagram for updated DatetimeExtractor format normalizationclassDiagram
class DatetimeExtractor {
+__finditer_with_limit(pattern, text, limit)
+__parse_match_into_maybe_datetime(match, df_tokens)
+__normalize_format_code(format_code) static
}
class DfregexToken {
+value
}
class Match {
+groupdict()
}
DatetimeExtractor ..> DfregexToken : uses
DatetimeExtractor ..> Match : parses
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
__normalize_format_codeimplementation currently does a blanketreplace("%-", "%"); if any format token can contain%−in a non-directive context this will silently alter the semantics—consider constraining the replacement (e.g., only when followed by known directive characters) or adding a brief comment explaining why a global replace is safe here. - Since
__normalize_format_codeis logically a pure helper, you might consider making it a module-level function or a@staticmethodwith single underscore naming to avoid name mangling and keep it more easily testable/reusable if other components ever need the same normalization.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `__normalize_format_code` implementation currently does a blanket `replace("%-", "%")`; if any format token can contain `%−` in a non-directive context this will silently alter the semantics—consider constraining the replacement (e.g., only when followed by known directive characters) or adding a brief comment explaining why a global replace is safe here.
- Since `__normalize_format_code` is logically a pure helper, you might consider making it a module-level function or a `@staticmethod` with single underscore naming to avoid name mangling and keep it more easily testable/reusable if other components ever need the same normalization.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DatetimeExtractorfails to parse datetimes when using non-zero-padded format codes like%-d,%-m, etc. on Linux CPython, wherestrptimedoesn't accept the-modifier.-modifier (e.g.%-d→%d) before passing tostrptime, sincestrptimecan already handle non-zero-padded values with the standard directives.TEST_MINUS_SIGNSandTEST_DATE_LONG_FORMpipelines.Test plan
%-m/%-dextraction works (e.g.1/11/2017with%-d/%m/%Y)%-din long-form dates works (e.g.Wednesday January 5, 2022)https://claude.ai/code/session_0137rpSUUxos1kRYsMtH8jiE
Summary by Sourcery
Handle non-zero-padded datetime format codes in DatetimeExtractor and add regression tests to cover them.
Bug Fixes:
Tests: