Fix strptime failure with non-zero-padded format codes by stephen-zhao · Pull Request #6 · stephen-zhao/datetime_matcher

stephen-zhao · 2026-03-29T05:21:38Z

Summary

Fixes Non-zero padding formats #4: DatetimeExtractor fails to parse datetimes when using non-zero-padded format codes like %-d, %-m, etc. on Linux CPython, where strptime doesn't accept the - modifier.
Normalizes format codes by stripping the - modifier (e.g. %-d → %d) before passing to strptime, since strptime can already handle non-zero-padded values with the standard directives.
Adds unit tests for non-zero-padded date extraction using TEST_MINUS_SIGNS and TEST_DATE_LONG_FORM pipelines.

Test plan

All 32 tests pass, including 5 new tests covering non-zero-padded format codes
Verified %-m/%-d extraction works (e.g. 1/11/2017 with %-d/%m/%Y)
Verified %-d in long-form dates works (e.g. Wednesday January 5, 2022)
Existing tests unchanged and still passing

https://claude.ai/code/session_0137rpSUUxos1kRYsMtH8jiE

Summary by Sourcery

Handle non-zero-padded datetime format codes in DatetimeExtractor and add regression tests to cover them.

Bug Fixes:

Fix datetime parsing failures when using non-zero-padded strptime format codes (e.g. '%-d', '%-m') on platforms that do not support the '-' modifier.

Tests:

Add tests for non-zero-padded date components in filename-based extraction using the TEST_MINUS_SIGNS pipeline.
Add tests for non-zero-padded day components in long-form date strings using the TEST_DATE_LONG_FORM pipeline.

On some platforms (notably Linux CPython), strptime does not accept the '-' modifier in format codes like %-d. Since strptime's %d can already parse non-zero-padded values, we normalize format codes by stripping the '-' modifier before passing them to strptime. Fixes #4 https://claude.ai/code/session_0137rpSUUxos1kRYsMtH8jiE

sourcery-ai · 2026-03-29T05:21:44Z

Reviewer's Guide

Normalizes datetime format tokens by stripping unsupported '%-' modifiers before calling strptime, and adds regression tests to ensure non-zero-padded date formats are correctly parsed in existing pipelines.

Sequence diagram for datetime parsing with format code normalization

sequenceDiagram
    actor Client
    participant DatetimeExtractor
    participant Match
    participant DfregexToken
    participant Strptime

    Client->>DatetimeExtractor: extract_datetime(text, pipeline)
    DatetimeExtractor->>Match: finditer on text
    loop for each match
        DatetimeExtractor->>Match: groupdict()
        Match-->>DatetimeExtractor: groups
        DatetimeExtractor->>DfregexToken: get df_tokens[datetime_group_num]
        DfregexToken-->>DatetimeExtractor: format_code
        DatetimeExtractor->>DatetimeExtractor: __normalize_format_code(format_code)
        DatetimeExtractor-->>DatetimeExtractor: normalized_format_code
        DatetimeExtractor->>Strptime: strptime(datetime_string_value, normalized_format_code)
        Strptime-->>DatetimeExtractor: datetime_object or error
    end
    DatetimeExtractor-->>Client: parsed datetimes

Class diagram for updated DatetimeExtractor format normalization

classDiagram
    class DatetimeExtractor {
        +__finditer_with_limit(pattern, text, limit)
        +__parse_match_into_maybe_datetime(match, df_tokens)
        +__normalize_format_code(format_code) static
    }

    class DfregexToken {
        +value
    }

    class Match {
        +groupdict()
    }

    DatetimeExtractor ..> DfregexToken : uses
    DatetimeExtractor ..> Match : parses

File-Level Changes

Change	Details	Files
Normalize datetime format codes before passing them to strptime to support non-zero-padded directives on platforms that reject '%-' modifiers.	Introduce a private static helper to strip the '-' modifier from all '%-' sequences in format codes Use the new normalizer when building the list of datetime format codes from df_tokens in __parse_match_into_maybe_datetime Preserve existing error handling for problematic tokens while ensuring normalized codes are passed to strptime	`src/datetime_matcher/datetime_extractor.py`
Add regression tests verifying extraction of non-zero-padded dates in existing pipelines.	Add parametrized tests covering non-zero-padded month/day parsing in TEST_MINUS_SIGNS pipeline Add parametrized tests covering non-zero-padded day parsing in long-form date strings in TEST_DATE_LONG_FORM pipeline Assert that only a single datetime is produced per input and that iteration stops afterward to match existing extractor behavior	`test/test_datetime_extractor.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#4	Ensure DatetimeExtractor can parse datetimes when using non-zero-padded format codes (e.g. '%-d', '%-m') by adjusting the format string before calling strptime so it works on platforms where strptime does not accept the '-' modifier.	✅
#4	Add automated tests that cover extraction of dates using non-zero-padded format codes to prevent regressions.	✅

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

The __normalize_format_code implementation currently does a blanket replace("%-", "%"); if any format token can contain %− in a non-directive context this will silently alter the semantics—consider constraining the replacement (e.g., only when followed by known directive characters) or adding a brief comment explaining why a global replace is safe here.
Since __normalize_format_code is logically a pure helper, you might consider making it a module-level function or a @staticmethod with single underscore naming to avoid name mangling and keep it more easily testable/reusable if other components ever need the same normalization.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `__normalize_format_code` implementation currently does a blanket `replace("%-", "%")`; if any format token can contain `%−` in a non-directive context this will silently alter the semantics—consider constraining the replacement (e.g., only when followed by known directive characters) or adding a brief comment explaining why a global replace is safe here.
- Since `__normalize_format_code` is logically a pure helper, you might consider making it a module-level function or a `@staticmethod` with single underscore naming to avoid name mangling and keep it more easily testable/reusable if other components ever need the same normalization.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai Bot reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix strptime failure with non-zero-padded format codes#6

Fix strptime failure with non-zero-padded format codes#6
stephen-zhao wants to merge 1 commit intomainfrom
claude/fix-github-issue-xSBsq

stephen-zhao commented Mar 29, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Mar 29, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stephen-zhao commented Mar 29, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for datetime parsing with format code normalization

Class diagram for updated DatetimeExtractor format normalization

File-Level Changes

Assessment against linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stephen-zhao commented Mar 29, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Mar 29, 2026 •

edited

Loading