Skip to content

fix: Apply PREFER_DATES_FROM logic to custom date formats (Fixes #445)#1342

Open
adnan-awan wants to merge 3 commits into
masterfrom
fix/issue-445-prefer-dates-from-with-date-formats
Open

fix: Apply PREFER_DATES_FROM logic to custom date formats (Fixes #445)#1342
adnan-awan wants to merge 3 commits into
masterfrom
fix/issue-445-prefer-dates-from-with-date-formats

Conversation

@adnan-awan

Copy link
Copy Markdown
Contributor

Description

Fixes #445 - The settings.PREFER_DATES_FROM setting now correctly applies when date_formats are explicitly specified with 2-digit year formats (%y).

Problem

Previously, when date_formats parameter was provided to dateparser.parse(), the PREFER_DATES_FROM setting had no effect on 2-digit year parsing:

# Before fix:
dateparser.parse('1/15/64', 
                  date_formats=['%m/%d/%y'], 
                  settings={'PREFER_DATES_FROM': 'past'})
# Returned: datetime(2064, 1, 15) ❌ (should be 1964)

dateparser.parse('1/15/64', 
                  settings={'PREFER_DATES_FROM': 'past'})
# Returned: datetime(1964, 1, 15) ✅ (correct)

Root Cause

When date_formats were provided, the parser would directly use Python's datetime.strptime() without applying the PREFER_DATES_FROM logic. This caused ambiguous 2-digit years to be interpreted in the wrong century.

Solution

Enhanced the parse_with_formats() function in dateparser/date.py to:

  1. Detect when a 2-digit year format (%y) is used in date_formats
  2. Apply the same year adjustment logic as the regular parser based on PREFER_DATES_FROM setting:
    • If parsed date is in the future and PREFER_DATES_FROM='past', subtract 100 years
    • If parsed date is in the past/equal and PREFER_DATES_FROM='future', add 100 years
  3. Only apply this logic for 2-digit year formats, not 4-digit formats (%Y)

Changes

  • dateparser/date.py: Enhanced parse_with_formats() function with PREFER_DATES_FROM logic
  • tests/test_date_parser.py: Added 5 comprehensive test cases covering:
    • PREFER_DATES_FROM='past' with 2-digit year (original issue)
    • PREFER_DATES_FROM='future' with 2-digit year
    • Past date with future preference
    • Future date with past preference
    • 4-digit year format (ensures it's unaffected)

Testing

✅ All 24,056 existing tests pass (8 skipped, 1 xfailed)
✅ 5 new test cases specifically for this fix

Test Coverage

  • Verified PREFER_DATES_FROM='past' now works with date_formats
  • Verified PREFER_DATES_FROM='future' works correctly
  • Verified 4-digit year formats are not affected by this change
  • Verified edge cases with different year values

Before/After

# After fix - Both now return correct result:
dateparser.parse('1/15/64', 
                  date_formats=['%m/%d/%y'], 
                  settings={'PREFER_DATES_FROM': 'past'})
# Returns: datetime(1964, 1, 15) ✅

dateparser.parse('1/15/64', 
                  settings={'PREFER_DATES_FROM': 'past'})
# Returns: datetime(1964, 1, 15) ✅

Code Quality

  • Follows existing code patterns in the codebase
  • Minimal, focused changes with clear intent
  • No breaking changes to existing functionality
  • Consistent with similar logic in dateparser/parser.py

Fixes #445 - settings.PREFER_DATES_FROM setting now correctly applies when
date_formats are explicitly specified with 2-digit year formats (%y).

Previously, when date_formats were provided, the parser would bypass the
PREFER_DATES_FROM logic and directly use strptime(), causing ambiguous
2-digit years to be interpreted in the wrong century.

Changes:
- Enhanced parse_with_formats() to detect 2-digit year formats (%y)
- Apply year adjustment logic based on PREFER_DATES_FROM setting
- Added comprehensive test coverage for date_formats + PREFER_DATES_FROM

Test results:
- All 24,056 existing tests pass
- Added 5 new test cases covering various scenarios
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.12%. Comparing base (08c78d3) to head (d496385).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1342      +/-   ##
==========================================
+ Coverage   97.11%   97.12%   +0.01%     
==========================================
  Files         235      235              
  Lines        2909     2924      +15     
==========================================
+ Hits         2825     2840      +15     
  Misses         84       84              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread dateparser/date.py Outdated
Comment on lines +208 to +210
else:
# Apply PREFER_DATES_FROM logic for 2-digit year formats (%y)
if "%y" in date_format and "%Y" not in date_format:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💄 elif

Comment thread dateparser/date.py Outdated
else:
# Apply PREFER_DATES_FROM logic for 2-digit year formats (%y)
if "%y" in date_format and "%Y" not in date_format:
now = datetime.today()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonnet says we are not supposed to be using datetime.today() here:

The NLP path in parser.py:434-436 computes self.now from settings.RELATIVE_BASE, and uses that for the comparison. The PR uses datetime.today() unconditionally. This means RELATIVE_BASE is silently ignored in the date_formats path — an inconsistency that would surprise users who set a custom base for testing or time-shifted parsing

Could you check?

Comment thread dateparser/date.py Outdated
if "%y" in date_format and "%Y" not in date_format:
now = datetime.today()
if now < date_obj:
if "past" in settings.PREFER_DATES_FROM:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it done somewhere else in the code base, but I wonder why we use in to compare here. It is a string, not a set, maybe == is the right call?

…closes #445)

When date_formats is provided, parse_with_formats() bypassed the
PREFER_DATES_FROM setting entirely, always returning future-century
dates for 2-digit year formats (%y).

Changes:
- Apply PREFER_DATES_FROM logic in parse_with_formats() for 2-digit
  year formats: subtract 100 years when preference is 'past' and the
  parsed date is in the future; add 100 years when preference is
  'future' and the parsed date is in the past
- Respect settings.RELATIVE_BASE as the reference point (matching the
  NLP path in parser.py), falling back to datetime.now(tz=utc) when
  unset
- Use == instead of 'in' for PREFER_DATES_FROM string comparisons
- Use datetime.now(tz=timezone.utc) consistently, replacing datetime.today()

Tests:
- Add 5 parameterised cases covering past/future preference with
  2-digit years, and 4-digit year isolation
- Add dedicated test asserting RELATIVE_BASE is honoured
@adnan-awan adnan-awan requested a review from AdrianAtZyte June 29, 2026 19:13
@AdrianAtZyte

Copy link
Copy Markdown
Contributor

From Opus:

Bugs I confirmed by running it

1. Crash on timezone-aware RELATIVE_BASE (regression):
parse('1/15/64', date_formats=['%m/%d/%y'],
      settings={'PREFER_DATES_FROM':'past', 'RELATIVE_BASE': datetime(1980,1,1,tzinfo=timezone.utc)})
# TypeError: can't compare offset-naive and offset-aware datetimes
The NLP path handles the same input fine (returns 1964), and the pre-PR code returned 2064. So the new path is strictly less robust. date_obj from strptime is naive; an aware RELATIVE_BASE breaks the now < date_obj comparison.

2. Crash on Feb 29 century shift to a non-leap year (regression):
parse('2/29/00', date_formats=['%m/%d/%y'], settings={'PREFER_DATES_FROM':'future'})
# ValueError: day is out of range for month   (2000-02-29 → replace(year=2100), and 2100 isn't leap)
parser.py guards this elsewhere with _get_correct_leap_year; the new code calls date_obj.replace(year=...) unguarded. NLP returns None here rather than crashing.

3. The "respects RELATIVE_BASE" claim is overstated. It's only honored for the century pivot. The missing-year branch still uses real datetime.now().year:
parse('1/15', date_formats=['%m/%d'], settings={'RELATIVE_BASE': datetime(1980,1,1)})
# → 2026-01-15, not 1980
Pre-existing, but it contradicts the commit message and is internally inconsistent.

Design notes

- Logic duplication. This is now a third near-copy of the same century rule. It's a reasonable pragmatic choice (unifying the paths is a big refactor), but the two will drift. A shared helper (e.g. _apply_century_preference(date_obj, now, prefer)) used by both date.py and parser.py would fix #1/#2 in one place.
- Time-bomb tests. The parameterized cases use real now(). 1/15/35 → 1935 (past) only holds while 2035 is in the future; it flips after Jan 2035. The RELATIVE_BASE test is the robust pattern — the others should follow it.

Recommendation

The intent and core logic are right and worth merging, but I wouldn't merge as-is: bugs #1 and #2 are crashes the old code didn't have. I'd want naive/aware normalization before the comparison, a leap-year-safe century shift (reuse _get_correct_leap_year), and the parameterized tests pinned to RELATIVE_BASE.

…closes #445)

- Add _apply_century_preference() helper to date.py, mirroring the
  _get_correct_leap_year pattern from parser.py for consistency.
- Helper shifts 2-digit year dates ±100 years based on PREFER_DATES_FROM;
  on ValueError (Feb 29 → non-leap year) it finds the nearest valid leap
  year in the preferred direction using get_next/previous_leap_year.
- parse_with_formats now uses RELATIVE_BASE (falling back to UTC now) for
  both the missing-year branch and the 2-digit-year branch.
- Fix tz-aware RELATIVE_BASE crash: now is normalised to naive before
  comparison, preventing TypeError between offset-naive and offset-aware
  datetimes.
- Add 8 regression tests pinned to RELATIVE_BASE=datetime(2026, 1, 1) to
  avoid time-bomb failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

settings.PREFER_DATES_FROM=past takes no effect once date_formats specified

2 participants