Skip to content

fix(waterdata): get_ratings — don't abort on one bad feature, don't silently truncate#304

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/waterdata-ratings-reliability
Draft

fix(waterdata): get_ratings — don't abort on one bad feature, don't silently truncate#304
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/waterdata-ratings-reliability

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Problem

Two reliability issues in get_ratings:

1. One bad feature aborts the whole batch. The per-feature download loop catches (httpx.HTTPError, ValueError, OSError), but _download_and_parse_raise_for_non_200 raises the module's typed errors — RateLimited (429), ServiceUnavailable (5xx), RuntimeError (other 4xx) — which are all RuntimeError subclasses, not httpx.HTTPError. (A feature missing its assets.data.href also raises LookupError.) So a single rate-limited/failed/malformed feature in a multi-site request escapes the loop and kills the whole call instead of being logged and skipped as intended.

2. Large result sets are silently truncated. _search sent limit verbatim and returned only the first page — no pagination, no clamp — despite the docstring describing limit as a "page size … (capped at 10000)." A query matching more than one page lost the remainder with no indication.

Fix

  1. Broaden the per-feature except to also catch RuntimeError (covers the typed errors) and LookupError.
  2. Clamp the page size to 10,000 and follow the STAC next link until exhausted.

Behavior note for (2): only the case it fixes changes — queries matching more than one page (or callers who set a small limit) now return all matches instead of the first page. Common queries (default limit=10000, fewer matches) return the same as before.

Verification (live API + mock)

  • Pagination (live): _search(bbox=[-95, 40, -92, 42], limit=2) now returns 177 features (previously 2).
  • Resilience (mock): with _download_and_parse patched to raise RateLimited, get_ratings(...) logs Failed to download / parse … and returns {} — skipped, not raised (pre-fix it raised).
  • Happy path (live): get_ratings("USGS-01104475", download_and_parse=False) → 1 feature.
  • ruff clean.

🤖 Generated with Claude Code

Two reliability issues in the ratings getter:

1. The per-feature download loop caught only (httpx.HTTPError, ValueError,
   OSError), but _download_and_parse -> _raise_for_non_200 raises the module's
   typed errors (RateLimited / ServiceUnavailable / RuntimeError, all
   RuntimeError subclasses), and a feature missing its data asset raises
   LookupError. So one rate-limited / failed / malformed feature aborted the
   entire multi-site call instead of being logged and skipped. Broadened the
   except to cover RuntimeError and LookupError.

2. _search sent `limit` verbatim and returned only the first page, silently
   truncating large result sets despite the docstring. It now clamps the page
   size to the service max (10,000) and follows the STAC `next` link until
   exhausted, returning all matching features. (Behavior change for the narrow
   case it fixes: >1-page queries, or small explicit `limit`s, now return all
   matches; common default queries are unchanged.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant