Skip to content

fix(wqp): preserve leading zeros on code columns (HUCs, parameter codes, FIPS)#311

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/wqp-preserve-leading-zeros
Draft

fix(wqp): preserve leading zeros on code columns (HUCs, parameter codes, FIPS)#311
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/wqp-preserve-leading-zeros

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Problem

All nine WQP getters read the response with a bare
pd.read_csv(StringIO(text), delimiter=",", low_memory=False), which infers code columns as int/float and silently drops their significant leading zeros:

USGS parameter code  "00060"     -> 60
HUC8                 "07090002"  -> 7090002
FIPS / qualifier codes           -> numeric, zeros lost

R dataRetrieval reads these as character. (The NWIS RDB path is unaffected — it pins site_no/parm_cd to str already.)

Fix

Add a _read_wqp_csv helper (used by all nine read sites): read the header, then re-read with dtype=str for any column whose name is a code/identifier — ends with "code", or contains "identifier"/"huc"/"fips". This covers both the legacy and WQX3.0 column schemas while leaving value columns (e.g. ResultMeasureValue) numeric.

Verification

csv = "Location_HUCEightDigitCode,USGSpcode,ResultMeasureValue\n07090002,00060,1.5\n"
_read_wqp_csv(csv)
#   HUC8 -> "07090002"   (was np.int64(7090002))
#   pcode-> "00060"      (was np.int64(60))
#   ResultMeasureValue -> 1.5 (float, unchanged)

Added a regression test; the full wqp suite (15) passes — df.shape/df.size and the derived *DateTime columns are unchanged by the dtype shift. ruff clean.

Note: the committed WQX3 fixture (wqp3_results.txt) was itself generated post-corruption (its HUC cell is already 7090002), so the regression test uses a constructed row that actually carries a leading zero.

🤖 Generated with Claude Code

…es, FIPS)

The nine WQP getters read responses with a bare
`pd.read_csv(StringIO(text), delimiter=",", low_memory=False)`, which infers
code columns as int/float and silently drops their significant leading zeros:
a USGS parameter code "00060" became 60, HUC8 "07090002" became 7090002.
(R dataRetrieval reads these as character.)

Add a `_read_wqp_csv` helper that reads the header, then re-reads with
`dtype=str` for any column whose name is a code/identifier (ends with "code",
or contains "identifier"/"huc"/"fips") — covering both the legacy and WQX3.0
column schemas — while leaving value columns numeric. All nine read sites use
it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant