Skip to content

geotiff: VRT unknown dataType silently maps to float32 #1783

@brendancol

Description

@brendancol

Describe the bug

parse_vrt() in xrspatial/geotiff/_vrt.py resolves the per-band data type with:

dtype_name = band_elem.get('dataType', 'Float32')
dtype = np.dtype(_DTYPE_MAP.get(dtype_name, np.float32))

at line 280-281. _DTYPE_MAP (line 119) only covers eight GDAL dtype names: Byte, UInt16, Int16, UInt32, Int32, Float32, Float64, Int8. Any other valid GDAL dataType -- UInt64, Int64, CInt16, CInt32, CFloat32, CFloat64 -- silently falls back to float32.

For UInt64 / Int64 this is precision loss above ~2^24 (a 64-bit integer raster reads as float32 and quietly loses low-order bits). For the complex types it discards the imaginary component entirely. The XML claims one thing; the array holds another.

The .get(..., np.float32) default also dates from when only Float32 was the documented default for missing dataType. The default-for-missing-attribute and default-for-unknown-attribute cases should not collapse together.

Expected behavior

Unknown dataType values should raise a typed ValueError from parse_vrt() so the malformed/unsupported VRT is surfaced rather than silently re-typed.

If a GDAL dataType can be represented natively (e.g. UInt64, Int64), it should be added to _DTYPE_MAP deliberately. Complex types remain unsupported and should raise with a clear message.

Categories

  • Cat 2 (data corruption): silent precision loss for 64-bit integer / complex VRT files
  • Cat 3 (input validation): unknown XML attribute value is accepted instead of rejected

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions