Describe the bug
parse_vrt() in xrspatial/geotiff/_vrt.py resolves the per-band data type with:
dtype_name = band_elem.get('dataType', 'Float32')
dtype = np.dtype(_DTYPE_MAP.get(dtype_name, np.float32))
at line 280-281. _DTYPE_MAP (line 119) only covers eight GDAL dtype names: Byte, UInt16, Int16, UInt32, Int32, Float32, Float64, Int8. Any other valid GDAL dataType -- UInt64, Int64, CInt16, CInt32, CFloat32, CFloat64 -- silently falls back to float32.
For UInt64 / Int64 this is precision loss above ~2^24 (a 64-bit integer raster reads as float32 and quietly loses low-order bits). For the complex types it discards the imaginary component entirely. The XML claims one thing; the array holds another.
The .get(..., np.float32) default also dates from when only Float32 was the documented default for missing dataType. The default-for-missing-attribute and default-for-unknown-attribute cases should not collapse together.
Expected behavior
Unknown dataType values should raise a typed ValueError from parse_vrt() so the malformed/unsupported VRT is surfaced rather than silently re-typed.
If a GDAL dataType can be represented natively (e.g. UInt64, Int64), it should be added to _DTYPE_MAP deliberately. Complex types remain unsupported and should raise with a clear message.
Categories
- Cat 2 (data corruption): silent precision loss for 64-bit integer / complex VRT files
- Cat 3 (input validation): unknown XML attribute value is accepted instead of rejected
Describe the bug
parse_vrt()inxrspatial/geotiff/_vrt.pyresolves the per-band data type with:at line 280-281.
_DTYPE_MAP(line 119) only covers eight GDAL dtype names:Byte,UInt16,Int16,UInt32,Int32,Float32,Float64,Int8. Any other valid GDAL dataType --UInt64,Int64,CInt16,CInt32,CFloat32,CFloat64-- silently falls back tofloat32.For
UInt64/Int64this is precision loss above ~2^24 (a 64-bit integer raster reads as float32 and quietly loses low-order bits). For the complex types it discards the imaginary component entirely. The XML claims one thing; the array holds another.The
.get(..., np.float32)default also dates from when onlyFloat32was the documented default for missingdataType. The default-for-missing-attribute and default-for-unknown-attribute cases should not collapse together.Expected behavior
Unknown
dataTypevalues should raise a typedValueErrorfromparse_vrt()so the malformed/unsupported VRT is surfaced rather than silently re-typed.If a GDAL dataType can be represented natively (e.g.
UInt64,Int64), it should be added to_DTYPE_MAPdeliberately. Complex types remain unsupported and should raise with a clear message.Categories