-
Notifications
You must be signed in to change notification settings - Fork 176
Model-centric refactoring to reduce dataset creation #2646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
VeckoTheGecko
wants to merge
45
commits into
Parcels-code:main
Choose a base branch
from
VeckoTheGecko:restructure
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
f5eb442
Typing
VeckoTheGecko 735e1c0
Restructure to introduce models
VeckoTheGecko e03ccf7
Refactor from_sgrid_conventions to model
VeckoTheGecko f198994
Fix typing
VeckoTheGecko aaf6846
Refactor from_ugrid_conventions to model
VeckoTheGecko 915b0cb
Fix typing
VeckoTheGecko 7787c86
Add FieldSet.models
VeckoTheGecko af1dbf5
Move "time_interval" to model
VeckoTheGecko 035bd3f
Update Model ABC
VeckoTheGecko 9e24a14
Update Field init to take model
VeckoTheGecko b69402a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 915b0b6
Add XGCM adapter
VeckoTheGecko 9685ddf
Remove xgcm constructors
VeckoTheGecko 23bc2d4
Update _transpose_xfield_data_to_tzyx to work with SGRID metadata
VeckoTheGecko 82f2001
Define SGRID data pre-processing
VeckoTheGecko f222d4b
Create grid object within StructuredModel
VeckoTheGecko 108d3b2
Allow for time dimension size 1
VeckoTheGecko 065c96d
Disable assert_all_field_dims_have_axis check
VeckoTheGecko 538477d
New interpolator API
VeckoTheGecko f1799ac
Update interpolators to use new API
VeckoTheGecko 8e121a0
Adding overwrite option to ParticleFile API (#2655)
erikvansebille 6ef510d
Make sure the ProgressBar also works for negative dt (#2659)
erikvansebille e120572
Add tutorial for SCHISM model for lake ontario. (#2660)
fluidnumericsJoe 8783afa
Update installation instructions (#2661)
VeckoTheGecko 66741c4
Renaming vector_interp_method to interp_method (#2662)
erikvansebille 9be378b
Issue: Consistent dimensions order #2374 (#2663)
PeterWolfram 1345f6e
Enable adding of fieldsets
VeckoTheGecko b87fae4
Add assert_compatible_fieldsets
VeckoTheGecko 13644ea
Fix test suite
VeckoTheGecko 5569031
Define how to set interpolators
VeckoTheGecko 54674c6
Fix test suite
VeckoTheGecko f2ef7ce
Merge
VeckoTheGecko 79f53d9
Add task and changes
VeckoTheGecko 5080a2b
LLM instructions
VeckoTheGecko dba1ae3
Update test suite
VeckoTheGecko 234ae3c
Fix test suite
VeckoTheGecko 3b2d271
Enable constant field tests
VeckoTheGecko e836212
Disable reprs
VeckoTheGecko 62db278
Refactor constant field logic to use dedicated model
VeckoTheGecko b304e5a
Fix constant field logic
VeckoTheGecko 7594fac
Enable unstructured tests
VeckoTheGecko 9bf76d7
Update unstructured grid interpolators
VeckoTheGecko fcc41fa
Update unstructured FieldSet ingestion in tests
VeckoTheGecko 131ea70
Update comments
VeckoTheGecko caa3432
Fix test_time1D_field
VeckoTheGecko File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,262 @@ | ||
| # Refactoring Summary: `field.py`, `fieldset.py`, `model.py` | ||
|
|
||
| This document describes the refactoring introduced in commit `69338d87a89763efbb1e3886b470e09992812978` relative to `main`. | ||
|
|
||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| The central change is the introduction of a new `Model` abstraction layer between raw xarray/uxarray data and the `Field`/`FieldSet` objects. Previously, `Field` owned its data and grid directly. Now, `Field` is a thin view over a `Model`, and `FieldSet` is a container of `Model` objects rather than `Field` objects. | ||
|
|
||
| --- | ||
|
|
||
| ## New file: `src/parcels/_core/model.py` | ||
|
|
||
| ### `Model` (abstract base class) | ||
|
|
||
| Abstract class with three required attributes: | ||
|
|
||
| - `data: Any` — the underlying dataset | ||
| - `grid: BaseGrid` — the grid object | ||
| - `field_to_interpolator: dict[str, ScalarInterpolator | VectorInterpolator]` — maps field names to interpolator instances | ||
|
|
||
| Abstract methods: | ||
|
|
||
| - `construct_fields() -> list[Field | VectorField]` — build field objects from this model | ||
| - `scalar_field_names -> list[str]` — names of scalar fields in the data | ||
| - `assert_valid_field_data(field_data)` — validate a single field's data | ||
|
|
||
| Concrete methods on `Model`: | ||
|
|
||
| - `assert_valid_model_data()` — iterates `scalar_field_names` and calls `assert_valid_field_data` on each | ||
| - `time_interval -> TimeInterval | None` — computed from `self.data` | ||
|
|
||
| ### `StructuredModel(Model)` | ||
|
|
||
| For structured (SGRID) grid data backed by `xr.Dataset`. | ||
|
|
||
| Constructor: `StructuredModel(data: xr.Dataset, mesh: Mesh)` | ||
|
|
||
| - Calls `preprocess_sgrid_model_data(data)` to transpose fields to `(t, z, y, x)` order | ||
| - Creates an `XGrid(data, mesh)` grid | ||
| - Initializes `field_to_interpolator = {}` | ||
| - Calls `assert_valid_model_data()` on construction | ||
|
|
||
| `from_sgrid_conventions(cls, ds, mesh=None)` classmethod: | ||
|
|
||
| - Copied/moved from `FieldSet.from_sgrid_conventions` — handles time axis renaming, mesh type inference | ||
| - Sets default interpolator `XLinear()` on all scalar fields after construction | ||
| - Returns a `StructuredModel` instance | ||
|
|
||
| `construct_fields()`: | ||
|
|
||
| - Creates `Field("U", self)`, `Field("V", self)` etc., then wraps them in `VectorField("UV", ...)` if U+V present | ||
| - Uses `XLinear_Velocity()` for A-grids, `CGrid_Velocity()` for C-grids | ||
|
|
||
| ### `UnstructuredModel(Model)` | ||
|
|
||
| For unstructured (UGRID) grid data backed by `ux.UxDataset`. | ||
|
|
||
| Constructor: `UnstructuredModel(data: ux.UxDataset, grid: UxGrid)` | ||
|
|
||
| `from_ugrid_conventions(cls, ds, mesh="spherical")` classmethod: | ||
|
|
||
| - Validates required dimensions (`time`, `zf`, `zc`) | ||
| - Creates `UxGrid`, calls `_discover_ux_U_and_V`, returns instance | ||
|
|
||
| `construct_fields()`: | ||
|
|
||
| - Uses `_select_uxinterpolator(da)` to pick the appropriate interpolator per field | ||
| - Note: interpolator is passed as 3rd arg to `Field(name, model, interp)` — see Field changes below | ||
|
|
||
| ### Helper functions moved from `fieldset.py` to `model.py` | ||
|
|
||
| - `_discover_ux_U_and_V(ds)` — unchanged logic | ||
| - `_select_uxinterpolator(da)` — unchanged logic | ||
| - `_get_mesh_type_from_sgrid_dataset(ds)` — unchanged logic | ||
| - `_is_coordinate_in_degrees(da)` — unchanged logic | ||
| - `_get_time_interval(data)` — logic adjusted: checks `"time" not in data or data["time"].size == 1` (previously checked `data.shape[0] == 1`) | ||
| - `_assert_valid_uxdataarray(data)` — unchanged logic | ||
| - `_assert_has_time_coordinate(da)` — new helper extracted from old `Field.__init__` | ||
|
|
||
| ### New helper in `model.py` | ||
|
|
||
| - `preprocess_sgrid_model_data(ds)` — transposes all non-grid-topology data vars to `(t, z, y, x)` using `_transpose_xfield_data_to_tzyx` | ||
|
|
||
| --- | ||
|
|
||
| ## Changes to `src/parcels/_core/field.py` | ||
|
|
||
| ### `Field.__init__` signature change | ||
|
|
||
| **Before:** | ||
|
|
||
| ```python | ||
| Field(name: str, data: xr.DataArray | ux.UxDataArray, grid: UxGrid | XGrid, interp_method: Callable) | ||
| ``` | ||
|
|
||
| **After:** | ||
|
|
||
| ```python | ||
| Field(name: str, model: Model) | ||
| ``` | ||
|
|
||
| - `data`, `grid`, and `interp_method` are no longer constructor arguments | ||
| - The constructor only sets `self.name`, `self.model`, and `self.igrid = -1` | ||
| - Validation (data type checks, axis checks, time interval extraction) removed from `__init__` | ||
|
|
||
| ### `Field` properties (delegating to model) | ||
|
|
||
| Three new properties proxy into the model: | ||
|
|
||
| ```python | ||
| @property | ||
| def data(self): | ||
| return self.model.data[self.name] | ||
|
|
||
| @property | ||
| def grid(self): | ||
| return self.model.grid | ||
|
|
||
| @property | ||
| def time_interval(self): | ||
| return self.model.time_interval | ||
| ``` | ||
|
|
||
| These preserve backward compatibility for code that reads `field.data`, `field.grid`, `field.time_interval`. | ||
|
|
||
| ### `Field.interp_method` property/setter | ||
|
|
||
| **Before:** stored as `self._interp_method`; validated via `assert_same_function_signature` against `ZeroInterpolator` | ||
|
|
||
| **After:** stored in `self.model.field_to_interpolator[self.name]` | ||
|
|
||
| - Getter raises `AttributeError` (not `KeyError`) if no interpolator is set for this field | ||
| - Setter validates `isinstance(value, ScalarInterpolator)` instead of checking function signature | ||
|
|
||
| ### Interpolator call convention change | ||
|
|
||
| **Before:** `self._interp_method(particle_positions, grid_positions, self)` | ||
|
|
||
| **After:** `self.interp_method.interp(particle_positions, grid_positions, self)` | ||
|
|
||
| Interpolators are now objects with an `.interp(...)` method, not plain callables. | ||
|
|
||
| ### `VectorField` changes | ||
|
|
||
| - `interp_method` parameter type annotation changed from `Callable | None` to `VectorInterpolator | None` | ||
| - Validation changed from `assert_same_function_signature(...)` to `isinstance(interp_method, VectorInterpolator)` | ||
| - Setter similarly validates `isinstance(method, VectorInterpolator)` | ||
| - Call site: `self._interp_method.interp(...)` instead of `self._interp_method(...)` | ||
|
|
||
| ### Removed from `field.py` | ||
|
|
||
| - `_assert_valid_uxdataarray` — moved to `model.py` | ||
| - `_assert_compatible_combination` — removed (validation now handled per-model) | ||
| - `_get_time_interval` — moved to `model.py` | ||
| - Imports: `uxarray`, `xarray`, `Callable`, `TimeInterval`, `ZeroInterpolator`, `ZeroInterpolator_Vector`, `assert_same_function_signature`, `_transpose_xfield_data_to_tzyx`, `assert_all_field_dims_have_axis` | ||
|
|
||
| --- | ||
|
|
||
| ## Changes to `src/parcels/_core/fieldset.py` | ||
|
|
||
| ### `FieldSet.__init__` signature change | ||
|
|
||
| **Before:** `FieldSet(fields: list[Field | VectorField])` | ||
|
|
||
| **After:** `FieldSet(models: list[Model])` | ||
|
|
||
| - Now stores `self.models: list[Model]` | ||
| - Calls `self.reconstruct_fields()` on init to build `self._fields` | ||
| - `assert_compatible_calendars(fields)` call commented out (TODO) | ||
|
|
||
| ### New `FieldSet.fields` property | ||
|
|
||
| `_fields` is now the backing store; `fields` is a lazy property that calls `reconstruct_fields()` if `_fields` is `None`. | ||
|
|
||
| ### New `FieldSet.reconstruct_fields()` method | ||
|
|
||
| Iterates `self.models`, calls `model.construct_fields()` on each, flattens into `self._fields` dict. | ||
|
|
||
| ### `context` renamed to `constants` | ||
|
|
||
| - `self.context` → `self.constants` | ||
| - `add_context(name, value)` → `add_constant(name, value)` | ||
| - `add_constant` now validates that `value` is `float | np.floating | int | np.integer` | ||
| - `__setattr__` override that guarded `context` keys has been **removed** | ||
|
|
||
| ### `__getattr__` updated | ||
|
|
||
| Now checks `self._fields` and `self.constants` (was `self.fields` and `self.context`). | ||
|
|
||
| ### New `FieldSet.__add__` operator | ||
|
|
||
| ```python | ||
| def __add__(self, other: FieldSet) -> FieldSet: | ||
| assert_compatible_fieldsets(self, other) | ||
| combined = FieldSet(self.models + other.models) | ||
| combined.constants = {**self.constants, **other.constants} | ||
| return combined | ||
| ``` | ||
|
|
||
| ### `from_ugrid_conventions` simplified | ||
|
|
||
| **Before:** ~15 lines building grid, discovering U/V, creating Field objects, returning `cls(list(fields.values()))` | ||
|
|
||
| **After:** | ||
|
|
||
| ```python | ||
| model = UnstructuredModel.from_ugrid_conventions(ds, mesh) | ||
| return cls([model]) | ||
| ``` | ||
|
|
||
| ### `from_sgrid_conventions` simplified | ||
|
|
||
| **Before:** ~50 lines handling time axis, xgcm grid creation, field creation | ||
|
|
||
| **After:** | ||
|
|
||
| ```python | ||
| model = StructuredModel.from_sgrid_conventions(ds, mesh) | ||
| return cls([model]) | ||
| ``` | ||
|
|
||
| ### `add_field` constant field creation updated | ||
|
|
||
| The inline `xgcm.Grid(...)` call when adding a constant scalar field is replaced with constructing `XGrid(ds, mesh=mesh)` directly (after attaching SGRID metadata via `sgrid._attach_sgrid_metadata`). | ||
|
|
||
| ### New module-level function: `assert_compatible_fieldsets` | ||
|
|
||
| ```python | ||
| def assert_compatible_fieldsets(left: FieldSet, right: FieldSet) -> None | ||
| ``` | ||
|
|
||
| Raises `ValueError` if the two fieldsets share any field names or constant names. | ||
|
|
||
| ### Removed from `fieldset.py` | ||
|
|
||
| - `xgcm` import | ||
| - `UxGrid` import | ||
| - `_DEFAULT_XGCM_KWARGS` import | ||
| - `logger` import | ||
| - `_ds_rename_using_standard_names` import | ||
| - Most interpolator imports (only `XConstantField` remains) | ||
| - `_discover_ux_U_and_V` — moved to `model.py` | ||
| - `_select_uxinterpolator` — moved to `model.py` | ||
| - `_get_mesh_type_from_sgrid_dataset` — moved to `model.py` | ||
| - `_is_coordinate_in_degrees` — moved to `model.py` | ||
|
|
||
| --- | ||
|
|
||
| ## Summary of architectural intent | ||
|
|
||
| | Concern | Before | After | | ||
| | ----------------------- | -------------------------------------------------- | -------------------------------------------------------- | | ||
| | Data ownership | `Field` (held `self.data`, `self.grid`) | `Model` (holds `self.data`, `self.grid`) | | ||
| | Interpolator storage | `Field._interp_method` (per-field callable) | `Model.field_to_interpolator` (dict of objects) | | ||
| | Interpolator type | Any callable matching `ZeroInterpolator` signature | Instance of `ScalarInterpolator` / `VectorInterpolator` | | ||
| | Interpolator invocation | `interp_method(positions, grid_positions, field)` | `interp_method.interp(positions, grid_positions, field)` | | ||
| | `FieldSet` contents | `list[Field \| VectorField]` | `list[Model]` | | ||
| | Field construction | Done in `FieldSet.from_*` classmethods | Delegated to `Model.construct_fields()` | | ||
| | `context` / `constants` | `fieldset.context` (any type) | `fieldset.constants` (float/int only) | | ||
| | `FieldSet` combination | Not supported | `fieldset_a + fieldset_b` via `__add__` | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really part of this PR, is it? Would have been better to make a separate PR?