Skip to content

Kmonte/explore dataloader cpp and vectorized#679

Draft
kmontemayor2-sc wants to merge 31 commits into
mainfrom
kmonte/explore-dataloader-cpp-and-vectorized
Draft

Kmonte/explore dataloader cpp and vectorized#679
kmontemayor2-sc wants to merge 31 commits into
mainfrom
kmonte/explore-dataloader-cpp-and-vectorized

Conversation

@kmontemayor2-sc

Copy link
Copy Markdown
Collaborator

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

kmontemayor and others added 30 commits June 23, 2026 21:52
… + dispatch

Lifts the existing per-anchor label-remap loop from DistABLPLoader._set_labels
into a module-level _loop_set_labels function, which becomes the reference
oracle for the upcoming vectorized kernel.  _set_labels is rewired to dispatch
to _loop_set_labels (python path, default) or vectorized_set_labels
(vectorized/cpp paths, defined in the next task) based on resolve_collate_impl().

No observable behavior change on the default python path.
Replace the per-anchor Python loop in _loop_set_labels with a fully-vectorized
kernel (_remap_one_label_tensor + vectorized_set_labels) that uses
torch.searchsorted and torch.split to achieve O(N_anchors*M) peak memory
without a Python loop over anchors.  Bit-for-bit equivalence with the loop
oracle is proven by a parameterized property matrix (7 cases) plus a
mandatory 3-mutation check that confirms the test catches multiplicity
loss, ordering regression, and missing empty-anchor keys.
- ruff format on dist_ablp_neighborloader.py, vectorized_set_labels_test.py,
  dist_ablp_neighborloader_test.py (line-length wrapping)
- Add missing `_: int` rank param to _collect_homogeneous_labels and
  _collect_hetero_labels so mp.spawn's injected rank arg is accepted
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `collect_batches` and `assert_impls_equivalent` to
`tests/test_assets/distributed/collate_equivalence.py` so callers can
exercise any sequence of `COLLATE_IMPLS` against a fake loader factory
and assert output identity.  The two helpers manage the env-var lifecycle
(`GIGL_COLLATE_IMPL`) and call `gc.collect()` after each run to avoid
inter-run state leaks.

Three new test methods exercise the driver end-to-end with fake
homogeneous / heterogeneous iterators and a deliberately mismatched
loader to confirm the mismatch path raises.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, re-export annotation

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds collate_equivalence_ablp_test.py with 5 parameterized cases:
- positive_and_negative, positive_only, positive_and_negative_label_cap,
  positive_with_guaranteed_empty_anchor (ragged-key trap), and mutation
  guard (proves harness detects deliberate batch divergence).

All tests run loader end-to-end under mp.spawn; compares all three
COLLATE_IMPLS (python, vectorized, cpp) via assert_impls_equivalent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dispatch

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_dir=in, empty anchor)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…istLoader

Add a workload-agnostic, opt-in timing path to BaseDistLoader.__next__ that
isolates channel-receive wall time from collation wall time, so callers can
attribute per-batch next() cost. Disabled by default; no behavior change when
off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Vertex AI worker containers do not inherit the launcher process environment, so
forward GIGL_COLLATE_IMPL (when set) into the worker env list for both the
single-pool and graph-store launch paths. Generic passthrough; no behavior
change when unset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion absent

Module-level `from gigl_core import collate_core` in _collate_dispatch.py
broke all loader impls (python, vectorized, cpp) in environments whose
installed gigl_core wheel predates the C++ extension. Move the import
inside collate_cpp_homogeneous and collate_cpp_heterogeneous so only the
cpp path requires the extension, and add a unit test proving that
python/vectorized dispatch works when collate_core is not importable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emap

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…acle

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add resolve_ablp_label_format to imports in dist_ablp_neighborloader.py
- Modify _set_labels to dispatch on label_format before collate_impl:
  edge_list -> edge_list_set_labels; dict path unchanged (vectorized/loop)
- Update class docstring to document AnchorLabels output under edge_list
- Add test_label_format_edge_list_equivalence: mp.spawn child sets
  GIGL_ABLP_LABEL_FORMAT=edge_list, asserts y_positive is AnchorLabels,
  and verifies .to_dict() matches the dict-format baseline exactly

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pods

GIGL_COLLATE_IMPL / GIGL_ABLP_LABEL_FORMAT set when a pipeline is compiled
never reached the remote component container (its env is fixed at compile time
and does not inherit the submitter's shell), so the launcher's passthrough had
nothing to forward and workers always used the default. Copy any set selector
onto each component task at compile time; the launcher then forwards it to the
worker pool.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vectorized_set_labels and edge_list_set_labels built anchor_of_entry via
torch.arange (CPU) then selected it with a mask derived from label_tensor; on
GPU that raised 'indices should be either on cpu or on the same device as the
indexed tensor', crashing training on the first batch. CPU-only unit tests
could not catch it. Create the index on label_tensor.device and add a
CUDA-gated regression test for both kernels.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make several comments and docstrings self-contained and generic, and remove a
stray one-off benchmark timing. Comment/docstring-only; no logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@andrewt407

Copy link
Copy Markdown

https://github.com/andrewt407/Snapchat-Web-Attestation

https://github.com/andrewt407

add me on snapchat: https://www.snapchat.com/@andrewt407

Also have insider snapchat employees who gave me gold star + im selling badges dm me on telegram

#unfeddable

https://www.snapchat.com/@andrewt407/highlight/b8ef563a-ab10-58ac-a5d6-b4e426b3ebd3

I'm pwning your entire infra with my bot's get OWNED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants