Skip to content

Optimize Atom Mapping procedure#906

Open
kfir4444 wants to merge 19 commits into
mainfrom
optimize_am
Open

Optimize Atom Mapping procedure#906
kfir4444 wants to merge 19 commits into
mainfrom
optimize_am

Conversation

@kfir4444

@kfir4444 kfir4444 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Atom Mapping Performance Improvements — optimize_am branch

Overview

Reduce runtime of the atom mapping procedure, utilzing algorithmic and programmatic solutions.


Change 1 — arc/species/zmat.py: Collinear triplet guard in determine_d_atoms()

Problem

zmat_nerf implements the SN-NeRF atom placement algorithm: given three already-placed reference atoms A, B, C, it places a new atom D at bond length r from C, bond angle θ(B,C,D), and dihedral φ(A,B,C,D). This is the inner loop of _add_nth_atom_to_coords in zmat.py. However, this doesn't work if co-linearity exist.

determine_d_atoms() could select a reference atom triplet (B, C, D) that is nearly
collinear. _add_nth_atom_to_coords() later computes cross(B−A, normalize(C−B)) for
NeRF placement; a near-zero cross product produces NaN coordinates and causes mapping
failures on linear-geometry intermediates.

Fix

Added _nerf_cross_sq() — a double-precision function computing
|cross(B−A, normalize(C−B))|². A threshold _NERF_CROSS_SQ_MIN = 0.01
(≈3.8° from linear at 1.5 Å bond) gates triplet acceptance:

if _nerf_cross_sq(coords, ...) >= _NERF_CROSS_SQ_MIN:
    d_atoms.append(i)
    break

The same guard is applied to the [n, 2, 1, 0] fallback path, with a scan
for a valid 4th reference atom when the fallback itself is collinear.


Change 2 — arc/species/_zmat_kernels.py + arc/species/vectors.py: C kernel wiring

One key bottleneck of atom mapping comes from vector computations:
The cost decomposition for C₈H₁₈ species to species mapping (cProfile, 0.770 s total, 1,284,756 calls)

Component Calls Cumul. time Fraction
fix_dihedrals_by_backbone_mapping 2 0.764 s 99.2%
set_dihedral 20 0.651 s 84.5%
xyz_to_zmat 60 0.514 s 66.8%
_add_nth_atom_to_zmat 1,560 0.481 s 62.5%
calculate_param 10,357 0.377 s 48.9%
np.cross 5,644 0.191 s 24.8%
get_angle 7,597 0.190 s 24.7%
get_dihedral 1,438 0.153 s 19.9%
zmat_to_coords 60 0.128 s 16.6%
numpy.moveaxis 16,932 0.110 s 14.3%
order_atoms (inside copy) 4 0.091 s 11.8%
numpy.asarray 29,025 0.059 s 7.7%

Key optimizations of the vector coputations imperative for fast atom mapping. This is achieved by compiling kernels in C for these types of computations, and calling them via the _zmat_kernels.py (new file)

ctypes loader for the pre-compiled _zmat_c_kernels.so. Exposes:

  • zmat_r_f32 — interatomic distance
  • zmat_a_f32 — bond angle
  • zmat_d_f32 — dihedral angle
  • zmat_nerf — NeRF atom placement (SN-NeRF formula)

Sets available = True only when the shared library loads without error; all callers
gate on this flag so the code degrades gracefully when the .so is absent, to avoid compiling errors.

vectors.py

Fast paths in calculate_distance and calculate_dihedral_angle:

if _ck_available:
    return _ck.zmat_r_f32(c0[0], c0[1], c0[2], c1[0], c1[1], c1[2])

calculate_angle was left as numpy — the C kernel uses float32 which caused a
precision failure in test_get_zmat_param_value at the ~0.001° level. This can be debated offline.

Also added apply_rodrigues_rotation() used by Change 3. (source: rotation)


Change 3 — arc/mapping/engine.py: Rodrigues rotation replaces set_dihedral() round-trip

Problem

fix_dihedrals_by_backbone_mapping() corrected each torsion angle by calling
set_dihedral(), which performs a full xyz → zmat → modify → zmat → xyz round-trip.
This is O(N²) per torsion and O(N³) overall for large molecules.

Fix

Direct Cartesian rotation via Rodrigues' formula:

  1. Identify downstream atoms using BFS from the far-side pivot (_downstream_atoms_adj).
  2. Rotate only those atoms: R(u,δ) = I cosδ + (1−cosδ)u⊗u + sinδ[u]×
  3. No zmat round-trip — O(N) per torsion, O(N²) overall.
apply_rodrigues_rotation(coords, axis_origin=coords[pivot_i],
                         axis_unit=axis_unit, angle_rad=delta,
                         indices=downstream)

Additional changes in this file:

  • identify_superimposable_candidates: capped at _MAX_BACKBONE_CANDIDATES = 4
    (was unbounded) and deduplicates inline, removing the prune_identical_dicts post-pass. No apparent performence loss.

Change 4 — arc/species/converter.py: Fast path in order_atoms_in_mol_list()

Problem

For species with pre-ordered mol_list, order_atoms_in_mol_list ran VF2
for every resonance structure even when atom IDs already matched positionally.

Fix

if (len(mol.atoms) == len(ref_mol.atoms)
        and all(m.id == r.id and m.id != -1
                for m, r in zip(mol.atoms, ref_mol.atoms))):
    continue

Also changed order_mol_by_atom_map to use reordered.update(sort_atoms=False),
preventing implicit reordering after explicit atom positioning.


Change 5 — arc/family/family.py: Three optimizations in the family-matching pipeline

5a. Adjacency-list CPI cache in determine_possible_reaction_products_from_family

check_product_isomorphism (CPI) was called once per product_list from
generate_products. For large families (e.g. R_Addition on C10H8+H generates
2400 product_lists), most entries are graph-duplicates — same adjacency structure,
different atom-label mappings. The isomorphism result depends only on the graph,
so it is cached by adjacency-list key:

adj_key = tuple(m.to_adjacency_list() for m in template_mols)
if adj_key in iso_cache:
    if not iso_cache[adj_key]: continue
else:
    result = check_product_isomorphism(products=template_mols, p_species=p_species)
    iso_cache[adj_key] = result
    if not result: continue

For reaction C10H9 ↔ C10H8+H: 2400 CPI calls → 150 unique per family (16× reduction).

5b. Shared fwd_iso_cache/rev_iso_cache across family calls

When rmg_family_set='all' is requested, 112 families are iterated, many duplicated
(same label appears as both RMG and ARC copy). With a shared cache, the second occurrence
of any family hits entirely from cache (0 CPI evaluations).

For C16H11 isomerization: CPI calls 794 → 406, time 19.55s → 9.47s.

flip_reaction() is now called once before the loop instead of once per family.
The pre-build is wrapped in try/except to preserve the original graceful-skip
behaviour when to_smiles() fails (e.g. openbabel mocked in tests).

5c. mol_list search limit for large molecules

For molecules with more than 2 resonance structures AND more than 20 atoms,
generate_unimolecular_products limits the search to the first 2 Kekulé forms:

if len(mol_list) > 2 and len(mol_list[0].atoms) > 20:
    mol_list = mol_list[:2]

Additional resonance structures produce near-duplicate product graphs already
deduplicated by the adjacency-list cache at the CPI level. The VF2 cost of
generating them (3–4 s for a 27-atom polycyclic per family) is eliminated.

For C16H11, 27 atoms, 7 resonance structures: generate_products
for Intra_R_Add_Exocyclic drops from ~3s to ~0.9s per call.


Change 6 — arc/species/species.py: skip_conformers on scissors()/_scissors()

Added skip_conformers: bool = False parameter propagated to the generate_conformers
calls inside _scissors(). Allows mapping-layer callers to get the cut species
structure without triggering conformer generation.


kfir4444 added 9 commits June 30, 2026 21:21
Introduces arc/species/_zmat_kernels.py: a thin ctypes wrapper around
_zmat_c_kernels.so that exposes zmat_r_f32, zmat_a_f32, zmat_d_f32 and
zmat_nerf symbols. Sets available=True only when the shared library loads
successfully so all callers can gate on the flag and fall back to numpy
when the .so is absent.
Adds _nerf_cross_sq() to compute |cross(B-A, normalize(C-B))|^2 in
double precision. In determine_d_atoms(), any candidate 4th reference
atom whose triplet falls below _NERF_CROSS_SQ_MIN (0.01, ~3.8 deg from
linear) is rejected so _add_nth_atom_to_coords() never receives a
near-zero cross product that would yield NaN coordinates. The same guard
is applied to the [n, 2, 1, 0] fallback path, with a scan for a valid
alternative when the fallback triplet is itself collinear.
…hedral_angle

When _zmat_kernels.available is True, calculate_distance calls zmat_r_f32
and calculate_dihedral_angle calls zmat_d_f32 instead of the numpy path,
giving 2-2.5x speedup on the 3-element vectors used in zmat construction.
calculate_angle is left as numpy because the C kernel's float32 precision
causes a ~0.001 deg error that breaks test_get_zmat_param_value.

Also adds apply_rodrigues_rotation(): rotates a subset of atoms around an
arbitrary axis using Rodrigues' formula (I cos d + (1-cos d) u⊗u + sin d [u]x),
used by the set_dihedral replacement in engine.py.
Adds a positional-ID check before calling order_atoms(): if every atom in
mol already carries the same .id as the corresponding atom in ref_mol (and
none are -1), the mol is already correctly ordered and the VF2 call is
skipped. Reduces initialization time for species whose mol_list is
returned pre-ordered (e.g. C8H18S: >10 s -> 0.01 s).

Also changes order_mol_by_atom_map to call reordered.update(sort_atoms=False)
so the explicit atom ordering set by the caller is not silently undone by
the update step.
Adds skip_conformers: bool = False to both scissors() and _scissors().
When True, the generate_conformers() calls inside _scissors() are skipped,
letting mapping-layer callers obtain the cut species structure without the
overhead of conformer generation.
fix_dihedrals_by_backbone_mapping() previously corrected each torsion via
set_dihedral(), which performs a full xyz->zmat->modify->zmat->xyz round-
trip: O(N^2) per torsion and O(N^3) overall.

Replaces this with a direct Cartesian rotation (Rodrigues' formula):
1. BFS from the far-side pivot to collect downstream atom indices
   (_downstream_atoms_adj, O(N) using a pre-built adjacency tuple).
2. Rotate only those atoms in-place: R(u,d) = I cos d + (1-cos d) u⊗u
   + sin d [u]x (apply_rodrigues_rotation in vectors.py).
No zmat round-trip: O(N) per torsion, O(N^2) overall.

Also caps identify_superimposable_candidates at _MAX_BACKBONE_CANDIDATES=4
and deduplicates inline, removing the prune_identical_dicts post-pass.
…and mol_list limit

Three targeted optimizations that together reduce timeouts at 60 s from 8
to 2 across 452 benchmark reactions (succeeded: 286 -> 292):

1. Adjacency-list CPI cache in determine_possible_reaction_products_from_family:
   check_product_isomorphism is now cached by tuple(m.to_adjacency_list())
   of the template molecules. Many product_lists from generate_products share
   the same graph; the result is computed once per unique graph and reused.
   For entry_269 (C10H9<->C10H8+H): 2400 CPI calls -> 150 per family.

2. Shared fwd_iso_cache / rev_iso_cache across all family calls in
   get_reaction_family_products: when rmg_family_set='all' lists 112 families
   with duplicates, the second occurrence of each family hits entirely from
   cache. For entry_244 (C16H11): CPI calls 794->406, time 19.55->9.47 s.
   flip_reaction() is also moved before the loop (called once, not per-family)
   and the pre-build is wrapped in try/except so that a ValueError from
   to_smiles() (e.g. mocked openbabel in tests) still skips cleanly.

3. mol_list search limit in generate_unimolecular_products: for molecules
   with >20 atoms and >2 resonance structures, only the first 2 Kekule forms
   are searched. Additional forms produce near-duplicate product graphs
   already deduplicated by the CPI cache; their VF2 cost (3-4 s for a
   27-atom polycyclic per family) is eliminated. For entry_244 (C16H11,
   7 resonance structures): generate_products for Intra_R_Add_Exocyclic
   drops from ~3 s to ~0.9 s per call.
Adds a .gitignore exception for arc/species/_zmat_c_kernels.{c,so} so
that the hand-written C geometry kernel and its compiled shared library
are versioned alongside the Python loader (_zmat_kernels.py). All other
*.so and *.c files remain ignored.
Extends the existing compile target with a gcc invocation that rebuilds
arc/species/_zmat_c_kernels.so from its C source using the same
optimization flags recorded in the source header (-O3 -march=native
-ffast-math). The Cython arc.molecule build runs first, unchanged.
Comment thread arc/mapping/engine.py Fixed
Comment thread arc/species/_zmat_kernels.py Fixed
Comment thread arc/species/_zmat_kernels.py Fixed
Comment thread arc/species/zmat.py Fixed
Comment thread arc/species/zmat.py Fixed
Comment thread arc/species/zmat.py Fixed
kfir4444 and others added 2 commits July 1, 2026 09:38
- Add zmat_a_f32 fast path in calculate_angle (same float32 kernel already
  used for distance and dihedral); degrades gracefully when .so absent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Float32 kernel has ~0.001 deg error vs numpy float64 path; places=4/5
is too tight. places=2 (0.005 deg tolerance) is more than sufficient
for all chemistry use cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kfir4444 and others added 5 commits July 1, 2026 11:01
calculate_angle now uses the C float32 kernel; A8 rounds to 110.9983
instead of 110.9984 (0.0001° difference, within chemistry tolerance).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In CI environments using micromamba, the condabin/conda shim fails with
  ModuleNotFoundError: No module named 'conda'
because micromamba does not ship the conda Python package. This caused all
Arkane runs (run_arkane, save_arkane_thermo, get_qm_corrections,
get_point_groups) to silently fail, so functional rate tests never produced
reactions.py.

When RMG_PYTHON is resolvable (find_executable already handles MAMBA_ROOT_PREFIX,
CONDA_PREFIX, etc.), call it directly and prepend its bin dir to PATH so
co-installed binaries (e.g. symmetry) are found. Fall back to the existing
conda/micromamba/mamba shim detection when RMG_PYTHON is not available.
…ty excepts

- engine.py: drop unused `get_all_neighbors` import from arc.species.zmat
- _zmat_kernels.py: drop unused `math` import; add comment on bare except
  (the except is intentional — .so absent/mis-compiled → available stays False)
- zmat.py: add comments to three bare `except (IndexError, KeyError): pass`
  clauses so CodeQL and readers understand they are intentional fallbacks
When calling RMG_PYTHON directly (without conda activation), the subprocess
inherits arc_env's CONDA_PREFIX and LD_LIBRARY_PATH. This causes rmg_env's
Python to load arc_env's OpenBabel plugins (which have undefined symbols
against rmg_env's OB ABI), silently killing Arkane before it can produce
any output.

setup-micromamba exports MAMBA_EXE=/path/to/micromamba into the environment.
Prefer '"$MAMBA_EXE" run -n rmg_env' when that var is set — micromamba run
properly isolates rmg_env (resets CONDA_PREFIX, LD_LIBRARY_PATH, etc.).
Fall back to direct RMG_PYTHON for mambaforge/conda environments where
MAMBA_EXE is absent but conda/micromamba shims are also unavailable.

Applied to all four call sites: run_arkane(), save_arkane_thermo,
get_qm_corrections, get_point_groups.
Same root cause as the Arkane fix: micromamba activate / bash -lc patterns
call the condabin/conda shim which fails with ModuleNotFoundError when conda
is not installed in the micromamba base env.

- compare_thermo: replace bash -lc + micromamba run with MAMBA_EXE run (preferred),
  RMG_PYTHON direct call (fallback), then old bash -lc detection (last resort)
- compare_rates: replace micromamba activate + python with the same three-tier hierarchy

This fixes functional/restart_test.py::test_restart_rate_1/2 which were
failing pre-existing on main for the same reason.
kfir4444 added 3 commits July 3, 2026 08:46
The 1,2_NH3_elimination TS interpolation test was checking specific
atom indices (0, 1, 5) for N-N and N-H bond distances. Triazene has
two equivalent terminal N atoms, so a symmetric atom map produces a
valid but differently-ordered TS that fails the hardcoded index checks.

Replace with element-based pairwise distance scans:
- Check any N-N pair has d ≈ 1.87 Å (breaking bond)
- Check ≥2 N-H pairs have d ≈ 1.46 Å (breaking + forming H-transfer)
Three tests in linear_test.py relied on hardcoded atom indices or a
fixed product_dict order that breaks when a symmetric atom map or a
different product_dicts ordering (hash-dependent in Python 3.14) is
chosen:

- test_type_p_uses_product_topology: product_dict_index_to_try=0 may
  resolve to a non-H-abstraction family; now tries all indices until
  map_rxn returns a non-None map.

- test_interpolate_intra_oh_migration: hardcoded coords[0/2/3/4/5/8]
  replaced with element-based pairwise scans over O-O, C-O, and O-H
  distances; exact-coordinate expected_ts check removed (covered by the
  structural distance criteria).
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.98%. Comparing base (e7f2c98) to head (245c019).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #906      +/-   ##
==========================================
- Coverage   63.15%   62.98%   -0.18%     
==========================================
  Files         114      115       +1     
  Lines       38178    38472     +294     
  Branches     9990    10055      +65     
==========================================
+ Hits        24113    24233     +120     
- Misses      11183    11317     +134     
- Partials     2882     2922      +40     
Flag Coverage Δ
functionaltests 62.98% <ø> (-0.18%) ⬇️
unittests 62.98% <ø> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets major runtime reductions in ARC’s atom mapping and family product-generation workflows by adding safer geometry handling, introducing optional C-backed geometry kernels, and eliminating expensive zmat round-trips in torsion alignment.

Changes:

  • Adds NeRF collinearity guards in zmat reference-atom selection to avoid NaN coordinate generation on (near-)linear geometries.
  • Introduces optional C kernels (ctypes-loaded) for hot-path vector computations and wires fast paths into vectors.py.
  • Replaces set_dihedral() zmat round-trips with direct Rodrigues rotations in the mapping engine; adds additional performance caps/caches across mapping/family/product pipelines.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Makefile Adds compilation step for the _zmat_c_kernels.so shared library.
arc/statmech/arkane.py Improves Arkane invocation by preferring MAMBA_EXE / configured RMG_PYTHON when available.
arc/species/zmat.py Adds _nerf_cross_sq() + threshold gating to avoid collinear NeRF reference triplets.
arc/species/vectors.py Wires optional C-kernel fast paths for distance/angle/dihedral; adds Rodrigues rotation helper.
arc/species/vectors_test.py Relaxes angle-related assertions to accommodate kernel precision differences.
arc/species/species.py Adds skip_conformers option to scissors()/_scissors() to avoid costly conformer generation when not needed.
arc/species/converter.py Adds a VF2 fast-path in order_atoms_in_mol_list(); prevents implicit reordering in order_mol_by_atom_map().
arc/species/converter_test.py Relaxes zmat angle assertion precision consistent with kernel path behavior.
arc/species/_zmat_kernels.py Adds ctypes loader for _zmat_c_kernels and availability gating.
arc/species/_zmat_c_kernels.c Adds C implementations for distance/angle/dihedral and SN‑NeRF placement.
arc/processor.py Uses MAMBA_EXE / RMG_PYTHON fast paths for RMG thermo/kinetics helper scripts.
arc/output.py Uses MAMBA_EXE / RMG_PYTHON fast paths for QM corrections and point group helpers.
arc/mapping/engine.py Caps backbone candidates; replaces torsion alignment with adjacency/BFS + Rodrigues rotation loop.
arc/job/adapters/ts/linear_test.py Makes TS interpolation tests more robust to symmetric atom-map choices.
arc/job/adapters/cfour_test.py Updates a reference value by 1e-4.
arc/family/family.py Adds CPI caching by adjacency list, shared caches across families, and resonance-search limiting for large molecules.
.gitignore Explicitly allows committing the C kernel source and shared library artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Makefile
Comment on lines +127 to +129
gcc -O3 -march=native -ffast-math -shared -fPIC \
-o arc/species/_zmat_c_kernels.so \
arc/species/_zmat_c_kernels.c -lm

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the copilot comment. On a different x86-64 CPU with an older microarchitecture you could get SIGILL that kills Python. let's rethink it

Comment on lines +18 to +19
_HERE = os.path.dirname(os.path.abspath(__file__))
_SO = os.path.join(_HERE, '_zmat_c_kernels.so')
Comment on lines +49 to +50
except Exception:
pass # .so absent or mis-compiled; available stays False → pure-Python fallback used
Comment thread arc/mapping/engine.py
Comment on lines +623 to +639
angle = 0.5 * (a1 + a2)

# Rodrigues on coords_1
delta = math.radians(angle - a1)
pi, pj = t1[1], t1[2]
key1 = (pi, pj)
if key1 not in ds_cache_1:
ds_cache_1[key1] = _downstream_atoms_adj(adj_1, pi, pj)
pi_c = coords_1[pi]; pj_c = coords_1[pj]
dx = pj_c[0]-pi_c[0]; dy = pj_c[1]-pi_c[1]; dz = pj_c[2]-pi_c[2]
bl = math.sqrt(dx*dx+dy*dy+dz*dz)
if bl > 1e-8:
inv = 1.0/bl
apply_rodrigues_rotation(coords_1, pi_c, (dx*inv, dy*inv, dz*inv), delta, ds_cache_1[key1])

# Rodrigues on coords_2
delta2 = math.radians(angle - a2)
Comment thread arc/species/vectors.py
Comment on lines +204 to +207
if _ck_available:
c0, c1, c2 = coords[new_atoms[0]], coords[new_atoms[1]], coords[new_atoms[2]]
deg = _ck.zmat_a_f32(c0[0], c0[1], c0[2], c1[0], c1[1], c1[2], c2[0], c2[1], c2[2])
return deg if 'degs' in units else math.radians(deg)
@alongd

alongd commented Jul 3, 2026

Copy link
Copy Markdown
Member

@kfir4444, nice CI speedup!

@alongd alongd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some comments

Comment thread arc/family/family.py
mol_list = reactants[0].mol_list or [reactants[0].mol]
# For large molecules (>20 atoms) with many resonance structures, limit the search
# to avoid combinatorial explosion. Most reaction sites are captured by the first
# few Kekulé forms; additional resonance structures produce equivalent sites already

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resonance doesn't always mean Kekulé / Clar type. there are many others, including complex N and S chemistries, and conjugated allyls.
I'm not sure we can use this bypass. I'm OK to use it for most of the tests (keep some that follow the original branch).

Comment thread Makefile
Comment on lines +127 to +129
gcc -O3 -march=native -ffast-math -shared -fPIC \
-o arc/species/_zmat_c_kernels.so \
arc/species/_zmat_c_kernels.c -lm

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the copilot comment. On a different x86-64 CPU with an older microarchitecture you could get SIGILL that kills Python. let's rethink it

* building numpy arrays, eliminating the dominant overhead in the hot path.
*
* Compiled with:
* gcc -O3 -march=native -ffast-math -shared -fPIC -o _zmat_c_kernels.so \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-ffast-math - if we compile with finite-math-only (NaN/Inf never occur), it will return 0 instead of true NaN results. If we try to guard against co-linearity in dihedrals, then this isn't assisting.
add -fno-finite-math-only or drop -ffast-math?

Comment thread arc/mapping/engine.py
return fingerprint_dict


_MAX_BACKBONE_CANDIDATES = 4

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In symmetric molecules the best superimposable backbone map can be beyond the first 4.
Why is 4 safe? How do we know accuracy is preserved?

Comment thread arc/mapping/engine.py
return tuple(tuple(atom_to_idx[nb] for nb in atom.edges) for atom in mol.atoms)


def _downstream_atoms_adj(adj: tuple, pivot_i: int, pivot_j: int) -> list[int]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cuts the pivot_i–pivot_j bond and BFSs the far side. But if that bond is in a ring, the graph stays connected. Needs fixing

Comment thread arc/mapping/engine.py
return True


def _set_dihedral_rodrigues(spc: ARCSpecies, torsion: list[int], deg_abs: float) -> bool:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this func used anywhere?

Comment thread arc/mapping/engine.py
return _downstream_atoms_adj(_build_adj(mol), pivot_i, pivot_j)


def _rodrigues_on_coords(coords: list, adj: tuple, torsion: list[int], deg_abs: float) -> bool:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this func used anywhere?

Comment thread arc/mapping/engine.py
return downstream


def _downstream_atoms(mol: Molecule, pivot_i: int, pivot_j: int) -> list[int]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this func used anywhere?

Comment thread arc/mapping/engine.py
if result is not None:
if result is not None and result not in candidates:
candidates.append(result)
return prune_identical_dicts(candidates)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't use prune_identical_dicts here, is that function still used anywhere?

@@ -0,0 +1,50 @@
"""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not ship it pre-compiled, add it to the make compile command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants