A fuzzy matching toolkit for noisy names and transliterations. Includes normalization, phonetic encoding, candidate generation, and ranking to improve recall and precision when matching names with spelling variants and script conversions.
- Name normalization and cleanup utilities
- Phonetic keys for sound-alike matching
- Edit-distance scoring and ranking
- Candidate generation to keep search efficient
- Normalize input strings (case, spacing, punctuation, common variants)
- Generate phonetic keys to capture sound-alike matches
- Create candidates using prefixes, phonetic buckets, or blocking
- Rank results using edit distance and weighted scoring
src/- core matching and conversion logicexamples/- sample inputs and usagetests/- unit tests (optional)data/- sample datasets (optional)
- Provide two names (or a query and a candidate list)
- The library returns top matches with scores and intermediate signals (optional)
- Candidate blocking is used to reduce comparisons
- Scoring can be tuned based on the domain (names, locations, entities)
Add your preferred license.