Skip to content

gh-149427: Improve performance of re.compile() for patterns with character ranges#149428

Open
eendebakpt wants to merge 3 commits into
python:mainfrom
eendebakpt:regex_compile_charset_memcpy
Open

gh-149427: Improve performance of re.compile() for patterns with character ranges#149428
eendebakpt wants to merge 3 commits into
python:mainfrom
eendebakpt:regex_compile_charset_memcpy

Conversation

@eendebakpt
Copy link
Copy Markdown
Contributor

@eendebakpt eendebakpt commented May 5, 2026

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 5, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32753205 | 📁 Comparing 56e25cd against main (f6d16a0)

  🔍 Preview build  

43 files changed · ± 42 modified · - 1 deleted

± Modified

- Deleted

@maurycy
Copy link
Copy Markdown
Contributor

maurycy commented May 5, 2026

I confirm the numbers as of e7ca3ef

Benchmark baseline patched
compile-charsets-small 1.24 ms 584 us: 2.12x faster
compile-charsets-big 745 us 706 us: 1.06x faster
Geometric mean (ref) 1.49x faster

Comment thread Lib/re/_compiler.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When av[0] and av[1] were only being accessed once, indexing overhead didn't matter much, and the meaning was more clear. But now we're using it a lot more. Perhaps change to start, end = av, end += 1, r = range(start, end), and then the non-fixup path can just use start and end in place of reindexing to av[0]/av[1] repeatedly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants