Skip to content

Move html characeter converting mechanism to regexp-handling#1570

Merged
tompng merged 1 commit into
ruby:masterfrom
tompng:html_chars_regexp_handling
May 12, 2026
Merged

Move html characeter converting mechanism to regexp-handling#1570
tompng merged 1 commit into
ruby:masterfrom
tompng:html_chars_regexp_handling

Conversation

@tompng
Copy link
Copy Markdown
Member

@tompng tompng commented Jan 19, 2026

Text#to_html_characters was a postprocess that converts ascii quotes/marks to multibyte characters. Postprocessing HTML to do that is not a good idea. Convert plain text node with regexp-handling is better.

@tompng tompng temporarily deployed to fork-preview-protection January 19, 2026 17:54 — with GitHub Actions Inactive
Comment thread lib/rdoc/text.rb
h[encoding] = {
:close_dquote => encode_fallback('”', encoding, '"'),
:close_squote => encode_fallback('’', encoding, '\''),
:copyright => encode_fallback('©', encoding, '(c)'),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think html character is ©, not ©, but it's another issue.

@matzbot
Copy link
Copy Markdown
Collaborator

matzbot commented Jan 19, 2026

🚀 Preview deployment available at: https://0733bf1b.rdoc-6cd.pages.dev (commit: d8f23d9)

Comment thread lib/rdoc/markup/to_html.rb Outdated
Comment thread lib/rdoc/markup/to_html.rb Outdated
Comment thread lib/rdoc/markup/to_html.rb Outdated
@insquotes = true
end
end
TO_HTML_CHARACTERS[quote.encoding][type] if type
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could TO_HTML_CHARACTERS[quote.encoding] ever be nil and causes NoMethodError?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TO_HTML_CHARACTERS = Hash.new do |h, encoding| ... end will generate a new hash if there is no key.

Comment thread lib/rdoc/markup/to_html.rb Outdated
Copilot AI review requested due to automatic review settings May 12, 2026 12:30
@tompng tompng force-pushed the html_chars_regexp_handling branch from 6eb654a to d64c2b8 Compare May 12, 2026 12:30
@tompng tompng requested a deployment to fork-preview-protection May 12, 2026 12:30 — with GitHub Actions Waiting
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves “smart punctuation” (quotes/dashes/ellipsis/(c)/(r) conversions) from RDoc::Text’s HTML post-processing into RDoc::Markup::ToHtml’s inline regexp-handling so conversions apply only to plain-text nodes (and avoid rewriting already-generated HTML).

Changes:

  • Removed RDoc::Text#to_html_characters (and related helpers) and stopped post-processing rendered HTML.
  • Added regexp-based handling in RDoc::Markup::ToHtml for HTML-character aliases and quote conversion, scoped to inline/plain-text processing.
  • Updated snippet and crossref behaviors/tests to reflect the new conversion location and boundaries.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lib/rdoc/text.rb Removes the old HTML post-processing conversion mechanism from RDoc::Text.
lib/rdoc/markup/to_html.rb Introduces encoding-aware character mapping + regexp-handling and quote conversion in the HTML formatter.
lib/rdoc/markup/to_html_snippet.rb Aligns snippet truncation ellipsis and HTML conversion with the new formatter behavior.
test/rdoc/rdoc_text_test.rb Removes tests that were specific to the deleted RDoc::Text conversion API.
test/rdoc/markup/to_html_test.rb Adds coverage for the new formatter-level conversion behavior and encoding fallback.
test/rdoc/markup/to_html_snippet_test.rb Updates expected truncation output to use the new ellipsis behavior.
test/rdoc/markup/to_html_crossref_test.rb Updates expectations where generated crossref text is no longer post-processed into smart quotes.
Comments suppressed due to low confidence (1)

lib/rdoc/text.rb:172

  • RDoc::Text#to_html / #to_html_characters were removed here. If any external callers still depend on these helpers (they previously handled quote/dash/ellipsis substitutions post-render), this is a breaking API change. Consider leaving a compatibility method that forwards to the new regexp-handling implementation (or raising a clearer error/deprecation) to avoid silent behavioral differences across versions.
  ##
  # Wraps +txt+ to +line_len+

  def wrap(txt, line_len = 76)
    res = []

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/rdoc/text.rb
:undef => :replace, :replace => fallback)
end

##
Comment thread lib/rdoc/markup/to_html.rb Outdated
`Text#to_html_characters` was a postprocess that converts ascii quotes/marks to multibyte characters.
Postprocessing HTML to do thaat is not a good idea. Convert plain text node is better.
@tompng tompng force-pushed the html_chars_regexp_handling branch from d64c2b8 to d8f23d9 Compare May 12, 2026 13:15
@tompng tompng temporarily deployed to fork-preview-protection May 12, 2026 13:15 — with GitHub Actions Inactive
@tompng tompng merged commit 4d6b725 into ruby:master May 12, 2026
41 checks passed
@tompng tompng deleted the html_chars_regexp_handling branch May 12, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants