Skip to content

Commit

Permalink
Merge pull request #50 from digitalmoksha/bw-fix-benchmark
Browse files Browse the repository at this point in the history
Fix and remove old benchmark code
  • Loading branch information
gjtorikian committed Jun 5, 2024
2 parents 271d7db + 660a9cd commit 36cde72
Show file tree
Hide file tree
Showing 9 changed files with 216 additions and 148,499 deletions.
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@
"[ruby]": {
"editor.defaultFormatter": "Shopify.ruby-lsp"
},
"[markdown]": {
"editor.defaultFormatter": "esbenp.prettier-vscode"
},
}
8 changes: 2 additions & 6 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,9 @@ group :lint do
end

group :benchmark do
# benchmark stuff
gem "benchmark-ips"
gem "commonmarker"
gem "gemoji"
gem "html-pipeline"
gem "rouge"
gem "sanitize", "~> 6.0"
gem "nokolexbor"
gem "sanitize"
end

gem "ruby-lsp", "~> 0.11", group: :development
124 changes: 102 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,38 +182,118 @@ The `element` argument in `handle_element` has the following methods:

## Benchmarks

When `bundle exec rake benchmark`, two different benchmarks are calculated. Here are those results on my machine.

### Benchmarks for just the sanitization process

Comparing Selma against popular Ruby sanitization gems:

<!-- prettier-ignore-start -->
<details>
<pre>
ruby test/benchmark.rb
ruby test/benchmark.rb
Warming up --------------------------------------
sanitize-document-huge
1.000 i/100ms
selma-document-huge 1.000 i/100ms
sanitize-sm 15.000 i/100ms
selma-sm 126.000 i/100ms
Calculating -------------------------------------
sanitize-sm 155.074 (± 1.9%) i/s - 4.665k in 30.092214s
selma-sm 1.290k (± 1.3%) i/s - 38.808k in 30.085333s

Comparison:
selma-sm: 1290.1 i/s
sanitize-sm: 155.1 i/s - 8.32x slower

input size = 86686 bytes, 0.09 MB

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
sanitize-md 3.000 i/100ms
selma-md 33.000 i/100ms
Calculating -------------------------------------
sanitize-document-huge
0.257 (± 0.0%) i/s - 2.000 in 7.783398s
selma-document-huge 4.602 (± 0.0%) i/s - 23.000 in 5.002870s
sanitize-md 40.321 (± 5.0%) i/s - 1.206k in 30.004711s
selma-md 337.417 (± 1.5%) i/s - 10.131k in 30.032772s

Comparison:
selma-md: 337.4 i/s
sanitize-md: 40.3 i/s - 8.37x slower

input size = 7172510 bytes, 7.17 MB

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
sanitize-document-medium
2.000 i/100ms
selma-document-medium
22.000 i/100ms
sanitize-lg 1.000 i/100ms
selma-lg 1.000 i/100ms
Calculating -------------------------------------
sanitize-document-medium
28.676 (± 3.5%) i/s - 144.000 in 5.024669s
selma-document-medium
121.500 (±22.2%) i/s - 594.000 in 5.135410s
sanitize-lg 0.144 (± 0.0%) i/s - 5.000 in 34.772526s
selma-lg 4.026 (± 0.0%) i/s - 121.000 in 30.067415s

Comparison:
selma-lg: 4.0 i/s
sanitize-lg: 0.1 i/s - 27.99x slower
</pre>
</details>
<!-- prettier-ignore-end -->

## Benchmarks for just the rewriting process

Comparing Selma against popular Ruby HTML parsing gems:

<!-- prettier-ignore-start -->
<details>
<pre>

input size = 25309 bytes, 0.03 MB

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
nokogiri-sm 79.000 i/100ms
nokolexbor-sm 285.000 i/100ms
selma-sm 244.000 i/100ms
Calculating -------------------------------------
nokogiri-sm 807.790 (± 3.1%) i/s - 24.253k in 30.056301s
nokolexbor-sm 2.880k (± 6.4%) i/s - 86.070k in 30.044766s
selma-sm 2.508k (± 1.2%) i/s - 75.396k in 30.068792s

Comparison:
nokolexbor-sm: 2880.3 i/s
selma-sm: 2507.8 i/s - 1.15x slower
nokogiri-sm: 807.8 i/s - 3.57x slower

input size = 86686 bytes, 0.09 MB

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
nokogiri-md 8.000 i/100ms
nokolexbor-md 43.000 i/100ms
selma-md 39.000 i/100ms
Calculating -------------------------------------
nokogiri-md 87.367 (± 3.4%) i/s - 2.624k in 30.061642s
nokolexbor-md 438.782 (± 3.9%) i/s - 13.158k in 30.031163s
selma-md 392.591 (± 3.1%) i/s - 11.778k in 30.031391s

Comparison:
nokolexbor-md: 438.8 i/s
selma-md: 392.6 i/s - 1.12x slower
nokogiri-md: 87.4 i/s - 5.02x slower

input size = 7172510 bytes, 7.17 MB

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
sanitize-document-small
10.000 i/100ms
selma-document-small 20.000 i/100ms
nokogiri-lg 1.000 i/100ms
nokolexbor-lg 1.000 i/100ms
selma-lg 1.000 i/100ms
Calculating -------------------------------------
sanitize-document-small
107.280 (± 0.9%) i/s - 540.000 in 5.033850s
selma-document-small 118.867 (±31.1%) i/s - 540.000 in 5.080726s
nokogiri-lg 0.895 (± 0.0%) i/s - 27.000 in 30.300832s
nokolexbor-lg 2.163 (± 0.0%) i/s - 65.000 in 30.085656s
selma-lg 5.867 (± 0.0%) i/s - 176.000 in 30.006240s

Comparison:
selma-lg: 5.9 i/s
nokolexbor-lg: 2.2 i/s - 2.71x slower
nokogiri-lg: 0.9 i/s - 6.55x slower
</pre>
</details>
<!-- prettier-ignore-end -->

## Contributing

Expand Down
126 changes: 70 additions & 56 deletions test/benchmark.rb
Original file line number Diff line number Diff line change
@@ -1,85 +1,99 @@
# frozen_string_literal: true

require "benchmark/ips"
require "html/pipeline"
require "commonmarker"
require "sanitize"
require "selma"
require_relative "benchmark/selma_config"

REWRITE_INPUT = File.read("test/benchmark/rewrite_benchmark_input.md").freeze
require "sanitize"
require "nokogiri"
require "nokolexbor"

DIR = File.expand_path(File.dirname(__FILE__))

DOCUMENT_SMALL = File.read("#{DIR}/benchmark/html/document-sm.html").encode("UTF-8", invalid: :replace, undef: :replace)
DOCUMENT_MEDIUM = File.read("#{DIR}/benchmark/html/document-md.html").encode("UTF-8", invalid: :replace, undef: :replace)
DOCUMENT_HUGE = File.read("#{DIR}/benchmark/html/document-lg.html").encode("UTF-8", invalid: :replace, undef: :replace)

DOCUMENTS = [
[DOCUMENT_SMALL, "sm"],
[DOCUMENT_MEDIUM, "md"],
[DOCUMENT_HUGE, "lg"],
]

IPS_ARGS = { time: 30, warmup: 10 }

def bytes_to_megabytes(bytes)
(bytes.to_f / 1_000_000).round(2)
end

DIR = File.expand_path(File.dirname(__FILE__))

DOCUMENT_HUGE = File.read("#{DIR}/benchmark/html/document-huge.html").encode("UTF-8", invalid: :replace, undef: :replace)
DOCUMENT_MEDIUM = File.read("#{DIR}/benchmark/html/document-medium.html").encode("UTF-8", invalid: :replace, undef: :replace)
DOCUMENT_SMALL = File.read("#{DIR}/benchmark/html/document-small.html").encode("UTF-8", invalid: :replace, undef: :replace)

FRAGMENT_LARGE = File.read("#{DIR}/benchmark/html/fragment-large.html").encode("UTF-8", invalid: :replace, undef: :replace)
FRAGMENT_SMALL = File.read("#{DIR}/benchmark/html/fragment-small.html").encode("UTF-8", invalid: :replace, undef: :replace)
def print_size(html)
bytes = html.bytesize
mbes = bytes_to_megabytes(bytes)
puts("input size = #{bytes} bytes, #{mbes} MB\n\n")
end

def compare_sanitize
sanitize_config = Sanitize::Config::RELAXED
[[DOCUMENT_HUGE, "huge"], [DOCUMENT_MEDIUM, "medium"], [DOCUMENT_SMALL, "small"]].each do |(html, label)|
DOCUMENTS.each do |(html, label)|
print_size(html)
Benchmark.ips do |x|
x.report("sanitize-document-#{label}") do
Sanitize.document(html, sanitize_config)
x.config(IPS_ARGS)

x.report("sanitize-#{label}") do
Sanitize.document(html, Sanitize::Config::RELAXED)
end

x.report("selma-document-#{label}") do
Selma::HTML.new(html, sanitize: Selma::Sanitizer::Config::RELAXED).rewrite
x.report("selma-#{label}") do
sanitizer = Selma::Sanitizer.new(Selma::Sanitizer::Config::RELAXED)
Selma::Rewriter.new(sanitizer: sanitizer).rewrite(html)
end

x.compare!
end
end
end

def compare_rewriting
bytes = REWRITE_INPUT.bytesize
mbes = bytes_to_megabytes(bytes)
puts("input size = #{bytes} bytes, #{mbes} MB\n\n")

Benchmark.ips do |x|
x.report("html-pipeline") do
context = {
asset_root: "http://your-domain.com/where/your/images/live/icons",
base_url: "http://your-domain.com",
asset_proxy: "https//assets.example.org",
asset_proxy_secret_key: "ssssh-secret",
}
pipeline = HTML::Pipeline.new(
[
HTML::Pipeline::MarkdownFilter,
HTML::Pipeline::SanitizationFilter,
HTML::Pipeline::CamoFilter,
HTML::Pipeline::ImageMaxWidthFilter,
HTML::Pipeline::HttpsFilter,
HTML::Pipeline::MentionFilter,
HTML::Pipeline::EmojiFilter,
HTML::Pipeline::SyntaxHighlightFilter,
],
context.merge(gfm: true),
)
result = pipeline.call(REWRITE_INPUT)
result[:output].to_s
nokogiri_compat = ->(doc) do
doc.css(%(a[href])).each do |node|
node["href"] = node["href"].sub(/^https?:/, "gopher:")
end

x.report("selma") do
html = CommonMarker.render_html(REWRITE_INPUT)
Selma::Rewriter.new(sanitize: SelmaConfig::ALLOWLIST, handlers: [
SelmaConfig::CamoHandler.new,
SelmaConfig::ImageMaxWidthHandler.new,
SelmaConfig::HttpsHandler.new,
SelmaConfig::MentionHandler.new,
SelmaConfig::EmojiHandler.new,
SelmaConfig::SyntaxHighlightHandler.new,
]).rewrite(html)
doc.css("span").each do |node|
node.parent.add_child("<div>#{node.text}</div>")
end

x.compare!
doc.css("img").each(&:remove)

doc.to_html
end

DOCUMENTS.each do |(html, label)|
print_size(html)
Benchmark.ips do |x|
x.config(IPS_ARGS)

x.report("nokogiri-#{label}") do
doc = Nokogiri::HTML.parse(html)

nokogiri_compat.call(doc)
end

x.report("nokolexbor-#{label}") do
doc = Nokolexbor::HTML(html)

nokogiri_compat.call(doc)
end

x.report("selma-#{label}") do
Selma::Rewriter.new(sanitizer: nil, handlers: [
SelmaConfig::HrefHandler.new,
SelmaConfig::SpanHandler.new,
SelmaConfig::ImgHandler.new,
]).rewrite(html)
end

x.compare!
end
end
end

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 36cde72

Please sign in to comment.