Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug that Stream parser doesn't expand the user-defined entity references for "text" #200

Merged
merged 5 commits into from
Aug 21, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Aug 20, 2024

Why?

Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references.

Change

  • text_stream_unnormalize.rb
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml/document'
require 'rexml/parsers/sax2parser'
require 'rexml/parsers/pullparser'
require 'rexml/parsers/streamparser'
require 'rexml/streamlistener'

xml = <<EOS
<!DOCTYPE foo [
  <!ENTITY la "1234">
  <!ENTITY lala "--&la;--">
  <!ENTITY lalal "&la;&la;">
]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root>
EOS

class StListener
  include REXML::StreamListener

  def text(text)
    puts text
  end
end

puts "REXML(DOM)"
REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text}

puts ""
puts "REXML(Pull)"
parser = REXML::Parsers::PullParser.new(xml)
while parser.has_next?
  event = parser.pull
  case event.event_type
  when :text
    puts event[1]
  end
end

puts ""
puts "REXML(Stream)"
parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse

puts ""
puts "REXML(SAX)"
sax = REXML::Parsers::SAX2Parser.new(xml)
sax.listen(:characters) {|x| puts x }
sax.parse

Before (master)

$ ruby text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
&la;      #<= This
&lala;    #<= This
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

After(This PR)

$ ruby text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

…eferences for "text"

## Why?
Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references.

## Change
- text_stream_unnormalize.rb
```
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml/document'
require 'rexml/parsers/sax2parser'
require 'rexml/parsers/pullparser'
require 'rexml/parsers/streamparser'
require 'rexml/streamlistener'

xml = <<EOS
<!DOCTYPE foo [
  <!ENTITY la "1234">
  <!ENTITY lala "--&la;--">
  <!ENTITY lalal "&la;&la;">
]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root>
EOS

class StListener
  include REXML::StreamListener

  def text(text)
    puts text
  end
end

puts "REXML(DOM)"
REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text}

puts ""
puts "REXML(Pull)"
parser = REXML::Parsers::PullParser.new(xml)
while parser.has_next?
  event = parser.pull
  case event.event_type
  when :text
    puts event[1]
  end
end

puts ""
puts "REXML(Stream)"
parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse

puts ""
puts "REXML(SAX)"
sax = REXML::Parsers::SAX2Parser.new(xml)
sax.listen(:characters) {|x| puts x }
sax.parse
```

## Before (master)
```
$ ruby  text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
&la;           #<= This
&lala;         #<= This
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™
```

After(This PR)

```
$ ruby  text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™
```
## Why?

See:
- ruby#187
- ruby#195

## Change
- Supported `REXML::Security.entity_expansion_limit=` in Stream parser
- Supported `REXML::Security.entity_expansion_text_limit=` in Stream parser
## Why?
Because `StreamParser#entity_expansion_count` was added.
@naitoh naitoh marked this pull request as ready for review August 20, 2024 23:04
test/test_stream.rb Outdated Show resolved Hide resolved
@naitoh naitoh requested a review from kou August 21, 2024 00:54
@kou kou merged commit 6109e01 into ruby:master Aug 21, 2024
61 checks passed
@kou
Copy link
Member

kou commented Aug 21, 2024

Thanks.

@naitoh naitoh deleted the fix_stream_text_unnormalize branch August 21, 2024 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants