Fix reading HTML via XmlReader API when root element is absent #16

greatvovan · 2016-05-11T15:27:49Z

Consider folowing snippet:

            string toParse;
            toParse = "<p>one</p><p>two</p><p>three</p>";
            var sr = new StringReader(toParse);
            var NULL = "NULL";

            using (var reader = new SgmlReader()
                {InputStream = sr, DocType = "HTML", IgnoreDtd = true})
            {
                while (!reader.EOF)
                {
                    reader.Read();
                    WriteLine($"[{reader.NodeType,10}] {reader.Name ?? NULL,5}: " +
                        $"{reader.Value ?? NULL}");
                }
            }

On version 1.8.12 SGMLReader fails to read all document since it comes to this section twice:

            if (this.Depth == 1)
            {
                if (this.m_rootCount == 1)
                {
                    // Hmmm, we found another root level tag, soooo, the only
                    // thing we can do to keep this a valid XML document is stop
                    this.m_state = State.Eof;
                    return false;
                }
                this.m_rootCount++;
            }

It only returns the first <p> element and opening tag for second.

It was working in old good version 1.8.6, but I don't have a code of it, as on GitHub history starts from version 1.8.7 wich is already broken.

This commit allows to fix this behavior if user has set DocType to "HTML" explicitly.

UweKeim · 2016-05-24T14:37:51Z

Awesome, just the same problem as I'm having here, too. Hopefully it get's integrated and published to NuGet, soon.

I'm now wrapping my HTML fragments inside an artificial <div> root tag to fulfil the requirement of one root only. Of course, this only works when reading from the document, not wenn modifying.

lovettchris · 2016-09-10T20:41:23Z

I'm worried this change is too HTML centric, what about SgmlReader over other types of data, like OFX ?

greatvovan · 2017-03-28T17:29:41Z

@lovettchris I can not see how it affects OFX and other types. Why did you get this particular doubt?

Fix reading HTML via XmlReader API when root element is absent

5315fad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reading HTML via XmlReader API when root element is absent #16

Fix reading HTML via XmlReader API when root element is absent #16

greatvovan commented May 11, 2016

UweKeim commented May 24, 2016 •

edited

Loading

lovettchris commented Sep 10, 2016

greatvovan commented Mar 28, 2017

Fix reading HTML via XmlReader API when root element is absent #16

Are you sure you want to change the base?

Fix reading HTML via XmlReader API when root element is absent #16

Conversation

greatvovan commented May 11, 2016

UweKeim commented May 24, 2016 • edited Loading

lovettchris commented Sep 10, 2016

greatvovan commented Mar 28, 2017

UweKeim commented May 24, 2016 •

edited

Loading