Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash parsing HTML document #126

Closed
retorquere opened this issue Jun 26, 2019 · 21 comments · Fixed by #181
Closed

Crash parsing HTML document #126

retorquere opened this issue Jun 26, 2019 · 21 comments · Fixed by #181
Assignees
Labels

Comments

@retorquere
Copy link

htmltest is erroring out when I run it:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1111246]

goroutine 1 [running]:
github.com/wjdp/htmltest/htmldoc.(*Document).Parse(0x0)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmldoc/document.go:47 +0x26
github.com/wjdp/htmltest/htmldoc.(*Document).IsHashValid(0x0, 0xc000364b1e, 0x15, 0x127b200)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmldoc/document.go:112 +0x2b
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkInternalHash(0xc0000c5200, 0xc000161770)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:297 +0xad
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkInternal(0xc0000c5200, 0xc000161770)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:271 +0xcb
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkLink(0xc0000c5200, 0xc00010c700, 0xc00012df10)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:85 +0x3f9
github.com/wjdp/htmltest/htmltest.(*HTMLTest).testDocument(0xc0000c5200, 0xc00010c700)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:203 +0x193
github.com/wjdp/htmltest/htmltest.(*HTMLTest).testDocuments(0xc0000c5200)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:182 +0x6a
github.com/wjdp/htmltest/htmltest.Test(0xc000075aa0, 0x1, 0x1, 0x49)
	/home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:142 +0x8fb
main.run(0xc000075aa0, 0xc000075aa0)
	/home/travis/gopath/src/github.com/wjdp/htmltest/main.go:159 +0x1e8
main.main()
	/home/travis/gopath/src/github.com/wjdp/htmltest/main.go:66 +0x298

To Reproduce

Steps to reproduce the behaviour:

  1. Run with config and options …
htmltest --log-level 3

.htmltest.yml

Please copy in your config file

DirectoryPath: "public"
EnforceHTTPS: false
CacheExpires: "6h"
CheckExternal: false
IgnoreDirectoryMissingTrailingSlash: true
IgnoreInternalEmptyHash: true
IgnoreDirs:
- Support

Source files

I haven't been able to narrow it down yet -- my request is for htmltest to print the page it's processing to help narrow it down.

Expected behaviour

print each page as it's being processed

Versions

  • OS: MacOS 10.14.5
  • htmltest: 0.10.3
@retorquere retorquere added the bug label Jun 26, 2019
@wjdp
Copy link
Owner

wjdp commented Jun 26, 2019

Hey @retorquere you can run htmltest -l0 to log every file as it's tested.

@retorquere
Copy link
Author

I'm an idiot. Of course I wanted level 0, sorry.

The offending page is at https://gist.github.com/6c955708ecfa70ff55d363c485f9eb1e

@retorquere
Copy link
Author

No wait -- the log ends at

pull-export/index.html
  DOCTYPE html []
 --- pull-export/index.html --> <nil>
testDocument on test/index.html
panic: runtime error: invalid memory address or nil pointer dereference

so which of these two is likely the culprit? pull-export/index.html or test/index.html?

@retorquere
Copy link
Author

I have another file on which it consistently crashes, but if I test only that file, it passes.

@wjdp
Copy link
Owner

wjdp commented Jun 27, 2019

It'll be test/index.html there. The debug message "testDocument on…" is the first call when finished with the last doc and starting the next.

@retorquere
Copy link
Author

It crashes on more files now. I've removed test/index.html since, but I still have others. My current run ends with

testDocument on installation/configuration/hidden-preferences/index.html
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1111246]

but it may not be something about that file in particular; if I set DirectoryPath to public/installation/configuration (it's usually set to public), it does not crash.

@retorquere
Copy link
Author

My site is available as a tarball on https://0x0.st/z2DV.gz

@retorquere
Copy link
Author

(but that tarball was produced on MacOS, which means that Support and support are deemed to be the same file)

@wjdp
Copy link
Owner

wjdp commented Jun 27, 2019

Thanks! I'm on holiday next week but will try to have a look at this at some point in July.

@wjdp wjdp self-assigned this Jun 27, 2019
@retorquere
Copy link
Author

Thanks! Is there anything I can do in the interim to help debugging this?

@retorquere
Copy link
Author

I've tried this on a linux system and it runs without issue there.

@wjdp
Copy link
Owner

wjdp commented Jun 27, 2019

Ah, that's very interesting. I'm no expert on osx (only access I have is the Travis test runners). Right now, without looking at code, unfortunately I don't have any ideas.

@retorquere
Copy link
Author

No issue. When you're back I'd be happy to run an instrumented version that may give more insight.

@retorquere retorquere changed the title print page being processed when log-level is debug htmltest crashes on MacOS Jun 27, 2019
@danyill
Copy link

danyill commented Apr 26, 2020

I'm seeing the same on Linux on Ubuntu Focal:

node@791983aec7ee:~/antora-base$ ./bin/htmltest -l0 public
htmltest started at 09:18:46 on public
========================================================================
0: DirectoryPath string = public
1: DirectoryIndex string = index.html
2: FilePath string = 
3: FileExtension string = .html
4: CheckDoctype bool = true
5: CheckAnchors bool = true
6: CheckLinks bool = true
7: CheckImages bool = true
8: CheckScripts bool = true
9: CheckMeta bool = true
10: CheckGeneric bool = true
11: CheckExternal bool = true
12: CheckInternal bool = true
13: CheckInternalHash bool = true
14: CheckMailto bool = true
15: CheckTel bool = true
16: CheckFavicon bool = false
17: CheckMetaRefresh bool = true
18: EnforceHTML5 bool = false
19: EnforceHTTPS bool = false
20: IgnoreURLs []interface {} = []
21: IgnoreDirs []interface {} = []
22: IgnoreInternalEmptyHash bool = false
23: IgnoreEmptyHref bool = false
24: IgnoreCanonicalBrokenLinks bool = true
25: IgnoreExternalBrokenLinks bool = false
26: IgnoreAltMissing bool = false
27: IgnoreDirectoryMissingTrailingSlash bool = false
28: IgnoreSSLVerify bool = false
29: IgnoreTagAttribute string = data-proofer-ignore
30: HTTPHeaders map[interface {}]interface {} = map[Accept:*/* Range:bytes=0-0]
31: TestFilesConcurrently bool = false
32: DocumentConcurrencyLimit int = 128
33: HTTPConcurrencyLimit int = 16
34: LogLevel int = 0
35: LogSort string = document
36: ExternalTimeout int = 15
37: StripQueryString bool = true
38: StripQueryExcludes []string = [fonts.googleapis.com]
39: EnableCache bool = true
40: EnableLog bool = true
41: OutputDir string = tmp/.htmltest
42: OutputCacheFile string = refcache.json
43: OutputLogFile string = htmltest.log
44: CacheExpires string = 336h
45: NoRun bool = false
46: VCREnable bool = false
47: Version string = 0.12.1
testDocument on Home/faq.html
Home/faq.html
  DOCTYPE html []
 --- Home/faq.html --> <nil>
  from cache --- Home/faq.html --> https://docs.tpwiki.com/Home/faq.html
  OK --- Home/faq.html --> https://docs.tpwiki.com/Home/faq.html
  from cache --- Home/faq.html --> https://docs.tpwiki.com
  OK --- Home/faq.html --> https://docs.tpwiki.com
  target does not exist --- Home/faq.html --> /oauth2/sign_out
testDocument on Home/index.html
Home/index.html
  DOCTYPE html []
 --- Home/index.html --> <nil>
  from cache --- Home/index.html --> https://docs.tpwiki.com/Home/index.html
  OK --- Home/index.html --> https://docs.tpwiki.com/Home/index.html
  from cache --- Home/index.html --> https://docs.tpwiki.com
  OK --- Home/index.html --> https://docs.tpwiki.com
  target does not exist --- Home/index.html --> /oauth2/sign_out
testDocument on SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html
SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html
  DOCTYPE html []
 --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> <nil>
  from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://docs.tpwiki.com
  OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://docs.tpwiki.com
  target does not exist --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> /oauth2/sign_out
  from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/tree/master
  OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/tree/master
  from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/issues
  OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/issues
  from cache --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/compare/master...master
  OK --- SEL751_Arc_Flash_Protection_Settings/unstable/downloads/Downloads.html --> https://gitlab.tpwiki.com/standard-designs/arc-flash-protection/SEL751_Arc_Flash_Protection_Settings/compare/master...master
testDocument on SEL751_Arc_Flash_Protection_Settings/unstable/setting_guide/Setting_Guide.html
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x51c0d7]

goroutine 1 [running]:
github.com/wjdp/htmltest/htmldoc.(*Document).Parse(0x0)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmldoc/document.go:47 +0x37
github.com/wjdp/htmltest/htmldoc.(*Document).IsHashValid(...)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmldoc/document.go:112
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkInternalHash(0xc0000ce240, 0xc0003210b0)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:325 +0xb0
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkInternal(0xc0000ce240, 0xc0003210b0)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:299 +0x15d
github.com/wjdp/htmltest/htmltest.(*HTMLTest).checkLink(0xc0000ce240, 0xc0000fe480, 0xc0001ed0a0)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/check-link.go:97 +0x5ec
github.com/wjdp/htmltest/htmltest.(*HTMLTest).testDocument(0xc0000ce240, 0xc0000fe480)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:204 +0x18c
github.com/wjdp/htmltest/htmltest.(*HTMLTest).testDocuments(0xc0000ce240)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:183 +0x65
github.com/wjdp/htmltest/htmltest.Test(0xc000013950, 0xc000010018, 0xc0000f9d48, 0x1)
        /home/travis/gopath/src/github.com/wjdp/htmltest/htmltest/htmltest.go:143 +0x89b
main.run(0xc000013950, 0xc000013950)
        /home/travis/gopath/src/github.com/wjdp/htmltest/main.go:159 +0x207
main.main()
        /home/travis/gopath/src/github.com/wjdp/htmltest/main.go:66 +0x268

My system is:

Linux 791983aec7ee 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 x86_64 x86_64 GNU/Linux

running within a Docker container.

Happy to provide further information. This error is highly consistent and always occurs.

@danyill
Copy link

danyill commented Apr 26, 2020

My directory is also public but I tried public2 and 2xxx both of which it also crashed on with the same errors.

@wjdp wjdp changed the title htmltest crashes on MacOS Crash parsing HTML document Jan 19, 2021
@wjdp
Copy link
Owner

wjdp commented Jan 19, 2021

This seems to be an issue with parsing HTML. I know this issue is very old but @danyill do you have a copy of the files that caused the crash?

@danyill
Copy link

danyill commented May 14, 2021

@wjdp Sorry for the slow response, time is getting away on me. I have a copy of a very similar one which also crashes on the latest version of htmltest. I can't share this publicly but am happy to provide it with you. What is the easiest way to provide this to? Can I email it to your commit address? (1.5 Mb file with embedded images).

@Marshevskyy
Copy link

Marshevskyy commented Oct 27, 2021

Hi, @wjdp. I was able to replicate this error.

In my case, I have 2 pages, first page has an anchor link to another page
page 1 public/docs/dev/index.html

...
<!DOCTYPE html>
<html>
<title>test</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<body>
  <a href="/docs/hello-configuration/#link">
    <code class="language-text">link</code>
  </a>
</body>
</html>
...

page 2 public/docs/hello-configuration/index.html

...
<!DOCTYPE html>
<html>
<title>test</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<body>
  <h2 id="link" style="position:relative">
    <a href="#link">
    </a>link
  </h2>
</body>
</html>
...

.htmltest.yml:

IgnoreDirectoryMissingTrailingSlash: true
DirectoryPath: "public/"
IgnoreAltMissing: true
OutputDir: "tmp/.htmltest"
OutputCacheFile: "refcache.json"
OutputLogFile: "htmltest.log"
IgnoreDirs:
  - hello

Links are valid since when I run it on localhost or server, links work OK.
Is there any workaround?
Please let me know if you need more details.

UPDATED (27.10.21): once I remove DirectoryPath: "public/" from .htmltest.yml. it seems to be working
UPDATED (28.10.21):

  • found issue, as you can see my fail name is hello-configuration/index.html and in config, I have IgnoreDirs -> - hello. this is a case! once I renamed my file or remove IngoreDirs or change name hello to hello/, this error is gone. .

@markmandel
Copy link
Contributor

Been digging into this while watching tv 😄

I'm quite sure I've narrowed down the culprit:

refDoc, _ := hT.documentStore.ResolveRef(ref)

Debugging shows me that hT.documentStore.ResolveRef(ref) can return a response of (nil, false), but the ok value is never checked.

The way I'm currently fairly sure I can hit this issue is one of two ways:

  1. Aim htmltest at a html page that has links to parent directories
  2. Aim htmltest at a html page, but have valid (at least I think they are valid) links to ignored set of pages covered by IgnoreDirs.

From there, any call to member functions will panic if they reference internal members.

I'll keep digging, but I wanted to report on my progress in case it spurred someone else to see the correct path through to resolving this issue.

@markmandel
Copy link
Contributor

markmandel commented Nov 24, 2021

So easy enough fix for the panic, check the ok value returned from ResolveRef (my branch is over here):

https://github.com/markmandel/htmltest/blob/1756c2ea506c42270afb56ca7bed6e9194701a2a/htmltest/check-link.go#L336-L343

The next issue I run into, is that this reference I have should resolve, but it doesn't because (I assume) the reference it points to isn't available in DocumentPathMap since it matches IgnoreDirs 🤔 Now to work out how that gets populated.

@markmandel
Copy link
Contributor

Okay! I think I got it working! I had to keep the list of all Document in DocumentStore and add a property to say if they should be ignored for test or not - that allows for references to be checked against, but can be skipped over for testing.

PR coming shortly!

markmandel added a commit to markmandel/htmltest that referenced this issue Nov 24, 2021
This fix includes two things:
1. Check for `ok` value from `ResolveRef` in `checkInternalHash`. If a
value was ignored but was a valid link, it would panic as it was not
found.
2. Change the behaviour of `discoverRecurse` such that it keeps all
found `Document`s, but adds a new `IgnoreTest` attribute such that we
can track if it should be skipped on test, but still referenced in
a test.

Closes wjdp#126
@wjdp wjdp closed this as completed in #181 Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants