Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"fatal error: runtime: out of memory" with site with large number of links #185

Open
MarkvanMents opened this issue Jan 27, 2022 · 6 comments
Labels

Comments

@MarkvanMents
Copy link

Hi Will,
I want to use htmltest for the internal links on my large Hugo-generated website.
Running on Windows 10, it tests 3095 documents in around 3 minutes and has helped me to resolve some missing internal links. Exactly what I want.

When I put it into my Travis job to build the site automatically, however, htmltest fails with fatal error: runtime: out of memory.
It seems that my Travis job is only given 8GB RAM - when I run under Windows, htmltest takes 16GB available to it (out of 24GB physical memory on the motherboard).
I tried setting up a 20GB swap file for Travis. This stops the fatal error, but htmltest is still running after 30 minutes and eventually gets killed by Travis.

Is this a known issue? Are there any options I have missed to tell htmltest to use less memory?

Let me know if you would like any more information. I would really like to use your tool if I can as it seems very fast and flexible.

@MarkvanMents
Copy link
Author

A bit more information on this issue:
I am using Hugo with Docsy to produce a site.
There is a very large number of links and anchors which Hugo/Docsy creates in the sidebar. Since these are generated automatically by Hugo, I don't need to test these. I only want to test the links and anchors which are in the

... section of the file.
I can't see any way to tell htmltest to only look at this part of the file.
If I force htmltest to ignore the Docsy-generated menu items using data-proofer-ignore it still seems to use the same amount of memory and build all the cross-referencing - it just doesn't report any errors on them. And there is no way ignore the anchors/ids in any case.

It would be great if there was a way to only test a part of a document (e.g. within the

tags, but I realise that this is a feature request, so I don't expect it to be added any time soon (if ever).

Thanks for all the work you have put into htmltest. It is a pity that the combination of Hugo and Docsy is producing an unmanageable number of links otherwise I would have no hesitation in using it.

@wjdp
Copy link
Owner

wjdp commented Feb 15, 2022

Hey @MarkvanMents thanks for the follow up on this. Yes, in short term it's unlikely I'll be able to have a look at this.
If I do, or someone else can can you provide either your built site or an example that produces the same effect?
Thanks

@wjdp wjdp added the bug label Feb 15, 2022
@wjdp wjdp changed the title "fatal error: runtime: out of memory" running on Travis-CI "fatal error: runtime: out of memory" with site with large number of links Feb 15, 2022
@MarkvanMents
Copy link
Author

MarkvanMents commented Feb 17, 2022

Hi @wjdp
Thanks for getting back to me, Will.

I have managed to solve this for my case by ignoring all the <aside> tags. I'm busy getting my site finished at present, but will look at making a solution configurable through .htmltest.yml when I have more time. I think I'm an exception and most users will want to continue to test their whole site. If I make a PR for my change I will link it to this issue.

The GitHub source for the site I am building is here: https://github.com/mendix/docs-site-test (work in progress as you can tell from the title)
This uses the Docsy theme of Hugo to build a site on AWS here: http://mendix-new-docs-site.s3-website.eu-central-1.amazonaws.com/

Docsy generates huge HTML files in our case because every file has a sidebar with around 3000 links.
Luckily, the sidebar is within <aside> tags and I have hardcoded this cludge to remove these from the htmlNode structure: MarkvanMents@e84902e.

My hard-coded change at least enables me to run htmltest and reduces the memory used in Windows from 16GB to around 700MB, taking about the same length of time in both cases.

Not sure whether the current design of htmltest would allow a more memory-efficient solution if I had wanted to test links in all the <aside> sections as well. I don't know enough about Hugo's memory management to know how you would do that. And the great thing about htmltest is the speed, so you wouldn't want to slow down htmltest just to solve these extreme cases.

But at least I can solve my particular case by significantly reducing the size of the Parse tree.

@wjdp
Copy link
Owner

wjdp commented Feb 17, 2022

Looking at the code change you've done this seems like a good fit to extend the data-proofer-ignore attribute so instead of ignoring late it could just remove the node (and therefore children) before parsing the doc. Not quite as extensible as a user configurable filter but given the niche nature of this I'm hesitant to suggest adding another config option.

@MarkvanMents
Copy link
Author

Thanks for that idea - sounds like a simpler solution, can be implemented in the same way, and means that any block can be ignored.
I'll look at making the change this way when I get around to removing the hard-coding.

@MarkvanMents
Copy link
Author

Hi Will,
Our site is now live with htmltest at https://github.com/mendix/docs.
I have applied the updated version from #188 in this request (mendix/docs#4511) on our production documentation site. It performs the test without running out of memory.
I hope my colleagues will enable me to put it live next week.

Thanks for developing htmltest - it is much faster and more flexible than the homegrown code it is replacing. Hope my PR resolves issues for others as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants