Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize #at_css and #css initialization #14

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

zyc9012
Copy link
Collaborator

@zyc9012 zyc9012 commented Sep 1, 2024

Before

Nokolexbor at_css:   138765.3 i/s
Nokolexbor    css:    10132.0 i/s

After

Nokolexbor at_css:   202963.7 i/s
Nokolexbor    css:     10209.4 i/s

@zyc9012 zyc9012 requested a review from Freaky September 1, 2024 08:51
@zyc9012 zyc9012 force-pushed the optimize-nl-node-find-init branch 5 times, most recently from 21f99b6 to a3257b6 Compare September 1, 2024 11:50
Copy link

@Freaky Freaky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zyc9012 - I like the look of that micro-benchmark!

Some preliminary tests with our scraper parsers shows an average improvement of about 3%, and a small improvement to memory consumption.

I'm not sure we really need these pthread_key_t variables - they can be shared across objects, they can surely also be shared across threads provided they're protected by a mutex, which we have implicitly in the GIL. Can we drop all the pthread stuff and just have:

static lxb_html_parser_t *html_parser = NULL;
static lxb_css_parser_t *css_parser = NULL;
static lxb_selectors_t *selectors = NULL;

And allocate them on initialization or first use? Letting them leak seems less important if they're just a one-time process-wide allocation.

Comment on lines 1202 to 1203
pthread_key_create(&p_key_css_parser, free_css_parser);
pthread_key_create(&p_key_selectors, free_selectors);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are fallible and can fail to allocate a key, which would lead to undefined behaviour in later calls to pthread_get/setspecific.

css_parser = lxb_css_parser_create();
status = lxb_css_parser_init(css_parser, NULL, NULL);
if (status != LXB_STATUS_OK) {
css_parser = NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the error was specifically LXB_STATUS_ERROR_OBJECT_IS_NULL this will leak the initial allocation plus any others that happened to succeed within the init call.

selectors = lxb_selectors_create();
status = lxb_selectors_init(selectors);
if (status != LXB_STATUS_OK) {
selectors = NULL;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, if the error was anything other than LXB_STATUS_ERROR_INCOMPLETE_OBJECT, the initial allocation will leak, plus any of the dobject allocations that did succeed within the init function.

Comment on lines 401 to 406
/* Destroy Selectors object. */
(void)lxb_selectors_destroy(selectors, true);

/* Destroy resources for CSS Parser. */
(void)lxb_css_parser_destroy(parser, true);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be kept for the error handling cases, and also for the #ifndef HAVE_PTHREAD_H case where the destructor call is not done for us at thread exit.

lxb_status_t status = lxb_html_parser_init(g_parser);
if (html_parser == NULL) {
html_parser = lxb_html_parser_create();
lxb_status_t status = lxb_html_parser_init(html_parser);
if (status != LXB_STATUS_OK) {
nl_raise_lexbor_error(status);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaks the initial allocation unless status was LXB_STATUS_ERROR_OBJECT_IS_NULL. Otherwise it also leaks other allocations performed by lxb_html_parser_init. The parser object also leaks in the non-HAVE_PTHREAD_H case due to reliance on it calling the destructor.

@zyc9012
Copy link
Collaborator Author

zyc9012 commented Sep 4, 2024

Thanks @Freaky. I've removed all pthread stuff and made sure that it's thread-safe on truffleruby.

@zyc9012 zyc9012 requested a review from Freaky September 4, 2024 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants