Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiprocessing on macOS and Windows #88

Merged
merged 10 commits into from
Aug 23, 2023

Conversation

xxyzz
Copy link
Collaborator

@xxyzz xxyzz commented Aug 18, 2023

I'm also planning to replace the deprecated pkg_resources module, I'll do that next week.

This callback function was added for select pages in some namspaces,
this should be done with the `namespace_ids` arguemnt to query the
sqlite database.

Also fix the subprocess still running resouce warning.
The code doesn't reply on the Linux fork method to copy variables now.
multiprocessing code are moved to wiktextract, unpicklable objects
like sqlite connection and lupa runtime are handled at there.
This file is mostly test the `process_dump()`, move it to
`tests/test_dumppaser.py`. Extract each page is still tested in
wiktextract.
`get_page()` is kind slow(1s per call), cache the requests improve the
performance significantly. This reduces the process time of Chinese
Wiktionary from 40 minutes to 10 minutes.
@xxyzz
Copy link
Collaborator Author

xxyzz commented Aug 21, 2023

4b3d963 makes the code 4 times faster. But I don't know why this simple SQL is slow.

The English Wiktionary has 40327 template and 54524 module
pages(exclude redirects). I think the cache is mostly useful for
caching shared templat and module pages used in pages processes in a
worker process. So the cache size relates to page numbers, CPU core
numbers and shared templates. If 1000 is not enough, we can increase
it to 10000.
@xxyzz
Copy link
Collaborator Author

xxyzz commented Aug 22, 2023

Lua errors in Wiktionary Module pages won't throw lupa.LuaError exception because pcall Lua api doesn't propagate errors. But if error happens in other part of the Lua code in lua folder, then a4a0bfc will be used.

@kristian-clausal kristian-clausal merged commit e5296c1 into tatuylonen:main Aug 23, 2023
2 checks passed
@xxyzz xxyzz deleted the pool branch August 23, 2023 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants