Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Polish language stemming #2484

Open
asdfMaciej opened this issue Sep 3, 2024 · 1 comment
Open

Add support for Polish language stemming #2484

asdfMaciej opened this issue Sep 3, 2024 · 1 comment

Comments

@asdfMaciej
Copy link

asdfMaciej commented Sep 3, 2024

Hello! Tantivy is used by multiple projects - I've found it as a dependency of ParadeDB. I would love it if it supported stemming in Polish language, as it would enable Polish language support downstream.

I'm not sure though if it's currently possible? I saw that tantivy uses rust_stemmers as a dependency for multi-language stemming. However, its repository appears to be unmaintained with 13 open PRs and the last commit being from 2021. Also, it appears that a language needs to be added to Snowball prior to adding to rust_stemmers, which has an open PR (since 2021) for the Polish language: snowballstem/snowball#159

Sadly, I don't know NLP well enough to contribute - but I hope this write-up comes in handy for someone :)

Thanks for maintaining this project!

@asdfMaciej
Copy link
Author

I've found this crate, which appears to be maintained and tackles the same exact issue: https://github.com/testuj-to/tantivy-stemmers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant