You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue while using the scraper crate in my Rust project. The error message indicates that the future created by the async block is not Send, which means it cannot be safely sent between threads. The issue arises because the Html type from the scraper crate does not implement the Send trait.
The specific error message is as follows:
error: future cannot be sent between threads safely
--> docs-to-knowledge/src/lib.rs:29:91
|
29 | async fn fetch_all(&self) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
| ^
30 | match self.knowledge_type {
31 | KnowledgeType::CratesIo => crates_io::fetch_docs(&self.repo_path).await,
32 | }
33 | }
|_____^ future created by async block is not `Send`
|
= help: within `tendril::tendril::NonAtomic`, the trait `Sync` is not implemented for `Cell<usize>`, which is required by `{async block@docs-to-knowledge/src/lib.rs:29:91: 33:6}: Send`
= note: if you want to do aliasing and mutation between multiple threads, use `std::sync::RwLock` or `std::sync::atomic::AtomicUsize` instead
note: future is not `Send` as this value is used across an await
--> docs-to-knowledge/src/crates_io.rs:46:61
|
34 | let document = Html::parse_document(&html);
| -------- has type `Html` which is not `Send`
...
46 | let page_html = fetch_page_html(&client, &page_url).await?;
| ^^^^^ await occurs here, with `document` maybe used later
= note: required for the cast from `Pin<Box<{async block@docs-to-knowledge/src/lib.rs:29:91: 33:6}>>` to `Pin<Box<dyn Future<Output = Result<String, Box<dyn std::error::Error + Send + Sync>>> + Send>>`
The issue occurs because the Html type contains references to non-thread-safe types like Cell<usize> from the tendril crate. When the Html type is used across an await point (e.g., the .await? call on fetch_page_html), it means that the Html value may be accessed by multiple threads concurrently, which is not allowed for types that are not Send.
Steps to Reproduce
Create a new Rust project and add the scraper crate as a dependency in the Cargo.toml file.
Use the scraper crate to parse HTML and extract information within an async context.
Use the Html type from the scraper crate across an await point (e.g., store it in a variable and use it after an .await call).
Run the project and observe the compilation error related to the future not being Send.
Possible Solutions
Here are a few possible solutions to resolve this issue:
Make the Html type Send and Sync:
Modify the implementation of the Html type in the scraper crate to ensure that it is thread-safe and implements the Send and Sync traits.
This may involve replacing the usage of non-thread-safe types like Cell<usize> with thread-safe alternatives like AtomicUsize or RwLock.
Provide an alternative thread-safe API:
Consider providing an alternative API in the scraper crate that allows for thread-safe usage of the HTML parsing and extraction functionality.
This could involve introducing new types or methods that are designed to be used safely in async contexts and across thread boundaries.
Document the limitations and provide usage guidelines:
Update the documentation of the scraper crate to clearly state the limitations regarding the usage of the Html type in async contexts and across thread boundaries.
Provide guidelines and examples on how to use the scraper crate safely in such scenarios, possibly suggesting alternative approaches or workarounds.
Additional Information
Rust version: 1.77.1 (7cf61ebde 2024-03-27)
Operating system: archlinux (GNU/Linux)
scraper version: 0.19.0
If you need any further information or have any questions, please let me know. I would be happy to provide more details or assist in any way possible.
Thank you for your time and consideration.
The text was updated successfully, but these errors were encountered:
Description
I encountered an issue while using the
scraper
crate in my Rust project. The error message indicates that the future created by the async block is notSend
, which means it cannot be safely sent between threads. The issue arises because theHtml
type from thescraper
crate does not implement theSend
trait.The specific error message is as follows:
The issue occurs because the
Html
type contains references to non-thread-safe types likeCell<usize>
from thetendril
crate. When theHtml
type is used across an await point (e.g., the.await?
call onfetch_page_html
), it means that theHtml
value may be accessed by multiple threads concurrently, which is not allowed for types that are notSend
.Steps to Reproduce
scraper
crate as a dependency in theCargo.toml
file.scraper
crate to parse HTML and extract information within an async context.Html
type from thescraper
crate across an await point (e.g., store it in a variable and use it after an.await
call).Send
.Possible Solutions
Here are a few possible solutions to resolve this issue:
Make the
Html
typeSend
andSync
:Html
type in thescraper
crate to ensure that it is thread-safe and implements theSend
andSync
traits.Cell<usize>
with thread-safe alternatives likeAtomicUsize
orRwLock
.Provide an alternative thread-safe API:
scraper
crate that allows for thread-safe usage of the HTML parsing and extraction functionality.Document the limitations and provide usage guidelines:
scraper
crate to clearly state the limitations regarding the usage of theHtml
type in async contexts and across thread boundaries.scraper
crate safely in such scenarios, possibly suggesting alternative approaches or workarounds.Additional Information
scraper
version: 0.19.0If you need any further information or have any questions, please let me know. I would be happy to provide more details or assist in any way possible.
Thank you for your time and consideration.
The text was updated successfully, but these errors were encountered: