Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

future created by async block is not Send #181

Closed
JamesClarke7283 opened this issue Apr 9, 2024 · 2 comments
Closed

future created by async block is not Send #181

JamesClarke7283 opened this issue Apr 9, 2024 · 2 comments

Comments

@JamesClarke7283
Copy link

Description

I encountered an issue while using the scraper crate in my Rust project. The error message indicates that the future created by the async block is not Send, which means it cannot be safely sent between threads. The issue arises because the Html type from the scraper crate does not implement the Send trait.

The specific error message is as follows:

error: future cannot be sent between threads safely
  --> docs-to-knowledge/src/lib.rs:29:91
   |
29 |     async fn fetch_all(&self) -> Result<String, Box<dyn std::error::Error + Send + Sync>> {
   |                                                                                           ^
30 |         match self.knowledge_type {
31 |             KnowledgeType::CratesIo => crates_io::fetch_docs(&self.repo_path).await,
32 |         }
33 |     }
   |_____^ future created by async block is not `Send`
   |
   = help: within `tendril::tendril::NonAtomic`, the trait `Sync` is not implemented for `Cell<usize>`, which is required by `{async block@docs-to-knowledge/src/lib.rs:29:91: 33:6}: Send`
   = note: if you want to do aliasing and mutation between multiple threads, use `std::sync::RwLock` or `std::sync::atomic::AtomicUsize` instead

note: future is not `Send` as this value is used across an await
  --> docs-to-knowledge/src/crates_io.rs:46:61
   |
34 |     let document = Html::parse_document(&html);
   |         -------- has type `Html` which is not `Send`
...
46 |         let page_html = fetch_page_html(&client, &page_url).await?;
   |                                                             ^^^^^ await occurs here, with `document` maybe used later
   = note: required for the cast from `Pin<Box<{async block@docs-to-knowledge/src/lib.rs:29:91: 33:6}>>` to `Pin<Box<dyn Future<Output = Result<String, Box<dyn std::error::Error + Send + Sync>>> + Send>>`

The issue occurs because the Html type contains references to non-thread-safe types like Cell<usize> from the tendril crate. When the Html type is used across an await point (e.g., the .await? call on fetch_page_html), it means that the Html value may be accessed by multiple threads concurrently, which is not allowed for types that are not Send.

Steps to Reproduce

  1. Create a new Rust project and add the scraper crate as a dependency in the Cargo.toml file.
  2. Use the scraper crate to parse HTML and extract information within an async context.
  3. Use the Html type from the scraper crate across an await point (e.g., store it in a variable and use it after an .await call).
  4. Run the project and observe the compilation error related to the future not being Send.

Possible Solutions

Here are a few possible solutions to resolve this issue:

  1. Make the Html type Send and Sync:

    • Modify the implementation of the Html type in the scraper crate to ensure that it is thread-safe and implements the Send and Sync traits.
    • This may involve replacing the usage of non-thread-safe types like Cell<usize> with thread-safe alternatives like AtomicUsize or RwLock.
  2. Provide an alternative thread-safe API:

    • Consider providing an alternative API in the scraper crate that allows for thread-safe usage of the HTML parsing and extraction functionality.
    • This could involve introducing new types or methods that are designed to be used safely in async contexts and across thread boundaries.
  3. Document the limitations and provide usage guidelines:

    • Update the documentation of the scraper crate to clearly state the limitations regarding the usage of the Html type in async contexts and across thread boundaries.
    • Provide guidelines and examples on how to use the scraper crate safely in such scenarios, possibly suggesting alternative approaches or workarounds.

Additional Information

  • Rust version: 1.77.1 (7cf61ebde 2024-03-27)
  • Operating system: archlinux (GNU/Linux)
  • scraper version: 0.19.0

If you need any further information or have any questions, please let me know. I would be happy to provide more details or assist in any way possible.

Thank you for your time and consideration.

@adamreichold
Copy link
Member

Please have a look at the existing atomic feature.

@fan-tastic-z
Copy link

I solved it by adding atomic feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants