Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable configuration for per-file timeout #593

Closed
pombredanne opened this issue Jan 31, 2023 · 4 comments
Closed

Enable configuration for per-file timeout #593

pombredanne opened this issue Jan 31, 2023 · 4 comments

Comments

@pombredanne
Copy link
Contributor

pombredanne commented Jan 31, 2023

INFO Some files failed to scan properly:
With these two files part of a larger Boost:
https://raw.githubusercontent.com/boostorg/typeof/develop/include/boost/typeof/vector150.hpp
https://raw.githubusercontent.com/boostorg/typeof/develop/include/boost/typeof/vector200.hpp

... when running a scan_codebase pipeline, we get a timeout at 120s:

INFO Path: codebase/scipy-1.10.0/scipy/_lib/boost/boost/typeof/vector150.hpp
INFO   ERROR: for scanner: copyrights:
INFO   ERROR: Processing interrupted: timeout after 120 seconds.
INFO Path: codebase/scipy-1.10.0/scipy/_lib/boost/boost/typeof/vector200.hpp
INFO   ERROR: for scanner: copyrights:
INFO   ERROR: Processing interrupted: timeout after 120 seconds.
INFO Path: codebase/scipy-1.10.0/scipy/misc/ascent.dat
INFO   ERROR: for scanner: copyrights:
INFO   ERROR: Processing interrupted: timeout after 120 seconds.

But SCANCODEIO_TASK_TIMEOUT defaults to 86400s..

The 120s timeout comes from the ScanCode tootkit default.

We should either expose this as a setting or we should avoid having processing (of copyrights) being so slow on large files that they timeout.
FWIW, here the copyright is at the top and completes super fast:

$ head -n 100 vector150.hpp > foo
$ scancode --yaml y.ml --copyright foo
Setup plugins...
Collect file inventory...
Scan files for: copyrights with 1 process(es)...
[####################] 2             
Scanning done.
Summary:        copyrights with 1 process(es)
Errors count:   0
Scan Speed:     20.02 files/sec. 
Initial counts: 1 resource(s): 1 file(s) and 0 directorie(s) 
Final counts:   1 resource(s): 1 file(s) and 0 directorie(s) 
Timings:
  scan_start: 2023-01-31T194140.236620
  scan_end:   2023-01-31T194140.292716
Removing temporary files...done.

See also aboutcode-org/scancode-toolkit#2726 (comment)

JonoYang added a commit that referenced this issue Feb 1, 2023
    * The environment variable SCANCODEIO_SCAN_FILE_TIMEOUT can be set to control how much time is given to a file when scanning a codebase

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 1, 2023
    * The environment variable SCANCODEIO_SCAN_FILE_TIMEOUT can be set to control how much time is given to a file when scanning a codebase

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Feb 1, 2023
    * The environment variable SCANCODEIO_SCAN_FILE_TIMEOUT can be set to control how much time is given to a file when scanning a codebase

Signed-off-by: Jono Yang <jyang@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Feb 15, 2023

But SCANCODEIO_TASK_TIMEOUT defaults to 86400s..

This applies to the whole pipeline run and not to single file.
The SCANCODE_TOOLKIT_CLI_OPTIONS setting https://scancodeio.readthedocs.io/en/latest/scancodeio-settings.html#scancode-toolkit-cli-options is available to pass custom value to scancode-toolkit but will only work for pipelines using a subprocess call to the toolkit exec.

We need a better way to handle this.

@junyer
Copy link

junyer commented Feb 15, 2023

FWIW, we ended up adding this to .env-scancodeio when testing the AppImage:

SCANCODE_TOOLKIT_CLI_OPTIONS="--timeout=3600"
SCANCODEIO_SCAN_FILE_TIMEOUT="3600"

tdruez added a commit that referenced this issue Mar 22, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Mar 22, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Mar 22, 2023

@pombredanne the SCANCODEIO_SCAN_FILE_TIMEOUT setting is now available to control the per-file timeout #644
Thanks @JonoYang for the solution in 0a7c248

@pombredanne I'm keeping the issue open until we discuss the following approach:

We should either expose this as a setting or we should avoid having processing (of copyrights) being so slow on large files that they timeout.
FWIW, here the copyright is at the top and completes super fast:

@tdruez
Copy link
Contributor

tdruez commented Jan 8, 2024

To clarify, there's 2 timeout settings available:

@tdruez tdruez closed this as completed Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants