Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing stuck in "Queued" state #387

Closed
lucasgonze opened this issue Jan 21, 2022 · 10 comments
Closed

Processing stuck in "Queued" state #387

lucasgonze opened this issue Jan 21, 2022 · 10 comments
Labels
bug Something isn't working

Comments

@lucasgonze
Copy link

lucasgonze commented Jan 21, 2022

Context: 2017 Air running OS X Monterey

Install and run...

  1. git clone https://github.com/nexB/scancode.io.git && cd scancode.io
  2. make envfile
  3. docker-compose build
  4. docker-compose up

Zip up a project to analyze and open the Scancode UI:

  1. git clone git@github.com:magma/magma.git
  2. tar cfz magma.tgz magma
  3. open http://localhost
  4. add inputs magma.tgz
  5. submit for processing as scan_codebase or scan_package
  6. come back four hours later, no change: item is still in "Queued" state. No error reports in the terminal or browser, no feedback of any kind.

image.

Terminal log:
log.txt

@pombredanne
Copy link
Contributor

@lucasgonze Thanks for the report.
This is weird... just to rule out OS-related issues... you are running a macOS Monterey on Intel, correct?

@pombredanne
Copy link
Contributor

Note that this magma repo is big enough with about 1/2 of it that's .git metadata (not that this matters, just a note)

@pombredanne
Copy link
Contributor

Your report having a log is great BTW. Thanks for this.
the key issue seems to be:

worker_1  | 19:50:36 Registering birth of worker 537d425ec5cb4dae9d7c9e3346485fb8
worker_1  | 19:50:36 Worker rq:worker:537d425ec5cb4dae9d7c9e3346485fb8: started, version 1.10.1
worker_1  | 19:50:36 Subscribing to channel rq:pubsub:537d425ec5cb4dae9d7c9e3346485fb8
worker_1  | 19:50:36 *** Listening on default...
worker_1  | 19:50:36 Sent heartbeat to prevent worker timeout. Next one should arrive within 480 seconds.
worker_1  | 19:50:36 Cleaning registries for queue: default
worker_1  | INFO Synchronizing QUEUED and RUNNING Runs with their related Jobs...
web_1     | Operations to perform:
web_1     |   Apply all migrations: admin, auth, authtoken, contenttypes, django_rq, scanpipe, sessions
web_1     | Running migrations:
db_1      | 2022-01-20 19:50:36.080 UTC [72] ERROR:  relation "scanpipe_run" does not exist at character 370
db_1      | 2022-01-20 19:50:36.080 UTC [72] STATEMENT:  SELECT "scanpipe_run"."uuid", "scanpipe_run"."task_id", "scanpipe_run"."task_start_date", "scanpipe_run"."task_end_date", "scanpipe_run"."task_exitcode", "scanpipe_run"."task_output", "scanpipe_run"."project_id", "scanpipe_run"."pipeline_name", "scanpipe_run"."created_date", "scanpipe_run"."scancodeio_version", "scanpipe_run"."description", "scanpipe_run"."log" FROM "scanpipe_run" WHERE ("scanpipe_run"."task_end_date" IS NULL AND "scanpipe_run"."task_id" IS NOT NULL) ORDER BY "scanpipe_run"."created_date" ASC
worker_1  | 19:50:36 Worker rq:worker:537d425ec5cb4dae9d7c9e3346485fb8: found an unhandled exception, quitting...
worker_1  | Traceback (most recent call last):
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
worker_1  |     return self.cursor.execute(sql, params)
worker_1  | psycopg2.errors.UndefinedTable: relation "scanpipe_run" does not exist
worker_1  | LINE 1: ...ipe_run"."description", "scanpipe_run"."log" FROM "scanpipe_...
worker_1  |                                                              ^
worker_1  | 
worker_1  | 
worker_1  | The above exception was the direct cause of the following exception:
worker_1  | 
worker_1  | Traceback (most recent call last):
worker_1  |   File "/usr/local/lib/python3.9/site-packages/rq/worker.py", line 591, in work
worker_1  |     self.run_maintenance_tasks()
worker_1  |   File "/app/scancodeio/worker.py", line 57, in run_maintenance_tasks
worker_1  |     scanpipe_app.sync_runs_and_jobs()
worker_1  |   File "/app/scanpipe/apps.py", line 211, in sync_runs_and_jobs
worker_1  |     if queued_or_running:
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 284, in __bool__
worker_1  |     self._fetch_all()
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
worker_1  |     self._result_cache = list(self._iterable_class(self))
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
worker_1  |     results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
worker_1  |     cursor.execute(sql, params)
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
worker_1  |     return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
worker_1  |     return executor(sql, params, many, context)
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
worker_1  |     return self.cursor.execute(sql, params)
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
worker_1  |     raise dj_exc_value.with_traceback(traceback) from exc_value
worker_1  |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
worker_1  |     return self.cursor.execute(sql, params)
worker_1  | django.db.utils.ProgrammingError: relation "scanpipe_run" does not exist
worker_1  | LINE 1: ...ipe_run"."description", "scanpipe_run"."log" FROM "scanpipe_...
worker_1  |                                                              ^

It feels as if the worker is trying to connect to the DB before the DB was actually created fully (e.g. from running the migrations that create/update the tables schema are applied)

That's really weird

@pombredanne pombredanne added the bug Something isn't working label Jan 21, 2022
@pombredanne
Copy link
Contributor

@lucasgonze can you confirm that the same issue occurs with even a tiny project? Say a scan_package pipeline with this URL from a PyPI package that is a few KB https://files.pythonhosted.org/packages/eb/5b/f82983127b1a1a4db83ee290e00a94a2b08c566fa6c58466e82ed7b0a76f/boolean.py-3.8-py2.py3-none-any.whl (that's a dep of ScanCode FWIW)

@lucasgonze
Copy link
Author

@lucasgonze Thanks for the report. This is weird... just to rule out OS-related issues... you are running a macOS Monterey on Intel, correct?

That's correct. This is an Intel.

image

@lucasgonze
Copy link
Author

It feels as if the worker is trying to connect to the DB before the DB was actually created fully (e.g. from running the migrations that create/update the tables schema are applied)

I think that's the right diagnosis.

After killing the first run and restarting, I got this following:


Attaching to scancodeio_redis_1, scancodeio_db_1, scancodeio_web_1, scancodeio_worker_1, scancodeio_nginx_1
db_1      | The files belonging to this database system will be owned by user "postgres".
db_1      | This user must also own the server process.
db_1      |
db_1      | The database cluster will be initialized with locale "en_US.utf8".
db_1      | The default database encoding has accordingly been set to "UTF8".
db_1      | The default text search configuration will be set to "english".
db_1      |
db_1      | Data page checksums are disabled.
db_1      |
db_1      | fixing permissions on existing directory /var/lib/postgresql/data ... ok
db_1      | creating subdirectories ... ok

Log of the 2nd run attached. This includes all CLI activity from the initial install through the second run. second run log.txt

Race conditions are particularly likely on this 2017 system because the hardware is older and slower.

@lucasgonze
Copy link
Author

Status during the second run is very different:
image

If it's helpful for debugging, I can nuke my first install and run from a clean state.

@lucasgonze
Copy link
Author

At the 21 hour mark, processing completed for one of the packages:
image

@lucasgonze
Copy link
Author

@lucasgonze can you confirm that the same issue occurs with even a tiny project? Say a scan_package pipeline with this URL from a PyPI package that is a few KB https://files.pythonhosted.org/packages/eb/5b/f82983127b1a1a4db83ee290e00a94a2b08c566fa6c58466e82ed7b0a76f/boolean.py-3.8-py2.py3-none-any.whl (that's a dep of ScanCode FWIW)

I'm happy to do that if it's still useful.

pombredanne added a commit that referenced this issue Jan 22, 2022
In some case when using docker compose up, the worker processes may
start before the web process's database migration is completed.
This trigger byzantine errors that are not obvious except by looking in
the logs and will let you start a pipeline that will be queued forever.
To solve this issue we install the wait-for-it script available in
Debian by @vishnubob and as suggested in the Docker documentation.
In the docker-compose.yml, we let the worker wait for the web processing
to be complete when gunicorn exposes port 8000 and web container is
available.

Reported-by: Lucas Gonze @lucasgonze
Reference: #387
Reference: https://docs.docker.com/compose/startup-order/
Reference: https://github.com/vishnubob/wait-for-it
Reference: https://tracker.debian.org/pkg/wait-for-it
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@tdruez
Copy link
Contributor

tdruez commented Apr 5, 2022

Fix merged in main branch ce21929
Thanks @pombredanne !

@tdruez tdruez closed this as completed Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants