Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Turn on WAL mode for cluster job table #3863

Open
Michaelvll opened this issue Aug 22, 2024 · 1 comment
Open

[Core] Turn on WAL mode for cluster job table #3863

Michaelvll opened this issue Aug 22, 2024 · 1 comment
Labels
good first issue Good for newcomers P0

Comments

@Michaelvll
Copy link
Collaborator

Michaelvll commented Aug 22, 2024

For job table on clusters, we do not turn on WAL mode, which causes errors when user cancels and submits new jobs quickly and frequently:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/skypilot-runtime/lib/python3.10/site-packages/sky/skylet/log_lib.py", line 448, in tail_logs
    for line in _follow_job_logs(log_file,
  File "/root/skypilot-runtime/lib/python3.10/site-packages/sky/skylet/log_lib.py", line 381, in _follow_job_logs
    status = job_lib.get_status_no_lock(job_id)
  File "/root/skypilot-runtime/lib/python3.10/site-packages/sky/skylet/job_lib.py", line 339, in get_status_no_lock
    rows = _CURSOR.execute('SELECT status FROM jobs WHERE job_id=(?)',
sqlite3.OperationalError: database is locked
@wizenheimer
Copy link
Contributor

Hey @Michaelvll,
Pushed a fix. Please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers P0
Projects
None yet
Development

No branches or pull requests

2 participants