Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error: device-side assert triggered #370

Open
THEWHITEBOY503 opened this issue Jul 26, 2023 · 4 comments
Open

CUDA Error: device-side assert triggered #370

THEWHITEBOY503 opened this issue Jul 26, 2023 · 4 comments

Comments

@THEWHITEBOY503
Copy link

I just got an RTX 3060 today and have been playing with KoboldAI all day. At some point, I attempted to overclock my GPU using MSI Afterburner with reasonable settings, and now every time I try and generate, I get this error:

C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [32,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [33,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [34,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [35,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [36,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [37,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [38,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [39,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [40,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [41,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [42,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [43,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [44,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [45,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [46,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [47,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [48,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [49,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [50,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [51,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [52,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [53,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [54,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [55,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [56,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [57,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [58,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [59,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [60,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [61,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [62,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [1,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [2,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [3,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [4,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [5,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [6,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [7,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [8,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [9,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [10,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [11,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [12,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [13,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [14,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [15,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [16,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [17,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [18,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [19,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [20,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [21,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [22,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [23,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [24,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [25,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [26,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [27,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [28,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [29,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [30,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:145: block: [0,0,0], thread: [31,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
[2023-07-25 19:33:50,013] ERROR in app: Exception on /api/v1/generate [POST]
Traceback (most recent call last):
  File "B:\python\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "B:\python\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "B:\python\lib\site-packages\flask_cors\extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "B:\python\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "B:\python\lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "aiserver.py", line 843, in g
    return f(*args, **kwargs)
  File "aiserver.py", line 765, in decorated
    response = f(schema, *args, **kwargs)
  File "aiserver.py", line 748, in decorated
    raise e
  File "aiserver.py", line 739, in decorated
    return f(*args, **kwargs)
  File "aiserver.py", line 8443, in post_generate
    return _generate_text(body)
  File "aiserver.py", line 8308, in _generate_text
    genout = apiactionsubmit(body.prompt, use_memory=body.use_memory, use_story=body.use_story, use_world_info=body.use_world_info, use_authors_note=body.use_authors_note)
  File "aiserver.py", line 3572, in apiactionsubmit
    genout = apiactionsubmit_generate(tokens, minimum, maximum)
  File "aiserver.py", line 3463, in apiactionsubmit_generate
    _genout, already_generated = tpool.execute(model.core_generate, txt, set())
  File "B:\python\lib\site-packages\eventlet\tpool.py", line 132, in execute
    six.reraise(c, e, tb)
  File "B:\python\lib\site-packages\six.py", line 719, in reraise
    raise value
  File "B:\python\lib\site-packages\eventlet\tpool.py", line 86, in tworker
    rv = meth(*args, **kwargs)
  File "C:\KoboldAI\modeling\inference_model.py", line 342, in core_generate
    result = self.raw_generate(
  File "C:\KoboldAI\modeling\inference_model.py", line 589, in raw_generate
    result = self._raw_generate(
  File "C:\KoboldAI\modeling\inference_models\hf_torch.py", line 328, in _raw_generate
    genout = self.model.generate(
  File "B:\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\generation\utils.py", line 1572, in generate
    return self.sample(
  File "C:\KoboldAI\modeling\inference_models\hf_torch.py", line 260, in new_sample
    return new_sample.old_sample(self, *args, **kwargs)
  File "B:\python\lib\site-packages\transformers\generation\utils.py", line 2619, in sample
    outputs = self(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 854, in forward
    transformer_outputs = self.transformer(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 689, in forward
    outputs = block(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 309, in forward
    attn_outputs = self.attn(
  File "B:\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "B:\python\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 233, in forward
    k_rot = apply_rotary_pos_emb(k_rot, sin, cos)
  File "B:\python\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 77, in apply_rotary_pos_emb
    sin = torch.repeat_interleave(sin[:, :, None, :], 2, 3)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I reset my Afterburner to default settings and disabled it on Windows startup, updated my GPU drivers, and even reinstalled KoboldAI dependancies, and rebooted twice, but have had no luck.

@THEWHITEBOY503
Copy link
Author

@THEWHITEBOY503
Copy link
Author

THEWHITEBOY503 commented Jul 27, 2023

I can get it to work if I turn the context below 2048. I also updated KoboldAI to the latest version. But, it works super slow (much slower than I used to be getting). When I run a prompt, I get this error before my prompt. I think it has something to do with my DSA error, since it talks about the index being out of bounds.
Token indices sequence length is longer than the specified maximum sequence length for this model (1575 > 1024). Running this sequence through the model will result in indexing errors

@henk717
Copy link
Collaborator

henk717 commented Jul 27, 2023

The last error is normal for some architectures such as GPT-Neo and GPT-J based models. They use the gpt2 tokenizer which only supports up to 1024 tokens while the model supports higher context. So it warns you that its applying a workaround but its fine.

Most models do not support more than 2048 tokens.

@THEWHITEBOY503
Copy link
Author

The last error is normal for some architectures such as GPT-Neo and GPT-J based models. They use the gpt2 tokenizer which only supports up to 1024 tokens while the model supports higher context. So it warns you that its applying a workaround but its fine.

Huh, interesting.

Most models do not support more than 2048 tokens.

Previously I was running like 2300 tokens context and IIRC generating around 20 tokens/second. Now I have to keep it below 2048 and I'm getting 2 tokens/second or less, otherwise it throws an error around 75% in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants