-
Notifications
You must be signed in to change notification settings - Fork 7
Issues: atoma-network/atoma-node-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Benchmark
slide_assign
vs Tensor::cat
in the forward
pass of flash_attention
#38
opened Aug 5, 2024 by
jorgeantonio21
Address
seqlenq_ngroups_swapped
option in kv cached flash attention
#37
opened Aug 5, 2024 by
jorgeantonio21
Remove unused kernels for paged attention
cuda
flash-attention2
optimization
paged-attention
#30
opened Jul 30, 2024 by
jorgeantonio21
Check determinism of our implementation
candle
cuda
determinism
llama
paged-attention
#25
opened Jul 24, 2024 by
jorgeantonio21
Add multi-gpu support
candle
cuda
flash-attention2
llama
paged-attention
#24
opened Jul 24, 2024 by
jorgeantonio21
2 tasks
ProTip!
Find all open issues with in progress development work with linked:pr.