Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow indexing of large files #1040

Merged
merged 2 commits into from
Oct 13, 2023
Merged

Conversation

rsdy
Copy link
Contributor

@rsdy rsdy commented Oct 11, 2023

Allow indexing of large files, and also make a best effort at lazy loading these.

The hypothesis is that files in Git are going to be generally not too large. Due to the iteration logic we have always read large files from Git in their entirety, but discarded them early in the process. There's no change here, we just discard them later.

For files from the file system, the process will never read these until it knows that it needs the contents. This may impact the accuracy of language detection for unindexed files, but otherwise it should be safe change.

@rsdy rsdy merged commit d9d747b into main Oct 13, 2023
3 checks passed
@rsdy rsdy deleted the rsdy/blo-1747-allow-indexing-of-large-files branch October 13, 2023 07:35
Dead4W added a commit to Dead4W/bloop that referenced this pull request Oct 15, 2023
* Use accurate token counts (BloopAI#1024)

* Use accurate token counts

The `total` token count is now based upon an identical format
conversation sent to the LLM during response generation.

This results in counts that should be accurate, and prevent token limit
errors entirely.

* Match token counts precisely, and add baseline count

* Calculate `total` as summation of other section token counts

* add message to response if it ended when token limit exceeded, no translations

---------

Co-authored-by: anastasiia <anastasiya1155@gmail.com>

* Send full redirect target to cognito (BloopAI#1034)

* Update Helm chart (BloopAI#1033)

* update helm chart

* remove secret.yaml

* put back secret.yaml

* mandatory Semantic in /search (BloopAI#1039)

* Upgrade qdrant version to 1.6 (BloopAI#1037)

* Upgrade qdrant version

* Update qdrant binaries

* Fix race in credentials polling (BloopAI#1042)

Previously the assumption here was that this path is locked & safe
when there is a furnished github cred in the system. However, when the
user logs out, the `unwrap()` call can blow up the task, and
this may cause issues.

* Enable paid features for desktop users (BloopAI#1038)

* Add pro features to default builds

* Check user's status through `User` object

* Adapt webserver layer for paid feature gate

* Just enforce schema right out of the gate

* Fix date parsing logic

* allow paid users sync branches

* Fix clippy features

* Disable branch switching for local repos

* cloud implies pro

* Update dockerfile

---------

Co-authored-by: anastasiia <anastasiya1155@gmail.com>

* Update flake (BloopAI#1044)

* Debug logs for initialisation (BloopAI#1036)

* debug logs for initialisation

* Scope logging

* DB too

* Clippy error

---------

Co-authored-by: rsdy <p@symmetree.dev>

* Allow indexing of large files (BloopAI#1040)

* WIP

* Support indexing large files, but lazy load them from local file system

* bump tantivy to v0.21 (BloopAI#1043)

* bump tantivy to v0.22

* address clippy

* fix broken tests

* bump version to 0.5.6 (BloopAI#1047)

---------

Co-authored-by: calyptobai <111788964+calyptobai@users.noreply.github.com>
Co-authored-by: anastasiia <anastasiya1155@gmail.com>
Co-authored-by: rsdy <rsdy@users.noreply.github.com>
Co-authored-by: Gabriel Gordon-Hall <ggordonhall@gmail.com>
Co-authored-by: rsdy <p@symmetree.dev>
Co-authored-by: akshay <nerdy@peppe.rs>
Co-authored-by: Anastasiia Solop <35258279+anastasiya1155@users.noreply.github.com>
Co-authored-by: Ilya Zedgenizov <izedgenizov@saber.games>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants