Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to reindex the database and remove events. #49

Merged
merged 33 commits into from
Mar 10, 2020
Merged

Conversation

poljar
Copy link
Collaborator

@poljar poljar commented Mar 6, 2020

This is a bit of a big one and it essentially implements two features.

While developing support to remove events from the database/index it has come to my attention that documents can only be deleted by a field if the field is indexed.
Tantivy has no way to find a document by a field if the field isn't indexed (doh).

This in turn requires a schema change on our part since we really need to delete events by event id, which is unique and guarantees that we're only going to remove this one intended event.

Our current Tantivy schema is lacking some more definitions to support the following use cases:

While this PR doesn't implement all of the things that are needed to support those features it does make the required Tantivy schema changes and bumps the database version.

To allow breaking schema changes without the need to delete the whole database and re-download the complete room history we introduce a RecoveryDatabase, this database acts as a read only database that allows the index to be deleted while keeping the SQLite database in tact.

The RecoveryDatabase can then proceed to create a new index and fetch events out of the database for them to be re-added to the index.

As already mentioned, support to remove events from the database and index is included here as well. Removing an event requires a Tantivy commit, so they need to go through the flow control logic the same way as newly added events.

Some things to consider during review:

  • Do we want all the non-implemented features as well
  • Since re-indexing requires events to be de-serialized and neon-serde can't be used for this; do we want to make this feature optional since it requires serde_json.

This will allow the index to be searched by the sender or timestamp.
@poljar poljar requested a review from a team March 6, 2020 16:34
Copy link
Contributor

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great to me! 😁

  • Do we want all the non-implemented features as well

Yes, I think at least this batch seems reasonable to add. It seems quite reasonable to want to support these extra query filters, even if we might not add it to Riot right away.

  • Since re-indexing requires events to be de-serialized and neon-serde can't be used for this; do we want to make this feature optional since it requires serde_json.

I would think all clients would want to be able to support re-indexing, so I would say it's fine to be required until there's a problem... I assume it works fine with Riot?

src/error.rs Outdated Show resolved Hide resolved
src/error.rs Outdated Show resolved Hide resolved

transaction.execute("DELETE from events WHERE event_id == ?1", &[&event_id])?;
transaction.execute(
"INSERT OR IGNORE INTO undeleted_events (event_id) VALUES (?1)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does undeleted_events mean "events to be deleted"? If so, "undeleted" is confusing to me as I would assumed it means "restored" or "brought back from the trash".

If my assumption is correct, maybe use pending_deletion_events or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is events to be deleted.

Copy link
Member

@ara4n ara4n Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to pending_deletion_events

* // reindex the database
* await recovery.reindex();
*/
class SeshatRecovery extends seshat.SeshatRecovery {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have feedback after seeing how you plan to handle the recovery path in Riot, but at an abstract level it seems okay.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit implements support for it in Riot: element-hq/element-web@921d734.

I didn't notice any delays at startup, the thing seems to be reasonably quick but do note that this was using a 3k events large database.

@poljar
Copy link
Collaborator Author

poljar commented Mar 9, 2020

I would think all clients would want to be able to support re-indexing, so I would say it's fine to be required until there's a problem... I assume it works fine with Riot?

It does work fine with Riot, the only thing that worries me is a bit of code/dependency bloat.

poljar and others added 3 commits March 10, 2020 09:19
Co-Authored-By: J. Ryan Stinnett <jryans@gmail.com>
Co-Authored-By: J. Ryan Stinnett <jryans@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants