Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to reindex the database and remove events. #49

Merged
merged 33 commits into from
Mar 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
549ec33
database: Allow marking the database for a reindex.
poljar Feb 26, 2020
0ae4f35
dirty: Reindex support.
poljar Feb 29, 2020
ba56da5
index: Index the server timestamp and sender.
poljar Mar 2, 2020
378238f
seshat: Format the repo.
poljar Mar 2, 2020
9cbb560
database: Move the readonly database into a separate module.
poljar Mar 2, 2020
0fc2a66
database: Rename the ReadonlyDatabase to RecoveryDatabase.
poljar Mar 3, 2020
71c0847
database: More work on the recovery db.
poljar Mar 3, 2020
de85c23
events: Fix the event sources.
poljar Mar 3, 2020
af106dd
database: Improve the recovery test.
poljar Mar 3, 2020
46ef3b2
database: Add propper error handling to the recovery database.
poljar Mar 4, 2020
d3f2996
data: Add a v2 database that needs to be reindexed.
poljar Mar 4, 2020
c6f4df4
database: Add a method to delete events.
poljar Feb 19, 2020
7f7d7fe
database: Improve the database recovery test.
poljar Mar 4, 2020
f28220e
data: Add a v1 database with valid json in the event source field.
poljar Mar 4, 2020
af41adc
database: Test the new v1 database upgrade.
poljar Mar 4, 2020
e87fb54
database: Add missing docs to the recovery database.
poljar Mar 4, 2020
ace778c
seshat: Fix some clippy issues.
poljar Mar 4, 2020
ad3ebfc
seshat-node: Add a class to reindex Seshat.
poljar Mar 5, 2020
491be9a
seshat-node: Throw a different error if the database needs to be rein…
poljar Mar 5, 2020
3a3453e
seshat-node: Convert the RangeError into a custom error on the js side.
poljar Mar 5, 2020
bbbe4c7
seshat: Make the event deserialization return a Result.
poljar Mar 6, 2020
18e747d
seshat-node: Improve the reindex test.
poljar Mar 6, 2020
832455f
database: Fix the reindex tests.
poljar Mar 6, 2020
1ccd8d1
seshat-node: Fix the reindexing.
poljar Mar 6, 2020
fe0ff88
travis: Reset the test data before running Javascript tests.
poljar Mar 6, 2020
0029495
database: Remove some unused imports from the tests.
poljar Mar 6, 2020
786fecb
travis: First delete the database folder, then switch folders.
poljar Mar 6, 2020
947654b
seshat-node: Add docs and fix lint issues for the new recovery database.
poljar Mar 6, 2020
6aa353f
seshat-node: Explain our re-index error throwing shenanigans.
poljar Mar 6, 2020
a3dc17e
seshat-node: Fix some more lint errors.
poljar Mar 6, 2020
743d448
Update src/error.rs
poljar Mar 10, 2020
3b2b9b8
Update src/error.rs
poljar Mar 10, 2020
48b5d7d
database: Rename undeleted_events to pending_deletion_events.
poljar Mar 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ install:

script:
- cargo test
# Reset our test data before running the Javascript tests
- rm -r data/database
- git reset --hard HEAD
- cd seshat-node
- yarn install
- yarn test --coverage
Expand Down
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ pbkdf2 = "0.3.0"
rand = "0.7.3"
zeroize = "1.1.0"
byteorder = "1.3.2"
serde_json = "1.0.48"

[dev-dependencies]
tempfile = "3.1.0"
Expand Down
1 change: 1 addition & 0 deletions data/database/v1_2/.managed.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
["5249de4da76b404098f026f248155bf1.store","88b4ad3f5a1e430f9ea03e2ef43c39f7.fast","88b4ad3f5a1e430f9ea03e2ef43c39f7.fieldnorm","88b4ad3f5a1e430f9ea03e2ef43c39f7.pos","c7f9f53e463f4d9297d257449d63b8d0.posidx","9883d1a65e92474983c3022fb99ea0da.posidx","bf6498b5a1e948d090131605048ecbcd.fieldnorm","3b59e3c0f08646efb6b56192a98ab85f.idx","bf6498b5a1e948d090131605048ecbcd.posidx","5249de4da76b404098f026f248155bf1.term","c7f9f53e463f4d9297d257449d63b8d0.fast","9883d1a65e92474983c3022fb99ea0da.pos","d013a4c1859e403093ef8e37d7e88f32.pos","c7f9f53e463f4d9297d257449d63b8d0.term","5249de4da76b404098f026f248155bf1.posidx","18dcd68b373940f6844631d851e6139c.posidx","0112d870f836494da5f4c16699878d93.store","5249de4da76b404098f026f248155bf1.pos","3b59e3c0f08646efb6b56192a98ab85f.pos","bf6498b5a1e948d090131605048ecbcd.idx","d013a4c1859e403093ef8e37d7e88f32.posidx","c7f9f53e463f4d9297d257449d63b8d0.pos","5249de4da76b404098f026f248155bf1.idx","bf6498b5a1e948d090131605048ecbcd.store","9883d1a65e92474983c3022fb99ea0da.fast","c7f9f53e463f4d9297d257449d63b8d0.fieldnorm","88b4ad3f5a1e430f9ea03e2ef43c39f7.posidx","0112d870f836494da5f4c16699878d93.fast","c7f9f53e463f4d9297d257449d63b8d0.store","9883d1a65e92474983c3022fb99ea0da.term","3b59e3c0f08646efb6b56192a98ab85f.fieldnorm","d013a4c1859e403093ef8e37d7e88f32.fieldnorm","d013a4c1859e403093ef8e37d7e88f32.term","88b4ad3f5a1e430f9ea03e2ef43c39f7.idx","bf6498b5a1e948d090131605048ecbcd.pos","5249de4da76b404098f026f248155bf1.fast","5249de4da76b404098f026f248155bf1.fieldnorm","18dcd68b373940f6844631d851e6139c.fieldnorm","c7f9f53e463f4d9297d257449d63b8d0.idx","meta.json","18dcd68b373940f6844631d851e6139c.term","88b4ad3f5a1e430f9ea03e2ef43c39f7.term","3b59e3c0f08646efb6b56192a98ab85f.term","bf6498b5a1e948d090131605048ecbcd.term","bf6498b5a1e948d090131605048ecbcd.fast","9883d1a65e92474983c3022fb99ea0da.fieldnorm","3b59e3c0f08646efb6b56192a98ab85f.fast","18dcd68b373940f6844631d851e6139c.fast","88b4ad3f5a1e430f9ea03e2ef43c39f7.store","9883d1a65e92474983c3022fb99ea0da.idx","d013a4c1859e403093ef8e37d7e88f32.fast","18dcd68b373940f6844631d851e6139c.idx","9883d1a65e92474983c3022fb99ea0da.store","18dcd68b373940f6844631d851e6139c.store","18dcd68b373940f6844631d851e6139c.pos","3b59e3c0f08646efb6b56192a98ab85f.store","d013a4c1859e403093ef8e37d7e88f32.store","d013a4c1859e403093ef8e37d7e88f32.idx","3b59e3c0f08646efb6b56192a98ab85f.posidx"]
Empty file.
Empty file.
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added data/database/v1_2/events.db
Binary file not shown.
99 changes: 99 additions & 0 deletions data/database/v1_2/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
{
"segments": [
{
"segment_id": "bf6498b5-a1e9-48d0-9013-1605048ecbcd",
"max_doc": 356,
"deletes": null
},
{
"segment_id": "5249de4d-a76b-4040-98f0-26f248155bf1",
"max_doc": 288,
"deletes": null
},
{
"segment_id": "18dcd68b-3739-40f6-8446-31d851e6139c",
"max_doc": 284,
"deletes": null
},
{
"segment_id": "9883d1a6-5e92-4749-83c3-022fb99ea0da",
"max_doc": 67,
"deletes": null
},
{
"segment_id": "3b59e3c0-f086-46ef-b6b5-6192a98ab85f",
"max_doc": 1,
"deletes": null
},
{
"segment_id": "88b4ad3f-5a1e-430f-9ea0-3e2ef43c39f7",
"max_doc": 1,
"deletes": null
},
{
"segment_id": "c7f9f53e-463f-4d92-97d2-57449d63b8d0",
"max_doc": 1,
"deletes": null
},
{
"segment_id": "d013a4c1-859e-4030-93ef-8e37d7e88f32",
"max_doc": 1,
"deletes": null
}
],
"schema": [
{
"name": "body",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": false
}
},
{
"name": "topic",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": false
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": false
}
},
{
"name": "room_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": false
}
},
{
"name": "event_id",
"type": "text",
"options": {
"indexing": null,
"stored": true
}
}
],
"opstamp": 1007
}
1 change: 1 addition & 0 deletions data/database/v2/.managed.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
["b653775a48c44546afa249d5f9dc4848.fast","b653775a48c44546afa249d5f9dc4848.pos","meta.json","b653775a48c44546afa249d5f9dc4848.idx","b653775a48c44546afa249d5f9dc4848.posidx","b653775a48c44546afa249d5f9dc4848.store","b653775a48c44546afa249d5f9dc4848.fieldnorm","b653775a48c44546afa249d5f9dc4848.term"]
Empty file.
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added data/database/v2/events.db
Binary file not shown.
64 changes: 64 additions & 0 deletions data/database/v2/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"segments": [
{
"segment_id": "b653775a-48c4-4546-afa2-49d5f9dc4848",
"max_doc": 999,
"deletes": null
}
],
"schema": [
{
"name": "body",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": false
}
},
{
"name": "topic",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": false
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": false
}
},
{
"name": "room_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": false
}
},
{
"name": "event_id",
"type": "text",
"options": {
"indexing": null,
"stored": true
}
}
],
"opstamp": 1000
}
158 changes: 135 additions & 23 deletions seshat-node/lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -76,36 +76,88 @@ const seshat = require('../native');
* @property {number} roomCount The number of rooms the database knows about.
*/

/**
* @typedef recoveryInfo
* @type {Object}
* @property {number} totalEvents The total number of events that the database
* holds.
* @property {number} reindexedEvents The number of events that have been
* reindexed.
* @property {number} done The percentage showing the re-index progress.
*/

/**
* Seshat re-index error.<br>
*
* This error will be thrown if a Seshat database can't be opened because it
* needs to be re-indexed.
*
* The database can be opened as a recovery database with the SeshatRecovery
* class. This class provides method to re-index the database.
*
*/
class ReindexError extends Error {
/**
* Create a new ReindexError
*/
constructor(...params) {
super(...params);

if (Error.captureStackTrace) {
Error.captureStackTrace(this, ReindexError);
}
this.name = 'ReindexError';
this.message = 'The Seshat database needs to be reindexed.';
}
}

/**
* Seshat database.<br>
*
* A Seshat database can be used to store and index Matrix events. A full-text
* search can be done on the database retrieving events that match a search
* query.
*
* @param {string} path The path where the database should be stored. If a
* database already exist in the given folder the database will be reused.
* @param {object} config Additional configuration for the database.
* database already exist in the given folder the database will be reused.
* @param {string} config.language The language that the database should use
* for indexing. Picking the correct indexing language may improve the search.
* @param {string} config.passphrase The passphrase that should be used to
* encrypt the database. The database is left unencrypted it no passphrase is
* set.
*
* @constructor
*
* @example
* // create a Seshat database in the given folder
* let db = new Seshat("/home/example/database_dir");
* // Add a Matrix event to the database.
* db.addEvent(textEvent, profile);
* // Commit events waiting in the queue to the database.
* await db.commit();
* // Search the database for messages containing the word 'Test'
* let results = await db.search('Test');
*/
class Seshat extends seshat.Seshat {
/**
* Open an existing or create a new Seshat database.
*
* @param {string} path The path where the database should be stored. If a
* database already exist in the given folder the database will be reused.
* @param {object} config Additional configuration for the database.
* database already exist in the given folder the database will be reused.
* @param {string} config.language The language that the database should
* use for indexing. Picking the correct indexing language may improve the
* search. @param {string} config.passphrase The passphrase that should be
* used to encrypt the database. The database is left unencrypted it no
* passphrase is set.
*
* @constructor
*
* @example
* // create a Seshat database in the given folder
* let db = new Seshat("/home/example/database_dir");
* // Add a Matrix event to the database.
* db.addEvent(textEvent, profile);
* // Commit events waiting in the queue to the database.
* await db.commit();
* // Search the database for messages containing the word 'Test'
* let results = await db.search('Test');
*/
constructor(path, config = undefined) {
config = config || {};
try {
super(path, config);
} catch (e) {
// The Rust side throws a RangeError, this is a bit silly so convert
// it to a custom error.
if (e.constructor.name === 'RangeError') {
throw new ReindexError();
} else {
throw e;
}
}
}
/**
* Add an event to the database.
*
Expand Down Expand Up @@ -393,4 +445,64 @@ class Seshat extends seshat.Seshat {
}
}

module.exports = Seshat;
/**
* Seshat recovery database.<br>
*
* A Seshat recovery database can be used to re-index a Seshat database.
*
* This will be needed if schema changes to the index were required and the
* library has been upgraded.
*
* The recovery database uses the same parameters in the constructor like the
* normal Seshat database.
*
* @param {string} path The path where the database should be stored. If a
* database already exist in the given folder the database will be reused.
* @param {object} config Additional configuration for the database.
* database already exist in the given folder the database will be reused.
* @param {string} config.language The language that the database should use
* for indexing. Picking the correct indexing language may improve the search.
* @param {string} config.passphrase The passphrase that should be used to
* encrypt the database. The database is left unencrypted it no passphrase is
* set.
*
* @constructor
*
* @example
* // open a Seshat recovery database in the given folder
* let recovery = new SeshatRecovery("/home/example/database_dir");
* // reindex the database
* await recovery.reindex();
*/
class SeshatRecovery extends seshat.SeshatRecovery {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have feedback after seeing how you plan to handle the recovery path in Riot, but at an abstract level it seems okay.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit implements support for it in Riot: element-hq/element-web@921d734.

I didn't notice any delays at startup, the thing seems to be reasonably quick but do note that this was using a 3k events large database.

/**
* Get info about the re-index status.
*
* @return {RecoveryInfo} A object that holds the number of total events,
* re-indexed events and the done percentage.
*/
info() {
return super.info();
}

/**
* Re-index the database.
*
* @return {Promise} A promise that will resolve once the database has
* been re-indexed.
*/
async reindex() {
return new Promise((resolve, reject) => {
super.reindex((err, res) => {
if (err) reject(err);
else resolve(res);
});
});
}
}

module.exports = {
Seshat: Seshat,
SeshatRecovery: SeshatRecovery,
ReindexError: ReindexError,
};
Loading