Transparent Encryption/Decryption Layer between LevelDB and Filesystem "Block Manager" #5

CMCDragonkai · 2021-10-19T07:21:22Z

Specification

Our current encryption/decryption layer sits on top of LevelDB. This causes problems for indexing #1 because when you want ot index something you'll need to expose keys, and keys have to be un-encrypted atm.

It may also increase performance of DB if encryption/decryption were operating at a block level rather at individual key-value level. It's the equivalent of using full-disk encryption and using leveldb on top.

We can't rely on OS provided full-disk encryption. So something that is in-between the current key-value DB like leveldb and the actual filesystem that is executed in JS or C++ would be needed.

There is a level-js which is a abstract-leveldown compliant store that can be wrapped in levelup. It is leveldb implemented in pure-JS which relies on IndexedDB. Currently IndexedDB doesn't exist natively on Node.js, but there are some implementations of it. This seems to give an opportunity to add a transparent encryption/decryption layer in between leveldb and IndexedDB.

Additional context

Integrate Automatic Indexing into DB #1 (comment) - RocksDB at the C++ level has provision for incorporating encryption at a low level (not sure if this still key-value level or block level of abstraction)
WIP: Integrating Automatic Indexing into DB #2 (comment) - discussion about level architecture in Node.js
level is a library that bundles leveldown, level-js, levelup and encoding-down to create a single batteries-included package
- the difference between leveldown and level-js is that leveldown uses the C++ leveldb library which only works in Node.js while level-js works on IndexedDB which exists in browsers
- if IndexedDB were to exist in Node.js, one could use level-js as an isomorphic library that works on browsers and Node.js, not sure about NativeScript though
  - using IndexedDB could allow us to put a transparent encryption/decryption layer in between IndexedDB and level-js, thus enabling us flexible indexing, and probably better security
  - there are only 2 libraries providing IndexedDB in Node.js:
    - https://github.com/bigeasy/indexeddb
    - https://github.com/dumbmatter/fakeIndexedDB and FIDB persistence dumbmatter/fakeIndexedDB#12 (comment)
      - if fakeIndexedDB becomes realIndexedDB via leveldb, then this should be possible in NS as well, but you are doing something a bit funny: levelup API -> level-js -> transparent encryption/decryption -> IndexedDB -> leveldb or whatever, but this is sort of what happens in Chrome which implements IndexedDB using leveldb
      - see https://news.ycombinator.com/item?id=10188476 for discussion about IndexedDB in node.js
Implement True Snapshot Isolation for LevelDB #4 - this may be impacted if IndexedDB is used
https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20171220_encryption_at_rest.md - CockroachDB has done this already we can compare this

Tasks

- Investigate how level-js uses IndexedDB
- Attempt to implement or find a persistent IndexedDB, perhaps by being implemented by leveldb or sqlite, it seems like any performant implementation would have to use C++ at some point, also there are bunch of wrapper libraries, but not sure which ones actually perform real persistence
- Integrate this into PK

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2021-10-19T07:24:26Z

Alternative is to instead work on leveldb directly and manipulate the C++ to allow one to plug in encryption/decryption.

CMCDragonkai · 2022-05-10T04:20:37Z

Along with IndexedDB, RocksDB, another option is lmdb based on the discussion here: 6107d9c#commitcomment-73269633.

The lmdb-js project already supports native encryption at the block level thus ensuring keys and values are encrypted.

CMCDragonkai · 2022-05-26T07:32:38Z

Since we are working at the C++ layer, this should mean we can finally attempt block level encryption. I wonder if we can just bind to node's openssl. https://github.com/nodejs/node-gyp/blob/master/docs/Linking-to-OpenSSL.md since it's already there.

Node's crypto and webcrypto API is likely built on top of the statically linked openssl. If we do the same, we would maintain parity with the crypto implementation. And it avoids bringing our own crypto library. Finally if we have to do, we can then do so with a native library rather needing it to be implemented in raw JS or web assembly.

CMCDragonkai · 2022-07-14T16:28:52Z

Relevant PRs:

CMCDragonkai · 2022-10-11T01:17:02Z

When doing this it's worth considering the ability to do incremental key rotation.

This means if the key gets changed instead of re-encrypting EVERYTHING straight away, we can encrypt new values with the new key.

However the old key would have to be kept around to decrypt old values and can only be discarded once all old values are gone and have been re-encrypted.

We can one of 2 ways:

Background incremental re-encryption
Pull-driven incremental re-encryption - that is re-encryption only occurs for values that have been read or written to.

One could build 1 off 2. A background system can just read every single block. While in the case of 2, it just means a reference count has to be kept around for the key.

However js-db doesn't keep around the key on disk. It is expected that one key is provided to the DB in-memory. The persistence of old keys will need to be hooked into through a ref counted system.

How do we identify blocks that are encrypted with a particular key... we may hash the key, and keep the hash around as a "key identifier". Then each block would have a key identifier.

Blocks would need to be large enough to justify keeping these key identifiers around. I imagine we may have something like 16 bit hashes or 8 bit hashes.

Perhaps a counter could also work, but one would need to again remember some aspect of the key that is being used.

Perhaps the db can remember there are X keys still be used. Imagine:

Key1 - 13 blocks
Key2 - 20 blocks
... etc

Then the user must provide those 6 keys again. If they don't, then the initial integrity/canary check will fail.

CMCDragonkai · 2023-07-16T12:15:14Z

When integrating our new symmetric crypto routines from sodium native to js-db, we need to consider how to integrate 2 native shared objects (native plugins) to nodeJS together.

I asked ChatGPT about this https://chat.openai.com/share/d09826e1-ebb0-4584-9e89-d379ac7363b8.

This will also be relevant to MatrixAI/Polykey#526.

The key point is to avoid code duplication. We won't want to use the OpenSSL library inside NodeJS, because OpenSSL there is not likely to exist on other platforms, so we must supply our own crypto library which currently is the libsodium provided by sodium-native package (which we most likely need to fork into PK).

CMCDragonkai · 2023-07-16T12:37:24Z

Also see the discussion in MatrixAI/Polykey#526 (comment) for further elaboration on interactions between different shared object native libraries in the same NodeJS process.

CMCDragonkai · 2023-07-16T12:43:02Z

It seems then, that the right thing to do is to require peer dependencies, rather than direct dependencies.

That is, the DB could depend on the peer dependency on sodium-native. This sort of implies that sodium-native is the host, and @matrixai/db is the plugin.

Thus requiring that the downstream project have sodium-native as a dependency as well. It's bit inconvenient, but it would ensures that encryption is necessary.

It's a bit strange. Alternatively if @matrixai/db were to say that sodium-native is direct dependency, then it can still work without problems as long as the downstream packages didn't bring a different incompatible version of sodium-native.

Given that @matrixai/db is already a native package, it's not really a big deal to add a dependency on another native package.

On top of this, one could argue that it's an optional dependency, because the DB doesn't actually need to have crypto switched on. Right now it's a dependency injection. However we still need to work out how exactly one would dependency inject into the RocksDB environment...? Especially since we would want to avoid having C++ code call JS then call C++, instead C++ should just call C++ directly. So I imagine this would have to be just a runtime boolean switch to turn it on/off.

And thus it would be hardcoded to the sodium-native crypto facilities. No dependency injection possible here. I think though, there is this concept of calling a common interface/header, and being able to substitute for a different library as long as it exposed the same symbols. I see some native projects saying that you can swap out their SSL for different openssl variants. So this must be possible too. Therefore this would be a libsodium based interface.

CMCDragonkai · 2023-07-16T12:54:31Z

So upon further research, I see that it's possible to "dynamically" inject the function pointer into the C/C++ code. This is different technique to just using the same headers, and then using -l shared.so when compiling, because this relies on the dlsym function to resolve to a particular function.

So imagine that in the C++ code, we wanted to have functions passed in that we would call to do the crypto operation. These would be considered C function pointers. How would we "pass" these in from JS.

Well you could do something like this:

#include <dlfcn.h>

int main() {
    void* handle = dlopen("mylib.so", RTLD_LAZY);
    void (*function_in_library)() = dlsym(handle, "function_in_library");
    function_in_library();  // Indirect call through a function pointer
    dlclose(handle);
    return 0;
}

Suppose this was called by NodeJS:

    void* handle = dlopen("mylib.so", RTLD_LAZY);

I'm not sure if it is possible to access the void* handle returned by dlopen just by doing require() in JS (or in the case of ESM, the import()), but suppose you got that some how, you would be able to move that around like an opaque reference pointer via the NAPI interface.

Then subsequently pass that into the C++ side of @matrixai/db.

Then the @matrixai/db could do:

    void (*function_in_library)() = dlsym(handle, "function_in_library");
    function_in_library();  // Indirect call through a function pointer

The handle still is managed by the caller though. If it wants to use dlclose.

I'm not sure if this is a better method. This sort of allows @matrixai/db to be agnostic to the crypto implementation, and just require someone to pass in the right C function that supports a particular simple interface.

CMCDragonkai · 2023-07-16T12:56:52Z

There is a slight performance penalty on using the dlsym method. But it's actually quite flexible, because we defer the linking decision. Direct calls require you to use the -l option during compilation/linking to the shared object, so you have to have it available at the point which we are compiling @matrixai/db.

CMCDragonkai · 2023-07-16T13:00:47Z

Note the usage of https://nodejs.org/api/process.html#processdlopenmodule-filename-flags, primarily is about loading exported NAPI/node API functions. But if the shared object just has exposed symbols in general... that should be available to other shared objects right? This requires some experimentation and comparison to ESM async imports/static imports.

CMCDragonkai · 2023-07-16T13:01:48Z

Oh actually it turns out once you switch to ESM, you cannot use require. But you can use process.dlopen.

CMCDragonkai · 2023-07-16T13:03:42Z

Test with different symbols: https://nodejs.org/api/os.html#dlopen-constants

CMCDragonkai added the development Standard development label Oct 19, 2021

CMCDragonkai mentioned this issue Oct 20, 2021

WIP: Integrating Automatic Indexing into DB #2

Closed

8 tasks

CMCDragonkai mentioned this issue Dec 1, 2021

Integrate DB into the level db interface so db levels are the same as db #8

Closed

CMCDragonkai added the design Requires design (architecture, protocol, specification and task list requires further work) label Dec 1, 2021

CMCDragonkai mentioned this issue Feb 16, 2022

Integrate Automatic Indexing into DB #1

Closed

2 tasks

CMCDragonkai mentioned this issue Mar 17, 2022

Upgrade to abstract-level interface #11

Closed

CMCDragonkai referenced this issue May 10, 2022

update level to 7.0.1

6107d9c

CMCDragonkai mentioned this issue May 10, 2022

Introduce Snapshot Isolation OCC to DBTransaction #18

Closed

15 tasks

CMCDragonkai mentioned this issue Jul 14, 2022

Research how to simplify the connection between JS and C++ code (considering both ios and android) #37

Open

CMCDragonkai added the epic Big issue with multiple subissues label Jul 15, 2022

CMCDragonkai added the r&d:polykey:core activity 2 Cross Platform Cryptography for JavaScript Platforms label Jul 24, 2022

CMCDragonkai self-assigned this Jul 10, 2023

CMCDragonkai mentioned this issue Jul 16, 2023

Unifying the TLS libraries between WS, QUIC and Fetch/HTTPS MatrixAI/Polykey#526

Open

CMCDragonkai removed the epic Big issue with multiple subissues label Aug 12, 2024

CMCDragonkai removed their assignment Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transparent Encryption/Decryption Layer between LevelDB and Filesystem "Block Manager" #5

Transparent Encryption/Decryption Layer between LevelDB and Filesystem "Block Manager" #5

CMCDragonkai commented Oct 19, 2021 •

edited

Loading

CMCDragonkai commented Oct 19, 2021

CMCDragonkai commented May 10, 2022

CMCDragonkai commented May 26, 2022

CMCDragonkai commented Jul 14, 2022

CMCDragonkai commented Oct 11, 2022

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023 •

edited

Loading

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

Transparent Encryption/Decryption Layer between LevelDB and Filesystem "Block Manager" #5

Transparent Encryption/Decryption Layer between LevelDB and Filesystem "Block Manager" #5

Comments

CMCDragonkai commented Oct 19, 2021 • edited Loading

Specification

Additional context

Tasks

CMCDragonkai commented Oct 19, 2021

CMCDragonkai commented May 10, 2022

CMCDragonkai commented May 26, 2022

CMCDragonkai commented Jul 14, 2022

CMCDragonkai commented Oct 11, 2022

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023 • edited Loading

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Jul 16, 2023

CMCDragonkai commented Oct 19, 2021 •

edited

Loading

CMCDragonkai commented Jul 16, 2023 •

edited

Loading