diff --git a/docs/specifications/fs-datastore.png b/docs/specifications/fs-datastore.png new file mode 100644 index 00000000000..0a5eaaa87dd Binary files /dev/null and b/docs/specifications/fs-datastore.png differ diff --git a/docs/specifications/ipfs-repo-contents.png b/docs/specifications/ipfs-repo-contents.png new file mode 100644 index 00000000000..cfda9e5d9e3 Binary files /dev/null and b/docs/specifications/ipfs-repo-contents.png differ diff --git a/docs/specifications/keystore.md b/docs/specifications/keystore.md new file mode 100644 index 00000000000..7e588ca9853 --- /dev/null +++ b/docs/specifications/keystore.md @@ -0,0 +1,295 @@ +# ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) Keystore + +**Authors(s):** +- [whyrusleeping](github.com/whyrusleeping) +- [Hector Sanjuan](github.com/hsanjuan) + +**Abstract** + +This spec provides definitions and operations for the keystore feature in IPFS. + +# Table of Contents + +- [Goals](#goals) +- [Planned Implementation](#planned-implementation) + - [Key storage](#key-storage) + - [Interface](#interface) + - [Code changes and additions](#code-changes-and-additions) + - [Structures](#structures) + +## Goals + +To have a secure, simple and user-friendly way of storing and managing keys +for use by ipfs. As well as the ability to share these keys, encrypt, decrypt, +sign and verify data. + +## Planned Implementation + +### Key storage + +Storage layout and format is defined in the [`repository_fs`](repository_fs.md) part of the spec. + +### Interface + +#### ipfs key + +``` +USAGE + ipfs key - Create and list IPNS name keypairs + + ipfs key + + 'ipfs key gen' generates a new keypair for usage with IPNS and 'ipfs name + publish'. + + > ipfs key gen --type=rsa --size=2048 mykey + > ipfs name publish --key=mykey QmSomeHash + + 'ipfs key list' lists the available keys. + + > ipfs key list + self + mykey + + +SUBCOMMANDS + ipfs key export - Export a keypair + ipfs key gen - Create a new keypair + ipfs key import - Import a key and prints imported key id + ipfs key list - List all local keypairs. + ipfs key rename - Rename a keypair. + ipfs key rm ... - Remove a keypair. + ipfs key rotate - Rotates the IPFS identity. + + For more information about each command, use: + 'ipfs key --help' +``` + +#### ipfs crypt + +**NOTE:** as of 2023 Q4, `ipfs crypt` commands are not implemented yet. + +``` + ipfs crypt - Perform cryptographic operations using ipfs keypairs + +SUBCOMMANDS: + + ipfs crypt sign - Generates a signature for the given data with a specified key + ipfs crypt verify - Verify that the given data and signature match + ipfs crypt encrypt - Encrypt the given data + ipfs crypt decrypt - Decrypt the given data + +DESCRIPTION: + + `ipfs crypt` is a command used to perform various cryptographic operations + using ipfs keypairs, including: signing, verifying, encrypting and decrypting. +``` + +#### Some subcommands: + +##### ipfs key Gen + + +``` +USAGE + ipfs key gen - Create a new keypair + +SYNOPSIS + ipfs key gen [--type= | -t] [--size= | -s] + [--ipns-base=] [--] + +ARGUMENTS + + - name of key to create + +OPTIONS + + -t, --type string - type of the key to create: rsa, ed25519. Default: + ed25519. + -s, --size int - size of the key to generate. + --ipns-base string - Encoding used for keys: Can either be a multibase + encoded CID or a base58btc encoded multihash. Takes + {b58mh|base36|k|base32|b...}. Default: base36. +``` + +* * * + +##### Key Send + +``` +USAGE + ipfs key - Create and list IPNS name keypairs + +SYNOPSIS + ipfs key + +DESCRIPTION + + 'ipfs key gen' generates a new keypair for usage with IPNS and 'ipfs name + publish'. + + > ipfs key gen --type=rsa --size=2048 mykey + > ipfs name publish --key=mykey QmSomeHash + + 'ipfs key list' lists the available keys. + + > ipfs key list + self + mykey + + +SUBCOMMANDS + ipfs key export - Export a keypair + ipfs key gen - Create a new keypair + ipfs key import - Import a key and prints imported key id + ipfs key list - List all local keypairs. + ipfs key rename - Rename a keypair. + ipfs key rm ... - Remove a keypair. + ipfs key rotate - Rotates the IPFS identity. + + For more information about each command, use: + 'ipfs key --help' +``` + +##### Comments: + +Ensure that the user knows the implications of sending a key. + +* * * + +##### Crypt Encrypt + +``` + ipfs crypt encrypt - Encrypt the given data with a specified key + +ARGUMENTS: + + data - The filename of the data to be encrypted ("-" for stdin) + +OPTIONS: + + -k, -key string - The name of the key to use for encryption (default: localkey) + -o, -output string - The name of the output file (default: stdout) + -c, -cipher string - The cipher to use for the operation + -m, -mode string - The block cipher mode to use for the operation + +DESCRIPTION: + + 'ipfs crypt encrypt' is a command used to encypt data so that only holders of a certain + key can read it. +``` + +##### Comments: + +This should probably just operate on raw data and not on DAGs. + +* * * + +##### Other Interface Changes + +We will also need to make additions to support keys in other commands, these changes are as follows: + +- `ipfs add` + - Support for a `-encrypt-key` option, for block encrypting the file being added with the key + - also adds an 'encrypted' node above the root unixfs node + - Support for a `-sign-key` option to attach a signature node above the root unixfs node + +- `ipfs block put` + - Support for a `-encrypt-key` option, for encrypting the block before hashing and storing + +- `ipfs object put` + - Support for a `-encrypt-key` option, for encrypting the object before hashing and storing + +- `ipfs name publish` + - Support for a `-key` option to select which keyspace to publish to + +### Code changes and additions + +This sections outlines code organization around this feature. + +#### Keystore package + +The fsrepo carries a `keystore` that can be used to load/store keys. The keystore is implemented following this interface: + +```go +// Keystore provides a key management interface +type Keystore interface { + // Has returns whether or not a key exist in the Keystore + Has(string) (bool, error) + // Put stores a key in the Keystore, if a key with the same name already exists, returns ErrKeyExists + Put(string, ci.PrivKey) error + // Get retrieves a key from the Keystore if it exists, and returns ErrNoSuchKey + // otherwise. + Get(string) (ci.PrivKey, error) + // Delete removes a key from the Keystore + Delete(string) error + // List returns a list of key identifier + List() ([]string, error) +} +``` + +Note: Never store passwords as strings, strings cannot be zeroed out after they are used. +using a byte array allows you to write zeroes over the memory so that the users password +does not linger in memory. + +#### Unixfs + +- new node types, 'encrypted' and 'signed', probably shouldn't be in unixfs, just understood by it +- if new node types are not unixfs nodes, special consideration must be given to the interop + +- DagReader needs to be able to access keystore to seamlessly stream encrypted data we have keys for + - also needs to be able to verify signatures + +#### Importer + +- DagBuilderHelper needs to be able to encrypt blocks + - Dag Nodes should be generated like normal, then encrypted, and their parents should + link to the hash of the encrypted node +- DagBuilderParams should have extra parameters to accommodate creating a DBH that encrypts the blocks + +#### New 'Encrypt' package + +Should contain code for crypto operations on dags. + +Encryption of dags should work by first generating a symmetric key, and using +that key to encrypt all the data. That key should then be encrypted with the +public key chosen and stored in the Encrypted DAG structure. + +Note: One option is to simply add it to the key interface. + +### Structures +Some tentative mockups (in json) of the new DAG structures for signing and encrypting + +Signed DAG: +``` +{ + "Links" : [ + { + "Name":"@content", + "Hash":"QmTheContent", + } + ], + "Data": protobuf{ + "Type":"Signed DAG", + "Signature": "thesignature", + "PubKeyID": "QmPubKeyHash", + } +} +``` + +Encrypted DAG: +``` +{ + "Links" : [ + { + "Name":"@content", + "Hash":"QmRawEncryptedDag", + } + ], + "Data": protobuf{ + "Type":"Encrypted DAG", + "PubKeyID": "QmPubKeyHash", + "Key": "ephemeral symmetric key, encrypted with public key", + } +} +``` diff --git a/docs/specifications/repository.md b/docs/specifications/repository.md new file mode 100644 index 00000000000..0e7d663d59e --- /dev/null +++ b/docs/specifications/repository.md @@ -0,0 +1,131 @@ +# ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) IPFS Repo Spec + +**Author(s)**: +- [Juan Benet](github.com/jbenet) + +**Abstract** + +This spec defines an IPFS Repo, its contents, and its interface. It does not specify how the repo data is actually stored, as that is done via swappable implementations. + +# Table of Contents + +- [Definition](#definition) +- [Repo Contents](#repo-contents) + - [version](#version) + - [datastore](#datastore) + - [keystore](#keystore) + - [config (state)](#config-state) + - [locks](#locks) + - [datastore\_spec](#datastore_spec) + - [hooks (TODO)](#hooks-todo) +- [Notes](#notes) + +## Definition + +A `repo` is the storage repository of an IPFS node. It is the subsystem that +actually stores the data IPFS nodes use. All IPFS objects are stored +in a repo (similar to git). + +There are many possible repo implementations, depending on the storage media +used. Most commonly, IPFS nodes use an [fs-repo](repository_fs.md). + +Repo Implementations: +- [fs-repo](repository_fs.md) - stored in the os filesystem +- mem-repo - stored in process memory +- s3-repo - stored in amazon s3 + +## Repo Contents + +The Repo stores a collection of [IPLD](https://github.com/ipld/specs#readme) objects that represent: + +- **config** - node configuration and settings +- **datastore** - content stored locally, and indexing data +- **keystore** - cryptographic keys, including node's identity +- **hooks** - scripts to run at predefined times (not yet implemented) + +Note that the IPLD objects a repo stores are divided into: +- **state** (system, control plane) used for the node's internal state +- **content** (userland, data plane) which represent the user's cached and pinned data. + +Additionally, the repo state must determine the following. These need not be IPLD objects, though it is of course encouraged: + +- **version** - the repo version, required for safe migrations +- **locks** - process semaphores for correct concurrent access +- **datastore_spec** - array of mounting points and their properties + +Finally, the repo also stores the blocks with blobs containing binary data. + +![](./ipfs-repo-contents.png) + +### version + +Repo implementations may change over time, thus they MUST include a `version` recognizable across versions. Meaning that a tool MUST be able to read the `version` of a given repo type. + +For example, the `fs-repo` simply includes a `version` file with the version number. This way, the repo contents can evolve over time but the version remains readable the same way across versions. + +### datastore + +IPFS nodes store some IPLD objects locally. These are either (a) **state objects** required for local operation -- such as the `config` and `keys` -- or (b) **content objects** used to represent data locally available. **Content objects** are either _pinned_ (stored until they are unpinned) or _cached_ (stored until the next repo garbage collection). + +The name "datastore" comes from [go-datastore](https://github.com/jbenet/go-datastore), a library for swappable key-value stores. Like its name-sake, some repo implementations feature swappable datastores, for example: +- an fs-repo with a leveldb datastore +- an fs-repo with a boltdb datastore +- an fs-repo with a union fs and leveldb datastore +- an fs-repo with an s3 datastore +- an s3-repo with a cached fs and s3 datastore + +This makes it easy to change properties or performance characteristics of a repo without an entirely new implementation. + +### keystore + +A Repo typically holds the keys a node has access to, for signing and for encryption. + +Details on operation and storage of the keystore can be found in [`repository_fs.md`](repository_fs.md) and [`keystore.md`](keystore.md). + +### config (state) + +The node's `config` (configuration) is a tree of variables, used to configure various aspects of operation. For example: +- the set of bootstrap peers IPFS uses to connect to the network +- the Swarm, API, and Gateway network listen addresses +- the Datastore configuration regarding the construction and operation of the on-disk storage system. + +There is a set of properties, which are mandatory for the repo usage. Those are `Addresses`, `Discovery`, `Bootstrap`, `Identity`, `Datastore` and `Keychain`. + +It is recommended that `config` files avoid identifying information, so that they may be re-shared across multiple nodes. + +**CHANGES**: today, implementations like js-ipfs and go-ipfs store the peer-id and private key directly in the config. These will be removed and moved out. + +### locks + +IPFS implementations may use multiple processes, or may disallow multiple processes from using the same repo simultaneously. Others may disallow using the same repo but may allow sharing _datastores_ simultaneously. This synchronization is accomplished via _locks_. + +All repos contain the following standard locks: +- `repo.lock` - prevents concurrent access to the repo. Must be held to _read_ or _write_. + +### datastore_spec + +This file is created according to the Datastore configuration specified in the `config` file. It contains an array with all the mounting points that the repo is using, as well as its properties. This way, the `datastore_spec` file must have the same mounting points as defined in the Datastore configuration. + +It is important pointing out that the `Datastore` in config must have a `Spec` property, which defines the structure of the ipfs datastore. It is a composable structure, where each datastore is represented by a json object. + +### hooks (TODO) + +Like git, IPFS nodes will allow `hooks`, a set of user configurable scripts to run at predefined moments in IPFS operations. This makes it easy to customize the behavior of IPFS nodes without changing the implementations themselves. + +## Notes + +#### A Repo uniquely identifies an IPFS Node + +A repository uniquely identifies a node. Running two different IPFS programs with identical repositories -- and thus identical identities -- WILL cause problems. + +Datastores MAY be shared -- with proper synchronization -- though note that sharing datastore access MAY erode privacy. + +#### Repo implementation changes MUST include migrations + +**DO NOT BREAK USERS' DATA.** This is critical. Thus, any changes to a repo's implementation **MUST** be accompanied by a **SAFE** migration tool. + +See https://github.com/jbenet/go-ipfs/issues/537 and https://github.com/jbenet/random-ideas/issues/33 + +#### Repo Versioning + +A repo version is a single incrementing integer. All versions are considered non-compatible. Repos of different versions MUST be run through the appropriate migration tools before use. diff --git a/docs/specifications/repository_fs.md b/docs/specifications/repository_fs.md new file mode 100644 index 00000000000..01e30d1c393 --- /dev/null +++ b/docs/specifications/repository_fs.md @@ -0,0 +1,279 @@ +# ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) fs-repo + +**Author(s)**: +- [Juan Benet](github.com/jbenet) +- [David Dias](github.com/daviddias) +- [Hector Sanjuan](github.com/hsanjuan) + +**Abstract** + +This spec defines `fs-repo` version `1`, its formats, and semantics. + +# Table of Contents + +- [Definition](#definition) +- [Contents](#contents) + - [api](#api) + - [blocks/](#blocks) + - [config](#config) + - [hooks/](#hooks) + - [keystore/](#keystore) + - [datastore/](#datastore) + - [logs/](#logs) + - [repo.lock](#repolock) + - [version](#version) +- [Datastore](#datastore-1) +- [Notes](#notes) + - [Location](#location) + - [blocks/ with an fs-datastore](#blocks-with-an-fs-datastore) + - [Reading without the `repo.lock`](#reading-without-the-repolock) + +## Definition + +`fs-repo` is a filesystem implementation of the IPFS [repo](repository.md). + + +## Contents + +![](img/ipfs-repo-contents.png?) + +``` +.ipfs/ +├── api <--- running daemon api addr +├── blocks/ <--- objects stored directly on disk +│ └── aa <--- prefix namespacing like git +│ └── aa <--- N tiers +├── config <--- config file (json or toml) +├── hooks/ <--- hook scripts +├── keystore/ <--- cryptographic keys +│ ├── key_b32name <--- private key with base32-encoded name +├── datastore/ <--- datastore +├── logs/ <--- 1 or more files (log rotate) +│ └── events.log <--- can be tailed +├── repo.lock <--- mutex for repo +└── version <--- version file +``` + +### api + +`./api` is a file that exists to denote an API endpoint to listen to. +- It MAY exist even if the endpoint is no longer live (i.e. it is a _stale_ or left-over `./api` file). + +In the presence of an `./api` file, ipfs tools (e.g. go-ipfs `ipfs daemon`) MUST attempt to delegate to the endpoint, and MAY remove the file if reasonably certain the file is stale. (e.g. endpoint is local, but no process is live) + +The `./api` file is used in conjunction with the `repo.lock`. Clients may opt to use the api service, or wait until the process holding `repo.lock` exits. The file's content is the api endpoint as a [multiaddr](https://github.com/jbenet/multiaddr) + +``` +> cat .ipfs/api +/ip4/127.0.0.1/tcp/5001 +``` + +Notes: +- The API server must remove the api file before releasing the `repo.lock`. +- It is not enough to use the `config` file, as the API addr of a daemon may + have been overridden via ENV or flag. + +#### api file for remote control + +One use case of the `api` file is to have a repo directory like: + +``` +> tree $IPFS_PATH +/Users/jbenet/.ipfs +└── api + +0 directories, 1 files + +> cat $IPFS_PATH/api +/ip4/1.2.3.4/tcp/5001 +``` + +In go-ipfs, this has the same effect as: + +``` +ipfs --api /ip4/1.2.3.4/tcp/5001 +``` + +Meaning that it makes ipfs tools use an ipfs node at the given endpoint, instead of the local directory as a repo. + +In this use case, the rest of the `$IPFS_PATH` may be completely empty, and no other information is necessary. It cannot be said it is a _repo_ per-se. (TODO: come up with a good name for this). + +### blocks/ + +The `block/` component contains the raw data representing all IPFS objects +stored locally, whether pinned or cached. This component is controlled by the ` +datastore`. For example, it may be stored within a leveldb instance in ` +datastore/`, or it may be stored entirely with independent files, like git. + +In the default case, the user uses fs-datastore for all `/blocks` so the +objects are stored in individual files. In other cases, `/blocks` may even be +stored remotely + +- [blocks/ with an fs-datastore](#blocks-with-an-fs-datastore) + +### config + +The `config` file is a JSON or TOML file that contains the tree of +configuration variables. It MUST only be changed while holding the +`repo.lock`, or potentially lose edits. + +### hooks/ + +The `hooks` directory contains executable scripts to be called on specific +events to alter ipfs node behavior. + +Currently available hooks: + +``` +none +``` + +### keystore/ + + +The `keystore` directory holds additional private keys that the node has +access to (the public keys can be derived from them). + +The keystore repository should have `0700` permissions (readable, writable by +the owner only). + +The key files are named as `key_base32encodedNameNoPadding` where `key_` is a +fixed prefix followed by a base32 encoded identifier, **without padding and +downcased**. The identifier usually corresponds to a human-friendly name given +by the user. + +The key files should have '0400' permissions (read-only, by the owner only). + +The `self` key identifier is reserved for the peer's main key, and therefore key named +`key_onswyzq` is allowed in this folder. + +The key files themselves contain a serialized representation of the keys as +defined in the +[libp2p specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#keys). + +### datastore/ + +The `datastore` directory contains the data for a leveldb instance used to +store operation data for the IPFS node. If the user uses a `boltdb` datastore +instead, the directory will be named `boltdb`. Thus the data files of each +database will not clash. + +TODO: consider whether all should just be named `leveldb/` + +### logs/ + +IPFS implementations put event log files inside the `logs/` directory. The +latest log file is `logs/events`. Others, rotated out may exist, with a +timestamp of their creation. For example: + + + +### repo.lock + +`repo.lock` prevents concurrent access to the repo. Its content SHOULD BE the +PID of the process currently holding the lock. This allows clients to detect +a failed lock and cleanup. + +``` +> cat .ipfs/repo.lock +42 +> ps | grep "ipfs daemon" +42 ttys000 79:05.83 ipfs daemon +``` + +**TODO, ADDRESS DISCREPANCY:** the go-ipfs implementation does not currently store the PID in the file, which in some systems causes failures after a failure or a teardown. This SHOULD NOT require any manual intervention-- a present lock should give new processes enough information to recover. Doing this correctly in a portable, safe way, with good UX is very tricky. We must be careful with TOCTTOU bugs, and multiple concurrent processes capable of running at any moment. The goal is for all processes to operate safely, to avoid bothering the user, and for the repo to always remain in a correct, consistent state. + +### version + +The `version` file contains the repo implementation name and version. This format has changed over time: + +``` +# in version 0 +> cat $repo-at-version-0/version +cat: /Users/jbenet/.ipfs/version: No such file or directory + +# in versions 1 and 2 +> cat $repo-at-version-1/version +1 +> cat $repo-at-version-2/version +2 + +# in versions >3 +> cat $repo-at-version-3/version +fs-repo/3 +``` + +_Any_ fs-repo implementation of _any_ versions `>0` MUST be able to read the +`version` file. It MUST NOT change format between versions. The sole exception is version 0, which had no file. + +**TODO: ADDRESS DISCREPANCY:** versions 1 and 2 of the go-ipfs implementation use just the integer number. It SHOULD have used `fs-repo/`. We could either change the spec and always just use the int, or change go-ipfs in version `>3`. we will have to be backwards compatible. + +## Datastore + +Both the `/blocks` and `/datastore` directories are controlled by the +`datastore` component of the repo. + +## Notes + +### Location + +The `fs-repo` can be located anywhere on the filesystem. By default +clients should search for a repo in: + +``` +~/.ipfs +``` + +Users can tell IPFS programs to look elsewhere with the env var: + +``` +IPFS_PATH=/path/to/repo +``` + +### blocks/ with an fs-datastore + +![](fs-datastore.png) + +Each object is stored in its own file. The filename is the hash of the object. +The files are nested in directories whose names are prefixes of the hash, as +in `.git/objects`. + +For example: +```sh +# multihashes +1220fe389b55ea958590769f9046b0f7268bca90a92e4a9f45cbb30930f4bf89269d # sha2 +1114f623e0ec7f8719fb14a18838d2a3ef4e550b5e53 # sha1 + +# locations of the blocks +.ipfs/blocks/1114/f6/23/e0ec7f8719fb14a18838d2a3ef4e550b5e53 +.ipfs/blocks/1220/fe/38/9b55ea958590769f9046b0f7268bca90a92e4a9f45cbb30930f4bf89269d +``` + +**Important Notes:** +- the hashes are encoded in hex, not the usual base58, because some + filesystems are case insensitive. +- the multihash prefix is two bytes, which would waste two directory levels, + thus these are combined into one. +- the git `idx` and `pack` file formats could be used to coalesce objects + +**TODO: ADDRESS DISCREPANCY:** + +the go-ipfs fs-repo in version 2 uses a different `blocks/` dir layout: + +``` +/Users/jbenet/.ipfs/blocks +├── 12200007 +│ └── 12200007d4e3a319cd8c7c9979280e150fc5dbaae1ce54e790f84ae5fd3c3c1a0475.data +├── 1220000f +│ └── 1220000fadd95a98f3a47c1ba54a26c77e15c1a175a975d88cf198cc505a06295b12.data +``` + +We MUST address whether we should change the fs-repo spec to match go-ipfs in version 2, or we should change go-ipfs to match the fs-repo spec (more tiers). We MUST also address whether the levels are a repo version parameter or a config parameter. There are filesystems in which a different fanout will have wildly different performance. These are mostly networked and legacy filesystems. + +### Reading without the `repo.lock` + +Programs MUST hold the `repo.lock` while reading and writing most files in the +repo. The only two exceptions are: + +- `repo.lock` - so clients may check for it +- `api` - so clients may use the API