Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for taking snapshot of a column family and creating column family from a given CF snapshot. #3469

Open
sachja opened this issue Feb 6, 2018 · 7 comments
Assignees

Comments

@sachja
Copy link

sachja commented Feb 6, 2018

When building a distributed key value store on top of rocksdb a useful feature is to be able to migrate a column family from one node to another. For this it will be useful to have APIs to
a. Generate a snapshot of a column family. This could be implemented by flushing the memtable and giving handles to all the sstables of the column family with their metadata information.
b. Create a column family from a snapshot. The snapshot can be transferred by the caller by copying the files and the metadata information and then it can call this API to create a column family with identical state as the sender.

Pls let us know if something for above already exists or can be extended or if anyone is planning to work on this?

@siying
Copy link
Contributor

siying commented Feb 9, 2018

We don't have exact API for that. It's something interesting to have, but we don't have near-term plan to add this feature. If you are interested, feel free to contribute the code and we'll be happy to review and merge it.

@siying
Copy link
Contributor

siying commented Feb 13, 2018

If you don't have further comments, I'm going to close the issue.

@siying siying self-assigned this Feb 13, 2018
@sachja
Copy link
Author

sachja commented Feb 24, 2018

Hi
We started working on the support for this. One place we are stuck is that the column family name and ID may not be same on the sender and receiver. It looks like the sst files have the column family ID in its table properties which is checked both during ingest and maybe during lookup.
Our question is why we need to embed the column family ID info in the sst file at all since I assume the manifest file will have the info about the SST files for each column family. Are there just a few places where the column family info in the SST file checked which can be removed with an option.

Thanks for your help.

@sachja
Copy link
Author

sachja commented Feb 24, 2018

@vpallipadi
@snaeni

@vpallipadi
Copy link

@siying Can you please comment on this change.
Specifically, we are ignoring the column family id of the sst file during this import and Version sequence number is updated if the imported sequence number is higher.

The idea is to import sst files into a column family on an active db (that may have other active column families), preserving levels and sequence numbers from the source cf.

We have been testing this change internally for couple of weeks and this part seems to be working fine. We still have issues on the source side where we are preparing the sst files for import. (1) With DisableFileDeletions and copying of sst files over and EnableFileDeletions, as in #3609 and (2) a potential race with DisableFileDeletions and background compactions.

@vpallipadi
Copy link

@siying Any comments on this change - vpallipadi@50b517f

Let me know if I need to copy anyone else to get an initial feedback on the approach.
Thanks.

@siying
Copy link
Contributor

siying commented May 8, 2018

I'll take a look.

facebook-github-bot pushed a commit that referenced this issue Jul 17, 2019
Summary:
Refresh of the earlier change here - #5135

This is a review request for code change needed for - #3469
"Add support for taking snapshot of a column family and creating column family from a given CF snapshot"

We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.

(1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
// Exports all live SST files of a specified Column Family onto export_dir,
// returning SST files information in metadata.
// - SST files will be created as hard links when the directory specified
//   is in the same partition as the db directory, copied otherwise.
// - export_dir should not already exist and will be created by this API.
// - Always triggers a flush.
virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                  const std::string& export_dir,
                                  ExportImportFilesMetaData** metadata);

Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
returning the list of file metadata.

(2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
// CreateColumnFamilyWithImport() will create a new column family with
// column_family_name and import external SST files specified in metadata into
// this column family.
// (1) External SST files can be created using SstFileWriter.
// (2) External SST files can be exported from a particular column family in
//     an existing DB.
// Option in import_options specifies whether the external files are copied or
// moved (default is copy). When option specifies copy, managing files at
// external_file_path is caller's responsibility. When option specifies a
// move, the call ensures that the specified files at external_file_path are
// deleted on successful return and files are not modified on any error
// return.
// On error return, column family handle returned will be nullptr.
// ColumnFamily will be present on successful return and will not be present
// on error return. ColumnFamily may be present on any crash during this call.
virtual Status CreateColumnFamilyWithImport(
    const ColumnFamilyOptions& options, const std::string& column_family_name,
    const ImportColumnFamilyOptions& import_options,
    const ExportImportFilesMetaData& metadata,
    ColumnFamilyHandle** handle);

Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.

If incoming sequence number is higher than current local sequence number, local sequence
number is updated to reflect this.

Note, as the sst files is are being moved across Column Families, Column Family name in sst file
will no longer match the actual column family on destination DB. The API does not modify Column
Family name or id in the sst files being imported.
Pull Request resolved: #5495

Differential Revision: D16018881

fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
merryChris pushed a commit to merryChris/rocksdb that referenced this issue Nov 18, 2019
Summary:
Refresh of the earlier change here - facebook#5135

This is a review request for code change needed for - facebook#3469
"Add support for taking snapshot of a column family and creating column family from a given CF snapshot"

We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.

(1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
// Exports all live SST files of a specified Column Family onto export_dir,
// returning SST files information in metadata.
// - SST files will be created as hard links when the directory specified
//   is in the same partition as the db directory, copied otherwise.
// - export_dir should not already exist and will be created by this API.
// - Always triggers a flush.
virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                  const std::string& export_dir,
                                  ExportImportFilesMetaData** metadata);

Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
returning the list of file metadata.

(2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
// CreateColumnFamilyWithImport() will create a new column family with
// column_family_name and import external SST files specified in metadata into
// this column family.
// (1) External SST files can be created using SstFileWriter.
// (2) External SST files can be exported from a particular column family in
//     an existing DB.
// Option in import_options specifies whether the external files are copied or
// moved (default is copy). When option specifies copy, managing files at
// external_file_path is caller's responsibility. When option specifies a
// move, the call ensures that the specified files at external_file_path are
// deleted on successful return and files are not modified on any error
// return.
// On error return, column family handle returned will be nullptr.
// ColumnFamily will be present on successful return and will not be present
// on error return. ColumnFamily may be present on any crash during this call.
virtual Status CreateColumnFamilyWithImport(
    const ColumnFamilyOptions& options, const std::string& column_family_name,
    const ImportColumnFamilyOptions& import_options,
    const ExportImportFilesMetaData& metadata,
    ColumnFamilyHandle** handle);

Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.

If incoming sequence number is higher than current local sequence number, local sequence
number is updated to reflect this.

Note, as the sst files is are being moved across Column Families, Column Family name in sst file
will no longer match the actual column family on destination DB. The API does not modify Column
Family name or id in the sst files being imported.
Pull Request resolved: facebook#5495

Differential Revision: D16018881

fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants