Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SharedTree Engineering Plan #8662

Closed
DLehenbauer opened this issue Jan 5, 2022 · 1 comment
Closed

SharedTree Engineering Plan #8662

DLehenbauer opened this issue Jan 5, 2022 · 1 comment
Assignees
Labels
area: dds Issues related to distributed data structures design-required This issue requires design thought
Milestone

Comments

@DLehenbauer
Copy link
Contributor

DLehenbauer commented Jan 5, 2022

Engineering Plan

January marks the beginning of our engineering work on the converged tree DDS, which builds on Autodesk's pioneering work with the Property Tree DDS and incorporates novel approaches for preserving user intention contributed by the Microsoft Whiteboard team.

Following the Fluid naming conventions, we will adopt the name "SharedTree" for the converged tree DDS going forward (not to be confused with the experimental DDS currently residing in the '/experimental/tree' directory).

Engineering will happen in the main FluidFramework branch. Since we are on a productization path, we will forgo the use of the '/experimental/' directory. However, we will continue to publish packages under "@fluid-experimental/" until we are ready to release.

January

ChangeSet Format (Design)

The changeset format is the most critical aspect of the SharedTree design. Changesets are used as both the wire format for communicating changes between clients as well as the storage format for summarizing the tree to disk (for storage, the changeset is squashed such that it is a snapshot of the current tree state.)

Our highest priority for January is finalizing the changeset format. @yann-achard-MS has identified an algebra of intention preserving data change operations and is working on a proposal to express these operations in an efficient PropertySet-like changeset format.

In general, the design of the changeset format is challenging due to the many constraints:

  • The format must be compact to minimize storage and network costs.
  • The format must be spatially ordered (i.e., organized hierarchically matching the tree structure) for efficient filtering for clients with partial views.
  • However, the format must also preserve temporal order (i.e., the order in which changes were applied) when necessary for intention preservation.
  • The format must support the ability to be split into (roughly) fixed-size chunks to support efficient random/partial access.
  • The format must encode enough information so that it may be split and partially applied to pre-existing chunk boundaries.

@yann-achard-MS, along with support from @ruiterr and @PaulKwMicrosoft, will continue to finalize the changeset format in January. Our key result is to unblock implementation of rebasing and encoding of changsets at the beginning of February.

In-Memory Representation (Design)

The client's In-Memory representation is a key integration point that enables end-to-end testing. Lingering concerns about runtime costs when operating on high density data, such as mesh vertices, have made it difficult to completely close on aspects of the in-memory representation.

To rectify this, @CraigMacomber is joining @PaulKwMicrosoft to drill into our remaining pathological cases. (See Craig's document on tree compression here.) The key result is to unblock implementation of the in-memory representation by the end of January so we can integrate other deliverables as they arrive.

Queued Work

The following work items are also candidates for January but deemed less important than the design items called out above (currently they are unfunded)

Op Algebra Implementation

The op algebra used to represent data changes is sufficiently stable that we should begin the initial implementation and verification soon to bolster our confidence in the correctness of the design (intention preservation, convergence, parity with PropertySet and Whiteboard DDSes, etc.)

Changeset Encode/Decode and Chunking

A significant portion of the work to encode/decode and chunk changesets is independent of the final changeset format and could begin now (and would be informative to the changeset design work with respect to size tradeoffs.)

Of particular interest are the algorithms for splitting changesets into (roughly) fixed-sized blobs and splitting incoming changesets along pre-existing chunk/view boundaries. In both cases, it is desirable to perform the intersection operations without decoding and re-encoding either the source or destination chunk.

@ghost ghost added the triage label Jan 5, 2022
@DLehenbauer DLehenbauer self-assigned this Jan 5, 2022
@tylerbutler tylerbutler added the area: dds Issues related to distributed data structures label Jan 21, 2022
@tylerbutler tylerbutler added this to the January 2022 milestone Jan 21, 2022
@ghost ghost removed the triage label Jan 21, 2022
@tylerbutler tylerbutler added the design-required This issue requires design thought label Jan 21, 2022
@DLehenbauer
Copy link
Contributor Author

Content has been merged into Epic #8273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dds Issues related to distributed data structures design-required This issue requires design thought
Projects
None yet
Development

No branches or pull requests

3 participants