Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: SharedTree DDS #8273

Closed
DLehenbauer opened this issue Nov 16, 2021 · 10 comments
Closed

Epic: SharedTree DDS #8273

DLehenbauer opened this issue Nov 16, 2021 · 10 comments
Assignees
Labels
area: dds: tree epic An issue that is a parent to several other issues status: stale

Comments

@DLehenbauer
Copy link
Contributor

DLehenbauer commented Nov 16, 2021

SharedTree DDS Roadmap

This epic is deprecated, and has been relocated to roadmap.md.

@ghost ghost added the triage label Nov 16, 2021
@DLehenbauer DLehenbauer self-assigned this Nov 16, 2021
@DLehenbauer DLehenbauer changed the title Epic: Converged Tree DDS Epic: Tree DDS Nov 17, 2021
@tylerbutler tylerbutler added the epic An issue that is a parent to several other issues label Nov 19, 2021
@nmsimons nmsimons added this to the December 2021 milestone Nov 30, 2021
@nmsimons nmsimons changed the title Epic: Tree DDS Epic: Tree DDS Design Nov 30, 2021
@vladsud vladsud modified the milestones: December 2021, January 2022 Jan 10, 2022
@DLehenbauer DLehenbauer changed the title Epic: Tree DDS Design Epic: SharedTree DDS Jan 28, 2022
@ungrokkable
Copy link

Hi @DLehenbauer,
I'm trying to get a better grasp on the serde side of M1. In your feature, do you plan to allow serializing the whole tree or just the changeset itself? Thanks.

@DLehenbauer
Copy link
Contributor Author

@ungrokkable - We've been working backwards toward API, starting with op and storage formats, so the discussion on the the M1 API is just spinning up. I just summarized the direction we're currently exploring in #8989 for the underlying API, with opportunity to ship an example/experimental higher level convenience API on top what #8989 describes in the same timeframe.

@ungrokkable
Copy link

Hi @DLehenbauer, I'm officially requesting a PropertyDDS-like interface for TreeDDS. I realize that #8989 is more for discussing the internal API. Do we have something equivalent for the higher level API? We obviously are flexible and won't require a perfect PropertyDDS interface, but we do want to ensure some of the functionality like binders are there. Thanks, and if I can help with anything, let me know.

@DLehenbauer
Copy link
Contributor Author

For M7/M8, this comment nicely describes what integration with external systems and indexed queries look like at scale.

@DLehenbauer
Copy link
Contributor Author

DLehenbauer commented Feb 15, 2022

@taylorsw04 - Notes on pending key decisions to integrate into plan

  • Data Model
    • Subset of JS, C++, or something else (e.g., Intentional Unified Data Model)
    • Which parts of schema are enforced on read vs. write
  • High-level API: General flavor and how early to start? (See @ungrokkable's request above)
  • Responsibility of low-level engine vs. high-level API
    • Who reifies what and when
    • Who caches what and when
  • Rebase
    • Sandwich vs. Scaffolding
    • Are we maximizing IDs when available?

@taylorsw04
Copy link
Contributor

taylorsw04 commented Feb 15, 2022

My WIP aggregation of the open decisions we have. I plan to build a topology at some point to help focus our efforts, and this is absolutely not complete. I'll keep folding in new information here.

Urgent (either impending or have implications for impending work):

  • "Engine"-side in-memory format
    • How do we implement it such that neither an immutable nor mutable API takes a perf hit?
  • Tree abstraction
    • Is the consumer-facing API different from the underlying representation?
      • Regardless, what is desired for the public API (JS/bespoke/etc.)?
    • If JSON is the abstraction
      • How does this change merge resolution?
      • How does this affect schema?
      • Where can identity appear?
        • How does this affect serialization?
      • How does extensibility work (in WB model, anything can be added everywhere)?
    • If bespoke/WB model is the abstraction
      • All scalars are leaf nodes?
      • Can scalars lack identity?
      • Can traits contain a mix of unboxed scalars and nodes?
      • Always reifying arrays vs. not always doing so?
      • Allow specialized collections (maps/sets)?
        • At leaves only?
        • Would they have identities?
      • Can identities be present but not usable for lookup?
    • Do we expose an immutable or mutable API initially?
    • Where is schema? On-read vs. on-write could impact representation.
  • Reifying
    • Responsibility of low-level engine vs. high-level API
      • Who reifies what and when
      • Who caches what and when
  • Rebasing
    • Do we want to rebase on the server?
    • Should rebasing be temporal onto spatial or spatial onto spatial?
      • If you can only rebase via temporal, how do we rebase the anchors that are inputs to the command? Phantom node approach?
    • Can we better leverage identity here?
    • OT vs. MT/CRDT (aka scaffolding)?
      • If OT, exclusion or inversion?
  • How would this allow delta-only rebasing?
    • If MT, how does it work?
  • How would this allow delta-only rebasing?
  • Can we leverage identities in the scaffolding?
  • Squashing
    • How much can we squash intra-commit states?
    • How much can we squash inter-commit states?
  • How do we want to allow custom/app-specific merge behavior?
    • Rebase handler/conflict handler?
    • Definitely want re-running of commands
      • How can we make this more efficient?
  • Constraints
    • What types do we want?

Lower-pri:

  • Optional sub-transactions
    • Do we want them?
  • Partial checkout
    • How do moves work?
    • How do constraints work?
  • Range semantics
    • "Atomic range": do we want it/when do we want it?

@yann-achard-MS
Copy link
Contributor

@ruiterr - here is a table summarizing the differences between PropertyDDS and SharedTree when it comes to PropertyDDS constraints. Please let me if I got something wrong or forgot something on the PropertyDDS side.

Operation Implicit PropertyDDS Constraint SharedTree Behavior When Constraint Not Met
insert child at object field field is empty inserts before or after the existing content
remove child at object field field is not empty silent no-op
modify element at object field field is not empty conflicts (drops the modify)
insert element at array index none n/a
remove element at array index elem is present silent no-op
modify element at array index index is not empty conflicts (drops the modify)

A couple comments on the above:

  • In all cases, once SharedTree supports explicit constraints, we'll be able replicate the implicit PropertyDDS constraints exactly. In that sense, the third column only represents the default behavior of shared tree.
  • For the "insert child at object field" case, one approach we can use is to craft edits such that the field is guaranteed to be empty by always removing its contents prior to the insert.
  • When it comes to constraints, maps behave like objects in both systems.
  • When it comes to constraints, strings behave like arrays in both systems.

@ruiterr
Copy link
Contributor

ruiterr commented Mar 21, 2022

@yann-achard-MS I think you correctly described the current behaviour of PropertyDDS. At the moment, the conflicting cases are all silently handled. There should actually never be "constraint violations", but in the rebase with respect to the previous operation the CS is rewritten in such a way that it can be applied in a non conflicting way (a duplicate insert is dropped for fields, modification and remove are dropped if there is a parallel remove of the same element). Those cases are regarded as conflicts, but since we currently don't report conclicts to the app, they are just silently resolved.

For the "insert child at object field" case, one approach we can use is to craft edits such that the field is guaranteed to be empty by always removing its contents prior to the insert.

Would this require a slice remove (from beginning to end)? I think it won't be possible via a set remove. Let's assume we have the following situation:

Initial value: A
Modification by client 1: [remove(A, setlike), insert(B, after A)].
Modification by client 2: [remove(A, setlike), insert(C, after A)].

Let's assumed client 1 is sequenced first. After its change is applied, we have
Value: B

If we now rebase the change of client 2, with respect to the change of client 1, the remove(A) will be dropped, because it is a duplicated remove of the same element, the insert will be preserved, so the result of applying this would be
Value: C, B

@yann-achard-MS
Copy link
Contributor

@ruiterr - Thank you for clarifying what happens when the constraints are not met. Here's an attempt at characterizing the difference between the two systems:

Operation Implicit PropertyDDS Constraint SharedTree Behavior When Constraint Not Met PropertyDDS Behavior When Constraint Not Met
insert child at object field field is empty inserts before or after the existing content silent no-op
remove child at object field field is not empty silent no-op silent no-op
modify element at object field field is not empty conflicts (drops the modify) silent no-op
insert element at array index none n/a n/a
remove element at array index element is present silent no-op silent no-op
modify element at array index index is not empty conflicts (drops the modify) silent no-op

Note that there's a key difference between SharedTree conflicts and PropertyDDS's slient no-ops: while the relevant operation is dropped in both cases, the whole transaction is dropped in SharedTree. To replicate the PropertyDDS behavior, a SharedTree client could choose to only perform one change per transaction (but commit several transactions in bulk). This doesn't seem to be a very efficient workaround as it will foster long chains of more complex rebases. It also doesn't mesh will with hierarchical edits, but I'm guessing current usages of PropertyDDS would not care about them.

I think the first row of the table is the only case for which we would need to use SharedTree's constraint system to be able to reproduce the PropertyDDS outcome. How much of an issue is that (given the work-around slice-delete trick)?

More generally, I'm also curious how you feel about having the current users of PropertyDDS/HFDM adopt SharedTree's model of transactions and constraints as opposed to trying to pretend it's not different. If you don't see it as an improvement, then we want to understand why that may be.

Would this require a slice remove (from beginning to end)?

Yes. That's exactly what I had in mind.

@vladsud vladsud modified the milestones: March 2022, June 2022 Jun 3, 2022
taylorsw04 added a commit that referenced this issue Aug 30, 2022
This PR moves the SharedTree roadmap (#8273) into a markdown file and updates some links.
WayneFerrao pushed a commit to WayneFerrao/FluidFramework that referenced this issue Aug 31, 2022
This PR moves the SharedTree roadmap (microsoft#8273) into a markdown file and updates some links.
@microsoft-github-policy-service
Copy link
Contributor

This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dds: tree epic An issue that is a parent to several other issues status: stale
Projects
None yet
Development

No branches or pull requests

9 participants