Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request (dynamodb-util): Expose premarshall configuration #1533

Closed
russell-dot-js opened this issue Sep 18, 2020 · 11 comments
Closed
Labels
feature-request New feature or enhancement. May require GitHub community feedback. wontfix We have determined that we will not resolve the issue.

Comments

@russell-dot-js
Copy link
Contributor

russell-dot-js commented Sep 18, 2020

Is your feature request related to a problem? Please describe.
The problem, as detailed here, is that the current (v2) DocumentClient, and the upcoming (v3) DocumentClient, will serialize native JS Dates as {}.

We cannot make assumptions about how consumers want to serialize Dates (some want ISO strings, some want numbers, some may want just a date with no time component), but we can expose configuration to make it easy for consumers to instruct the client on how to serialize certain attributes based on type or name, without overstepping the client's responsibility and becoming a full blown ORM.

Describe the solution you'd like
I'd love to hammer out details of the contract here, and as an avid dynamo user, I'm also happy to implement this feature. The rough concept is something along these lines:

premarshall: ({TableName: string, ItemKey: KeyType, unmarshalledItem: any, attributeKey: string, attributeValue: any}) => any;

new DocumentClient({
  premarshall: ({attributeKey: key, attributeValue: value}) => {
    if(value instanceof Date) {
      return value.toIsoString(); // or convert to number depending on use case
    }
    if(value instanceof moment) {
      return value.format();
    }

    if(key === 'someKeyIDontWantInDynamo') {
      return undefined;
    }
    return value;
  }
})

I included TableName in case you want to serialize items differently based on table (especially important if you are writing to multiple tables during a transaction, you might have different rules in place for each one).

Describe alternatives you've considered
I'm not sure if we should cross the line into allowing consumers to change the key name... it would make sense for Put, but is kind of confusing for UpdateItem, where your UpdateExpression should already have your keys clearly defined.

But this is something every dynamo client struggles with - creating a clean, usable client that handles all the unique (and kind of weird) ways Dynamo is used. For example, both DynamoDB Data Mapper and DynamoDB Toolbox provide ways to instruct the client on how to map attributes, but they also come at the cost of having to strictly define your types either as classes with decorators, or via configs, so it's easy to create disconnects between the client configuration and the types you are passing to the DB when using typescript.

Thus, pure DocumentClient tends to be more usable than either of these alternatives, and I want to be careful to not take on too responsibility. However, it might be nice to allow consumers to do something like the following contrived example:

new DocumentClient({
  premarshall: ({unmarshalledItem: item, attributeKey, attributeValue}) => {
    if(['createdAt', 'updatedAt', 'favoriteColor'].includes(attributeKey)) {
      return undefined;
    }

    if(attributeKey === 'hometown') {
      return {
        attributeKey: 'GSI1_Sort',
        attributeValue: `${item.favoriteColor}_${item.hometown}_${item.updatedAt}_{${item.createdAt}`
      };
    }

    return { attributeKey, attributeValue };
  }
})

In this example, the consumer is choosing to cram createdAt, updatedAt, favoriteColor, and hometown in to a GSI's sort key, and remove those individual attributes from the object itself. But this is where it's easy for the client to start to box users in take on too much responsibility, leading to complexity and, in some cases, becoming unusable (for example, in the example above, if I wanted to store all 4 attributes separately, AS WELL AS cram into the GSI1, it would be easy for 3/4, but what about hometown, where I rewrite the attribute name?)

Additional context
IMO it's more important to have this on the way in to dynamo on the way out, because there's always some casting from DB values to runtime values anyway (e.g. converting iso string back to date, stripping database concerns, etc). But some way to inject some casting logic that could be applied to Item, Attributes, Items, etc based on endpoint is a secondary item on my wishlist.

Obviously this is something that consumers can do themselves before calling UpdateItem / PutItem, but it creates additional complexity if, for example, every consumer has to wrap the documentClient with their own functionality to marshall. Or, even worse, have one-off code like such:

documentClient.putItem({
  Key,
  Item: {
     ...value,
     createdAt: value.createdAt.toISOString(),
   }
})

Code like the above becomes highly error prone, because even with typescript, adding a new Date value to typeof T will not break any assumptions and consumers have to jump through additional hoops to keep their DB logic in line with their types, when in reality it should "just work" (without having to cross the line into a full blown ORM).

Also: DynamoDB is rad 👍

@russell-dot-js russell-dot-js added the feature-request New feature or enhancement. May require GitHub community feedback. label Sep 18, 2020
@russell-dot-js
Copy link
Contributor Author

I just realized the example where we pass in attributeKey gets hairy for nested attributes e.g.

{
  metadata: {
    createdAt: new Date(),
  }
}

@trivikr
Copy link
Member

trivikr commented Oct 1, 2020

We like this idea of premarshalling, but want to limit it as follows:

  • premarshalling should only be done on value and not key
  • instead of providing a function, user will provide a configuration which would include typeof value and function to call on the value.

We're brainstorming this internally, and will post updates in this issue.

@trivikr
Copy link
Member

trivikr commented Oct 1, 2020

Exploratory (not production-ready) code to define preMarshallConfig as an array of tuples:

type PreMarshallConfig<T extends object> = [T, (value: T) => NativeAttributeValue][];

interface MockMarshallOptions {
  readonly preMarshallConfig: PreMarshallConfig<any>;
}

const mockMarshall = (data: any, options: MockMarshallOptions): NativeAttributeValue => {
  const { preMarshallConfig } = options;
  const getPreMarsshallFunc = (data: any, preMarshallConfig: PreMarshallConfig<any>) => {
    for (const [objType, objFunc] of preMarshallConfig) {
      if (data instanceof objType) {
          return objFunc;
      }
    }
    return undefined;
  }
  const preMarshallFunction = getPreMarsshallFunc(data, preMarshallConfig);

  if (preMarshallFunction) {
    return preMarshallFunction(data);
  }
  return data;
}

Verified on TypeScript Playground
If we plan to go ahead with this, it needs to be improved by removing usages of <any> and @ts-ignore

@russell-dot-js
Copy link
Contributor Author

Totally cool not to allow changing keys, but I have a couple questions about the above:

  • Why not allow providing a function? This will let users use libraries they're often already using (e.g. joi, yup), and allow the document client to get out of the way - instanceof can get tricky with the prototype chain, forcing users to get their configurations in the right order (or else they get bit), while allowing consumers to pass in a function allows them to do whatever they want on a per-key basis
  • Why does T need to extend object? Why not allow users to store, for example, numbers as strings?
  • How would I store one Date as an iso string, and another as a number? I don't have a use case for this but there probably are some people who are using dates for numeric sorts, just as an example, while storing them as iso elsewhere.
  • I'm assuming with this implementation you're trying to avoid letting users complete complex orchestration, such as building a sort key or z-index from multiple input values. Which makes sense. But I wanted to consider that use case... is it worth considering coming up with an easy, extensible way to say put Country:{country}|Province:{province}|PostalCode:{postalCode} into GSI1_Sort ?

@stang-tgs
Copy link

We are in the process of porting aws-sdk v2 over to v3. If we modified marshall to allow marshalling of typeof data === "object instead of data?.constructor?.name === "Object", our existing code, which uses v2 DocumentClient is easily ported. Here is a snippet example of our ported code, but will no longer work because of the marshalling differences between v2 (documentClient) and v3.

  async put(model: T, overwriteExisting = false) {
    const command = new PutItemCommand({
      Item: marshall(model, { convertEmptyValues: true }), //**********PROBLEMATIC LINE********
      TableName: this._tableName,
      ReturnValues: 'NONE'
    });
    if (!overwriteExisting) {
      command.input.ConditionExpression = `attribute_not_exists(#${this.hashKey})`;
      command.input.ExpressionAttributeNames = { [`#${this.hashKey}`]: this.hashKey };
    }
    await this.dynamoDb.send(command);
    return model;
  }

T in the example above is of a TModelType which is strongly typed with HashKey and optional RangeKey. In order for our code to work WITHOUT a premarshall, along with the strict narrow marshalling, we'd have to recursively convert our models into anonymous objects before putting it into Dynamo, as our models can be nested. As it currently stands, unless we write up a helper function which recursively walks through our model and recursively spreads named classes into anonymous objects, there's no way for us to use v3 with our existing code without a major overhaul. #1865 and #1866 attempts to put v3 on par with v2 so that existing code using v2 can be used with v3.

If #1865 and #1866 will not be addressed, but addressed via #1533, prior to a premarshalling solution, is there any guidance on how we should port our code (other than writing a helper function to recursively walk through our models and change named classes to anonymous objects)? We're just looking for a reasonable migration path from v2 DocumentClient to v3.

Allowing users to specify custom marshalling is a great idea, whether it's for people to write ORM libraries, or just for one off custom serialization (e.g. Dates to a number, or ISO string, etc). However, I also think that you need to provide an easy migration path for existing v2 users who already are using DocumentClient without major code changes to their existing base (e.g., the base narrowness of the aws dynamo library to conform with v2, but allow callers to override marshalling and register their own marshaller based on value types, whether it's via instanceof or constructor.name).

Thanks!

@stang-tgs
Copy link

Just some more thoughts about premarshalling. I think premarshalling may not be as useful as initially thought because the above proposal is only about premarshall. It makes no mention of custom unmarshall. Unmarshalling is a whole other can of worms, since type information is lost, and is only as extensive as the intersection between JSON types and Dynamo types.

Because of the asymmetry, the user of the SDK will still need to have custom logic after reading from Dynamo, if they had custom premarshalling. Because of this, I think the v2 way of how the DocumentClient handled marshalling, albeit not perfect, may be the best compromise. The premarshall is quite useless, since it needs to be symmetrical with the unmarshall to be "correct", and the unmarshall can only be as good as the intersection between JS/TS/JSON and Dynamo types.

Thoughts?

@trivikr
Copy link
Member

trivikr commented Jan 20, 2021

Is there any guidance on how we should port our code (other than writing a helper function to recursively walk through our models and change named classes to anonymous objects)?

Changing named classes to anonymous objects is the recommendation we have as of now. The marshall and premarshall utility methods won't be taking a call on user's behalf.

We'll evaluate exposing new utility methods - say legacyMarshall and legacyUnmarshall which has same behavior as that of JS SDK v2.

@trivikr
Copy link
Member

trivikr commented Jan 22, 2021

Update: Instead of creating new utility methods, we'll be evaluating adding a configuration (say marshallObjects) which would marshall all Objects like v2 does.

@stang-tgs
Copy link

stang-tgs commented Jan 22, 2021

Update: Instead of creating new utility methods, we'll be evaluating adding a configuration (say marshallObjects) which would marshall all Objects like v2 does.

Sounds great! The nice thing about the v3 marshall over the v2 marshalling is that the native js/ts Set is supported vs a special DocumentClient.DynamoDbSet required in v2. (Interestingly, v3 doesn't support marshalling of native js/ts Map, but perhaps understandable, because you don't know whether the original dynamodb map was a js/ts Map or Object. This is the asymmetrical problem I mentioned above.)

Anyway, thank you!

@trivikr trivikr added the wontfix We have determined that we will not resolve the issue. label Feb 10, 2021
@trivikr
Copy link
Member

trivikr commented Feb 10, 2021

We're not planning to add premarshall configuration in util-dynamodb, as the equivalent post-marshall configuration can't work DynamoDB records.

Closing this issue as wontfix.

The option to convert class instances to map was added in #1969, and was released with v3.4.0

@trivikr trivikr closed this as completed Feb 10, 2021
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request New feature or enhancement. May require GitHub community feedback. wontfix We have determined that we will not resolve the issue.
Projects
None yet
Development

No branches or pull requests

3 participants