Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Serializer interface for use in ehcache disk tier and elsewhere;… #9

Open
wants to merge 3 commits into
base: sagars-ehcache-final
Choose a base branch
from

Conversation

peteralfonsi
Copy link
Owner

@peteralfonsi peteralfonsi commented Nov 2, 2023

Description

Added Serializer interface to Sagar's ehcache disk tier implementation. We pass the disk tier a key and a value serializer which serialize objects to byte[].

The value serializer is used outside of ehcache, in the disk tier's put() and get(). This saves us from having to serialize BytesReference objects such that their class is the same before and after serialization, which would have been very hacky and potentially damaged performance in other places in OpenSearch. (BytesReference is an interface and its implementing classes are not always simple to instantiate). Ehcache requires this class constraint, but the rest of OS does not care which type of BytesReference something is, so we can move our serializer outside ehcache for a cleaner implementation.

The key serializer is put into a wrapper that outputs ByteBuffer objects and implements ehcache's serializer interface. The wrapper instance is then passed directly to ehcache. We cannot use byte[] directly as a key, despite ehcache having a bundled serializer for it, because when searching for a key it checks both the equals() method of the bundled serializer and also the equals() method of the keys themselves (this is not documented). This means the key serializer must preserve class before and after serialization, but we don't expect this to be a problem.

The implementation is generic and can be used for any EhcacheDiskCachingTier<K, V> so long as serializer implementations for K and V are provided.

Related Issues

Part of larger tiered caching feature.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    - [N/A] New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
    - [N/A] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
    - [N/A] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… modifies existing disk tier impl to use it in a generic way

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Fixed byte[] key implementation to use ByteBuffer wrapper passed directly to ehcache

Signed-off-by: Peter Alfonsi <petealft@amazon.com>

Added tests for BytesReference serializer, and ehcache disk tier using BytesReference as a value
Comment on lines 46 to 54
/**
* This ehcache disk caching tier uses its value serializer outside ehcache.
* Values are transformed to byte[] outside ehcache and then ehcache uses its bundled byte[] serializer.
* The key serializer you pass to this class produces a byte[]. This serializer is passed to a wrapper which
* implements Ehcache's serializer implementation and produces a BytesBuffer. The wrapper instance is then passed to ehcache.
* This is done because to get keys on a disk tier, ehcache internally checks the equals() method of the serializer,
* but ALSO requires newKey.equals(storedKey) (this isn't documented), which is the case for ByteBuffer but not byte[].
* This limitation means that the key serializer must preserve the class of the key before/after serialization,
* but the value serializer does not have to do this.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detailed explanation is not needed that too around serialization. Just keep it generic. Any explanation around serialization if needed, keep it short.

new RemovalNotification<>(
event.getKey(),
valueSerializer.deserialize(event.getOldValue()),
RemovalReason.EVICTED
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed it earlier, can you also add TierType.DISK as an argument to this constructor.

event.getKey(),
valueSerializer.deserialize(event.getOldValue()),
RemovalReason.INVALIDATED
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed it earlier, can you also add TierType.DISK as an argument to this constructor.

public class EhCacheDiskCachingTier<K, V> implements DiskCachingTier<K, V> {

// A Cache manager can create many caches.
private final PersistentCacheManager cacheManager;

// Disk cache
private Cache<K, V> cache;
private Cache<K, byte[]> cache;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks confusing. Where we take key as generic K but value as byte[]. Ideally we need to serialize both eventually. Plus it can happen that both key and value maybe of same type like ByteReference, String etc, so having this hybrid doesn't make sense.

Better to keep both byte[] or use generic for value as well and define a valueWrapper like keyWrapper.

Let me know it it make sense.

Peter Alfonsi added 2 commits November 13, 2023 10:14
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants