Signal disk buffering #913

LikeTheSalad · 2023-06-09T12:25:06Z

Description:

Feature addition - Allows to cache signals in the disk and later send them on-demand.
This feature is pretty similar to the one already existing in the OpenTelemetry Swift SDK.

Requirements

Java 8 compatible (avoiding to use non-Android-friendly tools).
Configurable disk space limit.
Configurable disk cache dir.
Allows executing write and read operations in parallel.
Ensures FIFO order when reading from disk.
Prioritizes data writing, deleting the oldest data available in case the max disk space limit has been reached.
Lightweight.

Existing Issue(s):

Testing:

Unit testing.
Manual testing.

Documentation:

Javadoc and README.

How it works

You can take a look at the CONTRIBUTING file to get a more detailed overview of how it all works.

Outstanding items:

A serialization/deserialization mechanism that relies on the Java proto lib is being used despite its large size. A future PR might be needed to replace the serialization tool with a smaller alternative in order to better comply with the lightweight requirement.

This reverts commit d84d329.

breedx-splk

Disclaimer: I have not looked at every line of code, but I have looked at the overall design and a bunch of the code and think this is a great start/addition. I do wish it would have been more incremental, but alas, he were are.

I have a number of things I am thinking about iterating on with this, but since this is new and experimental contrib code, at this point I would prefer to merge the big module and do piecemeal issues/changes after.

Thanks @LikeTheSalad !

Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>

jack-berg

Some minor comments. Definitely easier to grok with the simplified serialization logic.

jack-berg · 2023-07-11T18:19:22Z

...g/src/main/java/io/opentelemetry/contrib/disk/buffering/exporters/LogRecordDiskExporter.java

+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+package io.opentelemetry.contrib.disk.buffering.exporters;


Let's put all the public classes in io.opentelemetry.contrib.disk.buffering, and put the remaining stuff in io.opentelemetry.contrib.disk.buffering.internal.*.

In other words, I don't think there's a need for the breaking out i.o.c.d.b.exporters and i.o.c.d.b.storage.

Sounds good to me, I've applied the changes.

...java/io/opentelemetry/contrib/disk/buffering/storage/files/DefaultTemporaryFileProvider.java

jack-berg · 2023-07-11T18:22:10Z

...java/io/opentelemetry/contrib/disk/buffering/storage/files/DefaultTemporaryFileProvider.java

+import java.io.IOException;
+
+public class DefaultTemporaryFileProvider implements TemporaryFileProvider {
+  public static final TemporaryFileProvider INSTANCE = new DefaultTemporaryFileProvider();


We prefer exposing singletons via public static methods rather than public static fields:

private static final TemporaryFileProvider INSTANCE = new DefaultTemporaryFileProvider(); public static TemporaryFileProvider getInstance() { return INSTANCE; }

Thanks, I've applied this change to all the cases where an INSTANCE field was directly accessed.

jack-berg · 2023-07-11T18:26:26Z

...ing/src/main/java/io/opentelemetry/contrib/disk/buffering/exporters/StoredBatchExporter.java

+   *     provided. FALSE if either of those conditions didn't meet.
+   * @throws IOException If an unexpected error happens.
+   */
+  boolean exportStoredBatch(long timeout, TimeUnit unit) throws IOException;


Seems like this method has roughly the same semantics as CompletableResultCode {Signal}Exporter#flush().

Can we get rid of this method and instead have users use flush() to force an read from disk and export?

I think it can work yeah. My only concern with doing so would be that the expectations of someone using this method won't probably match the reality of what actually happens, in the sense that, I believe the flush()method suggests that all the unexported signals will be exported right away. Though for the case of the disk exporter, it would mean that the next available batch of signals in the queue will be sent, but not all the batches at once. So I'm wondering if that could cause misunderstandings and probably invalid issues to be created in the repo because of it.

jack-berg · 2023-07-11T18:37:47Z

...c/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/utils/TimeProvider.java

+
+package io.opentelemetry.contrib.disk.buffering.internal.storage.utils;
+
+public class TimeProvider {


Consider using io.opentelemetry.sdk.common.Clock instead of introducing a new abstraction.

Oof, I forgot to make this same observation! Good call jack.

I just made the change, although I'm not sure if the use case here is properly addressed by a Clock implementation because the time format that the storage classes use is milliseconds, which is also the format used in the configuration parameters, making it handy to do the math when checking for changes in the state of the cache files based on the user-provided parameters. So if we provide nanoseconds as local time, we'd have to make conversions all the time, which would turn in overall to more than one conversion needed per action, considering that the way a simple clock like the one used here would provide the current time in nanoseconds would be by converting the system current milliseconds to nanos.

I think it's also worth noting that this is meant to be an internal utility class which makes it possible to change the time for testing purposes, whereas I believe the usage of Clock is also handy for when we need to provide the consumers the ability to use their own implementation, though at the moment that's not needed in here.

So based on the above, the changes I made are essentially an incorrect implementation of Clock, which returns the current time in milliseconds to avoid unnecessary conversion issues. It's a bit ugly to me, though it's a simple way to reuse the Clock interface, though it's ugly nevertheless, so I'd like to know your opinion on it.

…fering/storage/files/DefaultTemporaryFileProvider.java Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>

# Conflicts: # disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/storage/files/DefaultTemporaryFileProvider.java

…uggestion

laurit · 2023-07-12T13:05:50Z

disk-buffering/build.gradle.kts

+plugins {
+  id("otel.java-conventions")
+  id("otel.publish-conventions")
+  id("me.champeau.jmh") version "0.7.1"


both opentelemetry-java and opentelemetry-java-instrumentation use animalsniffer to avoid using apis that are not available on android, see if it would be useful in this project

Good idea, I've just added it to verify API 24 supported code

laurit · 2023-07-12T13:16:48Z

...metry/contrib/disk/buffering/internal/serialization/serializers/LogRecordDataSerializer.java

+  private LogRecordDataSerializer() {}
+
+  static LogRecordDataSerializer get() {
+    if (instance == null) {


not thread safe, consider whether you need to initialize this lazily

Thanks, I've made the changes

laurit · 2023-07-12T13:26:41Z

...metry/contrib/disk/buffering/internal/serialization/serializers/LogRecordDataSerializer.java

+      proto.writeDelimitedTo(out);
+      return out.toByteArray();
+    } catch (IOException e) {
+      throw new IllegalArgumentException(e);


to me IllegalStateException seems better suited than IllegalArgumentException for wrapping IOException

Thanks, I don't have a strong opinion on this, I've just changed it

laurit · 2023-07-12T13:33:43Z

...c/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFile.java

+    }
+  }
+
+  private static void copyFile(File from, File to) throws IOException {


Files.copy isn't available on android?

It is, it's just that it was added in API level 26 which is quite high for apps that need to support older devices, we've got a couple of customers which are on level 24. Though now that I think about it better, if the core OTel SDK requires library desugaring, then probably there's no need to make this lib safe for older versions. I'll take a deeper look.

Thanks, that makes sense. I had a quick peek at desugared api list and couldn't find Files.copy. I'd just add a comment there explaining why you are not using Files.copy. An alternative would be to implement copy using FileChannel.transferTo.

Sounds good! Thanks for taking the time 👍 Btw I'm planning to make all the requested changes first thing tomorrow, since I'm already EOD for today

I've added a comment explaining why it was done this way, and also, the animalsniffer check will complain if Files.copy is used

laurit · 2023-07-12T13:34:25Z

...c/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFile.java

+
+  private static void copyFile(File from, File to) throws IOException {
+    try (InputStream in = new BufferedInputStream(new FileInputStream(from));
+        OutputStream out = new FileOutputStream(to, false)) {


false should be the default (overwrite instead of append)

Thanks, I've removed it

laurit · 2023-07-12T13:38:29Z

...c/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFile.java

+  }
+
+  private static void copyFile(File from, File to) throws IOException {
+    try (InputStream in = new BufferedInputStream(new FileInputStream(from));


using BufferedInputStream here feels a bit weird as you are already using your own buffer

Good point, I've removed it.

laurit · 2023-07-12T14:14:13Z

...ain/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/utils/Constants.java

+
+public final class Constants {
+
+  public static final byte[] NEW_LINE_BYTES =


This is only used in a test. Is this a remnant from the json based serialzier?

Nice catch, it is from when we were using the custom json serialization.

It's removed now.

laurit · 2023-07-12T14:25:16Z

...uffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/Storage.java

+  }
+
+  /**
+   * Attempts to write a line into a writable file.


line feels a bit misleading as far as I can tell it's just binary data

This is also a leftover from when json was used.

It's removed now.

laurit · 2023-07-12T14:36:28Z

...c/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFile.java

+    expireTimeMillis = createdTimeMillis + configuration.getMaxFileAgeForReadMillis();
+    originalFileSize = (int) file.length();
+    temporaryFile = configuration.getTemporaryFileProvider().createTemporaryFile(file.getName());
+    copyFile(file, temporaryFile);


I found the copy source file to temp file, read temp file and overwrite the source after each batch logic hard to follow. Comments might help.

yes, it was also hard for me. The general design is documented in a markdown file now. Maybe just link to it.

Instead of copying the data around it might be more efficient to alter the file format so that it would be possible to tell which chunks have already been handled. Having some kind of header in the data file could also be useful. If it is possible that files written by older version of library are read by newer version then having a header could allow skipping data in case file format has changed.

@zeitlinger and I also discussed some more optimized approaches as well, such as having a separate file to keep track of the bytes already read and updating that number instead. The header approach I think it's a good idea as well, especially because it'd keep everything in the same file, however, I'd have to double-check how could we update the header values at the same time the file is being read to make sure we don't lose the current position if the app gets terminated without a chance to properly close the reader. Though ultimately, since there's a lot to this PR already, and, based on a benchmark added to this specific functionality, I found out that the simple approach is quite decent in terms of performance, I was thinking it would be ideal to add these optimizations on a future PR if needed.

Exploring these ideas in the future is fine.

Thanks, I've added an explanation in the class's doc

laurit · 2023-07-12T14:39:42Z

...c/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/ReadableFile.java

+      return ReadableResult.FAILED;
+    }
+    if (hasExpired()) {
+      close();


something else is supposed to delete the expired file?

A clean up is done when creating a new file here.

trask

thx @LikeTheSalad!

and thx @zeitlinger @breedx-splk @jack-berg @laurit for reviewing!!

LikeTheSalad added 30 commits June 2, 2023 16:08

Creating empty module for disk exporters

2056cbf

Updated new exporters-storage module gradle file and package name

347a2b9

Renaming exporters-storage module to disk-buffer

ba9dd8a

Adding AttributesJsonConverter

0619a73

Adding span json types

67be26c

Adding log json types

d1214af

Adding metric json types

fb08847

Validating custom MetricDataJsonConverter

1c24365

Changing package name

248dffb

Moving serialization into internal package

51bfb99

Updating module name for disk buffer

fbef8fb

Adding span mappers

67e35e5

Clean up

7697b69

Adding metrics mapping

42f1f49

Adding log mapping

b5874dc

Making classes final

69dbcd0

Created SignalSerializer

103ab05

Verifying log serialization

635a8e0

Renaming base serializer test methods

9ec3b73

Clean up

0d10862

Moving mapping to serialization

875e120

Clean up imports

cce475c

Reoarganizing json dtos

36297c6

Reorganizing json custom converters

f92ea01

Using autovalue on metric point data implementation

c1a5935

Abstracting common metric data point builder methods

dbd58d1

Validating metrics serialization

62e4598

Validating spans serialization

10a4a7c

Renaming Serializer to JsonSerializer

d84d329

Revert "Renaming Serializer to JsonSerializer"

300479f

This reverts commit d84d329.

breedx-splk approved these changes Jul 7, 2023

View reviewed changes

Update disk-buffering/README.md

8596be1

Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>

jack-berg reviewed Jul 11, 2023

View reviewed changes

LikeTheSalad and others added 13 commits July 12, 2023 10:22

Renaming TimeProvider to StorageClock and implementing OTel's Clock

eb13335

Providing milliseconds in StorageClock.now()

35c6187

Making StorageClock final

e8abb3b

Replacing singleton access from INSTANCE field to getInstance() method

d55a4d5

Update disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buf…

9f71217

…fering/storage/files/DefaultTemporaryFileProvider.java Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com>

Making DefaultTemporaryFileProvider final

e0dbd4c

Merge remote-tracking branch 'origin/disk-buffer' into disk-buffer

6cb3804

# Conflicts: # disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/storage/files/DefaultTemporaryFileProvider.java

Adding DefaultTemporaryFileProvider getInstance after committing PR s…

1ed6042

…uggestion

Moving all public classes outside the exporters package

21a945b

Moving all classes from storage directly to internal

101b932

Moving all classes from storage directly to internal

9eabe76

Fixing formatting

e727b9f

Fixing md links

d40d829

laurit reviewed Jul 12, 2023

View reviewed changes

LikeTheSalad added 10 commits July 13, 2023 09:39

Removing usage of buffered input stream in ReadableFile

b733c9a

Adding animalsniffer check for Android level 24 api support check

8acf8b2

Removing lazy init for serializers

80b266a

Renaming singleton getter

7236f41

Replacing exception type when serializing

9d365d7

Removing unnecessary param for FileOutputStream constructor

a8f668d

Removing unused Constants class

ba57501

Replacing "line" by "item" wording in Storage.java

c712d25

Adding docs to ReadableFile

60ac5c0

Adding docs to ReadableFile.copyFile

ad26fb2

trask approved these changes Jul 14, 2023

View reviewed changes

trask merged commit 2b8888d into open-telemetry:main Jul 14, 2023
13 checks passed

breedx-splk mentioned this pull request Jul 14, 2023

disk-buffering refactoring #957

Merged


		package io.opentelemetry.contrib.disk.buffering.internal.storage.utils;

		public class TimeProvider {


		public final class Constants {

		public static final byte[] NEW_LINE_BYTES =

Signal disk buffering #913

Signal disk buffering #913

Conversation

LikeTheSalad commented Jun 9, 2023 • edited Loading

breedx-splk left a comment

Choose a reason for hiding this comment

jack-berg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LikeTheSalad Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trask left a comment

Choose a reason for hiding this comment

LikeTheSalad commented Jun 9, 2023 •

edited

Loading

LikeTheSalad Jul 12, 2023 •

edited

Loading