Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal disk buffering #913

Merged
merged 213 commits into from
Jul 14, 2023
Merged

Signal disk buffering #913

merged 213 commits into from
Jul 14, 2023

Conversation

LikeTheSalad
Copy link
Contributor

@LikeTheSalad LikeTheSalad commented Jun 9, 2023

Description:

Feature addition - Allows to cache signals in the disk and later send them on-demand.
This feature is pretty similar to the one already existing in the OpenTelemetry Swift SDK.

Requirements

  • Java 8 compatible (avoiding to use non-Android-friendly tools).
  • Configurable disk space limit.
  • Configurable disk cache dir.
  • Allows executing write and read operations in parallel.
  • Ensures FIFO order when reading from disk.
  • Prioritizes data writing, deleting the oldest data available in case the max disk space limit has been reached.
  • Lightweight.

Existing Issue(s):

Testing:

  • Unit testing.
  • Manual testing.

Documentation:

Javadoc and README.

How it works

You can take a look at the CONTRIBUTING file to get a more detailed overview of how it all works.

Outstanding items:

  • A serialization/deserialization mechanism that relies on the Java proto lib is being used despite its large size. A future PR might be needed to replace the serialization tool with a smaller alternative in order to better comply with the lightweight requirement.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I have not looked at every line of code, but I have looked at the overall design and a bunch of the code and think this is a great start/addition. I do wish it would have been more incremental, but alas, he were are.

I have a number of things I am thinking about iterating on with this, but since this is new and experimental contrib code, at this point I would prefer to merge the big module and do piecemeal issues/changes after.

Thanks @LikeTheSalad !

Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Definitely easier to grok with the simplified serialization logic.

* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.contrib.disk.buffering.exporters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put all the public classes in io.opentelemetry.contrib.disk.buffering, and put the remaining stuff in io.opentelemetry.contrib.disk.buffering.internal.*.

In other words, I don't think there's a need for the breaking out i.o.c.d.b.exporters and i.o.c.d.b.storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, I've applied the changes.

import java.io.IOException;

public class DefaultTemporaryFileProvider implements TemporaryFileProvider {
public static final TemporaryFileProvider INSTANCE = new DefaultTemporaryFileProvider();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer exposing singletons via public static methods rather than public static fields:

private static final TemporaryFileProvider INSTANCE = new DefaultTemporaryFileProvider();

public static TemporaryFileProvider getInstance() {
  return INSTANCE;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've applied this change to all the cases where an INSTANCE field was directly accessed.

* provided. FALSE if either of those conditions didn't meet.
* @throws IOException If an unexpected error happens.
*/
boolean exportStoredBatch(long timeout, TimeUnit unit) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this method has roughly the same semantics as CompletableResultCode {Signal}Exporter#flush().

Can we get rid of this method and instead have users use flush() to force an read from disk and export?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can work yeah. My only concern with doing so would be that the expectations of someone using this method won't probably match the reality of what actually happens, in the sense that, I believe the flush()method suggests that all the unexported signals will be exported right away. Though for the case of the disk exporter, it would mean that the next available batch of signals in the queue will be sent, but not all the batches at once. So I'm wondering if that could cause misunderstandings and probably invalid issues to be created in the repo because of it.


package io.opentelemetry.contrib.disk.buffering.internal.storage.utils;

public class TimeProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using io.opentelemetry.sdk.common.Clock instead of introducing a new abstraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, I forgot to make this same observation! Good call jack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made the change, although I'm not sure if the use case here is properly addressed by a Clock implementation because the time format that the storage classes use is milliseconds, which is also the format used in the configuration parameters, making it handy to do the math when checking for changes in the state of the cache files based on the user-provided parameters. So if we provide nanoseconds as local time, we'd have to make conversions all the time, which would turn in overall to more than one conversion needed per action, considering that the way a simple clock like the one used here would provide the current time in nanoseconds would be by converting the system current milliseconds to nanos.

I think it's also worth noting that this is meant to be an internal utility class which makes it possible to change the time for testing purposes, whereas I believe the usage of Clock is also handy for when we need to provide the consumers the ability to use their own implementation, though at the moment that's not needed in here.

So based on the above, the changes I made are essentially an incorrect implementation of Clock, which returns the current time in milliseconds to avoid unnecessary conversion issues. It's a bit ugly to me, though it's a simple way to reuse the Clock interface, though it's ugly nevertheless, so I'd like to know your opinion on it.

plugins {
id("otel.java-conventions")
id("otel.publish-conventions")
id("me.champeau.jmh") version "0.7.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both opentelemetry-java and opentelemetry-java-instrumentation use animalsniffer to avoid using apis that are not available on android, see if it would be useful in this project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I've just added it to verify API 24 supported code

private LogRecordDataSerializer() {}

static LogRecordDataSerializer get() {
if (instance == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not thread safe, consider whether you need to initialize this lazily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've made the changes

proto.writeDelimitedTo(out);
return out.toByteArray();
} catch (IOException e) {
throw new IllegalArgumentException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me IllegalStateException seems better suited than IllegalArgumentException for wrapping IOException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I don't have a strong opinion on this, I've just changed it

}
}

private static void copyFile(File from, File to) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files.copy isn't available on android?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, it's just that it was added in API level 26 which is quite high for apps that need to support older devices, we've got a couple of customers which are on level 24. Though now that I think about it better, if the core OTel SDK requires library desugaring, then probably there's no need to make this lib safe for older versions. I'll take a deeper look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I had a quick peek at desugared api list and couldn't find Files.copy. I'd just add a comment there explaining why you are not using Files.copy. An alternative would be to implement copy using FileChannel.transferTo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Thanks for taking the time 👍 Btw I'm planning to make all the requested changes first thing tomorrow, since I'm already EOD for today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment explaining why it was done this way, and also, the animalsniffer check will complain if Files.copy is used


private static void copyFile(File from, File to) throws IOException {
try (InputStream in = new BufferedInputStream(new FileInputStream(from));
OutputStream out = new FileOutputStream(to, false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false should be the default (overwrite instead of append)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've removed it

}

private static void copyFile(File from, File to) throws IOException {
try (InputStream in = new BufferedInputStream(new FileInputStream(from));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using BufferedInputStream here feels a bit weird as you are already using your own buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've removed it.


public final class Constants {

public static final byte[] NEW_LINE_BYTES =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in a test. Is this a remnant from the json based serialzier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, it is from when we were using the custom json serialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed now.

}

/**
* Attempts to write a line into a writable file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line feels a bit misleading as far as I can tell it's just binary data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a leftover from when json was used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's removed now.

expireTimeMillis = createdTimeMillis + configuration.getMaxFileAgeForReadMillis();
originalFileSize = (int) file.length();
temporaryFile = configuration.getTemporaryFileProvider().createTemporaryFile(file.getName());
copyFile(file, temporaryFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the copy source file to temp file, read temp file and overwrite the source after each batch logic hard to follow. Comments might help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was also hard for me. The general design is documented in a markdown file now. Maybe just link to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying the data around it might be more efficient to alter the file format so that it would be possible to tell which chunks have already been handled. Having some kind of header in the data file could also be useful. If it is possible that files written by older version of library are read by newer version then having a header could allow skipping data in case file format has changed.

Copy link
Contributor Author

@LikeTheSalad LikeTheSalad Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeitlinger and I also discussed some more optimized approaches as well, such as having a separate file to keep track of the bytes already read and updating that number instead. The header approach I think it's a good idea as well, especially because it'd keep everything in the same file, however, I'd have to double-check how could we update the header values at the same time the file is being read to make sure we don't lose the current position if the app gets terminated without a chance to properly close the reader. Though ultimately, since there's a lot to this PR already, and, based on a benchmark added to this specific functionality, I found out that the simple approach is quite decent in terms of performance, I was thinking it would be ideal to add these optimizations on a future PR if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exploring these ideas in the future is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added an explanation in the class's doc

return ReadableResult.FAILED;
}
if (hasExpired()) {
close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something else is supposed to delete the expired file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A clean up is done when creating a new file here.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask trask merged commit 2b8888d into open-telemetry:main Jul 14, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants