Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our scenario seems to lack framework possibilities - or can't find documentation (sorry then) #1894

Closed
Daniel-Pavic opened this issue Jan 19, 2022 · 8 comments

Comments

@Daniel-Pavic
Copy link

Daniel-Pavic commented Jan 19, 2022

Dear BenchmarkDotNet-Team, contributers and friends,

we would like to measure the overall (de-)compression performance on various data with different compression settings.

Therefore our goal is to:

  • Have a specific benchmark-class per CompressionType (GZip, Deflate, Brotli, etc.) that contains all possible compression-settings for this type.
  • Load all files from filesystem and provide them via an IEnumerable<byte[]>
  • Perform compression-benchmark analysis
  • USING THE OUTPUT-DATA OF COMPRESSION-BENCHMARK-ANALYSIS AS INPUT (-PARAMETER?) FOR SUBSEQUENT DECOMPRESSION-BENCHMARK ANALYSIS in order to obtain corresponding decompression-results per file and compression-settings
  • Perform an aggregation of benchmarks over all files, grouped by CompressionMode (compress/decompress), CompressionType (GZip, Deflate, Brotli, etc.).
  • Provide CompressionRatio (AVG) and CompressionTime (AVG) using the above grouping of data

What I’ve managed to do so far:

  • Providing IEnumerables for Files, CompressionTypes, CompressionLevels and other compression settings
  • Setting categories for CompressionMode (compress-/decompress) … which might be counterproductive since it makes more sense to perform the decompression on specific compressed data with corresponding compression settings
  • measure compression-performance for all files on a „per-file“ basis.

What I’ve NOT managed to achieve so far:

  • (RE)using compressed data and the corresponding compression settings to perform decompression benchmarks (without maybe storing all data on the filesystem and reloading it later ... still I'd need to know what compression settings were used to persist the compressed data)
  • Perform aggregations over all files, grouping the benchmark results by CompressionMode, CompressionType and CompressionSettings

Question:
Is there any chance to achieve my „wishlist“ using the (extenedable) mechanisms of the framework? If so: Where can I find documentation to achieve my goals or is there any? Unfortunately I couldn’d find any documentation regarding my „unmanaged goals“.

Any help or reply is greatly appreciated. Thank you in advance!
Best regards, Daniel

@Daniel-Pavic Daniel-Pavic changed the title Our scenario seems to lack framework possibilities (I guess) - or can't find documentation (sorry then) Our scenario seems to lack framework possibilities - or can't find documentation (sorry then) Jan 19, 2022
@timcassell
Copy link
Collaborator

timcassell commented Jan 19, 2022

Something like this? This is only doing the compression/decompression on a single file, but you already said you have multiple files working, so this should handle the other parts.

Code
public class Benchmark
{
    [Params(/* compression modes here */)]
    public CompressionMode compressionMode;
    [Params(/* compression types here */)]
    public CompressionType compressionType;
    [ParamsSource(nameof(GetCompressionSettings))]
    public CompressionSettings compressionSettings;

    public IEnumerable<CompressionSettings> GetCompressionSettings()
    {
        yield return ...;
        yield return ...;
    }

    private const string folderPath = "C:/path/to/files/";
    private const string originalFilePath = folderPath + "/originalFileName";
    private string filePath;
    private byte[] bytes;

    [GlobalSetup]
    public void GlobalSetup()
    {
        filePath = folderPath + $"/{compressionType}-{compressionMode}-{compressionSettings}.{extension}";
    }

    [IterationSetup(Target = nameof(Compress))]
    public void CompressSetup()
    {
        bytes = File.ReadAllBytes(originalFilePath);
    }

    [Benchmark]
    public void Compress()
    {
        // Do Compress work
        bytes = CompressBytes(bytes);
    }

    [IterationCleanup(Target = nameof(Compress))]
    public void CompressCleanup()
    {
        File.WriteAllBytes(filePath, bytes);
    }

    [IterationSetup(Target = nameof(Decompress))]
    public void DecompressSetup()
    {
        bytes = File.ReadAllBytes(filePath);
    }

    [Benchmark]
    public void Decompress()
    {
        // Do Decompress work
        bytes = DecompressBytes(bytes);
    }

    [IterationCleanup(Target = nameof(Decompress))]
    public void DecompressCleanup()
    {
        // Anything to do here? I think we're done.
    }
}

Unfortunately, there is no way to add custom data to the results table like you're looking for for compression ratio. There is an open issue about it #784.

@Daniel-Pavic
Copy link
Author

Daniel-Pavic commented Jan 19, 2022

Hi Tim, thank you for your really quick and helpful reply! I stumbled across [IterationSetup] - but kept on reading due to the following info:

It's not recommended to use this attribute in microbenchmarks because it can spoil the results. However, if you are writing a macrobenchmark (e.g. a benchmark which takes at least 100ms) and you want to prepare some data before each invocation, [IterationSetup] can be useful.

This kept me from taking it into consideration since a single compress-benchmark iteration takes (dependent on filesize and CompressionType) from 5ms (2kB) - 50ms (10kB) to complete - and therefore doesn't really match the above criteria.

Do you think it is nevertheless appropriate to use it? Are there any known alternatives or workarounds for so called "microbenchmarks"?

Anyway: Thank you for focussing me on this approach again ... maybe I should give it a try despite the known limitations!

@timcassell
Copy link
Collaborator

It is unfortunately necessary to use for this type of benchmark. BDN considers anything under 100ms to be a "microbenchmark", so you can use [Benchmark(OperationsPerInvoke = 100)] and repeat the compression/decompression that many times to increase the runtime to get more accurate results.

@timcassell
Copy link
Collaborator

You don't need IterationSetup/Cleanup if you want to include reading/writing the file from/to the filesystem as part of the benchmark.

@Daniel-Pavic
Copy link
Author

That's a great point using the OperationsPerInvoke-option! Being new to BDN it's somewhat hard to connect all the info and possibilities in a way to get a suitable approach for a more complex scenario. Your code and reply helps me a lot! Thank you very much for helping out! I am now curious and keen on my results after benchmark-recreation!

You don't need IterationSetup/Cleanup if you want to include reading/writing the file from/to the filesystem as part of the benchmark.

I actually try to keep the compressed data in memory (benchmark-class) in order to apply the corresponding decompression. Let's see... :-)

@timcassell
Copy link
Collaborator

I actually try to keep the compressed data in memory (benchmark-class) in order to apply the corresponding decompression. Let's see... :-)

Another option with that in mind is using GlobalSetup to cache both the uncompressed and compressed byte[]s as fields (Run the decompress method once in GlobalSetup and store the result). Then you don't need IterationSetup at all.

@Daniel-Pavic
Copy link
Author

Daniel-Pavic commented Jan 19, 2022

I guess this won't work for my scenario since the compression result (byte[]-data) varies on each iteration due to different compression settings. Therefore decompression benchmark might also vary - depending on the data's compression settings. A global approach might lead to a "unified" decompression benchmark since the compressed (global) data has been created only once using a specific (single) compression level setting.... or did I miss something?

@timcassell
Copy link
Collaborator

GlobalSetup is not used as a global setup for all benchmarks, but rather as a global setup for each benchmark. It is run only once per benchmark, as opposed to IterationSetup which can run multiple times per benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants