Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we choose the zlib version to use? #61852

Closed
carlossanlop opened this issue Nov 19, 2021 · 12 comments
Closed

How do we choose the zlib version to use? #61852

carlossanlop opened this issue Nov 19, 2021 · 12 comments
Labels
area-System.IO.Compression question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@carlossanlop
Copy link
Member

carlossanlop commented Nov 19, 2021

I'm trying to understand how we decide which native zlib library to consume in our compression libraries, but there are a few things that confuse me.

My questions are the following:

  • Why are the *.c and *.h files from the zlib library, as well as its CMakeLists.txt file, located inside the Native Windows folder instead of the Native AnyOS folder? The Unix code also consumes these libraries.
  • Why is the zlib.h file excluded from **NATIVECOMPRESSION_SOURCES in CMakeLists.txt, when we choose based on architecture?
  • Why, at the top of pal_zlib.c, are we hardcoding the zlib\zlib.h file depending if _WIN32 is defined, instead of by architecture?
  • Where is #include <zlib.h> coming from, and why is that used if _WIN32 is not defined?

Below are the details explaining how I arrived to those questions.


.NET consumes two zlib versions:


The code that chooses which libraries to load depending on the architecture is here:

if(${CLR_CMAKE_HOST_ARCH} STREQUAL x86 OR ${CLR_CMAKE_HOST_ARCH} STREQUAL x64)
set(NATIVECOMPRESSION_SOURCES
zlib-intel/adler32.c
zlib-intel/compress.c
zlib-intel/crc_folding.c
zlib-intel/crc32.c
zlib-intel/deflate_medium.c
zlib-intel/deflate_quick.c
zlib-intel/deflate.c
zlib-intel/inffast.c
zlib-intel/inflate.c
zlib-intel/inftrees.c
zlib-intel/match.c
zlib-intel/slide_sse.c
zlib-intel/trees.c
zlib-intel/x86.c
zlib-intel/zutil.c
)
else()
set(NATIVECOMPRESSION_SOURCES
zlib/adler32.c
zlib/compress.c
zlib/crc32.c
zlib/deflate.c
zlib/inffast.c
zlib/inflate.c
zlib/inftrees.c
zlib/trees.c
zlib/zutil.c
)
endif()

Strange thing no. 1: The zlib library *.h and *.c files, and the CMakeLists.txt file arelocated in the Windows folder, instead of the AnyOS folder.


Both of the library folders contain a zlib.h file, which according to the description, is the "interface of the library'. Notice that the zlib.h file is excluded from the CMakeLists.txt file.

Strange thing no. 2: We don't explicitly load the zlib.h file in the CMakeLists.txt file


The public p/invoke entrypoints are declared in runtime/src/libraries/Native/AnyOS/zlib/pal_zlib.h

And they are implemented in runtime/src/libraries/Native/AnyOS/zlib/pal_zlib.c

But in pal_zlib.c file, you'll notice the #include that loads zlib.h hardcodes the one from madler/zlib when _WIN32 is not defined:

#ifdef _WIN32
#define c_static_assert(e) static_assert((e),"")
#include "../../Windows/System.IO.Compression.Native/zlib/zlib.h"
#else
#include "pal_utilities.h"
#include <zlib.h>
#endif

Strange thing no. 3: Why aren't we choosing based on architecture here? And to which file is <zlib.h> pointing?


The entry points for the DLL are indicated in this file, which imports them from pal_zlib.h:

DllImportEntry(CompressionNative_Crc32)
DllImportEntry(CompressionNative_Deflate)
DllImportEntry(CompressionNative_DeflateEnd)
DllImportEntry(CompressionNative_DeflateReset)
DllImportEntry(CompressionNative_DeflateInit2_)
DllImportEntry(CompressionNative_Inflate)
DllImportEntry(CompressionNative_InflateEnd)
DllImportEntry(CompressionNative_InflateReset)
DllImportEntry(CompressionNative_InflateInit2_)


The Unix code consumes the AnyOS pal_zlib.c file as well:

CompressionNative_Crc32
CompressionNative_Deflate
CompressionNative_DeflateEnd
CompressionNative_DeflateReset
CompressionNative_DeflateInit2_
CompressionNative_Inflate
CompressionNative_InflateEnd
CompressionNative_InflateReset
CompressionNative_InflateInit2_

@carlossanlop carlossanlop added question Answer questions and provide assistance, not an issue with source code or documentation. area-System.IO.Compression labels Nov 19, 2021
@carlossanlop carlossanlop added this to the Future milestone Nov 19, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 19, 2021
@ghost
Copy link

ghost commented Nov 19, 2021

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

I'm trying to understand how we decide which zlib library to consume, but there are a few things that confuse me.

My questions are the following:

  • Why are the *.c and *.h files from the zlib library, as well as its CMakeLists.txt file, located inside the Native Windows folder instead of the Native AnyOS folder?
  • Why is the zlib.h file excluded from **NATIVECOMPRESSION_SOURCES in CMakeLists.txt, when we choose based on architecture?
  • Why, at the top of pal_zlib.c, are we hardcoding the zlib\zlib.h file depending if _WIN32 is defined, instead of by architecture?
  • Where is #include <zlib.h> coming from, and why is that used if _WIN32 is not defined?

Below are the details explaining how I arrived to those questions.


.NET consumes two zlib versions:


The code that chooses which libraries to load depending on the architecture is here:

if(${CLR_CMAKE_HOST_ARCH} STREQUAL x86 OR ${CLR_CMAKE_HOST_ARCH} STREQUAL x64)
set(NATIVECOMPRESSION_SOURCES
zlib-intel/adler32.c
zlib-intel/compress.c
zlib-intel/crc_folding.c
zlib-intel/crc32.c
zlib-intel/deflate_medium.c
zlib-intel/deflate_quick.c
zlib-intel/deflate.c
zlib-intel/inffast.c
zlib-intel/inflate.c
zlib-intel/inftrees.c
zlib-intel/match.c
zlib-intel/slide_sse.c
zlib-intel/trees.c
zlib-intel/x86.c
zlib-intel/zutil.c
)
else()
set(NATIVECOMPRESSION_SOURCES
zlib/adler32.c
zlib/compress.c
zlib/crc32.c
zlib/deflate.c
zlib/inffast.c
zlib/inflate.c
zlib/inftrees.c
zlib/trees.c
zlib/zutil.c
)
endif()

Strange thing no. 1: The zlib library *.h and *.c files, and the CMakeLists.txt file arelocated in the Windows folder, instead of the AnyOS folder.


Both of the library folders contain a zlib.h file, which according to the description, is the "interface of the library'. Notice that the zlib.h file is excluded from the CMakeLists.txt file.

Strange thing no. 2: We don't explicitly load the zlib.h file in the CMakeLists.txt file


The public p/invoke entrypoints are declared in runtime/src/libraries/Native/AnyOS/zlib/pal_zlib.h

And they are implemented in runtime/src/libraries/Native/AnyOS/zlib/pal_zlib.c

But in pal_zlib.c file, you'll notice the #include that loads zlib.h hardcodes the one from madler/zlib when _WIN32 is not defined:

#ifdef _WIN32
#define c_static_assert(e) static_assert((e),"")
#include "../../Windows/System.IO.Compression.Native/zlib/zlib.h"
#else
#include "pal_utilities.h"
#include <zlib.h>
#endif

Strange thing no. 3: Why aren't we choosing based on architecture here? And to which file is <zlib.h> pointing?


The entry points for the DLL are indicated in this file, which imports them from pal_zlib.h:

DllImportEntry(CompressionNative_Crc32)
DllImportEntry(CompressionNative_Deflate)
DllImportEntry(CompressionNative_DeflateEnd)
DllImportEntry(CompressionNative_DeflateReset)
DllImportEntry(CompressionNative_DeflateInit2_)
DllImportEntry(CompressionNative_Inflate)
DllImportEntry(CompressionNative_InflateEnd)
DllImportEntry(CompressionNative_InflateReset)
DllImportEntry(CompressionNative_InflateInit2_)

Author: carlossanlop
Assignees: -
Labels:

question, area-System.IO.Compression

Milestone: Future

@jkotas
Copy link
Member

jkotas commented Nov 19, 2021

Why are the *.c and *.h files from the zlib library, as well as its CMakeLists.txt file, located inside the Native Windows folder instead of the Native AnyOS folder? The Unix code also consumes these libraries.

Right, the AnyOS/Unix/Windows split does not really make sense. It would be best to get rid of it and have the library-specific directories directly in the native folder. This was discussed several times. Nobody volunteered to clean it up yet.

@carlossanlop
Copy link
Member Author

@hoyosjs gave me the answers via chat. I'm translating them and will post them in a few moments.

@carlossanlop
Copy link
Member Author

carlossanlop commented Nov 19, 2021

Why are the *.c and *.h files from the zlib library, as well as its CMakeLists.txt file, located inside the Native Windows folder instead of the Native AnyOS folder? The Unix code also consumes these libraries.
Why, at the top of pal_zlib.c, are we hardcoding the zlib\zlib.h file depending if _WIN32 is defined, instead of by architecture?
Where is #include <zlib.h> coming from, and why is that used if _WIN32 is not defined?

3 questions, 1 answer: The reason why the checked-in files are located in the Windows folder is because only Windows consumes those files (hence the _WIN32 ifdef protection). Other OSs consume whatever the system provides (for example, Ubuntu will have whatever was installed via apt-get).

The hardcoded file /zlib/zlib.h is just pointing to the public contract, which should be compatible among versions. I created a diff and the files are almost identical, except for a few different comments and a parameter named differently.

Why is the zlib.h file excluded from **NATIVECOMPRESSION_SOURCES in CMakeLists.txt, when we choose based on architecture?

Only *.c files are included there, those are the ones that get compiled. I didn't notice that initially.

Thanks @hoyosjs

@jkotas do you feel strongly about moving the native libraries to the AnyOS folder? With what I wrote above, I don't feel like that's needed anymore, since only Windows consumes the zlib checked-in libraries. But if it's for correctness, I could open an issue and mark it up for grabs.

@carlossanlop carlossanlop removed the untriaged New issue has not been triaged by the area owner label Nov 19, 2021
@jkotas
Copy link
Member

jkotas commented Nov 19, 2021

@jkotas do you feel strongly about moving the native libraries to the AnyOS folder?

If we are going to do any shuffle here, we should get rid of the AnyOS/Unix/Windows directory layer completely.

Note that there are other non-senses in the current directory structure, for example System.IO.Compression.Native is under Unix, but it is also used on Windows.

@carlossanlop
Copy link
Member Author

If we are going to do any shuffle here, we should get rid of the AnyOS/Unix/Windows directory layer completely.

We might! The zlib folks released zlib-ng, an ARM/ARM64 optimized variant of the zlib library. Same license as the other zlib libraries we already consume, so we can use it.

I'm currently experimenting with this library and plan on collecting perf numbers. If it all looks good, I can take this opportunity to reshuffle the files as you suggest.

Note that there are other non-senses in the current directory structure, for example System.IO.Compression.Native is under Unix, but it is also used on Windows.

Noted! I'll keep it in mind.

Thanks!

@safern
Copy link
Member

safern commented Nov 19, 2021

Note that there are other non-senses in the current directory structure, for example System.IO.Compression.Native is under Unix, but it is also used on Windows.

Same for System.Globalization.Native we've come across this many times but we haven't had the time to remove the OS structure in the directories. Also, I think the System.Globalization.Native shim should move to src/native as we no longer ship that library with the runtime packs, we now statically link it into coreclr and mono and the code is shared between the 2 runtimes.

@jkotas
Copy link
Member

jkotas commented Nov 19, 2021

I think the System.Globalization.Native shim should move to src/native as we no longer ship that library with the runtime packs,

We should treat all shims the same way. I agree that src/native may be the best place for all shims. We are going to have mix of static and dynamic linking of the shims (e.g. we statically link most shims into the single-file coreclr flavor), and this mix is going to change over time.

@safern
Copy link
Member

safern commented Nov 19, 2021

I couldn't find an issue for that discussion, but I did recall having that discussion, do you remember if we have a tracking issue to move shims to src/native?

@jkotas
Copy link
Member

jkotas commented Nov 19, 2021

I am not able to find any tracking issue for it.

@jkotas
Copy link
Member

jkotas commented Nov 20, 2021

Opened #61864

@safern
Copy link
Member

safern commented Nov 20, 2021

Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

3 participants