Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tar format not detected. #464

Closed
dstruck opened this issue Dec 20, 2023 · 1 comment · Fixed by #466
Closed

Tar format not detected. #464

dstruck opened this issue Dec 20, 2023 · 1 comment · Fixed by #466

Comments

@dstruck
Copy link

dstruck commented Dec 20, 2023

Concerned file: https://github.com/danielmiessler/SecLists/raw/master/Payloads/Zip-Bombs/r.tar.gz

Expected MIME type: application/x-tar

Returned MIME type: application/octet-stream

Version of the library: v1.4.3

The outer layer is correctly detected as application/gzip. After unpacking the outer layer (gunzip r.tar.gz), r.tar is detected as application/octet-stream, although it can be uncompressed with the GNU tar utility (tar xvf r.tar; Ubuntu).

The file utility on Ubuntu detects r.tar as r.tar: tar archive.

Tika is also able to detect the file is a Tar archive. If I understand it correctly, Tika uses "org.apache.commons.compress.archivers.tar.TarArchiveInputStream" to detect if it is a Tar archiv. This project documented the different possible Tar headers here: compress/blob/master/src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveEntry.java:

  • C structure for a Tar Entry's header
  • C structure for a old GNU Tar Entry's header
  • C structure for a xstar (Jörg Schilling star) Tar Entry's header

Another project able to detect the file as a Tar archive: https://github.com/trailofbits/polyfile:

A pure Python cleanroom implementation of libmagic, with instrumented parsing from Kaitai struct and an interactive hex viewer

One can find their Kaitai definition for the Tar format here: https://github.com/trailofbits/polyfile/blob/master/polyfile/magic_defs/archive

gabriel-vasile added a commit that referenced this issue Jan 3, 2024
Previous detection used the rules from PRONOM. This commit replaces
those rules with the check from github.com/file/file: compute checksum
for header and check if recorded checksum matches.
Fixes #464
gabriel-vasile added a commit that referenced this issue Jan 3, 2024
#466)

Previous detection used the rules from PRONOM. This commit replaces
those rules with the check from github.com/file/file: compute checksum
for header and check if recorded checksum matches.
Fixes #464
@gabriel-vasile
Copy link
Owner

Thank you for the detailed issue. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants