Skip to content

Siegfried-based characterization tool for directories and disk images

License

Notifications You must be signed in to change notification settings

tw4l/brunnhilde

Repository files navigation

Brunnhilde - Siegfried-based characterization tool for directories and disk images

Version: Brunnhilde 1.9.6

Generates aggregate reports of files in a directory or disk image based on input from Richard Lehane's Siegfried.

For the graphical user interface (GUI) version of Brunnhilde, see Brunnhilde GUI.

Brunnhilde runs Siegfried against a specified directory or disk image, loads the results into a sqlite3 database, and queries the database to generate reports to aid in triage, arrangement, and description of digital archives. The program will also check for viruses unless specified otherwise, and will optionally run bulk_extractor against the given source. Outputs include:

  • report.html: Includes some provenance information on the scan itself, aggregate statistics for the material as a whole (number of files, begin and end dates, number of unique vs. duplicate files, etc.), and detailed reports on content found (file formats, file format versions, MIME types, last modified dates by year, unidentified files, Siegfried warnings/errors, duplicate files, and -optionally - Social Security Numbers found by bulk_extractor).
  • csv_reports folder: Contains CSV results queried from database on file formats, file format versions, MIME types, last modified dates by year, unidentified files, Siegfried warnings and errors, and duplicate files.
  • siegfried.csv: Full CSV output from Siegfried

Optionally, outputs may also include:

  • tree.txt: Tree report of the directory structure of directory or file system on disk image (in Linux and macOS only)
  • bulk_extractor folder: Contains bulk_extractor outputs (if selected).
  • carved_files folder: Contains files carved from disk images by tsk_recover or HFS Explorer (generated in -d mode; can be deleted at end of process by passing the -r or --removefiles flag to Brunnhilde).
  • dfxml.xml: A fiwalk-generated Digital Forensics XML file describing the volumes, filesystems, and files on a disk (generated in -d mode for non-HFS disk images).
  • logs folder: Contains log files for ClamAV and bulk_extractor (if selected).
  • siegfried.sqlite: SQLite3 database generated from Siegfried CSV (deleted at end of processing by default, but may be retained by using the -k flag.)

All outputs are placed into a new directory named after the identifier passed to Brunnhilde as the last argument.

For the most accurate statistics with Siegfried 1.6+, it is advised to force Siegfried to make single identifications for files with multiple filetypes. This can be accomplished with roy using the following command:

roy build -multi 0  

For a more detailed explanation of how multiple identifications are handled by Siegfried, see richardlehane/siegfried#75.

Installation

Brunnhilde and all of its dependencies are already installed in BitCurator version 1.7.106+. In versions 1.8.0+, a terminal launcher for Brunnhilde is included in the "Forensics and Reporting" folder on the BitCurator desktop.

Brunnhilde minimally requires that Python 2 or 3 and Siegfried are installed on your system to characterize directories of content. Characterizing disk images introduces additional dependencies. For more information, see Dependencies.

sudo pip install brunnhilde

If using macOS, you may have to run sudo pip3 install brunnhilde

Once installed, you can call brunnhilde with just brunnhilde.py [arguments].

If an older version of Brunnhilde is installed on your system, you can upgrade to the latest version with:

sudo pip install brunnhilde --upgrade

Usage

usage: brunnhilde.py [-h] [-a] [-b] [--ssn_mode SSN_MODE] [--regex REGEX] [-d]
                     [--hfs] [--hfs_resforks] [--hfs_partition HFS_PARTITION]
                     [--hfs_fsroot HFS_FSROOT] [--tsk_imgtype TSK_IMGTYPE]
                     [--tsk_fstype TSK_FSTYPE]
                     [--tsk_sector_offset TSK_SECTOR_OFFSET] [--hash HASH]
                     [-k] [-l] [-n] [-r] [-t] [-v] [-V] [-w] [-z]
                     [--save_assets SAVE_ASSETS] [--load_assets LOAD_ASSETS]
                     [--csv CSV] [--stdin] [-o] [--in-memory-db]
                     source destination [basename]

positional arguments:
  source                Path to source directory or disk image
  destination           Path to destination for reports
  basename              DEPRECATED. Accession number or identifier, used as
                        basename for outputs. Prefer using the new simpler
                        `brunnhilde.py source destination` syntax. The
                        basename argument is retained for API stability and
                        used when provided.

optional arguments:
  -h, --help            show this help message and exit
  -a, --allocated       Instruct tsk_recover to export only allocated files
                        (recovers all files by default)
  -b, --bulkextractor   Run Bulk Extractor on source
  --ssn_mode SSN_MODE   Specify ssn_mode for Bulk Extractor (0, 1, or 2)
  --regex REGEX         Specify path to regex file
  -d, --diskimage       Use disk image instead of dir as input (Linux and
                        macOS only)
  --hfs                 Use for raw disk images of HFS disks
  --hfs_resforks, --resforks
                        HFS option: Extract AppleDouble resource forks from
                        HFS disks
  --hfs_partition HFS_PARTITION
                        HFS option: Specify partition number as integer for
                        unhfs to extract (e.g. --hfs_partition 1)
  --hfs_fsroot HFS_FSROOT
                        HFS option: Specify POSIX path (file or dir) in the
                        HFS file system for unhfs to extract (e.g.
                        --hfs_fsroot /Users/tessa/backup/)
  --tsk_imgtype TSK_IMGTYPE
                        TSK option: Specify format of image type for
                        tsk_recover. See tsk_recover man page for details
  --tsk_fstype TSK_FSTYPE
                        TSK option: Specify file system type for tsk_recover.
                        See tsk_recover man page for details
  --tsk_sector_offset TSK_SECTOR_OFFSET
                        TSK option: Sector offset for particular volume for
                        tsk_recover to recover
  --hash HASH           Specify hash algorithm
  -k, --keepsqlite      Retain Brunnhilde-generated sqlite db after processing
  -l, --largefiles      Enable virus scanning of large files
  -n, --noclam          Skip ClamAV virus scan
  -r, --removefiles     Delete 'carved_files' directory when done (disk image
                        input only)
  -t, --throttle        Pause for 1s between Siegfried scans
  -v, --verbosesf       Log verbose Siegfried output to terminal while
                        processing
  -V, --version         Display Brunnhilde version
  -w, --warnings, --showwarnings
                        Add Siegfried warnings to HTML report
  -z, --scanarchives    Decompress and scan zip, tar, gzip, warc, arc with
                        Siegfried
  --save_assets SAVE_ASSETS
                        DEPRECATED. Non-functional in Brunnhilde 1.9.1+ but
                        retained for API stability
  --load_assets LOAD_ASSETS
                        DEPRECATED. Non-functional in Brunnhilde 1.9.1+ but
                        retained for API stability
  --csv CSV             Path to Siegfried CSV file to read as input
                        (directories only)
  --stdin               Read Siegfried CSV from piped stdin (directories only)
  -o, --overwrite       Overwrite reports directory if it already exists
  --in-memory-db        Use in-memory sqlite database rather than writing it
                        to disk

For file paths containing spaces in directory names, enclose the entire path in single or double quotes or make sure spaces are escaped properly (e.g. CCA\ Finding\ Aid\ Demo\).

Brunnhilde will accept absolute or relative paths for source and destination.

Example commands:

Brunnhilde 1.9+

Brunnhilde 1.9.0 introduces a simpler CLI syntax:

brunnhilde.py /directory/to/scan /output/directory/to/create

Or with options:

brunnhilde.py -ndz /home/user/diskimage.dd output_directory - scan a disk image (-d), skip the clamav virus check (-n), and instruct Siegfried to scan the contents of zip, tar, gzip, warc, and arc archive files (-z)

sf -