Skip to content

Add Metadata to LLVM Bitcode

Hassaan edited this page May 24, 2021 · 7 revisions

The steps below describe how to use an LLVM optimizer pass, named AddMetadata, to add metadata at function call-sites to specified parameters of the called function.

Use case

Following is a code snippet from the web server nweb which shows the read of a request data (i.e. input to the web server) from the client into the argument buffer of the read system call:

...
void web(int fd, int hit)
{
	int j, file_fd, buflen, len;
	long i, ret;
	char * fstr;
	static char buffer[BUFSIZE+1]; /* static so zero filled */

	ret =read(fd,buffer,BUFSIZE); 	/* read Web request in one go */
...

Metadata can be added to the argument buffer of the read function call (above) using the following AddMetadata configuration file:

read, 2, input

The configuration file, says to add the metadata input to the second argument of the read function call.

To add the metadata, the AddMetadata pass requires the LLVM bitcode of the nweb application. Then it iterates over all the instructions in the LLVM bitcode and adds the above-mentioned metadata to an instruction as LLVM Metadata. For each function call, a metadata is attached to it with the name call-site-metadata. The value of the attached metadata is an MDTuple instance. The first element in this tuple is a unique identifier for the site of the function call. Rest of the elements in the tuple are instances of MDTuple, one for each argument (if metadata was added for the argument). The first element in the argument tuple is the index of the argument (as specified in the configuration file). The rest of the elements in the argument tuple are the metadatas (one or more) as specified in the configuration file.

The output LLVM IR snippet of the above example looks as follows:

%14 = call i64 @read(i32 %12, i8* %13, i64 10), !call-site-metadata !2

!2 = !{!"0", !3}
!3 = !{!"2", !"input"}

Requirements

  • LLVM - Recommended release 10.0.0
  • Clang - Recommended release 10.0.0
  • CMake - Recommended release 3.13.4 or higher

On Ubuntu 20.04, the requirements can be installed with:

sudo apt-get install -y llvm-10 clang-10 cmake

Build the AddMetadata Pass From Source

  • Clone SPADE repository
  • Execute the command: ./bin/build-add-metadata.sh /usr/lib/llvm-10. Make sure to update the argument /usr/lib/llvm-10 to your LLVM installation
  • Upon successful build, the shared library for the pass would be created in lib/libAddMetadata.so

Using AddMetadata Pass

The pass takes three arguments:

  • -config: (Mandatory) The path to input configuration file (format described below)
  • -output: (Optional) File location to write the output of the pass to. If the value is stdout then output is written to standard out
  • -debug : (Optional) Print debug information, specifically, after each metadata addition, parse and print the metadata

Following is an example command:

$ opt -load lib/libAddMetadata.so -legacy-add-metadata -config input.config -output stdout bitcode.bc -o bitcode_with_metadata.bc

The command above reads input configuration file from input.config, and writes the output to standard out.


File Formats

Following is a sample input configuration file specified using -config:

# Each line contains 3 comma-separated values
# 1. The first value is the function name. The metadata will be added for all call-sites of this function
# 2. The seconds value is the argument index of the function call to which the metadata would be added
# 3. The third value is a descriptor of the metadata to identify the semantics of the metadata

# Comments start with '#' and must be at the beginning of the line

# Following tells the pass to add metadata with descriptor 'input' to all call-sites of the function 'read' for it's second parameter
read, 2, input

Following is a sample output of the pass:

10, read, 2, input

The output, above, indicates that the descriptor input was added to the second parameter of the function read at it's call-site which is identified by the value 10.


Extracting Metadata from LLVM Bitcode in an LLVM Optimizer Pass

The following code snippet shows how to extract the added metadata using a callback mechanism:

// The callback function which would be called for each metadata description added for each parameter
static void extraction_metadata_callback(Instruction *instruction, StringRef *functionName, APInt *callSiteNumber, APInt *parameterIndex, StringRef *description){
  // Do your work here
}

// The main function for an LLVM optimizer pass
bool MyOPTPass::runOnModule(Module &module){
  // The required definition of the callback can be seen here
  void (*metadata_callback_func)(Instruction *instruction, StringRef *functionName, APInt *callSiteNumber, APInt *parameterIndex, StringRef *description);
  // Assigning your callback function
  metadata_callback_func = &extraction_metadata_callback;
  // The 'extractAllMetadata' function checks for existing metadata on each instruction. If found, then it calls the callback function.
  extractAllMetadata(module, metadata_callback_func);
  return false;
}

The implementation of extractAllMetadata can be found in AddMetadata.cpp.


Clone this wiki locally