Skip to content

Fuzzing a Compiler

Rohan Padhye edited this page Dec 30, 2021 · 23 revisions

Finding Bugs in Google Closure with Zest

This tutorial walks through generator-based fuzzing of a non-trivial application: the Google Closure Compiler. The closure compiler optimizes JavaScript source code; therefore, we will write a generator for strings that produces syntactically valid JavaScript inputs.

In this tutorial, we will use the JQF Maven Plugin to invoke the Zest fuzzing engine via Apache Maven. We will also use Maven to resolve dependencies on JQF and Google Closure. This means no installation and no messing with scripts/classpaths! If you prefer not to use Maven, however, check out the Zest 101 tutorial for examples of command-line usage.

Requirements and Background

You will need Java 8+ and Apache Maven 3.5+.

Although not strictly required, this tutorial assumes some familiarity with writing a Generator<T> and the information displayed on the status screen when running the Zest fuzzing engine. If you feel lost at any point, check out the Zest 101 tutorial.

Step 1: Create pom.xml

Save the following file as pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>examples</groupId>
    <artifactId>zest-tutorial</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <!-- Google Closure: we want to test this library -->
        <dependency>
            <groupId>com.google.javascript</groupId>
            <artifactId>closure-compiler</artifactId>
            <version>v20180204</version>
            <scope>test</scope>
        </dependency>

        <!-- JQF: test dependency for @Fuzz annotation -->
        <dependency>
            <groupId>edu.berkeley.cs.jqf</groupId>
            <artifactId>jqf-fuzz</artifactId>
            <!-- confirm the latest version at: https://mvnrepository.com/artifact/edu.berkeley.cs.jqf -->
            <version>1.8</version> 
            <scope>test</scope>
        </dependency>

        <!-- JUnit-QuickCheck: API to write generators -->
        <dependency>
            <groupId>com.pholser</groupId>
            <artifactId>junit-quickcheck-generators</artifactId>
            <version>1.0</version>
            <scope>test</scope>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <!-- The JQF plugin, for invoking the command `mvn jqf:fuzz` -->
            <plugin>
                <groupId>edu.berkeley.cs.jqf</groupId>
                <artifactId>jqf-maven-plugin</artifactId>
                <!-- confirm the latest version at: https://mvnrepository.com/artifact/edu.berkeley.cs.jqf -->
                <version>1.8</version>
            </plugin>
        </plugins>
    </build>
</project>

Step 2: Write a test driver

Create the following file as src/test/java/examples/CompilerTest.java:

package examples;

import java.io.ByteArrayOutputStream;
import java.io.PrintStream;

import com.google.javascript.jscomp.CompilationLevel;
import com.google.javascript.jscomp.Compiler;
import com.google.javascript.jscomp.CompilerOptions;
import com.google.javascript.jscomp.Result;
import com.google.javascript.jscomp.SourceFile;
import com.pholser.junit.quickcheck.From;
import edu.berkeley.cs.jqf.fuzz.Fuzz;
import edu.berkeley.cs.jqf.fuzz.JQF;
import org.junit.Before;
import org.junit.runner.RunWith;

import static org.junit.Assume.*;

@RunWith(JQF.class)
public class CompilerTest {

    static {
        // Disable all logging by Closure passes, to speed up fuzzing
        java.util.logging.LogManager.getLogManager().reset();
    }

    // Compiler, options, and predefined JS environment
    private Compiler compiler = new Compiler(new PrintStream(new ByteArrayOutputStream(), false));
    private CompilerOptions options = new CompilerOptions();
    private SourceFile externs = SourceFile.fromCode("externs", "");

    @Before // Runs before tests are executed
    public void initCompiler() {
        // Don't use threads
        compiler.disableThreads();
        // Don't print things
        options.setPrintConfig(false);
        // Enable all safe optimizations
        CompilationLevel.SIMPLE_OPTIMIZATIONS.setOptionsForCompilationLevel(options);
    }

    /** Compiles an input and returns its result */
    private Result compile(SourceFile input) {
        Result result = compiler.compile(externs, input, options);
        assumeTrue(result.success); // Semantic validity check
        return result;
    }

    /** Entry point for fuzzing with default (arbitrary) string generator */
    @Fuzz
    public void testWithString(String code) {
        SourceFile input = SourceFile.fromCode("input", code);
        compile(input); // No assertions; we are looking for unexpected exceptions
    }
}

The method annotated with @Fuzz will be the entry point to the fuzzing engine. We have not specified any explicit generator for the input parameter to testWithString(). Junit-quickcheck's default string generator produces arbitrary strings. The rest of the methods are primarily set-up code to initialize the Google closure compiler and set some of its configuration options.

Compile the test driver:

mvn test-compile

Step 3: Fuzz using default generator -- moderate code coverage

We now fuzz for 5 minutes using the JQF Maven Plugin, as follows:

mvn jqf:fuzz -Dclass=examples.CompilerTest -Dmethod=testWithString -Dtime=5m

You will see a status screen as follows:

Zest: Validity Fuzzing with Parametric Generators
----------------------------------------------
Test name:            examples.CompilerTest#testWithString
Results directory:    /path/to/tutorial/target/fuzz-results/examples.CompilerTest/testWithString
Elapsed time:         5m 0s (max 5m 0s)
Number of executions: 34,312
Valid inputs:         12,414 (36.18%)
Cycles completed:     0
Unique failures:      0
Queue size:           221 (0 favored last cycle)
Current parent input: 120 (favored) {735/740 mutations}
Execution speed:      85/sec now | 114/sec overall
Total coverage:       4,615 (7.04% of map)
Valid coverage:       4,247 (6.48% of map)

You'll probably notice some modest code coverage (we saw ~4,600 branches in 5 minutes), and likely no failures. By default, the string inputs generated for the testWithString method do not have any knowledge of the syntax of JavaScript. Although Zest tries its best to bias input generation towards validity, most of these inputs will be trivially valid (e.g. 1+1 or x). Such inputs cover a small fraction of the code and do not uncover any interesting bugs.

Step 4: Write a custom JavaScript generator

Let's create a new class which extends Generator<String> but only produces syntactically valid JavaScript programs when its generate() method is called. This is a non-trivial generator that mimics the grammar of JavaScript, and recursively generates a program. We will save this file in src/test/java/examples/JavaScriptCodeGenerator.java. The generator code is self-documenting:

package examples;

import java.util.*;
import java.util.function.*;

import com.pholser.junit.quickcheck.generator.GenerationStatus;
import com.pholser.junit.quickcheck.generator.Generator;
import com.pholser.junit.quickcheck.random.SourceOfRandomness;

/* Generates random strings that are syntactically valid JavaScript */
public class JavaScriptCodeGenerator extends Generator<String> {
    public JavaScriptCodeGenerator() {
        super(String.class); // Register type of generated object
    }

    private GenerationStatus status; // saved state object when generating
    private static final int MAX_IDENTIFIERS = 100;
    private static final int MAX_EXPRESSION_DEPTH = 10;
    private static final int MAX_STATEMENT_DEPTH = 6;
    private static Set<String> identifiers; // Stores generated IDs, to promote re-use
    private int statementDepth; // Keeps track of how deep the AST is at any point
    private int expressionDepth; // Keeps track of how nested an expression is at any point

    private static final String[] UNARY_TOKENS = {
            "!", "++", "--", "~",
            "delete", "new", "typeof"
    };

    private static final String[] BINARY_TOKENS = {
            "!=", "!==", "%", "%=", "&", "&&", "&=", "*", "*=", "+", "+=", ",",
            "-", "-=", "/", "/=", "<", "<<", ">>=", "<=", "=", "==", "===",
            ">", ">=", ">>", ">>=", ">>>", ">>>=", "^", "^=", "|", "|=", "||",
            "in", "instanceof"
    };

    /** Main entry point. Called once per test case. Returns a random JS program. */
    @Override
    public String generate(SourceOfRandomness random, GenerationStatus status) {
        this.status = status; // we save this so that we can pass it on to other generators
        this.identifiers = new HashSet<>();
        this.statementDepth = 0;
        this.expressionDepth = 0;
        return generateStatement(random).toString();
    }

    /** Utility method for generating a random list of items (e.g. statements, arguments, attributes) */
    private static List<String> generateItems(Function<SourceOfRandomness, String> genMethod, SourceOfRandomness random,
                                             int mean) {
        int len = random.nextInt(mean*2); // Generate random number in [0, mean*2) 
        List<String> items = new ArrayList<>(len);
        for (int i = 0; i < len; i++) {
            items.add(genMethod.apply(random));
        }
        return items;
    }

    /** Generates a random JavaScript statement */
    private String generateStatement(SourceOfRandomness random) {
        statementDepth++;
        String result;
        // If depth is too high, then generate only simple statements to prevent infinite recursion
        // If not, generate simple statements after the flip of a coin
        if (statementDepth >= MAX_STATEMENT_DEPTH || random.nextBoolean()) {
            // Choose a random private method from this class, and then call it with `random`
            result = random.choose(Arrays.<Function<SourceOfRandomness, String>>asList(
                    this::generateExpressionStatement,
                    this::generateBreakNode,
                    this::generateContinueNode,
                    this::generateReturnNode,
                    this::generateThrowNode,
                    this::generateVarNode,
                    this::generateEmptyNode
            )).apply(random);
        } else {
            // If depth is low and we won the flip, then generate compound statements
            // (that is, statements that contain other statements)
            result = random.choose(Arrays.<Function<SourceOfRandomness, String>>asList(
                    this::generateIfNode,
                    this::generateForNode,
                    this::generateWhileNode,
                    this::generateNamedFunctionNode,
                    this::generateSwitchNode,
                    this::generateTryNode,
                    this::generateBlock
            )).apply(random);
        }
        statementDepth--; // Reset statement depth when going up the recursive tree
        return result;
    }

    /** Generates a random JavaScript expression using recursive calls */
    private String generateExpression(SourceOfRandomness random) {
        expressionDepth++;
        String result;
        // Choose terminal if nesting depth is too high or based on a random flip of a coin
        if (expressionDepth >= MAX_EXPRESSION_DEPTH || random.nextBoolean()) {
            result = random.choose(Arrays.<Function<SourceOfRandomness, String>>asList(
                    this::generateLiteralNode,
                    this::generateIdentNode
            )).apply(random);
        } else {
            // Otherwise, choose a non-terminal generating function
            result = random.choose(Arrays.<Function<SourceOfRandomness, String>>asList(
                    this::generateBinaryNode,
                    this::generateUnaryNode,
                    this::generateTernaryNode,
                    this::generateCallNode,
                    this::generateFunctionNode,
                    this::generatePropertyNode,
                    this::generateIndexNode,
                    this::generateArrowFunctionNode
            )).apply(random);
        }
        expressionDepth--;
        return "(" + result + ")";
    }

    /** Generates a random binary expression (e.g. A op B) */
    private String generateBinaryNode(SourceOfRandomness random) {
        String token = random.choose(BINARY_TOKENS); // Choose a binary operator at random
        String lhs = generateExpression(random);
        String rhs = generateExpression(random);

        return lhs + " " + token + " " + rhs;
    }

    /** Generates a block of statements delimited by ';' and enclosed by '{' '}' */
    private String generateBlock(SourceOfRandomness random) {
        return "{ " + String.join(";", generateItems(this::generateStatement, random, 4)) + " }";
    }

    private String generateBreakNode(SourceOfRandomness random) {
        return "break";
    }

    private String generateCallNode(SourceOfRandomness random) {
        String func = generateExpression(random);
        String args = String.join(",", generateItems(this::generateExpression, random, 3));

        String call = func + "(" + args + ")";
        if (random.nextBoolean()) {
            return call;
        } else {
            return "new " + call;
        }
    }

    private String generateCaseNode(SourceOfRandomness random) {
        return "case " + generateExpression(random) + ": " +  generateBlock(random);
    }

    private String generateCatchNode(SourceOfRandomness random) {
        return "catch (" + generateIdentNode(random) + ") " +
                generateBlock(random);
    }

    private String generateContinueNode(SourceOfRandomness random) {
        return "continue";
    }

    private String generateEmptyNode(SourceOfRandomness random) {
        return "";
    }

    private String generateExpressionStatement(SourceOfRandomness random) {
        return generateExpression(random);
    }

    private String generateForNode(SourceOfRandomness random) {
        String s = "for(";
        if (random.nextBoolean()) {
            s += generateExpression(random);
        }
        s += ";";
        if (random.nextBoolean()) {
            s += generateExpression(random);
        }
        s += ";";
        if (random.nextBoolean()) {
            s += generateExpression(random);
        }
        s += ")";
        s += generateBlock(random);
        return s;
    }

    private String generateFunctionNode(SourceOfRandomness random) {
        return "function(" + String.join(", ", generateItems(this::generateIdentNode, random, 5)) + ")" + generateBlock(random);
    }

    private String generateNamedFunctionNode(SourceOfRandomness random) {
        return "function " + generateIdentNode(random) + "(" + String.join(", ", generateItems(this::generateIdentNode, random, 5)) + ")" + generateBlock(random);
    }

    private String generateArrowFunctionNode(SourceOfRandomness random) {
        String params = "(" + String.join(", ", generateItems(this::generateIdentNode, random, 3)) + ")";
        if (random.nextBoolean()) {
            return params + " => " + generateBlock(random);
        } else {
            return params + " => " + generateExpression(random);
        }

    }

    private String generateIdentNode(SourceOfRandomness random) {
        // Either generate a new identifier or use an existing one
        String identifier;
        if (identifiers.isEmpty() || (identifiers.size() < MAX_IDENTIFIERS && random.nextBoolean())) {
            identifier = random.nextChar('a', 'z') + "_" + identifiers.size();
            identifiers.add(identifier);
        } else {
            identifier = random.choose(identifiers);
        }

        return identifier;
    }

    private String generateIfNode(SourceOfRandomness random) {
        return "if (" +
                generateExpression(random) + ") " +
                generateBlock(random) +
                (random.nextBoolean() ? generateBlock(random) : "");
    }

    private String generateIndexNode(SourceOfRandomness random) {
        return generateExpression(random) + "[" + generateExpression(random) + "]";
    }

    private String generateObjectProperty(SourceOfRandomness random) {
        return generateIdentNode(random) + ": " + generateExpression(random);
    }

    private String generateLiteralNode(SourceOfRandomness random) {
        // If we are not too deeply nested, then it is okay to generate array/object literals
        if (expressionDepth < MAX_EXPRESSION_DEPTH && random.nextBoolean()) {
            if (random.nextBoolean()) {
                // Array literal
                return "[" + String.join(", ", generateItems(this::generateExpression, random, 3)) + "]";
            } else {
                // Object literal
                return "{" + String.join(", ", generateItems(this::generateObjectProperty, random, 3)) + "}";

            }
        } else {
            // Otherwise, generate primitive literals
            return random.choose(Arrays.<Supplier<String>>asList(
                    () -> String.valueOf(random.nextInt(-10, 1000)), // int literal
                    () -> String.valueOf(random.nextBoolean()),      // bool literal
                    () -> generateStringLiteral(random),
                    () -> "undefined",
                    () -> "null",
                    () -> "this"
            )).get();
        }
    }

    private String generateStringLiteral(SourceOfRandomness random) {
        // Generate an arbitrary string using the default string generator, and quote it
        return '"' + gen().type(String.class).generate(random, status) + '"';
    }

    private String generatePropertyNode(SourceOfRandomness random) {
        return generateExpression(random) + "." + generateIdentNode(random);
    }

    private String generateReturnNode(SourceOfRandomness random) {
        return random.nextBoolean() ? "return" : "return " + generateExpression(random);
    }

    private String generateSwitchNode(SourceOfRandomness random) {
        return "switch(" + generateExpression(random) + ") {"
                + String.join(" ", generateItems(this::generateCaseNode, random, 2)) + "}";
    }

    private String generateTernaryNode(SourceOfRandomness random) {
        return generateExpression(random) + " ? " + generateExpression(random) +
                " : " + generateExpression(random);
    }

    private String generateThrowNode(SourceOfRandomness random) {
        return "throw " + generateExpression(random);
    }

    private String generateTryNode(SourceOfRandomness random) {
        return "try " + generateBlock(random) + generateCatchNode(random);
    }

    private String generateUnaryNode(SourceOfRandomness random) {
        String token = random.choose(UNARY_TOKENS);
        return token + " " + generateExpression(random);
    }

    private String generateVarNode(SourceOfRandomness random) {
        return "var " + generateIdentNode(random);
    }

    private String generateWhileNode(SourceOfRandomness random) {
        return "while (" + generateExpression(random) + ")" + generateBlock(random);
    }
}

When JavaScriptCodeGenerator.generate(random) is invoked, it will generate a new random JavaScript program. An example of a program generated by this generator is:

((o_0) => (((o_0) *= (o_0)) < ((i_1) &= ((o_0)((((undefined)[(((i_1, o_0, a_2) => { if ((i_1)) { throw ((false).o_0) } })((y_3)))])((new (null)((true))))))))))

The generator has some interesting properties:

  • It makes heavy use of recursion. This is typical of the QuickCheck style where a structured input is created using a recursive generation procedure. The choice of which methods to call recursively is often taken at random. The generator also keeps track of recursion depth to prevent going into infinite loops.
  • It mimics the syntax of JavaScript but it is not context-free. Notice that the method generateIdentNode, which generates identifiers, maintains a set data structure called identifiers. This set keeps track of newly created identifier names, so it can possibly (by the flip of a coin) re-use previously generated identifier names. This helps generate programs that refer to the same variable multiple times.
  • It re-uses other generators. The method generateStringLiteral requests the default junit-quickcheck generator for strings using the API gen().type(String.class), and then uses the default string generator to generate arbitrary strings inside the string literal.

We now modify our test driver to add a new fuzzing entry point. This new test method, which we will call testWithGenerator, is similar to testWithString, but it specifies that its input argument be generated by the JavaScriptGenerator that we just wrote above. Modify the file src/test/java/examples/CompilerTest.java with the following:

...
import com.pholser.junit.quickcheck.From;
...
   /* In class CompilerTest... */

    @Fuzz
    public void testWithGenerator(@From(JavaScriptCodeGenerator.class) String code) {
        SourceFile input = SourceFile.fromCode("input", code);
        compile(input);
    }
...

Compile the generator and updated test driver:

mvn test-compile

Step 5: Fuzz using JS generator -- More coverage + New bugs!

Launch the fuzzing engine again but this time specify the method testWithGenerator:

mvn jqf:fuzz -Dclass=examples.CompilerTest -Dmethod=testWithGenerator -Dtime=5m

You will now be pleased to see a lot more code coverage and also some test failures. We found about 60% more coverage plus some new failures after 5 minutes of fuzzing:

Zest: Validity Fuzzing with Parametric Generators
-------------------------------------------------
Test name:            examples.CompilerTest#testWithGenerator
Results directory:    /path/to/tutorial/target/fuzz-results/examples.CompilerTest/testWithGenerator
Elapsed time:         5m 0s (max 5 min 0s)
Number of executions: 16,141
Valid inputs:         4,563 (28.27%)
Cycles completed:     0
Unique failures:      20
Queue size:           559 (0 favored last cycle)
Current parent input: 106 (favored) {121/360 mutations}
Execution speed:      12/sec now | 53/sec overall
Total coverage:       7,686 (11.73% of map)
Valid coverage:       7,570 (11.55% of map)

You can reproduce the failures using the mvn jqf:repro goal:

mvn jqf:repro -Dclass=examples.CompilerTest -Dmethod=testWithGenerator -Dinput=target/fuzz-results/examples.CompilerTest/testWithGenerator/failures/id_000000

You might see an error message like this:

Error(23): java.lang.RuntimeException: INTERNAL COMPILER ERROR.
Please report this problem.

Unexpected variable l_0
  Node(NAME l_0): stdin:1:8
while ((l_0)){ while ((l_0)){ if ((l_0)) { break;;var l_0;continue }{ break;var l_0 } } }
  Parent(FOR): stdin:1:0
while ((l_0)){ while ((l_0)){ if ((l_0)) { break;;var l_0;continue }{ break;var l_0 } } }

	at com.google.javascript.jscomp.VarCheck.visit(VarCheck.java:222)
	at com.google.javascript.jscomp.NodeTraversal.traverseBranch(NodeTraversal.java:772)
	at com.google.javascript.jscomp.NodeTraversal.traverseChildren(NodeTraversal.java:843)
	...

We reported this bug and several others to the project maintainers.

Sometimes, the exception messages themselves are sufficient to find the bug. If not, you can always modify your test driver -- in this case testWithGenerator() -- to print out the generated input program before invoking the compiler. Lastly, if you'd like to perform step-through debugging in your IDE, you can temporarily modify the @Fuzz annotation on your test driver to provide a file name to repro:

    @Fuzz(repro='target/fuzz-results/examples.CompilerTest/testWithGenerator/failures/id_000000')
    public void testWithGenerator(@From(JavaScriptCodeGenerator.class) String code) {
        SourceFile input = SourceFile.fromCode("input", code);
        compile(input);
    }

Once you have specified the input to repro, you can simply run this method as a normal JUnit test in your IDE (e.g. press the run/debug button in IntelliJ).