Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Z3 phantom reference map by doubly-linked list. #355

Merged
merged 15 commits into from
Feb 13, 2024

Conversation

ThomasHaas
Copy link
Contributor

@ThomasHaas ThomasHaas commented Feb 4, 2024

I played around with the user propagator feature I implemented in #349 and ran into a performance problem related to phantom references. The issue is that the user propagator creates a BooleanFormula wrapper whenever the SMT solver reports a variable assignment, causing a phantom reference to get created.
This quickly generates hundreds of thousands of phantom references (or even millions?) and the HashMap that is used to manage phantom references degrades heavily: 80%(!) of all solving time was spent just accessing the map (this was 4 minutes from 5 minutes of total solving time!).

In this PR I propose a simple (general-purpose) solution by replacing the map with a doubly-linked list.
Doubly-linked lists provide the necessary O(1) deletion and insertion time while scaling to arbitrary sizes.
Furthermore, the memory footprint of the list should be strictly smaller than that of the previously used IdentityHashMap.

On my test benchmark, the solving time was reduced from 5 minutes down to 70 seconds, and the phantom reference overhead was just 10% rather than 80% of the total time. Interestingly, my profiler reported the 10% to be mostly from ReferenceQueue.poll rather than the actual cleanup code (the profiler might be inaccurate though).

@ThomasHaas ThomasHaas changed the title Replaced Z3 phantom reference map by doubly-linked list. Replace Z3 phantom reference map by doubly-linked list. Feb 4, 2024
@PhilippWendler
Copy link
Member

Furthermore, the memory footprint of the list should be strictly smaller than that of the previously used IdentityHashMap.

Really?

The doubly-linked list has 2n references + n longs, so 192 bits per entry. We do not need to count the object header of Z3AstReference objects because we have the PhantomReference objects anyway.

The hash map consists of one array with a size of 2n to 4n references (references for key and value, with overhead for empty cells), so 128 to 256 bits per entry. The hash map also references Long objects, which are free if they happen to be exist anyway because they are referenced somewhere else as well, or cost another 192 bits per entry. But I don't know what is the case here.

So if I did not overlook something, the list is not strictly smaller than the map, if the latter happens to be >= 75% full and the Longs are referenced anyway.

But I see a major additional argument: The memory consumption of the list shrinks again once fewer elements exist. IdentityHashMap does not implement any shrinking of the table, so if the program creates a lot of Z3Formula objects and frees them afterwards, the memory consumed by the table is kept occupied until the whole solver context is freed.

Of course, having add/remove be guaranteed O(1) instead of just amortized is also nice.

src/org/sosy_lab/java_smt/solvers/z3/Z3FormulaCreator.java Outdated Show resolved Hide resolved
src/org/sosy_lab/java_smt/solvers/z3/Z3FormulaCreator.java Outdated Show resolved Hide resolved
src/org/sosy_lab/java_smt/solvers/z3/Z3FormulaCreator.java Outdated Show resolved Hide resolved
Comment on lines 931 to 989

// Force clean all ASTs, even those which were not GC'd yet.
// Is a no-op if phantom reference handling is not enabled.
for (long ast : referenceMap.values()) {
Native.decRef(getEnv(), ast);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for deleting this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was by accident because I misinterpreted the purpose. I added it back.

However, I noticed that if the PhantomReferences are already enqueued, then they will never get removed from the queue after closing (until the context object itself gets collected).
Unfortunately, ReferenceQueue.clear does not exist and one has to poll all entries to clear them.
Should I add this or does this not really matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean those references that have been enqueued before close() is called? These could in principle be cleaned up, but note that there will likely also be references that are enqueued after close() is called, and there is nothing we can do about them, so I do not think we need to bother with the already enqueued ones.

Once the solver context becomes unreachable and is cleaned up by GC, the queue and all PhantomReferences are also cleaned up. So this affects only users who close the context but then keep a reference to it for a longer time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my answer to the bulk deletion point you raised further below.

@ThomasHaas
Copy link
Contributor Author

Furthermore, the memory footprint of the list should be strictly smaller than that of the previously used IdentityHashMap.

Really?

I was expecting this response :).

So if I did not overlook something, the list is not strictly smaller than the map, if the latter happens to be >= 75% full and the Longs are referenced anyway.

It is true that I accidentally calculated in the boxing of the longs, but those do already exist in all cases where the function is used. Nevertheless, the load factor of IdentityHashMap is 66.67% (it resizes if 3*size > 2*capacity) guaranteeing that for each stored object there is at least one free cell, hence the size per object is 3n <= size/object <= 4n.
The proposed list is always 3n in memory size and thus smaller. It might be the case that a fully filled hashmap is identical in size to the list, but for all practical purposes I would still claim that the list is strictly smaller :).

But I see a major additional argument: The memory consumption of the list shrinks again once fewer elements exist. IdentityHashMap does not implement any shrinking of the table, so if the program creates a lot of Z3Formula objects and frees them afterwards, the memory consumed by the table is kept occupied until the whole solver context is freed.

Good point. This also means the upper bound size/object <= 4n does not even hold. The maps memory consumption is unbounded.

Of course, having add/remove be guaranteed O(1) instead of just amortized is also nice.

Yes, it performs A LOT better in extreme cases where the hashmap's operations start to approach linear time.

ThomasHaas and others added 2 commits February 6, 2024 10:43
Co-authored-by: Philipp Wendler <2545335+PhilippWendler@users.noreply.github.com>
@ThomasHaas
Copy link
Contributor Author

ThomasHaas commented Feb 7, 2024

I noticed that the Z3FormulaCreator is full of calls to clearReferences, pretty much everywhere where objects get created.
How about moving the call to clearReferences to inside storePhantomReference so that all code that generates phantom references also clears them? This would make all existing calls to clearReferences obsolete.

… comparison.

Z3 uses internal formula hashing,
thus there should not be multiple pointers to the same boolean constants.

This reduces the number of JNI calls for some simple cases.
Build-dependencies produce the largest artifact, around 160 MB for JavaSMT.
We do not need those artifacts being around for longer.
The latest artifact of a repository is kept for longer than this expiration time,
so this change should not affect the reuse of any build-cache.
Copy link
Member

@kfriedberger kfriedberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not detect a breaking example or unit test. The code itself looks valid.
However, I am not sure whether we need/want to merge this PR.

Executing a command like
java -cp bin:lib/java/runtime-z3/common.jar:lib/java/runtime-z3/com.microsoft.z3.jar org.sosy_lab.java_smt.example.nqueens_user_propagator.NQueens 11 2
or
java -cp bin:lib/java/runtime-z3/common.jar:lib/java/runtime-z3/com.microsoft.z3.jar org.sosy_lab.java_smt.example.nqueens_user_propagator.NQueens 11 3
showed a no speedup, e.g., the example runs for about 6.0-6.5s, with and without this change.
Perhaps the speedup was related to other optimizations implemented for Z3 or user propagation.

@ThomasHaas
Copy link
Contributor Author

ThomasHaas commented Feb 13, 2024

I could not detect a breaking example or unit test. The code itself looks valid. However, I am not sure whether we need/want to merge this PR.

Executing a command like java -cp bin:lib/java/runtime-z3/common.jar:lib/java/runtime-z3/com.microsoft.z3.jar org.sosy_lab.java_smt.example.nqueens_user_propagator.NQueens 11 2 or java -cp bin:lib/java/runtime-z3/common.jar:lib/java/runtime-z3/com.microsoft.z3.jar org.sosy_lab.java_smt.example.nqueens_user_propagator.NQueens 11 3 showed a no speedup, e.g., the example runs for about 60-65s, with and without this change. Perhaps the speedup was related to other optimizations implemented for Z3 or user propagation.

Have you enabled phantom references (they are disabled by default)?
Also, I made the user propagator robust(*) against this feature by reusing existing wrappers and creating only half as many wrappers as before.
Even if I disable the optimization, Nqueens (N=13) does not run into performance problems but it still causes the reference map to reach a size of 7+ million entries, meaning it will consume over 1GB of memory until the whole context is dropped (due to the lack of shrinking).

(*) The robustness is only for formulas reported by the user propagator itself. If the user decides to create new formulas from inside the callbacks, they can still end up with millions of phantom references and a possibility of encountering bad map performance.

EDIT: Btw. on what hardware are you running the Nqueens example? My machine runs 11-Queens in 2 seconds rather than 60. I'm surprised by such a big discrepancy.

@kfriedberger
Copy link
Member

kfriedberger commented Feb 13, 2024

First: I forgot to enable phantom references. Thanks for the hint.

Second: I copied the time numbers and forgot the decimal separator (too late in the night :-) ). My older private laptop requires about 6.0-6.5s for the example. I measured again on a bigger machine just now and get about 2s when executing NQueens of size 11 with method 2 or 3.

For n=12 and method 2, the big machine requires about 20s with map-stored phantom references, and also 20s with the list-based references.
For n=13 and method 2, its about 230s with map-based references and also 228s list-based references.

Measuring further examples might not bring any benefit on my side. From theoretical point, the space complexity of list and map are quite similar. General time complexity is also similar. Worst time complexity is better for the list-based approach. -> I will merge this PR.

@kfriedberger kfriedberger merged commit a24e437 into sosy-lab:master Feb 13, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants