Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back InitMask by IntervalSet #94450

Closed

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Feb 28, 2022

On the example below (sourced from #93215), this cuts ~2 GB of memory from the total memory usage at no significant delta or a slight improvement in instruction counts; will run perf across the full suite of benchmarks to evaluate any further need for improvements. The example moves from using 33 to 31 GB, roughly, which makes sense: the array is 16*10,000*100,000 = 14.9 GB, which corresponds to ~1.86 GB for the initialization bitset. (The full array is initialized in this case).

The remaining memory usage is due to the two copies of the array stored in memory: currently all constants we pass into LLVM are duplicated because LLVM seems to require a trailing null byte (which we don't provide), requiring a reallocation by LLVM. See LLVMConstStringInContext calls, if interested in more detail. This is likely fixable but will require some further work that's largely orthogonal from this PR, and the wins in this PR will remain useful.

const ROWS: usize = 100000;
const COLS: usize = 10000;

static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];

fn main() {}

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 28, 2022
@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 28, 2022
@bors
Copy link
Contributor

bors commented Feb 28, 2022

⌛ Trying commit 8f368a15c9d3ce6379271799985472132b6e6cfd with merge 05028ceb468ace681da1478ce8b2d521c433e20e...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 28, 2022

☀️ Try build successful - checks-actions
Build commit: 05028ceb468ace681da1478ce8b2d521c433e20e (05028ceb468ace681da1478ce8b2d521c433e20e)

@rust-timer
Copy link
Collaborator

Queued 05028ceb468ace681da1478ce8b2d521c433e20e with parent edda7e9, future comparison URL.

This extends the functionality available on IntervalSet to provide more flexible
access needed for the InitMask abstraction (and, likely, more generally useful
as code wishes to migrate between the various Idx-based set abstractions
available in rustc_index).
This avoids trouble with I = usize on 64-bit systems, where previously we would
either assert or truncate when converting to u32.
@tmiasko
Copy link
Contributor

tmiasko commented Feb 28, 2022

The worst case scenarios that came up last time in this context have been array where elements contain padding, for example:

pub static TWODARRAY: [[(u64, u8); COLS]; ROWS] = [[(0, 0); COLS]; ROWS];

@Mark-Simulacrum
Copy link
Member Author

Yeah, we'll do much worse on such arrays, it's a good point. I'm not sure what the right tradeoff is -- in practice, for the example you suggest we still end up with something like the IntervalSet (just without random insert ability) in the compression routine.

The IntervalSet has an up to 64x increase in memory usage over the bitset approach in the worst case, I think, since every other byte being init/uninit is represented in bytes/8 in the bitset and bytes/2*16 (64x larger) in the IntervalSet. I can switch the implementation over to a ChunkedBitSet, which would work better in the all-init and all-uninit cases (with 2048 bit granularity of choice), and so potentially is a more scalable solution -- it could easily enough support all the APIs we want, though compression of some kind likely still be desirable.

It probably makes more sense to prioritize reducing worst-case extra memory (i.e., the every other bit case, for example, which a bitset of some kind is good at) rather than best-case memory where ~all bytes are the same init state.

Some part of me also wants to consider implementing some kind of compression scheme atop IntervalSet, but it's probably both complicated and not worthwhile.

@Mark-Simulacrum Mark-Simulacrum added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 28, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (05028ceb468ace681da1478ce8b2d521c433e20e): comparison url.

Summary: This benchmark run shows 4 relevant improvements 🎉 but 42 relevant regressions 😿 to instruction counts.

  • Arithmetic mean of relevant regressions: 0.8%
  • Arithmetic mean of relevant improvements: -22.7%
  • Arithmetic mean of all relevant changes: -1.3%
  • Largest improvement in instruction counts: -29.6% on incr-unchanged builds of ctfe-stress-4 check
  • Largest regression in instruction counts: 1.5% on full builds of keccak check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 28, 2022
@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 28, 2022
@erikdesjardins
Copy link
Contributor

currently all constants we pass into LLVM are duplicated because LLVM seems to require a trailing null byte (which we don't provide), requiring a reallocation by LLVM.

This (unfortunately) doesn't seem to be the case--in practice we always pass DontNullTerminate = true to LLVMConstStringInContext.

All constants we pass into LLVM are duplicated because they're interned in a StringMap: https://github.com/llvm/llvm-project/blob/988dae653f4166020f28028cc149dd641c04dae7/llvm/lib/IR/Constants.cpp#L2982-L3002 (see also https://llvm.org/doxygen/classllvm_1_1Constant.html#details)

@bors
Copy link
Contributor

bors commented Mar 7, 2022

☔ The latest upstream changes (presumably #94597) made this pull request unmergeable. Please resolve the merge conflicts.

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 7, 2022
cleanup: remove unused ability to have LLVM null-terminate const strings

(and the copied function in rustc_codegen_gcc)

Noticed this while writing rust-lang#94450 (comment).

r? `@nagisa`
bjorn3 pushed a commit to bjorn3/rustc_codegen_gcc that referenced this pull request Mar 26, 2022
cleanup: remove unused ability to have LLVM null-terminate const strings

(and the copied function in rustc_codegen_gcc)

Noticed this while writing rust-lang/rust#94450 (comment).

r? `@nagisa`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants