Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR MPL-0040] Failed on cluster #5655

Open
oharboe opened this issue Aug 26, 2024 · 18 comments
Open

[ERROR MPL-0040] Failed on cluster #5655

oharboe opened this issue Aug 26, 2024 · 18 comments
Assignees
Labels
mpl Macro Placement

Comments

@oharboe
Copy link
Collaborator

oharboe commented Aug 26, 2024

13 minutes to reproduce:

untar https://drive.google.com/file/d/18n0z4_Bk9Gscy3RRCiU6FiNvghIb6zIG/view?usp=drive_link

$ time ./run-me-BoomTile-asap7-base.sh
OpenROAD v2.0-15340-g7ebef4425
Features included (+) or not (-): +Charts +GPU +GUI +Python
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
HierRTLMP Flow enabled...
rtl_macro_placer -halo_width 20 -halo_height 20 -report_directory .//objects/asap7/BoomTile/base/rtlmp -target_util 0.60
Floorplan Outline: (0.0, 0.0) (2160.73, 2160.73),  Core Outline: (1.026, 1.08) (2159.73, 2159.73)
        Number of std cell instances: 1743858
        Area of std cell instances: 220985.73
        Number of macros: 72
        Area of macros: 691249.12
        Halo width: 20.00
        Halo height: 20.00
        Area of macros with halos: 1292620.62
        Area of std cell instances + Area of macros: 912234.88
        Core area: 4659886.50
        Design Utilization: 0.20
        Core Utilization: 0.06
        Manufacturing Grid: 1

[ERROR MPL-0040] Failed on cluster frontend/bpd/banked_predictors_1/btb
Error: macro_place_util.tcl, 143 MPL-0040

Originally posted by @oharboe in The-OpenROAD-Project/megaboom#97 (comment)

@Weather-OS

This comment was marked as outdated.

@maliberty

This comment was marked as outdated.

@oharboe oharboe changed the title One hour is a painfully long turnaround time for debugging... I will try to start a run to whittle down the problem: [ERROR MPL-0040] Failed on cluster Aug 26, 2024
@maliberty
Copy link
Member

Should this be an OR issue or is it related to the setup of megaboom?

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 26, 2024

Should this be an OR issue or is it related to the setup of megaboom?

Unknown. I have the reproduction case this morning, but I don't know anything about what is going on.

Please advice.

@maliberty
Copy link
Member

If you want OR developers to look at then it is best to file with OR. We don't track megaboom issues.

@oharboe

This comment was marked as outdated.

@maliberty maliberty transferred this issue from The-OpenROAD-Project/megaboom Aug 26, 2024
@maliberty maliberty added the mpl Macro Placement label Aug 26, 2024
@The-OpenROAD-Project The-OpenROAD-Project deleted a comment from dseynhae Aug 26, 2024
@maliberty
Copy link
Member

@AcKoucher please give this high priority (a workaround or a solution)

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 26, 2024

@AcKoucher @maliberty Found a workaround, tweak initial conditions

@maliberty
Copy link
Member

@AcKoucher @maliberty Found a workaround, tweak initial conditions

From the megaboom PR:

I would like to see an initial diagnosis from @AcKoucher first... but yes. I hope the problem is just some existing rare problem that presents itself with some unfortunate initial conditions and that it can be solved in due course but without urgency.

@AcKoucher
Copy link
Contributor

AcKoucher commented Aug 26, 2024

It looks like there's a combination of things that make this somewhat peculiar.

  1. After clustering we end up with multiple mixed clusters which are made of some few std cells and a macro (3, 7, 32, 35, 16, 17, 29).
  1. The dead space filling that we apply to mixed clusters during hierarchical macro placement annealing have meaningful effect in just three clusters (4 --> 7, 8 --> 11, 29 --> 32).
  1. There are way to many tiny std cell clusters.

Now, the actual problem seems to be that when we get to the point of placing the children of the cluster 4 in the first image - 7 in the second image after dead space filling - even with the target util variation SA can't fit the clusters in the outline. Apparently this happens, because the outline penalty never wins the fight against the boundary penalty.

However there's something going on with the wire length, because for all the steps, I see zero at the debug report (perhaps it's too small I have to check).

------ Penalty ------
Area                       1.0186
Outline Penalty            0.4646
Wirelength                 0.0000
Boundary Penalty         102.0848
Normalized Cost           55.1202

My first suggestion would be to try decreasing the halos as @oharboe already did or decrease the boundary penalty.
@maliberty It looks like there's a lot going on, do you have some idea of what to aim first?

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 26, 2024

Thanks! Sounds like this is in good hands and well understood. No longer urgent for my part as we have a workaround.

@AcKoucher
Copy link
Contributor

@oharboe Ok :-) I'm investigating what is going on with the clustering so we can have a proper fix.

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 27, 2024

Another workaround I'm trying out is to save a macro placement. With a saved macro placement, I should avoid rtlmp errors due to slight changes in initial conditions, like changed PLACE_DENSITY.

write_macro_placement macros.tcl

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 27, 2024

@AcKoucher Please confirm that the fixes work on the full testcase of 1 hour

I included a faster, 13 minute, testcase here, that I produced from the full testcase with deltaDebug.py.

There is a risk that deltaDebug.py identified other bugs than the original bug...

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 28, 2024

@AcKoucher Please confirm that the fixes work on the full testcase of 1 hour

I included a faster, 13 minute, testcase here, that I produced from the full testcase with deltaDebug.py.

There is a risk that deltaDebug.py identified other bugs than the original bug...

ah, the full test-case still fails...

@maliberty
Copy link
Member

Can you re-delta?

@AcKoucher
Copy link
Contributor

@oharboe As I said in #5666 there are other problems that need to be addressed in other to actually resolve the issue. I'm investigating.

@oharboe
Copy link
Collaborator Author

oharboe commented Aug 28, 2024

@maliberty New deltadeug: this test case takes ca. 13 minutes and fails on master:

https://drive.google.com/file/d/1klYn7s2_uJBk2Wi-vfPKK_ol8Kwv02sY/view?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mpl Macro Placement
Projects
None yet
Development

No branches or pull requests

4 participants