Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in PixelCPEFast<TrackerTraits>::fillParamsForGpu #45332

Open
iarspider opened this issue Jun 27, 2024 · 10 comments
Open

Assertion failure in PixelCPEFast<TrackerTraits>::fillParamsForGpu #45332

iarspider opened this issue Jun 27, 2024 · 10 comments

Comments

@iarspider
Copy link
Contributor

iarspider commented Jun 27, 2024

In CMSSW_14_1_X_2024-06-26-2300 for all architectures, RelVal 29634.501 failed assertion:

cmsRun: src/RecoLocalTracker/SiPixelRecHits/src/PixelCPEFast.cc:131: void PixelCPEFast<TrackerTraits>::fillParamsForGpu() [with TrackerTraits = pixelTopology::Phase2]: Assertion `commonParamsGPU_.thePitchY == p.thePitchY' failed.


A fatal system signal has occurred: abort signal
The following is the call stack containing the origin of the signal.

(...)
Thread 1 (Thread 0x40001d68b4c0 (LWP 733254) "cmsRun"):
#0  0x000040001dad0960 in poll () from /lib64/libc.so.6
#1  0x00004000229c05a0 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00004000229c07d4 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000040001da722e8 in __pthread_kill_implementation () from /lib64/libc.so.6
#5  0x000040001da2a73c in raise () from /lib64/libc.so.6
#6  0x000040001da17034 in abort () from /lib64/libc.so.6
#7  0x000040001da24090 in __assert_fail_base () from /lib64/libc.so.6
#8  0x000040001da24100 in __assert_fail () from /lib64/libc.so.6
#9  0x000040006038d0f8 in PixelCPEFast<pixelTopology::Phase2>::fillParamsForGpu() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/libRecoLocalTrackerSiPixelRecHits.so
#10 0x000040006038d328 in PixelCPEFast<pixelTopology::Phase2>::PixelCPEFast(edm::ParameterSet const&, MagneticField const*, TrackerGeometry const&, TrackerTopology const&, SiPixelLorentzAngle const*, SiPixelGenErrorDBObject const*, SiPixelLorentzAngle const*) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/libRecoLocalTrackerSiPixelRecHits.so
#11 0x00004000600300cc in PixelCPEFastESProducerT<pixelTopology::Phase2>::produce(TkPixelCPERecord const&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
#12 0x00004000600384a0 in void edm::SerialTaskQueueChain::actionToRun<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<PixelCPEFastESProducerT<pixelTopology::Phase2>, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >(PixelCPEFastESProducerT<pixelTopology::Phase2>*, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> > (PixelCPEFastESProducerT<pixelTopology::Phase2>::*)(TkPixelCPERecord const&), edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> const&, edm::es::Label const&)::{lambda(TkPixelCPERecord const&)#1}, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<PixelCPEFastESProducerT<pixelTopology::Phase2>, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >(PixelCPEFastESProducerT<pixelTopology::Phase2>*, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> > (PixelCPEFastESProducerT<pixelTopology::Phase2>::*)(TkPixelCPERecord const&), edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> const&, edm::es::Label const&)::{lambda(TkPixelCPERecord const&)#1}, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(TkPixelCPERecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}&>(tbb::detail::d1::task_group*&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
#13 0x00004000600387f4 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::eventsetup::CallbackBase<edm::ESProducer, edm::ESProducer::setWhatProduced<PixelCPEFastESProducerT<pixelTopology::Phase2>, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >(PixelCPEFastESProducerT<pixelTopology::Phase2>*, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> > (PixelCPEFastESProducerT<pixelTopology::Phase2>::*)(TkPixelCPERecord const&), edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> const&, edm::es::Label const&)::{lambda(TkPixelCPERecord const&)#1}, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >::makeProduceTask<edm::eventsetup::Callback<edm::ESProducer, edm::ESProducer::setWhatProduced<PixelCPEFastESProducerT<pixelTopology::Phase2>, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >(PixelCPEFastESProducerT<pixelTopology::Phase2>*, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> > (PixelCPEFastESProducerT<pixelTopology::Phase2>::*)(TkPixelCPERecord const&), edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> const&, edm::es::Label const&)::{lambda(TkPixelCPERecord const&)#1}, std::unique_ptr<PixelClusterParameterEstimator, std::default_delete<PixelClusterParameterEstimator> >, TkPixelCPERecord, edm::eventsetup::CallbackSimpleDecorator<TkPixelCPERecord> >::prefetchAsync(edm::WaitingTaskHolder, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, edm::ServiceToken const&, edm::ESParentContext const&)::{lambda(auto:1&&, auto:2&&, auto:3&&, auto:4&&)#1}::operator()<tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&>(tbb::detail::d1::task_group*&, edm::ServiceWeakToken&, edm::eventsetup::EventSetupRecordImpl const*&, edm::EventSetupImpl const*&) const::{lambda(TkPixelCPERecord const&)#1}>(tbb::detail::d1::task_group*, edm::ServiceWeakToken const&, edm::eventsetup::EventSetupRecordImpl const*, edm::EventSetupImpl const*, bool, tbb::detail::d1::task_group*&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}::operator()(std::__exception_ptr::exception_ptr const*) const::{lambda()#2}>(tbb::detail::d1::task_group&, tbb::detail::d1::task_group*&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
#14 0x000040001bf07818 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/libFWCoreConcurrency.so
#15 0x000040001d4d86c4 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x4000a5a62100, this=0x40001e49b300) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc12/external/tbb/v2021.9.0-d3fee8b576fbbe8272c9d17d70a75801/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x40001e49b300) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc12/external/tbb/v2021.9.0-d3fee8b576fbbe8272c9d17d70a75801/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#17 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc12/external/tbb/v2021.9.0-d3fee8b576fbbe8272c9d17d70a75801/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#18 0x000040001b936464 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/libFWCoreFramework.so
#19 0x000040001b945bfc in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/libFWCoreFramework.so
#20 0x000040001b93ba0c in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02843/el9_aarch64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-06-26-2300/lib/el9_aarch64_gcc12/libFWCoreFramework.so
#21 0x00000000004071e0 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#22 0x000040001d4d1048 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins_a/workspace/build-any-ib/w/BUILD/el9_aarch64_gcc12/external/tbb/v2021.9.0-d3fee8b576fbbe8272c9d17d70a75801/tbb-v2021.9.0/src/tbb/arena.cpp:688
#23 0x0000000000408ab0 in main::{lambda()#1}::operator()() const ()
#24 0x00000000004047c4 in main ()
@iarspider
Copy link
Contributor Author

assign RecoLocalTracker/SiPixelRecHits

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Jun 27, 2024

IIUC this is expected and done consciously, see #45175 (comment).
FYI @srimanob @AdrianoDee

@mmusich
Copy link
Contributor

mmusich commented Jun 27, 2024

Somewhat a duplicate of #45177

@AdrianoDee
Copy link
Contributor

Yes it's done on purpose to have a constant reminder since we have no quick fix for that.

@mmusich
Copy link
Contributor

mmusich commented Aug 20, 2024

@iarspider I think this issue should be re-opened, see #45177 (comment)

@iarspider iarspider reopened this Aug 20, 2024
@makortel
Copy link
Contributor

I'm confused of #45177 (comment) and #45177 (comment) in whether #45694 was supposed to address these assertion failures on 29634.501 and 29634.502. These workflows are still failing in CMSSW_14_2 2024-09-10-2300.

@AdrianoDee Could you clarify?

@AdrianoDee
Copy link
Contributor

I was confused too since I thought the failures came from the relvals in relval_gpu.py (that I had updated with #45694). Instead these come from relval_2026.py. I've opened #45980 to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants