-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory corruption when running some Phase 2 reco job through ModuleAllocMonitor #45964
Comments
assign core |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
cms-bot internal usage |
A new Issue was created by @makortel. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@Dr15Jones This is the one I mentioned to you in person. |
The code here is void* XMemory::operator new(size_t size, MemoryManager* manager)
{
assert(manager != 0); // <--- this is the failing assertion
size_t headerSize = XMLPlatformUtils::alignPointerForNewBlockAllocation(
sizeof(MemoryManager*));
void* const block = manager->allocate(headerSize + size);
*(MemoryManager**)block = manager;
return (char*)block + headerSize;
} |
In addition of 1-thread job working, in my tests following worked too
2 streams and 5 threads resulted a 2 streams and 6 threads resulted a segfault in 2 streams and 7 threads resulted a segfault in 2 streams and 8 threads resulted a Symptoms look like more general memory corruption than a problem specific to Xerces. |
Wonder if we can get valgrind to work with these jobs. Might require linking a version of cmsRun directly with libPerfToolsAllocMonitorPreload.so. |
Running the example job #45854 (comment) by adding
and running the job as
LD_PRELOAD=libPerfToolsAllocMonitorPreload.so cmsRun PSet.py
results in assertion failureThe job is configured to use 8 threads and 2 streams. With 1 thread and stream the job succeeds.
Other AllocMonitors such as
SimpleAllocMonitor
andPeriodicAllocMonitor
work fine with the multithreaded job.I was unable to reproduce the assertion failure with
20034.0
workflow (I tried the first 3 steps).The text was updated successfully, but these errors were encountered: