Skip to content

[ARM] WFs 145.X fail with Segmentation violation in XercesC #48090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iarspider opened this issue May 15, 2025 · 10 comments
Open

[ARM] WFs 145.X fail with Segmentation violation in XercesC #48090

iarspider opened this issue May 15, 2025 · 10 comments

Comments

@iarspider
Copy link
Contributor

We are observing occasional RelVal failures in WFs 145.x due to SIGSEGVs in XercesC code. Some examples: 1, 2, 3, 4.

The exact failure position in XercesC code differs from workflow to workflow, but in all cases XercesC code is called from XMLConfigReader::readPatterns(...)

@cmsbuild
Copy link
Contributor

cmsbuild commented May 15, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@iarspider
Copy link
Contributor Author

assign L1Trigger/L1TMuonOverlap
@makortel FYI

@cmsbuild
Copy link
Contributor

New categories assigned: l1

@BenjaminRS,@quinnanm you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

E.g. in DetectorDescription packages we call these functions

namespace cms {
namespace concurrency {
namespace {
std::mutex g_xerces_mutex;
}
// We need to make sure these serialized are only called by one
// thread at the time. Until the first Init succeeds the other threads
// should not be allowed to proceed. We also do not want two different
// threads calling Init and Finalize at the same time. We therefore simply
// use a global mutex to serialize everything.
void xercesInitialize() {
std::unique_lock<std::mutex> l(g_xerces_mutex);
XMLPlatformUtils::Initialize();
}
void xercesTerminate() {
std::unique_lock<std::mutex> l(g_xerces_mutex);
XMLPlatformUtils::Terminate();
}
} // namespace concurrency
} // namespace cms

I don't see Utilities/Xerces being used in L1Trigger/L1TMuonOverlap package.

@dan131riley
Copy link

I don't see Utilities/Xerces being used in L1Trigger/L1TMuonOverlap package.

That's correct, it is calling XMLPlatformUtils::Initialize() directly, and that is almost certainly the cause of the crashes. Those calls need to be replaced with the cms::concurrency calls that wrap those with a mutex. These other places may have the same issue, depending on how they are used:

CondFormats/PPSObjects/src/CTPPSRPAlignmentCorrectionsMethods.cc
L1TriggerConfig/L1GtConfigProducers/src/L1GtTriggerMenuXmlParser.cc
L1Trigger/L1TCommon/src/XmlConfigParser.cc
L1Trigger/L1TMuonOverlapPhase1/src/Omtf/XMLConfigReader.cc
L1Trigger/L1TMuonOverlapPhase1/src/Omtf/XMLConfigWriter.cc
L1Trigger/L1TMuonOverlap/src/XMLConfigReader.cc
L1Trigger/L1TMuonOverlap/src/XMLConfigWriter.cc

I was working on opening an issue, but got distracted and @iarspider beat me to it.

It's also worth noting that these were easy to spot in the ROOT6 builds. They do happen with other release series, but they get hidden by all the expression parser segfaults unless you know where to look.

@makortel
Copy link
Contributor

assign CondFormats/PPSObjects

@cmsbuild
Copy link
Contributor

New categories assigned: alca

@atpathak,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

One could also ask how much these codes really need to use xerces, or could they be migrated to use e.g. tinyxml2 (but that would, of course, need more effort)

@BenjaminRS
Copy link
Contributor

BenjaminRS commented May 15, 2025

One could also ask how much these codes really need to use xerces, or could they be migrated to use e.g. tinyxml2 (but that would, of course, need more effort)

We can follow up with the OMTF experts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants