Skip to content

Rare exception in PromptReco from SoftMuonMvaRun3Estimator (muon with NaN globalTrack chi2) #48063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gpetruc opened this issue May 13, 2025 · 9 comments

Comments

@gpetruc
Copy link
Contributor

gpetruc commented May 13, 2025

Hello,

We've had one exception in prompt reco of the run 391884 as reported on tier0 cms talk.

The exception message is

An exception of category 'StdException' occurred while
   [0] Processing  Event run: 391884 lumi: 114 event: 31413135 stream: 1
   [1] Running path 'write_NANOAOD_step'
   [2] Prefetching for module PoolOutputModule/'write_NANOAOD'
   [3] Prefetching for module SimplePATTauFlatTableProducer/'boostedTauTable'
   [4] Prefetching for module PATObjectCrossLinker/'linkedObjects'
   [5] Prefetching for module PATJetRefSelector/'finalJetsPuppi'
   [6] Prefetching for module PATJetUserDataEmbedder/'updatedJetsPuppiWithUserData'
   [7] Prefetching for module PATJetUpdater/'updatedJetsPuppi'
   [8] Prefetching for module PATJetSelector/'slimmedJetsPuppi'
   [9] Prefetching for module PATJetUpdater/'updatedPatJetsTransientCorrectedSlimmedPuppiWithDeepTags'
   [10] Prefetching for module BoostedJetONNXJetTagsProducer/'pfParticleNetFromMiniAODAK4PuppiCentralJetTagsSlimmedPuppiWithDeepTags'
   [11] Prefetching for module ParticleNetFeatureEvaluator/'pfParticleNetFromMiniAODAK4PuppiCentralTagInfosSlimmedPuppiWithDeepTags'
   [12] Prefetching for module PATMuonSlimmer/'slimmedMuons'
   [13] Prefetching for module PATMuonSelector/'selectedPatMuons'
   [14] Calling method for module PATMuonProducer/'patMuons'
Exception Message:
A std::exception was thrown.
Feature is not set: glbNormChi2

The crash is reproducible running on just the affected event,

cmsrel CMSSW_15_0_5
cd CMSSW_15_0_5/src
cmsenv
tar xzvf /eos/cms/tier0/store/unmerged/data/logs/prod/2025/5/12/PromptReco_Run391884_ParkingSingleMuon7/Reco/0000/3/17264dd8-23e0-4851-b7e9-8a2578de72ce-27-3-logArchive.tar.gz
cd job/WMTaskSpace/cmsRun1

echo 'process.source.eventsToProcess = cms.untracked.VEventRange(cms.EventRange("391884:114:31413135"))' >> PSet.py
cmsRun PSet.py 2>&1 | tee exception.log

I believe the std::exception comes from the pat::XGBooster::predict method
https://github.com/cms-sw/cmssw/blob/CMSSW_15_0_5/PhysicsTools/XGBoost/src/XGBooster.cc#L81-L95
called by the SoftMuonMvaRun3Estimator
https://github.com/cms-sw/cmssw/blob/CMSSW_15_0_5/PhysicsTools/PatAlgos/src/SoftMuonMvaRun3Estimator.cc#L133
The exception is thrown because a muon in the event has a global track, but it's normalizedChi2 is NaN and so the XGBooster believes the variable has not been set.
The normalizedChi2 has a protection for zero d.o.f. as it does return ndof_ != 0 ? chi2_ / ndof_ : chi2_ * 1e6; but in this case it's the chi2 value that is NaN already.
Possibly this should be debugged / fixed upstream since the chi2 shouldn't be NaN to begin with, but changing SoftMuonMvaRun3Estimator tp so

booster.set("glbNormChi2", muon.isGlobalMuon() && !std::isnan(muon.globalTrack()->chi2()) ? muon.globalTrack()->normalizedChi2() : 9999.);

is enough for the processing of that event to conclude without exceptions.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 13, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @gpetruc.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign PhysicsTools/PatAlgos

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction,xpog

@ftorrresd,@hqucms,@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

FYI @cms-sw/muon-pog-l2

@jfernan2
Copy link
Contributor

@24LopezR @rbhattacharya04 as Muon Reco contacts, can you please have a look?

@24LopezR
Copy link
Contributor

Hi @gpetruc , the explanation and the fix you provide looks reasonable to me. It looks like it is just a problem with NaN values, so a workaround is fine. If you can, please make the corresponding PR and tag both @llunerti an me. Otherwise, we can make the PR for you.

@gpetruc
Copy link
Contributor Author

gpetruc commented May 16, 2025

Hi,
Apologies I didn't get the notification for the reply to the message.
I would prefer if you can do the PR,

Giovanni

@jfernan2
Copy link
Contributor

+1
Fixed by #48115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants