-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Compression options ignored on output when no change is made on input data #48182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cms-bot internal usage |
A new Issue was created by @rsreds. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Additionally, I tested the same thing in RNTUPLE_X using the |
What is happening is your job is using 'fast cloning'. If the job can guarantee that the order in which the events are stored in the input will be the exact same order as they will be stored in the output (i.e. only 1 thread is running and there are no EDFilters in the job) then the default is to 'fast clone'. What a fast copy does is it takes the raw bytes from the input file and stores then in the output, without doing any decompressing/recompressing. This makes the job substantially faster but also means any changes to compression settings are ignored. You can stop this behavior by doing process.out.fastCloning = cms.untracked.bool(False) |
I understand. I would argue though that this fast cloning should be performed only if the compression settings stays the same. Otherwise the behavior does not match what is actually stored in the root compression settings. |
assign core |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Unfortunately this behavior follows from decision from a decade (or two) ago, and if changed, there is a high chance of breaking something in the offline computing system. Yes, the setup is brittle, but doesn't seem worth of touching at this point. We should do better with RNTuple. |
When opening a file with
PoolSource
and then using anPoolOutputModule
to write it back and specifying a different compression method, the file does not seem to be compressed, even if the compression setting is correctly assigned.The minimal config:
run on a TTBar sample with 1000 events containing only
FEDRawDataCollection
(follows the content of the input file as per theedmFileUtil
print):results in this file:
When checking the compression settings with root:
Menawhile, iff doing the compression with
hadd
:The
edmFileUtil
print is:Confirmed also by the compression settings in root.
The text was updated successfully, but these errors were encountered: