-
Notifications
You must be signed in to change notification settings - Fork 100
Issue-749 - Preserve Symlinks When Copying Additional Build Files #750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the potential issue I pointed out, I/we need to give some thought on how your use case of using symlinks is going to interact with the code that attempts to preserve the build cache by checking file stats. I'm not certain if there is any impact there at all.
Fix when running ansible-builder create twice If symlink changes we will override it If symlink is not changed we will not copy
Copy same symlink Copy different symlink Copy broken symlink Copy file over symlink Copy symlink over file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left you some comments inline. In general, unnecessary code or comment reformatting just adds noise for the reviewer, so it's best to avoid that when possible.
src/ansible_builder/utils.py
Outdated
logger.debug("Symlink %s target differs and will be overwritten.", dest) | ||
elif dest_path.exists(): | ||
logger.debug("Destination %s is a regular file and will be replaced by symlink.", dest) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if maybe it would be better if, instead of comparing what the symlinks point to, we just let the filecmp.cmp()
call below take care of deciding if there are changes? I would think the contents would be more important than the actual file being pointed to. I think with the original code, the filecmp.cmp()
would have worked with symlinks as-is. Probably needs testing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filecmp.cmp()
with symlinks make the unit tests indeterministic - I install pytest-repeat and ran them 1000 times and copied which I would expect to be False would switch between True and False.
I kept the reproduction steps in the unit tests, but added an extra case before filecmp.cmp()
which explicitly checks for the symlink conditions
I couldn't track why filecmp.cmp()
was doing this tho - I was thinking it was something in the os or the pytest fixture for creating tmp files but couldn't figure it out
filecmp.cmp() has an issue with symlinks I'm thinking its perhaps within the os or the pytest tmp_path fixture but I couldn't track it down Using pytest-repeat --count=1000 you get different results 950~ True / 50~ False I tried various things like ensuring flushing the files but couldn't make it deterministic with the filecmp.cmp() check
test/unit/test_utils.py
Outdated
# 1000 runs using pytest-repeat | ||
# 37 times it is False | ||
# 963 times it is True | ||
assert copied in (False, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by this and couldn't think of a way to get around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very odd. I changed the assert to check for False
and ran the test 5000 times and it never failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm developing within a container maybe that is what is going on - I'll try on mac osx directly
I'm fine with just changing the assertion to False, removing the pytest-repeat comments in the tests and knowing that my environment is weird and it might invalidate the cache for me
Does that seem okay given you ran it 5k times and it always worked 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely change it to the expected value, otherwise this isn't really a good test. Can remove all of the comments about the tests failing too. It likely has something to do with your environment. In the meantime, I'll test it some more on my other systems, just to verify..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - the CI run reproduced the same issue I was experiencing locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still cannot reproduce this on my systems. But, on a hunch, can you try adding a call to filecmp.clear_cache()
in between the consecutive copy_file()
calls and see if you still get the failures on your machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion fixes things - I didn't realize it had a cache, was focusing on the files themselves. The outcome (False -> True) was cached is the issue:
After First Copy (Before Cache Clearing):
filecmp._cache: {('/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/source.txt', '/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/dest_symlink', (32768, 3, 1751046892.509871), (32768, 3, 1751046892.509871)): False}
After Cache Clearing and Second Copy:
filecmp._cache: {('/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/source.txt', '/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/dest_symlink', (32768, 3, 1751046892.509871), (32768, 3, 1751046892.509871)): True}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any preference what should happen here
If I call ansible-builder create
twice there is no shared cache between the runs so that's okay
but if copy_file
gets twice in the same program it could have this issue
I'm thinking:
1.) In copy_file
we could look if the input paths are already in cache and pop them (or just clear all cache) before using filecmp.cmp()
to check
2.) we could just add to the unit test which is probably fine for most current use-cases since each ansible-builder create
has its own cache in memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimal reproduction case that works at least in my env
import filecmp, shutil
with open("/tmp/foo", "w") as f:
f.write("hello")
with open("/tmp/bar", "w") as f:
f.write("world")
# Should be False
print(filecmp.cmp("/tmp/foo", "/tmp/bar", shallow=False))
# Overwrite bar with foo (same size)
shutil.copyfile("/tmp/foo", "/tmp/bar")
# Still returns False due to stale cache!
print(filecmp.cmp("/tmp/foo", "/tmp/bar", shallow=False)) # ❌ Incorrect
# Correct after clearing cache
filecmp.clear_cache()
print(filecmp.cmp("/tmp/foo", "/tmp/bar", shallow=False))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to modify copy_file()
here. This only happens in the test because filecmp.cmp()
is called back-to-back on the same file after it changes which, according to the filecmp
docs, can cause cache issues. This isn't going to happen in the normal code. So just putting the clear_cache()
call in the tests, with a comment on why we are adding it, is the proper solution, IMO.
This is to capture the case where we have a symlink that points to a file that doesn't exist yet (It could be created a runtime via a mount or a local task)
Source needed to move the symlink check before the file exists check
@@ -189,7 +189,10 @@ def copy_file(source: str, dest: str, ignore_mtime: bool = False) -> bool: | |||
raise Exception(f"Source {source} can not be a directory. Please use copy_directory instead.") | |||
if Path(dest).is_dir(): | |||
raise Exception(f"Destination {dest} can not be a directory. Please use copy_directory instead.") | |||
if not os.path.exists(dest): | |||
if Path(source).is_symlink() and Path(dest).is_symlink() and os.readlink(source) == os.readlink(dest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out why not filecmp.cmp(source, dest, shallow=False)
wasn't working properly for symlinks so I just added this as an explicit check.
It needs to be before:
elif not os.path.exists(dest):
To handle the case where the symlink is pointing to a file that does not yet exist (on purpose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filecmp.cmp()
should totally work with symlinks. Can you show me an example where it isn't working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe you meant filecmp.cmp()
didn't work because the link was pointing to a non-existing file? If so, then yeah, an exception would be thrown in that case since that function attempts to stat
the actual file (not the link). I.e., it follows the symlink for comparing.
Fixes issue reported in #749