Skip to content

Issue-749 - Preserve Symlinks When Copying Additional Build Files #750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: devel
Choose a base branch
from

Conversation

clickthisnick
Copy link

Fixes issue reported in #749

@clickthisnick clickthisnick requested a review from a team as a code owner June 20, 2025 15:16
@github-actions github-actions bot added the needs_triage New item that needs to be triaged label Jun 20, 2025
Copy link
Contributor

@Shrews Shrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the potential issue I pointed out, I/we need to give some thought on how your use case of using symlinks is going to interact with the code that attempts to preserve the build cache by checking file stats. I'm not certain if there is any impact there at all.

@Shrews Shrews removed the needs_triage New item that needs to be triaged label Jun 24, 2025
Fix when running ansible-builder create twice

If symlink changes we will override it
If symlink is not changed we will not copy
Copy same symlink
Copy different symlink
Copy broken symlink
Copy file over symlink
Copy symlink over file
Copy link
Contributor

@Shrews Shrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left you some comments inline. In general, unnecessary code or comment reformatting just adds noise for the reviewer, so it's best to avoid that when possible.

logger.debug("Symlink %s target differs and will be overwritten.", dest)
elif dest_path.exists():
logger.debug("Destination %s is a regular file and will be replaced by symlink.", dest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if maybe it would be better if, instead of comparing what the symlinks point to, we just let the filecmp.cmp() call below take care of deciding if there are changes? I would think the contents would be more important than the actual file being pointed to. I think with the original code, the filecmp.cmp() would have worked with symlinks as-is. Probably needs testing though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filecmp.cmp() with symlinks make the unit tests indeterministic - I install pytest-repeat and ran them 1000 times and copied which I would expect to be False would switch between True and False.

I kept the reproduction steps in the unit tests, but added an extra case before filecmp.cmp() which explicitly checks for the symlink conditions

I couldn't track why filecmp.cmp() was doing this tho - I was thinking it was something in the os or the pytest fixture for creating tmp files but couldn't figure it out

filecmp.cmp() has an issue with symlinks

I'm thinking its perhaps within the os or the pytest tmp_path fixture but I couldn't track it down

Using pytest-repeat --count=1000 you get different results 950~ True / 50~ False

I tried various things like ensuring flushing the files but couldn't make it deterministic with the filecmp.cmp() check
# 1000 runs using pytest-repeat
# 37 times it is False
# 963 times it is True
assert copied in (False, True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this and couldn't think of a way to get around it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very odd. I changed the assert to check for False and ran the test 5000 times and it never failed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm developing within a container maybe that is what is going on - I'll try on mac osx directly

I'm fine with just changing the assertion to False, removing the pytest-repeat comments in the tests and knowing that my environment is weird and it might invalidate the cache for me

Does that seem okay given you ran it 5k times and it always worked 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely change it to the expected value, otherwise this isn't really a good test. Can remove all of the comments about the tests failing too. It likely has something to do with your environment. In the meantime, I'll test it some more on my other systems, just to verify..

Copy link
Author

@clickthisnick clickthisnick Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - the CI run reproduced the same issue I was experiencing locally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still cannot reproduce this on my systems. But, on a hunch, can you try adding a call to filecmp.clear_cache() in between the consecutive copy_file() calls and see if you still get the failures on your machine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion fixes things - I didn't realize it had a cache, was focusing on the files themselves. The outcome (False -> True) was cached is the issue:

After First Copy (Before Cache Clearing):
filecmp._cache: {('/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/source.txt', '/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/dest_symlink', (32768, 3, 1751046892.509871), (32768, 3, 1751046892.509871)): False}

After Cache Clearing and Second Copy:
filecmp._cache: {('/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/source.txt', '/tmp/pytest-of-clickthisnick/pytest-122/test_regular_file_overwrites_s0/dest_symlink', (32768, 3, 1751046892.509871), (32768, 3, 1751046892.509871)): True}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any preference what should happen here

If I call ansible-builder create twice there is no shared cache between the runs so that's okay
but if copy_file gets twice in the same program it could have this issue


I'm thinking:

1.) In copy_file we could look if the input paths are already in cache and pop them (or just clear all cache) before using filecmp.cmp() to check

2.) we could just add to the unit test which is probably fine for most current use-cases since each ansible-builder create has its own cache in memory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimal reproduction case that works at least in my env

import filecmp, shutil

with open("/tmp/foo", "w") as f:
    f.write("hello")

with open("/tmp/bar", "w") as f:
    f.write("world")

# Should be False
print(filecmp.cmp("/tmp/foo", "/tmp/bar", shallow=False))

# Overwrite bar with foo (same size)
shutil.copyfile("/tmp/foo", "/tmp/bar")

# Still returns False due to stale cache!
print(filecmp.cmp("/tmp/foo", "/tmp/bar", shallow=False))  # ❌ Incorrect

# Correct after clearing cache
filecmp.clear_cache()
print(filecmp.cmp("/tmp/foo", "/tmp/bar", shallow=False)) 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to modify copy_file() here. This only happens in the test because filecmp.cmp() is called back-to-back on the same file after it changes which, according to the filecmp docs, can cause cache issues. This isn't going to happen in the normal code. So just putting the clear_cache() call in the tests, with a comment on why we are adding it, is the proper solution, IMO.

This is to capture the case where we have a symlink that points to a file that doesn't exist yet (It could be created a runtime via a mount or a local task)
Source needed to move the symlink check before the file exists check
@@ -189,7 +189,10 @@ def copy_file(source: str, dest: str, ignore_mtime: bool = False) -> bool:
raise Exception(f"Source {source} can not be a directory. Please use copy_directory instead.")
if Path(dest).is_dir():
raise Exception(f"Destination {dest} can not be a directory. Please use copy_directory instead.")
if not os.path.exists(dest):
if Path(source).is_symlink() and Path(dest).is_symlink() and os.readlink(source) == os.readlink(dest):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out why not filecmp.cmp(source, dest, shallow=False) wasn't working properly for symlinks so I just added this as an explicit check.

It needs to be before:
elif not os.path.exists(dest):

To handle the case where the symlink is pointing to a file that does not yet exist (on purpose)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filecmp.cmp() should totally work with symlinks. Can you show me an example where it isn't working?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe you meant filecmp.cmp() didn't work because the link was pointing to a non-existing file? If so, then yeah, an exception would be thrown in that case since that function attempts to stat the actual file (not the link). I.e., it follows the symlink for comparing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants