Skip to content

Conversation

dzbarsky
Copy link
Contributor

@dzbarsky dzbarsky commented Jul 2, 2025

On a write_source_files tree with ~20 chained targets and a few files each (so total 90 files + directories) the runtimes goes from 5-7s to 20ms.

We will want to precompile the helper binary like the rest of the helpers.

Putting this up to get feedback first.

})
}

func doCopy(action copyAction) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have Go programs in this repo for CopyFile and CopyDirectory mnemonics. I imagine it could share code, or perhaps even just update that binary with some more flags to handle this use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyFile uses cp but I think you're talking about the copy_directory and CloneFile stuff in tools/common?

I saw those initially but it wasn't super trivial to reuse because tools/copy_directory does its own concurrency control and I didn't want multiple layers spawning goroutines. Looking at it again, however, it does look like we can do a bit of restructuring so that write_source_files handles enqueueing the work and there's a single pool of workers. Happy to give that a shot.

I think adding more flags to the existing programs is less appealing, because:

  1. We currently don't use any flags for write_source_files since RunEnvironmentInfo can only take env vars, no args. To use args we'd need to emit launcher scripts vs just symlinking the executable like we do now, which requires forking on platform again
  2. I think it's clearer to understand the tools when they have separate well-defined entrypoints for the (fairly different) use cases

It does mean distributing an extra toolchain + slightly bigger downloads, but that doesn't feel like a huge drawback, WDYT?

@dzbarsky
Copy link
Contributor Author

dzbarsky commented Jul 3, 2025

@alexeagle I updated the PR with a sketch of how some code can be shared. There's a few things bothering me:

  1. I don't see how the existing implementation of copy_directory handles the executable bit (I'm not sure how Clonefile works but the fallback implementation just copies the bytes and doesn't seem to create the file properly?)
  2. I think the implementation I came up with does fewer syscalls (for example it only issues MkdirAll when writes fail, it first attempts to copy a file and then fallsback to directory handling instead of issuing upfront Stat, etc)
  3. The existing io.Copy does not reuse buffers, we should likely allocate a single buffer in the worker and reuse that with io.CopyBuffer
  4. It feels like we are promoting the copy_directory walker as the "common" one, which creates a bit of an oddness for copy_to_directory
  5. CopyFile requires an upfront Stat on srcPath to get the info, and we also need that to do that to decide whether to issue a CopyFile or CopyDirectory. In the current draft this happens serially and then execution of all copy actions is done by the worker pool, but ideally there wouldn't be a serial portion at all. This would mean doing the "try to copy as file and fallback to directory" inside the worker itself, are we ok with that? (It would likely slightly speedup existing copy_directory as well...)

Overall I can't tell if going down this path will create a mess or result in improvements to the existing implementations, but if we do want to unify we will probably want some prep PRs

@alexeagle
Copy link
Collaborator

I'm sorry I disappeared here!
TBH I'm not a great reviewer for this, I don't think I ever reviewed earlier versions (@gregmagolan wrote it originally and I think @thesayyn made changes).

If your analysis leads you to think that sharing code isn't beneficial here, I'm on board with that. These are small programs, and distributing them is easy. I think the highest-order bit is to make it easy to maintain and avoid bugs since we don't have any funding for this org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants