Skip to content

tee: GNU test failing in CI #7805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jtracey opened this issue Apr 20, 2025 · 6 comments
Open

tee: GNU test failing in CI #7805

jtracey opened this issue Apr 20, 2025 · 6 comments
Labels

Comments

@jtracey
Copy link
Contributor

jtracey commented Apr 20, 2025

tee has started failing its GNU test in the CI, when it had been passing on the main branch. See, e.g., #7765 (comment), #7801 (comment), #7799 (comment). Relevant test output is:

FAIL: tests/misc/tee
====================

tee: file.ro: Permission denied (os error 13)
FAIL tests/misc/tee.sh (exit status: 1)

The relevant part of the test script is, in a nutshell, doing touch file.ro && chmod a-w file.ro && tee -p </dev/null file.ro, expecting exit status 1 (EPERM, operation not permitted). Since this test is passing on my and others machines, I suspect this is the CI environment changing something to return exit status 13 (EACCESS, permission denied). Not sure why this is happening, since we do have permission access to the file (presumably), just not permission to write to it.

@jtracey
Copy link
Contributor Author

jtracey commented Apr 21, 2025

Looks like the (os error 13) might be a red herring, since I can confirm, at least locally, that even when reporting os error 13, our tee does still return 1.

@drinkcat
Copy link
Contributor

drinkcat commented Apr 21, 2025

Yeah, it's a red herring, that error is meant to happen.

You can get the exact some error output by adding exit 1 towards the end of test/misc/tee.sh, so something else is failing but it's not clear what...

Trying by setting VERBOSE=yes (see 6469559)

drinkcat added a commit to drinkcat/coreutils that referenced this issue Apr 21, 2025
drinkcat added a commit to drinkcat/coreutils that referenced this issue Apr 21, 2025
@drinkcat
Copy link
Contributor

Got verbose logs here: https://github.com/drinkcat/coreutils/actions/runs/14574570622/job/40877858265 , https://productionresultssa10.blob.core.windows.net/actions-results/82c4e74b-7898-41a8-bff9-0e4502de1c29/workflow-job-run-2edfd9fd-13f0-51db-b86a-dc1022807e62/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-04-21T15%3A31%3A11Z&sig=vZcj8t4%2BtyQU4%2FckeoXa1a7itxw0ss%2FLX8sgHKUzsJQ%3D&ske=2025-04-21T23%3A22%3A59Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-04-21T11%3A22%3A59Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-01-05&sp=r&spr=https&sr=b&st=2025-04-21T15%3A21%3A06Z&sv=2025-01-05

In a gist for easier parsing... https://gist.github.com/drinkcat/b406f61bed0422985015ff58fff0cc01

This seems to be the error:

2025-04-21T14:20:20.3132187Z + read_fifo
2025-04-21T14:20:20.3132294Z + timeout 10 dd count=1 if=fifo of=/dev/null status=none
2025-04-21T14:20:20.3132355Z + yes
2025-04-21T14:20:20.3132422Z + timeout 10 tee -p
2025-04-21T14:20:20.3132481Z + fail=1

I think its this test that it failing:

# With -p, SIGPIPE is suppressed, exit 0 for EPIPE when all outputs finished
read_fifo
yes | timeout 10 tee -p 2>err >fifo || fail=1
test $(wc -l < err) = 0 || { cat err; fail=1; }

@drinkcat
Copy link
Contributor

drinkcat commented Apr 21, 2025

Reproducer tee-test.sh:

# set -x

rm -f fifo
mkfifo fifo

# Previous test, not 100% necessary but makes errors more frequent
(timeout 10 dd count=1 if=fifo of=/dev/null status=none; echo timeout1) &
yes | timeout 10 tee ./e/noent 2>err >fifo
echo "RET:$? (want nonzero)"

(timeout 10 dd count=1 if=fifo of=/dev/null status=none; echo timeout2) &
yes | timeout 10 tee -p >fifo
echo "RET:$? (want 0)"

Then run with something like this (uutils yes/timeout seem to work fine here, it's a tee problem):

cargo build -p uu_yes
cargo build -p uu_timeout
cargo build -p uu_tee
PATH=./target/debug/:$PATH bash tee-test.sh # Test with uutils
bash tee-test.sh # Test with system tools (GNU)

Sometimes the test "succeeds", but there's a 10 second pause that should not be there.

timeout1
RET:141 (want nonzero)
<<< 10 seconds pause here
timeout2
RET:0 (want 0)

Sometimes it fails:

PATH=./target/debug/:$PATH bash tee-test.sh 
timeout1
RET:141 (want nonzero)
timeout2
RET:124 (want 0)

I think we don't always fail because of a race condition between the 2 timeouts (if you reduce the last timeout 10 tee to 2, uutils always fails, while GNU always succeeds.

@drinkcat
Copy link
Contributor

strace is interesting.

GNU

poll([{fd=0, events=POLLIN|POLLRDBAND}, {fd=1, events=POLLRDBAND}], 2, -1) = 1 ([{fd=0, revents=POLLIN}])
read(0, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
poll([{fd=0, events=POLLIN|POLLRDBAND}, {fd=1, events=POLLRDBAND}], 2, -1) = 2 ([{fd=0, revents=POLLIN}, {fd=1, revents=POLLERR}])
read(0, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192
write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = -1 EPIPE (Broken pipe)

uutils:

poll([{fd=1, events=POLLRDBAND}], 1, -1) = 
[HANGS until the fifo reader exits]
 ([{fd=1, revents=POLLERR}])

I can fix the issue by completely bypassing ensure_stdout_not_broken... @RenjiSann FYI, looks like you added this code recently.

(BTW, this is not a new failure, if you look at https://github.com/uutils/coreutils-tracking/blame/main/aggregated-result.json tee.log line and press the "blame prior" button you'll see that the test most often failed, but happened to flip-flop recently)

@RenjiSann
Copy link
Collaborator

My change what initially fixing #7073. I had feared the change would not be robust enough and I was maybe missing something, which it appears I was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants