-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
We're observing intermittent HTTP 502s in production, which seem to be correlated with the "autorestarting worker after current request" log line, and are less frequent as we increase max_requests
. I've reproduced this on 21.2.0 and 20.1.0, but it doesn't seem to happen in 20.0.4.
I've produced a minimal reproduction case following the gunicorn.org example as closely as possible, but please let me know if there are other changes you'd recommend:
Application
def app(environ, start_response):
data = b"Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
return iter([data])
Configuration
gunicorn --max-requests 10 --worker-class gthread myapp:app
Reproduction
#!/usr/bin/env bash
set -euo pipefail
# start gunicorn
gunicorn --max-requests 10 --log-level debug --worker-class gthread myapp:app &
# wait for gunicorn to be ready
sleep 5
# send 20 sequential HTTP requests with minimal output
for i in {1..20}; do
curl -S -s -o /dev/null -w "[REPRO]: Received HTTP %{http_code} from GET /${i}\n" "127.0.0.1:8000/${i}" || true
done
Quickstart
For convenience, I've packaged this into a single command that consistently reproduces the problem on my machine. If you have Docker installed, this should Just Work™️.
cat <<DOCKER_EOF | docker build -t gunicorn-repro-max-requests-rst - && docker run --rm gunicorn-repro-max-requests-rst
FROM python:3.11
RUN pip install --disable-pip-version-check --no-cache gunicorn
RUN cat <<EOF > myapp.py
def app(environ, start_response):
data = b"Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
return iter([data])
EOF
RUN cat <<EOF > repro.sh
#!/usr/bin/env bash
set -euo pipefail
# start gunicorn
gunicorn --max-requests 10 --log-level debug --worker-class gthread myapp:app &
# wait for gunicorn to be ready
sleep 5
# send 20 sequential HTTP requests with minimal output
for i in {1..20}; do
curl -S -s -o /dev/null -w "[REPRO]: Received HTTP %{http_code} from GET /\\\${i}\n" "127.0.0.1:8000/\\\${i}" || true
done
EOF
RUN chmod +x repro.sh
CMD ["./repro.sh"]
DOCKER_EOF
Example
Logs
[2023-07-21 00:20:10 +0000] [6] [DEBUG] Current configuration:
config: ./gunicorn.conf.py
wsgi_app: None
bind: ['127.0.0.1:8000']
backlog: 2048
workers: 1
worker_class: gthread
threads: 1
worker_connections: 1000
max_requests: 10
max_requests_jitter: 0
timeout: 30
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
print_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 0
group: 0
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: None
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: -
loglevel: debug
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
logconfig_json: None
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: myapp:app
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0xffff935672e0>
on_reload: <function OnReload.on_reload at 0xffff93567420>
when_ready: <function WhenReady.when_ready at 0xffff93567560>
pre_fork: <function Prefork.pre_fork at 0xffff935676a0>
post_fork: <function Postfork.post_fork at 0xffff935677e0>
post_worker_init: <function PostWorkerInit.post_worker_init at 0xffff93567920>
worker_int: <function WorkerInt.worker_int at 0xffff93567a60>
worker_abort: <function WorkerAbort.worker_abort at 0xffff93567ba0>
pre_exec: <function PreExec.pre_exec at 0xffff93567ce0>
pre_request: <function PreRequest.pre_request at 0xffff93567e20>
post_request: <function PostRequest.post_request at 0xffff93567ec0>
child_exit: <function ChildExit.child_exit at 0xffff92db8040>
worker_exit: <function WorkerExit.worker_exit at 0xffff92db8180>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0xffff92db82c0>
on_exit: <function OnExit.on_exit at 0xffff92db8400>
ssl_context: <function NewSSLContext.ssl_context at 0xffff92db85e0>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2023-07-21 00:20:10 +0000] [6] [INFO] Starting gunicorn 21.2.0
[2023-07-21 00:20:10 +0000] [6] [DEBUG] Arbiter booted
[2023-07-21 00:20:10 +0000] [6] [INFO] Listening at: http://127.0.0.1:8000 (6)
[2023-07-21 00:20:10 +0000] [6] [INFO] Using worker: gthread
[2023-07-21 00:20:10 +0000] [8] [INFO] Booting worker with pid: 8
[2023-07-21 00:20:10 +0000] [6] [DEBUG] 1 workers
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /1
[REPRO]: Received HTTP 200 from GET /1
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /2
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /2
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /3
[REPRO]: Received HTTP 200 from GET /3
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /4
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /4
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /5
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /5
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /6
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /6
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /7
[REPRO]: Received HTTP 200 from GET /7
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /8
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /8
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /9
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /9
[2023-07-21 00:20:15 +0000] [8] [DEBUG] GET /10
[2023-07-21 00:20:15 +0000] [8] [INFO] Autorestarting worker after current request.
[2023-07-21 00:20:15 +0000] [8] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /10
curl: (56) Recv failure: Connection reset by peer
[2023-07-21 00:20:15 +0000] [8] [INFO] Worker exiting (pid: 8)
[REPRO]: Received HTTP 000 from GET /11
[2023-07-21 00:20:15 +0000] [22] [INFO] Booting worker with pid: 22
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /12
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /12
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /13
[REPRO]: Received HTTP 200 from GET /13
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /14
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /14
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /15
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /15
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /16
[REPRO]: Received HTTP 200 from GET /16
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /17
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /17
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /18
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /18
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /19
[REPRO]: Received HTTP 200 from GET /19
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[2023-07-21 00:20:15 +0000] [22] [DEBUG] GET /20
[2023-07-21 00:20:15 +0000] [22] [DEBUG] Closing connection.
[REPRO]: Received HTTP 200 from GET /20
Expected
I'd expect to receive an HTTP 200 for each request, regardless of the max-requests restart. We should see [DEBUG] GET /11
when the worker handles the 11th request.
Actual
The reproduction script sends GET /11
, but the worker never sees it, and we see a connection reset instead. The repro script reports a status code of 000
, but that's just a quirk of libcurl. I've used tcpdump and can confirm the RST
.
In case it's useful, I've also seen curl: (52) Empty reply from server
, but it happens less frequently and I'm not 100% sure that it's the same problem.
Workaround
Increasing max-requests makes this happen less frequently, but the only way to resolve it is to disable max-requests (or maybe switch to a different worker type?). Increasing the number of workers or threads doesn't seem to resolve the problem either, from what I've seen.