Description
What's wrong?
When the loki.source.kubernetes_events
component fails to load and exits, the value of the agent_component_controller_running_components
metric does not change. All GA components are reported as healthy.
When with the enclosed configuration, the Source:EventSource{Component:,Host:,}
component fails to load the loki.source.kubernetes_events
reports 6 "healthy" components - the same value as with successfully loaded components. The health_type
is always "healthy"
.
The failure to load the component is reported only by this logline:
Steps to reproduce
The Kubernetes Events source fails to load only occasionally and only in overloaded clusters.
Maybe, to reproduce the problem with a dev version, the informerSyncTimeout can be set to a very low value
System information
No response
Software version
v0.37.2
Configuration
Full `config.river`
logging {
level = "info"
format = "json"
}
otelcol.receiver.otlp "otlp" {
http {}
output {
metrics= [otelcol.exporter.prometheus.prom.input]
}
}
otelcol.exporter.prometheus "prom" {
forward_to = [ prometheus.remote_write.default.receiver ]
}
prometheus.remote_write "default" {
endpoint {
url = "http://prometheus-server.monitoring-system.svc.cluster.local/api/v1/write"
}
}
loki.write "obs" {
external_labels = { cluster = "staging" }
endpoint {
url = "https://loki.platform-staging.internal.xxx.yyy/loki/api/v1/push"
}
}
loki.relabel "drop_instance" {
forward_to = [loki.write.obs.receiver]
rule {
action = "labeldrop"
regex = "instance"
}
}
loki.source.kubernetes_events "default" {
forward_to = [loki.relabel.drop_instance.receiver]
job_name = "kubernetes_events"
log_format = "json"
}
### Logs
_No response_