Skip to content

Metric agent_component_controller_running_components when component fails #274

Open
grafana/agent
#6385
@sadovnikov

Description

@sadovnikov

What's wrong?

When the loki.source.kubernetes_events component fails to load and exits, the value of the agent_component_controller_running_components metric does not change. All GA components are reported as healthy.

When with the enclosed configuration, the Source:EventSource{Component:,Host:,} component fails to load the loki.source.kubernetes_events reports 6 "healthy" components - the same value as with successfully loaded components. The health_type is always "healthy".

The failure to load the component is reported only by this logline:

image

Steps to reproduce

The Kubernetes Events source fails to load only occasionally and only in overloaded clusters.
Maybe, to reproduce the problem with a dev version, the informerSyncTimeout can be set to a very low value

System information

No response

Software version

v0.37.2

Configuration

Full `config.river`


    logging {
      level  = "info"
      format = "json"
    }

    otelcol.receiver.otlp "otlp" {
      http {}

      output {
        metrics= [otelcol.exporter.prometheus.prom.input]
      }
    }

    otelcol.exporter.prometheus "prom" {
      forward_to  = [ prometheus.remote_write.default.receiver ]
    }

    prometheus.remote_write "default" {
      endpoint {
        url = "http://prometheus-server.monitoring-system.svc.cluster.local/api/v1/write"
      }
    }

    loki.write "obs" {
      external_labels = { cluster = "staging" }
      endpoint {
        url = "https://loki.platform-staging.internal.xxx.yyy/loki/api/v1/push"
      }
    }

    loki.relabel "drop_instance" {
      forward_to = [loki.write.obs.receiver]
      rule {
        action = "labeldrop"
        regex  = "instance"
      }
    }

    loki.source.kubernetes_events "default" {
      forward_to = [loki.relabel.drop_instance.receiver]
      job_name = "kubernetes_events"
      log_format = "json"
    }


### Logs

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions