-
Notifications
You must be signed in to change notification settings - Fork 506
Mimir received a series with an invalid label/metric name #3898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How is the data pushed to mimir? Does it scrate the OTEL collector's prometheus exporter? This does not seem like operator issue. This issue should be most likely opened in the https://github.com/open-telemetry/opentelemetry-collector-contrib/ |
@pavolloffay we use Prometheus Operator + ServiceMonitors. The the data is pushed to Mimir with Prometheus Remote Write. We set the following config for
One of the ServiceMonitors is created like the one below:
You still think I should open an issue in the collector repo? |
Yes I think it is most likely issue with the Prometheus exporter. @swiatekm WDYT? |
@sigurdfalk it would help a lot if you could post your collector configuration. This is a problem that could conceivably be caused by any of the prometheus-related components. |
This is my config (copied after applied to k8s, so redacted/removed some noisy/sensitive fields): apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
labels:
app: opentelemetry-operator
name: otel-daemonset
namespace: system-opentelemetry-operator
spec:
config:
exporters:
otlphttp/loki:
endpoint: xxx
headers:
Authorization: ${GRAFANA_LOKI_BASIC_AUTH}
X-Scope-OrgID: xxx
extensions:
file_storage:
directory: /var/lib/otelcol
health_check:
endpoint: 0.0.0.0:13134
processors:
batch: {}
filter/loki-k8s:
...
k8sattributes:
...
memory_limiter:
...
transform/filelog-receiver:
...
receivers:
filelog/k8s:
...
service:
extensions:
- health_check
- file_storage
pipelines:
logs/loki-k8s:
exporters:
- otlphttp/loki
processors:
- memory_limiter
- k8sattributes
- filter/loki-k8s
- transform/filelog-receiver
- batch
receivers:
- filelog/k8s
telemetry:
logs:
encoding: json
level: info
metrics:
address: 0.0.0.0:8889
level: detailed
configVersions: 3
daemonSetUpdateStrategy: {}
deploymentUpdateStrategy: {}
env:
- name: GOMEMLIMIT
value: 13000MiB
envFrom:
- secretRef:
name: collector-grafana-loki-auth
hostNetwork: true
ingress:
route: {}
ipFamilyPolicy: SingleStack
managementState: managed
mode: daemonset
observability:
metrics:
enableMetrics: true
podDnsConfig: {}
replicas: 1
resources: {}
securityContext:
runAsGroup: 0
runAsUser: 0
serviceAccount: otel-daemonset
targetAllocator:
allocationStrategy: consistent-hashing
filterStrategy: relabel-config
observability:
metrics: {}
prometheusCR:
scrapeInterval: 30s
resources: {}
tolerations:
- effect: NoSchedule
operator: Exists
upgradeStrategy: automatic
volumeMounts:
- mountPath: /var/log/pods
name: varlogpods
readOnly: true
- mountPath: /var/lib/otelcol
name: varlibotelcol
volumes:
- hostPath:
path: /var/log/pods
name: varlogpods
- hostPath:
path: /var/lib/otelcol
type: DirectoryOrCreate
name: varlibotelcol |
Are you sure that's the right one? It doesn't have any prometheus components in it, and only seems to be shipping logs to Loki. |
The issue is not metrics being sent by an exporter in the collector. It's when Prometheus scrapes the metrics from the collector itself on port 8889 and then sends it to |
I think you're suffering from open-telemetry/opentelemetry-collector#12458, but I'm not sure why, given that you're using the deprecated syntax for configuring the Prometheus endpoint. Can you post the output of the collector's Prometheus endpoint (that is, |
Oh, I was not aware that was deprecated. Will try to update to the new config then. Anyways, this is the output on 8889 which causes the errors:
|
That output doesn't have any of the problems Mimir complains about. Are you sure you aren't introducing them in some intermediate processing step? |
Yes, I also thought that was strange, and thought about if we are doing something in between. However, as I can toggle the issue by switching back and forth between the collector version, I find it a bit strange.. Can't really think about anything we do at our end that would impact this, our setup is pretty standard. |
Component(s)
collector
What happened?
Description
After upgrading from Operator version
v0.117.0
and collector versionv0.117.0
, we see metric being dropped by Mimir due to "err-mimir-metric-name-invalid" and "err-mimir-label-invalid". More details about specific metric and label in log output.Steps to Reproduce
We install the OTEL Operator with the Helm Chart
v0.84.2
with the following values:Expected Result
Actual Result
Kubernetes Version
1.30.11
Operator version
0.120.0
Collector version
0.120.0
Environment information
Environment
OS: AzureLinux on AKS
Compiler(if manually compiled): (e.g., "go 14.2")
Log output
From Prometheus logs:
Additional context
No response
The text was updated successfully, but these errors were encountered: