Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: OpenTelemetry metrics #13265

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

feat!: OpenTelemetry metrics #13265

wants to merge 2 commits into from

Conversation

Joibel
Copy link
Member

@Joibel Joibel commented Jun 28, 2024

Led by #12589.

As discussed in #12589 OpenTelemetry is the future of observability. This PR changes the underlying codebase to collect metrics using the otel libraries, whilst attempting to retain compatibility with prometheus scraping where this makes sense.

It helps lay the groundwork for adding workflow tracing using otel.

This PR amends and extends the built in metrics to be more useful and correctly named.

This PR does not attempt do do anything else with otel, nor does it attempt to change custom metrics.

Mostly removed prometheus libraries, and replaced with otel libraries.

Allows use of both prometheus /metrics scraping and opentelemetry protocol transmission.

Extends the workqueue metrics to include all of them exposed by client-go/workqueue

Removed argo_workflows_workflows_processed_count as a non-functioning metric.

originName enhancements proposed in #12589 were not added, nor argo_workflows_workflowtemplate_used_total

metricsTTL for histogram custom metrics isn't implementable as the opentelemetry api doesn't provide a way to delete metrics by design. This only works for other custom metrics types by making them all asynchronous/observable (even for non realtime metrics) and just not reporting the TTL expired metrics.

Note to reviewers: this is part of a stack of reviews for metrics changes. Please don't merge until the rest of the stack is also ready.

Supersedes #13232

@Joibel Joibel self-assigned this Jun 28, 2024
@Joibel Joibel changed the title feat: OpenTelemetry metrics feat!: OpenTelemetry metrics Jun 28, 2024
@Joibel Joibel marked this pull request as draft June 28, 2024 13:42
@Joibel Joibel removed their assignment Jun 28, 2024
@Joibel Joibel marked this pull request as ready for review June 30, 2024 09:48
@Joibel Joibel added area/controller Controller issues, panics area/metrics prioritized-review For members of the Sustainability Effort labels Jul 1, 2024
Led by #12589.

As discussed in #12589 OpenTelemetry is the future of
observability. This PR changes the underlying codebase to collect
metrics using the otel libraries, whilst attempting to retain
compatibility with prometheus scraping where this makes sense.

It helps lay the groundwork for adding workflow tracing using otel.

This PR amends and extends the built in metrics to be more useful and
correctly named.

This PR does not attempt do do anything else with otel, nor does it
attempt to change custom metrics.

Mostly removed prometheus libraries, and replaced with otel libraries.

Allows use of both prometheus `/metrics` scraping and opentelemetry
protocol transmission.

Extends the workqueue metrics to include all of them exposed by
[client-go/workqueue](https://github.com/kubernetes/client-go/blob/babfe96631905312244f34500d2fd1a8a10b186c/util/workqueue/metrics.go#L173)

Removed `argo_workflows_workflows_processed_count` as a
non-functioning metric.

`originName` enhancements proposed in #12589 were not added, nor
`argo_workflows_workflowtemplate_used_total`

Note to reviewers: this is part of a stack of reviews for metrics
changes. Please don't merge until the rest of the stack is also ready.

Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel force-pushed the opentelemetry branch 2 times, most recently from 066e9a7 to 5974967 Compare July 5, 2024 10:11
Signed-off-by: Alan Clucas <alan@clucas.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/metrics prioritized-review For members of the Sustainability Effort
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant
-