Skip to content

feat: add check_http_metrics task for Prometheus metrics assertions#186

Draft
chetanyb wants to merge 5 commits into
ethpandaops:masterfrom
chetanyb:check-http-metrics
Draft

feat: add check_http_metrics task for Prometheus metrics assertions#186
chetanyb wants to merge 5 commits into
ethpandaops:masterfrom
chetanyb:check-http-metrics

Conversation

@chetanyb
Copy link
Copy Markdown

Summary

  • Add new check_http_metrics task for evaluating assertions against Prometheus metrics endpoints
  • Supports value mode (current value) and delta mode (change over time) assertions
  • Includes counter reset detection, missing metric/series handling, and label subset matching

Features

Assertion Modes:

  • Value mode: Assert on current metric value (e.g., value > 100)
  • Delta mode: Assert on change since baseline (e.g., counter increased by at least 1, gauge decreased)

Metric Support:

Type Value Extracted
COUNTER Counter value (with reset detection)
GAUGE Gauge value
UNTYPED Untyped value
SUMMARY Sample sum
HISTOGRAM Sample sum

Configurable Behaviors:

  • missingMetric / missingSeries: wait, fail, or pass
  • resetBehavior: fail, rebaseline, or ignore (COUNTER only)
  • failOnCheckMiss: Fail immediately vs keep polling
  • continueOnPass: Keep monitoring after success

Safety Features:

  • Response size limit (maxResponseSize)
  • Request timeout (requestTimeout)
  • Non-finite value detection (NaN/Inf)
  • Label subset matching must select exactly one series

Example Usage

- name: check_metrics
  task: check_http_metrics
  config:
    url: "http://localhost:9090/metrics"
    pollInterval: 10s
    assertions:
      # Check counter increased by at least 1
      - name: counter_increased
        metric: my_counter
        labels:
          env: prod
        mode: delta
        operator: gte
        value: 1

      # Check gauge decreased (negative delta)
      - name: gauge_dropped
        metric: my_gauge
        mode: delta
        operator: lte
        value: -1

      # Check current value is above threshold
      - name: value_check
        metric: my_metric
        operator: gt
        value: 100

Outputs

Output Type Description
passedAssertions array Assertion names that passed
failedAssertions array Assertion names that failed
values object Map of assertion name to latest value
deltas object Map of assertion name to computed delta
baselines object Map of assertion name to baseline value
scrapeErrors int HTTP/parsing error count
assertionErrors int Assertion evaluation error count

Files

File Lines Description
pkg/tasks/check_http_metrics/config.go 235 Configuration structs and validation
pkg/tasks/check_http_metrics/task.go 599 Task implementation
pkg/tasks/check_http_metrics/task_test.go 1604 37 unit tests
pkg/tasks/check_http_metrics/README.md 131 Documentation
pkg/tasks/tasks.go +2 Task registration

Test Plan

  • Unit tests pass with race detection (go test -race)
  • Linter passes (golangci-lint run --new-from-rev="origin/master")
  • Build succeeds (go build ./...)
  • Code reviewed for correctness and style
  • Manual test against real Prometheus endpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant