observability-zero-to-hero day-3


3 min read

observability-zero-to-hero day-3

πŸ“Š Metrics in Prometheus

  • Metrics in Prometheus are the core data objects that represent measurements collected from monitored systems.

  • These metrics provide insights into various aspects of system performance, health, and behavior.

🏷️ Labels:

  • Metrics are paired with Labels.

  • Labels are key-value pairs that allow you to differentiate between dimensions of a metric, such as different services, instances, or endpoints.

πŸ” Example:

container_cpu_usage_seconds_total{namespace="kube-system", endpoint="https-metrics"}
  • container_cpu_usage_seconds_total is the metric.

  • {namespace="kube-system", endpoint="https-metrics"} are the labels.

πŸ› οΈ What is PromQL?

  • PromQL (Prometheus Query Language) is a powerful and flexible query language used to query data from Prometheus.

  • It allows you to retrieve and manipulate time series data, perform mathematical operations, aggregate data, and much more.

  • πŸ”‘ Key Features of PromQL:

    • Selecting Time Series: You can select specific metrics with filters and retrieve their data.

    • Mathematical Operations: PromQL allows for mathematical operations on metrics.

    • Aggregation: You can aggregate data across multiple time series.

    • Functionality: PromQL includes a wide range of functions to analyze and manipulate data.

πŸ’‘ Basic Examples of PromQL

  • container_cpu_usage_seconds_total

    • Return all time series with the metric container_cpu_usage_seconds_total
  • container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}

    • Return all time series with the metric container_cpu_usage_seconds_total and the given namespace and pod labels.
  • container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}[5m]

    • Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector.

βš™οΈ Aggregation & Functions in PromQL

  • Aggregation in PromQL allows you to combine multiple time series into a single one, based on certain labels.

  • Sum Up All CPU Usage:

    • This query aggregates the CPU usage across all nodes.
  • Average Memory Usage per Namespace:

      avg(container_memory_usage_bytes) by (namespace)
    • This query provides the average memory usage grouped by namespace.
  • rate() Function:

    • The rate() function calculates the per-second average rate of increase of the time series in a specified range.
  • This calculates the rate of CPU usage over 5 minutes.
  • increase() Function:

    • The increase() function returns the increase in a counter over a specified time range.
  • This gives the total increase in container restarts over the last hour.
  • histogram_quantile() Function:

    • The histogram_quantile() function calculates quantiles (e.g., 95th percentile) from histogram data.
    histogram_quantile(0.95, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le))
  • This calculates the 95th percentile of Kubernetes API request durations.


apiVersion: networking.k8s.io/v1
kind: Ingress
  name: kubernetes-prometheus-stack
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
    alb.ingress.kubernetes.io/target-type: ip
  ingressClassName: alb
    - http:
          - path: /prometheus
            pathType: Prefix
                name: prometheus-service  # Change this to your Prometheus service name
                  number: 9090
          - path: /grafana
            pathType: Prefix
                name: grafana-service  # Change this to your Grafana service name
                  number: 3000
          - path: /alertmanager
            pathType: Prefix
                name: alertmanager-service  # Change this to your Alertmanager service name
                  number: 9093

Support my work-

Stay connected on LinkedIn Profile

Stay up to date with GitHub Profile

Feel free to reach out to me, if you have any other queries.

Happy Learning!