π Metrics in Prometheus
Metrics in Prometheus are the core data objects that represent measurements collected from monitored systems.
These metrics provide insights into various aspects of system performance, health, and behavior.
π·οΈ Labels:
Metrics are paired with Labels.
Labels are key-value pairs that allow you to differentiate between dimensions of a metric, such as different services, instances, or endpoints.
π Example:
container_cpu_usage_seconds_total{namespace="kube-system", endpoint="https-metrics"}
container_cpu_usage_seconds_total
is the metric.{namespace="kube-system", endpoint="https-metrics"}
are the labels.
π οΈ What is PromQL?
PromQL (Prometheus Query Language) is a powerful and flexible query language used to query data from Prometheus.
It allows you to retrieve and manipulate time series data, perform mathematical operations, aggregate data, and much more.
π Key Features of PromQL:
Selecting Time Series: You can select specific metrics with filters and retrieve their data.
Mathematical Operations: PromQL allows for mathematical operations on metrics.
Aggregation: You can aggregate data across multiple time series.
Functionality: PromQL includes a wide range of functions to analyze and manipulate data.
π‘ Basic Examples of PromQL
container_cpu_usage_seconds_total
- Return all time series with the metric container_cpu_usage_seconds_total
container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}
- Return all time series with the metric
container_cpu_usage_seconds_total
and the givennamespace
andpod
labels.
- Return all time series with the metric
container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}[5m]
- Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector.
βοΈ Aggregation & Functions in PromQL
Aggregation in PromQL allows you to combine multiple time series into a single one, based on certain labels.
Sum Up All CPU Usage:
sum(rate(node_cpu_seconds_total[5m]))
- This query aggregates the CPU usage across all nodes.
Average Memory Usage per Namespace:
avg(container_memory_usage_bytes) by (namespace)
- This query provides the average memory usage grouped by namespace.
rate() Function:
- The rate() function calculates the per-second average rate of increase of the time series in a specified range.
rate(container_cpu_usage_seconds_total[5m])
- This calculates the rate of CPU usage over 5 minutes.
increase() Function:
- The increase() function returns the increase in a counter over a specified time range.
increase(kube_pod_container_status_restarts_total[1h])
- This gives the total increase in container restarts over the last hour.
histogram_quantile() Function:
- The histogram_quantile() function calculates quantiles (e.g., 95th percentile) from histogram data.
histogram_quantile(0.95, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le))
- This calculates the 95th percentile of Kubernetes API request durations.
ingress_kube_prom_stack.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kubernetes-prometheus-stack
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
alb.ingress.kubernetes.io/target-type: ip
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /prometheus
pathType: Prefix
backend:
service:
name: prometheus-service # Change this to your Prometheus service name
port:
number: 9090
- path: /grafana
pathType: Prefix
backend:
service:
name: grafana-service # Change this to your Grafana service name
port:
number: 3000
- path: /alertmanager
pathType: Prefix
backend:
service:
name: alertmanager-service # Change this to your Alertmanager service name
port:
number: 9093
Support my work-
Stay connected on LinkedIn Profile
Stay up to date with GitHub Profile
Feel free to reach out to me, if you have any other queries.
Happy Learning!