observability-zero-to-hero day-3

📊 Metrics in Prometheus

Metrics in Prometheus are the core data objects that represent measurements collected from monitored systems.
These metrics provide insights into various aspects of system performance, health, and behavior.

🏷️ Labels:

Metrics are paired with Labels.
Labels are key-value pairs that allow you to differentiate between dimensions of a metric, such as different services, instances, or endpoints.

🔍 Example:

container_cpu_usage_seconds_total{namespace="kube-system", endpoint="https-metrics"}

container_cpu_usage_seconds_total is the metric.
{namespace="kube-system", endpoint="https-metrics"} are the labels.

🛠️ What is PromQL?

PromQL (Prometheus Query Language) is a powerful and flexible query language used to query data from Prometheus.
It allows you to retrieve and manipulate time series data, perform mathematical operations, aggregate data, and much more.
🔑 Key Features of PromQL:
- Selecting Time Series: You can select specific metrics with filters and retrieve their data.
- Mathematical Operations: PromQL allows for mathematical operations on metrics.
- Aggregation: You can aggregate data across multiple time series.
- Functionality: PromQL includes a wide range of functions to analyze and manipulate data.

💡 Basic Examples of PromQL

container_cpu_usage_seconds_total
- Return all time series with the metric container_cpu_usage_seconds_total
container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}
- Return all time series with the metric container_cpu_usage_seconds_total and the given namespace and pod labels.
container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-proxy.*"}[5m]
- Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector.

⚙️ Aggregation & Functions in PromQL

Aggregation in PromQL allows you to combine multiple time series into a single one, based on certain labels.
Sum Up All CPU Usage:
```
  sum(rate(node_cpu_seconds_total[5m]))
```
- This query aggregates the CPU usage across all nodes.
Average Memory Usage per Namespace:
```
  avg(container_memory_usage_bytes) by (namespace)
```
- This query provides the average memory usage grouped by namespace.
rate() Function:
- The rate() function calculates the per-second average rate of increase of the time series in a specified range.

    rate(container_cpu_usage_seconds_total[5m])

This calculates the rate of CPU usage over 5 minutes.

increase() Function:
- The increase() function returns the increase in a counter over a specified time range.

    increase(kube_pod_container_status_restarts_total[1h])

This gives the total increase in container restarts over the last hour.

histogram_quantile() Function:
- The histogram_quantile() function calculates quantiles (e.g., 95th percentile) from histogram data.

    histogram_quantile(0.95, sum(rate(apiserver_request_duration_seconds_bucket[5m])) by (le))

This calculates the 95th percentile of Kubernetes API request durations.

ingress_kube_prom_stack.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubernetes-prometheus-stack
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /prometheus
            pathType: Prefix
            backend:
              service:
                name: prometheus-service  # Change this to your Prometheus service name
                port:
                  number: 9090
          - path: /grafana
            pathType: Prefix
            backend:
              service:
                name: grafana-service  # Change this to your Grafana service name
                port:
                  number: 3000
          - path: /alertmanager
            pathType: Prefix
            backend:
              service:
                name: alertmanager-service  # Change this to your Alertmanager service name
                port:
                  number: 9093

Support my work-

Stay connected on LinkedIn Profile

Stay up to date with GitHub Profile

Feel free to reach out to me, if you have any other queries.

Happy Learning!