☸️ Container Orchestration in MLOps — Kubernetes & Helm Introduction

MachineLearning

🚀 Introduction

Machine Learning (ML) projects often start small — maybe a single model served through Flask or FastAPI — but as the application scales, managing multiple models, APIs, and services becomes complex.

Here comes Kubernetes (K8s) — a container orchestration platform that automates deployment, scaling, and management of containerized ML applications.
And to simplify Kubernetes configuration, we use Helm, the package manager for Kubernetes.

Together, they form the backbone of production-grade MLOps systems.

🧩 Introduction to Kubernetes for MLOps

Kubernetes (often called K8s) is an open-source platform originally developed by Google to manage containers at scale.

It allows you to:

Automatically deploy, scale, and monitor your ML models.
Manage hundreds of containers efficiently.
Handle failures, load balancing, and service discovery.

In MLOps, Kubernetes helps deploy ML models as microservices, orchestrate data pipelines, and manage distributed training jobs.

🏗️ Overview of Kubernetes Architecture

Kubernetes architecture is divided into Control Plane and Worker Nodes.

🧠 Control Plane Components

API Server: Entry point for all Kubernetes commands (kubectl).
etcd: Key-value store for cluster state and configurations.
Controller Manager: Ensures desired state (e.g., replicas running).
Scheduler: Assigns pods to nodes based on resources.

⚙️ Worker Node Components

Kubelet: Communicates with the Control Plane and runs pods.
Kube-proxy: Manages network rules for services.
Container Runtime: Executes containers (e.g., Docker, containerd).

🖼️ Kubernetes Architecture Diagram

+---------------------------------------------------+
|                    Control Plane                  |
|  +------------+   +------------+   +------------+ |
|  | API Server |-->| Controller |-->| Scheduler  | |
|  +------------+   +------------+   +------------+ |
|          |                |              |        |
|        etcd <-------------+--------------+        |
+---------------------------------------------------+
          |
          v
+---------------------------------------------------+
|                    Worker Nodes                   |
|  +-----------+   +-----------+   +-----------+    |
|  | Kubelet   |   | Kubelet   |   | Kubelet   |    |
|  | Pod (ML)  |   | Pod (API) |   | Pod (DB)  |    |
|  +-----------+   +-----------+   +-----------+    |
+---------------------------------------------------+

🧱 Managing Containers with Kubernetes

Kubernetes manages containers as Pods — the smallest deployable unit.
A Pod can contain one or more containers (e.g., ML model + monitoring agent).

🔹 Basic Commands

kubectl get pods
kubectl get services
kubectl describe pod <pod-name>
kubectl delete pod <pod-name>

Pods are managed using higher-level controllers like Deployments, ReplicaSets, and DaemonSets.

🚀 Deploying Applications on Kubernetes

Let’s deploy a simple ML model service using Kubernetes.

Step 1: Create a Deployment (YAML)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
        - name: ml-model
          image: bittusharma/ml-api:v1
          ports:
            - containerPort: 5000

Step 2: Create a Service

apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  type: NodePort
  selector:
    app: ml-model
  ports:
    - port: 80
      targetPort: 5000
      nodePort: 30001

Step 3: Apply Configuration

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

Now access the model using:

http://<node-ip>:30001/predict

⚙️ Setting up Kubernetes Cluster for ML Applications

🧩 Local Setup (Minikube)

For practice, set up Minikube:

minikube start
kubectl get nodes

☁️ Cloud Setup

For production ML workloads, use:

Amazon EKS (Elastic Kubernetes Service)
Google GKE (Google Kubernetes Engine)
Azure AKS (Azure Kubernetes Service)

These services provide auto-scaling, load balancing, and integrated monitoring for ML models.

🧠 Creating and Managing Pods, Deployments, and Services

Pods

Smallest deployable unit — one or more containers.

kubectl run test-pod --image=nginx

Deployments

Manages multiple Pods and ensures high availability.

kubectl create deployment webapp --image=nginx

Services

Expose your Pods to the outside world.

kubectl expose deployment webapp --type=LoadBalancer --port=80

Scaling

kubectl scale deployment webapp --replicas=3

⚓ Using Helm for Kubernetes Management

Kubernetes uses YAML manifests for every component — which can become complex for large ML projects.
Helm simplifies this by packaging all configurations into a reusable format called a Helm Chart.

📦 Introduction to Helm Charts

A Helm Chart is like a Dockerfile for Kubernetes — it defines how to deploy your app using a structured template.

Helm Chart Structure:

my-ml-chart/
  ├── Chart.yaml
  ├── values.yaml
  ├── templates/
  │   ├── deployment.yaml
  │   ├── service.yaml

Example `Chart.yaml`

apiVersion: v2
name: ml-model
version: 0.1.0
description: A Helm chart for deploying ML model API

Example `values.yaml`

replicaCount: 2
image:
  repository: bittusharma/ml-api
  tag: v1
service:
  type: NodePort
  port: 80

🚀 Deploying Applications with Helm

Step 1: Create a Chart

helm create ml-model

Step 2: Update `values.yaml` and templates

Step 3: Install the Chart

helm install mlapp ./ml-model

Step 4: Check Release

helm list
kubectl get pods

Step 5: Upgrade / Rollback

helm upgrade mlapp ./ml-model
helm rollback mlapp 1

🧠 Helm is to Kubernetes what apt is to Ubuntu — a package manager for simplifying deployments.

🧩 Best Practices for Kubernetes in MLOps

Use Namespaces – Isolate dev/test/prod workloads.
Leverage ConfigMaps & Secrets – Store credentials and configs securely.
Use Resource Limits – Prevent ML containers from consuming all GPU/CPU.
Use Liveness & Readiness Probes – Auto-restart unhealthy model pods.
Monitor & Log Everything – Integrate with Prometheus and Grafana.
CI/CD Integration – Automate model builds and deployments using GitHub Actions or Jenkins.
GPU Workloads – Use GPU node pools and NVIDIA device plugins.

⚖️ Scaling and Auto-Scaling ML Models

Kubernetes provides Horizontal Pod Autoscaler (HPA) for scaling based on CPU/memory usage.

Example HPA

kubectl autoscale deployment ml-model-deployment --cpu-percent=70 --min=2 --max=10

This automatically scales ML model replicas based on load — ensuring reliability and cost efficiency.

Follow me on LinkedIn

Follow me on GitHub

Keep Learning……

Command Palette