βΈοΈ Container Orchestration in MLOps β Kubernetes & Helm Introduction

I am Bittu Sharma, a DevOps & AI Engineer with a keen interest in building intelligent, automated systems. My goal is to bridge the gap between software engineering and data science, ensuring scalable deployments and efficient model operations in production.! ππ²π'π ππΌπ»π»π²π°π I would love the opportunity to connect and contribute. Feel free to DM me on LinkedIn itself or reach out to me at bittush9534@gmail.com. I look forward to connecting and networking with people in this exciting Tech World.
MachineLearning
π Introduction
Machine Learning (ML) projects often start small β maybe a single model served through Flask or FastAPI β but as the application scales, managing multiple models, APIs, and services becomes complex.
Here comes Kubernetes (K8s) β a container orchestration platform that automates deployment, scaling, and management of containerized ML applications.
And to simplify Kubernetes configuration, we use Helm, the package manager for Kubernetes.
Together, they form the backbone of production-grade MLOps systems.
π§© Introduction to Kubernetes for MLOps
Kubernetes (often called K8s) is an open-source platform originally developed by Google to manage containers at scale.
It allows you to:
Automatically deploy, scale, and monitor your ML models.
Manage hundreds of containers efficiently.
Handle failures, load balancing, and service discovery.
In MLOps, Kubernetes helps deploy ML models as microservices, orchestrate data pipelines, and manage distributed training jobs.
ποΈ Overview of Kubernetes Architecture
Kubernetes architecture is divided into Control Plane and Worker Nodes.
π§ Control Plane Components
API Server: Entry point for all Kubernetes commands (
kubectl).etcd: Key-value store for cluster state and configurations.
Controller Manager: Ensures desired state (e.g., replicas running).
Scheduler: Assigns pods to nodes based on resources.
βοΈ Worker Node Components
Kubelet: Communicates with the Control Plane and runs pods.
Kube-proxy: Manages network rules for services.
Container Runtime: Executes containers (e.g., Docker, containerd).
πΌοΈ Kubernetes Architecture Diagram
+---------------------------------------------------+
| Control Plane |
| +------------+ +------------+ +------------+ |
| | API Server |-->| Controller |-->| Scheduler | |
| +------------+ +------------+ +------------+ |
| | | | |
| etcd <-------------+--------------+ |
+---------------------------------------------------+
|
v
+---------------------------------------------------+
| Worker Nodes |
| +-----------+ +-----------+ +-----------+ |
| | Kubelet | | Kubelet | | Kubelet | |
| | Pod (ML) | | Pod (API) | | Pod (DB) | |
| +-----------+ +-----------+ +-----------+ |
+---------------------------------------------------+
π§± Managing Containers with Kubernetes
Kubernetes manages containers as Pods β the smallest deployable unit.
A Pod can contain one or more containers (e.g., ML model + monitoring agent).
πΉ Basic Commands
kubectl get pods
kubectl get services
kubectl describe pod <pod-name>
kubectl delete pod <pod-name>
Pods are managed using higher-level controllers like Deployments, ReplicaSets, and DaemonSets.
π Deploying Applications on Kubernetes
Letβs deploy a simple ML model service using Kubernetes.
Step 1: Create a Deployment (YAML)
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-deployment
spec:
replicas: 2
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-model
image: bittusharma/ml-api:v1
ports:
- containerPort: 5000
Step 2: Create a Service
apiVersion: v1
kind: Service
metadata:
name: ml-model-service
spec:
type: NodePort
selector:
app: ml-model
ports:
- port: 80
targetPort: 5000
nodePort: 30001
Step 3: Apply Configuration
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
Now access the model using:
http://<node-ip>:30001/predict
βοΈ Setting up Kubernetes Cluster for ML Applications
π§© Local Setup (Minikube)
For practice, set up Minikube:
minikube start
kubectl get nodes
βοΈ Cloud Setup
For production ML workloads, use:
Amazon EKS (Elastic Kubernetes Service)
Google GKE (Google Kubernetes Engine)
Azure AKS (Azure Kubernetes Service)
These services provide auto-scaling, load balancing, and integrated monitoring for ML models.
π§ Creating and Managing Pods, Deployments, and Services
Pods
Smallest deployable unit β one or more containers.
kubectl run test-pod --image=nginx
Deployments
Manages multiple Pods and ensures high availability.
kubectl create deployment webapp --image=nginx
Services
Expose your Pods to the outside world.
kubectl expose deployment webapp --type=LoadBalancer --port=80
Scaling
kubectl scale deployment webapp --replicas=3
β Using Helm for Kubernetes Management
Kubernetes uses YAML manifests for every component β which can become complex for large ML projects.
Helm simplifies this by packaging all configurations into a reusable format called a Helm Chart.
π¦ Introduction to Helm Charts
A Helm Chart is like a Dockerfile for Kubernetes β it defines how to deploy your app using a structured template.
Helm Chart Structure:
my-ml-chart/
βββ Chart.yaml
βββ values.yaml
βββ templates/
β βββ deployment.yaml
β βββ service.yaml
Example Chart.yaml
apiVersion: v2
name: ml-model
version: 0.1.0
description: A Helm chart for deploying ML model API
Example values.yaml
replicaCount: 2
image:
repository: bittusharma/ml-api
tag: v1
service:
type: NodePort
port: 80
π Deploying Applications with Helm
Step 1: Create a Chart
helm create ml-model
Step 2: Update values.yaml and templates
Step 3: Install the Chart
helm install mlapp ./ml-model
Step 4: Check Release
helm list
kubectl get pods
Step 5: Upgrade / Rollback
helm upgrade mlapp ./ml-model
helm rollback mlapp 1
π§ Helm is to Kubernetes what
aptis to Ubuntu β a package manager for simplifying deployments.
π§© Best Practices for Kubernetes in MLOps
Use Namespaces β Isolate dev/test/prod workloads.
Leverage ConfigMaps & Secrets β Store credentials and configs securely.
Use Resource Limits β Prevent ML containers from consuming all GPU/CPU.
Use Liveness & Readiness Probes β Auto-restart unhealthy model pods.
Monitor & Log Everything β Integrate with Prometheus and Grafana.
CI/CD Integration β Automate model builds and deployments using GitHub Actions or Jenkins.
GPU Workloads β Use GPU node pools and NVIDIA device plugins.
βοΈ Scaling and Auto-Scaling ML Models
Kubernetes provides Horizontal Pod Autoscaler (HPA) for scaling based on CPU/memory usage.
Example HPA
kubectl autoscale deployment ml-model-deployment --cpu-percent=70 --min=2 --max=10
This automatically scales ML model replicas based on load β ensuring reliability and cost efficiency.
Follow me on LinkedIn
Follow me on GitHub
Keep Learningβ¦β¦




