🚀 Day 08 – Kubernetes Advanced: ConfigMaps, Secrets, Autoscaling & Load Balancing for ML Services

I am Bittu Sharma, a DevOps & AI Engineer with a keen interest in building intelligent, automated systems. My goal is to bridge the gap between software engineering and data science, ensuring scalable deployments and efficient model operations in production.! 𝗟𝗲𝘁'𝘀 𝗖𝗼𝗻𝗻𝗲𝗰𝘁 I would love the opportunity to connect and contribute. Feel free to DM me on LinkedIn itself or reach out to me at bittush9534@gmail.com. I look forward to connecting and networking with people in this exciting Tech World.
Welcome back to Day 08 of our MLOps Learning Series! 👋
In the previous post, we covered Kubernetes Basics—understanding Pods, Deployments, and Services, and how to deploy an ML container using Minikube.
Today, we’ll explore advanced Kubernetes concepts — ConfigMaps, Secrets, Autoscaling, and Load Balancing — and see how they come together to help you deploy a scalable and secure ML service.
🧠 Why Advanced Kubernetes Concepts Matter in MLOps
When running ML workloads in production, you must handle:
Dynamic configurations (like API endpoints or environment variables).
Sensitive credentials (like access keys or database passwords).
Scaling traffic automatically when model usage spikes.
Ensuring high availability with load balancing.
Kubernetes provides built-in solutions for all these — let’s dive in.
⚙️ 1. ConfigMaps – Managing Application Configurations
ConfigMaps are used to store non-sensitive configuration data in key-value pairs.
They allow you to separate config files from your application code.
🧩 Example: Creating a ConfigMap
kubectl create configmap ml-config \
--from-literal=MODEL_NAME=iris-classifier \
--from-literal=MODEL_VERSION=v1
✅ Verify:
kubectl get configmap ml-config -o yaml
🔗 Mounting ConfigMap into Pod:
envFrom:
- configMapRef:
name: ml-config
This ensures your ML container can access environment variables dynamically — without rebuilding the image.
🔒 2. Secrets – Handling Sensitive Data Securely
Secrets are similar to ConfigMaps but are designed for storing sensitive information like API keys, tokens, or passwords.
🧩 Example: Creating a Secret
kubectl create secret generic ml-secret \
--from-literal=DB_USER=admin \
--from-literal=DB_PASS=secure123
✅ Verify:
kubectl get secret ml-secret -o yaml
🔗 Using Secret in Deployment:
envFrom:
- secretRef:
name: ml-secret
Your ML model can now access credentials securely without exposing them in plain text.
📈 3. Autoscaling – Scaling ML Services Automatically
ML services may experience unpredictable traffic — for example, an API endpoint serving predictions might get sudden load spikes.
Kubernetes handles this with Horizontal Pod Autoscaler (HPA).
🧩 Step 1: Enable Metrics Server
minikube addons enable metrics-server
🧩 Step 2: Create an HPA
kubectl autoscale deployment ml-model-deployment \
--cpu-percent=50 --min=2 --max=5
This means:
If CPU usage > 50%, Kubernetes adds new pods (up to 5).
If CPU usage < 50%, it scales down automatically (min 2 pods).
✅ Check status:
kubectl get hpa
Autoscaling ensures optimal resource utilization while maintaining performance.
🌐 4. Load Balancing – Distributing Traffic Efficiently
When multiple replicas of your ML service run, Kubernetes uses Services (type LoadBalancer) to distribute incoming traffic evenly.
🧩 Example: LoadBalancer Service
apiVersion: v1
kind: Service
metadata:
name: ml-model-service
spec:
type: LoadBalancer
selector:
app: ml-model
ports:
- port: 80
targetPort: 5000
This ensures that user requests are automatically balanced between all available pods, avoiding overload on any single instance.
🧪 Hands-on: Deploy ML Service with Autoscaling
Let’s deploy a simple ML prediction API that scales automatically based on CPU usage.
🧩 Step 1: Deployment File (ml-model-deployment.yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-deployment
spec:
replicas: 2
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-container
image: bittu/ml-model:latest
ports:
- containerPort: 5000
envFrom:
- configMapRef:
name: ml-config
- secretRef:
name: ml-secret
resources:
requests:
cpu: "100m"
limits:
cpu: "500m"
🧩 Step 2: Create the Service
kubectl apply -f ml-model-deployment.yaml
kubectl expose deployment ml-model-deployment --type=LoadBalancer --port=80 --target-port=5000
🧩 Step 3: Add Autoscaling
kubectl autoscale deployment ml-model-deployment --cpu-percent=50 --min=2 --max=5
✅ Step 4: Test
kubectl get pods
kubectl get hpa
minikube service ml-model-deployment
You’ll see the ML model scaling automatically when CPU utilization increases.
🧭 Real-World Use Case
Imagine an image classification API hosted on Kubernetes.
During peak traffic (say, during an online sale or event), user requests surge — HPA automatically adds new pods.
When traffic drops, it scales down again.
This automation ensures:
Reduced cost 💰
Stable performance ⚡
Zero downtime 🔄
💡 Key Takeaways
| Concept | Purpose |
| ConfigMaps | Manage non-sensitive configuration data |
| Secrets | Securely store credentials and API keys |
| Autoscaling | Automatically scale pods based on usage |
| Load Balancing | Distribute requests among pods |
🧩 What’s Next?
In Day 09, we’ll explore CI/CD for ML — integrating continuous integration and deployment pipelines for machine learning models.
Stay tuned — we’re moving closer to automated, production-ready MLOps workflows! 🚀
Follow me on LinkedIn
Follow me on GitHub
Keep Learning……




