🚀 Day 08 – Kubernetes Advanced: ConfigMaps, Secrets, Autoscaling & Load Balancing for ML Services

Welcome back to Day 08 of our MLOps Learning Series! 👋

In the previous post, we covered Kubernetes Basics—understanding Pods, Deployments, and Services, and how to deploy an ML container using Minikube.

Today, we’ll explore advanced Kubernetes concepts — ConfigMaps, Secrets, Autoscaling, and Load Balancing — and see how they come together to help you deploy a scalable and secure ML service.

🧠 Why Advanced Kubernetes Concepts Matter in MLOps

When running ML workloads in production, you must handle:

Dynamic configurations (like API endpoints or environment variables).
Sensitive credentials (like access keys or database passwords).
Scaling traffic automatically when model usage spikes.
Ensuring high availability with load balancing.

Kubernetes provides built-in solutions for all these — let’s dive in.

⚙️ 1. ConfigMaps – Managing Application Configurations

ConfigMaps are used to store non-sensitive configuration data in key-value pairs.
They allow you to separate config files from your application code.

🧩 Example: Creating a ConfigMap

kubectl create configmap ml-config \
  --from-literal=MODEL_NAME=iris-classifier \
  --from-literal=MODEL_VERSION=v1

✅ Verify:

kubectl get configmap ml-config -o yaml

🔗 Mounting ConfigMap into Pod:

envFrom:
  - configMapRef:
      name: ml-config

This ensures your ML container can access environment variables dynamically — without rebuilding the image.

🔒 2. Secrets – Handling Sensitive Data Securely

Secrets are similar to ConfigMaps but are designed for storing sensitive information like API keys, tokens, or passwords.

🧩 Example: Creating a Secret

kubectl create secret generic ml-secret \
  --from-literal=DB_USER=admin \
  --from-literal=DB_PASS=secure123

✅ Verify:

kubectl get secret ml-secret -o yaml

🔗 Using Secret in Deployment:

envFrom:
  - secretRef:
      name: ml-secret

Your ML model can now access credentials securely without exposing them in plain text.

📈 3. Autoscaling – Scaling ML Services Automatically

ML services may experience unpredictable traffic — for example, an API endpoint serving predictions might get sudden load spikes.
Kubernetes handles this with Horizontal Pod Autoscaler (HPA).

🧩 Step 1: Enable Metrics Server

minikube addons enable metrics-server

🧩 Step 2: Create an HPA

kubectl autoscale deployment ml-model-deployment \
  --cpu-percent=50 --min=2 --max=5

This means:

If CPU usage > 50%, Kubernetes adds new pods (up to 5).
If CPU usage < 50%, it scales down automatically (min 2 pods).

✅ Check status:

kubectl get hpa

Autoscaling ensures optimal resource utilization while maintaining performance.

🌐 4. Load Balancing – Distributing Traffic Efficiently

When multiple replicas of your ML service run, Kubernetes uses Services (type LoadBalancer) to distribute incoming traffic evenly.

🧩 Example: LoadBalancer Service

apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  type: LoadBalancer
  selector:
    app: ml-model
  ports:
    - port: 80
      targetPort: 5000

This ensures that user requests are automatically balanced between all available pods, avoiding overload on any single instance.

🧪 Hands-on: Deploy ML Service with Autoscaling

Let’s deploy a simple ML prediction API that scales automatically based on CPU usage.

🧩 Step 1: Deployment File (`ml-model-deployment.yaml`)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-container
        image: bittu/ml-model:latest
        ports:
        - containerPort: 5000
        envFrom:
        - configMapRef:
            name: ml-config
        - secretRef:
            name: ml-secret
        resources:
          requests:
            cpu: "100m"
          limits:
            cpu: "500m"

🧩 Step 2: Create the Service

kubectl apply -f ml-model-deployment.yaml
kubectl expose deployment ml-model-deployment --type=LoadBalancer --port=80 --target-port=5000

🧩 Step 3: Add Autoscaling

kubectl autoscale deployment ml-model-deployment --cpu-percent=50 --min=2 --max=5

✅ Step 4: Test

kubectl get pods
kubectl get hpa
minikube service ml-model-deployment

You’ll see the ML model scaling automatically when CPU utilization increases.

🧭 Real-World Use Case

Imagine an image classification API hosted on Kubernetes.
During peak traffic (say, during an online sale or event), user requests surge — HPA automatically adds new pods.
When traffic drops, it scales down again.

This automation ensures:

Reduced cost 💰
Stable performance ⚡
Zero downtime 🔄

💡 Key Takeaways

Concept	Purpose
ConfigMaps	Manage non-sensitive configuration data
Secrets	Securely store credentials and API keys
Autoscaling	Automatically scale pods based on usage
Load Balancing	Distribute requests among pods

🧩 What’s Next?

In Day 09, we’ll explore CI/CD for ML — integrating continuous integration and deployment pipelines for machine learning models.

Stay tuned — we’re moving closer to automated, production-ready MLOps workflows! 🚀

Follow me on LinkedIn

Follow me on GitHub

Keep Learning……

🚀 Day 08 – Kubernetes Advanced: ConfigMaps, Secrets, Autoscaling & Load Balancing for ML Services

🧠 Why Advanced Kubernetes Concepts Matter in MLOps

⚙️ 1. ConfigMaps – Managing Application Configurations

🧩 Example: Creating a ConfigMap

✅ Verify:

🔗 Mounting ConfigMap into Pod:

🔒 2. Secrets – Handling Sensitive Data Securely

🧩 Example: Creating a Secret

✅ Verify:

🔗 Using Secret in Deployment:

📈 3. Autoscaling – Scaling ML Services Automatically

🧩 Step 1: Enable Metrics Server

🧩 Step 2: Create an HPA

✅ Check status:

🌐 4. Load Balancing – Distributing Traffic Efficiently

🧩 Example: LoadBalancer Service

🧪 Hands-on: Deploy ML Service with Autoscaling

🧩 Step 1: Deployment File (`ml-model-deployment.yaml`)

🧩 Step 2: Create the Service

🧩 Step 3: Add Autoscaling

✅ Step 4: Test

🧭 Real-World Use Case

💡 Key Takeaways

🧩 What’s Next?

Comments

More from this blog

# Apache Maven for DevOps: Complete Guide to Build Automation and CI/CD

🚀 LLMOps + Kubernetes: The Future of AI Infrastructure

📅 30 Days Blog Challenge Tracker

🚀 LLMOps: The Complete Guide (From Basics to Production)

🚀 Complete In-Depth Guide to LangServe (LangServer) for LLM Applications

Command Palette

🧠 Why Advanced Kubernetes Concepts Matter in MLOps

⚙️ 1. ConfigMaps – Managing Application Configurations

🧩 Example: Creating a ConfigMap

✅ Verify:

🔗 Mounting ConfigMap into Pod:

🔒 2. Secrets – Handling Sensitive Data Securely

🧩 Example: Creating a Secret

✅ Verify:

🔗 Using Secret in Deployment:

📈 3. Autoscaling – Scaling ML Services Automatically

🧩 Step 1: Enable Metrics Server

🧩 Step 2: Create an HPA

✅ Check status:

🌐 4. Load Balancing – Distributing Traffic Efficiently

🧩 Example: LoadBalancer Service

🧪 Hands-on: Deploy ML Service with Autoscaling

🧩 Step 1: Deployment File (ml-model-deployment.yaml)

🧩 Step 2: Create the Service

🧩 Step 3: Add Autoscaling

✅ Step 4: Test

🧭 Real-World Use Case

💡 Key Takeaways

🧩 What’s Next?

Comments

More from this blog

🧩 Step 1: Deployment File (`ml-model-deployment.yaml`)