Skip to main content

Command Palette

Search for a command to run...

🚀 Day 08 – Kubernetes Advanced: ConfigMaps, Secrets, Autoscaling & Load Balancing for ML Services

Published
4 min read
🚀 Day 08 – Kubernetes Advanced: ConfigMaps, Secrets, Autoscaling & Load Balancing for ML Services
B

I am Bittu Sharma, a DevOps & AI Engineer with a keen interest in building intelligent, automated systems. My goal is to bridge the gap between software engineering and data science, ensuring scalable deployments and efficient model operations in production.! 𝗟𝗲𝘁'𝘀 𝗖𝗼𝗻𝗻𝗲𝗰𝘁 I would love the opportunity to connect and contribute. Feel free to DM me on LinkedIn itself or reach out to me at bittush9534@gmail.com. I look forward to connecting and networking with people in this exciting Tech World.

Welcome back to Day 08 of our MLOps Learning Series! 👋

In the previous post, we covered Kubernetes Basics—understanding Pods, Deployments, and Services, and how to deploy an ML container using Minikube.

Today, we’ll explore advanced Kubernetes conceptsConfigMaps, Secrets, Autoscaling, and Load Balancing — and see how they come together to help you deploy a scalable and secure ML service.


🧠 Why Advanced Kubernetes Concepts Matter in MLOps

When running ML workloads in production, you must handle:

  • Dynamic configurations (like API endpoints or environment variables).

  • Sensitive credentials (like access keys or database passwords).

  • Scaling traffic automatically when model usage spikes.

  • Ensuring high availability with load balancing.

Kubernetes provides built-in solutions for all these — let’s dive in.


⚙️ 1. ConfigMaps – Managing Application Configurations

ConfigMaps are used to store non-sensitive configuration data in key-value pairs.
They allow you to separate config files from your application code.

🧩 Example: Creating a ConfigMap

kubectl create configmap ml-config \
  --from-literal=MODEL_NAME=iris-classifier \
  --from-literal=MODEL_VERSION=v1

✅ Verify:

kubectl get configmap ml-config -o yaml

🔗 Mounting ConfigMap into Pod:

envFrom:
  - configMapRef:
      name: ml-config

This ensures your ML container can access environment variables dynamically — without rebuilding the image.


🔒 2. Secrets – Handling Sensitive Data Securely

Secrets are similar to ConfigMaps but are designed for storing sensitive information like API keys, tokens, or passwords.

🧩 Example: Creating a Secret

kubectl create secret generic ml-secret \
  --from-literal=DB_USER=admin \
  --from-literal=DB_PASS=secure123

✅ Verify:

kubectl get secret ml-secret -o yaml

🔗 Using Secret in Deployment:

envFrom:
  - secretRef:
      name: ml-secret

Your ML model can now access credentials securely without exposing them in plain text.


📈 3. Autoscaling – Scaling ML Services Automatically

ML services may experience unpredictable traffic — for example, an API endpoint serving predictions might get sudden load spikes.
Kubernetes handles this with Horizontal Pod Autoscaler (HPA).

🧩 Step 1: Enable Metrics Server

minikube addons enable metrics-server

🧩 Step 2: Create an HPA

kubectl autoscale deployment ml-model-deployment \
  --cpu-percent=50 --min=2 --max=5

This means:

  • If CPU usage > 50%, Kubernetes adds new pods (up to 5).

  • If CPU usage < 50%, it scales down automatically (min 2 pods).

✅ Check status:

kubectl get hpa

Autoscaling ensures optimal resource utilization while maintaining performance.


🌐 4. Load Balancing – Distributing Traffic Efficiently

When multiple replicas of your ML service run, Kubernetes uses Services (type LoadBalancer) to distribute incoming traffic evenly.

🧩 Example: LoadBalancer Service

apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  type: LoadBalancer
  selector:
    app: ml-model
  ports:
    - port: 80
      targetPort: 5000

This ensures that user requests are automatically balanced between all available pods, avoiding overload on any single instance.


🧪 Hands-on: Deploy ML Service with Autoscaling

Let’s deploy a simple ML prediction API that scales automatically based on CPU usage.

🧩 Step 1: Deployment File (ml-model-deployment.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-container
        image: bittu/ml-model:latest
        ports:
        - containerPort: 5000
        envFrom:
        - configMapRef:
            name: ml-config
        - secretRef:
            name: ml-secret
        resources:
          requests:
            cpu: "100m"
          limits:
            cpu: "500m"

🧩 Step 2: Create the Service

kubectl apply -f ml-model-deployment.yaml
kubectl expose deployment ml-model-deployment --type=LoadBalancer --port=80 --target-port=5000

🧩 Step 3: Add Autoscaling

kubectl autoscale deployment ml-model-deployment --cpu-percent=50 --min=2 --max=5

✅ Step 4: Test

kubectl get pods
kubectl get hpa
minikube service ml-model-deployment

You’ll see the ML model scaling automatically when CPU utilization increases.


🧭 Real-World Use Case

Imagine an image classification API hosted on Kubernetes.
During peak traffic (say, during an online sale or event), user requests surge — HPA automatically adds new pods.
When traffic drops, it scales down again.

This automation ensures:

  • Reduced cost 💰

  • Stable performance ⚡

  • Zero downtime 🔄


💡 Key Takeaways

ConceptPurpose
ConfigMapsManage non-sensitive configuration data
SecretsSecurely store credentials and API keys
AutoscalingAutomatically scale pods based on usage
Load BalancingDistribute requests among pods

🧩 What’s Next?

In Day 09, we’ll explore CI/CD for ML — integrating continuous integration and deployment pipelines for machine learning models.

Stay tuned — we’re moving closer to automated, production-ready MLOps workflows! 🚀

Follow me on LinkedIn

Follow me on GitHub

Keep Learning……