🚀 Day 08 All Essential Kubernetes (K8s) Commands for MLOps Engineers

As an MLOps Engineer, you’ll constantly work with Kubernetes to deploy, monitor, and scale ML workloads — from training containers to serving APIs.
Knowing the right kubectl commands saves time, reduces errors, and helps automate your workflows efficiently.

In this blog, we’ll cover all practical Kubernetes commands — from beginner to advanced — tailored for Machine Learning Operations (MLOps) use cases.

⚙️ 1️⃣ Basic Kubernetes Commands

These commands help you start working with any Kubernetes cluster.

# Check Kubernetes version
kubectl version

# Get cluster info
kubectl cluster-info

# List all nodes in the cluster
kubectl get nodes

# Get detailed info about a node
kubectl describe node <node-name>

# View all namespaces
kubectl get namespaces

# Switch to a namespace
kubectl config set-context --current --namespace=<namespace-name>

🧱 2️⃣ Working with Pods

Pods are the smallest deployable units in Kubernetes.
In ML, they often represent your model training or inference containers.

# List all pods
kubectl get pods

# Get pods in all namespaces
kubectl get pods --all-namespaces

# Describe a specific pod
kubectl describe pod <pod-name>

# View pod logs (useful for ML training logs)
kubectl logs <pod-name>

# Stream logs in real-time
kubectl logs -f <pod-name>

# Execute a command inside a pod (like checking model outputs)
kubectl exec -it <pod-name> -- /bin/bash

🧩 3️⃣ Deployments and ReplicaSets

Deployments ensure that your ML models or training jobs are continuously running and scalable.

# List all deployments
kubectl get deployments

# Create a deployment
kubectl create deployment ml-api --image=mlops/serve:latest

# Scale deployment (e.g., 3 replicas for model serving)
kubectl scale deployment ml-api --replicas=3

# Update deployment with a new image
kubectl set image deployment/ml-api ml-api=mlops/serve:v2

# Rollback deployment
kubectl rollout undo deployment/ml-api

🌐 4️⃣ Services and Networking

Services expose your ML model to internal or external traffic.

# List all services
kubectl get svc

# Expose a deployment as a service
kubectl expose deployment ml-api --type=LoadBalancer --port=80 --target-port=5000

# Describe a service
kubectl describe svc ml-api

# Get the external IP of a LoadBalancer service
kubectl get svc ml-api -o wide

⚡ 5️⃣ ConfigMaps and Secrets

These are essential for ML configurations — like data paths, credentials, or API keys.

# Create a ConfigMap
kubectl create configmap ml-config --from-literal=DATA_PATH=/data

# View ConfigMaps
kubectl get configmaps

# Create a Secret (for API keys, DB credentials)
kubectl create secret generic ml-secret --from-literal=API_KEY=abcd1234

# Describe Secret
kubectl describe secret ml-secret

📈 6️⃣ Autoscaling and Resource Management

Kubernetes makes ML model scaling easy with the Horizontal Pod Autoscaler.

# Apply resource limits in YAML
kubectl apply -f ml-deployment.yaml

# Set up autoscaling based on CPU
kubectl autoscale deployment ml-api --min=2 --max=10 --cpu-percent=80

# Check current autoscalers
kubectl get hpa

🧮 7️⃣ Jobs and CronJobs (For Training ML Models)

Jobs are perfect for running ML training tasks that need to complete once.
CronJobs automate model retraining at intervals.

# Run a one-time training job
kubectl create job ml-train --image=mlops/train:v1

# List jobs
kubectl get jobs

# Create a CronJob for daily retraining
kubectl create cronjob ml-retrain --image=mlops/train:v1 --schedule="0 2 * * *"

# List CronJobs
kubectl get cronjobs

🔍 8️⃣ Monitoring & Debugging

Useful commands for inspecting resources during model deployment or training.

# View events (troubleshooting training pods)
kubectl get events --sort-by=.metadata.creationTimestamp

# Check all resources in current namespace
kubectl get all

# Port-forward a pod to access locally
kubectl port-forward pod/<pod-name> 8080:80

# Delete a stuck pod
kubectl delete pod <pod-name> --force --grace-period=0

🧰 9️⃣ Custom Resource Definitions (CRDs) for ML

MLOps platforms like Kubeflow or Seldon Core use CRDs to manage ML pipelines.

# Get all CRDs
kubectl get crds

# Apply a custom CRD (e.g., SeldonDeployment)
kubectl apply -f seldon-deploy.yaml

# Check status of CRD
kubectl describe seldondeployment <name>

🧱 10️⃣ YAML Management

As an MLOps engineer, you often use YAML manifests to define resources.

# Apply a YAML file
kubectl apply -f deployment.yaml

# Delete a YAML resource
kubectl delete -f deployment.yaml

# View generated YAML
kubectl get deployment ml-api -o yaml

🚀 Bonus: Useful Shortcuts

# Quickly delete all pods in namespace
kubectl delete pods --all

# Restart deployment
kubectl rollout restart deployment/ml-api

# Switch context
kubectl config use-context <context-name>

# View current context
kubectl config current-context

🧩 Real-World Use Case

Imagine you have a TensorFlow model deployed as an API using FastAPI.
Here’s what you might do:

Build & push the model Docker image.

Deploy it on Kubernetes using:

 kubectl create deployment tf-api --image=mlops/tf-api:v1
 kubectl expose deployment tf-api --type=LoadBalancer --port=80 --target-port=8000

Set up autoscaling:

 kubectl autoscale deployment tf-api --min=2 --max=10 --cpu-percent=75

Configure model retraining via CronJob:

 kubectl create cronjob tf-retrain --image=mlops/train:v2 --schedule="0 0 * * SUN"

💡 Interview Questions

Q1: What’s the difference between a Deployment and a Job in Kubernetes?
Q2: How do you perform rolling updates for ML model APIs?
Q3: How can you autoscale a training pipeline in Kubernetes?
Q4: How do ConfigMaps and Secrets improve ML workflow security?
Q5: What’s the role of CronJobs in MLOps pipelines?

🧭 Conclusion

Kubernetes is the backbone of modern MLOps.
With these commands, you can manage ML model training, deployment, scaling, and monitoring in production environments confidently.

Master these commands — and you’ll be ready to handle real-world ML pipelines at scale!

🚀 Day 08 All Essential Kubernetes (K8s) Commands for MLOps Engineers

⚙️ 1️⃣ Basic Kubernetes Commands

🧱 2️⃣ Working with Pods

🧩 3️⃣ Deployments and ReplicaSets

🌐 4️⃣ Services and Networking

⚡ 5️⃣ ConfigMaps and Secrets

📈 6️⃣ Autoscaling and Resource Management

🧮 7️⃣ Jobs and CronJobs (For Training ML Models)

🔍 8️⃣ Monitoring & Debugging

🧰 9️⃣ Custom Resource Definitions (CRDs) for ML

🧱 10️⃣ YAML Management

🚀 Bonus: Useful Shortcuts

🧩 Real-World Use Case

💡 Interview Questions

🧭 Conclusion

✍️ Author: Bittu Sharma

Comments

More from this blog

# Apache Maven for DevOps: Complete Guide to Build Automation and CI/CD

🚀 LLMOps + Kubernetes: The Future of AI Infrastructure

📅 30 Days Blog Challenge Tracker

🚀 LLMOps: The Complete Guide (From Basics to Production)

🚀 Complete In-Depth Guide to LangServe (LangServer) for LLM Applications

Command Palette

⚙️ 1️⃣ Basic Kubernetes Commands

🧱 2️⃣ Working with Pods

🧩 3️⃣ Deployments and ReplicaSets

🌐 4️⃣ Services and Networking

⚡ 5️⃣ ConfigMaps and Secrets

📈 6️⃣ Autoscaling and Resource Management

🧮 7️⃣ Jobs and CronJobs (For Training ML Models)

🔍 8️⃣ Monitoring & Debugging

🧰 9️⃣ Custom Resource Definitions (CRDs) for ML

🧱 10️⃣ YAML Management

🚀 Bonus: Useful Shortcuts

🧩 Real-World Use Case

💡 Interview Questions

🧭 Conclusion

✍️ Author: Bittu Sharma

Comments

More from this blog