๐ Day 07 โ Introduction to CI/CD for Machine Learning (MLOps Series)

I am Bittu Sharma, a DevOps & AI Engineer with a keen interest in building intelligent, automated systems. My goal is to bridge the gap between software engineering and data science, ensuring scalable deployments and efficient model operations in production.! ๐๐ฒ๐'๐ ๐๐ผ๐ป๐ป๐ฒ๐ฐ๐ I would love the opportunity to connect and contribute. Feel free to DM me on LinkedIn itself or reach out to me at bittush9534@gmail.com. I look forward to connecting and networking with people in this exciting Tech World.
๐ง Introduction
As Machine Learning (ML) projects grow, maintaining consistency, reproducibility, and automation becomes challenging.
Thatโs where CI/CD (Continuous Integration and Continuous Delivery) steps in โ it helps automate the process of building, testing, and deploying ML models seamlessly.
But unlike traditional software, ML CI/CD comes with unique challenges: data versioning, model drift, dependency tracking, and reproducibility.
In this post, weโll explore how CI/CD applies to ML, key challenges, and how to build an ML CI/CD pipeline using GitHub Actions.
โ๏ธ What is CI/CD?
๐งฉ Continuous Integration (CI)
Continuous Integration is the process of automatically testing and validating code changes every time a team member pushes new code to the repository.
In ML:
Validate code and data integrity
Run unit tests for preprocessing and model scripts
Train lightweight models to check reproducibility
๐ Continuous Delivery (CD)
Continuous Delivery ensures the automated deployment of your models or applications once they pass all tests.
In ML:
Deploy trained models to production or staging
Automate retraining and redeployment
Version control for models (e.g., via DVC or MLflow)
๐ก Why CI/CD for ML is Different?
Traditional CI/CD works for code.
In ML, we also deal with:
๐ฆ Large datasets
๐งฎ Model artifacts
๐ง Data drift and model performance drift
๐ Experiment tracking and reproducibility
๐งฐ Dependency management for libraries and frameworks
So ML CI/CD = Code + Data + Model + Metrics pipelines.
๐๏ธ MLOps CI/CD Pipeline Overview
Hereโs how a typical ML CI/CD pipeline works:
Data Collection โ Preprocessing โ Model Training โ Evaluation โ Model Registry โ Deployment
โ โ โ
CI Testing MLflow/DVC CD Automation
๐ง Hands-on: Setup CI/CD with GitHub Actions for ML Pipeline
Weโll create a simple ML pipeline that:
Loads data
Trains a model
Runs tests
Tracks metrics
Automates this flow using GitHub Actions
๐ช Step 1: Project Structure
ml-ci-cd/
โ
โโโ data/
โ โโโ iris.csv
โโโ src/
โ โโโ train.py
โ โโโ test_train.py
โโโ requirements.txt
โโโ .github/
โโโ workflows/
โโโ ml_pipeline.yml
๐งฉ Step 2: train.py โ Train Your Model
# src/train.py
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import joblib
def train_model():
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test)
acc = accuracy_score(y_test, preds)
print(f"Model Accuracy: {acc:.2f}")
joblib.dump(model, "model.pkl")
if __name__ == "__main__":
train_model()
๐งช Step 3: test_train.py โ Add Basic Unit Test
# src/test_train.py
import joblib
from sklearn.datasets import load_iris
def test_model_exists():
model = joblib.load("model.pkl")
assert model is not None, "Model not found!"
def test_prediction_shape():
model = joblib.load("model.pkl")
X, _ = load_iris(return_X_y=True)
preds = model.predict(X[:5])
assert preds.shape[0] == 5, "Prediction shape mismatch"
โ๏ธ Step 4: requirements.txt
scikit-learn
joblib
pytest
๐ Step 5: Setup GitHub Actions Workflow
Create a new file:.github/workflows/ml_pipeline.yml
name: ML CI/CD Pipeline
on:
push:
branches:
- main
pull_request:
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Train model
run: |
python src/train.py
- name: Run tests
run: |
pytest src/test_train.py
๐งญ Step 6: Verify in GitHub Actions
Commit your code:
git add . git commit -m "Setup CI/CD for ML pipeline" git push origin mainGo to GitHub โ Actions Tab
Watch your ML pipeline run automatically on each push!
๐ Step 7: Extend the Pipeline (Optional)
Once basic CI/CD is working, extend it by:
Using MLflow or DVC to track experiments
Pushing trained models to S3 or Model Registry
Triggering deployment workflows (e.g., Docker + AWS + Kubernetes)
๐ง Real-World Example
Companies like Spotify, Netflix, and Google integrate CI/CD in their MLOps workflows to automate:
Retraining models when new data arrives
Testing model performance
Rolling back to older models when drift occurs
They often use tools like GitHub Actions, Jenkins, GitLab CI, or Kubeflow Pipelines.
๐ฌ Common Interview Questions
Q1. Whatโs the difference between traditional CI/CD and ML CI/CD?
A: ML CI/CD includes data and model versioning, not just code automation.
Q2. What tools can be used for ML CI/CD?
A: GitHub Actions, Jenkins, GitLab CI, Kubeflow, Airflow.
Q3. How can you automate retraining in CI/CD?
A: By setting triggers on data changes (using DVC or S3 events).
Q4. Whatโs model drift, and how do you handle it?
A: Model drift occurs when data distribution changes โ monitor with MLflow metrics and retrain models automatically.
Q5. Whatโs the role of testing in ML CI/CD?
A: Ensures that data preprocessing, model training, and inference pipelines behave as expected.
๐ Summary
| Concept | Description |
| CI | Automate build, test, and validation |
| CD | Automate deployment and delivery |
| ML CI/CD | Adds data, model, and metrics to pipeline |
| Tools | GitHub Actions, MLflow, DVC, Kubeflow |
| Outcome | Faster, reproducible, and reliable ML pipelines |
๐งฉ Final Thoughts
Integrating CI/CD into your ML workflow is a crucial step toward reliable, scalable, and production-ready ML systems.
It ensures that every code or data change is tested, validated, and deployed automatically โ keeping your models always in sync with reality.
Follow me on LinkedIn
Follow me on GitHub
Keep Learningโฆโฆ




