09 Jun 2025 4 min read azure

From Pandas to the Cloud: How I Accidentally Became a Container Evangelist

A messy forecasting script lived on someone’s laptop, run manually and prone to issues. I automated it using Azure Container Apps Jobs — no servers, no headaches. In this post, I share how I containerized the workflow, and made the whole thing run like clockwork.

A few months ago, I was minding my own business—tuning data pipelines, optimizing our queries, and doing the usual data engineering work—when a curious message popped up on Teams.

“Hey, got a sec? We’ve got this demand forecasting script that’s... uh, getting out of hand. Can you take a look?”

Sure, why not?

The Forecasting Fiasco

The business team had been running their demand forecasting using a familiar tool—Pandas. It wasn’t flashy, but it worked. Well, sort of. The process pulled source data from multiple systems: CRM, Budgeting, Revenue and even a few Excel files manually emailed every week.

The pipeline (if you could call it that) involved:

Opening the python project on someone’s personal laptop
Running the main script
Exporting results to CSVs
Manually uploading those files to an Azure Storage container
Then, pray that the files are uploaded successfully to the database, which occasionally failed due to formatting issues.

It was equal parts art, science, and daily anxiety.

Whose Turn Is It to Run the Forecast

The script wasn’t running on any server. It was tied—rather tragically—to whoever available to run it. Every week, a different analyst would try to run it.

I still remember the call I got one morning:

“Hey, I tried running the forecast and it’s throwing a weird error about openpyxl and pyodbc. I don’t know what that means.”

I did. It meant someone’s Conda environment had gone rogue.

The business wasn’t just running forecasts. They were wrestling Conda environment, fighting PATH variables, and battling dependencies hell. It was a miracle anything worked at all.

Enter the Data Engineer

That’s when they came to me.

The request was simple enough: “Can we automate this?”

So I started looking at options:

Azure Synapse Notebook? Tempting, but the script was packed with custom Python libraries, written using Python 3.7 that will not be compatible with the supported Synapse Spark 3.5 runtime.
Azure Data Factory? Also an option, but converting a sprawling Pandas script into ADF data flows felt like translating poetry into assembly language.

After a few trials, I realized something important: we didn’t need to rewrite the logic—we just needed to containerize it.

A Job for Azure Container Jobs

And that’s when I discovered Azure Container Jobs.

No need for orchestration engines, no Kubernetes cluster to maintain, and no VM just sitting idle. I could:

Containerize the Python application into a Docker container
Push it to Azure Container Registry
Set up a Azure Container Apps Environment
Create an Azure File Share and mount the volume in the container
Create a Container App Job and schedule it to run once a week (or whenever needed), persisting the forecasting report in Azure File Share

Best of all? It ran the exact same environment every time. No more “it works on my laptop” debates. Here's what the solution looks like.

azure container app.drawio.png

Dockerfile (Simplified)

To keep the size of the image down, I used a multi-stage build. This dramatically reduce the image size by 50%:

# Use the official Miniconda base image
FROM mcr.microsoft.com/devcontainers/miniconda:latest AS build

# Copy environment files
COPY environment.yml environment.yml

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Create a default conda environment
RUN conda env create -f environment.yml

# Install conda-pack:
RUN conda install -c conda-forge conda-pack

# Use conda-pack to create a virtual env
RUN conda-pack -n default -o /local/env.tar && \
  mkdir /venv && cd /venv && tar xf /local/env.tar && \
  rm /local/env.tar

# This makes the entire /venv directory truly self-contained and portable
RUN /venv/bin/conda-unpack

# We starts brand new, from a minimal base image without conda
# as the venv we built previously is completely self-sufficient
FROM debian:buster AS runtime

# Copy /venv from the previous stage:
COPY --from=build /venv /venv

# Add Conda Python to PATH
ENV PATH="/venv/bin:$PATH"

# Copy the main script
COPY forecasting_pipeline.py .

# Default run command
CMD ["python", "forecasting_pipeline.py"]

Example environment.yml

The trickiest part of this exercise was upgrading Python from 3.7 to 3.11, but it turned out to be a blessing in disguise — the newer version brought noticeable performance improvements, better error messages, and long-term support, making our container builds leaner and our data pipeline faster and more maintainable.

name: default
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pandas
  - numpy
  - scipy
  - plotly
  - pyodbc
  - azure-identity
  - python-dateutil
  - pip
  - pip:
    - python-dotenv
    - sqlalchemy

Container Registry and Deployment

To deploy:

# Build the container
docker build -t forecasting-pipeline .

# Tag and push to ACR
az acr login --name myregistry
docker tag forecasting-pipeline myregistry.azurecr.io/forecasting-pipeline:latest
docker push myregistry.azurecr.io/forecasting-pipeline:latest

Then, schedule using an Azure Container Job with the Azure File Share mounted and appropriate secrets/environment variables passed in.

From Chaos to Confidence

Now the forecasting pipeline runs like clockwork. The business team doesn’t worry about conda environments or weird Excel edge cases. They just get their forecast files where they need them, when they need them.

I didn’t set out to become a DevOps-for-Pandas advocate. But sometimes the best solutions aren’t the flashiest—they’re the ones that quietly work, week after week.

And all it took was one container.

Epilogue: Lessons Learned

Manual data pipelines are a hidden tax on productivity.
Not everything needs to be rewritten—encapsulation can be powerful.
Azure Container Jobs are a sweet spot between DIY infrastructure and fully managed orchestration.

If you’re a data engineer caught between business logic and bad environments, don’t overlook containers. Sometimes, all you need is a good Dockerfile and a quiet Sunday to set things right.

The Forecasting Fiasco

Whose Turn Is It to Run the Forecast

Enter the Data Engineer

A Job for Azure Container Jobs

Dockerfile (Simplified)

Example environment.yml

Container Registry and Deployment

From Chaos to Confidence

Epilogue: Lessons Learned

You might also like...

My Data Engineering Wake-Up Call: From 3 AM Alerts to Streaming Serenity

From Chaos to Clarity: My Journey Reimagining Azure RBAC

My Journey from SQL Server to Delta Lake: Solving Semi-Structured Data Challenges with Azure Synapse

Network Security in Azure - From Simple to Enterprise

Building Your Personal Data Powerhouse: Self-Hosting Apache Spark on Kubernetes - Part 1