How to Deploy an Airflow Stack

April 01, 2025

Apache Airflow is a powerful, open-source platform designed to programmatically author, schedule, and monitor workflows. It is widely used in data engineering and analytics pipelines to orchestrate complex workflows, ensuring tasks are executed in the correct order and at the right time. With its intuitive Directed Acyclic Graphs (DAGs) structure, Airflow allows users to define workflows as Python code, making it both flexible and highly customizable.

Why Use Apache Airflow?

What would you do if you want to run a task every x hours? A classic answer would be to use a Cron Job! Cool! But consider this scenario:

Suppose you want to back up your PostgreSQL database every night at midnight. A cron job is perfect for this because it's simple, lightweight, and doesn't require complex dependencies. You can set it up with a single line in your crontab:

0 0 * * * pg_dump -U postgres my_database > /backups/my_database_$(date +\%F).sql

Why Use Cron?
Cron jobs are ideal for straightforward, time-based tasks that don't require monitoring, retries, or interdependencies with other tasks. They're easy to set up and sufficient for simple automation needs.

Now imagine you need to back up your database, process the backup file (e.g., compress it), and then upload it to cloud storage. Additionally, you want to monitor the workflow, handle failures, and ensure tasks run in the correct order. Achieving this with cron jobs is a nightmare. This is where Airflow shines. You can define a Directed Acyclic Graph (DAG) to orchestrate these tasks in the right order and track tasks, configure retries, and so much more.

Why Use Airflow?

Airflow is ideal for managing workflows that require dependency management, retries, and scheduling.

  • ETL Pipelines: Extracting, transforming, and loading data from various sources into data warehouses.
  • Data Processing: Orchestrating machine learning pipelines or large-scale data transformations.
  • Automation: Scheduling and automating repetitive tasks across systems.
  • Monitoring: Providing visibility into task execution and logs for debugging.

Key Features of Apache Airflow

  • Dynamic Workflow Creation: Define workflows as Python code, enabling dynamic and reusable pipelines.
  • Scalability: Scale horizontally with distributed execution using workers.
  • Extensibility: Integrate with a wide range of systems via built-in operators or custom plugins.
  • Web Interface: Monitor and manage workflows through an intuitive web-based UI.
  • Resilience: Built-in retry mechanisms and task dependencies ensure robust execution.

Components of an Airflow Docker Stack

When deploying Airflow using Docker, the stack typically consists of several components, each playing a critical role in the system's functionality. Let’s break them down:

1. Scheduler

The scheduler is the brain of Airflow. It is responsible for parsing DAGs, determining task dependencies, and scheduling tasks for execution. The scheduler ensures that tasks are executed in the correct order and handles retries for failed tasks. It continuously monitors the state of the system and assigns tasks to available workers.

2. Worker

Workers are the execution engines of Airflow. They pick up tasks assigned by the scheduler and execute them. In a distributed setup, multiple workers can run in parallel, enabling horizontal scaling to handle large workloads. Workers use a message broker (like Redis) to receive task execution instructions.

3. Postgres Database

The Postgres database serves as the metadata store for Airflow. It keeps track of DAG definitions, task states, execution logs, and other critical information. This database is essential for ensuring the persistence and consistency of workflow execution data.

4. Redis Server

Redis acts as the message broker in the Airflow stack. It facilitates communication between the scheduler and workers by queuing tasks. Redis ensures that tasks are distributed efficiently and reliably across the system, enabling smooth coordination between components.

5. Webserver

The webserver provides a user-friendly interface for interacting with Airflow. Through the web UI, users can monitor DAGs, view task statuses, trigger workflows, and access logs. It is an essential tool for managing and debugging workflows in real time.


By combining these components, the Airflow Docker stack creates a robust and scalable environment for orchestrating workflows. In the next sections, we’ll dive into how to deploy and configure this stack to get your workflows up and running efficiently.

Deploy the stack

  1. Create a folder locally named airflow that will contain all of our project code
  2. We will build our own custom Airflow image. This will allow us to install additional Airflow dependencies or Python packages that we might want to add. Let's create an src folder inside our main project folder. This will contain:
  • Dockerfile for our custom docker image
  • python_requirements.txt for our Python packages
  • airflow_requirements.txt for our Airflow additional packages
  1. We will add GitPython, psycopg2-binary, and redis to our Python requirements file to be able to use these packages for custom Airflow operator hooks. For our Airflow requirements, we will add postgres, redis, google, and google_auth. Then our Dockerfile should look like this:
FROM --platform=linux/amd64 apache/airflow:2.10.5-python3.12

ARG MAIN_PYTHON_VERSION=3.12
ARG MAIN_AIRFLOW_VERSION=2.10.5

ARG RUNTIME_APT_DEPS_INSTALL="\
       build-essential \
       curl \
       libpq-dev \
       dnsutils \
       git"

ARG CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${MAIN_AIRFLOW_VERSION}/constraints-${MAIN_PYTHON_VERSION}.txt"

ENV RUNTIME_APT_DEPS=${RUNTIME_APT_DEPS}
ENV SCRIPTS=/scripts

USER root
RUN set -ex \
  && apt-get update -y \
  # installing apt dependencies
  && apt-get install -y --no-install-recommends \
        ${RUNTIME_APT_DEPS} \
  # removing unecessary files
  && apt-get autoremove -yqq --purge \
  && apt-get clean \
  && rm -vrf /var/lib/apt/lists/* \
        /tmp/* \
        /var/tmp/* \
        /usr/share/man \
        /usr/share/doc \
        /usr/share/doc-base

COPY ./python_requirements.txt ${SCRIPTS}/python_requirements.txt
COPY ./airflow_requirements.txt ${SCRIPTS}/airflow_requirements.txt
RUN pip3 install --no-cache-dir -r "${SCRIPTS}/python_requirements.txt" --constraint ${CONSTRAINT_URL}
RUN pip3 install --no-cache-dir apache-airflow["$(grep -v '^#' ${SCRIPTS}/airflow_requirements.txt | tr -s "( |\n)" , | head -c -1)"]==${AIRFLOW_VERSION} --constraint ${CONSTRAINT_URL}
  1. Switch to the src folder and build the Docker image:
cd src
docker build -t custom-airflow:V1 .
  1. Create a data folder to be mounted as a volume that contains the following folders for the stack: conf for the Airflow configuration file, dags for the Airflow DAGs (we can add an empty hello_world.py for now), logs (self-explanatory), and pgdata for the Airflow database data files. Add the airflow.cfg config file to the conf folder. The folder structure should look like this now:
airflow
├── data
│   ├── conf
│   │   └── airflow.cfg
│   ├── dags
│   │   └── hello_world.py
│   ├── logs
│   └── pgdata
├── src
│   ├── Dockerfile
│   ├── airflow_requirements.txt
│   └── python_requirements.txt
└── docker-compose.yml
  1. Add the airflow.cfg config file default content from here. You can set the Postgres, Airflow user, and Redis passwords to be a test password like Abcd1234 for example since it is only running locally. Example from the test airflow.cfg local file:
[database]
sql_alchemy_conn = postgresql+psycopg2://airflow:Abcd1234@postgres:5432/airflow

# The encoding for the databases
sql_engine_encoding = utf-8
.
.
.
broker_url = redis://:Abcd1234@redis:6379/0
  1. Change the base_url parameter under [webserver] to be: base_url = http://localhost:8080.
  2. This assumes that you will be running the Docker stack in a Linux OS. If you are using any other OS, consider setting up Docker Desktop to use as a Docker runtime. If you cannot use Docker due to license restrictions, Colima is another free open-source alternative.
  3. Add the following to your docker-compose.yml to specify all the services that need to run for the Airflow stack to work and to directly hardcode the environment variables (username, passwords, ports, etc., that were specified in the config file's connection parameters):
version: "3.8"
services:
  postgres:
    image: postgres:16
    deploy:
      placement:
        constraints: [node.role == manager]
    environment:
      - PGDATA=/var/lib/postgresql/data
      - POSTGRES_DB=airflow
      - POSTGRES_HOST=postgres
      - POSTGRES_PASSWORD=<PASSWORD>
      - POSTGRES_PORT=5432
      - POSTGRES_USER=airflow
    volumes:
      - ~/projects/pipeline/airflow/pgdata:/var/lib/postgresql/data
    networks:
      - airflow_network
      # - datadog-network
    ports:
      - 5432:5432
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5

  redis:
    image: redis:latest
    command: redis-server --requirepass <PASSWORD>
    networks:
      - airflow_network
      # - datadog-network
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50

  webserver:
    image: custom-airflow:V1
    deploy:
      placement:
        constraints: [node.role == manager]
    command: webserver
    ports:
      - "8080:8080"
    environment:
      - _AIRFLOW_DB_UPGRADE=true
      - _AIRFLOW_WWW_USER_CREATE=true
      - _AIRFLOW_WWW_USER_PASSWORD=<PASSWORD>
      - AIRFLOW_UID=501
      - POSTGRES_PASSWORD=<PASSWORD>

    volumes:
      - ~/projects/pipeline/airflow/dags:/opt/airflow/dags
      - ~/projects/pipeline/airflow/logs:/opt/airflow/logs
      - ~/projects/pipeline/airflow/plugins:/opt/airflow/plugins
      - ~/projects/pipeline/airflow/conf/airflow.cfg:/opt/airflow/airflow.cfg
    networks:
      - airflow_network

  scheduler:
    image: custom-airflow:V1
    deploy:
      placement:
        constraints: [node.role == manager]
    command: scheduler
    environment:
      - _AIRFLOW_DB_UPGRADE=true
    volumes:
      - ~/projects/pipeline/airflow/dags:/opt/airflow/dags
      - ~/projects/pipeline/airflow/logs:/opt/airflow/logs
      - ~/projects/pipeline/airflow/plugins:/opt/airflow/plugins
      - ~/projects/pipeline/airflow/conf/airflow.cfg:/opt/airflow/airflow.cfg
    networks:
      - airflow_network

  worker:
    image: custom-airflow:V1
    deploy:
      placement:
        constraints: [node.role == manager]
    command: celery worker
    environment:
      - _AIRFLOW_DB_UPGRADE=true
      - POSTGRES_PASSWORD=<PASSWORD>
    volumes:
      - ~/projects/pipeline/airflow/dags:/opt/airflow/dags
      - ~/projects/pipeline/airflow/logs:/opt/airflow/logs
      - ~/projects/pipeline/airflow/plugins:/opt/airflow/plugins
      - ~/projects/pipeline/airflow/conf/airflow.cfg:/opt/airflow/airflow.cfg
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - airflow_network

networks:
  airflow_network:
    external: false
  1. Add DAGs to the dags folder. We will create a simple "Hello World" test DAG that has a single task. This task will use a Python Operator to print out "Hello World!" onto the logs. The hello_world.py file should look like this:
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from airflow.operators.python import PythonOperator

def test_python():
    print("Hello World!")

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2025, 4, 1),
    'email': ['khalid@deriv.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 5,
    'retry_delay': timedelta(minutes=2),
}

dag_params = {
    'dag_id': 'test_exec',
    'default_args': default_args,
    'schedule_interval': None,
    'catchup': False,
    'max_active_runs': 5,
}

with DAG(**dag_params) as dag:

    t1 = PythonOperator(
        task_id='test_python',
        python_callable=test_python,
        dag=dag,
    )

    t2 = EmptyOperator(task_id='end')

    t1 >> t2
  1. Now we can deploy our Airflow Docker stack. Switch to the root folder and deploy the stack by running:
cd .. # Make sure you're in the `airflow` directory
docker swarm init
docker stack deploy --compose-file docker-compose.yml airflow
  1. Go to the webserver URL at http://localhost:8080 and trigger the test DAG.

Profile picture

Written by Khalid Ibrahim Adem A passionate developer and life-long learner.

© 2026 Khalid Ibrahim Adem. Bragging rights reserved 😎