Run Airflow + StatsD + Grafana locally

Daniel
2 min readDec 12, 2020

--

I wanted to understand how to airlfow, StatsD, prometheus, and grafana all fit together.

Here’s what I did.

Let’s assume you know how to run airflow locally in a virtualenv. With that, let’s begin

Run airflow

Install the statsd extra with pip install apache-airflow[statsd].

Next, like the docs say, enable statsd for the scheduler:

[scheduler]
statsd_on = True
statsd_host = localhost
statsd_port = 9125
statsd_prefix = airflow

Notice I changed the port to 9125.

Next run airflow scheduler and airflow webserver in terminal.

You might think you could just go to http://localhost:8080/metrics and see some metrics. But you’d be wrong.

Run statsd_exporter

Now you have airflow running, and statsd enabled. However, you still have no stats to look at. All you have done is tell airflow to push events to localhost:9125; but there is nothing listening for events on this port.

To fix that, we need to run another service:

docker run --rm -it \
-p 9102:9102 \
-p 9125:9125
-p 9125:9125/udp \
prom/statsd-exporter \
--log.level=debug

This container listens on 9125 for events pushed from airflow.

And since you ran it in debug mode, you should be able to see airflow events coming through in stdout.

Next you can open http://localhost:9102 in a browser. Then click on the link metrics and you’ll see some metrics. In particular, you’ll see some with prefix airflow.

You might think you can now launch grafana and connect it to statsd-exporter, but you’d be wrong.

Run prometheus

You need yet another service running, namely prometheus, to scrape the metrics from statsd-exporter.

Make a simple config file prometheus.yml:

global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'airflow'
static_configs:
- targets: ['host.docker.internal:9102']
labels: {'host': 'my-airlfow-label'}

Now you can launch prometheus:

docker run --name=prometheus \
-p 9090:9090 \
-v $PWD/prometheus.yml:/prometheus.yml \
-it \
prom/prometheus \
--config.file=/prometheus.yml \
--log.level=debug \
--web.listen-address=:9090

Run grafana

Ok so now we have a lot of services running:

  • airflow scheduler
  • airflow webserver
  • statsd-exporter
  • prometheus

Now it’s finally time to run grafana.

docker run -d -p 3000:3000 grafana/grafana

User is admin, password is admin.

Go to data sources, and add http://host.docker.internal:9090 (the prometheus server).

Now you’re in business. The included dashboards e.g. Prometheus 2.0 Stats should work.

More interestingly, we can explore airflow metrics. Go to Manage Dashboards and create a new dashboard. You should be able to select metrics in the pane at the bottom of the screen.

What’s next

Now you have everything connected, but no good dashboards.

--

--