I wanted to understand how to airlfow, StatsD, prometheus, and grafana all fit together.
Here’s what I did.
Let’s assume you know how to run airflow locally in a virtualenv. With that, let’s begin
Run airflow
Install the statsd
extra with pip install apache-airflow[statsd]
.
Next, like the docs say, enable statsd for the scheduler:
[scheduler]
statsd_on = True
statsd_host = localhost
statsd_port = 9125
statsd_prefix = airflow
Notice I changed the port to 9125.
Next run airflow scheduler
and airflow webserver
in terminal.
You might think you could just go to http://localhost:8080/metrics
and see some metrics. But you’d be wrong.
Run statsd_exporter
Now you have airflow running, and statsd enabled. However, you still have no stats to look at. All you have done is tell airflow to push events to localhost:9125
; but there is nothing listening for events on this port.
To fix that, we need to run another service:
docker run --rm -it \
-p 9102:9102 \
-p 9125:9125
-p 9125:9125/udp \
prom/statsd-exporter \
--log.level=debug
This container listens on 9125 for events pushed from airflow.
And since you ran it in debug mode, you should be able to see airflow events coming through in stdout.
Next you can open http://localhost:9102
in a browser. Then click on the link metrics
and you’ll see some metrics. In particular, you’ll see some with prefix airflow
.
You might think you can now launch grafana and connect it to statsd-exporter
, but you’d be wrong.
Run prometheus
You need yet another service running, namely prometheus, to scrape the metrics from statsd-exporter
.
Make a simple config file prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'airflow'
static_configs:
- targets: ['host.docker.internal:9102']
labels: {'host': 'my-airlfow-label'}
Now you can launch prometheus:
docker run --name=prometheus \
-p 9090:9090 \
-v $PWD/prometheus.yml:/prometheus.yml \
-it \
prom/prometheus \
--config.file=/prometheus.yml \
--log.level=debug \
--web.listen-address=:9090
Run grafana
Ok so now we have a lot of services running:
- airflow scheduler
- airflow webserver
- statsd-exporter
- prometheus
Now it’s finally time to run grafana.
docker run -d -p 3000:3000 grafana/grafana
User is admin
, password is admin
.
Go to data sources, and add http://host.docker.internal:9090
(the prometheus server).
Now you’re in business. The included dashboards e.g. Prometheus 2.0 Stats
should work.
More interestingly, we can explore airflow metrics. Go to Manage Dashboards
and create a new dashboard. You should be able to select metrics in the pane at the bottom of the screen.
What’s next
Now you have everything connected, but no good dashboards.