Tuesday, October 16, 2018

Apache Airflow - Runbook

To try out a different scheduler,  we tried Apache Airflow to schedule Spark jobs. 
Due to a known issue with Kerberos and Python 3 (see below), Python 2 had to be installed. 

I really like Airflow, but it doesn't handle user propagation as of 1.9 very well. Multi-tenancy isn't supported in a fully secure way, allowing users to execute jobs as other users in some cases. 

Following is the runbook to install Airflow.
An Ansible playbook that we used is here: 
https://github.com/infOpen/ansible-role-airflow


Apache Airflow Runbook

2 comments:

Apache Airflow - Runbook

To try out a different scheduler,  we tried Apache Airflow to schedule Spark jobs.  Due to a known issue with Kerberos and Python 3 (see...