Recently, I had the opportunity to add a new EMR on EKS plugin to Apache Airflow. While I’ve been a consumer of Airflow over the years, I’ve never contributed directly to the project. And weighing in at over half a million lines of code, Airflow is a pretty complex project to wade into. So here’s a guide on how I made a new operator in the AWS provider package.
...
Build your own Air Quality Monitor with OpenAQ and EMR on EKS
Fire season is closely approaching and as somebody that spent two weeks last year hunkered down inside with my browser glued to various air quality sites, I wanted to show how to use data from OpenAQ to build your own air quality analysis.
With Amazon EMR on EKS, you can now customize and package your own Apache Spark dependencies and I use that functionality for this post.
Overview OpenAQ maintains a publicly accessible dataset of various air quality metrics that’s updated every half hour. Bokeh is a popular library for Python data visualization. While it includes sample data for US county and state boundaries, we’re going to use shapefiles from census.gov.
...