- Design, build, optimize, launch and support new and existing data models and ETL processes in production
- Interface with engineers, product managers and product analysts to understand data needs.
- Manage and verify data accuracy for Hadoop cluster.
- Responsible for support of Hadoop cluster environment which includes Hive, Spark, Hbase, Presto, etc.
- BS degree or equivalent experience in Computer Science or related field
- 2+ years experience in custom ETL design, implementation and maintenance on Hadoop clusters
- 2+ years on hand-on development coding
- Understanding of Hadoop ecosystem such as HDFS, YARN, MapReduce, Zookeeper, Kafka, HBase, Spark and Hive
- Strong SQL skills, especially in the area of data aggregation
- Good understanding of distributed system, basic mathematics such as statistics and probability
- Comfortable with Git version control
- Experience building real-world data pipelines
- Automation skills such as Airflow, Python and Bash code
- Experience in the following is a plus: Druid, GeoMesa, or GeoWave
- Experience with A/B testing environment
- Experience with analytics tools like R, Matlab
- Strong Java or Scala skills