Our Data Processing Journey

Full Featured (30 min.)
[Infrastructure]

There are many aspects to consider when choosing a data processing framework. We @Juno chose the most dominant framework - Apache Spark (1.x). After a while, we started encountered a few issues, so we considered different solutions. At the end, we chose Google Cloud Dataflow. Google Cloud Dataflow is a fully-managed Big Data as a service. With Dataflow you can develop wide data processing patterns including ETL, both batch and stream computation with the same interfaces. It supports auto-scaling, multi-zones for isolation and more.

In this talk, I will go through our data processing journey, and how Google Cloud Dataflow helps solve wide data processing patterns.