Reversim Summit 2018

Light up the Spark in Catalyst by avoiding UDFs

Full Featured (30 min.)

Processing data at scale usually involves struggling with performance, strict SLA, limited hardware capabilities and more.After struggling with Spark SQL query run-time I found the felon! In this lecture,I would like to share with you the change in perspective and process we had to go through in order to find the felon (and the solution!).Today in the world of Big Data and Spark we are processing high volume transactions.Catalyst is the Spark SQL query optimizer and in this talk, we will reveal how you can fully utilize Catalyst optimization power in order to make queries run as fast as possible,by pushing down actions and avoiding UDFs as much as possible,while still maximizing performance

Adi Polak

Software Engineer @ Akamai