model compression techniques for handling production constraints
Full Featured (30 min.)
The tech giants made great efforts to make Deep Learning more accessible to the general developers community. However, although it is much simple to use/train a neural network, it is still hard to comply with production constraints, in sense of running times and cost. In this lecture we will review the different ways in which we can make neural networks more efficient and cost effective while preserving their quality. I'll review the several techniques for model compression - e.g: knowledge distillation, pruning and quantization. i'll talk about several works in the fields and how to use and implement them for your own use.