Big Data Analysis with SPARK

Apache Spark is an open-source software for large-scale data processing and analysis. Using Apache Spark and Python (PySpark), this workshop is aimed at analyzing data sets that are too large to be handled and processed by a single computer.
In a hands-on format, using PySpark, participants will learn to import data, and use functions to transform, reduce and compile the data. You will also learn how to produce parallel algorithms that can run on the national clusters.