BDA
Big Data Analysis
Objectives
The main learning objective of this course is to promote a comprehensive understanding of the principles of analysing large volumes of data. The learning objectives are
- understand the challenges of analysing large volumes of data
- solve the challenges of analysing large volumes of data,
- analyse case studies of big data analysis,
- learn tools and best practices for more effective, scalable, robust and reproducible data analysis.
- design effective solutions for analysing large volumes of data.
- implement effective solutions for analysing large volumes of data..
Program
The CU covers several important aspects of big data analytics tools, including:
- BigData: diversity (wide range of data formats), velocity (real-time processing of data streams to support real-time decisions) and volume
- Fundamentals and essential approaches for designing, storing, analysing and managing semi-structured and unstructured data: data models such as tabular, tree, graph, multi-dimensional (cubes), text; and row vs column-oriented storage
- Basic components of data analysis pipelines: acquisition, integration, exploration, mining, analysis, visualisation and interpretation
- Considerations of scalability, availability, coherence, distribution and expressiveness of data
- Distributed processing: approaches such as MapReduce, Dataflow/DAG and Graphs to distribute processing across several nodes.
- Comparison between Batch and Stream Processing for different data analysis needs.
- Performance optimisation strategies in data analysis to maximise computational efficiency.
- Analysing large amounts of data in Python: Jupyter Notebooks, Pandas, NumPy, Dask or PySpark.
Bibliography
- Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. Martin Kleppmann. 2017. O’ Reilly Media, Inc
- High Performance Python. Micha Gorelick, Ian Ozsvald. 2020. O’ Reilly Media, Inc
- Data Science with Python and Dask. Jesse C. Daniel. 2019. Manning
- Spark: The Definitive Guide. Bill Chambers, Matei Zaharia. 2018. O’ Reilly Media, Inc