Big Data Analysis

Objectives

The main learning objective of this course is to promote a comprehensive understanding of the principles of analysing large volumes of data. The learning objectives are

  • understand the challenges of analysing large volumes of data
  • solve the challenges of analysing large volumes of data,
  • analyse case studies of big data analysis,
  • learn tools and best practices for more effective, scalable, robust and reproducible data analysis.
  • design effective solutions for analysing large volumes of data.
  • implement effective solutions for analysing large volumes of data..

Program

The CU covers several important aspects of big data analytics tools, including:

  • BigData: diversity (wide range of data formats), velocity (real-time processing of data streams to support real-time decisions) and volume
  • Fundamentals and essential approaches for designing, storing, analysing and managing semi-structured and unstructured data: data models such as tabular, tree, graph, multi-dimensional (cubes), text; and row vs column-oriented storage
  • Basic components of data analysis pipelines: acquisition, integration, exploration, mining, analysis, visualisation and interpretation
  • Considerations of scalability, availability, coherence, distribution and expressiveness of data
  • Distributed processing: approaches such as MapReduce, Dataflow/DAG and Graphs to distribute processing across several nodes.
  • Comparison between Batch and Stream Processing for different data analysis needs.
  • Performance optimisation strategies in data analysis to maximise computational efficiency.
  • Analysing large amounts of data in Python: Jupyter Notebooks, Pandas, NumPy, Dask or PySpark.

Bibliography

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. Martin Kleppmann. 2017. O’ Reilly Media, Inc
  • High Performance Python. Micha Gorelick, Ian Ozsvald. 2020. O’ Reilly Media, Inc
  • Data Science with Python and Dask. Jesse C. Daniel. 2019. Manning
  • Spark: The Definitive Guide. Bill Chambers, Matei Zaharia. 2018. O’ Reilly Media, Inc

Updated: