Fault TolerancePermalink

ObjectivesPermalink

  • Explain the importance of distributed systems for dependable computer systems.
  • Characterize distributed systems challenges in terms of abstract models and problems.
  • Discuss the role of distributed systems algorithms (e.g., consensus) in solving dependability problems. - Implement distributed systems addressing scale and dependability challenges.
  • Evaluate distributed systems addressing scale and dependability challenges.

ProgramPermalink

Foundations of fault tolerance: concepts and models. Agreement in distributed systems (consensus): impossibility results, failure detection, and algorithms. Strong consistency replication. Case study: database replication. Byzantine fault tolerance: algorithms and applications. Case study: blockchain.

BibliographyPermalink

Introduction to Reliable and Secure Distributed Programming. Christian Cachin, Rachid Guerraoui, Luís Rodrigues, Springer Fault-Tolerant Message-Passing Distributed Systems: An Algorithmic Approach. Michel Raynal, Springer Replication: theory and practice, B. Charron-Bost, F. Pedone, A. Schiper (Eds), Springer Distributed Systems for System Architects, Paulo Veríssimo, Luís Rodrigues, Kluwer Academic Reliable Distributed Systems, Kenneth Birman, Springer

Updated: