FT
Fault TolerancePermalink
ObjectivesPermalink
- Explain the importance of distributed systems for dependable computer systems.
- Characterize distributed systems challenges in terms of abstract models and problems.
- Discuss the role of distributed systems algorithms (e.g., consensus) in solving dependability problems. - Implement distributed systems addressing scale and dependability challenges.
- Evaluate distributed systems addressing scale and dependability challenges.
ProgramPermalink
Foundations of fault tolerance: concepts and models. Agreement in distributed systems (consensus): impossibility results, failure detection, and algorithms. Strong consistency replication. Case study: database replication. Byzantine fault tolerance: algorithms and applications. Case study: blockchain.
BibliographyPermalink
Introduction to Reliable and Secure Distributed Programming. Christian Cachin, Rachid Guerraoui, Luís Rodrigues, Springer Fault-Tolerant Message-Passing Distributed Systems: An Algorithmic Approach. Michel Raynal, Springer Replication: theory and practice, B. Charron-Bost, F. Pedone, A. Schiper (Eds), Springer Distributed Systems for System Architects, Paulo Veríssimo, Luís Rodrigues, Kluwer Academic Reliable Distributed Systems, Kenneth Birman, Springer