Engineering Journal | Articles Page 6

Heartbeats: Knowing a Node Is Alive

Failure Detection

Heartbeats: Knowing a Node Is Alive

How heartbeats help systems detect failures without confusing slow with dead.

April 30, 2024 | 4 min read Read

Gossip Protocols: Small Messages, Shared State

Cluster State

Gossip Protocols: Small Messages, Shared State

A simple explanation of gossip protocols and why they are useful for spreading cluster state.

April 18, 2024 | 4 min read Read

Fencing Tokens: Stop the Old Owner

Concurrency

Fencing Tokens: Stop the Old Owner

How fencing tokens protect shared resources when an old leader wakes up after a pause.

April 8, 2024 | 4 min read Read

Eventual Consistency: Read the System Like a Timeline

Consistency

Eventual Consistency: Read the System Like a Timeline

A practical explanation of eventual consistency and how to make delayed updates understandable to users.

March 26, 2024 | 4 min read Read

Determinism: Same Inputs, Same Outcome

Correctness

Determinism: Same Inputs, Same Outcome

Why deterministic processing makes replay, recovery, testing, and distributed debugging much easier.

March 14, 2024 | 4 min read Read

CAP Theorem: The Practical Meaning

CAP

CAP Theorem: The Practical Meaning

A practical explanation of CAP theorem through the choice a system makes during a network partition.

March 5, 2024 | 4 min read Read

Byzantine Faults Without the Mystery

Fault Tolerance

Byzantine Faults Without the Mystery

A plain explanation of Byzantine faults, where a participant may lie, corrupt data, or send conflicting answers.

February 22, 2024 | 4 min read Read

Availability in Distributed Systems

Distributed Systems

Availability in Distributed Systems

A practical explanation of availability, graceful degradation, and what users should still be able to do when part of a system is unhealthy.

February 12, 2024 | 4 min read Read

Distributed Computing Jargon: An A to Z Glossary

Distributed Systems

Distributed Computing Jargon: An A to Z Glossary

A plain language glossary of distributed systems terms, from availability and CAP to Lamport time, consensus, ordering, serialized transactions, and ZooKeeper.

January 31, 2024 | 5 min read Read

Simple. Scalable. Systems.

Practical notes on building reliable software by Arunkumar Ganesan

Heartbeats: Knowing a Node Is Alive

Gossip Protocols: Small Messages, Shared State

Fencing Tokens: Stop the Old Owner

Eventual Consistency: Read the System Like a Timeline

Determinism: Same Inputs, Same Outcome

CAP Theorem: The Practical Meaning

Byzantine Faults Without the Mystery

Availability in Distributed Systems

Distributed Computing Jargon: An A to Z Glossary