Like it says on the tin, it’s a hash table that’s distributed across many nodes. It originally came out of p2p stuff to address problems w/ centralization (aka lawsuits).

Characteristics:

  1. nodes have no central coordination mechanism
  2. the system continues to work with nodes joining/leaving/failing
  3. it can work at thousands/millions of nodes

When a file needs to be created, it generates a hash (e.g. sha1 of a filename). That key is sent to a node in the DHT which forwards it around until it finds the node that should hold it, according to the keyspace paritioning rules, where it’s saved.

The key space is partitioned using consistent hashing or a similar method.

Nodes are connected using an overlay network. When a file is requested, a given node will hash the key and forward it to any of it’s peers which are “closer” to where it’s supposed to be, according to the partitioning scheme. If it is the closest location, it assumes that’s where it’s supposed to be. There is a natural tension between how deeply connected a given node is and how long it takes to get to the best node.