TensorCAS: Content Addressable Storage for ML Checkpoints
TensorCAS: Content Addressable Storage for ML Checkpoints
A question I've been thinking about a lot recently is what it would look like to treat model checkpoints as structured object states instead of opaque blobs. Since a model is a collection of named arrays, if you could identify individual arrays by their content, you would only need to store changes between checkpoints instead of the full model every time. This is the idea of content addressable storage and is the basis for git, IPFS, and other systems that need to store large amounts of versioned data efficiently.