May 5, 2020
This post is about two Haskell packages that I wrote a while ago: secret-sharing and data-dispersal.
Imagine we have a file \(F\) that is very large, important and confidential. Think of \(F\) as being a very long ByteString. Our goal is to create a backup of \(F\) in a way that
keeps the file’s contents secret,
allows us to recover the file even if some of the backup servers fail, and
uses \(O(|F|)\) space for the backup in total, where \(|F|\) is the size of the file.
Keeping our file’s content locked away from prying eyes is especially important if we’re planning to use one of the many cloud storage providers (Dropbox, GDrive, iCloud Drive, Nextcloud, OneDrive, ...) for the backup.
We are going to assume that we have \(10\) backup servers each of which will store some fragment of the data in \(F\). We will restrict our threat model by the (optimistic?) assumption that at most \(5\) of them end up losing our data or get hacked.
There is a simply solution if we don’t care about how much space (and bandwidth) we’re using: Just encrypt the file and split it before uploading it. The downside of this is that if we use \(k\) backup servers to store the encrypted file of \(M\) bytes, we will be using \(M\) bytes on each of the servers, and we need to send \(k\cdot M\) bytes every time we want to update the backup. Even though cloud storage space is cheap, it’s not entirely free. In the rest of this post, we will see how we can do much better by using data-dispersal and secret-sharing.
In a nutshell, secret-sharing implements the classic secret sharing technique See Adi Sharmi: How to share a secret.
, where we split some string \(S\) (i.e. the secret) into and the second one
For two integers \(n\) and \(m\), An \((n,m)\)-threshold secret sharing we split a secret string \(S\) into \(n\) parts with the following two crucial properties
The original string \(S\) can be reconstructed from any