All my geeky stuff ends up here. Mostly Unix-related

Archive for the ‘encryption’ Category

Convergent Encryption

with 2 comments

If you did not follow the latest buzz, an Internet startup is getting a lot of attention lately: Bitcasa. Their offer seems just too good to be true: for 10$/month you get unlimited remote storage for your data on their servers. The best part is: they claim your data will be encrypted on their servers so that even they will not be able to access your file contents. They also claim you can get unlimited storage by using de-duplication: the first time a file is uploaded it is truly stored on their servers and all consecutive attempts to upload the same file will just return a pointer to the already stored one.

First reaction: this sounds like bullshit. How could you both encrypt a file so that only their legitimate user can access it, and identify redundancies on the servers? If everybody has a different encryption key, the same file will encrypt differently for each user and prevent any attempt at de-duplication. And if everybody has the same encryption key it kind of defeats the whole point of encryption, doesn’t it?

The Bitcasa founder recently mentioned convergent encryption during an interview, which pushed me into looking further into the topic. I have tried to wrap my head around it and summarize my understanding of the whole process here.

Let us do a thought experiment:

The client is given a list of files to store on the server.

Each file is cut into 4kB-blocks using padding wherever necessary. The same process is repeated for each block:

  1. Compute K = SHA256(block)
  2. Compute H = SHA256(K)
  • K will be used as encryption key for this block.
  • H will be used as an index to store/retrieve the block server-side.

Now the client queries the server for H. If the server already has this block, it notifies the client that the block needs no uploading. If the block has never been seen before it is encrypted using K and AES256 then uploaded to the server.

Once all blocks have been either uploaded or identified as already known, the client can store a list of all (H,K) pairs and enough metadata to rebuild complete files from individual blocks. Re-building the whole archive is now a matter of N requests for blocks identified by H, and N decryptions with the associated Ks, followed by file reconstruction based on metadata.

Store these metadata on the local client, and store one copy on the server too for good measure. To ensure only the client can read them back, metadata can be encrypted with a key only known to the client, or using a key derived from a user passphrase.

What did we achieve?

Client-side: initial files are now reduced to a set of (H,K) pairs and metadata needed to rebuild them. Since a copy is stored on the server, the client local archive can safely be erased and rebuilt from scratch.

Server-side: duplicated blocks are stored only once. Blocks are encrypted with keys unknown to the server and indexed by a hash that does not yield any information about key or contents. Metadata are also stored on the server but encrypted with a key only known to the user.

Executive summary:

  • The server does not have any knowledge about the data it is storing
  • The server cannot determine which files are owned by a given user without attacking their metadata.

This scheme is a good way to deal with snooper authorities, but what about dictionary attacks? An attacker who prepared a list of known plaintext blocks can easily discover if they are already present on the server. To be fair, dictionary attacks are quoted as the main vulnerability of convergent encryption schemes in all papers I have seen so far.

What are we trying to protect against?

If you offer to the world a possibility to remotely store their files, chances are that you will soon end up with a very long list of all movies and music ever produced by humanity on your servers. Your main opponent are copyright holders who do not want the public to share their productions so easily.

I believe dictionary attacks are hardly going to be usable by copyright holders. The same data block could very well be present in two very different data files, I do not see how you could ever prevent somebody from publically storing 4KB of data that match a 4KB-block in a copyright-protected data file. Copyright protection applies to a complete work, not to individual 4KB-components. You would need to prove that the same user has access to all individual blocks for the copyright violation to be proven, but that is not possible in the described scheme.

Come to think of it, this suddenly looks like it could work.

Written by nicolas314

Monday 19 September 2011 at 11:15 pm

GMail considered a liability

with 2 comments

Your digital life is online

Since free webmail providers have emerged, it has become common to enjoy universal e-mail access from every computer without restriction. This makes e-mail ubiquitous, giving us the power to delve into our oldest archives to retrieve pictures, messages, links or conversations whenever we want to access them.

Storing all of your digital life with a single webmail provider like GMail, Yahoo or Hotmail makes you more reactive and also provides this warm safe feeling that all your private information is always within reach. Yet it is also unfortunately a single point of privacy failure. If you can access all of your e-mail history, so does anybody with either your password or administrator access at your webmail provider, whether they are regular admins or successful hackers.

Convenience kills security

Webmail services suffer from security issues. In short:

Administrator access

Anybody with admin privileges at your webmail provider can read your e-mail. Why would a GMail administrator want to access your e-mail history? After all there are millions of users, why would your mailbox be more interesting to them than any other?

Thing is: the very fact that an unknown person (or software) potentially has complete access to your e-mail history should be enough to make you nervous. There are now laws in the US that enable government bodies to access anybody’s e-mail history without having to reveal their investigation, i.e. you will never be told that your messages have been used to gather evidence about you. This is an issue for everybody since GMail, Yahoo and Hotmail are storing a copy of all their users’ data on servers located in the US, where this law is applicable. Still not nervous?

Security breaches

A quick search on the web for webmail security breaches should be enough to convince you that no matter how hard they try, webmail providers will never be able to protect their users’ data with 100% security. See for example:

No matter how hard they try, a webmail system with millions of users will always have flaws and there will be people to exploit them. Securing a network is a very hard task, perfect security does not exist.

Stealing your password

There are many ways to steal somebody’s password. Shoulder-surfing is the most obvious one: observe a user when they type their password, in general you get enough clues to make an educated guess pretty quickly.

If you do not know the person, targeted phishing can be very successful. Send an e-mail to your victim containing a link to a web site you own. This web site displays a fake webmail error page about an expired session and invites the user to enter their credentials again. Done.

This may seem far-fetched but this kind of attack is actually very easy to put together and has been demonstrated to work exceedingly well with all kinds of population, including unwarned security experts. Believe me, I recently tried. Getting a GMail password is just a matter of setting up a tiny web site and sending an e-mail. And if you do not feel like doing it yourself you can hire somebody to do it for you. For a price between $50 and $200 you can get anybody’s webmail password in clear within a couple of days. True story.

Another phishing method was recently described. I believe this is virtually impossible to thwart except by pure chance. What this means is: sooner or later your webmail password will be known to other people than yourself. Prepare for that day or prepare to suffer.

Privacy breach

A privacy breach could destroy lives. Getting access to somebody’s e-mail history will give you more than just a view into their hearts.

To take a parallel, what if starting from now, everything you said was recorded forever and could be used against you at any time in the future with sentences taken out of their context? What if the database of everything you ever said was made public and searchable?

Keeping full control on your e-mail history should be considered an absolute priority. Possible outcomes from a privacy breach are broken lives.

What must be protected

Hardly any e-mail you send or receive is in itself a privacy liability in itself. There are of course countless examples of disastrous reply-to-all messages that were intended for a single person and unfortunately sent to a whole mailing-list, but this tends to remain a negligible part of the e-mail flow. Additionally, the reply-to-all catastrophe can also be avoided with better e-mail clients that warn you before sending messages to a list.

Take one embarrassing e-mail out of my mailbox, publish it on public forums and I will either ignore or deny it. You will not have enough information to embarrass me or if you do I will simply call you a liar.

Take my whole mailbox and the e-mail history it contains for several years and you have enough information to impersonate me with as much precision as you like. You will probably be able to gather a list of additional accounts I have on every web site I have been to, know my most intimate friends and thoughts, and probably be able to reconstruct my personal life day by day.

e-mails become a privacy treasure:

  1. When they gather in an archive
  2. Simply over time, by the very fact that they document past events

An e-mail is something instantaneous. It is written within a very focused and narrow context and may later acquire much more importance than the very message it carries. Think of the many books from famous authors that are only made of letter exchanges they had with their friends and families: the insight they give you about their authors teaches you infinite treasures about the context when they were written, both about the writers and the situation they were in. The whole is worth far more than the sum of its components.

E-mail encryption?

Encrypting individual e-mails would solve the issue but is obviously overkill. Individual messages taken separately are not really a danger.


If you want to take the encrypted e-mail path, PGP and S/MIME are two well-designed systems but quite impractical if not implemented seamlessly for the end-user. When you do activate PGP encryption on your mail client, you always ask yourself before sending any message “is this message sensitive enough that I need to encrypt it?”. The answer is almost always “no” and you quickly learn to forget how to use e-mail encryption.

Most importantly, e-mail encryption needs both sender and receiver to agree on an encryption mechanism, something that just cannot be asked from any user. Either it happens at the lowest level without users knowing, or it is not used. Neither PGP nor S/MIME are there yet.


Hushmail is a service offering a free public webmail with limited inbox size for non-paying users. Their whole business is designed around encryption built into their system and it turns out to work pretty well between Hushmail users. Unfortunately it quickly gets impractical or totally unusable when you need to communicate with anybody outside of Hushmail.

Despite heavy advertising on their encryption capabilities, Hushmail has lost a lot of credibility when it was revealed that they had at some point handed over the keys to US government agencies for “security reasons”.

So much for privacy…

While I have no doubt that authorities have good reasons to invade some user’s privacy, it shows that there are technical means to access user mailboxes. What does a determined attacker need to get the same access rights as lawful bodies? I do not have the answer to that question but I’d rather take no risk.

Damage control

Perfect security systems do not exist but disasters can be mitigated. Free webmail providers are convenient and there must be ways to keep using them without loosing all the benefits. What can we do?

Use encrypted e-mail storage

Use asymmetric cryptography and let the users choose their own keys. This way, no administrator could access your e-mail history. This kind of service is actually sold by for a modest fee.
The level of webmail functionality between lavabit and GMail just cannot be compared though. GMail’s web interface is beautiful, lavabit has the bare e-mail functionality. Apples and oranges: Google is the most powerful corporation on the planet, lavabit is not.

Roll your own

Stop archiving e-mails with your webmail provider!

A very simple solution is to keep your mail archive out of your webmail provider’s reach. Make it a habit to download all of your inbox to a local folder at regular intervals and make sure all archives are deleted on the webmail site.

For GMail, this implies downloading your mailbox off the GMail/[All Mail] box through IMAP. Make sure your IMAP client deletes mail really instead of just sending it to the Trash folder where it will still live for 30 days. I do not use Yahoo or Hotmail but there must be equivalent procedures.

Ok, now that you got your e-mail archive off the web you are a bit safer. There are two other topics you want to address though:

  1. Make sure that only you can access your e-mail archive
  2. Enable mail archive browsing from any of the computers you use, ideally from your e-mail client.

First topic can be addressed using any disk encryption tool. Second topic can be addressed using services like Dropbox that take care of replicating the same data on all computers registered to your account.

One possible solution

I have spent a bit of time testing and tweaking and finally came to a workable solution:

  • A Dropbox account. If you do not know Dropbox check out this previous post: dropbox love
  • Get TrueCrypt from TrueCrypt is free and open-source, it works on Windows, Mac and Linux.

Initial procedure:

  • Create a TrueCrypt container and populate it with your e-mail archive.
  • Copy the TrueCrypt container to your Dropbox folder and let it sync.

This is going to take a while, depending on the size of your TrueCrypt container and available upload bandwidth. But fortunately this only happens once. Dropbox and TrueCrypt work fine together: when you change just one bit of a file in the encrypted container, only the difference are sync’ed, not the complete file.

Daily procedure:

  • Keep using your webmail as usual

Accessing archives to read or update them:

  • Start dropbox, make sure your encrypted container is sync’ed to the latest version, stop dropbox.
  • Mount your encrypted container with TrueCrypt
  • Start your e-mail client and browse your e-mail archive. You can move mails from your webmail archive to your encrypted container at that point.
  • When you are finished: stop your e-mail client, unmount your encrypted contained. To upload your modifications: start dropbox, let it sync.

This solution is by no means ideal, it requires a number of interactions with three pieces of software: TrueCrypt for encryption, Dropbox for synchronization, and an e-mail client to move mail around. But in the end it is incredibly safer than anything I have seen so far. Taking matters into your own hands guarantees that:

  1. Your e-mail archive is only available to you
  2. You have multiple copies of your e-mail archive on all computers you use, and one at Dropbox.
  3. Your e-mail archive is integrated with your e-mail client.

There are probably more convenient solutions but for now this is the best I found. Suggestions are welcome.

Have a safe e-mailing day!

Written by nicolas314

Monday 21 June 2010 at 5:15 pm