All my geeky stuff ends up here. Mostly Unix-related

Archive for the ‘network storage’ Category

Music in 2014

leave a comment »

Got to meet old-time friends this Christmas and I was amazed to discover that many of them are still die-hard fans buying all of their music on audio CDs. Guys: we are not living in 2014 and you are still buying physical objects to listen to music? Say again?

I must have given up on audio CDs about 15 years ago when mp3s starting flowing around. It first started with dedicated web sites distributing sound files (aiff or au format first, then mp3). The thing snowballed very quickly and then we had Napster and Kazaa to download all of our stuff through our glorious 33k US Robotics modems.

Truth be told, Internet sharing was not the biggest source. I owned about 300 audio CDs at that time, and most of my friends had between one hundred and one thousand music CDs at home. One day we started encoding all of them (using CDex) and circulated hard drives fully loaded with tons of music.  After a few months we had all gathered more music than we possibly could listen to in our awake moments for the next decades. Internet came as an extra source for very recent albums or stuff you could not find in brick-and-mortar stores: bootlegs, one-of-a-kind albums, and little-known artists.

I purchased some of the first portable MP3 players in 1998 and hooked one to my stereo. That was probably the last year I actually inserted a physical disc into a CD reader.

Is that deviant? I do not think so. Let me take an example based on reasonable assumptions:

– Apple’s iPod Classic offers 160GB of storage
– A song is 3-min long on average
– Albums contain 10 songs on average
– Songs encoded in 128k take up 1MB/min on average

Hence, an iPod Classic contains about 5,000 albums.

Assuming albums are priced 10 euros on average, this represents 50,000 euros worth of music on a device currently priced around 200 euros.  I cannot figure somebody who stores 5,000 CDs at home and would be willing to encode them one by one, or somebody who would be willing to spend 50k on a music collection. Seriously: has anyone ever filled up an iPod with only legally-acquired music? How many iPod Classic users have actually spent so much time or money on their content? If there ever was a business model based on the assumption that people would pay for the content they listened to, it is obviously unaware of those very basic facts.

Music is not a luxury or a commodity. It is part of our human culture and I would go as far as saying it is part of our daily needs. You can survive without music, like you could survive without speaking or bathing, but it is not going to be fun. The only government I ever heard about that decided to prohibit music were the Talibans between the Russian and the American occupations, and they did not end too well.

You can certainly control, tax, and rule the distribution of physical objects like audio CDs and stereos, but you cannot possibly have any effect on people singing in their showers, friends having a gig, or people who just want to dance to something else but deep silence. Music is a form of language, it is meant to be expressed and shared in order to be alive.

CDs are a convenient way to distribute and share music among humans, not the only possible one. We now all have high-bandwidth Internet connections from home and mobile devices, but we can also share huge collections of music face-to-face by just carrying a 500GB hard drive around. Since music wants to be shared, anything that goes into this direction is naturally favoured. You cannot prevent humans from sharing a form of language they take pleasure in hearing, no more than you can prevent them from telling stories or showing pictures of beautiful places they visited.

My kids have never bought a single CD or even inserted one in a CD player. When they want to listen to music they turn on their iPods and choose an album.  Each of their mini iPods contains between twenty and a hundred times more music than was available to me as a teenager in the 80s. They can try everything, build up their tastes, dance, sing, and experience the whole world of music for the price of a single device. Compare that to the tapes and vinyls we carried around thirty years ago: we were stuck into the same few artists and rarely experienced new stuff. If we did, it was through low-quality pirated tapes and few of us could afford spending money to purchase everything. We just shared.

Having literally thousands of albums on a hard drive is not a solution though. If you want to be able to play them on any MP3 player, you often need to transcode songs, sort them out, find the album covers and re-tag all the songs correctly if you do not want to end up with a million songs labeled “Unknown Song” by “Unknown Artist” in “Unknown Album” — unless you have an iPod Shuffle and enjoy it, of course.

Since about 5 years, Spotify and Deezer have changed the rules once more: instead of curating your own MP3 collection you can rely on other people doing it for you. They take the time to sort things out, put the right covers, search for the lyrics, find links to the band Wikipedia page, etc.  The really exciting part is that this kind of service holds a million more times what you could possibly store at home, and they keep storing new artists every day. If you want to discover new talents, there is no way you could reach that with your personal MP3 collection. Disclaimer: I have no part in those services, I am not even subscribed.

You have to admit that kind of thing goes into the right direction for the environment. When I hear that an artist has sold 2 millions copies of an album, I cannot help but think of how many tons of plastic have gone into making discs, to distribute the same amount of data to a large audience.  Instead of all having terabytes of MP3s at home, isn’t it be more sensible to store everything into the same unique pool and make it easy for everybody to access the pool remotely? This is exactly what Amazon and Google Music are doing.

We cannot remain blind to the main issue though: how do we fund artists?  It does not take much insight to see that a business based on scarcity of physical goods has no chance against electronic goods that have no cost to store and copy.

A famous post written by Courtney Love summarized the situation in the early 2000s: Courtney Love does the math
tl;dr: Out of the millions generated by her gigs, she and her band only succeed in making a modest revenue. The rest is eaten up by majors.

Tough time for CD vendors, but the fact that the current model of selling CDs is dead does not have to mean there is no other choice. Looking at recent stats, it seems more and more artists are getting most of their revenues from live performances and various merchandising items sold on the spot: T-shirts, mugs, and the inevitable band posters.

See this post dated Nov 2013 about shifting artist revenus over the past 15 years:

I have absolutely no trouble with this model. Again: music wants to be expressed and shared! Live performances are the perfect incarnation of this fact.

Business is going to be tough for people who produce CD-only pieces, things you cannot easily share and enjoy in a live performance. It does not mean they have to cease their activities though. Other models based on free contributions have also been quite successful in many cases.

Music can be distributed under permissive file-sharing licenses (e.g.  Creative Commons). See sites like Other artists have decided to offer their songs for free download from their own web sites (e.g. Radiohead) and invite their fans to contribute whatever they want, aka the beggar model, or as Courtney Love put it: “I am a waiter”. Others are asking for funds
through Kickstarter equivalents for music. You name it. Compare that to the emergence of radio broadcasts: music was suddenly free and available to all without limits, and yet it survived and generated a huge music recording industry. Some variables have changed but the issue remains the same at heart: let us enjoy your music and we will find a way to fund your next album. You will probably not become as rich as Madonna or Michael Jackson in the 80s, but there should be enough for you to survive.

Another major shift happened recently: the price to pay to record an album is now so low that just anybody can do it with a consumer-class computer at home and get a fairly high-level quality. This certainly reduces the role of Music Majors even further. It used to cost a fortune to record a song, which is why you needed investors to create an album. Not anymore. The price of producing an album and distributing it through the Internet is so low that you do not need to involved bankers and contracts. Just do it over the weekend with the same computer you use to play Starcraft 2 and you are done. What do we still need those record companies for, then?

Lowering the barrier for entry has had consequences. As Moby put it: you have a lot more mediocrity on the market, and real talents are drowned in a flow of bad music. I take this point, but removing top-level executives from the chain of decision can only increase the diversity of what we are hearing, and that is a good thing. Between 1960 and 2000, everything you heard was carefully selected by a small bunch of old white people who made all the decisions about who had a right to be popular. Removing this bias opens the gates to mediocrity, but also to many more talents that would have remained silent.

If you are interested in the topic, you may enjoy this documentary: PressPausePlay dated from 2012.  A lot of the points I touched above are reviewed with a lot more data.

Written by nicolas314

Monday 13 January 2014 at 12:53 am

Posted in music, network storage

Tagged with , , ,

Convergent Encryption

with 2 comments

If you did not follow the latest buzz, an Internet startup is getting a lot of attention lately: Bitcasa. Their offer seems just too good to be true: for 10$/month you get unlimited remote storage for your data on their servers. The best part is: they claim your data will be encrypted on their servers so that even they will not be able to access your file contents. They also claim you can get unlimited storage by using de-duplication: the first time a file is uploaded it is truly stored on their servers and all consecutive attempts to upload the same file will just return a pointer to the already stored one.

First reaction: this sounds like bullshit. How could you both encrypt a file so that only their legitimate user can access it, and identify redundancies on the servers? If everybody has a different encryption key, the same file will encrypt differently for each user and prevent any attempt at de-duplication. And if everybody has the same encryption key it kind of defeats the whole point of encryption, doesn’t it?

The Bitcasa founder recently mentioned convergent encryption during an interview, which pushed me into looking further into the topic. I have tried to wrap my head around it and summarize my understanding of the whole process here.

Let us do a thought experiment:

The client is given a list of files to store on the server.

Each file is cut into 4kB-blocks using padding wherever necessary. The same process is repeated for each block:

  1. Compute K = SHA256(block)
  2. Compute H = SHA256(K)
  • K will be used as encryption key for this block.
  • H will be used as an index to store/retrieve the block server-side.

Now the client queries the server for H. If the server already has this block, it notifies the client that the block needs no uploading. If the block has never been seen before it is encrypted using K and AES256 then uploaded to the server.

Once all blocks have been either uploaded or identified as already known, the client can store a list of all (H,K) pairs and enough metadata to rebuild complete files from individual blocks. Re-building the whole archive is now a matter of N requests for blocks identified by H, and N decryptions with the associated Ks, followed by file reconstruction based on metadata.

Store these metadata on the local client, and store one copy on the server too for good measure. To ensure only the client can read them back, metadata can be encrypted with a key only known to the client, or using a key derived from a user passphrase.

What did we achieve?

Client-side: initial files are now reduced to a set of (H,K) pairs and metadata needed to rebuild them. Since a copy is stored on the server, the client local archive can safely be erased and rebuilt from scratch.

Server-side: duplicated blocks are stored only once. Blocks are encrypted with keys unknown to the server and indexed by a hash that does not yield any information about key or contents. Metadata are also stored on the server but encrypted with a key only known to the user.

Executive summary:

  • The server does not have any knowledge about the data it is storing
  • The server cannot determine which files are owned by a given user without attacking their metadata.

This scheme is a good way to deal with snooper authorities, but what about dictionary attacks? An attacker who prepared a list of known plaintext blocks can easily discover if they are already present on the server. To be fair, dictionary attacks are quoted as the main vulnerability of convergent encryption schemes in all papers I have seen so far.

What are we trying to protect against?

If you offer to the world a possibility to remotely store their files, chances are that you will soon end up with a very long list of all movies and music ever produced by humanity on your servers. Your main opponent are copyright holders who do not want the public to share their productions so easily.

I believe dictionary attacks are hardly going to be usable by copyright holders. The same data block could very well be present in two very different data files, I do not see how you could ever prevent somebody from publically storing 4KB of data that match a 4KB-block in a copyright-protected data file. Copyright protection applies to a complete work, not to individual 4KB-components. You would need to prove that the same user has access to all individual blocks for the copyright violation to be proven, but that is not possible in the described scheme.

Come to think of it, this suddenly looks like it could work.

Written by nicolas314

Monday 19 September 2011 at 11:15 pm

Fixing Google Music

leave a comment »

Throwing the ball at Google Music in my previous post was easy. The service is in beta after all and looking for ways to optimize user comfort, legal compliance and business model at the same time. Not sure there are obvious solutions to that, especially if you are not willing to enter discussions with the MAFIAA.

Does not mean that it is impossible though. Let’s try to fix Google Music, shall we?

Fix the initial upload

Uploading my whole music collection would require about 60 days full-time on my current DSL line. I tried uploading from other locations with better upload bandwidth but unfortunately Google Music Manager does not support HTTP proxies (yeah, beta). Why should I have to upload my music after all? I bet Google has half a billion users uploading Pink Floyd’s Dark Side of the Moon right now, this eats up tons of space for the same files over and over again, and uses bandwidth for nothing.

Guess what? Apparently Google did not have much choice, but it seems things have just changed: Cloud music is not a crime

Much better. Now I will just point the manager to my copious MP3 collection and let Google Music decide that I can access all of these from the cloud. Pretty cool! Hey wait: how does Google determine that a file on my disk is the same as a file in the cloud? Sheer MD5? Cool! This means that if I can produce a set of files with the same MD5 fingerprints, Google will automatically give me access to the real music files up there. Instead of downloading albums, I can now just download a set of files with the correct fingerprints, or whatever it takes for Google Music Manager to identify them as valid music files and give me access to them instantly. No need to upload but no need to own the real data either!

Going a bit further: there is actually no need to download files. I bet you can hack Music Manager into believing that you have a huge set of music files of your choosing and let it activate it all in the cloud for your account. The Music Manager is a piece of software running on my computer, I can hack the OS all I want to make it believe what I want. I give this a couple of months before somebody finds a way to do that.

Packaging it all could be made even more convenient:

  • Write a script that opens a Google account for you, get user help when the captcha is required
  • Automatically subscribe to Google Music, download Music Manager
  • Feed Music Manager whatever it takes to make it believe you have 20,000 songs on your computer
  • Instant access to 20,000 songs!
  • Profit!

We could easily imagine scripts to get instant access to 20,000 jazz pieces, or 20,000 classical recordings, or 20,000 best popular songs. You name it.

Even without having to create new Google accounts, you could have a script that bullshits Music Manager into giving you access to 20,000 songs of your choice on your existing account. You could offer dedicated themed radios too. The sky’s the limit.

Sure, you won’t be able to download the songs, but you will be able to listen to all the music you want from your Android phone or anything that has a flash-enabled web browser.

Not yet ideal but that would certainly make the service a lot more interesting :-)

Written by nicolas314

Tuesday 23 August 2011 at 1:50 pm

Google Music (beta) Review

with 2 comments

Been using Google Music for a couple of weeks now. Time for reviewing!

What is it about?

Google recently opened Google Music to compete against Amazon, Spotify, and Apple in general. Compared to other services the terms are pretty simple: you can upload up to 20,000 of your own mp3 songs at no cost (for now) and they are available to you anywhere you can call from a browser. Google also offers an Android app that can sync with your music collection: choose the albums you want on your phone, put it on WiFi and let it download locally for offline listening.

Seems sweet huh?

Hmm… Only two weeks in and I am not convinced. Let me elaborate.

Beta Quirks

The service is still in beta, which seems to be the norm for all Google services, but this time it really feels like it is half-baked with profound design errors.

The only way you can upload music is through a desktop app. There is one for Windows, Mac and Linux. I installed the one for Mac, — a pretty big app for something that is just supposed to upload stuff — started it and got a configuration screen. I clicked a bit everywhere in hope of getting in control but there are very few things you can set in the panel, and once you click Ok the app disappears.

Ok so it wants to remain unseen like a background daemon doing its job without me noticing. That was friendly but I really want to be in charge and know what is happening. Guys this is my bandwidth!

As for upload options you are given a choice between:

  • Uploading your “Music” folders (? didn’t try)
  • Uploading your iTunes collection
  • Uploading a list of folders of your choice

I stupidly chose to upload my iTunes stuff, thinking Mac integration would be best with the Apple-sanctioned music handler. And then nothing. Waited a bit and noticed my bandwidth was being savagely maxed. I had to rummage through the System Preferences to find what in hell this thing was doing, and finally discovered it was courageously uploading all the podcasts I stored for later perusal on my iPod. No! No! I do not care about putting podcasts in the cloud! I can find them easily enough on their respective web sites, don’t need an extra copy with Google. So how do you stop this thing? Well, turns out you cannot, so I ended up uninstalling it altogether.

Ok, reinstall and this time choose Upload folder. Point it to a folderful of music and… wait. There are 20 Gigs in there, how long is this gonna take? Quick calculation: about three days. THREE EFFING DAYS?? Forget about web browsing when all the upload is gone.

But Ok, I’ll play. Eat my bandwidth, Google.

Three days later I got a nice set of 3,000 songs up there. Now what? Now I can proudly listen to a bunch of files I already had access to, by definition, at work and at home. Putting 8 Gb of it on my Android would not have taken me that long through USB. What is the service really?

I just cannot get any grasp as to why I would like to use this service at all. My personal music store at home runs in about half a terabyte of music of all possible styles, trends and periods, I will never be able to upload all of it to Google’s servers. And even if I could, what would be the point really? Transferring music to my phone is really easy (mount phone through USB, copy) and I only need to do it every couple of months or so. With USB thumb drives running now with 32Gbs of storage, it has never been easier to carry around tons of music and transfer them without difficulty anywhere I go. The “portable” aspect of having my music online just escapes me.

One point about storing things up in the cloud is that it remains there in case you loose all of your backups in a fire. But honestly: if my house burnt, my music collection would be the least of my worries. I bet if this happened I could rebuild a full music DB in a matter of days by just gathering music from friends. This could actually be a good opportunity to discover new stuff, come to think of it.

One useful feature could have been to be able to download your music back from the cloud. Putting everything up there could at least be useful to share songs with friends. But no: nothing is planned for music download. So much for sharing.

The web-based part tries to be as nice as possible but still has a way to go. Some uploaded albums were not recognized correctly: missing covers, or you end up with two half-albums sharing the same name, each one having half of the songs of the original album. Yuck. In terms of file formats you can only upload mp3. Forget ogg, flac, ape, or anything exotic. Barf.

Net result: I am not buying it. The service has less features than what I can do myself with a few euros worth of gear in my pocket. A “free” service that has already cost me more in electricity than what I could have achieved in five minutes over USB.

C’mon Google. We all know you can do better than that! Don’t let the MAFIAA ruin what could have been a great project.

NB: Google Music is currently only available through invitations and you need to be in the US to activate it.

Written by nicolas314

Saturday 20 August 2011 at 1:51 am

My 2c on Amazon

with one comment

Hide the family jewels

As an early adopter I have enjoyed digital cameras at home for over 12 years now. This translates into about 20Gb of JPEGs on my home partition which I absolutely do not want to loose. I had the painful experience of getting burglarized a few years back and was lucky enough to recover my computers from the police station a couple of days later. The hardware itself has no importance to me but the pictures are of course priceless. This calls for a drastic solution: backup, backup, and remote backup. First two steps are easy: multiply the copies of your pictures using rsync on various hard drives around the house and you are covered against single hard drive failure. Make sure you take the habit of sync’ing them all every time you get a new bunch of pics and you are set. Now what are the solutions for remote backups?

Store it at work

The obvious solution is to encrypt a disk and leave it somewhere in my office, but that has obvious drawbacks. First is that I have to think about bringing the disk home every time I add more data. I tried it for a while and could never think about updating the drive. Second point is that there are lots of people going through my office every day. Even if I trust my colleagues, it is always tempting to borrow a USB hard drive you have seen sitting around the office for ages. The contents are of course encrypted, which makes the drive appear as unformatted to the untrained eye.

I do not want to lock stuff in drawers. Last time I did, I lost the keys and had to destroy a drawer to get to my stuff. Kinda cryptography in the real world, except brute force actually works.

Network storage

Network storage solutions are a dime a dozen and literally exploding these days. I tried a lot of them and came to the conclusion that Dropbox is by far the best in terms of usability and functionalities. It is the only solution I tried that has clients for Windows, Mac and Linux and that can dig through the firewall and http proxy at work without me configuring anything. It also has an iPhone app to review your files on the go and this is absolutely gorgeous. I can finally have the illusion of having the same disk at home on all machines, at work, and in my pocket.

I will probably become a paid subscriber at some point. The remaining detail I have to fix is to figure out how to upload 20 gigs of data to their servers with my puny 100kB/s home DSL connection. Dropbox also does not offer encryption, I have to figure out a way to encrypt everything on the fly but still make contents accessible for easy retrieval like an index or equivalent.

Amazon S3

Another shot at network storage solutions brought me to Amazon S3. This service offered by Amazon is mostly aimed at developers who want to host large amounts of data like a database backend for a dynamic web site. It is a bit rough around the edges. Lots of people have tried disguising the whole thing as a network disk without much success. Reviewing existing Python APIs and fuse-based stuff did not reveal anything revolutionary or stable. Anyway, I felt I just had to try it out.

My tests consisted in creating a dedicated directory (a bucket in Amazon terms) and upload 100 Mb of data to see how easy it would be. I want both to be able to sync my picture directories and encrypt all contents on the way up without having to recode too much stuff. I ended up with a little bit of Python glu around rsync and gpg that was not too satisfactory. It worked for basic tests but I would not have relied on my own code for production :-)

Amazon S3 is not a free service, but it isn’t expensive either. Doing my whole test set ended up with a bill for less than 2 euros. Fair. But this is where it hurts: Amazon billed me in US dollars and that triggers international charges on my credit card that are far above these 2 euros. In the end I might make my bank richer and will not bring anything to Amazon.

Pained by what I had discovered on my bank monthly slip, I decided to close the lid on the S3 experience and deleted all data from the bucket I created. Next month I was charged $0.02 for this operation, which turned into an absolutely ridiculous amount in euros with a fair charge attached from the credit card because they did not appreciate my micro-payment.

This is probably the last time I ever use S3. I really do not understand why Amazon can bill me in euros for books (even when I buy in the UK) and not for services. Another good idea could be for them to cumulate bills until they reach a reasonable sum like 10 or 15 euros. It would not change much to their cash flow and would really avoid un-necessary bank feeding.

My 2c on Amazon S3 have cost me more than my phone bill this month.

Written by nicolas314

Thursday 10 December 2009 at 11:09 pm