All my geeky stuff ends up here. Mostly Unix-related

Posts Tagged ‘lxml

Sunday Yak Shaving

leave a comment »

Seems I will never escape the curse of Yak Shaving, especially when I had plans for better things to do on Sunday. I have about half a million books in epub format and I wanted to sort them. No complicated stuff, just rename them to something like “Author, Title”. Seems pretty obvious that this information is contained somewhere in the epub files themselves, I just need some way to extract it. A quick read through the net told me everything I wanted to know about epub files: they are basically a bunch of HTML files zipped together with an XML description and possibly a cover image. The information I am looking for is in container.xml. A quick search even turned up some Python code to perform the extraction:

The tough part ended up getting import lxml to work on my Mac.

See, I replaced Snow Leopard by Lion last week, and things have only gotten worse ever since. A number of things have stopped working and needed repair. Nothing serious, just an endless series of little crashes needing no more than a bit of Googling and a couple of command-line fixes. This time my progress took me through:

lxml: Install failed, cannot find a compiler. What? I have XCode installed, I swear!

Ok, uninstalled XCode 3, installed XCode 4. Went to Apple web site, discovered that it can only be installed through the App Store (FFS???). Created an account, downloaded 1.6Gb.

At that point I let the Mac do its thing and went ironing shirts.

A couple of hours later I installed XCode: crash after crash, ended up sending lots of fancy bug reports to Apple. Read some wisdom on Stack Overflow, re-installed XCode another time (with iTunes off, this time), finally got a working compiler.

lxml: install failed. Something about ARCH not correctly set. Impossible to fix with the system Python. Decided to install another Python interpreter using brew.

Ouch: brew did not appreciate the Lion update either. Tried everything I could to salvage my installation, to no avail. gnutls turned out impossible to compile, kept crashing anything I wanted to do and did not let me install Python. Best I could do was delete everything brew-related and reinstall all packages from scratch. A couple more hours lost.

lxml: install failed again, but this time the quickfixes found on Stack Overflow did pay off. Just needed to add a couple variables to compile.

At some point the whole thing was so preposterous I knew I wanted to take it all the way till I had a working python/epub library on my desktop. The whole ordeal would have been limited to a simple ‘apt-get install python-lxml’ on anything Debian-based. My next priority now is to find a way to export all mass storage on the Mac to a VM running Linux Mint on my Mac, and forget about developing anything on a desktop that is so obviously not meant for that.

This is unfortunately not my first adventure in the realm of Apple development. So far I have never had a positive experience: bad documentation, the web site is so incredibly badly designed it looks like a practical joke, the bug report system keeps bugging (asking you to send a bug report through the bug report site that just bugged), you get HTTP 500’s at regular intervals, and the forums are just completely useless. But yeah, the GUIs look gorgeous.




Written by nicolas314

Monday 13 February 2012 at 11:53 pm

Posted in Uncategorized

Tagged with , , , , ,