All my geeky stuff ends up here. Mostly Unix-related

Posts Tagged ‘python

Sunday Yak Shaving

leave a comment »

Seems I will never escape the curse of Yak Shaving, especially when I had plans for better things to do on Sunday. I have about half a million books in epub format and I wanted to sort them. No complicated stuff, just rename them to something like “Author, Title”. Seems pretty obvious that this information is contained somewhere in the epub files themselves, I just need some way to extract it. A quick read through the net told me everything I wanted to know about epub files: they are basically a bunch of HTML files zipped together with an XML description and possibly a cover image. The information I am looking for is in container.xml. A quick search even turned up some Python code to perform the extraction:

The tough part ended up getting import lxml to work on my Mac.

See, I replaced Snow Leopard by Lion last week, and things have only gotten worse ever since. A number of things have stopped working and needed repair. Nothing serious, just an endless series of little crashes needing no more than a bit of Googling and a couple of command-line fixes. This time my progress took me through:

lxml: Install failed, cannot find a compiler. What? I have XCode installed, I swear!

Ok, uninstalled XCode 3, installed XCode 4. Went to Apple web site, discovered that it can only be installed through the App Store (FFS???). Created an account, downloaded 1.6Gb.

At that point I let the Mac do its thing and went ironing shirts.

A couple of hours later I installed XCode: crash after crash, ended up sending lots of fancy bug reports to Apple. Read some wisdom on Stack Overflow, re-installed XCode another time (with iTunes off, this time), finally got a working compiler.

lxml: install failed. Something about ARCH not correctly set. Impossible to fix with the system Python. Decided to install another Python interpreter using brew.

Ouch: brew did not appreciate the Lion update either. Tried everything I could to salvage my installation, to no avail. gnutls turned out impossible to compile, kept crashing anything I wanted to do and did not let me install Python. Best I could do was delete everything brew-related and reinstall all packages from scratch. A couple more hours lost.

lxml: install failed again, but this time the quickfixes found on Stack Overflow did pay off. Just needed to add a couple variables to compile.

At some point the whole thing was so preposterous I knew I wanted to take it all the way till I had a working python/epub library on my desktop. The whole ordeal would have been limited to a simple ‘apt-get install python-lxml’ on anything Debian-based. My next priority now is to find a way to export all mass storage on the Mac to a VM running Linux Mint on my Mac, and forget about developing anything on a desktop that is so obviously not meant for that.

This is unfortunately not my first adventure in the realm of Apple development. So far I have never had a positive experience: bad documentation, the web site is so incredibly badly designed it looks like a practical joke, the bug report system keeps bugging (asking you to send a bug report through the bug report site that just bugged), you get HTTP 500’s at regular intervals, and the forums are just completely useless. But yeah, the GUIs look gorgeous.




Written by nicolas314

Monday 13 February 2012 at 11:53 pm

Posted in Uncategorized

Tagged with , , , , ,

Programming for kids

leave a comment »

Interesting links for kids who want to learn programming:

Scratch from MIT
Scratch is meant just for that: provide a first approach to programming. Completely visual and exists in multiple (human) languages.
Processing a full-fledged language used to create beautiful visualizations, equally loved by scientists and artists. Very easy to learn and the results are executables that run everywhere.
Snake Wrangling for Kids
This book introduces Python programming for young readers through entertaining topics.

Written by nicolas314

Saturday 22 October 2011 at 12:22 am

fapws3 +

leave a comment »

Objective: run a application with the fapws3 web server


If you are looking into fast and easy ways to run your application, there are many exciting alternatives out there claiming to be both easier to install (easy) and faster (not so easy) than Apache+mod_wsgi. A benchmark of Python web servers summarizes all good candidates today. I decided to give them all a quick try and see what they have to offer. First in line: fapws3

Here is my HelloWorld
import web
class hello:
  def GET(self):
    return 'Hello world'
urls = ('/', 'hello')
application = web.application(urls, globals(), True).wsgifunc()

and here is the glu to run it from fapws3:
import hello
from fapws import base
import fapws._evwsgi as evwsgi
if __name__=="__main__":
  evwsgi.start('', '8080')
  evwsgi.wsgi_cb(('', hello.application))

Start the server with python and point your browser to http://localhost:8080 to see it run. Ok, now let us modify a bit our web app: say I have a URL that requires longer computation times. This is simulated here with time.sleep:

import web, time
class immediate:
  def GET(self):
    return 'immediate'
class delayed:
  def GET(self):
    return 'delayed'
urls = ('/immediate', 'immediate',
           '/delayed', 'delayed')
application = web.application(urls, globals(), True).wsgifunc()

Open two tabs in your browsers, point the first to /immediate and the second one to /delayed. Now reload both… and wait 10 seconds to see /immediate get refreshed. Ouch. One long-running request blocks the whole server.


  • fapws3 is not threaded and never will be, according to the FAQ
  • fapws3 does not support SSL

No support for multi-threading means that you will have to implement your own manager/worker mechanism for long-running requests. The fapws3 FAQ recommends using many parallel instances and pound for load-balancing and SSL support. WTF?

Now I am left wondering: what could fapws3 possibly be useful for? There are so many more WSGI-compatible web servers with excellent performances, a full thread stack and complete SSL support out of the box, why should I bother with one that lets me do all the work? I probably missed something. Oh well…

Written by nicolas314

Monday 7 March 2011 at 11:39 pm

Fast Python webapp

leave a comment »

Just spent the last few days trying to find the fastest way to put together a Python webapp. Not an easy task, especially since documentation on the topic is really abundant and (I found) rarely self-sufficient. I ended up choosing what I believe is the most straightforward alignment of code to get a Python webapp up and running in minutes, and make it portable to production mode without efforts. is the simplest Python framework there is. Straight and to the point: you can program very basic stuff but if you really want to, you can add templates and database and model-view-controller design as you see fit. The basic hello world in would be:


import web

class hello:
    def GET(self):
        return 'Hello world'

urls = ('/', 'hello')
app  = web.application(urls, globals(), True)

if __name__=="__main__":


Cannot get simpler than that! URLs are mapped to callables by regexes, which gives you perfect flexibility for URL design. Your classes can implement different methods for GET and POST, keeping closer to the real REST philosophy. Without having to install any further software you can immediately test your app by running:


python is friendly enough to embed its own (pure-Python) web server for test purposes. Debug mode is also automatically activated in that mode so you will be able to get usable messages when things go wrong during development.


I have spent a lot of time with lighttpd now, browsing documentation (I even bought the book!), parsing the source and even participating on their forum. Now is time to get my return on investment. I just found out that lighttpd can launch apps directly in fastcgi mode without need to write your own boilerplate code to convert fastcgi to wsgi. Here is a minimal lighttpd configuration that just works:


server.port = 8080
server.modules = ("mod_fastcgi", "mod_rewrite")
server.document-root = "/home/www/"
fastcgi.server = ( "" =>
    (( "socket" => "/tmp/fastcgi.socket",
       "bin-path" => "/home/www/",
       "max-procs" => 5
url.rewrite-once = (
    "^/sta/(.*)$" => "/sta/$1",
    "^/(.*)$" => "/$1"


The above specifies a fastcgi handler called ‘’ that is always called thanks to the last rewrite rule, except for stuff located in /sta which is directly served by lighttpd. /sta is where you are going to store your served static content like images and css.

In production you can launch a bunch of lighttpd front-ends and configure them to talk to a fastcgi app possibly located on another server or farm of servers.

Took me quite a while to converge to this simple solution. Other paths I reviewed where:

  • Apache+mod_wsgi: too heavy
  • cherokee+uwsgi: cherokee is really nice but uwsgi is an ugly duckling
  • lighttpd+SCGI+flup+cherrypy: works but heavy and boilerplate code is ugly and un-maintainable

Not saying the other solutions are bad, they are just not as straightforward.

Written by nicolas314

Thursday 7 October 2010 at 11:24 pm

Posted in python, webapp

Tagged with , , ,

Python web development woes

with 6 comments

Writing a 2c web app is a lot harder than it looks.


Write a mockup prototype for a web-based application. Tools: anything you like, as long as the job is done quickly and can easily be modified to accomodate rapidly-evolving requirements.  In fact, this application is meant as a living demonstration of the future full-fledged stuff re-programmed later with something industrial-grade. Think of it as specifications that compile and can actually be run.

As the local Python expert I thought I would demonstrate how it can quickly get the job done with a minimal amount of efforts. Little did I know…

Choosing the web server

We are dealing with something that will essentially be web-based, choosing the appropriate web server seems the first thing to do.

I have worked with two web servers in the past: Apache and lighttpd. Apache is notoriously difficult to configure, the config file is full of traps and possible inconsistencies. There are complete books and tutorials on the Net about how to configure your own server and believe me they are all worth consulting. Apache is a really good server but if you have never used it you’d better plan a couple of weeks in advance to learn how to use it correctly.

lighttpd (pronounced “lighty”) is a really fast and lightweight server, much easier to configure. Once you have it installed you can literally have it run within minutes. Unfortunately it does not support https client-side certificates (yet) and that feature is needed for what I want to do. One guy recently submitted a related patch but unfortunately I could not get it to work against every version of lighttpd I could find. Exit lighttpd, welcome Apache!

First attempt: Python CGI script

Python comes with a cgi module that is great for writing demo scripts but quickly becomes a serious pain whenever you want to implement anything more complicated like logins, sessions or database-related stuff. This is really bare-bone but maybe a little too much. After spending half a day re-coding a session mechanism I finally gave up and moved on to the next stage.

Second attempt: Apache + mod_python

mod_python is great! A Publisher algorithm browses through your Python files and publishes on the web server anything that looks like a string or a callable. Imagine a static server responding to these URLs:


If you have a Python module containing two top-level strings named ‘hello’ and ‘world’, they will be published by mod_python and displayed verbatim. More interesting of course is to use callables (functions or instances) for a dynamic site.

Took me a complete week to finish the site with mod_python but development was a breeze. I spent more time in my application business than with the tools themselves, which is the main reason for using tools like Python.

And then I reached a stopping point: I need to authorize web clients to upload XML files to the server in an unusual MIME type. Unfortunately mod_python offers no support to do this and even worse: it silently absorbs the uploaded files and does not even bother warning your application that it missed a client request. Going through mod_python forums I could find that somebody else already mentioned this to the developers but the feature was rejected because if you want to do serious web stuff you should move to WSGI.

At that point I could have gone back to CGI for the file uploading stuff but I did not want to live with a schizophrenic code being half CGI half mod_python. Besides, I do not even want to know how much time I would have needed to make this work in the Apache configuration file. Time to leave mod_python behind and move on.

Third attempt: Python WSGI

Now I have to wade through this infamous WSGI stuff and see if it is really worth all the buzz. To make things short: WSGI is a pure Python standard that specifies how a Web framework should behave at its lowest level. The intention is to make it easier to port a WSGI-compliant application from one framework or web server to another without having to re-code anything.

I read the full specifications for WSGI and I have to admit I did not really understand the motivations behind this design. But oh well, I trust the guys to have done a good job at factorizing web frameworks. The WSGI standard itself is really low-level.  There is no way you can develop a web site just armed with it, it is only meant for middleware providers. So let’s hunt for WSGI middleware!

WSGI stage 1: Django, Pylons

Django and Pylons are full-fledged frameworks that come up with all bells and whistles. Nothing bad with these but they do suffer from the same issues, namely:

  • They offer about a zillion features I do not care about
  • They cover almost everything I need, but not quite

Which means that I will probably end up deploying lots of packages I will never use and will have to code additional functionalities into their framework just to cover my own needs.

Both packages come with half a million dependencies on various additional packages, and every package means more maintenance.  I spent a couple of days on each to try and explore and came to the conclusion that it must be really great to use them as a basis for a larger project but I would not want to do it.

Mental note: I need to train myself on these frameworks, it might come handy some day.

WSGI stage 2: Bottle

Bottle is a lightweight WSGI environment all contained within a single file. Can’t beat that in terms of the fewest dependencies!  It offers a very simple syntax to route your URLs to your objects and makes for clean code like:

def administration():
return '.... html page here ....'

def index():
return '.... html page here ....'

Nice package, but I would largely prefer having the framework pick routable objects directly from my Python modules, like mod_python’s Publisher does. There were some other features missing from it and Bottle does not seem to be maintained any more, so I reluctantly decided not to use it.

Just out of curiosity I also tried to run Bottle within lighttpd, loosing another evening in the process. lighttpd does not support WSGI, you have to install yet another middleware layer (python-flup) and run the server in FCGI mode. After a whole evening of messing around I still could not get any Hello World out of my setup and ended up tracking an obscure bug in the way lighttpd spawns sub-processes. I do not have the courage to get into that in depth.

My conclusion on lighttpd: great for serving static files, still a long way to go before it can compete with Apache. I have no doubt the lighttpd guys will eventually get there though.

WSGI stage 3: Colubrid, Werkzeug

Colubrid offers exactly the kind of thing I need: a Publisher algorithm that goes through your objects and publishes them at predictable URLs. It took me no more than an hour to transform my mod_python application to Colubrid and see it run. Documentation for this project is pretty sparse though, and it is unfortunately not maintained any more. The authors refer to Werkzeug as the tool of choice now.

Enters Werkzeug: described as a library of WSGI helpers, it tends to suffer from the same overweight issues as Django or Pylons. A lot of dependencies on other libraries and a model that is really hard to understand. I spent a couple of hours going through the tutorial and could not make sense out of it. It is probably very powerful but seems inadequate for my brain.

So Colubrid it will be. It is unmaintained but the library does not force other dependencies and even if it has little documentation I can at least understand it. If I ever face issues I will modify it to suit my needs without fear of seeing my own patches overwritten by a new version. I found a couple of bugs but no showstopper for the moment.

Wrapping it up

I learned quite a lot in the process. Python is sufficiently high-level to expect development to be quick and to the point.  And in a way, that goal is pretty much achieved. Getting a dynamic web application is just a matter of coding your business logic into classes and then hooking them into a View and a database.

On the other hand, the sheer number of dependencies for most frameworks is a definitive showstopper for production. Many of these libraries are still relatively young and lack the polish needed to adapt from the package developer’s needs to your own.

Another thing I learned is that as time goes on, Python frameworks tend to become more and more complicated, to the point that there is little left for people like me who want to just have something that handles the HTTP protocol and lets you hook in the tools you need one at a time.

Oh well… Give me enough time and I might just end up writing my own.

Looks like you are writing a framework

Looks like you are writing a framework

Written by nicolas314

Thursday 27 August 2009 at 11:43 pm