Nicolas314

All my geeky stuff ends up here. Mostly Unix-related

Python web development woes

with 6 comments

Writing a 2c web app is a lot harder than it looks.

Objective

Write a mockup prototype for a web-based application. Tools: anything you like, as long as the job is done quickly and can easily be modified to accomodate rapidly-evolving requirements.  In fact, this application is meant as a living demonstration of the future full-fledged stuff re-programmed later with something industrial-grade. Think of it as specifications that compile and can actually be run.

As the local Python expert I thought I would demonstrate how it can quickly get the job done with a minimal amount of efforts. Little did I know…

Choosing the web server

We are dealing with something that will essentially be web-based, choosing the appropriate web server seems the first thing to do.

I have worked with two web servers in the past: Apache and lighttpd. Apache is notoriously difficult to configure, the config file is full of traps and possible inconsistencies. There are complete books and tutorials on the Net about how to configure your own server and believe me they are all worth consulting. Apache is a really good server but if you have never used it you’d better plan a couple of weeks in advance to learn how to use it correctly.

lighttpd (pronounced “lighty”) is a really fast and lightweight server, much easier to configure. Once you have it installed you can literally have it run within minutes. Unfortunately it does not support https client-side certificates (yet) and that feature is needed for what I want to do. One guy recently submitted a related patch but unfortunately I could not get it to work against every version of lighttpd I could find. Exit lighttpd, welcome Apache!

First attempt: Python CGI script

Python comes with a cgi module that is great for writing demo scripts but quickly becomes a serious pain whenever you want to implement anything more complicated like logins, sessions or database-related stuff. This is really bare-bone but maybe a little too much. After spending half a day re-coding a session mechanism I finally gave up and moved on to the next stage.

Second attempt: Apache + mod_python

mod_python is great! A Publisher algorithm browses through your Python files and publishes on the web server anything that looks like a string or a callable. Imagine a static server responding to these URLs:

http://[server]/hello
http://[server]/world

If you have a Python module containing two top-level strings named ‘hello’ and ‘world’, they will be published by mod_python and displayed verbatim. More interesting of course is to use callables (functions or instances) for a dynamic site.

Took me a complete week to finish the site with mod_python but development was a breeze. I spent more time in my application business than with the tools themselves, which is the main reason for using tools like Python.

And then I reached a stopping point: I need to authorize web clients to upload XML files to the server in an unusual MIME type. Unfortunately mod_python offers no support to do this and even worse: it silently absorbs the uploaded files and does not even bother warning your application that it missed a client request. Going through mod_python forums I could find that somebody else already mentioned this to the developers but the feature was rejected because if you want to do serious web stuff you should move to WSGI.

At that point I could have gone back to CGI for the file uploading stuff but I did not want to live with a schizophrenic code being half CGI half mod_python. Besides, I do not even want to know how much time I would have needed to make this work in the Apache configuration file. Time to leave mod_python behind and move on.

Third attempt: Python WSGI

Now I have to wade through this infamous WSGI stuff and see if it is really worth all the buzz. To make things short: WSGI is a pure Python standard that specifies how a Web framework should behave at its lowest level. The intention is to make it easier to port a WSGI-compliant application from one framework or web server to another without having to re-code anything.

I read the full specifications for WSGI and I have to admit I did not really understand the motivations behind this design. But oh well, I trust the guys to have done a good job at factorizing web frameworks. The WSGI standard itself is really low-level.  There is no way you can develop a web site just armed with it, it is only meant for middleware providers. So let’s hunt for WSGI middleware!

WSGI stage 1: Django, Pylons

Django and Pylons are full-fledged frameworks that come up with all bells and whistles. Nothing bad with these but they do suffer from the same issues, namely:

  • They offer about a zillion features I do not care about
  • They cover almost everything I need, but not quite

Which means that I will probably end up deploying lots of packages I will never use and will have to code additional functionalities into their framework just to cover my own needs.

Both packages come with half a million dependencies on various additional packages, and every package means more maintenance.  I spent a couple of days on each to try and explore and came to the conclusion that it must be really great to use them as a basis for a larger project but I would not want to do it.

Mental note: I need to train myself on these frameworks, it might come handy some day.

WSGI stage 2: Bottle

Bottle is a lightweight WSGI environment all contained within a single file. Can’t beat that in terms of the fewest dependencies!  It offers a very simple syntax to route your URLs to your objects and makes for clean code like:

@route('/admin')
def administration():
return '.... html page here ....'

@route('/')
def index():
return '.... html page here ....'

Nice package, but I would largely prefer having the framework pick routable objects directly from my Python modules, like mod_python’s Publisher does. There were some other features missing from it and Bottle does not seem to be maintained any more, so I reluctantly decided not to use it.

Just out of curiosity I also tried to run Bottle within lighttpd, loosing another evening in the process. lighttpd does not support WSGI, you have to install yet another middleware layer (python-flup) and run the server in FCGI mode. After a whole evening of messing around I still could not get any Hello World out of my setup and ended up tracking an obscure bug in the way lighttpd spawns sub-processes. I do not have the courage to get into that in depth.

My conclusion on lighttpd: great for serving static files, still a long way to go before it can compete with Apache. I have no doubt the lighttpd guys will eventually get there though.

WSGI stage 3: Colubrid, Werkzeug

Colubrid offers exactly the kind of thing I need: a Publisher algorithm that goes through your objects and publishes them at predictable URLs. It took me no more than an hour to transform my mod_python application to Colubrid and see it run. Documentation for this project is pretty sparse though, and it is unfortunately not maintained any more. The authors refer to Werkzeug as the tool of choice now.

Enters Werkzeug: described as a library of WSGI helpers, it tends to suffer from the same overweight issues as Django or Pylons. A lot of dependencies on other libraries and a model that is really hard to understand. I spent a couple of hours going through the tutorial and could not make sense out of it. It is probably very powerful but seems inadequate for my brain.

So Colubrid it will be. It is unmaintained but the library does not force other dependencies and even if it has little documentation I can at least understand it. If I ever face issues I will modify it to suit my needs without fear of seeing my own patches overwritten by a new version. I found a couple of bugs but no showstopper for the moment.

Wrapping it up

I learned quite a lot in the process. Python is sufficiently high-level to expect development to be quick and to the point.  And in a way, that goal is pretty much achieved. Getting a dynamic web application is just a matter of coding your business logic into classes and then hooking them into a View and a database.

On the other hand, the sheer number of dependencies for most frameworks is a definitive showstopper for production. Many of these libraries are still relatively young and lack the polish needed to adapt from the package developer’s needs to your own.

Another thing I learned is that as time goes on, Python frameworks tend to become more and more complicated, to the point that there is little left for people like me who want to just have something that handles the HTTP protocol and lets you hook in the tools you need one at a time.

Oh well… Give me enough time and I might just end up writing my own.

Looks like you are writing a framework

Looks like you are writing a framework

Advertisements

Written by nicolas314

Thursday 27 August 2009 at 11:43 pm

6 Responses

Subscribe to comments with RSS.

  1. You should have tried CherryPy. Comes with everything you need, including a fast HTTP Server.

    Here’s what hello world looks like with CherryPy:
    http://en.wikipedia.org/wiki/CherryPy#Pythonic_interface

    Once you have the CherryPy module installed, nothing else is required to run the above program.

    Seun (Nairalist)

    Monday 31 August 2009 at 3:18 am

    • Actually I gave CherryPy a try just after I posted this. I just could not use it in HTTPS mode with a self-signed certificate. Took me a whole evening but I could not figure out what is going wrong. Other than that the framework looks really neat.

      nicolas314

      Monday 31 August 2009 at 9:44 am

  2. Actually, bottle is in active development. I don’t know how you got the impression it is not maintained anymore.

    Oh, and you don’t need lighttpd at all. The easiest way would be to use bottles build-in HTTP server or grab one of the supported server modules (flup, paste, fapws3, cherrypy) and just start it. If you really want to run bottle behind a regular HTTP server, start bottle on a custom port (8080) and use a proxy module (mod_proxy for lighttpd) to redirect the HTTP requests to bottle. That’s just as fast as SCGI or FCGI but doesn’t require an additional layer of complexity.

    There are some other deployment options described here: http://bottle.paws.de/page/docs#deployment

    I hope that helped :)

    Marcel Hellkamp

    Friday 16 October 2009 at 9:00 pm

    • Hi Marcel:
      Good to hear Bottle is still actively maintained! I stand corrected.
      Running your web site on a Python-based web server is a very appreciated bonus during development, but I’d rather have something more industrial-grade for production. Also: you cannot run an https server behind an http proxy.
      Many thanks for these helpful comments!

      nicolas314

      Friday 16 October 2009 at 9:52 pm

      • To scale well, I recommend two or more bottle processes using fapws3 (a high performance asynchronous HTTP/WSGI Server) running on different ports and a load-balancing proxy (pound or lighttpd+mod_proxy) in front of it. Actually, I recommend this setup for all WSGI Frameworks out there. Relying on a single Python process, even if it uses threads, simply wastes the power of multicore servers (Google for GIL. Python can not utilize more than one core at a time).

        As an alternative, you can use Apache+mod_wsgi to archive the same.

        Be aware that Colubrid (your framework of choice) uses Paste as its HTTP backend, which is implemented in Python. Paste is not a bad choice (Bottle supports it too), but it has the same limits than any other HTTP server implemented in Python (GIL). As soon as you want to scale a WSGI app, you have to go the multiple process approach, regardless of the framework used.

        As of HTTPS: You should never need to run a HTTPS server behind a load balancing proxy, because the proxy should handle the “S” part of HTTPS and just forward HTTP.

        Marcel Hellkamp

        Friday 16 October 2009 at 10:25 pm

      • As a side note: I used pound in a production environment for about a year and only brought me sorrow, but that was before version 1.0 was released and judged production-ready by its authors. I ended up replacing it with apache+mod_proxy, but that was for another project with very different needs than the ones I am describing in this blog entry.

        A few weeks after having written the above entry I completely switched to CherryPy (the move was relatively trivial) and I cannot complain. Web front-end is served by Apache+WSGI, which is heavier than what I would have wanted but handles HTTPS just fine.

        Interestingly: I really care about server+application dealing with HTTPS itself, since it is completely secured (authenticated) by client-side https certificates. I need to get the full SSL information and a proxy would just eat it all up. Sometimes an HTTPS server is more than just serving data over an encrypted link :-)

        nicolas314

        Friday 16 October 2009 at 11:05 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: