Deploying your django application with mongrel2 and wsgid

Some time ago when people talked about webapp deployment they were probably talking about the LAMP stack. Since the appearance of the nginx webserver this has changed quite a bit. In this post you will know another stack, that does not use neither apache nor nginx but that is equally interesting and quite scalable too. We will be talking about mongrel2 for the frontend and wsgid for the backend worker processes.

TL;DR

In this post you will learn how to configure mongrel2 and wsgid to run your WSGI appalication. If you are interested in web servers, web application deployment and system administration, read on!

Mongrel2

Mongrel2 is a different webserver. At least different from all web servers I’ve seen until I found it. This difference stands on the fact that mongrel2 is language agnostic, that is, it does not matter in which language your application is written. You will understand why very soon, keep reading!

Mongrel2 decouples application from web server in a very simple yet powerful and intelligent way. It uses a high performance queue to talk to your application. The queue used here is zeromq. So everything your applications needs to have mongrel2 as its front-end is to know how to speak zeromq. Fortunately there are already 30+ zeromq bindings, so if your language of choice has a zeromq binding you are ready to plug your app into mongrel2 infrastructure.

This decoupling means that your app and mongrel2 runs on different operating system processes, in fact nothing forbids them to run of different machines. To accomplish this mongrel2 uses a pair of sockets to communicate to back-end handlers. These are two zeromq sockets: One to push messages to back-end handlers and another to get the responses. And why is this great? Because the zeromq socket (PUSH/PULL) mongrel2 uses to push messages comes for free with round robin load balancing, this means that if you have 32 connected handlers to this socket each one of them will receive one message at a time in a round robin fashion. Cool, isn’t it?

Another big advantage of this load balancing is that it’s different from other load balancing setups and I will explain why. Usually when we see a load balancing setup, it uses the proxy technique. That’s nothing wrong with it, but one thing that you must know beforehand is the IP:PORT of all your back-end servers, the servers that you will proxy requests to. It happens like this: The front-end receives a new request, connects to one of the back-end proxies and delivers the just received request. If you want to add a new back-end server, you need to edit your front-end config, introduce this new server to it and tell your front-end to reload.

With mongrel2 you don’t need to do this, yes you don’t. That’s because of how zeromq works. What mongrel2 does is: It opens one zeromq PUSH socket on localhost (This is chosen on mongrel2 configuration, more on this later) and here is the magic: The back-end workers are the ones who connects to this socket. So who needs to know where to connect is the back-end handler, not the front-end! See? Because of this sort of inversion you can add/remove new handlers without mongrel2 even knowing you did it, without touching any config files and the best: without the need to restart/reload mongrel2 config values. This  was the functionality that blew my mind, by the way.

You may be thinking: “Alright, so I have to write my own mongrel2 handler to run my apps on it?” The answer is Yes and No. If you are interested on running python WSGI applications with mongrel2 you won’t have to write a handler, in fact your WSGI app won’t even need to know it is running with mongrel2. That’s when wsgid comes in to rescue you (more on this later), it is the bridge between mongrel2 and your WSGI app.

So let’s see how to configure our mongrel2 instance and then we will see how to use wsgid to run our web app.

Configuration files

I said files because mongrel2 reads its configuration from a database. Yes, from a database! Breathe…., still with me? Alright! At first you may think: “What? From a database? WTF!? This is crazy!”. But when you think clearly you start to see some advantages, and the most important is that it becomes absolutelly easy to change any config you want, and to do this all you need is a sqlite client. Even better, you can build your own tool to manage this config database yourself (no need to write nay kind of file parser). Note: Mongrel2 already have internal infrastructure to be able to load its config values from anywhere, see more at this blog post.

One disadvantage that probably you thought about was: “It is very non optimal to read config values from a database”, and you are not completely wrong here, but there is one thing that changes it all: Mongrel2 does not keep reading the database during its run. It reads the configs only when starting up, so what you end up with is one read from the database against one read/parse from the config files. Maybe some additional milliseconds to read the database, but honestly, not that bad, right? Also mongrel2 has a hot reload functionality, you just SIGHUP it and all config values will be reloaded on the next request.

Installing mongrel2

Installing is pretty usual, as awlays it’s a matter of:

$ make
# make install

If you happen do use Gentoo linux, there is an overlay that conatins mongrel2 ebuilds, It’s here: https://github.com/daltonmatos/gentoo-overlay. Just add it to your Gentoo installation and you will be able to do an emerge mongrel2 to install it. If you are installing manually remember to install also zeromq and sqlite3.

Configuring mongrel2

As said before, mongrel2 uses a database to store its configurations. So we need a database schema to start with. Mongrel2 ships with a tool names m2sh that can build your database schema for the first time. There are some example config files that m2sh understands inside mongrel2 source code, they are inside examples/configs. All you have to do is:

$ m2sh load -config <your-config-file.conf>

this will build a file named config.sqlite. That’s your database! Rather than learning the syntax that m2sh understands (because if we work with the files that m2sh understands, we will lose all the advantages of having a database as a source of our settings) we will focus on the database it creates for us. You not always need m2sh, mongrel2 comes with one SQL file that re-creates the schema for you, it’s the src/config/config.sql and can be seen here: https://github.com/zedshaw/mongrel2/blob/master/src/config/config.sql. So let’s see what do we get with m2sh load command:

$ sqlite3 config.sqlite 
SQLite version 3.7.7.1 2011-06-28 17:39:05
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .tables
directory  host       mimetype   route      setting  
handler    log        proxy      server     statistic
sqlite>

So here are the tables mongrel2 needs to store all your configuration. The tables we will see are:

  • server – Stores all servers we have, here we have the port that this server is listening to;
  • host – Here are all virtual hosts, each host is attached to one server;
  • route – Here are our routes, each route is attached to one host and to one target (more on this later) that can be a handler, proxy or directory;
  • handler – Here we define our backends;
  • directory – Here are all our directories that contains static data.

The server table

Here is the schema of this table:

sqlite> .schema server
CREATE TABLE server (id INTEGER PRIMARY KEY,
    uuid TEXT,
    access_log TEXT,
    error_log TEXT,
    chroot TEXT DEFAULT '/var/www',
    pid_file TEXT,
    default_host TEXT,
    name TEXT DEFAULT '',
    bind_addr TEXT DEFAULT "0.0.0.0",
    port INTEGER,
    use_ssl INTEGER default 0);
sqlite>

And here is some data from my personal deployment:

sqlite> select * from server;
          id = 1
        uuid = f400bf85-4538-4f7a-8908-67e313d515c2
  access_log = /logs/access.log
   error_log = /logs/error.log
      chroot = /var/mongrel2
    pid_File = /run/mongrel2.pid
        name = mainserver
   bind_addr = 0.0.0.0
        port = 80
     use_ssl = 0
default_host = wsgid.com
sqlite>

I used one value per line for clarity. So here we have one typical server deployed on port 80. One important thing to see here is the chroot field. Mongrel2 always chroots to one specified location before start serving requests. All paths at the directory table will be relative to this path, for example.

The host table

Here is the schema:

sqlite> .schema host
CREATE TABLE host (id INTEGER PRIMARY KEY, 
    server_id INTEGER,
    maintenance BOOLEAN DEFAULT 0,
    name TEXT,
    matching TEXT);
sqlite>

The important parts are:  server_id is the server that this host belongs to. The name is just a name for this host, you use this value to refer tho this host inside you configuration. The matching field is what mongrel2 uses to match the Host header of the HTTP request. You can use some regular expression here, see more at mongrel2 docs.

Here are some data of my same deploy cited above:

sqlite> select * from host where name in ('daltonmatos.com', 'wsgid.com');
         id = 4
  server_id = 1
maintenance = 0
       name = wsgid.com
   matching = wsgid.com

         id = 6
  server_id = 1
maintenance = 0
       name = daltonmatos.com
   matching = daltonmatos.com
sqlite>

Here we have two deployed hosts. Until here mongrel2 is not useufl yet, because we are not serving any content. Now we start to configure our backends.

The route table

Here is the schema:

sqlite> .schema route
CREATE TABLE route (id INTEGER PRIMARY KEY,
    path TEXT,
    reversed BOOLEAN DEFAULT 0,
    host_id INTEGER,
    target_id INTEGER,
    target_type TEXT);
sqlite>

Here we put all routes we want mongrel2 to respond to and attach them either to a handler, proxy or directory. In this post we will see only handler and directory backends. Here is some real data:

sqlite> select * from route where host_id = 6;
         id = 12
       path = /static
   reversed = 0
    host_id = 6
  target_id = 5
target_type = dir

         id = 13
       path = /
   reversed = 0
    host_id = 6
  target_id = 2
target_type = handler

These two routes are attached to my personal website host: daltonmatos.com, and are respectively for the static content and for the main application (it’s a Django app). So let’s see how we configure our handlers:

The directory table, for static serving

Here is the schema:

sqlite> .schema directory
CREATE TABLE directory (id INTEGER PRIMARY KEY,
   base TEXT,   
   index_file TEXT,
   default_ctype TEXT,
   cache_ttl INTEGER DEFAULT 0);
sqlite>

And here is some real data from my deploy:

sqlite> select * from directory where id = 5;
           id = 5
         base = apps/daltonmatos.com/app/daltonmatosdotcom/static/
   index_file = index.html
default_ctype = text/html
    cache_ttl = 0
sqlite>

This is the target that the /static route is attached to. Remember the base path is always relative to the chroot path you choosed when creating your servers. Now, the final part where we present our application handlers to mongrel2

The handler table

Here is the schema:

sqlite> .schema handler
CREATE TABLE handler (id INTEGER PRIMARY KEY,
    send_spec TEXT, 
    send_ident TEXT,
    recv_spec TEXT,
    recv_ident TEXT,
   raw_payload INTEGER DEFAULT 0, protocol TEXT default 'json');
sqlite>

And the data for the daltonmatos.com host:

sqlite> select * from handler where id = 2;
         id = 2
  send_spec = tcp://127.0.0.1:5002
 send_ident = 35353f12-6a2a-11e0-b898-001fe149503a
  recv_spec = tcp://127.0.0.1:5003
 recv_ident = 35ad3ab2-6a2a-11e0-b898-001fe149503a
raw_payload = 0
   protocol = json
sqlite>

So here is the magic behind mongrel2. See these two IP:PORT configs? So these are the two sockets I talked about earlier at the beginning of this post. The send_spec is the socket used by mongrel2 to dispatch messages to the handlers. The recv_spec is the socket used to receive the responses.

Here is how mongrel2 decouples from you application. Since it talks to your app across the network, it does not matter in which language you write you apps, as long as your app speaks mongrel2 protocol language. And thats where wsgid enters the story. Lucky you, none of your already written django apps need to know about mongrel2 protocol because wsgid speaks both mongrel2 protocol and  WSGI. Let’s see how we can use wsgid to run ou apps with mongrel2 as the frontend.

So long story short, every request that comes to the root of daltonmatos.com, mongrel2 parses the HTTP requests and dispatches it to this send_spec socket, and from here on until the response comes back, is zeromq machinery. To know more about how mongrel2 handles this two sockets you can go to the official docs.

Starting your server

Now all you have to do is call mongrel2 on the command line to start it up. Since we may have multiple server on the same database config we must choose which server we will start and we do this passing the uuid of your sever on the command line , like this:

# mongrel2 <config.sqlite> <uuid>

This will start the server with uuid = <uuid>, reading from the config database passed as the first argument. From now on mongrel2 is ready to receive requests. Now it’s time to see how wsgid helps us on running our django app.

wsgid

Now that you know about mongrel2 handlers and all that, you can see wsgid as a generic WSGI handler for mongrel2 applications. With it you can run any WSGI app having mongrel2 as the front-end, and here is how we do it.

Installing Wsgid

You can install wsgid downloading it from the official website: http://wsgid.com. After downloading, extract it to one location of your choice and run:

# python setup.py install

from inside this folder. To check your instalation, you can run:

$ wsgid --version

and you should see the current installed version.

Creating a wsgid app folder

Since wsgid runs your app as a *nix daemon, it needs a special location with some specific folders inside. Let’s call it application folder (app folder). There is where you will put the source-code of you Django app. To create a new app folder you can run:

$ wsgid init --app-path=/path/to/my/app/folder

Wsgid will create this folder if it does not exists and initialize all needed sub folders for you. In this example, let’s use /tmp/myapp. So an initialized folder looks like this:

daltonmatos@jetta /tmp/myapp [13]$ ls
app  logs  pid
daltonmatos@jetta /tmp/myapp [14]$

Deploying you Django app

All our app source-code lives inside the app/ folder. So we just need to cd to /tmp/myapp/app and create our brand new django project:

daltonmatos@jetta /tmp/myapp/app [16]$ django-admin.py startproject myproj
daltonmatos@jetta /tmp/myapp/app [17]$ ls
myproj
daltonmatos@jetta /tmp/myapp/app [18]$

Ok, so this is a common django project. Of course that when deploying your already written django project you will just copy the project folder to this location, it will have the very same effect.

Starting wsgid instance to run your django app

Now it’s time to start wsgid so we can send requests to our app. Remember that two sockets we saw earlier on this post? Remember that the back-end handler is who needs to know where to connetc? Here is how we do it.

Let’s use the same handlers we saw earlier:

  • send_spec = tcp://127.0.0.1:5002
  • recv_spec = tcp://127.0.0.1:5003

So here is how we pass this to wsgid:

$ wsgid --app-path=/tmp/myapp --recv=tcp://127.0.0.1:5002 --send=tcp://127.0.0.1:5003 --workers=4

Now we have one instance of wsgid running and calling our django app when a new request arrives. You can confirm that wsgid started successfully looking at the logs:

$ tail -f /tmp/myapp/logs/wsgid.log
2011-11-06 18:06:56,115 - wsgid [pid=27900] - INFO - Master process started
2011-11-06 18:06:56,116 - wsgid [pid=27900] - INFO - New wsgid worker created pid=27903
2011-11-06 18:06:56,117 - wsgid [pid=27900] - INFO - New wsgid worker created pid=27904
2011-11-06 18:06:56,118 - wsgid [pid=27900] - INFO - New wsgid worker created pid=27905
2011-11-06 18:06:56,119 - wsgid [pid=27900] - INFO - New wsgid worker created pid=27906
2011-11-06 18:06:56,145 - wsgid [pid=27903] - INFO - Using AppLoader: DjangoAppLoader
2011-11-06 18:06:56,145 - wsgid [pid=27904] - INFO - Using AppLoader: DjangoAppLoader
2011-11-06 18:06:56,150 - wsgid [pid=27905] - INFO - Using AppLoader: DjangoAppLoader
2011-11-06 18:06:56,168 - wsgid [pid=27906] - INFO - Using AppLoader: DjangoAppLoader
2011-11-06 18:06:56,223 - wsgid [pid=27903] - INFO - All set, ready to serve requests...
2011-11-06 18:06:56,228 - wsgid [pid=27904] - INFO - All set, ready to serve requests...
2011-11-06 18:06:56,231 - wsgid [pid=27905] - INFO - All set, ready to serve requests...
2011-11-06 18:06:56,238 - wsgid [pid=27906] - INFO - All set, ready to serve requests...

Wsgid automatically detects what kind of application it is loading and uses the appropriately AppLoader. Considering that you have a localhost registered on your host table, just hit http://localhost in your browser and see your running Django application.

daltonmatos@jetta ~ [2]$ curl -i http://localhost/
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 2051

daltonmatos@jetta ~ [3]$

To know more about wsgid command line options, try wsgid –help.

Using wsgid config  files

You don’t need to call a complete command line, full of arguments every time you want to start a new wsgid instance. You can tell wsgid to create a simple JSON config file inside your app folder. Just add config as the first command line argument:

$ wsgid config --app-path=/tmp/myapp --recv=tcp://127.0.0.1:5002 --send=tcp://127.0.0.1:5003 --workers=4

Wsgid now created a /tmp/myapp/wsgid.json file with all these options:

daltonmatos@jetta /tmp/myapp [41]$ cat wsgid.json 
{
  "debug": "False",
  "workers": 4,
  "keep_alive": "True",
  "recv": "tcp://127.0.0.1:5002",
  "send": "tcp://127.0.0.1:5003"
}
daltonmatos@jetta /tmp/myapp [42]$

From now on you can start this same instance just with wsgid –app-path=/tmp/myapp. If you want to change any value inside the config file just re-run the command with the new values and wsgid will update the config file accordingly.

Now if you want to run more instances of you app on different machines (considering that you have a copy of your app there) all you have to do is to access that machine a run:

$ wsgid --app-path=/tmp/myapp --recv=tcp://MONGREL2_IP:5002 --send=tcp://MONGREL2_IP:5003 --workers=4

And you are done! No need to tell mongrel2 about it. Just run this and you will have 4 more processes added to the zeromq round robin load balancing logic.

Wsgid has others usfeul command line options, such as: wsgid reload to reload your app code on the fly, wsgid status to show you the PID number of all your processes and many more. See all of them ad the official docs: http://wsgid.com/docs.

Final words

So what you just saw here is another way to deploy and scale your app. There is nothing brand new here, just a known technique applied with smart tools to make your life a lot easier! I’ve been using this setup for a couple of months now and I can say that I’m absolutely happy with it. I even ran a blitz.io (this is my blitz.io invite link) rush against wsgid.com and mongrel2+wsgid handled all requests pretty well!

So let me know what you think! Give this setup a try and share your thoughts. What can we do the make it even better?

Thanks for reading!



About these ads

, , ,

  1. Deixe um comentário

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s

Seguir

Obtenha todo post novo entregue na sua caixa de entrada.

%d blogueiros gostam disto: