Friday, December 25, 2009

python3.2(svn) python2.7(svn) and the new GIL, cherrypy as a test.

I've done some quick benchmarks for the unreleased python3.2 on the unreleased cherrypy webserver for python 3. Also with the unreleased python2.7.

Here are the results...


Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
25 | 1000.0 | 0.0 | 533.32 | 1.875 | 93.75 |
50 | 1000.0 | 0.0 | 525.86 | 1.902 | 92.69 |
100 | 1000.0 | 0.0 | 522.96 | 1.912 | 92.1 |
200 | 1000.0 | 0.0 | 523.83 | 1.909 | 92.25 |
400 | 1000.0 | 0.0 | 506.92 | 1.973 | 89.27 |
Average | 1000.0 | 0.0 | 522.578 | 1.9142 | 92.012 |


Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
25 | 1000.0 | 0.0 | 555.72 | 1.799 | 97.78 |
50 | 1000.0 | 0.0 | 558.86 | 1.789 | 98.52 |
100 | 1000.0 | 0.0 | 552.87 | 1.809 | 97.45 |
200 | 1000.0 | 0.0 | 546.09 | 1.831 | 96.27 |
400 | 1000.0 | 0.0 | 548.64 | 1.823 | 96.53 |
Average | 1000.0 | 0.0 | 552.436 | 1.8102 | 97.31 |

So here you can see a small improvement in the 400 threads version with the new GIL in python3.2.

Python 3.2 threads seem more scalable in this benchmark compared to python 3.1. Also faster overall (20-40 requests per second faster).


Python2.6 still beats python3.2 for cherrypy benchmarks. These are a network IO heavy, string processing heavy, with big python thread thread usage. So an ok benchmark for the new GIL work in my opinion.

Note that both python3.2, and cherrypy for python 3 are not released.
Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
25 | 1000.0 | 0.0 | 660.54 | 1.514 | 116.43 |
50 | 1000.0 | 0.0 | 671.01 | 1.49 | 118.28 |
100 | 1000.0 | 0.0 | 663.84 | 1.506 | 117.12 |
200 | 1000.0 | 0.0 | 664.85 | 1.504 | 117.19 |
400 | 1000.0 | 0.0 | 651.9 | 1.534 | 114.8 |
Average | 1000.0 | 0.0 | 662.428 | 1.5096 | 116.764 |


Python2.7 is faster still.
Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
25 | 1000.0 | 0.0 | 695.33 | 1.438 | 122.79 |
50 | 1000.0 | 0.0 | 684.6 | 1.461 | 121.12 |
100 | 1000.0 | 0.0 | 688.99 | 1.451 | 121.67 |
200 | 1000.0 | 0.0 | 682.94 | 1.464 | 120.49 |
400 | 1000.0 | 0.0 | 641.01 | 1.56 | 112.78 |
Average | 1000.0 | 0.0 | 678.574 | 1.4748 | 119.77 |

It's also worth noting that 100-120%(out of 200%) of the two cores cpus are during the run of each version of python tested. Even though the GIL is released for many parts with python and cherrypy, and even though the benchmark is very IO heavy, both cores are not loaded. It's generally a good thing for a webserver to not load up the CPUs - but in a benchmark you want them to go full speed.

Also, during the tests the benchmarking tool 'ab' is run on the same machine skewing results. However ab seems to only use 1% of the CPU during the tests (according to top).

1.6ghz core 2 duo, ubuntu (save the) karmic koala 32bit version.

Shrinking the stack to save some memory.

So how do we reduce the memory usage of threaded programs? Reducing the stack size is one idea. Since threads do not share a stack with each other, each thread takes up a nice big stack. On ubuntu karmic koala it seems to default to 8MB or so.

Note, this is dangerous and can segfault your interpreter if you do not have enough stack space for some operations. So make sure you test things well before doing this.

In 2.6 you do:

import thread

Whereas in python 3.x you do:
import threading

(default stack) * 110MB virt, 10MB resident
(adjusted stack) * 15MB virt, 10MB resident

(default stack) * 107MB virt, 8MB resident
(adjusted stack) * 12MB virt, 8MB resident

Note, the python3.x versions start using up 2 gigabytes of memory in the benchmark. Compared to a peak of 600MB or so with python2.6. Showing some sort of memory leak in either python or cherrypy (note, both are pre-release).

It seems python3.2 worked with a smaller stack compared to python2.6. python2.6 segfaulted with 32768, but worked with a stack sized 32768*2. However python3.x used more resident memory, and virtual memory.

I've uploaded profile(python cProfile), results and scripts here: Strangely using python2.7 with the benchmark and cProfile did not finish (well it was still going 30 minutes later). So I guess that is a bug somewhere.

Wednesday, December 23, 2009

Invent with python - a new pygame/python book.

Invent with python 'Teach yourself how to program by making computer games!' is a free book now in its second edition. It is released under the creative commons licence Attribution-Noncommercial-Share Alike 3.0 United States - and the author just asks for a donation if you liked reading it.

The book takes the fun approach of teaching programming through making games... the way many of us learnt(and are still learning) to program.

As a bonus it uses python 3 - which pygame has supported for a while.

Congratulations to Albert Sweigart for finishing his second edition of the book.

Tuesday, December 22, 2009

structuring modules/packages and the cdb database for websites and python packages

Integrated modules are nice, but so are modular packages.

How can we have both? That is, keep all things for a module(or sub package) in one directory, but also have a nice integrated system built on top of that?

Often for one package I will have a file layout like so:


Then each module has its tests, docs, and examples all mixed in the one directory. This is nice if you want to have all of the tests together, and all of the docs, and examples together.

However then all of the modules are mixed in together. Meaning it is harder to separate them, and keep one module in its own directory. Having everything for one thing together is nicer for developers I think.

Using namespace packages through setuptools distribute is one way. There is a draft pep around to put namespace packages into python as well. However, this feels way too heavyweight for me. Especially since they add multiple paths to the interpreter... meaning python has to do stat, and open, calls for every single path - which slows down the startup and import speed of python even when not using those packages.

Another way is to use a sub-package for each module. This turns each module into a sub-package.

However, some people put all of their stuff in an file. Which is kind of hard to find, and not very descriptive. Also if you are editing a dozen files for a project, it is really quite annoying. It is much better to have it in a '' file, and then import that in the file. Still slower than using modules directly though, since there are still extra stat calls, and extra file opens and reads.

Using the file is still annoying in that you need the extra file, and have to remember what this magic file does.

This one liner can put all of the sub packages modules into the name space:
import os,sys
for f in os.listdir('somepackage'):
if os.path.isdir(os.path.join('somepackage',f)):
This is magic itself, and has a pretty bad code smell.

Other issues are playing nicely with things like cx_freeze, py2app, py2exe and the many other tools for managing python packages and modules... who don't know about namespace packages, or my custom magic.

I could write an import hook, or try out importlib in python3.1, but those would probably have similar issues to other non-standard ways.

So in the end, I think I'll stick with a fairly standard method of doing it... using a file which imports the proper module into its namespace.

This lets me do import somepackage.somemodule and have it work. I can also "cd somepackage/somemodule/; python" and have it work.

This is how I'll structure pywebsite. With a separate directory for each sub module. This gives modularity, making it scalable to many modules, and it will simplify contributors lives. Since it uses standard packaging it should be compatible with most tools. Another bonus is that from within the source download I can just do:
import pywebsite
without having to call the script or install it. If a module changes from a .py file to a .so or a .dll or the other way around I won't have any issues either.

It is now a little harder for sub-packages within a package to refer to other sub-packages, but can be done with newer versions of python.

Sub-packages are more heavyweight than just using modules in your package... but it does seem cleaner and more extensible. However it is not as heavyweight as using namespace packages.

Any other issues with this approach?

namespace packages

Using the method outlined above, still does not allow me to split up a package into multiple separate files. This is ok though, since to start with I want to keep most of the modules in the one place... in the one repository, with the one bug tracker. However designing for extensibility from the start is useful, so we consider how it can be done.

This is where namespace packages are useful.

There is this draft pep: pep-0382, and also the setuptools distribute namespace packages (which is what everyone is using to do them). Here is the setuptools distribute documentation for namespace packages.

It should be possible to use some sort of hack, so that once you import a package, it searches for other packages with the same namespace.

You can use a packages __path__ attribute to tell it to look in other paths for importing modules.

>>> import pywebsite
>>> pywebsite.__path__
>>> pywebsite.__path__.append('lala')
>>> import pywebsite.bla
>>> pywebsite.bla.__file__

Using __path__ is outlined in the Packages in Multiple Directories part of the python tutorial.

As an example, say I had a 'otherpackage' package, and then someone else wanted to maintain part of that package, or we wanted to separate part of it out. Let's call this namespace package: 'otherpackage_doit'. It could install itself as a directory and package called 'otherpackage_doit'. Then import otherpackage_doit would work fine. However import pywebsite.doit would not work. You can't just call the package 'otherpackage.doit' either - since python will first look in the otherpackage package for the doit package, making import otherpackage.doit fail.

From a users discovery perspective, I would expect otherpackage.doit to be in otherpackage/doit. So that's where I'd look first. Installing into that directory would probably be best then. However that is not a very good method. After that, I'd probably do "print(otherpackage.doit.__file__)". Or I might do a "locate otherpackage.doit" command.

Really I just wish python3.2 could be changed so that 'otherpackage.doit' package is automatically a namespace package - without having to mess around with weird magic .pth files or declaring things in setup files like setuptools does.

So how can we retrofit(hack) existing pythons to do this for us? We need to get the python import machinery to search for otherpackage.* packages outside of the otherpackage directory. I'm sure it's possible with python somehow... Inserting 'otherpackage.doit' into sys.path does not work. You can't even have a package name with a '.' in it.

So I'll give up on namespace packages for now until a suitable option presents itself... or I have more time for research. Separate packages will have to live in the same file, but can still be separate with source control tools - like bzr, svn, github etc.

Still not fast enough... cdb databases for websites and python packages

However, the standard python package method is not the fastest. Supporting cgi operations for a web library is a good idea. This is because many webhosting platforms still only support python through cgi. So loading heaps of files for every cgi request is not an option. It is possible to get acceptable performance out of cgi and python... just many of the large frameworks have poorly optimized loading. Many frameworks rely on long running processes to avoid the slow load times. Using django via cgi in an embedded 130mhz arm with a limit of 10MiB is not going to work very well (or at all).

So how to make it faster for embedded/cgi apps?

Firstly an executable can be made. Using tools like py2exe. This can pack all of your data inside the executable.

One common method people try is to use the zip format. This works fairly well but is not optimal. Zip files are nice as they are supported by OS level tools, and file managers - so this will be one option to use. The downside, is that it makes the files harder to edit. I see .zip files as an optimisation that hinders usability. Especially .egg files(which are just .zip files) are bad, as it makes it harder to debug or change programs. So like .pyc files I think the zip file should be generated as needed - but having the full source tree there to change is very useful. If someone changes the source, the zip file should be regenerated as needed.

Another option is a constant database (cdb). cdb is a very simple constant database format used in things like djbdns, qmail and for other things. cdb happens to be one of the fastest(if not the fastest for constant databases [benchmarks pdf]). cdb is perfect for python packages that are not meant to change since they are so quick.

cdb is also pretty good for serving data from websites. Since often with websites much of the data is mostly static(constant) - a cdb key/value database is a nice optimisation over files on the file system. There are less syscalls, and less latency issues.

Zip files can also be used as .jar files by some browsers(firefox), to reduce latency on websites too. See the jar url scheme for details on how to put all your static files into a zip file(jar file).

There are some of my python module_experiments in:
bzr co

So I have now refactored pywebsite to use a sub-package for each module, so that all the tests, docs and examples for that module are within its own sub-package. Using a zip/cdb file for imports will be left for later, as will namespace packages.

Monday, December 21, 2009

hashing urls for authorisation, and pywebsite.signed_url

Using a hash with a salt is one way of validating urls. With it you can tell that it is quite likely the url was generated by someone with a secret. That is, generated by someone authorised to make that url. This post describes using this technique with the hmac python module applied to urls(and object method calls).

Generating thumbnail urls is a good example. If you have a url which can generate a thumbnail of an image at a specified size that is pretty cool. Except then you can have someone generating images at all sorts of other weird sizes by exploiting the width, and height parameters (eg making 100,000 pixel wide thumbnails). Or perhaps you want to limit it to a certain set of images.

You could always code in your thumb nail generating routine which variables are valid... but this has problems. First it makes your routine less flexible. Separating authorisation is always a nice idea.

Another way (with HMAC like using pywebsite.signed_url) is to generate the urls and add a hash to them with a secret salt. This way you can be fairly sure that it was generated by someone with access to the secret salt.

This system is not as secure as using PKI, but it is a lot quicker to implement and is faster running.

One problem with using hashes is when you have to change the salt or hash scheme... then all your old urls are invalidated. Very annoying (sad panda).

Below is a little example of how you would hash protect some object methods, so only methods which are generated by authorised people are allowed. Note that it only uses a hash with a length of 6 characters, this is to keep urls short.

from pywebsite import signed_url, imageops

class SignedImages(object):
salt = 'somesecret'
root_dir = '/tmp/'

def thumb(self, hash, width, height, rotate, gallery, image):
""" serves a thumb nail
salt = self.salt
keys = None
length_used = 6
values = [width, height, rotate, gallery, image]
if not signed_url.verify(values, salt, hash, keys, length_used = length_used):
raise ValueError('not a valid url')
cache_dir = os.path.join(self.root_dir, "cache/")
path_to_galleries = os.path.join(self.root_dir, "galleries/")
iops = imageops.ImageOps(cache_dir, path_to_galleries)
path = iops.resize(gallery, image, width, height, rotate)
return cherrypy.lib.static.serve_file(path)

def gen_thumb_url(self, width, height, gallery, image):
values = [ width, height, '0', gallery, image]
salt = self.salt
keys = None
length_used = 6
hash = signed_url.sign(values, salt, keys = keys, length_used = length_used)
url = "/".join(["thumb", hash] + values)
return url = True = True

See the on launchpad file browser, or bzr co lp:pywebsite. Also see pywebsite.signed_url.signed_url_test for the unit tests. It is quite a simple module, and you could roll your own quite easily (with the python hmac module). Well, maybe not well easily... eg, consider the flickr exploits This code is also probably vulnerable( eg, SHA-1 is now vulnerable to collisions), but at least it will stop basic fiddling with unauthorised urls and shouldn't have the same vulnerabilities as the flickr API used to(and the facebook API might still have).

updates: been through a number of changes... the name has changed from hash_url to signed_url.sign, signed_url.verify. It has also moved into its own sub-package (like all pywebsite modules now live in their own self contained sub-package). Fixes for timing attack(unit tested). Uses hmac.

In Switzerland. Lots of snow!

In Switzerland. Had fun visiting an old friend and his new girl friend. Lots of snow came whilst we're here which was fun. However, the planes didn't like it, and they got cancelled. So going back to London today instead(if the planes work).

Thursday, December 17, 2009

Python the GIL, unladen swallow, reference counting, and atomic operations.

The unladen swallow project comments on the global interpreter lock and suggests long term dropping reference counting in favour of GC. They are also now favouring an incremental GIL removal from python.

However, confusing getting rid of the GIL, with requiring using garbage collection is wrong. This mini-essay will explain why reference counting is good for python, and why you do not need to get rid of it for multiple CPU systems. Then there will be descriptions of other improvements to python for multiple CPU systems that can be made.

Reference counting has a number of advantages over various GC schemes see As a python programmer, optimizing for the programmer is what is most important... and reference counting makes it easier for the programmer. It makes programs much more deterministic, meaning that reference counting code is much easier to think about in our human heads.

Below are two real world examples of atomic reference counting. This makes reference counting thread safe, by using atomic operations.

Here are two mainstream open source object systems that have implemented it:

QT: from qt 4 released June 2005:

glib: released august 2005.

It is possible to make reference counting thread safe in a performant, cross platform manner. Using the argument against reference counting just because of the GIL and thread safeness is a bad argument - as proved by at least these two highly portable, highly performant, open source pieces of code. Not just research papers, but real life code that has been used for a number of years now.

There has been a lot of research and real life code which improves reference counting, which python could take advantage of - rather than moving to a GC.

Simple Direct Media Layer also has a cross platform atomic operations API in SDL 1.3.

Locality of memory is much more important than using GC or reference counting. That is operating on memory worrying about if it is in the caches (L1,L2,L3 etc), also about locality to certain cpus/cores. This is why modern memory allocators have a per thread heap... to help memory allocated on one heap be accessed more easily by threads. See NUMA for details. NUMA factors play a BIG part in modern cpus already... even on single cpu systems. Designing for cache friendliness can give big speedups.

Reference counting can be a problem for cache friendly systems... but it doesn't need to be. There are techniques for reducing, or removing NUMA issues with reference counting memory management.

Reducing memory usage of objects will also mean for faster threaded python programs.
From the pypy project: "The total amount of RAM used on a 32-bit[ED:pypy] Linux is 247 MB, completing in 10.3 seconds. On CPython, it consumes 684 MB and takes 89 seconds to complete... This nicely shows that our GCs are much faster at allocating objects, and that our objects can be much smaller than CPython's."
More details here: As you can see reducing the memory size of objects can have a massive effect on runtime.

Another memory optimization for python can come from reducing the amount of allocations done by python at run time. Many real time systems have a policy of zero run time memory allocations. For python this is hard, however there are probably a number of places where memory allocations could be removed, or reduced in python. The pypy project tried to reduce the runtime memory allocations and found a number of them which sped things up.

Intelligent sizing of the stack is another memory optimization python could do. Each new thread has a copy of its own stack. So by having a very large stack, python can use less threads than is optimal. An optimization could be to try and use a very small stack, and then resize it if necessary. See the for the thread.stack_size([size]) function.

Also speeding up GIL holding/removing will be good for those parts of python that already remove the GIL - especially on single CPU systems. Currently grabbing the GIL is a fairly costly operation. Using a method like that used by QT with atomic operations would be MUCH more efficient.

An incremental technique of GIL removal was taken by VMs like linux and *bsds (eg freebsd, dragonfly bsd) (and glib, qt). The idea is you make more fine grained locks on individual systems. Then you can slowly remove the GIL from parts incrementally... and you can test them. Removing the GIL all at once is silly... removing it piece by piece is a much better idea. I think the unladen swallow project is now preferring an incremental approach to GIL removal... which is nice.

A plan which approaches GIL removal incrementally is much more sane over all.

Designing APIs to work in manners better for threaded programs is another area python can get better performance on multi cpu systems. For example, by finding calls which block and making sure they release the GIL. Also by making batching APIs, allows not only avoiding the high cost of python function calls, but also makes that code embarrassingly easy to achieve parallelism.

One API that could be easily turned into a batching API is the import command. When you import 10 modules at the top of your program, much of that work could be shared or done in parallel. Likewise making the libraries be able to easily use

Documenting which calls release the GIL, or have fine grained locking/lock free programming can also help programmers more easily take advantage of multiple cpus.

Documenting which objects are safe to pickle will help people use the multiprocessing module, which does not work with objects unless the objects can be pickled. Likewise using forking can allow you to use multiple processes on data that can not be pickled. Since when you fork, the OS does the memory copying for you. The advantage over the multiprocessing library, is that with forking you only need the results to be able to be pickled... not the inputs.

Using wise madvise calls for mmap can also mark data which is read only... making multi processing much more efficient. Using madvise well, can mean much better copy on write behaviour. Improving copy on write behaviour is good for multiple processes... not just multiple threads. Likewise talking to the OS about what the program needs in other ways will also help. For example, if you know your program is going to use 1GB of data... then tell the OS that at the start. If you know it will eventually need 50 threads... then tell it at the start. If you know it needs a very small stack size... then tell it at the start.

Modern CPUs have memory prefetching commands, which can help out with telling the CPU which memory you are likely to access next. Likewise telling python, and the CPU that you want to only read or write to memory can help a lot with performance.

Apart from memory, there are many other OS level tweaks that can make your multi process python code go quicker. For example, setting the process/thread priorities. You can even ask your code to run with real time priority if it needs it. This can reduce the latency of processing, and also speed up the throughput. Easily gaining a 20% increase in throughput, or a very big drop in latency. If you do not care how fast your python program goes, then telling the OS so will help other processes in the system go faster.

Improving the performance of the built in primitives like the Queue/Mutex can also help python a lot. I've found that avoiding the python thread safe Queue can give you pretty good performance increases. Instead just using a list and using its python level atomic operations (documenting which python primitives are atomic would also help programmers a lot). A multi thread safe event queue can be quite nice. Especially if you can pass things into the queue at C level... avoiding the GIL entirely. There are a number of these available for python already (eg pygame.fastevent).

In fact for python 3.2 the GIL has been reworked: and the RLock primitive has been rewritten in C. This 'newgil' work has already been put into py3k trunk. However I think there is still quite a lot of room for improvement in the python GIL.

Another API design improvement are optimizations available for worker queues. One is sending similar tasks and data to the same cpus/cores... to take advantage of NUMA effects/cache (eg, always sending the url "/dojob1" to the same cpu).

Another worker queue optimisation is to reduce contention on the worker queue, by sending batches of tasks/data to the worker threads/processes. Rather than one task/piece of data at a time... divide the work up into many pieces and then send that across to the worker queues. eg, sending a list of 1000 images to each of the cpus to process rather than have each cpu ask for a single image to process. Dividing the work up into 16 parts of 100 each requires 16 interactions with the worker queue. Having them take one thing at a time means 16000 interactions with the worker queue - that is 16000 * 16 more opportunities for GIL locking and thread contention.

Many operations in the python stdlib could be made to use multiple threads/processes internally at the library level. Then people wouldn't need to know if the library is using async IO, threads, processes, or alien magic. eg, a url downloader might be given a list of URLs to GET... and return the responses. Underneath the covers it could use an async library, or even multiple threads to get the job done. This library level parallelism can happen if the APIs are given a batching design... that is given multiple inputs, and expecting multiple outputs. Then it is possible for library authors to try and choose the best technique. Taking single inputs, and giving single outputs does not let the library authors implement parallel algorithms as easily.

In conclusion, there is a lot we can do to improve python on multi cpu systems without removing the GIL entirely. An incremental approach to removing it from parts of python is a good idea as shown by other projects doing the same. Removing reference counting from python is not needed, as there are good, proven ways of using reference counting in multi CPU systems.

Wednesday, December 16, 2009

Writing for the latest python is FUN.

Writing for the latest python version is just simply... fun. Even the name 'python 3000' or py3k is amusing.

It's been a joy over the last week plugging away at a few personal website and game projects using the latest python version. Over the last year I've been involved in upgrading existing code to the latest versions of python... but not really in writing it JUST for the latest versions.

No need to consider backwards compatibility problems... older pythons be damned when programming for fun! I'm able to take advantage of the latest python features in deep architectural ways. To fiddle around with the new deep magic python internals. But the new deep magic parts aren't the best bits though... it's all the elegant little improvements that make a difference.

It is of great delight finding python 3 compatible modules. It turns out that the modules available for python 3 are often of quite good quality - and well maintained. There's not all that many of them around, so it can be good just to browse through those python 3 packages that are available.

All of this fun makes me reminisce to the days of playing around with python ten years ago. Perhaps I'm romanticising it a little bit(well I definitely am!). However when first getting into python all those years ago was quite a thing. It was probably the small community, and the idea that optimizing for the programmer was more important than optimizing for the computer(no more squiggly braces :). Like the halcyon days of python2, there is a smaller community of python 3 programmers, and there are a bunch of new techniques and tricks available from python 3. These two factors remind me about what it was like to get into python all those years ago.

Most of the programmers doing python 3 stuff are either doing it for fun, or doing it for the good of the python ecosystem over all, that is to make python better for people rather than better for the machine.

So if you need a reason to get into python 3 it is this: python 3 makes python fun again.

Wednesday, December 09, 2009

pywebsite.sqlitepickle. sqlite VS pickle

The sqlite and pickle modules that come with python are quite useful. So lets mix them together and see what comes out.


pywebsite.sqlitepickle is a little module I just made which combines sqlite and pickle for persistence. Useful since it works with python3, and both pickle and sqlite are included with pythons (including pypy).

Import the module.
>>> from pywebsite import sqlitepickle

An in memory db.
>>> db = sqlitepickle.SQLPickle()
>>>'key', 'value')
>>> db.get('key')

Can also save to a file. So we first get a temp file name.
>>> import tempfile
>>> f = tempfile.NamedTemporaryFile()
>>> fname =

>>> db = sqlitepickle.SQLPickle(fname)
>>>'key', 'value')
>>> db.get('key')

>>> db.close()

The issues with this are that sqlite does not like sharing connections from multiple threads. To get around that I just create a new connection in each thread, or do db stuff from one thread. Also using pickles in persistent data can be a security issue.

Speed? ok speed. Pickle isn't the fastest, and neither is sqlite... but they are both kind of ok.

Tuesday, December 08, 2009

Play cards. Not business cards.

Play cards. They're like business cards, but take 30 minutes to make five. Also, they are for handing out to people you want to play with, rather than to people you want to do business with. Well it could be used for both... but I think the normal style business card is a bit boring to give to friends.

play cards

Made some of these the other day. Based on an earlier version I designed a few years ago. They're about the size of a finger, but unfold to show other details (eg, email, website, etc).


Sunday, December 06, 2009

buildout project that uses numpy

Here is an example buildout project that uses numpy:

If you have bazaar, you can check it out like so:

cd /tmp/
bzr branch lp:numpybuildout
cd numpybuildout/trunk/

To use distribute instead of setuptools use the -d flag of
python -d
>>> import numpy
>>> numpy.__file__

There you have it, a simple project that can download and build numpy off of the python package index and install it in it's own private directory for use.

update: It was broken for python2.6 and numpy 1.3. However it works again with numpy 1.4

Saturday, December 05, 2009

gvim and karmic ubuntu... with a fix for you.

gvim in ubuntu karmic koala is really annoying. It prints out a bunch of gtk warnings each time you call it. However thanks to someone making a patch, you can clean it up. It seems that the fix isn't going to come out until the next ubuntu.

** (gvim:13354): CRITICAL **: gtk_form_set_static_gravity: assertion `static_gravity_supported' failed

** (gvim:13354): CRITICAL **: gtk_form_set_static_gravity: assertion `static_gravity_supported' failed

** (gvim:13354): CRITICAL **: gtk_form_set_static_gravity: assertion `static_gravity_supported' failed

** (gvim:13354): CRITICAL **: gtk_form_set_static_gravity: assertion `static_gravity_supported' failed

** (gvim:13354): CRITICAL **: gtk_form_set_static_gravity: assertion `static_gravity_supported' failed

Very annoying to see that every time you edit a file.

So instead of switching to arch linux, or gentoo linux... or one of the other developer centric distros there is a patch! ya for patches people make and share :)

Grab the debdiff to apply the patch. Check a look at the packaging guide for details on how to build with a debdiff patch.

NOTE: please back up your system before trying anything here. Also DO NOT TRY THIS AT HOME AS SOMETHING MAY GO HORRIBLY WRONG!!!

I'll run through what to do here:

cd ~
$ mkdir vim-gnome
$ cd vim-gnome

$ wget

$ md5sum vim_7.2.245-2ubuntu2.1.debdiff
abdb13517ec59a1a0b74b55b977e0139 vim_7.2.245-2ubuntu2.1.debdiff

Check that your md5sum is the same as this. Hey, why not look at the patch to review it for goodness? Well you don't have to, but it can be interesting looking at bug fixes.

We need to install all the tools required for building ubuntu packages...
$ sudo apt-get install build-essential fakeroot devscripts

Then grab the source for vim-gnome (or vim-gtk if you don't want to use the gnome version... replace vim-gnome with vim-gtk in all things below if you want that instead).
$ apt-get source vim-gnome

When I installed the libraries needed to build the source package...
sudo apt-get build-dep vim-gnome
I got a 'Segmentation fault'. eek. Well, time for a restart... It seems Karmic isn't all that stable for me with apt-get, dpkg and friends.

Ok, back from the restart of the computer.
Let us try it again...
sudo apt-get build-dep vim-gnome

Ya! it's working. I have to download about 100MB of packages. ... wait 10 minutes... then everything is downloaded and installed.

Apply the patch...

$ cd vim-7.2.245/
$ patch -p1 < ../vim_7.2.245-2ubuntu2.1.debdiff

Ok, lets build it again after we've applied this patch. This build step can take a long while!

$ debuild -uc -us

Ok, there's no test step... since ubuntu doesn't really have a way to run automated test suites (that I know of).

At least lintian is run and tells you something.

It should produce a bunch of .deb files for you.

vim_7.2.245-2ubuntu2.1_i386.deb vim-gtk_7.2.245-2ubuntu2.1_i386.deb
vim-common_7.2.245-2ubuntu2.1_i386.deb vim-gui-common_7.2.245-2ubuntu2.1_all.deb
vim-dbg_7.2.245-2ubuntu2.1_i386.deb vim-nox_7.2.245-2ubuntu2.1_i386.deb
vim-doc_7.2.245-2ubuntu2.1_all.deb vim-runtime_7.2.245-2ubuntu2.1_all.deb
vim-gnome_7.2.245-2ubuntu2.1_i386.deb vim-tiny_7.2.245-2ubuntu2.1_i386.deb

Now to install the relevant packages...
$ sudo dpkg -i vim_7.2.245-2ubuntu2.1_i386.deb vim-common_7.2.245-2ubuntu2.1_i386.deb vim-gui-common_7.2.245-2ubuntu2.1_all.deb vim-runtime_7.2.245-2ubuntu2.1_all.deb vim-gnome_7.2.245-2ubuntu2.1_i386.deb

Ya! no more annoying message :)

Now, if someone made a PPA other people on different architectures could also easily update their binaries. Or perhaps the ubuntu folks will make it nice for gvim developers, and apply the patch themselves.

Friday, December 04, 2009

From Berlin, back to London again... and python dojo!

Arrived back in London a few days ago, after a month living in Berlin.

I really enjoyed Berlin, and thought about staying there for longer... but in the end decided against it for now. Would really like to visit in the summer, when it's apparently quite a different place!

Some pictures from Berlin

Finding the bustle and life on the streets of London quite refreshing. I think I'll be based in London for the next six months or so at this point(probably longer).

Going to the london python dojo again on the 10th. This is an interesting format for a get together. People take turns pair programming with a projector, whilst the rest of the people in the room offer comments. This time there's going to be a few short interactive presentations too. It was nice meeting python people from around London last time. Fry-it are the hosts, who offer their office, pizza(+salad) and beer(+wine) for everyone at each dojo.

Friday, November 27, 2009

pythons distutils install race condition

Pythons distutils has a race condition where it starts to copy files into the python path whilst installing.

This is a race condition, since python programs can be importing the package whilst the package is being installed.

It would be good for distutils to install things in an atomic manner. Where things can be installed, or not installed. Like, on unix by moving the files in from a temporary directory. This would also help reduce breakages. Since if a package breaks half way installing a package then the broken version will not over write the existing version.

It's not a very serious problem, since most people don't install things on live important systems. Also some packaging tools fix the issues with the source installs.

Sunday, November 22, 2009

2to3c: an implementation of Python's 2to3 for C code

"2to3c: an implementation of Python's 2to3 for C code"

See for a description of this tool.

It works on C modules. So it should be easier for people to do ports to py3k for their C modules.

The 2to3c tool uses Coccinelle for transformations on the C code. This program has been used for linux, and other software... for updating code when their APIs change.

The perfect fit for the python C API changes!

Thursday, November 05, 2009

rakarrack is a decent effects rack for linux

rakarrack is a decent effects rack for linux.

It's not packaged for ubuntu(of course), but is fairly easy to compile. I've been using the cvs version (yes, people are still quite happily using cvs it seems).

Once you get it set up you have access to 80 presets of realtime guitar effects(2-40ms depending on your computer sound card setup).

It works with the jack audio system, so you can route audio into it, and route its audio out to other programs easily enough. You can also control it with midi(alsa, or jack). Meaning you can hook it up to a midi controller of some sort. I've been using it with an maudio axiom 25 controller, and with python scripts via pygame.midi.

It even has a very nice help manual integrated into the program... I wish more programs had a good help section. It details the 17 different effects that come with the program.

One thing which is a bit funny is that it doesn't use LADSPA or lv2 plugins. You can of course still use these other plugins through other jack programs if you want. However the ones it comes with are all fairly good quality from my tinkering. There is a lot you can do with the various plugins in different orders with different settings in each one. The 80 presets show a good variations of sounds you can get out of it.

I've been using it in conjunction with the mixxx mixing program and the hydrogen drum machine. So you don't need to use it just with guitars... any program or audio input will do.

If you are using guitars it also has a tuner which might come in handy... if you've never quite got the hang of playing that song called Tuning that you hear at the start of every gig.

Monday, November 02, 2009

Ich bin in Berlin

Berlin seems like a fun and laid back place. Arrived here the other day... haven't had much time to look around so far.

Friday, October 23, 2009

Karmic Koala Ubuntu 9.10 beta review.

This is for the beta release, and most(I think all) of these bugs have been reported in the bug tracker. (Note the bug tracker is currently reporting errors... so I can not link to bugs).

I do like a lot of things about the release... these are mainly things I don't like. So please don't take this as a bad view of Ubuntu 9.10 overall... just some criticism.

Hibernation seems to not work correctly all the time(for me)... when I close the lid, and open it later it does not seem restore properly. It shows the screen, but does not respond to input on the keyboard/trackpad. Then after a bit the screen goes black. Then I need to press the power button, and then the login box comes back up - and I can log in again. This has worked for the last two ubuntu releases, so it's annoying.

Booting seems slower... and is actually slower(timed it) for me. Perhaps I need to install from scratch for it to be faster or something. Pete Shinners has some benchmarks on his blog of ubuntu booting faster for him... boot benchmarks.

pulseaudio was installed again, stuffing up my sound settings. However after removing pulseaudio, the multimedia keys on my keyboard do not work... as they used to work in the previous release without pulseaudio. Apparently the ubuntu mixer applet only works with pulseaudio now. Yuk. If you go to the sound preferences dialog, it only works if you use pulseaudio.

So pulseaudio is using 3% of your cpu when no sound is playing, or pulseaudio does not work with your screen reader... or many other reasons you do not want to use pulseaudio, and you want to remove it ubuntus gui mixer fails.

The main pulseaudio author even says the ubuntu implementation of pulseaudio is bad on his blog... so I hope this is fixed before the official release.

Changing the speed of the CPUs/cores now requires a authentication... and typing in a password... annoying!! So now I end up leaving them at full cpu... wasting power, as I need full cpu in some situations. The usability is not considered here... I think people will just use more power.

Some of my icons were messed up, along with applets I was using. eg, my firefox icon got changed into a red circled crossed with a line through it.

I leave a memory card in the memory card holder... and now ubuntu pops a message up telling me it's there each time I open the lid. Even if I press the ignore button. Annoying. The last ubuntu did not ignore my request.

Newer versions of many software packages are available... which is nice. The graphics do seem to be faster on my intel machine... again nice :) New gcc, etc etc.

Still no jack/pulseaudio compatibility stuff. Still no OSSv4 sound system packaged :( Ubuntu have made it hard to have alsa, jack and pulseaudio all used side by side. The authors of those systems have worked to make the situation better, but ubuntus implementation is not good. Better to install audio stuff yourself for this release too.

Ubuntu has old SDL, and pygame releases... unlike a lot of other platforms which have the latest stable releases.

So expect Ubuntu karmic koala 9.10 to still have bad sound, and bad support for games. I have hopes that bugs will get fixed before the beta finishes... but I don't have very strong hopes.

update: Pete Shinners has some benchmarks on his blog of ubuntu booting faster for him... boot benchmarks. Added link to pulseaudio authors blog about poor pulseaudio on ubuntu.

Tuesday, October 20, 2009

web design for robots

More robots are reading websites than humans - so should we be designing the websites for robots?

Thanks to the search engine wars, most visitors to most websites are by robots. But has anyone asked what these robots enjoy?

There's often a divide in artists between introverted, and extroverted artists.
"Fuck you - I make art for myself and not for others." OR "I make art for people to enjoy, I'm not so selfish and self indulgent that I just make art for myself".

So I decided to interview a few robots to find out what some of their favourite websites are. But where to find a robot to ask questions of?

I had to look no further than an internet enabled fridge. So with my notebook in hand I pull a chair up to the fridge and ask it some questions. I open with a flurry of questions, trying to provoke a response.

"Do you have any favourite websites?" I ask. No answer. Obviously this robot does not know english. I give it another ten minutes of questioning - then give up. I leave my card for the robot in case it feels like answering questions later, and pull away my chair. The fridge seems relieved I have gone.

Luckily I can just query some robots brains directly. No need for tricky questions to extract the information I'm after. I stuff a proxy in between the fridge and the internet. A week later I study the statistics of its favourite websites.

As we all know, just because you go to a website five times a day - does not mean you enjoy it. However, it might give us an idea of what this robot does like. First I noticed it occasionally calls home to it's place of manufacture. I guess to send details back home of what it's up to. However, I guess its family has moved on - 404 - page not found for where its trying to send it's details. I wonder for how many years this will go on? It would be like you sending a post card home to your family for 20 years, and never knowing that your family moved house 19 years ago. Oh well, we keep sending the post cards anyway.

First rule for web design for robots - they are really dumb, and will possibly forever, keep looking at the same urls. Robots do not change with the world around them.

The Egyptians knew about mixing colors around 4500 years ago. Do robots know how to mix colors, and do they care about colors? Some of those early Egyptian paintings and scupltures, and poems are really interesting - and still provide joy, and insight. By designing for robots today, will robots in the future still appreciate our work?

A robot mind can often not change once set. Well, until it is updated by the programmer. What happens when the robot programmer moves on to another project? Will the robot ever be updated again? Will the robot gain a split personality? The result of many different programmers updating its brain, must surely be a variety of personalities mixed in there.

Continuing these thoughts - thinking about how robots think - is not really interesting to me. I am now forcing myself to continue writing, in a workman like fashion. Sorry (you should probably stop reading my robot friends).

What if the robot has some way of asking people to tell it what is nice, good, sane, and lawful? What if the robot can ask the people to tell it things... and reward them in some way for their work? Provide the robot brain programmers with some value, and the robot will evolve - for free.

Oh, so the robots are driven by the programmers - and what the robots can sense about the world around them. Can the robots also be driven by positive and negative feedback? Well certainly, it's possible. Is it possible? Then robots need to process the input - the signals - and make sense of them.

Signals are processed by the robots, but what signals do the robots enjoy the most? Signals that fall into easily recognized patterns surely.

Maybe robots will like other robots art the best? Robots that make things for other robots? Well, robots can make things all by themselves this is true.

What if the robot masters try to tell the robots - "do not care about what the robots make - only care for what the humans make". Well, then the robots try to appear human. There are robots who try to become human, and robots that try find out the difference between a human and a robot.

Signal - human. Signal - robot. Robot, human, robot, robot... 101010101...

We shall call this test the Robot Turing test. The Turing test is the test to see if people can see if a signal is from a human or a robot. An idea from a man dead - at the hands of government lives on in our brains, and in robot brains. Shifting signals henceforth into patterns recognised a long time ago.

Robots can use the human brains as a resource. Humans can use the robots as a resource. Who is in control in the end?


Is anything in control? Or is it stuck in an infinite loop? Are the humans and robots just sending post cards, even though their family has moved house?

Should someone tell them all to stop sending postcards?

Wednesday, October 14, 2009

game review - Tonk Tanks

Tonk Tanks is a really small, and fun game. All game play is on one screen, and the controls are simple.

You have arrow keys to move your tank around, and a button to shoot the other tanks. It's like one of those old atari 2600 tank games, but a more modern version. Once you die, you teleport to a different respawn positions around the map.

The game is quite playable single player - but the author is working on a networked multiplayer version. Another nice addition would be multiplayer on one machine - especially with joysticks or mice support (since keyboards are evil).

The tank AI seems different and varied enough, that I have not worked it out from playing it ten times or so. Usually my games only last about five minutes before I move onto something else. It's one of those games I can play for little while when I want a short break.

Tonk Tanks works well on linux, windows and Mac (and probably other platforms supported by python+pygame). There's a windows .exe available, otherwise you need to have python+pygame installed to play. I don't think it's packaged for any linux/*bsd distributions yet.

Tuesday, September 29, 2009

Spam detection on websites?

Assume you have a user content site - or you're using software that can somehow get spam links inserted into it.

How do you find out if your website has spam put on it?

It seems a common enough problem these days... people putting spam links on websites. Surely there must be a service or piece of software to detect such a thing?

I can think of a few ways to go about writing one fairly easily (using existing spam detection tools... but applying them to a spiders crawl of your website). It would be much nicer if there's already a tool which does such a thing though.

Saturday, September 26, 2009

Alsa midi, timidity, fluidsynth and jack.

If you don't have a midi output on linux(cause your laptop has crappy audio hardware) you can use timidity or fluidsynth to emulate it.
timidity -iA -B2,8 -Os -EFreverb=0

Well, this piece of html has a bunch of incantations for using timidity on linux... and also gives insight into how to use alsa midi tools.

Like listing midi ports, and connection midi ports with these two commands:
$ pmidi -l
Port Client name Port name
14:0 Midi Through Midi Through Port-0
20:0 USB Axiom 25 USB Axiom 25 MIDI 1
128:0 TiMidity TiMidity port 0
128:1 TiMidity TiMidity port 1
128:2 TiMidity TiMidity port 2
128:3 TiMidity TiMidity port 3

To connect the midi input from my usb Axiom 25 keyboard to the timidity synth the aconnect is the command to use.
aconnect 20:0 128:0

The AlsaMidiOverview has more information on things.

#remove all connections...
$ aconnect -x

# list all the connections(without using pmidi)
$ aconnect -o

# a gui for connections
# aconnectgui

Another synth that can be driven by midi is fluidsynth. qsynth is the graphical interface for fluidsynth which makes it easier to tweak. You can use it in pretty much the same way as timidity. It opens up a port (which you can list with pmidi -l), and then connect it to your keyboard with aconnect. fluidsynth is probably a bit nicer than timidity... and you can use soundfonts with it. Heaps of free sound fonts are available from and

This plugin is *very* useful:

It allows all your alsa using programs to be routed through your jack server. This means you can use all of your normal audio programs with low latency, good mixing and synchonised audio - even over the network (with netjack).

Ubuntu does not have the alsa jack plugin included for some brain dead reason... even though debian has had it packaged for a year or so. However building from source is simple. (./configure && make && make install). I've gone back to removing pulse audio, as this system works very nicely for me.

Here's my old jackd script, I put in a ~/bin/jack_mine and start with screen at boot.
jackd -R -P 70 -d alsa -p 256 -n 3 -r 44100

I can now use various synths, samplers, and effects racks from my python scripts.

Jamin is a mastering program with a eq, compressors etc. I don't think I'll need it for live work. However it might be useful for lots of instruments. I don't see any reason why it couldn't be controlled by another person with a midi controller.

Next up I need to see if multiple sound cards can work... unfortunately that apparently doesn't work the best. You can create a virtual sound card from multiple sound cards, but they have trouble syncing. It's funny that it's currently easier to sync sound cards on multiple machines than on the same machine... from what I know so far.

For my use I don't think sync will matter too much... that is listening with headphones with one card, and outputting to sound system with another channel. My inbuilt sound card has two output lines already, and one line input(which can also be used for output).

Dell inspirion 1525... the channels are:
1(l) - 2(r) - first headphone plug, or speakers if headphone not in plug 0.
5(r) - 6(l) - second headphone plug.
3(l) - 4(r) - third headphone plug.
7(x) - 8(x) - unused... (not soldered on?)

So looks like 6 usable channels... nice for a crappy cheap latop :) This is much nicer than what windows vista allows me to do, and also way nicer than the pulseaudio/gnome combination. I've used this setup to output 6 channel audio with various different playback libraries including pygame.

Friday, September 25, 2009

screen for ghetto servers and startup scripts.

GNU screen is a good little tool for server administration, or running things on your own remote machines. It's even good for running things locally.

I hope this is useful for people who want to run scripts every time they login, or reboot... and who need interactive access to those scripts. Or useful for those people who are already using screen, but would like to make their setup a bit better:
  • scripting sessions, rather than doing them manually at each login or reboot,
  • finding your screen sessions more easily.
  • restarting scripts at reboot,
  • monitoring,
  • logging,
  • resource control

Running things as daemons is cool... but if you'd also like interactive control occasionally, running things with screen is useful.

Most servers have screen, watch and crontab(osx is lacking watch though) - including most linux distros, *bsd, osx, windows(with cygwin). Most OSes also have their own style init scripts(scripts to run things at boot or logon). So this screen, watch, crontab combination is ok if you want to use multiple different types of computers - but reuse your scripts. If you want something robust, and good this isn't for you.

Scripting sessions
You can make a shell script to script your screen sessions. So you don't need to do them manually each time you login. This can save you a *lot* of time, and can make you less afraid of a reboot :).

-- /home/me/bin/ --

# start up your 'app1' to be restarted 2 seconds after it dies, every time it dies.
screen -d -m -S app1 -t "my web app one descriptive name" /opt/local/bin/watch -n 2 /home/me/someapp1/ > /home/me/logs/app1stdout >> /home/me/logs/app1stderr

# start running an application.
screen -d -m -S proxy -t "proxy server" /home/me/someapp2/

# connect to my my server setup with ssh keys.
screen -d -m -S server1 -t "my server" ssh

# monitoring run time test of my app1 every 67 seconds
screen -d -m -S monitor1 -t "my monitor script" /opt/local/bin/watch -n 67 /home/me/bin/

Finding screen sessions easily: you can run these commands:
screen -d -r app1
screen -d -r server1
screen -d -r proxy

This will let you connect to that screen from any shell. It detatches any session that's open. This is good as you can easily remember things with just the short names you choose, eg 'app1' 'proxy' etc. Normally you have to do 'screen -ls' look at the output, find the session you want to connect to, then finally 'screen -r random_sessionname'.

At Reboot: Then you can add this to your crontab with crontab -e (be careful not to use -r!!! which is right next to e on a qwerty keyboard). Or your crontab web interface with your hosting account( for example cpanel/whm, webmin, plex etc). Note that each user has a crontab. So to run your apps as different users, just login and change each ones crontab(or use sudo, or 'su username -l -c' from the roots crontab).
@reboot /home/me/bin/
# this line below can be used from a root account to run as the user 'me'.
@reboot su me -l -c /home/me/bin/

Restarting: Of course your app might crash or something... Look at the first one... that one has a watch -n 2... which means "run this process, wait for it to finish, then start it again after 2 seconds." It's kind of like a ghetto daemontools. Not as good as something like daemontools... but good enough for some purposes.

Monitoring: You can have a separate tool monitoring your scripts if you like... then if your app has frozen, or is overloaded... just send it some term, then kill signals... and watch will start another one up when it dies. Consider each script a 'runtime test'. If the 'runtime test' fails, then kill the app and restart it. The app could be a ssh proxy for your vpn - in which case the 'runtime test' would see if you can ping the network, and if not kill the ssh connection... so it restarts. A webserver runtime test might see if it can do a GET request... if not kill the server. A ghetto monitoring system for sure... but simple.

Logging: It's pretty easy to add some stderr and stdout redirection with > /home/me/logs/app1stdout >> /home/me/logs/app1stderr. This way you can have ghetto logging too.

Resource control: You can use ulimit in your scripts to limit how many resources your server can use. Then if it uses too much, it will die and be restarted in two seconds. Say you think your python web server should never *ever* take up 500MB of ram, then run it from a .sh file, and put ulimit -m 500000 before it. See ulimit -a for a list of things you can limit. Ghetto-quick resource control. Similarly you can use nice and ionice to make things behave more nicely :).

Debugging: screen doesn't give you an error message with -d -m. So you can either look in your logs or try out your command first with "screen cmd". eg "screen python". You can try out your @reboot command, not just by rebooting but by setting it to run in 10 minutes from now. See cron help pages for how to do that. You might want to use a sh -l in front of your command so it's a 'login shell'. This will setup your paths and environment variables like your login. Or setup explicitly for each app/script which paths and environment variables they need.

Again this isn't for everyone... but some ideas here might be useful for your own ghetto screen usage for servers and startup scripts.

Wednesday, September 23, 2009

Linux sound is getting better.

No I'm not talking about the free software song sung by Richard Stallman(very funny, but in a low quality .au format). Or the pronouciation of Linus and linux.

To start on this long-journey-of-a-rambling-diatribe-of-words, there's two good audio patches in the SDL bug tracker for the upcoming SDL 1.2.14 release.

One patch is for the pulse audio driver, and the other is for the alsa backend. These solve some of the high latency or scratchy sound issues some have.

That's right a new SDL release very soon... it's over a year since the last 1.2.13 release, and it seems like forever since the SDL 1.3 series begun. Most new development has been happening on the SDL 1.3 tree in the last year... so the 1.2 releases have slowed to an almost stop.

There's a good article on a x-platform atomic operation API for SDL That's one of the features that's been evolving over a few years, and is being implemented in svn.

In python terms, SDL 1.3 is like python3000. A refinement, and a promise to break backwards compatibility with the ABI. Note, not so much the API... the API is fairly backwards compatible... but some things must change. Also SDL 1.3 has lots of cool features I'm looking forward to.

Even though the SDL 1.3 tree is improving, and many people are now switching over to it, the SDL 1.2 series has a lot of life left in it.

So the SDL 1.2.14 release is all about fixing bugs, and applying patches. There's a lot of bug reports, and also a lot of patches in the main SDL bug tracker.

With free software and open source there is the mantra 'release early, release often' (the other mantra is 'release early, then abandon on sourceforge'). A stable version, that's used by people needs plenty of bug fixes, and people send in patches. Whereas a development version doesn't get the same kind of attention as released-and-used-by-people software. Many of these fixes done on the stable 1.2 tree will also be ported to the 1.3 tree too.

Now enough SDL 1.3 love... what else is improving in linux audio world? now for something completely different.

Well, pulse audio is frantically making releases. Three releases in september... so far, and five for 2009. Pulse and jack are also playing nicer together now(well, not packaged in ubuntu yet... grrrr, see bugs 198048 and 109659 this is critical for allowing many high end audio programs to work along side the 'beep' sound your terminal makes. Hopefully they'll get a good desktop architect (sound experience) from their job posting to fix things).

Jack is the low latency, synchronised audio system used by many professional audio programs on linux. Think unix pipes applied to audio, but in a way that works with the audio latency requirements. Both jack, and pulse audio have been ported to lots of other operating systems these days. Which can only be good for them getting more developer support... and making the linux audio world better along with it. You can see in their change logs, and repository commits that developers on different platforms other than linux are contributing quite a lot.

Even trusty old Open Sound System(OSS) has gotten better. OSS was removed from the kernel, replaced with alsa a while ago... but OSS kept going anyway. OSS4 has lots of things fixed compared to OSSv3 that most people remember using a long time ago. Including a fast transparent high quality in kernel mixer(good for crappy cards that only support one program outputing sound at a time). It also has a "record-what-you-hear" feature... for recording what is coming out of your sound card (a feature MS disabled in vista... booo!) The commercial version is now available as open source with a mecurial repository too! OSS is also quite x-platform.

What about sound applications?

The drum machine Hydrogen got a new release for the first time in three years... and this time it's not just linux only too.

A great DJ program called mixxx is another high quality multiplatform audio program. It's probably my most favourite audio program... just because it's so fun. You can even hook up real vinyl decks to it for scratch control(and midi ones). Unfortunately you can't pipe music in from other audio programs or in from a sound card... so you can't use the vinyl decks in that way. You have to use specially encoded records which the program then reads to figure out where the record is moving. The latest version features javascript scripting of midi and other parts of the program.

(go on, download it and become a dj ninja)

Guitarix is an amp emulator... it tries to sounds like various vintage guitar amps. Pretty fun to play around with.

(plenty of knobs to play with)

Especially in combination with many of the effect plugins available through the hundreds of LADSPA(guitarix is a LADSPA plugin too) and LV2 plugins. Other plugins available include vocoders and all sorts of weirdness.

Lash uptake has been good, and now lash talks dbus... letting it mix in nicely with the rest of the linux desktop ecosystem. Lash is a session system for linux audio programs. It lets you open your 12 different linux audio programs(remember audio in linux is like pipes... pipes with audio running through them instead of water... let's call them wires... but digital... maybe fiber optics... but not using light... ok whatever... why am I explaining it this way?... you're not five... too many dots. sorry.) and save your settings for later. The alternative is to each time open your 12 programs, set up the wiring between six of them, start messing around and finally... 2 hours later... realise you were supposed to be setting them up in a certain way rather than making stupid minimal beep noises to a house vocals mixed with a recording of a fart noise - filtered down to retro 8bit samples. Without lash, you couldn't save that brilliant setup and play with it later.

Audacity, the simple(yet advanced) audio editing work horse is moving towards a 2.0 by the end of the year. Audacity has been around for ages, and has been multi platform for ages. The 1.3 series seems to have been going on forever... but they do regular beta releases, and nightly builds. So it's pretty easy to get fresh versions. Do proper releases matter that much when new releases are pushed out every day? I guess so.

LiVES reached 1.0 earlier in the year after a long time in development... (since 2002!). LiVES is a video editor(which includes audio). It's actually quite useful for editing video! The other cool part of it, is that it's a VJ tool. So you can do those awesome projections you saw the last time you were in a club rushing around the place. You can control much of LiVES with midi too, which is mice.

(Make home movies of your loved ones. Like grandpa Nelson here.)

In fact lots of audio programs available for linux can be controlled by midi. Which is mice for me since you can easily do midi with python and pygame.midi.

Speaking of things midi and pythony... The vj program freej now has python wrappers! There are even five tutorials which use pygame. Unfortunately this is not in release form yet... but all this good stuff happening in the git repos.
(you too can make video art like this with freej... All you need is a crazy mask)

Both LiVES and freej use the frei0r video plugins. Which has nothing to do with linux sound getting better really. So there. Jerk(why did this guy even write this? I wish I didn't waste my time reading it.).

Comments? Important typos I should fix? Interesting linux audio things you're doing? Want to tell me how your tomatoes are growing in your garden? Gott a picture of your cat you'd like to share with me?

Tuesday, September 22, 2009

Where did the 'new' module go in python 3?

Anyone know where the 'new' module went in python 3?

2to3 can't seem to find 'new', and I can't find anywhere with my favourite search engine either... filed bug at: issue6964.

A complete 2to3 tool should know about all modules that are missing at least. It needs to actually know what to do with those modules, but should be able to at least tell you which modules are missing. I'm not sure how to get a complete top level module list sanely... I guess by scanning the libs directory of python.

Or maybe there is a module to find all python modules?

Each platform would be slightly different of course... and there'd be differences based on configure. Also some modules have probably stopped importing or compiling at all these days.

Then you could just find the intersection and differences with the lovely set module :)
# find the difference between the modules.
top_level_modules_not_in_3 = set(top_level_modules3series) - set(top_level_modules1_2series)

Well maybe the 2to3 tool could work a different way. Instead it could find all the modules it *does know about*, and warn you if it encounters modules it doesn't know about. You can already list fixes with: 2to3-3.1 -l

But what about packages with submodules? It seems hard to pin down exactly what's included with python. Or maybe there is an easy way to find out.

update: it is printed as a warning with python2.6 -3 -c "import new" . The types module is the one to use instead. A reminder to always use the python2.6 -3 to warn you of things the 2to3 tool can not fix. python2.6 -3 is a good thing to do on your current code base for preparation for a python 3 future.

Wednesday, September 16, 2009

py3k(python3) more than one year on - 0.96 % packages supporting py3k.

Python3 was released more than a year ago so far, and the release candidates and beta releases much before then.

How successful has the transition been to python3 so far?

One way to measure that is to look at how many python packages have been ported to py3k. One way to measure what packages are released is to look at the python package index(aka cheeseshop, aka pypi).

Not all packages ported to python3 are listed on pypi, and not all packages are listed on pypi. Also there are many packages which haven't been updated in a long time. However it is still a pretty good way to get a vague idea of how things are going.

73 packages are listed in the python3 section of pypi, and 7568 packages in total. That's 0.96% of python packages having been ported to python3.

Another large project index for python is the website. Where there are currently over 2000 projects which use pygame. I think there are 2 projects ported to python3 on there(but I can't find them at the moment). This shows a section of the python community using it in projects. Most of the things listed on pypi are packages - not projects. It's showing what people are using for their projects - not what their libraries support. In a similar way, it could be good to see how many websites are running on top of python3. I think a lot of the people who have ported to python3 aren't really using it for their projects, but have done the porting work as a good will measure towards moving python forward.

Another way to measure the success of the migration, is to pick a random sampling of some popular packages and see if their python3 support exists.

Pygame(yes), Pyopengl(no), pyglet(no), numpy(no), PIL python imaging library(no), cherrypy(yes), Twisted(no), zope.interface(no), buildout(not sure, I think no), setuptools(no, patches available), django(no), plone(no), psyco(no), cython(yes), swig(yes), sqlalchemy(no).

With some packages being used by 1000s or 10,000s of projects, those popular projects hold back the py3k migration significantly. It would seem that some applied efforts to the right projects would help the py3k migration a lot. Perhaps a py3k porting team could be made to help port important libraries to py3k.

How about other python implementations supporting python3 features? None have full python3 support as of yet. For example jython(no), pypy(no), ironpython(no), tinypy(no), python-on-a-chip(no), unladenswallow(no), shedskin(no). However some implementations support some new python3 features.

How about wsgi? wsgi is the python specification for web gateways... that it specifies how different web frameworks, web servers and applications can talk to each other and out to http. The wsgiref module in python3 is somewhat broken, and the amendments for python3 have not made it into a new wsgi spec. However work is being done towards it with a couple of major wsgi users supporting python3(cherrypy and mod_wsgi).

Another question to ask is: 'are many projects planning to support py3k soon? Or have they decided not to work on py3k at all yet?'. It seems many projects have decided not to put in the work yet. At this point, for many projects they don't see enough benefit towards moving to py3k. Or their dependencies have not been ported, and they are waiting on those to be ported before beginning to port themselves.

How well have the python developers themselves developed the support material for people upgrading their code? It looks like the cporting guide is still incomplete and hasn't been updated in a while. However the CPython API using projects have taken up the slack... so there are now a number of extensions for people to look at for guidance. It's possible to make CPython extensions which support both 2.x and 3.x APIs.

There is now a 3to2 script being worked on. This allows projects to write their code in python3 code, and have it translated into python2 code. The python developers realised that having a 2to3 script was backwards in a way - requiring developers to stick with their python 2 code. However, many projects seem to not use the translation script, since it hasn't worked for them. Instead they seem to have either made separate branches, or made their code so that it works in both 2.x and 3.x.

Support for python3 was dropped, and python3.1 is the new python3. However python3 still exists in some distributions (like ubuntu).

So how are the various OS distributions going with their python3 support? The latest version of OSX to be released (snow leopard) uses a version of python2.6.1. Most unix distributions are using python2.6 as their main python at the moment. However most of them have also packaged python3.x as well. So it's fairly easy for people to try out python3 alongside their python2.x installation(s). macports currently has py25(286 ports), py26(206 ports), py30(0 ports... since py3.0 isn't supported by, and py31(4 ports). So for macports, it has 1.9% of py31 packages ported compared to py25. This shows similar percentage to the ratio of ported packages in the pypi index(0.96%).

This is not mirrored by the number of windows downloads from Python2.6.2 windows installer had 786400 downloads, python2.5.4(104291), python 3.1(241363) 3.1.1(214871) for a total of 456234 for 3.x. This is around 58% comparing 2.6 and (3.0+3.1). Strangely about the same amount of people are downloading 3.0 as 3.1 - even though it states that 3.1 is the new py3k and 3.0 is not supported anymore. This is just windows download counts for August... if you compare it to most unix distributions, they almost all come with python2.5 or python2.6.

So, is the python3 migration going along swimmingly? Or has it failed to reach its goals(what were its goals if any)? What can we do to help? Should we even help at this point? comments?

Friday, September 11, 2009

The many JIT projects... Parrot plans to ditch its own JIT and move towards one using LLVM.

It seems LLVM has gained another user, in the parrot multi language VM project. They plan to ditch their current JIT implementation and start using LLVM.

Full details are on their jit wiki planning page. There is more of a discussion on the parrot developer Andrew Whitworths blog here and here.

Parrot aligns very nicely with the LLVM project which itself is attempting to be used by many language projects.

Along with the unladen swallow project(python using LLVM for JIT), this brings other dynamic languages in contact with LLVM. This can only mean good things for dynamically typed languages on top of LLVM.

Mac ruby is another project switching to LLVM - they have been working on it since march.

Rubinius seems to be another ruby implementation mostly written in ruby, and the rest in C++ with LLVM. It even supports C API modules written for the main ruby implementation. 'Rubinius is already faster than MRI on micro-benchmarks but is often slower than MRI running applications'.

Hopefully this will help LLVM become more portable, and faster at creating code... as well as being able to create larger amounts of code(LLVM only supports generating up to 16MB of code currently, but that limit is being worked on).

It's yet to be proven that a major dynamically typed language can be sped up nicely with LLVM, but these projects using it should help it get there.

luaijt for lua and psyco V21 for python are both successful JIT projects for dynamic languages. However, both are limited in their platform support - only supporting 32bit x86 platforms. Other successful JITd dynamically typed languages include the many javascript and action script implementations... including V8(x86 32 and arm) and tracemonkey (which uses nanojit which supports many backends: arm, x86 64/32, sparc, ppc etc).

Luajit is compared to lua llvm here and here. It seems lua jit2 is faster than luallvm, and the posts explain why. It also points out that LJ2 is faster than C speeds for some things.

pypy decided not to use LLVM too, and has embarked on making their own jit system. At one point there was code in the pypy svn repository to support LLVM, but it was removed a while ago. One comment in the past was that LLVM was too slow at generating code, and that it was a very large dependency. LLVM is C++ code that takes quite a while to compile itself, and the library is quite large.

Despite these downsides of LLVM, it can generate very efficient code. LLVM is often comparable to the fastest generated code for the C language. This is one reason why the unladed swallow project has chosen it. The unladen swallow projects goal is to optimize long running server processes... so it doesn't care that much about generating fast math code, or in taking its time to generate native code. This makes sense, considering that it is a google sponsored project.

Another interesting project for python is the corepy project. It's a run time assembler for python. One project corepy is used for is to accelerate numpy operations - using SSE and multiple cores - so even the numpy written in C can go much faster with the corepy accelerated version. The numcorepy blog lists the results of the project. Including a 200,000 particle particle system done with numcorepy on the cpu(s).

In the same vein of accelerating numpy code - the pygpu, and pycuda projects make it possible to use GPU accelerated versions of numpy functions. This allows python code to run way faster than is possible on any available CPU. These projects generate shader code in C-like languages to run on the GPU. So in a way they are also JIT libraries.

liborc is a runtime cross platform assembler which supports many vector operations. Unlike many of the code generators that do not support vector operations - liborc does. It's a replacement for liboil and is used for gstreamer and dirac multimedia libraries.

Inferno is a virtual machine project which includes a JIT for many platforms.

1psyco v2 doesn't seem to have a web page yet, just a svn(not the old psyco v1 svn on source forge).

updates: from the comments... added note about mac ruby using llvm, the inferno vm, and the rubinius ruby using llvm. Added link to numcorepy project, and a link to pypy. Added some links to a comparison of lua llvm and luajit, and a link to lua llvm.

Thursday, September 10, 2009

Linux 2.6.31 released... the good bits.

The new linux kernel has been released. Here are the human readable changes.

Here's the cool stuff (the links in the original article were broken, so I've fixed the links here):
  1. USB 3 support
  2. CUSE (character devices in userspace) and OSS Proxy
  3. Improve desktop interactivity under memory pressure
  4. ATI Radeon Kernel Mode Setting support
  5. Performance Counters
  6. IEEE 802.15.4 Low-Rate Wireless Personal Area Networks support
  7. Gcov support
  8. Kmemcheck
  9. Kmemleak
  10. Fsnotify
  11. Preliminary NFS 4.1 client support
  12. Context Readahead algorithm and mmap readhead improvements

For me the performance counters will be the most useful thing. Also being able to use and write user space character devices is cool(especially for audio). USB3 support is awesome, but not useful right now... since there isn't even much hardware out yet!

More info on what that low power wireless support is, can be found on the wikipedia: IEEE_802.15.4-2006.

Tuesday, September 08, 2009

Dependency analysis, and a digression onto mock ducks.

Dependency analysis allows all sorts of fun things in software.

It can be used to reduce software defects. How? Say you have 10 components, and over time they may bit rot, or change. By reducing a dependency on as many of the components as possible, means you have less of a chance of encountering a bug. It also means you have exponentially less code to update or re-factor. Another reason, is that combining multiple components together requires more testing... exponentially more testing(which is why unit-tests are popular).

Performance can be improved with dependency analysis too. Not just by reducing the amount of code run. If code doesn't have dependencies, it can be run in separation. This is where some object oriented design is missing something. When they have methods which change the state of an object internally - then they have a dependency. At this stage it makes task, and data level parallelism harder.

Compare these two calls:
map(o.meth, data)
map(f, data)

If you had a dependency analysis module you could check what dependencies f, and o.meth had. Then you could safely distribute them, and not require locking or anything else of that kind. Failing to have this available to you, you can make sure they use locking, or atomic operations... or you can manually make sure they do not have any dependencies.

Unfortunately method calls often change the state of an object, even if some times they don't need to. Say half way inside a method, it assigns something to self? Then you've changed the state of the object, and your code is not safe for distribution.

What language features, or design ideas encourage the reduction of dependencies? Functional programming is one. Unit testing is another. Good packaging, and module systems are a third.

Duck typing is another one that can help reduce dependencies. However, it has problems too. Say you have a class like this:

class Neighbor:
def use_your_duck(self, duck):
self.number_of_feathers = duck.number_of_feathers

The issue is that the caller of Neighbor.use_your_duck isn't sure exactly what use_your_duck needs a duck for. Why does Neighbor need to use a whole duck, just to know how many feathers it has? By giving it the whole duck, you've created a dependency on ducks. Each time your neighbor needs to figure out how many feathers are on the duck, you need to give your neighbor the duck. What if instead, you just count the feathers your self, and give that number to the neighbor instead.

What if your duck changes the amount of feathers... if it's important that your neighbor gets an accurate feather count, then they will want access to the duck. Letting your neighbor count the feathers itself, means you have less work to do. This is why it's good to be able to give a reference to a duck... rather than just counting the feathers.

However, if your neighbor moves to Alaska, and you live in Buenos Aires - then you might have a problem sending them a duck every time they want to count the feathers. Now your neighbor has to fly over to pick up the duck, take it to Alaska and count the feathers... or just keep the duck in Alaska. Another option is for you to just tell your ex-Neighbor how many feathers the duck has over the phone. Your neighbor gives you a call, you go off to count the feathers... and call-back.

Or, you could make some mock-duck, and give your neighbor this. You could make it mostly like a duck... well, you could design this mock-duck forever to figure out what your neighbor is doing with your duck every day. So you plant a spy camera in your neighbors basement... and note your neighbor only ever counts the feathers on the duck... never plucks your duck or does anything else to the duck. So it's safe for you to make your mock-duck with a bunch of feathers on it - and be fairly sure your neighbor will not break when borrowing your mock-duck.

Anyway... enough typing about ducks for now.