Monday, 2 April 2007

libebook scalability

Ever had a monster-addressbook in Evolution? During some performance testing of libebook this weekend, I found that it behaves very badly with large addressbooks. A bit of digging revealed the cause - in the way it uses GList. When fetching multiple contacts from the standard file backend, g_list_append is used instead of g_list_prepend, which scales way better. A tiny patch fixes the problem.



The "libdb direct" numbers stem from a routine that bypasses evolution-data-server altogether and connects directly to libdb. Consider that almost the ultimate "time to beat", given the current database format. There is no doubt room for improvement.

All tests are run with a number of equal contacts with only name and email filled in. The database was stored on a tmpfs drive (ramdisk). For those who are interested, to create 1 million contacts take several hours.. I don't think that anyone sane will have tens of thousands of contacts, but for testing purposes, for batch processing and large enterprises, good scalability is a must. The address book has more potential for peformance improvements, but sooner or later the verbose data structure will be the limit.

I'm sure g_list_append is used in a similar way a lot of places around e-d-s and GNOME in general. A quick-win is to replace with g_list_prepend wherever n is large.

3 comments:

Unknown said...

Nice catch! An audit of the code base for similar cases would definitely be helpful. In some cases it might be more convenient to use a GQueue, which provides O(1) appends.

Jeffrey Stedfast said...

Evolution already has a List data-structure that does O(1) appends and O(1) prepends... it's called EDList and I believe it is in evolution-data-server's e-memory.h or something like that.

Øystein said...

"Something like that" then being e-msgport.{h,c}..

The question is whether to use GSList, GList, GQueue, or EDList.. Why isn't there a doubly-linked list with O(1) appends in glib btw?

Gee, I miss high-level languages..