<?xml version="1.0" encoding="utf-8"?>
<!-- If you are running a bot please visit this policy page outlining rules you must respect. https://www.livejournal.com/bots/ -->
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:lj="https://www.livejournal.com">
  <id>urn:lj:livejournal.com:atom1:dmalcolm</id>
  <title>dmalcolm</title>
  <subtitle>dmalcolm</subtitle>
  <author>
    <name>dmalcolm</name>
  </author>
  <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/"/>
  <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom"/>
  <updated>2012-03-23T22:39:37Z</updated>
  <lj:journal userid="15358883" username="dmalcolm" type="personal"/>
  <link rel="service.feed" type="application/x.atom+xml" href="https://dmalcolm.livejournal.com/data/atom" title="dmalcolm"/>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:6935</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/6935.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=6935"/>
    <title>Autodetecting reference-counting bugs: the video!</title>
    <published>2012-03-23T22:39:37Z</published>
    <updated>2012-03-23T22:39:37Z</updated>
    <category term="fedora"/>
    <category term="gcc"/>
    <category term="python"/>
    <content type="html">I gave a talk at PyCon 2012 about my &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fgcc-python-plugin%2F" rel="nofollow" rel="nofollow"&gt;Python plugin for GCC&lt;/a&gt; - how this lowers the barrier for entry to potential GCC hackers, and how&amp;nbsp;&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FStaticAnalysisOfPythonRefcounts" rel="nofollow" rel="nofollow"&gt;I&amp;#39;ve been using this to find reference-counting errors in Python C extension modules&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A video of the talk can be seen here:&lt;br /&gt;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpyvideo.org%2Fvideo%2F648%2Fstatic-analysis-of-python-extension-modules-using" rel="nofollow" rel="nofollow"&gt;http://pyvideo.org/video/648/static-analysis-of-python-extension-modules-using&lt;/a&gt;&lt;br /&gt;or on YouTube &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dn6145JSeqWc" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The slides are &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2F%7Edmalcolm%2Fpresentations%2Fpycon-2012%2FStaticAnalysisOfPythonExtensionModulesUsingGcc.html" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;We had a mini-sprint after the talk, and another later on in the main&amp;nbsp;sprints, covering these topics:&lt;ul&gt;&lt;li&gt;getting the plugin to build on OS X (using MacPorts&amp;#39; build of&amp;nbsp;gcc-4.6.1): this works, but needed some compat patching around some of&amp;nbsp;the differences between glibc and OS X&amp;#39;s libc (also case-sensitivity of&amp;nbsp;filenames).&lt;/li&gt;&lt;li&gt;improving the static analysis engine: currently it takes the&amp;nbsp;simplistic approach of trying to generate all traces of execution&amp;nbsp;through the function in a big tree, which suffers from exponentially&amp;nbsp;explosions.&amp;nbsp; I&amp;#39;m hoping the insides can be reworked to implement an&amp;nbsp;iterative solver that can handle loops more robustly (data flow&amp;nbsp;equations)&lt;/li&gt;&lt;li&gt;improving the HTML error reports that the analyzer generates. &amp;nbsp;More on this to follow!&lt;/li&gt;&lt;/ul&gt;I&amp;#39;ve also finally succumbed to the inevitable and joined github: you can &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Fgithub.com%2Fdavidmalcolm%2Fgcc-python-plugin" rel="nofollow" rel="nofollow"&gt;fork the plugin here&lt;/a&gt;&amp;nbsp;(this is just a clone I keep synchronized with &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgit.fedorahosted.org%2Fgit%2F%3Fp%3Dgcc-python-plugin.git%3Ba%3Dsummary" rel="nofollow" rel="nofollow"&gt;the Fedora Hosted repository&lt;/a&gt;)</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:6869</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/6869.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=6869"/>
    <title>Adding a spellchecking pass to GCC, using Python</title>
    <published>2011-09-30T18:16:10Z</published>
    <updated>2011-09-30T18:16:10Z</updated>
    <category term="fedora"/>
    <category term="gcc"/>
    <category term="python"/>
    <content type="html">LWN recently posted an &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Flwn.net%2FArticles%2F457543%2F" rel="nofollow" rel="nofollow"&gt;interesting article on writing GCC plugins&lt;/a&gt;, showing how to add a spellchecking pass to the compiler.&lt;br /&gt;&lt;br /&gt;I thought this was a neat example, so I had a go at porting it from C to use my &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FGccPythonPlugin" rel="nofollow" rel="nofollow"&gt;Python plugin for GCC&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The result is about 35 lines of Python code (including comments), which you can see &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Freadthedocs.org%2Fdocs%2Fgcc-python-plugin%2Fen%2Flatest%2Fworking-with-c.html%23spell-checking-string-constants-within-source-code" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I hope the code is easy to read. &amp;nbsp;One other thing this demonstrates is how by extending the compiler in Python you have access to the whole of &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpypi.python.org%2Fpypi" rel="nofollow" rel="nofollow"&gt;the Python packaging ecosystem&lt;/a&gt; - I was able to use the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpackages.python.org%2Fpyenchant%2F" rel="nofollow" rel="nofollow"&gt;&amp;quot;enchant&amp;quot; spellchecking library&lt;/a&gt; (originally from the&amp;nbsp;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fwww.abisource.com%2F" rel="nofollow" rel="nofollow"&gt;AbiWord&lt;/a&gt; project) with about 4 lines of code.&lt;br /&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:6560</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/6560.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=6560"/>
    <title>Automatically detecting reference-count bugs in Python extension modules</title>
    <published>2011-07-15T21:32:30Z</published>
    <updated>2011-07-15T21:38:21Z</updated>
    <category term="fedora"/>
    <category term="gcc"/>
    <category term="python"/>
    <content type="html">[ For the tl;dr version, scroll down to see the pretty screenshots :) ]&lt;br /&gt;&lt;br /&gt;I've been working on &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fgcc-python-plugin%2F" rel="nofollow" rel="nofollow"&gt;a static analysis tool to automatically detect reference-count bugs in C Python extension code&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(see my earlier posts on verifying calls to the PyArg_ParseTuple API &lt;a href="http://dmalcolm.livejournal.com/5931.html" target="_blank"&gt;here&lt;/a&gt; and &lt;a href="http://dmalcolm.livejournal.com/6364.html" target="_blank"&gt;here&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Mismanaging reference counts can lead to the python process leaking memory (and other resources), for when an object becomes immortal, or segfaulting, when an object is cleaned up when things still refer to it.&lt;br /&gt;&lt;br /&gt;My "cpychecker" code is still an early prototype (don't expect to use it on arbitrary C code yet), but here's an example of some of the things it's already capable of:&lt;br /&gt;&lt;br /&gt;Can you see the reference-counting error in this (contrived) code fragment?&lt;br /&gt;&lt;pre&gt;
    22	PyObject *
    23	refcount_demo(PyObject *self, PyObject *args)
    24	{
    25	    PyObject *list;
    26	    PyObject *item;
    27	    list = PyList_New(1);
    28	    if (!list)
    29	        return NULL;
    30	
    31	    item = PyLong_FromLong(42);
    32	    if (!item)
    33	        return NULL;
    34	
    35	    PyList_SetItem(list, 0, item);
    36	    return list;
    37	}
    38	
    39	static PyMethodDef test_methods[] = {
    40	    {"refcount_demo",  refcount_demo, METH_VARARGS, NULL},
    41	    {NULL, NULL, 0, NULL} /* Sentinel */
    42	};
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Compiling like this:&lt;br /&gt;&lt;pre&gt;
  [david@fedora-15 gcc-plugin]$ ./gcc-with-python cpychecker.py -I/usr/include/python2.7 refcount-demo.c
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;the checker adds this output to gcc's:&lt;br /&gt;&lt;pre&gt;
refcount-demo.c: In function ‘refcount_demo’:
refcount-demo.c:37:1: error: ob_refcnt of PyListObject is 1 too high
refcount-demo.c:27:10: note: PyListObject allocated at:     list = PyList_New(1);
refcount-demo.c:27:10: note: when PyList_New() succeeds at:     list = PyList_New(1);
refcount-demo.c:28:8: note: when taking False path at:     if (!list)
refcount-demo.c:31:10: note: reaching:     item = PyLong_FromLong(42);
refcount-demo.c:31:10: note: when PyLong_FromLong() fails at:     item = PyLong_FromLong(42);
refcount-demo.c:32:8: note: when taking True path at:     if (!item)
refcount-demo.c:33:9: note: reaching:         return NULL;
refcount-demo.c:37:1: note: when returning
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;which can be navigated in any IDE that can parse GCC's output messages (&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2F%7Edmalcolm%2Fblog%2F2011-07-15%2Frefcount-demo-textual-errors.png" rel="nofollow" rel="nofollow"&gt;works for me in emacs&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;This demonstrates a particular path of execution that has a bug.&lt;br /&gt;&lt;br /&gt;I found the textual output a bit heavy on the eye, so I've hacked up the plugin script so it can render graphical HTML visualizations of the errors that it finds.&lt;br /&gt;&lt;br /&gt;Here's that same report, in HTML form:&lt;br /&gt;&lt;br /&gt;  &lt;img src="https://imgprx.livejournal.net/5b864aa2dd285f2de822d432158df0944856492c/lRdmkRy0lyEKoQaqOUgcDnnGpXMe4qtmRlfyiRYmHhPzRbkJw6JQCUMHw-I7Xsv3vwkE4xJyaD5VUGjWSmVTucPFHXTQqAIAtQlPhoGRsrivndmV8TXwAHwj4RjyOWWO" fetchpriority="high" /&gt;&lt;br /&gt;&lt;br /&gt;The report shows the control flow through the function: lines that get executed are written in bold and outlined in blue, with arrows connecting them, and additional annotations in italics.  (I'm not so good at HTML/CSS, so help here would be most welcome!).&lt;br /&gt;&lt;br /&gt;I used the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fjsplumb.org%2F" rel="nofollow" rel="nofollow"&gt;jsplumb&lt;/a&gt; JavaScript library to add lines to the HTML to link together elements.  This uses the newish &amp;lt;canvas&amp;gt; element, so the control-flow lines may only appear in recent browsers.  It works for me in Chromium 12 and Firefox 4.  You can see the HTML report itself here:&lt;br /&gt;  &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2F%7Edmalcolm%2Fblog%2F2011-07-15%2Frefcount_demo-refcount-errors.html' rel='nofollow'&gt;http://fedorapeople.org/~dmalcolm/blog/2011-07-15/refcount_demo-refcount-errors.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(Currently it's hardcoded to generate the reports, but I'll probably add something like a -fdump-html command-line option to the gcc-with-cpychecker harness).&lt;br /&gt;&lt;br /&gt;Here are some more examples:&lt;br /&gt;&lt;br /&gt;  Detecting the all-too-common: "return Py_None;" bug:&lt;br /&gt;    &lt;img src="https://imgprx.livejournal.net/8629df6c406ec7e309161f1f24d4e924eef5f1f7/lRdmkRy0lyEKoQaqOUgcDnnGpXMe4qtmRlfyiRYmHhPzRbkJw6JQCUMHw-I7Xsv3vwkE4xJyaD5VUGjWSmVTuXEes_8TzW07fEkrlNoPB1CWHjeZijNdPfEfPhQmnzdA" loading="lazy" /&gt;&lt;br /&gt;    As HTML: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2F%7Edmalcolm%2Fblog%2F2011-07-15%2Flosing_refcnt_of_none-refcount-errors.html' rel='nofollow'&gt;http://fedorapeople.org/~dmalcolm/blog/2011-07-15/losing_refcnt_of_none-refcount-errors.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;  Another (very contrived) reference leak:&lt;br /&gt;    &lt;img src="https://imgprx.livejournal.net/e1443da567c49a2eaa69eb04132d34dac1f9cf7a/lRdmkRy0lyEKoQaqOUgcDnnGpXMe4qtmRlfyiRYmHhPzRbkJw6JQCUMHw-I7Xsv3vwkE4xJyaD5VUGjWSmVTuWHsv_9VY0YdyXBnH4E-A-Qv_78YDF7QqeHMF1fofUvz" loading="lazy" /&gt;&lt;br /&gt;    As HTML: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2F%7Edmalcolm%2Fblog%2F2011-07-15%2Fobject_leak-refcount-errors.html' rel='nofollow'&gt;http://fedorapeople.org/~dmalcolm/blog/2011-07-15/object_leak-refcount-errors.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;  Detecting a stray Py_INCREF that makes the reference count too high, or segfaults python, depending on what happened earlier:&lt;br /&gt;    &lt;img src="https://imgprx.livejournal.net/f69298e7239226a327df91539806e0f38660b58e/lRdmkRy0lyEKoQaqOUgcDnnGpXMe4qtmRlfyiRYmHhPzRbkJw6JQCUMHw-I7Xsv3vwkE4xJyaD5VUGjWSmVTuVPXSzOW-0qEPKU4Em_lOBbGU9vuYdBgVUe9OFIr8L9n" loading="lazy" /&gt;&lt;br /&gt;    As HTML: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2F%7Edmalcolm%2Fblog%2F2011-07-15%2Ftoo_many_increfs-refcount-errors.html' rel='nofollow'&gt;http://fedorapeople.org/~dmalcolm/blog/2011-07-15/too_many_increfs-refcount-errors.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This is still an experimental prototype, so it's not yet ready for general purpose use, but I'm frantically working on it, and I hope it will be ready in time for &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FStaticAnalysisOfCPythonExtensions" rel="nofollow" rel="nofollow"&gt;inclusion in Fedora 16&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The checker is &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFoundations%23Freedom" rel="nofollow" rel="nofollow"&gt;Free Software&lt;/a&gt; (licensed under GPLv3 or later), and if you want to get involved, go to &lt;a href='https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fpipermail%2Fgcc-python-plugin%2F' rel='nofollow'&gt;https://fedorahosted.org/pipermail/gcc-python-plugin/&lt;/a&gt; (as I said above, I could really use some help with HTML and CSS!  The checker is written in Python itself, if you're interested in &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgit.fedorahosted.org%2Fgit%2F%3Fp%3Dgcc-python-plugin.git%3Ba%3Dsummary" rel="nofollow" rel="nofollow"&gt;hacking on the code&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;(Thanks to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fwww.redhat.com%2F" rel="nofollow" rel="nofollow"&gt;Red Hat&lt;/a&gt; for allowing me to spend a substantial proportion of my $DAYJOB on this)</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:6364</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/6364.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=6364"/>
    <title>Verifying the more awkward parts of the CPython API</title>
    <published>2011-06-25T00:45:13Z</published>
    <updated>2011-06-25T00:45:13Z</updated>
    <category term="fedora"/>
    <category term="gcc"/>
    <category term="python"/>
    <content type="html">I've been running my &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fgcc-python-plugin%2F" rel="nofollow" rel="nofollow"&gt;Python extension module static analyser&lt;/a&gt; over CPython itself (the latest in the 2.7 hg branch, specifically).&lt;br /&gt;&lt;br /&gt;I'm pleased to say that &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fmailman%2Flistinfo%2Fgcc-python-plugin" rel="nofollow" rel="nofollow"&gt;the project's mailing list&lt;/a&gt; received the &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fpipermail%2Fgcc-python-plugin%2F2011-June%2F000007.html" rel="nofollow" rel="nofollow"&gt;first patch to the checker&lt;/a&gt; from someone other than me (Thanks Tom!)  - He's been running it over the sources of gdb (which embeds python).&lt;br /&gt;&lt;br /&gt;These make for good torture tests for the analyser, and I'm pleased with how far it survived.  Some bugs do remain - in the checker, that is.&lt;br /&gt;&lt;br /&gt;There are quite a few places where CPython calls PyArg_ with a code expecting a "const char*", but receives a "char*'.  I think the ones in CPython are all false positives, so I think we're going to need to make that configurable.&lt;br /&gt;&lt;br /&gt;Perhaps the most fiddly part of the checking this API is the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fdocs.python.org%2Fc-api%2Farg.html%23parsing-arguments-and-building-values" rel="nofollow" rel="nofollow"&gt;"O&amp;" conversion code&lt;/a&gt; - I wasn't able to handle this in my &lt;a href="http://dmalcolm.livejournal.com/3689.html" target="_blank"&gt;previous Coccinelle-based approach&lt;/a&gt; to this problem.&lt;br /&gt;&lt;br /&gt;Here's an example:&lt;br /&gt;&lt;pre&gt;
    68	
    69	extern int convert_to_ssize(PyObject *, Py_ssize_t *);
    70	
    71	PyObject *
    72	buggy_converter(PyObject *self, PyObject *args)
    73	{
    74	    int i;
    75	
    76	    if (!PyArg_ParseTuple(args, "O&amp;", convert_to_ssize, &amp;i)) {
    77	        return NULL;
    78	    }
    79	
    80	    Py_RETURN_NONE;
    81	}
    82	
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The idea is that you're meant to supply a conversion callback, which can extract a value back to the next argument.&lt;br /&gt;&lt;br /&gt;The above example has a bug (can you see it?)&lt;br /&gt;&lt;br /&gt;After &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgit.fedorahosted.org%2Fgit%2F%3Fp%3Dgcc-python-plugin.git%3Ba%3Dcommit%3Bh%3Dc00b06553d71d2d52c128b3d63f46308dfb87970" rel="nofollow" rel="nofollow"&gt;a fair amount of coding today&lt;/a&gt;, the checker is now able to detect it:&lt;br /&gt;&lt;br /&gt;[david@fedora-15 gcc-python]$ ./gcc-with-python cpychecker.py $(python-config --cflags) demo.c &lt;br /&gt;&lt;pre&gt;
demo.c: In function ‘buggy_converter’:
demo.c:76:26: error: Mismatching type in call to PyArg_ParseTuple with format code "O&amp;" [-fpermissive]
  argument 4 ("&amp;i") had type
    "int *" (pointing to 32 bits)
  but was expecting
    "Py_ssize_t *" (pointing to 64 bits) (from second argument of "int (*fn) (struct PyObject *, Py_ssize_t *)")
  for format code "O&amp;"
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Notice how it used the type of the callback to figure out what the type of the next argument must be.&lt;br /&gt;&lt;br /&gt;I've also &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgit.fedorahosted.org%2Fgit%2F%3Fp%3Dgcc-python-plugin.git%3Ba%3Dcommitdiff%3Bh%3D63ea6c0548da9869b15b87f41abf47bb4ec6f791" rel="nofollow" rel="nofollow"&gt;reformatted the error messages slightly&lt;/a&gt;, adding newlines and indentation to try to make them easier to grok.&lt;br /&gt;&lt;br /&gt;Hopefully we'll shake out the rest of the bugs soon, and then on to reference-count checking...&lt;br /&gt;&lt;br /&gt;If's still rather rough around the edges, but if you want to try running it on your extension module, then &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fmailman%2Flistinfo%2Fgcc-python-plugin" rel="nofollow" rel="nofollow"&gt;come and join us&lt;/a&gt;.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:5931</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/5931.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=5931"/>
    <title>Static analysis of CPython extensions, using a new GCC plugin</title>
    <published>2011-06-23T21:50:55Z</published>
    <updated>2011-06-23T21:50:55Z</updated>
    <category term="fedora"/>
    <category term="gcc"/>
    <category term="python"/>
    <content type="html">I've been looking at ways to improve the quality of Python extensions written in C.&lt;br /&gt;&lt;br /&gt;CPython provides a &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fdocs.python.org%2Fextending%2Findex.html" rel="nofollow" rel="nofollow"&gt;great C API&lt;/a&gt; that makes it easy to relatively easy to integrate C and C++ libraries with Python code.  We use it extensively within &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2F" rel="nofollow" rel="nofollow"&gt;Fedora&lt;/a&gt; -  for example, &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FAnaconda" rel="nofollow" rel="nofollow"&gt;Fedora's installation program is written in Python&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But you do have to be write such code carefully:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;  &lt;li&gt;you have to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fdocs.python.org%2Fextending%2Fextending.html%23reference-counts" rel="nofollow" rel="nofollow"&gt;correctly keep track of reference counts in your objects&lt;/a&gt;.  If you get this wrong, you can segfault the interpreter, or introduce a memory leak.&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;some APIs use a format string, with C variable-length arguments (see e.g &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fdocs.python.org%2Fc-api%2Farg.html" rel="nofollow" rel="nofollow"&gt;PyArg_ParseTuple and its variants&lt;/a&gt;).  If the C compiler doesn't know the rules, it can't enforce type-safety.  This can lead to people accidentally writing architecture-specific code (more on this below)&lt;/li&gt;&lt;br /&gt;  &lt;li&gt;like any API, function calls can fail.  This seems to be a universal rule of computer programming: it's tricky to correctly handle all the errors that can occur - bugs tend to lurk in the error-handling cases&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;I want to make it easier for people to write correct Python extension code, so I've been looking at static analysis.&lt;br /&gt;&lt;br /&gt;None of the existing tools seemed to do exactly what I wanted, and given that all of my work is done with GCC, I wanted a solution that was well integrated with GCC.  I also wanted to be able to use Python itself to work on the tool.  (I attempted some of this a while back &lt;a href="http://dmalcolm.livejournal.com/3689.html" target="_blank"&gt;with Coccinelle&lt;/a&gt;, but I use GCC, so I wanted to embed the checking directly into GCC).&lt;br /&gt;&lt;br /&gt;So I've written a &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FGccPythonPlugin" rel="nofollow" rel="nofollow"&gt;GCC plugin that embeds Python&lt;/a&gt; within that compiler.  This means that it's now possible to write new C and C++ compilation passes in Python, and use Python packages for things like syntax-highlighting, visualization, and so on.&lt;br /&gt;&lt;br /&gt;That's the theory, anyway.  The code is still fairly new, so &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Freadthedocs.org%2Fdocs%2Fgcc-python-plugin%2Fen%2Flatest%2Findex.html" rel="nofollow" rel="nofollow"&gt;I've only wrapped a small subset of GCC's types and APIs&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I've started using this to write &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Freadthedocs.org%2Fdocs%2Fgcc-python-plugin%2Fen%2Flatest%2Fcpychecker.html" rel="nofollow" rel="nofollow"&gt;a static analyser for CPython extension code&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here's an example of what it can do so far...&lt;br /&gt;&lt;br /&gt;Given this fragment of C code:&lt;br /&gt;&lt;pre&gt;
    24	
    25	PyObject *
    26	socket_htons(PyObject *self, PyObject *args)
    27	{
    28	    unsigned long x1, x2;
    29	
    30	    if (!PyArg_ParseTuple(args, &amp;quot;i:htons&amp;quot;, &amp;amp;x1)) {
    31	        return NULL;
    32	    }
    33	    x2 = (int)htons((short)x1);
    34	    return PyInt_FromLong(x2);
    35	}
    36	
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;there's a bug:  at line 30, the &amp;quot;i&amp;quot; code to PyArg_ParseTuple signifies an &amp;quot;int&amp;quot;, but it's being passed an &amp;quot;unsigned long&amp;quot; from line 28 (via a pointer) to write back its result to.  This will &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsvn.python.org%2Fview%3Fview%3Drevision%26revision%3D34931" rel="nofollow" rel="nofollow"&gt;break badly on a big-endian 64-bit CPU&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;First of all, we can use the Python support in the compiler to visualize the code:&lt;br /&gt;&lt;pre&gt;
[david@fedora-15 gcc-python]$ ./gcc-with-python show-ssa.py -I/usr/include/python2.7 demo.c
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Here's the output.  This visualization shows the basic blocks of code, with source code on the left, interleaved with GCC's internal representation on the right:&lt;br /&gt;&lt;img width="640" height="519" alt="SVG rendering of the control-flow graph of the given function" src="https://imgprx.livejournal.net/2b967fa5cfd23106b7768f7b48296b68a13bd47c/lRdmkRy0lyEKoQaqOUgcDn3mzMNWIFvjNnwsOOz6Pi57PE0fadCevilTY-EGf1opVCfO2UyX3DoAhAVFLBtZkMTkr3vxOIGZcquvFFws4VM" fetchpriority="high" /&gt;&lt;br /&gt;&lt;br /&gt;(If you're wondering what the "PHI&amp;lt;&amp;gt;" functions mean in the above, this is actually showing the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FStatic_single_assignment_form" rel="nofollow" rel="nofollow"&gt;SSA representation&lt;/a&gt; after some of GCC's analysis and optimizations passes have already happened).&lt;br /&gt;&lt;br /&gt;Given that this is Python, it's really easy to write new visualizations.&lt;br /&gt;&lt;br /&gt;I've also written the first new compiler warnings using the Python plugin.&lt;br /&gt;&lt;br /&gt;Here's the output from compiling that C code using my &amp;quot;cpychecker.py&amp;quot; script to add new warnings:&lt;br /&gt;&lt;pre&gt;
[david@fedora-15 gcc-python]$ ./gcc-with-python cpychecker.py $(python-config --cflags) demo.c 
demo.c: In function &amp;lsquo;socket_htons&amp;rsquo;:
demo.c:30:26: error: Mismatching type in call to PyArg_ParseTuple with format code &amp;quot;i:htons&amp;quot; [-fpermissive]
  argument 3 (&amp;quot;&amp;amp;x1&amp;quot;) had type &amp;quot;long unsigned int *&amp;quot; (pointing to 64 bits)
  but was expecting &amp;quot;int *&amp;quot; (pointing to 32 bits) for format code &amp;quot;i&amp;quot;
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I've tried to make the new error message readable, containing as much information as possible.&lt;br /&gt;&lt;br /&gt;Any ideas on how to improve this?&lt;br /&gt;&lt;br /&gt;I'm now working frantically on implementing reference-count checking :)&lt;br /&gt;&lt;br /&gt;I hope that I'll be able to get this into a working state in time for Fedora 16: I'd like to &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FStaticAnalysisOfCPythonExtensions" rel="nofollow" rel="nofollow"&gt;run all of the C Python extension code in the Fedora distribution through a checker&lt;/a&gt;, but I need to do a lot of polishing before it's ready!&lt;br /&gt;&lt;br /&gt;The code is free software (GPLv3 or later), and you can grab it from this git repository:&lt;br /&gt;&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fgit.fedorahosted.org%2Fgit%2F%3Fp%3Dgcc-python-plugin.git%3Ba%3Dsummary' rel='nofollow'&gt;http://git.fedorahosted.org/git/?p=gcc-python-plugin.git;a=summary&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm using this Trac instance for bug tracking:&lt;br /&gt;&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fgcc-python-plugin%2F' rel='nofollow'&gt;https://fedorahosted.org/gcc-python-plugin/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Anyone got ideas for other uses for this?  Visualizations of code?  New compiler warnings?  Remember, this thing's built on top of GCC, so (in theory) it can handle anything that GCC can handle e.g. C++ templates, Java, Fortran, and so on.&lt;br /&gt;&lt;br /&gt;If you want to get involved, or want more information, there's a mailing list here:&lt;br /&gt;&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fmailman%2Flistinfo%2Fgcc-python-plugin' rel='nofollow'&gt;https://fedorahosted.org/mailman/listinfo/gcc-python-plugin&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Thanks to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fwww.redhat.com%2F" rel="nofollow" rel="nofollow"&gt;Red Hat&lt;/a&gt; for supporting the development of this software! (and for general awesomeness); thanks also to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Freadthedocs.org%2F" rel="nofollow" rel="nofollow"&gt;Read the Docs&lt;/a&gt; for providing a nifty hosting service for free software API documentation.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:5782</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/5782.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=5782"/>
    <title>PyCon US talks on memory usage and gdb</title>
    <published>2011-03-13T06:20:48Z</published>
    <updated>2011-03-13T06:20:48Z</updated>
    <category term="fedora"/>
    <category term="python"/>
    <content type="html">Various people have been asking for the slides to my &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fus.pycon.org%2F2011%2Fhome%2F" rel="nofollow" rel="nofollow"&gt;PyCon US&lt;/a&gt;&amp;nbsp;talks, so here goes.&lt;br /&gt;&lt;br /&gt;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fus.pycon.org%2F2011%2Fschedule%2Fpresentations%2F25%2F" rel="nofollow" rel="nofollow"&gt;&amp;quot;Dude, Where's My RAM?&amp;quot; - A deep dive into how Python uses memory&lt;/a&gt;&lt;br /&gt;&lt;p&gt;Slides can be found &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.fedoraproject.org%2F%7Edmalcolm%2Fpresentations%2FPyCon-US-2011%2FMemoryUsage.odp" rel="nofollow" rel="nofollow"&gt;here in ODP format&lt;/a&gt;&amp;nbsp;(2.7M) and &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.fedoraproject.org%2F%7Edmalcolm%2Fpresentations%2FPyCon-US-2011%2FMemoryUsage.pdf" rel="nofollow" rel="nofollow"&gt;here as a PDF&lt;/a&gt;&amp;nbsp;(2.5M)&lt;/p&gt;   &lt;p&gt;To answer some of the questions asked about the &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fgdb-heap%2F" rel="nofollow" rel="nofollow"&gt;gdb-heap&lt;/a&gt; tool:&lt;/p&gt;      &lt;ul&gt;&lt;li&gt;I've used it with both Python 2.6 and Python 2.7 as the Python running inside gdb&lt;/li&gt;&lt;li&gt;I've used to analyze the memory usage of Python 2.4, 2.6, 2.7, 3.1 and 3.2&lt;/li&gt;&lt;li&gt;If anyone wants to work on extending this (e.g. adding PyPy support), I'm attending the sprints until Thursday&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-size: larger; "&gt;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fus.pycon.org%2F2011%2Fschedule%2Fpresentations%2F27%2F" rel="nofollow" rel="nofollow"&gt;Using Python to debug C and C++ code (using gdb)&lt;br /&gt;&lt;/a&gt;&lt;/span&gt;The slides can be seen &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.fedoraproject.org%2F%7Edmalcolm%2Fpresentations%2FPyCon-US-2011%2FGdbPythonPresentation%2FGdbPython.html" rel="nofollow" rel="nofollow"&gt;here&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;(BTW, these were built using &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fseld.be%2Fnotes%2Fintroducing-slippy-html-presentations" rel="nofollow" rel="nofollow"&gt;slippy&lt;/a&gt;, which I like a lot. &amp;nbsp;I first tried using rst and rst2s5 but ran into lots of aspect ratio issues when trying it out on a projector, so I fixed up the s5 html into slippy's html. &amp;nbsp;Thanks to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fnedbatchelder.com%2F" rel="nofollow" rel="nofollow"&gt;Ned Batchelder&lt;/a&gt; for showing me slippy)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:5597</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/5597.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=5597"/>
    <title>FUDcon talk: Different Species of Python</title>
    <published>2011-01-30T01:46:50Z</published>
    <updated>2011-02-08T18:32:50Z</updated>
    <category term="fedora"/>
    <category term="python"/>
    <content type="html">I'm at &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FFUDCon%3ATempe_2011" rel="nofollow" rel="nofollow"&gt;FUDCon in Tempe, Arizona&lt;/a&gt;. &amp;nbsp;Today I gave a talk entitled &amp;quot;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.fedoraproject.org%2F%7Edmalcolm%2Fpresentations%2FFUDcon%202011%20-%20Different%20Species%20Of%20Python.pdf" rel="nofollow" rel="nofollow"&gt;Different Species of Python&lt;/a&gt;&amp;quot;, comparing the various implementations of the Python language, and how &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FSIGs%2FPython" rel="nofollow" rel="nofollow"&gt;we&lt;/a&gt; might better support them within &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2F" rel="nofollow" rel="nofollow"&gt;Fedora&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I hope this will be of interest to both the folks on the&amp;nbsp;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fplanet.fedoraproject.org%2F" rel="nofollow" rel="nofollow"&gt;Planet Fedora&lt;/a&gt;&amp;nbsp;and the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fplanet.python.org%2F" rel="nofollow" rel="nofollow"&gt;Planet Python&lt;/a&gt; aggregators.&lt;br /&gt;&lt;br /&gt;I ran overtime a bit: lots of interesting questions and discussion on the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsvn.python.org%2Fview%2Fpython%2Fbranches%2Fpy3k%2F" rel="nofollow" rel="nofollow"&gt;internals of the two CPython implementations&lt;/a&gt;, so I didn't get to cover Jython, IronPython and &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fcodespeak.net%2Fpipermail%2Fpypy-dev%2F2011q1%2F006673.html" rel="nofollow" rel="nofollow"&gt;PyPy&lt;/a&gt; as much as I might have done, or the packaging issues, but hopefully a useful time was had by all.&lt;br /&gt;&lt;br /&gt;I've uploaded a copy of my slides to my Fedora People page, as &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.fedoraproject.org%2F%7Edmalcolm%2Fpresentations%2FFUDcon%202011%20-%20Different%20Species%20Of%20Python.pdf" rel="nofollow" rel="nofollow"&gt;PDF&lt;/a&gt; here and as &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.fedoraproject.org%2F%7Edmalcolm%2Fpresentations%2FFUDcon%202011%20-%20Different%20Species%20Of%20Python.odp" rel="nofollow" rel="nofollow"&gt;ODF&lt;/a&gt; here - enjoy!&lt;br /&gt;&lt;br /&gt;UPDATE:&lt;br /&gt;Tim Flink heroically transcribed the talk and questions for those on IRC. &amp;nbsp;A transcript can be found &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ftflink.fedorapeople.org%2FFUDConNA2011%2FDifferentSpeciesOfPython_FUDConNA2011_DavidMalcom_IRCtranscript.txt" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:5240</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/5240.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=5240"/>
    <title>Python 3 support for GTK via PyGI: proof-of-concept</title>
    <published>2010-04-20T23:47:10Z</published>
    <updated>2010-04-20T23:47:10Z</updated>
    <category term="fedora"/>
    <category term="python"/>
    <content type="html">&lt;img alt="Screenshot showing GTK window created by Python 3 via PyGI" width="800" height="600" src="https://imgprx.livejournal.net/fccbd7bdb613a57752a3e20628fc43dfd0cf7509/lRdmkRy0lyEKoQaqOUgcDn3mzMNWIFvjNnwsOOz6Pi57PE0fadCevilTY-EGf1opHmAqDy9Chc-AdRyHPdnSpymNiVDQQ3MoVcxPqEUPSYiBYvV0uXQ9r_93xpoU7tnr" fetchpriority="high" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I spent much of last week taking part in a &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Flive.gnome.org%2FHackfests%2FPython2010" rel="nofollow" rel="nofollow"&gt;GTK/Python hackfest&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;My particular interest is in Python 3 support, so I spent some time helping John Ehresman clean up pygobject, and some time with John Palmieri on PyGI&lt;br /&gt;&lt;br /&gt;I'm somewhat new to gobject-introspection, so I created an ASCII-art diagram to try to help figure out how it works:&lt;br /&gt;&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Flive.gnome.org%2FGObjectIntrospection%2FArchitecture" rel="nofollow" rel="nofollow"&gt;http://live.gnome.org/GObjectIntrospection/Architecture&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In the old GTK approach to binding native libraries for use by other language runtimes (such as Python's), a .defs file provided metadata on the API, which had to be kept in-sync with the code.  An example can be seen &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgit.gnome.org%2Fbrowse%2Fpygtk%2Ftree%2Fatk.defs" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;. For Python, this was used to generate .c code which could then be compiled into an extension module.  A problem with this approach is that you typically need to write numerous &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgit.gnome.org%2Fbrowse%2Fpygtk%2Ftree%2Fgtk%2Fgtktoolbar.override" rel="nofollow" rel="nofollow"&gt;.overrides files&lt;/a&gt; to handle the awkward cases, and these are specific to Python.&lt;br /&gt;&lt;br /&gt;This means that for every N libraries (gtk, atk, gstreamer, dbus, telepathy, etc) and M runtimes (python 2, python 3, javascript, java, etc) you potentially need to handle N*M bindings, and each one could involve specialcasing.&lt;br /&gt;&lt;br /&gt;In the new approach, "gobject-introspection" defines a simple textual format for source-code comments, containing similar information to a .defs file, but (I hope) rich enough to handle more of the special cases.  This is scraped from the source into an XML file (e.g. Foo.gir), then compiled into an efficient binary format (e.g. Foo.typelib) which can be mapped into memory at runtime using a library (libgirepository.so).&lt;br /&gt;&lt;br /&gt;The idea is that each language runtime can define a bridge, which calls into libgirepositrory, mapping between the language and the various libraries, using libffi, dynamically lazy-creating the glue code.&lt;br /&gt;&lt;br /&gt;My hope is that we now need to handle only N + M cases, and so hopefully we have less of a combinatorial explosion, and potentially, faster start-up times and less RAM usage.&lt;br /&gt;&lt;br /&gt;I've now got a py3k branch of PyGI in a Fedora git repository &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2Fgitweb%3Fp%3Ddmalcolm%2Fpublic_git%2Fpygi.git%3Ba%3Dsummary" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(the tracking bug is &lt;a href='https://www.livejournal.com/away?to=https%3A%2F%2Fbugzilla.gnome.org%2Fshow_bug.cgi%3Fid%3D615872' rel='nofollow'&gt;https://bugzilla.gnome.org/show_bug.cgi?id=615872&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;This code is able to be compiled against both Python 2.6 and Python 3.1 (in each case using the py3k branch of pygobject), and all tests pass with both versions of Python.&lt;br /&gt;&lt;br /&gt;I can now write something like this:&lt;br /&gt;&lt;pre&gt;
from gi.repository import Gtk
w = Gtk.Window()
w.set_title('\u6587\u5b57\u5316\u3051') # 3 Kanji and a hiragana: "Mojibake"
b = Gtk.Button()
b.set_label('I am a GtkButton created by Python 3 via PyGI')
w.add(b)
w.show_all()
Gtk.main()
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;and have Python 3 dynamically load the Gtk typelib, dynamically load the underlying C libraries, and generate the machine code glue to call into the GTK library and create a Gtk Window.  In this case a unicode string is correctly marshalled from Python 3, converted into UTF 8 internally within PyGI, and used to invoke gtk_window_set_title dynamically.&lt;br /&gt;&lt;br /&gt;Early days yet (still seeing far too many errors when spelunking around inside the API), but at least it works well enough for a screenshot!&lt;br /&gt;&lt;br /&gt;Thanks to the other hackfest attendees for answering my many dumb questions about PyGI, to Red Hat for paying me to work on Python 3 porting, and for sponsoring our food at the event, to One Laptop per Child for letting us use a room in their office as hacking space, and to Canonical for buying us coffee.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:5013</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/5013.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=5013"/>
    <title>What variability exists within proposed updates to the Fedora package collection?</title>
    <published>2010-03-09T18:53:43Z</published>
    <updated>2010-03-09T18:53:43Z</updated>
    <category term="fedora"/>
    <content type="html">There's been a lot of discussion, and alas, some bad feeling, I think, about trying to balance updates versus testing in Fedora.&lt;br /&gt;&lt;br /&gt;I believe there are many areas where we can mitigate risk for the users of Fedora without imposing extra work on package maintainers.&lt;br /&gt;&lt;br /&gt;I don't think &amp;quot;one size fits all&amp;quot; - I believe that one of the problems we face is that no package or update is alike, and that discussion tends to lump things together without recognizing those differences.&lt;br /&gt;&lt;br /&gt;In the hope that it's helpful, I've tried to gather some of the variables that I think are meaningful in the context of &amp;quot;how likely is it that a proposed update might break something?&amp;quot; (and there's some of my opinion in here too)&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Built-in test suite&lt;/strong&gt;&lt;br /&gt;There's great variability here between different src rpms:&lt;ul&gt;&lt;li&gt;Does the upstream code have a test suite?&lt;/li&gt;&lt;li&gt;Do we run it during %check ?&lt;/li&gt;&lt;li&gt;If something started failing, do we actually know about it? (e.g. does an error here kill the build? is there a list of known good/known bad tests?)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;I think that a package that's passed hundreds of selftests during the build and not failed any is doing better than one that has no built-in test suite, and should be in some way privileged in our update system.  (It's possible to auto-detect the presence of a %check section as well)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;External test suite&lt;/strong&gt;&lt;br /&gt;How much testing does the package get via autoqa?  How much testing has this proposed update had via an automated system?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Manual testing&lt;/strong&gt;&lt;br /&gt;Yes, having a human actually try the software is often good, and finds different types of problem to those that can be found via automated testing.  Having said that, I feel we have far too little automated testing in Fedora today, and that the current way we do manual testing has flaws:  people test for the bugs they wanted to see fixed, and report whether they are fixed or not.  From a formal coverage perspective, we've no idea if they're hitting the important use-cases for other users of the package.  But presumably we do hit a lot of coverage with the current approach.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Can multiple versions be installed?&lt;/strong&gt;&lt;br /&gt;The kernel gets to install multiple copies of itself, and this is the ultimate escape hatch for when a kernel update hoses your system: you still have a known-good version installed (I hope) and get to reboot with that.&lt;br /&gt;&lt;br /&gt;To what extent are other packages allowed to do that? (or would the maintainers want to?)   Would extending it be a way to mitigate risk on those packages that want to rebase more often?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Meaningful test coverage&lt;/strong&gt;&lt;br /&gt;With my former professional QA hat on I think the ideal for software testing is:&lt;ul&gt;&lt;li&gt;The &amp;quot;&lt;strong&gt;functional&lt;/strong&gt;&amp;quot; side:&amp;nbsp;to have a set of personas describing who will be using the software, what they will be using it to do, and to use that to come up with a set of test cases that cover that functionality&lt;/li&gt;&lt;li&gt;The &amp;quot;&lt;strong&gt;non-functional&lt;/strong&gt;&amp;quot; side: to know about the types of flaw expected based on the technology (e.g. I expect buffer overruns in C), and to use this to come up with appropriate test cases to prevent these&lt;/li&gt;&lt;/ul&gt;This should give an idea of what test coverage is appropriate, and you can _then_ think about automating them.  &lt;br /&gt;&lt;br /&gt;So I think that a package that has some test cases on the wiki is &amp;quot;in better shape&amp;quot; for doing updates than one that doesn't, and I hope that's a lightweight way of guiding testing.  I hope there's a way of streamlining this within our processes so that we do smarter testing without needing extra work for package maintainers.  (I don't expect anyone wants to adopt IEEE 829 in Fedora QA; see p133-136 of &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fwww.amazon.com%2FLessons-Learned-Software-Testing-Kaner%2Fdp%2F0471081124" rel="nofollow" rel="nofollow"&gt;&amp;quot;Lessons Learned in Software Testing&amp;quot;, Kaner et al (2002)&lt;/a&gt; for excellent arguments for &lt;em&gt;not&amp;nbsp;&lt;/em&gt;using it; a great book, BTW).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Lines of code overall&lt;/strong&gt;&lt;br /&gt;Some packages are small, some are huge.  I did some stats on this for RHEL when I worked on RHEL QA, using &amp;quot;sloccount&amp;quot;. I believe the largest by SLOC was openoffice, closely followed by the kernel (in the millions of SLOC), then a big dropoff to the 100k SLOC packages, then a long tail.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Amount of code touched&lt;/strong&gt;&lt;br /&gt;What is the build-time difference between old and new versions of the src.rpm?  This isn't the whole story (a one-line bug can still kill you), but it's part of the story.  A rebase might contain a fix for bugs you care about, but might also touch 50 other subsystems.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Amount of testing elsewhere&lt;/strong&gt;&lt;br /&gt;One advantage of a rebase is that you are sharing source code changes with other people, and so if there is a problem, someone else might have already run into it.  This isn't a panacea: yes, there are plenty of ways in which we can have Fedora-specific bugs, but it is one difference between a tarball rebase versus cherry-picking patches.&lt;br /&gt;&lt;br /&gt;(random thought: could Bodhi have integration with other distributions update systems and warn us about analogous updates that are breaking other people?  or is Fedora always the first into the minefield, finding the bugs for other distributions?)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Noarch vs architecture independent&lt;/strong&gt;&lt;br /&gt;The former are typically much simpler than the latter.  The latter has specific risks (e.g. word-size assumptions).  To what extent can we mitigate these risks with automated testing?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Programming Language&lt;/strong&gt;&lt;br /&gt;Each programming language exhibits its own sets of emergent behavior.  For example (and this is grossly oversimplifying):&lt;br /&gt;  &lt;ul&gt;&lt;li&gt;C code tends to exhibit buffer-overflow bugs, poor unit testing, poor error-handling&lt;/li&gt;&lt;li&gt;C++ code can be more prone to compiler/static linker/dynamic linkage bugs than C code&lt;/li&gt;&lt;/ul&gt;etc.  I don't want to populate this list too much as this kind of thing is prone to unhelpful programming language flamewars.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Problems inherent to packaging&lt;/strong&gt;&lt;br /&gt;Each software delivery system exhibits its own set of flaws, and our RPM/yum is no exception.  To what extent does, say, rpmlint cover the types of thing that go wrong, and to what extent can we extend it to help us?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Build system&lt;/strong&gt;&lt;br /&gt;Although the Fedora packaging guidelines manage to impose some sanity on this, there are many ways in which packages get configured and built.&lt;br /&gt;&lt;br /&gt;Some examples:&lt;ul&gt;&lt;li&gt;the GNU autotools: configure.in, Makefile.am, leading to a &amp;quot;configure&amp;quot; used during the build to generate a Makefile.  This can be prone to &amp;quot;silently&amp;quot; dropping functionality when the buildroot changes.  It's sometimes possible to detect such breakage by looking at the &amp;quot;Requires&amp;quot; metadata of the built packages (can we automate this in Bodhi?)&lt;/li&gt;&lt;li&gt;hand-written one-of-a-kind Makefile written by upstream.  Given that each is unique, each will have unique problems&lt;/li&gt;&lt;li&gt;python setup.py, using distutils/setuptools.&lt;/li&gt;&lt;li&gt;cmake&lt;/li&gt;&lt;/ul&gt;etc&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Security fixes&lt;/strong&gt;&lt;br /&gt;Security fixes probably should be treated differently from non-security fixes: many people expect that the former should happen as a matter or course, that if someone has distributed software, they should also promptly distribute security fixes.  This seems to be regarded as some kind of natural entitlement within software in a way that other kinds of update aren't, and so our update process probably should reflect this special quality ascribed to security flaws (I suspect I'm getting grumpy and middle-aged in my attitudes here, sorry)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Critical versus Speciality packages&lt;/strong&gt;&lt;br /&gt;Is this a package that needs to work for an essential Fedora functionality to work, or is it more of a &amp;quot;leaf&amp;quot; within the dependency graph.  For example, if this package breaks, could it prevent a user from running yum, or running a graphical browser to search for information on the bug?&lt;br /&gt;&lt;br /&gt;I like our &amp;quot;critical path&amp;quot; approach: some packages are definitely more critical than others.  The exact details might need tuning, of course.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Paid versus Volunteer&lt;/strong&gt;&lt;br /&gt;I'm in the very fortunate position that I'm being paid to work on Fedora, and thus I'm professionally responsible for doing some tasks that aren't fun.  Others volunteer their time and effort on Fedora, and I think it's important that their time should be fun, or, at least satisfying for some of the higher levels of Maslow's hierarchy of needs.  (I happen to enjoy most of what I do in Fedora, and I do spend evenings and weekends on it too).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I hope this is a constructive addition to the debate. &amp;nbsp;What other variability is meaningful in the context of &amp;quot;candidate updates&amp;quot;?  I probably missed some.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:4799</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/4799.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=4799"/>
    <title>Update on 2to3c</title>
    <published>2010-02-21T17:35:23Z</published>
    <updated>2010-02-22T16:42:33Z</updated>
    <category term="python"/>
    <content type="html">My 2to3c tool now has a website: &lt;a href='https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2F2to3c%2F' rel='nofollow'&gt;https://fedorahosted.org/2to3c/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I haven't yet sanely packaged it (I'm working on that) but I have done some work since &lt;a href="http://dmalcolm.livejournal.com/3935.html" target="_blank"&gt;my last blog post&lt;/a&gt;:&lt;br /&gt;  - It's somewhat more robust at handling errors from &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fcoccinelle.lip6.fr%2F" rel="nofollow" rel="nofollow"&gt;spatch&lt;/a&gt;&lt;br /&gt;  - Beginnings of a selftest suite (ultimately I want to be able to take C code, run the tool, compile it against multiple runtimes, then execute it in each one, verifying that the module still works)&lt;br /&gt;  - Some (possibly crazy) hacks to try to better handling preprocessor macros when reworking module initialization.&lt;br /&gt;&lt;br /&gt;John Palmieri has used the tool to help with &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Fbugs.freedesktop.org%2Fshow_bug.cgi%3Fid%3D26420" rel="nofollow" rel="nofollow"&gt;porting the DBus python bindings to Python 3&lt;/a&gt;.  He's had to do a lot of manual work, but apparently &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fwww.j5live.com%2F2010%2F02%2F03%2Fthe-quest-for-python-3%2F" rel="nofollow" rel="nofollow"&gt;2to3c did save him some time&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This is more of a shovel than a silver bullet, perhaps.&lt;br /&gt;&lt;br /&gt;Help with this would be most welcome.  I'm at PyCon, BTW, and will be around for a few of the sprint days, if anyone's interested in working on this.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:4545</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/4545.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=4545"/>
    <title>Debugging the innards of Python... with Python</title>
    <published>2010-02-13T06:39:22Z</published>
    <updated>2010-02-13T06:39:22Z</updated>
    <category term="fedora"/>
    <category term="python"/>
    <content type="html">I've been trying a &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FEasierPythonDebugging" rel="nofollow" rel="nofollow"&gt;new approach for debugging the internals of Python&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One of the strengths of the C implementation of Python (&amp;quot;CPython&amp;quot;) is how easy it is to wrap libraries written in C/C++ so that they're callable from Python code.&lt;br /&gt;&lt;br /&gt;The drawback of this is that you're running &amp;quot;native&amp;quot; machine code, and that code can crash the python process, or worse, corrupt the internals of the python process so that the crash seems to be coming from somewhere else.&lt;br /&gt;&lt;br /&gt;Time to break out the debugger...&lt;br /&gt;&lt;br /&gt;Unfortunately, practically everything inside CPython is a pointer to a &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsvn.python.org%2Fview%2Fpython%2Ftrunk%2FInclude%2Fobject.h%3Fview%3Dmarkup" rel="nofollow" rel="nofollow"&gt;PyObject structure&lt;/a&gt;, with some additional type-specific data lurking after it in memory.    A typical debugger by default just shows you the addresses of these structures, or shows you the two values they contain (&amp;quot;ob_refcnt&amp;quot; and &amp;quot;ob_type&amp;quot;, which aren't necessarily enlightening).&lt;br /&gt;&lt;br /&gt;I typically use &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsourceware.org%2Fgdb%2F" rel="nofollow" rel="nofollow"&gt;the &amp;quot;gdb&amp;quot; debugger&lt;/a&gt;.  Python has long had a file named &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsvn.python.org%2Fview%2Fpython%2Ftrunk%2FMisc%2Fgdbinit%3Fview%3Dmarkup" rel="nofollow" rel="nofollow"&gt;&amp;quot;gdbinit&amp;quot;&lt;/a&gt; which contains various hooks to make it easier to debug CPython within gdb  (In &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2F" rel="nofollow" rel="nofollow"&gt;Fedora&lt;/a&gt; and &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fwww.redhat.com%2Frhel%2F" rel="nofollow" rel="nofollow"&gt;RHEL&lt;/a&gt; that file can be found in our &amp;quot;python-devel&amp;quot; packages).&lt;br /&gt;&lt;br /&gt;If you copy this file to ~/.gdbinit and restart gdb you can then use &amp;quot;pyo&amp;quot; to print a more human-readable representation of a PyObject*, and use &amp;quot;pyframe&amp;quot; and other commands to figure out where the currently selected thread of the process is in Python code.&lt;br /&gt;&lt;br /&gt;This approach has some drawbacks:  &lt;ul&gt;&lt;li&gt;it isn't very robust - it effectively injects calls to the repr() function into the process being debugged (the so-called &amp;quot;inferior process&amp;quot; in debugger parlance). If the data in the inferior process is corrupt, attempting to print it can lead to another segmentation fault (&amp;quot;SIGSEGV&amp;quot;) within that process inside the implementation of &amp;quot;repr&amp;quot;.  For example, if the allocator of the inferior process has become corrupt, then Python won't be able to create the string representation of the data without a crash (be it in Python's memory arena code or the underlying heap), and you've got another SIGSEGV.  If one of the objects inside a list has been splatted with 0xDEADBEEF garbage, attempting to print the repr() object will crash, and you won't even know that you had a list.&lt;/li&gt;&lt;li&gt;you have to go into gdb manually and run these commands by hand.  Automated crash reports can't get at the &amp;quot;real&amp;quot; data, most significantly: what lines of Python code were being executed when this crash happened?&lt;/li&gt;&lt;li&gt;as well as being a manual task, it's hard to do this correctly; any mistakes when doing this will typically cause a SIGSEGV in the inferior process&lt;/li&gt;&lt;li&gt;the script is written in gdb's own language and is thus hard to work with and extend&lt;/li&gt;&lt;li&gt;complicated intermeshing of C and Python (e.g. invocations of callbacks that then trigger other callbacks each time crossing a language boundary) can be hard to figure out&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Is there a better way?&lt;br /&gt;&lt;br /&gt;(...drum roll...)&lt;br /&gt;&lt;br /&gt;As of gdb 7.0, it's now possible to write custom &amp;quot;pretty-printers&amp;quot; for data in gdb, using Python as the extension language.   One of the reasons for this was to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsourceware.org%2Fgdb%2Fwiki%2FProjectArcher" rel="nofollow" rel="nofollow"&gt;improve visualization of C++ data in the debugger&lt;/a&gt;, so that if you try to print a C++ std::string, rather than get this craziness:&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;(gdb) print str&lt;br /&gt;$1 = {static npos = 4294967295,&lt;br /&gt;  _M_dataplus = {&lt;std::allocator&gt;&lt;char&gt;&amp;gt; = {&amp;lt;__gnu_cxx::new_allocator&lt;char&gt;&amp;gt; = {&lt;no fields=""&gt;}, &lt;no fields=""&gt;}, _M_p = 0x804a014 &amp;quot;hello world&amp;quot;}}&lt;br /&gt;&lt;/no&gt;&lt;/no&gt;&lt;/char&gt;&lt;/char&gt;&lt;/std::allocator&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;you now get:&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;(gdb) print str&lt;br /&gt;$1 = hello world&lt;br /&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;The above is thanks to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fgcc.gnu.org%2Fsvn%2Fgcc%2Ftrunk%2Flibstdc%2B%2B-v3%2Fpython%2Flibstdcxx%2Fv6%2Fprinters.py" rel="nofollow" rel="nofollow"&gt;python hooks running inside the gdb process&lt;/a&gt;, using a &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsourceware.org%2Fgit%2Fgitweb.cgi%3Fp%3Darcher.git%3Ba%3Dblob%3Bf%3Dgdb%2Fpython%2Fpython.c%3Bh%3D29386c9205333aab33348c5e9742343ed8070c5e%3Bhb%3DHEAD" rel="nofollow" rel="nofollow"&gt;&amp;quot;gdb&amp;quot; module&lt;/a&gt; to walk the graph of &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsourceware.org%2Fgit%2Fgitweb.cgi%3Fp%3Darcher.git%3Ba%3Dblob%3Bf%3Dgdb%2Fpython%2Fpy-value.c%3Bh%3Da792819d5511d523d0db3ac4722d9d96752a18d9%3Bhb%3DHEAD" rel="nofollow" rel="nofollow"&gt;gdb.Value&lt;/a&gt; and &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fsourceware.org%2Fgit%2Fgitweb.cgi%3Fp%3Darcher.git%3Ba%3Dblob%3Bf%3Dgdb%2Fpython%2Fpy-type.c%3Bh%3Da97c125359336aa7c5fcd3ee7c79740d3605e54b%3Bhb%3DHEAD" rel="nofollow" rel="nofollow"&gt;gdb.Type&lt;/a&gt; objects representing the state of the process being debugged.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;What this is building up to is that I've written a similar set of Python hooks for CPython itself.&lt;br /&gt;&lt;br /&gt;The code can be seen in &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2Fgitweb%3Fp%3Ddmalcolm%2Fpublic_git%2Flibpython.git%3Ba%3Dblob%3Bf%3Dlibpython.py" rel="nofollow" rel="nofollow"&gt;this git repository&lt;/a&gt;.  I've been going through the various types within CPython's internals, writing Python code to run inside the debugger.  For each type I've written a Python class that runs inside the debugger process, and knows how to convert an object in the process being debugged into an object within the debugger, and can then print it.&lt;br /&gt;&lt;br /&gt;To demonstrate, let's try breaking Python:&lt;br /&gt;&lt;br /&gt;The real use case for this for me is when a library decides to do error handling using &amp;quot;assert&amp;quot;, and takes down the whole process, or has a crasher bug.  Rather than pick on a 3rd-party library, let's inject our own segfault.  We can do this by abusing the &amp;quot;ctypes&amp;quot; module, by picking a random location in the process' address space and trying to read it as a string.  We can do this as a one-liner, but let's set up some data so we can see what it looks like in the crash report:&lt;br /&gt;&lt;pre&gt;
&amp;gt;&amp;gt;&amp;gt; class Foo:
...     def bar(self):
...         from ctypes import string_at
...         string_at(0xDEADBEEF) # this code will (probably) cause Python to segfault
... 
&amp;gt;&amp;gt;&amp;gt; f = Foo()
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; # Let's assign some data of various kinds to the instance:
&amp;gt;&amp;gt;&amp;gt; f.someattr = 42
&amp;gt;&amp;gt;&amp;gt; f.someotherattr = {'one':1, 'two':2L, 'three':[(), (None,), (None, None)]}
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; # Now let's trigger the segfault
&amp;gt;&amp;gt;&amp;gt; f.bar()
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;At this point we've generated a segmentation fault inside Python.&lt;br /&gt;&lt;br /&gt;Let's see the old behavior of a backtrace, using the &amp;quot;bt&amp;quot; command:&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;Program received signal SIGSEGV, Segmentation fault.&lt;br /&gt;__strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:87&lt;br /&gt;87		pcmpeqb	(%esi), %xmm0&lt;br /&gt;Current language:  auto&lt;br /&gt;The current source language is &amp;quot;auto; currently asm&amp;quot;.&lt;br /&gt;(gdb) bt&lt;br /&gt;#0  __strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:87&lt;br /&gt;#1  0x07113d30 in PyString_FromString (str=0xdeadbeef &amp;lt;Address 0xdeadbeef out of bounds&amp;gt;) at Objects/stringobject.c:116&lt;br /&gt;#2  0x00167e18 in string_at (ptr=0xdeadbeef &amp;lt;Address 0xdeadbeef out of bounds&amp;gt;, size=-1) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/_ctypes.c:5348&lt;br /&gt;#3  0x0018247f in ffi_call_SYSV () at src/x86/sysv.S:61&lt;br /&gt;#4  0x001822b0 in ffi_call (cif=&amp;lt;value optimized out&amp;gt;, fn=&amp;lt;value optimized out&amp;gt;, rvalue=&amp;lt;value optimized out&amp;gt;, avalue=&amp;lt;value optimized out&amp;gt;) at src/x86/ffi.c:213&lt;br /&gt;#5  0x00171315 in _call_function_pointer (pProc=0x167de0 &amp;lt;string_at&amp;gt;, argtuple=0xb7f3d02c, flags=4357, argtypes=0xb7f45a4c, restype=0x80f3dc4, checker=0x0) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/callproc.c:815&lt;br /&gt;#6  _CallProc (pProc=0x167de0 &amp;lt;string_at&amp;gt;, argtuple=0xb7f3d02c, flags=4357, argtypes=0xb7f45a4c, restype=0x80f3dc4, checker=0x0) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/callproc.c:1162&lt;br /&gt;#7  0x0016a6f2 in CFuncPtr_call (self=0xb7f9d5dc, inargs=0xb7f3d02c, kwds=0x0) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/_ctypes.c:3857&lt;br /&gt;#8  0x070c478c in PyObject_Call (func=0xb7f9d5dc, arg=0xb7f3d02c, kw=0x0) at Objects/abstract.c:2492&lt;br /&gt;#9  0x0716069c in do_call (f=0x80f37bc, throwflag=0) at Python/ceval.c:3917&lt;br /&gt;#10 call_function (f=0x80f37bc, throwflag=0) at Python/ceval.c:3729&lt;br /&gt;#11 PyEval_EvalFrameEx (f=0x80f37bc, throwflag=0) at Python/ceval.c:2389&lt;br /&gt;#12 0x07162642 in PyEval_EvalCodeEx (co=0xb7f3bda0, globals=0xb7f3768c, locals=0x0, args=0x80ec788, argcount=1, kws=0x80ec78c, kwcount=0, defs=0xb7f45d78, defcount=1, closure=0x0) at Python/ceval.c:2968&lt;br /&gt;#13 0x07160983 in fast_function (f=0x80ec644, throwflag=0) at Python/ceval.c:3802&lt;br /&gt;#14 call_function (f=0x80ec644, throwflag=0) at Python/ceval.c:3727&lt;br /&gt;#15 PyEval_EvalFrameEx (f=0x80ec644, throwflag=0) at Python/ceval.c:2389&lt;br /&gt;#16 0x07161b79 in fast_function (f=0x80eb1cc, throwflag=0) at Python/ceval.c:3792&lt;br /&gt;#17 call_function (f=0x80eb1cc, throwflag=0) at Python/ceval.c:3727&lt;br /&gt;#18 PyEval_EvalFrameEx (f=0x80eb1cc, throwflag=0) at Python/ceval.c:2389&lt;br /&gt;#19 0x07162642 in PyEval_EvalCodeEx (co=0xb7f2e578, globals=0xb7fc70b4, locals=0xb7fc70b4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968&lt;br /&gt;#20 0x071627a3 in PyEval_EvalCode (co=0xb7f2e578, globals=0xb7fc70b4, locals=0xb7fc70b4) at Python/ceval.c:522&lt;br /&gt;#21 0x0717d94b in run_mod (mod=&amp;lt;value optimized out&amp;gt;, filename=&amp;lt;value optimized out&amp;gt;, globals=0xb7fc70b4, locals=0xb7fc70b4, flags=0xbffff2fc, arena=0x80e8628) at Python/pythonrun.c:1335&lt;br /&gt;#22 0x0717f4a6 in PyRun_InteractiveOneFlags (fp=0x5b5420, filename=0x71c3e7d &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, flags=0xbffff2fc) at Python/pythonrun.c:840&lt;br /&gt;#23 0x0717f6ab in PyRun_InteractiveLoopFlags (fp=0x5b5420, filename=0x71c3e7d &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, flags=&amp;lt;value optimized out&amp;gt;) at Python/pythonrun.c:760&lt;br /&gt;#24 0x0717f7eb in PyRun_AnyFileExFlags (fp=0x5b5420, filename=&amp;lt;value optimized out&amp;gt;, closeit=0, flags=0xbffff2fc) at Python/pythonrun.c:729&lt;br /&gt;#25 0x0718c212 in Py_Main (argc=1, argv=0xbffff3f4) at Modules/main.c:599&lt;br /&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Looking closely at the above, we can see that there's a problem inside ctypes, but it's hard to see what triggered the problem within the Python code.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Below is a screendump of what it looks like with my new debugging hooks (warning: some very long lines; hopefully this won't get too badly mangled)&lt;br /&gt;&lt;br /&gt;I've highlighted some of it: we can see&lt;ul&gt;&lt;li&gt;the location we're at (file/line/function) within the Python code at each point in the call stack&lt;/li&gt;&lt;li&gt;the dictionary of globals&lt;/li&gt;&lt;li&gt;the arguments passed to functions&lt;/li&gt;&lt;/ul&gt;and so on - there's almost too much information.  This should come in handy for nasty issues when the stack involves C code calling into Python and back again (e.g. when callbacks get invoked); should also be useful for debugging all threads at once in a multithreaded app.&lt;br /&gt;&lt;br /&gt;There are probably many things that could be improved about this code, but I'm already finding it very useful, and I hope other people will too.&lt;br /&gt;&lt;br /&gt;I've integrated this code into Fedora's Python 2 and Python 3 builds for the &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FEasierPythonDebugging" rel="nofollow" rel="nofollow"&gt;upcoming Fedora 13 release&lt;/a&gt; so that the hooks should get automatically loaded by the debugger.   It should all Just Work - the crash-handling tool will fetch the debugging hooks and the bug report that's generated should contain all of this rich information on what's going on inside python - &lt;b&gt;without the user needing to be a debugging guru&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;
(gdb) bt
#0 &amp;nbsp;__strlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen.S:87&lt;address out="" of="" bounds=""&gt;#1 &amp;nbsp;0x07113d30 in PyString_FromString (str=0xdeadbeef &amp;lt;Address 0xdeadbeef out of bounds&amp;gt;) at Objects/stringobject.c:116&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#2 &amp;nbsp;0x00167e18 in string_at (ptr=0xdeadbeef &amp;lt;Address 0xdeadbeef out of bounds&amp;gt;, size=-1) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/_ctypes.c:5348&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#3 &amp;nbsp;0x0018247f in ffi_call_SYSV () at src/x86/sysv.S:61&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#4 &amp;nbsp;0x001822b0 in ffi_call (cif=&amp;lt;value optimized out&amp;gt;, fn=&amp;lt;value optimized out&amp;gt;, rvalue=&amp;lt;value optimized out&amp;gt;, avalue=&amp;lt;value optimized out&amp;gt;) at src/x86/ffi.c:213&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#5 &amp;nbsp;0x00171315 in _call_function_pointer (pProc=0x167de0 &amp;lt;string_at&amp;gt;, &lt;b&gt;argtuple=(3735928559L, -1)&lt;/b&gt;, flags=4357, argtypes=(&amp;lt;builtin_function_or_method at remote 0xb7f45d2c&amp;gt;, &amp;lt;builtin_function_or_method at remote 0xb7f45d4c&amp;gt;), restype=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;_ctypes.SimpleType at remote 0x80f3dc4&amp;gt;, checker=&amp;lt;unknown at remote 0x0&amp;gt;) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/callproc.c:815&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#6 &amp;nbsp;_CallProc (pProc=0x167de0 &amp;lt;string_at&amp;gt;, argtuple=(3735928559L, -1), flags=4357, argtypes=(&amp;lt;builtin_function_or_method at remote 0xb7f45d2c&amp;gt;, &amp;lt;builtin_function_or_method at remote 0xb7f45d4c&amp;gt;), restype=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;_ctypes.SimpleType at remote 0x80f3dc4&amp;gt;, checker=&amp;lt;unknown at remote 0x0&amp;gt;) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/callproc.c:1162&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#7 &amp;nbsp;0x0016a6f2 in CFuncPtr_call (self=0xb7f9d5dc, inargs=(3735928559L, -1), kwds=&amp;lt;unknown at remote 0x0&amp;gt;) at /usr/src/debug/Python-2.6.2/Modules/_ctypes/_ctypes.c:3857&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#8 &amp;nbsp;0x070c478c in PyObject_Call (func=&amp;lt;CFunctionType at remote 0xb7f9d5dc&amp;gt;, arg=(3735928559L, -1), kw=&amp;lt;unknown at remote 0x0&amp;gt;) at Objects/abstract.c:2492&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#9 &amp;nbsp;0x0716069c in do_call (&lt;b&gt;f=File /usr/lib/python2.6/ctypes/__init__.py, line 492, in string_at (ptr=3735928559L, size=-1)&lt;/b&gt;, throwflag=0) at Python/ceval.c:3917&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#10 call_function (&lt;b&gt;f=File /usr/lib/python2.6/ctypes/__init__.py, line 492, in string_at (ptr=3735928559L, size=-1)&lt;/b&gt;, throwflag=0) at Python/ceval.c:3729&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#11 PyEval_EvalFrameEx (f=File /usr/lib/python2.6/ctypes/__init__.py, line 492, in string_at (ptr=3735928559L, size=-1), throwflag=0) at Python/ceval.c:2389&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#12 0x07162642 in PyEval_EvalCodeEx (co=0xb7f3bda0, globals=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'Union': &amp;lt;_ctypes.UnionType at remote 0x17c120&amp;gt;, 'c_wchar': &amp;lt;_ctypes.SimpleType at remote 0x80fb32c&amp;gt;, 'c_bool': &amp;lt;_ctypes.SimpleType at remote 0x80fab54&amp;gt;, 'c_double': &amp;lt;_ctypes.SimpleType at remote 0x80f7a0c&amp;gt;, 'CFUNCTYPE': &amp;lt;function at remote 0xb7f3264c&amp;gt;, '__path__': ['/usr/lib/python2.6/ctypes'], 'byref': &amp;lt;builtin_function_or_method at remote 0xb7f40eec&amp;gt;, 'pointer': &amp;lt;builtin_function_or_method at remote 0xb7f40d6c&amp;gt;, 'alignment': &amp;lt;builtin_function_or_method at remote 0xb7f40eac&amp;gt;, '_memmove_addr': 4962832, 'c_longlong': &amp;lt;_ctypes.SimpleType at remote 0x80f8544&amp;gt;, 'c_short': &amp;lt;_ctypes.SimpleType at remote 0x80f407c&amp;gt;, 'get_errno': &amp;lt;builtin_function_or_method at remote 0xb7f39f4c&amp;gt;, '__file__': '/usr/lib/python2.6/ctypes/__init__.pyc', '_calcsize': &amp;lt;builtin_function_or_method at remote 0xb7f96bec&amp;gt;, 'c_ulong': &amp;lt;_ctypes.SimpleType at remote 0x80f5974&amp;gt;, 'c_int': &amp;lt;_ctypes.SimpleType at remote 0x80f5124&amp;gt;, 'c_int32': &amp;lt;_ctypes.SimpleType at remote 0x80f5124&amp;gt;, 'memmove': &amp;lt;CFunctionType at remote 0xb7f9d4a4&amp;gt;, '_sys': &amp;lt;module at remote 0xb7fa308c&amp;gt;, '_cast': &amp;lt;CFunctionType at remote 0xb7f9d574&amp;gt;, 'addressof': &amp;lt;builtin_function_or_method at remote 0xb7f40f0c&amp;gt;, 'ArgumentError': &amp;lt;type at remote 0x80f2fdc&amp;gt;, 'c_buffer': &amp;lt;function at remote 0xb7f32614&amp;gt;, 'c_longdouble': &amp;lt;_ctypes.SimpleType at remote 0x80f821c&amp;gt;, 'cdll': &amp;lt;LibraryLoader at remote 0xb7f459ac&amp;gt;, 'memset': &amp;lt;CFunctionType at remote 0xb7f9d50c&amp;gt;, 'string_at': &amp;lt;function at remote 0xb7f32a04&amp;gt;, 'sizeof': &amp;lt;builtin_function_or_method at remote 0xb7f40ecc&amp;gt;, '_FUNCFLAG_PYTHONAPI': 4, 'create_string_buffer': &amp;lt;function at remote 0xb7f325dc&amp;gt;, 'set_errno': &amp;lt;builtin_function_or_method at remote 0xb7f40d2c&amp;gt;, '_pointer_type_cache': {&amp;lt;_ctypes.SimpleType at remote 0x80f9ec4&amp;gt;: &amp;lt;_ctypes.PointerType at remote 0x80fb864&amp;gt;, &amp;lt;_ctypes.SimpleType at remote 0x80fb32c&amp;gt;: &amp;lt;_ctypes.PointerType at remote 0x80fb504&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;: &amp;lt;_ctypes.SimpleType at remote 0x80fa6a4&amp;gt;}, '_Pointer': &amp;lt;_ctypes.PointerType at remote 0x17bd00&amp;gt;, 'create_unicode_buffer': &amp;lt;function at remote 0xb7f326bc&amp;gt;, 'c_long': &amp;lt;_ctypes.SimpleType at remote 0x80f5124&amp;gt;, 'c_char_p': &amp;lt;_ctypes.SimpleType at remote 0x80fa37c&amp;gt;, '__builtins__': {'bytearray': &amp;lt;type at remote 0x71fb540&amp;gt;, 'IndexError': &amp;lt;type at remote 0x71ff0e0&amp;gt;, 'all': &amp;lt;builtin_function_or_method at remote 0xb7fafccc&amp;gt;, 'help': &amp;lt;_Helper at remote 0xb7f814ec&amp;gt;, 'vars': &amp;lt;builtin_function_or_method at remote 0xb7fb280c&amp;gt;, 'SyntaxError': &amp;lt;type at remote 0x71fed60&amp;gt;, 'unicode': &amp;lt;type at remote 0x720e2c0&amp;gt;, 'sorted': &amp;lt;builtin_function_or_method at remote 0xb7fb274c&amp;gt;, 'isinstance': &amp;lt;builtin_function_or_method at remote 0xb7fb22cc&amp;gt;, 'copyright': &amp;lt;_Printer at remote 0xb7f81d6c&amp;gt;, 'NameError': &amp;lt;type at remote 0x71feac0&amp;gt;, 'BytesWarning': &amp;lt;type at remote 0x72006c0&amp;gt;, 'dict': &amp;lt;type at remote 0x7205960&amp;gt;, 'input': &amp;lt;builtin_function_or_method at remote 0xb7fb224c&amp;gt;, 'oct': &amp;lt;builtin_function_or_method at remote 0xb7fb246c&amp;gt;, 'bin': &amp;lt;builtin_function_or_method at remote 0xb7fafd8c&amp;gt;, 'SystemExit': &amp;lt;type at remote 0x71fe2e0&amp;gt;, 'StandardError': &amp;lt;type at remote 0x71fdf60&amp;gt;, 'format': &amp;lt;builtin_function_or_method at remote 0xb7fb20ac&amp;gt;, 'repr': &amp;lt;builtin_function_or_method at remote 0xb7fb268c&amp;gt;, 'UnicodeDecodeError': &amp;lt;type at remote 0x71ff540&amp;gt;, 'False': &amp;lt;bool at remote 0x71f9624&amp;gt;, 'RuntimeWarning': &amp;lt;type at remote 0x7200340&amp;gt;, 'bytes': &amp;lt;type at remote 0x7209c80&amp;gt;, 'iter': &amp;lt;builtin_function_or_method at remote 0xb7fb230c&amp;gt;, 'reload': &amp;lt;builtin_function_or_method at remote 0xb7fb264c&amp;gt;, 'Warning': &amp;lt;type at remote 0x71ffee0&amp;gt;, 'round': &amp;lt;builtin_function_or_method at remote 0xb7fb26cc&amp;gt;, 'dir': &amp;lt;builtin_function_or_method at remote 0xb7faff4c&amp;gt;, 'cmp': &amp;lt;builtin_function_or_method at remote 0xb7fafe4c&amp;gt;, 'set': &amp;lt;type at remote 0x7207000&amp;gt;, 'list': &amp;lt;type at remote 0x7204420&amp;gt;, 'reduce': &amp;lt;builtin_function_or_method at remote 0xb7fb260c&amp;gt;, 'intern': &amp;lt;builtin_function_or_method at remote 0xb7fb228c&amp;gt;, 'issubclass': &amp;lt;builtin_function_or_method at remote 0xb7fb22ec&amp;gt;, 'apply': &amp;lt;builtin_function_or_method at remote 0xb7fafd4c&amp;gt;, 'EOFError': &amp;lt;type at remote 0x71fe820&amp;gt;, 'locals': &amp;lt;builtin_function_or_method at remote 0xb7fb238c&amp;gt;, 'BufferError': &amp;lt;type at remote 0x71ffe00&amp;gt;, 'slice': &amp;lt;type at remote 0x7207900&amp;gt;, 'FloatingPointError': &amp;lt;type at remote 0x71ff8c0&amp;gt;, 'sum': &amp;lt;builtin_function_or_method at remote 0xb7fb278c&amp;gt;, 'buffer': &amp;lt;type at remote 0x71f9840&amp;gt;, 'getattr': &amp;lt;builtin_function_or_method at remote 0xb7fb20cc&amp;gt;, 'abs': &amp;lt;builtin_function_or_method at remote 0xb7fafc8c&amp;gt;, 'exit': &amp;lt;Quitter at remote 0xb7fd1d2c&amp;gt;, 'print': &amp;lt;builtin_function_or_method at remote 0xb7fb256c&amp;gt;, 'IndentationError': &amp;lt;type at remote 0x71fee40&amp;gt;, 'True': &amp;lt;bool at remote 0x71f9630&amp;gt;, 'FutureWarning': &amp;lt;type at remote 0x7200420&amp;gt;, 'ImportWarning': &amp;lt;type at remote 0x7200500&amp;gt;, 'None': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, 'hash': &amp;lt;builtin_function_or_method at remote 0xb7fb218c&amp;gt;, 'len': &amp;lt;builtin_function_or_method at remote 0xb7fb234c&amp;gt;, 'credits': &amp;lt;_Printer at remote 0xb7f8156c&amp;gt;, 'frozenset': &amp;lt;type at remote 0x72070e0&amp;gt;, '__name__': '__builtin__', 'ord': &amp;lt;builtin_function_or_method at remote 0xb7fb24ec&amp;gt;, 'super': &amp;lt;type at remote 0x720ade0&amp;gt;, 'TypeError': &amp;lt;type at remote 0x71fe040&amp;gt;, 'license': &amp;lt;_Printer at remote 0xb7f8170c&amp;gt;, 'KeyboardInterrupt': &amp;lt;type at remote 0x71fe3c0&amp;gt;, 'UserWarning': &amp;lt;type at remote 0x71fffc0&amp;gt;, 'filter': &amp;lt;builtin_function_or_method at remote 0xb7fb206c&amp;gt;, 'range': &amp;lt;builtin_function_or_method at remote 0xb7fb25ac&amp;gt;, 'staticmethod': &amp;lt;type at remote 0x72035e0&amp;gt;, 'SystemError': &amp;lt;type at remote 0x71ffb60&amp;gt;, 'BaseException': &amp;lt;type at remote 0x7200a60&amp;gt;, 'pow': &amp;lt;builtin_function_or_method at remote 0xb7fb252c&amp;gt;, 'RuntimeError': &amp;lt;type at remote 0x71fe900&amp;gt;, 'float': &amp;lt;type at remote 0x72028a0&amp;gt;, 'GeneratorExit': &amp;lt;type at remote 0x71fe200&amp;gt;, 'StopIteration': &amp;lt;type at remote 0x71fe120&amp;gt;, 'globals': &amp;lt;builtin_function_or_method at remote 0xb7fb210c&amp;gt;, 'divmod': &amp;lt;builtin_function_or_method at remote 0xb7faff8c&amp;gt;, 'enumerate': &amp;lt;type at remote 0x71fdaa0&amp;gt;, 'Ellipsis': &amp;lt;ellipsis at remote 0x7207840&amp;gt;, 'LookupError': &amp;lt;type at remote 0x71ff000&amp;gt;, 'open': &amp;lt;builtin_function_or_method at remote 0xb7fb24ac&amp;gt;, 'quit': &amp;lt;Quitter at remote 0xb7fd120c&amp;gt;, 'basestring': &amp;lt;type at remote 0x7209ba0&amp;gt;, 'UnicodeError': &amp;lt;type at remote 0x71ff380&amp;gt;, 'zip': &amp;lt;builtin_function_or_method at remote 0xb7fb284c&amp;gt;, 'hex': &amp;lt;builtin_function_or_method at remote 0xb7fb21cc&amp;gt;, 'long': &amp;lt;type at remote 0x7204f60&amp;gt;, 'next': &amp;lt;builtin_function_or_method at remote 0xb7fb244c&amp;gt;, 'int': &amp;lt;type at remote 0x7203a40&amp;gt;, 'chr': &amp;lt;builtin_function_or_method at remote 0xb7fafe0c&amp;gt;, '__import__': &amp;lt;builtin_function_or_method at remote 0xb7fafc6c&amp;gt;, 'type': &amp;lt;type at remote 0x720ac20&amp;gt;, '__doc__': &amp;quot;Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.&amp;quot;, 'Exception': &amp;lt;type at remote 0x71fde80&amp;gt;, 'tuple': &amp;lt;type at remote 0x720a6a0&amp;gt;, 'UnicodeTranslateError': &amp;lt;type at remote 0x71ff620&amp;gt;, 'reversed': &amp;lt;type at remote 0x71fdb80&amp;gt;, 'UnicodeEncodeError': &amp;lt;type at remote 0x71ff460&amp;gt;, 'IOError': &amp;lt;type at remote 0x71fe660&amp;gt;, 'hasattr': &amp;lt;builtin_function_or_method at remote 0xb7fb214c&amp;gt;, 'delattr': &amp;lt;builtin_function_or_method at remote 0xb7faff0c&amp;gt;, 'setattr': &amp;lt;builtin_function_or_method at remote 0xb7fb270c&amp;gt;, 'raw_input': &amp;lt;builtin_function_or_method at remote 0xb7fb25ec&amp;gt;, 'PendingDeprecationWarning': &amp;lt;type at remote 0x7200180&amp;gt;, 'compile': &amp;lt;builtin_function_or_method at remote 0xb7fafecc&amp;gt;, 'ArithmeticError': &amp;lt;type at remote 0x71ff7e0&amp;gt;, 'str': &amp;lt;type at remote 0x7209c80&amp;gt;, 'property': &amp;lt;type at remote 0x71fcec0&amp;gt;, 'MemoryError': &amp;lt;type at remote 0x71ffd20&amp;gt;, 'ImportError': &amp;lt;type at remote 0x71fe4a0&amp;gt;, 'xrange': &amp;lt;type at remote 0x7206640&amp;gt;, 'KeyError': &amp;lt;type at remote 0x71ff1c0&amp;gt;, 'coerce': &amp;lt;builtin_function_or_method at remote 0xb7fafe8c&amp;gt;, 'SyntaxWarning': &amp;lt;type at remote 0x7200260&amp;gt;, 'file': &amp;lt;type at remote 0x7201d80&amp;gt;, 'EnvironmentError': &amp;lt;type at remote 0x71fe580&amp;gt;, 'unichr': &amp;lt;builtin_function_or_method at remote 0xb7fb27cc&amp;gt;, 'id': &amp;lt;builtin_function_or_method at remote 0xb7fb220c&amp;gt;, 'OSError': &amp;lt;type at remote 0x71fe740&amp;gt;, 'DeprecationWarning': &amp;lt;type at remote 0x72000a0&amp;gt;, 'min': &amp;lt;builtin_function_or_method at remote 0xb7fb242c&amp;gt;, 'UnicodeWarning': &amp;lt;type at remote 0x72005e0&amp;gt;, 'execfile': &amp;lt;builtin_function_or_method at remote 0xb7fb202c&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, 'complex': &amp;lt;type at remote 0x71fc840&amp;gt;, 'bool': &amp;lt;type at remote 0x71f9560&amp;gt;, 'ValueError': &amp;lt;type at remote 0x71ff2a0&amp;gt;, 'NotImplemented': &amp;lt;NotImplementedType at remote 0x72061e8&amp;gt;, 'map': &amp;lt;builtin_function_or_method at remote 0xb7fb23cc&amp;gt;, 'any': &amp;lt;builtin_function_or_method at remote 0xb7fafd0c&amp;gt;, 'max': &amp;lt;builtin_function_or_method at remote 0xb7fb240c&amp;gt;, 'object': &amp;lt;type at remote 0x720ad00&amp;gt;, 'TabError': &amp;lt;type at remote 0x71fef20&amp;gt;, 'callable': &amp;lt;builtin_function_or_method at remote 0xb7fafdcc&amp;gt;, 'ZeroDivisionError': &amp;lt;type at remote 0x71ffa80&amp;gt;, 'eval': &amp;lt;builtin_function_or_method at remote 0xb7faffcc&amp;gt;, '__debug__': &amp;lt;bool at remote 0x71f9630&amp;gt;, 'ReferenceError': &amp;lt;type at remote 0x71ffc40&amp;gt;, 'AssertionError': &amp;lt;type at remote 0x71ff700&amp;gt;, 'classmethod': &amp;lt;type at remote 0x7203500&amp;gt;, 'UnboundLocalError': &amp;lt;type at remote 0x71feba0&amp;gt;, 'NotImplementedError': &amp;lt;type at remote 0x71fe9e0&amp;gt;, 'AttributeError': &amp;lt;type at remote 0x71fec80&amp;gt;, 'OverflowError': &amp;lt;type at remote 0x71ff9a0&amp;gt;}, '_FUNCFLAG_USE_ERRNO': 8, '_memset_addr': 4962944, '_dlopen': &amp;lt;builtin_function_or_method at remote 0xb7f40e0c&amp;gt;, '__name__': 'ctypes', 'RTLD_LOCAL': 0, 'c_int16': &amp;lt;_ctypes.SimpleType at remote 0x80f407c&amp;gt;, '_SimpleCData': &amp;lt;_ctypes.SimpleType at remote 0x17bde0&amp;gt;, 'wstring_at': &amp;lt;function at remote 0xb7f32a3c&amp;gt;, 'c_void_p': &amp;lt;_ctypes.SimpleType at remote 0x80fa6a4&amp;gt;, 'set_conversion_mode': &amp;lt;builtin_function_or_method at remote 0xb7f40dec&amp;gt;, 'PyDLL': &amp;lt;type at remote 0x80fc3dc&amp;gt;, 'DEFAULT_MODE': 0, 'LittleEndianStructure': &amp;lt;_ctypes.StructType at remote 0x17c040&amp;gt;, 'c_uint64': &amp;lt;_ctypes.SimpleType at remote 0x80f8d54&amp;gt;, 'c_ulonglong': &amp;lt;_ctypes.SimpleType at remote 0x80f8d54&amp;gt;, '_FUNCFLAG_USE_LASTERROR': 16, '_cast_addr': 1490912, 'ARRAY': &amp;lt;function at remote 0xb7f3279c&amp;gt;, 'c_ushort': &amp;lt;_ctypes.SimpleType at remote 0x80f48ac&amp;gt;, '__doc__': 'create and manipulate C data types in Python', '_check_size': &amp;lt;function at remote 0xb7f32684&amp;gt;, 'CDLL': &amp;lt;type at remote 0x80fbeac&amp;gt;, '_wstring_at': &amp;lt;CFunctionType at remote 0xb7f9d644&amp;gt;, 'c_ubyte': &amp;lt;_ctypes.SimpleType at remote 0x80f9564&amp;gt;, 'RTLD_GLOBAL': 256, 'c_char': &amp;lt;_ctypes.SimpleType at remote 0x80f9ec4&amp;gt;, 'c_uint32': &amp;lt;_ctypes.SimpleType at remote 0x80f5974&amp;gt;, 'c_float': &amp;lt;_ctypes.SimpleType at remote 0x80f71fc&amp;gt;, 'SetPointerType': &amp;lt;function at remote 0xb7f32764&amp;gt;, 'resize': &amp;lt;builtin_function_or_method at remote 0xb7f40dcc&amp;gt;, '_c_functype_cache': {(&amp;lt;_ctypes.SimpleType at remote 0x80f5124&amp;gt;, (), 1): &amp;lt;_ctypes.CFuncPtrType at remote 0x8100b14&amp;gt;, (&amp;lt;_ctypes.SimpleType at remote 0x80fa6a4&amp;gt;, (&amp;lt;_ctypes.SimpleType at remote 0x80fa6a4&amp;gt;, &amp;lt;_ctypes.SimpleType at remote 0x80f5124&amp;gt;, &amp;lt;_ctypes.SimpleType at remote 0x80f5974&amp;gt;), 1): &amp;lt;_ctypes.CFuncPtrType at remote 0x80fd84c&amp;gt;}, '---Type &amp;lt;return&amp;gt; to continue, or q &amp;lt;return&amp;gt; to quit---&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;_os': &amp;lt;module at remote 0xb7fa314c&amp;gt;, '_wstring_at_addr': 1494192, 'cast': &amp;lt;function at remote 0xb7f329cc&amp;gt;, 'c_int8': &amp;lt;_ctypes.SimpleType at remote 0x80f9a14&amp;gt;, 'c_byte': &amp;lt;_ctypes.SimpleType at remote 0x80f9a14&amp;gt;, 'c_int64': &amp;lt;_ctypes.SimpleType at remote 0x80f8544&amp;gt;, 'c_voidp': &amp;lt;_ctypes.SimpleType at remote 0x80fa6a4&amp;gt;, '_string_at_addr': 1474016, '_FUNCFLAG_CDECL': 1, 'pythonapi': &amp;lt;PyDLL at remote 0xb7f45a0c&amp;gt;, 'PYFUNCTYPE': &amp;lt;function at remote 0xb7f327d4&amp;gt;, '_CFuncPtr': &amp;lt;_ctypes.CFuncPtrType at remote 0x17bb40&amp;gt;, '_endian': &amp;lt;module at remote 0xb7fa3944&amp;gt;, '__package__': 'ctypes', 'c_uint16': &amp;lt;_ctypes.SimpleType at remote 0x80f48ac&amp;gt;, 'BigEndianStructure': &amp;lt;_swapped_meta at remote 0x81006ec&amp;gt;, 'pydll': &amp;lt;LibraryLoader at remote 0xb7f459ec&amp;gt;, '__version__': '1.1.0', 'Structure': &amp;lt;_ctypes.StructType at remote 0x17c040&amp;gt;, 'c_uint': &amp;lt;_ctypes.SimpleType at remote 0x80f5974&amp;gt;, 'py_object': &amp;lt;_ctypes.SimpleType at remote 0x80f3dc4&amp;gt;, 'c_wchar_p': &amp;lt;_ctypes.SimpleType at remote 0x80fae7c&amp;gt;, '_string_at': &amp;lt;CFunctionType at remote 0xb7f9d5dc&amp;gt;, 'c_size_t': &amp;lt;_ctypes.SimpleType at remote 0x80f5974&amp;gt;, 'c_uint8': &amp;lt;_ctypes.SimpleType at remote 0x80f9564&amp;gt;, 'LibraryLoader': &amp;lt;type at remote 0x80fc704&amp;gt;, 'Array': &amp;lt;_ctypes.ArrayType at remote 0x17bc20&amp;gt;, 'POINTER': &amp;lt;builtin_function_or_method at remote 0xb7f40d4c&amp;gt;}, locals=&amp;lt;unknown at remote 0x0&amp;gt;, args=0x80ec788, argcount=1, kws=0x80ec78c, kwcount=0, defs=0xb7f45d78, defcount=1, closure=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;unknown at remote 0x0&amp;gt;) at Python/ceval.c:2968&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#13 0x07160983 in fast_function (&lt;b&gt;f=&lt;/b&gt;&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&lt;b&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;File &amp;lt;stdin&amp;gt;, line 2, in bar (self=&amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, string_at=&amp;lt;function at remote 0xb7f32a04&amp;gt;)&lt;/b&gt;, throwflag=0) at Python/ceval.c:3802&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#14 call_function (&lt;b&gt;f=&lt;/b&gt;&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&lt;b&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;File &amp;lt;stdin&amp;gt;, line 2, in bar (self=&amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, string_at=&amp;lt;function at remote 0xb7f32a04&amp;gt;)&lt;/b&gt;, throwflag=0) at Python/ceval.c:3727&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#15 PyEval_EvalFrameEx (f=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;File &amp;lt;stdin&amp;gt;, line 2, in bar (self=&amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, string_at=&amp;lt;function at remote 0xb7f32a04&amp;gt;), throwflag=0) at Python/ceval.c:2389&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#16 0x07161b79 in fast_function (f=File &amp;lt;stdin&amp;gt;, line 1, in &amp;lt;module&amp;gt; (), throwflag=0) at Python/ceval.c:3792&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#17 call_function (f=File &amp;lt;stdin&amp;gt;, line 1, in &amp;lt;module&amp;gt; (), throwflag=0) at Python/ceval.c:3727&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#18 PyEval_EvalFrameEx (f=File &amp;lt;stdin&amp;gt;, line 1, in &amp;lt;module&amp;gt; (), throwflag=0) at Python/ceval.c:2389&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#19 0x07162642 in PyEval_EvalCodeEx (co=0xb7f2e578, globals=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'f': &amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, '__builtins__': &amp;lt;module at remote 0xb7fa3074&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, '__name__': '__main__', 'Foo': &amp;lt;classobj at remote 0xb7f3817c&amp;gt;, '__doc__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;}, locals=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'f': &amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, '__builtins__': &amp;lt;module at remote 0xb7fa3074&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, '__name__': '__main__', 'Foo': &amp;lt;classobj at remote 0xb7f3817c&amp;gt;, '__doc__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;}, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0,&amp;nbsp;&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;closure=&amp;lt;unknown at remote 0x0&amp;gt;) at Python/ceval.c:2968&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#20 0x071627a3 in PyEval_EvalCode (co=0xb7f2e578, globals=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'f': &amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, '__builtins__': &amp;lt;module at remote 0xb7fa3074&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, '__name__': '__main__', 'Foo': &amp;lt;classobj at remote 0xb7f3817c&amp;gt;, '__doc__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;}, locals=&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'f': &amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, '__builtins__': &amp;lt;module at remote 0xb7fa3074&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, '__name__': '__main__', 'Foo': &amp;lt;classobj at remote 0xb7f3817c&amp;gt;, '__doc__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;}) at Python/ceval.c:522&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#21 0x0717d94b in run_mod (mod=&amp;lt;value optimized out&amp;gt;, filename=&amp;lt;value optimized out&amp;gt;, &lt;b&gt;globals=&lt;/b&gt;&lt;/address&gt;&lt;b&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'f': &amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, '__builtins__': &amp;lt;module at remote 0xb7fa3074&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, '__name__': '__main__', 'Foo': &amp;lt;classobj at remote 0xb7f3817c&amp;gt;, '__doc__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;}, locals=&lt;/address&gt;&lt;/b&gt;&lt;address out="" of="" bounds=""&gt;&lt;b&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{'f': &amp;lt;Foo({'someattr': 42, 'someotherattr': {'three': [(), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;,), (&amp;lt;NoneType at remote 0x72061e0&amp;gt;, &amp;lt;NoneType at remote 0x72061e0&amp;gt;)], 'two': 2L}}) at remote 0xb7f3946c&amp;gt;, '__builtins__': &amp;lt;module at remote 0xb7fa3074&amp;gt;, '__package__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;, '__name__': '__main__', 'Foo': &amp;lt;classobj at remote 0xb7f3817c&amp;gt;, '__doc__': &amp;lt;NoneType at remote 0x72061e0&amp;gt;}&lt;/b&gt;, flags=0xbffff2fc, arena=0x80e8628) at Python/pythonrun.c:1335&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#22 0x0717f4a6 in PyRun_InteractiveOneFlags (fp=0x5b5420, filename=0x71c3e7d &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, flags=0xbffff2fc) at Python/pythonrun.c:840&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#23 0x0717f6ab in PyRun_InteractiveLoopFlags (fp=0x5b5420, filename=0x71c3e7d &amp;quot;&amp;lt;stdin&amp;gt;&amp;quot;, flags=&amp;lt;value optimized out&amp;gt;) at Python/pythonrun.c:760&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#24 0x0717f7eb in PyRun_AnyFileExFlags (fp=0x5b5420, filename=&amp;lt;value optimized out&amp;gt;, closeit=0, flags=0xbffff2fc) at Python/pythonrun.c:729&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#25 0x0718c212 in Py_Main (argc=1, argv=0xbffff3f4) at Modules/main.c:599&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;#26 0x080485c7 in main (argc=1, argv=0xbffff3f4) at Modules/python.c:23&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&lt;/address&gt;&lt;address out="" of="" bounds=""&gt;&amp;nbsp;&lt;/address&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;a name='cutid1-end'&gt;&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:4183</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/4183.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=4183"/>
    <title>Python memory usage: is it worth sharing constant data?</title>
    <published>2009-12-19T01:19:52Z</published>
    <updated>2009-12-19T01:19:52Z</updated>
    <category term="fedora"/>
    <category term="python"/>
    <content type="html">I had some interesting conversations at &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FFUDCon%3AToronto_2009" rel="nofollow" rel="nofollow"&gt;FUDcon Toronto&lt;/a&gt; [1] about python's memory footprint, and about ways of reducing the whole-system memory pressure of a collection of Python processes, by sharing more pages of memory between them.&lt;br /&gt;&lt;br /&gt;We discussed sharing memory between python processes.&lt;br /&gt;&lt;br /&gt;One approach to doing this would be to write some kind of support for starting python as a "zygote process", where we keep a started-up python process around, forking from it, so that the freshly forked process (in theory) uses only memory for the pages that become different (copy-on-write).  As I understand it, this is how &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fcode.google.com%2Fp%2Fchromium%2Fwiki%2FLinuxZygote" rel="nofollow" rel="nofollow"&gt;Chromium implements its tabs&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Bernie Innocenti apparently tried this for OLPC's activities (each a PyGTK process) but only got 10-20% sharing.  The problem is that every python object embeds a reference count, and constantly increments/decrements those ob_refcnt fields, leading to unsharing of the pages.&lt;br /&gt;&lt;br /&gt;One way of improving this might be to invent a magic refcount value (e.g. 0) that means "don't refcount this", so that pages can be shared (stuff that's been created at vm startup will probably be long-lived).  (do we need the ability to "seal" an object, to optimize for the common case where nothing's been monkeypatched?)&lt;br /&gt;&lt;br /&gt;(Or we could completely scrap refcounting and go to a full gc model, but I'm looking for easy wins here)&lt;br /&gt;&lt;br /&gt;A similar approach that doesn't require us to have a zygote process is to use &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FFeatures%2FKSM" rel="nofollow" rel="nofollow"&gt;KSM (Kernel SamePage Merging)&lt;/a&gt;.  This is a feature added in recent versions of the Linux kernel, where a program can mark regions of its address space.  The kernel can try to hash these pages, and pages that are bit-for-bit identical will be shared in RAM across the various processes.  &lt;br /&gt;&lt;br /&gt;KSM was developed by Qumranet hackers (now at Red Hat) for use in KVM.   Perhaps Python could use it?&lt;br /&gt;&lt;br /&gt;Unfortunately I don't think this approach will work either: all of the various PyObject* pointers cross-referencing the data in memory will be non-equal, and that will kill the sharing; the pages are unlikely to be bit-for-bit identical.&lt;br /&gt;&lt;br /&gt;One idea for achieving equality of pages was to mimic how we build emacs: as I understand it, we build emacs as a process, then take a core-dump of it.  On starting up emacs we bring the coredump back to life.  (I may be horribly mischaracterizing this - it seems very fragile to me).&lt;br /&gt;&lt;br /&gt;An approach that sounded promising was to try to consolidate the representation of the immutable blobs of data in the loaded python modules: the docstrings, the constant strings and unicode representations; the bytecode blobs (actually these are PyStringObjects).&lt;br /&gt;&lt;br /&gt;The idea was a new variant of .pyc/.pyo.  A .pyc file is a hierarchical representation of a parsed python module, containing everything needed at runtime (e.g. optimized bytecodes blobs rather than source strings, with links to the constants needed in the code) serialized to disk using the "marshal" module.  It's analogous to pickle, except that the marshal format only caters to strict hierarchies of objects, whereas pickle supports cross-references, and this the marshal code can be simpler (and hopefully more efficient).&lt;br /&gt;&lt;br /&gt;So in our proposed variant of .pyc, we would split the data into two streams:&lt;br /&gt;  - control data for building the hierarchy of objects, to be thrown away after the module is loaded&lt;br /&gt;  - "large" data to be mmap-ed, to persist in the process' address space after the module is loaded, with the kernel sharing all instances of this data in RAM between all python processes.  &lt;br /&gt;&lt;br /&gt;This would require hooks in PyStringObject (need PyBytesObject to do it for py3k) e.g. a new ob_sstate: SSTATE_INTERNED_MMAPPED, which places the bytes in a pointer elsewhere in the address space.&lt;br /&gt;&lt;br /&gt;Some approaches to doing this:&lt;br /&gt;  - use the .pyc format as is.  Unfortunately I don't think this works: currently they're written to disk as (size, bytes) without a nul terminator, whereas PyStringObject assumes that ob_sval is nul-terminated. &lt;br /&gt;  - come up with a simple variant of .pyc that splits the marshalled data into two streams (as above), storing offsets into the second stream within the first whenever writing out e.g. a PyStringObject&lt;br /&gt;  - use the ELF format directly: ELF is a binary format supporting multiple chunks/streams of data, with room for expansion, and a library and command-line tools for manipulating them.  We could invent some new types of section.  However I suspect that tools for dealing with ELF files are only readily-available on Linux (it happens to be the native format for executables and shared libraries)  (we came up with the name ".pye" to refer to these ELF-based bytecode files)&lt;br /&gt;&lt;br /&gt;Another idea was linkage: to try to link together all of the .pyc files per RPM into one big file, linking together the large sections as far as possible,  or perhaps a big sqlite db mapping dotted path to .pyc files for standard modules.  The idea here was to try to reduce the number of syscalls needed to locate the .py files.&lt;br /&gt;&lt;br /&gt;As it turned out, this seems to have been a classic case of optimizing without measuring.&lt;br /&gt;&lt;br /&gt;Looking at the "stat" case, starting up the python interpreter 100 times:&lt;br /&gt;&lt;pre&gt;
[david@brick ~]$ time (for i in $(seq 1 100) ; do python -c "pass" ; done)

real	0m3.129s
user	0m2.328s
sys	0m0.652s
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;...so the bulk of the time taken is in user-space, rather than waiting on the kernel.  (We tried this on the OLPC "boot animation" workload, and I believe the real slowdown is an accidental syncronous call that should have been asyncronous, that's stalling on waiting for a socket to close).&lt;br /&gt;&lt;br /&gt;On my return from the conference I spent some time trying to capture real measurements to justify a possible pyc rewrite.&lt;br /&gt;&lt;br /&gt;To look at the memory usage of all of those shared docstrings, I wrote some systemtap scripts.&lt;br /&gt;&lt;br /&gt;You can see the latest version here: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fdmalcolm.fedorapeople.org%2Fsystemtap%2Fmarshal.stp' rel='nofollow'&gt;http://dmalcolm.fedorapeople.org/systemtap/marshal.stp&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I trieds various approaches to instrumentation.&lt;br /&gt;&lt;br /&gt;The one I've settled on is to instrument returns from the r_object() call in Python/marshal.c, and to record PyObject* instances returned from that function that are of ob_type "str" (i.e. are PyStringObject instances) and have ob_refcnt == 1 (i.e. they are shared with anything, and haven't been interned).&lt;br /&gt;&lt;br /&gt;Assuming I've instrumented things correctly, a simple startup of the interpreter under systemtap:&lt;br /&gt;  $ stap -v marshal.stp -c"python -c'pass'"&lt;br /&gt;has this usage of unique strings (the "value" is the length of the string); note that this includes docstrings, constant strings, and bytecode blobs:&lt;br /&gt;(snipped, full output here: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fdmalcolm.fedorapeople.org%2Fsystemtap%2Foutput-of-marshal.stp-on-python-pass.txt' rel='nofollow'&gt;http://dmalcolm.fedorapeople.org/systemtap/output-of-marshal.stp-on-python-pass.txt&lt;/a&gt; )&lt;br /&gt;&lt;pre&gt;
Total cumulative size of r_object() calls returning strings with refcnt==1:  192K
value |-------------------------------------------------- count
    0 |                                                     0
    1 |@@                                                  54
    2 |@@@@@@@@@@@@@@                                     291
    4 |@@@@@@@@@@@                                        238
    8 |@@@@@@@@@@@@@@                                     281
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    959
   32 |@@@@@@@@@@@@@@@@@@@@@                              432
   64 |@@@@@@@@@                                          196
  128 |@@@@@@                                             138
  256 |@@@@                                                97
  512 |@@                                                  45
 1024 |                                                    11
 2048 |                                                     5
 4096 |                                                     1
 8192 |                                                     1
16384 |                                                     0
32768 |                                                     0
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;so (assuming my method is correct) we'd save 192K of mmap-ed data per python process.&lt;br /&gt;&lt;br /&gt;For the OLPC case, each "activity" on the laptop is a python process that typically imports the GTK and DBus modules.&lt;br /&gt;&lt;br /&gt;This shows a larger saving: 431K per python process:&lt;br /&gt;&lt;pre&gt;
$ stap -v marshal.stp -c"python -c'import gtk; import dbus'"
&lt;/pre&gt;&lt;br /&gt;(output snipped; full output here: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fdmalcolm.fedorapeople.org%2Fsystemtap%2Foutput-of-marshal.stp-on-python-import-gtk-and-dbus.txt' rel='nofollow'&gt;http://dmalcolm.fedorapeople.org/systemtap/output-of-marshal.stp-on-python-import-gtk-and-dbus.txt&lt;/a&gt; )&lt;br /&gt;&lt;pre&gt;
Total cumulative size of r_object() calls returning strings with refcnt==1:  431K
value |-------------------------------------------------- count
    0 |                                                      0
    1 |@                                                    65
    2 |@@@@@@@@@@@@@                                       534
    4 |@@@@@@@@@@@@@@                                      565
    8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                   1302
   16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1958
   32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      1173
   64 |@@@@@@@@@@@                                         478
  128 |@@@@@@@@                                            336
  256 |@@@@@                                               216
  512 |@@                                                   87
 1024 |                                                     22
 2048 |                                                     10
 4096 |                                                      2
 8192 |                                                      1
16384 |                                                      0
32768 |                                                      0
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Similarly, I suspect that there may be savings if you have numerous python web apps on one box (mod_wsgi daemon mode?), or via KSM savings as above if dealing with multiple guest VMs running python on one host.&lt;br /&gt;&lt;br /&gt;Worth pursuing?&lt;br /&gt;&lt;br /&gt;I also started looking at "yum"'s memory usage; see &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fdmalcolm.fedorapeople.org%2Fsystemtap%2Foutput-of-marshal.stp-on-yum-whatprovides-python.txt' rel='nofollow'&gt;http://dmalcolm.fedorapeople.org/systemtap/output-of-marshal.stp-on-yum-whatprovides-python.txt&lt;/a&gt; .  I wrote a systemtap script to try to instrument the various levels of memory allocation inside the python runtime; see &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fdmalcolm.fedorapeople.org%2Fsystemtap%2Fpython-mem.stp' rel='nofollow'&gt;http://dmalcolm.fedorapeople.org/systemtap/python-mem.stp&lt;/a&gt;  ; unfortunately this script doesn't work yet, owing to &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D546006" rel="nofollow" rel="nofollow"&gt;a systemtap bug&lt;/a&gt;.  Hopefully when that's fixed we can get some real insight into this.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[1] with &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fcodewiz.org%2F" rel="nofollow" rel="nofollow"&gt;Bernie Innocenti&lt;/a&gt;, &lt;a href="http://cgwalters.livejournal.com/" target="_blank"&gt;Colin Walters&lt;/a&gt;, &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Flewk.org%2Fblog" rel="nofollow" rel="nofollow"&gt;Luke Macken&lt;/a&gt;, &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.freedesktop.org%2F%7Eajax%2F" rel="nofollow" rel="nofollow"&gt;Adam Jackson&lt;/a&gt; and others; please forgive me if I've forgotten you.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:3935</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/3935.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=3935"/>
    <title>2to3c: an implementation of Python's 2to3 for C code</title>
    <published>2009-11-21T00:54:51Z</published>
    <updated>2009-11-21T00:54:51Z</updated>
    <category term="python"/>
    <content type="html">I'm hoping that we'll package python 3 versions of as many modules as possible in &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2F" rel="nofollow" rel="nofollow"&gt;Fedora 13&lt;/a&gt;, so the easier it is to port them, the better.&lt;br /&gt;&lt;br /&gt;To that end, I've written a tool to help people port their C python extensions from Python 2 to Python 3.&lt;br /&gt;&lt;br /&gt;It uses the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fcoccinelle.lip6.fr%2F" rel="nofollow" rel="nofollow"&gt;Coccinelle&lt;/a&gt; tool to apply a &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2Fgitweb%3Fp%3Ddmalcolm%2Fpublic_git%2F2to3c.git%3Ba%3Dtree%3Bf%3Dfixes%3Bhb%3DHEAD" rel="nofollow" rel="nofollow"&gt;series of "semantic patches" to .c files&lt;/a&gt;.  I also had to code one of the refactorings in python with regular expressions (due to the need to &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2Fgitweb%3Fp%3Ddmalcolm%2Fpublic_git%2F2to3c.git%3Ba%3Dblob%3Bf%3Dfixes%2Ftypeobject.py" rel="nofollow" rel="nofollow"&gt;manipulate preprocessor macros containing commas&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Sample session, running on a tarball of dbus-python:&lt;br /&gt;&lt;pre&gt;
[david@brick 2to3]$ ./2to3c --help
Usage: 2to3c [options] filenames...

Options:
  -h, --help   show this help message and exit
  -w, --write  Write back modified files
[david@brick 2to3]$ ./2to3c ../../python3/packaging/modules/by-hand/dbus-python/devel/dbus-python-0.83.0/_dbus_bindings/*.c &amp;gt; dbus-python.patch 

[david@brick 2to3]$ diffstat dbus-python.patch
 abstract.c       |   28 ++----
 bus.c            |    4 
 bytes.c          |   16 +--
 conn.c           |    7 -
 containers.c     |   21 ++--
 float.c          |    6 -
 generic.c        |    4 
 int.c            |   31 ++-----
 libdbusconn.c    |    5 -
 mainloop.c       |    3 
 message-append.c |    4 
 message.c        |   17 +--
 module.c         |  243 ++++++++++++++++++++++++++++++++++++++++++++-----------
 pending-call.c   |    3 
 server.c         |    7 -
 signature.c      |    6 -
 string.c         |    9 --
 17 files changed, 267 insertions(+), 147 deletions(-)

[david@brick 2to3]$ head -n 30 dbus-python.patch
--- ../../python3/packaging/modules/by-hand/dbus-python/devel/dbus-python-0.83.0/_dbus_bindings/abstract.c.orig 
+++ ../../python3/packaging/modules/by-hand/dbus-python/devel/dbus-python-0.83.0/_dbus_bindings/abstract.c 
@@ -54,7 +54,7 @@
 
     if (!vl_obj)
         return 0;
-    return PyInt_AsLong(vl_obj);
+    return PyLong_AsLong(vl_obj);
 }
 
 dbus_bool_t
@@ -76,7 +76,7 @@
         }
     }
     else {
-        PyObject *vl_obj = PyInt_FromLong(variant_level);
+        PyObject *vl_obj = PyLong_FromLong(variant_level);
         if (!vl_obj) {
             Py_DECREF(key);
             return FALSE;
@@ -127,7 +127,7 @@
     Py_DECREF(key);
 
     if (!value)
-        return PyInt_FromLong(0);
+        return PyLong_FromLong(0);
     Py_INCREF(value);
     return value;
 }
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;You can see the full patch it generated here: &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fdmalcolm.fedorapeople.org%2Fdbus-python.patch' rel='nofollow'&gt;http://dmalcolm.fedorapeople.org/dbus-python.patch&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It hasn't done all of the work, there are some places involving the preprocessor where it didn't quite generate correct code, and there are some remaining issues - for example, a human is going to have to decide whether the strings are bytes or unicode.&lt;br /&gt;&lt;br /&gt;However, I think this ought to save a lot of time: it takes care of a lot of the tedious parts of such patches.&lt;br /&gt;&lt;br /&gt;The public git repo can be seen here:&lt;br /&gt;  &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2Fgitweb%3Fp%3Ddmalcolm%2Fpublic_git%2F2to3c.git%3Ba%3Dtree' rel='nofollow'&gt;http://fedorapeople.org/gitweb?p=dmalcolm/public_git/2to3c.git;a=tree&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You should be able to download it by cloning it thus:&lt;br /&gt;&lt;pre&gt;git clone git://fedorapeople.org/home/fedora/dmalcolm/public_git/2to3c.git&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Patches most welcome! (send them to dmalcolm@redhat.com)  I intend to license this under LGPLv2.1, but am happy to relicense as the upstream Python community see fit.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:3689</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/3689.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=3689"/>
    <title>Static analysis of CPython's .c code</title>
    <published>2009-11-16T23:09:54Z</published>
    <updated>2009-11-16T23:09:54Z</updated>
    <category term="python"/>
    <content type="html">I've been hearing good things about &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fcoccinelle.lip6.fr%2F" rel="nofollow" rel="nofollow"&gt;Coccinelle&lt;/a&gt; for a while now, a tool for working with C code.&lt;br /&gt;&lt;br /&gt;For example, it's been used for &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Flwn.net%2FArticles%2F315686%2F" rel="nofollow" rel="nofollow"&gt;automating tedious (and error-prone) work on the Linux kernel&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I decided it was time to take it for a test-drive on CPython code.&lt;br /&gt;&lt;br /&gt;I occasionally run into problems with the &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fdocs.python.org%2Fc-api%2Farg.html" rel="nofollow" rel="nofollow"&gt;PyArg_ParseTuple API&lt;/a&gt;.  It's a convenient API - it makes it very easy to marshal the objects passed as parameters to Python function calls into their C equivalents using a mini-language.  The downside of this approach is that the compiler can't check such code for type safety, and so it's an area where bugs can lurk.&lt;br /&gt;&lt;br /&gt;So I've written a tool which can detect such problems.  You can download it from my &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Ffedoraproject.org%2Fwiki%2FOverview" rel="nofollow" rel="nofollow"&gt;Fedora&lt;/a&gt; people page using this command:&lt;br /&gt;&lt;tt&gt;git clone git://fedorapeople.org/home/fedora/dmalcolm/public_git/check-cpython.git&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;or simply read the code online here:&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Ffedorapeople.org%2Fgitweb%3Fp%3Ddmalcolm%2Fpublic_git%2Fcheck-cpython.git%3Ba%3Dtree' rel='nofollow'&gt;http://fedorapeople.org/gitweb?p=dmalcolm/public_git/check-cpython.git;a=tree&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;To run it you'll need Coccinelle (which includes the &amp;quot;spatch&amp;quot; tool).  On Fedora you can install it using this command:&lt;br /&gt;&lt;tt&gt;yum install coccinelle&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;You should then be able to run it thus:&lt;br /&gt;&lt;tt&gt;spatch -sp_file pyarg-parsetuple.cocci buggy.c&lt;/tt&gt;&lt;br /&gt; &lt;br /&gt;&lt;tt&gt;init_defs_builtins: /usr/share/coccinelle/standard.h&lt;br /&gt;HANDLING: buggy.c&lt;br /&gt;buggy.c:13:socket_htons:Mismatching type of argument 1 in &amp;quot;&amp;quot;i:htons&amp;quot;&amp;quot;: expected &amp;quot;int *&amp;quot; but got &amp;quot;unsigned long *&amp;quot;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;thus correctly finding the bug (an old one, fixed in &lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fsvn.python.org%2Fview%3Fview%3Drev%26amp%3Brevision%3D34931' rel='nofollow'&gt;http://svn.python.org/view?view=rev&amp;amp;revision=34931&lt;/a&gt; )&lt;br /&gt;&lt;br /&gt;Early days yet, but this seems promising. Does anyone know of any other non-proprietary tools that can do this kind of thing?&lt;br /&gt;&lt;br /&gt;(I've posted more info to python-dev list &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fmail.python.org%2Fpipermail%2Fpython-dev%2F2009-November%2F094301.html" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;)</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:3340</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/3340.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=3340"/>
    <title>librpm and python 3(000)</title>
    <published>2009-10-15T21:29:38Z</published>
    <updated>2009-10-15T21:29:38Z</updated>
    <category term="rpm"/>
    <category term="fedora"/>
    <category term="python"/>
    <content type="html">I've had a go at hacking up librpm's python bindings so that they work with both python 2 and python 3.&lt;br /&gt;&lt;br /&gt;Here's my progress so far, trying out some sample rpm python code with py3k.  Some things appear to be working, but I really need a unit test suite to do this properly, I think.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;[david@brick rpm]$ PYTHONPATH=/home/david/coding/python3/rpm-python-bindings/install-prefix/lib/python3.1/site-packages python3&lt;br /&gt;Python 3.1.1 (r311:74480, Oct  1 2009, 12:20:21) &lt;br /&gt;[GCC 4.4.1 20090725 (Red Hat 4.4.1-2)] on linux2&lt;br /&gt;Type &amp;quot;help&amp;quot;, &amp;quot;copyright&amp;quot;, &amp;quot;credits&amp;quot; or &amp;quot;license&amp;quot; for more information.&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; import rpm&lt;br /&gt;/home/david/coding/python3/rpm-python-bindings/install-prefix/lib/python3.1/site-packages/rpm/__init__.py:9: DeprecationWarning: Type rpm.hdr defines tp_reserved (formerly tp_compare) but not tp_richcompare. Comparisons may not behave as intended.&lt;br /&gt;  from rpm._rpm import *&lt;br /&gt;# (clearly I need to fix this)&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; rpm.addMacro(&amp;quot;_dbpath&amp;quot;, &amp;quot;/var/lib/rpm&amp;quot;)&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; ts = rpm.TransactionSet()&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; mi = ts.dbMatch()&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; for h in mi:&lt;br /&gt;...     print(&amp;quot;%s-%s-%s&amp;quot; % (h['name'], h['version'], h['release']))&lt;br /&gt;b'im-chooser'-b'1.2.6'-b'3.fc11'&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;etc. (note how print has become a function, and how the data is coming back as &amp;quot;bytes&amp;quot;, not as strings)&lt;br /&gt;&lt;br /&gt;For the gory details, I sent my patches here:&lt;br /&gt;http://lists.rpm.org/pipermail/rpm-maint/2009-October/002528.html&lt;br /&gt;&lt;br /&gt;Does anyone have a good test suite for the python rpm API?&lt;br /&gt;&lt;br /&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:3271</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/3271.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=3271"/>
    <title>Whence came "codebase"?</title>
    <published>2009-06-24T20:27:59Z</published>
    <updated>2009-06-24T20:27:59Z</updated>
    <content type="html">The word &amp;quot;codebase&amp;quot; has been bugging me lately.&amp;nbsp; I've noticed myself using it, and it's starting to grate on me.&lt;br /&gt;&lt;br /&gt;It appears to mean a collection of computer code, from an implementation perspective.&amp;nbsp; Users of the word might refer to the &amp;quot;foo-1.0 codebase&amp;quot; as opposed to the &amp;quot;foo-2.0 codebase, where we totally rewrote everything&amp;quot;.&lt;br /&gt;&lt;br /&gt;Assuming that this is what the word means, it's a useful one.&amp;nbsp; However, where did this word come from, and when?&amp;nbsp;&amp;nbsp; Does it have a more precise meaning, or has it spread across the internet with the rough (but useful) meaning above?&lt;br /&gt;&lt;br /&gt; The only definitions I can find are within Wikipedia or Wictionary (neither of which I'm a fan of).&amp;nbsp; I don't see it in the Jargon File either (but ditto, frankly).&amp;nbsp; It also seems to have a (different) specific meaning for Java applets, and is the name of a proprietary database product.&lt;br /&gt;&lt;br /&gt;Does anyone have an etymology for this word?</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:3069</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/3069.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=3069"/>
    <title>squeal 0.4</title>
    <published>2009-06-05T01:04:45Z</published>
    <updated>2009-06-05T01:04:45Z</updated>
    <content type="html">&lt;div class=""&gt;&lt;div&gt;&lt;p&gt;I've put together a tarball release of my SQL/command-line collision, formerly &amp;quot;show&amp;quot;, now &amp;quot;squeal&amp;quot;&lt;br /&gt;&lt;br /&gt;Tarball here:&amp;nbsp;&amp;nbsp; &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.redhat.com%2Fdmalcolm%2Fpython%2Fsqueal-0.4-3.src.rpm" rel="nofollow" rel="nofollow"&gt;https://fedorahosted.org/released/squeal/squeal-0.4.tar.gz&lt;/a&gt;&lt;br /&gt;SRPM for Fedora/RHEL 5 here: &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.redhat.com%2Fdmalcolm%2Fpython%2Fsqueal-0.4-3.src.rpm" rel="nofollow" rel="nofollow"&gt;http://people.redhat.com/dmalcolm/python/squeal-0.4-3.src.rpm&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's what happened other than the name change:&lt;/p&gt;&lt;h2&gt;New Data Sources&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23NewDataSources" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h2&gt; &lt;h3&gt;Text files and streams&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23Textfilesandstreams" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h3&gt; &lt;p&gt;Squeal can now deal with arbitrary text files, and on stdin (using &amp;quot;-&amp;quot;).&lt;/p&gt; &lt;p&gt;It uses the first line of the input to determine the number of columns, giving you an input source of string columns named &amp;quot;col0&amp;quot;, &amp;quot;col1&amp;quot;, ..., up to N-1.&lt;/p&gt; &lt;p&gt;By default it splits columns on whitespace, but you can use -F a.la. &amp;quot;awk&amp;quot; to specify a field separator, e.g.:&lt;/p&gt; &lt;pre class=""&gt;
   cat some-file.txt | squeal -F: --format=html -
&lt;/pre&gt;&lt;p&gt;to generate an HTML table from a colon-separated file.&lt;/p&gt; &lt;p&gt;You can also specify a regular expression containing groups, which will become the columns, e.g.:&lt;/p&gt; &lt;pre class=""&gt;
   squeal col0, count from -r &amp;quot;(\&amp;lt;[^\&amp;gt;]+\&amp;gt;) (.*)&amp;quot; irc-log.txt group by col0 order by count desc
&lt;/pre&gt;&lt;p&gt;to figure out who's the chattiest in an IRC log.&lt;/p&gt; &lt;p&gt;Of course, if you need more complicated parsing, it's probably worth writing a dedicated data-source backend.&lt;/p&gt; &lt;h3&gt;Archives&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23Archives" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h3&gt; &lt;p&gt;You can now issue SQL-like queries upon the contents of various kinds of archive files: .zip, .tar, .tar.bz2, .tar.gz2, and .rpm.  For example, here's a query on the payload of an RPM file:&lt;/p&gt; &lt;pre class=""&gt;
    $ squeal &amp;quot;total(size)&amp;quot; from ~/rpmbuild/RPMS/noarch/squeal-0.4-1.fc10.noarch.rpm
    total(size)|
    -----------+
       171407.0|
&lt;/pre&gt;&lt;h3&gt;tcpdump/Wireshark files&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23tcpdumpWiresharkfiles" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h3&gt; &lt;p&gt;I wrote a proof-of-concept backend for querying tcpdump files, e.g.:&lt;/p&gt; &lt;pre class=""&gt;
$ squeal &amp;quot;count(*)&amp;quot;, &amp;quot;total(length)&amp;quot;, src_mac, dst_mac from test.pcap group by src_mac, dst_mac
&lt;/pre&gt;&lt;p&gt;to analyse the quantity of network traffic between pairs of machines.&lt;/p&gt; &lt;p&gt;Internally it's merely invoking tcpdump turn the file back into textual form, then carve up with regexps, so it's not at all robust yet.&lt;/p&gt; &lt;h3&gt;/var/log/maillog* (sendmail and spamd)&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23varlogmaillogsendmailandspamd" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h3&gt; &lt;p&gt;I wrote an experimental parser for /var/log/maillog&lt;/p&gt; &lt;h2&gt;Query Syntax Improvements&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23QuerySyntaxImprovements" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h2&gt; &lt;p&gt;Squeal can split its own arguments, to minimize the amount of escaping that you have to do within the shell.  You can now pass in mixed queries like:&lt;/p&gt; &lt;pre class=""&gt;
      $ squeal &amp;quot;count(*), total(size) host from&amp;quot; /var/log/httpd/*error_log* \
      &amp;quot;order by total(size) desc&amp;quot;
&lt;/pre&gt;&lt;p&gt;where some of the arguments are split by the shell, and some by squeal's parser.&lt;/p&gt; &lt;p&gt;You can now type &amp;quot;count&amp;quot; rather than &amp;quot;count(*)&amp;quot;, provided &amp;quot;count&amp;quot; isn't a column name (it's a pain to type, and to have to escape this from the shell). So the above query can be simplified further to:&lt;/p&gt; &lt;pre class=""&gt;
      $ squeal &amp;quot;count, total(size) host from&amp;quot; /var/log/httpd/*error_log* \
      &amp;quot;order by total(size) desc&amp;quot;
&lt;/pre&gt;&lt;h2&gt;Output&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23Output" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h2&gt; &lt;p&gt;I added two new formatting options:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&amp;quot;text&amp;quot;  : outputs as lines of space-separated fields&lt;/li&gt;&lt;li&gt;&amp;quot;table&amp;quot; : outputs an ascii-art table&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Bugfixes&lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2Fwiki%2FReleaseNotes%2F0.4%23Bugfixes" class="" title="Link to this section" rel="nofollow" rel="nofollow"&gt; &amp;para;&lt;/a&gt;&lt;/h2&gt; &lt;ul&gt;&lt;li&gt;The httpd log backend now supports parsing logs containing usernames&lt;/li&gt;&lt;li&gt;The syslog backend can now deal with single-digit dates within a month(!)&lt;/li&gt;&lt;li&gt;Deal with absolute and relative paths when path-matching input filenames; only check for Augeas below /etc&lt;/li&gt;&lt;li&gt;Various internal cleanups&lt;/li&gt;&lt;li&gt;Started a unit test suite.&lt;/li&gt;&lt;li&gt;Support Python &amp;lt; 2.5 by using earlier versions of sqlite; runs on RHEL 5&lt;/li&gt;&lt;li&gt;Work around an issue seen sometimes with the RPM backend&lt;/li&gt;&lt;li&gt;Detect exceptions in execution of the internal sqlite queries, and log them&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;Enjoy!&lt;br /&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:2785</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/2785.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=2785"/>
    <title>"show" is now "squeal"</title>
    <published>2009-05-25T03:07:48Z</published>
    <updated>2009-05-25T15:47:32Z</updated>
    <content type="html">My &lt;a href="http://dmalcolm.livejournal.com/1301.html" target="_blank"&gt;command-line/SQL hybrid&lt;/a&gt; needed a new name, since &amp;quot;show&amp;quot; was regarded as too generic.&lt;br /&gt;&lt;br /&gt;Thanks everyone who suggested a name, and for the feedback.  &amp;quot;squeal&amp;quot; was my favourite, so &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fsqueal%2F" rel="nofollow" rel="nofollow"&gt;&amp;quot;show&amp;quot; is now named squeal&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'm working on a release; it's gained a few new features since I last blogged.&lt;br /&gt;&lt;br /&gt;(Thanks to the Fedora Infrastructure team for doing the rename)&lt;br /&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:2433</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/2433.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=2433"/>
    <title>How not to render 3D graphics: 40 ways to get a blank black screen</title>
    <published>2009-04-12T21:52:27Z</published>
    <updated>2009-04-12T21:52:27Z</updated>
    <content type="html">I've learned most of these the hard way:&lt;br /&gt;    &lt;ul&gt;&lt;li&gt;The monitor is in power-saving mode&lt;/li&gt;&lt;li&gt;Your video card has two outputs, and the monitor is plugged into the wrong one.&lt;/li&gt;&lt;li&gt;The screensaver kicked in while you were talking to your boss, but your program has grabbed input focus.&lt;/li&gt;&lt;li&gt;You've forgotten to flip the display buffers.  You're drawing everything OK, but not displaying it.&lt;/li&gt;&lt;li&gt;There's a bug in your memory management.  You're rendering to a different area of video RAM than the one the monitor is reading from.&lt;/li&gt;&lt;li&gt;Your scene is too complex; either the CPU or CPU crashed before the first frame finished rendering.&lt;/li&gt;&lt;li&gt;The &amp;quot;near&amp;quot; clip plane is further away than the &amp;quot;far&amp;quot; clip plane&lt;/li&gt;&lt;li&gt;You're rejecting the wrong side of each line as you walk your BSP tree; your universe is hiding just beyond the corner of your eye.&lt;/li&gt;&lt;li&gt;You forgot to set up the bounding-boxes for your objects, and all of them are being rejected as too small&lt;/li&gt;&lt;li&gt;You have a bug in your matrix library.  The entire universe has collapsed to (0,0,0).&lt;/li&gt;&lt;li&gt;You have a bug in your quaternion library.  All of your rotated objects are collapsing to points.&lt;/li&gt;&lt;li&gt;You have a bug in your back-face culling.  Every face is being culled.&lt;/li&gt;&lt;li&gt;Your camera is too narrow angle; the universe has collapsed to a single point directly ahead of you&lt;/li&gt;&lt;li&gt;Your camera's so wide-angle... (insert mother joke)&lt;/li&gt;&lt;li&gt;You've confused the origins of screen space and of the frame buffer, and all of the scene is being rendered off the side of the framebuffer (and culled per-fragment)&lt;/li&gt;&lt;li&gt;You're using index-colour, and you forgot to set up a palette: all colours are black.&lt;/li&gt;&lt;li&gt;Your scale is wildly wrong: the scene is vastly larger than you expect.  You are viewing a single texel on a single triangle in the scene.&lt;/li&gt;&lt;li&gt;Your scale is wildly wrong: the scene is vastly smaller than you expect.  The entire world has collapsed to a tiny point in the centre of the screen.&lt;/li&gt;&lt;li&gt;The camera is outside the scene and pointing the wrong way.  Did you try looking behind you?  Above you?  etc&lt;/li&gt;&lt;li&gt;You forgot to add any lights to the scene.  Darkness remains over the surface of the deep.&lt;/li&gt;&lt;li&gt;You forgot to set the textures.  All of the scene is being rendered with a blank texture.&lt;/li&gt;&lt;li&gt;You forgot to set any UV coordinates.  The entire scene is being rendered with the colour at (0,0) in their textures.&lt;/li&gt;&lt;li&gt;You're doing integer multiplication of fractions and everything is coming out as zero.&lt;/li&gt;&lt;li&gt;You've forgotten to offset the objects in the scene in world space and relative to the camera.  You are viewing all of the visible objects in the scene from inside, and back-face culling ensures that nothing is visible.  (Insert bad pun about &amp;quot;Lost In Translation&amp;quot;)&lt;/li&gt;&lt;li&gt;You've forgotten to pop matrices from the transformation stack, and you overflowed the stack after a few frames.  (You should check error codes at least once per frame)&lt;/li&gt;&lt;li&gt;Your collision algorithms aren't good enough.  The object representing the camera has fallen through the floor and is hurtling at great speed downwards into nothingness whilst the world disappears high above.&lt;/li&gt;&lt;li&gt;Your geometry shader has a bug, and all geometry in the scene is appearing behind the camera&lt;/li&gt;&lt;li&gt;Your fragment shader failed to compile, and everything is coming out as black.&lt;/li&gt;&lt;li&gt;You forgot to clear the z-buffer, and all of your fragments are being rejected.&lt;/li&gt;&lt;li&gt;Your scene is foggier than a lazy TV remake of Sherlock Holmes.&lt;/li&gt;&lt;li&gt;You forgot to set up alpha values, and the entire world is fully-transparent.&lt;/li&gt;&lt;li&gt;You're standing in front of a black wall.&lt;/li&gt;&lt;li&gt;The game logic has decided that we're fading out the screen, in preparation for the next level.&lt;/li&gt;&lt;li&gt;You set up all of your data correctly, but a stray pointer is trashing one/all of the above.&lt;/li&gt;&lt;li&gt;The in-game menus have appeared, overlaid in front of your scene.  Unfortunately, due to poor state management, they're far too big, and a single corner of one letter &amp;quot;A&amp;quot; is obscuring the entire screen with black.&lt;/li&gt;&lt;li&gt;Your radiosity renderer doesn't have enough photons.&lt;/li&gt;&lt;li&gt;Your volume shader is too opaque.&lt;/li&gt;&lt;li&gt;Your bidirectional reflectance distribution functions aren't reflective enough&lt;/li&gt;&lt;li&gt;Time is running at a different rate than it should be, and your simulation code has either destroyed all of the objects in the scene, or none have been created yet.&lt;/li&gt;&lt;li&gt;There's a FIXME in your code that really needs fixing...&lt;/li&gt;&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:2283</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/2283.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=2283"/>
    <title>IM IN UR UZER-BIN BRAKIN UR TAB COMPLEESHUN</title>
    <published>2009-04-01T21:44:51Z</published>
    <updated>2009-04-01T21:44:51Z</updated>
    <content type="html">&lt;a href="http://pics.livejournal.com/dmalcolm/pic/00002fgh/" target="_blank"&gt;&lt;img src="https://pics.livejournal.com/dmalcolm/pic/00002fgh" width="280" height="210" border="0" fetchpriority="high" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm hoping to get my &lt;a href="http://dmalcolm.livejournal.com/2009/03/22/" target="_blank"&gt;SQL&lt;/a&gt;/&lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Flwn.net%2FArticles%2F324908%2F" rel="nofollow" rel="nofollow"&gt;command-line&lt;/a&gt; &lt;a href="http://dmalcolm.livejournal.com/2009/03/24/" target="_blank"&gt;mashup&lt;/a&gt; into Fedora, but, alas "show" is &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedoraproject.org%2Fwiki%2FPackaging_tricks%23Use_of_common_namespace" rel="nofollow" rel="nofollow"&gt;too generic to go into /usr/bin&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Some names have been suggested:&lt;br /&gt;  - "squeal", a mispronunciation of "sql" (thanks Nalin and Jeremy).&lt;br /&gt;  - &lt;a href="http://dmalcolm.livejournal.com/1704.html?thread=4520#t4520" target="_blank"&gt;shelect&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Anyone got any other ideas?&lt;br /&gt;&lt;br /&gt;My current favourite name is squeal.&lt;br /&gt;&lt;br /&gt;(The current project location is &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fshow%2F" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt; and the package review request is &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D492816" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;)</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:1932</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/1932.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=1932"/>
    <title>Be excellent to each other</title>
    <published>2009-03-30T21:21:44Z</published>
    <updated>2009-03-30T21:21:44Z</updated>
    <content type="html">I wholeheartedly agree with Seth's comment &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Fskvidal.wordpress.com%2F2009%2F03%2F30%2Frecommendation-for-fedora-list-posts%2F" rel="nofollow" rel="nofollow"&gt;here&lt;/a&gt;.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:1704</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/1704.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=1704"/>
    <title>More command-line SQL</title>
    <published>2009-03-25T02:32:39Z</published>
    <updated>2009-03-25T02:32:39Z</updated>
    <content type="html">I've done a bit more hacking on my &lt;a href="http://dmalcolm.livejournal.com/1301.html" target="_blank"&gt;command-line/SQL mashup&lt;/a&gt;, currently called "show".&lt;br /&gt;&lt;br /&gt;It can now handle /var/log/messages, /var/log/secure (and the rotated logs), so you can issue a command like this:&lt;br /&gt;&lt;pre&gt;
  $ show /var/log/secure* where message like \"%authentication failure%\"
&lt;/pre&gt;&lt;br /&gt;and browse the results&lt;br /&gt;&lt;br /&gt;For example, here's a query with aggregation:&lt;br /&gt;$ show "count(*)", source from /var/log/messages group by source order by "count(*)" desc limit 5&lt;br /&gt;&lt;pre&gt;
count(*)|source        |
--------+--------------+
1635    |kernel        |
1398    |NetworkManager|
98	|ntpd          |
70	|avahi-daemon  |
63	|dhclient      |
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Going beyond log files, I used the rather wonderful &lt;a href="https://www.livejournal.com/away?to=http%3A%2F%2Faugeas.net" rel="nofollow" rel="nofollow"&gt;Augeas library&lt;/a&gt; to get parsers for many of the files in /etc, and wrote a backend to leverage this, so you can write things like:&lt;br /&gt;&lt;pre&gt;
  $ show /etc/passwd where shell !=\'/sbin/nologin\'
&lt;/pre&gt;&lt;br /&gt;and&lt;br /&gt;&lt;pre&gt;
  $ show /etc/yum.repos.d/*.repo where gpgcheck != \'"1"\'
&lt;/pre&gt;&lt;br /&gt;(it's a little dumb about string vs numeric types, and shell escaping requires lots of quotes here)&lt;br /&gt;&lt;br /&gt;I extended the &lt;a href="http://pics.livejournal.com/dmalcolm/pic/00001qes/" target="_blank"&gt;ncurses table-browsing UI&lt;/a&gt; so that you can scroll horizontally as well as vertically, which helps when the columns are wide.&lt;br /&gt;&lt;br /&gt;The Fedora infrastructure team set up a hosted project for me, so you can see the source here:&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fshow%2Fbrowser' rel='nofollow'&gt;https://fedorahosted.org/show/browser&lt;/a&gt;  (thanks!)&lt;br /&gt;&lt;br /&gt;An up-to-date SRPM can be grabbed from here:&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.redhat.com%2Fdmalcolm%2Fshow-0.3-1.fc10.src.rpm' rel='nofollow'&gt;http://people.redhat.com/dmalcolm/show-0.3-1.fc10.src.rpm&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;and you can grab the source via git here:&lt;br /&gt;  $ git clone git://git.fedorahosted.org/show.git&lt;br /&gt;&lt;br /&gt;Thanks to everyone for the great feedback on my previous post.&lt;br /&gt;&lt;br /&gt;I suspect some kind of integration with Func for running queries over groups of machines would be a good next step for this tool (oh, and fixing up the Trac instance)&lt;br /&gt;&lt;br /&gt;Is /usr/bin/show too generic?</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:1301</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/1301.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=1301"/>
    <title>SQL for the command line: "show"</title>
    <published>2009-03-23T00:37:00Z</published>
    <updated>2009-03-23T00:37:00Z</updated>
    <content type="html">I found myself analyzing some apache log files the other day, and found myself wanting a SQL interface to the logs.&lt;br /&gt;&lt;br /&gt;Now, it's possible to send the logs directly into a database, but that wasn't how the machine was configured.&lt;br /&gt;&lt;br /&gt;This got me thinking.  We have many different log formats, and many different sources of data.  All of our tools seem to have different interfaces.&lt;br /&gt;&lt;br /&gt;For example, why should I write regular expressions and shell pipelines to get at my logs? &lt;br /&gt;Why do I have to learn a custom syntax (&amp;quot;rpm -qa --queryformat='various things'&amp;quot;) for looking at the software I have installed?  Why does e.g. the audit subsystem have its own query format?&lt;br /&gt;&lt;br /&gt;Why can't I just use SQL, and write SELECT statements to drill down into all of this data?&lt;br /&gt;&lt;br /&gt;So I started writing a SELECT statement for the command line.&lt;br /&gt;&lt;br /&gt;I didn't want to use SELECT as caps are a recipe for RSI, and annoyingly, &amp;quot;select&amp;quot; is a bash builtin.&lt;br /&gt;&lt;br /&gt;So I've picked &amp;quot;show&amp;quot; as the command; it doesn't seem to be taken by anything on my system.  (it's a SQL command, but hopefully that's not going to be too confusing)&lt;br /&gt;&lt;br /&gt;The idea is that it looks at the FROM part of the query, and looks up a data source based on this.  For example, if you write&lt;br /&gt;&lt;pre&gt;

$ show host, &amp;quot;count(*)&amp;quot;, &amp;quot;total(size)&amp;quot; from /var/log/httpd/*access_log* group by host;
&lt;/pre&gt;&lt;br /&gt;it figures out that you're looking at apache logs, looks up an appropriate backend, and makes a table &amp;quot;on the fly&amp;quot;, so that it can run the query and give you the results:&lt;br /&gt;&lt;pre&gt;

     host|count(*)|total(size)|
---------+--------+-----------+
127.0.0.1|      10|    27679.0|
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;(You have to either use quotes or escaping to deal with parentheses and * characters from the shell)&lt;br /&gt;&lt;br /&gt;You can use filters using &amp;quot;WHERE&amp;quot;:&lt;br /&gt;&lt;pre&gt;

$ show distinct request from /var/log/httpd/*access_log* where status = 404
request                  |
-------------------------+
GET /favicon.ico HTTP/1.1|
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;If you specify more than one filename it adds a &amp;quot;filename&amp;quot; column:&lt;br /&gt;&lt;pre&gt;

# show filename, &amp;quot;count(*)&amp;quot;, &amp;quot;total(size)&amp;quot; from /var/log/httpd/*access_log* group by filename order by &amp;quot;total(size)&amp;quot; desc
                       filename|count(*)|total(size)|
-------------------------------+--------+-----------+
/var/log/httpd/ssl_access_log.4|    1921| 12824849.0|
    /var/log/httpd/access_log.3|     222|  6207367.0|
/var/log/httpd/ssl_access_log.3|     741|  2210799.0|
    /var/log/httpd/access_log.4|     268|   626711.0|
/var/log/httpd/ssl_access_log.1|       8|    13351.0|
/var/log/httpd/ssl_access_log.2|       5|     7305.0|
    /var/log/httpd/access_log.2|       4|     6995.0|
    /var/log/httpd/access_log.1|       2|      288.0|
&lt;/pre&gt;&lt;br /&gt;Naturally, this isn't just for apache logs.&lt;br /&gt;&lt;br /&gt;Here I'm querying the yum logs.  The backend code deals with the changes that happened in the logging format, without me having to worry about this in my query:&lt;br /&gt;&lt;pre&gt;

[root@brick select]# show from /var/log/yum.log* where 'name like &amp;quot;kernel%&amp;quot;' limit 5
           time|    event|           name|  arch|epoch|  version|     release|        filename|
---------------+---------+---------------+------+-----+---------+------------+----------------+
Feb 14 20:00:03|Installed|kernel-firmware|noarch| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 20:00:28|  Updated| kernel-headers|  i386| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 20:15:11|Installed|   kernel-devel|  i686| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 21:05:53|Installed|         kernel|  i686| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
Feb 14 21:12:41|Installed|     kernel-PAE|  i686| None|2.6.27.12|170.2.5.fc10|/var/log/yum.log|
&lt;/pre&gt;&lt;br /&gt;I also wrote &amp;quot;rpm&amp;quot; and &amp;quot;proc&amp;quot; backends:&lt;br /&gt;&lt;pre&gt;

[david@brick ~]$ show name, &amp;quot;count(*)&amp;quot; from rpm group by name having &amp;quot;count(*)&amp;gt;1&amp;quot;
                     name|count(*)|
-------------------------+--------+
               gpg-pubkey|       4|
jakarta-commons-validator|       2|
 java-1.6.0-openjdk-devel|       2|
                   kernel|       3|
             kernel-devel|       2|
               kernel-xen|       3|
                  libgnat|       2|
                  openssl|       2|
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Looking in RPM database by vendor:&lt;br /&gt;&lt;pre&gt;

[david@brick select]$ show vendor, &amp;quot;count(*)&amp;quot; from rpm group by vendor
            vendor|count(*)|
------------------+--------+
              None|      12|
    Fedora Project|    2042|
              Koji|       2|
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;It got tiresome typing &amp;quot;*&amp;quot;, and &amp;quot;from&amp;quot; all the time, so you can omit these:&lt;br /&gt;&lt;pre&gt;

# show rpm where release not like \'%fc%\' order by name limit 4
    name|epoch|version|release|  arch|        vendor|
--------+-----+-------+-------+------+--------------+
 MAKEDEV| None|   3.24|      1|  i386|Fedora Project|
   PyXML| None|  0.8.4|     10|  i386|Fedora Project|
  autofs|    1|  5.0.3|     36|  i386|Fedora Project|
automake| None| 1.10.1|      2|noarch|Fedora Project|
&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;There's a --format option, which can currently be used to emit tables in HTML format.  Other formats could be written e.g. xml, json, yaml, odf spreadsheets, etc.&lt;br /&gt;&lt;br /&gt;For bonus points, I wrote an ncurses UI for browsing tabular results.  The command detects if stdout is connected to a tty, and if so, goes into the UI, otherwise it sends text, so you can use it for shell pipelines etc.&lt;br /&gt;Here's a screenshot from&lt;br /&gt;&lt;pre&gt;
show rpm order by name&lt;/pre&gt;&lt;br /&gt;&lt;a href="http://pics.livejournal.com/dmalcolm/pic/00001qes/" target="_blank"&gt;&lt;img height="164" border="0" width="320" src="https://pics.livejournal.com/dmalcolm/pic/00001qes/s320x240" alt="" fetchpriority="high" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm already finding this useful.&lt;br /&gt;&lt;br /&gt;I'm hoping to host this as a Fedora project.  For now, you can grab the code from my Red Hat page here:&lt;br /&gt;&lt;a href='https://www.livejournal.com/away?to=http%3A%2F%2Fpeople.redhat.com%2Fdmalcolm%2Fshow-0.2-1.fc10.src.rpm' rel='nofollow'&gt;http://people.redhat.com/dmalcolm/show-0.2-1.fc10.src.rpm&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There's plenty of scope for writing new table backends for other data sources/file formats, improving the UI, writing new output formats etc  Any ideas?</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:1032</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/1032.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=1032"/>
    <title>Things to do when you're waiting for a compile</title>
    <published>2008-04-16T21:15:44Z</published>
    <updated>2008-04-16T21:15:44Z</updated>
    <category term="docbook"/>
    <content type="html">I implemented XInclude support for&amp;nbsp; &lt;a class="" href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fdocbook-lint%2F" rel="nofollow" rel="nofollow"&gt;docbook-lint&lt;img alt="" class="" src="https://imgprx.livejournal.net/f2706ff4a0d2c7244a99156d5608b8dc914df8c1/lRdmkRy0lyEKoQaqOUgcDpQ1kc1FSPGydPcRwvRhS7cGBEP9Akt-_qICzAlBQpO0ts4sxmWCZGQadENp5fYvVw" fetchpriority="high" /&gt;&lt;/a&gt; just now, or rather, a first pass at it.&amp;nbsp; Most documents I've been pointing it at turned out to be built from a short "core" document that has a collection of XInclude elements pointing at the rest of the content, so this was a must-have feature.&lt;br /&gt;&lt;br /&gt;Seems to work, but needs testing.&amp;nbsp; Hopefully I'll have a chance to try hooking it up with &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fpublican" rel="nofollow" rel="nofollow"&gt;Publican&lt;/a&gt; soon (dear lazyweb: it's not clear how to get at the source directly on that page), and maybe make the toolchain a little better.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:dmalcolm:784</id>
    <link rel="alternate" type="text/html" href="https://dmalcolm.livejournal.com/784.html"/>
    <link rel="self" type="text/xml" href="https://dmalcolm.livejournal.com/data/atom/?itemid=784"/>
    <title>Trying to improve the signal:noise ratio</title>
    <published>2008-04-13T23:40:45Z</published>
    <updated>2008-04-13T23:40:45Z</updated>
    <content type="html">I've been avoiding this for a while, but thanks to &lt;a href="http://katzj.livejournal.com/421776.html" target="_blank"&gt;Jeremy's prodding&lt;/a&gt;,&amp;nbsp; I've started this blog.&amp;nbsp; I've got a few coding projects in-flight at the moment, and it's about time I let the world know about them. &amp;nbsp; So far I've set up a home page for &lt;a href="https://www.livejournal.com/away?to=https%3A%2F%2Ffedorahosted.org%2Fdocbook-lint%2F" rel="nofollow" rel="nofollow"&gt;docbook-lint&lt;/a&gt;, but there are some others on my hard drive that ought to see the light of day.&amp;nbsp; More to follow... I hope.</content>
  </entry>
</feed>
