Archive for January, 2009


A tale of a resurrected bug

This is the story of a Friday night spent tracking down a particularly pesky bug in my Plone site. In gory detail.

Crap, it’s not working…

I had just finished moving one of our larger Plone deployments to a new host. Everything had gone, with one or two minor exceptions, exceedingly smoothly, and I was looking forward to setting aside the laptop for the rest of the evening. (Yes, that does happen occasionally 😉 …But then I got an IM…

[our client]: 22:06:50
i am getting:

Bad gateway
The proxy server received an invalid response from an upstream server

Well, shit. That’s weird… Normally I might have expected this 502 error from squid when Zope is restarting, but in this case Zope is directly behind apache, and is running fine. And the error’s only occurring on one particular page…what the heck? Also… why is this a problem now when it definitely wasn’t happening on the old server? My entire software stack is installed from a buildout with pinned versions, and I copied the apache configuration, so what variables could be left?

I could probably have already known at this point that the evening was shot. But what to do?

Time for some inspection…

Okay, well obviously Apache doesn’t like something it’s getting from Zope. Maybe a Zope bug of some sort? Let’s check the apache error log and see what it tells us:

[Fri Jan 23 22:12:05 2009] [warn] proxy: bad HTTP/1.1 header returned by /@@sub_trial (POST)

Gee, thanks for all the detail, apache. Well, at least this confirms my suspicions that some header from Zope is wrong. But what could it be? Time to hit Zope directly and see what we can see in the headers. Open up the port in the firewall, enter the Virtual Host Monster URL to the problem page, view the headers in Firebug…what have we got?

Date: Mon, 26 Jan 2009 08:35:33 GMT
Server: Zope/(Zope 2.10.5-final, python 2.4.4, linux2) ZServer/1.1 Plone/3.0.3
Content-Type: text/html; charset=utf-8
Location: https://www.xxxxx.org/thankyou_trial
Set-Cookie: statusmessages="A+RXZWxjb21lISBZb3UgYXJlIG5vdyBsb2dnZWQgaW4uaW5mbwTEWW91ciBUcmlhbCBTdWJzY3JpcHRpb24gSXMgTm93IEFjdGl2ZS5pbmZv"; Path=/
Via: 1.1 xxxxx.org
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 20
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive

Well, the page loads fine when I hit the site directly in Zope, and those headers look pretty innocuous. At this point my only guess was that the gzipping was screwing things up somehow, and thus followed a half-hour-long digression of confirming the gzip settings in CacheFu, confirming the mod_deflate settings in apache, and finally disabling gzipping entirely in Firefox. All of which served only to tell me in the end that the problem had nothing to do with gzipping.

Pull out the big guns: Google

At this point I did what any self-respecting software professional would do: I typed the error message into Google. This is where things really start to get interesting…

A query of zope "bad HTTP/1.1 header" netted me a few irrelevant-looking results. But aha, also an old e-mail thread that describes a similar error, and that has a link to a Plone trac ticket that…well, “Plone Hotfix 20071106 breaks long status messages” doesn’t exactly sound relevant, but look, there’s a comment that mentions the bug results in a 502 gateway error when running behind apache — which matches my experience — and it even says the bug is still present in Plone 3.0.3, the version I’m using on this particular site. And the description of the issue — a problem with the statusmessage cookie header — sounds reasonably close to the diagnosis I was guessing at. Hey, maybe we’re on to something!

But something is still not right. The comments on the ticket also say that a newer version of the hotfix that introduced the bug, PloneHotfix20071106, was released to fix the issue all the way back in November 2007. And my buildout already includes this updated version — it’s right there as:

http://plone.org/products/plone-hotfix/releases/20071106-2/PloneHotfix20071106.tar.gz

(Note the “-2”). Hmm, now that I think about it, maybe I have a vague recollection of encountering this same issue sometime in the distant past, and updating that hotfix in the buildout. But, obviously the fix is not working now. So what’s going on?

The epiphany

And then it hit me.

David Glick: 23:14:50 holy crap, I think the real problem here is related to the recent plone.org upgrade
David Glick: 23:14:56 no shit 🙂

Let me explain. During the recent plone.org upgrade, all the product tarball files stored in PloneSoftwareCenter were migrated from being stored in the ZODB to being stored in a big directory on the filesystem using FileSystemStorage and collective.psc.externalstorage. (This is part of the efforts to make PSC function as a pypi-style repository, a cool ability which I hope Alex Clark and Tarek Ziad?? will publicize before too long.) Now, the thing I noticed that made me think “aha!” is that the filename listed for the hotfix tarball in my buildout has no version suffix. And it didn’t take me much longer to check the older release of that hotfix and see that it has the exact same tarball filename. And then, knowing what I did about the plone.org upgrade, it didn’t take much imagination to think that the two releases must now be accessing the same file on the filesystem, and the “-2” release was getting the wrong data. And this was quickly confirmed by inspection of the actual Hotfix code that had been fetched into my buildout.

To express my feelings at this point, echoing the immortal words of our Zope Pope:

“WAAAA!”

Case closed? You betcha. I stopped using the broken tarball and checked out the correct version of the hotfix from the svn tag instead. After restarting Zope, no more 502 error when I hit the form. And I followed up soon thereafter by replacing the hotfix tarball with the most recent release so that others shouldn’t run into this same issue.

Note, however, that there’s no guarantee at this point that other packages aren’t experiencing a similar issue, so keep your wits about you when installing old-style product tarballs from plone.org. However, I have filed a report and trust that Alex or Tarek will be able to work on it before too long.

Of course, it goes without saying that I was rather lucky to have the familiarity with the plone.org upgrade needed to correctly diagnose the cause of this issue. But otherwise, just another day in the life of a Plone hacker…



hello world!

According to Abby today is blog de-lurking day, a.k.a. a time to reveal yourself as a reader of someone’s blog by posting a comment. In my case, I suppose it makes an excuse to stop lurking on my own blog and catch you all up (faithful readers) on what I’ve been up to since Sept. 4.

September

  • After a few more false cooking starts, I eventually gathered the equipment I needed and started to feel more settled in to my new apartment:

    View Larger Map
  • I welcomed and started getting to know the new bunch of 8 volunteers at the Mennonite Voluntary Service house. They are a fun group and I’ve enjoyed hanging out with them once or week (sometimes more!) for bible study and watching The Office.
  • I took the train down to Oregon for a weekend, to visit ONE/Northwest‘s Portland office as well as James, Darla, Levi, and Bethany in Salem…great to see each of you! Unfortunately I lost my camera on this trip, so the photos included here are thanks to friends…
  • I spent 3 days at a ONE/Northwest staff retreat on a lake not too far away. This was a wonderful time of getting to know my colleagues better and improving our shared vision of how our organization can be most effective at helping address the climate crisis (at least, when I wasn’t getting distracted by a certain server crisis). And I got to introduce the staff to Cowtris during our no-talent night.

    the web team works to avert a crisis???

    not cold

  • I hosted a homemade ice cream party in honor of my 23rd birthday, which was a great success with some 30 guests who struggled to navigate the tight confines of this apartment.

October

  • I had the privilege to attend the 2008 Plone Conference in Washington, D.C. (Plone is an open source content management system used for building websites; I use it extensively at work and have enjoyed getting involved in contributing to the community that develops the software, as I described in the new Plone section of this blog). In addition to spending plenty of time doing geeky things like giving a talk on debugging Plone, I had some awesome Ethiopian food and spent some great times with Kyle and Christy (helping them decorate their apartment, watching nature documentaries, and trying to figure out how to get to the other side of the railroad tracks) as well as Don, Rosanna, Sharada, Eric, and Alberto (playing games and giving lessons in proper Facebook ettiquette).

    Plone Conference 2008

    2939919455_3fb6598ed6_b.jpg

  • I made my infamous appearance as the bear market at the MVS Halloween party.

    RAAAAAAWWWR!

Whoops, it’s past my bedtime and I still have 2 months to go…guess there will have to be another installment sometime soon.