Archive for the 'Plone' Category


Tahoe sprint off and running

After carpooling together from San Francisco yesterday afternoon, the five participants in the Tahoe Snow Sprint arrived at our swanky lodgings overlooking Lake Tahoe. The decor may be questionable, but there is more than enough space for us all, leaving us with crucial questions such whether to use the first floor or third floor bar at any given time.

We haven’t done much coding yet, as much of the evening was spent with me giving a walkthrough of current Dexterity functionality, accompanied by much discussion of what works well and what doesn’t, and what things we’d like to work on improving.

I will work on making various improvements to the usability and functionality of the through-the-web content type editor, hopefully with some help on UI design from Alex Limi.

Alex also hopes to work on designing some improved widgets.

Ross Patterson will work on adding support for Choice fields with vocabularies and sources to the TTW editor.

David Brenneman will look into ways of allowing Archetypes content to reference Dexterity content.

And I believe Joel Burton will work on exporting content types created through the web to a full installable package (as opposed to simply exporting the FTI like you can do already via portal_setup).

So let the sprinting begin!

We are hanging out in #sprint on freenode, so feel free to stop by and say hello to the tahoebot.



reflections on PyCon 2010

I just got back from the US PyCon 2010, my first Python Conference, where I had a blast. The conference felt to me a lot like Plone conferences in spirit, only with a greater diversity of software projects and of course more people (a record attendance of ~1100). It was held at the Hyatt in downtown Atlanta and was a great success logistically. One success in the organization of the conference was the push to get more women to attend, which resulted in 11% female attendees, an increase over previous years which I hope will continue as a trend.

Some highlights of the talks I attended were:

  • “Building Leafy Chat, DjangoDose, and Hurricane: Lessons Learned on the Real-Time Web with Python” by Alex Gaynor – Introduced me to Orbited, Twisted, Redis, and other tools for building scalable, interactive websites.
  • “Managing the world’s oldest Django project” by James Bennett – I found myself drawing parallels between the evolution of Django and Ellington that James presented and that of Zope and Plone. The Django community is learning the same lessons about testing and reusability that we have.
  • “What every developer should know about database scalability” by Jonathan Ellis – good general overview of different strategies for replication and caching (focused on concepts rather than any particular software)
  • “Powerful Pythonic Patterns” by Alex Martelli – philosophizing on software patterns and anti-patterns in the Python context
  • “Demystifying Non-Blocking and Asynchronous I/O” by Peter A Portante – very helpful beginner-level overview
  • “Unladed Swallow: fewer coconuts, faster Python” by Collin Winter – an update on the state of Unladen Swallow, which was approved for being merged into CPython during the language summit just before PyCon
  • “Pynie: Python 3 on Parrot” by Allison Randal – This one was for fun…I might keep an eye on Pynie just to see how a language actually gets implemented.
  • “How Python is guiding infrastructure construction in Africa” by Roy Hyunjin Han – Covered the use of Python for recognizing buildings in satellite imagery to help with planning development, etc.
  • “Why not run all your tests all the time? A Study of continuous integration systems” by C. Titus Brown – Bottom line: “Use Hudson.”
  • the infamous Testing in Python BoF, which was a 3-hour lightning talk session organized one evening by the folks from Disney, complete with pizza, beer, heckling, and goats (the goat meme was introduced by Terry Peppers as an alternative to lolcats in slides, and ended up being adopted as a testing mascot).
  • “Tests and Testability” by Ned Batchelder – Not a lot new here for me, but a good overview by the creator of coverage.py.

Selecting which talk to go to was sometimes excruciating, and I’m looking forward to catching up with some of the ones I missed. Some of the ones I’ve heard recommended are:

  • “Deployment, development, packaging, and a little bit of the cloud” by Ian Bicking
  • “The state of Packaging” by Tarek Ziadé
  • “Scaling your Python application on EC2” by Jeremy Edberg – learnings from reddit
  • “Dude, Where’s My Database?” by Eric Florenzano
  • “Understanding the Python GIL” by David Beazley – the hot topic of the conference
  • “The Python and the Elephant: Large Scale Natural Language Processing with NLTK and Dumbo” by Nitin Madnani and Jimmly L. Lin

Videos of the talks are, amazingly, already becoming available. Kudos to the A/V team.

On Sunday my attention waned and I got a bit mischievous. The Eldarion guys, who created Type War, set up OHWar, a type war clone where you compete to correctly guess who said various quotes that were overheard at PyCon. After playing for far too long and still failing to stay in first place for long, I decided it was a job for Python and created an OHWar-playing bot. I left it running in screen and came back a few hours later to find that I had not only topped the leaderboard but also hit the game’s built-in score limit. 🙂 This was also the evening that David Brenneman and I found the Django Pony unattended and added some “enhancements.” 😉

Django pony with Plone stickers

Zope and Plone were not very visible in the conference schedule (there was one talk on Plone GetPaid and Satchmo, one on using Plone with Salesforce in which I contributed a few minutes of technical material to go with Chris Johnson’s high-level overview, and one on the interface/adapter concepts…as well as a couple relating to repoze.bfg which has a Zopish ancestry). On the other hand, I believe Plone was, surprisingly, the only open source project with a booth in the exhibition hall. We had a nice-looking display with the Plone banner that continues to be passed around to US events, a bunch of collateral and books for display, and a big monitor for demoing Plone 4. Various people took turns staffing the booth, including members of the Atlanta Plone group, and Chris Calloway for much of Saturday. The Plone Foundation also subsidized World Plone Day T-shirts which a bunch of us wore on Saturday. We gathered for a photo and ended up with around 30 people.

Plone folks at PyCon

During the conference, a highlight for me was meeting and eating meals with various luminaries, including Jason Huggins (of Selenium fame), Holger Krekel (founder of PyPy), Wesley Chun (author of Core Python Programming) and even Guido himself (well, way down at the other end of the table). I also got to interact briefly with Allison Randal (from the Perl community), while trying out and submitting a new test for pynie, a nascent Python implementation for the Parrot VM. I also now have a face to put with many additional names that I had only seen online before.

I was only able to join the sprints for one day, and mostly spent my time working on some miscellaneous tasks I hadn’t been getting too. However we were able to have a good meeting of GetPaid folks, to try to determine how to move forward with Brandon Rhodes’ work to clean up payment processor configuration. I also did some refactoring of the GetPaid development buildout to clean it up, make sure it still works, and pave the way for updating the product for compatibility with Plone 4. If I had been able to stay longer, I think it would have been fun to participate in the great work being done in the Python packaging sprint, led by Tarek Ziadé and the Packaging Pig. Next year I will have to be sure to attend the entire sprint.



Reflections on building a member directory using Plone and Salesforce.com

I promised Chris Johnson that I would write up some of my learnings from a project integrating Plone and Salesforce.com, which
Groundwire is just finishing up. So here you go, Chris!

The goal of the project is to provide web access to a directory of businesses who have paid for membership and inclusion in our client’s directory — while keeping the master data for the directory within Salesforce.com, not Plone. This involves several crucial challenges:

  1. How to present views for searching and browsing the Salesforce directory data within Plone
  2. How to provide the ability for businesses to log in and update their member profile
  3. How to provide the ability for businesses to apply and complete payment for membership, as well as to renew membership each year.

In this article, I’m going to focus on explaining how I approached the first two challenges. This is much more of a hand wave in the right direction, assuming a fair amount of background in Plone, than a detailed tutorial. That said, feel free to ask me questions about aspects of the implementation that I gloss over here.

Exposing the directory within Plone

Querying Salesforce directly on each request is a non-starter for many use cases. That’s because Salesforce puts a pretty low limit on the number of API requests allowed per day (something like 1000 per user license). This means that we need a way to mirror data from Salesforce within Plone, and then update it in batch (thereby using fewer API requests) every night. (Building the directory as VisualForce pages within Salesforce Sites would be a valid alternative in some cases — though requiring more work to integrate visually. But for this project it was a requirement that we be able to store additional data such as logos within Plone, as well as link to related content items for a business.)

How do we model data from Salesforce within Plone? It depends on what you need to do with the content in Plone. If you just need to be able to search and display a listing of results, then there is no reason to create full-fledged content items. In the past, for a case like this, I have just created temporary stub objects during a nightly dump of data from Salesforce, indexed them in a custom catalog, and then discarded the stubs. This is the most lightweight option; you have a catalog full of data for building your search views, but no unnecessary data hanging around.

If you actually need to be able to navigate to a full page view of a particular directory item, then you probably need an actual content item. I think Dexterity would be promising for this sort of thing, but for the project I’m just now wrapping up, I used Archetypes because I needed image scaling and the ability to link to other AT content as related items, both of which Dexterity doesn’t have great support for yet.

Note that you don’t actually need to define most aspects of the schema, if there are fields you want to display but don’t need to have editable within Plone. For example, my schema looks something like this:

MemberProfileSchema = document.ATDocumentSchema.copy() + atapi.Schema((
    atapi.TextField('sf_id'),
    atapi.TextField('mailingAddress'),
    # etc...
))
# hide most fields
for field in MemberProfileSchema.fields():
    if field.schemata == 'default' and field.__name__ not in ('text',):
        field.widget.visible = {'edit':'invisible', 'view':'visible'}

Fields like mailingAddress get populated during the nightly data dump, but don’t appear on the edit form if you edit the member profile. Why not? Well, mostly because I figured it would be hard to get an Archetypes edit form to save things to Salesforce as well as Plone. Alex Tokar at Web Collective tells me he has successfully taken this approach, though.

Here is an abbreviated version of the browser view that is called once a night to pull in the data from Salesforce:

"""
SFDC sync view. This is intended to be run via cron every night to update
the member profiles based on data from Salesforce.com.

It will:

 * Find all Accounts with a member status of 'Current' or 'Grace Period' (in
   our client's Salesforce schema this is a custom rollup field based on various
   criteria).
 
 * For each Account, find an existing Member Profile object in Plone whose
   'sf_id' field value equals the Id of the Account, and update it.
   
 * Or, if no existing Member Profile was found, create a new one and publish it.

 * Retract any existing Member Profiles that were no longer found as Accounts
   with the Active or Grace Period membership status in Salesforce, so they are
   still present but not publicly visible.

"""

import logging
import transaction
from zope.component import getUtility
from Products.Five import BrowserView
from Products.CMFCore.utils import getToolByName
from plone.i18n.normalizer.interfaces import IIDNormalizer
from Products.CMFPlone.utils import safe_unicode
from Products.CMFPlone.utils import _createObjectByType

SOBJECT_TYPE = 'Account'
FIELDS_TO_FETCH = (
    'Id',
    'Name',
    'Description',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    # etc...
    )
FETCH_CRITERIA = "Member_Status__c = 'Current' OR Member_Status__c = 'Grace Period'"
DIRECTORY_ID = 'directory'
PROFILE_PORTAL_TYPE = 'Member Profile'

logger = logging.getLogger('SFDC Import')

class UpdateMemberProfilesFromSalesforce(BrowserView):
    
    def __init__(self, context, request):
        BrowserView.__init__(self, context, request)
        self.catalog = getToolByName(self.context, 'portal_catalog')
        self.wftool = getToolByName(self.context, 'portal_workflow')
        self.normalizer = getUtility(IIDNormalizer)
    
    def getDirectoryFolder(self):
        portal = getToolByName(self.context, 'portal_url').getPortalObject()
    
        # create the directory folder if it doesn't exist yet
        try:
            directory = portal.unrestrictedTraverse(DIRECTORY_ID)
        except KeyError:
            _createObjectByType('Large Plone Folder', portal, id=DIRECTORY_ID)
            directory = getattr(portal, DIRECTORY_ID)
        
        return directory
    
    def findOrCreateProfileBySfId(self, name, sf_id):
        res = self.catalog.searchResults(getSf_id = sf_id)
        if res:
            # update existing profile
            profile = res[0].getObject()
            logger.info('Updating %s' % '/'.join(profile.getPhysicalPath()))
            return profile
        else:
            # didn't match sf_id or UID: create new profile
            name = safe_unicode(name)
            profile_id = self.normalizer.normalize(name)
            directory = self.getDirectoryFolder()
            profile_id = directory.invokeFactory(PROFILE_PORTAL_TYPE, profile_id)
            profile = getattr(directory, profile_id)
            profile.setSf_id(sf_id)
            profile.reindexObject(idxs=['getSf_id'])
            logger.info('Creating %s' % '/'.join(profile.getPhysicalPath()))
        
        return profile
    
    def updateProfile(self, profile, data):
        profile.setSf_id(data.Id)
        profile.setTitle(data.Name)
        if not profile.getText():
            profile.setText(data.Description, mimetype='text/x-web-intelligent')
        profile.setMailingAddress("%s\n%s, %s %s" % (data.BillingStreet, data.BillingCity,
                                                     data.BillingState, data.BillingPostalCode))
        # etc...
        
        # publish and reindex
        try:
            self.wftool.doActionFor(profile, 'publish')
        except:
            pass
        profile.reindexObject()
    
    def hideProfileBySfId(self, sf_id):
        res = self.catalog.searchResults(getSf_id = sf_id)
        profile = res[0].getObject()
        try:
            self.wftool.doActionFor(profile, 'reject')
        except:
            pass

    def queryMembers(self):
        """ Returns an iterator over the records of active members from Salesforce.com """
        sfbc = getToolByName(self.context, 'portal_salesforcebaseconnector')
        where = '(' + FETCH_CRITERIA + ')'
        soql = "SELECT %s FROM %s WHERE %s" % (
            ','.join(FIELDS_TO_FETCH),
            SOBJECT_TYPE,
            where)
        logger.debug(soql)
        res = sfbc.query(soql)
        logger.info('%s records found.' % res['size'])
        for member in res:
            yield member
        while not res['done']:
            res = sfbc.queryMore(res['queryLocator'])
            for member in res:
                yield member
    
    def __call__(self, queryMembers=queryMembers):
        """ Updates the member directory based on querying Salesforce.com """
        
        # 0. get list of sf_ids for the profiles we already know about, so we
        # can keep track of which ones we need to make private
        sf_ids_not_found = set(self.catalog.uniqueValuesFor('getSf_id'))
        
        # 1. fetch active Member Profile records, update ones that match,
        #    and create new ones
        for i, data in enumerate(queryMembers(self)):
            profile = self.findOrCreateProfileBySfId(name = data.Name, sf_id = data.Id)
            self.updateProfile(profile, data)
            
            # commit periodically (every 10) to avoid conflicts
            if not i % 10:
                transaction.commit()
            
            # keep track of which profiles we need to hide
            try:
                sf_ids_not_found.remove(data.Id)
            except KeyError:
                pass
        
        # 2. hide any profiles that are no longer active
        for sf_id in sf_ids_not_found:
            self.hideProfileBySfId(sf_id)

All that’s left is writing the view which actually queries the catalog for these member profiles and presents them as a listing, which is relatively straightforward, and left as an exercise for the reader. 🙂

Allowing updates to directory profiles

So if the Archetypes content type doesn’t allow edits to most of its fields,
how did I provide for logged-in members to edit profile info? Well, there are 2 parts:

  1. The Salesforce Auth Plugin allows
    logins to Plone based on Account records in Salesforce (by matching on custom username and password fields on the Account).

  2. A custom z3c.form form reads values from the Account associated with the currently logged-in user, and writes to both that Account record in Salesforce and also to the associated Member Profile archetype within Plone (so that updates appear in the directory immediately).

I won’t go into detail on the configuration of the Auth Plugin, as it is covered in the package’s documentation. I configured it to load the Salesforce Id of the Account and several other fields into PAS member properties, for easy access within Plone. I did not configure all of the account fields as member properties — while I could have done so, I didn’t see much utility in that, since Plone can’t (at least not yet) automatically generate an edit form for all the member properties.

Instead, I built a custom z3c.form form that reads and writes directly to Salesforce. This turned out to be less complicated than I anticipated, mostly thanks to a new ORM-style library I built for wrapping the objects returned from Salesforce by beatbox (with attributes corresponding to Salesforce field names) with a model whose attribute names match the field names of the form schema — allowing use of the wrapper as the context of a z3c.form form. I’m not yet going to post the implementation of this library, as I intend to make some significant changes to the API before releasing it (real soon now?). But let me at least show you what using it looks like (again I have simplified from the real code):

from zope.interface import implements
from z3c.form import form, field, button
from plone.z3cform.layout import wrap_form
from plone.memoize.instance import memoize
from Products.CMFCore.utils import getToolByName

from sforzando import SFObject, SFField

class IAccountGeneralInfo(Interface):
    """ Schema for member profile edit form """
    business_name = schema.TextLine(title = u'Business Name')
    # etc...

class SFAccount(SFObject):
    """ Adapts a Salesforce Account to the profile edit form schema"""
    implements(IAccountGeneralInfo)
    
    _sObjectType = 'Account'
    
    sf_id = SFField('Id')
    business_name = SFField('Name')
    # etc...

class ProfileEditForm(form.Form):
    """ An edit form for the current authenticated member's Account """
    
    label = u'Update Profile'
    fields = field.Fields(IAccountGeneralInfo)

    def _get_sf_id(self):
        """ Find the Salesforce Account Id corresponding to the current logged in member. """
        mtool = getToolByName(self.context, 'portal_membership')
        member = mtool.getAuthenticatedMember()
        sf_id = member.getProperty('sf_id')
        if not sf_id:
            raise Exception("Did not find valid Salesforce ID for member '%s'" % member.getId())
        return sf_id

    @memoize
    def getContent(self):
        """ Provides the object this form will edit.
            Memoized so we always get the same one for a given request. """
        sfbc = getToolByName(context, 'portal_salesforcebaseconnector')
        return SFAccount(sfbc, "Id='%s'" % self._get_sf_id())

    @button.buttonAndHandler(u'Update Profile')
    def handleUpdate(self, action):
        """ Handler for the Update Profile button """
        data, errors = self.extractData()
        if not errors:
            self.status = u'Changes saved.'
            # save changes to Salesforce
            sf_id = self._get_sf_id()
            sfbc = getToolByName(context, 'portal_salesforcebaseconnector')
            SFAccount.update(sfbc, id=sf_id, **data)
            # etc...additional code to update the local AT-based copy of the Account data...

ProfileEditView = wrap_form(ProfileEditForm)

Formlib would probably also work just as well as z3c.form. And certainly using a PloneFormGen form with the ‘update’ feature of the salesforcepfgadapter would work without need for coding, if you don’t need a particularly fancy form. As long as you mapped the Salesforce object Id as a member property in the Auth Plugin configuration, it’s pretty easy to use that as the basis for determining which object the form should edit.

In conclusion

I’m pretty excited about the results of this project, which is one of the deeper integrations of Plone and Salesforce.com that I have worked on, and which builds on the tools Groundwire has led the development of over the past few years — especially the Salesforce Auth Plugin. Giving Plone the ability to accept logins based on a CRM system opens the door to a lot of exciting possibilities — think about being able to show visitors targeted content based on what your database knows about their interests or location, or allowing them to share content with other visitors from the same geographic area.

If you are putting to good use the tools and code discussed here, or are finding other cool things to do by integrating Plone and Salesforce, I’d love to hear about it.



Using HAProxy with Zope via Buildout

After my post on reducing GIL contention by using fewer Zope threads, Lee Joramo asked for more information on setting up HAProxy, so let me share my configuration. Much of the credit for this goes to Hanno Schlichting and Alex Clark, who offered me much good advice and a sample configuration, respectively.

First, a few words about what HAProxy offers. For the past couple years I’ve been using Pound to load balance between multiple backend Zope instances. But recently I’ve been hearing recommendations from people I trust (such as Jarn and Elizabeth Leddy) to try HAProxy instead.

HAProxy offers some nice features:
– Backend health checks
– Various load-balance algorithms for how requests get distributed to backends
– Can do sticky sessions so that an authenticated user always hits the same backend
– Warmup time (don’t send as many requests to a Zope instance while it’s starting up)
– Provides a status page giving info on backend status and uptime, # of queued requests, # of active sessions, # of errors, etc.

Some of these are possible with pound too, but the status screen was really the “killer app” for me. This is fun to watch but also very useful for doing rolling restarts when new code needs to be deployed without an interruption in service.

HAProxy status page

Configuration

In my buildout.cfg I added:

[buildout]
...
parts =
    ...
    haproxy-build
    haproxy-conf

[haproxy-build]
recipe = plone.recipe.haproxy
url = http://dist.plone.org/thirdparty/haproxy-1.3.22.zip

[haproxy-conf]
recipe = collective.recipe.template
input = ${buildout:directory}/haproxy.conf.in
output = ${buildout:directory}/etc/haproxy.conf
maxconn = 24000
ulimit-n = 65536
user = zope
group = staff
bind = 127.0.0.1:8080

Here, we add a part called “haproxy-build” which uses the plone.recipe.haproxy recipe to build haproxy from source and add a bin/haproxy script for running it, and a part called “haproxy-conf” which builds the HAProxy configuration file by filling in variables in a template file called haproxy.conf.in.

Be sure to set the user and group variables to the user and group you want HAProxy to run as, and update the bind variable to set the port to which HAProxy should bind.

I run most of my Plone stack using supervisord, so I also updated my supervisord configuration in buildout to run HAProxy:

[supervisor]
recipe = collective.recipe.supervisor
...
programs =
    ...
    10 haproxy ${buildout:directory}/bin/haproxy [ -f ${buildout:directory}/etc/haproxy.conf -db ]

In a real life deployment, you’ll probably also want a caching reverse proxy like squid or varnish sitting in front of HAProxy.

What about the contents of haproxy.conf.in? Here’s mine:

global
  log 127.0.0.1 local6
  maxconn  ${haproxy-conf:maxconn}
  user     ${haproxy-conf:user}
  group    ${haproxy-conf:group}
  daemon
  nbproc 1

defaults
  mode http
  option httpclose
  # Remove requests from the queue if people press stop button
  option abortonclose
  # Try to connect this many times on failure
  retries 3
  # If a client is bound to a particular backend but it goes down,
  # send them to a different one
  option redispatch
  monitor-uri /haproxy-ping

  timeout connect 7s
  timeout queue   300s
  timeout client  300s
  timeout server  300s

  # Enable status page at this URL, on the port HAProxy is bound to
  stats enable
  stats uri /haproxy-status
  stats refresh 5s
  stats realm Haproxy\ statistics

frontend zopecluster
  bind ${haproxy-conf:bind}
  default_backend zope

# Load balancing over the zope instances
backend zope
  # Use Zope's __ac cookie as a basis for session stickiness if present.
  appsession __ac len 32 timeout 1d
  # Otherwise add a cookie called "serverid" for maintaining session stickiness.
  # This cookie lasts until the client's browser closes, and is invisible to Zope.
  cookie serverid insert nocache indirect
  # If no session found, use the roundrobin load-balancing algorithm to pick a backend.
  balance roundrobin
  # Use / (the default) for periodic backend health checks
  option httpchk

  # Server options:
  # "cookie" sets the value of the serverid cookie to be used for the server
  # "maxconn" is how many connections can be sent to the server at once
  # "check" enables health checks
  # "rise 1" means consider Zope up after 1 successful health check
  server  plone0101 127.0.0.1:${zeoclient1:http-address} cookie p0101 check maxconn 2 rise 1
  server  plone0102 127.0.0.1:${zeoclient2:http-address} cookie p0102 check maxconn 2 rise 1

This assumes that I have Zope instances built by parts called “zeoclient1” and “zeoclient2” in my buildout; you’ll probably need to update those names.

You may want to adjust the “option httpchk” line to use a different URL for checking whether the Zope instances are up — you want to point at something that can be rendered as quickly as possible (in my case it’s the Zope root information screen, so I’m not too worried).

The maxconn setting for each backend should be at least the number of threads that that Zope instance is running. Laurence Rowe pointed out to me that it should probably not be set to 1, since Zope also serves some things (blobs and ) via file stream iterators, which happens apart from the main ZPublisher threads. (So setting maxconn to 1 would mean serving a large blob could block other requests to that backend, for instance.)

See the HAProxy configuration documentation for more details on the settings that can be used in this file.



on Zope, multiple cores, and the GIL

I recently installed HAProxy as a load-balancer for a site that had previously been running using a single Zope instance using 4 threads. I switched to 2 instances using 2 threads each, load-balanced by HAProxy. I wasn’t anticipating that this change would have a noticeable effect on the site’s performance, so was happily surprised when the client mentioned that users of the site were commenting on the improved speed.

But why did the site get faster?

Looking at a munin graph of server activity, I observed a noticeable drop in the number of rescheduling interrupts — a change that coincided with my change in server configuration:

graph showing decreased contention when I switched to more Zope instances with fewer threads

I suspect that the “before” portion of this graph illustrates a problem that occurs when running multi-threaded Python programs on multi-core machines, wherein threads running in different cores fight for control of the Global Interpreter Lock (a problem Dave Beazley has called to the community’s attention in a recent presentation) — and that this explains the improvement in performance once I switched to multiple processes with fewer threads. By switching to multiple processes, we let concurrent processing get managed by the operating system, which is much better at it.

Moral of the story: If you’re running Zope on a multi-core machine, having more than 2 threads per Zope instance is probably a bad move performance-wise, compared to the option of running more (load-balanced) instances with fewer threads.

(Using a single thread per instance might be even better, although of course you need to make sure you have enough instances to still handle your load, and you need to make sure single-threaded instances don’t make calls to external services which then call back to that instance and block. I haven’t experimented with using single-threaded instances yet myself.)



Come improve Dexterity at the Tahoe Snow Sprint

This year the West Coast is hosting our own version of the infamous Snow Sprint. I’m really looking forward to spending a week coding, hanging out with Plonistas, and playing in the snow at the upcoming Tahoe Snow Sprint, organized by David Brenneman (dbfrombrc) and coming to California’s Sierra Nevada this March 15-19.

The goal of the sprint is to improve the Dexterity content type framework (a modern alternative to Archetypes created by Martin Aspeli and others). As part of the Dexterity team, I want to offer the following list of potential projects to help get your creative juices flowing.

At the sprint, you could…


Implement one of Dexterity’s missing features, such as:

Fix some of the other outstanding issues in the Dexterity issue tracker.

Create a ZopeSkel template for Dexterity-based projects.

Improve the through-the-web content type editor.

  • improve usability and/or sexiness
  • add UI for exporting types for work on the filesystem
  • add support for defining vocabularies
  • add support for selecting/configuring custom widgets

Create an editor that allows through-the-web editing of new behaviors (which can then be applied to existing types in a schema-extender-like fashion.)

Add a view editor to accompany the through-the-web schema editor. Deco is coming and will be great, but in the meantime it would be nice to at least have something that generates a basic view template based on your schema and then lets you tweak it and export it to the filesystem.

Build a better workflow editor to accompany the above.

Write a guide to migrating Archetypes-based content types to Dexterity. Or build a tool to do it automatically.

Create replacements for the ATContentTypes types using Dexterity types.

Determine how to handle existing content items sanely when editing schemas.

Devise a PloneFormGen successor that stores its schema in a fashion similar to Dexterity, and makes it easy to convert a form + results into a full-blown content type. Bonus points if the form editing is done using Deco. 🙂


There are so many interesting possibilities I’m having trouble deciding what to focus on myself. Space is limited, so if any of this strikes your fancy, head on over to Coactivate and sign right up to join us at the sprint!



mr.igor

Today I released mr.igor, a utility for helping you write Python faster by filling in missing imports based on where you’ve imported the names from before.

Here’s a one-minute screencast showing how it works.



Recombining ZODB storages

I recently faced the task of joining back together a Plone site composed of 4 ZODB filestorages that had been (mostly through cavalier naïveté on my part) split asunder some time ago.

Normally I would probably just do a ZEXP export of each of the folders that lived in its own mountpoint, then remove the mountpoints and reimport the ZEXP files into the main database. However, that wasn’t going to work in this case because the database included some cross-database references.

Some background: Normally in Zope, mountpoints are the only place where one filestorage references another one, but the ZODB has some support for *any* object to link to any other object in any other database, and this can happen within Zope if you copy an object from one filestorage to another. This is generally bad, since the ZODB’s support for cross-database references is partial — when you pack one filestorage, the garbage collection routine doesn’t know about the cross-database references (unless you use zc.zodbdgc), so an object might get removed even if some other filestorage still refers to it, and you’ll get POSKeyErrors. Also, in ZODB 3.7.x, the code that handles packing doesn’t know about cross-database references, so you’ll get KeyError: ‘m’ or KeyError: ‘n’ while packing.

Well, this is what had happened to my multi-database, and I wanted to keep those cross-database references intact while I merged the site back into one monolithic filestorage. So I ended up adapting the ZEXP export code to:

  1. traverse cross-database references (the standard ZEXP export ignores them and will not include objects in different filestorages from the starting object),
  2. traverse ZODB mountpoints (removing them in the process),
  3. and rewrite all the oids to avoid collisions in the new merged database.

Here is the script I ended up with. If you need to use it, you should:

  1. Edit the final line to pass the object you want to start traversing from, and the filename you want to write the ZEXP dump to.
  2. Run the script using bin/instance run multiexport.py
"""Support for export of multidatabases."""

##############################################################################
#
# Based on the ZODB import/export code.
# Copyright (c) 2009 David Glick.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE
#
##############################################################################

import logging
import cPickle, cStringIO
from ZODB.utils import p64, u64
from ZODB.ExportImport import export_end_marker
from ZODB.DemoStorage import DemoStorage

logger = logging.getLogger('multiexport')

def export_zexp(self, fname):
    context = self
    f = open(fname, 'wb')
    f.write('ZEXP')
    for oid, p in flatten_multidatabase(context):
        f.writelines((oid, p64(len(p)), p))
    f.write(export_end_marker)
    f.close()

def flatten_multidatabase(context):
    """Walk a multidatabase and yield rewritten pickles with oids for a single database"""
    base_oid = context._p_oid
    base_conn = context._p_jar
    dbs = base_conn.connections
    
    dummy_storage = DemoStorage()

    oids = [(base_conn._db.database_name, base_oid)]
    done_oids = {}
    # table to keep track of mapping old oids to new oids
    ooid_to_oid = {oids[0]: dummy_storage.new_oid()}
    while oids:
        # loop while references remain to objects we haven't exported yet
        (dbname, ooid) = oids.pop(0)
        if (dbname, ooid) in done_oids:
            continue
        done_oids[(dbname, ooid)] = True

        db = dbs[dbname]
        try:
            # get pickle
            p, serial = db._storage.load(ooid, db._version)
        except:
            logger.debug("broken reference for db %s, oid %s", (dbname, repr(ooid)),
                         exc_info=True)
        else:
            def persistent_load(ref):
                """ Remap a persistent id to a new ID and create a ghost for it.
                
                This is called by the unpickler for each reference found.
                """

                # resolve the reference to a database name and oid
                if isinstance(ref, tuple):
                    rdbname, roid = (dbname, ref[0])
                elif isinstance(ref, str):
                    rdbname, roid = (dbname, ref)
                else:
                    try:
                        ref_type, args = ref
                    except ValueError:
                        # weakref
                        return
                    else:
                        if ref_type in ('m', 'n'):
                            rdbname, roid = (args[0], args[1])
                        else:
                            return

                # traverse Products.ZODBMountpoint mountpoints to the mounted location
                rdb = dbs[rdbname]
                p, serial = rdb._storage.load(roid, rdb._version)
                klass = p.split()[0]
                if 'MountedObject' in klass:
                    mountpoint = rdb.get(roid)
                    # get the object with the root as a parent, then unwrap,
                    # since there's no API to get the unwrapped object
                    mounted = mountpoint._getOrOpenObject(app).aq_base
                    rdbname = mounted._p_jar._db.database_name
                    roid = mounted._p_oid

                if roid:
                    print '%s:%s -> %s:%s' % (dbname, u64(ooid), rdbname, u64(roid))
                    oids.append((rdbname, roid))

                try:
                    oid = ooid_to_oid[(rdbname, roid)]
                except KeyError:
                    # generate a new oid and associate it with this old db/oid
                    ooid_to_oid[(rdbname, roid)] = oid = dummy_storage.new_oid()
                return Ghost(oid)

            # do the repickling dance to rewrite references
            
            pfile = cStringIO.StringIO(p)
            unpickler = cPickle.Unpickler(pfile)
            unpickler.persistent_load = persistent_load

            newp = cStringIO.StringIO()
            pickler = cPickle.Pickler(newp, 1)
            pickler.persistent_id = persistent_id

            pickler.dump(unpickler.load())
            pickler.dump(unpickler.load())
            p = newp.getvalue()

            yield ooid_to_oid[(dbname, ooid)], p

class Ghost(object):
    __slots__ = ("oid",)
    def __init__(self, oid):
        self.oid = oid

def persistent_id(obj):
    if isinstance(obj, Ghost):
        return obj.oid

export_zexp(app.mysite, '/tmp/mysite.zexp')

Download multiexport.py

I’ve used this script with apparent success, but it has not been extensively tested and your mileage may of course vary.



Seeing a real-time breakdown of web traffic by vhost

Occasionally our servers are hit by traffic spikes. Since we typically host a number of websites per server, we need a way to quickly determine which site is receiving the bulk of incoming requests. (Then we can improve caching on that site, perhaps.) In order to see a real-time indication of what vhosts are being requested, we use the following awk script:

histo.awk

# creates a histogram of values in the first column of piped-in data
function max(arr, big) {
    big = 0;
    for (i in cat) {
        if (cat[i] > big) { big=cat[i]; }
    }
    return big
}

NF > 0 {
    cat[$1]++;
    if (!start) { start = $6 }
    end = $6
}
END {
    printf "from %s to %s\n", start, end
    maxm = max(cat);
    for (i in cat) {
        scaled = 60 * cat[i] / maxm;
        printf "%-25.25s  [%8d]:", i, cat[i]
        for (i=0; i<scaled; i++) {
            printf "#";
        }
        printf "\n";
    }
}

Which can be used like this:

watch 'tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3'

which will give a histogram of the occurence of vhosts in the last 100 lines of the apache log, updating every 2 seconds, sorted with the most frequent vhosts at the top. (Note that this assumes you are using an apache log format which includes the vhost as the first column.) It looks something like this:

Every 2.0s: tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3       Thu Oct  1 09:51:41 2009

www.dogwoodinitiative.org  [      49]:############################################################
www.wildliferecreation.or  [      24]:##############################
www.earthministry.org      [      14]:##################
blogs.onenw.org            [       3]:####
www.tilth.org              [       2]:###
www.oeconline.org          [       2]:###
www.audubonportland.org    [       1]:##
oraction.org               [       1]:##
oeconline.org              [       1]:##
dogwoodinitiative.org      [       1]:##
bandon.onenw.org           [       1]:##
209.40.194.148             [       1]:##
from [01/Oct/2009:09:51:21 to [01/Oct/2009:09:48:40

(Another useful variant of this is to produce a histogram of requests by IP address, which can help determine what to block in a DOS attack.)



Extending kupu’s initialization with a Javascript wrapper decorator

Today I found myself struggling to do something in Javascript that I’m used to doing with ease in Python — replace an existing method (defined by code I don’t want to touch) with a wrapper that calls the original method and then also performs some additional actions. (Yeah, it’s a monkey patch. But sometimes it’s a cleaner and more maintainable way to extend something than the alternatives.)

In particular, I was trying to adjust the default kupu configuration without overriding kupuploneinit.js to add commands directly to the initPloneKupu method. Here’s the snippet that got me there:

var augmentKupuInit = function(orig_fn) {
  return function(){
    var editor = orig_fn.apply(this, arguments);
    // do what you need to on the editor object here.
    // For example, I was trying to prevent kupu from
    // filtering the 'fb:fan' tag of Facebook's "Fan Box"
    // widget, like so:
    editor.xhtmlvalid.tagAttributes['fb:fan'] = ['*'];
    return editor;
  };
};
initPloneKupu = augmentKupuInit(initPloneKupu);

This defines a decorator function called augmentKupuInit that can be used to wrap another function. Then it uses it to wrap the original initPloneKupu method, calling the newly generated function initPloneKupu. As long as this snippet is registered in such a way that it loads after kupuploneinit.js and before the initPloneKupu method is called, it works like a charm!

(Many thanks to http://stackoverflow.com/questions/326596/how-do-i-wrap-a-function-in-javascript, which finally pointed me in the right direction.)