Metadata-Version: 1.0
Name: collective.transmogrifier
Version: 1.0
Summary: A configurable pipeline, aimed at transforming content for import and export
Home-page: http://pypi.python.org/pypi/collective.transmogrifier
Author: Jarn
Author-email: info@jarn.com
License: GPL
Description: **************
        Transmogrifier
        **************
        
        .. contents::
        
        Transmogrifier provides support for building pipelines that turn one thing
        into another. Specifically, transmogrifier pipelines are used to convert and
        import legacy content into a Plone site. It provides the tools to construct
        pipelines from multiple sections, where each section processes the data
        flowing through the pipe.
        
        A "transmogrifier pipeline" refers to a description of a set of pipe sections,
        slotted together in a set order. The stated goal is for these sections to
        transform data and ultimately add content to a Plone site based on this data.
        Sections deal with tasks such as sourcing the data (from textfiles, databases,
        etc.) and characterset conversion, through to determining portal type,
        location and workflow state.
        
        Note that a transmogrifier pipeline can be used to process any number of
        things, and is not specific to Plone content import. However, it's original
        intent is to provide a pluggable way to import legacy content.
        
        Installation
        ************
        
        See docs/INSTALL.txt for installation instructions.
        
        Credits
        *******
        
        Development sponsored by
        Elkjøp Nordic AS
        
        Design and development
        `Martijn Pieters`_ at Jarn_
        
        Project name
        A transmogrifier_ is fictional device used for transforming one object
        into another object. The term was coined by Bill Waterson of Calvin and
        Hobbes fame.
        
        .. _Martijn Pieters: mailto:mj@jarn.com
        .. _Jarn: http://www.jarn.com/
        .. _Transmogrifier: http://en.wikipedia.org/wiki/Transmogrifier
        
        Detailed Documentation
        **********************
        
        Pipelines
        =========
        
        To transmogrify, or import and convert non-plone content, you simply define a
        pipeline. Pipe sections, the equivalent of parts in a buildout_, are slotted
        together into a processing pipe. To slot sections together, you define a
        configuration file, define named sections, and a main pipeline definition that
        names the sections in order (one section per line):
        
        >>> exampleconfig = """\
        ... [transmogrifier]
        ... pipeline =
        ...     section 1
        ...     section 2
        ...     section 3
        ...
        ... [section 1]
        ... blueprint = collective.transmogrifier.tests.examplesource
        ... size = 5
        ...
        ... [section 2]
        ... blueprint = collective.transmogrifier.tests.exampletransform
        ...
        ... [section 3]
        ... blueprint = collective.transmogrifier.tests.exampleconstructor
        ... """
        
        As you can see this is also very similar to how you construct WSGI pipelines
        using paster. The format of the configuration files is defined by the Python
        ConfigParser module, with extensions that we'll describe later. At minimum, at
        least the transmogrifier section with an empty pipeline is required:
        
        >>> mimimalconfig = """\
        ... [transmogrifier]
        ... pipeline =
        ... """
        
        Transmogrifier can load these configuration files either by looking them up
        in a registry or by loading them from a python package.
        
        You register transmogrifier configurations using the ``registerConfig``
        directive in the http://namespaces.plone.org/transmogrifier namespace,
        together with a name, and optionally a title and description::
        
        <configure
        xmlns="http://namespaces.zope.org/zope"
        xmlns:transmogrifier="http://namespaces.plone.org/transmogrifier"
        i18n_domain="collective.transmogrifier">
        
        <transmogrifier:registerConfig
        name="exampleconfig"
        title="Example pipeline configuration"
        description="This is an example pipeline configuration"
        configuration="example.cfg"
        />
        
        </configure>
        
        You can then tell transmogrifier to load the 'exampleconfig' configuration. To
        load configuration files directly from a python package, name the package and
        the configuration file separated by a colon, such as
        'collective.transmogrifier.tests:exampleconfig.cfg'.
        
        Registering files with the transmogrifier registry allows other uses, such as
        listing available configurations in a user interface, together with the
        registered description. Loading files directly let's you build reusable
        libraries of configuration files more quickly though.
        
        In this document we'll use the shorthand *registerConfig* to register
        example configurations:
        
        >>> registerConfig(u'collective.transmogrifier.tests.exampleconfig',
        ...                exampleconfig)
        
        Pipeline sections
        -----------------
        
        Each section in the pipeline is created by a blueprint. Blueprints are looked
        up as named utilities implementing the ISectionBlueprint interface. In the
        transmogrifier configuration file, you refer to blueprints by the name under
        which they are registered. Blueprints are factories; when called they produce
        an ISection pipe section. ISections in turn, are iterators implementing the
        `iterator protocol`_.
        
        Here is a simple blueprint, in the form of a class definition:
        
        >>> from zope.interface import classProvides, implements
        >>> from zope.component import provideUtility
        >>> class ExampleTransform(object):
        ...     classProvides(ISectionBlueprint)
        ...     implements(ISection)
        ...
        ...     def __init__(self, transmogrifier, name, options, previous):
        ...         self.previous = previous
        ...         self.name = name
        ...
        ...     def __iter__(self):
        ...         for item in self.previous:
        ...             item['exampletransformname'] = self.name
        ...             yield item
        ...
        >>> provideUtility(ExampleTransform,
        ...                name=u'collective.transmogrifier.tests.exampletransform')
        
        Note that we register this class as a named utility, and that instances of
        this class can be used as an iterator. When slotted together, items 'flow'
        through the pipeline by iterating over the last section, which in turn
        iterates over it's preceding section (``self.previous`` in the example), and
        so on.
        
        By iterating over the source, then yielding the items again, each section
        passes items on to the next section. During the iteration loop, sections can
        manipulate the items. Note that items are python dictionaries; sections simply
        operate on the keys they care about. In our example we add a new key,
        ``exampletransformname``, which we set to the name of the section.
        
        Sources
        ~~~~~~~
        
        The items that flow through the pipe have to originate from somewhere though.
        This is where special sections, sources, come in. A source is simply a pipe
        section that inserts extra items into the pipeline. This is best illustrated
        with another example:
        
        >>> class ExampleSource(object):
        ...     classProvides(ISectionBlueprint)
        ...     implements(ISection)
        ...
        ...     def __init__(self, transmogrifier, name, options, previous):
        ...         self.previous = previous
        ...         self.size = int(options['size'])
        ...
        ...     def __iter__(self):
        ...         for item in self.previous:
        ...             yield item
        ...
        ...         for i in range(self.size):
        ...             yield dict(id='item%02d' % i)
        ...
        >>> provideUtility(ExampleSource,
        ...                name=u'collective.transmogrifier.tests.examplesource')
        
        In this example we use the ``options`` dictionary to read options from the
        section configuration, which in the example configuration we gave earlier has
        the option ``size`` defined as 5. Note that the configuration values are
        always strings, so we need to convert the size option to an integer here.
        
        The source first iterates over the previous section and yields all items
        unchanged. Only when that loop is done, does the source produce new items and
        puts those into the pipeline. This order is important: when you slot multiple
        source sections together, you want items produced by earlier sections to be
        processed first too.
        
        There is always a previous section, even for the first section defined in the
        pipeline. Transmogrifier passes in a empty iterator when it instantiates this
        first section, expecting such a first section to be a source that'll produce
        items for the pipeline to process.
        
        Constructors
        ~~~~~~~~~~~~
        
        As stated before, transmogrifier is intended for importing content into a
        Plone site. However, transmogrifier itself only drives the pipeline, inserting
        an empty iterator and discarding whatever it pulls out of the last section.
        
        In order to create content then, a constructor section is required. Like
        source sections, you should be able to use multiple constructors, so
        constructors should always start with yielding the items passed in from the
        previous section on to a possible next section.
        
        So, a constructor section is an ISection that consumes items from the previous
        section, and affects the plone site based on items, usually by creating
        content objects based on these items, then yield the item for a next section.
        For example purposes, we simply pretty print the items instead:
        
        >>> import pprint
        >>> class ExampleConstructor(object):
        ...     classProvides(ISectionBlueprint)
        ...     implements(ISection)
        ...
        ...     def __init__(self, transmogrifier, name, options, previous):
        ...         self.previous = previous
        ...         self.pprint = pprint.PrettyPrinter().pprint
        ...
        ...     def __iter__(self):
        ...         for item in self.previous:
        ...             self.pprint(item)
        ...             yield item
        ...
        >>> provideUtility(ExampleConstructor,
        ...                name=u'collective.transmogrifier.tests.exampleconstructor')
        
        With this last section blueprint example completed, we can load the example
        configuration we created earlier, and run our transmogrification:
        
        >>> from collective.transmogrifier.transmogrifier import Transmogrifier
        >>> transmogrifier = Transmogrifier(plone)
        >>> transmogrifier(u'collective.transmogrifier.tests.exampleconfig')
        {'exampletransformname': 'section 2', 'id': 'item00'}
        {'exampletransformname': 'section 2', 'id': 'item01'}
        {'exampletransformname': 'section 2', 'id': 'item02'}
        {'exampletransformname': 'section 2', 'id': 'item03'}
        {'exampletransformname': 'section 2', 'id': 'item04'}
        
        Developing blueprints
        ~~~~~~~~~~~~~~~~~~~~~
        
        As we could see from the ISectionBlueprint examples above, a blueprint gets
        called with several arguments: ``transmogrifier``, ``name``, ``options`` and
        ``previous``.
        
        We discussed ``previous`` before, it is a reference to the previous pipe
        section and must be looped over when the section itself is iterated. The
        ``name`` argument is simply the name of the section as given in the
        configuration file.
        
        The ``transmogrifier`` argument is a reference to the transmogrifier itself,
        and it can be used to reach the context we are importing to through it's
        ``context`` attribute. The transmogrifier also acts as a dictionary, mapping
        from section names to a mapping of the options in each section.
        
        Finally, as seen before, the ``options`` argument is a mapping of the current
        section options. It is the same mapping as can be had through
        ``transmogrifier[name]``.
        
        A short example shows each of these arguments in action:
        
        >>> class TitleExampleSection(object):
        ...     classProvides(ISectionBlueprint)
        ...     implements(ISection)
        ...
        ...     def __init__(self, transmogrifier, name, options, previous):
        ...         self.transmogrifier = transmogrifier
        ...         self.name = name
        ...         self.options = options
        ...         self.previous = previous
        ...
        ...         pipeline = transmogrifier['transmogrifier']['pipeline']
        ...         pipeline_size = len([s.strip() for s in pipeline.split('\n')
        ...                              if s.strip()])
        ...         self.size = options['pipeline-size'] = str(pipeline_size)
        ...         self.site_title = transmogrifier.context.Title()
        ...
        ...     def __iter__(self):
        ...         for item in self.previous:
        ...             item['pipeline-size'] = self.size
        ...             item['title'] = '%s - %s' % (self.site_title, item['id'])
        ...             yield item
        >>> provideUtility(TitleExampleSection,
        ...                name=u'collective.transmogrifier.tests.titleexample')
        >>> titlepipeline = """\
        ... [transmogrifier]
        ... pipeline =
        ...     section1
        ...     titlesection
        ...     section3
        ...
        ... [section1]
        ... blueprint = collective.transmogrifier.tests.examplesource
        ... size = 5
        ...
        ... [titlesection]
        ... blueprint = collective.transmogrifier.tests.titleexample
        ...
        ... [section3]
        ... blueprint = collective.transmogrifier.tests.exampleconstructor
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.titlepipeline',
        ...                titlepipeline)
        >>> plone.Title()
        u'Plone Test Site'
        >>> transmogrifier = Transmogrifier(plone)
        >>> transmogrifier(u'collective.transmogrifier.tests.titlepipeline')
        {'title': u'Plone Test Site - item00', 'id': 'item00', 'pipeline-size': '3'}
        {'title': u'Plone Test Site - item01', 'id': 'item01', 'pipeline-size': '3'}
        {'title': u'Plone Test Site - item02', 'id': 'item02', 'pipeline-size': '3'}
        {'title': u'Plone Test Site - item03', 'id': 'item03', 'pipeline-size': '3'}
        {'title': u'Plone Test Site - item04', 'id': 'item04', 'pipeline-size': '3'}
        
        Configuration file syntax
        -------------------------
        
        As mentioned earlier, the configuration files use the format
        defined by the Python ConfigParser module with extensions. The
        extensions are based on the zc.buildout extensions and are:
        
        - option names are case sensitive
        
        - option values can use a substitution syntax, described below, to
        refer to option values in specific sections.
        
        - you can include other configuration files, see `Including other
        configurations`_.
        
        The ConfigParser syntax is very flexible. Section names can contain any
        characters other than newlines and right square braces ("]"). Option names can
        contain any characters (within the ASCII character set) other than newlines,
        colons, and equal signs, can not start with a space, and don't include
        trailing spaces.
        
        It is a good idea to keep section and option names simple, sticking to
        alphanumeric characters, hyphens, and periods.
        
        Variable substitution
        ~~~~~~~~~~~~~~~~~~~~~
        
        Transmogrifier supports a string.Template-like syntax for variable
        substitution, using both the section and the option name joined by a colon:
        
        >>> substitutionexample = """\
        ... [transmogrifier]
        ... pipeline =
        ...     section1
        ...     section2
        ...     section3
        ...
        ... [definitions]
        ... item_count = 3
        ...
        ... [section1]
        ... blueprint = collective.transmogrifier.tests.examplesource
        ... size = ${definitions:item_count}
        ...
        ... [section2]
        ... blueprint = collective.transmogrifier.tests.exampletransform
        ...
        ... [section3]
        ... blueprint = collective.transmogrifier.tests.exampleconstructor
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.substitutionexample',
        ...                substitutionexample)
        
        Here we created an extra section called definitions, and refer to the
        item_count option defined in that section to set the size of the section1
        pipeline section, so we only get 3 items when we execute this pipeline:
        
        >>> transmogrifier = Transmogrifier(plone)
        >>> transmogrifier(u'collective.transmogrifier.tests.substitutionexample')
        {'exampletransformname': 'section2', 'id': 'item00'}
        {'exampletransformname': 'section2', 'id': 'item01'}
        {'exampletransformname': 'section2', 'id': 'item02'}
        
        Including other configurations
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        You can include other transmogrifier configurations with the ``include``
        option in the transmogrifier section. This option takes a list of
        configuration ids, separated by whitespace. All sections and options from
        those configuration files will be included provided the options weren't
        already present. This works recursively; inclusions in the included
        configuration files are honoured too:
        
        >>> inclusionexample = """\
        ... [transmogrifier]
        ... include =
        ...     collective.transmogrifier.tests.sources
        ...     collective.transmogrifier.tests.base
        ...
        ... [section1]
        ... size = 3
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.inclusionexample',
        ...                inclusionexample)
        >>> sources = """\
        ... [section1]
        ... blueprint = collective.transmogrifier.tests.examplesource
        ... size = 10
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.sources',
        ...                sources)
        >>> base = """\
        ... [transmogrifier]
        ... pipeline =
        ...     section1
        ...     section2
        ...     section3
        ... include = collective.transmogrifier.tests.constructor
        ...
        ... [section2]
        ... blueprint = collective.transmogrifier.tests.exampletransform
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.base',
        ...                base)
        >>> constructor = """\
        ... [section3]
        ... blueprint = collective.transmogrifier.tests.exampleconstructor
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.constructor',
        ...                constructor)
        >>> transmogrifier = Transmogrifier(plone)
        >>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample')
        {'exampletransformname': 'section2', 'id': 'item00'}
        {'exampletransformname': 'section2', 'id': 'item01'}
        {'exampletransformname': 'section2', 'id': 'item02'}
        
        Like zc.buildout configurations, we can also add or remove lines from included
        configuration options, by using the += and -= syntax:
        
        >>> advancedinclusionexample = """\
        ... [transmogrifier]
        ... include =
        ...     collective.transmogrifier.tests.inclusionexample
        ... pipeline -=
        ...     section2
        ...     section3
        ... pipeline +=
        ...     section4
        ...     section3
        ...
        ... [section4]
        ... blueprint = collective.transmogrifier.tests.titleexample
        ... """
        >>> registerConfig(u'collective.transmogrifier.tests.advancedinclusionexample',
        ...                advancedinclusionexample)
        >>> transmogrifier = Transmogrifier(plone)
        >>> transmogrifier(u'collective.transmogrifier.tests.advancedinclusionexample')
        {'title': u'Plone Test Site - item00', 'id': 'item00', 'pipeline-size': '3'}
        {'title': u'Plone Test Site - item01', 'id': 'item01', 'pipeline-size': '3'}
        {'title': u'Plone Test Site - item02', 'id': 'item02', 'pipeline-size': '3'}
        
        When calling transmogrifier, you can provide your own sections too: any extra
        keyword is interpreted as a section dictionary. Do make sure you use string
        values though:
        
        >>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample',
        ...               section1=dict(size='1'))
        {'exampletransformname': 'section2', 'id': 'item00'}
        
        Conventions
        -----------
        
        At its most basic level, transmogrifier pipelines are just iterators passing
        'things' around. Transmogrifier doesn't expect anything more than being able
        to iterate over the pipeline and doesn't dictate what happens within that
        pipeline, what defines a 'thing' or what ultimately gets accomplished.
        
        But as has been stated repeatedly, transmogrifier has been developed to
        facilitate importing legacy content, processing data in incremental steps
        until a final section constructs new content.
        
        To reach this end, several conventions have been established that help the
        various pipeline sections work together.
        
        Items are mappings
        ~~~~~~~~~~~~~~~~~~
        
        The first one is that the 'things' passed from section to section are
        mappings; i.e. they are or behave just like python dictionaries. Again,
        transmogrifier doesn't produce these by itself, source sections (see Sources_)
        produce them by injecting them into the stream.
        
        Keys are fields
        ~~~~~~~~~~~~~~~
        
        Secondly, *all* keys in such mappings that do not start with an underscore
        will be used by constructor sections (see Constructors_) to construct Plone
        content. So keys that do not start with an underscore are expected to map to
        Archetypes fields or Zope3 schema fields or whatever the constructor expects.
        
        Paths are to the target object
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        Many sections either create objects (constructors) or operate on
        already-constructed or pre-existing objecs. Such sections should interpret
        paths as the complete path for the object. For constructors this means they'll
        need to split the path into a container path and an id in order for them to
        find the correct context for constructing the object.
        
        Keys with a leading underscore are controllers
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        This leaves the keys that do start with a leading underscore to have special
        meaning to specific sections, allowing earlier pipeline sections to inject
        'control statements' for later sections in the item mapping. To avoid name
        clashes, sections that do expect such controller keys should use prefixes
        based on the name under which their blueprint was registered, plus optionally
        the name of the pipe section. This allows for precise targeting of pipe
        sections when inserting such keys.
        
        We'll illustrate this with an example. Let's say a source section loads news
        items from a database, but the database tables for such items hold filenames
        to point to binary image data. Rather than have this section load those
        filenames directly and add them to the item for image creation, a generic
        'file loader' section is used to do this. Let's suppose that this file loader
        is registered as ``acme.transmogrifier.fileloader``. This section then could
        be instructed to load files and store them in a named key by using 2
        'controller' keys named ``_acme.transmogrifier.fileloader_filename`` and
        ``_acme.transmogrifier.fileloader_targetkey``. If the source section were to
        create pipeline items with those keys, this later fileloader section would
        then automatically load the filenames and inject them into the items in the
        right location.
        
        If you need 2 such loaders, you can target them each individually by including
        their section names; so to target just the ``imageloader1`` section you'd use
        the keys ``_acme.transmogrifier.fileloader_imageloader1_filename`` and
        ``_acme.transmogrifier.fileloader_imageloader1_targetkey``. Sections that
        support such targeting should prefer such section specific keys over those
        only using the blueprint name.
        
        The collective.transmogrifier.utils module has a handy utility method called
        ``defaultKeys`` that'll generate these keys for you for easy matching:
        
        >>> from collective.transmogrifier import utils
        >>> keys = utils.defaultKeys('acme.transmogrifier.fileloader',
        ...                          'imageloader1', 'filename')
        >>> pprint.pprint(keys)
        ('_acme.transmogrifier.fileloader_imageloader1_filename',
        '_acme.transmogrifier.fileloader_filename',
        '_imageloader1_filename',
        '_filename')
        >>> utils.Matcher(*keys)('_filename', '_imageloader1_filename')
        ('_imageloader1_filename', True)
        
        
        Keep memory use to a minimum
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The above example is a little contrived of course; you'd generally configure a
        file loader section with a key name to grab the filename from, and perhaps put
        the loader *after* the constructor section and load the image data straight
        into the already constructed content item instead. This lowers memory
        requirements as image data can go directly into the ZODB this way, and the
        content object can be deactivated after the binary data has been stored.
        
        By operating on one item at a time, a transmogrifier pipeline can handle huge
        numbers of content without breaking memory limits; individual sections should
        also avoid using memory unnecessarily.
        
        Previous sections go first
        ~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        As mentioned in the Sources_ section, when inserting new items into the
        stream, generally previous pipe sections come first. This way someone
        constructing a pipeline knows what source section will be processed earlier
        (those slotted earlier in the pipeline) and can adjust expectations
        accordingly. This makes content construction more predictable when dealing
        with multiple sources.
        
        An exception would be a Folder Source, which inserts additional Folder items
        into the pipeline to ensure that the required container for any given content
        item exists at construction time. Such a source would inject extra items as
        needed, not before or after the previous source section.
        
        Iterators have 3 stages
        ~~~~~~~~~~~~~~~~~~~~~~~
        
        Some tasks have to happen before the pipeline runs, or after all content has
        been created. In such cases it is handy to realise that iteration within a
        section consists of three stages: before iteration, iteration itself, and
        after iteration.
        
        For example, a section creating references may have to wait for all content to
        be created before it can insert the references. In this case it could build a
        queue during iteration, and only when the previous pipe section has been
        exhausted and the last item has been yielded would the section reach into the
        portal and create all the references.
        
        Sources following the `Previous sections go first`_ convention basically
        inject the new items in the after iteration stage.
        
        Here's a piece of psuedo code to illustrate these 3 stages::
        
        def __iter__(self):
        # Before iteration
        # You can do initialisation here
        
        for item in self.previous
        # Iteration itself
        # You could process the items, take notes, inject additional
        # items based on the current item in the pipe or manipulate portal
        # content created by previous items
        yield item
        
        # After iteration
        # The section still has control here and could inject additional
        # items, manipulate all portal content created by the pipeline,
        # or clean up after itself.
        
        You can get quite creative with this. For example, the reference creator could
        get quite creative and defer creation of references until it knew the
        referenced object has been created too and periodically create these
        references. This would keep memory requirements smaller as not *all*
        references to create have to be remembered.
        
        Store pipeline-wide information in annotations
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        If, for some reason or other, you need to remember state across section
        instances that is pipeline-wide (such as database connections, or data
        counters), such information should be stored as annotations on the transmogrifier object::
        
        from zope.annotation.interfaces import IAnnotations
        
        MYKEY = 'foo.bar.baz'
        
        def __init__(self, transmogrifier, name, options, previous):
        self.storage = IAnnotations(transmogrifier).setdefault(MYKEY, {})
        self.storage.setdefault('spam', 0)
        ...
        
        def __iter__(self):
        ...
        self.storage['spam'] += 1
        ...
        
        .. _buildout: http://pypi.python.org/pypi/zc.buildout
        .. _iterator protocol: http://www.python.org/dev/peps/pep-0234/
        
        
        GenericSetup import integration
        ===============================
        
        To ease running a transmogrifier pipeline during site configuration, a generic
        import step for GenericSetup is included.
        
        The import step looks for a file named ``transmogrifier.txt`` and reads
        pipeline configuration names from this file, one name per line. Empty lines
        and lines starting with a # (hash mark) are skipped. These pipelines are then
        executed in the same order as they are found in the file.
        
        This means that if you want to run one or more pipelines as part of a
        GenericSetup profile, all you have to do is name these pipelines in a file
        named ``transmogrifier.txt`` in your profile directory.
        
        
        Default section blueprints
        **************************
        Constructor section
        ===================
        
        A constructor pipeline section is the heart of a transmogrifier content import
        pipeline. It constructs Plone content based on the items it processes. The
        constructor section blueprint name is
        ``collective.transmogrifier.sections.constructor``. Constructor sections do
        only one thing, they construct *new* content. No schema changes are made.
        Also, constructors create content without restrictions, no security checks or
        containment constraints are checked.
        
        Construction needs 2 pieces of information: the path to the item (including
        the id for the new item itself) and it's portal type. To determine both of
        these, the constructor section inspects each item and looks for 2 keys, as
        described below. Any item missing any of these 2 pieces will be skipped.
        Similarly, items with a path for a container or type that doesn't exist will
        be skipped as well; make sure that these containers are constructed
        beforehand. Because a constructor section will only construct new objects, if
        an object with the same path already exists, the item will also be skipped.
        
        For the object path, it'll look (in order) for
        ``_collective.transmogrifier.sections.constructor_[sectionname]_path``,
        ``_collective.transmogrifier.sections.constructor_path``,
        ``_[sectionname]_path``, and ``_path``, where ``[sectionname]`` is replaced
        with the name given to the current section. This allows you to target the
        right section precisely if needed. Alternatively, you can specify what key to
        use for the path by specifying the ``path-key`` option, which should be a list
        of keys to try (one key per line, use a ``re:`` or ``regexp:`` prefix to
        specify regular expressions).
        
        For the portal type, use the ``type-key`` option to specify a set of keys just
        like ``path-key``. If omitted, the constructor will look for
        ``_collective.transmogrifier.sections.constructor_[sectionname]_type``,
        ``_collective.transmogrifier.sections.constructor_type``,
        ``_[sectionname]_type``, ``_type``, ``portal_type`` and ``Type`` (in that
        order, with ``[sectionname]`` replaced).
        
        Unicode paths will be encoded to ASCII. Using the path and type, a new object
        will be constructed using invokeFactory; nothing else is done. Paths are
        always interpreted as relative to the context object, with the last path
        segment being the id of the object to create.
        
        >>> import pprint
        >>> constructor = """
        ... [transmogrifier]
        ... pipeline =
        ...     contentsource
        ...     constructor
        ...     printer
        ...
        ... [contentsource]
        ... blueprint = collective.transmogrifier.sections.tests.contentsource
        ...
        ... [constructor]
        ... blueprint = collective.transmogrifier.sections.constructor
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.constructor',
        ...                constructor)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.constructor')
        {'_type': 'FooType', '_path': '/spam/eggs/foo'}
        {'_type': 'FooType', '_path': '/foo'}
        {'_path': 'not/existing/bar',
        '_type': 'BarType',
        'title': 'Should not be constructed, not an existing path'}
        {'_path': '/spam/eggs/existing',
        '_type': 'FooType',
        'title': 'Should not be constructed, an existing object'}
        {'_path': '/spam/eggs/incomplete',
        'title': 'Should not be constructed, no type'}
        {'_path': '/spam/eggs/nosuchtype',
        '_type': 'NonExisting',
        'title': 'Should not be constructed, not an existing type'}
        {'_path': 'spam/eggs/changedByFactory',
        '_type': 'FooType',
        'title': 'Factories are allowed to change the id'}
        >>> pprint.pprint(plone.constructed)
        (('spam/eggs', 'foo', 'FooType'),
        ('', 'foo', 'FooType'),
        ('spam/eggs', 'changedByFactory', 'FooType'))
        
        
        Codec section
        =============
        
        A codec pipeline section lets you alter the character encoding of item
        values, allowing you to recode text from and to unicode and any of the
        codecs supported by python. The codec section blueprint name is
        ``collective.transmogrifier.sections.codec``.
        
        What values to recode is determined by the ``keys`` option, which takes a set
        of newline-separated key names. If a key name starts with ``re:`` or
        ``regexp:`` it is treated as a regular expression instead.
        
        The optional ``from`` and ``to`` options determine what codecs values are
        recoded from and to. Both these values default to ``unicode``, meaning no
        translation. If either option is set to ``default``, the current default
        encoding of the Plone site is used.
        
        To deal with possible encoding errors, you can set the error handler of both
        the ``from`` and ``to`` codecs separately with the ``from-error-handler`` and
        ``to-error-handler`` options, respectively. These default to ``strict``, but
        can be set to any error handler supported by python, including ``replace`` and
        ``ignore``.
        
        Also optional is the ``condition`` option, which lets you specify a TALES
        expression that when evaluating to False will prevent any en- or decoding from
        happening. The condition is evaluated for every matched key.
        
        >>> codecs = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     decode-all
        ...     encode-id
        ...     encode-title
        ...     printer
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.samplesource
        ... encoding = utf8
        ...
        ... [decode-all]
        ... blueprint = collective.transmogrifier.sections.codec
        ... keys = re:.*
        ... from = utf8
        ...
        ... [encode-id]
        ... blueprint = collective.transmogrifier.sections.codec
        ... keys = id
        ... to = ascii
        ...
        ... [encode-title]
        ... blueprint = collective.transmogrifier.sections.codec
        ... keys = title
        ... to = ascii
        ... to-error-handler = backslashreplace
        ... condition = python:'Brand' not in item['title']
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.codecs',
        ...                codecs)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.codecs')
        {'status': u'\u2117', 'id': 'foo', 'title': 'The Foo Fighters \\u2117'}
        {'status': u'\u2122', 'id': 'bar', 'title': u'Brand Chocolate Bar \u2122'}
        {'id': 'monty-python',
        'status': u'\xa9',
        'title': "Monty Python's Flying Circus \\xa9"}
        
        The ``condition`` expression has access to the following:
        
        =================== ==========================================================
        ``item``            the current pipeline item
        ``key``             the name of the matched key
        ``match``           if the key was matched by a regular expression, the match
        object, otherwise boolean True
        ``transmogrifier``  the transmogrifier
        ``name``            the name of the splitter section
        ``options``         the splitter options
        ``modules``         sys.modules
        =================== ==========================================================
        
        
        Inserter section
        ================
        
        An inserter pipeline section lets you define a key and value to insert into
        pipeline items. The inserter section blueprint name is
        ``collective.transmogrifier.sections.inserter``.
        
        A inserter section takes a ``key`` and a ``value`` TALES expression. These
        expressions are evaluated to generate the actual key-value pair that gets
        inserted. You can also specify an optional ``condition`` option; if given, the
        key only gets inserted when the condition, which is also a TALES is true.
        
        Because the inserter ``value`` expression has access to the original item, it
        could even be used to change existing item values. Just target an existing
        key, pull out the original value in the value expression and return a modified
        version.
        
        >>> inserter = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     simple-insertion
        ...     expression-insertion
        ...     transform-id
        ...     printer
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.rangesource
        ... size = 3
        ...
        ... [simple-insertion]
        ... blueprint = collective.transmogrifier.sections.inserter
        ... key = string:foo
        ... value = string:bar (inserted into "${item/id}" by the "$name" section)
        ...
        ... [expression-insertion]
        ... blueprint = collective.transmogrifier.sections.inserter
        ... key = python:'foo-%s' % item['id'][-2:]
        ... value = python:int(item['id'][-2:]) * 15
        ... condition = python:int(item['id'][-2:])
        ...
        ... [transform-id]
        ... blueprint = collective.transmogrifier.sections.inserter
        ... key = string:id
        ... value = string:foo-${item/id}
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.inserter',
        ...                inserter)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.inserter')
        {'foo': 'bar (inserted into "item-00" by the "simple-insertion" section)',
        'id': 'foo-item-00'}
        {'foo': 'bar (inserted into "item-01" by the "simple-insertion" section)',
        'foo-01': 15,
        'id': 'foo-item-01'}
        {'foo': 'bar (inserted into "item-02" by the "simple-insertion" section)',
        'foo-02': 30,
        'id': 'foo-item-02'}
        
        The ``key``, ``value`` and ``condition`` expressions have access to the
        following:
        
        =================== ==========================================================
        ``item``            the current pipeline item
        ``transmogrifier``  the transmogrifier
        ``name``            the name of the splitter section
        ``options``         the splitter options
        ``modules``         sys.modules
        ``key``             (only for the value and condition expressions) the key
        being inserted
        =================== ==========================================================
        
        
        Condition section
        =================
        
        A condition pipeline section lets you selectively discard items from the
        pipeline. The condition section blueprint name is
        ``collective.transmogrifier.sections.condition``.
        
        A condition section takes a ``condition`` TALES expression. When this
        expression when matched against the current item is True, the item is yielded
        to the next pipe section, otherwise it is not:
        
        >>> condition = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     condition
        ...     printer
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.rangesource
        ... size = 5
        ...
        ... [condition]
        ... blueprint = collective.transmogrifier.sections.condition
        ... condition = python:int(item['id'][-2:]) > 2
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.condition',
        ...                condition)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.condition')
        {'id': 'item-03'}
        {'id': 'item-04'}
        
        The ``condition`` expression has access to the following:
        
        =================== ==========================================================
        ``item``            the current pipeline item
        ``transmogrifier``  the transmogrifier
        ``name``            the name of the splitter section
        ``options``         the splitter options
        ``modules``         sys.modules
        =================== ==========================================================
        
        As condition sections skip items in the pipeline, they should not be used
        inside a splitter section!
        
        
        Manipulator section
        ===================
        
        A manipulator pipeline section lets you copy, move or discard keys from the
        pipeline. The manipulator section blueprint name is
        ``collective.transmogrifier.sections.manipulator``.
        
        A manipulator section will copy keys when you specify a set of keys to copy,
        and an expression to determine what to copy these to. These are the ``keys``
        and ``destination`` options.
        
        The ``keys`` option is a set of key names, one on each line; keynames starting
        with ``re:`` or ``regexp:`` are treated as regular expresions. The
        ``destination`` expression is a TALES expression that can access not only the
        item, but also the matched key and, if a regular expression was used, the
        match object.
        
        If a ``delete`` option is specified, it is also interpreted as a set of keys,
        like the ``keys`` option. These keys will be deleted from the item; if used
        together with the ``keys`` and ``destination`` options, keys will be renamed
        instead of copied.
        
        >>> manipulator = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     copy
        ...     rename
        ...     delete
        ...     printer
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.samplesource
        ...
        ... [copy]
        ... blueprint = collective.transmogrifier.sections.manipulator
        ... keys =
        ...     title
        ...     id
        ... destination = string:$key-copy
        ...
        ... [rename]
        ... blueprint = collective.transmogrifier.sections.manipulator
        ... keys = re:([^-]+)-copy$
        ... destination = python:'%s-duplicate' % match.group(1)
        ... delete = ${rename:keys}
        ...
        ... [delete]
        ... blueprint = collective.transmogrifier.sections.manipulator
        ... delete = status
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.manipulator',
        ...                manipulator)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.manipulator')
        {'id': 'foo',
        'id-duplicate': 'foo',
        'title': u'The Foo Fighters \u2117',
        'title-duplicate': u'The Foo Fighters \u2117'}
        {'id': 'bar',
        'id-duplicate': 'bar',
        'title': u'Brand Chocolate Bar \u2122',
        'title-duplicate': u'Brand Chocolate Bar \u2122'}
        {'id': 'monty-python',
        'id-duplicate': 'monty-python',
        'title': u"Monty Python's Flying Circus \xa9",
        'title-duplicate': u"Monty Python's Flying Circus \xa9"}
        
        The ``destination`` expression has access to the following:
        
        =================== ==========================================================
        ``item``            the current pipeline item
        ``key``             the name of the matched key
        ``match``           if the key was matched by a regular expression, the match
        object, otherwise boolean True
        ``transmogrifier``  the transmogrifier
        ``name``            the name of the splitter section
        ``options``         the splitter options
        ``modules``         sys.modules
        =================== ==========================================================
        
        
        Splitter section
        ================
        
        A splitter pipeline section lets you branch a pipeline into 2 or more
        sub-pipelines. The splitter section blueprint name is
        ``collective.transmogrifier.sections.splitter``.
        
        A splitter section takes 2 or more pipeline definitions, and sends the items
        from the previous section through each of these sub-pipelines, each with it's
        own copy [*]_ of the items:
        
        >>> emptysplitter = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     splitter
        ...     printer
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.rangesource
        ... size = 3
        ...
        ... [splitter]
        ... blueprint = collective.transmogrifier.sections.splitter
        ... pipeline-1 =
        ... pipeline-2 =
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.emptysplitter',
        ...                emptysplitter)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.emptysplitter')
        {'id': 'item-00'}
        {'id': 'item-00'}
        {'id': 'item-01'}
        {'id': 'item-01'}
        {'id': 'item-02'}
        {'id': 'item-02'}
        
        Although the pipeline definitions in the splitter are empty, we end up with 2
        copies of every item in the pipeline as both splitter pipelines get to process
        a copy. Splitter pipelines are defined by options starting with ``pipeline-``.
        
        Normally you'll use conditions to identify items for each sub-pipe, making the
        splitter the pipeline equivalent of an if/elif statement. Conditions are
        optional and use the pipeline option name plus ``-condition``:
        
        >>> evenoddsplitter = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     splitter
        ...     printer
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.rangesource
        ... size = 3
        ...
        ... [splitter]
        ... blueprint = collective.transmogrifier.sections.splitter
        ... pipeline-even-condition = python:int(item['id'][-2:]) % 2
        ... pipeline-even = even-section
        ... pipeline-odd-condition = not:${splitter:pipeline-even-condition}
        ... pipeline-odd = odd-section
        ...
        ... [odd-section]
        ... blueprint = collective.transmogrifier.sections.inserter
        ... key = string:even
        ... value = string:The even pipe
        ...
        ... [even-section]
        ... blueprint = collective.transmogrifier.sections.inserter
        ... key = string:odd
        ... value = string:The odd pipe
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.evenodd',
        ...                evenoddsplitter)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.evenodd')
        {'even': 'The even pipe', 'id': 'item-00'}
        {'odd': 'The odd pipe', 'id': 'item-01'}
        {'even': 'The even pipe', 'id': 'item-02'}
        
        Conditions are expressed as TALES statements, and have access to:
        
        =================== ==========================================================
        ``item``            the current pipeline item
        ``transmogrifier``  the transmogrifier
        ``name``            the name of the splitter section
        ``pipeline``        the name of the splitter pipeline this condition belongs
        to (including the ``pipeline-`` prefix)
        ``options``         the splitter options
        ``modules``         sys.modules
        =================== ==========================================================
        
        
        .. WARNING::
        Although the splitter section employs some techniques to avoid memory
        bloat, if any contained section swallows items (so taking them from the
        previous section without passing them on), runs the risk of pulling all
        remaining items into the splitter buffer as a next match for the contained
        pipeline is being sought.
        
        You can avoid this by not using sections that discard items within a
        splitter; place these before or after a splitter section. Better still,
        use a correct condition in the splitter configuration that won't include
        the items to discard in the first place.
        
        .. [*] Note that copy.deepcopy is used on all items. This will fail on items
        containing file handles, modules or other non-copyable values. See the
        copy module documentation.
        
        
        Savepoint section
        =================
        
        A savepoint pipeline section commits a savepoint every so often, which has a
        side-effect of freeing up memory. The savepoint section blueprint name is
        ``collective.transmogrifier.sections.savepoint``.
        
        A savepoint section takes an optional ``every`` option, which defaults to
        1000; a savepoint is committed every ``every`` items passing through the pipe.
        A savepoint section doesn't alter the items in any way:
        
        >>> savepoint = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     savepoint
        ...
        ... [source]
        ... blueprint = collective.transmogrifier.sections.tests.rangesource
        ... size = 10
        ...
        ... [savepoint]
        ... blueprint = collective.transmogrifier.sections.savepoint
        ... every = 3
        ... """
        >>> registerConfig(u'collective.transmogrifier.sections.tests.savepoint',
        ...                savepoint)
        
        We'll show savepoints being committed by overriding transaction.savepoint:
        
        >>> import transaction
        >>> original_savepoint = transaction.savepoint
        >>> counter = [0]
        >>> def test_savepoint(counter=counter, *args, **kw):
        ...     counter[0] += 1
        >>> transaction.savepoint = test_savepoint
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.savepoint')
        >>> transaction.savepoint = original_savepoint
        >>> counter[0]
        3
        
        
        CSV source section
        ==================
        
        A CSV source pipeline section lets you create pipeline items from CSV files.
        The CSV source section blueprint name is
        ``collective.transmogrifier.sections.csvsource``.
        
        A CSV source section will load the CSV file named in the ``filename`` option,
        and will yield an item for each line in the CSV file. It'll use the first line
        of the CSV file to determine what keys to use, or you can specify a
        ``fieldnames`` option to specify the key names.
        
        By default the CSV file is assumed to use the Excel CSV dialect, but you can
        specify any dialect supported by the python csv module if you specify it with
        the ``dialect`` option.
        
        >>> import tempfile
        >>> tmp = tempfile.NamedTemporaryFile('w+', suffix='.csv')
        >>> tmp.write('\r\n'.join("""\
        ... foo,bar,baz
        ... first-foo,first-bar,first-baz
        ... second-foo,second-bar,second-baz
        ... """.splitlines()))
        >>> tmp.flush()
        >>> csvsource = """
        ... [transmogrifier]
        ... pipeline =
        ...     csvsource
        ...     printer
        ...
        ... [csvsource]
        ... blueprint = collective.transmogrifier.sections.csvsource
        ... filename = %s
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """ % tmp.name
        >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource',
        ...                csvsource)
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource')
        {'baz': 'first-baz', 'foo': 'first-foo', 'bar': 'first-bar'}
        {'baz': 'second-baz', 'foo': 'second-foo', 'bar': 'second-bar'}
        >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource',
        ...                csvsource=dict(fieldnames='monty spam eggs'))
        {'eggs': 'baz', 'monty': 'foo', 'spam': 'bar'}
        {'eggs': 'first-baz', 'monty': 'first-foo', 'spam': 'first-bar'}
        {'eggs': 'second-baz', 'monty': 'second-foo', 'spam': 'second-bar'}
        
        
        Change History
        **************
        
        (name of developer listed in brackets)
        
        1.0 (2009-08-07)
        ================
        
        - Initial transmogrifier architecture.
        [mj]
        
        
        Download
        ********
        
Keywords: content import filtering
Platform: UNKNOWN
