HTMLContentExtractor
====================

This blueprint extracts out fields from html either via xpath rules or by automatic cluster
analysis

transmogrify.htmlcontentextractor
---------------------------------

The blueprint transmogrify.htmlcontentextractor takes rules of the following form

N-field = (text|html|delete|optional) xpath

N is the group number. Groups are run in order of group number. If
any rule doesn't match (unless its marked optional) then the next group
will be tried instead.

"field" is the attribute that will be set with the results of the xpath

"format" is what to do with the results of the xpath. "optional" means the same
as "delete" but won't cause the group to not match. if the format is delete or optional
then the field name doesn't matter but will still need to be unique

"xpath' is an xpath expression

transmogrify.htmlcontentextractor.auto
--------------------------------------
This blueprint will analyse the html and attempt to discover the rules to extract out the
title, description and body of the html

