Metadata-Version: 1.1
Name: mwstreaming
Version: 0.2.0
Summary: A collection of scripts and utilities to support the stream-processing of MediaWiki data.
Home-page: https://github.com/halfak/MediaWiki-Streaming
Author: Aaron Halfaker
Author-email: ahalfaker@wikimedia.org
License: MIT
Description: MediaWiki Streaming
        ===================
        
        A collection of scripts and utilities to support the stream-processing of
        MediaWiki data.
        
        * dump2json -- Converts an XML dump to a stream of revision JSON blobs
        * wikihadoop2json -- Converts a Wikihadoop-processed stream of XML pages to JSON
                             blobs
        * json2tsv -- Converts a stream of JSON blobs to tab-separated values
        * json2diffs -- Computes and adds a "diff" field to a stream of revision JSON
                        blobs
        * diffs2persistence -- Computes token persistence from a stream of JSON revision
                               diff blobs and adds a "persistence" field.
        * persistence2revstats -- Aggregates a stream of token persistence to revision
                                  statistics
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering
