simplere
========

A simplified interface to Python's regular expression (``re``)
string search that tries to eliminate steps and provide
simpler access to results. As a bonus, also provides compatible way to
access Unix glob searches.

.. toctree::
   :maxdepth: 3

Usage
=====

Python regular expressions are powerful, but the language's lack
of an *en passant* (in passing) assignment requires a preparatory
motion and then a test::

    import re

    match = re.search(pattern, some_string)
    if match:
        print match.group(1)

With ``simplere``, you can do it in fewer steps::

    from simplere import *

    if match / re.search(pattern, some_string):
        print match[1]

Motivation
==========

In the simple examples above, "fewer steps" seems like a small
savings (3 lines to 2). While a 33% savings is a pretty good
optimization, is it really worth using another module and
a quirky *en passant* operator to get it?

In code this simple, maybe not. But real regex-based searching tends
to have multiple, cascading searches, and to be tightly interwoven
with complex pre-conditions, error-checking, and post-match formatting
or actions. It gets complicated fast. When multiple ``re`` matches
must be done, it consumes a lot of "vertical space" and often
threatens to push the number of lines a programmer is viewing at
any given moment beyond the number that can be easily held in working
memory. In that case, it proves valuable to condense what is logically
a single operation ("regular expression test") into a single line
with its conditional ``if``.

This is even more true for the "exploratory" phases of development,
before a program's appropriate structure and best logical boundaries
have been established.  One can always "back out" the condensing *en
passant* operation in later production code, if desired.


Re Objects
==========

``Re`` objects are `memoized
<http://en.wikipedia.org/wiki/Memoization>`_ for efficiency, so they compile their
pattern just once, regardless of how many times they're mentioned in a
program.

Note that the ``in`` test turns the sense of the matching around (compared to
the standard ``re`` module). It asks "is the given string *in*
the set of items this pattern describes?" To be fancy, the
``Re`` pattern is an intensionally
defined set (namely "all strings matching the pattern"). This order often makes
excellent sense whey you have a clear intent for the test. For example, "is the
given string within the set of *all legitimate commands*?"

Second, the ``in`` test had the side effect of setting the underscore
name ``_`` to the result. Python doesn't support *en passant* assignment--apparently,
no matter how hard you try, or how much introspection you use. This makes it
harder to both test and collect results in the same motion, even though that's
often exactly appropriate. Collecting them in a class variable is a fallback
strategy (see the *En Passant* section below for a slicker one).

If you prefer the more traditional ``re`` calls::

    if Re(pattern).search(some_string):
        print Re._[1]

``Re`` works even better with named pattern components, which are exposed
as attributes of the returned object::

    person = 'John Smith 48'
    if person in Re(r'(?P<name>[\w\s]*)\s+(?P<age>\d+)'):
        print Re._.name, "is", Re._.age, "years old"
    else:
        print "don't understand '{}'".format(person)

One trick being used here is that the returned object is not a pure
``_sre.SRE_Match`` that Python's ``re`` module returns. Nor is it a subclass.
(That class `appears to be unsubclassable
<http://stackoverflow.com/questions/4835352/subclassing-matchobject-in-python>`_.)
Thus, regular expression matches return a proxy object that
exposes the match object's numeric (positional) and
named groups through indices and attributes. If a named group has the same
name as a match object method or property, it takes precedence. Either
change the name of the match group or access the underlying property thus:
``x._match.property``

It's possible also to loop over the results::

    for found in Re('pattern (\w+)').finditer('pattern is as pattern does'):
        print found[1]

Or collect them all in one fell swoop::

    found = Re('pattern (\w+)').findall('pattern is as pattern does')

Pretty much all of the methods and properties one can access from the standard
``re`` module are available.

Bonus: Globs
============

Regular expressions are wonderfully powerful, but sometimes the simpler `Unix glob
<http://en.wikipedia.org/wiki/Glob_(programming)>`_ is works just fine. As a bonus,
``simplere`` also provides simple glob access.::

    if 'globtastic' in Glob('glob*'):
        print "Yes! It is!"
    else:
        raise ValueError('YES IT IS')

Under the Covers
================

``ReMatch`` objects
wrap Python's native``_sre.SRE_Match`` objects (the things that ``re``
method calls return).::

    match = re.match(r'(?P<word>th.s)', 'this is a string')
    match = ReMatch(match)
    if match:
        print match.group(1)    # still works
        print match[1]          # same thing
        print match.word        # same thing, with logical name

But that's a huge amount of boiler plate for a simple test, right? So ``simplere``
*en passant* operator redefining the division operation and proxies the ``re`` result
on the fly to the pre-defined ``match`` object::

    if match / re.search(r'(?P<word>th.s)', 'this is a string'):
        assert match[1] == 'this'
        assert match.word == 'this'
        assert match.group(1) == 'this'

If the ``re`` operation fails, the resulting object is guaranteed to have
a ``False``-like Boolean value, so that it will fall through conditional tests.

Options and Alternatives
========================

If you prefer the look of the less-than (``<``) or less-than-or-equal
(``<=``), as indicators that ``match`` takes the value of the
following function call, they are experimentally supported as aliases
of the division operation (``/``).  You may define your own match
objects, and can use them on memoized ``Re`` objects too. Putting
a few of these optional things together::

    answer = Match()   # need to do this just once

    if answer < Re(r'(?P<word>th..)').search('and that goes there'):
        assert answer.word == 'that'

Notes
=====

 *  Automated multi-version testing is managed with the wonderful
    `pytest <http://pypi.python.org/pypi/pytest>`_
    and `tox <http://pypi.python.org/pypi/tox>`_. ``simplere`` is
    successfully packaged for, and tested against, all late-model versions of
    Python: 2.6, 2.7, 3.2, and 3.3, as well as PyPy 2.1 (based on 2.7.3).
    Travis-CI testing has also commenced.

 *  ``simplere`` is one part of a larger effort to add intensional sets
    to Python. The `intensional <http://pypi.python.org/pypi/intensional>`_
    package contains a parallel implementation of ``Re``, among many other
    things.

 *  The author, `Jonathan Eunice <mailto:jonathan.eunice@gmail.com>`_ or
    `@jeunice on Twitter <http://twitter.com/jeunice>`_
    welcomes your comments and suggestions.

Installation
============

To install the latest version::

    pip install -U simplere

To ``easy_install`` under a specific Python version (3.3 in this example)::

    python3.3 -m easy_install --upgrade simplere

(You may need to prefix these with "sudo " to authorize installation.)
