===========
Transaction
===========

###############################################################################

!!!ATTENTION!!!
This concept has changed. The first part of this documentation whcih describes
the current zope transaction is correct. But our mongodb transaction handling 
which is described in the second part has changed!

See !!!ATTENTION!!! comment below!

###############################################################################


First some notes about zope transaction handling. Zope uses transaction data
manager which will observe an object. We use a generic transaction data manager
for our storage and container implementation. They just dispatch the calls
to the object which implements the transaction handling API.

Transaction commit
------------------

Let's explain how this transaction manager will call the different transaction
data managers during a commit. The transaction manager will call the following
methods on each data manager. This means each method get called on each data
manager before the next method get called:

1. tpc_begin
2. commit
3. tpc_vote
4. tpc_finish

If an error happens during this different calls the following get called on each
data manger:

1. abort
2. tpc_abort

Note, abort only get called if a data manager get voted. This means
the transaction manager did call the methods e.g. tpc_begin, commit, tpc_vote
without any error on a data manager. And the method tpc_abort get called on any
error.

Let's show an example what could happen if we have 2 data managers. First show 
how a sucessfull transaction commit will look like:

- dm1.tpc_begin
- dm2.tpc_begin

- dm1.commit
- dm2.commit

- dm1.tpc_vote
- dm2.tpc_vote

- dm1.tpc_finish
- dm2.tpc_finish


tpc_begin fails
~~~~~~~~~~~~~~~

Now show what's happen if the first data manager fails during tpc_begin:

- dm1.tpc_begin
  -> error

- dm1.tpc_abort
- dm2.tpc_abort

- raise sys.exc_info()

Now show what's happen if the second data manager fails during tpc_begin:

- dm1.tpc_begin
- dm2.tpc_begin
  -> error

- dm1.tpc_abort
- dm2.tpc_abort

- raise sys.exc_info()


commit fails
~~~~~~~~~~~~

Now show what's happen if the first data manager fails during commit:

- dm1.tpc_begin
- dm2.tpc_begin

- dm1.commit
  -> error

- dm1.tpc_abort
- dm2.tpc_abort

- raise sys.exc_info()

Now show what's happen if the second data manager fails during commit:

- dm1.tpc_begin
- dm2.tpc_begin

- dm1.commit
- dm2.commit
  -> error

- dm1.tpc_abort
- dm2.tpc_abort

- raise sys.exc_info()


tpc_vote fails
~~~~~~~~~~~~~~

Now show what's happen if the first data manager fails during tpc_vote. As you
can see this is the same as above because non of the two dat manger get voted:

- dm1.tpc_begin
- dm2.tpc_begin

- dm1.commit
- dm2.commit

- dm1.tpc_vote
  -> error

- dm1.tpc_abort
- dm2.tpc_abort

- raise sys.exc_info()

But if the second data manger will fail on tpc_vote, abort get called on the
first data manager because this data manager get marked as voted:

- dm1.tpc_begin
- dm2.tpc_begin

- dm1.commit
- dm2.commit

- dm1.tpc_vote
- dm2.tpc_vote
  -> error

- dm1.abort

- dm1.tpc_abort
- dm2.tpc_abort

- raise sys.exc_info()

As you can see tpc_finish does not get called on any error.


tpc_finish fails
~~~~~~~~~~~~~~~~

Now show what's happen if a data manager fails during tpc_finish:

- dm1.tpc_begin
- dm2.tpc_begin

- dm1.tpc_vote
- dm2.tpc_vote

- dm1.tpc_finish
  -> raise error

As you can see there is no abort call which will cleanup anything. This means
tpc_finish should never fail. Never ever or your data get messed up!


Transaction Abort
-----------------

Also note the transaction manager can get aborted at all. This will call
abort on any data manager without tpc_begin etc. During this abort call
tpc_abort does not get called.

This makes things a little bit complex becaues soemtimes abort get called
and sometimes tpc_abort and sometimes both of them.


Things to know
--------------

There is a _callAfterCommitHooks method which will get called after a commit.
This method does also call rm.abort if any _after_commit is registered.
This seems like a bug to me. If not this means we should make sure that
we allways cleanup our abort call process after tpc_finish.


Retry
-----

Another thing which makes transaction handling a little bit complexer is 
the zope retry concept. This means if a transaction get aborted it will
try nother time to process everything and commit again. This is set by default
to 3 retry handling. 

This is important if it comes to caching. If we cache different objects
in a thread local storage, we have to make sure that we cleanup this caches
if zope starts a retry. Otherwise we probably run into a problem with 
cached data.


Caching
-------

Also note, if we use a cache for cache the data from a database because we
do not like to load them on every access, we need to make sure that we remove
them from the cache at the end of our transaction.


Fazit
-----

How can we use this strategy within an external data storage like MongoDB
for a consistent data management. Here are some basics:

- We need to keep the original data if we need to revert commited changes

- We need to approve commited data

- We need to revert data on any kind of abort

But if we commit and the data and we loose the database connection before we can
revert, we will get inconsistent data. This is true for allmost any kind
of external database without a global transaction. This should very rarely
happen. And if so it's like a system interrupt which most system can't
handle anyway.

Let's assume that our database connection does not get lost during a commit
phase between calling tpc_begin and tpc_finish/tpc_abort.


Concept
-------

###############################################################################
#
#!!!ATTENTION!!!
#
#This concept has changed. currently we use a custom transaction data manager
#which offers an enhanced concept with voteApprove and voteCommit methods.
#We also use only one single transaction data manager for all different kind 
#of MongoItems. This means we can approve all changed mongo items before we
#start to commit the first one.
#
# 
#  - outdated
# |
# v
#
#Let's show how we can use a good concept for add/update data in a external data
#storage based on the given transaction handling. In our sample we use
#2 different data managers. The first data managers does something else
#and the second one stores the data in an external database. We ensure that our
#database transaction manager is the last one by use sys.maxint in the getKey
#method.
#
#- dm1.tpc_begin
#- dm2.tpc_begin
#  - get as not independent marked items
#  - move added, removed and modified items to commit caches
#
#- dm1.commit
#- dm2.commit
#
#- dm1.tpc_vote
#- dm2.tpc_vote
#  - compare not independent items by it's version number
#    - raise error if data are out of sync 
#  - commit added, removed and modified data
#    - raise error if something failed
#
#- dm1.tpc_finish
#- dm2.tpc_finish
#  - reset transaction cache
#
#As you can see the database transaction manager stores the data after all
#other commits get voted. This is the savest time to commit to the database
#and if something happens, we can still abort the full transaction.
#
#Now show how we handle errors in the steps of our commit process. Let's show
#what happens in combination with different other data managers.
#
#
#tpc_begin fails
#~~~~~~~~~~~~~~~
#
#the first data manger will fail:
#
#- dm1.tpc_begin
#  -> error
#
#- dm1.tpc_abort
#- dm2.tpc_abort
#  - reset transaction cache
#
#- raise sys.exc_info()
#
#the second data manger will fail:
#
#- dm1.tpc_begin
#- dm2.tpc_begin
#  -> error
#
#- dm1.tpc_abort
#- dm2.tpc_abort
#  - reset transaction cache
#
#- raise sys.exc_info()
#
#
#commit fails
#~~~~~~~~~~~~
#
#the first data manger will fail:
#
#- dm1.tpc_begin
#- dm2.tpc_begin
#  - get as not independent marked items
#  - move added, removed and modified items to commit caches
#  - make ready for retry, reset load caches
#
#- dm1.commit
#  -> error
#
#- dm1.tpc_abort
#- dm2.tpc_abort
#  - reset transaction cache
#
#- raise sys.exc_info()
#
#the second data manger will fail:
#
#- dm1.tpc_begin
#- dm2.tpc_begin
#  - get as not independent marked items
#  - move added, removed and modified items to commit caches
#  - make ready for retry, reset load caches
#
#- dm1.commit
#- dm2.commit
#  -> error
#
#- dm1.tpc_abort
#- dm2.tpc_abort
#  - reset transaction cache
#
#- raise sys.exc_info()
#
#
#tpc_vote fails
#~~~~~~~~~~~~~~
#
#the first data manger will fail:
#
#- dm1.tpc_begin
#- dm2.tpc_begin
#  - get as not independent marked items
#  - move added, removed and modified items to commit caches
#  - make ready for retry, reset load caches
#
#- dm1.commit
#- dm2.commit
#
#- dm1.tpc_vote
#  -> error
#
#- dm1.tpc_abort
#- dm2.tpc_abort
#  - reset transaction cache
#
#- raise sys.exc_info()
#
#the second data manger will fail:
#
#- dm1.tpc_begin
#- dm2.tpc_begin
#  - get as not independent marked items
#  - move added, removed and modified items to commit caches
#  - make ready for retry, reset load caches
#
#- dm1.commit
#- dm2.commit
#
#- dm1.tpc_vote
#- dm2.tpc_vote
#  -> error, this means a MongoDB error happen and will probably left over
#     inconsistent data. But this only happens if  something is wrong with
#     our data model or with the connection. Also note, that the database
#     connection in the tpc_begin was fine. 
#
#- dm1.abort
#
#- dm1.tpc_abort
#- dm2.tpc_abort
#  - reset transaction cache
#
#- raise sys.exc_info()
#
#
#tpc_finish fails
#~~~~~~~~~~~~~~~~
#
#As you know tpc_finish should never fail. See comments above.
#
#
#Usage
#-----
#
#In the above concept there is an important use case which is not supported by
#default. Let me explain, if you read data from a database and you will delete
#or add an item from a database storage, then you need to send a response back
#to the client which will, if needed, read the data again. This is needed because
#the add and remove actions get only commited to the database after the
#transaction get finished. Your code could not use the modifications before
#the transaction get commited.
#
#But theres a solution. You can allways commit the transaction if you need 
#up-to-date data. But take care. I recommend to split your code into 2 parts.
#
#1. start request
#2. read the data
#3. manipulate them e.g. modify, add or remove items
#4. commit the transaction
#5. only do read operations after commit e.g. read updated data
#6. return response
#
#Note, you should really split your code into the described steps and not commit
#more then once. If you commit more then once, the first commit part could not
#get reverted and your data will get left in a inconsistent state if the a second
#commit will fail.
#
#It's very save to use the pattern described above because the second part after
#calling transaction commit will not have anything to commit till you don't
#manipulate data after the first commit.
#
# ^
# |
#  - outdated
##############################################################################
