From: Ann Harrison Subject: Re: Interbase - what is it doing? Let me also take a crack at this, since I may be the only person with more experience trying to explain it than Jim (Starkey - my previous & current boss/mentor/(he says "say husband") etc.). First, for Novice InterBasians - when I say _transaction_, I mean a set of actions against the database, ending with a Commit, Rollback, Prepare/Commit (two phase commit), disconnection from the database. A single action, like inserting, updating, or deleting a record is a _statement_. Many tools provide automatic transaction support, so you may not be aware of the number of transactions created on your behalf. Any tool that performs a commit per statement is not your friend if you're loading a database. Here's the hard core stuff. Explanations of sweeping tend to be unsatisfactory because the subject is complicated, and depends on understanding several other complicated ideas. Disclaimer: This description applies to the state of the world in V3.x, with extrapolation to V4.x specifically noted. I have no current connection with InterBase or Borland. Lets by defining transaction state, garbage, garbage collection, and Oldest Interesting, Oldest Active, and sweeping: ___________________ TRANSACTION STATES ___________________ Transactions have four states: active, committed, limbo, and rolled back. Taking these cases in order from the least complex to the most: Limbo: A transaction which started a two-phase commit by calling the Prepare routine. The transaction may be be alive or not. At any point, the transaction may re-appear and ask to commit or rollback. Changes it made can neither be trusted nor ignored, and certainly can not be removed from the database. Committed: A transaction is which completed its activity successfully. Either A) It called the Commit routine which completed successfully, or B) It called the Rollback routine but made no changes to the database. This transaction is finished and will never be heard from again, and its changes are now officially part of the database. Rolledback: A transaction which either: A) called the rollback routine and requested that its changes be removed from the database, or B) was marked as Active, but discovered to be dead by another transaction which marked it as rolled back. In either case, changes made by this transaction must be ignored and should be removed from the database. Active: A transaction which: A) Hasn't started. B) Has started and hasn't finished. C) Has started and ended without calling any termination routine. (e.g. crashed, lost communication, etc.) --------------------------------------- How do transactions know about each others' state? ----------------------------------------------------------------- The state of every transaction is kept on a Transaction Inventory Page (TIP). The single change made to the database when a transaction commits is to change the state of the transaction from Active to Committed. When a transaction calls the rollback routine, it checks its Update Flag - if the flag is not set, meaning that no updates have been made, it calls Commit instead. So, rolling back read-only transactions doesn't mess up the database. -------------------- How can a transaction go from Active to Rolled Back if it exits abnormally? ----------------------- This can happend in one of two ways. 1) When a transaction starts, it takes out a lock on its own transaction id. If a transaction (B) attempts to update or delete a record and finds that some version of the record was created by a transaction (A) whose TIP state is ACTIVE, transaction B tries to get a conflicting lock on A's transaction id. If the lock is gone, then B decides that A died and changes A's TIP state from active to Rolled Back. 2) When a transaction starts, it checks to see if it can get an exclusive lock on the database - meaning that no other transactions are active. Every active transaction has a shared lock on the database. If it gets an exclusive lock, it converts all Active TIP entries to Rolled Back. SUMMARY To reiterate, a transaction is Active (meaning that it appears to be alive), Limbo (meaning that its outcome can not be determined), Committed (meaning that it completed successfully) or Rolled Back (meaning acknowledged its faults and left the field in disgrace). _______________ GARBAGE _______________ InterBase is a multi-generational database. When a record is updated, a copy of the new values is placed in the database, but the old values remain (usually as a bytewise difference from the new value). The old value is called a "Back Version". The back version is the rollback log - if the transaction that updated the record rolls back, the old version is right there, ready to resume its old place. The back version is also the shadow that provides repeatable reads for long running transactions. When the transaction that updated the record commits, and all concurrent transactions finish, the back version is unnecessary. In a database in which records are updated significantly and regularly, back versions could eventually take up enough disk space that they would reduce the performance of the database. Thus they are GARBAGE, and should be cleaned out. _________________ GARBAGE COLLECTION __________________ Garbage collection prevents an update-intensive database from filling up with unnecessary back versions of records. It also removes record versions created by transactions that rolled back. Every transaction participates in garbage collection - every transaction, including read-only transactions. When a transaction reads a record, at top level it gets a record - looks like any record from any database. Two levels lower, somewhere in the server, InterBase pulls a string of record versions off the disk. Each version is tagged with the transaction id of the transaction that created it. The first one is the most recently stored. At this point, the server has two goals: 1) produce an appropriate version of the record for the current transaction 2) remove any versions that are garbage - either because they were created by a transaction that rolled back or because they are too old for anybody to care about. _____________________ Extra Credit Aside _____________________ There is a third kind of garbage collection which happens at the same time. InterBase also uses a "multi-generational" delete. When transaction deletes a record, does the record go away right then? No, of course not. The deletion could be rolled back. So instead of removing the record, InterBase sticks in a Delete marker, and keeps the old version. Sooner or later the deletion commits and matures. Then the whole thing, deletion marker and all record versions are Garbage and get ... (right you are!) garbage collected. ________________________ GARBAGE COLLECTION - resumes ________________________ Garbage collection is co-operative, meaning that all transactions participate in it, rather than a dedicated garbage team. Old versions, deleted records, and rolled back updates are removed when a transaction attempts to read the record. In a database where all records are continually active, or where exhaustive retrievals (i.e. non-indexed access) are done regularly on all tables, co-operative garbage collection works well, as long as the transaction mask stays current. For databases in which all access is indexed, running a periodic backup will have the secondary effect of forcing garbage collection since the backup programs performs exhaustive retrievals on all tables. ___________________________ OLDEST INTERESTING TRANSACTION _____________________________ In order to recognize which record versions can garbage collected, and which updates are rolled back and can be ignored, every transaction includes a "transaction mask" which records the states of all "interesting" transactions. A transaction is "interesting" to another transaction if it is concurrent - meaning that its updates are not committed, or if it rolled back - meaning that its updates should be discarded, or if it's in limbo. The "transaction mask" is a snapshot of the states of all transactions from the oldest interesting, to the current. The snapshot is made when the transaction starts and is never updated. The size of the snapshot depends on the number of transactions that have started since the oldest interesting transaction. __________________________________ OLDEST ACTIVE TRANSACTION __________________________________ This one sounds easy - but it's not. The oldest active transaction is not the oldest transaction currently running. Nor is it the oldest transaction marked Active in the TIP. (Alas). It is the oldest transaction that was active when the oldest transaction currently active started. The bookkeeping on this is hairy, and I frankly don't remember how it was done, but that's the rule, and it does work. Any record version behind a committed version created by a transaction older than the oldest transaction active when the oldest transaction currently active started is garbage and will never be needed ever again. That's pretty dense. Lets ignore the commit/rollback question briefly. Simple case. I'm transaction 20. I find a record created and committed by transaction 15. I modify it and commit. You are transaction 25, and when you start, you are the only transaction active. You read the same record, recognize that all active transactions can use the version of the record created by me, so you garbage collect the original version. In this case, your threshold for garbage collection (aka Oldest Active) is yourself. Harder case: You continue puttering around, modifying this and that. Another transaction, say 27 starts. You are its oldest active. It too can modify this and that, as long as it doesn't modify anything you modified. It commits. I start a transaction 30. You are also my oldest active transaction, and I can't garbage collect any record version unless the newer version is older than you. I run into a record originally created by transaction 15, modified by transaction 20, then modified again by 27. All three of those transactions are committed, but I can garbage collect only the original version, created by transaction 15. Although the version created by transaction 27 is old enough for me, it is not old enough for you, and being cooperative, I have to consider your needs too. Hardest case: I'm transaction 87, and when I started, all transactions before 75 had committed, and everybody from 75 on was active. Transaction 77 modifies a record, created originally by transaction 56. I continue to read the 56 version. All is well. Transaction 77 commits. You are transaction 95. When you start, I, number 87, am the oldest active. You read the record created by 56 and modified by 77. You can't garbage collect anything in that record because I can't read records created by any transaction newer than 74. Maybe you know now why descriptions of the oldest active tend to be a little peculiar. ____________________________________ SWEEPING ____________________________________ Sweeping is NOT just organized garbage collection. What sweeping seeks to do is to move the Oldest Interesting Transaction up, and reduce the size of transaction masks. It does so by rolled back transactions to committed transactions. "What!!!", you say. "The woman is nuts." But that's what a sweep does. It removes all the changes made by a rolled back transaction then changes it state to committed. (Remember we agreed earlier that a read-only transaction that rolled back could be considered committed for all the harm it did. Remove the damage, and its safe to consider the transaction committed.) At the same time, it garbages collects like any other transaction. Prior to version 4.2, the unlucky transaction that triggered the sweep gets to do the work. Other concurrent transactions continue, largely unaffected. In version 4.2 and later, a new thread is started and sweeps the database while everybody else goes about life as normal. ____________________________ ASIDE on LIMBO Transactions ____________________________ A transaction in limbo can not be resolved by a sweep, will continue to trigger sweeps, and will block attempts to update or delete record versions it created. However, InterBase gives good diagnositics when it encouters a record in that state, and no tool is likely to generate incomplete two-phase commits on a random basis. _______________________________________________ SOME EXAMPLES _______________________________________________ The unfortunate case that started this was an attempt to insert 1,000,000 record, one transaction, one commit per record. The process slowed to a crawl, which was blamed on sweeps. Sweeping may be the problem, but I doubt it. Case 1. Single stream of non-concurrent transactions. Transaction 1 inserts record 1, and commits. Transaction 2 starts and is both oldest active and oldest interesting. It inserts record 2 and commits. Transaction 3 starts, is oldest active and oldest interesting, inserts its record and commits. Eventually, transaction 1,000,000 starts and it too is both oldest interesting and oldest active. No sweeps. Case 2. Lurker in the background. Transaction 1 starts, looks around, and goes off for a smoke. Transaction 2 starts, notices that 1 is oldest interesting and oldest active, inserts record 1 and commits. Transaction 3 starts, notices that 1 is still OI and OA, inserts record 2 and commits. Eventually transaction 1,000,001 starts, notices that 1 is still OI and OA so the difference between the two is still 0, stores, and commits. No sweeps again. Case 3. Suicidal lurker. Transaction 1 starts, does something, goes out for a smoke. Transaction 2 starts, notices that 1 is oldest interesting and oldest active, inserts record 1 and commits. Transaction 3 starts, notices that 1 is still OI and OA, inserts record 2 and commits. Eventually transaction 1 succumbs to smoke inhalation and dies quietly in his corner. Transaction 15,034 (by luck) starts, gets an exclusive lock on the database, and sets Transaction 1's state to Rolled Back. Now the oldest interesting is still 1, but the oldest active is 15,034. The difference is 15,033, so no sweep yet. 4,967 transactions later the sweep occurs. Depending on the version of InterBase, transaction 20,001 may actually be charged with the time spent sweeping. Versions since 4.1 start a new thread. Once the sweep is done, the OI and OA march up together, hand in hand, and there is no more sweeping unless another transaction goes into an interesting and non-active state. Case 4. Suicidal Twin If for every record stored, the tool started one transaction which stored the record then rolled back, followed by a second transaction which stored the record and committed, then the difference between the OA and the OI would go up one for each record successfully stored. (Transaction 1 becomes OI when it rolls back. Transaction 2 is OA when it starts and the difference is 1. Transaction 3 rolls back, but is not OI because Transaction 1 is still Oer. Transaction 4 is OA and sees a difference of 3 between it and Transaction 1, and so on until transaction 20,001 which sweeps, and brings the OA and OI together at 20,001. Unfortunately its only storing record 10,001 since half the attempts to store are failing. In this EXTREMELY UNLIKELY case, storing 1,000,000 records would cause 100 sweeps. However, it would require an UNUSUALLY bad programmer to create anything that AMAZINGLY inefficient. Grounds for a career change. ___________ SUMMARY ___________ Beats me why the load was so slow, although the commit per update does a lot more writing than just updating. That and forced write might explain a lot. Maybe a really fragmented disk?