LOCK AVOIDANCE.

In this write up I have tried to give a very basic overview of DB2 Locking mechanism and Lock avoidance.



Maintaining  data concurrency is one of the most important aspects of DB2. DB2 uses locking to ensure that data can be shared by multiple applications without compromising on data integrity.

To achieve locking DB2 uses IBM's IRLM.  The disadvantage of using IRLM is that it is a very cumbersome and time consuming process. I have tried to explain in a simplistic way, how locking requests are processed on behalf of DB2. DBM1 places a cross system call to IRLM through MVS. IRLM Processes this request and hands the result back to MVS and MVS hands that result back to DBM1.






To avoid doing a costly call to the IRLM, DB2 has some innovative methods to determine if the data page or the row it is trying to access is clean without going to the IRLM..

If the page has consistent data, then it need not go to the IRLM and ask for a costly lock. This  process is called  lock avoidance.

 Generally you enable lock avoidance by the BIND parameter CURRENTDATA(NO) and ISOLATION(CS).

There are two tests that are performed by DB2 to determine if it can avoid taking a lock.
1. CLSN test
2. PUNC test

CLSN TEST.

Before we go into the CLSN test we need to understand  a few logging concepts.

 RBA or the Relative Byte Address is the DB2 equivalent of  the human calendar/clock. How ever, unlike our clock that keeps ticking continuously, the RBA ticks only if there are log records that need to be written. You can think of RBA as a simple counter that starts at 0 and is incremented by 1 when ever there is a log record that needs to be written.

You can also think of RBA as an ever increasing counter that keeps track of what time (in DB2 terms) each change took place (each log record was written).

Page Log RBA is  recorded in the page header of every page. Page Log RBA is the last RBA at which any changes to that particular page were committed. Let us assume that we have a table space TS1 (with DB2 Internal identifiers DBID=X, PSID=Y and OBID=Z)and an application program  DML(Insert, update, Delete) wants to update a row in page 100. Let us further assume that DB2 Reads the Page at time T1(say a log RBA value of 000000123000), and does some program logic that takes 5 seconds. Next it updates the actual row in that page at T1+5 seconds (say a log RBA value of  0000001230A3) and it does more program logic. Finally it comes back  after an other 12 seconds and issues an implicit or an explicit commit at time T1+5+12 seconds (say a log RBA value of 000000123456A). At this point, DB2 updates a register in the page header of  Page 100 with the value 000000123456A. This is the Page Log RBA of Page 100.

CLSN or Commit Log Sequence Number for a particular object is the Start RBA of the  oldest, still active but not yet committed Unit of Recovery for that object.

You can think of this as a simple table that is held in DB2's memory which has two columns. Column 1 identifies each object for which there is an active and uncommitted Unit of Recovery and Column 2 identifies the START  RBA of  the oldest, uncommitted Unit of Recovery.

Let us assume that there are two programs A and B that are updating a table space TS2 (with DB2 Internal identifiers DBID=X, PSID=Y and OBID=Z) .
Program A wants to update Page number 325 and Program B wants to update Page 129. Program A  does an update on Page 325 at RBA 123450 and goes to sleep without issuing COMMIT, Program B, on the other hand, updates page 129 at RBA 123455, then goes on to update Page 871 at RBA 12345E and issues a COMMIT.  The CLSN for the page set X,Y,Z is still 123450 because it is the oldest, still active Unit of Recovery.


Now for the CLSN test. When DB2 wants to acquire a lock on a page it compares the Page Log RBA of that page with the CLSN  for that object. If Page Log RBA is lesser than the CLSN then it means that all the committed changes to this page have been written to this page and hence DB2 does not need a lock.





In the figure above if DB2 wants to take a lock on  DBID X, PSID Y, OBID Z Page 100  it cannot use lock avoidance because the Page Log RBA of Page 100 (12346A) is greater than the CLSN for Object X,Y,Z (123450). This is because there is a chance that there might be some changes made to this object by the Unit of Recovery that started at 123450 that have still not been written to page 100. This page might have inconsistent data. So, for Page 100 the CLSN test fails.

On the other hand if DB2 wants a lock on DBID U, PSID V,  OBID W Page 2127  it can use lock avoidance as the Page Log RBA (12345B) is lesser than the CLSN for the object U,V,W (123FA0). So, DB2 can be sure that all the changes made to this page have already been committed and there is no chance of any other changes in the pipe that might be written to this page. For Page 2127 the CLSN test is successful. Same is true for Page 45.

When CLSN test succeeds, DB2 uses lock avoidance. When CLSN test fails, DB2 does an other test called PUNC test. The PUNC test operates at the row level.


PUNC TEST.

PUNC bit stands for Possibly Un Committed bit.

Every Data Row in the data page has a row header. One of the bits in the row header is the PUNC bit. This PUNC bit is set every time an update activity is done to that row.

It is important to understand that there is a chance that a data row that has been committed can still have it's PUNC bit set. This is because the PUNC bits are not reset as soon as the changes are committed.

For a given Data page, the PUNC  bit is reset (set to 0)  if

1. The Unit of Recovery that is doing the update to the page is running with Page or Row locking.
2. If the Page Log RBA is Less than the CLSN for the Page set.
3. If at least 25 % of the rows in the page have their PUNC bits set.

For a given Row, the PUNC bit is reset just for the row being read (Assuming the CLSN test succeeded for that page).




  When the CLSN test fails, DB2 does the PUNC test. It takes a look at the row header to see if the PUNC bit is 0. If it is not 0 the PUNC test also fails and DB2 has to do a costly cross system call to the IRLM.