Buffering
Part 2.
One of the most important factors affecting
database
performance is buffer pool performance. The way in which buffer
pools
work need to be understood to leverage and tune buffers
This is the second article in this series and
deals with buffer chain
management and Page Steal Algorithms. You need to read the previous
article before reading this one.
To ensure clarity and of this article I am
considering just Random Reads. It needs to be remembered that similar
but separate chains are maintained for sequential reads and empty pages.
The foremost objective of having a bufferpool is
to be able to reuse
the data
that has been read once when DB2 needs it the next time, without
actually reading it again from disk. When this happens it is called a
bufferpool Hit.
DB2 buffering is based on two common sense
assumptions. (a) Buffer pool is a limited and costly resource.
Therefore all data cannot be buffered. (b) When a data page is accessed
once the
chances of the same page being accessed again is high.
To understand chain management let us assume an
initial state as shown in the figure below.

Let us see what happens when the Data Manager
requests the Buffer manager to get Page A. But, it is not found in the
bufferpool and there are free pages in the pool.
The Buffer Manager checks to see if the Page A is
in the chain. The page is not there and it is a bufferpool miss.
The Buffer Manager reads Page A from disk and
places it in an empty buffer page.
The Buffer Manager updates the pointer in the hash
table to point to the Most recently used end of the buffer chain
(Page A) from the previous value (Page B).
The Buffer Manager also updates the buffer page headers of
Page A and Page B so that Page A becomes the Most recently used
page in the chain and Page B becomes the second most recently used page
in the chain.
The Most Recently used order at this point is Page
A (hottest), Page B, Page C, Page D and Page E (Coldest).

Now let us see what happens when the Data manager requests Page E .
Before this request for Page E, it was the Least Recently Used
page(coldest). Therefore it was at the end of the chain. But this
request for Page E has made it the hottest page now.
The Buffer Manager returns the location of Page E to the Data Manager..
The Buffer Manager updates the pointer in the hash
table to point to the Most recently used end of the buffer chain
(Page E) from the previous value (Page A).
The Buffer Manager also updates the buffer page headers of
Page E
and Page A so that Page E becomes the Most recently used page in the
chain and Page D becomes the least recently used page in the
chain.
The Most recently used order at this point becomes
Page E (Hottest), Page A, Page B, Page C and Page D (Coldest).

Next, let us see what happens when data manager
requests a Page X that is not there in the bufferpool and there are no
empty pages in the chain.
The Buffer Manager checks to see if the page is in the chain. The page
is not there and it is a bufferpool miss.
The Buffer Manager checks to see which is the coldest page in the
chain. (Page D) (assuming it has a committed page)
The Buffer Manager reads Page X from disk and
places it where Page D used to be.
The Buffer Manager updates the pointer in the hash
table to point to the Most recently used end of the buffer chain
(Page X) from the previous value (Page E).
The Buffer Manager also updates the buffer page headers of
Page X
and Page E so that Page X becomes the Most recently used page in the
chain and Page C becomes the least recently used page in the
chain.

It must be apparent by now that just a move of a page to the Most
Recently used (Hot) end of the chain, which happens during every read
causes
(1) Maintenance of the hash table.
(2) And it's chained buffer page headers.
Even though this is just pure CPU path and is very quick, the
number of reads happening in a production system is a few thousand get
pages per second and it all adds up.
The path is further lengthened if the page is not in the buffer or if
the page is not in the buffer and there are no empty pages or if the
page is not in the buffer and there are no stealable pages.
There is considerable performance improvement if we avoid this costly
LRU chain maintenance by using a simple First in First Out algorithm.
It might be a good idea to consider using FIFO strategy instead of the
default LRU strategy in the following two cases.
(1) For Sort Work Bufferpool associated with DSNDB07. The reason being
, there is no concept of reuse of sorted data. The data is sorted to
satisfy an ORDER BY that was issued by a specific application. There is
no case of an other application reading the sorted data of the first
application.
(2) When you plan to buffer an entire tablespace or an index. It is a
common practice to buffer entire high read tables and indexes that are
relatively small and are read only. An example of one such table would
be a Pricing menu look up table that is seldom changed but is accessed
millions of times a day by every customer. If you already know that
there is enough space in the dedicated buffer pool to hold all the
pages of this tablespace, then there is no point in maintaining the LRU
chain. This is because no pages are going to be aged and casted out to
make room for new pages. There is enough space for all the possible
pages that can be requested from this tablespace with out having to
cast out any existing page. Using FIFO instead of LRU can
cause significant performance gains in this case.