Aritcle-BUFFERING PART2

Buffering Part 2.

One of the most important factors affecting database performance is buffer pool performance. The way in which buffer pools work need to be understood to leverage and tune buffers

This is the second article in this series and deals with buffer chain management and Page Steal Algorithms. You need to read the previous article before reading this one.

To ensure clarity and of this article I am considering just Random Reads. It needs to be remembered that similar but separate chains are maintained for sequential reads and empty pages.

The foremost objective of having a bufferpool is to be able to reuse the data that has been read once when DB2 needs it the next time, without actually reading it again from disk. When this happens it is called a bufferpool Hit.

DB2 buffering is based on two common sense assumptions. (a) Buffer pool is a limited and costly resource. Therefore all data cannot be buffered. (b) When a data page is accessed once the chances of the same page being accessed again is high.

To understand chain management let us assume an initial state as shown in the figure below.

Let us see what happens when the Data Manager requests the Buffer manager to get Page A. But, it is not found in the bufferpool and there are free pages in the pool.

The Buffer Manager checks to see if the Page A is in the chain. The page is not there and it is a bufferpool miss.
The Buffer Manager reads Page A from disk and places it in an empty buffer page.
The Buffer Manager updates the pointer in the hash table to point to the Most recently used end of the buffer chain (Page A) from the previous value (Page B).
The Buffer Manager also updates the buffer page headers of Page A and Page B so that Page A becomes the Most recently used page in the chain and Page B becomes the second most recently used page in the chain.

The Most Recently used order at this point is Page A (hottest), Page B, Page C, Page D and Page E (Coldest).

Now let us see what happens when the Data manager requests Page E . Before this request for Page E, it was the Least Recently Used page(coldest). Therefore it was at the end of the chain. But this request for Page E has made it the hottest page now.

The Buffer Manager returns the location of Page E to the Data Manager..
The Buffer Manager updates the pointer in the hash table to point to the Most recently used end of the buffer chain (Page E) from the previous value (Page A).
The Buffer Manager also updates the buffer page headers of Page E and Page A so that Page E becomes the Most recently used page in the chain and Page D becomes the least recently used page in the chain.

The Most recently used order at this point becomes Page E (Hottest), Page A, Page B, Page C and Page D (Coldest).

Next, let us see what happens when data manager requests a Page X that is not there in the bufferpool and there are no empty pages in the chain.

The Buffer Manager checks to see if the page is in the chain. The page is not there and it is a bufferpool miss.
The Buffer Manager checks to see which is the coldest page in the chain. (Page D) (assuming it has a committed page)
The Buffer Manager reads Page X from disk and places it where Page D used to be.
The Buffer Manager updates the pointer in the hash table to point to the Most recently used end of the buffer chain (Page X) from the previous value (Page E).
The Buffer Manager also updates the buffer page headers of Page X and Page E so that Page X becomes the Most recently used page in the chain and Page C becomes the least recently used page in the chain.

It must be apparent by now that just a move of a page to the Most Recently used (Hot) end of the chain, which happens during every read
causes
(1) Maintenance of the hash table.
(2) And it's chained buffer page headers.

Even though this is just pure CPU path and is very quick, the number of reads happening in a production system is a few thousand get pages per second and it all adds up.

The path is further lengthened if the page is not in the buffer or if the page is not in the buffer and there are no empty pages or if the page is not in the buffer and there are no stealable pages.

There is considerable performance improvement if we avoid this costly LRU chain maintenance by using a simple First in First Out algorithm.

It might be a good idea to consider using FIFO strategy instead of the default LRU strategy in the following two cases.

(1) For Sort Work Bufferpool associated with DSNDB07. The reason being , there is no concept of reuse of sorted data. The data is sorted to satisfy an ORDER BY that was issued by a specific application. There is no case of an other application reading the sorted data of the first application.

(2) When you plan to buffer an entire tablespace or an index. It is a common practice to buffer entire high read tables and indexes that are relatively small and are read only. An example of one such table would be a Pricing menu look up table that is seldom changed but is accessed millions of times a day by every customer. If you already know that there is enough space in the dedicated buffer pool to hold all the pages of this tablespace, then there is no point in maintaining the LRU chain. This is because no pages are going to be aged and casted out to make room for new pages. There is enough space for all the possible pages that can be requested from this tablespace with out having to cast out any existing page. Using FIFO instead of LRU can cause significant performance gains in this case.