Article-Space Search Algorithm Part 1

PHYSICAL DATA ORGANIZATION FUNDAMENTALS. SPACE SEARCH ALGORITHM PART-1

When a SQL statement inserts a new row into a table how does DB2 determine the page into which the row will be accommodated? Why do we need to know about how DB2 searches for free space? How can we find out how effective the space searches have been for a given table space?

This article is a primer before we move on to the space search algorithms. A clear understanding of clustering is needed before proceeding to space search algorithms. If you are thorough with the concept of clustering, you can move on to the next article in this series.

How does DB2 determine the physical location of a new record that might be inserted?

To put it very simply.

In case of a table space DB2 :
(a) Tries to put the newly inserted row in that page of the table space where the clustering is maintained if the table has a clustering index.
(b) Use any already existing empty space before allocating new space.

In case of an index DB2:
(a) Guarantee key sequence
(b) Use any already existing empty space before allocating new space.

IMPORTANT NOTE: I am not considering MEMBER CLUSTER option that is used to reduce space map contention in data sharing environments.

About clustering, If the physical sequence of data rows in the table space pages closely follow the logical sequence of index entries (of the clustering index ) then the data in the table space is said to be highly clustered. It must be remembered that the index entries, no matter how disorganized they might be physically are always in logical order.

Let us see what I mean by this.

Consider the index shown below. It shows a simple index that is defined on PHONE_NUMBER ASCENDING. If you look at the the physical structuring of the index, the first set of entries of the index are actually stored in the last page, the next set of entries are stored in the first page and the last set of entries are stored some where in the middle. In spite of this when ever you access the index in sequence, you are GUARANTEED to access the LOGICAL PAGE 1 first, immaterial of whether it is the 1st physical page or the 100 th physical page. This is made possible by pointers that are maintained by DB2's B-Tree structure that always points you to the correct page and pointers between index pages.

Unlike an index , table spaces do not maintain pointers that maintain logical order. So if DB2 stores row in a random order in the table space pages and you access the table space without the index, you will get the data in the same random order. If the cluster ratio of the table space is 100 % then you will get the data in the sequential logical order even without using the index.

Consider a perfectly clustered table space as shown above. The Physical sequence of the records are in perfect match with the logical sequence of the index.
Now assume that a new record with a telephone number of 210-440-1212 is being inserted into the table, in order for DB2 to maintain clustering, it needs to be inserted some where between PAGE 1 and PAGE 2 of the table space. If DB2 is unable to find free space between PAGE1 and PAGE2 it has to insert this row in an other page that is not in sequence. At this point the Cluster ratio of this table space will fall from the perfect 100 %.

On the other hand in the the index, no matter where the index entry goes physically, DB2 will update the pointers to include the new entry at the correct position in the sequence. So the concept of clustering is non-existent for indexes.

I will be dealing with space search algorithms specific to table spaces and indexes, Significance of OFFPOS and INDREF measures in my subsequent articles.