Example of Obtaining Index Use Statistics
This section assumes you have run the steps at Examples of Debugging a Slow Query.
Using the PLAYERS
table, we can look at index statistics.
Don’t forget Analyze! Remember, run: ANALYZE TABLE PLAYERS to be sure index statistics are generated.
|
# Don't forget to analyze your table
ANALYZE TABLE players;
# Note the DISTINCTCOUNT for our IDX_PLAYERS_FIRSTNAME index - not too good...
SELECT * FROM system.indexstatistics WHERE tablename = 'PLAYERS';
OBJECTID CATALOGID SCHEMA TABLENAME INDEXNAME KEYCOUNT
--------- ---------- ------- ---------- --------------------- ---------
1 3567 HOCKEY PLAYERS PLAYERS..PRIMARY_KEY 7520
23 3567 HOCKEY PLAYERS IDX_PLAYERS_FIRSTNAME 7520
SELECT * FROM system.indexprefixstatistics WHERE objectid = 23 and catalogid = 3567;
OBJECTID CATALOGID PREFIXLENGTH DISTINCTCOUNT AVERAGEKEYLENGTH SOURCE
--------- ---------- ------------- -------------- ----------------- ----------
23 3567 1 845 5.32 statistics
SELECT * FROM system.indexhistograms where objectid = 23 and catalogid = 3567;
OBJECTID CATALOGID HISTOGRAMID FIELDCOUNT MAXRESOLUTION
--------- ---------- ------------ ----------- --------------
23 3567 1 1 10
# The selectivity of our Histogram buckets leaves a lot to be desired...
SELECT * FROM system.indexhistogrambuckets WHERE objectid = 23 AND catalogid = 3567;
OBJECTID CATALOGID HISTOGRAMID BUCKETINDEX BOUNDARY
--------- ---------- ------------ ------------ ---------------
23 3567 1 0 [0]
23 3567 1 1 [Dan]
23 3567 1 2 [Jack]
23 3567 1 3 [Matt]
23 3567 1 4 [Mike]
23 3567 1 5 [Mike]
23 3567 1 6 [Mike]
23 3567 1 7 [Mike]
23 3567 1 8 [Mike]
23 3567 1 9 [Ray]
23 3567 1 10 [Ziggy]
Now we see that there are eleven buckets in the histogram for the index IDX_PLAYERS_FIRSTNAME
. The 0th bucket contains one element and that is the lowest key in the index, in this case a [0] zero length key. We can easily see that we do not have good selectivity on this index and that most buckets are filled with keys equal to “Mike
”.
In most cases, using the system default value of 10 for the number of histogram buckets that an index will use to store statistics will allow for cost estimates that are reasonable enough for the optimizer to choose the most efficient path to retrieve the result set of a query. However table size increases and data value distributions become highly skewed, having only 10 histogram buckets can hide these spikes in the distribution of data values. This can result in less than optimal cost estimates for index statistics. Therefore, the number of index histogram buckets can be increased at the time of index creation.
For example, using the above example, how would these cost estimates change if we increase the number of histogram buckets for this index.
# Create the index with an increased number of histogram buckets
DROP INDEX idx_players_firstname;
CREATE INDEX idx_players_firstname ON players(firstname) WITH (RESOLUTION 300);
#Analyze the new index to update statistics
ANALYZE INDEX idx_players_firstname;
SELECT * FROM system.indexhistogrambuckets WHERE objectid = 24 AND catalogid = 3567;
OBJECTID CATALOGID HISTOGRAMID BUCKETINDEX BOUNDARY
--------- ---------- ------------ ------------ ---------------
24 3567 1 0 [0]
24 3567 1 1 [Al]
24 3567 1 2 [Aleksey]
24 3567 1 3 [Alexander]
24 3567 1 4 [Alexsandr]
24 3567 1 5 [Andre]
...
24 3567 1 93 [Mikael]
24 3567 1 94 [Mike]
24 3567 1 95 [Mike]
...
24 3567 1 256 [Mike]
24 3567 1 257 [Mikhail]
24 3567 1 258 [Morris]
...
24 3567 1 299 [Wayne]
24 3567 1 300 [Ziggy]
EXPLAIN (opt_estimates on) SELECT firstname, lastname, playerID FROM players WHERE firstname='Mike';
Select
List
Field HOCKEY.PLAYERS.FIRSTNAME (1)
Field HOCKEY.PLAYERS.LASTNAME (1)
Field HOCKEY.PLAYERS.PLAYERID (1)
Boolean sieve
Eql
Field HOCKEY.PLAYERS.FIRSTNAME (1)
"Mike" (varchar)
Inversion HOCKEY.PLAYERS (1)
Bitmap index IDX_PLAYERS_FIRSTNAME [cost: 12182.4, selectivity: 54.000%, rows: 4060]
"Mike" (varchar)
The cost estimates now being returned are more in line with actual data values. Distributing the data across an increased number of histogram buckets decreases the number of rows in each bucket which better relects the spikes in distribution of data.
Previously, the optimizer cost estimates for retrieving rows with a FIRSTNAME
equal to “Mike
” were:
Bitmap index FIRST_IDX [cost: 9024.0, selectivity: 40.000%, rows: 3008]
The new cost estimates after increasing the number of histogram buckets for the index are now calculated as:
Bitmap index FIRST_IDX [cost: 12182.4, selectivity: 54.000%, rows: 4060]
Both the selectivity and the number of rows are much closer to the actual values. There are 4093 rows for “Mike
” out of 7520 total rows, or 54% of the total rows. However, there is a cost for both collecting and storing these statistics. This cost must outweigh the benefits gained in query performance.