Obtaining NuoDB Optimizer Estimates

This section assumes you have run the steps at Examples of Debugging a Slow Query.

EXPLAIN WITH OPT_ESTIMATES

Running the same EXPLAIN commands as at Identifying the Indexes a Query is Using, with the addition of (OPT_STATISTICS ON) results in similar output except includes NuoDB Optimizer estimates for the IDX_PLAYERS_FIRSTNAME index. Optimizer statistics include:

Statistic Meaning

cost

This represents an estimated number of operations required to execute the query. It is somewhat arbitrary and only has meaning to the user when compared with cost of other queries. Lower numbers are better.

selectivity

This is the estimated percentage of rows used in the index scan for the query. Again, lower numbers are better.

rows

This is estimated number of rows used by the index for this query.

Remember, run: ANALYZE TABLE PLAYERS first to be sure index statistics are generated.

The following panel shows examples of EXPLAIN with optimizer estimates turned on.

# Don't forget to run ANALYZE TABLE on your table
ANALYZE TABLE players;

# Run EXPLAIN with optimizer estimates turned on. Note the line:
#  Bitmap index FIRST_IDX [cost: 26.6, selectivity: 0.118%, rows: 8]
EXPLAIN (opt_estimates on) SELECT firstname, lastname, playerID FROM players
    WHERE firstname='Kurt';
Select
  List
    Field HOCKEY.PLAYERS.FIRSTNAME (1)
    Field HOCKEY.PLAYERS.LASTNAME (1)
    Field HOCKEY.PLAYERS.PLAYERID (1)
  Boolean sieve
    Eql
      Field HOCKEY.PLAYERS.FIRSTNAME (1)
      "Kurt" (varchar)
    Inversion HOCKEY.PLAYERS  (1)
      Bitmap index IDX_PLAYERS_FIRSTNAME [cost: 28.2, selectivity: 0.082%, rows: 9]
        "Kurt" (varchar)

# This time let's use the string "Mike". Remember, this PLAYERS table has an
# inordinate number of rows with FIRSTNAME equal to "Mike"``
EXPLAIN (opt_estimates on) SELECT firstname, lastname, playerID FROM players
    WHERE firstname='Mike';``
 Select
  List
    Field HOCKEY.PLAYERS.FIRSTNAME (1)
    Field HOCKEY.PLAYERS.LASTNAME (1)
    Field HOCKEY.PLAYERS.PLAYERID (1)
  Boolean sieve
    Eql
      Field HOCKEY.PLAYERS.FIRSTNAME (1)
      "Mike" (varchar)
    Inversion HOCKEY.PLAYERS  (1)
      Bitmap index IDX_PLAYERS_FIRSTNAME [cost: 9024.0, selectivity: 40.000%, rows: 3008]
        "Mike" (varchar)

So the optimizer estimates for retrieving rows with a FIRSTNAME equal to “Kurt” were:

Bitmap index IDX_PLAYERS_FIRSTNAME [cost: 26.2, selectivity: 0.118%, rows: 8]

The cost is relatively low (compared to the next query), selectivity is pretty good at less than 1% and the optimizer estimates that approximately 8 rows of the index will be scanned to retrieve the result set for the query.

The optimizer estimates for retrieving rows with a FIRSTNAME equal to “Mike” were:

Bitmap index FIRST_IDX [cost: 9024.0, selectivity: 40.000%, rows: 3008]

The cost here is huge compared to the previous query, selectivity is 40% which means approximately 40% of the players are named “Mike” and the optimizer will have to scan approximately 3008 rows of the index in order to retrieve the result set for the query.

Perhaps an index on LASTNAME, a column with data values that are much more selective, should be considered instead?

For more information on the Query Plan see EXPLAIN.