Specifying Query Access Plans in the InterBase Optimizer

                     Deej Bredenberg September 15, 1993

Requirements

Various customers have requested the ability to view the access plan being
used by InterBase in query processing, as well as modifying that plan when
they believe they can improve on it. Here are the requirements for setting
and getting the access plan in the optimizer:

  1. Provide a readable display of the access plan generated by the
     optimizer for any given query (ISQL only).
  2. Provide extended SQL syntax to specify the plan to be used in a query.
     (DSQL/ISQL and embedded SQL).
  3. Make the output of requirement 1, acceptable as input to requirement
     2, so that the user may simply modify the plan rather than generating
     it from scratch.

Limitations

If the user specifies the plan to be used in a query, we will make no
attempts to analyze the submitted plan for errors of omission. That is, the
optimizer is either on or off; it will not attempt to complete a partial
plan or make recommendations for doing so. Of course, if the query cannot
be accomplished via the submitted plan an error will be reported.

Elements of Plan Selection

To determine the options needed in setting a plan, it is important to
survey the methods used by the optimizer to generate a plan. In this
discussion, the term stream refers generically to a set of records, such as
a relation or a relation with restrictions placed on it.

Assigning Indices

InterBase uses indices for three basic purposes:

  1. Equalities and Inequalities: When a field is compared to a constant,
     and the field is indexed, the index can be used to retrieve values
     greater than, less than, or equal to the constant. Utilizing the index
     rather than a sequential scan avoids fetching records that don't meet
     the selection criteria.
  2. Matching joined streams: When two streams are joined by matching two
     fields, an index can be used to perform the join. If an index is
     available for one of the fields, the other stream is retrieved first
     and key values matched against the index.
  3. Sorting and grouping: When a query is sorted by a field or fields
     which are the keys of an index, the index is used to retrieve the
     records in order rather than performing a sort. This is called a
     "navigational" walk of the index. It is not navigational in the sense
     that the index is walked backwards, but in that no bitmap is generated
     for the stream.

[Indices are also used to walk forwards and backwards through a stream to
handle IDAPI-style navigation, but for the moment this functionality does
not surface through SQL and so will not be handled.]

Combining Indices

For each selection criterion which matches an index, InterBase generates a
bitmap of the selected records. It can combine the bitmaps by ANDing and
Oring them together according to the logic of the query. This generates a
single bitmap which describes all the selected records from a single
stream. This is advantageous when iterating through a stream multiple
times, since it is much faster to scan the bitmap than a set of indices.

Determining Join Order

In joining two or more streams together, it is often crucial to determine
which stream should be retrieved first. In general, the stream which is the
most expensive to retrieve should be retrieved first. If you think of a
join as a set of nested loops, it makes sense that the innermost loop will
be executed the most times. Similarly, the last stream in the join list
will be fetched the most times, so it had better be the cheapest to
retrieve.

Generating Rivers

When two or more streams are combined together, they form a river. The
optimizer first attempts to find the longest river which can be retrieved
via indices alone. Then it attempts to form the remaining streams into the
longest river, and so on. Once it has a set of rivers which are retrievable
via indexed access, it joins them together in order from left to right
based on the estimated cost of retrieving each river.

[This is where the process broke down in the case of Philly Stock Exchange.
Finding the longest river does not guarantee the fastest retrieval. The
customer should be given flexibility to specify the rivers used, and the
order in which the rivers will be joined together.]

Cost Estimation

The cost of retrieval of a stream depends on several variables: whether
there is an index, the selectivity of that index, whether there are
selection criteria, the cardinality of the underlying relation, and whether
the stream needs to be sorted.

The optimizer uses all these variables to determine the cost of a retrieval
of a stream. For example, when retrieving an indexed equality, the cost is
estimated as :

        cardinality of base relation * selectivity of index + overhead of index retrieval

Since rough estimation is used to determine the values of all these
variables, and the relative importance each of these variables cannot be
precisely determined for all queries, the optimizer can often select a plan
which is far from the best. The user may simply wish to determine a plan
based on empirical data for a particular query. But we should advise the
customer that the best plan may change over time as the data changes, so
specifying a plan as part of the query implies an added maintenance burden
for the SQL programmer.

Sort Merges

When a join term is specified between two streams, and no index is
available, the optimizer specifies a sort merge between the two streams.
This means that both streams will be sorted and the results merged
together. Once the streams are sorted, the engine can scan through both
streams just once to form the join, rather than iterating through the
second stream for every record in the first stream.

Set Plan Syntax

Syntax will be added to the SELECT expression in GPRE and DSQL/ISQL to
allow the user to set the plan for a query. The syntax should work for
SELECT statements used in creating a view, a stored procedure, or a
trigger. It should also work for subselects in an UPDATE or DELETE
statement. Syntax to specify a plan for UPDATE or DELETE is also an option,
though it is not planned for now.

The Select Statement

A PLAN clause will be added at the end of the SELECT expression. The syntax
should support all current optimizer options, and be extensible enough to
allow for new options. And abridged syntax for the SELECT statement would
then be:

   select_expression [ UNION select_expression ...]



                [ ORDER BY field_list ]



select_expression := SELECT [ DISTINCT | ALL] [field_list|'*']

                        FROM relation_list

                        [ WHERE search_condition ]

                        [ GROUP BY field_list ]

                        [ HAVING search condition ]

                        [ PLAN plan_expression ]

Plans can be specified independently for the query and any subqueries.
Specifying a plan on the main select or any subselect does not require a
plan to be specified on any of the other select expressions in the query.

The Plan Expression

The syntax for the plan expression is as follows :

        plan_expression := [join_type] '('plan_item_list')'

        plan_item_list := plan_item | plan_item ',' plan_item_list

        plan_item := table_specifier access_type | plan_expression

The syntax allows specification of a single relation or a join of two of
more relations in one pass. Nested parenthesis can be used to specify any
combination of joins. Reverse polish notation is used to specify the join
operator, mostly because an infix operator would confuse the parser.

        join_type := JOIN | [SORT] MERGE

The default join type is a JOIN. The keyword MERGE can be used to specify a
sort merge of the substreams. The optional SORT keyword can be used for
clarification purposes.

        table_specifier := table_name | alias_name

The table name or the alias for the table can be specified here. If a table
is used more than once in the query, aliases must be used to distinguish
them in the plan.

        access_type := [ NATURAL | INDEX index_list | ORDER index_name ]

        index_list := index_name | index_name', 'index_list

The default access is NATURAL order, which means sequentially accessing the
records in no defined order.

The INDEX keyword allows specification of one or more indices for
satisfying booleans and join terms in the query. The user must specify all
the indices to be used; if any booleans or join terms remain after all
indices are exhausted, they will be evaluated without benefit of an index.
If any indices are specified which cannot be utilized, an error will be
returned.

The ORDER keyword specifies navigational access, to effect a sort through
the use of the index. In order to cause the result of the query to be
sorted, the stream must be leftmost. If the resultant stream is not sorted
as specified by the ORDER clause, a physical sort will be performed.

Display Plan Syntax

We should provide the following command in ISQL: ISQL> SET EXPLAIN; Once
this option has been set, all select-type statements will display the plan
chosen by the optimizer for the provided query, as shown in the examples
below.

The SET EXPLAIN command will toggle on and off this feature, just as with
other ISQL options.

Examples of Plan Syntax

If SET EXPLAIN were implemented today, these examples demonstrate the plans
the optimizer would choose. Moving the semicolon from the end of the query
to the end of the plan yields a valid query with a plan specified:

  1. Simple retrieval.
     ISQL> SELECT * FROM CITIES;
     PLAN (CITIES NATURAL)

  2. Ordered retrieval, utilizing an index for ordering.
     ISQL> SELECT * FROM CITIES ORDER BY CITY;
     PLAN (CITIES ORDER CITIES_1)

  3. Simple join, resulting in a cross product.
     ISQL>SELECT * FROM CITIES C, STATES S;
     PLAN JOIN(STATE NATURAL, CITIES NATURAL)

  4. Join with indexed equality.
     ISQL>SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE;
     PLAN JOIN(STATE NATURAL, CITIES INDEX DUPE_CITY)

  5. Equi-join with index used for ordering.
     ISQL>SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE ORDER BY
     S.STATE;
     PLAN JOIN(STATES ORDER STATE1, CITIES INDEX DUPE_CITY)

  6. Equi-join with no available indices, resulting in a sort merge.
     ISQL>DROP INDEX STATE1;
     ISQL>DROP INDEX DUPE_CITY;
     ISQL>SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE;
     PLAN SORT MERGE (CITIES NATURAL, STATES NATURAL)

  7. Three-way join with two indexed equalities.
     ISQL>SELECT * FROM CITIES C, STATES S, MAYORS M WHERE C.CITY = M.CITY
     AND C.STATE = S.STATE;
     PLAN JOIN (STATES NATURAL, CITIES INDEX DUPE_CITY, MAYORS INDEX
     MAYORS_1)

---------------------------------------------------------------------------
Last Modified 5/14/96