Specifying Query Access Plans in the InterBase Optimizer Deej Bredenberg September 15, 1993 Requirements Various customers have requested the ability to view the access plan being used by InterBase in query processing, as well as modifying that plan when they believe they can improve on it. Here are the requirements for setting and getting the access plan in the optimizer: 1. Provide a readable display of the access plan generated by the optimizer for any given query (ISQL only). 2. Provide extended SQL syntax to specify the plan to be used in a query. (DSQL/ISQL and embedded SQL). 3. Make the output of requirement 1, acceptable as input to requirement 2, so that the user may simply modify the plan rather than generating it from scratch. Limitations If the user specifies the plan to be used in a query, we will make no attempts to analyze the submitted plan for errors of omission. That is, the optimizer is either on or off; it will not attempt to complete a partial plan or make recommendations for doing so. Of course, if the query cannot be accomplished via the submitted plan an error will be reported. Elements of Plan Selection To determine the options needed in setting a plan, it is important to survey the methods used by the optimizer to generate a plan. In this discussion, the term stream refers generically to a set of records, such as a relation or a relation with restrictions placed on it. Assigning Indices InterBase uses indices for three basic purposes: 1. Equalities and Inequalities: When a field is compared to a constant, and the field is indexed, the index can be used to retrieve values greater than, less than, or equal to the constant. Utilizing the index rather than a sequential scan avoids fetching records that don't meet the selection criteria. 2. Matching joined streams: When two streams are joined by matching two fields, an index can be used to perform the join. If an index is available for one of the fields, the other stream is retrieved first and key values matched against the index. 3. Sorting and grouping: When a query is sorted by a field or fields which are the keys of an index, the index is used to retrieve the records in order rather than performing a sort. This is called a "navigational" walk of the index. It is not navigational in the sense that the index is walked backwards, but in that no bitmap is generated for the stream. [Indices are also used to walk forwards and backwards through a stream to handle IDAPI-style navigation, but for the moment this functionality does not surface through SQL and so will not be handled.] Combining Indices For each selection criterion which matches an index, InterBase generates a bitmap of the selected records. It can combine the bitmaps by ANDing and Oring them together according to the logic of the query. This generates a single bitmap which describes all the selected records from a single stream. This is advantageous when iterating through a stream multiple times, since it is much faster to scan the bitmap than a set of indices. Determining Join Order In joining two or more streams together, it is often crucial to determine which stream should be retrieved first. In general, the stream which is the most expensive to retrieve should be retrieved first. If you think of a join as a set of nested loops, it makes sense that the innermost loop will be executed the most times. Similarly, the last stream in the join list will be fetched the most times, so it had better be the cheapest to retrieve. Generating Rivers When two or more streams are combined together, they form a river. The optimizer first attempts to find the longest river which can be retrieved via indices alone. Then it attempts to form the remaining streams into the longest river, and so on. Once it has a set of rivers which are retrievable via indexed access, it joins them together in order from left to right based on the estimated cost of retrieving each river. [This is where the process broke down in the case of Philly Stock Exchange. Finding the longest river does not guarantee the fastest retrieval. The customer should be given flexibility to specify the rivers used, and the order in which the rivers will be joined together.] Cost Estimation The cost of retrieval of a stream depends on several variables: whether there is an index, the selectivity of that index, whether there are selection criteria, the cardinality of the underlying relation, and whether the stream needs to be sorted. The optimizer uses all these variables to determine the cost of a retrieval of a stream. For example, when retrieving an indexed equality, the cost is estimated as : cardinality of base relation * selectivity of index + overhead of index retrieval Since rough estimation is used to determine the values of all these variables, and the relative importance each of these variables cannot be precisely determined for all queries, the optimizer can often select a plan which is far from the best. The user may simply wish to determine a plan based on empirical data for a particular query. But we should advise the customer that the best plan may change over time as the data changes, so specifying a plan as part of the query implies an added maintenance burden for the SQL programmer. Sort Merges When a join term is specified between two streams, and no index is available, the optimizer specifies a sort merge between the two streams. This means that both streams will be sorted and the results merged together. Once the streams are sorted, the engine can scan through both streams just once to form the join, rather than iterating through the second stream for every record in the first stream. Set Plan Syntax Syntax will be added to the SELECT expression in GPRE and DSQL/ISQL to allow the user to set the plan for a query. The syntax should work for SELECT statements used in creating a view, a stored procedure, or a trigger. It should also work for subselects in an UPDATE or DELETE statement. Syntax to specify a plan for UPDATE or DELETE is also an option, though it is not planned for now. The Select Statement A PLAN clause will be added at the end of the SELECT expression. The syntax should support all current optimizer options, and be extensible enough to allow for new options. And abridged syntax for the SELECT statement would then be: select_expression [ UNION select_expression ...] [ ORDER BY field_list ] select_expression := SELECT [ DISTINCT | ALL] [field_list|'*'] FROM relation_list [ WHERE search_condition ] [ GROUP BY field_list ] [ HAVING search condition ] [ PLAN plan_expression ] Plans can be specified independently for the query and any subqueries. Specifying a plan on the main select or any subselect does not require a plan to be specified on any of the other select expressions in the query. The Plan Expression The syntax for the plan expression is as follows : plan_expression := [join_type] '('plan_item_list')' plan_item_list := plan_item | plan_item ',' plan_item_list plan_item := table_specifier access_type | plan_expression The syntax allows specification of a single relation or a join of two of more relations in one pass. Nested parenthesis can be used to specify any combination of joins. Reverse polish notation is used to specify the join operator, mostly because an infix operator would confuse the parser. join_type := JOIN | [SORT] MERGE The default join type is a JOIN. The keyword MERGE can be used to specify a sort merge of the substreams. The optional SORT keyword can be used for clarification purposes. table_specifier := table_name | alias_name The table name or the alias for the table can be specified here. If a table is used more than once in the query, aliases must be used to distinguish them in the plan. access_type := [ NATURAL | INDEX index_list | ORDER index_name ] index_list := index_name | index_name', 'index_list The default access is NATURAL order, which means sequentially accessing the records in no defined order. The INDEX keyword allows specification of one or more indices for satisfying booleans and join terms in the query. The user must specify all the indices to be used; if any booleans or join terms remain after all indices are exhausted, they will be evaluated without benefit of an index. If any indices are specified which cannot be utilized, an error will be returned. The ORDER keyword specifies navigational access, to effect a sort through the use of the index. In order to cause the result of the query to be sorted, the stream must be leftmost. If the resultant stream is not sorted as specified by the ORDER clause, a physical sort will be performed. Display Plan Syntax We should provide the following command in ISQL: ISQL> SET EXPLAIN; Once this option has been set, all select-type statements will display the plan chosen by the optimizer for the provided query, as shown in the examples below. The SET EXPLAIN command will toggle on and off this feature, just as with other ISQL options. Examples of Plan Syntax If SET EXPLAIN were implemented today, these examples demonstrate the plans the optimizer would choose. Moving the semicolon from the end of the query to the end of the plan yields a valid query with a plan specified: 1. Simple retrieval. ISQL> SELECT * FROM CITIES; PLAN (CITIES NATURAL) 2. Ordered retrieval, utilizing an index for ordering. ISQL> SELECT * FROM CITIES ORDER BY CITY; PLAN (CITIES ORDER CITIES_1) 3. Simple join, resulting in a cross product. ISQL>SELECT * FROM CITIES C, STATES S; PLAN JOIN(STATE NATURAL, CITIES NATURAL) 4. Join with indexed equality. ISQL>SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE; PLAN JOIN(STATE NATURAL, CITIES INDEX DUPE_CITY) 5. Equi-join with index used for ordering. ISQL>SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE ORDER BY S.STATE; PLAN JOIN(STATES ORDER STATE1, CITIES INDEX DUPE_CITY) 6. Equi-join with no available indices, resulting in a sort merge. ISQL>DROP INDEX STATE1; ISQL>DROP INDEX DUPE_CITY; ISQL>SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE; PLAN SORT MERGE (CITIES NATURAL, STATES NATURAL) 7. Three-way join with two indexed equalities. ISQL>SELECT * FROM CITIES C, STATES S, MAYORS M WHERE C.CITY = M.CITY AND C.STATE = S.STATE; PLAN JOIN (STATES NATURAL, CITIES INDEX DUPE_CITY, MAYORS INDEX MAYORS_1) --------------------------------------------------------------------------- Last Modified 5/14/96