Skip to content

Support nuances of the SELECT syntax: WITH, UNION, subqueries etc. #106

Open
@adamziel

Description

@adamziel

Let's go through the MySQL documentation pages and make sure even the complex SELECT queries are supported by the SQLite integration plugin:

This likely means rewriting execute_select as more of a grammar parser or a state machine and reason about each encountered token. In contrast, the current approach is to consume all the tokens unless a tactical adjustment applies. This way we could reuse the SELECT logic for WITH, UNIONs, subqueries, etc. Currently we cannot, because the execute_select method assumes it acts on an entire query, not on a part of it.

The implementation could look like this:

// Parse WITH
if($next_token->is_operator('WITH')) {
    $this->consume_with_clause();
}

/**
 * Processes the WITH clause (https://dev.mysql.com/doc/refman/8.0/en/with.html):
 *      WITH [RECURSIVE]
 *          cte_name [(col_name [, col_name] ...)] AS (subquery)
 *          [, cte_name [(col_name [, col_name] ...)] AS (subquery)] ...
 */
protected function consume_with_clause() {
    $token = $this->rewriter->consume();
    if($token->is_operator('RECURSIVE')) {
        $token = $this->rewriter->consume();
    }
    
    while(true) {
        $table_alias = $this->rewriter->consume();
        $token = $this->rewriter->consume();
        $column_aliases = null;
        if($token->is_operator('(')) {
            $column_aliases = [];
            // ...parse column aliases...
        }

        $token = $this->rewriter->consume_assert_is_keyword( 'AS' );
        $this->consume_sub_query();
        $comma_maybe = $this->rewriter->peek();
        if(!$comma_maybe->is_operator(',')) {
            break;
        }
    }
}

/**
 * Processes the SELECT statement (https://dev.mysql.com/doc/refman/8.0/en/select.html)
 *    SELECT
 *       [ALL | DISTINCT | DISTINCTROW ]
 *       [HIGH_PRIORITY]
 *       [STRAIGHT_JOIN]
 *       [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
 *       [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
 *        select_expr [, select_expr] ...
 */
protected function consume_select_query() {
    $this->rewriter->consume_assert_is_keyword( 'SELECT' );
    $token = $this->rewriter->peek();
    if($token->is_keyword(['ALL', 'DISTINCT', 'DISTINCTROW'])) {
         $this->rewriter->consume();
         $token = $this->rewriter->peek();
    }
    if($token->is_keyword('HIGH_PRIORITY')) {
         $this->rewriter->skip();
         $token = $this->rewriter->peek();
    }
    // ... keep going token by token, don't just skip over things like we do now
    //     with a while loop ...
    if($is_subquery) {
        $this->consume_sub_query();
    }
    // inevitably at some point:
    if($token->is_keyword('UNION')) {
        $this->consume_select_query();
    }
}

protected function consume_sub_query() {
    // ... consume a nested query ...
    // ... can it be just a SELECT query? Or can it also be something else? ...
    // ... can it have a WITH clause? ...
    // ...
    // inevitably at some point:
    $this->consume_select_query();
}

For starters, just migrating to a state machine approach would be more than enough as it would unlock support for UNIONs and easy ignoring of tokens like HIGH_PRIORITY or SQL_SMALL_RESULT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions