@@ -16,16 +16,124 @@ At some point, if the parsers diverge enough, it may be worth jettisoning
16
16
compatibility with upstream so that we can perform large-scale refactors, but we
17
17
should make such a decision deliberately, not accidentally.
18
18
19
- ## Design
19
+ ## Upstream overview
20
+
21
+ The goal of this project is to build a SQL lexer and parser capable of parsing
22
+ SQL that conforms with the [ ANSI/ISO SQL standard] [ sql-standard ] while also
23
+ making it easy to support custom dialects so that this crate can be used as a
24
+ foundation for vendor-specific parsers.
25
+
26
+ This parser is currently being used by the [ DataFusion] query engine and
27
+ [ LocustDB] .
28
+
29
+ ## Example
30
+
31
+ To parse a simple ` SELECT ` statement:
32
+
33
+ ``` rust
34
+ use sqlparser :: dialect :: GenericDialect ;
35
+ use sqlparser :: parser :: Parser ;
36
+
37
+ let sql = " SELECT a, b, 123, myfunc(b) \
38
+ FROM table_1 \
39
+ WHERE a > b AND b < 100 \
40
+ ORDER BY a DESC, b" ;
41
+
42
+ let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ...
43
+
44
+ let ast = Parser :: parse_sql (& dialect , sql . to_string ()). unwrap ();
45
+
46
+ println! (" AST: {:?}" , ast );
47
+ ```
48
+
49
+ This outputs
50
+
51
+ ``` rust
52
+ AST : [Query (Query { ctes : [], body : Select (Select { distinct : false , projection : [UnnamedExpr (Identifier (" a" )), UnnamedExpr (Identifier (" b" )), UnnamedExpr (Value (Long (123 ))), UnnamedExpr (Function (Function { name : ObjectName ([" myfunc" ]), args : [Identifier (" b" )], over : None , distinct : false }))], from : [TableWithJoins { relation : Table { name : ObjectName ([" table_1" ]), alias : None , args : [], with_hints : [] }, joins : [] }], selection : Some (BinaryOp { left : BinaryOp { left : Identifier (" a" ), op : Gt , right : Identifier (" b" ) }, op : And , right : BinaryOp { left : Identifier (" b" ), op : Lt , right : Value (Long (100 )) } }), group_by : [], having : None }), order_by : [OrderByExpr { expr : Identifier (" a" ), asc : Some (false ) }, OrderByExpr { expr : Identifier (" b" ), asc : None }], limit : None , offset : None , fetch : None })]
53
+ ```
54
+
55
+ ## SQL compliance
20
56
21
- * These design notes were copied from upstream.*
57
+ SQL was first standardized in 1987, and revisions of the standard have been
58
+ published regularly since. Most revisions have added significant new features to
59
+ the language, and as a result no database claims to support the full breadth of
60
+ features. This parser currently supports most of the SQL-92 syntax, plus some
61
+ syntax from newer versions that have been explicitly requested, plus some MSSQL-
62
+ and PostgreSQL-specific syntax. Whenever possible, the [ online SQL:2011
63
+ grammar] [ sql-2011-grammar ] is used to guide what syntax to accept. (We will
64
+ happily accept changes that conform to the SQL:2016 syntax as well, but that
65
+ edition's grammar is not yet available online.)
22
66
23
- The parser is implemented using the [ Pratt Parser] ( https://tdop.github.io/ )
24
- design, which is a top-down operator-precedence parser.
67
+ Unfortunately, stating anything more specific about compliance is difficult.
68
+ There is no publicly available test suite that can assess compliance
69
+ automatically, and doing so manually would strain the project's limited
70
+ resources. Still, we are interested in eventually supporting the full SQL
71
+ dialect, and we are slowly building out our own test suite.
25
72
26
- This approach has the following benefits over parser generators:
73
+ If you are assessing whether this project will be suitable for your needs,
74
+ you'll likely need to experimentally verify whether it supports the subset of
75
+ SQL that you need. Please file issues about any unsupported queries that you
76
+ discover. Doing so helps us prioritize support for the portions of the standard
77
+ that are actually used. Note that if you urgently need support for a feature,
78
+ you will likely need to write the implementation yourself. See the
79
+ [ Contributing] ( #Contributing ) section for details.
80
+
81
+ ### Supporting custom SQL dialects
82
+
83
+ This is a work in progress, but we have some notes on [ writing a custom SQL
84
+ parser] ( docs/custom_sql_parser.md ) .
85
+
86
+ ## Design
87
+
88
+ The core expression parser uses the [ Pratt Parser] design, which is a top-down
89
+ operator-precedence (TDOP) parser, while the surrounding SQL statement parser is
90
+ a traditional, hand-written recursive descent parser. Eli Bendersky has a good
91
+ [ tutorial on TDOP parsers] [ tdop-tutorial ] , if you are interested in learning
92
+ more about the technique.
93
+
94
+ We are a fan of this design pattern over parser generators for the following
95
+ reasons:
27
96
28
97
- Code is simple to write and can be concise and elegant
29
98
- Performance is generally better than code generated by parser generators
30
99
- Debugging is much easier with hand-written code
31
- - It is far easier to extend and make dialect-specific extensions compared to using a parser generator
100
+ - It is far easier to extend and make dialect-specific extensions
101
+ compared to using a parser generator
102
+
103
+ ## Contributing
104
+
105
+ Contributions are highly encouraged!
106
+
107
+ Pull requests that add support for or fix a bug in a feature in the SQL
108
+ standard, or a feature in a popular RDBMS, like Microsoft SQL Server or
109
+ PostgreSQL, will almost certainly be accepted after a brief review. For
110
+ particularly large or invasive changes, consider opening an issue first,
111
+ especially if you are a first time contributor, so that you can coordinate with
112
+ the maintainers. CI will ensure that your code passes ` cargo test ` ,
113
+ ` cargo fmt ` , and ` cargo clippy ` , so you will likely want to run all three
114
+ commands locally before submitting your PR.
115
+
116
+ If you are unable to submit a patch, feel free to file an issue instead. Please
117
+ try to include:
118
+
119
+ * some representative examples of the syntax you wish to support or fix;
120
+ * the relevant bits of the [ SQL grammar] [ sql-2011-grammar ] , if the syntax is
121
+ part of SQL:2011; and
122
+ * links to documentation for the feature for a few of the most popular
123
+ databases that support it.
124
+
125
+ Please be aware that, while we strive to address bugs and review PRs quickly, we
126
+ make no such guarantees for feature requests. If you need support for a feature,
127
+ you will likely need to implement it yourself. Our goal as maintainers is to
128
+ facilitate the integration of various features from various contributors, but
129
+ not to provide the implementations ourselves, as we simply don't have the
130
+ resources.
131
+
132
+ [ tdop-tutorial ] : https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing
133
+ [ `cargo fmt` ] : https://github.com/rust-lang/rustfmt#on-the-stable-toolchain
134
+ [ current issues ] : https://github.com/andygrove/sqlparser-rs/issues
135
+ [ DataFusion ] : https://github.com/apache/arrow/tree/master/rust/datafusion
136
+ [ LocustDB ] : https://github.com/cswinter/LocustDB
137
+ [ Pratt Parser ] : https://tdop.github.io/
138
+ [ sql-2011-grammar ] : https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html
139
+ [ sql-standard ] : https://en.wikipedia.org/wiki/ISO/IEC_9075
0 commit comments