Skip to content

Commit 247cf66

Browse files
committed
Merge branch 'master' into materialize-master
2 parents 850e1bf + 8cd64e7 commit 247cf66

File tree

14 files changed

+775
-615
lines changed

14 files changed

+775
-615
lines changed

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22
name = "sqlparser"
33
description = "Extensible SQL Lexer and Parser with support for ANSI SQL:2011"
4-
version = "0.3.2-alpha.0"
4+
version = "0.4.1-alpha.0"
55
authors = ["Andy Grove <[email protected]>"]
66
homepage = "https://github.com/andygrove/sqlparser-rs"
77
documentation = "https://docs.rs/sqlparser/"

README.md

Lines changed: 114 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,124 @@ At some point, if the parsers diverge enough, it may be worth jettisoning
1616
compatibility with upstream so that we can perform large-scale refactors, but we
1717
should make such a decision deliberately, not accidentally.
1818

19-
## Design
19+
## Upstream overview
20+
21+
The goal of this project is to build a SQL lexer and parser capable of parsing
22+
SQL that conforms with the [ANSI/ISO SQL standard][sql-standard] while also
23+
making it easy to support custom dialects so that this crate can be used as a
24+
foundation for vendor-specific parsers.
25+
26+
This parser is currently being used by the [DataFusion] query engine and
27+
[LocustDB].
28+
29+
## Example
30+
31+
To parse a simple `SELECT` statement:
32+
33+
```rust
34+
use sqlparser::dialect::GenericDialect;
35+
use sqlparser::parser::Parser;
36+
37+
let sql = "SELECT a, b, 123, myfunc(b) \
38+
FROM table_1 \
39+
WHERE a > b AND b < 100 \
40+
ORDER BY a DESC, b";
41+
42+
let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ...
43+
44+
let ast = Parser::parse_sql(&dialect, sql.to_string()).unwrap();
45+
46+
println!("AST: {:?}", ast);
47+
```
48+
49+
This outputs
50+
51+
```rust
52+
AST: [Query(Query { ctes: [], body: Select(Select { distinct: false, projection: [UnnamedExpr(Identifier("a")), UnnamedExpr(Identifier("b")), UnnamedExpr(Value(Long(123))), UnnamedExpr(Function(Function { name: ObjectName(["myfunc"]), args: [Identifier("b")], over: None, distinct: false }))], from: [TableWithJoins { relation: Table { name: ObjectName(["table_1"]), alias: None, args: [], with_hints: [] }, joins: [] }], selection: Some(BinaryOp { left: BinaryOp { left: Identifier("a"), op: Gt, right: Identifier("b") }, op: And, right: BinaryOp { left: Identifier("b"), op: Lt, right: Value(Long(100)) } }), group_by: [], having: None }), order_by: [OrderByExpr { expr: Identifier("a"), asc: Some(false) }, OrderByExpr { expr: Identifier("b"), asc: None }], limit: None, offset: None, fetch: None })]
53+
```
54+
55+
## SQL compliance
2056

21-
*These design notes were copied from upstream.*
57+
SQL was first standardized in 1987, and revisions of the standard have been
58+
published regularly since. Most revisions have added significant new features to
59+
the language, and as a result no database claims to support the full breadth of
60+
features. This parser currently supports most of the SQL-92 syntax, plus some
61+
syntax from newer versions that have been explicitly requested, plus some MSSQL-
62+
and PostgreSQL-specific syntax. Whenever possible, the [online SQL:2011
63+
grammar][sql-2011-grammar] is used to guide what syntax to accept. (We will
64+
happily accept changes that conform to the SQL:2016 syntax as well, but that
65+
edition's grammar is not yet available online.)
2266

23-
The parser is implemented using the [Pratt Parser](https://tdop.github.io/)
24-
design, which is a top-down operator-precedence parser.
67+
Unfortunately, stating anything more specific about compliance is difficult.
68+
There is no publicly available test suite that can assess compliance
69+
automatically, and doing so manually would strain the project's limited
70+
resources. Still, we are interested in eventually supporting the full SQL
71+
dialect, and we are slowly building out our own test suite.
2572

26-
This approach has the following benefits over parser generators:
73+
If you are assessing whether this project will be suitable for your needs,
74+
you'll likely need to experimentally verify whether it supports the subset of
75+
SQL that you need. Please file issues about any unsupported queries that you
76+
discover. Doing so helps us prioritize support for the portions of the standard
77+
that are actually used. Note that if you urgently need support for a feature,
78+
you will likely need to write the implementation yourself. See the
79+
[Contributing](#Contributing) section for details.
80+
81+
### Supporting custom SQL dialects
82+
83+
This is a work in progress, but we have some notes on [writing a custom SQL
84+
parser](docs/custom_sql_parser.md).
85+
86+
## Design
87+
88+
The core expression parser uses the [Pratt Parser] design, which is a top-down
89+
operator-precedence (TDOP) parser, while the surrounding SQL statement parser is
90+
a traditional, hand-written recursive descent parser. Eli Bendersky has a good
91+
[tutorial on TDOP parsers][tdop-tutorial], if you are interested in learning
92+
more about the technique.
93+
94+
We are a fan of this design pattern over parser generators for the following
95+
reasons:
2796

2897
- Code is simple to write and can be concise and elegant
2998
- Performance is generally better than code generated by parser generators
3099
- Debugging is much easier with hand-written code
31-
- It is far easier to extend and make dialect-specific extensions compared to using a parser generator
100+
- It is far easier to extend and make dialect-specific extensions
101+
compared to using a parser generator
102+
103+
## Contributing
104+
105+
Contributions are highly encouraged!
106+
107+
Pull requests that add support for or fix a bug in a feature in the SQL
108+
standard, or a feature in a popular RDBMS, like Microsoft SQL Server or
109+
PostgreSQL, will almost certainly be accepted after a brief review. For
110+
particularly large or invasive changes, consider opening an issue first,
111+
especially if you are a first time contributor, so that you can coordinate with
112+
the maintainers. CI will ensure that your code passes `cargo test`,
113+
`cargo fmt`, and `cargo clippy`, so you will likely want to run all three
114+
commands locally before submitting your PR.
115+
116+
If you are unable to submit a patch, feel free to file an issue instead. Please
117+
try to include:
118+
119+
* some representative examples of the syntax you wish to support or fix;
120+
* the relevant bits of the [SQL grammar][sql-2011-grammar], if the syntax is
121+
part of SQL:2011; and
122+
* links to documentation for the feature for a few of the most popular
123+
databases that support it.
124+
125+
Please be aware that, while we strive to address bugs and review PRs quickly, we
126+
make no such guarantees for feature requests. If you need support for a feature,
127+
you will likely need to implement it yourself. Our goal as maintainers is to
128+
facilitate the integration of various features from various contributors, but
129+
not to provide the implementations ourselves, as we simply don't have the
130+
resources.
131+
132+
[tdop-tutorial]: https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing
133+
[`cargo fmt`]: https://github.com/rust-lang/rustfmt#on-the-stable-toolchain
134+
[current issues]: https://github.com/andygrove/sqlparser-rs/issues
135+
[DataFusion]: https://github.com/apache/arrow/tree/master/rust/datafusion
136+
[LocustDB]: https://github.com/cswinter/LocustDB
137+
[Pratt Parser]: https://tdop.github.io/
138+
[sql-2011-grammar]: https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html
139+
[sql-standard]: https://en.wikipedia.org/wiki/ISO/IEC_9075

src/ast/data_type.rs

Lines changed: 38 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
// limitations under the License.
1212

1313
use super::ObjectName;
14+
use std::fmt;
1415

1516
/// SQL data types
1617
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
@@ -65,47 +66,53 @@ pub enum DataType {
6566
Array(Box<DataType>),
6667
}
6768

68-
impl ToString for DataType {
69-
fn to_string(&self) -> String {
69+
impl fmt::Display for DataType {
70+
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
7071
match self {
71-
DataType::Char(size) => format_type_with_optional_length("char", size),
72-
DataType::Varchar(size) => format_type_with_optional_length("character varying", size),
73-
DataType::Uuid => "uuid".to_string(),
74-
DataType::Clob(size) => format!("clob({})", size),
75-
DataType::Binary(size) => format!("binary({})", size),
76-
DataType::Varbinary(size) => format!("varbinary({})", size),
77-
DataType::Blob(size) => format!("blob({})", size),
72+
DataType::Char(size) => format_type_with_optional_length(f, "char", size),
73+
DataType::Varchar(size) => {
74+
format_type_with_optional_length(f, "character varying", size)
75+
}
76+
DataType::Uuid => write!(f, "uuid"),
77+
DataType::Clob(size) => write!(f, "clob({})", size),
78+
DataType::Binary(size) => write!(f, "binary({})", size),
79+
DataType::Varbinary(size) => write!(f, "varbinary({})", size),
80+
DataType::Blob(size) => write!(f, "blob({})", size),
7881
DataType::Decimal(precision, scale) => {
7982
if let Some(scale) = scale {
80-
format!("numeric({},{})", precision.unwrap(), scale)
83+
write!(f, "numeric({},{})", precision.unwrap(), scale)
8184
} else {
82-
format_type_with_optional_length("numeric", precision)
85+
format_type_with_optional_length(f, "numeric", precision)
8386
}
8487
}
85-
DataType::Float(size) => format_type_with_optional_length("float", size),
86-
DataType::SmallInt => "smallint".to_string(),
87-
DataType::Int => "int".to_string(),
88-
DataType::BigInt => "bigint".to_string(),
89-
DataType::Real => "real".to_string(),
90-
DataType::Double => "double".to_string(),
91-
DataType::Boolean => "boolean".to_string(),
92-
DataType::Date => "date".to_string(),
93-
DataType::Time => "time".to_string(),
94-
DataType::Timestamp => "timestamp".to_string(),
95-
DataType::Interval => "interval".to_string(),
96-
DataType::Regclass => "regclass".to_string(),
97-
DataType::Text => "text".to_string(),
98-
DataType::Bytea => "bytea".to_string(),
99-
DataType::Array(ty) => format!("{}[]", ty.to_string()),
100-
DataType::Custom(ty) => ty.to_string(),
88+
DataType::Float(size) => format_type_with_optional_length(f, "float", size),
89+
DataType::SmallInt => write!(f, "smallint"),
90+
DataType::Int => write!(f, "int"),
91+
DataType::BigInt => write!(f, "bigint"),
92+
DataType::Real => write!(f, "real"),
93+
DataType::Double => write!(f, "double"),
94+
DataType::Boolean => write!(f, "boolean"),
95+
DataType::Date => write!(f, "date"),
96+
DataType::Time => write!(f, "time"),
97+
DataType::Timestamp => write!(f, "timestamp"),
98+
DataType::Interval => write!(f, "interval"),
99+
DataType::Regclass => write!(f, "regclass"),
100+
DataType::Text => write!(f, "text"),
101+
DataType::Bytea => write!(f, "bytea"),
102+
DataType::Array(ty) => write!(f, "{}[]", ty),
103+
DataType::Custom(ty) => write!(f, "{}", ty),
101104
}
102105
}
103106
}
104107

105-
fn format_type_with_optional_length(sql_type: &str, len: &Option<u64>) -> String {
106-
let mut s = sql_type.to_string();
108+
fn format_type_with_optional_length(
109+
f: &mut fmt::Formatter,
110+
sql_type: &'static str,
111+
len: &Option<u64>,
112+
) -> fmt::Result {
113+
write!(f, "{}", sql_type)?;
107114
if let Some(len) = len {
108-
s += &format!("({})", len);
115+
write!(f, "({})", len)?;
109116
}
110-
s
117+
Ok(())
111118
}

0 commit comments

Comments
 (0)