Skip to content

Commit ee8b5bf

Browse files
authored
use prettier to format md files (apache#367)
* use prettier to format md files * apply prettier * update ballista
1 parent 9fdc4fe commit ee8b5bf

31 files changed

+239
-219
lines changed

CODE_OF_CONDUCT.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,6 @@
1919

2020
# Code of Conduct
2121

22-
* [Code of Conduct for The Apache Software Foundation][1]
22+
- [Code of Conduct for The Apache Software Foundation][1]
2323

24-
[1]: https://www.apache.org/foundation/policies/conduct.html
24+
[1]: https://www.apache.org/foundation/policies/conduct.html

DEVELOPERS.md

+36-36
Original file line numberDiff line numberDiff line change
@@ -21,57 +21,57 @@
2121

2222
This section describes how you can get started at developing DataFusion.
2323

24-
For information on developing with Ballista, see the
25-
[Ballista developer documentation](ballista/docs/README.md).
24+
For information on developing with Ballista, see the
25+
[Ballista developer documentation](ballista/docs/README.md).
2626

2727
### Bootstrap environment
2828

2929
DataFusion is written in Rust and it uses a standard rust toolkit:
3030

31-
* `cargo build`
32-
* `cargo fmt` to format the code
33-
* `cargo test` to test
34-
* etc.
31+
- `cargo build`
32+
- `cargo fmt` to format the code
33+
- `cargo test` to test
34+
- etc.
3535

3636
## How to add a new scalar function
3737

3838
Below is a checklist of what you need to do to add a new scalar function to DataFusion:
3939

40-
* Add the actual implementation of the function:
41-
* [here](datafusion/src/physical_plan/string_expressions.rs) for string functions
42-
* [here](datafusion/src/physical_plan/math_expressions.rs) for math functions
43-
* [here](datafusion/src/physical_plan/datetime_expressions.rs) for datetime functions
44-
* create a new module [here](datafusion/src/physical_plan) for other functions
45-
* In [src/physical_plan/functions](datafusion/src/physical_plan/functions.rs), add:
46-
* a new variant to `BuiltinScalarFunction`
47-
* a new entry to `FromStr` with the name of the function as called by SQL
48-
* a new line in `return_type` with the expected return type of the function, given an incoming type
49-
* a new line in `signature` with the signature of the function (number and types of its arguments)
50-
* a new line in `create_physical_expr` mapping the built-in to the implementation
51-
* tests to the function.
52-
* In [tests/sql.rs](datafusion/tests/sql.rs), add a new test where the function is called through SQL against well known data and returns the expected result.
53-
* In [src/logical_plan/expr](datafusion/src/logical_plan/expr.rs), add:
54-
* a new entry of the `unary_scalar_expr!` macro for the new function.
55-
* In [src/logical_plan/mod](datafusion/src/logical_plan/mod.rs), add:
56-
* a new entry in the `pub use expr::{}` set.
40+
- Add the actual implementation of the function:
41+
- [here](datafusion/src/physical_plan/string_expressions.rs) for string functions
42+
- [here](datafusion/src/physical_plan/math_expressions.rs) for math functions
43+
- [here](datafusion/src/physical_plan/datetime_expressions.rs) for datetime functions
44+
- create a new module [here](datafusion/src/physical_plan) for other functions
45+
- In [src/physical_plan/functions](datafusion/src/physical_plan/functions.rs), add:
46+
- a new variant to `BuiltinScalarFunction`
47+
- a new entry to `FromStr` with the name of the function as called by SQL
48+
- a new line in `return_type` with the expected return type of the function, given an incoming type
49+
- a new line in `signature` with the signature of the function (number and types of its arguments)
50+
- a new line in `create_physical_expr` mapping the built-in to the implementation
51+
- tests to the function.
52+
- In [tests/sql.rs](datafusion/tests/sql.rs), add a new test where the function is called through SQL against well known data and returns the expected result.
53+
- In [src/logical_plan/expr](datafusion/src/logical_plan/expr.rs), add:
54+
- a new entry of the `unary_scalar_expr!` macro for the new function.
55+
- In [src/logical_plan/mod](datafusion/src/logical_plan/mod.rs), add:
56+
- a new entry in the `pub use expr::{}` set.
5757

5858
## How to add a new aggregate function
5959

6060
Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
6161

62-
* Add the actual implementation of an `Accumulator` and `AggregateExpr`:
63-
* [here](datafusion/src/physical_plan/string_expressions.rs) for string functions
64-
* [here](datafusion/src/physical_plan/math_expressions.rs) for math functions
65-
* [here](datafusion/src/physical_plan/datetime_expressions.rs) for datetime functions
66-
* create a new module [here](datafusion/src/physical_plan) for other functions
67-
* In [src/physical_plan/aggregates](datafusion/src/physical_plan/aggregates.rs), add:
68-
* a new variant to `BuiltinAggregateFunction`
69-
* a new entry to `FromStr` with the name of the function as called by SQL
70-
* a new line in `return_type` with the expected return type of the function, given an incoming type
71-
* a new line in `signature` with the signature of the function (number and types of its arguments)
72-
* a new line in `create_aggregate_expr` mapping the built-in to the implementation
73-
* tests to the function.
74-
* In [tests/sql.rs](datafusion/tests/sql.rs), add a new test where the function is called through SQL against well known data and returns the expected result.
62+
- Add the actual implementation of an `Accumulator` and `AggregateExpr`:
63+
- [here](datafusion/src/physical_plan/string_expressions.rs) for string functions
64+
- [here](datafusion/src/physical_plan/math_expressions.rs) for math functions
65+
- [here](datafusion/src/physical_plan/datetime_expressions.rs) for datetime functions
66+
- create a new module [here](datafusion/src/physical_plan) for other functions
67+
- In [src/physical_plan/aggregates](datafusion/src/physical_plan/aggregates.rs), add:
68+
- a new variant to `BuiltinAggregateFunction`
69+
- a new entry to `FromStr` with the name of the function as called by SQL
70+
- a new line in `return_type` with the expected return type of the function, given an incoming type
71+
- a new line in `signature` with the signature of the function (number and types of its arguments)
72+
- a new line in `create_aggregate_expr` mapping the built-in to the implementation
73+
- tests to the function.
74+
- In [tests/sql.rs](datafusion/tests/sql.rs), add a new test where the function is called through SQL against well known data and returns the expected result.
7575

7676
## How to display plans graphically
7777

README.md

+52-63
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ logical query plans as well as a query optimizer and execution engine
3030
capable of parallel execution against partitioned data sources (CSV
3131
and Parquet) using threads.
3232

33-
DataFusion also supports distributed query execution via the
33+
DataFusion also supports distributed query execution via the
3434
[Ballista](ballista/README.md) crate.
3535

3636
## Use Cases
@@ -42,24 +42,24 @@ the convenience of an SQL interface or a DataFrame API.
4242

4343
## Why DataFusion?
4444

45-
* *High Performance*: Leveraging Rust and Arrow's memory model, DataFusion achieves very high performance
46-
* *Easy to Connect*: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem
47-
* *Easy to Embed*: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific usecase
48-
* *High Quality*: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
45+
- _High Performance_: Leveraging Rust and Arrow's memory model, DataFusion achieves very high performance
46+
- _Easy to Connect_: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem
47+
- _Easy to Embed_: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific usecase
48+
- _High Quality_: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
4949

5050
## Known Uses
5151

5252
Here are some of the projects known to use DataFusion:
5353

54-
* [Ballista](ballista) Distributed Compute Platform
55-
* [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
56-
* [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
57-
* [datafusion-python](https://pypi.org/project/datafusion)
58-
* [delta-rs](https://github.com/delta-io/delta-rs)
59-
* [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
60-
* [ROAPI](https://github.com/roapi/roapi)
61-
* [Tensorbase](https://github.com/tensorbase/tensorbase)
62-
* [Squirtle](https://github.com/DSLAM-UMD/Squirtle)
54+
- [Ballista](ballista) Distributed Compute Platform
55+
- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
56+
- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
57+
- [datafusion-python](https://pypi.org/project/datafusion)
58+
- [delta-rs](https://github.com/delta-io/delta-rs)
59+
- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
60+
- [ROAPI](https://github.com/roapi/roapi)
61+
- [Tensorbase](https://github.com/tensorbase/tensorbase)
62+
- [Squirtle](https://github.com/DSLAM-UMD/Squirtle)
6363

6464
(if you know of another project, please submit a PR to add a link!)
6565

@@ -122,8 +122,6 @@ Both of these examples will produce
122122
+---+--------+
123123
```
124124

125-
126-
127125
## Using DataFusion as a library
128126

129127
DataFusion is [published on crates.io](https://crates.io/crates/datafusion), and is [well documented on docs.rs](https://docs.rs/datafusion/).
@@ -230,7 +228,6 @@ DataFusion also includes a simple command-line interactive SQL utility. See the
230228
- [x] Parquet primitive types
231229
- [ ] Parquet nested types
232230

233-
234231
## Extensibility
235232

236233
DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
@@ -242,35 +239,32 @@ DataFusion is designed to be extensible at all points. To that end, you can prov
242239
- [x] User Defined `LogicalPlan` nodes
243240
- [x] User Defined `ExecutionPlan` nodes
244241

245-
246242
# Supported SQL
247243

248244
This library currently supports many SQL constructs, including
249245

250-
* `CREATE EXTERNAL TABLE X STORED AS PARQUET LOCATION '...';` to register a table's locations
251-
* `SELECT ... FROM ...` together with any expression
252-
* `ALIAS` to name an expression
253-
* `CAST` to change types, including e.g. `Timestamp(Nanosecond, None)`
254-
* most mathematical unary and binary expressions such as `+`, `/`, `sqrt`, `tan`, `>=`.
255-
* `WHERE` to filter
256-
* `GROUP BY` together with one of the following aggregations: `MIN`, `MAX`, `COUNT`, `SUM`, `AVG`
257-
* `ORDER BY` together with an expression and optional `ASC` or `DESC` and also optional `NULLS FIRST` or `NULLS LAST`
258-
246+
- `CREATE EXTERNAL TABLE X STORED AS PARQUET LOCATION '...';` to register a table's locations
247+
- `SELECT ... FROM ...` together with any expression
248+
- `ALIAS` to name an expression
249+
- `CAST` to change types, including e.g. `Timestamp(Nanosecond, None)`
250+
- most mathematical unary and binary expressions such as `+`, `/`, `sqrt`, `tan`, `>=`.
251+
- `WHERE` to filter
252+
- `GROUP BY` together with one of the following aggregations: `MIN`, `MAX`, `COUNT`, `SUM`, `AVG`
253+
- `ORDER BY` together with an expression and optional `ASC` or `DESC` and also optional `NULLS FIRST` or `NULLS LAST`
259254

260255
## Supported Functions
261256

262257
DataFusion strives to implement a subset of the [PostgreSQL SQL dialect](https://www.postgresql.org/docs/current/functions.html) where possible. We explicitly choose a single dialect to maximize interoperability with other tools and allow reuse of the PostgreSQL documents and tutorials as much as possible.
263258

264-
Currently, only a subset of the PosgreSQL dialect is implemented, and we will document any deviations.
259+
Currently, only a subset of the PostgreSQL dialect is implemented, and we will document any deviations.
265260

266261
## Schema Metadata / Information Schema Support
267262

268263
DataFusion supports the showing metadata about the tables available. This information can be accessed using the views of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.
269264

270265
More information can be found in the [Postgres docs](https://www.postgresql.org/docs/13/infoschema-schema.html)).
271266

272-
273-
To show tables available for use in DataFusion, use the `SHOW TABLES` command or the `information_schema.tables` view:
267+
To show tables available for use in DataFusion, use the `SHOW TABLES` command or the `information_schema.tables` view:
274268

275269
```sql
276270
> show tables;
@@ -291,7 +285,7 @@ To show tables available for use in DataFusion, use the `SHOW TABLES` command o
291285
+---------------+--------------------+------------+--------------+
292286
```
293287

294-
To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the or `information_schema.columns` view:
288+
To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the or `information_schema.columns` view:
295289

296290
```sql
297291
> show columns from t;
@@ -313,50 +307,45 @@ To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or
313307
+------------+-------------+------------------+-------------+-----------+
314308
```
315309

316-
317-
318310
## Supported Data Types
319311

320312
DataFusion uses Arrow, and thus the Arrow type system, for query
321313
execution. The SQL types from
322314
[sqlparser-rs](https://github.com/ballista-compute/sqlparser-rs/blob/main/src/ast/data_type.rs#L57)
323315
are mapped to Arrow types according to the following table
324316

325-
326-
| SQL Data Type | Arrow DataType |
327-
| --------------- | -------------------------------- |
328-
| `CHAR` | `Utf8` |
329-
| `VARCHAR` | `Utf8` |
330-
| `UUID` | *Not yet supported* |
331-
| `CLOB` | *Not yet supported* |
332-
| `BINARY` | *Not yet supported* |
333-
| `VARBINARY` | *Not yet supported* |
334-
| `DECIMAL` | `Float64` |
335-
| `FLOAT` | `Float32` |
336-
| `SMALLINT` | `Int16` |
337-
| `INT` | `Int32` |
338-
| `BIGINT` | `Int64` |
339-
| `REAL` | `Float64` |
340-
| `DOUBLE` | `Float64` |
341-
| `BOOLEAN` | `Boolean` |
342-
| `DATE` | `Date32` |
343-
| `TIME` | `Time64(TimeUnit::Millisecond)` |
344-
| `TIMESTAMP` | `Date64` |
345-
| `INTERVAL` | *Not yet supported* |
346-
| `REGCLASS` | *Not yet supported* |
347-
| `TEXT` | *Not yet supported* |
348-
| `BYTEA` | *Not yet supported* |
349-
| `CUSTOM` | *Not yet supported* |
350-
| `ARRAY` | *Not yet supported* |
351-
317+
| SQL Data Type | Arrow DataType |
318+
| ------------- | ------------------------------- |
319+
| `CHAR` | `Utf8` |
320+
| `VARCHAR` | `Utf8` |
321+
| `UUID` | _Not yet supported_ |
322+
| `CLOB` | _Not yet supported_ |
323+
| `BINARY` | _Not yet supported_ |
324+
| `VARBINARY` | _Not yet supported_ |
325+
| `DECIMAL` | `Float64` |
326+
| `FLOAT` | `Float32` |
327+
| `SMALLINT` | `Int16` |
328+
| `INT` | `Int32` |
329+
| `BIGINT` | `Int64` |
330+
| `REAL` | `Float64` |
331+
| `DOUBLE` | `Float64` |
332+
| `BOOLEAN` | `Boolean` |
333+
| `DATE` | `Date32` |
334+
| `TIME` | `Time64(TimeUnit::Millisecond)` |
335+
| `TIMESTAMP` | `Date64` |
336+
| `INTERVAL` | _Not yet supported_ |
337+
| `REGCLASS` | _Not yet supported_ |
338+
| `TEXT` | _Not yet supported_ |
339+
| `BYTEA` | _Not yet supported_ |
340+
| `CUSTOM` | _Not yet supported_ |
341+
| `ARRAY` | _Not yet supported_ |
352342

353343
# Architecture Overview
354344

355345
There is no formal document describing DataFusion's architecture yet, but the following presentations offer a good overview of its different components and how they interact together.
356346

357-
* (March 2021): The DataFusion architecture is described in *Query Engine Design and the Rust-Based DataFusion in Apache Arrow*: [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU) (DataFusion content starts ~ 15 minutes in) and [slides](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)
358-
* (Feburary 2021): How DataFusion is used within the Ballista Project is described in *Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)
359-
347+
- (March 2021): The DataFusion architecture is described in _Query Engine Design and the Rust-Based DataFusion in Apache Arrow_: [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU) (DataFusion content starts ~ 15 minutes in) and [slides](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)
348+
- (Feburary 2021): How DataFusion is used within the Ballista Project is described in \*Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)
360349

361350
# Developer's guide
362351

ballista/README.md

+5-6
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,14 @@
1919

2020
# Ballista: Distributed Compute with Apache Arrow
2121

22-
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built
23-
on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as
22+
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built
23+
on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as
2424
first-class citizens without paying a penalty for serialization costs.
2525

2626
The foundational technologies in Ballista are:
2727

2828
- [Apache Arrow](https://arrow.apache.org/) memory model and compute kernels for efficient processing of data.
29-
- [Apache Arrow Flight Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for efficient
29+
- [Apache Arrow Flight Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for efficient
3030
data transfer between processes.
3131
- [Google Protocol Buffers](https://developers.google.com/protocol-buffers) for serializing query plans.
3232
- [Docker](https://www.docker.com/) for packaging up executors along with user-defined code.
@@ -57,7 +57,6 @@ April 2021 and should be considered experimental.
5757

5858
## Getting Started
5959

60-
The [Ballista Developer Documentation](docs/README.md) and the
61-
[DataFusion User Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide) are currently the
60+
The [Ballista Developer Documentation](docs/README.md) and the
61+
[DataFusion User Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide) are currently the
6262
best sources of information for getting started with Ballista.
63-

ballista/docs/README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,19 @@
1616
specific language governing permissions and limitations
1717
under the License.
1818
-->
19+
1920
# Ballista Developer Documentation
2021

21-
This directory contains documentation for developers that are contributing to Ballista. If you are looking for
22-
end-user documentation for a published release, please start with the
22+
This directory contains documentation for developers that are contributing to Ballista. If you are looking for
23+
end-user documentation for a published release, please start with the
2324
[DataFusion User Guide](../../docs/user-guide) instead.
2425

2526
## Architecture & Design
2627

27-
- Read the [Architecture Overview](architecture.md) to get an understanding of the scheduler and executor
28+
- Read the [Architecture Overview](architecture.md) to get an understanding of the scheduler and executor
2829
processes and how distributed query execution works.
2930

3031
## Build, Test, Release
3132

3233
- Setting up a [development environment](dev-env.md).
3334
- [Integration Testing](integration-testing.md)
34-

0 commit comments

Comments
 (0)