Skip to content

Commit 7fd0c96

Browse files
authored
Add document about basics of working with expressions (#668)
1 parent 7f27651 commit 7fd0c96

File tree

2 files changed

+95
-0
lines changed

2 files changed

+95
-0
lines changed
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
Expressions
19+
===========
20+
21+
In DataFusion an expression is an abstraction that represents a computation.
22+
Expressions are used as the primary inputs and ouputs for most functions within
23+
DataFusion. As such, expressions can be combined to create expression trees, a
24+
concept shared across most compilers and databases.
25+
26+
Column
27+
------
28+
29+
The first expression most new users will interact with is the Column, which is created by calling :func:`col`.
30+
This expression represents a column within a DataFrame. The function :func:`col` takes as in input a string
31+
and returns an expression as it's output.
32+
33+
Literal
34+
-------
35+
36+
Literal expressions represent a single value. These are helpful in a wide range of operations where
37+
a specific, known value is of interest. You can create a literal expression using the function :func:`lit`.
38+
The type of the object passed to the :func:`lit` function will be used to convert it to a known data type.
39+
40+
In the following example we create expressions for the column named `color` and the literal scalar string `red`.
41+
The resultant variable `red_units` is itself also an expression.
42+
43+
.. ipython:: python
44+
45+
red_units = col("color") == lit("red")
46+
47+
Boolean
48+
-------
49+
50+
When combining expressions that evaluate to a boolean value, you can combine these expressions using boolean operators.
51+
It is important to note that in order to combine these expressions, you *must* use bitwise operators. See the following
52+
examples for the and, or, and not operations.
53+
54+
55+
.. ipython:: python
56+
57+
red_or_green_units = (col("color") == lit("red")) | (col("color") == lit("green"))
58+
heavy_red_units = (col("color") == lit("red")) & (col("weight") > lit(42))
59+
not_red_units = ~(col("color") == lit("red"))
60+
61+
Functions
62+
---------
63+
64+
As mentioned before, most functions in DataFusion return an expression at their output. This allows us to create
65+
a wide variety of expressions built up from other expressions. For example, :func:`.alias` is a function that takes
66+
as it input a single expression and returns an expression in which the name of the expression has changed.
67+
68+
The following example shows a series of expressions that are built up from functions operating on expressions.
69+
70+
.. ipython:: python
71+
72+
from datafusion import SessionContext
73+
from datafusion import column, lit
74+
from datafusion import functions as f
75+
import random
76+
77+
ctx = SessionContext()
78+
df = ctx.from_pydict(
79+
{
80+
"name": ["Albert", "Becca", "Carlos", "Dante"],
81+
"age": [42, 67, 27, 71],
82+
"years_in_position": [13, 21, 10, 54],
83+
},
84+
name="employees"
85+
)
86+
87+
age_col = col("age")
88+
renamed_age = age_col.alias("age_in_years")
89+
start_age = age_col - col("years_in_position")
90+
started_young = start_age < lit(18)
91+
can_retire = age_col > lit(65)
92+
long_timer = started_young & can_retire
93+
94+
df.filter(long_timer).select(col("name"), renamed_age, col("years_in_position"))

docs/source/user-guide/common-operations/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Common Operations
2323

2424
basic-info
2525
select-and-filter
26+
expressions
2627
joins
2728
functions
2829
aggregations

0 commit comments

Comments
 (0)