|
| 1 | +.. Licensed to the Apache Software Foundation (ASF) under one |
| 2 | +.. or more contributor license agreements. See the NOTICE file |
| 3 | +.. distributed with this work for additional information |
| 4 | +.. regarding copyright ownership. The ASF licenses this file |
| 5 | +.. to you under the Apache License, Version 2.0 (the |
| 6 | +.. "License"); you may not use this file except in compliance |
| 7 | +.. with the License. You may obtain a copy of the License at |
| 8 | +
|
| 9 | +.. http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | +.. Unless required by applicable law or agreed to in writing, |
| 12 | +.. software distributed under the License is distributed on an |
| 13 | +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 14 | +.. KIND, either express or implied. See the License for the |
| 15 | +.. specific language governing permissions and limitations |
| 16 | +.. under the License. |
| 17 | +
|
| 18 | +Expressions |
| 19 | +=========== |
| 20 | + |
| 21 | +In DataFusion an expression is an abstraction that represents a computation. |
| 22 | +Expressions are used as the primary inputs and ouputs for most functions within |
| 23 | +DataFusion. As such, expressions can be combined to create expression trees, a |
| 24 | +concept shared across most compilers and databases. |
| 25 | + |
| 26 | +Column |
| 27 | +------ |
| 28 | + |
| 29 | +The first expression most new users will interact with is the Column, which is created by calling :func:`col`. |
| 30 | +This expression represents a column within a DataFrame. The function :func:`col` takes as in input a string |
| 31 | +and returns an expression as it's output. |
| 32 | + |
| 33 | +Literal |
| 34 | +------- |
| 35 | + |
| 36 | +Literal expressions represent a single value. These are helpful in a wide range of operations where |
| 37 | +a specific, known value is of interest. You can create a literal expression using the function :func:`lit`. |
| 38 | +The type of the object passed to the :func:`lit` function will be used to convert it to a known data type. |
| 39 | + |
| 40 | +In the following example we create expressions for the column named `color` and the literal scalar string `red`. |
| 41 | +The resultant variable `red_units` is itself also an expression. |
| 42 | + |
| 43 | +.. ipython:: python |
| 44 | +
|
| 45 | + red_units = col("color") == lit("red") |
| 46 | +
|
| 47 | +Boolean |
| 48 | +------- |
| 49 | + |
| 50 | +When combining expressions that evaluate to a boolean value, you can combine these expressions using boolean operators. |
| 51 | +It is important to note that in order to combine these expressions, you *must* use bitwise operators. See the following |
| 52 | +examples for the and, or, and not operations. |
| 53 | + |
| 54 | + |
| 55 | +.. ipython:: python |
| 56 | +
|
| 57 | + red_or_green_units = (col("color") == lit("red")) | (col("color") == lit("green")) |
| 58 | + heavy_red_units = (col("color") == lit("red")) & (col("weight") > lit(42)) |
| 59 | + not_red_units = ~(col("color") == lit("red")) |
| 60 | +
|
| 61 | +Functions |
| 62 | +--------- |
| 63 | + |
| 64 | +As mentioned before, most functions in DataFusion return an expression at their output. This allows us to create |
| 65 | +a wide variety of expressions built up from other expressions. For example, :func:`.alias` is a function that takes |
| 66 | +as it input a single expression and returns an expression in which the name of the expression has changed. |
| 67 | + |
| 68 | +The following example shows a series of expressions that are built up from functions operating on expressions. |
| 69 | + |
| 70 | +.. ipython:: python |
| 71 | +
|
| 72 | + from datafusion import SessionContext |
| 73 | + from datafusion import column, lit |
| 74 | + from datafusion import functions as f |
| 75 | + import random |
| 76 | +
|
| 77 | + ctx = SessionContext() |
| 78 | + df = ctx.from_pydict( |
| 79 | + { |
| 80 | + "name": ["Albert", "Becca", "Carlos", "Dante"], |
| 81 | + "age": [42, 67, 27, 71], |
| 82 | + "years_in_position": [13, 21, 10, 54], |
| 83 | + }, |
| 84 | + name="employees" |
| 85 | + ) |
| 86 | +
|
| 87 | + age_col = col("age") |
| 88 | + renamed_age = age_col.alias("age_in_years") |
| 89 | + start_age = age_col - col("years_in_position") |
| 90 | + started_young = start_age < lit(18) |
| 91 | + can_retire = age_col > lit(65) |
| 92 | + long_timer = started_young & can_retire |
| 93 | +
|
| 94 | + df.filter(long_timer).select(col("name"), renamed_age, col("years_in_position")) |
0 commit comments