You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/logical-analysis-rules/FindDataSourceTable.md
+78-12Lines changed: 78 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,36 +2,103 @@
2
2
title: FindDataSourceTable
3
3
---
4
4
5
-
# FindDataSourceTable Logical Evaluation Rule
5
+
# FindDataSourceTable Logical Resolution Rule
6
6
7
-
`FindDataSourceTable` is a catalyst/Rule.md[Catalyst rule] for <<apply, resolving UnresolvedCatalogRelations>> (of Spark and Hive tables) in a logical query plan.
7
+
`FindDataSourceTable` is a [Catalyst rule](../catalyst/Rule.md) to [resolve UnresolvedCatalogRelation logical operators](#apply)(of Spark and Hive tables) in a logical query plan (`Rule[LogicalPlan]`).
8
8
9
-
`FindDataSourceTable` is part of [additional rules](../Analyzer.md#extendedResolutionRules) in `Resolution` fixed-point batch of rules.
9
+
`FindDataSourceTable` is used by [Hive](../hive/HiveSessionStateBuilder.md#analyzer) and [Spark](../BaseSessionStateBuilder.md#analyzer) Analyzers as part of their [extendedResolutionRules](../Analyzer.md#extendedResolutionRules).
10
10
11
-
[[sparkSession]][[creating-instance]]
12
-
`FindDataSourceTable` takes a single [SparkSession](../SparkSession.md) to be created.
11
+
## Creating Instance
12
+
13
+
`FindDataSourceTable` takes the following to be created:
*`HiveSessionStateBuilder` is requested for the [Analyzer](../hive/HiveSessionStateBuilder.md#analyzer)
20
+
*`BaseSessionStateBuilder` is requested for the [Analyzer](../BaseSessionStateBuilder.md#analyzer)
21
+
22
+
## Execute Rule { #apply }
23
+
24
+
??? note "Rule"
25
+
26
+
```scala
27
+
apply(
28
+
plan: LogicalPlan): LogicalPlan
29
+
```
30
+
31
+
`apply` is part of the [Rule](../catalyst/Rule.md#apply) abstraction.
32
+
33
+
`apply` traverses the given [LogicalPlan](../logical-operators/LogicalPlan.md) (from top to leaves) to resolve `UnresolvedCatalogRelation`s of the following logical operators:
34
+
35
+
1.[InsertIntoStatement](../logical-operators/InsertIntoStatement.md) with a non-streaming `UnresolvedCatalogRelation` of [Spark (DataSource) table](../connectors/DDLUtils.md#isDatasourceTable)
36
+
1.[InsertIntoStatement](../logical-operators/InsertIntoStatement.md) with a non-streaming `UnresolvedCatalogRelation` of a Hive table
37
+
1.[AppendData](../logical-operators/AppendData.md) (that is not [by name](../logical-operators/AppendData.md#isByName)) with a [DataSourceV2Relation](../logical-operators/DataSourceV2Relation.md) of [V1Table](../connector/V1Table.md)
38
+
1. A non-streaming `UnresolvedCatalogRelation` of [Spark (DataSource) table](../connectors/DDLUtils.md#isDatasourceTable)
39
+
1. A non-streaming `UnresolvedCatalogRelation` of a Hive table
40
+
1. A streaming `UnresolvedCatalogRelation`
41
+
1. A `StreamingRelationV2` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/StreamingRelationV2/)) over a streaming `UnresolvedCatalogRelation`
42
+
43
+
??? note "Streaming and Non-Streaming `UnresolvedCatalogRelation`s"
44
+
The difference between streaming and non-streaming `UnresolvedCatalogRelation`s is the [isStreaming](../logical-operators/LogicalPlan.md#isStreaming) flag that is disabled (`false`) by default.
`getStreamingRelation` creates a `StreamingRelation` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/StreamingRelation/)) with a [DataSource](../DataSource.md#creating-instance) with the following:
57
+
58
+
Property | Value
59
+
-|-
60
+
[DataSource provider](../DataSource.md#className) | The [provider](../CatalogTable.md#provider) of the given [CatalogTable](../CatalogTable.md)
61
+
[User-specified schema](../DataSource.md#userSpecifiedSchema) | The [schema](../CatalogTable.md#schema) of the given [CatalogTable](../CatalogTable.md)
62
+
[Options](../DataSource.md#options) | [DataSource options](../connectors/DataSourceUtils.md#generateDatasourceOptions) based on the given `extraOptions` and the [CatalogTable](../CatalogTable.md)
63
+
[CatalogTable](../DataSource.md#catalogTable) | The given [CatalogTable](../CatalogTable.md)
64
+
65
+
---
66
+
67
+
`getStreamingRelation` is used when:
68
+
69
+
*`FindDataSourceTable` is requested to resolve streaming `UnresolvedCatalogRelation`s
70
+
71
+
## Demo
13
72
14
73
```text
15
74
scala> :type spark
16
75
org.apache.spark.sql.SparkSession
76
+
```
17
77
78
+
```scala
18
79
// Example: InsertIntoTable with UnresolvedCatalogRelation
`apply` resolves `UnresolvedCatalogRelation`s for Spark (Data Source) and Hive tables:
63
130
64
131
* `apply` [creates HiveTableRelation logical operators](#readDataSourceTable) for `UnresolvedCatalogRelation`s of Spark tables (incl. `InsertIntoTable`s)
65
132
66
133
* `apply` [creates LogicalRelation logical operators](#readHiveTable) for `InsertIntoTable`s with `UnresolvedCatalogRelation` of a Hive table or `UnresolvedCatalogRelation`s of a Hive table
67
134
68
-
`apply` is part of [Rule](../catalyst/Rule.md#apply) contract.
If not available, `readDataSourceTable` [creates a new DataSource](../DataSource.md) for the [provider](../CatalogTable.md#provider) (of the input `CatalogTable`) with the extra `path` option (based on the `locationUri` of the [storage](../CatalogTable.md#storage) of the input `CatalogTable`). `readDataSourceTable` requests the `DataSource` to [resolve the relation and create a corresponding BaseRelation](../DataSource.md#resolveRelation) that is then used to create a [LogicalRelation](../logical-operators/LogicalRelation.md) with the input [CatalogTable](../CatalogTable.md).
95
160
96
161
NOTE: `readDataSourceTable` is used when `FindDataSourceTable` is requested to <<apply, resolve an UnresolvedCatalogRelation in a logical plan>> (for data source tables).
0 commit comments