Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge latest master #29

Merged
merged 47 commits into from
Jan 21, 2025
Merged

Merge latest master #29

merged 47 commits into from
Jan 21, 2025

Conversation

kunwp1
Copy link

@kunwp1 kunwp1 commented Jan 21, 2025

No description provided.

Yicong-Huang and others added 30 commits December 15, 2024 15:51
We are excited to announce texera.io as the official website for the
Texera project.
- Redundant sections such as “Motivation,” “Education,” and “Videos”
have been removed.
- The "Goals" and “Publications” (including both computer science and
interdisciplinary) section, is retained for some redundancy for now. We
will consider to remove them later.
- The blog has been relocated to a new dedicated home under texera.io
for better accessibility and organization. The old blog site is
deprecated now.
This operator's design requires some more discussion. The current issues
are:
1. it is accessing the workflow context (i.e., workflowId and
executionId), which may not be permitted in the future.
2. It is directly interacting with the local file system.

For the short term, we will drop the support for such as downloader. We
can discuss how to support it in the future when needed.
This PR removes the obsolete `core/util` package which is used to
generate jooq code. The generation logics has been moved to `dao` as
class `JooqCodeGenerator`
## Purpose
Address #3042
There was an issue where visualization results were very easy to make
panels "sticky", meaning the panels would be dragged even though the
user is no longer holding down the mouse. This PR aims to fix this
issue.
## Changes
Add logic to modify the z-index of visualization results directly. If
the panel is being dragged, the visualization result's z-index becomes
-1, and if the panel is not dragged, the z-index changes back to 0 so
that the user can interact with the visualization.
## Demo
Before

![chrome_svDxLZYYvl](https://github.com/user-attachments/assets/8299e19d-9c66-4d9c-a40f-3fed8f7823be)
After

![R3oNSHxH8J](https://github.com/user-attachments/assets/65069f65-3995-435a-88f6-d7cc5b66ddf1)

---------

Co-authored-by: Xinyuan Lin <[email protected]>
As titled, this PR removes the redundant jooq codes in package
`edu.uci.ics.texera.web.model.jooq.generated` in `core/amber`. All the
imports related to them are now pointing to
`edu.uci.ics.texera.dao.jooq.generated` in `core/dao`.
This PR modifies the links in README to texera.io.
1. Add link to landing page of texera.io on the texera logo.
2. Change urls to be more meaningful, instead of a numerical number.
3. Add links to publications, video, etc.
This PR removes the cache source descriptor. We explicitly create a
physical operator to read cache during scheduling (cut off physical
links for materialization).
### Purpose:
Since "Hub" has been chosen as the more formal name for the community,
the previous name needs to be replaced with "Hub."

### Change:
The 'Community' in the left sidebar has been replaced with 'Hub'.

### Demo:
Before:

![image](https://github.com/user-attachments/assets/29604593-7453-43d7-ab80-1f37637752f7)

After:

![image](https://github.com/user-attachments/assets/ac41b971-9f10-4e35-b0a0-8ba37bd200a9)
…bles. (#3168)

### Purpose:
Although the column indicating publication already exists in both the
dataset and workflow tables, they have different names and types.

### Change:
Modified the dataset and workflow tables to unify the format of
is_public across different tables. **To complete the database update,
please execute `18.sql` located in the update folder.**
This PR unifies the design of the output mode, and make it a property on
an output port.

Previously we had two modes:
1. SET_SNAPSHOT (return a snapshot of a table)
2. SET_DELTA (return a delta of tuples) 

And different chart types (e.g., HTML, bar chart, line chart). However,
we are only using HTML type after switching to pyplot.

Additionally, the output mode and chart type are associated with a
logical operator, and passed along to the downstream sink operator, this
does not support multiple output ports operators.

### New design
1. Move OutputMode onto an output port's property.
2. Unify to three modes:
    a. SET_SNAPSHOT (return a snapshot of a table)
    b. SET_DELTA (return a delta of tuples)
    c. SINGLE_SNAPSHOT (only used for visualizations to return a html)
The SINGLE_SNAPSHOT is needed now as we need a way to differenciate a
HTML output vs a normal data table output. This is due to the storage
with mongo is limited by 16 mb, and HTMLs are usually larger than 16 mb.
After we remove this limitation on the storage, we will remove the
SINGLE_SNAPSHOT and fall back to SET_SNAPSHOT.
This PR refactors the handling of sink operators in Texera by removing
the sink descriptor and introducing a streamlined approach to creating
physical sink operators during compilation and scheduling. Additionally,
it shifts the storage assignment logic from the logical layer to the
physical layer.

1. **Sink Descriptor Removal:** Removed the sink descriptor, physical
sink operators are no longer created through descriptors. In the future,
we will remove physical sink operators.
2. **Sink Operator Creation:**
- Introduced a temporary factory for creating physical sink operators
without relying on a descriptor.
- Physical sink operators are now considered part of the sub-plan of
their upstream logical operator. For example: If the HashJoin logical
operator requires a sink, its physical sub-plan includes the building
physicalOp, probing physicalOp, and the sink physicalOp.
3. **Storage Assignment Refactor:**
- Merged the storage assignment logic into the physical layer, removing
it from the logical layer.
- When a physical sink operator is created (either during compilation or
scheduling), its associated storage is also created at the same moment.
This PR fixes the issue that, when MongoDB is used as the result
storage, the execution keeps failing.

The root cause is:

when using MongoDB as the result storage, the schema has to be extracted
from the physical op
the code implementation uses the physical ops in subPlan to extract the
schema out, which is incorrect as subPlan's physical ops are not
propagated for output schemas
How the fix is done is: using physicalPlan.getOperator, instead of
subPlan.getOperator. In this way, the operator that is flowing to the
downstream is from `physicalPlan`, not `subPlan`
This PR fixes an issue where operators marked for viewing results did
not display the expected results due to a mismatch caused by
string-based comparison instead of using operator identities. By
updating the matching logic to rely on operator identities, this change
ensures accurate identification of marked operators and correct display
of their results.
Schema propagation is now handled in the physical plan to ensure all
ports, both external and internal, have an associated schema. As a
result, schema propagation in the logical plan is no longer necessary.

In addition, workflow context was used to give user information so that
schema propagation can resolve file names. This is also no longer needed
as we are resolving files through an explicit call as a step of
compiling. WorkflowContext is no longer needed to be set to Logical
Operator.
The cache logic will be rewritten at the physical plan layer, using
ports as the caching unit. This version is being removed temporarily.
Previously, all results were tied to logical operators. This PR modifies
the engine to associate results with output ports, enabling better
granularity and support for operators with multiple outputs.

### Key Change: StorageKey with PortIdentity

The most significant update in this PR is the adjustment of the storage
key format to include both the logical operator ID and the port ID. This
ensures that logical operators with multiple output ports (e.g., Split)
can have distinct storages created for each output port.

For now, the frontend retrieves results from the default output port
(port 0). In future updates, the frontend will be enhanced to support
retrieving results from additional output ports, providing more
flexibility in how results are accessed and displayed.
There are a few operator descriptors written in Java, which makes them
difficult to use and maintain. This PR converts all such descriptors to
Scala to streamline the migration process to new APIs and facilitate
future work, such as operator offloading.

Changed Operators:
- PythonUDFSourceOpDescV2
- RUDFSourceOpDesc
- SentimentAnalysisOpDesc
- SpecializedFilterOpDesc
- TypeCastingOpDesc
…de-based IDEs (#3180)

Add Scala([Metals](https://github.com/scalameta/metals)) generated folders to gitignore to support VSCode and Cursor IDE. Metals is a popular plugin for VSCode-based IDE to develop scala applications.
This PR moves all definitions of protobuf under workflow-core to be
within the core package name.
The scalapb proto definition both present in workflow-core and amber.
This PR removes the second copy.
The Python protobuf-generated code was outdated. This PR updates the
generation script to include all protobuf definitions from the
workflow-core and amber sub-projects, ensuring that the latest Python
code is generated and aligned with the current protobuf definitions.
Previously, creating a physical operator during compilation also
required the creation of its corresponding executor instances. To delay
this process so that executor instances were created within workers, we
used a lambda function (in `OpExecInitInfo`). However, the lambda
approach had a critical limitation: it was not serializable and it is
language dependent.

This PR addresses this issue by replacing the lambda functions in
`OpExecInitInfo` with fully serializable Protobuf entities. The
serialized information now ensures compatibility with distributed
environments and is language-independent. Two primary types of
`OpExecInitInfo` are introduced:

1. **`OpExecWithClassName`**:
   - **Fields**: `className: String`, `descString: String`.
- **Behavior**: The language compiler dynamically loads the class
specified by `className` and uses `descString` as its initialization
argument.

2. **`OpExecWithCode`**:
   - **Fields**: `code: String`, `language: String`.
- **Behavior**: The language compiler compiles the provided `code` based
on the specified `language`. The arguments are already pre-populated
into the code string.

### Special Cases

The `ProgressiveSink` and `CacheSource` executors are treated as special
cases. These executors require additional unique information (e.g.,
`storageKey`, `workflowIdentity`, `outputMode`) to initialize their
executor instances. While this PR preserves the handling of these
special cases, these executors will eventually be refactored or removed
as part of the plan to move storage management to the port layer.
This PR improves exception handling in AsyncRPCServer to unwrap the
actual exception from InvocationTargetException.

Old:
<img width="889" alt="截屏2024-12-31 上午2 38 34"
src="https://github.com/user-attachments/assets/8ec40cce-1b7a-4ecc-8518-a67b7e79888b"
/>


New:
<img width="1055" alt="截屏2024-12-31 上午2 33 18"
src="https://github.com/user-attachments/assets/de1c3e42-0dbb-4dbe-8b76-dee486d5bbb9"
/>
This PR removes all schema propagation functions from the logical plan.
Developers are now required to implement `SchemaPropagationFunc`
directly within the PhysicalPlan. This ensures that each PhysicalOp has
its own distinct schema propagation logic, aligning schema handling more
closely with the execution layer.

To accommodate the need for schema propagation in the logical plan
(primarily for testing purposes), a new method,
`getExternalOutputSchemas`, has been introduced. This method facilitates
the propagation of schemas across all PhysicalOps within a logical
operator, ensuring compatibility with existing testing workflows.
Each port must have exactly one schema. If multiple links are connected
to the same port, they are required to share the same schema. This PR
introduces a validation step during schema propagation to ensure this
constraint is enforced as part of the compilation process.

For example, consider a Union operator with a single input port that
supports multiple links. If upstream operators produce differing output
schemas, the validation will fail with an appropriate error message:
![CleanShot 2024-12-31 at 14 56
36](https://github.com/user-attachments/assets/077594b4-26fb-4983-9ce4-d7c67365645e)
#### This PR introduces the `CostEstimator` trait which estimates the
cost of a region, given some resource units.
- The cost estimator is used by `CostBasedScheduleGenerator` to
calculate the cost of a schedule during search.
- Currently we only consider one type of schedule for each region plan,
which is a total order of the regions. The cost of the schedule (and
also the cost of the region plan) is thus the summation of the cost of
each region.
- The resource units are currently passed as placeholders because we
assume a region will have all the resources when doing the estimation.
The units may be used in the future if we consider different methods of
schedule-generation. For example, if we allow two regions to run
concurrently, the units will be split in half for each region.

#### A `DefaultCostEstimator` implementation is also added, which uses
past execution statistics to estimate the wall-clock runtime of a
region:
- The runtime of each region is represented by the runtime of its
longest-running operator.
- The runtime of operators are estimated using the statistics from the
**latest successful execution** of the workflow.
- If such statistics do not exist (e.g., if it is the first execution,
or if past executions all failed), we fall back to using number of
materialized edges as the cost.
- Added test cases using mock mysql data.
To simplify schema creation, this PR removes the Schema.builder()
pattern and makes Schema immutable. All modifications now result in the
creation of a new Schema instance.
PhysicalOp relies on the input port number to determine if an operator
is a source operator. For Python UDF, from the changes in #3183, the
input ports are not correctly associated with the PhysicalOp, causing
all the Python UDFs to be recognized as source operators. This PR fixes
the issue.
The ubuntu-latest image has been updated to 24.04 from 22.04 in recent
days. However, the new image is incompatible with libncurses5, requiring
an upgrade to libncurses6. Unfortunately, after upgrading, sbt no longer
functions as expected, an issue also documented here:
[actions/setup-java#712](actions/setup-java#712).
It appears that the 24.04 image does not include sbt by default.

This PR addresses the issue by pinning the image to ubuntu-22.04. We can
revisit and update the version when the 24.04 image becomes more stable
and resolves these compatibility problems.
In this PR, we add the user avatar to the execution history panel.
<img width="462" alt="Screenshot 2025-01-06 at 3 53 23 PM"
src="https://github.com/user-attachments/assets/e4e662af-c1a9-4686-90f4-b6bef155a36b"
/>
bobbai00 and others added 17 commits January 8, 2025 15:46
# Implement Apache Iceberg for Result Storage

<img width="556" alt="Screenshot 2025-01-06 at 3 18 19 PM"
src="https://github.com/user-attachments/assets/4edadb64-ee28-48ee-8d3c-1d1891d69d6a"
/>


## How to Enable Iceberg Result Storage
1. Update `storage-config.yaml`:
   - Set `result-storage-mode` to `iceberg`.

## Major Changes
- **Introduced `IcebergDocument`**: A thread-safe `VirtualDocument`
implementation for storing and reading results in Iceberg tables.
- **Introduced `IcebergTableWriter`**: Append-only writer for Iceberg
tables with configurable buffer size.
- **Catalog and Data storage for Iceberg**: Uses a local file system
(`file:/`) via `HadoopCatalog` and `HadoopFileIO`. This ensures Iceberg
operates without relying on external storage services.
- `ProgressiveSinkOpExec` with a new parameter `workerId` is added. Each
writer of the result storage will take this `workerId` as one new
parameter.

## Dependencies
- Added Apache Iceberg-related libraries.
- Introduced Hadoop-related libraries to support Iceberg's
`HadoopCatalog` and `HadoopFileIO`. These libraries are used for
placeholder configuration but do not enforce runtime dependency on HDFS.

## Overview of Iceberg Components
### `IcebergDocument`
- Manages reading and organizing data in Iceberg tables.
- Supports iterator-based incremental reads with thread-safe operations
for reading and clearing data.
- Initializes or overrides the Iceberg table during construction.

### `IcebergTableWriter`
- Writes data as immutable Parquet files in an append-only manner.
- Each writer uniquely prefixes its files to avoid conflicts
(`workerIndex_fileIndex` format).
- Not thread-safe—single-thread access is recommended.

## Data Storage via Iceberg Tables
- **Write**: 
  - Tables are created per `storage key`.
  - Writers append Parquet files to the table, ensuring immutability.
- **Read**:
  - Readers use `IcebergDocument.get` to fetch data via an iterator.
- The iterator reads data incrementally while ensuring data order
matches the commit sequence of the data files.

## Data Reading Using File Metadata
- Data files are read using `getUsingFileSequenceOrder`, which:
- Retrieves and sorts metadata files (`FileScanTask`) by sequence
numbers.
  - Reads records sequentially, skipping files or records as needed.
- Supports range-based reading (`from`, `until`) and incremental reads.
- Sorting ensures data consistency and order preservation.

## Hadoop Usage Without HDFS
- The `HadoopCatalog` uses an empty Hadoop configuration, defaulting to
the local file system (`file:/`).
- This enables efficient management of Iceberg tables in local or
network file systems without requiring HDFS infrastructure.

---------

Co-authored-by: Shengquan Ni <[email protected]>
This PR removes the `MemoryDocument` class and its usage. Additionally,
it updates the fallback mechanism for `MongoDocument`, changing it from
`Memory` to `Iceberg`.
…t functioning properly (#3200)

### Purpose:
Currently, a 400 error occurs when calling `persistWorkflow`. The reason
is that the parameter sent from the frontend to the backend is
`isPublished` instead of `isPublic`.

### Change:
Change the name of the parameter sent to the backend.

### Demos:
Before:

![image](https://github.com/user-attachments/assets/157ec547-5dfc-4cd2-b805-6ae919e7ed68)

After:

![image](https://github.com/user-attachments/assets/403ebab3-2f29-4993-af01-af7309ada02c)
Remove the redundant Flarum user registration service from the Texera
backend, as it is unnecessary. User registration for Flarum can be
handled directly by calling the Flarum user registration API from the
Texera frontend.

This PR does not affect the functionality or lifecycle of either Texera
or Flarum.
This PR addresses schema normalization and logic improvements for
tracking operator runtime statistics in a workflow execution system. It
introduces changes to the database schema, migration scripts, and Scala
code responsible for inserting and managing runtime statistics. The goal
is to reduce redundancy, improve maintainability, and ensure data
consistency between `operator_executions` and
`operator_runtime_statistics`.

### Schema Design
1. New Table Design:
- `operator_executions`: Tracks execution metadata for each operator in
a workflow execution. Each row contains `operator_execution_id`,
`workflow_execution_id`, and `operator_id`. This table ensures that
operator executions are uniquely identifiable.
- `operator_runtime_statistics`: Tracks runtime statistics for each
operator execution at specific timestamps. It includes
`operator_execution_id` as a foreign key, ensuring a direct reference to
`operator_executions`.

2. Normalization Improvements:
- Replaced repeated `execution_id` and `operator_id` in
`workflow_runtime_statistics` with a single foreign key
`operator_execution_id`, pointing to `operator_executions`.
- Split the previous large `workflow_runtime_statistics` table into
smaller, more manageable tables, eliminating redundancy and improving
data integrity.

3. Indexes and Keys:
- Added a composite index on `operator_execution_id` and `time` in
`operator_runtime_statistics` to speed up joins and queries ordered by
time.

### Testing
The `core/scripts/sql/update/19.sql` will create the two new tables,
`operator_executions` and `operator_runtime_statistics`, and migrate the
data from `workflow_runtime_statistics` to those two tables.

---------

Co-authored-by: Kunwoo Park <[email protected]>
Co-authored-by: Kunwoo Park <[email protected]>
Co-authored-by: Kunwoo Park <[email protected]>
Co-authored-by: Kunwoo Park <[email protected]>
Co-authored-by: Kunwoo Park <[email protected]>
Co-authored-by: Kunwoo Park <[email protected]>
### Purpose
It looks to restricted/inactive users as they can clone a workflow, when
they actually cannot. To reduce the confusion, the clone button is
disabled in the GUI.
fix #3066 

### Changes

Added variables isAdminOrRegularUser in
hub-workflow-detail-component.ts. Check the user role. If the user is
not an admin or regular user, disable the clone button.



https://github.com/user-attachments/assets/c426c50d-fb31-4377-a2b7-3de2a24da512

---------

Co-authored-by: Texera <[email protected]>
This PR refactors the codes style of the `DatasetResource` and
`DatasetAccessResource`.

Specifically, the refactoring principles are:
1. Move all the function definition in `object`, that are only being
used within the `class`, to the `class`
2. Uniform the way of interacting with jooq: use either jooq raw query,
or Dao, not both.
3. Flatten single-kv-pair case classes.
4. Flatten the single-line function.
This PR addresses an issue where the system interacts with the MySQL
database even when `user-sys.enabled` is set to `false`. In such cases,
the system should not perform any interactions with the MySQL database.
To resolve this, an if statement has been added to remove the logic,
ensuring the system behaves as expected.

Co-authored-by: Kunwoo Park <[email protected]>
The `IF` operator evaluates a condition against a specified state
variable and routes the input data to either the `True` or `False`
branch accordingly. The condition port accepts a state variable, and
users can define the name of the state variable to be evaluated by the
`IF` operator.
Note: The Date to State operator will be introduced in a separate PR.


![image](https://github.com/user-attachments/assets/5f4c4cc9-a6d9-4c9d-b689-a4a833200a64)
…functions (#3211)

This PR refactor the storage key by representing it using the storage
URI. And this PR also adds the `openDocument` and `createDocument`
function using the given URI.

## Major Changes

### 1. VFS URI resource type definition, resolve and decode functions
Two types of VFS resources are defined:
```
object VFSResourceType extends Enumeration {
  val RESULT: Value = Value("result")
  val MATERIALIZED_RESULT: Value = Value("materializedResult")
}
```

Two defs are added to the `FileResolver`

- `resolve`: create the URI pointing to the storage resource on the VFS
```java
/**
    * Resolve a VFS resource to its URI. The URI can be used by the DocumentFactory to create resource or open resource
    *
    * @param resourceType   The type of the VFS resource.
    * @param workflowId     Workflow identifier.
    * @param executionId    Execution identifier.
    * @param operatorId     Operator identifier.
    * @param portIdentity   Optional port identifier. **Required** if `resourceType` is `RESULT` or `MATERIALIZED_RESULT`.
    * @return A VFS URI
    * @throws IllegalArgumentException if `resourceType` is `RESULT` but `portIdentity` is missing.
    */
  def resolve(
      resourceType: VFSResourceType.Value,
      workflowId: WorkflowIdentity,
      executionId: ExecutionIdentity,
      operatorId: OperatorIdentity,
      portIdentity: Option[PortIdentity] = None
  ): URI
```

- `decodeVFSUri`: decode a VFS URI to components
```java
  /**
    * Parses a VFS URI and extracts its components
    *
    * @param uri The VFS URI to parse.
    * @return A `VFSUriComponents` object with the extracted data.
    * @throws IllegalArgumentException if the URI is malformed.
    */
  def decodeVFSUri(uri: URI): (
      WorkflowIdentity,
      ExecutionIdentity,
      OperatorIdentity,
      Option[PortIdentity],
      VFSResourceType.Value
  )
```


### 2. `createDocument` and `openDocument` functions to the
`DocumentFactory`

`createDocument` and `openDocument` defs to create/open a storage
resource pointed by the URI

- `DocumentFactory.createDocument`
```java
  /**
    * Create a document for storage specified by the uri.
    * This document is suitable for storing structural data, i.e. the schema is required to create such document.
    * @param uri the location of the document
    * @param schema the schema of the data stored in the document
    * @return the created document
    */
  def createDocument(uri: URI, schema: Schema): VirtualDocument[_]
```

- `DocumentFactory.openDocument`
```java
  /**
    * Open a document specified by the uri.
    * The document should be storing the structural data as the document and the schema will be returned
    * @param uri the uri of the document
    * @return the VirtualDocument, which is the handler of the data; the Schema, which is the schema of the data stored in the document
    */
  def openDocument(uri: URI): (VirtualDocument[_], Schema)
```
This PR removes the configuration related to the Python language server
from application.conf, as it is no longer needed. We have already
separated the python language server.
…urceMapping during the garbage collection (#3216)

This PR completes the garbage collection logic under the
newly-introduced URIs and ExecutionResourceMapping.

During the garbage collection, after all the documents associated with
the last execution id are released, it then removes the entry from the
mapping.

This PR also did the scalafixAll to remove unused imports.
This PR fixes the slow workflow execution issue by calling
desc.sourceSchema only once in the scan operator executor's constructor,
instead of for every output tuple.

This PR also includes some reformatting to remove unused imports.
This PR addresses the issue #3016 that execution dashboard takes long
time to load.

Changed made:
- Add loading icon in the frontend while loading the execution info
- Move the pagination to backend, including the sorting and filtering
function.

Loading Icon: 

![execution_loading](https://github.com/user-attachments/assets/d7494b54-eb15-4361-b1cd-a70f276aa6f6)

Page Size Change:

![newnewnew](https://github.com/user-attachments/assets/9e1da835-7b2c-4ca1-a02b-87b7b3694101)

---------

Co-authored-by: Chris <[email protected]>
### Purpose:
The current Google login button occasionally disappears. The reason is
that the Google script is loaded statically, meaning it is automatically
loaded via the `<script>` tag when the page loads. This approach has a
potential issue: if we want to use the `onGoogleLibraryLoad` callback
function to confirm that the script has successfully loaded, we must
ensure that the callback is defined before the script finishes loading.
Otherwise, when the script completes loading, Google will check whether
`onGoogleLibraryLoad` is defined. If it is not defined, the script load
notification will be missed, and the Google login initialization logic
will not execute.
fix #3155

### Changes:
1. To reduce the maintenance cost caused by frequent updates to the
Google official API, we chose to implement Google login using the
third-party library angularx-social-login instead of directly relying on
the Google official API.
2. Added the dependency @abacritt/angularx-social-login (version 2.1.0).
Please use `yarn install` to install it.
3. Additionally, a test file for the Google Login component has been
added to verify whether the component initializes correctly after the
script is successfully loaded.

### Demos:
Before:


https://github.com/user-attachments/assets/2717e6cb-d250-49e0-a09d-d3d11ffc7be3

After:


https://github.com/user-attachments/assets/81d9e48b-02ee-48ad-8481-d7a50ce43ba0

---------

Co-authored-by: Chris <[email protected]>
Co-authored-by: Xinyuan Lin <[email protected]>
Fix the issue #3055  that CSV Scan an empty file would result in error.

---------

Co-authored-by: Jiadong Bai <[email protected]>
@kunwp1 kunwp1 merged commit 0b7e7bc into Texera:master Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.