Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge latest master #29

Merged
merged 47 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8b92548
Add texera.io to README (#3158)
Yicong-Huang Dec 15, 2024
edaf799
Remove BulkDownloader operator (#3161)
Yicong-Huang Dec 16, 2024
41c47bc
Add JooqCodeGenerator to `dao` and remove `core/util` (#3160)
bobbai00 Dec 16, 2024
fc3170a
Address Result Panel Getting Sticky Too Easily (#3136)
sixsage Dec 17, 2024
8cc5098
Remove redundant jooq codes and their usages in `core/amber` (#3164)
bobbai00 Dec 17, 2024
752306f
Modify texera io links in README (#3162)
Yicong-Huang Dec 17, 2024
270f6d7
Remove cache source descriptor (#3163)
Yicong-Huang Dec 17, 2024
64dd0c1
Modify the name 'community' to 'hub' (#3167)
GspikeHalo Dec 18, 2024
7ae89dc
Standardize the "is_public" column across the dataset and workflow ta…
GspikeHalo Dec 18, 2024
5524099
Move output mode on port (#3169)
Yicong-Huang Dec 19, 2024
f90362e
Remove sink desc (#3170)
Yicong-Huang Dec 20, 2024
8a97cbb
Fix the schema fetching during sink storage assignment (#3174)
bobbai00 Dec 22, 2024
5267fec
Fix view result (#3176)
Yicong-Huang Dec 27, 2024
d85ce7a
Remove logical schema propagation (#3177)
Yicong-Huang Dec 30, 2024
14572dd
Remove cache checker in logical plan (#3178)
Yicong-Huang Dec 30, 2024
5b622e3
Support multiple output ports with storage (#3175)
Yicong-Huang Dec 30, 2024
0414544
Convert Java operator descriptors to Scala (#3179)
Yicong-Huang Dec 30, 2024
2c028f8
Update .gitignore to include Metals generated folders to support VSCo…
shengquan-ni Dec 30, 2024
35f1849
Move protobuf definitions under core package (#3181)
Yicong-Huang Dec 30, 2024
1d3561b
Remove duplicated scalapb definition (#3182)
Yicong-Huang Dec 31, 2024
4ea5fac
Fix python proto gen (#3184)
Yicong-Huang Dec 31, 2024
4617890
Make OpExecInitInfo serializable (#3183)
Yicong-Huang Dec 31, 2024
90f99e5
Enhance error handling and stack trace formatting (#3185)
shengquan-ni Dec 31, 2024
0eb36d0
Remove logical schema propagation (#3186)
Yicong-Huang Dec 31, 2024
2556432
Add single schema per port validation during compilation (#3187)
Yicong-Huang Dec 31, 2024
bf6ffc9
Add Cost Estimator Using Past Statistics for Schedule Generator (#3156)
Xiao-zhen-Liu Jan 1, 2025
c2bef3a
Simplify schema build (#3188)
Yicong-Huang Jan 1, 2025
f2aeb0a
Fix python udf source detection (#3189)
Yicong-Huang Jan 1, 2025
19644b4
Fix CI failures by pining the ubuntu version for backend CI (#3194)
shengquan-ni Jan 6, 2025
a1186f8
Add avatar in execution history dashboard (#3196)
yunyad Jan 7, 2025
7debf45
Add IcebergDocument as one type of the operator result storage (#3147)
bobbai00 Jan 8, 2025
630c59d
Remove MemoryDocument from result storage (#3201)
shengquan-ni Jan 9, 2025
3144fd9
Fix the issue where the persistWorkflow method in the workspace is no…
GspikeHalo Jan 9, 2025
d4de38f
Remove Flarum synchronization service from the webserver (#3165)
aglinxinyuan Jan 10, 2025
ae0c7bc
Normalize and Improve Operator Runtime Statistics Handling (#3171)
kunwp1 Jan 13, 2025
8ee3570
Disable the “Clone” button for restricted and inactive users (#3208)
paulschatt Jan 14, 2025
3f66f4e
Fix serialization of LineMode for LineChartOp (#3199)
bobbai00 Jan 14, 2025
00aa822
Refactor DatasetResource and DatasetAccessResource (#3210)
bobbai00 Jan 14, 2025
36e8752
Prevent MySQL Interaction When user-sys.enabled Is False (#3213)
kunwp1 Jan 14, 2025
9b9c200
Introduce IF operator (#3090)
aglinxinyuan Jan 14, 2025
d4d4176
Add URI generator, resolver and corresponding document open & create …
bobbai00 Jan 15, 2025
3ffa9bf
Remove redundant configurations (#3195)
yunyad Jan 15, 2025
64007f7
Remove the corresponding resource entry from the global ExecutionReso…
bobbai00 Jan 15, 2025
13cb52e
Fix slow scan operators (#3215)
shengquan-ni Jan 16, 2025
c5c0701
Execution Dashboard Backend Pagination & Frontend Loading Icon (#3105)
MiuMiuMiue Jan 16, 2025
58a214f
Fix Google Login button disappearing issue (#3197)
GspikeHalo Jan 19, 2025
586496c
Fix CSV File Scan Operator (#3217)
MiuMiuMiue Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/github-action-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
core:
strategy:
matrix:
os: [ ubuntu-latest ]
os: [ ubuntu-22.04 ]
java-version: [ 11 ]
runs-on: ${{ matrix.os }}
env:
Expand Down
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,10 @@ StoredCredential*
**/apache2/
**/Apache24/
**/php/
Composer-Setup.exe
Composer-Setup.exe

# Ignoring folders generated by vscode IDE
.metals/
.bloop/
.ammonite/
metals.sbt
70 changes: 7 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
<h1 align="center">Texera - Collaborative Data Science and AI/ML Using Workflows</h1>

<p align="center">
<img src="core/gui/src/assets/logos/full_logo_small.png" alt="texera-logo" width="192px" height="109px"/>
<a href="https://texera.io"> <img src="core/gui/src/assets/logos/full_logo_small.png" alt="texera-logo" width="192px" height="109px"/> </a>
<br>
<i>Texera supports scalable data computation and enables advanced AI/ML techniques.</i>
<br>
<i>"Collaboration" is a key focus, and we enable an experience similar to Google Docs, but for data science. </i>
<br>

<h4 align="center">
<a href="https://github.com/Texera/texera#videos">Demo Video</a>
<a href="https://texera.io">Official Site</a>
|
<a href="https://texera.github.io/blog/">Blogs</a>
<a href="https://texera.io/publications/">Publications</a>
|
<a href="https://texera.io/category/video/">Video</a>
|
<a href="https://texera.io/category/blog/">Blog</a>
|
<a href="https://github.com/Texera/texera/wiki/Getting-Started">Getting Started</a>
<br>
Expand All @@ -29,13 +33,6 @@
<img alt="Static Badge" src="https://img.shields.io/badge/Largest_Deployment-100_nodes,_400_cores-green">
</p>

# Motivation

* Data science is labor-intensive and particularly challenging for non-IT users applying AI/ML.
* Many workflow-based data science platforms lack parallelism, limiting their ability to handle big datasets.
* Cloud services and technologies have advanced significantly over the past decade, enabling powerful browser-based interfaces supported by high-speed networks.
* Existing data science platforms offer limited interaction during long-running jobs, making them difficult to manage after execution begins.

# Goals

* Provide data science as cloud services;
Expand Down Expand Up @@ -148,59 +145,6 @@ The workflow in the use case shown below includes data cleaning, ML model traini
_In JAMIA 2021_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7989302/pdf/ocab047.pdf)
</details>


# Education
<table>
<tr style="height: 500px;">
<td align="center">
<a href="https://ds4all.ics.uci.edu/">
<img src="https://ds4all.ics.uci.edu/wp-content/uploads/2023/07/banner-1024x576.png">
</a>
<p><b>Data Science for All</b></p>
An NSF-funded summer program to teach high-school students data science and AI/ML
</td>
<td align="center">
<a href="https://canvas.eee.uci.edu/courses/63639/pages/syllabus">
<img src="https://github.com/user-attachments/assets/a7569fd3-6857-48b4-80dc-d9f006ae2c8f">
</a>
<p><b>ICS 80: Data Science and AI/ML Using Workflows</b></p>
A Spring 2024 course at UCI, teaching 42 undergraduates, most of whom are not computer science majors, to learn data science and AI/ML
</td>
<td align="center">
<a href="https://sites.google.com/uci.edu/ds-workshop2024/home">
<img src="https://www.cerritos.edu/_resources/images/common/cerritos-college-logo.svg">
</a>
<p><b>Workshop of Data Science for Everyone at Cerritos College</b></p>
A two-day workshop designed for non-CS students to learn data science and ML without a single line of coding
</td>
</tr>
</table>


# Videos
<table>
<tr style="height: 500px;">
<td align="center">
<a href="https://www.youtube.com/watch?v=B81iMFS5fPc">
<img src="https://img.youtube.com/vi/B81iMFS5fPc/0.jpg" alt="Watch the video">
</a>
<p><b>dkNET Webinar 04/26/2024</b></p>
</td>
<td align="center">
<a href="https://www.youtube.com/watch?v=SP-XiDADbw0">
<img src="https://img.youtube.com/vi/SP-XiDADbw0/0.jpg" alt="Watch the video">
</a>
<p><b>Texera Demo @ VLDB'20</b></p>
</td>
<td align="center">
<a href="https://www.youtube.com/watch?v=T5ShFRfHmgI">
<img src="https://img.youtube.com/vi/T5ShFRfHmgI/0.jpg" alt="Watch the video">
</a>
<p><b>Amber Presentation @ VLDB'20</b></p>
</td>
</tr>
</table>

# Getting Started

* For users, visit [Guide to Use Texera](https://github.com/Texera/texera/wiki/Getting-Started).
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
syntax = "proto3";
package edu.uci.ics.amber.engine.architecture.rpc;

import "edu/uci/ics/amber/virtualidentity.proto";
import "edu/uci/ics/amber/workflow.proto";
import "edu/uci/ics/amber/core/virtualidentity.proto";
import "edu/uci/ics/amber/core/workflow.proto";
import "edu/uci/ics/amber/core/executor.proto";
import "edu/uci/ics/amber/engine/architecture/worker/statistics.proto";
import "edu/uci/ics/amber/engine/architecture/sendsemantics/partitionings.proto";
import "scalapb/scalapb.proto";
Expand Down Expand Up @@ -58,8 +59,8 @@ message EmptyRequest{}

message AsyncRPCContext {
option (scalapb.message).no_box = true;
ActorVirtualIdentity sender = 1 [(scalapb.field).no_box = true];
ActorVirtualIdentity receiver = 2 [(scalapb.field).no_box = true];
core.ActorVirtualIdentity sender = 1 [(scalapb.field).no_box = true];
core.ActorVirtualIdentity receiver = 2 [(scalapb.field).no_box = true];
}

message ControlInvocation {
Expand All @@ -79,25 +80,25 @@ enum ChannelMarkerType {
// Message for ChannelMarkerPayload
message ChannelMarkerPayload {
option (scalapb.message).extends = "edu.uci.ics.amber.engine.common.ambermessage.WorkflowFIFOMessagePayload";
ChannelMarkerIdentity id = 1 [(scalapb.field).no_box = true];
core.ChannelMarkerIdentity id = 1 [(scalapb.field).no_box = true];
ChannelMarkerType markerType = 2;
repeated ChannelIdentity scope = 3;
repeated core.ChannelIdentity scope = 3;
map<string, ControlInvocation> commandMapping = 4;
}

message PropagateChannelMarkerRequest {
repeated PhysicalOpIdentity sourceOpToStartProp = 1;
ChannelMarkerIdentity id = 2 [(scalapb.field).no_box = true];
repeated core.PhysicalOpIdentity sourceOpToStartProp = 1;
core.ChannelMarkerIdentity id = 2 [(scalapb.field).no_box = true];
ChannelMarkerType markerType = 3;
repeated PhysicalOpIdentity scope = 4;
repeated PhysicalOpIdentity targetOps = 5;
repeated core.PhysicalOpIdentity scope = 4;
repeated core.PhysicalOpIdentity targetOps = 5;
ControlRequest markerCommand = 6;
string markerMethodName = 7;
}

message TakeGlobalCheckpointRequest {
bool estimationOnly = 1;
ChannelMarkerIdentity checkpointId = 2 [(scalapb.field).no_box = true];
core.ChannelMarkerIdentity checkpointId = 2 [(scalapb.field).no_box = true];
string destination = 3;
}

Expand All @@ -122,7 +123,7 @@ message ModifyLogicRequest {
}

message RetryWorkflowRequest {
repeated ActorVirtualIdentity workers = 1;
repeated core.ActorVirtualIdentity workers = 1;
}

enum ConsoleMessageType{
Expand All @@ -147,7 +148,7 @@ message ConsoleMessageTriggeredRequest {
}

message PortCompletedRequest {
PortIdentity portId = 1 [(scalapb.field).no_box = true];
core.PortIdentity portId = 1 [(scalapb.field).no_box = true];
bool input = 2;
}

Expand All @@ -156,21 +157,21 @@ message WorkerStateUpdatedRequest {
}

message LinkWorkersRequest {
PhysicalLink link = 1 [(scalapb.field).no_box = true];
core.PhysicalLink link = 1 [(scalapb.field).no_box = true];
}

// Ping message
message Ping {
int32 i = 1;
int32 end = 2;
ActorVirtualIdentity to = 3 [(scalapb.field).no_box = true];
core.ActorVirtualIdentity to = 3 [(scalapb.field).no_box = true];
}

// Pong message
message Pong {
int32 i = 1;
int32 end = 2;
ActorVirtualIdentity to = 3 [(scalapb.field).no_box = true];
core.ActorVirtualIdentity to = 3 [(scalapb.field).no_box = true];
}

// Pass message
Expand All @@ -185,7 +186,7 @@ message Nested {

// MultiCall message
message MultiCall {
repeated ActorVirtualIdentity seq = 1;
repeated core.ActorVirtualIdentity seq = 1;
}

// ErrorCommand message
Expand All @@ -194,7 +195,7 @@ message ErrorCommand {

// Collect message
message Collect {
repeated ActorVirtualIdentity workers = 1;
repeated core.ActorVirtualIdentity workers = 1;
}

// GenerateNumber message
Expand All @@ -203,7 +204,7 @@ message GenerateNumber {

// Chain message
message Chain {
repeated ActorVirtualIdentity nexts = 1;
repeated core.ActorVirtualIdentity nexts = 1;
}

// Recursion message
Expand All @@ -213,44 +214,43 @@ message Recursion {

// Messages for the commands
message AddInputChannelRequest {
ChannelIdentity channelId = 1 [(scalapb.field).no_box = true];
PortIdentity portId = 2 [(scalapb.field).no_box = true];
core.ChannelIdentity channelId = 1 [(scalapb.field).no_box = true];
core.PortIdentity portId = 2 [(scalapb.field).no_box = true];
}

message AddPartitioningRequest {
PhysicalLink tag = 1 [(scalapb.field).no_box = true];
core.PhysicalLink tag = 1 [(scalapb.field).no_box = true];
sendsemantics.Partitioning partitioning = 2 [(scalapb.field).no_box = true];
}

message AssignPortRequest {
PortIdentity portId = 1 [(scalapb.field).no_box = true];
core.PortIdentity portId = 1 [(scalapb.field).no_box = true];
bool input = 2;
map<string, string> schema = 3;
}

message FinalizeCheckpointRequest {
ChannelMarkerIdentity checkpointId = 1 [(scalapb.field).no_box = true];
core.ChannelMarkerIdentity checkpointId = 1 [(scalapb.field).no_box = true];
string writeTo = 2;
}

message InitializeExecutorRequest {
int32 totalWorkerCount = 1;
google.protobuf.Any opExecInitInfo = 2 [(scalapb.field).no_box = true];
core.OpExecInitInfo opExecInitInfo = 2;
bool isSource = 3;
string language = 4;
}

message UpdateExecutorRequest {
PhysicalOpIdentity targetOpId = 1 [(scalapb.field).no_box = true];
core.PhysicalOpIdentity targetOpId = 1 [(scalapb.field).no_box = true];
google.protobuf.Any newExecutor = 2 [(scalapb.field).no_box = true];
google.protobuf.Any stateTransferFunc = 3;
}

message PrepareCheckpointRequest{
ChannelMarkerIdentity checkpointId = 1 [(scalapb.field).no_box = true];
core.ChannelMarkerIdentity checkpointId = 1 [(scalapb.field).no_box = true];
bool estimationOnly = 2;
}

message QueryStatisticsRequest{
repeated ActorVirtualIdentity filterByWorkers = 1;
repeated core.ActorVirtualIdentity filterByWorkers = 1;
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ syntax = "proto3";

package edu.uci.ics.amber.engine.architecture.sendsemantics;

import "edu/uci/ics/amber/core/virtualidentity.proto";
import "scalapb/scalapb.proto";

option (scalapb.options) = {
Expand All @@ -10,8 +11,6 @@ option (scalapb.options) = {
no_default_values_in_constructor: true
};

import "edu/uci/ics/amber/virtualidentity.proto";

message Partitioning{
oneof sealed_value{
OneToOnePartitioning oneToOnePartitioning = 1;
Expand All @@ -24,29 +23,29 @@ message Partitioning{

message OneToOnePartitioning{
int32 batchSize = 1;
repeated ChannelIdentity channels = 2;
repeated core.ChannelIdentity channels = 2;
}

message RoundRobinPartitioning{
int32 batchSize = 1;
repeated ChannelIdentity channels = 2;
repeated core.ChannelIdentity channels = 2;
}

message HashBasedShufflePartitioning{
int32 batchSize = 1;
repeated ChannelIdentity channels = 2;
repeated core.ChannelIdentity channels = 2;
repeated string hashAttributeNames = 3;
}

message RangeBasedShufflePartitioning {
int32 batchSize = 1;
repeated ChannelIdentity channels = 2;
repeated core.ChannelIdentity channels = 2;
repeated string rangeAttributeNames = 3;
int64 rangeMin = 4;
int64 rangeMax = 5;
}

message BroadcastPartitioning{
int32 batchSize = 1;
repeated ChannelIdentity channels = 2;
repeated core.ChannelIdentity channels = 2;
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ syntax = "proto3";

package edu.uci.ics.amber.engine.architecture.worker;

import "edu/uci/ics/amber/workflow.proto";
import "edu/uci/ics/amber/core/workflow.proto";
import "scalapb/scalapb.proto";

option (scalapb.options) = {
Expand All @@ -22,7 +22,7 @@ enum WorkerState {
}

message PortTupleCountMapping {
PortIdentity port_id = 1 [(scalapb.field).no_box = true];
core.PortIdentity port_id = 1 [(scalapb.field).no_box = true];
int64 tuple_count = 2;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ package edu.uci.ics.amber.engine.common;

import "edu/uci/ics/amber/engine/architecture/rpc/controlcommands.proto";
import "edu/uci/ics/amber/engine/architecture/rpc/controlreturns.proto";
import "edu/uci/ics/amber/virtualidentity.proto";
import "edu/uci/ics/amber/core/virtualidentity.proto";
import "scalapb/scalapb.proto";

option (scalapb.options) = {
Expand All @@ -21,11 +21,11 @@ message ControlPayloadV2 {
}

message PythonDataHeader {
ActorVirtualIdentity tag = 1 [(scalapb.field).no_box = true];
core.ActorVirtualIdentity tag = 1 [(scalapb.field).no_box = true];
string payload_type = 2;
}

message PythonControlMessage {
ActorVirtualIdentity tag = 1 [(scalapb.field).no_box = true];
core.ActorVirtualIdentity tag = 1 [(scalapb.field).no_box = true];
ControlPayloadV2 payload = 2 [(scalapb.field).no_box = true];
}
Loading