-
Notifications
You must be signed in to change notification settings - Fork 76
Guide for Developers
Please follow the Getting Started Guide to install the required packages and obtain the codebase.
Texera uses PostgreSQL to manage the user data and system metadata. To install and configure it, please follow the below instructions:
-
Install Postgres@14+, if you are using Mac, do a simple
brew install postgres
. -
Install Pgroonga for enabling full-text search, if you are using Mac, do a simple
brew install pgroonga
. -
Create
texera_db
in Postgres withcore/scripts/sql/texera_ddl.sql
to create the database for storing user data. -
Create
texera_iceberg_catalog
in Postgres withcore/scripts/sql/iceberg_postgres_catalog.sql
to create the database for storing Iceberg catalogs. -
Edit
core/workflow-core/src/main/resources/storage.conf
, changeicberg.catalog.type
to postgres.
Texera requires LakeFS and S3(Minio is one of the implementations) as the dataset storage. Setting up these two storage services locally are required to make Texera's backend fully functioning. You may also refer to this PR to see how we introduce them as the underlying storage and the architecture.
Here are two ways of setting up LakeFS+Minio:
- Install Docker Desktop which contains both docker engine and docker compose. Make sure you launch the Docker after installing it.
- Go to directory:
core/file-service/src/main/resources
-
Configure docker-compose.yml to mount the data to your local folder, you can search for
volumes
in the file and follow the instructions in the comment. This step is required otherwise your data can lose if containers are deleted - Execute
docker compose up -d
at the directory
Refer to https://docs.lakefs.io/howto/deploy/ for how to deploy LakeFS, https://min.io/docs/minio/kubernetes/upstream/index.html for how to deploy Minio. Once you finish the deployment, you also need to configure the following items in the core/workflow-core/src/main/resources/storage.conf
# Configurations of the LakeFS & S3 for dataset storage;
lakefs {
endpoint = "http://localhost:8000/api/v1"
endpoint = ${?STORAGE_LAKEFS_ENDPOINT}
auth {
api-secret = ""
api-secret = ${?STORAGE_LAKEFS_AUTH_API_SECRET}
username = ""
username = ${?STORAGE_LAKEFS_AUTH_USERNAME}
password = ""
password = ${?STORAGE_LAKEFS_AUTH_PASSWORD}
}
block-storage {
type = ""
type = ${?STORAGE_LAKEFS_BLOCK_STORAGE_TYPE}
bucket-name = ""
bucket-name = ${?STORAGE_LAKEFS_BLOCK_STORAGE_BUCKET_NAME}
}
}
s3 {
endpoint = ""
endpoint = ${?STORAGE_S3_ENDPOINT}
region = ""
region = ${?STORAGE_S3_REGION}
auth {
username = ""
username = ${?STORAGE_S3_AUTH_USERNAME}
password = ""
password = ${?STORAGE_S3_AUTH_PASSWORD}
}
}
Before you import the project, you need to have "Scala", and "SBT Executor" plugins installed in Intellij.
- In Intellij, open
File -> New -> Project From Existing Source
, then choose thecore
folder. - In the next window, select
Import Project from external model
, then selectsbt
. - In the next window, make sure
Project JDK
is set. Click OK. - IntelliJ should import and build this Scala project. In the terminal under
core
, run:
sbt clean protocGenerate
This will generate proto-specified codes. And the IntelliJ indexing should start. Wait until the indexing and importing is completed. And on the right, you can open the sbt tab and check the loaded core
project and couple of sub projects under core
:

- When IntelliJ prompts "Scalafmt configuration detected in this project" in the bottom right corner, select "Use scalafmt formatted".
If you missed the IntelliJ prompt, you can check the
Event Log
on the bottom right
- To check lint, under
core
run commandsbt "scalafixAll --check"
; to fix lint issues, runsbt scalafixAll
. - To check format, under
core
run commandsbt scalafmtCheckAll
; to fix format, runsbt scalafmtAll
. - When you need to execute both, scalafmt is supposed to be executed after scalafix.
The easiest way to run backend services is in IntelliJ. Currently we have couple of micro services for different purposes:
-
TexeraWebApplication
: provide user login, community resources read & write, and loading of available operators metadata. -
FileService
: provide dataset-related endpoints, including datasets management, access control and read&write files across datasets. -
WorkflowCompilingService
: propagate the schema and check for static errors during the workflow constructing. -
ComputingUnitMaster
: manage workflow execution, and serve as the master node of the computing cluster. -
ComputingUnitWorker
: a worker node in the computing cluster. (It is not a web server)
To be able to run the workflow using Amber, a distributed engine, we need to run the controller process which is TexeraWebApplication
, and a master node of the computing cluster which is ComputingUnitMaster
.
To run each of the above webserver, go to the corresponding scala file(i.e. for TexeraWebApplication
, go find TexeraWebApplication.scala), then run the main function by pressing on the green run button and wait for the process to start up.
For TexeraWebApplication
, the following message indicates that it is successfully running:
[main] [akka.remote.Remoting] Remoting now listens on addresses:
org.eclipse.jetty.server.Server: Started
- If IntelliJ displays CreateProcess error=206, The filename or extension is too long : add the -Didea.dynamic.classpath=true in Help | Edit Custom VM Options and restart the IDE
For ComputingUnitMaster
, the following prompt indicates that it is successfully running:
---------Now we have 1 node in the cluster---------
An alternative to run the backend engine is to run it in the command line. Navigate to core
folder in terminal window, run scripts/deploy-daemon.sh
, which launches all micro services as background processes; to terminate them, run scripts/terminate-daemon.sh
.
- The test framework is
scalatest
, for the amber engine, tests are located undercore/amber/src/test
; forWorkflowCompilingService
, tests are located undercore/workflow-compiling-service
. You can find unit tests and e2e tests. - To execute it, navigate to
core
directory in the command line and executesbt test
. - If using IntelliJ to execute the test cases please make sure to be at the correct working directory.
- For the amber engine's tests, the working directory should be
core/amber
- For the compiling service's tests, the working directory should be
core
This is for developers that work on the frontend part of the project. This step is NOT needed if you develop the backend only.
Recommend using nodejs@18 LTS
and required to use [email protected]
.
Install yarn:
npm install -g yarn
corepack enable && corepack prepare [email protected] --activate && yarn --cwd core/gui set version 4.5.1
You need to install Angular CLI to build and run the new GUI.
yarn install
Ignore those warnings (warnings are usually marked in yellow color or start with WARN
).
- In Intellij, open
File -> Open
, then choose thegui
folder insidecore
. - IntelliJ should import this project. Wait until the indexing and importing is completed.
- Click on the Green Run button next to the
Angular CLI Server
. - Wait for some time and the server will get started. Open a browser and access
http://localhost:4200
. You should see the Texera UI with a canvas.
Every time you save the changes to the frontend code, the browser will automatically refresh to show the latest UI.
Before merging your code to the master branch, you need to pass the existing unit tests first.
- Open a command line. Navigate to the
core/gui
directory. - Start the test:
ng test --watch=false
- Wait for some time and the test will get started.
You should also write some unit tests to cover your code. When others need to change your code, they will have to pass these unit tests so that you can keep your features safe.
The unit tests should be written inside
.spec.ts
file.
Run the following command
yarn run build
This command will optimize the frontend code to make it run faster. This step will take a while. After that, start the backend engine in IntelliJ and use your browser to access http://localhost:8080
.
Run the following command
yarn format:fix
This command will fix the formatting of the frontend code.
- Install [email protected], it's recommended to create a virtualenv to get a new copy of Python. Note: if you are using Apple's new M1 chip, please install Python through Anaconda.
- Obtain Python executable path, for example, run
which python
orwhere python
, and copy the returned path. - Fill the Python executable path into
core/amber/src/main/resources/udf.conf
, under thepath
key. - Install dependencies:
pip install -r core/amber/requirements.txt -r core/amber/operator-requirements.txt -r core/amber/r-requirements.txt
. - For formatting python files:
black core/amber/src/main/python
.
-
Make sure you have installed the PostgreSQL and configured your local PostgreSQL instance with the
core/scripts/sql/texera_ddl.sql
-
Edit
core/gui/src/environments/environment.default.ts
, changeuserSystemEnabled
to true. -
Edit
core/amber/src/main/resources/application.conf
- Change
user-sys.enabled
to true.
- Change
-
Edit
core/workflow-core/src/main/resources/storage.conf
- Change
jdbc.url
,jdbc.username
andjdbc.password
to your Postgres user and password to accesstexera_db
that you just created.
- Change
-
Optional: Add
googleClientId
to the same file to enable google login. -
Restart frontend and backend. You should see the homepage. You can register or login there.
- Set
clientId
andclientSecret
incore/amber/src/main/resources/application.conf
. - Login to Texera with an admin account.
- Open the Gmail dashboard under the admin tab.
- Authorize a Google account for sending emails.
- Send a test email.
This part is optional, you only need to do this if you are working on a specific task.
- Install MongoDB on your development environment(4.4 works for Ubuntu, 5.0 works for Mac/Windows).
- Start MongoDB with the default configuration.
- Edit
core/amber/src/main/resources/application.conf
, changestorage.mode
to "mongodb". - Start Texera, all the results of sink operators will be saved into MongoDB if applicable.
- Create the needed new table in MySQL and update
core/scripts/sql/texera_ddl.sql
to include the new table. - Run
core/dao/src/main/scala/edu/uci/ics/texera/dao/JooqCodeGenerator.scala
to generate the classes for the new table. - Create a helper class under
core/amber/src/main/scala/edu/uci/ics/texera/web/resource/dashboard
.
Note: Jooq creates DAO for simple operations if the requested SQL query is complex, then the developer can use the generated Table classes to implement the operation
Edit core/gui/src/environments/environment.default.ts
, change localLogin
to false
.
Edit core/gui/src/environments/environment.default.ts
, change inviteOnly
to true
.
There are two types of permissions for the backend endpoints:
- @RolesAllowed(Array("Role"))
- @PermitAll Please don't leave the permission setting blank. If the permission is missing for an endpoint, it will be @PermitAll by default.