-
Notifications
You must be signed in to change notification settings - Fork 8
DOC-753 | Graph ML UI #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Deploy Preview Available Via |
This comment was marked as duplicate.
This comment was marked as duplicate.
title: ArangoGraphML Web Interface | ||
menuTitle: ArangoGraphML Web Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title to be discussed (we might rename it to just GraphML)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be the same name twice, but I'm not settled on a particular name. Maybe just ui.md?
aliases: | ||
- getting-started-with-arangographml | ||
--- | ||
Solve high-computational graph problems with Graph Machine Learning. Apply ML on a selected graph to predict connections, get better product recommendations, classify nodes, and perform node embeddings. Configure and run the whole machine learning flow entirely in the web interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have node classification and embeddings available as immediate options. If we mention something like link predictions, we should at least outline how to achieve that.
Would also be good to have a more technical explanation here about how GraphML works (GraphSage, using depth 2 neighborhood, as mentioned in Slack team channel).
Please also add an overview over the process instead of immediately starting with project creation etc., users should first get an understanding of the hierarchy and steps involved.
|
||
To create a new GraphML project using the ArangoDB Web Interface, follow these steps: | ||
|
||
- **Select the Target Database** – From the **Database** dropdown in the left-hand sidebar, select the database where the project should reside. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are steps that should be followed in order, so use an ordered list here.
dropdown -> dropdown menu (or simply just write to select the database without mentioning the specific widget type)
To create a new GraphML project using the ArangoDB Web Interface, follow these steps: | ||
|
||
- **Select the Target Database** – From the **Database** dropdown in the left-hand sidebar, select the database where the project should reside. | ||
- **Navigate to the Data Science Section** – In the left-hand navigation menu, click on Data Science to open the GraphML project management interface, then click on RunGraphML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it the Data Science Suite perhaps?
click on Data Science
-> click **Data Science**
RunGraphML
-> **Run GraphML**
{{< info >}} | ||
The following attributes cannot be used: imdb_feat_description, imdb_feat_genre, imdb_feat_homepage, imdb_feat_id, imdb_feat_imageUrl, imdb_feat_imdb_x_hash, imdb_feat_imdbId, imdb_feat_label, imdb_feat_language, imdb_feat_lastModified, imdb_feat_released, imdb_feat_releaseDate, imdb_feat_runtime, imdb_feat_studio, imdb_feat_tagline, imdb_feat_title, imdb_feat_trailer, imdb_feat_type, imdb_feat_version, imdb_x, imdb_y, prediction_model_output. As some of their values are lists or arrays. | ||
{{< /info >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to mention that certain attributes are not eligible for GraphML but there shouldn't be a list of attributes here that are specific to the dataset, graph, and GraphML project. Users will not have these on the first run, and they will be different based on the mentioned things.
- **Batch size** – The number of documents to process in a single batch. | ||
- **Run analysis checks** – Whether to run analysis checks to perform a high-level analysis of the data quality before proceeding. Default is `true`. | ||
- **Skip labels** – Skip the featurization process for attributes marked as labels. Default is `false`. | ||
- **Overwrite FS graph** – Whether to overwrite the Feature Store graph if features were previously generated. Default is `false`, so features are written to an existing graph. | ||
- **Write to source graph** – Whether to store the generated features in the source graph. Default is `true`. | ||
- **Use feature store** – Enable the use of the Feature Store database, which stores features separately from the source graph. Default is `false`, so features are written to the source graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a reasonable amount of additional explanation over the available labels and toolstip in the UI to add value.
|
||
This is the second step in the ML workflow after featurization. In the training phase, you configure and launch a machine learning training job on your graph data. | ||
|
||
#### Select Type of Training Job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be a headline, especially not with the same level as the GraphML tasks
|
||
## Prediction Phase | ||
|
||
Once the best-performing model has been selected, the final step of the GraphML pipeline is to generate predictions for new or unlabeled data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I explained, we don't have the capability to only process new/unlabeled data
|
||
### Overview | ||
|
||
The Prediction interface allows inference to be run using the selected model. It enables configuration of how predictions are executed, which collections are involved, and whether new or outdated documents should be automatically featurized before prediction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a statement about effects on quality when featurizing new/outdated docs
Description
Upstream PRs