Skip to content

DOC-753 | Graph ML UI #709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

DOC-753 | Graph ML UI #709

wants to merge 3 commits into from

Conversation

bluepal-thirumala-thotapalli

Description

Upstream PRs

  • 3.10:
  • 3.11:
  • 3.12:
  • 3.13:

Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-709--docs-hugo.netlify.app

This comment was marked as duplicate.

@Simran-B Simran-B changed the title Doc 753 DOC-753 | Graph ML UI Jun 10, 2025
Comment on lines +2 to +3
title: ArangoGraphML Web Interface
menuTitle: ArangoGraphML Web Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title to be discussed (we might rename it to just GraphML)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be the same name twice, but I'm not settled on a particular name. Maybe just ui.md?

aliases:
- getting-started-with-arangographml
---
Solve high-computational graph problems with Graph Machine Learning. Apply ML on a selected graph to predict connections, get better product recommendations, classify nodes, and perform node embeddings. Configure and run the whole machine learning flow entirely in the web interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have node classification and embeddings available as immediate options. If we mention something like link predictions, we should at least outline how to achieve that.

Would also be good to have a more technical explanation here about how GraphML works (GraphSage, using depth 2 neighborhood, as mentioned in Slack team channel).

Please also add an overview over the process instead of immediately starting with project creation etc., users should first get an understanding of the hierarchy and steps involved.


To create a new GraphML project using the ArangoDB Web Interface, follow these steps:

- **Select the Target Database** – From the **Database** dropdown in the left-hand sidebar, select the database where the project should reside.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are steps that should be followed in order, so use an ordered list here.
dropdown -> dropdown menu (or simply just write to select the database without mentioning the specific widget type)

To create a new GraphML project using the ArangoDB Web Interface, follow these steps:

- **Select the Target Database** – From the **Database** dropdown in the left-hand sidebar, select the database where the project should reside.
- **Navigate to the Data Science Section** – In the left-hand navigation menu, click on Data Science to open the GraphML project management interface, then click on RunGraphML.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it the Data Science Suite perhaps?
click on Data Science -> click **Data Science**
RunGraphML -> **Run GraphML**

Comment on lines +31 to +33
{{< info >}}
The following attributes cannot be used: imdb_feat_description, imdb_feat_genre, imdb_feat_homepage, imdb_feat_id, imdb_feat_imageUrl, imdb_feat_imdb_x_hash, imdb_feat_imdbId, imdb_feat_label, imdb_feat_language, imdb_feat_lastModified, imdb_feat_released, imdb_feat_releaseDate, imdb_feat_runtime, imdb_feat_studio, imdb_feat_tagline, imdb_feat_title, imdb_feat_trailer, imdb_feat_type, imdb_feat_version, imdb_x, imdb_y, prediction_model_output. As some of their values are lists or arrays.
{{< /info >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to mention that certain attributes are not eligible for GraphML but there shouldn't be a list of attributes here that are specific to the dataset, graph, and GraphML project. Users will not have these on the first run, and they will be different based on the mentioned things.

Comment on lines +37 to +42
- **Batch size** – The number of documents to process in a single batch.
- **Run analysis checks** – Whether to run analysis checks to perform a high-level analysis of the data quality before proceeding. Default is `true`.
- **Skip labels** – Skip the featurization process for attributes marked as labels. Default is `false`.
- **Overwrite FS graph** – Whether to overwrite the Feature Store graph if features were previously generated. Default is `false`, so features are written to an existing graph.
- **Write to source graph** – Whether to store the generated features in the source graph. Default is `true`.
- **Use feature store** – Enable the use of the Feature Store database, which stores features separately from the source graph. Default is `false`, so features are written to the source graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a reasonable amount of additional explanation over the available labels and toolstip in the UI to add value.


This is the second step in the ML workflow after featurization. In the training phase, you configure and launch a machine learning training job on your graph data.

#### Select Type of Training Job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a headline, especially not with the same level as the GraphML tasks


## Prediction Phase

Once the best-performing model has been selected, the final step of the GraphML pipeline is to generate predictions for new or unlabeled data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained, we don't have the capability to only process new/unlabeled data


### Overview

The Prediction interface allows inference to be run using the selected model. It enables configuration of how predictions are executed, which collections are involved, and whether new or outdated documents should be automatically featurized before prediction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a statement about effects on quality when featurizing new/outdated docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants