Skip to content

include more parts of old operator manual and add more details screenshots for published data #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/mkdocs/mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ nav:
- Dashboard: login/Dashboard.md
- Datasets:
- datasets/index.md
- Register DOIs: datasets/Publishing.md
- Publishing data: datasets/Publishing.md
- Publishing data Advanced: datasets/PublishingAdvanced.md
- Proposals: proposals.md
- Samples: samples.md
- Instruments: instruments.md
Expand Down
37 changes: 36 additions & 1 deletion docs/backendconfig/dois.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,43 @@
# DOI minting in SciCat - How to publish datasets
# DOI minting in SciCat - How to set up publication of datasets

## Introduction
User introduction can be found [here](../doisIntro.md).

## Variables to configure

We repeat here the relevant parts from the [```.env```-file](../backendconfig/index.md#environment-variables) that essentially give handle to the admin-user:

* REGISTER_DOI_URI="https://mds.test.datacite.org/doi"
* REGISTER_METADATA_URI="https://mds.test.datacite.org/metadata"
* DOI_USERNAME="username"
* DOI_PASSWORD="password"

The up to now main landing page server as separate frontend client will become redundant: other than datasets can be chosen as entry points to benefit from the nice search on datasets, one can simply use also the publishedData main page as entry point for displaying all externaly accessible DOIs.

## Full potential with SciCat's APIs

The respective endpoints can be viewed from swagger and are

List of API endpoints one can access:
![swagger screenshot](../swagger/img/swagger_publishedData.png)

### Endpoints

#### post
Main one is to the post object:

![```post```](../swagger/img/swagger_publishedData_post.png)

Others are

#### count
![```count```](../swagger/img/swagger_publishedData_count.png)

#### register
![```register```](../swagger/img/swagger_publishedData_register.png)

#### form populate
![```form populate```](../swagger/img/swagger_publishedData_formpopulate.png)

#### resync
![```resync```](../swagger/img/swagger_publishedData_resync.png)
15 changes: 5 additions & 10 deletions docs/backendconfig/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The configuration file ```.env``` allows the systems administrator to configure

There are currently many configurable additions to SciCat which makes it very flexible these are:

* OIDC for authenticatoin
* OIDC for identification
* LDAP for authentication
* Elastic Search
* SMTP for sending emails to notify users of SciCat jobs
Expand All @@ -13,7 +13,7 @@ There are currently many configurable additions to SciCat which makes it very fl
## Environment Variables
All environment variables can be used in the ```.env``` filee. The current source code contains an example .env file, named _.env.example_, listing all (79) environment variables available to configure the backend. They can be found [here](https://github.com/SciCatProject/scicat-backend-next/blob/master/.env.example) and define

* How SciCat handles [access rights](#how-to-handle-access-rights) and connects to identity providers - such as [LDAP](#how-to-configure-ldap) or [OIDC](#how-to-configure-oidc)
* How SciCat handles access rights and connects to other services e.g. to identity providers - such as LDAP or OIDC for authentication.
* How to configure [DOIs](#how-to-configure-doi-minting).
* How to configure elasitc search (ES)
* How to configure jobs
Expand Down Expand Up @@ -499,17 +499,12 @@ LOGGERS_CONFIG_FILE="loggers.json"
DATASET_TYPES_FILE="datasetTypes.json"
PROPOSAL_TYPES_FILE="proposalTypes.json"
```
### How to configure LDAP
Here are some details that are currently unknown to the author.

### How to configure OIDC
Here are some details that are currently unknown to the author.
### How to configure to connect the backend to other services
In [scicatlive](https://www.scicatproject.org/scicatlive/latest/services/backend/) you find documentation on how to integrate your SciCat system with services providing identities, (e.g. KeyCloak) and authentication (OpenLDAP).

### How to configure DOI minting
In SciCat one can publish selected datasets that triggers a DOI minting process. Find [here](dois.md) a short introduction and instructions how to set up such a service. SciCat also has the option to make datasets publicly available, if you wish to do that follow [this Link](toBeWritten.md)

In SciCat one can publish selected datasets that triggers a DOI minting process. Find [here](../datasets/Publishing.md) a short introduction and instructions how to set up such a service. SciCat also has the option to make datasets publicly via APIs available, to get a better idea of that follow [this Link](dois.md)


## More advanced options

If you are compiling the application from source, you can edit the file _src/config/configuration.ts_ with the correct values for your infrastructure. **This option is still undocumented, although it is our intention to provide a detailed how-to guide as soon as we can.**
18 changes: 10 additions & 8 deletions docs/datasets/Publishing.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Publishing datasets
# Publishing SciCat datasets

There are two ways of _publishing_ datasets in SciCat, one via the "publish button" for each dataset, and secondly when adding to it to a selection of datasets for which a DOI is registered.
There are two ways of _publishing_ datasets in SciCat via the GUI: one using the "publish button" for each dataset, and the other when registering a selection of datasets for a DOI according to DataCite standards. The first step is included in the second one.
If you want a more advanced option for full exploitation of the available endpoints of SciCat, see [here](../backendconfig/dois.md).

A more technical description of the workflow can be found [here](PublishingAdvanced.md).

## Publish without DOI registration

Expand All @@ -10,13 +13,13 @@ Each dataset has this button on the top right.

## Publish with DOI registeration

The user can select one or several datasets for DOI (**Digital Object Identifier**) registration which means that a record in DataCite, a DOI provider, will be made that points to a DESY landing page. SciCat offers a DataCite conform schema during the workflow. Any data that is known to the data catalog can be published. The publication workflow does the following:
The user can select one or several datasets for DOI (**Digital Object Identifier**) registration producing a record in DataCite, a DOI provider, pointing to a local detailed landing page. SciCat offers a DataCite conform schema during the workflow. Any data that is known to the data catalog can be published and the publication workflow goes as follows:

1. The logged in user can define a **set** of datasets to be published.
2. That person assigns metadata relevant to the publication of the datasets, such as title, author (currently the name(s) under 'Creator'), abstract etc. One can work on it at a later stage, too and re-edit the registration. Note, that editing will be allowed once the registration request has been sent.
2. That user assigns metadata relevant to the publication of the datasets, such as title, author (currently the name(s) under 'Creator'), abstract etc. One can work on it at a later stage, too and re-edit the registration. Note, that **no** editing will be allowed once the registration request has been sent.
3. A DOI is assigned to the published data which can e.g. be used to link from a journal article to the data.
4. It makes the data publicly available by providing a _landing page_ that describes the data.
5. It publishes the DOI to the worldwide DOI system , e.g. from Datacite
4. It makes the data publicly available by providing a detailed _landing page_ that describes the data.
5. It publishes the DOI to the worldwide DOI system from Datacite.


So the first step is to **select the datasets** that should be published:
Expand All @@ -43,8 +46,7 @@ Once this is finished one can hit the "register" button (not shown in previous s

![Landing page of published data](img/landingpage.png)

Finally you can have a look at all the published data by going to the Published Data menu item (again by clicking the user icon at the top right corner and choosing "Published Data"):
Finally you can have a look at all the published data by going to the Published Data menu item: by clicking the user icon at the top right corner and choosing "Published Data":

![Landing page of published data](img/published_datasets.png)

This [short video](https://scicatproject.github.io/img/attach_and_publish.mp4) demonstrates how you can add an attachment to your dataset and publish the data.
28 changes: 28 additions & 0 deletions docs/datasets/PublishingAdvanced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Publishing SciCat datasets Advanced

The previously described options to [publish datasets](Publishing.md) in SciCat - the process of registration of a selection of datasets - is here outlined in a more technical way.

## Implementation workflow: The purpose.

This diagram shows the essential steps in the workflow to be implemented. Please note, that SciCat datasets are *always* only *meta datasets*, SciCat has no direct to the storage system there is no default coupling to such systems.

### 1. Create a list of selected datasets
User can select datasets to create a **dataset list**; more datasets can be added and removed in several sessions. He can cancel the process at any time. New will be that while examining single datasets he can directly add or remove them to or from the selection in the cart.

### 2. Fill the form for this dataset selection
The user will be forwarded to a form where one can **provide metadata specific to this selection** which can e.g. match site specific information about e.g. grants, associated projects, etc. All selected datasets will be made public. You will be asked to verify the selection of datasets. Owners and Admins are allowed to update this form. Again this shall be possible within several sessions.

### 3. Publish the selection
After hitting button all selected datasets become publicly visible: not only the owner can view all the metadata of the data, date of creation, associated files names, location, PI, etc. This is **prerequisite** for DOI registration.

### 4. DOI registration
Before hitting the registration button the data selection does have an "internal" DOI which is an unregistered DOI clearly indicated by the state of this registration request.
When hitting the button register all the meta data will be forwarded to DOI provider DataCite if configured, see [backend config](../backendconfig/dois.md). For quality control your site may run in between an external service before forwarding the request. *Pending request* is indicated until the request if forwarded to DataCite. Note, from then on no more changes are possible for the requester. The concept of DOIs is to never change the metadata/data of the DOI.

### 5. For Admins only
In extremely rare cases and only if justified, i.e. in case of great errors an update can be made by admins only.


![workflow diagram](img/published_data_workflow_1.png)


Binary file added docs/datasets/img/publish_button.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/datasets/img/published_data_workflow_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 2 additions & 3 deletions docs/datasets/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Datasets
Datasets can include several files which e.g. comprise a self-contained measurement - which is fully customizeable during ingestion of meta data. Users can search, view, list the meta data of a dataset.

To group and tag datasets is depicted [here](grouping_tagging_ds.md). Datasets can also be issued to be published: either removing the restricted view or triggering the process of obtaining a DOI for the selected datasets, see [old description](Publishing.md).
SciCat datasets are sets of meta data and can include several files which e.g. comprise a self-contained measurement - which is fully customizeable during ingestion of meta data. Users can search, view, list the meta data of a dataset.

To group and tag datasets is depicted [here](grouping_tagging_ds.md). Datasets can also be issued to be **published**: either removing the restricted view or triggering the process of obtaining a DOI for the selected datasets, see description of [publication of SciCat datasets](Publishing.md).

### How to query datasets

Expand Down
Binary file added docs/operator-manual/img/DacatDataflowV3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/operator-manual/img/job-assembler.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 69 additions & 11 deletions docs/operator-manual/index.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,82 @@
# Welcome to SciCat Operator's Manual

General manual for site-administrators can be found in the [**scicatlive**](https://www.scicatproject.org/scicatlive/latest/) documentation, it contains information how to set up and run a SciCat instance. For troublshooting issues, please see [the User's Guide](../troubleshoot/index.md).
## Overview

## Configuration of the Backend
There is one central place where one has a handle on how the Backend is configured in SciCat: the [dotenv](../backendconfig/index.md) file.
Getting SciCat up and running at your site should be rather straight forward for a test deployment. However turning into a production ready system may involve a bit more work, because different existing systems will need to be interfaced to SciCat.

### Hands-on SciCat
For getting familiar with SciCat's APIs, you can explore via the [Swagger](../swagger/index.md) interface.
## Understanding the Components

## Configuration of the Frontend
For the subsequent sections it will be useful to have a "helicopter" overview of the various components that need to play together seamlessly. The following diagram shows these components and also shows potentially existing components at your site, that you would likely want to interface to SciCat. The specific diagram reflects essentially the situation at PSI as of Sept. 2020. Of course your situation may look different. The diagram should therefore be seen as an example, which you need to adapt to your situation.

![ToDo: Updated schematic view of SciCat components](img/DacatDataflowV3.png)

## Up-to-date operator's information
Generally, the [**scicatlive**](https://www.scicatproject.org/scicatlive/latest/) documentation contains an up-to-date information how to set up and run the system ```SciCat``` interfacing it with various external, site-specific services. For troublshooting issues, please refer [the User's Guide](../troubleshoot/index.md).

Here we link to site-specific set ups.
## Backend
At the heart of the SciCat architecture there is the **REST API server**. This is a NodeJS application that uses the nestjs framework to generate RESTful APIs from JSON files that define these models Users, Datasets, Instruments, Proposals and Instruments. Following the Swagger/OpenAPI format SDKs can be generated in almost any language. You can explore the backend APIs directly via the [Swagger](../swagger/index.md) interface.

## Links to site-specific SciCat documentation of user sites
The persistence layer behind this API server is a **MongoDB** instance, i.e an open source, NoSQL, document-based database solution. The API server handles all the bi-directional communication from the REST interface to the database.

* ESS
* [PSI](../sites/PSI/index.md)
* MAXIV
These two components together comprise the "backend" of the architecture.

### Configuration of the backend
There is one central place where one has a handle on how the backend is configured in SciCat: the [dotenv](../backendconfig/index.md) file.

### Example: How to integrate to OIDC using keycloak

Integration with an identity provider, Keycloak, can be done using Open ID Connect, a protocol for authentication.
See [scicatlive manual](https://www.scicatproject.org/scicatlive/latest/services/backend/services/keycloak/) for more information on integration setup in SciCat backend.

## Frontend

To the REST server an arbitrary number of "clients" (frontends) can be connected. One of the most important clients is the web based GUI frontend. This allows to communicate with the data catalog in a user friendly way. It is based on the Angular (9+) technology and uses ngrx to communicate with the SciCat API and provide a searchable interface for datasets, as well as the option to carry out actions (i.e. archiving).

In addition to the GUI other clients exist, such as command line (CLI) clients (example exist written in GO and Python) or desktop based GUI applications based on Qt. The CLI tools are especially useful for automated workflows, e.g. to get the data into the data catalog. This process is termed "ingestion" of the data. But they can also be used to add the data manually, especially for derived data, since this part of the workflow is often not possible to automate, in particular in truly experimental setups.

### Configuration of the frontend

To start a local instance of the frontend follow the recipe: install requirements, esp. angular, git clone the [code](https://github.com/SciCatProject/frontend), go the the directory and run "npm run start". Then you can launch it by entering "localhost:4200".

### How to include site-specific logos
See [here](https://github.com/SciCatProject/frontend/blob/master/SITE-LOGO-CONFIGURATION.md) for example procedure how to include your logo.

### Messaging infractructure

SciCat strength is to intergrate into almost any existing infrastructure because **messaging systems** can be easily interfaced to SciCat that take over the communication to other services and systems.

In particular RabbitMQ (used at PSI) and Apache Kafka are in use. Such systems can e.g. be used to interface to an tape archive system. To add the specific business logic you can e.g. add your own scripting layer. At PSI however a Node-RED based solution proved to be a stable and flexible platform for this purpose. Node-RED is a A NodeJS based visual programming tool to handle flows of data from one source to another. The following shows the Nod-RED flow used for communicating job requests to the PSI archive system.

![Node-RED](img/job-assembler.png)


### Different entry points to SciCat

One can ususally see SciCat datasets that is the metadata of data taken. It will be possible to sort according to samples, proposals, instruments and published data. Integration and generalisation of these entry points to the catalogue is currently in development. Another strength of SciCat is that it provides a publishing server.

#### Publishing Server

In order to publish data you need to run a landing page server and you need to assign DOIs to your published data. Since the API server may be operated in an intranet, with no access to the internet the following architecture was chosen at PSI:

An OAI-PMH server is running in a DMZ connected to a local Mongo instance. At publication time the data from SciCat is pushed to the external OAI-PMH server. From this server the landing page server can fetch the information about the published data. Also external DOI systems connect to this OAI-PMH server to synchronize the data with the world wide DOI system.

If a user wants to download the full datasets of the published data, the data is copied from the internal file server to a https file server (acting as a cache file server) , which subsequently allows anonymous download of the data.

### Underlying Infrastructure of SciCat as a Service

You may or may not run the infrastructure as part of a Kubernetes cluster. E.g. at PSI the API server, the GUI application, RabbitMQ and the Node-RED instances are all deployed to a Kubernetes cluster, whereas the Mongo DB ist kept outside Kubernetes. Kubernetes is not necessary to have, but can simplify operations. Likewise "helm charts" or similar tools for managing software applications as a service. <!--Also, the separation into internet and intranet zones can be defined as required -- OK HOW??. You can, of course, operate the whole infrastructure directly in internet accessible servers, if security policies permit.-->

## Who uses SciCat?

Traditionally there were PSI, ESS and MaxIV that developed and deploy SciCat. More institutions joined the efforts and have pushed its development and many deploy photon and neutron labs in Europe and world-wide, see our project's for [all facilities](https://www.scicatproject.org/#facilities), contributors and users of SciCat.

Below is a list of their documentation with more details on their deployment.

* ESS - European Spallation Source
* [PSI - Paul-Scherrer-Institute](../sites/PSI/index.md)
* MAXIV
* RFI
* ALS
* SOLEIL
* [DESY](../sites/DESY/index.md)

Expand Down
Loading