Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hub online synchronization #82

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
228 changes: 228 additions & 0 deletions accepted/0000-hub-online-synchronization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
- Feature Name: Hub Online synchronization
- Start Date: (fill with today's date, YYYY-MM-DD)

# Summary
[summary]: #summary

A new online synchronization mechanism between SUSE Manager HUB and peripheral servers which would be easier to use and provides automatic content synchronization by re-using existing stable mechanisms.

From the field we see that HUB deployments are made with connection between HUB and Peripheral servers, and a disconnected environment can be considered a different scope.

# Motivation
[motivation]: #motivation


With the Hub deployment architecture we provided a concept of centralized content management in SUSE Manager server, which would then synchronize the data to the peripheral servers. To support it, a new inter-server-sync (ISSv2) tool was developed with the goal of replacing the existing mechanism.

The new ISSv2 was designed to allow the transfer of different data types, like software channels, configuration channels and images. The design makes it agnostic to database changes since it uses database metadata (if the new fields and tables follow the SUMA database conventions) and also gives support for fully disconnected environments.

ISSv2 has some constrains and limitation. This RFC proposes a different approach by re-using some existing features to provide a more complete solution, instead of trying to solve the issues with ISSv2.

## Existing Solutions

Let's make a more detailed analysis of each of the existing inter-server-sync solutions, with pros and cons.

### ISS v1

Based on an old code base, with declarative fields to be transferred (changes in database would mean changes in the transfer specification).

Cons:
- Only works with software channels
- Declarative fields to be transferred and serialized in XML.
- Changes in the database structure needs means adaptation in the field declaration
- Old code base

Pros:
- Simple usage, with initial configuration through the web-UI
- Adding a channel needs to go through a command line tool
- Automatically channel + Products synchronization with mgr-sync Taskomatic job (delegates call to `mgr-inter-sync` cli tool)
- Cross organization synchronization

### ISSv2

Cons:
- Slow to export and import
- Data transfer is not optimized: uses sql statements with a single transaction. The bigger the channel or more channels in one single export, the bigger the problem.
- If database schema doesn't follow the conventions, we need manual corrections
- A lot of manual steps to run and make it more efficient
- No UI support
- Hard to debug
- No good feedback during import
- No cross organization import (organizations needs to have the same name in source and target)
- Need to have all the cloned channels hierarchy to be able to use SP migration

Pros:
- Can transfer more than just software channels (images and configuration channels)
- Sync process is started from the server side
- Export once, import in parallel in all peripherals
- Support for fully disconnected environment


## Proposed solution

One of SUSE Manager focus areas is synchronizing and managing content, especially RPM's. To do so, it synchronizes remote channels, apply filters to produce new channels (CLM) and makes these channels available to be consumed.

The overall idea of the Hub Online synchronization is to re-use the existing repo-sync mechanism and synchronize channels in the peripheral from the HUB published repositories (online connection between hub and peripherals servers).

In the first stage the focus will be synchronizing software channels, since it's the most problematic one in terms of performance and automation. However, we should be able to re-use the integration mechanism, since everything will be implemented through API.

We will need a mechanism to create the channels in the peripheral servers (vendor and customs ones) in the desired organization. Those channels need to have as content source the corresponding HUB server repository URL, with the correct authentication token. With this, repo-sync will take care of channel synchronization (proof of concept already tested).

Pros:
- Automatically update channel during the night with repo-sync/mgr-sync
- No need to have specific code adaptations in case of schema changes. It will be handled any way in repo-sync changes
- Rely on a stable specification: repositories metadata
- After initial configuration, all synchronization between hub and peripheral should be automatic and transparent
- Only differences would be transferred
- Promote environments in CLM will automatically propagate changes to all connected peripherals
- Automatic creating of bootstrap repository done by repo-sync tool

Cons:
- All peripheral can start synchronizing at same time
- Can be problematic if we have several peripherals performing full synchronization at same time. However we can configure the peripherals to run repo-sync in different hours and spread the load
- This is only a problem in first sync, since subsequent sync only transfer differences

Copy link
Member

@srbarrios srbarrios Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When synchronizing the channels from SCC service, in theory, we rely on a service with HA.
In that proposal, the peripherals will rely in a unique Hub instance through custom and vendor channels pointing to that machine, what happen if that goes down or.. the Hub disk burns? I would also consider how we can recover under these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rely on the HUB, but the repo-sync tool already has a retry mechanism. It will try to download/synch the content in the next iteration of mgr-sync taskomatic task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srbarrios is this stilla question for you?


## Tools scope

With this solution we can change the scope of the existing tools. For ISSv1 this would be a full replacement, meaning this tool/feature can be removed in SUSE Manager 5.0.

Issv2 will continue to exist but with a change in its scope to be the tool to synchronize content between fully disconnected environments, and not to use in HUB set-ups. However, since this first implementation is not considering images and configuration channels synchronization, we still need to use it in HUB for these scenarios (should be temporary and replace with implementation in HUB Online Synchronization).

The new Hub synchronization would be focused on HUB online deployments to allow scale SUSE Manager infrastructure. The only focus of this solution would be to work in scale and make it easier to use.


# Detailed design
[design]: #detailed-design

The solution focuses on re-using the repo-sync mechanism to synchronize channels from Hub server to peripheral servers. It will be described in several steps that aim to address how it will be used and how we can technically support it.

The main focus is to develop the integration between HUB server and peripherals through API only and re-using existing API methods as much as possible. However, some new custom methods will be needed.

Configuration should be pushed from the HUB to peripherals, to avoid the need of direct management of the peripherals.

For now this solution is focused on software channels only, but can be extended to synchronize more data types.


## Define connection between HUB and Peripherals

We can follow a similar approach to what exists on ISSv1. On the hub side we can define multiple peripheral servers to connect to by providing the FQDN and an authentication token.
On the peripheral side we also need to define the Hub server FQDN and an Authentication token.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hub will have peripheral and generate associated auth token, right?

Why do we need to provide peripheral FQDN? Wouldn't generic peripheral name (may be FQDN) and generated token be enough? Or do we approach this as username/pass scenario?

I am assuming that connection will be always from peripheral to the hub, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be used for authentication from HUB to peripheral.
Communication will be bi-directional. Some cases will be peripheral calling hub (like synchronize software channels and calling SCC endpoints) other cases will be HUB calling the peripheral API like creating channels, pushing configuration channels, etc)


Each peripheral server can only have 1 Hub server (master). This will avoid dealing with problems like channel label conflicts between multiple masters. Configure a hub server will block access to a set of menu items like: "Admin" -> "Setup wizard" -> "Organization Credentials" and "Products" and "PAYG Connection" (similar to what we already have for ISSv1).
We cannot add Hub connection if SCC credentials are defined, they should be mutually excluded.

We can re-use and improve existing ISSv1 database tables to save the needed data.


## Hub as a proxy for SCC data

The SUSE Manager server needs a set of metadata to be able to operate. Currently that metadata is provided by SCC directly or, in the case of PAYG, provided by the cloud RMT infrastructure. We should also provide this data in SUSE Manager HUB to be consumed by the peripheral servers.
The minimal endpoints to be provided are:
- "/suma/product_tree.json"
- "/connect/organizations/products/unscoped"

On top of this, we should also provide an endpoint for peripherals to send status data needed by SCC (example of this is the minions registered and hardware information). Peripheral servers should send this data to the HUB instead of SCC, and the HUB server should consolidate it and send it to SCC.

## Peripheral software channels creation

We need a mechanism to create the channels in the peripheral servers (vendor and CLM's) in the desired organization. The peripheral channel creation must be done automatically from the HUB server through an API. Since we are making special channel creation (defined next), those API methods should be available to server-to-server communication only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to define a little bit more in detail, how this server-to-server API should look like.
This should also say something about the exiting API namespaces sync.master and sync.slave.

  • What namespace should be used for it?
  • one namespace or multiple?
  • design it with an outlook to the future and what else needs to be added in future to this API. E.g. activation keys, config channels, images, formulas, etc.
  • how should the authentication work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Michael. I added some clarification about API namespace and use cases. I didn't add any details about the exact API methods to develop because it looks to me like it's an implementation detail.
Could you have a look if it's more clear now? Thank you


For each peripheral we should define the organization mapping between HUB organizations and peripheral ones, following the rules (similiar to what we have on ISSv1):
- Each Hub organization can only be mapped to one peripheral organizations
- Peripheral organization can receive data from one HUB organization, which is not mandatory to have the same identifier

From the webUI we should be able to select which channels should be added to each peripheral (vendor or custom). This needed to be saved in a new database table on the HUB side. The vendor channels mapping are independent from the Organization Mapping. Customs channels need to be selected at a organization mapping level, since they are always attached to a organization.

Steps needed to create the channels on peripheral:
- In the HUB, generate authentication tokens to access the channel(s). Re-use the existing channel access table.
- In the peripheral create all needed channels (vendor and custom) point to the HUB server in the rhncontentsource table. Each channel needs to have its own entry.
- Minimal set of tables will be populated on the peripheral: rhnchannel, rhnchannelcontentsource, rhnchannelproduct, rhnchannelfamilymembers, rhnproductname, rhnchannelcloned, suseproductchannel.


When creating the channels in the peripheral server the content source will reference the correct HUB repository that provides that channel. This should be done by a new special API (server to server communication) which will add a entry to table `rhncontentsource` with the URL to the HUB channel repository. With this information repo-sync will be able to synchronize data from HUB server without any code change.

Peripheral servers need to be configured with the flag `java.unify_custom_channel_management`, which will synchronize custom channels during the nightly mgr-sync process.

One important aspect is to recreate the connection between custom channels and vendor channels, so we can have SP migration and avoid the need to synchronize all channel clone chains (ISSv1 also does this implementation).

Vendor channels are not linked to any organization, and can also be synchronized with this method.

## Communication Workflow

![Communication workflow](images/hub_sync_content/comunication_workflow.png)

## Integration with SUSE Manager 5.0

With SUSE Manager containerization we cannot register the peripheral servers as minions anymore. Several components were relying on having that.
In this section we will have a walk-through of those problems and how this solution can help solve them.

### HUB XML-RPC API

Component responsible for broadcasting API requests to peripheral servers. This component loads the peripheral servers that are registered on the HUB by looking at the minion entitlements.

With the proposed solution we can retrieve the list of the registered peripheral servers and avoid the peripheral server registration as minion.

### Inter-server-sync-v2

Will be replaced by this solution when fully implemented (including images and configuration channels). It will continue to exist but for disconnect set-ups only.

### Uyuni-config formula

This proposal doesn't address any of the issues with the configuration formula.

### HUB Report Federation

Reporting database information is collected using a salt state to get the report DB connection information.
With this solution we can replace the mechanism by an API call to the peripheral server which would return the needed information, and in this way remove the need to minion registration.


## Future Steps

This implementation sets the foundations to add more online content synchronization through the API, like configuration channels, images or activation keys.
this section will define a initial approach to make it possible. It's not as details as the proposed solution, since it's out of implementation scoped for this RFC and is described to give a full solution picture.

### Configuration channels

All data for configuration channel are save in the database and then materialized in disk to be used by salt. The source of true is the database information and the disk materialization is automatic if API methods are used.

In the HUB server we can collect which configuration channels that should be synchronized to each peripheral. Configuration channel are attached to a organization and they should be part of the organization mapping selection.
It's label behave in a different way when compared to software channels: we can have the same label in different organizations.

A new taskomatic Job can be created to synchronized data from HUB server to peripherals which should run automatically once per day during a low usage time.

The existing API methods under namespace `configchannel` cannot deal with cross organization mapping. To overcome it we have two options:
- Develop a new endpoint to create and update channels with server-to-server authentication and the possibility to defined the target organization. Besides organization identifier, all business logic should be the same as the API namespace `configchannel`.
- Collect organization admin credentials (user+password) on the HUB, and use the existing API. The downside is that it's more fields to be collected and managed.

### Images

TODO
- Is all data saved in the database?
- Is the identifier unique in the system, or we can have duplicates across organizations?
- Are the existing endpoint sufficient to update all the data?

### Activation Keys

Activation keys are part of a organization and need to be mapped to peripherals at a organization mapping level. Similar to Configuration Channels we have API methods to create and update activation keys (namespace `activationkey`). However the API methods doesn't have the ability to do cross organization mapping.

The same two options are available: New API endpoints or collect organization administrator credentials.

# Drawbacks
[drawbacks]: #drawbacks

- Full synchronize a high number of peripherals at same time can overload the HUB server
- After the first full synchronization it should not be a problem
- One Peripheral can manage up to 10.000 minions. If we add 10 peripherals we would be able to manage 100.000 minions (more than our biggest customer)

# Alternatives
[alternatives]: #alternatives

- Create a new UI and use ISSv2 to synchronize data
- Solve the existing and known problems of ISSv2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not discard this point, in addition to implement ISSv3.
As ISSv2 it's going to be used for disconnected environments, we can still bring a better user experience to that use case. For example, can we consider some improvement around performing parallel sql queries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of performance, the main issues are in the import process. In there we cannot change it to run in parallel.
We can however change the way we do the transaction, and have one transaction per channel, instead of one transaction per export as we have now.
Another possibility is to not have a transition at all, but that can be risky if users start to use the channel during the import process, or if an error occurs during the import.


# Unresolved questions
[unresolved]: #unresolved-questions

- How to deal when channel mapping is removed at hub configuration? Channel should stay in peripheral but will not get more updates?
- how to deal with channel delete on HUB? Channel should stay in peripheral but will not get more updates?
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.