Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Conversation

@MaxFedotov
Copy link
Contributor

@MaxFedotov MaxFedotov commented Apr 12, 2020

Hi @shlomi-noach,

This PR is an 'orchestrator' part of openark/orchestrator-agent#1

This PR adds following:

  • New seed processing algorithm
  • Redesign of agent-related structs and functions
  • Additional APIs, which provides information about seed and agent
  • Update of agent and seed UI

We define seed as an operation, which consists of a set of predefined stages, where 2 orchestrator agents participate - one agent is called source agent(so it is on a source side of seed), another - target agent (so it is on a target side of seed), and the goal of seed is to transfer all data from source agent to target agent and add target agent as a slave to source agent.

Seed can be executed using different seed methods - special ways or programs used to transfer data from source agent to target agent

Seed operation is completely processed by orchestrator and is ran on scheduled configurable intervals (SeedProcessIntervalSeconds).

By design, each seed operation consists of 5 stages, which are executed one after another:

  • Prepare - is executed on both agents
  • Backup - is executed either on target or on source agent depending on seed method chosen
  • Restore - is executed on target agent
  • ConnectSlave - is executed on target agent
  • Cleanup - is executed on both agents

Each of stages can be in one of 6 statuses:

  • Scheduled - on next seed processing cycle, orchestrator will call corresponding stage orchestrator-agent API (prepare\backup\restore...) and orchestrator-agent will start executing stage. If API call will be successful, stage status will be updated to Running. If not - it will be updated to Error
  • Running - on next seed processing cycle, orchestrator will call orchestrator-agent API to get information about stage and save it to a database. Stage will stay in this status until orchestrator-agent will return that either stage is Completed, and it this case seed will be moved to next stage with Scheduled status, or there were some errors during stage and it's status will be updated to Error
  • Completed - just an information status. Orchestrator do not process seeds in this status
  • Error - on the next seed processing cycle, orchestrator will check, if the number of times stage was in this status is not more than a configurable threshold (MaxRetriesForSeedStage). If it is more - seed will be marked as Failed and won't be processed. If it is less seed will be restarted - moved to a Scheduled status of the current stage (one exception from this rule are errors during Backup stage. If an error was on this stage - seed will be restarted from Prepare stage, as some of the seed methods do important things on this stage - for example, socat is started on target agent during Prepare stage). If stage was executed on both agents, and only one agent had an error during execution orchestrator will abort stage on the second agent before restarting it
  • Failed - just an information status. Orchestrator do not process seeds in this status
  • Aborting - if user hits an Abort button in UI or uses Abort API, stage will be moved to this status. Orchestrator will try to call abort API on an agent (or both agents, if this stage includes them both). If it will be successful - seed will be marked as Failed and won't be processed. If not - on the next seed process cycle orchestrator will try to abort it one more time until it succeeds.

Seed is considered Active (that means it should be processed by orchestrator) only if stage is in Scheduled, Running, Error or Aborting statuses.

All seed processing is done using this function

func ProcessSeeds() []*Seed {
	activeSeeds, err := ReadActiveSeeds()
	if err != nil {
		log.Errore(fmt.Errorf("Unable to read active seeds: %+v", err))
		return nil
	}
	if len(activeSeeds) > 0 {
		inst.AuditOperation("process-seeds", nil, fmt.Sprintf("Will process %d active seeds", len(activeSeeds)))
		var wg sync.WaitGroup
		for _, seed := range activeSeeds {
			wg.Add(1)
			switch seed.Status {
			case Scheduled:
				go seed.processScheduled(&wg)
			case Running:
				go seed.processRunning(&wg)
			case Error:
				go seed.processErrored(&wg)
			case Aborting:
				go seed.processAborting(&wg)
			}
		}
		wg.Wait()
		inst.AuditOperation("process-seeds", nil, "All active seeds processed")
	}
	return activeSeeds
}

Which starts a new goroutine depending on status for each of the Active seeds and waits until all of them will complete.

So the overall process will look like:

  • /agent-seed/:seedMethod/:targetHost/:sourceHost 'orchestrator' API is called. Different pre-seed checks are done and if they are successful, new seed is created. It will be in Scheduled status in Prepare stage
  • ProcessSeeds() function is called by a timed routine. It will start processScheduled() function in a goroutine. This function will call /api/prepare/:seedID/:seedMethod/:seedSide on both orchestrator-agents and it these calls will be successful - it will update Prepare stage status to Running for both agents
  • ProcessSeeds() function is called by a timed routine. It will start processRunning() function in a goroutine. This function will call /api/seed-stage-state/:seedID/:seedStage on both orchestrator-agents and log the result. If both agents return, that they completed Prepare stage, it will be marked as Completed and seed will be moved to Backup stage in Scheduled status. If one of the agents hadn't completed stage, seed will be still on this stage and during next ProcessSeeds() function call processRunning() function will be run again, and it will call /api/seed-stage-state/:seedID/:seedStage on remaining agent. The process will be continued until both agents complete stage
  • ProcessSeeds() function is called by a timed routine. It will start processScheduled() function in a goroutine. This function will call /api/backup/:seedID/:seedMethod/:seedHost/:mysqlPort on one of the agents depending on seed method chosen. If this call will be successful - it will update Backup stage status to Running for an agent.
  • ProcessSeeds() function is called by a timed routine. It will start processRunning() function in a goroutine. This function will call /api/seed-stage-state/:seedID/:seedStage on an agent. If an agent return that it completed the Backup stage it will be marked as Completed and seed will be moved to Restore stage in Scheduled status.

So basically this works like a state-machine, which moves seed from start to finish using following path:
New seed -> Prepare scheduled -> Prepare running -> Prepare completed -> Backup scheduled -> Backup running -> Backup completed -> Restore scheduled -> Restore running -> Restore completed -> ConnectSlave scheduled -> ConnectSlave running -> ConnectSlave completed -> Cleanup scheduled -> Cleanup running -> Cleanup completed -> Seed completed

I've also created different test cases, which are located in agent_test.go and simulates all this logic using docker and orchestator-agents mocks.

With this PR orchestrator UI is also updated to support all changes. I've tried to keep all previous functions regarding LVM\local snapshot hosts\remote snapshot hosts as much backward-compatible, as possible.
New UI includes agent page with various filters with auto-complete
image
Updated seeds page with filters with auto-complete and UI for starting new seed
image
Updated agent page (some of LVM specific UI elements are not shown because this agent do not have LVM seed method added)
image

We had already created our own auto-provisioning solution based on orchestrator, orchestrator-agent and puppet, where new hosts are automatically added to cluster as slaves. We started testing in on our staging environment, but I will be still considering this functionality in the beta stage, as there are obviously be some bugs, which we will catch with active production usage.

But I will be very grateful for your feedback and comments and will be glad to answer all questions :)
Thanks,
Max

Maksim Fedotov added 30 commits January 14, 2020 13:05
@MaxFedotov
Copy link
Contributor Author

Seems like there are some problems with vendoring packages, but I don't have such errors during local build, so I definitely need your help with this :)

@MaxFedotov
Copy link
Contributor Author

I was able to fix build errors, but now it failed on running my tests for seed logic. The thing is that this test requires docker installed and some customization with \etc\hosts file

// before running add following to your /etc/hosts file depending on number of agents you plan to use
// 127.0.0.2 agent1
// 127.0.0.3 agent2
// ...
// 127.0.0.n agentn

I do not know how to configure this, so maybe exclude this test from running?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant