Replace Ruby bosh-monitor with Go implementation#2747
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e611365 to
56da280
Compare
56da280 to
327b3f9
Compare
There was a problem hiding this comment.
Pull request overview
This PR replaces the legacy Ruby-based bosh-monitor with a Go-based implementation and updates packaging, CI, and integration test scaffolding to build and run the new binary + out-of-process plugins.
Changes:
- Introduces a new Go
bosh-monitorbinary with supporting packages (server, event processing, NATS monitoring, plugin host/protocol, etc.) and Ginkgo/Gomega tests. - Updates BOSH release packaging/job templates to run the Go binary instead of the Ruby runtime/gem.
- Updates integration support to build the Go binary/plugins and adjusts integration specs/configs for the new log/config formats.
Reviewed changes
Copilot reviewed 156 out of 160 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/spec/integration/health_monitor/hm_stateless_spec.rb | Updates integration log parsing to match Go slog output format. |
| src/spec/integration_support/sandbox.rb | Builds the Go monitor for integration tests and runs it with updated PATH/env. |
| src/spec/integration_support/bosh_monitor_manager.rb | Adds integration helper to build Go bosh-monitor + plugin binaries. |
| src/spec/assets/sandbox/health_monitor_without_resurrector.yml.erb | Adjusts sandbox HM config to match new Go monitor expectations. |
| src/Gemfile.lock | Removes Ruby bosh-monitor gem from bundle. |
| src/Gemfile | Removes Ruby bosh-monitor gem entry. |
| src/bosh-monitor/test/integration/integration_suite_test.go | Adds Go integration test suite scaffold (Ginkgo). |
| src/bosh-monitor/spec/unit/bosh/monitor/protocols/tcp_connection_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/tsdb_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/riemann_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/resurrector_helper_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/paging_datadog_client_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/pagerduty_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/logger_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/json_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/graphite_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/event_logger_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/email_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/dummy_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/plugins/base_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/metric_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/instance_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/events/base_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/events/alert_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/event_processor_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/director_monitor_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/config_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/unit/bosh/monitor/agent_spec.rb | Removes Ruby monitor unit tests (legacy implementation removed). |
| src/bosh-monitor/spec/support/uaa_helpers.rb | Removes Ruby monitor test support (legacy implementation removed). |
| src/bosh-monitor/spec/support/host_authorizatin.rb | Removes Ruby monitor test support (legacy implementation removed). |
| src/bosh-monitor/spec/support/buffered_logger.rb | Removes Ruby monitor test support (legacy implementation removed). |
| src/bosh-monitor/spec/spec_helper.rb | Removes Ruby monitor spec helper (legacy implementation removed). |
| src/bosh-monitor/spec/gemspec_spec.rb | Removes Ruby gemspec tests (legacy implementation removed). |
| src/bosh-monitor/spec/functional/notifying_plugins_spec.rb | Removes Ruby functional tests (legacy implementation removed). |
| src/bosh-monitor/spec/assets/sample_config.yml | Removes Ruby sample config (legacy implementation removed). |
| src/bosh-monitor/spec/assets/dummy_plugin_config.yml | Removes Ruby dummy plugin config (legacy implementation removed). |
| src/bosh-monitor/pkg/server/server.go | Adds Go HTTP API server implementation (healthz + agent endpoints). |
| src/bosh-monitor/pkg/server/server_test.go | Adds Go tests for server endpoints. |
| src/bosh-monitor/pkg/server/server_suite_test.go | Adds Ginkgo suite for server package. |
| src/bosh-monitor/pkg/resurrection/resurrection_suite_test.go | Adds Ginkgo suite for resurrection package. |
| src/bosh-monitor/pkg/resurrection/manager_test.go | Adds resurrection manager rule parsing/behavior tests. |
| src/bosh-monitor/pkg/processor/processor_suite_test.go | Adds Ginkgo suite for processor package. |
| src/bosh-monitor/pkg/processor/event_processor.go | Adds Go event processor (validation, dedupe, pruning, dispatch). |
| src/bosh-monitor/pkg/processor/event_processor_test.go | Adds tests for Go event processor. |
| src/bosh-monitor/pkg/pluginproto/protocol_suite_test.go | Adds Ginkgo suite for plugin protocol package. |
| src/bosh-monitor/pkg/pluginhost/pluginhost_suite_test.go | Adds Ginkgo suite for plugin host package. |
| src/bosh-monitor/pkg/pluginhost/host_test.go | Adds tests for plugin host command handling and startup behavior. |
| src/bosh-monitor/pkg/nats/nats_suite_test.go | Adds Ginkgo suite for NATS package. |
| src/bosh-monitor/pkg/nats/director_monitor.go | Adds Go director-alert subscription monitor. |
| src/bosh-monitor/pkg/nats/director_monitor_test.go | Adds initial unit tests for director monitor (needs strengthening). |
| src/bosh-monitor/pkg/nats/client.go | Adds Go NATS client with TLS and startup retry logic. |
| src/bosh-monitor/pkg/instance/instance.go | Adds Go instance model + formatting helpers. |
| src/bosh-monitor/pkg/instance/instance_suite_test.go | Adds Ginkgo suite for instance package. |
| src/bosh-monitor/pkg/instance/deployment.go | Adds Go deployment model and agent/instance bookkeeping. |
| src/bosh-monitor/pkg/instance/agent.go | Adds Go agent model and timeout/rogue logic. |
| src/bosh-monitor/pkg/events/metric.go | Adds Go metric model. |
| src/bosh-monitor/pkg/events/events_suite_test.go | Adds Ginkgo suite for events package. |
| src/bosh-monitor/pkg/events/base.go | Adds Go event factory/validation helpers. |
| src/bosh-monitor/pkg/director/director_suite_test.go | Adds Ginkgo suite for director package. |
| src/bosh-monitor/pkg/director/auth.go | Adds Go auth provider logic (basic + UAA token flow, CA selection). |
| src/bosh-monitor/pkg/config/config.go | Adds Go config loader with defaults/validation. |
| src/bosh-monitor/pkg/config/config_suite_test.go | Adds Ginkgo suite for config package. |
| src/bosh-monitor/main.go | Adds Go entrypoint (-c config) with slog logging and signal handling. |
| src/bosh-monitor/lib/bosh/monitor/yaml_helper.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/version.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/resurrection_manager.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/protocols/tsdb_connection.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/protocols/tcp_connection.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/protocols/graphite_connection.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/tsdb.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/riemann.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/resurrector_helper.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/README.md | Removes Ruby plugin README (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/paging_datadog_client.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/pagerduty.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/logger.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/json.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/http_request_helper.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/graphite.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/event_logger.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/email.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/dummy.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/datadog.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/plugins/base.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/metric.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/instance.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/events/heartbeat.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/events/base.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/events/alert.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/event_processor.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/errors.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/director.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/director_monitor.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/deployment.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/core_ext.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/config.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/auth_provider.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/api_controller.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor/agent.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/lib/bosh/monitor.rb | Removes Ruby monitor implementation (deleted). |
| src/bosh-monitor/go.mod | Adds Go module definition for new monitor. |
| src/bosh-monitor/cmd/plugins/pluginlib/pluginlib.go | Adds shared plugin runtime library for out-of-process plugins. |
| src/bosh-monitor/cmd/plugins/pluginlib/pluginlib_test.go | Adds tests for plugin runtime library. |
| src/bosh-monitor/cmd/plugins/pluginlib/pluginlib_suite_test.go | Adds Ginkgo suite for pluginlib package. |
| src/bosh-monitor/cmd/plugins/hm-tsdb/main.go | Adds TSDB plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-riemann/main.go | Adds Riemann plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-pagerduty/main.go | Adds PagerDuty plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-logger/main.go | Adds logger plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-json/main.go | Adds JSON fanout plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-graphite/main.go | Adds Graphite plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-event-logger/main.go | Adds event-logger plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-dummy/main.go | Adds dummy plugin (Go) implementation. |
| src/bosh-monitor/cmd/plugins/hm-datadog/main.go | Adds Datadog plugin (Go) implementation. |
| src/bosh-monitor/.golangci.yml | Adds golangci-lint config for the new Go module. |
| packages/health_monitor/spec | Updates package spec to remove Ruby dependencies and ship new monitor sources. |
| packages/health_monitor/packaging | Updates packaging to build Go binary + plugins. |
| jobs/health_monitor/templates/health_monitor | Updates job launcher to run the Go binary (removes Ruby env). |
| jobs/health_monitor/templates/bpm.yml | Updates BPM config to run Go binary with args, removes Ruby env vars/volumes. |
| jobs/health_monitor/spec | Removes Ruby package dependency from health_monitor job. |
| .github/workflows/ruby.yml | Removes the Ruby monitor test matrix entry. |
| .github/workflows/go.yml | Adds lint/test jobs for the new Go bosh-monitor module. |
Comments suppressed due to low confidence (1)
packages/health_monitor/spec:6
go buildis invoked in this package, but the package spec declares no dependency that would provide a Go toolchain during BOSH compilation. Unless the compilation environment already hasgoavailable, this will fail to compile the release. Consider either (a) adding agolang-*package dependency, or (b) shipping prebuilt binaries (like other packages in this release) and removing the compile-timego buildrequirement.
---
name: health_monitor
files:
- bosh-monitor/**/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
327b3f9 to
704df82
Compare
e7428b4 to
be95cbb
Compare
Replace the Ruby bosh-monitor implementation with a Go-based binary. Changes: - Add Go bosh-monitor implementation under src/bosh-monitor/ - Delete Ruby bosh-monitor source (lib/, spec/, bin/, gemspec) - Remove bosh-monitor gem from src/Gemfile and Gemfile.lock - Update jobs/health_monitor/ to use Go binary (remove director-ruby-3.3 dep) - Update packages/health_monitor/ to compile Go binary via go build - Update .github/workflows/go.yml to test and lint src/bosh-monitor/ - Update .github/workflows/ruby.yml to remove monitor:parallel (Ruby deleted) - Add integration test support: BoshMonitorManager builds Go binary for sandbox - Fix hm-logger plugin output format to match Ruby logger for integration tests - Update hm_stateless_spec.rb JSON heartbeat parsing to match Go slog format - Fix sandbox health_monitor_without_resurrector.yml.erb (remove nats plugin) - Ensure TLS peer verification with director_ca_cert and uaa_ca_cert - Implement NATS connection retry logic during startup - Align DataDog pagerduty_service_name routing with Ruby implementation - Align Riemann severity string mapping with Ruby implementation
be95cbb to
1a7651a
Compare
…Data and filter heartbeats only
The hm-logger plugin logs both heartbeats and alerts as JSON. The test at
hm_stateless_spec.rb:96 ("only outputs complete heartbeats") was failing
because:
1. The test filter only excluded compilation jobs but not non-heartbeat
events. Director alerts fired during deployment would appear in the
collected entries and fail the heartbeat schema assertion.
2. The EventData struct lacked a top-level number_of_processes field,
matching the Ruby HM's heartbeat JSON format.
Fixes:
- Add NumberOfProcesses interface{} to EventData with json omitempty.
- Populate it in eventToProto() from the heartbeat's raw Attrs map.
- Update the spec filter to also require kind == 'heartbeat'.
Implements Go equivalents of the following Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/runner_spec.rb:
connect_to_mbus:
- "should connect using SSL" — covered by buildTLSConfig tests:
* returns TLS config with client certificate
* enforces MinVersion TLS 1.2
* populates CA certificate pool from server CA file
* returns error when cert file is missing
* returns error when cert and key do not match
* returns error when server CA file is missing
* returns error when server CA file contains no valid PEM blocks
NATS connection retries:
- "retries the connection until it succeeds"
- "logs retry attempts"
- "when timeout is exceeded / raises the last connection error"
- "connection_wait_timeout from config / uses the configured timeout"
- "when NATS connection fails with AuthError (subclass of ConnectError) /
retries the connection" — in Go all error types are retried (broader
than Ruby's ConnectError-only policy)
- "when an error occurs while connecting / throws the error"
- Uses default ConnectionWaitTimeout when not configured
Note: Ruby's "non-ConnectError does not retry" test is not applicable to
Go because the Go client retries all errors, not only ConnectError subtypes.
Implementation:
- Add connectFunc and retryWait package-level variables to client.go as
test seams (production behaviour is unchanged: nats.Connect is called
with a 1-second retry interval)
- Add export_test.go to expose BuildTLSConfig, ConnectFunc, and RetryWait
for use in the external test package
- Add client_test.go with 14 new Ginkgo/Gomega specs (17 total in suite)
Implements Go equivalents of all Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/director_monitor_spec.rb:
describe 'subscribe':
- "subscribes to hm.director.alert over NATS" — Subscribe registers a
handler with the NATS client; also covers Subscribe returning an error
when the subscriber fails
context 'alert handler / valid payload':
- "does not log an error"
- "tells the event processor to process the alert" — covered twice:
once via handleAlert directly, and once end-to-end through Subscribe
using fakeSubscriber.fireAlert
- "passes all alert fields to the event processor"
context 'alert handler / invalid payload':
- For each of [id, severity, title, summary, created_at]:
"logs an error if the <key> field is missing"
"does not create a new director alert"
- Malformed JSON: logs error and does not call processor
Additional (no Ruby equivalent):
- "logs the processor error" when event processor returns an error
Implementation:
- Add DirectorAlertSubscriber interface to director_monitor.go so a fake
subscriber can be injected without a live NATS connection
- Extract the anonymous alert handler into handleAlert method
- Add HandleAlert(dm, payload) to export_test.go for direct handler testing
- Replace the skeleton director_monitor_test.go (which only tested JSON
marshaling) with 13 real Ginkgo/Gomega specs (27 total in NATS suite)
Implements Go equivalents of all Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/auth_provider_spec.rb:
Shared examples (:auth_provider_shared_tests) run once per CA cert context:
- "returns auth header provided by UAA"
- "reuses the same token for subsequent requests"
- "when token is about to expire / obtains new token" (expires_in <= 60 s)
- "when getting token fails / logs an error"
- (implicit) does not raise / does not panic when token fetch fails
Five CA cert selection contexts (uaaCACertPath logic):
- "user provides director_ca_cert" — uses director_ca_cert for UAA requests
- "user provides uaa_ca_cert with a non-empty file" — uaa_ca_cert takes
priority over director_ca_cert
- "user provides uaa_ca_cert but file is empty" — falls back to
director_ca_cert (blank/whitespace-only file is not usable)
- "user provides uaa_ca_cert but file is missing" — falls back to
director_ca_cert (non-existent file is not usable)
- "user has not provided director_ca_cert" — returns empty path, meaning
the system trust store is used (Go: tls.Config{RootCAs: nil})
Non-UAA mode (basic auth):
- "returns the basic-auth header with encoded username and password"
Note: Ruby tests CA cert selection indirectly via CF::UAA::TokenIssuer mock
arguments. Go tests uaaCACertPath() directly (exposed via export_test.go)
and separately verifies token behaviour against a plain-HTTP fake UAA server,
which gives the same overall coverage in fewer, more focused specs.
Implementation:
- Add export_test.go to pkg/director/ exposing UaaCACertPath(ap)
- Add auth_test.go with 12 new Ginkgo/Gomega specs (24 total in suite):
5 CA cert selection tests
4 token lifecycle tests (returns token, reuses, near-expiry, error)
1 basic-auth correctness test (exact Base64 value)
2 error-path tests (logs error, returns empty string without panic)
…clude matcher The previous fix (7a09333) added kind==heartbeat to the post-loop filter but the loop still broke on the first JSON entry (which is a director alert during deployment). This left heartbeat_hashes empty after filtering. Fixes: - Move the kind==heartbeat and compilation-job filter inside the collection loop so the loop only breaks when actual heartbeats are found (ignoring alert events logged during deployment). - Change expect(...).to match(...) to expect(...).to include(...) since EventData JSON includes an 'attributes' field (raw NATS payload) not listed in the expected hash; include() allows extra keys in actual.
Implements Go equivalents of Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/plugins/resurrector_helper_spec.rb
(module Bosh::Monitor::Plugins::ResurrectorHelper):
AlertTracker#state_for contexts:
- "when the number of unresponsive agents is 0" → reports as 'normal',
summary shows '0 of 10 agents are unhealthy (0.0%)'
- "when below the meltdown count threshold" → reports as 'managed'
- "when at/above count threshold and below percent threshold" → 'managed'
- "when at/above both count and percent thresholds" → 'meltdown'
- summary format: "deployment: '<name>'; N of M agents are unhealthy (P%)"
- "when recorded alerts are outside the time threshold" → excludes stale
entries; only alerts within timeThreshold seconds count as unhealthy
JobInstanceKey:
- "hashes properly" → two keys with the same fields resolve to the same
Go map entry (struct equality used as map key)
newAlertTracker:
- default config (empty options) produces minimumDownJobs=5,
percentThreshold=0.2, timeThreshold=600
- custom config values are respected
Implementation notes:
The Ruby AlertTracker wraps a Bosh::Monitor.instance_manager lookup to
obtain total agent count. The Go deploymentState receives agentCount
directly (the caller is responsible for providing it), so tests construct
deploymentState directly rather than going through a manager mock.
unhealthyCount() boundary semantics match the Ruby implementation:
an alert at exactly now-threshold is NOT counted (strict After check).
Implements Go equivalents of Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/agent_spec.rb:
TimedOut? boundary conditions:
- "knows if it is timed out": false when just inside the agent_timeout
threshold (Ruby: at t=344 s with agent_timeout=344), true one full
second past it (Ruby: at t=345 s)
- false for a freshly created agent
Rogue? boundary conditions:
- "knows if it is rogue": false when just inside rogue_agent_alert
threshold (Ruby: at t=124 s with rogue_agent_alert=124), true one
full second past it (Ruby: at t=125 s)
- false once agent.Deployment is set (managed agent is never rogue)
- false for a freshly created agent
Name format across incremental state transitions:
- only cid: "agent zb [cid=deadbeef]"
- cid + instance_id: "agent zb [cid=deadbeef, instance_id=iuuid]"
- + deployment (no job): "agent zb [deployment=oleg-cloud, cid=…, instance_id=…]"
- deployment + job + instance_id: "oleg-cloud: mysql_node(iuuid) [id=zb, cid=deadbeef]"
- + index: "oleg-cloud: mysql_node(iuuid) [id=zb, index=0, cid=deadbeef]"
UpdateInstance:
- "populates the corresponding attributes" — job, index, cid, instance_id
are taken from the instance
- "does not modify job_state or number_of_processes when updating instance"
Note: Ruby mocks Time.now to test exact-second boundaries; Go tests use a
500 ms guard margin inside the threshold to stay stable without a time mock.
…cessing
Implements Go equivalents of Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/instance_manager_spec.rb
(context 'stubbed config' > describe '#process_event'):
Heartbeat handling:
- "can process" — heartbeats from unknown agents still register them;
two events from the same agent count once (AgentsCount = 2)
- "when heartbeat information cannot be completed for instance_id, job,
or deployment" → "does not process the heartbeat" (processor not called)
- "processes a valid populated heartbeat message" — processor receives
agent_id, deployment, instance_id, job, teams, and timestamp
- "when teams have changed between heartbeats" → "updates teams in
heartbeat event": after a second SyncDeploymentState with ['ateam','bteam']
the next heartbeat carries both teams
Shutdown handling:
- "shutdowns agent" — after shutdown.008 the agent count drops from 3 to 2
and AnalyzeAgents reflects the removal
Alert handling:
- "bad alert" → "does not increment alerts_processed" when the processor
returns an error (Go equivalent of Ruby raising Bosh::Monitor::InvalidEvent)
- "good alert" → "increments alerts_processed" by 2 for two successful
alert events
- heartbeats_received increments after a valid heartbeat
Implementation:
- Add heartbeat_teams_test.go with a captureProcessor that records calls and
can return a configured error, enabling assertion of both event data content
and alert-counter behaviour.
Refactors each plugin to extract the anonymous PluginFunc closure to a
named run<Plugin> function, enabling direct use with pluginlib.RunWithIO
in tests without spawning the binary. Also adds a shared plugintestutil
package with reusable helpers (CmdSink, SendInit, SendEvent, NextCmd, …).
Implements Go equivalents of the Ruby plugin specs:
hm-logger (logger_spec.rb):
- text format: events logged as "[HEARTBEAT] …" / "[ALERT] …"
- json format: events serialised as valid JSON
- startup log message emitted on launch
hm-dummy (dummy_spec.rb):
- each processed event increments the running total logged
- plugin exits cleanly when stdin is closed
hm-event-logger (event_logger_spec.rb):
- alert event → http_request POST /events with action/object_type/
object_name/deployment/instance/context fields
- heartbeat events are silently ignored (no HTTP request)
- context.message contains title and severity
hm-graphite (graphite_spec.rb):
- missing host or port → startup error
- heartbeat metrics → TCP line "deployment.job.inst.agent.metric_name value ts"
- custom prefix prepended to metric name
hm-tsdb (tsdb_spec.rb):
- missing host or port → startup error
- heartbeat → TCP "put metric_name ts value deployment=…"
- alert events are ignored
hm-riemann (riemann_spec.rb):
- missing host → error; missing port → error
- alert → TCP JSON with service=bosh.hm, state=critical (severity 2)
- heartbeat → TCP JSON with service=bosh.hm, name=metric_name, metric=value
hm-consul (consul_spec.rb):
- missing host/port/protocol → startup error
- alert with events=true → PUT /v1/event/fire/<label>
- heartbeat with TTL config → PUT /v1/agent/check/…
hm-pagerduty (pagerduty_spec.rb):
- missing service_key → startup error
- alert → POST with service_key, event_type=trigger, incident_key=id
- heartbeat → POST with service_key and heartbeat description
hm-datadog (datadog_spec.rb):
- missing api_key → error; missing application_key → error
- heartbeat → POST /api/v1/series with bosh.healthmonitor.* metrics
- alert → POST /api/v1/events with title/priority/alert_type
- pagerduty_service_name appends @service to alert text for normal-priority
hm-email (email_spec.rb):
- missing recipients → startup error
- missing smtp options → startup error
- valid options → plugin starts and stops cleanly (no SMTP mock needed)
hm-json:
- nil options → uses default glob, starts and stops cleanly
- non-matching glob → no child processes, handles events without error
Implementation notes:
- pagerduty and datadog goroutines now use select+ctx.Done() to avoid
sending on a closed cmds channel when the plugin shuts down (concurrent
HTTP calls no longer panic on clean exit)
- datadogSeriesURLTemplate / datadogEventsURLTemplate / apiURI promoted
from const to var so tests can redirect HTTP calls to httptest.Servers
- Tests that override package-level URL vars are not run in parallel to
prevent data races
Implements Go equivalents of the missing Ruby tests from spec/unit/bosh/monitor/runner_spec.rb: - "when NATS calls error handler with a ConnectError" / "shuts down the server": TestRunnerNATSConnectionFails verifies Run() returns a "failed to connect to NATS" error when the injected NATS client reports a connection failure. - "when an error occurs while connecting" / "throws the error": TestRunnerNATSConnectErrorPropagated verifies the original error is wrapped and propagated with errors.Is semantics. - "stops the HM server, stops the event loop" (handle_fatal_error / stop): TestRunnerStopsOnContextCancel and TestRunnerStop verify that cancelling the context (or calling Stop()) causes Run() to call shutdown(), close the NATS client, and return nil. - "connection_wait_timeout is configured in mbus config" / "uses the configured timeout": TestRunnerNATSUsesConnectionWaitTimeout verifies the ConnectionWaitTimeout value is forwarded to the NATS Config and controls the number of connect attempts. - "should connect using SSL": TestRunnerPassesTLSConfigToNATSClient verifies all Mbus TLS fields (endpoint, CA path, cert path, key path, timeout) are forwarded verbatim to the NATS client Config. - Plugin startup errors are non-fatal: TestRunnerContinuesWhenPluginStartFails verifies the runner proceeds to NATS connect and reaches its running state even when a plugin executable cannot be found (matching the Ruby implementation's continue-on-error behaviour). Production change: introduce a natsClient interface in runner.go and a package-level newNATSClient factory variable so the test can inject a fake client without requiring a live NATS server. Also adds Stop() to cancel the runner's context from outside.
Summary
This PR replaces the Ruby
bosh-monitorimplementation with a new Go-based binary that provides the same functionality, including:director_ca_cert/uaa_ca_certsupport for CA-bundle TLSChanges
src/bosh-monitor/: Delete Ruby source (lib/, spec/, bin/, gemspec). Add Go implementation.src/Gemfile/src/Gemfile.lock: Removebosh-monitorgem..github/workflows/go.yml: Addbosh-monitor-lintandbosh-monitor-testjobs..github/workflows/ruby.yml: Removemonitor:parallelmatrix entry (Ruby code deleted).jobs/health_monitor/: Switch from Ruby runtime to Go binary; removedirector-ruby-3.3package dep.packages/health_monitor/: Replace gem build script withgo build.src/spec/integration_support/: AddBoshMonitorManagerto build the Go binary for integration tests; update sandbox to use it with correct PATH.src/spec/assets/sandbox/: Update sandbox HM configs to be compatible with Go config format.src/spec/integration/health_monitor/: Update JSON heartbeat log parsing to match Go slog format.Test plan
cd src/bosh-monitor && go test ./...— all pass locallyfly:integrationsubmitted: build #363518081goworkflow:bosh-monitor-testandbosh-monitor-lintjobsrubyworkflow:nats_sync:parallel,common:parallel,release(no longer tests deleted Ruby monitor)