Skip to content

Replace Ruby bosh-monitor with Go implementation#2747

Draft
aramprice wants to merge 11 commits into
mainfrom
experiment-golang-bosh-monitor
Draft

Replace Ruby bosh-monitor with Go implementation#2747
aramprice wants to merge 11 commits into
mainfrom
experiment-golang-bosh-monitor

Conversation

@aramprice

Copy link
Copy Markdown
Member

Summary

This PR replaces the Ruby bosh-monitor implementation with a new Go-based binary that provides the same functionality, including:

  • NATS subscription for agent heartbeats, alerts, and shutdown events
  • Director polling for deployment/instance synchronization
  • Plugin host architecture with out-of-process plugins (hm-logger, hm-resurrector, hm-event-logger, hm-datadog, hm-pagerduty, hm-riemann, hm-graphite, hm-email, hm-consul, hm-json, hm-tsdb)
  • HTTP API (healthz, unresponsive_agents, unhealthy_agents, etc.)
  • TLS peer verification for Director and UAA connections
  • NATS connection retry logic during startup
  • director_ca_cert / uaa_ca_cert support for CA-bundle TLS

Changes

  • src/bosh-monitor/: Delete Ruby source (lib/, spec/, bin/, gemspec). Add Go implementation.
  • src/Gemfile / src/Gemfile.lock: Remove bosh-monitor gem.
  • .github/workflows/go.yml: Add bosh-monitor-lint and bosh-monitor-test jobs.
  • .github/workflows/ruby.yml: Remove monitor:parallel matrix entry (Ruby code deleted).
  • jobs/health_monitor/: Switch from Ruby runtime to Go binary; remove director-ruby-3.3 package dep.
  • packages/health_monitor/: Replace gem build script with go build.
  • src/spec/integration_support/: Add BoshMonitorManager to build the Go binary for integration tests; update sandbox to use it with correct PATH.
  • src/spec/assets/sandbox/: Update sandbox HM configs to be compatible with Go config format.
  • src/spec/integration/health_monitor/: Update JSON heartbeat log parsing to match Go slog format.

Test plan

  • Go unit tests: cd src/bosh-monitor && go test ./... — all pass locally
  • fly:integration submitted: build #363518081
  • GitHub Actions go workflow: bosh-monitor-test and bosh-monitor-lint jobs
  • GitHub Actions ruby workflow: nats_sync:parallel, common:parallel, release (no longer tests deleted Ruby monitor)

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f69eb2e1-332d-477d-9677-8af3058145e1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch experiment-golang-bosh-monitor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Comment thread .github/workflows/go.yml Fixed
Comment thread .github/workflows/go.yml Fixed
Comment thread src/bosh-monitor/main.go Fixed
@aramprice aramprice force-pushed the experiment-golang-bosh-monitor branch from e611365 to 56da280 Compare June 19, 2026 23:43
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 19, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the legacy Ruby-based bosh-monitor with a Go-based implementation and updates packaging, CI, and integration test scaffolding to build and run the new binary + out-of-process plugins.

Changes:

  • Introduces a new Go bosh-monitor binary with supporting packages (server, event processing, NATS monitoring, plugin host/protocol, etc.) and Ginkgo/Gomega tests.
  • Updates BOSH release packaging/job templates to run the Go binary instead of the Ruby runtime/gem.
  • Updates integration support to build the Go binary/plugins and adjusts integration specs/configs for the new log/config formats.

Reviewed changes

Copilot reviewed 156 out of 160 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/spec/integration/health_monitor/hm_stateless_spec.rb Updates integration log parsing to match Go slog output format.
src/spec/integration_support/sandbox.rb Builds the Go monitor for integration tests and runs it with updated PATH/env.
src/spec/integration_support/bosh_monitor_manager.rb Adds integration helper to build Go bosh-monitor + plugin binaries.
src/spec/assets/sandbox/health_monitor_without_resurrector.yml.erb Adjusts sandbox HM config to match new Go monitor expectations.
src/Gemfile.lock Removes Ruby bosh-monitor gem from bundle.
src/Gemfile Removes Ruby bosh-monitor gem entry.
src/bosh-monitor/test/integration/integration_suite_test.go Adds Go integration test suite scaffold (Ginkgo).
src/bosh-monitor/spec/unit/bosh/monitor/protocols/tcp_connection_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/tsdb_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/riemann_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/resurrector_helper_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/paging_datadog_client_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/pagerduty_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/logger_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/json_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/graphite_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/event_logger_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/email_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/dummy_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/plugins/base_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/metric_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/instance_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/events/base_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/events/alert_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/event_processor_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/director_monitor_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/config_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/unit/bosh/monitor/agent_spec.rb Removes Ruby monitor unit tests (legacy implementation removed).
src/bosh-monitor/spec/support/uaa_helpers.rb Removes Ruby monitor test support (legacy implementation removed).
src/bosh-monitor/spec/support/host_authorizatin.rb Removes Ruby monitor test support (legacy implementation removed).
src/bosh-monitor/spec/support/buffered_logger.rb Removes Ruby monitor test support (legacy implementation removed).
src/bosh-monitor/spec/spec_helper.rb Removes Ruby monitor spec helper (legacy implementation removed).
src/bosh-monitor/spec/gemspec_spec.rb Removes Ruby gemspec tests (legacy implementation removed).
src/bosh-monitor/spec/functional/notifying_plugins_spec.rb Removes Ruby functional tests (legacy implementation removed).
src/bosh-monitor/spec/assets/sample_config.yml Removes Ruby sample config (legacy implementation removed).
src/bosh-monitor/spec/assets/dummy_plugin_config.yml Removes Ruby dummy plugin config (legacy implementation removed).
src/bosh-monitor/pkg/server/server.go Adds Go HTTP API server implementation (healthz + agent endpoints).
src/bosh-monitor/pkg/server/server_test.go Adds Go tests for server endpoints.
src/bosh-monitor/pkg/server/server_suite_test.go Adds Ginkgo suite for server package.
src/bosh-monitor/pkg/resurrection/resurrection_suite_test.go Adds Ginkgo suite for resurrection package.
src/bosh-monitor/pkg/resurrection/manager_test.go Adds resurrection manager rule parsing/behavior tests.
src/bosh-monitor/pkg/processor/processor_suite_test.go Adds Ginkgo suite for processor package.
src/bosh-monitor/pkg/processor/event_processor.go Adds Go event processor (validation, dedupe, pruning, dispatch).
src/bosh-monitor/pkg/processor/event_processor_test.go Adds tests for Go event processor.
src/bosh-monitor/pkg/pluginproto/protocol_suite_test.go Adds Ginkgo suite for plugin protocol package.
src/bosh-monitor/pkg/pluginhost/pluginhost_suite_test.go Adds Ginkgo suite for plugin host package.
src/bosh-monitor/pkg/pluginhost/host_test.go Adds tests for plugin host command handling and startup behavior.
src/bosh-monitor/pkg/nats/nats_suite_test.go Adds Ginkgo suite for NATS package.
src/bosh-monitor/pkg/nats/director_monitor.go Adds Go director-alert subscription monitor.
src/bosh-monitor/pkg/nats/director_monitor_test.go Adds initial unit tests for director monitor (needs strengthening).
src/bosh-monitor/pkg/nats/client.go Adds Go NATS client with TLS and startup retry logic.
src/bosh-monitor/pkg/instance/instance.go Adds Go instance model + formatting helpers.
src/bosh-monitor/pkg/instance/instance_suite_test.go Adds Ginkgo suite for instance package.
src/bosh-monitor/pkg/instance/deployment.go Adds Go deployment model and agent/instance bookkeeping.
src/bosh-monitor/pkg/instance/agent.go Adds Go agent model and timeout/rogue logic.
src/bosh-monitor/pkg/events/metric.go Adds Go metric model.
src/bosh-monitor/pkg/events/events_suite_test.go Adds Ginkgo suite for events package.
src/bosh-monitor/pkg/events/base.go Adds Go event factory/validation helpers.
src/bosh-monitor/pkg/director/director_suite_test.go Adds Ginkgo suite for director package.
src/bosh-monitor/pkg/director/auth.go Adds Go auth provider logic (basic + UAA token flow, CA selection).
src/bosh-monitor/pkg/config/config.go Adds Go config loader with defaults/validation.
src/bosh-monitor/pkg/config/config_suite_test.go Adds Ginkgo suite for config package.
src/bosh-monitor/main.go Adds Go entrypoint (-c config) with slog logging and signal handling.
src/bosh-monitor/lib/bosh/monitor/yaml_helper.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/version.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/resurrection_manager.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/protocols/tsdb_connection.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/protocols/tcp_connection.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/protocols/graphite_connection.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/tsdb.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/riemann.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/resurrector_helper.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/README.md Removes Ruby plugin README (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/paging_datadog_client.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/pagerduty.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/logger.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/json.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/http_request_helper.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/graphite.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/event_logger.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/email.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/dummy.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/datadog.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/plugins/base.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/metric.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/instance.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/events/heartbeat.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/events/base.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/events/alert.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/event_processor.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/errors.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/director.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/director_monitor.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/deployment.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/core_ext.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/config.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/auth_provider.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/api_controller.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor/agent.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/lib/bosh/monitor.rb Removes Ruby monitor implementation (deleted).
src/bosh-monitor/go.mod Adds Go module definition for new monitor.
src/bosh-monitor/cmd/plugins/pluginlib/pluginlib.go Adds shared plugin runtime library for out-of-process plugins.
src/bosh-monitor/cmd/plugins/pluginlib/pluginlib_test.go Adds tests for plugin runtime library.
src/bosh-monitor/cmd/plugins/pluginlib/pluginlib_suite_test.go Adds Ginkgo suite for pluginlib package.
src/bosh-monitor/cmd/plugins/hm-tsdb/main.go Adds TSDB plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-riemann/main.go Adds Riemann plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-pagerduty/main.go Adds PagerDuty plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-logger/main.go Adds logger plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-json/main.go Adds JSON fanout plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-graphite/main.go Adds Graphite plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-event-logger/main.go Adds event-logger plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-dummy/main.go Adds dummy plugin (Go) implementation.
src/bosh-monitor/cmd/plugins/hm-datadog/main.go Adds Datadog plugin (Go) implementation.
src/bosh-monitor/.golangci.yml Adds golangci-lint config for the new Go module.
packages/health_monitor/spec Updates package spec to remove Ruby dependencies and ship new monitor sources.
packages/health_monitor/packaging Updates packaging to build Go binary + plugins.
jobs/health_monitor/templates/health_monitor Updates job launcher to run the Go binary (removes Ruby env).
jobs/health_monitor/templates/bpm.yml Updates BPM config to run Go binary with args, removes Ruby env vars/volumes.
jobs/health_monitor/spec Removes Ruby package dependency from health_monitor job.
.github/workflows/ruby.yml Removes the Ruby monitor test matrix entry.
.github/workflows/go.yml Adds lint/test jobs for the new Go bosh-monitor module.
Comments suppressed due to low confidence (1)

packages/health_monitor/spec:6

  • go build is invoked in this package, but the package spec declares no dependency that would provide a Go toolchain during BOSH compilation. Unless the compilation environment already has go available, this will fail to compile the release. Consider either (a) adding a golang-* package dependency, or (b) shipping prebuilt binaries (like other packages in this release) and removing the compile-time go build requirement.
---
name: health_monitor

files:
- bosh-monitor/**/*


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/spec/integration_support/bosh_monitor_manager.rb
Comment thread packages/health_monitor/packaging
Comment thread src/bosh-monitor/go.mod
Comment thread src/bosh-monitor/pkg/nats/director_monitor_test.go Outdated
Comment thread src/bosh-monitor/pkg/server/server_test.go
Comment thread src/bosh-monitor/pkg/server/server_test.go
Comment thread src/bosh-monitor/pkg/processor/event_processor.go Outdated
Comment thread src/bosh-monitor/cmd/plugins/hm-pagerduty/main.go
Comment thread src/bosh-monitor/cmd/plugins/hm-graphite/main.go
Comment thread src/bosh-monitor/cmd/plugins/hm-tsdb/main.go
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 20, 2026
Replace the Ruby bosh-monitor implementation with a Go-based binary.

Changes:
- Add Go bosh-monitor implementation under src/bosh-monitor/
- Delete Ruby bosh-monitor source (lib/, spec/, bin/, gemspec)
- Remove bosh-monitor gem from src/Gemfile and Gemfile.lock
- Update jobs/health_monitor/ to use Go binary (remove director-ruby-3.3 dep)
- Update packages/health_monitor/ to compile Go binary via go build
- Update .github/workflows/go.yml to test and lint src/bosh-monitor/
- Update .github/workflows/ruby.yml to remove monitor:parallel (Ruby deleted)
- Add integration test support: BoshMonitorManager builds Go binary for sandbox
- Fix hm-logger plugin output format to match Ruby logger for integration tests
- Update hm_stateless_spec.rb JSON heartbeat parsing to match Go slog format
- Fix sandbox health_monitor_without_resurrector.yml.erb (remove nats plugin)
- Ensure TLS peer verification with director_ca_cert and uaa_ca_cert
- Implement NATS connection retry logic during startup
- Align DataDog pagerduty_service_name routing with Ruby implementation
- Align Riemann severity string mapping with Ruby implementation
@aramprice aramprice force-pushed the experiment-golang-bosh-monitor branch from be95cbb to 1a7651a Compare June 23, 2026 02:10
aramprice and others added 2 commits June 23, 2026 10:23
…Data and filter heartbeats only

The hm-logger plugin logs both heartbeats and alerts as JSON. The test at
hm_stateless_spec.rb:96 ("only outputs complete heartbeats") was failing
because:

1. The test filter only excluded compilation jobs but not non-heartbeat
   events. Director alerts fired during deployment would appear in the
   collected entries and fail the heartbeat schema assertion.

2. The EventData struct lacked a top-level number_of_processes field,
   matching the Ruby HM's heartbeat JSON format.

Fixes:
- Add NumberOfProcesses interface{} to EventData with json omitempty.
- Populate it in eventToProto() from the heartbeat's raw Attrs map.
- Update the spec filter to also require kind == 'heartbeat'.
Implements Go equivalents of the following Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/runner_spec.rb:

  connect_to_mbus:
  - "should connect using SSL" — covered by buildTLSConfig tests:
      * returns TLS config with client certificate
      * enforces MinVersion TLS 1.2
      * populates CA certificate pool from server CA file
      * returns error when cert file is missing
      * returns error when cert and key do not match
      * returns error when server CA file is missing
      * returns error when server CA file contains no valid PEM blocks

  NATS connection retries:
  - "retries the connection until it succeeds"
  - "logs retry attempts"
  - "when timeout is exceeded / raises the last connection error"
  - "connection_wait_timeout from config / uses the configured timeout"
  - "when NATS connection fails with AuthError (subclass of ConnectError) /
     retries the connection" — in Go all error types are retried (broader
     than Ruby's ConnectError-only policy)
  - "when an error occurs while connecting / throws the error"
  - Uses default ConnectionWaitTimeout when not configured

Note: Ruby's "non-ConnectError does not retry" test is not applicable to
Go because the Go client retries all errors, not only ConnectError subtypes.

Implementation:
- Add connectFunc and retryWait package-level variables to client.go as
  test seams (production behaviour is unchanged: nats.Connect is called
  with a 1-second retry interval)
- Add export_test.go to expose BuildTLSConfig, ConnectFunc, and RetryWait
  for use in the external test package
- Add client_test.go with 14 new Ginkgo/Gomega specs (17 total in suite)
colins and others added 8 commits June 23, 2026 15:47
Implements Go equivalents of all Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/director_monitor_spec.rb:

  describe 'subscribe':
  - "subscribes to hm.director.alert over NATS" — Subscribe registers a
    handler with the NATS client; also covers Subscribe returning an error
    when the subscriber fails

  context 'alert handler / valid payload':
  - "does not log an error"
  - "tells the event processor to process the alert" — covered twice:
    once via handleAlert directly, and once end-to-end through Subscribe
    using fakeSubscriber.fireAlert
  - "passes all alert fields to the event processor"

  context 'alert handler / invalid payload':
  - For each of [id, severity, title, summary, created_at]:
    "logs an error if the <key> field is missing"
    "does not create a new director alert"
  - Malformed JSON: logs error and does not call processor

  Additional (no Ruby equivalent):
  - "logs the processor error" when event processor returns an error

Implementation:
- Add DirectorAlertSubscriber interface to director_monitor.go so a fake
  subscriber can be injected without a live NATS connection
- Extract the anonymous alert handler into handleAlert method
- Add HandleAlert(dm, payload) to export_test.go for direct handler testing
- Replace the skeleton director_monitor_test.go (which only tested JSON
  marshaling) with 13 real Ginkgo/Gomega specs (27 total in NATS suite)
Implements Go equivalents of all Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/auth_provider_spec.rb:

  Shared examples (:auth_provider_shared_tests) run once per CA cert context:
  - "returns auth header provided by UAA"
  - "reuses the same token for subsequent requests"
  - "when token is about to expire / obtains new token" (expires_in <= 60 s)
  - "when getting token fails / logs an error"
  - (implicit) does not raise / does not panic when token fetch fails

  Five CA cert selection contexts (uaaCACertPath logic):
  - "user provides director_ca_cert" — uses director_ca_cert for UAA requests
  - "user provides uaa_ca_cert with a non-empty file" — uaa_ca_cert takes
    priority over director_ca_cert
  - "user provides uaa_ca_cert but file is empty" — falls back to
    director_ca_cert (blank/whitespace-only file is not usable)
  - "user provides uaa_ca_cert but file is missing" — falls back to
    director_ca_cert (non-existent file is not usable)
  - "user has not provided director_ca_cert" — returns empty path, meaning
    the system trust store is used (Go: tls.Config{RootCAs: nil})

  Non-UAA mode (basic auth):
  - "returns the basic-auth header with encoded username and password"

Note: Ruby tests CA cert selection indirectly via CF::UAA::TokenIssuer mock
arguments.  Go tests uaaCACertPath() directly (exposed via export_test.go)
and separately verifies token behaviour against a plain-HTTP fake UAA server,
which gives the same overall coverage in fewer, more focused specs.

Implementation:
- Add export_test.go to pkg/director/ exposing UaaCACertPath(ap)
- Add auth_test.go with 12 new Ginkgo/Gomega specs (24 total in suite):
    5 CA cert selection tests
    4 token lifecycle tests (returns token, reuses, near-expiry, error)
    1 basic-auth correctness test (exact Base64 value)
    2 error-path tests (logs error, returns empty string without panic)
…clude matcher

The previous fix (7a09333) added kind==heartbeat to the post-loop filter
but the loop still broke on the first JSON entry (which is a director alert
during deployment). This left heartbeat_hashes empty after filtering.

Fixes:
- Move the kind==heartbeat and compilation-job filter inside the collection
  loop so the loop only breaks when actual heartbeats are found (ignoring
  alert events logged during deployment).
- Change expect(...).to match(...) to expect(...).to include(...) since
  EventData JSON includes an 'attributes' field (raw NATS payload) not
  listed in the expected hash; include() allows extra keys in actual.
Implements Go equivalents of Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/plugins/resurrector_helper_spec.rb
(module Bosh::Monitor::Plugins::ResurrectorHelper):

  AlertTracker#state_for contexts:
  - "when the number of unresponsive agents is 0" → reports as 'normal',
    summary shows '0 of 10 agents are unhealthy (0.0%)'
  - "when below the meltdown count threshold" → reports as 'managed'
  - "when at/above count threshold and below percent threshold" → 'managed'
  - "when at/above both count and percent thresholds" → 'meltdown'
  - summary format: "deployment: '<name>'; N of M agents are unhealthy (P%)"
  - "when recorded alerts are outside the time threshold" → excludes stale
    entries; only alerts within timeThreshold seconds count as unhealthy

  JobInstanceKey:
  - "hashes properly" → two keys with the same fields resolve to the same
    Go map entry (struct equality used as map key)

  newAlertTracker:
  - default config (empty options) produces minimumDownJobs=5,
    percentThreshold=0.2, timeThreshold=600
  - custom config values are respected

Implementation notes:
  The Ruby AlertTracker wraps a Bosh::Monitor.instance_manager lookup to
  obtain total agent count.  The Go deploymentState receives agentCount
  directly (the caller is responsible for providing it), so tests construct
  deploymentState directly rather than going through a manager mock.
  unhealthyCount() boundary semantics match the Ruby implementation:
  an alert at exactly now-threshold is NOT counted (strict After check).
Implements Go equivalents of Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/agent_spec.rb:

  TimedOut? boundary conditions:
  - "knows if it is timed out": false when just inside the agent_timeout
    threshold (Ruby: at t=344 s with agent_timeout=344), true one full
    second past it (Ruby: at t=345 s)
  - false for a freshly created agent

  Rogue? boundary conditions:
  - "knows if it is rogue": false when just inside rogue_agent_alert
    threshold (Ruby: at t=124 s with rogue_agent_alert=124), true one
    full second past it (Ruby: at t=125 s)
  - false once agent.Deployment is set (managed agent is never rogue)
  - false for a freshly created agent

  Name format across incremental state transitions:
  - only cid: "agent zb [cid=deadbeef]"
  - cid + instance_id: "agent zb [cid=deadbeef, instance_id=iuuid]"
  - + deployment (no job): "agent zb [deployment=oleg-cloud, cid=…, instance_id=…]"
  - deployment + job + instance_id: "oleg-cloud: mysql_node(iuuid) [id=zb, cid=deadbeef]"
  - + index: "oleg-cloud: mysql_node(iuuid) [id=zb, index=0, cid=deadbeef]"

  UpdateInstance:
  - "populates the corresponding attributes" — job, index, cid, instance_id
    are taken from the instance
  - "does not modify job_state or number_of_processes when updating instance"

Note: Ruby mocks Time.now to test exact-second boundaries; Go tests use a
500 ms guard margin inside the threshold to stay stable without a time mock.
…cessing

Implements Go equivalents of Ruby tests from
src/bosh-monitor/spec/unit/bosh/monitor/instance_manager_spec.rb
(context 'stubbed config' > describe '#process_event'):

  Heartbeat handling:
  - "can process" — heartbeats from unknown agents still register them;
    two events from the same agent count once (AgentsCount = 2)
  - "when heartbeat information cannot be completed for instance_id, job,
    or deployment" → "does not process the heartbeat" (processor not called)
  - "processes a valid populated heartbeat message" — processor receives
    agent_id, deployment, instance_id, job, teams, and timestamp
  - "when teams have changed between heartbeats" → "updates teams in
    heartbeat event": after a second SyncDeploymentState with ['ateam','bteam']
    the next heartbeat carries both teams

  Shutdown handling:
  - "shutdowns agent" — after shutdown.008 the agent count drops from 3 to 2
    and AnalyzeAgents reflects the removal

  Alert handling:
  - "bad alert" → "does not increment alerts_processed" when the processor
    returns an error (Go equivalent of Ruby raising Bosh::Monitor::InvalidEvent)
  - "good alert" → "increments alerts_processed" by 2 for two successful
    alert events
  - heartbeats_received increments after a valid heartbeat

Implementation:
- Add heartbeat_teams_test.go with a captureProcessor that records calls and
  can return a configured error, enabling assertion of both event data content
  and alert-counter behaviour.
Refactors each plugin to extract the anonymous PluginFunc closure to a
named run<Plugin> function, enabling direct use with pluginlib.RunWithIO
in tests without spawning the binary.  Also adds a shared plugintestutil
package with reusable helpers (CmdSink, SendInit, SendEvent, NextCmd, …).

Implements Go equivalents of the Ruby plugin specs:

  hm-logger (logger_spec.rb):
  - text format: events logged as "[HEARTBEAT] …" / "[ALERT] …"
  - json format: events serialised as valid JSON
  - startup log message emitted on launch

  hm-dummy (dummy_spec.rb):
  - each processed event increments the running total logged
  - plugin exits cleanly when stdin is closed

  hm-event-logger (event_logger_spec.rb):
  - alert event → http_request POST /events with action/object_type/
    object_name/deployment/instance/context fields
  - heartbeat events are silently ignored (no HTTP request)
  - context.message contains title and severity

  hm-graphite (graphite_spec.rb):
  - missing host or port → startup error
  - heartbeat metrics → TCP line "deployment.job.inst.agent.metric_name value ts"
  - custom prefix prepended to metric name

  hm-tsdb (tsdb_spec.rb):
  - missing host or port → startup error
  - heartbeat → TCP "put metric_name ts value deployment=…"
  - alert events are ignored

  hm-riemann (riemann_spec.rb):
  - missing host → error; missing port → error
  - alert → TCP JSON with service=bosh.hm, state=critical (severity 2)
  - heartbeat → TCP JSON with service=bosh.hm, name=metric_name, metric=value

  hm-consul (consul_spec.rb):
  - missing host/port/protocol → startup error
  - alert with events=true → PUT /v1/event/fire/<label>
  - heartbeat with TTL config → PUT /v1/agent/check/…

  hm-pagerduty (pagerduty_spec.rb):
  - missing service_key → startup error
  - alert → POST with service_key, event_type=trigger, incident_key=id
  - heartbeat → POST with service_key and heartbeat description

  hm-datadog (datadog_spec.rb):
  - missing api_key → error; missing application_key → error
  - heartbeat → POST /api/v1/series with bosh.healthmonitor.* metrics
  - alert → POST /api/v1/events with title/priority/alert_type
  - pagerduty_service_name appends @service to alert text for normal-priority

  hm-email (email_spec.rb):
  - missing recipients → startup error
  - missing smtp options → startup error
  - valid options → plugin starts and stops cleanly (no SMTP mock needed)

  hm-json:
  - nil options → uses default glob, starts and stops cleanly
  - non-matching glob → no child processes, handles events without error

Implementation notes:
- pagerduty and datadog goroutines now use select+ctx.Done() to avoid
  sending on a closed cmds channel when the plugin shuts down (concurrent
  HTTP calls no longer panic on clean exit)
- datadogSeriesURLTemplate / datadogEventsURLTemplate / apiURI promoted
  from const to var so tests can redirect HTTP calls to httptest.Servers
- Tests that override package-level URL vars are not run in parallel to
  prevent data races
Implements Go equivalents of the missing Ruby tests from
spec/unit/bosh/monitor/runner_spec.rb:

- "when NATS calls error handler with a ConnectError" / "shuts down
  the server": TestRunnerNATSConnectionFails verifies Run() returns a
  "failed to connect to NATS" error when the injected NATS client
  reports a connection failure.

- "when an error occurs while connecting" / "throws the error":
  TestRunnerNATSConnectErrorPropagated verifies the original error is
  wrapped and propagated with errors.Is semantics.

- "stops the HM server, stops the event loop" (handle_fatal_error /
  stop): TestRunnerStopsOnContextCancel and TestRunnerStop verify that
  cancelling the context (or calling Stop()) causes Run() to call
  shutdown(), close the NATS client, and return nil.

- "connection_wait_timeout is configured in mbus config" / "uses the
  configured timeout": TestRunnerNATSUsesConnectionWaitTimeout verifies
  the ConnectionWaitTimeout value is forwarded to the NATS Config and
  controls the number of connect attempts.

- "should connect using SSL": TestRunnerPassesTLSConfigToNATSClient
  verifies all Mbus TLS fields (endpoint, CA path, cert path, key path,
  timeout) are forwarded verbatim to the NATS client Config.

- Plugin startup errors are non-fatal: TestRunnerContinuesWhenPluginStartFails
  verifies the runner proceeds to NATS connect and reaches its running state
  even when a plugin executable cannot be found (matching the Ruby
  implementation's continue-on-error behaviour).

Production change: introduce a natsClient interface in runner.go and a
package-level newNATSClient factory variable so the test can inject a
fake client without requiring a live NATS server.  Also adds Stop() to
cancel the runner's context from outside.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Merge | Prioritized

Development

Successfully merging this pull request may close these issues.

4 participants