Add create, attach, and list endpoints to dynamic disks API#2743
Add create, attach, and list endpoints to dynamic disks API#2743Alphasite wants to merge 5 commits into
Conversation
Adds three new Director API endpoints to complement the existing provide/detach/delete endpoints: - GET /dynamic_disks list all dynamic disks (bosh.dynamic-disks.list) - POST /dynamic_disks create disk without attaching (bosh.dynamic-disks.create) - POST /dynamic_disks/:name/attach attach existing disk to a VM (bosh.dynamic-disks.attach) Implements CreateDynamicDisk and AttachDynamicDisk background jobs, both on the :dynamic_disks worker queue. All operations are idempotent. 70 unit specs, all passing. [TNZ-109499] Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughTwo new background job classes are introduced under Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.23.0)src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.12][ERROR]: unable to find a config; path Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR extends the BOSH Director dynamic disks API with endpoints to list disks, create a disk (without attaching), and attach an existing disk to a VM, to support the next phase of CLI commands.
Changes:
- Add
GET /dynamic_disksto list dynamic disks (new scope:bosh.dynamic-disks.list). - Add
POST /dynamic_disksto create a dynamic disk in the IaaS without attaching it. - Add
POST /dynamic_disks/:name/attachto attach an existing named disk to a VM via a background job.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rb | Adds list/create/attach endpoints and enqueues new background jobs |
| src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb | New job to create a dynamic disk without attaching |
| src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb | New job to attach an existing dynamic disk to an instance’s active VM |
| src/bosh-director/lib/bosh/director.rb | Requires the newly added job classes |
| src/bosh-director/spec/support/test_identity_provider.rb | Adds a scoped test user for the list endpoint |
| src/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rb | Adds controller coverage for the new endpoints and scope enforcement |
| src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb | Adds unit tests for the create job |
| src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rb | Adds unit tests for the attach job |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cloud_properties = find_disk_cloud_properties(instance, @disk_pool_name) | ||
| cloud = Bosh::Director::CloudFactory.create.get(vm.cpi) | ||
|
|
||
| disk_model = Models::DynamicDisk.find(name: @disk_name) | ||
| if disk_model.nil? | ||
| cloud_properties['name'] = @disk_name | ||
| disk_cid = cloud.create_disk(@disk_size, cloud_properties, vm.cid) |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rb`:
- Around line 8-21: The get '/' route handler in the DynamicDisksController has
an N+1 query issue because it retrieves all dynamic disks with
Models::DynamicDisk.all and then accesses the associated objects
(disk.deployment and disk.vm.instance) within the map block, causing separate
database queries for each disk. Fix this by using Sequel's eager loading method
on the Models::DynamicDisk query to pre-load the deployment and vm associations
along with the nested instance association before performing the map operation.
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb`:
- Around line 30-42: The code directly mutates the cloud_properties hash by
assigning cloud_properties['name'] = `@disk_name` on line 32, which could cause
unintended side effects if this hash is shared or cached. Instead of modifying
the original hash in place, create a new hash by merging the cloud_properties
with a new hash containing the name key, then pass this merged hash to the
cloud.create_disk call. This preserves the original cloud_properties hash and
prevents potential side effects from mutations.
In
`@src/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rb`:
- Around line 337-356: Expand the validation test coverage for the POST '/'
endpoint to match the comprehensiveness of the existing POST '/provide' endpoint
tests. In addition to the current test contexts for disk_name being nil and
disk_size being 0, add more test cases covering nil values for other fields,
empty string validations, and boundary value tests for disk_size and other
numeric properties. Reference the validation test patterns used in the POST
'/provide' endpoint (lines 84-168) to ensure consistency in what scenarios are
being tested, while leveraging the same safe_property validation helper that
both endpoints use.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fb65caa2-5ab2-45be-a214-c92ffe8705bb
📒 Files selected for processing (8)
src/bosh-director/lib/bosh/director.rbsrc/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rbsrc/bosh-director/spec/support/test_identity_provider.rbsrc/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb
Models::DynamicDisk.all.map will cause N+1 queries when accessing disk.deployment and disk.vm.instance for each row. Eager-load the associated records so listing scales with number of disks. ai-assisted=yes [TNZ-109499] Add CLI commands for Dynamic Disks - Phase 2 (list, create, standalone attach) Signed-off-by: Nishad Mathur <nishad.mathur@broadcom.com>
also allows attach_dynamic_disk to accept and set metadata, since ProvideDynamicDisk technically can override metadata if a disk already exists ai-assisted=yes [TNZ-109499] Add CLI commands for Dynamic Disks - Phase 2 (list, create, standalone attach) Signed-off-by: Nishad Mathur <nishad.mathur@broadcom.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb (2)
29-31: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftSame-VM idempotent return can suppress agent reconciliation after partial failure.
Line 37 persists attachment state before agent notification (Lines 44-45), but Lines 29-31 return early on retries when VM already matches. If an earlier attempt fails after Line 37 (e.g., metadata/agent error), retry skips
add_dynamic_disk, leaving the agent unsynced.Also applies to: 37-45
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb` around lines 29 - 31, The early return check in the unless block at lines 29-31 prevents agent reconciliation on retries when a disk is already attached to the same VM, which can leave the agent out of sync if a previous attempt failed after persisting attachment state at line 37 but before agent notification at lines 44-45. Modify the logic to allow the flow to continue to the add_dynamic_disk call even when the disk is already attached to the same VM, or remove this early return condition entirely so that agent notification (lines 44-45) is always executed regardless of whether the attachment state already exists in the database.
29-37: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftAttachment-state check is outside a disk-scoped lock, allowing cross-VM races.
The
disk_model.vm.nil?check (Lines 29-33) happens before locking, and the lock at Line 36 is VM-scoped. Two jobs targeting different VMs can concurrently pass the nil check for the same disk and race intoattach_disk.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb` around lines 29 - 37, The disk attachment state check (disk_model.vm.nil?) occurs outside any disk-scoped lock, allowing concurrent jobs targeting different VMs to race on the same disk. The existing with_vm_lock protects the attach_disk call but only for a specific VM, not the disk itself. Add a disk-scoped lock that wraps the entire disk attachment logic including the nil check, the condition that verifies if the disk is already attached to the same or different VM, and the attach_disk operation. This ensures that only one job can check and modify the attachment state of a specific disk at any given time, preventing cross-VM races on disk_model.src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb (1)
21-42: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftCPI disk creation is not failure-safe across the DB write boundary.
At Line 34, the CPI create happens before the DB write at Line 35. If persistence fails (race/DB failure), the job can leave orphaned IaaS disks, and retries may create more. The pre-check at Lines 21-22 is not atomic protection.
💡 Suggested hardening pattern
- cloud_properties['name'] = `@disk_name` - disk_cid = cloud.create_disk(`@disk_size`, cloud_properties, vm.cid) - disk_model = Models::DynamicDisk.create( - name: `@disk_name`, - disk_cid: disk_cid, - deployment_id: instance.deployment.id, - size: `@disk_size`, - disk_pool_name: `@disk_pool_name`, - cpi: vm.cpi, - ) + cloud_properties = cloud_properties.merge('name' => `@disk_name`) + disk_cid = nil + begin + disk_cid = cloud.create_disk(`@disk_size`, cloud_properties, vm.cid) + disk_model = Models::DynamicDisk.create( + name: `@disk_name`, + disk_cid: disk_cid, + deployment_id: instance.deployment.id, + size: `@disk_size`, + disk_pool_name: `@disk_pool_name`, + cpi: vm.cpi, + ) + rescue + cloud.delete_disk(disk_cid) unless disk_cid.nil? + raise + end🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb` around lines 21 - 42, The cloud.create_disk call at line 34 happens before the Models::DynamicDisk.create call at line 35, creating a window where the disk exists in the IaaS but not in the database. If the database write fails, orphaned disks are left and retries create duplicates. Wrap both the cloud.create_disk operation and the Models::DynamicDisk.create operation within a single database transaction to ensure atomicity, or consider creating the database record first with a pending state, then updating it after successful cloud disk creation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb`:
- Around line 29-31: The early return check in the unless block at lines 29-31
prevents agent reconciliation on retries when a disk is already attached to the
same VM, which can leave the agent out of sync if a previous attempt failed
after persisting attachment state at line 37 but before agent notification at
lines 44-45. Modify the logic to allow the flow to continue to the
add_dynamic_disk call even when the disk is already attached to the same VM, or
remove this early return condition entirely so that agent notification (lines
44-45) is always executed regardless of whether the attachment state already
exists in the database.
- Around line 29-37: The disk attachment state check (disk_model.vm.nil?) occurs
outside any disk-scoped lock, allowing concurrent jobs targeting different VMs
to race on the same disk. The existing with_vm_lock protects the attach_disk
call but only for a specific VM, not the disk itself. Add a disk-scoped lock
that wraps the entire disk attachment logic including the nil check, the
condition that verifies if the disk is already attached to the same or different
VM, and the attach_disk operation. This ensures that only one job can check and
modify the attachment state of a specific disk at any given time, preventing
cross-VM races on disk_model.
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb`:
- Around line 21-42: The cloud.create_disk call at line 34 happens before the
Models::DynamicDisk.create call at line 35, creating a window where the disk
exists in the IaaS but not in the database. If the database write fails,
orphaned disks are left and retries create duplicates. Wrap both the
cloud.create_disk operation and the Models::DynamicDisk.create operation within
a single database transaction to ensure atomicity, or consider creating the
database record first with a pending state, then updating it after successful
cloud disk creation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2d6b5c6f-fbc5-4b73-a085-3193aa69fb49
📒 Files selected for processing (6)
src/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rbsrc/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb
…on attach - GET /dynamic_disks now returns disk_pool_name and metadata fields - POST /dynamic_disks (create): replace instance_id with deployment_name+az; finds an active VM in the requested AZ to use as CPI hint for placement; stores AZ in availability_zone column at creation time - POST /dynamic_disks/:name/attach: assert disk AZ matches target VM AZ, skip check when disk AZ is nil (disks created before AZ tracking) - Refactor find_disk_cloud_properties helper to accept deployment or instance - 76 unit specs, all passing [TNZ-109499] Co-authored-by: Cursor <cursoragent@cursor.com>
563e69e to
68e7872
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb (1)
29-33: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftSerialize attach operations by disk identity, not only VM identity.
Line 29 pre-checks
disk_model.vm, but Line 42 locks byvm.cid. Two concurrent attaches of the same disk to different VMs can both pass the pre-check and race into CPI attach before Line 43 persists ownership, causing inconsistent disk attachment state.Please guard the full precheck + attach + update sequence with a disk-scoped lock (or equivalent atomic claim on the disk row).
Also applies to: 42-43
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb` around lines 29 - 33, The current locking mechanism in the attach_dynamic_disk operation is scoped by VM identity (at line 42 using vm.cid), but the pre-check validation (lines 29-33) operates on disk identity. This creates a race condition where two concurrent attach operations targeting the same disk to different VMs can both pass the pre-check with different VM locks, then both proceed to call the CPI attach, causing inconsistent disk state. Change the synchronization strategy from VM-scoped locking to disk-scoped locking to ensure that the entire sequence of pre-check validation, CPI attach call, and persistence update is atomically protected by a lock based on the disk identity (disk_model) rather than the target VM identity, preventing concurrent attachments of the same disk.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb`:
- Around line 29-33: The current locking mechanism in the attach_dynamic_disk
operation is scoped by VM identity (at line 42 using vm.cid), but the pre-check
validation (lines 29-33) operates on disk identity. This creates a race
condition where two concurrent attach operations targeting the same disk to
different VMs can both pass the pre-check with different VM locks, then both
proceed to call the CPI attach, causing inconsistent disk state. Change the
synchronization strategy from VM-scoped locking to disk-scoped locking to ensure
that the entire sequence of pre-check validation, CPI attach call, and
persistence update is atomically protected by a lock based on the disk identity
(disk_model) rather than the target VM identity, preventing concurrent
attachments of the same disk.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3597a1b6-f1df-4370-a83d-9e12224d5241
📒 Files selected for processing (7)
src/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/helpers/dynamic_disk_helpers.rbsrc/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb (1)
29-33: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winDo not return early before reconciliation in idempotent attach path.
Line 30 returns immediately when already attached to the same VM, which skips metadata reconciliation and agent sync; this makes retries non-convergent. Also, Line 51 unconditionally calls
add_dynamic_disk, unlike existing dynamic-disk flow that guards whendisk_hintis nil.Suggested fix shape
- unless disk_model.vm.nil? - return "disk `#{`@disk_name`}` is already attached to vm `#{disk_model.vm.cid}`" if disk_model.vm.id == vm.id + already_attached_same_vm = false + unless disk_model.vm.nil? + if disk_model.vm.id == vm.id + already_attached_same_vm = true + else + raise "disk `#{`@disk_name`}` is already attached to a different vm `#{disk_model.vm.cid}`" + end - raise "disk `#{`@disk_name`}` is already attached to a different vm `#{disk_model.vm.cid}`" end - disk_hint = with_vm_lock(vm.cid, timeout: VM_LOCK_TIMEOUT) { cloud.attach_disk(vm.cid, disk_model.disk_cid) } - disk_model.update(vm_id: vm.id, availability_zone: vm.instance.availability_zone, disk_hint: disk_hint) + unless already_attached_same_vm + disk_hint = with_vm_lock(vm.cid, timeout: VM_LOCK_TIMEOUT) { cloud.attach_disk(vm.cid, disk_model.disk_cid) } + disk_model.update(vm_id: vm.id, availability_zone: vm.instance.availability_zone, disk_hint: disk_hint) + end - agent_client.add_dynamic_disk(disk_model.disk_cid, disk_model.disk_hint) + agent_client.add_dynamic_disk(disk_model.disk_cid, disk_model.disk_hint) unless disk_model.disk_hint.nil?Also applies to: 50-51
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb` around lines 29 - 33, The idempotent attach path returns early when a disk is already attached to the same VM, which skips necessary metadata reconciliation and agent sync operations, making retries non-convergent. Remove the early return in the condition block within the unless statement (that returns the "disk is already attached" message) so that the method continues execution to perform reconciliation and sync even when already attached to the correct VM. Additionally, guard the unconditional call to add_dynamic_disk so that it only executes when disk_hint is not nil, matching the pattern used in the existing dynamic-disk flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb`:
- Around line 22-39: The create_dynamic_disk operation has a race condition
between the existence check at `Models::DynamicDisk.find(name: `@disk_name`)` and
the actual disk creation at `cloud.create_disk()`. Two concurrent jobs with the
same disk_name can both pass the existence check and create disks in the IaaS,
leaving orphaned resources. Serialize the create-by-name operation by adding a
locking mechanism (such as a database-level lock or mutex based on `@disk_name`)
around the entire sequence from the existence check through to the
Models::DynamicDisk.create call. This ensures that only one job can execute the
full check-and-create flow for a given disk_name at any time, preventing
duplicate IaaS disk creation.
---
Outside diff comments:
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb`:
- Around line 29-33: The idempotent attach path returns early when a disk is
already attached to the same VM, which skips necessary metadata reconciliation
and agent sync operations, making retries non-convergent. Remove the early
return in the condition block within the unless statement (that returns the
"disk is already attached" message) so that the method continues execution to
perform reconciliation and sync even when already attached to the correct VM.
Additionally, guard the unconditional call to add_dynamic_disk so that it only
executes when disk_hint is not nil, matching the pattern used in the existing
dynamic-disk flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7606deb5-cd63-4db7-a8a0-090b6b9c747f
📒 Files selected for processing (7)
src/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/helpers/dynamic_disk_helpers.rbsrc/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb
| disk_model = Models::DynamicDisk.find(name: @disk_name) | ||
| raise "disk `#{@disk_name}` already exists" unless disk_model.nil? | ||
|
|
||
| deployment = Models::Deployment.find(name: @deployment_name) | ||
| raise "deployment `#{@deployment_name}` not found" if deployment.nil? | ||
|
|
||
| # Find an active VM in the requested AZ to use as the hint for create_disk. | ||
| # The IaaS uses the VM's location to determine which AZ/datastore to place the disk in. | ||
| vm = find_active_vm_in_az(deployment, @az) | ||
| raise "no active VM found in deployment `#{@deployment_name}` in AZ `#{@az}`" if vm.nil? | ||
|
|
||
| cloud = Bosh::Director::CloudFactory.create.get(vm.cpi) | ||
|
|
||
| cloud_properties = find_disk_cloud_properties(deployment, @disk_pool_name) | ||
| cloud_properties['name'] = @disk_name | ||
|
|
||
| disk_cid = cloud.create_disk(@disk_size, cloud_properties, vm.cid) | ||
| disk_model = Models::DynamicDisk.create( |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift
Serialize create-by-name to prevent duplicate/orphan IaaS disks.
Line 22 (find) and Line 38 (cloud.create_disk) are not atomic. Two concurrent jobs with the same @disk_name can both pass the existence check and both create disks in CPI; only one record may persist, leaving orphaned infrastructure resources.
Suggested direction
- disk_model = Models::DynamicDisk.find(name: `@disk_name`)
- raise "disk `#{`@disk_name`}` already exists" unless disk_model.nil?
+ # Serialize by disk name (DB lock or distributed lock) before calling CPI create_disk
+ # and persist reservation/record before external side effects where possible.
+ # If CPI create succeeds but persistence fails, perform compensating cleanup (delete_disk).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb`
around lines 22 - 39, The create_dynamic_disk operation has a race condition
between the existence check at `Models::DynamicDisk.find(name: `@disk_name`)` and
the actual disk creation at `cloud.create_disk()`. Two concurrent jobs with the
same disk_name can both pass the existence check and create disks in the IaaS,
leaving orphaned resources. Serialize the create-by-name operation by adding a
locking mechanism (such as a database-level lock or mutex based on `@disk_name`)
around the entire sequence from the existence check through to the
Models::DynamicDisk.create call. This ensures that only one job can execute the
full check-and-create flow for a given disk_name at any time, preventing
duplicate IaaS disk creation.
- Avoid mutating the cloud_properties hash returned by find_disk_cloud_properties; use merge instead - Clean up orphaned IaaS disk with cloud.delete_disk if the DB write fails after cloud.create_disk - Fix attach idempotency: remove early return on same-VM so agent reconciliation (add_dynamic_disk) always runs on retry; guard the agent call when disk_hint is nil - Expand POST / controller validation tests to match the coverage of POST /provide ai-assisted=yes [TNZ-109499] Add CLI commands for Dynamic Disks - Phase 2 (list, create, standalone attach) Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rb (1)
93-112: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd an idempotent retry test when
disk_hintis nil.Current same-VM idempotency coverage only exercises a non-nil
disk_hint. Add adisk_hint: nilcase so behavior is explicit and protected against regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rb` around lines 93 - 112, The current test for the idempotent retry scenario only covers the case where disk_hint has a non-nil value. Add an additional test context after the existing 'when disk is already attached to the same VM' context that specifically exercises the idempotent retry behavior when disk_hint is nil. This new context should use the same FactoryBot.create pattern to create a disk but with disk_hint explicitly set to nil, and verify that the same behavior occurs: the cloud attach_disk method is not called, but the agent_client receives the add_dynamic_disk call with nil as the disk_hint parameter.src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb (2)
29-49: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy liftProtect attachment-state checks with a disk-level lock/transaction.
Line 30-35 (
disk_model.vmcheck) and Line 47-48 (attach_disk+ update) are not atomic per disk. VM-level locking only serializes per target VM, so concurrent jobs can attach the same disk to different VMs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb` around lines 29 - 49, The disk attachment check and update operations are not protected by a disk-level lock, creating a race condition where concurrent jobs can attach the same disk to different VMs. The `disk_model.vm` nil check and the `attach_disk` + `disk_model.update` operations must be wrapped in a disk-level lock or transaction to ensure atomicity. Currently only a VM-level lock (with_vm_lock) is used which only serializes per target VM, not per disk. Add a disk-level locking mechanism (similar to with_vm_lock but for the disk) that encompasses both the initial disk state check and the attach/update operations to prevent concurrent attachment of the same disk to multiple VMs.
56-57: 🩺 Stability & Availability | 🟠 MajorAdd nil check before notifying agent in idempotent retry path.
Line 57 passes
disk_model.disk_hintto the agent without validation. If the disk is already attached to the same VM (idempotent retry), the code skips lines 46-49 and uses the storeddisk_hintdirectly from the database. Sincedisk_hint_jsonis nullable in the schema and the model returnsnilwhen empty, this can result inadd_dynamic_disk(disk_cid, nil)being sent to the agent.This inconsistency exists elsewhere in the codebase:
provide_dynamic_disk.rb(lines 59–61) explicitly guards withunless disk_model.disk_hint.nil?before callingadd_dynamic_disk, andmount_info_mismatch.rb(line 74) guards withif disk_hint. Add a similar guard in the idempotent retry path to ensure compatibility with legacy or partially-updated disk records that may have a nulldisk_hint_jsonvalue.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb` around lines 56 - 57, The call to agent_client.add_dynamic_disk at line 57 in the attach_dynamic_disk method passes disk_model.disk_hint without validating it first. Since disk_hint_json is nullable in the database schema, disk_model.disk_hint can be nil, which would result in sending add_dynamic_disk(disk_cid, nil) to the agent. Add a guard condition around the agent_client.add_dynamic_disk call to check that disk_model.disk_hint is not nil before invoking it, similar to the pattern used in provide_dynamic_disk.rb (lines 59-61) and mount_info_mismatch.rb (line 74). Only call add_dynamic_disk when disk_model.disk_hint has a value.
♻️ Duplicate comments (1)
src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb (1)
21-52: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftSerialize create-by-name around the CPI call path.
Line 22 and Line 38 are still a check-then-act race: concurrent jobs for the same
@disk_namecan both pass existence check and both callcloud.create_diskbefore one DB write fails. Even with Line 49 cleanup, this still causes non-atomic external side effects and brittle retry behavior.Suggested direction
def perform - disk_model = Models::DynamicDisk.find(name: `@disk_name`) - raise "disk `#{`@disk_name`}` already exists" unless disk_model.nil? + with_lock("dynamic-disk-create:#{`@disk_name`}") do + disk_model = Models::DynamicDisk.find(name: `@disk_name`) + raise "disk `#{`@disk_name`}` already exists" unless disk_model.nil? - deployment = Models::Deployment.find(name: `@deployment_name`) - ... - disk_cid = cloud.create_disk(`@disk_size`, cloud_properties, vm.cid) - begin - disk_model = Models::DynamicDisk.create(...) - rescue - cloud.delete_disk(disk_cid) - raise + deployment = Models::Deployment.find(name: `@deployment_name`) + ... + disk_cid = cloud.create_disk(`@disk_size`, cloud_properties, vm.cid) + begin + disk_model = Models::DynamicDisk.create(...) + rescue + cloud.delete_disk(disk_cid) + raise + end end end🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb` around lines 21 - 52, The perform method has a race condition where concurrent jobs can both pass the disk existence check at line 22-23 and proceed to call cloud.create_disk at line 37, causing duplicate disk creation attempts in the cloud. Serialize access around the critical section from the disk existence check through the cloud.create_disk call by implementing a lock mechanism (such as a database-level lock or mutex) based on the disk_name to ensure only one job can execute this sequence at a time and prevent concurrent calls to cloud.create_disk for the same disk name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb`:
- Around line 29-49: The disk attachment check and update operations are not
protected by a disk-level lock, creating a race condition where concurrent jobs
can attach the same disk to different VMs. The `disk_model.vm` nil check and the
`attach_disk` + `disk_model.update` operations must be wrapped in a disk-level
lock or transaction to ensure atomicity. Currently only a VM-level lock
(with_vm_lock) is used which only serializes per target VM, not per disk. Add a
disk-level locking mechanism (similar to with_vm_lock but for the disk) that
encompasses both the initial disk state check and the attach/update operations
to prevent concurrent attachment of the same disk to multiple VMs.
- Around line 56-57: The call to agent_client.add_dynamic_disk at line 57 in the
attach_dynamic_disk method passes disk_model.disk_hint without validating it
first. Since disk_hint_json is nullable in the database schema,
disk_model.disk_hint can be nil, which would result in sending
add_dynamic_disk(disk_cid, nil) to the agent. Add a guard condition around the
agent_client.add_dynamic_disk call to check that disk_model.disk_hint is not nil
before invoking it, similar to the pattern used in provide_dynamic_disk.rb
(lines 59-61) and mount_info_mismatch.rb (line 74). Only call add_dynamic_disk
when disk_model.disk_hint has a value.
In
`@src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rb`:
- Around line 93-112: The current test for the idempotent retry scenario only
covers the case where disk_hint has a non-nil value. Add an additional test
context after the existing 'when disk is already attached to the same VM'
context that specifically exercises the idempotent retry behavior when disk_hint
is nil. This new context should use the same FactoryBot.create pattern to create
a disk but with disk_hint explicitly set to nil, and verify that the same
behavior occurs: the cloud attach_disk method is not called, but the
agent_client receives the add_dynamic_disk call with nil as the disk_hint
parameter.
---
Duplicate comments:
In
`@src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb`:
- Around line 21-52: The perform method has a race condition where concurrent
jobs can both pass the disk existence check at line 22-23 and proceed to call
cloud.create_disk at line 37, causing duplicate disk creation attempts in the
cloud. Serialize access around the critical section from the disk existence
check through the cloud.create_disk call by implementing a lock mechanism (such
as a database-level lock or mutex) based on the disk_name to ensure only one job
can execute this sequence at a time and prevent concurrent calls to
cloud.create_disk for the same disk name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: de11dbb8-dd69-4155-9f0f-19da14c419f2
📒 Files selected for processing (5)
src/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rbsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rbsrc/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rb
Summary
Extends the BOSH Director's dynamic disk API with three new endpoints to support Phase 2 CLI commands (TNZ-109499):
GET /dynamic_disks— list all dynamic disks (bosh.adminorbosh.dynamic-disks.listscope)POST /dynamic_disks— create a disk in the IaaS without attaching it (bosh.dynamic-disks.createscope)POST /dynamic_disks/:name/attach— attach an existing named disk to a VM (bosh.dynamic-disks.attachscope)Changes
New files
src/bosh-director/lib/bosh/director/jobs/dynamic_disks/create_dynamic_disk.rb— background job that creates a disk via CPI without attachingsrc/bosh-director/lib/bosh/director/jobs/dynamic_disks/attach_dynamic_disk.rb— background job that attaches a named disk to a VM (idempotent)src/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/create_dynamic_disk_spec.rbsrc/bosh-director/spec/unit/bosh/director/jobs/dynamic_disks/attach_dynamic_disk_spec.rbModified files
src/bosh-director/lib/bosh/director/api/controllers/dynamic_disks_controller.rb— added 3 new routessrc/bosh-director/lib/bosh/director.rb— required new job classessrc/bosh-director/spec/support/test_identity_provider.rb— addeddynamic-disks-listeranddynamic-disks-attachertest userssrc/bosh-director/spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rb— full coverage for all 6 endpointsTest plan
bundle exec rspec spec/unit/bosh/director/api/controllers/dynamic_disks_controller_spec.rb spec/unit/bosh/director/jobs/dynamic_disks/— 70 examples, 0 failuresRelated: TNZ-109499, TNZ-99509