Skip to content

Conversation

@fm3
Copy link
Member

@fm3 fm3 commented Oct 27, 2025

  • When scanning datasources from directories, always add resolved mag paths immediately
  • Introduces realpaths for attachments
  • Report all realpaths that could be determined, scale down logging for those that couldn’t.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • start wk with some datasets in the binaryData dir, watch backend logging, should show realpath scan failures (provoke some by deleting mags or attachments that are referenced in some datasource-properties.jsons)
  • All realpaths that could be determined should be added to the database

TODOs:

  • immediate paths for datasources
  • simplify layerpath logic
  • attachment realpath column
  • don’t fail for datasource on single realpath failure
  • log failures
  • scan also attachments
  • evolution
  • adapt s3-delete code to also consider realpaths of attachments

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Oct 27, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

📝 Walkthrough

Walkthrough

Renames datastore endpoint to updateRealPaths, replaces MagPathInfo with RealPathInfo across clients/controllers/services, adds realPath and hasLocalData to attachments with DAO/migration support, updates mag-to-literal calls to require explicit allowScalar, and reworks scanning to collect real-paths with failure aggregation.

Changes

Cohort / File(s) Change Summary
Controller & Routes
app/controllers/WKRemoteDataStoreController.scala, conf/webknossos.latest.routes
Endpoint renamed updatePathsupdateRealPaths; controller delegates to datasetService.updateRealPaths; route mapping updated.
Dataset model & DAOs
app/models/dataset/Dataset.scala, app/models/dataset/DatasetService.scala, app/models/dataset/DatasetUploadToPathsService.scala
Replaced MagPathInfo with RealPathInfo; added updateAttachmentRealPathsForDataset; DatasetService now accepts datasetLayerAttachmentsDAO and updates mags and attachments via updateRealPathsForDataSource; minor mag-literal call adjustments.
Datastore client DTOs
webknossos-datastore/app/.../services/DSRemoteWebknossosClient.scala
New RealPathInfo(path, realPath, hasLocalData); DataSourcePathInfo now contains magPathInfos and attachmentPathInfos as Seq[RealPathInfo]; JSON formats and reportRealPaths signature updated.
Datastore scanning & services
webknossos-datastore/app/.../services/DataSourceService.scala, webknossos-datastore/app/.../controllers/DSLegacyApiController.scala, webknossos-datastore/app/.../controllers/DataSourceController.scala
Added scanRealPaths/scanRealPathsForDataSource, getMagPathInfo, getAttachmentPathInfo; dataSourceFromDir gains resolveMagPaths: Boolean; call sites updated to pass resolver flag; scanning/reporting consolidated with failure aggregation.
Storage resolution & callers
webknossos-datastore/app/.../storage/DataVaultService.scala, webknossos-datastore/app/.../services/uploading/UploadService.scala
resolveMagPath signature simplified (removed layerName), relative resolution always against dataset dir; callers updated; upload flows pass resolveMagPaths=false; path validation surfaces disallowed-paths message.
Tracingstore adjustments
webknossos-tracingstore/app/.../TSRemoteDatastoreClient.scala, webknossos-tracingstore/app/.../VolumeSegmentIndexBuffer.scala
mag serialization updated to toMagLiteral(allowScalar = false) where used.
Utility API
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala
toMagLiteral now requires explicit allowScalar: Boolean (default removed).
Database schema & migrations
conf/evolutions/145-attachment-realpaths.sql, conf/evolutions/reversions/145-attachment-realpaths.sql, tools/postgres/schema.sql
Added realPath TEXT and hasLocalData BOOLEAN NOT NULL DEFAULT FALSE to webknossos.dataset_layer_attachments; bumped schema 144→145; migrations include precondition and revert.
Upload & validation
webknossos-datastore/app/.../services/uploading/UploadService.scala, conf/messages
dataSourceFromDir call sites updated with resolveMagPaths=false; upload path validation now surfaces dataset.upload.disallowedPaths message.
Changelog
unreleased_changes/9019.md
Added entries for attachment realpaths, improved realpath scan resilience, and migration reference.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • Migration SQL and DAO updateAttachmentRealPathsForDataset correctness and transactional safety.
    • DataSourceService refactor: scanRealPaths/aggregation and interplay with reporting/error handling.
    • DataVaultService.resolveMagPath change (relative resolution) and all call-site updates.
    • Vec3Int.toMagLiteral signature change — ensure all callers supply explicit allowScalar.
    • DSRemoteWebknossosClient DTO/JSON changes — cross-service compatibility.

Possibly related PRs

Suggested reviewers

  • MichaelBuessemeyer
  • normanrz

Poem

"I hop through code with carrot cheer,
I map the realpaths far and near.
When one mag stumbles, I still keep score,
Attachments find the paths they bore. 🐇✨"

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: making realpath scans more resilient to handle failures without blocking entire datasources.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the three main objectives of resilience, attachment support, and improved logging.
Linked Issues check ✅ Passed All coding objectives from issue #9018 are fully addressed: single mag failures no longer block entire datasets [DatasetService], base paths determined early in datasource.fromDisk [DataSourceService], attachment realpaths support added [Dataset.scala, migrations], and realpath reporting integrated into original scan [DataSourceService].
Out of Scope Changes check ✅ Passed All changes directly support the core objectives of making realpath scans resilient and adding attachment support; minor refactorings like Vec3Int.toMagLiteral parameter changes and route renaming are necessary supporting changes for the main functionality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch realpath-scan

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27295e4 and 9e3ddad.

📒 Files selected for processing (4)
  • conf/messages (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • since (68-68)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
  • reportDataSources (107-114)
  • reportRealPaths (114-120)
  • nonEmpty (43-43)
  • DataSourcePathInfo (40-44)
  • DataSourcePathInfo (46-48)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • resolveMagPath (80-99)
  • resolveMagPath (99-104)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (12)
  • isRemote (113-113)
  • isRemote (178-178)
  • toLocalPath (111-111)
  • toLocalPath (176-176)
  • startsWith (138-142)
  • startsWith (198-205)
  • UPath (54-96)
  • fromLocalPath (80-80)
  • isAbsolute (99-99)
  • isAbsolute (147-147)
  • toString (106-109)
  • toString (174-174)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (10)
conf/webknossos.latest.routes (1)

128-128: LGTM! Route rename improves clarity.

The rename from paths to realpaths better reflects the endpoint's specific purpose. The lowercase realpaths follows the existing convention in this file (e.g., datasources, tracingstores, sharingToken), making it more consistent than the previously suggested realPaths with camelCase.

conf/messages (1)

124-124: LGTM! Clear error message for path validation.

The new error message is well-written and consistent with the existing messages in the file. It appropriately communicates the upload restriction related to disallowed paths.

Note: This message is very similar to dataSource.add.pathsNotAllowed at line 139, which is expected since they handle related validation scenarios (upload vs. add operations).

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (8)

8-8: LGTM!

The Instant import is correctly added to support the timing measurements in checkInbox and logging.


71-100: LGTM!

The refactored checkInbox correctly separates datasource discovery from realpath scanning. The failure aggregation approach ensures that individual realpath failures don't block reporting other datasources or their realpaths.


102-107: LGTM!

The failure aggregation logic correctly collects realpath failures without blocking the processing of other datasources.


109-129: LGTM!

The method correctly implements resilient realpath scanning by collecting both successful path resolutions and failures separately, ensuring that partial failures don't prevent reporting successful realpaths.


162-196: LGTM!

The enhanced logging correctly integrates timing information and realpath scan results. The verbose failure logging (lines 192-195) provides good operational visibility without cluttering normal logs.

Note: The variable name byOrganization on line 178 is clear and descriptive—no change needed despite the previous comment suggestion.


220-231: LGTM!

Setting resolveMagPaths = true when scanning organization directories correctly implements the PR objective of resolving mag paths immediately during the initial datasource scan.


233-267: LGTM!

The updated signature and conditional mag path resolution correctly implement the PR objective of controlling when mag paths are resolved during datasource loading. The transformation chain (attachments → optional mag paths → id) is clear and correct.


269-278: LGTM!

The addMagPaths method correctly populates explicit mag paths using resolveMagPath. The use of layer.mapped with newMags = Some(...) ensures the resolved paths are properly integrated into the datasource structure.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from refactor-datavault-service to master October 29, 2025 12:01
@fm3 fm3 changed the title WIP: Make Realpath Scans More Resilient Make Realpath Scans More Resilient Oct 30, 2025
@fm3 fm3 marked this pull request as ready for review October 30, 2025 12:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)

1262-1283: Deduplicate attachments by real path as well.

We now record realPath for attachments, but the storage query still partitions solely by the original path, so symlinked or otherwise relocated files with differing logical paths will continue to be counted multiple times. Mirror the mags query and fall back to path only when realPath is unavailable.

-              ROW_NUMBER() OVER (
-                PARTITION BY att.path
+              ROW_NUMBER() OVER (
+                PARTITION BY COALESCE(att.realPath, att.path)
                 ORDER BY ds.created ASC
               ) AS rn
@@
-          WHERE ranked.rn = 1
+          WHERE ranked.rn = 1
             AND ranked._organization = $organizationId
             AND ranked._dataStore = $dataStoreId
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e6cca4 and 5c74222.

📒 Files selected for processing (16)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • app/models/dataset/DatasetUploadToPathsService.scala (1 hunks)
  • app/models/dataset/WKRemoteDataStoreClient.scala (1 hunks)
  • conf/evolutions/145-attachment-realpaths.sql (1 hunks)
  • conf/evolutions/reversions/145-attachment-realpaths.sql (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • tools/postgres/schema.sql (2 hunks)
  • unreleased_changes/9019.md (1 hunks)
  • util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (7 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2024-11-22T17:18:04.217Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
  • app/models/dataset/DatasetService.scala
  • app/models/dataset/DatasetUploadToPathsService.scala
  • app/models/dataset/Dataset.scala
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala
  • app/models/dataset/WKRemoteDataStoreClient.scala
📚 Learning: 2024-11-22T17:19:07.947Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

Applied to files:

  • app/models/dataset/Dataset.scala
  • tools/postgres/schema.sql
📚 Learning: 2025-05-07T06:17:32.810Z
Learnt from: philippotto
PR: scalableminds/webknossos#8602
File: frontend/javascripts/oxalis/model/volumetracing/volume_annotation_sampling.ts:365-366
Timestamp: 2025-05-07T06:17:32.810Z
Learning: The parameter in applyVoxelMap was renamed from `sliceCount` to `sliceOffset` to better reflect its purpose, but this doesn't affect existing call sites since JavaScript/TypeScript function calls are position-based.

Applied to files:

  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T14:15:05.259Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.

Applied to files:

  • tools/postgres/schema.sql
🧬 Code graph analysis (9)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
  • updateRealPaths (491-503)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (1)
  • additionalCoordinatesKeyPart (65-76)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
  • findOneByDataSourceId (430-433)
  • dataSourceId (93-93)
  • updateMagRealPathsForDataset (828-845)
  • updateAttachmentRealPathsForDataset (1174-1191)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
  • shiftBox (312-312)
  • successful (53-56)
  • failure (58-62)
app/controllers/WKRemoteDataStoreController.scala (1)
  • updateRealPaths (196-208)
app/models/dataset/DatasetUploadToPathsService.scala (1)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
  • addQueryParam (30-33)
  • addQueryParam (33-36)
  • addQueryParam (36-39)
  • addQueryParam (39-42)
  • addQueryParam (42-45)
  • addQueryParam (45-50)
  • addQueryParam (50-53)
  • addQueryParam (53-56)
  • addQueryParam (56-59)
  • addQueryParam (59-63)
  • addQueryParam (63-68)
  • addQueryParam (68-71)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
  • mag (25-25)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
  • addQueryParam (30-33)
  • addQueryParam (33-36)
  • addQueryParam (36-39)
  • addQueryParam (39-42)
  • addQueryParam (42-45)
  • addQueryParam (45-50)
  • addQueryParam (50-53)
  • addQueryParam (53-56)
  • addQueryParam (56-59)
  • addQueryParam (59-63)
  • addQueryParam (63-68)
  • addQueryParam (68-71)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (7)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • since (68-68)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
  • reportDataSources (107-114)
  • reportRealPaths (114-120)
  • nonEmpty (43-43)
  • DataSourcePathInfo (40-44)
  • DataSourcePathInfo (46-48)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayerAttachments.scala (1)
  • allAttachments (19-19)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • resolveMagPath (79-98)
  • resolveMagPath (98-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: backend-tests
🔇 Additional comments (10)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (1)

70-70: LGTM! Explicit parameter improves clarity.

The addition of the explicit allowScalar = false parameter aligns with the codebase-wide refactoring to make toMagLiteral calls more explicit. This ensures the mag is always serialized in full "x-y-z" format for the remote datastore query, avoiding ambiguity.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1)

29-29: LGTM! Explicit allowScalar = false ensures consistent key format.

The explicit parameter correctly adapts to the updated toMagLiteral API and ensures segment index keys always use the full "x-y-z" format, which is appropriate for consistent key construction and lookups.

util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)

40-40: API change properly implemented across codebase.

Verification confirms that all 32+ call sites of toMagLiteral() throughout the codebase have been updated with explicit allowScalar boolean parameters. The breaking API change was implemented comprehensively with no incomplete migrations detected.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)

79-96: LGTM: Signature simplification and clearer mag literal handling.

The removal of the unused layerName parameter and the simplified relative path resolution logic improve maintainability. The explicit allowScalar = false parameter on line 90 makes it clear that the vec3 notation is intended for the fallback case.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

40-54: LGTM: Clean data model for real-path tracking.

The introduction of RealPathInfo with separate path and realPath fields provides clear semantics for tracking both the reference path and the resolved real path. The hasLocalData flag is useful for distinguishing between local and remote data. The extension of DataSourcePathInfo to include attachmentPathInfos successfully aligns with the PR objective to support attachment real paths.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)

70-99: LGTM: Resilient real-path scanning with proper separation of concerns.

The refactored flow correctly separates datasource discovery from real-path resolution, allowing failures in path resolution to be collected without preventing successful datasources from being reported. This achieves the PR objective of making the scan more resilient.


101-128: LGTM: Comprehensive path scanning with proper failure collection.

The implementation correctly processes both mag and attachment paths for each datasource, collecting failures without propagating them. The handling of optional attachments (dataLayer.attachments.map(_.allAttachments).getOrElse(Seq.empty)) is correct, and filtering non-empty path info prevents unnecessary reporting.


130-146: LGTM: Safe real-path resolution with proper local data detection.

The implementation correctly handles both remote and local paths. For local paths, toRealPath() is wrapped in tryo to safely handle failures (e.g., broken symlinks, non-existent paths). The hasLocalData check using startsWith correctly identifies whether the resolved path is within the dataset directory.


161-195: LGTM: Enhanced logging provides good observability.

The updated logging includes timing information, real-path scan statistics, and detailed failure reporting. The verbose mode provides helpful per-team breakdowns, and failure details are logged when available, which aids troubleshooting.


265-274: LGTM: Early mag path population supports downstream processing.

The addMagPaths method correctly populates mag paths during datasource loading, ensuring they're available for subsequent real-path scanning. Using layer.mapped with newMags is the appropriate pattern for creating modified layers, and the call to resolveMagPath properly resolves each mag's path.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)

244-250: Consider refactoring for improved readability.

The conditional assignment pattern is correct but the formatting makes it harder to follow. Consider this alternative structure:

-            val dataSourceWithMagPaths =
-              if (resolveMagPaths)
-                dataSourceWithAttachments.copy(
-                  dataLayers = addMagPaths(path, dataSourceWithAttachments)
-                )
-              else dataSourceWithAttachments
-            dataSourceWithMagPaths.copy(id)
+            val dataSourceWithIdAndAttachments = dataSourceWithAttachments.copy(id)
+            if (resolveMagPaths)
+              dataSourceWithIdAndAttachments.copy(
+                dataLayers = addMagPaths(path, dataSourceWithAttachments)
+              )
+            else dataSourceWithIdAndAttachments

Or more concisely:

-            val dataSourceWithMagPaths =
-              if (resolveMagPaths)
-                dataSourceWithAttachments.copy(
-                  dataLayers = addMagPaths(path, dataSourceWithAttachments)
-                )
-              else dataSourceWithAttachments
-            dataSourceWithMagPaths.copy(id)
+            val updatedDataLayers = if (resolveMagPaths) addMagPaths(path, dataSourceWithAttachments) else dataSourceWithAttachments.dataLayers
+            dataSourceWithAttachments.copy(id = id, dataLayers = updatedDataLayers)

268-277: Consider adding documentation for the method's purpose.

While the implementation is straightforward, adding a brief comment explaining when and why addMagPaths is called would help future maintainers. For example:

/**
 * Eagerly resolves and populates mag paths for all layers during datasource scanning.
 * This supports early path determination as part of the initial scan rather than as a separate request.
 */
private def addMagPaths(dataSourcePath: Path, dataSource: UsableDataSource): List[StaticLayer] =
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 961de3f and 27c12bc.

📒 Files selected for processing (2)
  • unreleased_changes/9019.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • unreleased_changes/9019.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • since (68-68)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
  • reportDataSources (107-114)
  • reportRealPaths (114-120)
  • nonEmpty (43-43)
  • DataSourcePathInfo (40-44)
  • DataSourcePathInfo (46-48)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • resolveMagPath (79-98)
  • resolveMagPath (98-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)

71-71: Excellent logging and timing improvements.

The addition of detailed timing, realpath scan summaries, and verbose failure reporting significantly improves observability. The per-team breakdown and conditional verbose logging strike a good balance between information and noise.

Also applies to: 81-91, 161-195


101-128: Resilient scanning architecture is well-implemented.

The separation of successful path resolutions from failures (lines 101-106) and the per-datasource scanning (lines 108-128) effectively achieve the PR objective: a single failing mag or attachment no longer prevents storing other paths for the datasource.


130-146: Mag path resolution is correct and handles remote/local cases appropriately.

The method properly distinguishes between remote paths (returned as-is) and local paths (resolved with toRealPath() to handle symlinks). The hasLocalData flag correctly identifies whether the resolved path is within the dataset directory.


148-159: Attachment path handling is sound with defensive absolute path check.

The explicit check on Line 153 that attachment.path.isAbsolute addresses the previous review concern about ensuring attachment paths are absolute before calling toRealPath(). This defensive check prevents incorrect resolution against the current working directory if a relative path were to slip through the upstream resolution logic.

Based on learnings

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

489-495: Consider more informative error message for disallowed paths.

The error message "dataset.upload.disallowedPaths" is generic and doesn't indicate which specific paths failed validation. While this may be intentional for security reasons, consider whether logging the problematic paths (server-side only, not in the user-facing error) would aid debugging without compromising security.

Example enhancement (if security permits):

     for {
-      _ <- Fox.fromBool(usableDataSource.allExplicitPaths.forall(dataVaultService.pathIsAllowedToAddDirectly)) ?~> "dataset.upload.disallowedPaths"
+      disallowedPaths = usableDataSource.allExplicitPaths.filterNot(dataVaultService.pathIsAllowedToAddDirectly)
+      _ = if (disallowedPaths.nonEmpty) logger.warn(s"Upload rejected due to disallowed paths: $disallowedPaths")
+      _ <- Fox.fromBool(disallowedPaths.isEmpty) ?~> "dataset.upload.disallowedPaths"
     } yield ()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27c12bc and 7412538.

📒 Files selected for processing (9)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • app/models/dataset/WKRemoteDataStoreClient.scala (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (8 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/models/dataset/WKRemoteDataStoreClient.scala
  • app/controllers/WKRemoteDataStoreController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
  • conf/webknossos.latest.routes
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/DatasetService.scala
  • app/models/dataset/Dataset.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

Applied to files:

  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
  • jsonFormat (82-94)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
  • findOneByDataSourceId (430-433)
  • dataSourceId (93-93)
  • updateMagRealPathsForDataset (828-845)
  • updateAttachmentRealPathsForDataset (1183-1200)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
  • Fox (28-228)
  • Fox (230-303)
  • successful (51-54)
  • serialCombined (93-97)
  • serialCombined (97-109)
app/controllers/WKRemoteDataStoreController.scala (1)
  • updateRealPaths (193-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
  • dataSourceFromDir (233-267)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
  • Fox (28-228)
  • Fox (230-303)
  • fromBool (30-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • allExplicitPaths (44-51)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
  • allExplicitPaths (97-97)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)
  • pathIsAllowedToAddDirectly (135-141)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
  • Instant (14-45)
  • Instant (47-103)
  • now (48-48)
  • since (68-68)
  • toString (15-15)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
  • reportDataSources (107-114)
  • reportRealPaths (114-120)
  • nonEmpty (43-43)
  • DataSourcePathInfo (40-44)
  • DataSourcePathInfo (46-48)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • resolveMagPath (79-98)
  • resolveMagPath (98-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (12)
  • isRemote (113-113)
  • isRemote (178-178)
  • toLocalPath (111-111)
  • toLocalPath (176-176)
  • startsWith (138-142)
  • startsWith (198-205)
  • UPath (54-96)
  • fromLocalPath (80-80)
  • isAbsolute (99-99)
  • isAbsolute (147-147)
  • toString (106-109)
  • toString (174-174)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
  • formatDuration (30-82)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

432-434: LGTM: Deferring mag path resolution as intended.

The addition of resolveMagPaths = false aligns with the PR objectives to make realpath scanning more resilient by deferring path resolution to a later, separate stage rather than during initial datasource loading.


560-560: Correct: Explicit allowScalar parameter as required.

The call to mag.toMagLiteral(true) with an explicit boolean parameter aligns with the PR requirement for explicit Vec3Int.toMagLiteral(allowScalar) usage. The true value appropriately allows scalar representation for path resolution.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)

1144-1149: Consider removing unused columns from SELECT.

The query selects realpath and hasLocalData columns, but parseRow (lines 1118-1122) doesn't extract or use them. If these columns aren't needed for LayerAttachment construction, consider removing them from the SELECT clause for clarity and minor performance improvement.

Apply this diff if the columns are not needed:

       rows <- run(
-        q"""SELECT _dataset, layerName, name, path, realpath, hasLocalData, type, dataFormat, uploadToPathIsPending
+        q"""SELECT _dataset, layerName, name, path, type, dataFormat, uploadToPathIsPending
                FROM webknossos.dataset_layer_attachments
                WHERE _dataset = $datasetId
                AND layerName = $layerName
                AND NOT uploadToPathIsPending""".as[DatasetLayerAttachmentsRow])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7412538 and dcb23d6.

📒 Files selected for processing (2)
  • app/models/dataset/Dataset.scala (12 hunks)
  • app/models/dataset/DatasetUploadToPathsService.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/models/dataset/DatasetUploadToPathsService.scala
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/Dataset.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

Applied to files:

  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (1)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromString (59-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (6)
app/models/dataset/Dataset.scala (6)

34-34: LGTM: Import updated to reflect renamed type.

The import change from MagPathInfo to RealPathInfo aligns with the API evolution in the datastore module.


829-844: LGTM: Method refactored to update real paths by logical path.

The renamed method now updates realPath and hasLocalData for mags by matching on path within a dataset. The Serializable transaction isolation with retry logic is appropriate for preventing concurrent update conflicts.


1186-1201: LGTM: Attachment real path update mirrors mag logic.

The new method consistently applies the same transaction isolation and retry semantics as the mag equivalent, maintaining symmetry across the codebase.


785-814: LGTM: Deduplication logic correctly handles real paths.

The use of COALESCE(mag.realPath, mag.path) for partitioning ensures that physically shared mags are counted only once, regardless of whether realPath is populated. The window function with ROW_NUMBER and filtering on rn = 1 is the correct approach for this deduplication.


1270-1298: LGTM: Attachment deduplication mirrors mag logic.

The partitioning strategy using COALESCE(att.realPath, att.path) is consistent with the mag implementation and correctly handles attachment deduplication.


872-891: LGTM: Path uniqueness checks account for real paths.

Both methods correctly identify paths used exclusively by a single dataset by checking both logical path equality and real path matches. The realpath IS NOT NULL guard prevents false positives from NULL comparisons.

Also applies to: 1301-1319

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bf54ca and 9f51abc.

📒 Files selected for processing (3)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/Dataset.scala (12 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/controllers/WKRemoteDataStoreController.scala
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.

Applied to files:

  • app/models/dataset/DatasetService.scala
  • app/models/dataset/Dataset.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.

Applied to files:

  • app/models/dataset/Dataset.scala
🧬 Code graph analysis (2)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
  • findOneByDataSourceId (440-443)
  • dataSourceId (94-94)
  • updateMagRealPathsForDataset (839-856)
  • updateAttachmentRealPathsForDataset (1196-1213)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
  • Fox (28-228)
  • Fox (230-303)
  • successful (51-54)
  • serialCombined (93-97)
  • serialCombined (97-109)
app/controllers/WKRemoteDataStoreController.scala (1)
  • updateRealPaths (193-205)
app/models/dataset/Dataset.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • RealPathInfo (50-50)
  • RealPathInfo (52-54)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromString (59-59)
🔇 Additional comments (10)
app/models/dataset/DatasetService.scala (3)

47-47: LGTM!

The addition of datasetLayerAttachmentsDAO to the constructor is necessary to support attachment real-path updates alongside mag real-path updates.


479-494: Good refactoring: improved encapsulation and added attachment support.

The method is now private and accepts a structured DataSourcePathInfo parameter, which is cleaner. The addition of attachment real-path updates (line 485) alongside mag updates (line 484) aligns with the PR objectives. Error handling for virtual datasets and non-existent datasets is preserved correctly.


496-499: LGTM!

The delegation to updateRealPathsForDataSource via Fox.serialCombined is correct. Sequential processing is appropriate for database updates with transaction isolation.

app/models/dataset/Dataset.scala (7)

34-34: LGTM!

The import of RealPathInfo supports the updated real-path APIs for both mags and attachments.


807-807: LGTM!

The COALESCE(mag.realPath, mag.path) deduplication logic correctly prioritizes physical paths (realPath) over logical paths when available, preventing duplicate storage counts for shared data. The same pattern is consistently applied to attachments at line 1291.


838-854: LGTM!

The method rename to updateMagRealPathsForDataset and parameter change to Seq[RealPathInfo] align with the new real-path model. The UPDATE logic correctly sets realPath and hasLocalData for records matching the logical path.


1195-1211: LGTM!

The new updateAttachmentRealPathsForDataset method correctly mirrors the mags variant, ensuring consistent real-path updates for attachments. The transaction isolation and retry logic are appropriately maintained.


1154-1159: LGTM!

The SELECT statement now retrieves realpath and hasLocalData fields, supporting the database schema evolution. Note that these fields are used for internal storage tracking and deduplication (as seen in lines 1291, 1323, 1341) but are not exposed in the LayerAttachment object returned by parseRow (lines 1128-1132), which is correct.


1320-1325: LGTM!

The updated logic correctly considers both logical path matches and real-path matches when determining if an attachment is used by other datasets. This prevents accidental deletion of shared physical data that's referenced via different logical paths. The NULL check on realpath is appropriate.


1341-1342: LGTM!

The query now correctly checks realpath instead of path to determine if attachments reside in a specific physical directory. This ensures accurate detection of datasets affected by directory deletion, consistent with the mags logic at lines 913-914.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks good 🗡️
Left one question and 3 mini suggestions :)

Testing went well 👍

override def toString: String = s"($x, $y, $z)"

def toMagLiteral(allowScalar: Boolean = false): String =
def toMagLiteral(allowScalar: Boolean): String =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 in some occurrences toMagLiteral you already changed the parameter and now made it mandatory to avoid bugs, correct?
If yes, what was the actual bug fixed here? I cannot find it in the pr description 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly a bug, but there was one usage where I’d rather have allowScalar=true where it wasn’t passed, so the default false was used. This happened in the DatasetUploadToPathsService. The new paths generated by wk for the uploads should have short-form mags for consistency.

And yes, correct, I decided to remove the default to make the code more explicit and force callers to be mindful of which behavior is desired.

@fm3 fm3 merged commit 6e800e0 into master Nov 11, 2025
5 checks passed
@fm3 fm3 deleted the realpath-scan branch November 11, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make realpath scan more resilient, support attachments

3 participants