-
Notifications
You must be signed in to change notification settings - Fork 30
Make Realpath Scans More Resilient #9019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRenames datastore endpoint to updateRealPaths, replaces MagPathInfo with RealPathInfo across clients/controllers/services, adds realPath and hasLocalData to attachments with DAO/migration support, updates mag-to-literal calls to require explicit allowScalar, and reworks scanning to collect real-paths with failure aggregation. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-05-12T13:07:29.637ZApplied to files:
📚 Learning: 2025-04-23T08:51:57.756ZApplied to files:
🧬 Code graph analysis (1)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)
1262-1283: Deduplicate attachments by real path as well.We now record
realPathfor attachments, but the storage query still partitions solely by the originalpath, so symlinked or otherwise relocated files with differing logical paths will continue to be counted multiple times. Mirror the mags query and fall back topathonly whenrealPathis unavailable.- ROW_NUMBER() OVER ( - PARTITION BY att.path + ROW_NUMBER() OVER ( + PARTITION BY COALESCE(att.realPath, att.path) ORDER BY ds.created ASC ) AS rn @@ - WHERE ranked.rn = 1 + WHERE ranked.rn = 1 AND ranked._organization = $organizationId AND ranked._dataStore = $dataStoreId
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
app/controllers/WKRemoteDataStoreController.scala(1 hunks)app/models/dataset/Dataset.scala(4 hunks)app/models/dataset/DatasetService.scala(2 hunks)app/models/dataset/DatasetUploadToPathsService.scala(1 hunks)app/models/dataset/WKRemoteDataStoreClient.scala(1 hunks)conf/evolutions/145-attachment-realpaths.sql(1 hunks)conf/evolutions/reversions/145-attachment-realpaths.sql(1 hunks)conf/webknossos.latest.routes(1 hunks)tools/postgres/schema.sql(2 hunks)unreleased_changes/9019.md(1 hunks)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(7 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2024-11-22T17:18:04.217Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/proofread_saga.ts:1039-1039
Timestamp: 2024-11-22T17:18:04.217Z
Learning: In `frontend/javascripts/oxalis/model/sagas/proofread_saga.ts`, when calling `getMagInfo`, the use of `volumeTracingLayer.resolutions` is intentional and should not be changed to `volumeTracingLayer.mags`.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scalawebknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scalaapp/models/dataset/DatasetService.scalaapp/models/dataset/DatasetUploadToPathsService.scalaapp/models/dataset/Dataset.scalawebknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scalaapp/models/dataset/WKRemoteDataStoreClient.scala
📚 Learning: 2024-11-22T17:19:07.947Z
Learnt from: dieknolle3333
PR: scalableminds/webknossos#8168
File: frontend/javascripts/oxalis/model/sagas/volumetracing_saga.tsx:433-434
Timestamp: 2024-11-22T17:19:07.947Z
Learning: In the codebase, certain usages of `segmentationLayer.resolutions` are intentionally retained and should not be changed to `segmentationLayer.mags` during refactoring.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
PR: scalableminds/webknossos#8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Applied to files:
app/models/dataset/Dataset.scalatools/postgres/schema.sql
📚 Learning: 2025-05-07T06:17:32.810Z
Learnt from: philippotto
PR: scalableminds/webknossos#8602
File: frontend/javascripts/oxalis/model/volumetracing/volume_annotation_sampling.ts:365-366
Timestamp: 2025-05-07T06:17:32.810Z
Learning: The parameter in applyVoxelMap was renamed from `sliceCount` to `sliceOffset` to better reflect its purpose, but this doesn't affect existing call sites since JavaScript/TypeScript function calls are position-based.
Applied to files:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala
📚 Learning: 2025-05-12T14:15:05.259Z
Learnt from: frcroth
PR: scalableminds/webknossos#8609
File: conf/evolutions/133-datasource-properties-in-db.sql:8-16
Timestamp: 2025-05-12T14:15:05.259Z
Learning: The database schema in WEBKNOSSOS has separate tables for dataset layers (`dataset_layers`) and magnifications (`dataset_mags`). The `dataFormat` field is stored in the layers table while magnification-specific fields like `cubeLength` (specific to WKW format) are stored in the mags table.
Applied to files:
tools/postgres/schema.sql
🧬 Code graph analysis (9)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/DatasetService.scala (1)
updateRealPaths(491-503)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (2)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingBucketHelper.scala (1)
additionalCoordinatesKeyPart(65-76)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
findOneByDataSourceId(430-433)dataSourceId(93-93)updateMagRealPathsForDataset(828-845)updateAttachmentRealPathsForDataset(1174-1191)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
shiftBox(312-312)successful(53-56)failure(58-62)app/controllers/WKRemoteDataStoreController.scala (1)
updateRealPaths(196-208)
app/models/dataset/DatasetUploadToPathsService.scala (1)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
RealPathInfo(50-50)RealPathInfo(52-54)app/utils/sql/SqlInterpolation.scala (2)
q(20-39)asUpdate(74-74)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
addQueryParam(30-33)addQueryParam(33-36)addQueryParam(36-39)addQueryParam(39-42)addQueryParam(42-45)addQueryParam(45-50)addQueryParam(50-53)addQueryParam(53-56)addQueryParam(56-59)addQueryParam(59-63)addQueryParam(63-68)addQueryParam(68-71)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
mag(25-25)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromLocalPath(80-80)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
app/models/dataset/WKRemoteDataStoreClient.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/rpc/RPCRequest.scala (12)
addQueryParam(30-33)addQueryParam(33-36)addQueryParam(36-39)addQueryParam(39-42)addQueryParam(42-45)addQueryParam(45-50)addQueryParam(50-53)addQueryParam(53-56)addQueryParam(56-59)addQueryParam(59-63)addQueryParam(63-68)addQueryParam(68-71)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral(40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (7)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
Instant(14-45)Instant(47-103)now(48-48)since(68-68)toString(15-15)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
reportDataSources(107-114)reportRealPaths(114-120)nonEmpty(43-43)DataSourcePathInfo(40-44)DataSourcePathInfo(46-48)RealPathInfo(50-50)RealPathInfo(52-54)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayerAttachments.scala (1)
allAttachments(19-19)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
resolveMagPath(79-98)resolveMagPath(98-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromLocalPath(80-80)util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
formatDuration(30-82)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped(104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: backend-tests
🔇 Additional comments (10)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TSRemoteDatastoreClient.scala (1)
70-70: LGTM! Explicit parameter improves clarity.The addition of the explicit
allowScalar = falseparameter aligns with the codebase-wide refactoring to maketoMagLiteralcalls more explicit. This ensures the mag is always serialized in full "x-y-z" format for the remote datastore query, avoiding ambiguity.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentIndexBuffer.scala (1)
29-29: LGTM! ExplicitallowScalar = falseensures consistent key format.The explicit parameter correctly adapts to the updated
toMagLiteralAPI and ensures segment index keys always use the full "x-y-z" format, which is appropriate for consistent key construction and lookups.util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
40-40: API change properly implemented across codebase.Verification confirms that all 32+ call sites of
toMagLiteral()throughout the codebase have been updated with explicitallowScalarboolean parameters. The breaking API change was implemented comprehensively with no incomplete migrations detected.webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)
79-96: LGTM: Signature simplification and clearer mag literal handling.The removal of the unused
layerNameparameter and the simplified relative path resolution logic improve maintainability. The explicitallowScalar = falseparameter on line 90 makes it clear that the vec3 notation is intended for the fallback case.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
40-54: LGTM: Clean data model for real-path tracking.The introduction of
RealPathInfowith separatepathandrealPathfields provides clear semantics for tracking both the reference path and the resolved real path. ThehasLocalDataflag is useful for distinguishing between local and remote data. The extension ofDataSourcePathInfoto includeattachmentPathInfossuccessfully aligns with the PR objective to support attachment real paths.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
70-99: LGTM: Resilient real-path scanning with proper separation of concerns.The refactored flow correctly separates datasource discovery from real-path resolution, allowing failures in path resolution to be collected without preventing successful datasources from being reported. This achieves the PR objective of making the scan more resilient.
101-128: LGTM: Comprehensive path scanning with proper failure collection.The implementation correctly processes both mag and attachment paths for each datasource, collecting failures without propagating them. The handling of optional attachments (
dataLayer.attachments.map(_.allAttachments).getOrElse(Seq.empty)) is correct, and filtering non-empty path info prevents unnecessary reporting.
130-146: LGTM: Safe real-path resolution with proper local data detection.The implementation correctly handles both remote and local paths. For local paths,
toRealPath()is wrapped intryoto safely handle failures (e.g., broken symlinks, non-existent paths). ThehasLocalDatacheck usingstartsWithcorrectly identifies whether the resolved path is within the dataset directory.
161-195: LGTM: Enhanced logging provides good observability.The updated logging includes timing information, real-path scan statistics, and detailed failure reporting. The verbose mode provides helpful per-team breakdowns, and failure details are logged when available, which aids troubleshooting.
265-274: LGTM: Early mag path population supports downstream processing.The
addMagPathsmethod correctly populates mag paths during datasource loading, ensuring they're available for subsequent real-path scanning. Usinglayer.mappedwithnewMagsis the appropriate pattern for creating modified layers, and the call toresolveMagPathproperly resolves each mag's path.
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
244-250: Consider refactoring for improved readability.The conditional assignment pattern is correct but the formatting makes it harder to follow. Consider this alternative structure:
- val dataSourceWithMagPaths = - if (resolveMagPaths) - dataSourceWithAttachments.copy( - dataLayers = addMagPaths(path, dataSourceWithAttachments) - ) - else dataSourceWithAttachments - dataSourceWithMagPaths.copy(id) + val dataSourceWithIdAndAttachments = dataSourceWithAttachments.copy(id) + if (resolveMagPaths) + dataSourceWithIdAndAttachments.copy( + dataLayers = addMagPaths(path, dataSourceWithAttachments) + ) + else dataSourceWithIdAndAttachmentsOr more concisely:
- val dataSourceWithMagPaths = - if (resolveMagPaths) - dataSourceWithAttachments.copy( - dataLayers = addMagPaths(path, dataSourceWithAttachments) - ) - else dataSourceWithAttachments - dataSourceWithMagPaths.copy(id) + val updatedDataLayers = if (resolveMagPaths) addMagPaths(path, dataSourceWithAttachments) else dataSourceWithAttachments.dataLayers + dataSourceWithAttachments.copy(id = id, dataLayers = updatedDataLayers)
268-277: Consider adding documentation for the method's purpose.While the implementation is straightforward, adding a brief comment explaining when and why
addMagPathsis called would help future maintainers. For example:/** * Eagerly resolves and populates mag paths for all layers during datasource scanning. * This supports early path determination as part of the initial scan rather than as a separate request. */ private def addMagPaths(dataSourcePath: Path, dataSource: UsableDataSource): List[StaticLayer] =
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
unreleased_changes/9019.md(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unreleased_changes/9019.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
Instant(14-45)Instant(47-103)now(48-48)since(68-68)toString(15-15)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
reportDataSources(107-114)reportRealPaths(114-120)nonEmpty(43-43)DataSourcePathInfo(40-44)DataSourcePathInfo(46-48)RealPathInfo(50-50)RealPathInfo(52-54)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
resolveMagPath(79-98)resolveMagPath(98-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromLocalPath(80-80)util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
formatDuration(30-82)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped(104-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
71-71: Excellent logging and timing improvements.The addition of detailed timing, realpath scan summaries, and verbose failure reporting significantly improves observability. The per-team breakdown and conditional verbose logging strike a good balance between information and noise.
Also applies to: 81-91, 161-195
101-128: Resilient scanning architecture is well-implemented.The separation of successful path resolutions from failures (lines 101-106) and the per-datasource scanning (lines 108-128) effectively achieve the PR objective: a single failing mag or attachment no longer prevents storing other paths for the datasource.
130-146: Mag path resolution is correct and handles remote/local cases appropriately.The method properly distinguishes between remote paths (returned as-is) and local paths (resolved with
toRealPath()to handle symlinks). ThehasLocalDataflag correctly identifies whether the resolved path is within the dataset directory.
148-159: Attachment path handling is sound with defensive absolute path check.The explicit check on Line 153 that
attachment.path.isAbsoluteaddresses the previous review concern about ensuring attachment paths are absolute before callingtoRealPath(). This defensive check prevents incorrect resolution against the current working directory if a relative path were to slip through the upstream resolution logic.Based on learnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
489-495: Consider more informative error message for disallowed paths.The error message
"dataset.upload.disallowedPaths"is generic and doesn't indicate which specific paths failed validation. While this may be intentional for security reasons, consider whether logging the problematic paths (server-side only, not in the user-facing error) would aid debugging without compromising security.Example enhancement (if security permits):
for { - _ <- Fox.fromBool(usableDataSource.allExplicitPaths.forall(dataVaultService.pathIsAllowedToAddDirectly)) ?~> "dataset.upload.disallowedPaths" + disallowedPaths = usableDataSource.allExplicitPaths.filterNot(dataVaultService.pathIsAllowedToAddDirectly) + _ = if (disallowedPaths.nonEmpty) logger.warn(s"Upload rejected due to disallowed paths: $disallowedPaths") + _ <- Fox.fromBool(disallowedPaths.isEmpty) ?~> "dataset.upload.disallowedPaths" } yield ()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
app/controllers/WKRemoteDataStoreController.scala(1 hunks)app/models/dataset/Dataset.scala(4 hunks)app/models/dataset/DatasetService.scala(1 hunks)app/models/dataset/WKRemoteDataStoreClient.scala(1 hunks)conf/webknossos.latest.routes(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala(8 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- app/models/dataset/WKRemoteDataStoreClient.scala
- app/controllers/WKRemoteDataStoreController.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
- conf/webknossos.latest.routes
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scalawebknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/DatasetService.scalaapp/models/dataset/Dataset.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (1)
jsonFormat(82-94)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
findOneByDataSourceId(430-433)dataSourceId(93-93)updateMagRealPathsForDataset(828-845)updateAttachmentRealPathsForDataset(1183-1200)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox(28-228)Fox(230-303)successful(51-54)serialCombined(93-97)serialCombined(97-109)app/controllers/WKRemoteDataStoreController.scala (1)
updateRealPaths(193-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
dataSourceFromDir(233-267)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
Fox(28-228)Fox(230-303)fromBool(30-34)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
allExplicitPaths(44-51)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (1)
allExplicitPaths(97-97)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (1)
pathIsAllowedToAddDirectly(135-141)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (6)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (5)
Instant(14-45)Instant(47-103)now(48-48)since(68-68)toString(15-15)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (7)
reportDataSources(107-114)reportRealPaths(114-120)nonEmpty(43-43)DataSourcePathInfo(40-44)DataSourcePathInfo(46-48)RealPathInfo(50-50)RealPathInfo(52-54)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
resolveMagPath(79-98)resolveMagPath(98-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (12)
isRemote(113-113)isRemote(178-178)toLocalPath(111-111)toLocalPath(176-176)startsWith(138-142)startsWith(198-205)UPath(54-96)fromLocalPath(80-80)isAbsolute(99-99)isAbsolute(147-147)toString(106-109)toString(174-174)util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)
formatDuration(30-82)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped(104-133)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
RealPathInfo(50-50)RealPathInfo(52-54)app/utils/sql/SqlInterpolation.scala (2)
q(20-39)asUpdate(74-74)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
- GitHub Check: backend-tests
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
432-434: LGTM: Deferring mag path resolution as intended.The addition of
resolveMagPaths = falsealigns with the PR objectives to make realpath scanning more resilient by deferring path resolution to a later, separate stage rather than during initial datasource loading.
560-560: Correct: ExplicitallowScalarparameter as required.The call to
mag.toMagLiteral(true)with an explicit boolean parameter aligns with the PR requirement for explicitVec3Int.toMagLiteral(allowScalar)usage. Thetruevalue appropriately allows scalar representation for path resolution.
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
app/models/dataset/Dataset.scala (1)
1144-1149: Consider removing unused columns from SELECT.The query selects
realpathandhasLocalDatacolumns, butparseRow(lines 1118-1122) doesn't extract or use them. If these columns aren't needed forLayerAttachmentconstruction, consider removing them from the SELECT clause for clarity and minor performance improvement.Apply this diff if the columns are not needed:
rows <- run( - q"""SELECT _dataset, layerName, name, path, realpath, hasLocalData, type, dataFormat, uploadToPathIsPending + q"""SELECT _dataset, layerName, name, path, type, dataFormat, uploadToPathIsPending FROM webknossos.dataset_layer_attachments WHERE _dataset = $datasetId AND layerName = $layerName AND NOT uploadToPathIsPending""".as[DatasetLayerAttachmentsRow])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/models/dataset/Dataset.scala(12 hunks)app/models/dataset/DatasetUploadToPathsService.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/dataset/DatasetUploadToPathsService.scala
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/Dataset.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (1)
app/models/dataset/Dataset.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
RealPathInfo(50-50)RealPathInfo(52-54)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromString(59-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (6)
app/models/dataset/Dataset.scala (6)
34-34: LGTM: Import updated to reflect renamed type.The import change from
MagPathInfotoRealPathInfoaligns with the API evolution in the datastore module.
829-844: LGTM: Method refactored to update real paths by logical path.The renamed method now updates
realPathandhasLocalDatafor mags by matching onpathwithin a dataset. The Serializable transaction isolation with retry logic is appropriate for preventing concurrent update conflicts.
1186-1201: LGTM: Attachment real path update mirrors mag logic.The new method consistently applies the same transaction isolation and retry semantics as the mag equivalent, maintaining symmetry across the codebase.
785-814: LGTM: Deduplication logic correctly handles real paths.The use of
COALESCE(mag.realPath, mag.path)for partitioning ensures that physically shared mags are counted only once, regardless of whetherrealPathis populated. The window function withROW_NUMBERand filtering onrn = 1is the correct approach for this deduplication.
1270-1298: LGTM: Attachment deduplication mirrors mag logic.The partitioning strategy using
COALESCE(att.realPath, att.path)is consistent with the mag implementation and correctly handles attachment deduplication.
872-891: LGTM: Path uniqueness checks account for real paths.Both methods correctly identify paths used exclusively by a single dataset by checking both logical path equality and real path matches. The
realpath IS NOT NULLguard prevents false positives from NULL comparisons.Also applies to: 1301-1319
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/controllers/WKRemoteDataStoreController.scala(1 hunks)app/models/dataset/Dataset.scala(12 hunks)app/models/dataset/DatasetService.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/controllers/WKRemoteDataStoreController.scala
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-12T13:07:29.637Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8609
File: app/models/dataset/Dataset.scala:753-775
Timestamp: 2025-05-12T13:07:29.637Z
Learning: In the `updateMags` method of DatasetMagsDAO (Scala), the code handles different dataset types distinctly:
1. Non-WKW datasets have `magsOpt` populated and use the first branch which includes axisOrder, channelIndex, and credentialId.
2. WKW datasets will have `wkwResolutionsOpt` populated and use the second branch which includes cubeLength.
3. The final branch is a fallback for legacy data.
This ensures appropriate fields are populated for each dataset type.
Applied to files:
app/models/dataset/DatasetService.scalaapp/models/dataset/Dataset.scala
📚 Learning: 2025-06-02T09:49:51.047Z
Learnt from: frcroth
Repo: scalableminds/webknossos PR: 8598
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala:89-95
Timestamp: 2025-06-02T09:49:51.047Z
Learning: In WebKnossos dataset layer attachments, multiple file types can safely use the same directory name (like "agglomerates") because the scanning logic filters by file extension. For example, AgglomerateFileInfo scans for .hdf5 files while CumsumFileInfo scans for .json files in the same "agglomerates" directory without interference.
Applied to files:
app/models/dataset/Dataset.scala
🧬 Code graph analysis (2)
app/models/dataset/DatasetService.scala (3)
app/models/dataset/Dataset.scala (4)
findOneByDataSourceId(440-443)dataSourceId(94-94)updateMagRealPathsForDataset(839-856)updateAttachmentRealPathsForDataset(1196-1213)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (5)
Fox(28-228)Fox(230-303)successful(51-54)serialCombined(93-97)serialCombined(97-109)app/controllers/WKRemoteDataStoreController.scala (1)
updateRealPaths(193-205)
app/models/dataset/Dataset.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
RealPathInfo(50-50)RealPathInfo(52-54)app/utils/sql/SqlInterpolation.scala (2)
q(20-39)asUpdate(74-74)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath(54-96)fromString(59-59)
🔇 Additional comments (10)
app/models/dataset/DatasetService.scala (3)
47-47: LGTM!The addition of
datasetLayerAttachmentsDAOto the constructor is necessary to support attachment real-path updates alongside mag real-path updates.
479-494: Good refactoring: improved encapsulation and added attachment support.The method is now private and accepts a structured
DataSourcePathInfoparameter, which is cleaner. The addition of attachment real-path updates (line 485) alongside mag updates (line 484) aligns with the PR objectives. Error handling for virtual datasets and non-existent datasets is preserved correctly.
496-499: LGTM!The delegation to
updateRealPathsForDataSourceviaFox.serialCombinedis correct. Sequential processing is appropriate for database updates with transaction isolation.app/models/dataset/Dataset.scala (7)
34-34: LGTM!The import of
RealPathInfosupports the updated real-path APIs for both mags and attachments.
807-807: LGTM!The
COALESCE(mag.realPath, mag.path)deduplication logic correctly prioritizes physical paths (realPath) over logical paths when available, preventing duplicate storage counts for shared data. The same pattern is consistently applied to attachments at line 1291.
838-854: LGTM!The method rename to
updateMagRealPathsForDatasetand parameter change toSeq[RealPathInfo]align with the new real-path model. The UPDATE logic correctly setsrealPathandhasLocalDatafor records matching the logicalpath.
1195-1211: LGTM!The new
updateAttachmentRealPathsForDatasetmethod correctly mirrors the mags variant, ensuring consistent real-path updates for attachments. The transaction isolation and retry logic are appropriately maintained.
1154-1159: LGTM!The SELECT statement now retrieves
realpathandhasLocalDatafields, supporting the database schema evolution. Note that these fields are used for internal storage tracking and deduplication (as seen in lines 1291, 1323, 1341) but are not exposed in theLayerAttachmentobject returned byparseRow(lines 1128-1132), which is correct.
1320-1325: LGTM!The updated logic correctly considers both logical path matches and real-path matches when determining if an attachment is used by other datasets. This prevents accidental deletion of shared physical data that's referenced via different logical paths. The NULL check on
realpathis appropriate.
1341-1342: LGTM!The query now correctly checks
realpathinstead ofpathto determine if attachments reside in a specific physical directory. This ensures accurate detection of datasets affected by directory deletion, consistent with the mags logic at lines 913-914.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
MichaelBuessemeyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good 🗡️
Left one question and 3 mini suggestions :)
Testing went well 👍
| override def toString: String = s"($x, $y, $z)" | ||
|
|
||
| def toMagLiteral(allowScalar: Boolean = false): String = | ||
| def toMagLiteral(allowScalar: Boolean): String = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 in some occurrences toMagLiteral you already changed the parameter and now made it mandatory to avoid bugs, correct?
If yes, what was the actual bug fixed here? I cannot find it in the pr description 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly a bug, but there was one usage where I’d rather have allowScalar=true where it wasn’t passed, so the default false was used. This happened in the DatasetUploadToPathsService. The new paths generated by wk for the uploads should have short-form mags for consistency.
And yes, correct, I decided to remove the default to make the code more explicit and force callers to be mindful of which behavior is desired.
URL of deployed dev instance (used for testing):
Steps to test:
TODOs:
Issues:
$PR_NUMBER.mdfile inunreleased_changesor use./tools/create-changelog-entry.py)