-
Notifications
You must be signed in to change notification settings - Fork 7
Add PromptDA depth estimation integration #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add PromptDADepthEstimator class for ONNX-based depth estimation - Include model_fp16.onnx model file - Add PromptDA toggle in UI with purple brain icon - Update ARViewModel to support switching between ARKit and PromptDA depth - Add onnxruntime dependencies via Swift Package Manager - Add README_PromptDA.md documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @ioridev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the application's depth sensing capabilities by integrating PromptDA, a machine learning-based depth estimation model. This provides users with an alternative to the standard ARKit depth, offering more flexibility and potentially different depth capture characteristics. The changes include adding the necessary model and dependencies, implementing the core estimation logic, and providing a user-friendly toggle in the application's interface.
Highlights
- New Depth Estimation Method: Integrated PromptDA, an alternative depth estimation method, allowing the application to use a machine learning model for depth sensing in addition to ARKit's built-in capabilities.
- Model Integration: Added the
model_fp16.onnx
ONNX model and theonnxruntime-swift-package-manager
dependency to facilitate the PromptDA depth estimation process. - New Class for Depth Estimation: Introduced the
PromptDADepthEstimator
class, which encapsulates the logic for preparing inputs, running ONNX inference, and processing the model's output into a depth map. - User Interface Control: Added a new toggle button in the UI (represented by a purple brain icon) that allows users to easily switch between using ARKit's depth sensing and the new PromptDA depth estimation.
- Documentation: Included a new
README_PromptDA.md
file that provides comprehensive setup instructions, usage details, and debugging tips for the PromptDA integration.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully integrates PromptDA depth estimation, offering an alternative to ARKit. The core logic is well-encapsulated in PromptDADepthEstimator.swift
. Key feedback points include addressing a critical force unwrap, ensuring data synchronization for saved images/depth maps, clarifying output resolutions, and adopting more idiomatic Swift error handling. The addition of README_PromptDA.md
is a good step for documentation.
context.render(scaledImage, to: buffer) | ||
} | ||
|
||
return resizedBuffer! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force unwrapping resizedBuffer!
can lead to a crash if CVPixelBufferCreate
(line 287) fails and resizedBuffer
remains nil
. It's safer to handle the optional gracefully, for example, by returning nil
if buffer creation fails.
return resizedBuffer! | |
guard let buffer = resizedBuffer else { | |
print("Error: Failed to create resized pixel buffer in resizePixelBuffer") | |
return pixelBuffer // Or return nil, depending on desired contract | |
} | |
context.render(scaledImage, to: buffer) | |
return buffer |
latestImage = frame.capturedImage | ||
let currentTime = CACurrentMediaTime() | ||
|
||
if currentTime - lastDepthUpdate >= depthUpdateInterval { | ||
lastDepthUpdate = currentTime // タイマーを更新 | ||
|
||
// DepthMapの処理と表示 | ||
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap { | ||
processDepthMap(depthMap) | ||
} | ||
|
||
// ConfidenceMapの処理と表示 | ||
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { | ||
processConfidenceMap(confidenceMap) | ||
// Use PromptDA for depth estimation if enabled | ||
if usePromptDA, let estimator = depthEstimator { | ||
if estimator.shouldProcessFrame(timestamp: frame.timestamp) { | ||
print("ARViewModel: Using PromptDA for depth estimation") | ||
// For preview, don't resize to save performance | ||
if let promptDADepth = estimator.estimateDepth(from: frame.capturedImage, lidarDepth: frame.sceneDepth?.depthMap, resizeToOriginal: false) { | ||
latestDepthMap = promptDADepth | ||
print("ARViewModel: PromptDA depth estimation successful") | ||
processDepthMap(promptDADepth) | ||
|
||
// Also create confidence map from PromptDA | ||
if showConfidenceMap, let confMap = estimator.createConfidenceMap(from: promptDADepth) { | ||
processConfidenceMap(confMap) | ||
} | ||
} else { | ||
print("ARViewModel: PromptDA depth estimation failed, falling back to ARKit depth") | ||
// Fall back to ARKit depth | ||
latestDepthMap = frame.sceneDepth?.depthMap | ||
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap { | ||
processDepthMap(depthMap) | ||
} | ||
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { | ||
processConfidenceMap(confidenceMap) | ||
} | ||
} | ||
} else { | ||
// Not time to process with PromptDA, use ARKit depth | ||
latestDepthMap = frame.sceneDepth?.depthMap | ||
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap { | ||
processDepthMap(depthMap) | ||
} | ||
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { | ||
processConfidenceMap(confidenceMap) | ||
} | ||
} | ||
} else { | ||
// PromptDA disabled or not initialized, use ARKit depth | ||
latestDepthMap = frame.sceneDepth?.depthMap | ||
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap { | ||
processDepthMap(depthMap) | ||
} | ||
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { | ||
processConfidenceMap(confidenceMap) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential synchronization issue when usePromptDA
is false
. latestImage
is updated every frame (line 40), but latestDepthMap
(when using ARKit, i.e., !usePromptDA
) is updated within the throttled block (e.g., current line 83). If saveDepthMap()
is called (line 94), and usePromptDA
is false
, depthMapToSave
(line 109) will use this throttled latestDepthMap
.
This can lead to latestImage
being more recent than latestDepthMap
, resulting in a saved image and depth map that don't perfectly correspond to the same moment in time.
To ensure synchronization, latestDepthMap
should be updated with the current frame's ARKit depth (frame.sceneDepth?.depthMap
) unconditionally at the beginning of session(_:didUpdate:)
if it's intended as the primary source or fallback. PromptDA can then overwrite it if active and successful.
latestImage = frame.capturedImage
// Always keep latestDepthMap updated with the most recent ARKit depth.
// If PromptDA is active and successful, it will overwrite this later.
// This ensures that if PromptDA is off, or fails, or for the LiDAR prompt,
// we have the freshest ARKit depth available.
if let currentARKitDepth = frame.sceneDepth?.depthMap {
latestDepthMap = currentARKitDepth
}
let currentTime = CACurrentMediaTime()
if currentTime - lastDepthUpdate >= depthUpdateInterval {
lastDepthUpdate = currentTime // タイマーを更新
// Use PromptDA for depth estimation if enabled
if usePromptDA, let estimator = depthEstimator {
if estimator.shouldProcessFrame(timestamp: frame.timestamp) {
print("ARViewModel: Using PromptDA for depth estimation")
// For preview, don't resize to save performance
// Pass the most current ARKit depth as the LiDAR prompt
if let promptDADepth = estimator.estimateDepth(from: frame.capturedImage, lidarDepth: frame.sceneDepth?.depthMap, resizeToOriginal: false) {
latestDepthMap = promptDADepth // PromptDA's result becomes the new latestDepthMap
print("ARViewModel: PromptDA depth estimation successful")
processDepthMap(promptDADepth)
// Also create confidence map from PromptDA
if showConfidenceMap, let confMap = estimator.createConfidenceMap(from: promptDADepth) {
processConfidenceMap(confMap)
}
} else {
print("ARViewModel: PromptDA depth estimation failed, falling back to ARKit depth")
// latestDepthMap already holds current ARKit depth from above
if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit)
processDepthMap(depthMap)
}
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
}
}
} else {
// Not time to process with PromptDA, use ARKit depth for display
// latestDepthMap remains as is (either previous PromptDA or current ARKit)
if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit)
processDepthMap(depthMap)
}
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
}
}
} else {
// PromptDA disabled or not initialized, use ARKit depth (already set in self.latestDepthMap)
if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit)
processDepthMap(depthMap)
}
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
}
}
}
DepthCamera/ARViewModel.swift
Outdated
// Force process the current frame for capture (keep 182x252 resolution) | ||
depthMapToSave = estimator.estimateDepth(from: latestImage, lidarDepth: latestDepthMap, resizeToOriginal: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 100 mentions "keep 182x252 resolution", but the model's input (and likely output) size is 192x256 (height x width) as defined in PromptDADepthEstimator
(modelSize = CGSize(width: 256, height: 192)
). This seems to be a minor typo in the comment.
More importantly, resizeToOriginal: false
(line 101) means that when PromptDA is used for capture, the saved depth map will be at the model's output resolution (e.g., 192x256). If the corresponding RGB image is saved at a higher, original camera resolution, this discrepancy might be unexpected for users. Consider documenting this in README_PromptDA.md
.
throw NSError(domain: "PromptDADepthEstimator", code: -1, | ||
userInfo: [NSLocalizedDescriptionKey: "model_fp16.onnx not found in bundle"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using NSError
for custom error conditions (like model not found, ONNX environment failure, etc.) is less idiomatic in Swift compared to defining a custom Error
enum. A custom enum (e.g., PromptDAError: Error
) would provide better type safety and improve code clarity.
This applies to other NSError
instantiations in this file as well (lines 43-44, 241-242, 259-260).
attributes as CFDictionary, | ||
&dstBuffer) | ||
|
||
guard let destBuffer = dstBuffer else { return depthBuffer } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CVPixelBufferCreate
for dstBuffer
(line 313) fails, dstBuffer
will be nil
. In this case, the guard let destBuffer = dstBuffer else { return depthBuffer }
statement will return the original, unresized depthBuffer
.
This might be misleading for the caller, who expects either a resized buffer or an indication of failure (like nil
). Consider returning nil
if the destination buffer cannot be created.
guard let destBuffer = dstBuffer else { return depthBuffer } | |
guard let destBuffer = dstBuffer else { | |
print("Error: Failed to create destination buffer in resizeDepthBuffer") | |
return nil // Or throw an error | |
} |
- Add NaN and Infinite value checks before UInt8 conversion - Replace invalid depth values with 0.0 to prevent crashes - Objects that are too close will now appear black instead of crashing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add NaN/Infinite value checks in processDepthMap to prevent crashes - Fix bytesPerRow handling in processDepthMap for correct pixel access - Add 16-bit PNG export option alongside TIFF for better compatibility - Fix preview/save mismatch by using same depth map for both operations - Remove unnecessary depth map regeneration during save - Add debug logging for troubleshooting The main issue was that preview and save were using different depth maps. Preview used the cached latestDepthMap while save was regenerating a new one, causing inconsistencies. Now both use the same cached depth map. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Document the investigation and resolution of depth map preview/save mismatch, including root cause analysis and lessons learned. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Re-enable 90-degree rotation for depth map preview to match expected portrait orientation. PromptDA outputs 252x182 (landscape) which needs rotation for correct display. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Summary
Features Added
PromptDADepthEstimator
class for ONNX-based depth estimationmodel_fp16.onnx
for depth processingREADME_PromptDA.md
Test plan
🤖 Generated with Claude Code