Skip to content

Conversation

ioridev
Copy link
Owner

@ioridev ioridev commented Jun 18, 2025

Summary

  • Integrate PromptDA depth estimation as an alternative to ARKit's built-in depth sensing
  • Add toggle in UI to switch between ARKit and PromptDA depth estimation
  • Include ONNX model and dependencies for depth processing

Features Added

  • PromptDADepthEstimator class for ONNX-based depth estimation
  • Purple brain icon toggle in UI to enable/disable PromptDA
  • Model file model_fp16.onnx for depth processing
  • Integration with onnxruntime via Swift Package Manager
  • Documentation in README_PromptDA.md

Test plan

  • Verify app builds successfully with new dependencies
  • Test PromptDA toggle switches between depth estimation methods
  • Confirm depth capture works with both ARKit and PromptDA modes
  • Check that saved depth maps are correctly processed

🤖 Generated with Claude Code

- Add PromptDADepthEstimator class for ONNX-based depth estimation
- Include model_fp16.onnx model file
- Add PromptDA toggle in UI with purple brain icon
- Update ARViewModel to support switching between ARKit and PromptDA depth
- Add onnxruntime dependencies via Swift Package Manager
- Add README_PromptDA.md documentation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ioridev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's depth sensing capabilities by integrating PromptDA, a machine learning-based depth estimation model. This provides users with an alternative to the standard ARKit depth, offering more flexibility and potentially different depth capture characteristics. The changes include adding the necessary model and dependencies, implementing the core estimation logic, and providing a user-friendly toggle in the application's interface.

Highlights

  • New Depth Estimation Method: Integrated PromptDA, an alternative depth estimation method, allowing the application to use a machine learning model for depth sensing in addition to ARKit's built-in capabilities.
  • Model Integration: Added the model_fp16.onnx ONNX model and the onnxruntime-swift-package-manager dependency to facilitate the PromptDA depth estimation process.
  • New Class for Depth Estimation: Introduced the PromptDADepthEstimator class, which encapsulates the logic for preparing inputs, running ONNX inference, and processing the model's output into a depth map.
  • User Interface Control: Added a new toggle button in the UI (represented by a purple brain icon) that allows users to easily switch between using ARKit's depth sensing and the new PromptDA depth estimation.
  • Documentation: Included a new README_PromptDA.md file that provides comprehensive setup instructions, usage details, and debugging tips for the PromptDA integration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully integrates PromptDA depth estimation, offering an alternative to ARKit. The core logic is well-encapsulated in PromptDADepthEstimator.swift. Key feedback points include addressing a critical force unwrap, ensuring data synchronization for saved images/depth maps, clarifying output resolutions, and adopting more idiomatic Swift error handling. The addition of README_PromptDA.md is a good step for documentation.

context.render(scaledImage, to: buffer)
}

return resizedBuffer!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Force unwrapping resizedBuffer! can lead to a crash if CVPixelBufferCreate (line 287) fails and resizedBuffer remains nil. It's safer to handle the optional gracefully, for example, by returning nil if buffer creation fails.

Suggested change
return resizedBuffer!
guard let buffer = resizedBuffer else {
print("Error: Failed to create resized pixel buffer in resizePixelBuffer")
return pixelBuffer // Or return nil, depending on desired contract
}
context.render(scaledImage, to: buffer)
return buffer

Comment on lines 40 to 91
latestImage = frame.capturedImage
let currentTime = CACurrentMediaTime()

if currentTime - lastDepthUpdate >= depthUpdateInterval {
lastDepthUpdate = currentTime // タイマーを更新

// DepthMapの処理と表示
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
processDepthMap(depthMap)
}

// ConfidenceMapの処理と表示
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
// Use PromptDA for depth estimation if enabled
if usePromptDA, let estimator = depthEstimator {
if estimator.shouldProcessFrame(timestamp: frame.timestamp) {
print("ARViewModel: Using PromptDA for depth estimation")
// For preview, don't resize to save performance
if let promptDADepth = estimator.estimateDepth(from: frame.capturedImage, lidarDepth: frame.sceneDepth?.depthMap, resizeToOriginal: false) {
latestDepthMap = promptDADepth
print("ARViewModel: PromptDA depth estimation successful")
processDepthMap(promptDADepth)

// Also create confidence map from PromptDA
if showConfidenceMap, let confMap = estimator.createConfidenceMap(from: promptDADepth) {
processConfidenceMap(confMap)
}
} else {
print("ARViewModel: PromptDA depth estimation failed, falling back to ARKit depth")
// Fall back to ARKit depth
latestDepthMap = frame.sceneDepth?.depthMap
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
processDepthMap(depthMap)
}
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
}
}
} else {
// Not time to process with PromptDA, use ARKit depth
latestDepthMap = frame.sceneDepth?.depthMap
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
processDepthMap(depthMap)
}
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
}
}
} else {
// PromptDA disabled or not initialized, use ARKit depth
latestDepthMap = frame.sceneDepth?.depthMap
if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
processDepthMap(depthMap)
}
if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
processConfidenceMap(confidenceMap)
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential synchronization issue when usePromptDA is false. latestImage is updated every frame (line 40), but latestDepthMap (when using ARKit, i.e., !usePromptDA) is updated within the throttled block (e.g., current line 83). If saveDepthMap() is called (line 94), and usePromptDA is false, depthMapToSave (line 109) will use this throttled latestDepthMap.

This can lead to latestImage being more recent than latestDepthMap, resulting in a saved image and depth map that don't perfectly correspond to the same moment in time.

To ensure synchronization, latestDepthMap should be updated with the current frame's ARKit depth (frame.sceneDepth?.depthMap) unconditionally at the beginning of session(_:didUpdate:) if it's intended as the primary source or fallback. PromptDA can then overwrite it if active and successful.

        latestImage = frame.capturedImage
        // Always keep latestDepthMap updated with the most recent ARKit depth.
        // If PromptDA is active and successful, it will overwrite this later.
        // This ensures that if PromptDA is off, or fails, or for the LiDAR prompt,
        // we have the freshest ARKit depth available.
        if let currentARKitDepth = frame.sceneDepth?.depthMap {
            latestDepthMap = currentARKitDepth
        }

        let currentTime = CACurrentMediaTime()
        
        if currentTime - lastDepthUpdate >= depthUpdateInterval {
            lastDepthUpdate = currentTime  // タイマーを更新
            
            // Use PromptDA for depth estimation if enabled
            if usePromptDA, let estimator = depthEstimator {
                if estimator.shouldProcessFrame(timestamp: frame.timestamp) {
                    print("ARViewModel: Using PromptDA for depth estimation")
                    // For preview, don't resize to save performance
                    // Pass the most current ARKit depth as the LiDAR prompt
                    if let promptDADepth = estimator.estimateDepth(from: frame.capturedImage, lidarDepth: frame.sceneDepth?.depthMap, resizeToOriginal: false) {
                        latestDepthMap = promptDADepth // PromptDA's result becomes the new latestDepthMap
                        print("ARViewModel: PromptDA depth estimation successful")
                        processDepthMap(promptDADepth)
                        
                        // Also create confidence map from PromptDA
                        if showConfidenceMap, let confMap = estimator.createConfidenceMap(from: promptDADepth) {
                            processConfidenceMap(confMap)
                        }
                    } else {
                        print("ARViewModel: PromptDA depth estimation failed, falling back to ARKit depth")
                        // latestDepthMap already holds current ARKit depth from above
                        if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit)
                            processDepthMap(depthMap)
                        }
                        if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
                            processConfidenceMap(confidenceMap)
                        }
                    }
                } else {
                    // Not time to process with PromptDA, use ARKit depth for display
                    // latestDepthMap remains as is (either previous PromptDA or current ARKit)
                    if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit)
                        processDepthMap(depthMap)
                    }
                    if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
                        processConfidenceMap(confidenceMap)
                    }
                }
            } else {
                // PromptDA disabled or not initialized, use ARKit depth (already set in self.latestDepthMap)
                if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit)
                    processDepthMap(depthMap)
                }
                if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
                    processConfidenceMap(confidenceMap)
                }
            }
        }

Comment on lines 100 to 101
// Force process the current frame for capture (keep 182x252 resolution)
depthMapToSave = estimator.estimateDepth(from: latestImage, lidarDepth: latestDepthMap, resizeToOriginal: false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on line 100 mentions "keep 182x252 resolution", but the model's input (and likely output) size is 192x256 (height x width) as defined in PromptDADepthEstimator (modelSize = CGSize(width: 256, height: 192)). This seems to be a minor typo in the comment.

More importantly, resizeToOriginal: false (line 101) means that when PromptDA is used for capture, the saved depth map will be at the model's output resolution (e.g., 192x256). If the corresponding RGB image is saved at a higher, original camera resolution, this discrepancy might be unexpected for users. Consider documenting this in README_PromptDA.md.

Comment on lines +30 to +31
throw NSError(domain: "PromptDADepthEstimator", code: -1,
userInfo: [NSLocalizedDescriptionKey: "model_fp16.onnx not found in bundle"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using NSError for custom error conditions (like model not found, ONNX environment failure, etc.) is less idiomatic in Swift compared to defining a custom Error enum. A custom enum (e.g., PromptDAError: Error) would provide better type safety and improve code clarity.

This applies to other NSError instantiations in this file as well (lines 43-44, 241-242, 259-260).

attributes as CFDictionary,
&dstBuffer)

guard let destBuffer = dstBuffer else { return depthBuffer }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If CVPixelBufferCreate for dstBuffer (line 313) fails, dstBuffer will be nil. In this case, the guard let destBuffer = dstBuffer else { return depthBuffer } statement will return the original, unresized depthBuffer.

This might be misleading for the caller, who expects either a resized buffer or an indication of failure (like nil). Consider returning nil if the destination buffer cannot be created.

Suggested change
guard let destBuffer = dstBuffer else { return depthBuffer }
guard let destBuffer = dstBuffer else {
print("Error: Failed to create destination buffer in resizeDepthBuffer")
return nil // Or throw an error
}

ioridev and others added 4 commits June 23, 2025 14:22
- Add NaN and Infinite value checks before UInt8 conversion
- Replace invalid depth values with 0.0 to prevent crashes
- Objects that are too close will now appear black instead of crashing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add NaN/Infinite value checks in processDepthMap to prevent crashes
- Fix bytesPerRow handling in processDepthMap for correct pixel access
- Add 16-bit PNG export option alongside TIFF for better compatibility
- Fix preview/save mismatch by using same depth map for both operations
- Remove unnecessary depth map regeneration during save
- Add debug logging for troubleshooting

The main issue was that preview and save were using different depth maps.
Preview used the cached latestDepthMap while save was regenerating a new one,
causing inconsistencies. Now both use the same cached depth map.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Document the investigation and resolution of depth map preview/save mismatch,
including root cause analysis and lessons learned.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Re-enable 90-degree rotation for depth map preview to match expected portrait orientation.
PromptDA outputs 252x182 (landscape) which needs rotation for correct display.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant