Add PromptDA depth estimation integration #5

ioridev · 2025-06-18T04:58:00Z

Summary

Integrate PromptDA depth estimation as an alternative to ARKit's built-in depth sensing
Add toggle in UI to switch between ARKit and PromptDA depth estimation
Include ONNX model and dependencies for depth processing

Features Added

PromptDADepthEstimator class for ONNX-based depth estimation
Purple brain icon toggle in UI to enable/disable PromptDA
Model file model_fp16.onnx for depth processing
Integration with onnxruntime via Swift Package Manager
Documentation in README_PromptDA.md

Test plan

Verify app builds successfully with new dependencies
Test PromptDA toggle switches between depth estimation methods
Confirm depth capture works with both ARKit and PromptDA modes
Check that saved depth maps are correctly processed

🤖 Generated with Claude Code

- Add PromptDADepthEstimator class for ONNX-based depth estimation - Include model_fp16.onnx model file - Add PromptDA toggle in UI with purple brain icon - Update ARViewModel to support switching between ARKit and PromptDA depth - Add onnxruntime dependencies via Swift Package Manager - Add README_PromptDA.md documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

gemini-code-assist

Summary of Changes

Hello @ioridev, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's depth sensing capabilities by integrating PromptDA, a machine learning-based depth estimation model. This provides users with an alternative to the standard ARKit depth, offering more flexibility and potentially different depth capture characteristics. The changes include adding the necessary model and dependencies, implementing the core estimation logic, and providing a user-friendly toggle in the application's interface.

Highlights

New Depth Estimation Method: Integrated PromptDA, an alternative depth estimation method, allowing the application to use a machine learning model for depth sensing in addition to ARKit's built-in capabilities.
Model Integration: Added the model_fp16.onnx ONNX model and the onnxruntime-swift-package-manager dependency to facilitate the PromptDA depth estimation process.
New Class for Depth Estimation: Introduced the PromptDADepthEstimator class, which encapsulates the logic for preparing inputs, running ONNX inference, and processing the model's output into a depth map.
User Interface Control: Added a new toggle button in the UI (represented by a purple brain icon) that allows users to easily switch between using ARKit's depth sensing and the new PromptDA depth estimation.
Documentation: Included a new README_PromptDA.md file that provides comprehensive setup instructions, usage details, and debugging tips for the PromptDA integration.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully integrates PromptDA depth estimation, offering an alternative to ARKit. The core logic is well-encapsulated in PromptDADepthEstimator.swift. Key feedback points include addressing a critical force unwrap, ensuring data synchronization for saved images/depth maps, clarifying output resolutions, and adopting more idiomatic Swift error handling. The addition of README_PromptDA.md is a good step for documentation.

gemini-code-assist · 2025-06-18T05:00:35Z

DepthCamera/PromptDADepthEstimator.swift

+            context.render(scaledImage, to: buffer)
+        }
+
+        return resizedBuffer!


Force unwrapping resizedBuffer! can lead to a crash if CVPixelBufferCreate (line 287) fails and resizedBuffer remains nil. It's safer to handle the optional gracefully, for example, by returning nil if buffer creation fails.

Suggested change

return resizedBuffer!

guard let buffer = resizedBuffer else {

print("Error: Failed to create resized pixel buffer in resizePixelBuffer")

return pixelBuffer // Or return nil, depending on desired contract

}

context.render(scaledImage, to: buffer)

return buffer

gemini-code-assist · 2025-06-18T05:00:35Z

DepthCamera/ARViewModel.swift

        latestImage = frame.capturedImage
        let currentTime = CACurrentMediaTime()

        if currentTime - lastDepthUpdate >= depthUpdateInterval {
            lastDepthUpdate = currentTime  // タイマーを更新

-            // DepthMapの処理と表示
-            if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
-                processDepthMap(depthMap)
-            }
-
-            // ConfidenceMapの処理と表示
-            if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
-                processConfidenceMap(confidenceMap)
+            // Use PromptDA for depth estimation if enabled
+            if usePromptDA, let estimator = depthEstimator {
+                if estimator.shouldProcessFrame(timestamp: frame.timestamp) {
+                    print("ARViewModel: Using PromptDA for depth estimation")
+                    // For preview, don't resize to save performance
+                    if let promptDADepth = estimator.estimateDepth(from: frame.capturedImage, lidarDepth: frame.sceneDepth?.depthMap, resizeToOriginal: false) {
+                        latestDepthMap = promptDADepth
+                        print("ARViewModel: PromptDA depth estimation successful")
+                        processDepthMap(promptDADepth)
+
+                        // Also create confidence map from PromptDA
+                        if showConfidenceMap, let confMap = estimator.createConfidenceMap(from: promptDADepth) {
+                            processConfidenceMap(confMap)
+                        }
+                    } else {
+                        print("ARViewModel: PromptDA depth estimation failed, falling back to ARKit depth")
+                        // Fall back to ARKit depth
+                        latestDepthMap = frame.sceneDepth?.depthMap
+                        if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
+                            processDepthMap(depthMap)
+                        }
+                        if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
+                            processConfidenceMap(confidenceMap)
+                        }
+                    }
+                } else {
+                    // Not time to process with PromptDA, use ARKit depth
+                    latestDepthMap = frame.sceneDepth?.depthMap
+                    if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
+                        processDepthMap(depthMap)
+                    }
+                    if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
+                        processConfidenceMap(confidenceMap)
+                    }
+                }
+            } else {
+                // PromptDA disabled or not initialized, use ARKit depth
+                latestDepthMap = frame.sceneDepth?.depthMap
+                if showDepthMap, let depthMap = frame.sceneDepth?.depthMap {
+                    processDepthMap(depthMap)
+                }
+                if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap {
+                    processConfidenceMap(confidenceMap)
+                }
            }
        }


There's a potential synchronization issue when usePromptDA is false. latestImage is updated every frame (line 40), but latestDepthMap (when using ARKit, i.e., !usePromptDA) is updated within the throttled block (e.g., current line 83). If saveDepthMap() is called (line 94), and usePromptDA is false, depthMapToSave (line 109) will use this throttled latestDepthMap.

This can lead to latestImage being more recent than latestDepthMap, resulting in a saved image and depth map that don't perfectly correspond to the same moment in time.

To ensure synchronization, latestDepthMap should be updated with the current frame's ARKit depth (frame.sceneDepth?.depthMap) unconditionally at the beginning of session(_:didUpdate:) if it's intended as the primary source or fallback. PromptDA can then overwrite it if active and successful.

latestImage = frame.capturedImage // Always keep latestDepthMap updated with the most recent ARKit depth. // If PromptDA is active and successful, it will overwrite this later. // This ensures that if PromptDA is off, or fails, or for the LiDAR prompt, // we have the freshest ARKit depth available. if let currentARKitDepth = frame.sceneDepth?.depthMap { latestDepthMap = currentARKitDepth } let currentTime = CACurrentMediaTime() if currentTime - lastDepthUpdate >= depthUpdateInterval { lastDepthUpdate = currentTime // タイマーを更新 // Use PromptDA for depth estimation if enabled if usePromptDA, let estimator = depthEstimator { if estimator.shouldProcessFrame(timestamp: frame.timestamp) { print("ARViewModel: Using PromptDA for depth estimation") // For preview, don't resize to save performance // Pass the most current ARKit depth as the LiDAR prompt if let promptDADepth = estimator.estimateDepth(from: frame.capturedImage, lidarDepth: frame.sceneDepth?.depthMap, resizeToOriginal: false) { latestDepthMap = promptDADepth // PromptDA's result becomes the new latestDepthMap print("ARViewModel: PromptDA depth estimation successful") processDepthMap(promptDADepth) // Also create confidence map from PromptDA if showConfidenceMap, let confMap = estimator.createConfidenceMap(from: promptDADepth) { processConfidenceMap(confMap) } } else { print("ARViewModel: PromptDA depth estimation failed, falling back to ARKit depth") // latestDepthMap already holds current ARKit depth from above if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit) processDepthMap(depthMap) } if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { processConfidenceMap(confidenceMap) } } } else { // Not time to process with PromptDA, use ARKit depth for display // latestDepthMap remains as is (either previous PromptDA or current ARKit) if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit) processDepthMap(depthMap) } if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { processConfidenceMap(confidenceMap) } } } else { // PromptDA disabled or not initialized, use ARKit depth (already set in self.latestDepthMap) if showDepthMap, let depthMap = self.latestDepthMap { // Use self.latestDepthMap (current ARKit) processDepthMap(depthMap) } if showConfidenceMap, let confidenceMap = frame.sceneDepth?.confidenceMap { processConfidenceMap(confidenceMap) } } }

gemini-code-assist · 2025-06-18T05:00:35Z

DepthCamera/ARViewModel.swift

+            // Force process the current frame for capture (keep 182x252 resolution)
+            depthMapToSave = estimator.estimateDepth(from: latestImage, lidarDepth: latestDepthMap, resizeToOriginal: false)


The comment on line 100 mentions "keep 182x252 resolution", but the model's input (and likely output) size is 192x256 (height x width) as defined in PromptDADepthEstimator (modelSize = CGSize(width: 256, height: 192)). This seems to be a minor typo in the comment.

More importantly, resizeToOriginal: false (line 101) means that when PromptDA is used for capture, the saved depth map will be at the model's output resolution (e.g., 192x256). If the corresponding RGB image is saved at a higher, original camera resolution, this discrepancy might be unexpected for users. Consider documenting this in README_PromptDA.md.

gemini-code-assist · 2025-06-18T05:00:35Z

DepthCamera/PromptDADepthEstimator.swift

+            throw NSError(domain: "PromptDADepthEstimator", code: -1, 
+                         userInfo: [NSLocalizedDescriptionKey: "model_fp16.onnx not found in bundle"])


Using NSError for custom error conditions (like model not found, ONNX environment failure, etc.) is less idiomatic in Swift compared to defining a custom Error enum. A custom enum (e.g., PromptDAError: Error) would provide better type safety and improve code clarity.

This applies to other NSError instantiations in this file as well (lines 43-44, 241-242, 259-260).

gemini-code-assist · 2025-06-18T05:00:35Z

DepthCamera/PromptDADepthEstimator.swift

+                           attributes as CFDictionary,
+                           &dstBuffer)
+
+        guard let destBuffer = dstBuffer else { return depthBuffer }


If CVPixelBufferCreate for dstBuffer (line 313) fails, dstBuffer will be nil. In this case, the guard let destBuffer = dstBuffer else { return depthBuffer } statement will return the original, unresized depthBuffer.

This might be misleading for the caller, who expects either a resized buffer or an indication of failure (like nil). Consider returning nil if the destination buffer cannot be created.

Suggested change

guard let destBuffer = dstBuffer else { return depthBuffer }

guard let destBuffer = dstBuffer else {

print("Error: Failed to create destination buffer in resizeDepthBuffer")

return nil // Or throw an error

}

- Add NaN and Infinite value checks before UInt8 conversion - Replace invalid depth values with 0.0 to prevent crashes - Objects that are too close will now appear black instead of crashing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add NaN/Infinite value checks in processDepthMap to prevent crashes - Fix bytesPerRow handling in processDepthMap for correct pixel access - Add 16-bit PNG export option alongside TIFF for better compatibility - Fix preview/save mismatch by using same depth map for both operations - Remove unnecessary depth map regeneration during save - Add debug logging for troubleshooting The main issue was that preview and save were using different depth maps. Preview used the cached latestDepthMap while save was regenerating a new one, causing inconsistencies. Now both use the same cached depth map. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Document the investigation and resolution of depth map preview/save mismatch, including root cause analysis and lessons learned. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Re-enable 90-degree rotation for depth map preview to match expected portrait orientation. PromptDA outputs 252x182 (landscape) which needs rotation for correct display. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

gemini-code-assist bot reviewed Jun 18, 2025

View reviewed changes

ioridev and others added 4 commits June 23, 2025 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PromptDA depth estimation integration #5

Add PromptDA depth estimation integration #5

Uh oh!

ioridev commented Jun 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 18, 2025

Uh oh!

gemini-code-assist bot Jun 18, 2025

Uh oh!

gemini-code-assist bot Jun 18, 2025

Uh oh!

gemini-code-assist bot Jun 18, 2025

Uh oh!

gemini-code-assist bot Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        return resizedBuffer!
+        guard let buffer = resizedBuffer else {
+            print("Error: Failed to create resized pixel buffer in resizePixelBuffer")
+            return pixelBuffer // Or return nil, depending on desired contract
+        }
+        context.render(scaledImage, to: buffer)
+        return buffer

		// Force process the current frame for capture (keep 182x252 resolution)
		depthMapToSave = estimator.estimateDepth(from: latestImage, lidarDepth: latestDepthMap, resizeToOriginal: false)

		throw NSError(domain: "PromptDADepthEstimator", code: -1,
		userInfo: [NSLocalizedDescriptionKey: "model_fp16.onnx not found in bundle"])

Add PromptDA depth estimation integration #5

Are you sure you want to change the base?

Add PromptDA depth estimation integration #5

Uh oh!

Conversation

ioridev commented Jun 18, 2025

Summary

Features Added

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant