Skip to content

Commit 8364141

Browse files
committed
(tweak) Even more copy
1 parent 261f845 commit 8364141

File tree

12 files changed

+1223
-439
lines changed

12 files changed

+1223
-439
lines changed

blog/authors.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
dfgHiatus:
2-
name: Benjamin M. Evans
2+
name: dfgHiatus
33
title: Maintainer of the Project Babble docs
44
url: https://github.com/dfgHiatus
5-
image_url: https://avatars.githubusercontent.com/u/51272212?s=400&u=cff33bfb7d514d6b56bd99c08bd50402a7189c56&v=4
5+
image_url: https://avatars.githubusercontent.com/u/51272212?s=400&u=cff33bfb7d514d6b56bd99c08bd50402a7189c56&v=4

blog/reverse-engineering-the-vive-facial-tracker.mdx

+73-22
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
---
22
title: Reverse Engineering the Vive Facial Tracker
3+
description: An article detailing the reverse engineering process of the Vive Facial Tracker, a VR face-tracking accessory.
4+
authors: dfgHiatus
5+
hide_table_of_contents: false
36
---
47

58
# Reverse Engineering the Vive Facial Tracker
@@ -14,32 +17,41 @@ This is the story of how the Vive Facial Tracker, another VR face tracking acces
1417

1518
Buckle in.
1619

20+
{/* truncate */}
21+
1722
# The Vive Facial Tracker
1823

1924
Some context. The Vive Facial Tracker (VFT) was a VR accessory released on March 24th, 2021. Worn underneath a VR headset, it captures camera images of a user's lower face, and using an in-house AI ([SRanipal](https://docs.vrcft.io/docs/hardware/VIVE/sranipal)) converts expressions into other programs can understand.
2025

21-
The VFT currently has integrations for VRChat (via [VRCFaceTracking](https://github.com/benaclejames/VRCFaceTracking)), Resonite and ChilloutVR (natively). Here is a video of it in use:
26+
import ImageGallery from '@site/src/components/ImageGallery/ImageGallery';
2227

23-
:::note
24-
Sidenote here, it's really hard to describe the *impact* VR face tracking has had on the entirety of Social VR, at least in my experience. It's a completely different degree of immersion, you have to see it in person.
25-
:::
28+
<ImageGallery images={[
29+
"/blog/reverse-engineering-the-vive-facial-tracker/vft-front.jpg",
30+
"/blog/reverse-engineering-the-vive-facial-tracker/vft-back.jpg",
31+
"/blog/reverse-engineering-the-vive-facial-tracker/vive.webp",
32+
]} />
33+
34+
The VFT currently has integrations for VRChat (via [VRCFaceTracking](https://github.com/benaclejames/VRCFaceTracking)), Resonite and ChilloutVR (natively). Here is a video of it in use in VRChat:
35+
36+
<iframe width="560" height="315" src="https://www.youtube.com/embed/F_ptjZ8Dl5E?si=sJ5ptM9EwKGZNEh3" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
2637

27-
Unfortunately, the VFT has been discontinued. [You can't even see it on Vive's own store anymore](https://www.vive.com/us/accessory/). Even worse, it's being scalped on eBay in excess of $1,000. Remember, this accessory cost ~$150 when it came out!!
38+
Unfortunately, the VFT has been discontinued. [You can't even see it on Vive's own store anymore](https://www.vive.com/us/accessory/). This accessory cost ~$150 when it came out, and I've seen it scalped on eBay in excess of $1,000. So, 4 years after its launch, the VFT is still in demand more than ever.
2839

2940
# Understanding the VFT's hardware
3041

3142
Before we can begin to extract camera images from the VFT, we need to understand its underlying hardware.
3243

33-
## Camera
44+
## Camera(s)
3445

35-
The VFT consists of two [OV6211 image sensors](https://www.ovt.com/products/ov6211/) and an image signal processor [OV00580-B21G-1C](https://www.ovt.com/products/ov580/) from OmniVision. The cameras record at 400px\*400px at 60Hz, their images are then shrunk to 200px\*400px then put side by side, for a combined final image 400px\*400px.
46+
The VFT consists of two [OV6211 image sensors](https://www.ovt.com/products/ov6211/) and an image signal processor [OV00580-B21G-1C](https://www.ovt.com/products/ov580/) from OmniVision. The cameras record at 400px\*400px at 60Hz, their images are then shrunk to 200px\*400px then put side by side, for a combined final image 400px\*400px. Below is the "raw" (in quotation marks!!) camera image from the VFT:
47+
48+
![Camera](/blog/reverse-engineering-the-vive-facial-tracker/hand.PNG)
3649

3750
:::note
3851
In SRanipal, these separate images are used to compute [disparity](https://en.wikipedia.org/wiki/Binocular_disparity), IE provide how close an object is to the camera. This is useful for expressions in which parts of the face are closer to the camera, such as `Jaw_Forward`, `Tongue_LongStep1` and `Tongue_LongStep2`.
3952
:::
4053

41-
An IR light source is used to illuminate the face of the user. The cameras do not record color information per se but rather the luminance of the IR light. This has a direct influence on the image format.
42-
54+
An IR light source is used to illuminate the face of the user. The cameras do not record color information per se but rather luminance of the IR light.
4355

4456
Moreover, this is the output of Video4Linux's `v4l2-ctl --list-devices`:
4557

@@ -66,9 +78,11 @@ ioctl: VIDIOC_ENUM_FMT
6678

6779
From above, we can see the VFT provides YUV422 images.
6880

69-
Funnily enough though, the VFT *does not actually output a proper YUV image*. Instead, the VFT stores the grayscale image in all 3 image channels. This breaks trying to convert YUV into RGB, the resulting image can be seen below:
81+
Remember the raw camera image from earlier? Well, that image was provided from the *SRanipal API*. We're querying the device directly, and the VFT *does not actually output a proper YUV image*. Instead, the VFT stores the grayscale image in all 3 image channels. This breaks trying to interpret YUV as RGB, the resulting image can be seen below:
7082

71-
To workaround this, we can extract the "Y" channel use it as a grayscale image. This is pretty fast, as we only need to decode the "4" part of the YUV422 image and ignore the "22" part.
83+
![YUV](/blog/reverse-engineering-the-vive-facial-tracker/YUV.PNG)
84+
85+
To workaround this, we can extract the "Y" channel use it as a grayscale image. This is pretty fast, as we only need to decode the "4" part of the YUV422 image and ignore the "22" part. For more information on how to convert from YUV color space to RGB, check out this [Wikipedia article](https://en.wikipedia.org/wiki/Y%E2%80%B2UV#Conversion_to/from_RGB).
7286

7387
## Lights
7488

@@ -288,25 +302,25 @@ Device Status: 0x0000
288302
(Bus Powered)
289303
```
290304

291-
Interestingly, the "VideoControl Interface Descriptor" guidExtensionCode's value `{2ccb0bda-6331-4fdb-850e-79054dbd5671}` matches the log output of a "ZED2i" camera online. This means the (open-source!) code of stereolabs's ZED cameras and the VIVE Facial Tracker share a lot in common, at least USB-wise:
305+
Interestingly, the "VideoControl Interface Descriptor" guidExtensionCode's value `{2ccb0bda-6331-4fdb-850e-79054dbd5671}` matches the log output of a "ZED2i" camera online. This means the (open-source!) code of stereolabs's ZED cameras and the VFT share a lot in common, at least USB-wise:
292306

293307
- https://github.com/stereolabs/zed-open-capture/blob/5cf66ff777175776451b9b59ecc6231d730fa202/src/videocapture.cpp
294308

295309
### SB Protocol / Extension Unit
296310

297-
The VIVE Facial Tracker is a video type USB device and behaves like one, with an exception. The data stream is not activated using the regular means but has to be activated using the "Extension Unit". Basically the VIVE Facial Tracker is controlled by sending commands to this "Extension Unit".
311+
The VFT is a video type USB device and behaves like one, with an exception. The data stream is not activated by regular means, but has to be controlled with an "Extension Unit".
298312

299-
In general you have to use SET_CUR commands to set camera parameters and to enable the camera stream. The device uses a fixed size scratch buffer of 384 bytes for all sending and receiving. Only the relevant command bytes are actually consumed while the rest is disregarded.
313+
In general, you have to use `SET_CUR` commands to set camera parameters/enable a camera stream via USB. The VFT uses a fixed size scratch buffer of 384 bytes for all sending and receiving, with only relevant command bytes consumed. The rest are disregarded.
300314

301-
Camera parameters are set using the `0xab` request id. Analyzing the protocol there are 11 registers touched by the original SRanipal software. The ZED2i lists in particular 6 parameters to control exposure and gain:
315+
Camera parameters are set using the `0xab` request ID. Analyzing the protocol, there are 11 registers touched by SRanipal. The ZED2i lists in particular 6 parameters to control exposure and gain:
302316
- ADDR_EXP_H
303317
- ADDR_EXP_M
304318
- ADDR_EXP_L
305319
- ADDR_GAIN_H
306320
- ADDR_GAIN_M
307321
- ADDR_GAIN_L
308322

309-
Using some testing they most probably map like this to the VIVE Facial Tracker:
323+
Some testing reveals they most likely map like this to the VFT:
310324

311325
| Register | Value | Description |
312326
|----------|---------|-------------|
@@ -322,23 +336,60 @@ Using some testing they most probably map like this to the VIVE Facial Tracker:
322336
| `x07` | `xb2` | gain low |
323337
| `x0f` | `x03` | |
324338

325-
The values on the left side are the registers and the value on the right side is the value set by SRanipal. Testing different values produced worse results so the values used by SRanipal seem to be the best choice. What the other parameters dso is unknown.
339+
The values on the left side are the registers and the value on the right side is the value set by SRanipal. Testing different values produced worse results so the values used by SRanipal seem to be the best choice. What the other parameters do is unknown.
326340

327-
The `x14` request is the one enabling and disabling the data stream. Hence first the camera parameters have to be set then the stream has to be enabled.
341+
The `x14` request is the one that enables and disables the data stream. Hence, first the camera parameters have to be set then the stream can be enabled.
328342

329343
Once the data stream is enabled the camera streams data in the YUV422 format using regular USB video device streaming.
330344

331-
### Windows
332-
345+
:::note
333346
One small caveat, Windows has no simple access to USB devices as Linux has. Thankfully, instead of using `v4l2`, we can use [`pygrabber`](https://github.com/andreaschiavinato/python_grabber) when needs be.
347+
:::
334348

335349
## Action
336350

337-
Now that we have a camera image from the VFT, we just need to pass it to the Babble App. That's all it takes! Here's a video of the VFT in use with the Babble App, if you'd like to mess around with it yourself, feel free to check out the branch for it here.
351+
Now that we have a camera image from the VFT, we just need to pass it to the Babble App. Of course, we need to merge and postprocess the left and right images:
352+
353+
```python
354+
...
355+
def process_frame(self: 'ViveTracker', data: np.ndarray) -> np.ndarray:
356+
"""Process a captured frame.
357+
Right now this applies a median blur but other manipulations
358+
are possible to improve the image if desired.
359+
Keyword arguments:
360+
data --- Frame to process
361+
"""
362+
lum = cv.split(data)[0]
363+
364+
"""
365+
gamma = 2.2
366+
inv_gamma = 1.0 / gamma
367+
lut = np.array([((i / 255.0) ** inv_gamma) * 255
368+
for i in np.arange(0, 256)]).astype("uint8")
369+
lum = cv.LUT(lum, lut)
370+
"""
371+
372+
lum = lum[:, 0:200]
373+
374+
lum = cv.resize(lum, (400, 400))
375+
376+
"""
377+
lum = cv.medianBlur(lum, 5)
378+
"""
379+
380+
return cv.merge((lum, lum, lum))
381+
...
382+
```
383+
384+
That's all it takes! Here's an image of the VFT in use with the Babble App:
385+
386+
![Babble App](/blog/reverse-engineering-the-vive-facial-tracker/app.PNG)
387+
388+
If you'd like to mess around with it yourself, feel free to check out the branch for it [here](https://github.com/Project-Babble/ProjectBabble/pull/82).
338389

339390
# Conclusions, Reflections
340391

341-
I want to give a shoutout to DragonLord for providing the code the VFT as well as making it available for the Babble App. I would also like to thank my teammates Summer and Rames, as well as Aero for QA'ing this here too.
392+
I want to give a shoutout to DragonLord again for providing the code the VFT as well as making it available for the Babble App. I would also like to thank my teammates Summer and Rames, as well as Aero for QA'ing this here too.
342393

343394
If you're interested in a Babble Tracker we're looking to restock sometime later this March, maybe April if things go slowly. We'll make an announcement when we re-open sales, you can follow us on Twitter or join or Discord to stay up to date on all things Babble!
344395

package.json

+5-4
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,21 @@
1414
"write-heading-ids": "docusaurus write-heading-ids"
1515
},
1616
"dependencies": {
17-
"@docusaurus/core": "^3.6.1",
18-
"@docusaurus/preset-classic": "^3.6.1",
17+
"@docusaurus/core": "^3.7.0",
18+
"@docusaurus/preset-classic": "^3.7.0",
1919
"@mdx-js/react": "^3.0.0",
2020
"clsx": "^2.0.0",
2121
"online-3d-viewer": "^0.12.0",
2222
"prism-react-renderer": "^2.3.0",
2323
"react": "^18.0.0",
2424
"react-dom": "^18.0.0",
25+
"react-horizontal-scrolling-menu": "^8.2.0",
2526
"react-svgmt": "^2.0.2",
2627
"yarn": "^1.22.22"
2728
},
2829
"devDependencies": {
29-
"@docusaurus/module-type-aliases": "^3.6.1",
30-
"@docusaurus/types": "^3.6.1"
30+
"@docusaurus/module-type-aliases": "^3.7.0",
31+
"@docusaurus/types": "^3.7.0"
3132
},
3233
"browserslist": {
3334
"production": [
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
.gallery-container {
2+
display: flex;
3+
padding-block: 24px;
4+
}
5+
6+
.gallery-item {
7+
cursor: pointer;
8+
margin-right: 10px;
9+
}
10+
11+
.gallery-item img {
12+
width: 200px;
13+
border-radius: 8px;
14+
transition: transform 0.2s;
15+
}
16+
17+
.gallery-item img:hover {
18+
transform: scale(1.05);
19+
}
20+
21+
.modal-overlay {
22+
position: fixed;
23+
top: 0;
24+
left: 0;
25+
width: 100vw;
26+
height: 100vh;
27+
background: rgba(0, 0, 0, 0.8);
28+
display: flex;
29+
justify-content: center;
30+
align-items: center;
31+
z-index: 1000;
32+
}
33+
34+
.modal-content {
35+
padding: 48px;
36+
37+
& img {
38+
object-fit: contain;
39+
height: 80vh;
40+
border-radius: 10px;
41+
}
42+
}
43+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import React, { useState } from "react";
2+
import { ScrollMenu } from "react-horizontal-scrolling-menu";
3+
import "react-horizontal-scrolling-menu/dist/styles.css";
4+
import "./ImageGallery.css";
5+
6+
const ImageGallery = ({ images = [] }) => {
7+
const [selectedImage, setSelectedImage] = useState(null);
8+
9+
const openModal = (src) => setSelectedImage(src);
10+
const closeModal = () => setSelectedImage(null);
11+
12+
return (
13+
<div className="gallery-container">
14+
<ScrollMenu>
15+
{images.map((src, index) => (
16+
<div key={index} className="gallery-item" onClick={() => openModal(src)}>
17+
<img src={src} alt={`Gallery Image ${index + 1}`} />
18+
</div>
19+
))}
20+
</ScrollMenu>
21+
22+
{/* Modal for Enlarged View */}
23+
{selectedImage && (
24+
<div className="modal-overlay" onClick={closeModal}>
25+
<div className="modal-content">
26+
<img src={selectedImage} alt="Enlarged view" />
27+
</div>
28+
</div>
29+
)}
30+
</div>
31+
);
32+
};
33+
34+
export default ImageGallery;
Loading
Loading
Loading
Loading
Loading
Binary file not shown.

0 commit comments

Comments
 (0)