You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: blog/reverse-engineering-the-vive-facial-tracker.mdx
+73-22
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,8 @@
1
1
---
2
2
title: Reverse Engineering the Vive Facial Tracker
3
+
description: An article detailing the reverse engineering process of the Vive Facial Tracker, a VR face-tracking accessory.
4
+
authors: dfgHiatus
5
+
hide_table_of_contents: false
3
6
---
4
7
5
8
# Reverse Engineering the Vive Facial Tracker
@@ -14,32 +17,41 @@ This is the story of how the Vive Facial Tracker, another VR face tracking acces
14
17
15
18
Buckle in.
16
19
20
+
{/* truncate */}
21
+
17
22
# The Vive Facial Tracker
18
23
19
24
Some context. The Vive Facial Tracker (VFT) was a VR accessory released on March 24th, 2021. Worn underneath a VR headset, it captures camera images of a user's lower face, and using an in-house AI ([SRanipal](https://docs.vrcft.io/docs/hardware/VIVE/sranipal)) converts expressions into other programs can understand.
20
25
21
-
The VFT currently has integrations for VRChat (via [VRCFaceTracking](https://github.com/benaclejames/VRCFaceTracking)), Resonite and ChilloutVR (natively). Here is a video of it in use:
Sidenote here, it's really hard to describe the *impact* VR face tracking has had on the entirety of Social VR, at least in my experience. It's a completely different degree of immersion, you have to see it in person.
The VFT currently has integrations for VRChat (via [VRCFaceTracking](https://github.com/benaclejames/VRCFaceTracking)), Resonite and ChilloutVR (natively). Here is a video of it in use in VRChat:
35
+
36
+
<iframewidth="560"height="315"src="https://www.youtube.com/embed/F_ptjZ8Dl5E?si=sJ5ptM9EwKGZNEh3"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
26
37
27
-
Unfortunately, the VFT has been discontinued. [You can't even see it on Vive's own store anymore](https://www.vive.com/us/accessory/). Even worse, it's being scalped on eBay in excess of $1,000. Remember, this accessory cost ~$150 when it came out!!
38
+
Unfortunately, the VFT has been discontinued. [You can't even see it on Vive's own store anymore](https://www.vive.com/us/accessory/). This accessory cost ~$150 when it came out, and I've seen it scalped on eBay in excess of $1,000. So, 4 years after its launch, the VFT is still in demand more than ever.
28
39
29
40
# Understanding the VFT's hardware
30
41
31
42
Before we can begin to extract camera images from the VFT, we need to understand its underlying hardware.
32
43
33
-
## Camera
44
+
## Camera(s)
34
45
35
-
The VFT consists of two [OV6211 image sensors](https://www.ovt.com/products/ov6211/) and an image signal processor [OV00580-B21G-1C](https://www.ovt.com/products/ov580/) from OmniVision. The cameras record at 400px\*400px at 60Hz, their images are then shrunk to 200px\*400px then put side by side, for a combined final image 400px\*400px.
46
+
The VFT consists of two [OV6211 image sensors](https://www.ovt.com/products/ov6211/) and an image signal processor [OV00580-B21G-1C](https://www.ovt.com/products/ov580/) from OmniVision. The cameras record at 400px\*400px at 60Hz, their images are then shrunk to 200px\*400px then put side by side, for a combined final image 400px\*400px. Below is the "raw" (in quotation marks!!) camera image from the VFT:
In SRanipal, these separate images are used to compute [disparity](https://en.wikipedia.org/wiki/Binocular_disparity), IE provide how close an object is to the camera. This is useful for expressions in which parts of the face are closer to the camera, such as `Jaw_Forward`, `Tongue_LongStep1` and `Tongue_LongStep2`.
39
52
:::
40
53
41
-
An IR light source is used to illuminate the face of the user. The cameras do not record color information per se but rather the luminance of the IR light. This has a direct influence on the image format.
42
-
54
+
An IR light source is used to illuminate the face of the user. The cameras do not record color information per se but rather luminance of the IR light.
43
55
44
56
Moreover, this is the output of Video4Linux's `v4l2-ctl --list-devices`:
45
57
@@ -66,9 +78,11 @@ ioctl: VIDIOC_ENUM_FMT
66
78
67
79
From above, we can see the VFT provides YUV422 images.
68
80
69
-
Funnily enough though, the VFT *does not actually output a proper YUV image*. Instead, the VFT stores the grayscale image in all 3 image channels. This breaks trying to convert YUV into RGB, the resulting image can be seen below:
81
+
Remember the raw camera image from earlier? Well, that image was provided from the *SRanipal API*. We're querying the device directly, and the VFT *does not actually output a proper YUV image*. Instead, the VFT stores the grayscale image in all 3 image channels. This breaks trying to interpret YUV as RGB, the resulting image can be seen below:
70
82
71
-
To workaround this, we can extract the "Y" channel use it as a grayscale image. This is pretty fast, as we only need to decode the "4" part of the YUV422 image and ignore the "22" part.
To workaround this, we can extract the "Y" channel use it as a grayscale image. This is pretty fast, as we only need to decode the "4" part of the YUV422 image and ignore the "22" part. For more information on how to convert from YUV color space to RGB, check out this [Wikipedia article](https://en.wikipedia.org/wiki/Y%E2%80%B2UV#Conversion_to/from_RGB).
72
86
73
87
## Lights
74
88
@@ -288,25 +302,25 @@ Device Status: 0x0000
288
302
(Bus Powered)
289
303
```
290
304
291
-
Interestingly, the "VideoControl Interface Descriptor" guidExtensionCode's value `{2ccb0bda-6331-4fdb-850e-79054dbd5671}` matches the log output of a "ZED2i" camera online. This means the (open-source!) code of stereolabs's ZED cameras and the VIVE Facial Tracker share a lot in common, at least USB-wise:
305
+
Interestingly, the "VideoControl Interface Descriptor" guidExtensionCode's value `{2ccb0bda-6331-4fdb-850e-79054dbd5671}` matches the log output of a "ZED2i" camera online. This means the (open-source!) code of stereolabs's ZED cameras and the VFT share a lot in common, at least USB-wise:
The VIVE Facial Tracker is a video type USB device and behaves like one, with an exception. The data stream is not activated using the regular means but has to be activated using the "Extension Unit". Basically the VIVE Facial Tracker is controlled by sending commands to this "Extension Unit".
311
+
The VFT is a video type USB device and behaves like one, with an exception. The data stream is not activated by regular means, but has to be controlled with an "Extension Unit".
298
312
299
-
In general you have to use SET_CUR commands to set camera parameters and to enable the camera stream. The device uses a fixed size scratch buffer of 384 bytes for all sending and receiving. Only the relevant command bytes are actually consumed while the rest is disregarded.
313
+
In general, you have to use `SET_CUR` commands to set camera parameters/enable a camera stream via USB. The VFT uses a fixed size scratch buffer of 384 bytes for all sending and receiving, with only relevant command bytes consumed. The rest are disregarded.
300
314
301
-
Camera parameters are set using the `0xab` request id. Analyzing the protocol there are 11 registers touched by the original SRanipal software. The ZED2i lists in particular 6 parameters to control exposure and gain:
315
+
Camera parameters are set using the `0xab` request ID. Analyzing the protocol, there are 11 registers touched by SRanipal. The ZED2i lists in particular 6 parameters to control exposure and gain:
302
316
- ADDR_EXP_H
303
317
- ADDR_EXP_M
304
318
- ADDR_EXP_L
305
319
- ADDR_GAIN_H
306
320
- ADDR_GAIN_M
307
321
- ADDR_GAIN_L
308
322
309
-
Using some testing they most probably map like this to the VIVE Facial Tracker:
323
+
Some testing reveals they most likely map like this to the VFT:
310
324
311
325
| Register | Value | Description |
312
326
|----------|---------|-------------|
@@ -322,23 +336,60 @@ Using some testing they most probably map like this to the VIVE Facial Tracker:
322
336
|`x07`|`xb2`| gain low |
323
337
|`x0f`|`x03`||
324
338
325
-
The values on the left side are the registers and the value on the right side is the value set by SRanipal. Testing different values produced worse results so the values used by SRanipal seem to be the best choice. What the other parameters dso is unknown.
339
+
The values on the left side are the registers and the value on the right side is the value set by SRanipal. Testing different values produced worse results so the values used by SRanipal seem to be the best choice. What the other parameters do is unknown.
326
340
327
-
The `x14` request is the one enabling and disabling the data stream. Hence first the camera parameters have to be set then the stream has to be enabled.
341
+
The `x14` request is the one that enables and disables the data stream. Hence, first the camera parameters have to be set then the stream can be enabled.
328
342
329
343
Once the data stream is enabled the camera streams data in the YUV422 format using regular USB video device streaming.
330
344
331
-
### Windows
332
-
345
+
:::note
333
346
One small caveat, Windows has no simple access to USB devices as Linux has. Thankfully, instead of using `v4l2`, we can use [`pygrabber`](https://github.com/andreaschiavinato/python_grabber) when needs be.
347
+
:::
334
348
335
349
## Action
336
350
337
-
Now that we have a camera image from the VFT, we just need to pass it to the Babble App. That's all it takes! Here's a video of the VFT in use with the Babble App, if you'd like to mess around with it yourself, feel free to check out the branch for it here.
351
+
Now that we have a camera image from the VFT, we just need to pass it to the Babble App. Of course, we need to merge and postprocess the left and right images:
If you'd like to mess around with it yourself, feel free to check out the branch for it [here](https://github.com/Project-Babble/ProjectBabble/pull/82).
338
389
339
390
# Conclusions, Reflections
340
391
341
-
I want to give a shoutout to DragonLord for providing the code the VFT as well as making it available for the Babble App. I would also like to thank my teammates Summer and Rames, as well as Aero for QA'ing this here too.
392
+
I want to give a shoutout to DragonLord again for providing the code the VFT as well as making it available for the Babble App. I would also like to thank my teammates Summer and Rames, as well as Aero for QA'ing this here too.
342
393
343
394
If you're interested in a Babble Tracker we're looking to restock sometime later this March, maybe April if things go slowly. We'll make an announcement when we re-open sales, you can follow us on Twitter or join or Discord to stay up to date on all things Babble!
0 commit comments