Immersive Web, WebGPU and WebNN

A progress report on my work on realising a vision for the Immersive Web [1]. I started with experiments using WebGL as a long established API for creating 3D applications in web pages. More recently, I switched to WebGPU, the modern successor to WebGL that offers better performance along with compute shaders. See [2] for an explanation and links to the demos. Note that most, but not all, browsers support WebGPU fully.

I am now experimenting with WebNN, which is a relatively new browser API for efficient access to tensor processing hardware. The aim is to provide real-time facial reenactment on everyday devices. In other words, to ensure, when meeting in immersive extended reality environments, that your chosen avatar’s facial expressions match yours as captured by the laptop's or phone's video camera.

The plan is to apply federated learning in the browser for crowd-sourced autoregressive blend shape training. Blend shapes describe vertex displacements for facial expressions, e.g. the movement of the corners of the mouth. The ambition is to allow the models to learn blend shapes without needing explicit definitions, and to combine this with albedo and normal maps for reduced polygon counts. The approach uses a combination of WebGPU and WebNN for acceleration.

The WebNN API is designed for inference, not training. My solution is to start with a simple syntax for neural network models that are processed to generate the JavaScript WebNN code for a) running the model forward, and b) back propagating the loss function by mapping the model to its inverse, thereby updating the model parameters in a forward pass through the inverse model.  This work will avoid the need for reliance on huge libraries like TensorFlow.js and ONNX.

A simple model syntax needs to a) declare the tensor shapes and datatypes, b) define how the named tensors are related through standardised operations. and c) supply the initial values for the input tensors.  I can use the same names as WebNN for the datatypes and operations, and avoid the need for superfluous quote marks. 

So far I have found Google’s Gemini AI to be an excellent research partner. It is good at explaining the concepts, providing examples, and linking to resources. It is less good at providing working full examples, as it tends to get a few things wrong, but then again so do I!  Together, we make a good team, and I look forward to exploiting this when I am ready to work on new neural network architectures for the Sentient Web: Human inspiration + AI grunt work.

[1] https://www.w3.org/2024/06-Raggett-immersive-web.pdf
[2] https://www.w3.org/2025/webgpu/

Best regards,

Dave Raggett <dsr@w3.org>

Received on Saturday, 1 November 2025 14:41:11 UTC