Nvidia's gen-AI facial animation tech is going open source, potentially making it much easier for game devs to turn real-time speech into convincing characters

4 hours ago 3

Rommie Analytics

Nvidia has announced that its Audio2Face animation technology is going open source. On paper, that should make it much easier for a wide range of game developers to create AI characters with convincing facial expressions, including during real-time conversations with gamers.

To recap and according to Nvidia's own words, "by using large language and speech models, generative AI is creating intelligent 3D avatars that can engage users in natural conversation, from video games to customer service. To make these characters truly lifelike, they need human-like expressions."

Enter Nvidia's Audio2Face. "Audio2Face accelerates the creation of realistic digital characters by providing real-time facial animation and lip-sync driven by generative AI, Nvidia says.

"Audio2Face uses AI to generate realistic facial animations from audio input. It works by analyzing acoustic features like phonemes and intonation to create a stream of animation data, which is then mapped to a character's facial poses. This data can be rendered offline for pre-scripted content or streamed in real-time for dynamic, AI-driven characters, providing accurate lip-sync and emotional expressions."

In short, by open sourcing Audio2Face, Nvidia says it hopes to, "accelerate adoption of AI-powered avatars in games and 3D applications."

Part of Nvidia's broader ACE platform, which is all about creating more convincing digital human avatars, our own Jacob R sampled Audio2Face last year and came away impressed. Fuelled by some LLM-generated responses, Jacob found the end result, "frighteningly good."

Nvidia Audio2Face

The knee bone's connected to the LLM-powered-emotion-engine face bone... (Image credit: Nvidia)

The only really obvious giveaway that you're dealing with an early, experimental system is the slight delay in responses, which made for "awkward pauses" in conversation.

In terms of what's being released in open source form, we're talking the Audio2Face SDK, audio plugins for inputting voice streams, training frameworks, sample training data, a library of facial models and a specific Unreal 5 Engine plugin. The open source release also includes Audio2Emotion Models, which can "infer" emotional state from audio in real time.

Nvidia says that among game devs who already use Audio2Face are Codemasters, GSC Games World, NetEase, Perfect World Games, while ISVs include Convai, Inworld AI, Reallusion, Streamlabs, and UneeQ.

Of course, the catch to all this is that Nvidia's broader ACE platform is, inevitably, tied at least to some extent to Nvidia's GPUs, albeit as we understand there aren't any really obvious reasons why ACE features shouldn't run on non-Nvidia GPUs.

But like so many exciting technologies from Nvidia, then, part of the reason why they exist seems to be to push gamers onto Nvidia GPUs, or to keep them there if they are already on Nvidia. Thus Nvidia's default position is to make these features Nvidia-only and leave the likes of AMD playing catch up. It was ever thus.

Read Entire Article