Following the success of the IBC Accelerators Project on synthetic humans in 2023, John Maxwell Hobbs caught up with one of the project leaders to discuss subsequent progress and wider market adoption.
One of the eight IBC Accelerator projects in 2023 looked at ‘synthetic humans.’ A synthetic human differs from a traditional CGI animated character in that they are intended to function in real-time in response to a variety of data inputs. They can take several different forms, from a simple text-based chatbot to fully synthesised voices and video personas. For example, the two demos from the Accelerator project brought an historical figure, the opera singer Maria Calas, back to ‘virtual’ life and used voice synthesis to vocalise what a presenter was communicating using British sign language.
Michael Davey, Founder and Technologist at Michael Davey Consulting, was project lead on the second workstream looking at presenting weather forecasts in British Sign Language. Discussing industry developments in the area since the project concluded, he says: “The last six months have been marked in the industry with ongoing exploration into making digital humans more relatable and realistic through better understanding of human actions, enabling more intuitive editing of images and videos, and animating fictional characters in innovative ways. This includes efforts to replicate human body language, micro-gestures, facial expressions, and verbal communication, aiming for digital humans to be central in storytelling and creative processes.”
Read more IBC2023 Accelerator Project: Synthetic Humans
Use cases
These technologies are not limited to research projects. There has been a recent flood of AI video creation technologies reaching the market over the past year. In the run-up to IBC 2023, Apple announced the Vision Pro with its Personas, and Epic, the creator of CGI system Unreal Engine announced MetaHuman Animator.
Synthetic Media companies such as Synthesia, HeyGen, D-ID, Colossyan, Hour One, Elai, Runway, Pictory, Deepbrain AI, InVideo, and Fliki also continue to grow in popularity.
Broadcast media is already finding several use cases for AI-generated humans beyond the obvious film and gaming applications. For example, in early 2023, D-ID and Radio Formula launched three AI newscasters (Nat, Sofi, and Max) in Latin America. In December 2023, entrepreneur Adam Mosam launched Channel 1 AI News in the US, also with digital human newscasters.
Interestingly enough, the use of virtual news presenters is not new. The first was Ananova, introduced back in 2000 by the Press Association in the UK and retired in 2004.
Synthetic humans are increasingly being used as a new form of user interface for a variety of different systems. There are companies who are developing systems to as the face and voice of GPT tools such as ChatGPT, particularly in customer service roles with varying levels of success. Examples include Deutsche Telekom, L’Oreal, Kiehl, Vodafone, and UBS.
Technological advances
A big challenge for the acceptance of synthetic humans is what is known as the ‘uncanny valley.’ In a nutshell, when an artificial human gets close in appearance to a real human, a strong sense of revulsion is experienced by the viewer. If that ‘valley’ can be crossed and the digital creation becomes almost indistinguishable from a real person, that negative reaction goes away.
Until recently, the techniques used in this field have been based on traditional CGI methods, but that is changing. “A big development is the use of neural radiance fields (NeRFs), Gaussian splatting (GaSp) and spherical harmonics techniques to represent digital humans, rather than the more traditional texture-mapped polygon mesh approach,” explains Davey. “Early indications are that generative AI, polygon 3D engine, and GaSp approaches are starting to merge. This year is looking like it is going to be really interesting for these technologies.”
This area remains very active from a research and development viewpoint, with several significant papers being published over the last few months, from Universities like CMU, Max Planck Institute for Intelligent Systems, ETH Zurich, Tsinghua University, ShanghaiTech University, University of Hong Kong, University of Surrey, and Technical University of Munich, and commercial organisations like Synthesia, Meta, Apple, NVIDIA, Flawless AI, Google, and Toyota.
“Meta’s Pixel Codec Avatars approach is really interesting,” adds Davey. “Synthesia and Toyota are amongst the leaders in merging GenAI, GaSP, and 3D engine technologies.
“Furthermore, there are also indications that the wider fields of cloud computing, immersive technologies - spatial computing and XR, with the latter including 3Dweb and metaverse - and accelerated computing (GPUs, NPUs, neuromorphic computing and quantum computing) are converging.”
Real versus synthetic performers
A major point of negotiation in the recent agreement made in the US between the performers guild SAG-AFTRA and the Producers Guild was in terms of what they refer to as ‘Digital Replicas.’
The Guild draws a strong distinction between what they term a ‘Digital Replica’ and a ‘Synthetic Performer.’ Essentially, a Digital Replica is a recreation of an actual person’s voice or likeness, and a Synthetic Performer is a ‘wholly digital’ creation that does not resemble an actual performer and is not voiced by a person.
Commenting on the thinking underlying the new agreement, Davey says: “The guilds and unions play a really important role in surfacing the multi-faceted nature of these technologies and starting those conversations around being fairly compensated for the use of their digital likenesses.
“The agreement provides a really useful foundation on which to build out industry standards, commercial models, and agreement frameworks, and those agreements are also providing a starting point for similar discussions in other territories and professions. In turn, with clear rules, technology developers and content producers are more willing to invest in and develop new uses for digital replicas, knowing they are operating within an agreed framework,” he says.
The multidisciplinary nature of IBC’s Accelerators was of significant benefit in addressing these sorts of questions. “We were really fortunate to have HAND (Human & Digital), Signly, and Verizon as part of our accelerator team, guiding us on these issues,” explains Davey. “In fact, HAND hosted a webinar on 7th February of this year with a large panel of industry professionals including Duncan Crabtree-Ireland from SAG-AFTRA, where they explored many of these topics.”
One of the elements necessary for widespread adoption of any technology in broadcasting is the implementation of standardised frameworks. These are necessary to allow for the integration of new technology with the broad range of existing technologies that are provided by many different suppliers. “There is interesting work going on at Metaverse Standards Forum (MSF), Alliance for OpenUSD (AOUSD), Open Metaverse Interoperability Group (OMI Group) and Academy Software Foundation,” says Davey. “In particular, seeing the OpenUSD (MSF) and glTF (Khronos) teams working together to converge the best features of glTF, OpenUSD and FBX formats. HAND’s work with talent ID standardisation is also noteworthy here.”
Obsolete humans
The introduction of any new technology is always accompanied by the worry that it will put people out of work. The coincident timing of the availability of realistic synthetic humans along with generative AI such as ChatGPT DALL-E that can create realistic text and video has only served to stoke this fear. “This is one of the fascinating developments over the last six months,” adds Davey. “One of the key findings from the IBC Accelerator programme is that a purely generative AI approach won’t scale, for the reasons outlined in the industry review. However, as noted above, early indications are that the polygon-based 3D engines (rasterization engines), NeRFs, GaSps and genAI approaches are merging.”
However, Davey is optimistic about a continuing role for actual human beings. “I suspect that we’ll also see some giant leaps in the emotional connection that digital humans can invoke, over the next couple of years,” he says. “At our heart, the human race are storytellers. Just as desktop publishing allowed anyone to create a book and the Internet allowed anyone to create a podcast or channel, so digital humans will make it easier for anyone to tell their story visually. Democratisation of technology means being able to automate the mundane and the tedious so humans can concentrate on the creative: telling a compelling story, and creating an emotional connection. Or, exploring the human condition, creating a social and cultural dialogue, engaging with the consumer or audience.”
Read more IBC Accelerators 2023: Where Are They Now?
Accelerators 2024
This year, 12 shortlisted teams will pitch their projects at the Accelerator Kickstart Day on 6 March, at IET London, with a view to presenting demo PoC’s to an international audience at IBC2024 in Amsterdam. Submissions for challenges for the Accelerator Media Innovation Programme 2024 are now closed.
No comments yet