Tu banner alternativo

Generative audio

In today's world, Generative audio has become a topic of increasing interest. Over time, the importance of Generative audio has been consolidated in different areas, from people's personal lives to the global economy. The relevance of Generative audio has led to research being carried out and debates generated around this topic, in order to fully understand it and make informed decisions about it. In this article, we will explore different aspects related to Generative audio, from its origin to its implications today, with the aim of providing a comprehensive vision of this topic that has so much impact on society.

Tu banner alternativo
Audio curves[relevant?]

Generative audio refers to the creation of audio files from databases of audio clips.[citation needed] This technology differs from synthesized voices such as Apple's Siri or Amazon's Alexa, which use a collection of fragments that are stitched together on demand.

Generative audio works by using neural networks to learn the statistical properties of an audio source, then reproducing those properties.[1]

Implications

With this technology, a person's voice can be replicated to speak phrases that they may have never spoken. This could lead to a synthetic version of a public figure's voice being used against them.[2]

Technology

Modern generative audio systems employ various deep learning architectures. One notable approach uses generative adversarial networks (GANs), where two machine learning models work against each other to create realistic audio. Other architectures include WaveNet, which uses dilated causal convolutions to model raw audio waveforms, and implementations like 15.ai, which demonstrated in 2020 the ability to clone voices using as little as 15 seconds of training data through specialized neural network architectures.[3][4]

See also

References

  1. ^ "Fake news: you ain't seen nothing yet". The Economist. July 2017. Retrieved 2017-07-01.
  2. ^ Zotkin, D. N.; Shamma, S. A.; Ru, P.; Duraiswami, R.; Davis, L. S. (April 2003). "Pitch and timbre manipulations using cortical representation of sound". 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Vol. 5. pp. V–517–20. doi:10.1109/ICASSP.2003.1200020. ISBN 978-0-7803-7663-2. S2CID 10372569.
  3. ^ Chandraseta, Rionaldi (January 21, 2021). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on January 21, 2021. Retrieved December 18, 2024.
  4. ^ Temitope, Yusuf (December 10, 2024). "15.ai Creator reveals journey from MIT Project to internet phenomenon". The Guardian. Archived from the original on December 28, 2024. Retrieved December 25, 2024.