Generative audio

Generative audio refers to the creation of audio files from huge databases of audio clips, such as creating phrases and sentences that may have never been actually spoken. The technology differs from AI voices such as the Apple's Siri or Amazon's Alexa as those use a collection of fragments that are stitched together, on demand.

Audio curves

"Generative audio works differently, using neural networks to learn the statistical properties of the audio source in question, then reproducing those properties directly in any context, modelling how speech changes not just second-by-second, but millisecond-by-millisecond."[1]

Implications

With this technology, phrases can be put together whereas the voice owner may never have spoken. For instance, statements can be made with a public figure's voice that and used against them. Imagine the voice of the United States President declaring war on a country without.

"When a natural source such as a human voice or a musical instrument produces a sound, the resulting acoustic wave is generated by a time-varying excitation pattern of a possibly time-varying channel, and the sound characteristics depend both on the excitation signal and on the production system."[2]

Technology

This method uses generative adversarial network (GAN), a deep machine learning technique where computers work against each other to create a more believable image or soon, pieces of audio.

"...algorithms were able to produce speech that occasionally sounded perceptually similar to the target speaker but work remains to be done."[3]

References

"Fake news: you ain't seen nothing yet". The Economist. July 2017. Retrieved 2017-07-01.
Zotkin, D. N.; Shamma, S. A.; Ru, P.; Duraiswami, R.; Davis, L. S. (April 2003). Pitch and timbre manipulations using cortical representation of sound. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Vol. 5. pp. V–517–20. doi:10.1109/ICASSP.2003.1200020. ISBN 978-0-7803-7663-2. S2CID 10372569.
Mobin, Shariq (October 2016). "Voice Conversion using Convolutional Neural Networks". arXiv:1610.08927 [stat.ML].

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Fake news: you ain't seen nothing yet". The Economist. July 2017. Retrieved 2017-07-01.

[2] Zotkin, D. N.; Shamma, S. A.; Ru, P.; Duraiswami, R.; Davis, L. S. (April 2003). Pitch and timbre manipulations using cortical representation of sound. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Vol. 5. pp. V–517–20. doi:10.1109/ICASSP.2003.1200020. ISBN 978-0-7803-7663-2. S2CID 10372569.

[3] Mobin, Shariq (October 2016). "Voice Conversion using Convolutional Neural Networks". arXiv:1610.08927 [stat.ML].

Generative audio

Implications

Technology

See also

References