AI that produces sound through analysis of a source video is impressive. Fooling...

anigbrowl · on June 13, 2016

You're right, but as someone who's done a lot of sound editing/foley work I can't help having mixed feelings at seeing yet another job skill automated away. Good part - in a few years this will be good enough for commercial use which will save sound editors all sorts of tedious dull work and free them up to do more exciting creative stuff. Bad part: the tedious dull work was also what paid the bills. The easier it is to do that stuff automatically, the less people are willing to pay for good quality work.

Rather than now being able to make a living do the sort of fun really creative stuff like inventing new sounds for teleportation devices or dramatic natural phenomena, editors are more likely to be asked to work for free on the theory that they'll gain great exposure for their creativity. That's generally a very bad bargain. If past trends in the electronic dance music market are anything to go by, increasing automation will not reward true creative talent but rather just lead to an arms race to have the latest sound libraries, synthesizers etc. and just be the first to market with big splashy new sounds that offer superficial novelty.

Ability to provide high-value equipment below normal rental cost frequently trumps considerations of talent in the film industry. Similarly there are plenty of crappy directors of photography out there who get hired regularly because they own a pile of nice lenses and related camera equipment, and hiring them plus their camera package looks economically attractive on paper because it's hard to quantify photographic talent.

slr555 · on June 14, 2016

I have so much respect for foley artists. Artist being the operative word. People don't appreciate the hard work and creativity that goes into making the perfect sound.

dahart · on June 14, 2016

As someone who's worked in the film industry (visual fx & CG) I'm subject to the same problem, but all tedious jobs from the industrial revolution on have been automated away one by one. I can understand the lament for something you worked hard on, and this isn't to take that away from you, but most job skills do actually have less value in the market over time, right? The other way to look at it is that what counts as good quality work changes and improves continually over time, higher and higher quality becomes available for the same price. Jobs are continually being reinvented, and people always get to work on the interesting parts that can't be automated. Something that took many people to do one decade only takes one person the next decade. This has been true for hundreds of years from farmers to accountants to cooks to car makers ... This "problem" is here to stay, our economy hasn't crashed yet, and there are as many creative people as ever.

nashashmi · on June 14, 2016

Great Line:

the tedious dull work was also what paid the bills. The easier it is to do that stuff automatically, the less people are willing to pay for good quality work.

bravura · on June 13, 2016

Just an aside, but I really like the videos that Hiss and Roar makes to accompany their sound libraries:

For example, "Vegetable Violence": http://hissandaroar.com/sd001-vegetable-violence/

"Vegetable Violence is an organic sound effects library for creating your own orchestrated sonic mayhem. Vegetable rips, tears, squelches, hits, punches, stabs all recorded & mastered at 96kHz for stomach churning realism, this component library of gore sound effects is available for immediate download."

dogecoinbase · on June 13, 2016

Aside to the aside -- if you like this and also like Italian horror films, you should check out the movie "Berberian Sound Studio".

daturkel · on June 13, 2016

What about the reverse? I saw BSS but feel like I don't really know the world it refers to.

bazzargh · on June 14, 2016

Watch Dario Argento's films - Profundo Rosso, Suspiria, Inferno, Tenebrae. Those should all be quite easy to get hold of. Then start in on this list... https://en.wikipedia.org/wiki/Giallo

Suspiria in particular sticks in the mind. Great music, saturated colours, properly horrific horror. It's a bit more 'on-screen' than BSS, btw.

stcredzero · on June 13, 2016

AI that produces sound through analysis of a source video is impressive. Fooling humans is not. Since most of us have grown up on a steady diet of film and television many of the sounds we have in our memories are the work of foley artists that add sound effects to sequences in post.

Right on! The fact that most audiences seem to expect a shotgun racking sound in a scene with a shotgun that doesn't even have that mechanism, or that drawing a katana is so often accompanied by a metallic "shing" and rattling sound -- these indicate the degree to which large swathes of people are drastically disconnected from an immediate and physically connected sense of how sound relates to the world around them.

I think this is also related to the degree to which I find many people are unaware of the kinesthetic feeling of how beats are emphasized, and how this changes the feel of music. The most vibrant intelligence involves a connection to the world in realtime. You can hear how machine parts interrelate just as much as you can see them. (You can even smell how they interrelate!) This disconnection even seems to be directly correlated with a loss of self awareness and flexibility in problem solving. It's like we're raising generations of brains over-trained on the simplistic and highly abstracted world of media tropes and vastly under-prepared for the messy complexity of the natural physical world.

andyjdavis · on June 14, 2016

>or that drawing a katana is so often accompanied by a metallic "shing"

Relevant demonstration and explanation of why the "shing" noise isn't a thing outside of movies. https://www.youtube.com/watch?v=yzbfuI0PMdA

JorgeGT · on June 14, 2016

Condensed for people who can't view the video: when you want to stab someone, it is better to not alert them beforehand with a characteristic sound.

ludamad · on June 14, 2016

I'd suspect that there is a wide range of what we're willing to accept in a given situation, reflecting our incomplete model of how sound works. However, this isn't the same as us accepting any substitute sound, the TV tropes continue because their absence feels awkward. An AI that correctly mimicks TV-acceptable sounds is just about as impressive (however, this isn't on our list of 'hardest problems', for sure).

stcredzero · on June 14, 2016

I'd suspect that there is a wide range of what we're willing to accept in a given situation, reflecting our incomplete model of how sound works.

Given what I was talking about, it's largely a matter of people accepting symbols or tokens of things in lieu of perceiving the actual thing. It's a form of ignorance that masquerades as culture or "sophistication." (It is the former, but it's not the latter.)

SixSigma · on June 14, 2016

The vocabulary of sound. There is also an equivalent vocabulary of vision.

That's one reason films from the 40s seem so different to today's. I suspect a cinema goer from then would have some trouble keeping up with the narrative of a 21 century movie.

stcredzero · on June 14, 2016

If you see very high definition projections/videos of 40's movies, you'll find that there was sometimes an incredible humanity that came across from the actors then. The cinematography could be incredible at this. I bet a lot of modern audiences would see such a thing and be like, "Aw, man, where are the explosions!?"

lisper · on June 13, 2016

I don't see what that matters. Just because the human baseline comes from what is essentially a virtual reality instead of actual reality it's still quite a challenge to generalize sounds from associated images. Just because the AI is an artificial foley artist rather than a model of the real world doesn't make it any less impressive to me.

Aelinsaar · on June 13, 2016

More importantly, what are all of the foley artists of the world thinking right now? "Oh dear god no" would be my first guess. Suddenly the prospect of needing to beat a computer at your job is rearing its head.

dragontamer · on June 13, 2016

You mean like how the "tweening" industry completely destroyed artists?

https://en.wikipedia.org/wiki/Inbetweening

Nah, if this "AI" becomes useful, then sound artists will just be expected to use this new tool. AIs have taken over "tweening" (which is why 3D Animators exist. To specify model movements in such a way that the Computer can automatically create physics simulations or whatever to automatically fill in the annoying-to-do crap).

But the new tools have only made 3d animation more popular, leading to even more artists and more 3d animated content. And bigger productions (ie: Big Hero Six used a lot of AI for the city. "The Lion King" used flocking AI to animate all the Bison during the Stampede scene.)

AIs don't always destroy jobs, they sometimes create them. They replace jobs that no one wants to do (who wants to animate 500+ bison running down a cliff? Nah, lets have the AI do that). Letting the artists focus on more meaningful tasks, leading to a higher quality in production.

Aelinsaar · on June 13, 2016

What will people do with this tool exactly, once it's a mature tool? It doesn't sound like it will require much in the way of human guidance at some point down the line.

dragontamer · on June 13, 2016

Oh come on. Amplitude, envelopes, equalizing, balancing the audio.

If this were a professional production that needed to be matched up with two voice tracks and background music, the sound designer will use the AI to create the sound for the background events... but still needs to balance the various audio tracks so that the audience knows what to focus on.

https://youtu.be/kSQqv2UuvC0?t=4m8s

The abstract subway sound in the background may be chosen by an AI rather than a human. But the human will still need to determine the amplitude of the various voice tracks, the background music. Its not like these films make themselves.

Even IF somehow an AI became good enough to make all those decisions (and most of those decisions are more "style" and "art" rather than hard-and-fast rules)... the video editor needs to choose the cuts, the order of the scenes and more.

No jobs will be at risk by this tool. If successful, it'd only become one more tool in the MASSIVE toolbox that video editors / sound designers are expected to master.

-----------------------

Anyway, "Tweening" AI completely eradicated one form of work for cartoonists. Humans aren't doing "tweening" work anymore. Big studios are making 3d productions where AIs can "tween" everything for you. Even 2d anime are using 3d animation techniques to cut down on the work and to leverage the AI.

It takes no work to command the AI to "tween" frames. But picking the right algorithm, deciding when to use "smear" animation style (stylized tweening) or changes to algorithms to switch things up for the audience?

Yeah, those things will always just be straight up work.

ThomPete · on June 13, 2016

You are looking at this from such as simplistic perspective.

We are not at the limits of what machine learning can do, We are barely at the beginning.

Tweening or "make sound that can fool humans" is not what's important here it's the underlying "mechanics" that allow them to do these things which can be applied to so many other areas.

What make humans extraordinary is that we can combine our various mechanical and intellektuel abilities to adapt to our surroundings. What we are witnessing is another "species" who can do this and is only at the beginning.

dragontamer · on June 13, 2016

I spent two years of my life writing an automated logic system for a professor. Trust me, I know what AI can and can't do.

In my experience, when AI reaches the critical mass of usefulness, it becomes a tool within the industry. Automatic solving of logical puzzles? Yeah, electrical engineers use PSpice to optimally lay out logic gates in CPU Chips.

Automated logic can be used to verify extremely complicated mathematical proofs, or even search for new mathematical truths! So what happens? Well... some company creates a product with the AI, then sells it as a tool.

We exist in an age where AIs are responsible for searching and coalescing information. (Erm, how often do you use Google's database?) It wasn't very long ago when search was considered an AI problem... but as soon as computers successfully do it better than humans, its a "tool" and "not AI" anymore.

The last 50 years of AI history has taught me one thing: when AIs are successful, humans change and stop thinking that the task was "intelligent". Chess as a measure of intelligence? No longer, once Chess AIs got good.

Database search? No longer, now that Google is faster than humans.

Automated driving? Was an AI task, now it isn't one. People are already discussing how its a tool for Truckers or Uber to use to make more money.

-------------

"Intelligent tasks" become "tasks for tools" because thats how stuff sells on the market. You wouldn't believe what people thought was "intelligence" in the 80s. Chess, Databases Search / Natural Language Processing, automated logic, symbolic mathematical solvers, chip layouts, compiler optimizations... and everything that we just take for granted today.

Similarly, the tasks we consider "intelligence" today will simply turn into tools for the next generation once the AIs are written that solve that problem.

ThomPete · on June 13, 2016

You are IMO making the same mistake Searle made in his "Chinese Room" argument.

Your digestive system is a tool for you to get rid of garbage your system does not need, your neurons are tools for allowing you to ultimately think, your legs are tools for allowing you to move around.

It's the entire system thats relevant here. Not any of the individual subparts.

And we are not talking about what it can or can't do but it's potential.

You brush this off as a humans will always find a way... which is what I am objecting to.

Whether you spend 2 or 20 years writing automated logic systems for a professor is unimportant.

anigbrowl · on June 13, 2016

As a sound editor with 2 decades of first doing it for fun and then for a career I don't think that balancing the tracks is uniquely human and immune from automation.

This isn't going to lead to some new golden age of well-produced soundtracks, it's just going to make big bombastic soundtracks cheaper and more common. For a few years everything is going to sound like a disaster movie. Some would argue that we've already got that problem and I can't entirely disagree.

In short, quality won't go up, prices will go down and oversupply will result in excess.

Aelinsaar · on June 13, 2016

You clearly have a significantly deeper knowledge of this topic than I do, and I'm inclined to accept that and thank you for the lesson.

tuewocnc · on June 13, 2016

Good! I can't abide foley. Nature documentaries are ruined by it. A tiny ant eating a leaf, accompanied by horrible sounds of plastic-wrap being twisted. Why not record the actual sounds of an ant eating ? It's supposed to be a documentary. And if the ant doesn't make any sound then just leave some silence.

wtetzner · on June 13, 2016

This wouldn't remove foley, it would just replace the humans doing it with computers.

tuewocnc · on June 13, 2016

but what sound is a DNN going to associate with an insect eating ?

Aelinsaar · on June 13, 2016

The answer is definitely: How did you train it? If you trained it by showing it tons of movies and television, then you'd get a modern foley artist in a box, right? If you want them to do something else, you'd need to get a bunch of stock footage with stock audio. Both are doable, but which was done here?

dogma1138 · on June 13, 2016

Allot of foley artists are already delegating most of their work to premade audio libraries rather than creating unique sounds on their own.

Aelinsaar · on June 13, 2016

I didn't know that, but I suppose even then they can claim that there artistry exists in the choices they make. If a machine can make equally viable choices from the audiences' and critics' perspectives...

mfoy_ · on June 13, 2016

Everyone knows that if you fall while dying you make a Wilhelm scream. It's a clearly documented scientific phenomenon with earliest records dating back to long, long ago.

cleeus · on June 13, 2016

Yes, and often sounds from huge sound libraries are used. I allways smile when I hear a sound from Unreal Tournament 4.36 in a movie or on TV.

Karunamon · on June 13, 2016

Or Doom.. I remember very clearly the intro to Modern Marvels on the History Channel uses the Doom door opening noise.

I guess they used the same SFX library or something?

bitwize · on June 13, 2016

The library used by Doom is REALLY common. The sound DSBOSPIT (sound of boss demon spitting a telecube in Doom 2) in particular is so overused it's not even funny. You hear it everywhere: in budget movies when a house or plane explodes, sometimes in other video games.

zodPod · on June 13, 2016

I hear the Protoss building creation sound used a lot as well!

kdeldycke · on June 13, 2016

The Turing test in this case might consist of feeding the algorithm with a mute video of a scene from Monthy Python and the Holy Grail, when coconut halves are used to simulate the sound of galloping horses.

drzaiusapelord · on June 13, 2016

All of your examples sound a lot like what these things sound in real life. Yes, there's hack foley work but the reality is that these aren't arbitrary sounds. You don't need an actual horse to get horse sounds.

I think you're shooting for some idealized authenticity argument here that just doesn't work. I work in a tourist area where horses walk on cobblestone. Yeah it sounds exactly like what the foley guys do with coconuts. Its uncanny. Also in the digital age, a lot of foley work are samples of real sounds. We don't have guys in sound booths making new sounds anymore with gadgets and old shoes and such, outside of edge cases.

I know everyone likes to feel clever when they identify a popular sound, but most of the times those are intentional homages and you have to consider that millions of sounds you don't recognize. Its not all from some static library of 1930s foley artists punching meat and knocking together coconuts anymore.

pescolly · on June 13, 2016

From what I've observed the sound of horse hooves is made by taking two halves of a coconut and banging them together.

sporkologist · on June 13, 2016

I heard that was done in the Monty Python movie because of needing to save money by not getting a real horse. So that kinda makes it even funnier.

IshKebab · on June 13, 2016

Fooling humans is fairly impressive - they didn't just ask "is this real sound", they played the real sound and the synthesised sound and ask "which one is the real sound".

Although it looks like they only used a sample of 3 people which is pretty small, I imagine the parametrically synthesised sound would have fooled no-one.

esnard · on June 13, 2016

I would love to try this test. I guess I would be tricked too, but that's definitely something I want to try.

fudged71 · on June 13, 2016

Or the sound of a Bald Eagle ;) http://www.npr.org/templates/story/story.php?storyId=1561873...

pluma · on June 14, 2016

Somehow this factoid is one of the most American things I can imagine. It's almost allegorical.

moozilla · on June 14, 2016

Also in the clips in the video is fooled all _three_ of the participants tested. Couldn't find anything in the paper about sample sizes... but hopefully it was more than three...

stepvhen · on June 13, 2016

The sound of the hamburger rain in "Cloudy with a Chance of Meatballs" was wet brown paper towels (you know, from school?) being flopped against a floor.

jb1991 · on June 13, 2016

And not just our perceptions of sounds are skewed from a lifetime of manipulation, but so too our perceptions of the images themselves.