Mozgostroje

Binaural Audio - Spherical Head

08 Sep 2014

[]

The psychological method is just before it's next revolution. Forty something years ago psychologists abandoned the cages with pigeons and rats. Instead they studied humans by measuring their response times and giving them questionnaires. Now, the response boxes and questionnaires are obsolete. Virtual reality has taken their place. Why now? Why took it so long? There are two aspects to this. The hardware has become cheap and powerful enough to support immersive experience. On the software side, game engines like Unity and UE4 have become accessible (in terms of ease-of-use and price tag) and allow researchers to develop complex virtual worlds. The technology still requires further evolution and most of the issues (such as insufficient resolution and frame rate) are well-known and are being tackled. There are few issues which have been neglected, but need to be resolved before the technology can be used in research.

In this post I consider binaural audio. Wikipedia gives good introduction to binaural audio. This tutorial by Richard Duda goes into more detail. Binaural audio, when well implemented, can induce a strong feeling of presence - just like VR goggles do via the visual channel. Unfortunately, for whatever reasons modern video games and game engines do not use binaural audio.

In this post I introduce few simple models that try to create the immersive binaural experience by properly filtering the sound. Let's see how good these solutions are.

First, we will use this sound.

In [1]:
%pylab inline
buzz=np.load('buzz.npy')
fr=44100;tot=buzz.size/float(fr)*1000
plt.plot(np.linspace(0,tot,buzz.size),buzz);
plt.xlabel('time (ms)');plt.ylabel('amplitude')
K=500
buzz=np.concatenate([buzz]*K)
Populating the interactive namespace from numpy and matplotlib

We repeat the sound few hundred times to obtain 5 second buzz sound. This is what it sounds like: (You need to listen through headphones.)

In [2]:
from embedaudio import Audio as EmbedAudio
def embedLR(left,right=None,rate=44100):
    ''' assume int16 format'''
    if right is None: right=left
    out=np.concatenate([left[:,np.newaxis],
                        right[:,np.newaxis]],axis=1)
    out=out.flatten()/32767.
    return EmbedAudio(out,rate=rate,nrchannels=2)
embedLR(buzz,rate=fr)
Out[2]: