Weeks 8-11 Spatial Audio for VR

The Workflow Challenge:  Ambisonics or Stereo?

Spatial audio helps to enhance the experience of immersion in VR. Despite its importance, designing spatial audio for VR is not easy as there is no unified workflow from the stages of recording, postproduction to the final delivery to the target platform. The quality of sounds also varies greatly between audio engines as I will discuss later.

Spatial audio is primarily used in game and cinematic VR. The audio in cinematic VR has a strong emphasis on the timed playback and synchronisation with video, and less dependent on user interaction except the head tracking in rendering. On the other hand, VR game has higher demand for interaction and less time-based. As the project is a semi-cinematic VR app made with a game engine, I preferred to make room for interactivity that can be easily controlled in Unity instead of mixing down a pre-baked 360 soundscape in a digital audio workstation that currently has so little support for the two major ambisonic formats, namely the traditional FuMa B-format or the popular AmbiX B-format (ACN/SN3D, also known as the first-order B-format).

Enda has proposed four types of audio for the app earlier, including voiceover, music, UI tones and sound effects in general. He took care of the narration and produced music and UI tones. I focused on recording and processing ambience, and implementing spatial audio in Unity with third-party sound engines. The narration, music and UI tones throughout the tour are not supposed to be spatialized. The other audios in our app respond to the head movement, and the audio playbacks depend on user interaction. Enda had asked me to prepare ambience background for each static locales. I discovered later that even as subtle as ambience sounds can have HRTF enabled and respond to the head rotation if done properly from the beginning.

The workflow for designing spatial audio starts from recording. With the ZOOM H2n we can have sounds recorded in a 4-channel AmbiX format, or having regular stereo audios with perfect mono compatibility. A major problem for spatial audio comes in postproduction. Currently only Pro Tool, Reaper and Nuendo that have plugin supports for ambisonics. Mainstream media player such as Quicktime Player, iTune don’t have playback functionality for ambisonics. By contrast, stereo files don’t have such support problems in playback or postproduction. After mixing in a DAW, the audios will be imported into Unity and mapped with head rotation data in the background for real-time rendering of directional sounds. Unity has native support for ambisonics, but the sound quality is far from ideal compared to other third-party audio engines. The Google VR SDK enhances audio spatialisation with the Gvr Audio Soundfield APIs that aims specifically to support ambisonics.The Gvr Audio Soundfield API was mentioned in the Google I/O talk in May 2016 . However this feature was not released until 4 August in the v.0.9.0 SDK. The official documentation was updated the next day. If the GvrAudioSourcefield APIs was available when I made the spatial audio demo in mid-July, I would probably record the audios in the ambiX B-format, have them processed in Reaper and configure them in Unity with the Gvr Audio Soundfield APIs. Due to the time constraint, I don’t have time to swap the ambience with the GvrAudioSourcefield APIs in Unity. Instead, I only made a Youtube demo to show that we have the ability to implement ambisonics in Unity with Google VR SDK.

In light of the compatibility and support problems for ambisonics, I chose an easy workaround for the audio spatialisation. I have the audio recorded in stereo and processed in Audacity. Then the audio clips were imported in Unity. These audios were sprinkled as invisible game objects in the scenes to give a sense of specialisation when users interact with them. As the Gvr Audio Source APIs supports stereo and mono audios and has better spatialisation than the native audio engine in Unity, it was used for the sound effects that have directions in our app.

Recording and Editing Ambience

The first two weeks in August was primarily spent on recording and editing ambience. I recorded the ambience with the ZOOM H2n recorder. It has 5 built-in microphones, with two arranged as an XY pair and the other three configured in an MS pattern. The ambience was recorded in 4-channel mode, with signals from the XY mics and the MS mics recorded onto two separate stereo tracks. The output is two synchronised stereo audios for each locations in the tours. I also upgraded the software to Firmware v2.0. After the upgrade, the ZOOM H2n in the lab can produce first-order ambisonic surround sound and can write the 4-channel audio onto one track in AmbiX B-format. Due to the limit of time, I didn’t have time to re-do the ambience recordings in AmbiX B-format, but settled with the traditional stereo recordings I made in early August. With the GoogleVR SDK for Unity, the rendering of audios can adapt to the head movement in real time.

The ambience recording and editing took more time than expected, spanning from 19 July to 13 August. I first recorded the ambience during the daytime with lots of human activities and conversations. The panorama in the July demo was also packed with people. However, the formal 360 photos and videos was shot in an early morning when the campus was quiet and empty. It sounded very strange to have human activities in the ambience for the serene morning scenes. So I did the field recordings again in the early mornings on 4, 5 and 6 August. As the environment was very quiet, I turned up the Gain levels, fearing to miss the bird tweets and rustling leaves at the background. It turned out to be another mistake, as increasing gain in the recorder also introduced noises, which was difficult to remove in postproduction. Jill suggested not to turn up the Gain level during the recording, but only bring it up in the postproduction when necessary. So I went for another early morning field recording on 13 August and have the Gain level set at 0 or 1. This batch turned out to be very quiet, with the RMS at -70dB in general. I normalised the soundtracks to somewhere between -28dB and -36dB before putting them in Unity. The recordings in August formed a large part of ambience ready for use. The recordings taken in July was mostly discarded. Also, in the postproduction stage, Enda reminded me to remove the low-frequency rumbling in the ambience with high pass filter, which was very useful.

Sample recording 1: Regent House ambience with conversation

Sample recording 2: Regent House ambience without conversation

Sample recording 3: Front Square ambience with conversation

Sample recording 4: Front Square ambience without conversation

Building Spatial Audio and Collaboration In Unity

I had proposed to Enda in July that we should use Git for version control and Github for remote collaboration to keep everything in sync. Enda thought Git was not very helpful for Unity. Git is great for handling text-based files like scripts, but might have problems with scenes and prefabs as these are normally stored as a mix of binary and text in Unity. Although we had the option to store the scene and prefab assets as text-based files , they were too important that we could not afford to break them. Also, it would be much easier for Enda to integrate my works in Unity if they were packed as prefabs and exported as Unity package. We eventually went with the prefab way of collaboration. A minor problem in this approach was that Unity did not support the update of nested prefab. A nested prefab would appear missing and had to be adjusted every time a new set of prefabs were imported into the project. As Enda finished a mature prototype in Unity with photospheres, videospheres, navigation markers and multimedia placeholder in early August, I started to build on his master copy of the project. The following is a snapshot of the structure of game objects in a scene of our project.

1. Photosphere (with Narration attached)
    1) Marker
        a) Navigation (NAV)
        b) Point of Interests (POI)
    2) Audio 
        a) Ambience (AMB)
        b) Directional Sound Effects (SFX)
2. Videosphere
3. UI Sounds
4. Music

I took care of everything grouped under the Audio in the above hierarchy. My work in Unity concerned with the placing and the control of ambience sounds and directional sound effects. I used the GvrAudioSource API to trigger ambience and directional sounds with the event trigger component, while Enda used the AudioSource API in Unity to control narration, music and UI sounds as these audios didn’t need to be spatialised. Hence the manipulation over audio in Unity was clearly separated by class and functionality, and my scripts would not break Enda’ work.

The ambience audio was dropped at a location close to the main camera, and the playback would be triggered when the camera entered the active photosphere. At first I used the GvrAudioSource for the ambience sounds, and off-set it slightly from the camera to increase a feeling of spatialisation. If the location of the ambience audio overlapped with the camera to which the GvrAudioListener was attached, the spatial illusion would not work anymore. The GvrAudioSource would only create a sense of spatialisation if the audio was played somewhere away from GvrAudioListener. Also, the Google VR SDK for Unity tended to attenuate the audio level and muffled the ambience which was very subtle already. After I swapped the ambience with better quality sound tracks, I suddenly realised the degradation of sound quality in Unity might not come from the audio alone, but caused by the rendering algorithm in Google VR SDK for Unity. Considering the overkilled spatialisation and the tendency of Google VR SDK to muffle the original soundtracks, we preferred to use the native audio engine in Unity for the ambiences for a few good reasons. First, the output from the native audio engine in Unity seemed to be much louder than the playback in GvrAudioSource. Second, the output from the native audio engine can be routed to a mixer in Unity, and brought to a desired level in a batch. As the GvrAudioSource sidestepped the mixer workflow in Unity, it also became an advantage in Unity to continue with a mixer approach to adjust audio levels between narration, UI sounds, music and ambience all in one place.

The directional sounds were sprinkled around the scenes and the playbacks were triggered by the gaze-over interaction. Although the GvrAudioSource was incompetent for the ambience sounds in our project, it was suitable for directional sound effects spreading out in the virtual world. Generally there was an empty object in the scene representing the audio source, and an event trigger that decided when the playback should be started. The event trigger could be attached to the object holding the audio source, or separated from it. An advantage of the last approach was that we can have the user looking at one direction but triggering sound from another direction. Also, the object serving as a trigger can be set inactive, its collider can be disabled, or the event trigger component can be disabled after the crosshair exited the colliding zone. This would guarantee the audio would be played only once. In the spatial audio demo in July, the reticle grew immediately after hitting an audio object. In the later build, I disabled the reticle when it entered the colliding zone, and enabled it again upon exit. This wouldn’t affect the interaction between the GvrReticle and the UI markers somewhere else in the scene. As the crosshair stayed unchanged after the collision with audio objects, users needed to explore the soundscapes without the aid of visual cues. 

Another issue was that I needed to be careful with the UI canvas or other game objects that might block the ray-casting from the camera on the audio triggers in the scene. After Enda changed the projection meshes from sphere to octahedron, many audio triggers was laid outside the projected mesh and the transform coordinates needed to be tweaked again. The audio design for each location also changed as more multimedia contents were brought into Unity. For instance, if Enda put an movie clip at the back of the user, I would place an audio trigger at a location that cannot be neglected, such as the area near the forward navigation marker, and to draw attention with directional sound to the position where the movie clip was placed.

GoogleVR, FB360 Rendering SDK (TwoBigEars’s 3Dception)and Compatibility Issue in Unity

In the last week of August, I started to consider implementing the ambience surround sounds with the newly release GvrAudioSoundfield APIs or another third-party plugin for Unity. 3Dception has been highly praised before the company Two Big Ears was acquired by Facebook recently. I had a false impression from the official website that the Two Big Ears ceased support to the Unity plugin. It turned out that the rendering engine was incorporated into the FB360 Rendering SDK, which was free for download. I tried both the FB360 rendering engine (henceforth TwoBigEars rendering engine for the sake of brevity), and the GvrAudioSoundfield APIs.

Here is a brief comparison between the GvrAudioSoundfield in Google VR SDK for Unity and the TwoBigEars rendering engine in FB360 in terms of sound quality, supported audio formats and the ease of use.

The GvrAudioSoundfield, as part of the Google VR SDK for Unity, tends to attenuate the sound levels, and developer had to turned up the Gain from the inspector to have louder surround sounds. But the average sound levels of GvrAudioSoundfield were higher than the GvrAudioSource. The TwoBigEars rendering engine tends to raise the sound to the highest possible levels and let the developer to scale it down in the range between 0 and 1.

The GvrAudioSoundfield have support for mono, stereo and the ambiX B-format audio which has 4 channels in one track. One can simply drag and drop two audios in the Unity inspector and the Google VR SDK would render them with head tracking data.

The TwoBigEars rendering SDK requires encoding in the custom *.tbe format, B-format (FuMa), B-format (ambiX) or quad-binaural format before decoding and rendering the audios in Unity. It has indirect support for the traditional mono, stereo audios in popular DAWs such as Reaper and Pro Tool through a rebranded spatialiser plugin. In Reaper, sound designers can use the FB360 plugin to spatialise mono or stereo sound tracks, and rendered the master 3D track into an 8-channel audio file in Reaper. The 8-channel audio would be encoded in the FB360 encoder before bringing it to Unity. I tried the customised *.tbe encoding and decoded the audio in Unity. The spatialisation was slightly better the GvrAudioSoundfield. However, it has more softwares involved, and the TwoBigEars audio engine in the FB360 rendering SDK only had a few playback parameters exposed in the Unity inspector. The TBE rendering engine doesn’t even provide a loop function in the inspector, and I had to write a simple C# script for looping an audio asset with the TBE rendering engine. The TBE APIs could be queried in C#, but the amount of exposed APIs was limited. In short, the GvrAudioSoundfield has a more easy-to-use interface in Unity while the TwoBigEars has better quality in audio rendering.

Last but not least, the Google VR SDK cannot work together with the FB360 rendering engine in the same project. They both works well with the native audio engine in Unity. Considering all the directional sounds in Unity used the Google VR SDK, it would be a large amount of work to migrate the directional sounds to the TwoBigEars rendering engine. Due to the time limits, Enda and I decided to leave the ambience rendering to Unity’s native audio engine, while sticking with the GvrAudioSource for the directional sound effects in Unity.

Sample ambience soundtrack playback in Reaper

Soundtrack encoded with FB360 Encoder and decoded with FB360 Rendering SDK (TwoBigEars’ TBSpatDecoder) in Unity

Soundtrack playback with Audio Source in Unity

Soundtrack playback with Google VR SDK’s GvrAudioSoundfield

Soundtrack playback with Google VR SDK’s GvrAudioSource

Advertisements

Music for TrinityVR

Having previously acquired an M.Mus in Production of Popular Music, I thought it would be good to make a contribution of some original music to the app. This not only demonstrates the breadth of skills in the project, but also served a valuable purpose. While the photospheres are stationary, hence so is their ambience, the videospheres involving Adrian walking with the camera, which would change the audio ambience as he went. Unfortunately, the microphone built into the camera is quite low-fidelity, picking up wind noise and rumble from footsteps too easily. Instead music could replace this ambience when in videospheres. In addition, Ying found the ambience recorded in the Museum Building and Long Room to be inadequate. We intended to have the Chapel playing music recorded there constantly, and it was decided that this concept would suit all indoor areas, making them feel a bit more different from the Trinity exterior.

All pieces of music were created and processed in Avid Pro Tools 12, except for Captain O’Kane, which was created in Logic Pro X, due to its more appropriate virtual instrument selection.

Trinity Dawn

My initial idea with music was to create sounds in the style of the ambient music genre. This genre was pioneered by Brian Eno (in works such as Ambient 1: Music for Airports (1978)), and places an emphasis on creating atmosphere through sound over traditional music structures like rhythm and melody. I wanted to sculpt a piece that was simple and unobtrusive, which felt both light and comforting. I call the finished work Trinity Dawn, as it seemed to naturally coalesce with Adrian’s 360º photography of Trinity, taken at a rather beautiful dawn. The music features long, deep and swelling tones, accompanied by a slow harp melody from which evolve light, crystalline sounds.

Ancient Stones

The harp became an important idea to me in thinking about music and sound for this project. The harp is not only a truly Irish symbol, but also a symbol of Trinity. Something about the gentle lull of the harp also evokes an older, more ancient world. For this reason, I composed Ancient Stones. The song is mostly driven by a rhythmic harp and a simple melody that has a universally traditional sound. In  Originally, the outdoor videos were to alternate between this song and Trinity Dawn. However, when it became more appropriate to have music in the indoors areas, I moved this song there, naming it appropriately, being inspired by the Museum Building’s old marble and limestone, as well as its ancient fossils and relics.

Captain O’Kane

Ying conducted research on Irish harp music and found some very nice pieces. In particular, she transcribed the melody of Captain O’Kane, a traditional harp piece by Turlough O’Carolan. However, the virtual instruments available in Cubase left much to be desired. I found a full score, including the rhythmic and harmonic accompaniment and transcribed it in Logic Pro X, which comes with some very good quality sampled instruments. I also included some extra voices: an Irish tin whistle and Irish hammered dulcimer, which added to the traditional sound. I also wrote a string accompaniment to enhance the second repetition of the tune. The striking and poignant sounding song came to reside in the Long Room, as well as the introduction video Hailee created.

Screen Shot 2016-08-31 at 18.03.36.png

Chapel Music

There were a few different ideas for the music in the Chapel, but unfortunately, time was against developing a particularly interesting feature out of it. However, my sister-in-law was a member of one of the Trinity choirs. She offered me several tracks of her choir that had been recorded in the Chapel itself. I processed these in Pro Tools, making sure to enhance the lush natural reverberation that the Chapel possesses. I created a script which would iterate though a these songs in a list, at the users discretion. The songs, which are mostly religious in nature, include: Faire Is The Heaven, Bring Us O Lord God, Agnus Dei (Lamb of God), Ubi Caritas and Pastime in Good Company.

The Intro Video

Early on in the project, we had realized that a lot of the information we wanted to convey throughout the tour could be easily tied to buildings, making their placement intuitive and simple. But there was some information, mainly contextual and having to do with wider history, that didn’t really fit anywhere along the route. This gave rise to the idea of a timeline, where the sort of synopsis of Trinity’s 400+ years of history could live.

In June, we wrestled with how this timeline would fit within the larger app. Would the user walk into it in Front Square? Would it live over the portals? Would it be its own sphere separate from the tour itself? Ultimately, we decided we wanted something inclusive, brief, and easy to interact with, so we chose to make a video.

That work fell to me. In writing the rest of the script for the POIs, I also wrote a script for the intro video. The script kept with the timeline theme, basically walking the viewer through the highlights of Trinity and Irish history. After making sure to have Enda and Adrian check the script for “Americanisms,” I downloaded After Effects and planned on taking full advantage of the 7-day trial. With everything else going on in the app, it was 3 days in before I actually got started. That didn’t leave much time, so I based most of my work off a timeline template thinking this would be more efficient. Looking back now, I realize I barely used any of it (of course).

Screen Shot 2016-08-30 at 9.38.27 PM
These animated infographics help to illustrate changes in Trinity over time

Before starting this part of the project, I had forgotten how annoying the rendering time is with After Effects. I wasn’t able to make any precomps because I was still waiting on the final narration (and therefore the final timing), so the RAM playback was painfully slow. The timeline used a combination of historical images, nested videos, animated infographics, and Blender renders.

 

The latter proved to be the biggest headache of all. I wanted to illustrate how Front Square used to look surrounded by red-brick buildings, so I used a model of the Rubrics building I’d made in SketchUp. Unfortunately, exports from SketchUp do not play well with Blender, and each of the models had well over 10,000 objects in them. This obviously slowed down my computer a lot, and placing each of them was very trying. But it worked out, eventually. I used Adrian’s

Screen Shot 2016-08-30 at 9.40.06 PM.png
Blender render illustrating recreation of red-brick quadrangle in Front Sqaure

360 photo of Front Square as the environment background and placed them within the world convincingly enough, having a centrally-located camera spin around as four of the models rose from the ground in a quadrangle.

I also animated a lot of graphics in the video. I keyframed each of them by hand to coincide with the narration to illustrate things like the divide between Catholics and Protestants, Britain’s imperial standing, and the kinds of educational offerings the school provided at first. All of this information could have been conveyed simply using narration, but I think the addition of engaging, animated visuals is more apt to keep the viewer entertained and paying attention.

WhatsApp Image 2016-08-30 at 21.47.39
Intro video in Front Square 1

Recording the Narration

Hey there, Enda here!

For the narration, I arranged to record a family friend, Alan Condell. Alan, along with his wife Marie, have been neighbours with my parents in Monkstown for over twenty years now. My mum recommended Alan to me, as he has a very articulate voice, full of dulcet tones. Alan generously donated his time and voice for this project.

I offered to go to Alan to record in his home. With me I brought a variety of sound recording equipment, along with my laptop. The two most vital pieces were the microphone and preamp.

The microphone I used is the Shure SM7B. This is a condenser mic first made in 1974 – it’s been in popular in the recording industry and radio use ever since. The mic has a reputation has an excellent vocal mic, especially for men, though its applications are flexible. The mic is frequently used by radio presenters as it gives them a warm and full-bodied sounding voice. I thought this mic would be ideal, being one of the best in my collection, in addition to its pleasant tone.

61CwS8vNCFL.jpg
The Shure SM7B

The preamp I used is the Golden Age Project Pre-73 (hereby referred to as the GAP73). A preamp provides gain to the microphone, in order to bring its amplitude up to a recording level. Not all preamps are the same however, and preamps may be prized for their transparency, or how they colour a sound. The GAP 73 is a modern take on the vintage Neve 1173 preamp. The sound is fairly coloured, creating an extra warm tone, while still keeping excellent definition. I find its characterful sound really complements the SM7B.

113544-1489ee427dab339f8713d18a9547316d-3073a9c65ea7010b1d7c4518568cfe83.jpg
The front plate of the GAP73

Hailee had prepared the script from Jill’s research, which I had also proofread. There were around 60 points in all, each lasting between 20 seconds to a minute or more, though most erred on the shorter side. Thankfully recording with Alan went perfectly, as he is an excellent speaker, and the process took less than 2 hours, from setup to completion (though much time was spent having a good chat with Alan!)

I recorded, edited and mixed the audio in Avid Pro Tools 12, my preferred DAW of choice. Editing involved topping and tailing each audio clip, as well as cutting out breaths or mouth noise. For mixing, I used a variety of plugins to enhance the audio and give it a professional quality:

RBass: this plugin is especially useful on spoken word vocals, especially when they have to stand on their own. The plugin generates low frequency harmonics from the source, giving it a deeper and more solid sound.

Screen Shot 2016-08-29 at 18.40.14.png

Fabfilter Pro-Q: the Pro-Q is my go to EQ plugin, as its very versatile, yet easy to use. Using this I cut any super-low or subharmonic frequencies (less than 40Hz) and also dipped the low mids around 250Hz. This ensured that the low-end was under control but and made the low-mids a bit clearer.

Screen Shot 2016-08-29 at 18.40.21.png

Slate Digital VMR: the Virtual Mix Rack by Slate contains a variety of plugin modules on one plugin slot. In this case, I used the Virtual Console Collection mix buss plugin, which emulates the sound the summing amplifier of a mix buss of a vintage console (the model I chose is a Neve). This imparts a subtle sonic change that is very pleasing, lightly adding some harmonics and giving the low and high ends a nice finish. I then used Revival, which further added some crispness and clarity to the top and some fullness to the low-mids.

Screen Shot 2016-08-29 at 18.40.30

Fabfilter Pro-C: again from Fabfilter, my go-to compression plugin, especially when a smooth and transparent compression is required. This plugin levelled off Alan’s voice, giving a more consistent amplitude – lowering phrases or syllables that are too loud, which simultaneously raising ones that are too quiet.

Screen Shot 2016-08-29 at 18.40.45

Fabfilter Pro-L: this limiter plugin was used to provide a standard loudness across all the audio clips. Like the Pro-C is has excellent transparency. In this case, I brought the clips up to a peaking value of -6dBFS, which still leaves headroom for any other audio sources to occur simultaneously.

Screen Shot 2016-08-29 at 18.40.53

It’s all well and good to write about audio, but it can only truly be understood with our ears! Unfortunately, this blog cannot host audio, so you’ll just have to listen to Alan’s lovely voice on the app itself!

 

Fade In / Fade Out

Hey there, Enda here!

It might seem like a little thing, and it will be a short entry, but the topic of this blog post is definitely something I found made the app feel ‘real’ and professional. Transitions between events in a virtual space need to be smooth – when things pop in and out of existence, it can be quite jarring for the user. Rather, transitions often use fades to make this a gradual, rather than instant process.

Fading is a crucial part of the design of this app, as it affects the aesthetic ‘feel’ of so many objects. For example, fades are used in the following situations:

  • Fading photographs in and out, in time with their narration
  • Fading Blender stills in and out, in time with their narration
  • Fading the whole screen from black when the scene starts
  • Fading the whole screen to black when the scene ends
  • Fading in the loading screen and its animation

Fading is possible through linear interpolation (commonly known as a ‘lerp’). Lerping is a mathematical function that creates a line between to points, therefore creating a smooth change between the values.

 

LinearInterpolation.png

Unity has inbuilt lerping functions. The main one is Mathf.Lerp, but there is also Color.Lerp, specifically for colour changes. This is scripted in the code below:

Screen Shot 2016-08-29 at 17.43.03.png

In this case, the code loops in a while loop, changing the colour from clear (RBGA (0,0,0,0)) to black (RGBA(0,0,0,1)) by the value of ‘progress’, which is a multiplier times time.deltaTime. time.deltaTime is the amount of time in seconds the last frame took to complete – our app runs at approximately 60FPS.

These fades create a smoothness that makes the app a more pleasant experience. It also allows me to bring up full screen overlays, (such as the loading screen below) which can hide scene changes, which would otherwise be awkward in VR.

Screen Shot 2016-08-29 at 17.42.35.png
The TrinityVR loading screen

Bugs, horrible bugs! (Troubleshooting)

Hey there, Enda here.

So I’ve definitely fallen of the regular schedule of blogging, since the project has escalated and been incredibly busy! However, I plan to address a few different topics over the next few blog posts and catch up.

This post will talk about bugs and troubleshooting them, and this project has had more than a few, which led to some very despairing situations. Bugs are a daily reality in programming; if you’re lucky, when you find a bug, it’s something small and you immediately recognise what has gone wrong (mostly likely because you put it there in the first place!) However, on this project I have dealt with two ‘major’ bug problems, which were proving catastrophic for the project.

Bug 1: Easy Movie Textures

As mentioned in previous posts, I’ve been using a third party plugin, Easy Movie Textures, in order to create film textures on mobile devices. This is because Unity still does not natively support such a feature, although they really should! As a third party plugin, the behaviour of the asset is not entirely certain, and certainly lacks the robust testing that Unity’s native features go through. However, it took some sleuthing to even pinpoint the bug as an Easy Movie Texture issue.

The problem manifested in early builds. It appeared that UI buttons that moved the camera would stop working. This problem appeared in builds only and behaved as expected in the editor. My first thought was that this was a UI issue: we had already encountered UI problems with Unity and the GoogleVR SDK. The bug initially seemed to be random, but further examination revealed that it only happened after a video transition has been done – this was the first clue that it was something to do with the Easy Movie Texture asset. As it transpired, I was able to establish that the variable which contained the current state of the movie (ready, playing, stopped, ended) did not behave in builds as it did in the editor. In this case, it meant that the ‘if’ statement used to detect when the movie was over was constantly being triggered. This ‘if’ statement in turn triggered a function which moved the camera to a particular set of coordinates was also being constantly called. Forcing the camera to become stuck in one place. The next UI marker would move the camera when clicked, but only for a single frame, as it was then forced back to its previous location by the still running script.

The issue was reported to Unity, who recreated the problem but could not identify why. Thankfully, I was able to discover the solution to the issue myself in the end. This bug was an important lesson for me – it taught me that behaviours in the Unity Editor and in a build will not necessarily behave 1:1. As a result, I became more cautious and understanding of the difference between the two environments.

 

Bug 2: Crashing & Memory Leaks

This bug induced more than a bit of panic! As builds started getting bigger and bigger, we noticed that they would crash more frequently. Initially this was attributed to whatever the user was doing at the time, but eventually we noticed the app crashes regardless of what the user was doing – they seemed inevitable. By using the debug tool in Xcode, while the phone was hooked up, I was able to find some shocking numbers.

 

memoryWarning.pngVRmodeMemory9mins.png

Unfortunately, our app was eating up the phone memory. I discovered that this is something called a ‘memory leak’, where something is constantly being stored into memory, but is unable to be dumped from it. When memory use hits a critical point, the app crashes in order for the phone to save itself for its basic functions.

I did a lot of testing with various kinds of builds, to see how memory behaved, to try and identify the source of the leak. I discovered two interesting things. When VR mode was disabled, there was no leak. Also when no Unity UI elements were present (in VR Mode), there was no leak. Rather the app looked like one would expect in its memory use:

noVRmemory3mins.png

It was clear something was wrong with Google VR and its VR mode specifically, in combination with Unity’s UI. This lead me to do a lot of querying online and with various people who might have had answers. However, blissfully I stumbled across a closed thread on the Google VR: as it turns out the issue is with Unity: any version greater than 5.3.1f. This version came out in February, so we had always been using an incompatible version since the start of the project! Thankfully, this meant that rolling back our project to an older version of Unity saved it.

Despite being a very stressful event, the memory leak bug gave a better understanding of Unity, Google VR, programming and memory management – especially for mobiles, which have so little memory to spare.

Week 7 (25-31 July, 2016)

Hi, it’s Ying here. It has been quiet a while since my last update, which will be recounted in this post. I will start with the last few week in July after the mid-term presentation first. Then in the following posts I will summarise my work in August.

Spatial Audio Runtime Glitch in Unity

After finishing the spatial audio demo with Google VR SDK in week 6, I found an annoying glitch in Unity. My spatial audio demo had four audio assets – the bell, the chisel, the bike and the ambience. All of them are triggered by the user’s gaze input except the ambience, which is played on awake and always looping in the background. When the clips are playing simultaneously in the runtime, the screen flashes in pink on my macbook. To be honest I don’t know the exact cause of the glitch, probably due to the memory limitation on my mac or that Unity could not handle audio clips at different levels very well. The glitch did not happen after I normalised the individual audio files. I did the normalisation in Audacity, but the task can be done in other DAWs or in Unity as well.

Waveform Analyser Plugin for Audacity

I reported the aforesaid glitch to Enda, and he suggested to bring all the audio clips roughly to the same levels to save the computer from extra calculation. He suggested to me an audio waveform analyser plugin for Audacity < https://gist.github.com/endolith/514077 >, so that I could check the peaks and average levels (RMS) of each soundtracks. This plugin is very useful when I need to know the average decibel of the audio files before putting them into Unity. A solution for the glitch is found. As long as the audios were normalised to the approximate level, the screen would not flash again in Unity 5.4.0b23 in runtime.

Integrating VST in Cubase and Creating UI sounds

I used Cubase for the composition of UI tones. Although my UI sounds are not adopted in the end, I think the process itself worths a bit of documentation. On Tuesday and Wednesday (26-27 July), I was searching for free VSTs and integrating them into Cubase.

Jill preferred to have a percussion called hang for the UI. I could not find a free VST for this wonderful instrument unfortunately. Also, Enda mentioned that he preferred to have a mix of harp and piano for the UI. The notion was inspired by Trinity College Harp, which, according to legend, was once owned by the Irish King Brian Boru. The harp represents the Irish music tradition and aesthetics in many ways. I specifically looked for a harp VST as the HALION Sonic instruments in Cubase are mostly unavailable on the PC I am working with. Also the existing VSTs don’t have a natural sound either for a harp or a piano. By the end of the day I got 10 VSTs installed in Cubase. The names of the VSTs are listed below. I decided to only go with mda Piano and the VS Etherealwinds Harp to make the UI sounds.

mda Piano < http://mda.smartelectronix.com/synths.htm >
VS Etherealwinds Harp < http://vis.versilstudios.net/etherealwinds-harp.html >
Other VSTs: DSKAkoustiK KeyZ, EP-Station, Piano Harp, 4Front EPiano Module, EVM Grand Piano, GlueReeds, Joe’s Jazz Piano I, MrRay 73

UI-COK2

I made two UI tones, one for the highlight effect when a UI marker is gazed upon and the other for confirmation when a UI marker is clicked. They are made with the VS Etherealwinds Harp and the mda Piano. I chose a chord for the highlight and two notes for the confirmation. Both are very simple so that they won’t distract the user from the immersion experience.


 

Research on Irish harp music

This week I also did a little bit of research into the Irish harp music. Turlough O’Carolan (1670-1738) is perhaps the most famous harper and composer in Irish history. He bequeathed generations of Irish musicians with more than 200 pieces for harp performance alone. I selected seven songs from his repertoires in preparation for the video background music in the app.

Captain O’Kane (recorded)
Lady Maxwell (recorded)
Maurice O’Connor
Blind Mary
Lady Anthenry
Mrs Judge
Sheeberg & Sheemore

Many songs created by Turlough O’Carolan were not published during his life time, and they were supposed to be learned by ears. When I searched for the sheet music for the above pieces, the scores I found only provided the main theme for the right hand. Performer needs to improvise on the left-hand accompaniment, which has unlimited possibility of variance between performers. Due to constraint of time and the limit of my talent, I only recorded the main theme for Captain O’Kane and Lady Maxwell on Friday, 29 July. I passed the Captain O’Kane piece in MIDI to Enda. He found a better arrangement for that piece in the end.


 

Finalising Ambience Sound Design Document

I also finalised the ambience sound design document on Thursday, 28 July so that Enda and others could have an idea what kind of sounds would be put into Unity by locations. The design document followed the naming conventions Enda had outlined in Unity. The test recordings I took on 18, 19, 20 July were also taken into account, especially the environment and direction where the sounds are coming from. Having the sound design penned down, I took another two field recordings on Sunday, 31 July and Monday, 1 August.

Audio format considerations

All the ambience was recorded in MS and XY directions. There were stereo audio files with 4 channels in two tracks. I did not recorded the ambience in a 4-channel ambisonic format into one track as I was not sure about the editing workflow in DAW and the support for this format in Unity. I thought the directional 4-channel recording would suffice for the postproduction anyway.

Also I decided not to export the audio as MP3 format. There was 0.04 seconds of silence at the beginning of the sound track, which was unavoidable as long as it was exported in mp3. It would result in a sudden pop, which was bad when the track was looped in Unity. I would rather export the audio file into .wav and let Unity to compress it after the assets were imported.