Project Overview and Demo

Project Overview

Terminal is a virtual reality puzzle game, created to measure the differences in puzzle solving ability between auditory and textual cues when using virtual reality.

The game takes place in a warehouse, where there are two basic levels of interaction. The first level uses basic physics: picking up and moving boxes, turning on lights, opening doors, etc. The second layer of interaction takes place on an in-game computer monitor, and these interactions affect the warehouse in various ways. The puzzles throughout the game take place between these two layers of interaction, and gradually increase in difficulty.

The source code for the project can be found here.

Below is the poster we presented at York University's 2017 Undergraduate Research Fair.

Technical Summary

The game is built with Unity, using C#, and played using an Xbox 360 controller with a head-mounted display.


The Puzzles

Puzzle 1: The Lights

For the first puzzle, we thought it was important to keep things simple. The player has to interact with the virtual monitor to push a cube through a maze and reach the end goal in the maze -- a blue panel in the middle. As the box is pushed along inside the monitor, a box begins to float in front of the player, seemingly mirroring the movement of the on-screen red box.

When the on-screen red box is placed over top of the blue panel in the middle of the maze, the hovering box drops, all the lights in the warehouse (including the monitor screen) shut off, and the emergency lighting (a dim red glow) is activated.

The player then has to figure out how to turn the lights back on. Doing so requires finding the breaker, which is behind a locked door. To unlock the door, the player has to find the key (which fell with the floating box).

Puzzle 2: Colour Matching

After the player has turned the lights back on, the 4 consoles that line the left and right walls turn on, each showing one of four colours: red, blue, green and yellow. A cue will trigger that tells the player that the coloured boxes need to be delivered. Since the boxes are on the top shelves, the player has to return to the monitor.

Now the monitor displays a top-down view of the warehouse, and when locked onto it, moving the left joystick now controls the movement of a box that's visible on the screen but not in the warehouse. Moving this box into the coloured boxes atop the shelves knocks them down to the floor.

Now the player can place each box into its matching console. Once each of the 8 boxes (2 of each colour) has been placed inside its matching console, the console lights shut off and 1 box shoots out of each console: each being 1 of the 4 colours they accepted.

Coming Soon!

There are several other puzzles that are in active development. They'll be documented here once they're finalized.

Documentation and Technical Challenges

The monitor

The primary mechanic we wanted our game to use in its puzzles is the interplay between the "vitual" space (the contents of the monitor) and the "real" space (the warehouse). Getting an in-game monitor display working and making it interactable was our first priority.

The easiest method (and only method) we could figure out for accomplishing an in-game monitor effect was to create a plane (to serve as the monitor), create a camera (to serve as the monitor's "feed"), create a material out of the camera's feed, and then apply that material to the plane.

Luckily for us, it was a lot easier than it had been in the past. The easiest way to make a monitor in Unity is . Using a camera as a texture used to be a Pro-only feature in Unity, so we were lucky that this was no longer the case.

Below is a video of our first test of the monitor mechanic.

Once we had the basic mechanic working, we built our first monitor game. We created a simple maze. As you can see in the video below, we had a lot to learn about collisions in Unity.


We went through several iterations of lighting in our project. Initially, we wanted to use baked lighting to improve performance and make everything look nicer, but we ran into several problems along the way. We ended up using realtime lighting throughout the majority of the development, and then switched to baked lighting at the end.

First, we couldn't even get Unity to properly complete baking a lighting setup. It would take hours and get stuck halfway through. We left it overnight once and it still hadn't completed by morning. We eventually figured out that the models we'd created in Maya were somehow causing Unity's lighting engine to work a lot harder than it had to, so we remodelled everything in SketchUp.

When we finally managed to get the lights to bake, the resulting lightmaps were a mix of various bugs and issues.

Creating a text-based HUD in the Oculus Rift headset

In order to properly compare textual cues with auditory cues, we had to give the two a fair fight. Audio cues typically allow the player to continue focusing on their environment, so we wanted to bring our text cues to the same level.

We wanted the text to show up in the middle of the player's screen, no matter where they were looking, and it ended up being quite finicky to get the text to show up this way.

The first thing I tried was creating a Canvas object and then adding a Text object to it. I then created a Camera object called UI Camera (which only rendered the layer that the canvas was on), and then, putting the camera in "World space mode," I placed it a few feet in front of the player character. I attached the canvas to the CenterEyeAnchor inside the OVRPlayerController object, so that it would always be in front of the player's headset.

Using this method was almost perfect, but not quite. If I put the canvas on the UI layer, it showed up over top of everything else in the scene, but it wasn't locked to the center of the headset, and seemed to float somewhere above the player character, matching the movement of the mouse but not the headset.

If I put the canvas on the default layer, it showed up in the center of the headset like I wanted it to, but it didn't render on top of everything else in the scene.

I eventually found that there was an Oculus shader that I could apply to the 3D text object I'd placed in world space that would make it appear on top of everything else. The text was finally in the middle of the headset and over top of everything else.

And many more

There are several other difficulties that will be documented here in the future.



Virtual reality is on the brink of mainstream adoption, with video games currently being the main application for the technology. While the genre is young, virtual reality game developers are still experimenting with different ways to direct players through their games. Existing research in the field of virtual reality tends to focus on its effects on cognitive abilities when compared against alternative technologies. By having participants play a puzzle game in virtual reality, we explore the differences between the two main types of direction in video games: visual (text) and auditory (speech). We hypothesize that auditory instructions will be more effective and lead to better solve times than visual instructions.


Monitor is the name of a game that we’ve developed to determine the differences between textual and auditory cues. The game requires its players to solve a series of puzzles and takes place in a virtual warehouse. The puzzles mainly take place on an in-game monitor and are based around the interplay between the objects on the monitor and the objects in the warehouse in some way. Designing the puzzles this way forces players to think spatially. We decided on spatial puzzles because there exists a body of research that suggests that virtual reality is particularly effective when it comes to solving spatial puzzles. A couple of these studies are outlined in the section below, but many more can be found with ease. [1]

We will be using the metric of solve times, paired with a brief questionnaire to compare two different groups of players. The first group of players will receive their directions via a computer-generated voice, through a pair of headphones. The second group of players will receive their instructions via text overlaid on a heads-up display (basically: the text follows the player’s head movement so that it’s always visible) inside the virtual reality headset. Both groups will receive the exact same directions, and the directions will tell players what they need to do next.

By analyzing and comparing the metrics we acquire from both groups of players, we will determine whether or not there is any quantifiable difference between the solve times of the group who receives auditory cues, and the group who receives visual cues.

The game is being developed in Unity [2] the game engine. The game space is experienced through a head-mounted display (Oculus Rift) and a pair of headphones. The player moves through the space and interacts with objects using an Xbox controller. Since training “in a virtual environment leads to a reduction in real task completion time when tested” [3], we wanted to explore the nuances and deeper implications of this correlation, with regards to virtual reality technology and its ability to improve cognitive abilities. Immersing the player in the game will also have the added benefit of easily eliminating almost all variables of distraction.

Related works

We found several research projects that measure problem solving skills, but found none that talk about how a game’s narrative elements change its players’ ability to solve puzzles, so we decided to build a game that would allow us to measure this.

In a project called Squareland [4], researchers attempted to create a framework for spatial processing and cognitive processes in wayfinding. In their preliminary findings, Hamburger and Knauff discovered that “[t]ask instruction did not provide any significant differences or advantages for certain groups” [4] but we hypothesize that the reason is because the tasks asked of its users in this test were so simple that users had already figured out what to do next when the “hints” were given. In our game, the results will likely be different, because it is not always immediately apparent how the puzzles tie together, and what the player has to do next. Our cues will simply guide players in the right direction, without outright telling them what to do.

In another study, Linehan et al. found that “The pace at which challenges are introduced in a game has long been identified as a key determinant of both the enjoyment and difficulty experienced by game players, and their ability to learn from game play.” [4] We intend to introduce the mechanics of our game in this same fashion, and build each puzzle in such a way that it uses a previously introduced mechanic with new challenges or complexities added onto it.

Furthermore, their findings suggest that: “1) the main skills learned in each game are introduced separately, 2) through simple puzzles that require only basic performance of that skill, 3) the player has the opportunity to practice and integrate that skill with previously learned skills, and 4) puzzles increase in complexity until the next new skill is introduced. These data provide practical guidance for designers, support contemporary thinking on the design of learning structures in games, and suggest future directions for empirical research.” [5]

When testing the problem-solving abilities of test participants who played the game Portal 2 [6], Shute et al. found that those participants “showed significant increases from pretest to posttest on specific small- and large- scale spatial tests” [7] which goes to show that puzzle games with a strong audio direction component are worth exploring in a cognitive ability capacity.

Performance evaluation

The majority of data we’ll collect from each player will be temporal. Every time a puzzle begins, a timer will start, and it will not stop until that puzzle is completed (or the player expresses a desire to give up on that puzzle). For example, we’ll record the time it took after the game was started before the player interacted with the monitor. This number could be used as a baseline to “weigh” each player based on how quickly they typically solve things, but may not end up being used at all if there is no correlation found between this number and other performance metrics.

Another metric we’ll use is a brief questionnaire that participants will be asked to fill out. The questionnaire will determine how familiar the player is with video games, virtual reality, and whether they consider themselves to be more of a visual or auditory learner. Furthermore, we’ll also use a survey to get a general idea of the past experience each player has with puzzle games. We can ask things as broad as “are you familiar with Xbox controllers? [yes / no]” or as specific as “have you played the video game Portal or Portal 2 before?” to give us an idea of how new the experience may be to each player. The participant’s answers to these questions will help us determine how much weight their data carries.

We will also include questions relating to virtual reality to determine which players have experience with the technology. We will use a scale from “no experience” to “lots of experience” and use the player’s response to determine how much weight to place on their performance in our overall scoring of players using virtual reality versus a traditional monitor interface. Determining how much a player’s past experience with virtual reality will affect their ability to solve puzzles in virtual reality will be challenging. We will have to come up with some sort of averaging to balance out any advantages those players have over players with no virtual reality experience. Due to how immature the technology is, finding an audience who is familiar with virtual reality would be very difficult, so finding a balance will be imperative to getting meaningful results from our experiment. We’re relying on the fact that virtual reality is generally fairly intuitive.

We will also take a sum of all the puzzle solve times and calculate an average solve time for each player. This figure will allow us to compare every player’s performance against an average.

The rough logical flow of how measurements will be captured throughout the game can be seen in the flowchart below.

It’s important to note that puzzle solve times don’t necessarily directly represent how long it took for someone to figure out a puzzle, but rather how long it took them to solve the puzzle. This includes the repetition often found in puzzle games, which doesn’t account for how much time is spent actually doing the puzzle once it’s been figured out. This is a limitation in our research.

Interaction mechanics and controls

Our ideal participant would have a comfortable level of familiarity with a traditional game controller and with virtual reality. We expect to find more many more people who have experience with the former than the latter due to the age of each technology. will devise a mathematical mapping to weigh participants’ scores differently depending on how closely they match our ideal participant. For example: we’ll add weight to the score of a player who is familiar with virtual reality and we’ll take away weight from the score of a player who has never used a traditional game controller before. The reasoning behind this is that learning how to use a new technology will take time away from figuring out how to solve the puzzles. The weighting (or whether or not it’s even necessary) will be revisited once we’ve collected all the necessary data and measurements.

Research on the topic suggests that “subjects are able to acquire configuration knowledge of [a] virtual [space] even in the absence of physical movement” [8] so we don’t foresee it being a factor in the research.


It remains to be seen if there will be any difference between the performance of players given audio versus visual cues.

Since audio and text are both used frequently in video games (oftentimes in conjunction) as mediums through which to deliver directions to players, it remains to be seen which (if either) is more effective.

Because of the simple fact that players are already using their visual senses to play the game, we believe that audio cues will have an inherent advantage, because they will allow players to use more than one sense to interpret information.