The Silver Lining - Journal Special - Entry #10: Testing 1, 2, 3...


Entry #10: Testing 1, 2, 3...
October 11th, 2008 \| Neil Rodrigues

Last month’s entry discussed the final phase of development, but it’s still not over yet. It is very rare that something is done perfectly from start to finish. When you consider all the previously discussed game elements, all the different people that worked on them during various phases, and all degrees of rework and refinement that they went through over the years, it’s very easy for things to “break” or behave differently than expected in the final product. This variance between an expected result and actual result is more commonly referred to as a bug. A bug could be something obvious and critical like the engine crashing, or something minor and nitpicky like a missing word in some dialogue. It could be something visual, audible, code-related or even design-related if the problem is with the plot itself. A bug could be something fixed in a couple minutes or an hour, or it could take days, weeks or months, depending on the complexity and work involved to fix it. The goal of testing is to find and fix bugs. While it’s impossible to find and fix all bugs, fixing as many as possible is the next best thing. This month’s entry will discuss how we perform testing in The Silver Lining.


Reference Material While most people know that a bug is an oddity found in software, it’s hard to test something without knowing exactly what you are trying to find. Alpha testing is usually done internally by team members. They play through the game, comparing the gameplay to what they expect from reading the script. While the script doesn’t describe everything and may have been modified over time, it still provides a good basis for what to expect in each scene and what things the player can interact with. Beta testing, on the other hand, is done by people who usually have no prior knowledge of the game script. They play through the game without any reference material, and look for bugs based on their own expectations. However, in order to help the testers out, before they entered our team I provided the testers with some general tips on the sorts of things they should be looking for. Playing through the game without having knowledge of the script allows users to try things that team members may not have anticipated. For example, the script might have specified one exact location to obtain water, without realizing that there are plenty of other places to accomplish this very same action. When a tester tries to fetch water from a plasible source but is instead treated to a general Narrator error message, they’ve found a bug.


Bug Categorizing There are some bugs that are obvious to find, such as the game crashing when entering a scene, or fire appearing as blue instead of orange/yellow. Other bugs are more obscure, like a word discrepancy between written dialogue and spoken dialogue, or being able to use the same inventory object on a character two or more times. When we held a contest to find beta testers, we used a very recent build of the game restricted to three scenes. The scenes themselves contained 57 known bugs and over 50 unknown bugs, found by those who participated in the contest. The winners were chosen based on how many of these bugs they found individually, as well as on how well they reported their findings. They too were given no reference material, and the only guidance provided was sort their findings into four broad bug categories: visual, sound, click and user interface. A visual bug refers to: texture related (e.g. the graffiti on the bench which read “tsl4life”) geometry related (e.g. broken 3D model of Saladin ancestor statue) export related (e.g. GuardDogs not holding spears) environment related (e.g. rain penetrating roof in Hallway) Figure 1: Example of a Visual Bug: Trogdor Graffiti An audio bug refers to: voice related (e.g. GuardDogs lack voice) dialogue related (e.g. eye on Garlands say "Bitterly, Graham's eyes linger on the garlands of fresh flowers." while audio does not say "fresh") volume related (e.g. Oberon’s voice needs normalization) music related (e.g. Hand on Cloak playing Justin Timberlake - SexyBack) Example of an Audio Bug: Oberon's Supercalafragilistic Outburst A click-related bug refers to: dialogue related (e.g. eye on general in InnerGarden says "INSERT DIALOGUE HERE") walk related (e.g. difficult to walk in scenes with tracking camera) collision-mesh related (e.g. Graham can walk through walls and fall into a white abyss) scripting related (e.g. the left autowalk takes Graham to east side of balcony, when you go back through door, the moment he gets to the door he turns around and goes back up) marker related (e.g. hand on door to Hallway causes Graham to walk to incorrect marker) audio or visual bugs that occur when clicking on something Figure 2: Example of a Click related Bug: Graham-eating Column A user interface bug refers to: inventory related (e.g. eye & hand on 3D cloak do nothing) main menu related (e.g. save & restore buttons have different highlights) options related (e.g. audio extensions text overlaps volume sliders) cursor related (e.g. closing inventory with Arrow cursor doesn’t revert cursor to last used one) save/load related (e.g. screenshot preview is missing) Figure 3: Example of UI Bug: Text Overlapping Volume Sliders There was also a category for other bugs, which are bugs people found that might not have fit into any of the four broad categories.


Bug Reporting For the beta testing contest, the structure for bug reporting was fairly loose. There was only one bug report permitted per person, and all bugs must be numbered, but that was decided upon more for clarity and evaluation purposes than anything else. The actual beta testers use our project management tool called Redmine to report issues they find while playing the game. Figure 4: Screenshot of a Reported Bug Unlike bug reporting in the contest, there are a few rules team members must follow before entering bugs. First, the tester must use the search function to see if the bug they’re reporting has already been entered. The reason for this is that there were known issues already entered into Redmine by the team before the testers joined. Creating duplicate issues wastes time for the developer to investigate, it misleadingly inflates the amount of work remaining, and it can quickly become annoying for any other tester needing to search through the issue list. If a duplicate is reported, one of the issues must be closed. Sometimes it’s the older issue, in order to indicate the issue is still present in the latest build. And sometimes it’s the newest issue because the older issue describes the problem better. Next, every subject must prefix with the isle and scene name (i.e. IoC_CrossroadsStorm) so that the developer can immediately know what area of the game to look at. It also helps to identify all bugs reported in a particular scene when sorting the issue list by subject. The rest of the subject line should contain a brief description of the problem (which some of our testers enjoy turning into amusing puns). The description field is where the tester describes the issue in detail. This usually starts with the build number they performed the test on (i.e. the Subversion revision number) which helps the developer know whether or not their fix was in that build. Additionally, the description may contain steps to reproduce the issue, and attachments like screenshots or saved game files. Anything that can help the developer identify and reproduce the issue is beneficial. Lastly, the tester assigns the issue to the Lead Tester, who then verifies the issue follows the aforementioned guidelines. She then re-assigns the issue to the correct developer, and may set the bug category as well.


Bug Fixing When fixing bugs, usually it’s enough to simply read the subject and description. However, there are times when the problem is still too vague to identify. While not all developers use the bug category field, I find that it helps greatly in narrowing down the problem. For example, if the issue is "clicking on an object yields the general scene dialogue", it could mean either: the collision mesh for that object is missing the collision mesh for that object was not exported correctly the collision mesh for that object is being covered by some other collision mesh the name of the collision mesh does not match the name in the Conversation file the dialogue for clicking on that object is missing from the Conversation file the scene scripting for that object interaction is invalid the game engine is unable to process the action for some other reason Problems 1-3 are art-related issues, 4 & 5 are dialogue-related, and 6 & 7 are programming-related issues. Different people handle issues depending on what the problem is, meaning an issue could possibly be re-assigned several times and worked on by several people before it finally gets fixed. Once the issue has been fixed, the developer sets the issue status to Re-test, and re-assigns the issue to the original tester. That tester will then update to the latest build of the game and try to reproduce the issue. If the test passes, they mark the issue as Closed. If it fails, they will set the status as Feedback, and add a note that the issue (or some aspect of the issue) still has not been fixed. The Feedback status and note field may also be used by the developer in order to get back additional information about the issue from the tester, before being able to fix it.


Bug Prioritizing One of the most difficult things the team goes through is deciding how to prioritize bugs. I like to base priority on the severity. That is, any bug marked as Critical should be fixed first. Critical bugs are usually bugs that cause the game to crash, make gameplay unbearably slow, or otherwise put the game into an unplayable state. Figure 5: List of Critical and Less Severe Bugs The “unplayable state” could be argued and deemed less severe than Critical (i.e. Immediate, Urgent or High), because Graham getting stuck in a wall or falling through the floor is usually something that only happens when the user tries to walk through odd places. Another example of a less urgent Critical bug is one that occurs only for certain people. People with older computers tend to experience issues that others are not able to. For example, having less RAM or an older video card generally means scenes will take much longer to load and may even be temporarily choppy during gameplay. One Critical issue we solved recently involved the game getting extremely sluggish after exiting and re-entering a scene 20 or more times. It turned out that the game had a memory leak where audio was never being freed when leaving a scene. The game would consume more and more memory each time a scene loaded, until the operating system could no longer render the game at a normal framerate. This bug was solved by forcing audio buffers to clear, by resetting the audio driver between scenes changes. Most bugs fall under the severity of Normal, which mean they all have equal priority and can be fixed in any order. Below that we have priorities: Low and Tweak. Ideally, Tweak priority bugs should be kept for last. These are issues that do not impact a player’s ability to complete the game, and usually amount to nothing more than finishing touches. The problem is that a finishing touch for one person, could be a Normal or even High priority bug for another, depending on how annoying the issue is. For example, I would consider a spelling mistake in dialogue to be a Low or even Normal priority bug, because to me, spelling errors tend to stand out like a sore thumb (especially when replaying a scene over and over)! On the other hand, someone else may consider a spelling mistake to be acceptable because it doesn’t impact gameplay. Tweaks are usually very easy and quick to fix, but over time they can really add up. They can get to the point where the total amount of issues reported creates a misleading amount of work left to be done. The other problem with tweaks is that they can be deceiving. Something that seems to one person like a tiny fix may in fact turn out to be a massive undertaking to solve. An aspect of the game can have so many tiny things wrong with it, that the entire piece needs to be redone from scratch. When this happens, it’s done with the intention that it will solve many issues, but there’s also the possibility of it creating more issues than it solves, due to the sheer size of the change.


Testing is often an ongoing and repetitive process, which involves performing similar actions over and over again in order to find bugs. Once a bug has been found, a report must be made, describing the bug in detail as well as the exact steps taken to reproduce the bug. The responsibility then transfers to the developer, to determine the cause of the bug, to fix it and to reproduce the steps to ensure it’s fixed. Lastly, the responsibility is transferred back to the original tester, to double-check that the bug no longer exists. Once a considerable list of bugs has been made, the problem then becomes one of deciding which to fix first. Ideally, the bugs that impact gameplay the most should be fixed first, but this hasn’t always been the case. In the past, we’ve used many different strategies to figure out what tasks should be worked on and who should be working on them. Next month’s entry will discuss how this project has been managed over the years, as well as the tools used throughout each process. >> Comments