Wednesday, April 7, 2010

Usurping the (Useless?) Utilization of the Usual Usability Test with Gratuitous Substitutes

The following article uses a project I am developing for a friend of mine, Kyle Britt, as a case study.

One might think that I'd be remarkably adept in formulating a robustly rigorous usability testing plan. Why would one think this? Well, admittedly, I served for a brief time as Usability Research Assistant at the Simmons GSLIS Usability Lab. Unfortunately, if one assumed my deft aptitude, one would be sorely mistaken. Though slated to assist in the development and facilitation of the usability testing plan for the new redesign at my current place of employment, I am abhorrently clueless about the subject.

Disclaimer done.

Here's the lowdown, dirty: there's an ideal situation and then there's reality. Constrained by limited availability of time and labor, and compounded by an unfortunate dearth of expertise, I can't really achieve much. Reality bites. Therefore, I'll just start by saying that an intensively comprehensive evaluation of this site is impossible. (Or is it?) Here I am, alone in this venture; financial and temporal feasibility of an adequate test dangerously low, even a traditional testing scenario involving just a few users proving decidedly most difficult.

That said, given the tools that are readily available (on the Internet and through Imagineering) in concert with the enormity of Kyle's network of friends and clients, many of whom use Macs (there's a point to that...), there actually (that is, in reality; no, really) is a lot that I can accomplish! So, to that end, rather than spend my time contemplating the rainbows, sunshine, ice cream, and ponies that could be (In Dreams), I thought, "Why not investigate the options, of which there are many, available to me for conducting these here usability testing-thingies on a shoestring and a smile?" In other words, why not look into something that I could potentially achieve in the next three weeks?

So, to reiterate redundantly, what I'm going to do here is talk about a few options available to me and those like me who are working alone on a site, haven't any money or time to spare, and are just plain sick of sitting days on end in a usability lab staring at the walls. Simultaneously, I'll construct some qualitative and quantitative questions to quash any would-be quips about my quibbling and quandary-inducing cacophony of characteristically culpable crimes against all good taste and decency. OK?

OK. Onto the plan.

The Traditional Test

Now, if I were to conduct just one test, I would do so upon having nearly finished the project, i.e. right before I were to go live with the site but still had sufficient time to make some changes that could quickly and effectively alleviate any truly glaring issues. Who and how would I test? Well, again, in a perfect world (Skokie, IL, 1953), I'd test at least fifteen persons in a rigorous fashion. In this instance, “rigorous fashion” means to individually sit each of the fifteen persons behind a two-way mirror but in front of an excellent computer machine, replete with eye-tracking enabled monitor, high quality audio microphone, and both overhead and straight-ahead cameras. I would employ Morae, a software suite developed and released by TechSmith Corporation, in order to capture each user's datastream; i.e., e.g., mouse-clicks, on-screen text, keystrokes, program launches and browser window modifications, not to mention mouse, keyboard, and tower hurls (these are the most fun, and destructive).

Naturally, as observeur and arbiteur extraordinare, I'd sit on the opposite side of the two-way, laughing heartily at the testers as they toiled through the ten or so questions and tasks that I'd have developed for them, my hands at the ready to strike the necessary key combinations in order to input realtime notes into my Morae screencast and video. Perhaps I'd ask them questions or instruct them to complete tasks akin to the following:

1. "Could you find a picture of a woman sitting on a chess board and tell me how Kyle managed to do her hair in just twenty minutes flat?"

(This question would force the user to think about which navigational button he or she would have to depress in order to land on a page containing this information. Once there, they would have to quickly scan the photos and determine the correct image. As I ask this question now, I am already immediately flooded with newer, infinitely more impressive, audaciously more awesome ideas like: "Hmmmm, this makes it sound like the gallery would be an excellent candidate for a relational database with a robust (coffee anyone? I can't be sure if this is a “java” joke...), perhaps Boolean, search function, enabling users the ability to search the voluminous pictures for a variety of pieces of metadata and cull out attributes in which they are interested.... Or they could just suffer the unbearable burden of searching through the thumbnails in the gallery and rolling over the pictures so that they can find the picture of the lady on the chess board, read the description, and then find the picture of Kyle doing her hair. How successful one is at this task can tell the researcher [i.e. me] a lot. Why? It'll show me how clear and articulate the writing is in describing the photos, as well as how natural and fluid the navigational features of the website are.)

2. "Please tell me your general impression of this site. Do you like the overall aesthetic, i.e. the look and feel? Is it pleasing to your senses? Would you come back?"

(I love questions like these. Very often we can obtain more information about a site and how a person really feels about it by way of simply asking them outright, "How do you feel about the pixels of light at which you are currently looking?". People like to talk [most of them] and people have interesting things to say [a few of them]. Obviously, quantitative data can tell us a lot about a site and the various functions and how effective they actually are. But unlike, say, an anonymous poll, usability tests can be much more personal and take into account [if you've completed an appropriate IRB and have the person's permission] various demographic and personal details about the person. And unlike an anonymous poll or survey, they can't really hide anything! You can get to know a person pretty well by sitting and watching them struggle with, elate over, or otherwise stare at something you've created [and obviously think is awesome] for an hour or more! This is invaluable in determining how well one has designed his or her site and what he or she or they can do towards ensuring that they target the appropriate user population.)

3. "Could you tell me how you might make a donation to Kyle's brother's film project? And could you, uh, go ahead and make one (thanks...)?"

(Again, this question will test the clarity of the navigational structure as well as the intuitiveness of one of the main features of the site, namely, the donation form [paypal, probably...]. I will be able to determine just how obvious the heading of "Documentary" is to the users, or whether it is semantically serpentine. Seemingly for certain. For some surfers. Sanguinely not for some others. [Ah, if only all humans were exactly the same!])

4. "Please sign up for news and exciting opportunities from Kyle!"

(This task is straightforward, but comprised of several steps. The user will work through the sign up form, after first having to find it, and then submit it, navigate to their email, and confirm their subscription. Should be easy, right? Well, this task will show the researcher [yeah, you guessed it; Frank Stallo---err, I mean, me] how intuitive the various aspects of the sign up form are and how easily the entirety of the process of signing up for the mailing list is completed.)

5. "Please share this on Facebook."

(This task could be completed in a couple of ways. Obviously, a user could copy the the URL of the site and link it in Facebook, sharing it with their friends. But I'd be looking for them to utilize the social media icons on the bottom of every page.)

6. "What is your name?"

(These descriptions are getting shorter...)

After all the users have completed the testing, I would inevitably pour over the data that was collected and determine ways to reduce clicks, rectify shoddy wordage, and refine inordinately meandrous processes. It will obviously be important to have had taken excellent notes over the course of the sessions, procured adequate background information on the users that undoubtedly "volunteered" for the study so as to determine how that may have affected their feedback and task success, and developed a plan for manipulating the data in such a way that I could excitedly infer incisive conclusions with statistically significant analysis. I'm not a statistician, so I'll decline to comment much (or at all) on what that might entai---- "Hey! Wait a minute!!" I interject.

Obviously I've just realized that I unwittingly began discussing a framework for a traditional, fully-realized usability testing scenario. If you remember, I said that I wasn't going to do that. For shame.

Well, if I were to perform just one test at just one point of development during the creation and implementation of this site, it'd be something along those lines. But I don't want to test my site just once! And I don't want to sit in a usability lab! And I don't want to pay anybody! At least not with money... ("Hey, I'll buy you a beer and you can sit and listen to me play 'Yankee Doodle' on harmonica!")

Stage 1: Time Aware Research

I approach websites like I approach any creative pursuit. For instance, when I've written a new song or had an interesting musical idea, I allow it to percolate for a time, in an internalized, insular way. Specifically, I play it by myself for awhile, roll it around in my head for a longer while, and then finally allow it to burst forth brilliantly and blindingly. This bursting forth occurs quickly and results in a rough, raw, and unrefined 'product'. By product, I mean a 'demo', or, if you will, a 'mockup' or 'wireframe'. This is achieved by recording a basic outline of the song onto tape or computer, quickly and easily. Once that is completed, it is ready for limited dissemination whereby persons whose opinion I respect on the matter listen to the song and provide feedback.

What would be an equivalent of this process for wireframes or mockups?

Obviously, the first thing that comes to mind is the dissemination of said mockups and wireframes, perhaps to fellow GSLIS peers and professors, coworkers, friends or family. The feedback received at that stage of development is invaluable.

But what are some other avenues for objective, unbiased review of initial design ideas?

Well, here is one.

This website provides a free outlet for designers and developers to share initial mockups or design drafts with others whose opinions are at once informed (maybe) and unbiased (we hope...). It also gives one an opportunity see others' early designs, gathering ideas to further their own creations. It affords a quick and easy way to see feedback from a variety of personalities and backgrounds in one place. Not a bad little tool for getting things off the ground. One can invest as much time (though the potential for time wastage is high) as he or she likes and it doesn't cost anything at all! Whether anyone actually bothers to look at the design is something that would have to be considered as well, but as a start, I probably wouldn't hesitate to at least contemplate uploading a design and gathering some feedback..

Here's another tool I might employ.

This website offers researchers the opportunity to “predict” how potential users might look at their site. The nice part about this tool is that you don't actually need a completed site, therefore, working with your wireframe or mockup, and using the “powerful algorithms” developed by FengGui's "world-renowned" scientists, one can catch a glimpse into how users might likely scan over your site well before you ever sit down to start coding it. Neat. You also don't need to bother any potential users or solicit outside help, the “heatmap” that Feng-Gui creates is done entirely by computer simulation.

I tried it with a website I had created a long time ago and more or less abandoned. I've linked the image from my Simmons web space on my project pages wiki.

Sadly, I don't think it worked terribly well, but then again, I don't think it was the best website with which to give this tool a whirl. As you can see, the “users” were really focused on the scroll bar (which is really obtrusive and conspicuous, isn't it?) the center image, and for some reason, the right hand side of the screen just under the main header image. Why there? I don't know.

Another very useful, free (albeit only in its most limited version) tool is Chalkmark.

It takes no time to sign up for an account to this really cool online tool. Once I made an account I was able to quickly set-up a new survey. Chalkmark allows you to upload images of your site and then administer a survey of questions that pertain to the image. Users then click on the location in the image that they believe best answers or suits the question or task at hand. There are many customizable options, determining how participants access your image and how the survey is administered. The free demonstration allows up to three questions on one survey only. I uploaded a test so that readers of this article can try the survey I developed for Kyle's site.

Go here to take the very quick test. The tasks in this survey are obviously an attempt to call out any significant problems to the wording of labels or links. But I designed it such that the task or question is not relevant to the main content of the particular page, but rather forces the user to focus their attention on either the functional links or the main navigation. As the artistic and aesthetic elements of the site have yet to be realized, it makes no sense to ask users about qualitative aspects of the site.

If I'm able to get in touch with Kyle (no small task) I'm at a distinct advantage. All he has to do is post the link to his Facebook status and it is very likely that at least 300-400 (or approximately 30-40%[the usual response rate when soliciting a large group of people en masse {how many levels of asides can I go? \\so many!\\}] of his Facebook “friends”) would respond to the survey. Granted there would be an element of the “lazy” or “disinterested” tester inherent in this as there is in any testing scenario.

But the lazy or malignant tester is not the only obstacle to obtaining useful results!

“Time-aware research” is a concept that is getting a lot more play in the user experience, usability testing field recently. In fact, it is probably one of the chief causes of the “disinterested” or “lazy” tester syndrome. Basically, the premise is this: usability testing is by its very nature “artificial”. We set up a day or two in the lab, give people $20 or some other small stipend to satiate their need to deem the study worth their time, formulate a few random tasks (or more scientifically developed tasks, designed to really ”engross” the user in what he or she is doing), and give the user ample time to complete the task, with no pressure of a real-world scenario bearing down on them. They want to perform the tasks, because they want the stipend, but they really don't have a vested interest in what they are doing. They don't “own” the goals of the tasks nor are they particularly vested in the outcome or the goals sought by the researchers.

However, if borne of necessity, a trip to the website, and a subsequent testing of the use of that website produces a far greater chance that increasingly vital observational data will result from that use. For this reason, there seems to be ample evidence that a “self-selecting” group of “participants” is more effective than what essentially amounts to a “bribed” group of “subjects”. These effects can no doubt be mitigated with effective research into the potential participants' background and interests, however, in a situation like mine, at this point in time, such research would be impossible. Granted, I am lucky enough to have a large group of people, namely friends of Kyle, to draw upon for my data collection, but in most cases I am relying on people whom I have never met and are inevitably going to be “asked” to participate in a study in which they do not have a vested interest.

This is one reason why the GSLIS Usability Lab seeks out clients like EBSCO, Brigham and Women's Hospital, and Harvard Catalyst (associated with Harvard Medical School and Center in the Longwood Medical Area). EBSCO's users are actually here, at Simmons. And the hospitals can rely on the fact that nurses, doctors, and other health professionals from the LMA can be solicited to take part in studies done on campus. These potential participants can walk over to Simmons and complete some tasks in the Catalyst system on their lunch-break.

The difference is analogous to sitting down friends and family to listen to a demo CD versus popping a CD in the car's player during a long ride together.

Stage 2: Heuristics

This brings us to the next stage of development, and the penultimate step towards completing a cost-effective, yet fruitful usability testing plan.

It is at this point that the site has been coded and uploaded to the testing server. With a bigger budget
extensive laboratory testing might be conducted. But as it is, I'm once again seeking inspired workarounds.

With a functioning, navigable “rough draft” of the site, there are some more free, or at the very least, very cheap tools to consider utilizing.

For $15 I can direct “expert” reviewers to my site and receive comprehensive feedback on my site. Where can I do this? At Feedback Army of course. This website offers a service putting your site in touch with “Turks” from “Mechanical Turk”, a work force employed by Amazon.com. As advertised, the workforce is comprised of a college-educated majority from around the United States and the world, that get paid to complete “human intelligence tasks”. In effect, they serve as the “participants” in the usability test, except that they do so remotely, and for very little recompense. Just how effective they are at determining usability issues is up to the researcher's discretion, of course, but for the price, as compared to thousands of dollars to conduct the same tasks and ask the same questions in a lab, it is probably worth at least one go 'round.

At this point I would also employ a basic heuristic evaluation based on one of the several heuristic evaluation checklists circulating on the Web, the most popular and well-known being Jakob Nielsen's heuristic checklist and “how-to” guide.


Another excellent, albeit long, checklist can be found here.

With checklist in hand, the next step is to find a few persons willing to sit with me and actually work through the list alongside the nearly-finished product. This is no small task, but it can be done remotely. Ideally, however, this is done by expert evaluators or with the researcher available, in person, to explain what in the world the checklist is asking.

If the researcher is lucky enough to have access to Morae usability testing software, there is an excellent guide to setting up a Nielsen heuristics test.
Find it here.

Stage 3: Multivariate and Remote Testing

Finally, it is time to drop the ubiquitous Google Analytics javascript code into the final design and disseminate the URL to those persons that will be most interested in using the site, specifically, Kyle's clients, prospective clients, and friends.

However, the usability testing is not complete! No, in fact, there a couple of final cheap, and potentially free, very powerful tools I can utilize before I put the finishing touches on the site and get it out of my hair forever (or until something breaks or Kyle moves or the documentary actually gets made or hell freezes over).

These two techniques work hand-in-hand and can offer some very useful, final feedback.

First, there is “multivariate” or A/B or Split testing. These terms can more or less be used interchangeably, and as far as I know, mean the same thing. Now, there are a couple of ways to actually achieve this technique which usually focuses on minimizing “bounce” and maximizing “conversion”. Bounce refers to the rate at which users leave a site moments after they've navigated to it, and conversion refers to the rate at which users execute and complete a desired outcome, e.g. donation to the documentary, signing up for the newsfeed, or linking to the site on their Facebook or Twitter account.

The first way to achieve this is to set up an account at Visual Website Optimizer. But another way is to combine the power of Google Analytics with a simple javascript code written into the site's “index.html”. The javascript code would go something like this:



var number;
number = Math.floor(Math.random()*2+1);
if (number == 2)
{
window.location.replace('http://web.simmons.edu/~lague/LIS467/index.html');
}
else {

}



What this does is allow the researcher or developer the ability to send users to two different homepages at random. Why is this useful? Well, for starters, in my current development quandary, I'd like to test the effects of having one homepage use a flash animation as a sort of introduction to the site. Than, on another page, I'd like to test the effects of not having the animated introduction. I can see the effects these two differing homepages have on the overall “conversion” and “bounce” rates by checking Google Analytics. One potential way to achieve this is to set up two sites, identical but for the homepages, in two separate directories on my web server. Each directory would have its own distinct Google Analytics account. Alternatively, I could do some serious sleuthing, and just look at the pages that are being accessed from the referring sites, that is the two unique homepages, and then see which one is more effective that way.

But the really cool thing, and this brings me back to why I mentioned that many of Kyle's friends own Macs, is that I can use this technique in conjunction with a piece of software called “Silverback”. Readers of this article can investigate this technology for themselves by going here: http://silverbackapp.com/

Silverback is free for the first thirty days. That is really cool, because what it does is simultaneously record a screencast as well as a video using the built-in iSight and audio using the built-in microphone on MacBooks, Pros, and iMacs. This is really amazing, because it turns your $900 MacBook into a mobile usability lab for free (for thirty days) or $50 (forever!). One can conduct truly remote usability testing including capturing screencasts, video of the participant, and audio recordings all in one integrated file by enlisting the help of friends, and allowing them to conduct their own test of the site on their own time.

Obviously I could still draft a list of tasks for these participants. However, at this point of the testing, I'd be more inclined to gather feedback on the overall aesthetic and interactivity of the site. For instance, I would probably try to lead the users, half of whom would be experiencing an animated introduction before accessing the home page, the other half of whom would be taken directly to the site, with questions regarding their first impressions of the site such as:

1. "Do you feel the site initially grabbed and held your interest?"

(It will be interesting to compare the responses from the A group and the B group on this question. Will users like the animation? Will they prefer to get right into the content of the site?)

2. "Do you like the format of the gallery and the method by which you select and view pictures?"

(I've included this question because it might be another chance to use multivariate testing techniques. One group would be directed to the gallery which uses a "rollover" javascript function so as to elicit a larger version of the image and coincident description. The other would be directed to a more traditional, point-and-click format, where clicking directly on an image calls out a larger image and a description of that image. Determining which version the participants preferred would be extremely beneficial.)

3. "Did you enjoy using this site?"

(Hey, why not? As informative as the site is supposed to be, it is supposed to be enjoyed. It is supposed to be entertaining.)

These questions would be supplemented by some of the more quantitative, task-oriented instructions found throughout this paper and the Chalkmark example as well.

After I'd finished this here test, I'd say that I was done.

Done.