user testing statistics

However, this argument holds only if the different users are actually going to behave in completely different ways. Subscribe to the weekly newsletter to get notified about future articles. Log in to your UserTesting account, or sign up to create an account or to become a tester. The following chart summarizes 83 of Nielsen Norman Group's recent usability consulting projects. We are looking for behavioral based insight (what they do). Summary: The answer is 5, except when it's not. In the case of running a series of usability tests or iterating your testing process (recommended for refinements based on evolving design decisions), you may want to choose a smaller number of users: I recommend no less than 8 users. Remember in the early 1990's, only the hard core research and development labs at Apple, Bell Labs, Microsoft, IBM and Sun were doing usability testing. However, even the highest-value design projects will still optimize their ROI by keeping each study small and conducting many more studies than a lower-value project could afford. Mobile Testing . When the users and their tasks are this different, you're essentially running a new test for each target audience, and you'll need close to 5 users per group. Academic Usability Research:Samples are usually larger depending on size and scope and research objectives (e.g. "A big website has hundreds of features." He holds 79 United States patents, mainly on ways of making the Internet easier to use. This answer has been the same since I started promoting "discount usability engineering" in 1989. Throughout the design process, several techniques can be employed to help you increase the odds of your product being usable. )- Also one of the major problems with gaining insight from web analytics (website traffic statistics). At the end of usability testing you will have collected several types of data depending on the metrics you identified in your test plan. The end result of usability testing is not statistical validity per say (the outcome of quant-itative research) but verification of insights and assumptions based on behavioral observation (the outcome of qual-itative research). I think it is important to understand that Jakob Nielsen was. To use any of these calculators, a user simply enters in all of the various fields and the resultant test statistic will be shown below. Learn if participants are able to complete specified tasks successfully and 2. At Experience Dynamics, (usability consultancy) we have found that the cost savings of using fewer users is negligible. Answers to common questions about testing on your Android or iOS device are located here. The site has a huge library of templates and resources, including consent forms, report templates, and sample emails. Most arguments for using more test participants are wrong, but some tests should be bigger and some smaller. Usability.gov was created by the US Department of Health and Human Services as a resource for UX best practices and website guidelines. This approach isn’t much better than guessing. As each test only takes around 20 minutes to complete, that’s a fairly generous pay rate. "A big website has millions of users." Testing with 5 people lets you find almost as many usability problems as you'd find using many more test participants. Yes, you'll need more users overall for a feature-rich design, but you need to spread these users across many studies, each focusing on a subset of your research agenda. Basically, if 10/15 users are confused you can assume that many more will also be confused as well. 80% of your videos will be completed in less than 2 hours. In a usability-testing session, a researcher (called a “facilitator” or a “moderator”) asks a participant to perform tasks, usually using one or more specific user interfaces. Dr. Nielsen established the "discount usability engineering" movement for fast and cheap improvements of user interfaces and has invented several usability methods, including heuristic evaluation. Translation: 5 users per audience segment or target user group, or for a website with 3 diverse segments you will need 15 users for the one test. Clearly, I need to better explain the benefits of small-N usability testing. Yay! Hypothesis testing is a key concept in statistics, analytics, and data science; Learn how hypothesis testing works, the difference between Z-test and t-test, and other statistics concepts . Jakob Nielsen, Ph.D., is a User Advocate and principal of the Nielsen Norman Group which he co-founded with Dr. Donald A. Norman (former VP of research at Apple Computer). For some other projects, 8 users — or sometimes even more — might be better. An opinion poll needs the same number of respondents to find out who will be elected mayor of Pittsburgh or president of France. Usability testing lets the design and development teams identify problems before they are coded. So, which is it, 5 or 15? Usability testing is being used industry-wide and has been for past 25 years. Only use this if you're desperate for money. on Some clients wanted bigger studies for internal credibility. Quantifying the User Experience: Practical Statistics for User Research offers a practical guide for using statistics to solve quantitative problems in user research. (The chart includes only normal qualitative studies; we also run competitive studies and benchmark measurements, and conduct other types of research not shown here.). Spend it on additional studies, not more users in each study. A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. Helping some of the worlds best known brands measure and improve the user experience. Our objective is to apply findings to fix design problems in a corporate setting (not academic analysis). About this template: this ten-page, text-heavy template is a blueprint for a comprehensivemoderated usability testing proposal. The first question that has to be asked is “Why are statistics important to AB testing?”The The main argument for small tests is simply return on investment: testing costs increase with each additional study participant, yet the number of findings quickly reaches the point of diminishing returns. 3300 E 1st Ave. Suite 370 Denver, Colorado 80206 1 + 303-578-2801 - MST Contact Us Blog If you want to calculate the test statistic based on paired data samples, see our Paired t-test Calculator The benefit you get from adding a few more users to the total (or in the case of 5 users, doubling the amount) is far greater than the small test that gives you "quick and dirty" results. Jakob Nielsen: You must have javascript and cookies enabled in order to display videos. If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.. If you have an Agile-style UX process with very low overhead, your investment in each study is so trivial that the cost–benefit ratio is optimized by a smaller benefit. (If management trusted its own employees, much money could be saved. Sounds exciting, huh? Subscribe to our Alertbox E-Mail Newsletter: The latest articles about interface usability, website design, and UX research from the Nielsen Norman Group. Before we venture on the difference between different tests, we need to formulate a clear understanding of what a null hypothesis is. A null hypothesis, proposes that no significant difference exists in a set of given observations. Some of the randomly selected sets of 5 participants found 99% of the problems; other sets found only 55%. Basically, guerrilla testing … Scale research across your organization with … Usability Testing with 5 Users: Information Foraging (video 3 of 3), Usability Testing with 5 Users: Design Process (video 1 of 3), The Word "Validate" Undermines UX Effectiveness. want to collect as much relevant knowledge as you can get in order to make the product that people really want Desktop Testing. Why did we run more users in the first place, given that I certainly believe my own research results showing the superiority of small-N testing? 2. ", you will need several hundred responses to gain statistical validity in order to validate what will be opinion-driven data. 1. Recruit for engagement, not … We end at the 1 Sample Binomial Test with a link to the One Proportion Calculator. Copyright © 1998-2020 Nielsen Norman Group, All Rights Reserved. 15 users per segment or 40-100 users in a usability test). Some examples from our projects include. Later on in the article Nielsen says that, Statistical Validity in Usability Testing, Jakob Nielsen's "test with 5 users" assumption. "The site makes so much money that even the smallest usability problem is unacceptable." If you've been asked to participate in a special test, you can find more information here. If you want a single number, the answer is simple: test 5 users in a usability study. (Conversely, the decision about whether to fix a design flaw should certainly consider how much use it'll get: it might not be worth the effort to improve a feature that has few users; better to spend the effort recoding something with millions of users.). During the UX Conference, I surveyed 217 participants about the practices at their companies. If you have many things to fix, simply plan for a lot of iterations. Quantifying the User Experience: Practical Statistics for User Research, Second Edition, provides practitioners and researchers with the information they need to confidently quantify, qualify, and justify their data. 15 or 20 participants). To use A/B testing efficiently and effectively, you must understand what it is and all the statistics that surround it. Why did they fail? Looks for trends and keep a count of problems that occurred across participants. In user testing, we focus on a website's functionality to see which design elements are easy or difficult to use. (It might seem counterintuitive to get more return on investment by benefiting less from each study, but this savings occurs because the smaller overhead per study lets you run so many more studies that the sum of numerous small benefits becomes a big number.). Rich companies certainly have an ROI case to spend more on usability. The test participant should belong to your target audience. Watch Usability Testing with 5 Users: ROI Criteria (video 2 of 3), 3 minute video with However, it's very unreliable in the sense that you will see this message over and over again: "Unfortunately you didn't quality for this test." The end result will be higher quality (and thus higher business value) due to the additional iterations than from testing more users each time. For example, if a medical doctor needed to test the probable effectiveness of a drug, she would utilize statistics to see if the drug worked a certain number of times for a certain population. Meh. With higher investment, you want a larger benefit. When a study's sponsor presents findings to executives who don't understand usability, the recommendations are easier to swallow when more users were tested. In general, if the data is normally distributed, parametric tests should be used. The null hypothesis, in this case, is that the mean linewidth is 500 micrometers. Usability testing is a popular UX research methodology.. Laurie Faulkner ( PDF: 2003) has conducted new empirical research showing benefits from increased sample size. Guerilla testing is the simplest form of usability testing. In other words, after you spend the time and money to set up, facilitate and report on the test, adding a few more users does not add "that much" time and money to the overall project. No worries, no one will ask you to make grind statistics and make calculations. Entering 20 out of 25, “Is Greater Than” and a Test Proportion of .75 tells us there’s about a 70% chance at least 75% of all users would be able to find the Sewing … Get rapid feedback with access to the largest and most diverse first-party panel. If the data is non-normal, non-parametric tests should be … For really low-overhead projects, it's often optimal to test as few as 2 users per study. This test-statistic i… Statistics tell half the story and often are devoid of context (e.g. Statistical hypothesis testing sits at the core of A/B testing. Asking someone their opinion does not constitute usability requirements, since usability testing is about isolating "how they will actually use" the design not just "what they think" of the design. The variance in statistical sampling is determined by the sample size, not the size of the full population from which the sample was drawn. 3. With, say, a financial site that targets novice, intermediate, and experienced investors, you might test 3 of each, for a total of 9 users — you won't need 15 users total to assess the site's usability. You need big samples for market research because of this (though focus groups bend this because they are somewhat qualitative). This is an argument for running several different tests — each focusing on a smaller set of features — not for having more users in each test. If you give a small set of users a scenario that forces them to interact with home page elements and observe their behavior, and listen to their unsolicited reactions, you will get a better idea of what they think and need. Anything not fixed now will be fixed next time. And if you’re just starting with user testing, don’t worry much about demographics at all. Qual-itative research follows different research rules to quant-itative research and it is typical that sample size is low (i.e. If this is your strategy, you’re ripe for disappointment. The driver here is expectation (governed by cognitive factors) vs. opinion which can be driven solely by emotional, social or personal factors. "We have several different target audiences." When to use a t-test. The UserTesting Human Insight Platformhelps you close the empathy gap. It’s great that you guys have got the opportunity to do some usability testing of the app that DigitalAgencyCo are building. Answer 2: = 15 users (Laurie Faulkner, 2004), PDF file. Typically, you can get away with 3–4 users per group because the user experience will overlap somewhat between the two groups. You might even mirror certain competitor activities and run heuristic evaluations to check for basic usability errors. From: Matthew Magain To: Sarah Doyle Subject: Re: testing the app Hi Sarah. The average response was that they used 11 test participants per round of user testing — more than twice the recommended size. This can actually be a legitimate reason for testing a larger user set because you'll need representatives of each target group. There's little additional benefit to running more than 5 people through the same study; ROI drops like a stone with a bigger N. And if you have a big budget? In contrast, market research is largely opinion-driven: You ask people what they think and what they think they think. Usability research is largely qual-itative, or driven by insight (why users don't understand or why they are confused). 2012-06-03 For example, suppose that we are interested in ensuring that photomasks in a production process have mean linewidths of 500 micrometers. Sadly, most companies insist on running bigger tests. The earlier issues are identified and fixed, the less expensive the fixes will be in terms of both staff time and possible impact to the schedule. Behavior-driven research is more predictable. And why are we arguing about an extra 10 users, doesn't one need to test with at least 100 or more users for statistical significance, accuracy and validity? ROI is the ratio between benefits and expense. The t-test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. You ask a number of people to perform a number of typical tasks on your website or intranet.Or on a mock-up if you’re in the process of building a new one. It’s probably more fun to put up a test between a red and green buttonand wait until your testing tool tells you one of them has beaten the other. Example: If you ask someone "what do you think of this homepage? The evaluation of a design element's quality is independent of how many people use it. The evaluation of a design element's quality is independent of how many people use it. When hiring a consultant, the true expense is higher than just the fee because the client must also spend time finding the consultant and negotiating the project. The concept of statistical significance is central to planning, executing and evaluating A/B (and multivariate) tests, but at the same time it is the most misunderstood and misused statistical tool in internet marketing, conversion optimization, landing page optimization, and user testing. Other Test Types. Profile and Dashboard Help Salaries posted anonymously by UserTesting employees. 85% of issues related to UX can be detected by performing a usability test on a group of 5 users. When analyzing the data you’ve collected, read through the notes carefully looking for patterns and be sure to add a description of each of the problems. If you could complete three tests within an hour, you’d earn $30 for an hours work. an auction site where you can either sell stuff or buy stuff. I initially did them in a Doc (like Word), but this looked quite text-heavy so I have now switched to a Presentation (like PowerPoint). Many designers and researchers view usability and design as qualitative activities, which do not require attention to formulas and numbers. Statistics help you interpret results and make practical business decisions. 10 Usability Heuristics for User Interface Design, When to Use Which User-Experience Research Methods, Empathy Mapping: The First Step in Design Thinking, Between-Subjects vs. Within-Subjects Study Design, UX Mapping Methods Compared: A Cheat Sheet, User Control and Freedom (Usability Heuristic #3), Imagery Helps International Shoppers Navigate Ecommerce Sites, Flexibility and Efficiency of Use: The 7th Usability Heuristic Explained, 3 Steps for Getting Started with DesignOps, Error Handling on Mobile Devices: Showing Alerts, majority of your user research should be qualitative, Affinity Diagramming for Collaboratively Sorting UX Findings and Design Ideas, Avoid Leading Questions to Get Better Insights from Participants, Project Management for User Research: The Plan, Observer Guidelines for Usability Research, How to Recruit Participants for Usability Studies, How to Conduct Usability Studies for Accessibility, Making Use of Qualitative Data with Video, Conducting User Research in the Public Sector, a medical site targeting both doctors and patients, and. In Nielsen's much respected and equally criticized article "Why You Only Need to Test With 5 Users" (written in 2000) he recommends (based on the early 1990's analysis) that instead of opting for higher accuracy, you go for the "fast and dirty" approach of conducing three tests instead of one "elaborate" study. Finally, the very fact that these were consulting projects justified including a few more users, which is why we often run studies with around 8 users. A t-test can only be used when comparing the means of two groups (a.k.a. Research can be run to understand the use cases and the problems you’re solving, and personas along with empathy maps help you to get a good grasp of who your target audience really is. Even if they spend "too much" on each quality improvement, they'll make even more back because of the vast amounts of money flowing through the user interface. For the purpose of these tests in generalNull: Given two sample means are equalAlternate: Given two sample means are not equalFor rejecting a null hypothesis, a test statistic is calculated. You don’t want to find the love of your life – you just want to observe behaviour and detect errors. ), Some design projects had multiple target audiences and the differences in expected (or at least. In her study, "Beyond the five-user assumption: Benefits of increased sample sizes in usability testing", she wrote: It is widely assumed that 5 participants suffice for usability testing. However, a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Nowadays, it is all done automatically for you. Usability research is behavior-driven: You observe what people do, not what they say. Three reasons: The last point also explains why the true answer to "how many users" can sometimes be much smaller than 5. The book presents a practical guide on how to use statistics to solve common quantitative problems that arise in user research. With 10 users, the lowest percentage of problems revealed by any one set was increased to 80%, and with 20 users, to 95%. Ho… The test is performed on an individual basis.So it’s not like a focus group where there’s a bunch of people giving you feedback all at once.Please, don’t ever call a focus group a user test. As with any human factors issue, however, there are exceptions: However, these exceptions shouldn't worry you much: the vast majority of your user research should be qualitative — that is, aimed at collecting insights to drive your design, not numbers to impress people in PowerPoint. Here the sections are more clearly marked by slides so it’s easier to consume. The coronavirus pandemic has made a statistician out of us all. Statistical analysis helps elaborate on trends or patterns found within the research of a topic. A free inside look at UserTesting salary trends based on 172 salaries wages for 91 jobs at UserTesting. During a usability test, you will: 1. This is why phone or web surveys require hundreds or thousands of responses. You can't ask any individual to test more than a handful of tasks before the poor user is tired out. June 3, 2012. Experts, authors and academics put their reputations and credentials behind the methodology. Site Map | Copyright 2020. Each dot is one usability study and shows how many users we tested and how many usability findings we reported to the client. Usability Testing = 10-15 participants; Field Studies = 15-40 participants; Card Sorting = 15-30 (higher is better since card sorting uses the statistical method of cluster analysis) Academic Usability Research: Samples are usually larger depending on size and scope and research objectives (e.g. Question: How many users do you need to test with for a usability test? Doesn't matter for the sample size, even if you were doing statistics. Statistics aren’t necessarily fun to learn. For most projects, however, you should stay with the tried-and-true: 5 users per usability test. Thanks for your message. It's not a scam like some people have stated: you do get paid a week after a completed test. The variance in statistical sampling is determined by the sample size, not the size of the full population from which the sample was drawn. … With 5 users, you almost always get close to user testing's maximum benefit-cost ratio. A lack of understanding of A/B testing statistics can lea… Obviously if I had a little more notice I could probably come in and give you guys a hand, but I can’t really juggle things at this late notice. The decision of which statistical test to use depends on the research design, the distribution of the data, and the type of variable. pairwise comparison). A classic use of a statistical test occurs in process control studies. In this study, 60 users were tested and random sets of 5 or more were sampled from the whole, to demonstrate the risks of using only 5 participants and the benefits of using more. Doesn't matter whether you test websites, intranets, PC applications, or mobile apps. Instead, usability testing participants should be recruited based on matching their behaviour and prior experience and knowledge about the topic. Research shows that even with low numbers, you can gain valid data. Introduction. Find more information about testing on your desktop or laptop computer here. 15 users per segment or 40-100 users in a usability test). This data can come from the natural or social sciences. All Rights Reserved. Identify how long it takes to complete specified tasks 3. Keeping the documents online is a great idea, as people can refer to them wherever they are, so I tend to use Google Drive for my testing reports. Guerilla testing. Often, it ends with a year’s worth of testing but the exact same conversion rateas when you started. In user testing, we focus on a website's functionality to see which design elements are easy or difficult to use. Answer 1: = 5 users (Jakob Nielsen and Thomas Landauer, 1993). While the participant completes each task, the researcher observes the participant’s behavior and listens for feedback. The CDC’s test was designed to use three main sets of primers and probes — two that match just the novel coronavirus, and one that matches a variety of highly similar viruses. User Testing’s pay is pretty good – you earn $10 per test. The basic point is that it's okay to leave usability problems behind in any one version of the design as long as you're employing an iterative design process where you'll design and test additional versions. There is a wide range of statistical tests.