A/B Testing for Small Nonprofits & Political Campaigns

Testing

Follow regular Epolitics.com contributor Allyson Goldsmith on Twitter at @allyson8765.

I hear the same advice at practically every digital strategy conference and event I attend — test everything. In principle I agree with this advice, but anyone at a small organization like mine also has to be practical about what you can realistically accomplish on the testing front, given the many other varied demands on you time.

In my career, I’ve worked for a series of organizations with small digital programs, where it’s been just me or it’s a two-person shop. We all wish we had a list the size of Organizing for America and a digital staff to match. But we don’t! So how should you structure a testing program that matches the resources and constrains faced by a small digital department?

Here are two questions you should ask yourself:

1. What do you have the tools or software to test?

Do you have Optimize.ly to test website performance? Do you have ShareProgress, which allows you to test your organizations social sharing text? Do you have an email system, like SalsaLabs or Convio, that permits you to A/B test your emails? If not, I suggest you acquire them pronto!

2. What do you want to optimize?

Focus on the acquisition of information that could have a large impact on your digital strategy going forward. When someone asks me if we can test something I always ask what they are hoping to learn and what are they going to do with the information the test(s) provide? I want to make sure there is a plan to use the information we gather to advance the performance of the organization moving forward.

If you’re testing on email do you want to learn about your email template, signers, or length and tone of the email? If you’re evaluating a donation page you could test the ask string, image or no image, one column or two, and the text. Clearly, there are many features and outcomes that can be evaluated with a test, so it’s important to choose carefully to get the most value for the time you allocated to testing.

Once you’ve decided what you want to test, you have to decide how you want to determine success. Otherwise, you can’t interpret what your test reveals.

For example, if you’re testing a new donation page, your variables for success could be the number of donations, the total amount donated, or the average size of a donation. However, you should decide up front which, if not all, of these outcomes is most important. If you choose multiple measures of success, it’s possible that the tests will reveal that you are successful on one front but not in another. I had a situation like that once where the tests I ran revealed that a new donation page raised more money than its predecessor, but from fewer donors.

You also need to decide what level of statistical significance you’re going to use to determine success. Do you want to be confident in the inference you have drawn virtually all of the time (i.e., 99 percent of the time) or would you be comfortable if 90 percent of the time the conclusion you reach is accurate. Not sure what I’m talking about? Never fear: I’ll explain statistical significance in the analytics section below.

Once you have your goals, you’re ready to devise a testing plan. When you do this, make sure to keep in mind the amount of time it takes to conduct a series of A/B tests. Testing often involves two versions of something, so if you do your own coding make sure to give yourself double the amount of time.

When I test something that I think will change our digital strategy, I want to test it multiple times before settling on a view of what the results reveal and hence deciding whether or not to implement a strategy. I do this to determine if results are consistent – they say the same thing each time — and are believable (i.e., statistically significant), because findings can be inconsistent.

Moreover, the confidence you hold in your findings can change over time. Running the same test a number of times over a period of time is called longitudinal testing. One of the organizations I worked for tested a new donation page to see if it was attracting more donors and raising more money. The organization ran the test four times over a few weeks. The initial test revealed that the original and the new donation pages yielded practically the same results in terms of number of donors and amount of money raised. However, the second test showed that the new page raised more money and had more donors than the old donation page. The third test resulted in the old donation page having better results for both variables. The fourth test suggested that the original and the new donation pages provided equivalent levels of performance in terms of total donors and total donations.

Interestingly, when I compared the two donation pages by examining the performance of each from the beginning of the analysis period to the end of the examination period, the new donation page outperformed the old donation page in both total donations and number of donors. Situations like this is why longitudinal testing over time is important when considering making major changes to your digital program.

Now, go forth and implement your testing plan!

Which means it’s Analytics time!

Analytics is how you’ll determine which version wins the test. Hang on, everybody: I want to stop and take a minute to talk about statistics. I know it can be hard to understand, but I promise it’ll be painless!

Statistical significance is how confident you are that if you repeated the test the results would be the same. Generally three levels of statistical significant are used: 90%, 95% and 99%. The level you use depends on how sure you want to be that your results aren’t random. When testing a new donation page, I generally use a 99% confidence interval. But, when I’m testing an email subject line, I’ll use a lower standard, typically I am happy with a 90% level of confidence in this realm. The reason to use a higher confidence interval for a test that will affect your entire digital program is that you want to be much more confident that you can hang your hat on your findings so you don’t casually do something that might harm your program. Subject lines are important, but they only affect one email – not the entire shebang. Fortunately, for us and the rest of the busy world, an online calculator called AB/BA will calculate the statistical significance of your test results.

Finally, save your test data. It’s impossible to remember everything you’ve ever tested and the results of the test. I have a spreadsheet that I use to record the key findings from all my tests and if it was significant or not. I’ll even make this easy for you. Here’s “>a template of the spreadsheet that I use.

The beauty of testing? You now have actual, real-evidence to guide your decision making!! Not too shabby.

Written by
Allyson Goldsmith
View all articles
Leave a reply