Around the start of 2013, the senior leadership team of my company decided to make a big change to our business model. Our company switched from selling stand-alone software products, to selling an all-in-one platform with a good/better/best pricing model. Depending on which package you were in, you would have access to certain products/features. Overall, we believed this change would be beneficial for both our company, and for our customers.
In order to bring this idea to life, senior leadership put together an internal start-up team that was made up of about twenty five people. There was some representation from different departments in the company including marketing, analytics, billing, sales, support, and a few others . I was one of three analytics members on the team. Although we collaborated on everything analytics related, each of us had areas of work that we were most involved in. Among other things, we were tasked with providing some input into product features, leading quick and iterative A/B testing, building easy to consume reporting and analyses, and giving updates to C-level executives. For this post, I’m going to skip “other things” and “providing input into product features” mainly because they were pretty basic or were primarily done by the other two analysts. I’m going to give a little more detail on the three areas I was involved in and contributed to the most.
The first thing to figure out for testing was how much of a sample size we would need. We based this on two things. 1) To get to, or as close as we could to, statistical significance with our testing. That would give us confidence in the results we were seeing. 2) If this completely tanked ($0 revenue), that we wouldn’t harm the overall business. Keep in mind that although we were treating this like a start-up, we were still part of a public company with revenue targets.
Like most SaaS businesses, we have a funnel that narrows down from website visitors > free trialers > paying customers > cancelled customers. The metrics we traditionally used for this funnel are Visitor:Trial rate (V:T), Trial:Pay rate (T:P), Average Revenue Per User/Customer (ARPU), and Cancel rate. After playing around with different conservative scenarios based on historical trends, we figured out that if we took 10% of our overall website traffic to put into this new experience, we should have enough sample to test with. We also worked with the financial analyst on the project to confirm that 10% wouldn’t put our revenue targets in jeopardy.
Along with the 10% of website traffic that we were putting into this new experience, we held out another 10% of our old business experience as a control group. The control group was held out of any promotions, offers, or any other special activity that we may have been offering at the time. That would keep the control group clean for the best analysis. For this project, we added a new metric that was our primary measure of success. It was called Average Revenue Per Visitor (ARP-V). It combined the four previous metrics into one that gave a simple view into whether or not the new experience as a whole was working better than our old experience. An example of ARP-V is below.
If you follow the funnel example from start to finish, you’ll see that the control group has a much higher T:P rate than the test group. However, the test group has a higher V:T rate, a higher ARPU, and a lower cancel rate. When you divide the total revenue by the starting visitor number, you get more revenue from each visitor in the test group than you do in the control group.
The last part of the testing was to figure out how long we would have to run each test. As I mentioned before, the goal for testing was to test, learn, and iterate quickly. We wanted to make as many tweaks as possible in the shortest amount of time. After playing around with more scenarios of how some of these early tests might go, we came up with about 14 days for each test iteration.
After running a test for 14 days, we would take any learnings we could (click tracking data, feedback from the sales reps, etc), to make tweaks to our website, the product UI, pricing, and more. In a few cases where results were to close to be statistically significant, we supplemented them with a testing forecast that I had previously built. That gave us a little more confidence in the recommendations we were making. In total, we went through about twenty iterations of testing. Originally, we had a goal of 5% lift in ARP-V. By the end of our testing iterations, we were able to achieve close to 20% lift in ARP-V.
Once we had our testing plan in place, we needed to be able to see how our new experience was performing, so we could make appropriate recommendations. To do this, we built some advanced excel reporting that was updated daily. In addition to the funnel metrics mentioned earlier, this reporting showed product engagement metrics like % of trailers starting our product flow, % of trialers uploading their contacts into our system, and the % of customers logging into the product in the last 90 days. Each of these detailed metrics were indicators of some of the funnel metrics, so if funnel metrics weren’t performing well in the new experience, we know which engagement metrics to look at. Based on what we saw, we would know if it was worth tweaking the product for the next iteration of testing, or if the sales reps may need to adjust how the pitch the new experience. Below is a sketch of what some of the visitor and trial reporting looked like in excel.
That’s still a little vague, so I’ll walk through a specific example of how we used the reporting, along with an ad-hoc analysis we did occasionally, to discover insight about trial performance in the new experience.
One of the biggest metrics we struggled with was Trial:Pay (T:P) rate. Whether it was the new UI, pricing, or sales positioning, we always had a decline, or a gap, between the new experience and the old experience. The reporting showed this was consistent no matter which test iteration we were looking at. We brainstormed with the product team to figure out if there was a change we could make to the UI that would close the T:P gap.
As I mentioned before, we have a few product engagement metrics that we know are indicators of a higher T:P rate. One of them, which is a little intuitive, is successfully getting through our product flow. We also looked at T:P rates of trialers in other segments including 1) starting the product flow, 2) clicking a button on the UI homepage, 3) having a sales interaction only, no UI, and 4) no interaction at all. Based on the % of trialers falling into each of these segments, combined with the T:P rates of these segments, we were able to figure out which segment was the biggest cause of the overall T:P gap we had (basically a weighted averaged).
It was the segment of trialers who were starting the flow. The T:P rate for this group was similar between the new experience and old experience, however, there were a lot less people starting the flow in the new experience. If we could get more trialers in the new experience to get into the product flow, our overall T:P gap should close.
After collaborating with the product team on different changes we could make for the next test iteration, we agreed on the change below.
Our hypothesis was that the initial UI homepage that trialers saw was too busy with some of the menus and additional features we added in to the new experience, and trialers were getting overwhelmed. We simplified the UI by giving trialers the ability to only click the button for the product flow. Once they went into the product flow, they would see the additional menus whenever they went back to the homepage. This change was successful, and did close the overall T:P gap we had by a few percentage points.
The last part of this project I’ll discuss is how we communicated with the rest of the team, the executives, and the company as a whole. We had a different communication based on whether it was a daily, weekly, month, or ad-hoc update.
Every day, we’d send out an email to the start up team on the results of the latest test iteration. This mostly included the funnel metrics, and whether or not we were seeing statistically significant differences. If there was anything special to note, we’d include that as well.
Once a week, we would send an email out to start up team as well as the executives. This would include the funnel metrics for the latest test iteration, as well as insights we found, and updates on the changes we were going to make like the UI change I highlighted above.
Whenever we were giving updates to the team, we always used a Fact/Meaning/Action structure. This was the simplest and most impactful way we were able to get our message across and make progress. The Facts are the actual results of the test. In other words, what the data is telling you. In the UI change, the facts were that we were seeing a 30% increase in trialers starting the product flow, a 10% increase in trialers finishing the product flow, and the T:P gap close 2%. All of these results were statistically significant. This meant that trailers were more successful in the new experience, and more likely to buy based on ease of use over the appearance of many features. The action was that we kept the simplified UI as the default in the following test iterations.
Overall, this project was a great experience for me personally and professionally. It’s probably the most rewarding thing I’ve done, and the most successful. Due to the 20% lift we saw in ARP-V, our executive team made the decision to launch the new experience to 100% of our web traffic after just 7 months of of testing.
That wouldn’t have been possible if the 25 person start up team didn’t work so well together. I think it was a great example of what can happen when you put great employees together, and empower them to do great things.
DISCLAIMER: For confidentiality reasons, the actual metrics values included in this post have been adjusted. However, the story is still an accurate portrayel of the project, and what occured.