## Data-Driven Decision Making

by **Jitendra Subramanyam** | July 22, 2019 | Submit a Comment

Here is some data. You’ve probably seen something like this in your management dashboard. Does it warrant a data-driven decision?

You are in charge of Group 4. Given that your group’s renewal rate is lower than that of Groups 1, 2, and 3, should you take action to fix customer renewal in your group? Should you find out what’s working, especially in Group 1, and strive to emulate that?

To answer this question, you first have to know the extent to which these renewal rates can vary from group to group without there being any real difference in these rates. Why would the results vary like this? For one of two reasons (or both).

Before we get into the details, here’s the **key takeaway**: *Before* you make any data-driven decisions, figure out how much variation you *should *expect to see within the groups based on these two reasons alone.

Let’s get into the two reasons for variation in data of the kind we see in Figure 1.

### Reason 1: Sampling Variation

First, consider the entire population that consists of all of the groups put together. We can think of it like the diagram below.

The green plus signs are customers who will renew at the end of the year. The red minus signs are customers who won’t renew at the end of the year. The population of all customers can be grouped in many ways. Here are just two of the many possibilities.

You can see why, just by grouping customers differently, the renewal rate of the groups can change. This kind of variation in the percentage renewal rate within a group is called *sampling variation.*

Now, there might be a very good reason why you group customers the way you do. It might indeed be the case that a customer in Group 4 cannot be conceived to be part of Group 2, and vice versa. But sometimes, group affiliation is fluid. A customer in Group 4 could well be classified as falling into Group 2 or vice versa. This can happen when the groups themselves are not conceptually watertight or there is an issue with data quality that makes it difficult to definitively classify customers into one group or another.

In such situations, just seeing a difference in percentage renewal rates between the groups is not enough to drive action. We must first determine if the difference lies within the variation we *should *expect. If it does, then there’s no reason to take action; if the differences are greater than what should be expected due to sampling variation, then we are seeing a real difference between the groups.

To determine the variation we should expect to see, we can do a bit of simulation. Imagine you have a *population *of clients as shown in Figure 2. For this population as a whole, we’ll set the renewal rate at 80% — in other words 80% of the total number of customers renew at year end. From this population we’ll choose a random sample of 300 customers (*without* replacement, in case you’re wondering). We’ll calculate the renewal rate of this sample. And we’ll repeat this 100 times to get a sense of how the renewal rate varies from one sample to another.

Here’s the variation we see when run the simulation.

Just by taking different samples from the same population, the renewal rate can vary from a high of roughly 86% to a low of roughly 75%. Remember, this is the exact same population with the exact same renewal rate of 80% overall. But simply because we sample different groups of 300 customers from this population, the sample renewal rates range over a difference of 11 percentage points!

Figure 5 depicts the same point using Group 4 as an example. Sampling variation alone will result in Group 4’s percentage renewal rate to vary anywhere between 86% and 75%.

In other words, there is no real difference between a percentage renewal rate of 86% and 75% — any value in this range is *equally likely *due to the inherent variation in the way Group 4 (or any other Group for that matter) is composed.

Except for Group 5, sampling variation alone accounts for the range of values we see in Figure 1. There is really no difference in renewal rate between Group 1 and Group 4 although it appears so at first glance. It would be misguided to take action based on Figure 1 to fix Group 4’s renewal “problems”. However, the customers in Group 5 have a renewal rate of 72% which falls below the range we see; it makes sense to look into the customers in this group and take appropriate action to improve their renewal rates.

Variation in the results seen in Figure 1 could be for a second reason as well. Let’s understand this second reason for variation.

### Reason 2: Trajectory Variation

Put yourself in a customer’s shoes. During the course of a year, a customer experiences various events that may influence his or her propensity to renew. Some of these events positively influence the renewal decision; for example, having a great interaction with customer service or saving critical time by using the product. But many of the events during the course of the year might negatively influence the renewal decision. The customer’s role may change making the product no longer useful, or they may switch employers. Or they may have a bad experience with the product.

How many positive and negative events a customer might encounter during the course of a year is hard to predict. But this kind of customer renewal trajectory can be modeled based on some assumptions. We’ll use three such models to try and capture the customers’ trajectories. The point is that we can’t say much about the data without having an understanding of how the data was *generated*. That’s what a model aims to do. Let’s look at our first trajectory variation model.

#### Trajectory Variation Due to a Brownian Renewal Process

A Brownian Process is a model that captures the many vagaries of a customer’s renewal journey. The customer starts the year at a neutral state, say at zero. Through the course of the year the customer experiences positive and negative events. A positive event pushed them in the positive direction along the y axis while a negative event pushes them in the negative direction along the y axis (see Figure 6 below).

The Brownian model works by setting the number of events a customer experiences and uses a Normal distribution of positive and negative events from which the particular events are randomly selected.

Once again, we take 300 customers and simulate their trajectories over the course of a year. We find the renewal rate for this group. We then repeat this again until we’ve run 100 simulations. The results are captured in Figure 7 below.

And once again renewal percentages range from 86% to 75% showing that there is no real difference between the renewal rates between the first 4 groups shown in Figure 1. There is no reason to take action to fix Group 4’s renewal rate based on what we know about the range of variation in renewal rates.

#### Trajectory Variation Due to a Markov Renewal Process

In a Markov process, a customer starts out the renewal year in a particular state. They’ve just signed and are excited by the product (for the most part). Let’s call this the “High Value” state. During the course of the year the customer experiences various events. Each event leads to a transition from one state to another *based only on their current state. *In a Markov process these transitions are defined by a *transition matrix*.

If a customer’s current state is High Value, no matter what event they experience, they have a probability of 0.6 of staying in the High Value state. Similarly, if a customer is in a Medium Value state, they have a probability of 0.1 of transitioning to a Low Value state no matter what event they experience. And once they are in a Low Value state, the probability of transitioning to a Medium Value state is 0.3.

We can set the transition matrix to anything reasonable. But note that the Markov process model assumes that a customer’s current state is more important than the type of event they experience, positive or negative. In this aspect, the Markov process model is quite different from the Brownian process model where the customer is swayed in a positive or negative direction by the positive or negative valence of each event they experience.

Once again we simulate 100 instances of 300 customers moving through their renewal journeys. Each customer has 24 interactions — or actually, 24 transitions from one value state to another. At the start of the year 40% of customers are in the High Value state, 50% in the Medium Value state and 10% are in the Low Value state. The renewal rates at the end of the year for 100 simulations is shown in Figure 9.

In Figure 9 we again see variation in renewal rates from 86% to 75% due to the multiple ways in which customer renewal journeys unfold. This variation in customer renewal rates is what we should expect. Going back to Figure 1 we again see that that there’s no real difference between Groups 1 through 4 even though the reported renewal rates differ.

#### Trajectory Variation Due to a Customized Random Process

While the Brownian and Markov process models capture some of the aspects of a customer’s renewal trajectory, they fall short as models in some ways. Perhaps it’s not always appropriate for events to come from a Normal distribution (as is the case in our Brownian process model). Or maybe the customer’s mindset is not as important as the Markov process model makes it out to be. To get around these problems we can make assumptions that better fit the customer renewal trajectory.

For our customized random process model we’ll once again use 300 customers. During the renewal trajectory a customer experiences six types of events:

- Positive High Impact (Prob = 0.1, Impact = +5)
- Positive Medium Impact (Prob = 0.1, Impact = +2
- Positive Low Impact (Prob = 0.1, Impact = +1)
- Negative Low Impact (Prob = 0.24, Impact = -1)
- Negative Medium Impact (Prob = 0.23, Impact = -2)
- Negative High Impact (Prob = 0.23, Impact = -5)

Each customer is randomly assigned the number of events they experience during the course of the renewal year. The distribution of the types of events is set as described above — 30% of the events are positive while 70% of the events are negative. A customer starts off the year at zero and is pushed farther from or closer to this starting state based on the number and types of events they experience. At the end of the year they are either above or below a set threshold. If they end up above the threshold, they renew, otherwise they don’t.

Here’s the variation in renewal rates when we use the customized random process model and simulate the journeys of 300 clients.

You won’t be surprised to see the variation of renewal rates in Figure 10.

### Conclusion

Without knowing how much variation there is between groups — variation due to sampling, trajectory variation or both — it doesn’t make sense to read reported differences in dashboard visualizations as real differences. It makes no sense to act on differences that are caused simply by the types of variation we’ve seen. To make effective data-driven decisions (rather than misguided ones) figure out if the differences exceed what you should expect to see. To figure out how much variation you should expect to see, get a better understanding of the process by which the data was generated.

### Additional Resources

**Category:** customer-analytics-voc-and-insights **Tags:** data-literacy data-driven-decisons

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.