Are You Making A Difference Statistically?

We probably all like to think we are making a difference in the world, particularly those people who apply various interventions to facilitate improvement. Examples include medical doctors, counsellors, teachers, organisational psychologists, environmentalists, and vegetarians, to name but a few. However, if you are providing a service to achieve a positive change, or you’re a client who has purchased the services of someone who has committed to effecting a positive change, how do you know if a “real” difference is being achieved? To help answer this question, one needs to perform a statistical analysis.

Let’s consider an example. Suppose an organisation is interested in contracting a consultant to potentially deliver a training program relevant to increasing psychological resilience in its employees. However, before the organisation commits to rolling-out the program in full, they agree to a smaller implementation on a select number of employees (“pilot-study”). If the results are impressive, they will move on with the project, and the consultant stands to earn a substantial amount of money.

Arguably, the most straightforward and powerful approach to addressing this question is with an experimental design which includes both a between-groups effect (‘training group’ versus ‘control group’) and a within-groups effect (‘pre-training’ versus ‘post-training’). Statistically, this is known as a split-plot design. Understandably, these last two sentences may very well be gobbledygook to many non-statistically inclined readers. Consequently, I’ve simulated some data to produce some plausible results in Figure 1, which may be more familiar to everyone.
Figure 1: Resilience Training Efficacy: Means
We can see that there are two groups: (1) a control group; and (2) a training group. In this fictitious example, the two groups were tested on resilience on two occasions: (1) pre-training; and (2) post-training. It can be seen that the training group’s level of resilience has increased from pre- to post-training. By contrast, the control group’s level of resilience does not appear to have changed much at all. However, from a statistical point of view, looking at this chart, the question remains: What are the chances that this apparent difference between the two groups may have arisen due to chance? We can test this question with a split-plot Anova, which is arguably one of the most useful statistical analyses around (I’d place it in my top 3; I'm talking 'take it with me on a deserted island' here).

Using a statistical program (i.e., Statistica 10), I estimated the chances of having obtained the pattern of means displayed in Figure 1 at less than 1 in 100. Such a result makes for a compelling case that the training program’s apparent effectiveness is very probably not a fluke. Figure 2 (below) is the same chart, with the addition of confidence intervals around the means. As the confidence intervals associated with the means at post-training are quite distant from each other, we can infer that there is little chance that the two groups of employees are operating at the same level of resilience: the training group is outperforming, post-training.
Figure 2: Resilience Training Efficacy: Means
Of course, this statistical analysis does not address whether the difference in the means at post-training (which equals 6.2, btw) is “meaningful” or consequential. It simply states that the pattern of four means is unlikely to have happened by chance. To address the question of “consequential” would require an additional statistical calculation (which is pretty simple), as well as some qualitative considerations. I’d nonetheless say that the consultant in this fictitious example has taken a big step toward building a compelling case to persuade the organisation to roll-out the program to all of its employees.

Additional Technical Information: To test the hypothesis in this case, a split-plot Anova was performed. I used Statistica 10.  It was the group*time interaction that was most relevant, in this case. The interaction was associated with F(1,98) = 119.09 , p<.001. The sample size was 50 in both groups. The Pre-training means(sds) were Control = 53.88 (4.90), Training = 52.56 (4.56); the Post-training means(sds) were Control = 53.80 (4.33), Training = 58.80 (4.65).

Thought Expansion: An alternative way to think about the interaction in this case is to consider that the magnitude of the difference between the means at time 2 (post-training) was statistically significantly larger than the magnitude of the difference between the two means at time 1 (pre-training). Some people in this type of design would be tempted to test the mean difference with two separate t-test pairs: (1) at pre-training (53.88 versus 52.56), and (2) at post-training (53.80 versus 58.80). Based on this example’s data, they would draw the same conclusion as the split-plot Anova, because the means are not statistically different at pre-training, but they are at post-training. However, there are plenty of real world occasions (say, when N = 100+ in each group) where the means will in fact be statistically significantly different at time 1 (pre-training) and at time 2 (post-training)? In that case, the t-test pairs approach is clearly useless. This is a particularly vexing situation for a t-test pairs advocate when the treatment group is higher than the control group at time 1 (i.e., assuming higher is better). By contrast, the split-plot interaction effect is impervious to this type of circumstance. If the interaction is statistically significant, and the means are in the hypothesized direction at time 2, then the treatment effect may be suggested to have “worked”.

Software Comments: I performed the analyses and created the charts with Statistica 10. I had never used Statistica prior to this blog entry. I’m happy to report that I was impressed (I have no affiliation with this software). I was particularly impressed that it was so easy to include confidence intervals to a chart with both between subjects and within subjects factors (in fact, within Statistica, it’s the default; you have to click a button to remove the confidence intervals). Believe it or not, this is not a common feature within a stats package, unfortunately. You can watch me perform the analysis here.

I've also done the same split-plot analysis using SPSS. Check it out: