Spearman Rank Correlation


The Spearman Rank correlation is the non-parametric equivalent of the Pearson correlation. What does that mean? It means that the Spearman correlation has fewer assumptions. The biggest one is that the Spearman correlation can be applied to non-normal data. In a sense, all the Spearman correlation does is transform the data into ranked data, if it has not been transformed already. It's really just a Pearson correlation applied to ranked or ordinal data.

What I really like about the Spearman correlation is that I can quickly compare it to the Pearson correlation. That is, when dealing with interval/ratio data that might be non-normally distributed, I can apply Spearman and Pearson to the same data. The corresponding Pearson/Spearman correlations that differ "meaningfully" tell me that my data are too non-normally distributed to use the Pearson correlation.

If the Spearman has less assumptions than Pearson, why not use it every time? Well, that's a good question. One thing to keep in mind is that, across all statistical analyses, as you deal with statistics with fewer assumptions, they also tend to be less informative. In the case of the Spearman correlation, it is considered inappropriate to square the correlation to derive a coefficient of determination. So, with Spearman correlations, you can not talk about percentage of variance accounted for justifiably (unless you want to talk about about percentage of variance in ranks accounted for). That's a pretty big negative, in my opinion. Nonetheless, the Spearman correlation should be in every data analysts arsenal.

Check out this video where I apply Spearman correlation to a set of data in SPSS.


Youtube Link: http://www.youtube.com/watch?v=r_WQe2c-ISU

Parts:

See also: