Ceiling Effects?

There’s been a lot of discussion about “ceiling effects” in the recently published Johnson, Cheung, and Donnellan (hereafter, JCD) replication of Schnall, Benton, & Harvey (2008; hereafter SBH). In particular, Simone Schnall has argued (here, here, and here) that there is a “ceiling effect” in the replication data that makes it potentially uninterpretable. Most of the criticism has focused on Study 1, so I will focus on this study here as well.

The design of this study is simple: 2 conditions, one in which cleanliness is primed with a sentence-unscrambling task; one in which nothing is primed (the “neutral” condition). After the priming task, participants rate the wrongness of 6 moral transgressions on a 0 (“perfectly OK”) to 9 (“extremely wrong”) scale. These 6 wrongness ratings are then averaged to form a composite, which is the main DV. The hypothesis is that the composite wrongness ratings will be lower in the “cleanliness” condition than in the “neutral” condition.

The important thing to keep in mind is that this 6-item composite, NOT the individual item scores, is the main DV. The “ceiling effect” critique has focused on item-level distributions, not the distribution of the composites. But how do the distributions of the composites compare between the two studies? Here I compare the distributions of the composites in the “neutral” (control) condition between the two studies, in order to see whether there is a ceiling effect at baseline.

Here is a histogram of the composite scores from the original study (SBH):


Here is a histogram of the composite scores from the replication (JCD):


A few things are apparent:

1. The mean in the replication study is higher by .67 scale points, a statistically significant difference (t[125] = 2.30, p = .023).

2. There is no evidence for a ceiling effect in the composite scores. The distribution looks almost suspiciously normal.

3. The distribution in the original study looks a little less normal, probably because of the lower N (20).

4. There is plenty of room to move scores *down* which is what the manipulation was hypothesized to do.

Conclusion: the composite scores in the replication study don’t differ from normality in any significant way. There is no evidence of a ceiling effect in the composite scores. It may be the case that the effect of cleanliness priming only “works” when baseline wrongness ratings are lower, but it is not clear why this would follow from current theory. More research is (always) needed.

Thanks & disclaimers:

1. I would not have been able to do these analyses had JCD and SBH not made data publicly available. I appreciate their efforts. Data can be downloaded here: SBH, JCD.

2. I am not a statistician, although I do know how to look at histograms and run t-tests. I may have made errors. If you find any, please tell me.

3. Yes, I still use SPSS. Please don’t judge me.