I first heard about the enneagram in mid-2017. If you’re not familiar with it, it’s a psychological and spiritual personality assessment that supposedly reveals our inner motivations. It has quickly become very popular, particularly among Christians, as a tool for spiritual growth. When I first heard about it, everyone else already knew what it was and spoke as if it was the universally accepted gold-standard for personality tests. This caused me quite a bit of embarrassment because I felt like I’d missed a major development within the field of personality psychology. At that time, I already had an MA and BS in psychology, had served as a behavioral scientist in the Air Force for nine years, three of which were spent teaching psychology as an adjunct, and I was currently going to seminary and volunteering in a psychology research lab at a university.
Despite my embarrassment, I swallowed my pride and sought to learn more about it. I asked them to tell me more about it and when I got home, I researched it even further. I quickly discovered why I had never heard of it: the enneagram is not a scientifically validated tool nor is it used by therapists (although there are some rare exceptions). It wasn’t in any textbooks on personality, counseling, or the psychology of religion and there were only a handful of scientific studies that used it in any way.
Additionally, I looked up 20 or so different enneagram experts and many have theology degrees, a few have degrees that are in the ballpark of personality science (Jerome Wagner & Beatrice Chestnut, Ph.D. clinical psychology; Don Richard Riso, M.A. in social psychology ), but none appear to have degrees that would offer through training in psychometrics, test construction, and personality, which are all necessary to evaluate the accuracy of such a tool. For example, Russ Hudson, president of the Enneagram Institute, describes himself as “one of the principal scholars and innovative thinkers in the enneagram world today,” yet his LinkedIn page only lists a BA in East Asian Studies. This does not automatically mean the enneagram is invalid. I mention it because it helps explain why I hadn’t heard of it.
There are some valid theological concerns about the enneagram, but most of the critiques boil down to its questionable origins. As a scientist, I’m not concerned about its origins because this has nothing to do with whether or not it’s accurate (this is what philosophers call the genetic fallacy), so I’m not going to address those concerns. The only point I want to make theologically is that the Bible charges us to be wise (Matt 10:16), discerning (Phil 1:9-10), and to test everything (1 Thess 5:21 & 1 John 4:1), especially things concerned with spiritual matters like the enneagram.
On the one hand, many people personally testify to the usefulness of the enneagram. Even for me, when I read the description of my type, there are aspects of it that seem eerily accurate. On the other hand, there are aspects of other enneagram types that seem eerily accurate about me, the claims about the enneagram seem to be too good to be true, and the enneagram experts don’t have the proper training to substantiate these claims.
So how are we to follow the biblical command to test it, especially when there are seemingly conflicting data about the enneagram? Enter science.
Science of Personality
I’m often told that the enneagram is not a personality test and that it cannot be tested scientifically. As a scientist who studies personality, I can tell you that both of these objections are plainly false. The enneagram makes the same kind of claims as every other personality test. There’s nothing magical about it that makes it beyond the realm of science. Therefore, we can and should test it accordingly, which means we need to understand the science of personality.
Personality is hard to assess because it’s easy to include non-personality factors into the test such as intelligence, education (correlated, but still different than intelligence), religious beliefs, identity, etc. Personality often correlates to these factors, but a good personality test will discriminate between personality and these other factors. Studying personality scientifically is important because it helps us remove our personal biases so we can accurately assess different measures. This allows us to consider multiple variables and see if it applies to large populations of people rather than being limited to a single person’s experiences or best guesses.
The two personality tests that are usually considered the gold standard are the NEO, which assesses personality according to the big five traits, often called the big five, and the MMPI. Scientists debate which is better, but the big five is used more in research because it is more accessible. The MMPI is expensive and requires certification to administer or interpret the data. The Meyers-Briggs (MBTI) is the most popular among laypeople because it is simple and flashy, but most scientists don’t typically use it because it’s validity is questionable (in fact, it’s not uncommon for psychologists to openly mock it).
Unfortunately, there is very little scientific data on the enneagram so it’s hard to draw definitive conclusions about it. I could only find a handful of scientific studies that examined it. None of them were in top-tier journals and their methodology was questionable. This does not invalidate them but does raise more red flags. Either way, I will take these studies at face value and assume they are valid.
Perhaps the most important factor for a personality test is test-retest reliability, which checks to see if the test can reliably reproduce consistent results when someone takes it more than once. Only one of the studies actually looked at this measure and they found 79-100% of participants, depending on the type, were in the same type at the pre- and post-tests. This really good, but it was also based on a biased sample of people who are trained in the enneagram and self-selected their own type both times.
As a comparison, the NEO PI-R which measures personality by the big five factors ranges from .86 to .91 after 3 months and .63 to .83 over 6 years. While personality is fairly stable over time, particularly in adulthood, it does gradually change so some changes should be expected for any personality test. Interestingly, one of the claims many people make about the enneagram is that your type does not change, even from childhood, which is opposed to other personality research.
Related to this is inter-rater reliability which looks to see if two people rate a person the same way. For the enneagram, the highest score for this came from people with at least 2.5 years experience with the enneagram and they only agreed 55% of the time. The scores only went down from there in other studies or when less experienced people were tested. In fact, one researcher who advocates for the enneagram states that trained enneagram practitioners are pretty good (although the data doesn’t support this), but they are “not as good as they think they are!”.
Another important factor, which is the most common, is the internal consistency (reported as Cronbach’s alpha), which checks to see if the questions for each enneagram type are testing the same thing. An acceptable score is considered .70 or higher. The enneagram types ranged from .37 to .82 with at least three of the types falling below the .70 threshold. This means that 18-63% of the variation in scores is due to measurement error! For comparison, the NEO PI-R ranges from .86 to .92 (8-14% measurement error). Since the enneagram is ipsative, meaning the questions force you to choose between two answers instead of choosing the degree to which you agree, the low internal consistency means that most people typically have characteristics of multiple types.
The next factor is predictive validity which checks to see how well the test predicts behavior. One of the studies specifically compared the enneagram to the big five (Sutton, Allinson, Williams 2013), which is great in theory, but they compared apples to oranges so it’s hard to draw conclusions. Unfortunately, they compared the enneagram to only single factors of the big five rather than combining scores across all five factors which would have enhanced the predictive utility of the big five. Even so, the big five still fared better even though they used it in a less than optimal way. The enneagram did as well as a single factor of the big five, and in one case, it did better. The authors should have used a multiple regression with the big five to incorporate all five factors before comparing it to the enneagram.
The final study looked at the organization of the types and the notion of having a “wing.” One study had participants organize the types based on similarities and the results showed vastly different organizations from how the types are actually organized according to the enneagram. More research needs to be done here, but it does seem to suggest that even if the types are valid, the organization of them on the circle may simply be arbitrary.
Conclusion of Scientific Evidence
Overall, the psychometric properties of the enneagram are mixed. Some properties are below standard thresholds, a few are very good, and a lot of them are right around minimally acceptable standards. It’s not a terrible test, but it’s not good either. This won’t change unless someone develops a revised version of it, in which case, it will be different from what anyone is currently using.
The Wagner Enneagram Personality Style Scales (WEPSS) appears to be a little more accurate than other versions but still has mixed or uncertain results. Additionally, where it improves in some areas, it creates other issues. I am still waiting to hear back from the company regarding the reliability and validity statistics so I can go see more than just the basic information that was reported in The Fifteenth Mental Measurements Yearbook.
Additionally, the current research only looks at the basic explanations and delineations of each type. The issue is that the enneagram is also supposed to tell a person what their sins and weaknesses are, how they can get healthy, and how they can best relate to other people. These are all additional claims that stem from assumptions about the types, meaning they will have the same degree of error as the type, plus more!
Think of it this way. If you are playing pool and you are off by a millimeter, you may still make the shot. But if you are off by a millimeter when you try to shoot a combo by banking one ball off of another, you will almost certainly miss because the first margin of error will affect the next ball, and multiply the error. Even more so if you try a banking two balls, and so on. This is how the enneagram is supposed to work. As a 5, I supposedly become more like a 7 when stressed and an 8 when I am relaxed. This is like a quadruple combo because it assumes each number is correct, plus the relationships between each number are correct.
My guess is that the sins associated with each type are probably only a little more accurate than a roll of a dice. Some are probably above chance while others are probably below chance. I suspect the same is true for how people are supposed to get healthy, what they do when stressed, the triads, or what their “wing” is (assuming it could theoretically be any number and not just a neighboring number).
Finally, there is no cross-cultural data on the enneagram, so even if it were valid in the U.S., it may not be in other cultures. The big five, however, has been tested in several cultures and has shown to reliably describe personality for people of all cultures. I’m not aware of any culture it does not apply to. The only caveat is that testing it in collectivist cultures has revealed there might be another factor pertaining to interpersonal relatedness.
Unless you’ve done graduate work in psychometrics, the scientific data probably doesn’t mean a whole lot to you (which is why there are two parts to this article). For those who have studied psychometrics, it’s a no-brainer that the enneagram simply cannot do all its proponents claim it can. Any scientist who studies personality would simply look at the reliability scores and conclude the test is not accurate enough to be helpful, and therefore, they wouldn’t use it because the potential for harm will be too high.
I hope this information is helpful and informative, for those who’ve been silently skeptical of the enneagram and for those who are fans of it. My goal was and is to be as objective as possible, which is why I included statistics that may have been hard to understand. In this article, I mostly wanted to get the data out. In part 2, I explain why the enneagram still seems to work (for some), why it matters if we use it or not, and offer recommendations for better tools that can be used as a replacement.
For thoughts on it from a theological perspective, consider this article from the Christian Research Journal.
Here’s a list of scientific(ish) sources I consulted (it does not include the books and websites I used to personally understand the enneagram). Many of these sources are not actually peer-reviewed or they are in low level and inappropriate journals (meaning the reviewers may not be qualified to properly critique the methods, statistical analyses, or interpretation of results). This is due to the limited number of articles available that test the enneagram. Most of these are favorable to the enneagram and therefore, I am accepting these as more valid than I would otherwise to try to be fair and present the best possible case for the enneagram. There were also a few other peer-reviewed articles on the enneagram, but they were not looking at the validity of it so they are not included here.
- Bland, A. M. (2010). The Enneagram: A review of the empirical and transformational literature. The Journal of Humanistic Counseling, Education and Development, 49(1), 16-31
- Costa, P. T., & McCrae, R. R. (2010). The NEO Personality Inventory: 3. Odessa, FL: Psychological assessment resources.
- Edwards, A. C. (1991). Clipping the wings off the enneagram; a study in people’s perceptions of a ninefold personality typology. Social Behavior and Personality: an international journal, 19(1), 11-20.
- Matise, M. (2007). The enneagram: An innovative approach. Journal of Professional Counseling: Practice, Theory & Research, 35(1).
- McCrae, R. R.; Costa, P. T. (1983). “Joint factors in self-reports and ratings: Neuroticism, extraversion and openness to experience”. Personality and Individual Differences. 4 (3): 245–255.
- McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of personality and social psychology, 52(1), 81.
- McCrae, R. R., & John, O. P. (1992). An introduction to the five‐factor model and its applications. Journal of personality, 60(2), 175-215.
- McCrae, R. R., & Costa Jr, P. T. (1997). Personality trait structure as a human universal. American psychologist, 52(5), 509.
- McCrae, R. R., & Costa, P. T. (2003). Personality in adulthood: A five-factor theory perspective. Guilford Press.
- McCrae, R. R., Kurtz, J. E., Yamagata, S., & Terracciano, A. (2011). Internal consistency, retest reliability, and their implications for personality scale validity. Personality and social psychology review, 15(1), 28-50.
- Newgent, R. A., Parr, P. E., & Newman, I. (2002). The Enneagram: Trends in Validation.
- Newgent, R. A., Parr, P. H., Newman, I., & Wiggins, K. K. (2004). The Riso-Hudson Enneagram type indicator: Estimates of reliability and validity. Measurement and Evaluation in Counseling and Development, 36(4), 226-237.
- Scott, S. A. (2011). An analysis of the validity of the enneagram. The College of William and Mary.
- Sutton, A. M. (2012). But Is It Real? A Review of Research on Enneagram. Enneagram Journal, 5.
- Sutton, A., Allinson, C., & Williams, H. (2013). Personality type and work-related outcomes: An exploratory application of the Enneagram model. European Management Journal, 31(3), 234-249.
- Wagner, J. P., & Walker, R. E. (1983). Reliability and validity study of a Sufi personality typology: The enneagram. Journal of Clinical Psychology, 39(5), 712-717.
- Yilmaz, E. D., Gençer, A. G., Ünal, Ö., & Aydemir, Ö. (2014). From enneagram to nine types temperament model: A proposal. Egitim ve Bilim, 39(173).
- Yilmaz, E. D., Gençer, A. G., Aydemir, Ö., Yilmaz, A., Kesebir, S., Ünal, Ö., … & Bilici, M. (2014). Validity and Reliability and of Nine Types Temperament Scale. Egitim ve Bilim, 39(171).
Here’s a link to my Google Drive folder with the Enneagram articles saved in case you want to read them,