McKinsey’s "Diversity Wins" Under the Microscope: A Case Study in Flawed Consulting Data

For nearly a decade, McKinsey & Company has been the primary authority behind the "business case for diversity." In a series of highly influential reports—Diversity Matters (2015), Delivering through Diversity (2018), Diversity Wins: How Inclusion Matters (2020), and Diversity Matters Even More (2023)—McKinsey reported finding statistically significant positive relationships between the racial/ethnic diversity of executive teams and corporate financial outperformance (measured by EBIT margin).

These statistics became corporate gospel, cited in thousands of investor pitch decks, board presentations, and HR policies worldwide to argue that diversity is not just a moral good, but a financial imperative.

However, a groundbreaking 2024 peer-reviewed study by researchers Jeremiah Green (Texas A&M) and John R. M. Hand (UNC Chapel Hill), published in Econ Journal Watch, subjected McKinsey's findings to a rigorous "quasi-replication" and exposed severe methodological flaws.

The Methodological Flaws in McKinsey’s Research

The academic investigation revealed that McKinsey's widely repeated statistics are built on a highly questionable foundation.

1. Reverse Causality (The Time-Horizon Loop)

McKinsey’s reports imply a causal relationship: that increasing executive diversity leads to or causes improved financial performance.

In reality, McKinsey measured corporate financial performance (EBIT margins) over a 4-to-5-year period leading up to the year in which they measured the diversity of the executive team.
For example, in their 2015 study, they analyzed financial data from 2010–2013, but measured executive diversity in 2014.
This means the default direction of causality captured by their data is the exact opposite of what they claim: highly profitable, outperforming firms are more likely to subsequently hire diverse executives, possibly because they have the luxury and resources to deploy advanced talent acquisition strategies.

2. Complete Lack of Reproducibility

Because McKinsey refused to share its detailed datasets or the names of the firms in its samples (citing client confidentiality), Green and Hand conducted a quasi-replication using firms in the S&P 500 Index as of December 31, 2019.

They applied McKinsey's exact mathematical testing approach to S&P 500 firms, measuring financial performance from 2015–2019 and executive diversity in 2020.
The Result: They found no statistically significant relationship between McKinsey's executive diversity metrics and industry-adjusted EBIT margin.
To ensure robustness, the researchers expanded their analysis to five other financial metrics: sales growth, gross margin, return on assets (ROA), return on equity (ROE), and total shareholder return (TSR). Out of 40 separate statistical tests, 37 showed no significant relationship, one was positive, and two were negative.

3. A Mathematically Flawed Metric for Real-World Diversity

McKinsey measures diversity using an inverse normalized Herfindahl-Hirschman Index (NHHI).

Mathematically, this metric is only maximized when a firm has an exactly equal number of executives from all 5 (or 8) racial/ethnic groups.
Because the actual US population and labor force do not contain equal numbers of all racial groups, this metric is an unrealistic and counter-intuitive standard for corporate diversity.
Under this metric, a firm whose executive team perfectly mirrors the demographics of the US population (e.g., ~61% White, ~18% Hispanic, ~13% Black, ~7% Asian) is rated as significantly less diverse than a firm whose executives are equally divided (20% each) across all five groups.

Why the McKinsey Statistics Went Viral

Despite these massive limitations—most of which McKinsey quietly acknowledged in small-print footnotes—the statistics spread like wildfire.

The Halo Effect of the McKinsey Brand: Because the reports carried the imprimatur of the world's most prestigious consulting firm, business leaders, journalists, and HR professionals assumed the data was ironclad. The brand name "laundered" the questionable methodology.
Moral-Empirical Alignment: Corporate leaders and activists desperately wanted the "business case" for diversity to be true. It allowed them to present a moral and social goal as a pure, bottom-line financial decision, making it far easier to sell to boards and shareholders. Skepticism was suspended because the conclusion was highly desirable.
Academic Shielding: By keeping their datasets secret, McKinsey shielded its research from peer review for nearly a decade. It was only when independent academics took the initiative to build their own equivalent dataset from scratch that the claims were finally tested and debunked.

McKinsey’s "Diversity Wins" Under the Microscope: A Case Study in Flawed Consulting Data

Sources