This paper addresses the structural validity of tournament-based performance evaluation systems that mandate fixed distributional outcomes, commonly known as forced ranking or stack ranking, widely used in technology and corporate environments. The authors argue that such systems produce systematic classification errors independent of implementation quality, not merely as a result of poor execution. Using agent-based simulation across 994 engineers distributed across 142 teams of seven, the study demonstrates that random team assignment alone generates a 32% error rate in termination and promotion decisions, attributable purely to team composition variance. Under conditions modelling differential managerial capability, error rates rise to 53%, with false positives and false negatives each exceeding correct classifications. The authors further argue that cross-team calibration — a commonly proposed corrective — converts evaluation into influence contests driven by managerial persuasion rather than employee merit. Multi-period dynamics are shown to incentivise risk-averse behaviour and accelerate high-performer attrition through adverse selection. The paper concludes that forced ranking persists not because it aligns incentives effectively, but because it satisfies institutional demands for a demonstrable, auditable process, despite producing outcomes statistically indistinguishable from random allocation. Key insights: Random team assignment alone, absent any managerial bias, generates a 32% misclassification error rate in forced ranking termination and promotion decisions due to team composition variance. Under realistic conditions modelling differential managerial capability, classification error rates reach 53%, meaning false classifications outnumber correct ones. Cross-team calibration processes — frequently proposed as a remedy to forced ranking's inconsistencies — structurally transform evaluation into influence contests where persuasive managers secure favourable outcomes for their reports independent of actual merit. Multi-period dynamics produce adverse selection: as employees observe random or merit-independent outcomes, risk-averse behaviour increases and high performers are disproportionately likely to exit the organisation. The paper argues forced ranking persists not through demonstrated incentive alignment but through satisfying institutional and legal demands for a formalised, auditable process. Practical takeaways: Organisations using fixed distributional requirements in performance evaluation face an inherent structural error rate driven by team composition variance, a factor largely invisible to standard process audits. The proposed alternative — delegating evaluative judgment to managers with hierarchical accountability — is identified in the paper as the theoretically efficient solution, though the authors note it cannot be fully formalised within the legal and coordination constraints that originally motivated forced ranking.