A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code

Jan 6, 2026·

Dawoon Jeong

James Holehouse

Jisung Yoon

Christopher P. Kempes

Geoffrey B. West

Hyejin Youn*

· 0 min read

PDF Dataset Project SFI Press

Figure 1, Jeong et al. (2026), Scientific Data

Abstract

As societies confront increasingly complex regulatory demands in domains such as digital governance, climate policy, and public health, there is a pressing need to understand how legal systems evolve, where they concentrate regulatory attention, and how their institutional architectures shape capacity for adaptation. Yet, the long-term structural dynamics of law remain empirically underexplored. Here, we provide a quantitative analysis of the United States Code (U.S. Code), the primary compilation of federal statutory law in the United States, covering the entire history of the Code from 1926 to 2023. We include statistics related to the structural and linguistic complexity of the Code: word counts, vocabulary statistics, hierarchical organization (titles, chapters, sections, subsections), and cross-references among titles. Additionally, we make the generative AI method utilized to clean the old OCR versions of the U.S. Code publicly available. The dataset offers an empirical foundation for large-scale and long-term interdisciplinary analysis of the growth, reorganization, and internal logic of statutory systems. The dataset is released on GitHub with comprehensive documentation to support reuse across legal studies, data science, complexity research, and institutional analysis.

Type

Journal article

Publication

In Scientific Data, 13, 13

Last updated on Jan 6, 2026

US Legal Code Complexity Institutional Evolution Legal Systems

← Quantifying organizational routines: A multidimensional analysis of innovation in the photovoltaic industry Apr 8, 2026

Contrasting Pathways of Automation: Routine Task Substitution vs. AI Complementarity Jan 1, 2026 →