Skip to content

CPL Roadmap to Government Administrative Data in California

California offers substantial opportunities for policy-focused research. It is the largest US state, with a population that exceeds that of the smallest 21 states combined and an economy larger than all but four nations in the world. California is also highly diverse, with a wide variety of industries, landscapes, population densities, racial and ethnic groups, and complex social issues. Finally, California has often been an innovator in many domains of social policy, making it a useful laboratory for studying policies that may be adopted nationally.

The California Policy Lab (CPL) has active research partnerships with over twenty state and local government agencies in California. In doing this work, we have learned a great deal about the state’s data resources. Most of the data we work with is sensitive, restricted-access, person-level data. This is not “open data” — rather, it is of necessity “closed” to most access. (If you’re looking for open data resources, check out data.ca.gov).

We also recommend checking out CPL’s PERLI Initiative. PERLI is expanding the data resources available for quantitative Social, Behavioral, and Economic research. Using administrative microdata from California, the world’s 5th largest economy, we are providing linked longitudinal datasets that unlock new pathways to discovering the causes and consequences of poverty and economic mobility, and whether government interventions help households succeed.

In the spirit of knowledge sharing, we provide here a list of important person-level administrative data that exist in California. It is not intended to be comprehensive, and it may not be fully accurate. However, we often get questions regarding basic data discovery issues, and so we provide this informal resource for interested parties and hope it is of value. If you’re thankful, send us a note or make a donation to CPL. To make suggested edits or additions, please email datapage@capolicylab.org.

State Level Data

For rows marked with a    below, we host a user group where users of that data can share tips and discuss complexities of the data. Email datapage@capolicylab.org to join.

County Level Data

Many important data systems in California are maintained at the county level, rather than the state level. This is especially true in criminal justice: data from the courts, the sheriff, the district attorney, the public defender, and the probation department are all kept at the county level and have no state-level equivalent. CPL has worked with county justice data in Alameda, Los Angeles, Sacramento, San Francisco, Santa Clara, and Sonoma counties.

Social services data are also maintained at the county level in the SAWS systems, although CDSS has recently been granted some access to those data. CPL has worked with county social services data in Los Angeles, San Francisco, and Sonoma counties.

Homelessness management information systems (HMIS) are maintained by the local or regional continuum of care, which often but not always coincide with the county boundaries. CPL has worked with county HMIS data in Los Angeles and Sonoma counties.

Some California counties have done impressive work to link together person-level data from different domains in order to study complex problems and populations. The largest and most significant of such systems is Los Angeles’s Enterprise Linkages Project. CPL works with a similar linked data system in Sonoma County. As we understand it, there are other such linked data systems (extant or in development) in Alameda, Humboldt, Sacramento, San Diego, San Francisco, and perhaps other counties.

CPL’s Internal DataWiki

In addition to the information above, CPL also maintains an internal DataWiki that has more granular information about the datasets with which CPL works. If you are interested in gaining access, send an inquiry with your affiliation and your reason for interest to datawiki@capolicylab.org. Access is restricted to bona fide researchers and even then may not be granted in all cases (for example, when restricted by the agency data owner).

Statewide Longitudinal Data System

In 2019, the California Legislature passed the California Cradle-to-Career Data System Act, which began an in-depth or comprehensive planning process to create a statewide longitudinal data system that would link, among other things, K-12 data to higher education data to workforce data. The website dedicated to these efforts is here. In December 2020, the working group posted recommendations for the Legislature to consider.

Two CPL staff participated on a subcommittee and advisory group for this process. Though we have some issues with how the process was conducted and with the final recommendations, we are generally quite supportive of the effort and hope that California can join the ranks of most other states that have already linked their K-12, higher ed, and workforce data systems.

Education Data

In the education space, there is a rich set of groups with experience using California’s educational administrative data, including:

  • Policy Analysis for California Education, or PACE, is a research consortium across many California universities that includes faculty who work with data from all of California’s education systems. They also have a collaboration with the CORE districts (see below) that allow researchers to use those data.
  • The California Education Lab at UC Davis, includes faculty who have worked extensively with data from the K12, community colleges, and CSU systems, and more.
  • The CORE Districts is a collaboration among 8 California school districts (including many of the largest) that has created a joint data system available to researchers.
  • Cal-PASS Plus is a clearinghouse of longitudinal data following students from K-12 into the workforce. Cal-PASS Plus does entertain research requests.
  • The Silicon Valley Regional Data Trust is an initiative by Santa Clara, San Mateo, and Santa Cruz counties to combine educational and other relevant data about kids and their success. This data system is updated nightly and serves operational needs. SVRDT also does limited work with researchers.

Other Relevant Resources

Other groups that have substantial cross-domain experience with California administrative data include the Children’s Data Network, the Stanford Center on Poverty and Inequality, and the Public Policy Institute of California.

One excellent example is CDN has worked with the state’s Health and Human Services Agency to help link together data from several CHHSA departments to develop a client-centered approach to health and human services delivery. Learn more.

Is there something missing? Send us an email at: datapage@capolicylab.org.

Stay Informed