The University of California Consumer Credit Panel (UC-CCP) is a new dataset of anonymized consumer credit information, created for the purpose of studying consumer financial well-being and identifying trends among California households related to credit, debt, income, and mobility. The UC-CCP was created in 2020 through a partnership between the California Policy Lab, the Student Borrower Protection Center, and the Student Loan Law Initiative. The dataset is designed for use by researchers affiliated with the University of California or the California Policy Lab. The data can inform research on a variety of topics including economic mobility, health and financial well-being, social mobility, the impact of student debt, California’s housing challenges, and more.
More About the Dataset
The UC-CCP is a longitudinal panel of approximately 40 million consumers starting in 2004 and continuing quarterly through the present (pending funding availability). Updates to the data on a quarterly basis are anticipated but will depend on funding. The sample comprises anonymized credit records of a nationally representative 2% sample of U.S. adult consumers with credit records along with a full sample of 100% of Californians with credit histories. The dataset also includes records from consumers that shared an address or an account (e.g., co-signers) with those in the sample. Data elements includes demographic and geographic information about consumers, credit scores, and raw tradeline-level information about each loan or collections item, including payment history, credit limits and balances, and various information about the type and status of those tradelines, including collections and deferments.
While the UC-CCP is similar to existing credit panels by the Federal Reserve Bank of New York and the Consumer Financial Protection Bureau, it also has three distinct advantages for researchers:
1. The size of the sample and the oversampling of California consumers.
2. The granularity of the data.
3. A streamlined process (through CPL) for potentially linking the UC-CCP data with other California data.
The data originates from one of the three nationwide consumer reporting agencies. Before being provided to the UC-CCP, the data was stripped of any information that might reveal consumers’ identities, such as names, addresses, and Social Security numbers.
Here is a public deck describing the data.
Accessing the Data
The UC-CCP is hosted on CPL’s Secure Data Hub, which is a virtual enclave environment designed for secure analysis and research of sensitive administrative microdata. Only approved users have access and only for approved projects. User activities are monitored, logged, and audited.
There is a cost to use the data (see FAQ below) and potentially interested researchers should reach out with questions and inquiries to: firstname.lastname@example.org.
Potential users should note that not all requests may be approved. As of December 2020, we have usable annual data from 1Q2004 through 3Q2020. The data in 2020 is monthly. Pending funding availability, we hope to purchase quarterly data on an ongoing basis.
Frequently Asked Questions
Can you describe the data in more detail? For example, what variables are in the data?
We do not yet have detailed data documentation available to share with potential users. However, our data is similar to credit panels held by the Federal Reserve Bank of New York and the Consumer Financial Protection Bureau, so it may help to read up on those data.
The NY Fed describes their data here: An Introduction to the New York Fed Consumer Credit Panel. One main difference from the NY Fed’s data, besides sampling, is that the UC-CCP contains tradeline-level information from the credit bureau, in addition to person-level.
Here is a sampling of research using consumer credit panel data:
Pandemic Patterns: California is Seeing Fewer Entrances and More Exits (2021) (note: this research used UC-CCP data)
CalExodus: Are People Leaving California (2021) (note: this research used UC-CCP data)
For each consumer in each archive, there are four files: one on consumer characteristics, one on tradelines, one on inquiries, and one on public records.
- Consumer characteristics include credit score, geography, gender, month and year of birth, marital status, occupation and education codes, household count, and an indicator of homeownership status. We plan to perform reliability tests on some of these data, which are modeled/estimated by the credit bureau.
- “Tradelines” are loans or other reported credit products, and we receive several variables describing that tradeline, such as loan type, balance amount, minimum payment, credit limit, open and closure dates, a multi-year monthly payment history.
- Hard inquiries for credit (i.e., credit checks) are tracked by date, dollar amount, and type of business.
- Public records include bankruptcy records, including the type of bankruptcy, the filing date, and the amounts of assets and liabilities.
Can you identify individuals in the data?
No. The data are anonymized so that consumer privacy is maintained. There are no names, addresses, social security numbers, birth dates, or other personally identifying information in the data.
Do you know the race or ethnicity of individuals in the data?
No. However, we have merged on race data at the census block group-level using the American Community Survey, which provides a probability of the race/ethnicity of each individual.
For what years do you have the data?
We have quarterly extracts going back to 2004. The first archive is from March 2004, and we are receiving archives through present from March, June, September, and December of each year. We hope to continue purchasing the data going forward, pending funding availability.
Some of the demographic data is unavailable for archives before June 2010.
In order to better study the COVID-19 pandemic, we obtained monthly data for all of 2020.
Going forward we expect to receive quarterly data about one month after each quarter ends.
What is the most detailed geography for which you have data?
Each consumer record has a 5-digit ZIP. After June 2010, the UC-CCP has census geography information for ~80% of records, down to the Census Block-level.
Can you describe the sampling methodology in greater detail?
There are two samples, one nationwide and one from California.
The National Sample: For each archive, we first select all records with a “consumer pin” ending in one of two two-digit numbers (e.g., 24 or 56). The consumer pin is assigned sequentially by the credit bureau and we have conducted testing to ensure that the pins are as good as random, thereby creating a representative nationwide sample.
The California Sample: We first selected all consumers that had a California address during one of the sixty quarterly archives between March 2004 and December 2019. We have data for those consumers from all archives, even from archives in which they are not located in California. The resulting sample includes “always” residents of California, but also “comers” to California during the 2004-19 period, and “leavers” from California from 2004 to present, and into the future.
Household Members and Associated Borrowers: For both the National and California Samples, we also have data for consumers who share the same address (max of 8 co-habitants) during that archive (Household Members). And we also have data for consumers who are on the same tradelines, such as co-signers (Associated Borrowers). These Household Members and Associated Borrowers are distinguished within the data, and we only have data for them during the archives in which they are associated with the Sample members.
Can the UC-CCP data be linked to other data? If so, how?
Yes, in some circumstances.
The UC-CCP data can readily be linked with other data at the ZIP-5, or after June 2010, at the Census Block-level, or higher levels of geography.
In addition, we have arranged for a streamlined process by which UC-CCP data can be linked with other data at the individual level. This process requires that each data provider encrypt, in the same manner, identifiers that can then be matched and linked on CPL’s servers, without ever seeing the identifiers. The process requires an additional fee (listed below) and is subject to approval by the data providers.
Who is eligible to access the data?
Faculty, students, and employees of the University of California are eligible to access the data, but their specific use of the data must first be approved by the credit bureau and CPL. UC authors may have non-UC co-authors, subject to approval by CPL and the credit bureau.
What is the approval process for accessing the data?
If you are interested in conducting research with the UC-CCP, please fill out this form: UC-CCP Research Request Form.
Not all research projects may get approval. Projects will generally be reviewed within 6 weeks of submittal, so please be patient if you’ve submitted an inquiry before that time. Please do not check-in with us before 6 weeks unless there is some unusual urgency.
Even those who are approved for access may experience delays in getting access to the data as we get our hosting environment set up. We appreciate your patience; we are a small team and there is a large volume of requests.
What are the fees related to data access?
Effective July 1st, 2022, we are charging users $6,583 per project to access the data. This helps recoup our costs of cleaning and hosting the data.
An additional $12,503 fee is assessed for each project that requires an individual-level linkage. $5,000 of that fee goes directly to the credit bureau. The remainder covers CPL’s costs in facilitating the linkage, writing code to hash the data, and hosting the data.
These rates are subject to change and are likely to change on an annual basis, in alignment with our costs.
Seed Grants are available to cover access fees. If you apply for a Seed Grant, please also submit an application here for UC-CCP access, and reference both applications in each application.