All facilities certified by the Centers for Medicare & Medicaid Services (CMS) are required to submit an annual financial report. These financial reports are somewhat like tax returns: they include basic information on the facility (its name, address, etc.) and the services it provides. The bulk of the cost report is devoted to information on the hospital's finances, which include details on revenues, expenses, profits, assets, liabilities, wages, etc.
The following types of CMS-certified facilities are required to submit cost reports:
The RAND Hospital Data currently includes information from hospital cost reports only. (See "What facilities are included in the RAND Hospital Data?")
HCRIS stands for the Healthcare Cost Report Information System, which is the database in which cost report data submitted by CMS facilities are stored.
Yes. Cost report data are released to the public, and the raw files can be downloaded directly from CMS. CMS releases a separate zip file for each facility type, and there is also a separate zip file for each federal fiscal year for many of the facility types. Note that many raw files, especially those containing information for hospitals and SNFs, are too large to open in Microsoft Excel or Access.
CMS releases cost report data four times per year: between the middle and end of January, April, July, and October. These releases are timed to occur soon after the close of each quarter of the federal fiscal year. Federal fiscal year quarters end on the last day of December, March, June, and September.
The RAND Hospital Data are updated a month after CMS cost report data are released, on or around the first day of February, May, August, and November. (See "How often are the RAND Hospital Data updated?")
Each type of facility fills out a specific CMS cost report form. "2552" is the official name for the cost report form submitted by hospitals. CMS has made several major updates to the hospital cost report form, and two of these updates occurred in 1996 and 2010. The last two digits of "2552-10" and "2552-96" indicate the version of the form a hospital completed and submitted. Hospitals generally completed form "2552-96" for cost reporting periods that started within the years 1996-2009. Likewise, form "2552-10" was used for 2010 onwards.
The raw cost report data are stored in a relational database called the Healthcare Cost Report Information System (HCRIS), which includes three tables: the report table, numeric table, and alphanumeric table. Each table is released as a separate comma-separated-values (CSV) file.
The report table contains one observation for each cost report with a beginning date in a given federal fiscal year. The report record number (rpt_rec_num
) uniquely identifies each cost report and serves as a linking variable to the other two tables. Other fields in the report table include:
The numeric table contains one observation for each variable with a numeric data type for each cost report for each facility. It includes:
The alphanumeric table contains one observation for each variable with an alphanumeric data type for each cost report for each facility. It includes:
The most detailed documentation available for raw CMS cost report data is the full set of instructions that CMS provides to facilities on how to complete cost report forms. These instructions are part of the publicly available CMS "Provider Reimbursement Manual."
The Manual consists of two parts:
Chapter 36 and Chapter 40 include a detailed set of instructions for hospitals, and a full set of the worksheets (in pdf format) that hospitals fill out to complete their cost reports. For further details, see "What do '2552-10' and '2552-96' mean?"
Note: The intended audience for the Provider Reimbursement Manual is finance professionals working in the healthcare industry. The information provided therein explains how to complete cost report forms and is not intended for use by the average individual interested in understanding cost report data.
The RAND Hospital Data is a streamlined collection of variables that make CMS hospital cost report data more accessible. We download raw CMS cost report data submitted by hospitals and improve upon it, enabling users to solve logistical problems and answer complex questions more easily.
The RAND Hospital Data are structured as panel datasets; that is, each observation (row) in the data corresponds to one year of data for one level of observation (e.g., hospital, county, state, etc. - see “What does level of observation mean?” for more information).
Each dataset has observations for all available years of data between 1996 and the latest year of released hospital cost report data. For example, a hospital-level file downloaded from the May 1, 2023, release will contain one record for each hospital in each year the hospital submitted a cost report between 1996 and 2021 (See “How long does it take for the RAND Hospital Data to publish a complete year of CMS Hospital Cost Report data?” for further explanation).
No. The RAND Hospital Data contain a useful subset of variables, comprising approximately 1,500 fields selected directly from the hospital cost report, plus additional variables we derive for users. Counting all columns, lines, and sub-lines, Form 2552-10 (the full CMS hospital cost report) contains around 38,000 separate alphanumeric fields and 327,000 separate numeric fields.
See "What if there is a CMS cost report field I need that is not in the RAND Hospital Data?" if you are interested in a variable that is not included in the RAND Hospital Data.
The RAND Hospital Data files contain information on all hospitals in the United States and Puerto Rico that are required to submit CMS cost reports. These include short term, acute care hospitals; critical access hospitals; psychiatric hospitals; children's hospitals; rehabilitation facilities; and long-term care facilities.
The RAND Hospital Data are updated four times a year, on or around the first day of February, May, August, and November.
The RAND Hospital Data Team makes the current and all historical releases of the RAND Hospital Data available for subscriber download. The Release Date identifies a version specific to the date on which RAND published the dataset. For example, “2025_02_01” indicates the version of the RAND Hospital Data published on February 1, 2025.
The Level of Observation is the characteristic or set of characteristics that allows the user to uniquely identify one row in a dataset. For example, in a dataset with hospital and year as the level of observation, each record represents one hospital in one year. In a dataset with a geographical area and year as the level of observation (e.g., county, CBSA, state, or national), each record represents the summary of all hospitals contained in one geographic area in one year.
CMS requires all facilities to submit an annual cost report, but it allows each facility to select whatever cost reporting period it would like. In most cases, a facility submits cost reports using its fiscal year, which will typically run either from October 1 through September 30 (federal fiscal year), January 1 through December 31 (the calendar year), or July 1 through June 30. Facilities can and do have other cost reporting periods. Some hospitals will also change their cost reporting period in the middle of a year and have a cost reporting period shorter than a year.
There are many options in addressing the lack of uniformity in hospital cost reporting periods. To make comparisons across hospitals, it is useful to have the cost report data processed so that each record covers a standardized time period.
Subscribers to the RAND Hospital Data can choose datasets where cost report information has been standardized to one of three time periods, or they can choose to download unweighted data.
The three standardized periods:
In the hospital cost report time period datasets, the data are unweighted, and each record corresponds to one hospital cost reporting period. So, the start and end dates will differ by hospital and will not necessarily be one year long.
Core-Based Statistical Areas (CBSAs) are defined by the U.S. Census Bureau, and each one contains a single core county, an urbanized area, or an urban cluster with a population of at least 10,000 people. Each core county is combined with any adjacent counties that have a high degree of social and economic integration with the core, as measured through commuting ties with the core or other counties associated with the core.
CBSAs are each classified as either a metropolitan area (urbanized area with a population of least 50,000 people) or a micropolitan area (urbanized area with a 10,000-50,000 population). When processing and creating RAND Hospital Data, all rural counties within a state - i.e., not included in a metropolitan or micropolitan area - are grouped and coded as “XX999” where “XX” is the 2-character postal abbreviation for the state, such as “PA999” for Pennsylvania.
Subscribers can choose between data with OR without outliers corrected. To correct data outliers, RAND applies an algorithm that identifies numeric values that fall far outside the normal range of variation and replaces them with interpolated values. In general, the data is allowed a very wide range of variation before being corrected; the degree of variation is adjusted based on the degree of observed variation within a given hospital over time (hospitals that typically exhibit wider-than-normal variation are given more latitude) and the typical degree of variation for a given variable.
Registered users can only access a subset of variables from the most recent cost report for each hospital with outliers corrected.
Yes. A documentation zip package is available for download, and contains three Excel workbooks, two .zip files, and a Readme:
RAND Hospital Data: Web-Based Tool. Santa Monica, CA: RAND Corporation, 2018. https://www.rand.org/pubs/tools/TL303.html.
As a subscriber, if there is a field that you need for your analysis that is not included in RAND Hospital Data, please contact us and let us know the worksheet, column, and line(s) from Form 2552-10 that interest you. We will accommodate as many of these requests as possible in future releases of the RAND Hospital Data.
Unless a user is trying to replicate a specific analysis that relies on data from a particular Release Date, we recommend using the most recent release. We make this recommendation for several reasons:
It takes approximately 18 months after the end of the year of interest for complete data to appear in the RAND Hospital Data. This is due to three main factors:
We include two variables to provide details on the completeness of cost report data for individual hospitals in a given time period:
We calculate two additional metrics to help users estimate the relative completeness of cost report data within the time period of interest across all hospitals:
To estimate completeness, we take the ratio of number of hospital-years to the number of hospitals reporting any data. For example, if 4,000 hospitals have submitted cost reports that cover any part of the time period of interest, and each of those cost reports covers six months of a year of interest, we calculate that to be 2,000 hospital-years. Taking the ratio of hospital-years to hospitals yields a value of 0.5, indicating that the data are half complete. A year of data is considered complete when the hospital-years metric approaches or equals the number of hospitals metric.
The following sources provide additional information and examples of how CMS hospital cost reports have been used:
CMS and NBER also provide CMS hospital cost report data that has been pre-processed to make it easier to use. However, there are a number of differences between the RAND Hospital Data and these files:
Feature | RAND Hospital Data | CMS Roll-up Data | NBER Files |
---|---|---|---|
Multiple years of data, one file | |||
Sub-lines summed to line level | |||
Standardized time periods | |||
Calendar year | |||
Federal fiscal year | |||
Hospital fiscal year | |||
Derived variables | |||
Variable documentation written for research audience | |||
Variable names | |||
Concept-based | |||
Location-based | |||
Logitudinally consistent between Form 2552-96 and Form 2552-10 | |||
Available data formats | |||
SAS | |||
Stata | |||
csv | |||
Geographic summary files |
At this time, there are no discounts offered for subscriptions to the RAND Hospital Data for any group, including students, government workers, nonprofit staff, etc. As a nonprofit, we charge minimal fees to support and sustain the service.
The RAND Hospital Data is not offered with a tiered pricing structure at this time; users cannot purchase a subset of variables at a lower price. Users have two options:
We do not currently offer a site, university, or multi-user license; subscriptions are on a per-user basis. We ask subscribers to use their best judgment on the number of subscriptions to purchase if the data are to be used across an institution or for different projects.
If a subscription is purchased through PayPal, it will automatically renew at the end of the year, unless the user cancels prior to the next charge date. If a user provides payment through a Purchase Order, the subscription will lapse after one year, and a new Purchase Order must be issued to access the data for an additional year.
We are happy to answer questions about individual variables in the RAND Hospital Data and how they are constructed. Unfortunately, we do not have sufficient resources to provide support in designing analyses, writing code, or interpreting results.