Frequently Asked Questions | RAND Hospital Data

Main Menu

FAQs

Cost Reports

RAND Hospital Data

Using the RAND Hospital Data

RAND Hospital Data Subscriptions

Expand all

Collapse all

All facilities certified by the Centers for Medicare & Medicaid Services (CMS) are required to submit an annual financial report. These financial reports are somewhat like tax returns: they include basic information on the facility (its name, address, etc.) and the services it provides. The bulk of the cost report is devoted to information on the hospital's finances, which include details on revenues, expenses, profits, assets, liabilities, wages, etc.

The following types of CMS-certified facilities are required to submit cost reports:

Hospitals
Skilled Nursing Facilities (SNF)
Home Health Agencies
Renal Dialysis Facilities
Hospice Facilities
Federally Qualified Health Centers (FQHC)
Rural Health Centers (RHC)
Community Mental Health Centers (CMHC)

The RAND Hospital Data currently includes information from hospital cost reports only. (See "What facilities are included in the RAND Hospital Data?")

HCRIS stands for the Healthcare Cost Report Information System, which is the database in which cost report data submitted by CMS facilities are stored.

Yes. Cost report data are released to the public, and the raw files can be downloaded directly from CMS. CMS releases a separate zip file for each facility type, and there is also a separate zip file for each federal fiscal year for many of the facility types. Note that many raw files, especially those containing information for hospitals and SNFs, are too large to open in Microsoft Excel or Access.

CMS releases cost report data four times per year: between the middle and end of January, April, July, and October. These releases are timed to occur soon after the close of each quarter of the federal fiscal year. Federal fiscal year quarters end on the last day of December, March, June, and September.

The RAND Hospital Data are updated a month after CMS cost report data are released, on or around the first day of February, May, August, and November. (See "How often are the RAND Hospital Data updated?")

Each type of facility fills out a specific CMS cost report form. "2552" is the official name for the cost report form submitted by hospitals. CMS has made several major updates to the hospital cost report form, and two of these updates occurred in 1996 and 2010. The last two digits of "2552-10" and "2552-96" indicate the version of the form a hospital completed and submitted. Hospitals generally completed form "2552-96" for cost reporting periods that started within the years 1996-2009. Likewise, form "2552-10" was used for 2010 onwards.

The raw cost report data are stored in a relational database called the Healthcare Cost Report Information System (HCRIS), which includes three tables: the report table, numeric table, and alphanumeric table. Each table is released as a separate comma-separated-values (CSV) file.

Report Table

The report table contains one observation for each cost report with a beginning date in a given federal fiscal year. The report record number (rpt_rec_num) uniquely identifies each cost report and serves as a linking variable to the other two tables. Other fields in the report table include:

Medicare ID of the hospital that submitted the report (prvdr_num)
Beginning and end dates of the cost reporting period (fy_bgn_dt and fy_end_dt)
Status of the cost report (rpt_stus_cd)
Identity of the fiscal intermediary that processed the cost report
And additional fields

Numeric Table

The numeric table contains one observation for each variable with a numeric data type for each cost report for each facility. It includes:

Report record number (rpt_rec_num)
Worksheet, line, and column
Numeric value reported by the hospital

Alphanumeric Table

The alphanumeric table contains one observation for each variable with an alphanumeric data type for each cost report for each facility. It includes:

Report record number (rpt_rec_num)
Worksheet, line, and column
The alphanumeric value

The most detailed documentation available for raw CMS cost report data is the full set of instructions that CMS provides to facilities on how to complete cost report forms. These instructions are part of the publicly available CMS "Provider Reimbursement Manual."

Provider Reimbursement Manual

The Manual consists of two parts:

Publication 15-1: Contains 31 chapters on accounting topics spanning multiple facility and provider types.
Publication 15-2: Contains 44 chapters, each of which relates to a specific type of provider and set of cost report forms.

Key Chapters for Hospital Cost Report Forms

Publication 15-2, Chapter 36: "Hospital and Healthcare Complex, Form CMS-2552-96"
Publication 15-2, Chapter 40: "Hospital and Hospital Health Care Complex Cost Report, Form CMS-2552-10"

Chapter 36 and Chapter 40 include a detailed set of instructions for hospitals, and a full set of the worksheets (in pdf format) that hospitals fill out to complete their cost reports. For further details, see "What do '2552-10' and '2552-96' mean?"

Note: The intended audience for the Provider Reimbursement Manual is finance professionals working in the healthcare industry. The information provided therein explains how to complete cost report forms and is not intended for use by the average individual interested in understanding cost report data.

The RAND Hospital Data is a streamlined collection of variables that make CMS hospital cost report data more accessible. We download raw CMS cost report data submitted by hospitals and improve upon it, enabling users to solve logistical problems and answer complex questions more easily.

The RAND Hospital Data are structured as panel datasets; that is, each observation (row) in the data corresponds to one year of data for one level of observation (e.g., hospital, county, state, etc. - see “What does level of observation mean?” for more information).

Each dataset has observations for all available years of data between 1996 and the latest year of released hospital cost report data. For example, a hospital-level file downloaded from the May 1, 2023, release will contain one record for each hospital in each year the hospital submitted a cost report between 1996 and 2021 (See “How long does it take for the RAND Hospital Data to publish a complete year of CMS Hospital Cost Report data?” for further explanation).

No. The RAND Hospital Data contain a useful subset of variables, comprising approximately 1,500 fields selected directly from the hospital cost report, plus additional variables we derive for users. Counting all columns, lines, and sub-lines, Form 2552-10 (the full CMS hospital cost report) contains around 38,000 separate alphanumeric fields and 327,000 separate numeric fields.

See "What if there is a CMS cost report field I need that is not in the RAND Hospital Data?" if you are interested in a variable that is not included in the RAND Hospital Data.

The RAND Hospital Data files contain information on all hospitals in the United States and Puerto Rico that are required to submit CMS cost reports. These include short term, acute care hospitals; critical access hospitals; psychiatric hospitals; children's hospitals; rehabilitation facilities; and long-term care facilities.

The RAND Hospital Data are updated four times a year, on or around the first day of February, May, August, and November.

The RAND Hospital Data Team makes the current and all historical releases of the RAND Hospital Data available for subscriber download. The Release Date identifies a version specific to the date on which RAND published the dataset. For example, “2025_02_01” indicates the version of the RAND Hospital Data published on February 1, 2025.

The Level of Observation is the characteristic or set of characteristics that allows the user to uniquely identify one row in a dataset. For example, in a dataset with hospital and year as the level of observation, each record represents one hospital in one year. In a dataset with a geographical area and year as the level of observation (e.g., county, CBSA, state, or national), each record represents the summary of all hospitals contained in one geographic area in one year.

CMS requires all facilities to submit an annual cost report, but it allows each facility to select whatever cost reporting period it would like. In most cases, a facility submits cost reports using its fiscal year, which will typically run either from October 1 through September 30 (federal fiscal year), January 1 through December 31 (the calendar year), or July 1 through June 30. Facilities can and do have other cost reporting periods. Some hospitals will also change their cost reporting period in the middle of a year and have a cost reporting period shorter than a year.

There are many options in addressing the lack of uniformity in hospital cost reporting periods. To make comparisons across hospitals, it is useful to have the cost report data processed so that each record covers a standardized time period.

Subscribers to the RAND Hospital Data can choose datasets where cost report information has been standardized to one of three time periods, or they can choose to download unweighted data.

The three standardized periods:

Calendar year: Each observation is standardized to January 1 - December 31.
Federal fiscal year: Each observation is standardized to October 1 - September 30. For example, federal fiscal year 2019 spans October 1, 2018, through September 30, 2019.
Hospital fiscal year: Each observation is the sum of all information included in cost reports with a start date during a given federal fiscal year. For example, if a hospital submits cost report 1 covering October 1, 2018 through December 31, 2018, and then submits cost report 2 covering January 1, 2019, through December 31, 2019, information from both reports will be summed into a single observation because both have a beginning date that falls within the federal fiscal year 2019.

In the hospital cost report time period datasets, the data are unweighted, and each record corresponds to one hospital cost reporting period. So, the start and end dates will differ by hospital and will not necessarily be one year long.

Core-Based Statistical Areas (CBSAs) are defined by the U.S. Census Bureau, and each one contains a single core county, an urbanized area, or an urban cluster with a population of at least 10,000 people. Each core county is combined with any adjacent counties that have a high degree of social and economic integration with the core, as measured through commuting ties with the core or other counties associated with the core.

CBSAs are each classified as either a metropolitan area (urbanized area with a population of least 50,000 people) or a micropolitan area (urbanized area with a 10,000-50,000 population). When processing and creating RAND Hospital Data, all rural counties within a state - i.e., not included in a metropolitan or micropolitan area - are grouped and coded as “XX999” where “XX” is the 2-character postal abbreviation for the state, such as “PA999” for Pennsylvania.

Subscribers can choose between data with OR without outliers corrected. To correct data outliers, RAND applies an algorithm that identifies numeric values that fall far outside the normal range of variation and replaces them with interpolated values. In general, the data is allowed a very wide range of variation before being corrected; the degree of variation is adjusted based on the degree of observed variation within a given hospital over time (hospitals that typically exhibit wider-than-normal variation are given more latitude) and the typical degree of variation for a given variable.

Registered users can only access a subset of variables from the most recent cost report for each hospital with outliers corrected.

Yes. A documentation zip package is available for download, and contains three Excel workbooks, two .zip files, and a Readme:

HCRIS_2552-10_Worksheets_A_B_C_D_E_G_L_S_with_varnames_[YYYY_MM_DD].xlsx: CMS hospital cost report worksheets from Form 2552-10, with yellow/green highlights for fields included in processed datasets.
HCRIS_2552-96_Worksheets__A_B_C_D_E_G_L_S_with_varnames_[YYYY_MM_DD].xlsx: CMS hospital cost report worksheets from Form 2552-96, with yellow/green highlights for fields included in processed datasets.
P152_36.zip: Chapter 36 of the CMS Provider Reimbursement Manual, which gives hospitals instructions on filling out worksheets in Form 2552-96.
P152_40.zip: Chapter 40 of the CMS Provider Reimbursement Manual, which gives hospitals instructions on filling out worksheets in Form 2552-10.
RAND_hospital_data_contents_[YYYY_MM_DD].xlsx: A workbook containing a list of fields included in the datasets available for download for subscribers and registered users.
Readme.pdf: Contains tips on how to use the included documentation files.

RAND Hospital Data: Web-Based Tool. Santa Monica, CA: RAND Corporation, 2018. https://www.rand.org/pubs/tools/TL303.html.

As a subscriber, if there is a field that you need for your analysis that is not included in RAND Hospital Data, please contact us and let us know the worksheet, column, and line(s) from Form 2552-10 that interest you. We will accommodate as many of these requests as possible in future releases of the RAND Hospital Data.

Unless a user is trying to replicate a specific analysis that relies on data from a particular Release Date, we recommend using the most recent release. We make this recommendation for several reasons:

The most recent release will reflect the most recent raw data and will likely contain more complete data for later years.
At times, CMS will require facilities to revise and resubmit cost reports; others are subject to audit. This can lead to changes in reported values.
The RAND Hospital Data Team strives for continuous quality improvement. In many releases, we add new raw and derived variables, and if needed, make corrections and improvements to our methods.

It takes approximately 18 months after the end of the year of interest for complete data to appear in the RAND Hospital Data. This is due to three main factors:

Cost reports are submitted on a rolling basis (i.e., hospitals can choose the reporting period).
Hospitals are given up to five months after the end of their cost reporting period to submit their data to CMS. Once submitted, the Medicare Administrative Contractor then checks and applies edits to the data, which takes some time.
Data are refreshed four times a year.

We include two variables to provide details on the completeness of cost report data for individual hospitals in a given time period:

share_of_time_period_with_report: Ranges from 0 to 1, with 0 indicating no days of a year are covered, and 1 indicating that all days of a year are covered.
days_in_cost_reporting_period: Ranges from 0 to 365, with 0 indicating no days of a year are covered, and 365 indicating all days of the year are covered (we do not adjust this variable for leap years).

We calculate two additional metrics to help users estimate the relative completeness of cost report data within the time period of interest across all hospitals:

Number of hospitals: The number of hospitals that have submitted a cost report that covers any portion of the time period of interest.
Number of hospital-years: The combined length of time, in years, covered by all cost reports across all facilities for a time period of interest.

To estimate completeness, we take the ratio of number of hospital-years to the number of hospitals reporting any data. For example, if 4,000 hospitals have submitted cost reports that cover any part of the time period of interest, and each of those cost reports covers six months of a year of interest, we calculate that to be 2,000 hospital-years. Taking the ratio of hospital-years to hospitals yields a value of 0.5, indicating that the data are half complete. A year of data is considered complete when the hospital-years metric approaches or equals the number of hospitals metric.

The following sources provide additional information and examples of how CMS hospital cost reports have been used:

BlueCross BlueShield Association (BCBSA): Released a series of commentaries and case studies on CMS Cost Reporting Forms. Although these reports are not publicly available, they can be extremely helpful if they can be obtained from BCBSA or another source.
National Bureau of Economic Research (NBER): Provides HCRIS data and documentation, including the relational databases (in original text format, as well as SAS and Stata formats), and processed flat files containing select variables.
Sheps Center at the University of North Carolina: Published several studies that use Medicare hospital cost reports to measure the financial performance of rural hospitals.
CMS FAQs: Answers a set of Frequently Asked Questions relating to cost reports.
Nancy Kane and Stephen Magnus (2001): Discussed limitations of the CMS hospital cost reports, some of which were addressed in the update from 2552-96 to 2552-10.
Medicare Payment Advisory Commission (MedPAC): In June 2004, compared several sources of data on hospital financial performance, including CMS cost reports and audited financial statements.
University of North Carolina (2012): Compared selected items from CMS cost reports with "gold standard" audited financial statements.

CMS and NBER also provide CMS hospital cost report data that has been pre-processed to make it easier to use. However, there are a number of differences between the RAND Hospital Data and these files:

Feature	RAND Hospital Data	CMS Roll-up Data	NBER Files
Multiple years of data, one file
Sub-lines summed to line level
Standardized time periods
Calendar year
Federal fiscal year
Hospital fiscal year
Derived variables
Variable documentation written for research audience
Variable names
Concept-based
Location-based
Logitudinally consistent between Form 2552-96 and Form 2552-10
Available data formats
SAS
Stata
csv
Geographic summary files

At this time, there are no discounts offered for subscriptions to the RAND Hospital Data for any group, including students, government workers, nonprofit staff, etc. As a nonprofit, we charge minimal fees to support and sustain the service.

The RAND Hospital Data is not offered with a tiered pricing structure at this time; users cannot purchase a subset of variables at a lower price. Users have two options:

Register for free to gain access to a limited dataset.
Pay $499 for an annual subscription to obtain full access to all releases of the RAND Hospital Data (see our Pricing page).

We do not currently offer a site, university, or multi-user license; subscriptions are on a per-user basis. We ask subscribers to use their best judgment on the number of subscriptions to purchase if the data are to be used across an institution or for different projects.

If a subscription is purchased through PayPal, it will automatically renew at the end of the year, unless the user cancels prior to the next charge date. If a user provides payment through a Purchase Order, the subscription will lapse after one year, and a new Purchase Order must be issued to access the data for an additional year.

We are happy to answer questions about individual variables in the RAND Hospital Data and how they are constructed. Unfortunately, we do not have sufficient resources to provide support in designing analyses, writing code, or interpreting results.

RAND Hospital Data Variables

Download documentation with a comprehensive list of variables included in RAND Hospital Data