Brief 3: Development of HCBS Outcome Measures
Limited high-quality measurement in the Home and Community–Based Services (HCBS) field has impeded our understanding of the extent to which HCBS facilitates its recipients achieving desired life outcomes. In addition, it has limited the conduction of applied research with respect to the facilitators, strategies, and interventions that constitute high-quality services and supports. The development and validation of psychometrically sound HCBS outcome and quality measures has been identified as a priority by the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), the Administration on Community Living (ACL), and the Center for Medicaid and Medicare Services (CMS). At the current time, however, federal and state authorities, service providers, and advocacy/self-advocacy organizations must exercise extreme caution when interpreting available outcome data. This is necessary due to the fact that most available measures have little empirical evidence to support their use. The aim of this brief is to present a systematic process for developing and validating HCBS outcome measures with the goal of increasing the likelihood that they will demonstrate sound psychometric characteristics, be endorsed by the National Quality Forum, and found useful in the process of quality improvement.
HCBS constitute a part of Long-Term Services and Supports (LTSS) funded by Medicaid and designed to provide person-centered services within an eligible individual’s home or community (CMS, 2019). Such services may be used for assistance with daily living tasks, to support employment in the community, or facilitate the inclusion of people with disabilities in a variety of other contexts. People who rely on HCBS include individuals with intellectual or developmental disabilities (IDD), physical disabilities (PD), age-related disabilities (ARD), traumatic brain injury (TBI), and psychiatric disabilities (PsyD).
Over 2.5 million individuals receive HCBS through an optional Section 1915 (c) or Section 1115 waivers, with nearly 1.2 million receiving optional personal care state plan services (KFF, 2020). Accurate and reliable measurement of the effectiveness of HCBS is critical for a variety of reasons beyond the sheer number of people served. First and foremost, HCBS recipients depend on the supports they receive in their daily lives. When delivered as intended, these services have the potential to enhance inclusion within the community as well as self-determination and independence. It is imperative for people with disabilities and their families to know the extent to which these life outcomes are experienced. In addition, a tremendous amount of funding is required to provide HCBS with joint federal and state Medicaid HCBS spending totaled $92 billion in FY 2018. Federal agencies, states, and taxpayers have the right to know the extent to which the services they fund have their intended impact.
A Framework for HCBS Measure Development
Measure development is a systematic process that requires: (1) the development of a measure concept or construct based on a thorough understanding of existing conceptual frameworks and research; (2) generation of guiding questions that one desires to be able to answer with the measure that will be produced; (3) creation of a pool of potential items that are sufficient to adequately saturate the construct of interest; (4) development of one or more sets of response options for items; (5) evaluating items for a variety of aspects of usability, feasibility, and accessibility, and (6) testing the measure concepts to ensure that their psychometric properties meet or exceed the guidelines laid out by the National Quality Forum (NQF), CMS, and American Psychological Association (APA). The process of measure development is divided into four parts as depicted in Figure 1.
- Exploration and understanding of the construct to be measured;
- Item Development;
- Qualitative feedback; and
- Quantitative testing
Before presenting each phase of the measure development process in detail, the guidelines for measure development offered by the NQF, CMS, and APA, and an important stakeholder input process for the selection of measures will be reviewed.
Theoretical frameworks and guidelines for measure development
Organizations concerned with measurement quality such as the NQF, CMS, and APA have provided extensive guidance on what constitutes good measure development. CMS has developed a comprehensive measure development lifecycle that details a process to follow in developing, testing, and maintaining measures. Moreover, the NQF provides guidelines for evaluating the scientific acceptability of measures (e.g. importance, feasibility, usability, and psychometric properties) submitted for endorsement. Specific guidelines for measurement when it is used with people with disabilities are provided by the APA (APA, 2020) and the Standards for Educational and Psychological Testing (AERA, APA, & NCME 2014). Despite these extensive guidelines and standards, the vast majority of instruments in HCBS have not undergone this rigorous development process. As a result, users cannot be confident that their use will lead to the collection of reliable and valid data. This brief describes a process for developing quality measures to assess the impact of HCBS on outcomes experienced by HCBS recipients.
Process for selection of measures to be developed (stakeholder input and gap analysis)
The Research and Training Center on HCBS Outcome Measurement (RTC/OM) funded by the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) was tasked with developing person-centered HCBS quality and outcome measures based on the National Quality Forum’s (NQF) HCBS Outcome Measurement Framework (NQF, 2016) (Figure 2). This comprehensive framework was developed by a group of HCBS subject matter experts and includes 11 domains with 2-7 subdomains representing outcome areas deemed essential to measure in HCBS. However, the framework was not validated with stakeholders. Therefore, the RTC/OM’s first step was to meet with stakeholders to validate the new framework and determine which domains and subdomains were considered most important to measure.
To validate the NQF framework, RTC/OM staff persons led fifty-eight Participatory Planning and Decision Making (PPDM) groups with stakeholders across the country from four primary backgrounds: people with disabilities, family members of people with disabilities, organizations that provide HCBS, and HCBS policymakers and program administrators. The PPDM format allowed stakeholders to weigh the importance of potential measurement domains and subdomains and move toward consensus as to which were most important to measure.
After using the results of stakeholder input to determine the most important areas to measure, a gap analysis was undertaken and input from a national advisory group of HCBS leaders and stakeholders used to finalize the selection of eight outcome areas for measure development. The gap analysis involved a review of over 130 instruments used for HCBS outcome measurement and coding their items against the NQF framework to identify the domains and subdomains in which promising measures were already available and those needing further measure development. An expert panel of members from the RTC/OM national leadership and advisory groups then rated the potential domains and subdomains on feasibility, usability, and importance to guide the selection of the initial concepts for development. Two important criteria for inclusion were: 1) high ratings from stakeholder groups, and 2) person-centeredness (i.e. responded to by the person, based on what is important to him or her). Table 1. includes the measure concepts selected for development and their definitions.
Part 1 Exploration of the construct to measure
After identifying the measures to develop, extensive literature scans were undertaken for each measure concept. These scans had the broad goal of exploring the problem space of each measure concept including, 1) construct definitions (e.g. How is each measure theoretically defined in the literature?), 2) Operational definitions (e.g. What are the concrete domains and indicators used to measure the construct?) 3) Existing instruments that measure the construct (e.g. Are their psychometric characteristics known? Person-centered?). The information collected about each measure concept was summarized in a series of blueprints that guided the next step of item development.
Part 2 Item Development
The next step in the measure development process was the drafting of guiding questions. Guiding questions can be conceptualized as arguments that specify the intended inferences and supporting assumptions for each measure. They refer to specific inferences related to the construct measured and more specifically their identified domains and subdomains. Before creating new items, items from existing HCBS instruments were reviewed to ensure coverage of those aspects of the construct in question not currently being measured. Based on this review, the RTC/OM team developed items to capture the domains and subdomains identified in the blueprint for each measure concept.
Each measure was conceptualized as having two tiers. Tier 1 consists of a small number of questions that assess global aspects of the construct in question. Tier 2 items are more specific in nature and attempt to provide answers as to factors underlying the experience of more global outcomes. This two-tiered approach was utilized to 1) test the validity of the constructs, 2) gather insight with respect to how specific items are related to the broad measure concept, and 3) potentially reduce the measures in length.
Table 1. Definitions of Measure Concepts Selected for Development by the RTC/OM
Focus of Measure
The degree to which HCBS recipients
Choice and Control
The degree to which HCBS recipients exercise choice & control over
The degree to which HCBS recipients
The degree to which HCBS recipients…
Abuse and Neglect
The degree to which
Part 3 Feedback from Stakeholders - Technical Expert Panel
Once draft items were developed for each measure concept, a technical experts panel (TEP) consisting of a variety of stakeholders, including content and measurement experts in each concept area, HCBS program administrators, and people with disabilities and their families, reviewed and rated each item in the six areas summarized in Table 2.
Table 2. Summary of TEP Areas of Input
Area of Input
Stakeholder Input Received
1) Item Relevance
Is the item relevant to the specific measure concept?
2) Item Importance
Is the item important to measure within the specific measure concept?
3) Item Understandability- Interviewer
Is the item understandable by an interviewer (data collector)?
4) Item Understandability- HCBS Recipient
Is the item understandable by the recipient of HCBS (respondent)?
Does the item accurately measure an aspect of the construct?
6) Response Options
Are the response options appropriate for the item?
For all areas, except response options, reviewers rated the item on a four-point scale. If a reviewer rated an item with a low score, they then completed a follow-up open-ended response item to provide specific feedback related to that item. TEP members were also asked for feedback on the appropriateness of the response options used for each item and given an opportunity to suggest alternative options/scales. TEP ratings and feedback were used to revise and, in some cases, remove or replace items that stakeholders indicated did not adequately measure a concept.
Cognitive testing (CT) is used to fine-tune items using direct input from the target population. It is considered an essential step for improving item accessibility (Kramer & Schwartz, 2017). CT is designed to obtain direct input to verify that the respondent’s interpretation of an item and the terms within it match the developer’s intent (Ericsson & Simon, 1980; Willis, et al, 1991) and contribute to the validity of the measure (Castillo-Díaz & Padilla 2013). This form of stakeholder involvement was essential to the development of measures to assure items were interpreted similarly by people from all disability groups and recipients of a variety of HCBS and increase their accessibility for intended interviewees.
Cognitive testing strategies employed for this phase of measure development were designed to elicit input as to how the target population interpreted each item as a whole using what is referred to as the think-aloud approach as well as probing questions about key terms (Beatty & Willis, 2007; Willis, 2005). The protocol was designed to address the core cognitive components of item responding included in the Cognitive Aspects of Survey Methodology (CASM) model: comprehending the item, retrieving the information needed to answer the item, making a judgment, and reporting a response (Tourangeu, 1984; 2018). Figure 3 illustrates the major phases of the RTC/OM Cognitive testing protocol.
Figure 3. General RTC/OM Cognitive Testing Protocol
Pre-Item Testing Stage
Think Aloud Stage
Item Probing Stage
This protocol was used with at least five members of each target population (ARD, IDD, TBI, MH, PD) for each measure. Testing identified several item wording improvements related to the item’s measurement intent and, particularly, interpretation of terms (e.g., HCBS-related terminology). Qualitative information was also used to create a glossary of terms and examples to aid interviewers in supporting item comprehension.
Part 4 Quantitative Testing
Pilot testing of newly developed measures is imperative to ensure the results are reliable and valid. The purpose of pilot testing is to evaluate the usability and feasibility of developed outcome measures and observe the inter-item correlations (i.e. between global and specific items) and establish initial estimates of measure reliability. Pilot testing using a structured interview format was conducted with 104 HCBS recipients with intellectual and developmental disabilities (IDD), physical disabilities (PD), traumatic brain injury (TBI), Psychiatric Disabilities (PsyD), and Age-related Disabilities (ARD). Measure content (the extent to which the measure represents the domain measured) and procedural validity (the extent to which participants interpreted and responded to the measures the way they were intended), as well as internal consistency, inter-rater, and test-retest reliability were evaluated. Internal consistency reliability reflects the extent to which items within an instrument reliably measure various aspects of the same characteristic or construct. Inter-rater reliability provides information about how consistently two interviewers “score” interviewee responses to the items of which a measure is composed. Test-retest reliability reflects information about the stability of the measure over time. Pilot testing provides an opportunity to finalize measures and prepare them for more widescale testing and validation with a large representative sample of participants.
The purpose of undertaking a field study of measure constructs is to evaluate them with respect to their reliability, validity, and sensitivity to change with a representative sample of the target populations. In the case of the measures developed by this Center, five populations of HCBS recipients were targeted, including people with IDD, PD, PsyD, TBI, and ARD. A national field study is essential to evaluate: 1) the measure administration protocol and the effectiveness of interviewer training in maximizing administration fidelity, 2) usability and feasibility of the measures with a large representative sample, 3) the reliability and validity of measures across intended populations, 4) variability of scores on each measure, and 5) the degree to which measures are sensitive to change over time. The goal is to test the overall performance of the outcome measures across people with different disabilities and support needs, living in different locations and from diverse cultural communities.
An examination of the manner in which measures are field-tested is critical in order for potential users to be able to gauge the quality of the measure or measurement program they are considering. Testing should be rigorous, transparent, and provide users with a clear indication of the appropriate uses, strengths, and limitations of the measures under investigation.
An initial critical step in measure testing is the development of a sampling frame and the recruitment of a sample (see Figure 4). In psychometrics, a sampling frame refers to the specific source used in drawing a subset of cases or individuals from the larger population who can be sampled. In the case of HCBS outcome measurement, this could include individuals, community residences, providers, manage care organizations, or states (Jessen, 1970; Salant & Dillman, 1994). Based on the intended usage of a measure, inclusionary and exclusionary criteria for participation in testing should be clearly delineated, including disability type and level of support needs; residential type; age; cultural group membership, etc. Based on the intended use of a measure the sampling frame will be somewhat different. If one desires an HCBS measure to be used to differentiate providers, the frame’s focus needs to be provider agencies. If it’s meant to be utilized to make decisions about the support programs of people with disabilities, the individual level will be the focus. It’s vital that potential measure users or developers know how the sample was obtained and the degree to which the sampling frame covers the entire target population. This is especially important for HCBS measures that are intended to be used with diverse populations, across disabilities, and with individuals who have levels of support need ranging from low to extremely high (McGraw, et al., 1992).
The manner in which one goes about recruiting people to take part in measure testing is a second important piece of information that needs to be considered. One would hope that HCBS measures are tested on non-biased samples, but all too often this is not the case. Because we are testing outcome measures with people, potential participants have the option of saying no to taking part and there may exist systematic differences between those who agree and do not agree to participate. The result is that all too often, we do not have representative samples. Individuals with more significant disabilities may not be responsive to traditional means of recruitment. In addition, "reach-outs" to culturally and linguistically diverse communities are often insufficient. It may also be the case that the people who do agree to take part in testing do not truly represent the population of interest. They may, for example, either be extremely satisfied or dissatisfied with the supports and services they receive. This is especially likely to be the case when there are low positive response rates to recruitment (Fahimi, et al., 2015).
When devising a sampling frame and establishing a sample we encourage developers to include not only HCBS recipients (i.e., people with disabilities eligible for HCBS) but persons without disabilities as well. The inclusion of members of this group is important because, at the current time, we have no standards with respect to the personal outcomes one should expect HCBS recipients to achieve.
A second point that those field-testing measures or considering their adoption need to keep in mind are the steps that have been taken to ensure the fidelity of administration both during testing and further use. Questions need to be answered about the requirements needed for one to effectively administer measures. Some highly structured measures, for example, can be well administered with a minimum of training. Those that are less structured require higher interviewer skill levels and instruction. Well conceptualized manuals or modules that clearly layout training requirements, administration competencies, and steps that will be taken following training to maintain integrity with respect to the manner in which measures are administered, scored/completed, and prepared for analysis should be readily available. Part of this process includes a well thought out plan to ensure data privacy and integrity.
Two additional factors should be considered when testing or selecting outcome measures associated with HCBS. The first of these pertains to decisions regarding the analyses employed to estimate the psychometric characteristics of measures. Regardless of the approach taken, be it classic testing or item response theory, it should be specified beforehand, appropriate for the intent of the analysis and data with which it will be used, and a justification provided for its selection. In recent years we have seen far too many instances of questionable data analysis practices being employed.
The second factor that needs consideration relates to the purpose of HCBS outcome measurement. We believe that its function goes beyond compliance and must be grounded in quality improvement and supporting people with disabilities to achieve the outcomes in life that they desire. If that is the case, we need to have measures that are supported by evidence that indicates that they are sensitive to change over time. If the supports a person is provided are changed, a program improved, or a state or federal policy either newly implemented or significantly altered, the HCBS measures one is using should be sensitive enough to detect changes in personal outcomes or the quality of supports that HCBS recipients are receiving. This is especially critical for state programs operating under Olmstead agreements. Unfortunately, few of the most widely used HCBS measures have had their sensitivity to change evaluated. For measures to be sensitive to the outlined changes, HCBS measurement needs to be conceptualized as a longitudinal endeavor and follow people for sufficient lengths of time to determine if, in fact, changes that occur in policies and services actually have an impact on the quality of supports received and the outcomes people experience.
Conclusion & Recommendations
In order to evaluate the extent to which HCBS effectively supports the outcomes of people with different types of disabilities, the measures used in this process need to conform to quality measure development standards, as specified by CMS, NQF, or APA. By following this development process, we can be confident that the data collected using such measures will reliably and accurately represent the perspectives of HCBS recipients.
Recommendation 1: Measure development needs to be grounded in the experiences of people who use HCBS, as well as other stakeholders. These include family members, provider agency staff, and government officials.
Recommendation 2: Measures that are developed need to be person-centered or based on the desired life outcomes of the people with disabilities with which they are used.
Recommendation 3: Well-operationalized constructs based on sound theoretical frameworks should serve as the foundation of new measures. These should be determined on the basis of a comprehensive review of the existing literature and measures with high-quality psychometric properties.
Recommendation 4: New measures need to be vetted by a panel of experts as well as assessed with respect to their feasibility and usability with people who use HCBS to ensure everyone understands measure items in the way intended.
Recommendation 5: Prior to scale-up, measures must be rigorously field-tested to ensure they are valid, reliable, and sensitive to change .
AERA, APA, & NCME. (2014). Standards for Educational and Psychological Testing. American Educational Research Association.
American Psychological Association (APA). (2020). Guidelines for Assessment of and intervention with person with disabilities. Retrieved from https://www.apa.org/pi/disability/resources/assessment-disabilities
Beatty, P. C., & Willis, G. B. (2007). The Practice of Cognitive Interviewing. In Oxford University Press American Association for Public Opinion Research. The Public Opinion Quarterly. https://doi.org/10.2307/4500375
Castillo-Diaz, M., & Padilla, J.-L. (2013). How Cognitive Interviewing can Provide Validity Evidence of the Response Processes to Scale Items. Social Indicators Research, 114(3), 963–975. https://doi.org/10.1007/s11205-012-0184-8
Centers for Medicare and Medicaid Services (CMS). (2019). Blueprint for the CMS Measures Management System. Retrieved from https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/MMS/Downloads/Blueprint.pdf
Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215–251. https://doi.org/10.1037/0033-295X.87.3.215
Fahimi, M., Barlas, F. M., Thomas, R. K., & Buttermore, N. (2015). Scientific surveys based on incomplete sampling frames and high rates of nonresponse. Survey Practice, 8(6).
Jessen, R. J. (1964). Probability sampling with marginal constraints. Journal of the American Statistical Association, 65(330), 776–796.
Kaiser Family Foundation (KFF). (2020). Medicaid home and community-based services enrollment and spending (February 2020 Issue Brief). Retrieved from http://files.kff.org/attachment/Issue-Brief-Medicaid-Home-and-Community-Based-Services-Enrollment-and-Spending
Kramer, J. M., & Schwartz, A. (2017). Reducing Barriers to Patient-Reported Outcome Measures for People With Cognitive Impairments Archives of Physical Medicine and Rehabilitation. Archives of Physical Medicine and Rehabilitation, 98, 1705–1720. https://doi.org/10.1016/j.apmr.2017.03.011
McGraw, S. A., McKinlay, J. B., Crawford, S. A., Costa, L. A., & Cohen, D. L. (1992). Health survey methods with minority populations: some lessons from recent experience. Ethnicity & Disease, 2(3), 273–287.
National Quality Forum (NQF). (2016). Quality in Home and Services to Support Community Living: Addressing Gaps in Performance Measurement. Retrieved from http://www.qualityforum.org/Publications/2016/09/Quality_in_Home_and_Community-Based_Services_to_Support_Community_Living__Addressing_Gaps_in_Performance_Measurement.aspx
National Quality Forum (NQF). (2019). Measure evaluation criteria and guidance for evaluating measures for endorsement. Retrieved from http://www.qualityforum.org/WorkArea/linkit.aspx?LinkIdentifier=id&ItemID=92804
Rehabilitation Research and Training Center on Outcome Measurement. (RTC/OM). (2016). Stakeholder Input: Identifying Critical Domains and Subdomains of HCBS Outcomes. Retrieved from https://rtcom.umn.edu/phases/phase-1-stakeholder-input
Salant, P., Dillman, I., & Don, A. (1994). How to conduct your own survey.
Tourangeau, R. (1984). Cognitive science and survey methods. In T. Jabine, M. L. Straf, J. M. Tanur, & R. Tourangeau (Eds.), Cognitive Aspects of Survey Design: Building a Bridge Between Disciplines (pp. 73–100). Washington, DC: National Academy Press.
Tourangeau, R. (2018). The survey response process from a cognitive viewpoint. Quality Assurance in Education, 26(2), 169–181. https://doi.org/10.1108/QAE-06-2017-0034
Willis, G. B. (2002). Cognitive interviewing : a tool for improving questionnaire design. Sage Publications.
Willis, G. B., Royston, P., & Bercini, D. (1991). The use of verbal report methods in the development and testing of survey quesionnaires. Applied Cognitive Psychology, 5(3), 251–267. https://doi.org/10.1002/acp.2350050307