Benefits of Small Area Measurements: a Spatial Clustering Analysis on Medicare Beneficiaries in the Usa

Small area estimates on where services for potential Medicare beneficiaries may be needed, could provide unique research opportunities for improving the healthcare quality of the ageing U.S. population. The project described in this paper validates this argument by contrasting the spatial clustering results from an analysis that uses large geographical units with proxy measures to the results from an analysis using small area geographic units with direct measures. Large-area proxy measures come from county-level U.S. Census Bureau 2010 cross sectional data on the number of people aged 65 and over. Medicare beneficiary estimates in 2007 with Primary Care Service Areas (PCSAs) make up the small-area direct-measure analysis. Findings show that the latter offers a more geographically defined appraisal of where healthcare quality efforts should focus to aid potential Medicare beneficiary populations. Because the healthcare quality of an aging population will only increase in importance as their numbers grow in the US, further research is needed.


Introduction
Estimating the geographical distribution of health care needs in the aged is a topic of great importance due to their growing population.The United States (US) Census Bureau projects that, by 2030, nearly one out of every five US residents will be aged 65 and older (Vincent & Velkoff, 2010).Individuals aged 90 and older are expected to make about 10% of the older US population by 2050 (He & Muenchrath, 2011).Influencing policy is frequently the professional raison d'être (reason for existence) in public health research (Luft, 2012).A call for using spatial-analysis to understand health outcomes was given many decades ago (Mayer, 1983).This study argues that adding a "geo-spatial" (i.e., geographical space: for use of term see Siordia & Saenz, 2012) dimension to how healthcare needs are geographically localized and ultimately understood, has the potential for improving our understanding of what affects the quality of healthcare for the growing population of aged adults in the US.

Literature review
Quantitative investigations making use of hierarchical models continue to grow as the quantity of geographically referenced data expands, making use of geographic information systems (GIS) more commonplace in spatial epidemiology and public health research (Jerret, Gale, & Kontgis, 2010).The proliferating use of small area estimates is born out of an attempt to capture how an individual's environment plays a role on their health (Riva, Gauvin, & Barnett, 2007).The environment, in this context, refers to the: built; physical; and social habitat of the individual (Weden, et al., 2011).Recent work has advanced this line of research to explicate that even geographical membership by governmentadministrative areas plays a role in health outcomes (Arcaya, et al., 2012).At the most fundamental level, the motivator for creating and using small area geographical attributes is born out of the theoretical assumption that health varies by location-where an individual's composition and geographical exposure to health harming risks interact to influence their health status.
Operationalizing the theoretical significance of various human geographies and drawing appropriate statistical inferences is complex given the intractable nature of "scale"-a factor determined by both physical distance and the socially constructed meaning of space (Lefebvre, 1991).Discussions of the modifiable areal unit problem (Openshaw, 1984) begun about 80 years ago (Gehlke & Biehl, 1934), were formally introduced into statistics over a century ago (Yule & Kendall, 1950), and were directly connected with quantitative geography about four decades ago (Clark & Avery, 1976).The core argument in this discourse is how changing the size and/or shape of a spatial unit has the potential to affect inferences from the analyses, where results; are not influenced by the scale of the areal unit; are quantitatively impacted (estimated equation parameters are impacted but do not change direction); or are qualitatively impacted by the areal scale (estimated equation parameters and their direction change) (Fotheringham & Wong, 1991).A publication investigating the effects of the modifiable areal unit problem on spatial econometrics has more recently validated the fact that parameter efficiency is affected by the level of aggregation (i.e., the scale) when spatial autocorrelation is present (Arbia & Petrarca, 2011).
A full discussion on challenges posed by the modifiable areal unit problem is beyond the scope of the current study.Brief attention is given to it here in order to highlight why it is important to discuss and investigate the effects of scale in the representation and analysis of geographically referenced data.The main point here is to explain why creating aggregate measures at different geographical scales is important.For example, the European Commission develops economic information by territories using Nomenclature of Territorial Units for Statistics (NUTS)-a hierarchical classification scheme that produces small-, basic-, and major-regional economic statistics.The creation of information at different scales can then be used to serve different purposesthe needs of a community versus those of a region.
Because the use of healthcare resources in the US is highly localized, the availability of small area Medicare measures is very valuable.The main argument being made in this paper is that public health researchers benefit from having health-related information at a small geographical scale.The benefits derived from analyzing small area measures are commonly understood amongst experts and have been used for many years (Carstairs, 1981).The discussion here is directly linked with public health policy, because when it comes to informing policy, the use of small area data in quantitative analysis is more useful than aggregate measures derived from large area geographies (Box, 1979).Arguments of why more geographically defined healthcare estimates are better than their large-area counterparts have largely relied on qualitative explanations.However, such a measure of credence falls short of the scientific standard.In this paper, quantitative techniques are employed to show an instance where small geographies with direct measures produce more geographically precise results than those created by using large-area polygons with proxy-measures.
The specific aim of this investigation is to give empirical evidence for why small area Medicare estimates are important.In the US, "Medicare" is a national health insurance administered by the federal government that insures access to basic health services for legal residents aged 65 and over.To investigate why smaller geographical areas with more direct measures provide more precise information for policy creation than estimate from large scale geographies with proxy measures, the investigation focuses on analyzing the spatial clustering of: (1) the count of Medicare beneficiaries aged 65 and above by Primary Care Service Area (PCSA); and (2) the count of those aged 65 and over by county in 2010.The first spatial clustering analysis with PCSAs (i.e., small area units) uses 6,494 geographical polygons in the US mainland, while the second analysis with counties (i.e., large area units) only uses 3,109 spatial units.
The research question is: Do smaller and more precise Medicare beneficiary measures provide insight not available through large polygon analyses with proxy measures?It is hypothesized that using smaller geographical units and more precise Medicare beneficiary measures will render more precise findings, when contrasted to methods that employ larger geographical polygons and proxy measures.Qualitatively important differences are expected to be found because PCSAs are geographically smaller and because they offer more precise measurements of Medicare beneficiary populations.

Study design and methods 2010 Census Summary File 1
The county level percent aged 65 and over measure is derived from US Census Bureau Summary File 1 (SF1) files (U.S. Census Bureau, 2012a).SF1 data is derived from full count of the population conducted during the 2010 decennial data collection period.The SF1 file used in the analysis provides a county-level tabulation of the total population by age categories.The "percent aged 65 and over" measure used in the first spatial clustering analysis is computed for each county with the following equation: [(number of people aged 65 and over ÷ total population)*100].

PCSA layer 2007 data
1 was prepared for the Health Resources and Services Administration by the Center for Health Policy Research is used.The 2007 PCSAs "Medicare beneficiary count" variable was derived from 100% Outpatient file for Medicare over 65 Feefor-Service beneficiaries 2 .PCSAs are service areas delineated by the use of primary care services from primary care workforce (Goodman et al 2003).The geographical polygons are defined by aggregating zip code areas to designate primary care market areas that are based on Medicare beneficiaries travel to primary care health professional (e.g., physicians, advanced nurse practitioners, and physician assistants for ambulatory primary care).This means that PCSAs include a ZIP code area with one or more primary care providers and any contiguous ZIP code areas whose Medicare populations seek the plurality of their primary care from those providers.Please note that PCSA estimates of Medicare beneficiaries are assumed to be as valid and reliable as county level percent aged 65 and over measures.

Mapping
Geographically referenced measures are linked to mapping software using a geocoding system that assigns a numeric or alphanumeric code to each polygon.
Topological Integrated Geographic Encoding Referencing (TIGER) Shapefiles from the Census were used to conduct all mapping and spatial analyses.A shapefile is a geospatial vector data format used in GIS related software.Shapefiles provide open specification for data interoperability by describing geometries by using points, polylines, and polygons.Full details on ".shp" files are available elsewhere (ESRI 2011).A broader explanation of U.S. Census Bureau TIGER/Line Shapefiles is also available elsewhere (U.S. Census Bureau 2012b).Please note that although figures presented in this paper have been altered (exported from ArcMap as JPEG and extracted for display in publication), spatial analyses were conducted using shapefiles projected with a US contiguous Albers equal area (conic USGS), using the GCS NA 1983 coordinate system with the 1983 datum and the Prime Meridian.All analyses and mapping were conducted using ArcGIS® 9.3 [software by ESRI.ArcGIS® and ArcMap™, the intellectual property of ESRI and are used herein under license (Copyright © ESRI, all rights reserved) for more information about ESRI® software, please visit www.esri.com].

Spatial Nonstationarity
Modeling geographic dependence is popular in social behavioral sciences in part because spatial clustering has to do with spatial autocorrelation (Carpenter, 2011).Spatial autocorrelation is a form of statistical dependence sometimes present in geographically referenced data.Geo-spatial autocorrelation arrives when spatial processes deviate as a function of geographical location.Spatial autocorrelation is the statistical terminology representing the concept of "spatial nonstationarity" (for a more complete discussion on spatial nonstationarity see Siordia, Saenz, & Tom, 2012).At its most basic form, spatial nonstationarity explains the fact that environmental processes may at times influence each other as a function of geographical space-where being physically proximal usually means having more similarity than being physically distant.For example, the economic well-being of one county has the potential to influence the economic well-being of surrounding counties.

Spatial Clustering
Spatial nonstationarity can be captured with a local indicator of spatial autocorrelation (Anselin, 1995).This project measures spatial autocorrelation by using polygon centroids and shifts in the movement of kernel intensities.The kernel intensity for the spatial point pattern on both measures is estimated using polygon attributes (e.g., percent aged 65 and over) with an adjusted bandwidth of 2-miles for PCSAs and 91-miles for county measures.The bandwidth is the minimum distance between PCSAs and counties, respectively, where all polygons are allowed to have at least one neighbor.This means that the kernel intensity function should be thought of as an exploratory tool capable of producing an intensity plot.
The purpose of using local measures of similarity between PCSA (or county) neighbors is to create a map of P values related to the hypothesis of no autocorrelation in each polygon.Local indicators of spatial association make use of the k function by observing spatial autocorrelation.Ripley (1976) formally introduced the K function and proved its reliability as a statistical tool to analyze second order moment in a point pattern process.The K function is: where the numerator is the expected number of events lying within distance h; in an arbitrary event of process; and where the denominator is the intensity of the process (Galvis et al., 2009).So that if we are said to have a benchmark point pattern process is K(h) = πh2, then K(h) < πh2 signals a "regular" point in the pattern, while K(h) > πh2 indicates a "clustered" process (Galvis et al., 2009).Local Moran's I (LMi) is used with areal data to produce local-specific statistics by identifying the location where pattern deviations occur by chance.Where the simplified global Moran's I equation is: in the PCSA analysis, x i is the Medicare beneficiary and μ is its sample mean, so that Z is the polygon's count of Medicare beneficiaries at PCSA i and at PCSA j .This means that in the spatial clustering analysis, a set of weighted PCSA and county features is developed with the Cluster and Outlier Analysis tool in ArcGIS 9.3.Fundamental theoretical assumptions in spatial clustering analysis have been discussed elsewhere (Siordia & Fox, 2013).
The procedure identifies "Medicare beneficiary" and "total population aged 65" clusters with high values.Local Moran's I score are used, as derived from the global Moran's I equation: High intensity clusters with high values are referred to as High-High clusters (HH  c ).With the PCSA analysis, HH c represent areas where the high count of Medicare beneficiaries is surrounded by zones where the count of Medicare beneficiaries is spatially autocorrelated.More technically, HH c signals where the intensity in the order moment of a point pattern deviates significantly from the benchmark.In essence, the LMi approach assesses the degree of relatedness of the sets of PCSAs and counties with respect to Medicare beneficiary counts and total population aged 65 respectively.LMi capture the extent to which neighboring spatial PCSAs and counties influence the counts and population on their respective polygons.

Results
Clustering of populations aged 65 and over by county is first discussed, followed by a review of the spatial clustering of Medicare beneficiary count by PCSA.The results are contrasted in the last section of the results.

County Clustering
County clustering results are displayed in Figure 1, where state boundaries (thick black lines) surround county polygons (thin black lines).Red areas on the map signal the statistically significant clustering of older adult populations (i.e., the HH c of percent aged 65 and over).As can be noted from viewing the map, large clusters of populations aged 65 and over are found in: California; New Jersey; Connecticut; Massachusetts; south-eastern New York and Pennsylvania; and Arizona (note the large county polygons).Texas, Washington, Illinois, and Ohio also have noticeable clusters.

Qualitative Evaluation on the Differences in Clustering Results
It was hypothesized that using PCSAs would render unique findings when contrasted to a method using counties.The visual-differences were expected to be found not only because PCSAs are geographically smaller, but because they have more precise measurements of Medicare beneficiary populations than the proxy-measure (i.e., population aged 65 and over).In contrast to Figure 1 where proxy measures are used with county polygons, the findings displayed in Figure 2, clearly show that PCSAs provide more geographically detailed and precise clusters of Medicare beneficiary counts.
For example, when using county polygons, the planning for distributing resources for the aged population may suggest giving equal funding to the following counties: Mohave; Yuma; Yavapai; Maricopa; Pinal; and Prima.Mohave is a largely rural county.In this instance, Mohave County would be better served to know where the funds should be allocated.The analysis using PCSAs clearly shows that potential Medicare beneficiaries in Mohave County reside in a small southwestern area of the county, where: Bullhead City; Fort Mohave; Mohave Valley; and Lake Havasu City are located.This qualitative comparison clearly shows that improving the healthcare quality of the ageing US population could be aided by the use of small area geographic units with direct measures.Please note here that the qualitative comparison being made here assumes that PCSA measures are move valid and reliable because they have more direct and geographically defined measures of potential Medicare beneficiaries.Thus, when comparing Figure 1 and 2, the latter is believed to be more informative.The key argument being made here is that the visualization of the information makes a notable difference-where scalar factors are one important component of visual representation.

Discussion
In answer to the research question, it is found that smaller and more precise measures provide unique information on the geographical distribution of potential Medicare beneficiary populations.Small area estimates provide unique research and policy opportunities for improving the healthcare quality of the aged.By qualitatively contrasting the spatial clustering results from an analysis that uses large geographical units with proxy measure attributes to the results from an analysis using small area direct measures, the study finds that Medicare beneficiary estimates with PCSAs provide a more detail and precise appraisal of the Medicare beneficiary population.

Limitations
The paper makes a substantive contribution to spatial healthcare quality research by providing an investigation on the differences between using smallarea and large-area population estimates.Notwithstanding this contribution, it has some limitations.In the first instance, the qualitative and visual-comparison between maps fails to capture if the differences are statistically different.Of particular importance is the fact that an important assumption is being made: the differences in clustering are more than the product of using different geographical polygons (county versus PCSA), measures (direct versus proxy), and years of data (2007 versus 2010).The project assumes potential Medicare beneficiary populations are best captured by PCSA data and polygons.Since both secondary survey data sources make use of sampling techniques, they contain errors in their estimates.Without an absolute count of potential Medicare beneficiaries, the assumptions above cannot be tested-and assumptions in spatial modeling and health-related outcomes need careful treatment (Anselin, 2006).
Another limitation is technical in nature.The spatial clustering algorithm being employed in this analysis first calculates polygon centroids then assigns the polygon's attribute (e.g., % age > 65) to the centroid-point and measures the Euclidean vector-space between points to compute geographically referenced weights to be used in the estimation of the K function (Siordia & Fox, 2013).In location modeling with Local Moran's I, geographically referenced continuous variation is thus represented by a set of discrete points.Implicit in this procedure is the fact that the polygon centroid is an aggregate measure from a two-dimensional polygon which consists of an infinite number of points (Murray et. al., 2008).This approach has the potential for introducing uncertainty/error in the estimated parameters (Tong & Church, 2013).
The spatial analyses techniques used in this project are popular because estimating spatial clustering while accounting for polygon complexity and multiple spatial relationships is challenging (Goodchild & Haining, 2004;Tong 2012).However, Local Moran's I spatial clustering estimation has the potential to be affected by polygon size and shape since it uses polygon centroids in point-pattern detection.In general, PCSAs are made up of smaller polygons than counties.However, polygon complexity (i.e., the degree to which a geometric shape differs from a simple shape such as a circle, square, or triangle) may be higher in PCSAs than in counties.Because of these differences, it is difficult to ascertain if an analysis that accounts for polygon complexity would produce similar results.

Directions for Future Research
The results of this study provide evidence for the argument that using smaller geographical areas with more precise measures provide detailed rich results that are unavailable when using large-areas with proxy measures.Spatial methodologies should explore alternatives for comparing the statistical significance of differences from clustering analysis where different geographical polygons and measurements are undertaken in the analysis.Future work should aim at investigating the same phenomenon with different polygons and health measures.Scale-space theory (Witkin, 1983) has led to the formation of models that account for alternative representation schemes (Lu & Wang, 2008) where vector base spatial clustering analysis is independent of frame and is scale invariant (Kan & Weber, 2003).For example, models examining results from the partitioning of a bounded continuous space into a number of two-dimensional zones (Tong & Murray, 2009).These methodologies should be more frequently used in public health research.
Healthcare quality researchers should aim at informing policy by engaging the geo-spatial dimension in their investigations.Funding for the development of more precise and geographically localized health data should continue.Because research has shown that analytic mapping can be applied in the primary care setting and received enthusiastically by clinic leaders (Bazemore, Phillips, & Miyoshi, 2011), efforts to implement GIS in the healthcare quality public health area should continue.Because the healthcare quality of the aging population will only increase in importance as their numbers grow in the US, further research is needed.

Endnotes
1 Data from the "updated CHC/FQHC/RHC location data" version 9/4/2009. 2The count variable made use of the "PVDEN_07" variable.This factor was created for each PCSA from using a 20% national sample of fee-for-service Medicare beneficiaries and analyzing 100% of their physician and hospital claims.

Figure 1 .
Figure 1.County "HH" Clustering of Population Age 65 and above PCSA Clustering PCSA clusters are displayed in Figure 2, where county boundaries surround PCSA polygons.As with the previous figure, the red areas on the map signal statistically significant clustering.In this case, the red polygons indicate the spatial autocorrelation on the high level of Medicare beneficiary count (i.e., the HH c ).Here again, the following states stand out: California; New Jersey; Connecticut; Massachusetts; south-eastern New York and Pennsylvania; and Arizona.Texas, Georgia, Washington, Illinois, Michigan, and Ohio are now more important stateplayers in the findings.

Figure 2 .
Figure 2. PCSA "HH" Clustering of Medicare Beneficiary Count Beneficiaries were included in the calculation of PVDEN_07 if: (1) they resided in the United States; (2) were not HMO enrolled during 2007; (3) were aged 65 to 99 years on January 1, 2007; (4) had Part A on June 30,2007; and (5) Part B entitlement in June 2007.PCSAs are assigned to each study beneficiary based on his/her resident zip code.