# Distance-Band Spatial Weights

## Introduction

In this Chapter, we will continue to deal with the spatial weights functionality in GeoDa, but now we will focus on weights that use the notion of distance. Intrinsically, this is most appropriate for point layers, but we will see that it can readily be generalized to polygons as well.

We will initially use a data set with point locations of house sales for Cleveland, OH, but later return to our U.S. county Homicides data to illustrate the polygon case.

We will compute the distance between points to create distance-band weights, as well as k-nearest neighbor weights. We will examine the weights characteristics and pay particular attention to the issue of neighborless locations, or isolates. We also consider generalizing the concept of contiguity to points (using Thiessen polygons), and the notion of distance-based weights for polygons (using their centroids to compute the distances).

### Objectives

• Construct distance band spatial weights

• Understand the contents of a gwt weights file

• Assess the characteristics of distance-based weights

• Assess the effect of the max-min distance cut-off

• Identify isolates

• Construct k-nearest neighbor spatial weights

• Create Thiessen polygons from a point layer

• Construct contiguity weights for points and distance weights for polygons

• Understand the use of great circle distance

#### GeoDa functions covered

• Legend > set color for category
• Preferences > Use classic yellow cross-hatching to highlight selection in maps
• Weight File Creation dialog
• distance-band weights
• k-nearest neighbor weights
• great circle distance option
• Weights Manager
• Connectivity Histogram, Map and Graph
• Map
• Thiessen Polygons option

### Getting started

To begin, we will use a data set that contains the location and sales price of 205 homes in a core area of Cleveland, OH for the fourth quarter of 2015. The data set is included as one of the Center for Spatial Data Science example data sets and can be downloaded from Cleveland Home Sales (2015). This yields a folder with four shape files with file name clev_sls_154_core and the usual four file extensions (shp, shx, dbf and prj).

We get started by clearing the previous project and dropping the file clev_sls_154_core.shp into the Drop files here rectangle of the connect to data source dialog. A themeless base map is shown in Figure 1. Figure 1: Cleveland home sales themeless map

#### Customizing a point map

To make this look a little nicer, we make three minor modifications. First, we add a base map using the corresponding toolbar icon and pick Nokia Day. The transparency may need some adjustment to make the points easier to see (use Change Map Transparency from the base map icon options menu).

Next, we change the color of the points themselves (actually, drawn as tiny circles). This is one of the options in the legend panel of the map. Right click on the green box to the left of (205) and choose Color for Category, as in Figure 2.2 Figure 2: Set color for category

In our example, we turned the circles to black, which results in the base map shown in Figure 3. Figure 3: Cleveland home sales point map

A third minor adjustment is to change the way selected and unselected points are displayed. The default is to use transparency to distinguish them, but for points this often does not provide a clear distinction, in the sense that the unselected points may be hard to see. Instead, we go back to the old way of highlighting selected observations in GeoDa, which is to use a different color.

We set this option in the System tab of the GeoDa Preferences Setup. As shown in Figure 4, the first item under Maps pertains to the way in which map selections are highlighted. The default is to have the check box off, but we change that so that the box is checked. Figure 4: GeoDa selection preferences

A selection of points (locations) in the map is now shown as red, whereas the unselected points remain in black, as shown in Figure 5. Figure 5: Cleveland points with selection highlighted

The points considered in our example are contained within an arbitrary rectangular selection taken from the full set of home sales for that period. We are now ready to proceed with spatial weights derived from a point layer.

## Distance-Band Weights

### Concepts

#### Distance metric

The core input into the determination of a neighbor relation for distance-based spatial weights is a formal measure of distance, or a distance metric. The most familiar special case is the Euclidean or straight line distance, $$d_{ij}$$, as the crow flies: $\begin{equation*} d_{ij} = \sqrt{(x_{i} - x_{j})^{2} + (y_{i} - y_{j})^{2}}, \end{equation*}$ for two points $$i$$ and $$j$$, with respective coordinates $$(x_{i},y_{i})$$ and $$(x_{j},y_{j})$$.

#### Great circle distance

Euclidean inter-point distances are only meaningful when the coordinates are recorded on a plane, i.e., for projected points.

In practice, one often works with unprojected points, expressed as degrees of latitude and longitude, in which case using a straight line distance measure is inappropriate, since it ignores the curvature of the earth. This is especially the case for longer distances, such as from the East Coast to the West Coast in the U.S.

The proper distance measure in this case is the so-called arc distance or great circle distance. This takes the latitude and longitude in decimal degrees as input into a conversion formula.3 Decimal degrees are obtained from the degree-minute-second value as degrees + minutes/60 + seconds/3600.

The latitude and longitude in decimal degrees are converted into radians as: $\begin{eqnarray*} \mbox{Lat}_r &=& (\mbox{Lat}_d - 90) * \pi/180\\ \mbox{Lon}_r &=& \mbox{Lon}_d * \pi/180, \end{eqnarray*}$ where the subscripts $$d$$ and $$r$$ refer respectively to decimal degrees and radians, and $$\pi = 3.14159 \dots$$. With $$\Delta \mbox{Lon} = \mbox{Lon}_{r(j)} - \mbox{Lon}_{r(i)}$$, the expression for the arc distance is: $\begin{eqnarray*} d_{ij} &=& \mbox{R} * \arccos [ \cos ( \Delta \mbox{Lon} ) * \sin \mbox{Lat}_{r(i)} * \sin \mbox{Lat}_{r(j)} )\\ &&+ \cos \mbox{Lat}_{r(i)} * \cos \mbox{Lat}_{r(j)} ], \end{eqnarray*}$ or, equivalently: $\begin{eqnarray*} d_{ij} &=& \mbox{R} * \arccos [ \cos ( \Delta \mbox{Lon} ) * \cos \mbox{Lat}_{r(i)} * \cos \mbox{Lat}_{r(j)} )\\ &&+ \sin \mbox{Lat}_{r(i)} * \sin \mbox{Lat}_{r(j)} ], \end{eqnarray*}$ where R is the radius of the earth. In GeoDa, the arc distance is obtained in miles with R = 3959, and in kilometers with R = 6371.

These calculated distance values are only approximate, since the radius of the earth is taken at the equator. A more precise measure would take into account the actual latitude at which the distance is measured. In addition, the earth’s shape is much more complex than a sphere, but the approximation serves our purposes.

#### Distance-band weights

The most straightforward spatial weights matrix constructed from a distance measure is obtained when $$i$$ and $$j$$ are considered neighbors whenever $$j$$ falls within a critical distance band from $$i$$. More precisely, $$w_{ij} = 1$$ when $$d_{ij} \le \delta$$, and $$w_{ij} = 0$$ otherwise, where $$\delta$$ is a preset critical distance cutoff.

In order to avoid isolates (islands) that would result from too stringent a critical distance, the distance must be chosen such that each location has at least one neighbor. Such a distance conforms to a max-min criterion, i.e., it is the largest of the nearest neighbor distances.4

In practice, the max-min criterion often leads to too many neighbors for locations that are somewhat clustered, since the critical distance is determined by the points that are furthest apart. This problem frequently occurs when the density of the points is uneven across the data set, such as when some of the points are clustered and others more spread out. We revisit this problem in the illustrations below.

Further technical details on distance-based spatial weights are contained Chapters 3 and 4 of Anselin and Rey (2014), although the software illustrations are for an earlier GeoDa interface design.

### Creating distance-band weights

As we did for contiguity weights, we invoke the Weights Manager and click on the Create button to get the process started. In the Weights File Creation interface, after specifying the ID variable (unique_id), we focus on the right-most button, Distance Weight. This generates the dialog for the three distance weight options: Distance band, K-Nearest neighbors, and Adaptive kernel. In this Chapter, we focus on the first two options.

Distance band weights are initiated by selecting the Distance band button in the interface, as shown in Figure 6. This is also the default option. The max-min distance (largest nearest neighbor distance) is given in the box next to Specify bandwidth, in units appropriate for the projection used. In our example, these are feet. Figure 6: Distance band default setting

Note the importance of the Distance metric, highlighted in Figure 6. Since our data is projected, it is appropriate to use Euclidean (straight line) distance. However, many data sets come in simple latitude-longitude, for which a great circle distance (or arc distance) must be used instead. We will revisit this shortly.

The critical distance for our point data is about 3598 feet, or roughly 0.7 miles. This is the distance that ensures that each point (house sale) has at least one neighbor. After clicking on the Create button and specifying a file name, such as clev_sls_154_cored_d (the GWT file extension is added automatically), the new weights and their summary properties are listed in the Weights Manager, as in Figure 7. Figure 7: Distance weights in weight manager

An important aspect of the metadata in the weights manager is the threshold value. This information will also be included in any Project File that is saved. This is the only reliable way to remember which distance cut-off was used for the distance bands in the weights.

### GWT file

Distance-based weights are saved in files with a GWT file extension. This format, illustrated in Figure 8, is slightly different from the GAL format used for contiguity weights. It was first introduced in SpaceStat in 1995, and later adopted by R spdep and other software packages. The header line is the same as for GAL files, but each pair of neighbors is listed, with the ID of the observation, the ID of its neighbor and the distance that separates them. This distance is currently only included for informational purposes, since GeoDa does not use the actual distance value in any statistical operations (spatial weights are also row-standardized by default). Figure 8: GWT file contents

### Weights characteristics

In the same way as for contiguity weights, we can assess the characteristics of distance-based weights by means of the Connectivity Histogram, the Connectivity Map, and the Connectivity Graph, available through the buttons at the bottom of the weights manager.

#### Connectivity histogram

The shape of the connectivity histogram for distance-band weights is typically very different from that of contiguity-based weights (as in any histogram, we can bring up descriptive statistics through the View > Display Statistics option). As illustrated in Figure 9, we see a much larger range in the number of neighbors, as well as extremes, with some observations having only one neighbor, and others having 32. Figure 9: Connectivity histogram – default distance band

We also observe these descriptive statistics in the property list shown in Figure 7. Compared to contiguity weights, the mean (12.64) and median (13.00) number of neighbors are much higher, and the matrix is also much denser (% non-zero = 6.17%).

The range in the number of neighbors is directly related to the spatial distribution of the points. Locations that are somewhat isolated will drive the determination of the largest nearest neighbor cut-off point (their nearest neighbor distance will be large), whereas dense clusters of locations will encompass many neighbors using this large cut-off distance.

#### Finding the locations separated by the max-min distance

In the Cleveland example, we can examine the GWT file to find the observation pair separated by the critical distance cut-off (3598). As shown in Figure 10, this turns out to be the observation pair with unique_id 11359 and 10014. Figure 10: Cut off distance in GWT file

In the Table, a selection of observation with unique_id 11359 will result in the corresponding location to be highlighted in red in the point map, as indicated with the pointer in Figure 11.5 Figure 11: Selected max-min location

We now open the Connectivity Map, shown in Figure 12. Since this map is a regular themeless map, we can customize it in the same way as the original point map by adding a base layer (Nokia day) and changing the color category to black. As the current selection, the point with unique_id 11359 will be highlighted in red. With the pointer hovering over the selected point, the status bar of the connectivity map will show the selected observation (11359) and its one neighbor (10114, in black), as highlighted in Figure 12. Figure 12: Selected max-min location in Connectivity Map

#### Finding the most connected observations

We can examine the effect of the large distance cut-off on more densely distributed point locations. For example, selecting the right-most bar in the Connectivity Histogram will highlight the two most connected observations in the map. The connectivity histogram shows that these have 32 neighbors. The two points are highlighted in red in the map in Figure 13. Figure 13: Most connected observations

As expected, the two points in question are in the center of a dense cluster of sales transactions. Through linking, they are also selected in the Connectivity Map. We can find their unique_id values from the table (Move Selected to Top). For example, for the observation with unique_id=19785, we see in the connectivity map that this point has 32 neighbors, highlighted as black circles in Figure 14. Figure 14: Most connected observations in connectivity map

The unequal distribution of the neighbor cardinality in distance-band weights is often an undesirable feature. Therefore, when the spatial distribution of the points is highly uneven, distance-band weights should be avoided, since they could provide misleading impressions of (local) spatial autocorrelation. We examine some alternatives below.

#### Connectivity graph

The properties of the distance band weights can be further investigated by means of the Connectivity Graph. As before, this is invoked through the right-most button at the bottom of the weights manager.

The pattern shown in Figure 15 highlights how the connectivity divides the points into two interconnected subgraphs and two pairs of points. The different sub-networks have no connection between them. In Figure 15, the pointer is centered on location 19195, which has 32 neighbors and is located in the center of a tightly connected network. We can also identify a few locations that are only connected with their nearest neighbor, but not with any other locations. Figure 15: Connectivity graph for distance band weights

### Isolates

So far, we have used the default cut-off value for the distance band. However, the dialog is flexible enough that we can type in any value for the cut-off, or use the moveable button to drag to any value larger than the minimum. Sometimes, theoretical or policy considerations suggest a specific value for the cut-off that may be smaller than the max-min distance.

For example, say we want to use 1500 ft. as the distance band. After typing in that value in the dialog, shown in Figure 16, we proceed in the usual way to create the weights. Figure 16: Distance band set to 1500

However, a warning appears, as in Figure 17, pointing out that the specified cut-off value is smaller than the max-min distance needed to ensure that each observation has at least one neighbor. Figure 17: Isolates warning message

If we proceed and click Yes in the dialog, the properties of the new weights are listed in the weight manager, as in Figure 18. This includes the threshold value of 1500, but also shows a much sparser distribution, with %non-zero as 1.48% (compared to 6.17% for the default). In addition, the minimum number of neighbors is indicated to be 0. In other words, one of more observations do not have neighbors when a distance band of 1500 feet is used. Figure 18: Distance threshold 1500 properties

#### Isolates in the connectivity histogram

The connectivity histogram shown in Figure 19 reveals a much more compact distribution of neighbor cardinality (compared to the max-min criterion). However, it also suggests the existence of 24 isolates, i.e., observations without neighbors. This is given as a warning at the top of the histogram (highlighted in red in the figure), but can also be seen by hovering the pointer over the first bin. The status bar reveals that the range 0-1 has 24 observations. Figure 19: Isolates in connectivity histogram

When selecting the left-most bar in the histogram, we can locate the isolated points in the map, as in Figure 20 (the red points are the selected observations). The selected points are indeed locations that are far away from the other points (more than 1500 feet). Figure 20: Isolates in point map

#### Isolates in the connectivity graph

The most dramatic visualizion of the isolates is given by the Connectivity Graph, shown in Figure 21. The 24 points without an edge in the graph to another point are easily identified. In our example, the pointer is over location with unique_id 62122. The status bar indicates that this point has 0 neighbors. Figure 21: Connectivity graph for distance band set to 1500

#### How to deal with isolates

Since the isolated observations are not included in the spatial weights (in effect, the corresponding row in the spatial weights matrix consists of zeros), they are not accounted for in any spatial analysis, such as tests for spatial autocorrelation, or spatial regression. For all practical purposes, they should be removed from such analysis. However, they are fine to be included in a traditional non-spatial data analysis.

Ignoring isolates may cause problems in the calculation of spatially lagged variables, or measures of local spatial autocorrelation. By construction, the spatially lagged variable will be zero, which may suggest spurious correlations.

Alternatives where isolates are avoided by design are the K-nearest neighbor weights and contiguity weights constructed from the Thiessen polygons for the points. They are discussed next.

## K-Nearest Neighbor Weights

### Concept

As mentioned, an alternative type of distance-based spatial weights that avoids the problem of isolates are $$k$$-nearest neighbor weights. In contrast to the distance band, this is not a symmetric relation. The fact that B is the nearest neighbor to A does not imply that A is the nearest neighbor to B. There may be another point C that is actually closer to B than A. This asymmetry can cause problems in analyses that depend on the instrinsic symmetry of the weights (e.g., some algorithms to estimate spatial regression models). One solution is to replace the original weights matrix $$\mathbf{W}$$ by $$(\mathbf{W + W'})/2$$, which is symmetric by construction.6 GeoDa currently does not implement this approach.

A potential issue with $$k$$-nearest neighbor weights is the occurrence of ties, i.e., when more than one location $$j$$ has the same distance from $$i$$. A number of solutions exist to break the tie, from randomly selecting one of the $$k$$-th order neighbors, to including all of them. In GeoDa, random selection is implemented.

### Creating KNN weights

KNN weights are computed by selecting the corresponding button in Distance Weight panel of the Weights File Creation interface. The value for the Number of neighbors (k) is specified in the box shown in Figure 22. The default is 4, but in our example, we have selected 6. Figure 22: K nearest neighbor weights

The weights (saved as the file clev_sls_154_core_k6) are added to the collection contained in the weights manager. In addition, all its properties are listed, as illustrated in Figure 23. Note that the properties now include the number of neighbors (instead of the distance threshold value, as is the case of distance-band weights). Also, symmetry is set to asymmetric, which is a fundamental difference with distance-band weights. Figure 23: KNN-6 weights properties

#### Properties of KNN weights

The properties listed in the weights manager also include the mean and median number of neighbors, which of course equal k (in our example, they equal 6). The resulting weights matrix is much sparser than the distance-band weights (2.93% compared to 6.17%).

Again, we can also use the connectivity histogram and the connectivity map to inspect the neighbor characteristics of the observations. However, in this case, the histogram doesn’t make much sense, since all observations have the same number of neighbors (by construction), as shown in Figure 24. Figure 24: KNN-6 connectivity histogram

In contrast, the connectivity graph, shown in Figure 25, clearly demonstrates how each point is connected to six other points. In our example, this yields a fully connected graph instead of the collection of sub-graphs for the distance band. Figure 25: KNN-6 connectivity graph

In Figure 25, we have the pointer over observation with unique_id 9372. The status bar lists the ID values for its six nearest neighbors.

#### KNN and distance

One drawback of the k-nearest neighbor approach is that it ignores the distances involved. The first k neighbors are selected, irrespective of how near or how far they may be. This suggests a notion of distance decay that is not absolute, but relative, in the sense of intervening opportunities (e.g., you consider the two closest grocery stores, irrespective of how far they may be).

This can be illustrated in the Cleveland example by selecting an observation in the western part of the map, where the house sales are densely distributed in space. With the pointer on one of the circles in the connectivity map, we distinguish six black circles close by, as shown in Figure 26. Figure 26: KNN-6 close neighbors

By contrast, if we move to the eastern part of the data set and similarly select an observation with the pointer, the six neighbors are much farther apart, as in Figure 27. Figure 27: KNN-6 far neighbors

This relative distance effect should be kept in mind before mechanically applying a k-nearest neighbor criterion.

## Generalizing the Concept of Contiguity

In GeoDa, the concept of contiguity can be generalized to point layers by converting the latter to a tessellation, specifically Thiessen polygons. Queen or rook contiguity weights can then be created for the polygons, in the usual way.

Similarly, the concepts of distance-band weights and k-nearest neighbor weights can be generalized to polygon layers. The layers are represented by their central points and the standard distance computations are applied.

These operations can be carried out explicitly, by actually creating a separate Thiessen polygon layer or centroid point layer, and subsequently loading it into GeoDa as a new project. Alternatively, the computations happen under the hood, in the sense that it is not necessary to create a separate layer. In this way, it is possible to use the weights manager to create contiguity weights for points or distance weights for polygons directly in the Weights File Creation dialog.

We briefly consider these options.

### Contiguity-based weights for points

#### Thiessen polygons

An alternative solution to deal with the problem of the uneven distribution of neighbor cardinality for distance-band weights is to compute a measure of contiguity. This is accomplished by turning the points into Thiessen polygons. These are also referred to as Voronoi diagrams or Delaunay triangulations.7

In general terms, a Thiessen polygon is a tessellation (a way to divide an area into regular subareas) that encloses all locations that are closer to the central point than to any other point. In economic geography, this is a (simplistic) notion of a market area, in the sense that all consumers in the polygon would patronize the seller located at the central point. The polygons are constructed by combining lines perpendicular at the midpoint of a line that connects a point to its nearest neighbors. From this, the most compact polygon is created.

#### Creating Thiessen polygons from a point layer

In any point map, the tessellation is invoked as an option, by right-clicking on the map. Similar to the setup for Shape Centers, Thiessen Polygons gives a choice between simply displaying the polygons over the point layer, or saving them as a separater layer, as shown in Figure 28.

There is a third option (greyed out in the example below) that applies to situations where multiple points share the same location (e.g., appartments in the same high rise). GeoDa has an option to save the information for the duplicate points to a table.8 Figure 28: Thiessen polygon option

The map overlay is illustrated in Figure 29 for the Cleveland house sale points. Each polygon encloses exactly one point. Besides the characteristic distortions as the edge of the polygon, we notice a great discrepancy between the polygon sizes in the areas with a dense distribution of points and the areas where they are further apart. Figure 29: Thiessen polygons overlaid on point map

#### Contiguity weights for Thiessen polygons

When selecting rook or queen contiguity in the Contiguity Weight panel of the weights file creation dialog, the Thiessen polygons are constructed in the background and the contiguity criteria applied to them. For example, for our Cleveland point data, we can create a queen contiguity weights file in the standard way (e.g., as clev_sls_154_core_q). The file name subsequently shows up in the weights manager list, as illustrated in Figure 30. Figure 30: Queen contiguity for points

The descriptive statistics in the properties list as well as the associated connectivity histogram illustrate why this approach may be a useful alternative to distance-band or k-nearest neighbor weights. In Figure 31, theconnectivity histogram is shown with the statistics displayed. Figure 31: Points-based queen connectivity histogram

The histogram represents a much more symmetric and compact distribution of the neighbor cardinalities, very similar to the typical shape of the histogram for first order contiguity between polygons. The median number of neighbors is 6 and the average 5.6, with a limited spread around these values. In many instances where the point distribution is highly uneven, this approach provides a useful compromise between the distance-band and the k-nearest neighbors.

This is further illustrated by the connectivity graph for the queen contiguity, shown in Figure 32. The pointer over point with unique_id 40295 shows this location having six neighbors, compared to only one in Figure 15 for the default distance band contiguity. The more balanced structure is also reflected by the fully connected graph. Figure 32: Connectivity graph for Thiessen queen contiguity

Before moving on to the next option, we save the weights information in a Project File (e.g., clev_sls_154_core.gda), using File > Save Project.

### Distance-based weights for polygons

To illustrate the application of distance-based weights to polygons, the current project needs to be cleared and the U.S. county homicide file (natregimes) loaded, either as a shape file (natregimes.shp, without any weights information), or from a previously created project file (natregimes.gda, with the weights information).

As we have seen before, the polygon layer has a series of Shape Center options to add the centroid or mean center information to the data table, display those points on the map, or save them as a separate point layer.

In order to create distance weights for polygons, such as the U.S. counties, there is no need to explicitly save or display the centroids. The calculation happens in the background, whenever a distance option is chosen in the weights file creation dialog.

We proceed as usual, and select the Distance Weight option in the Weights File Creation dialog. With fipsno as the ID variable and Distance Band as the type of weight, the Specify bandwidth box will show a cut-off distance of 1.465776, as in Figure 33. Figure 33: Distance cut-off in decimal degrees

However, this distance cut-off is for the default setting of Euclidean Distance. For the U.S. counties, the geographic layer is provided in latitude-longitude decimal degrees (i.e., the coordinates are unprojected). Consequently, the use of a straight line Euclidean distance is inappropriate (at least, for larger distances). Instead, the great circle distance or arc distance needs to be computed. So far, we have only considered Euclidean distance (the default), but the drop down list in the weights file creation interface also includes Arc Distance (in miles or in kilometers), as shown in Figure 34. Figure 34: Arc distance option

With the Arc Distance (mi) option checked, the threshold distance becomes about 91 miles, as displayed in the dialog in Figure 35. Figure 35: Arc distance cut-off distance

Proceeding in the usual fashion (and saving the weights as natregimes_darc) adds the properties of the new weights to the list in the weights manager, shown in Figure 36.9 Note how the properties include the distance unit (mile), the points for which the distances were computed (centroids), as well as the threshold value, with the distance metric now set to arc. Figure 36: Arc distance weights properties

The resulting weights clearly demonstrate the pitfalls of using a distance-band when polygons (such as U.S. counties) are of widely varying sizes. This is similar to the issues encountered for points with different densities. From the property list in Figure 36, we see that the range of weights goes from 1 to 85. The connectivity histogram in Figure 37 illustrates the extensive range of neighbor cardinalities, very similar to what we obtained for the Cleveland data in Figure 9. Figure 37: Connectivity histogram for arc distance weights

To illustrate the problem with the max-min cut-off distance, we search in the GWT file for the pair of observations that are a distance of 90.8652 apart. As shown in Figure 38, the cut-off distance is between the centroids of fipsno 32007 (Elko county, NV) and fipsno 16083 (Twin Falls, ID). Figure 38: Max nearest neighbor distance for US counties

Selecting the observation with fipsno 32007 in the Table (using the Selection Tool) highlights this location in all the open views. With the Connectivity Map open, we find that there is only one neighbor for the selected/highlighted Elko county, NV (FIPS code 32007). Figure 39: Elko county connectivity map

Clearly, the cut-off distance has major consequences for the smaller counties east of the Mississippi. As an illustration, we select the right-most bar in the connectivity histogram, the single location that has 85 neighbors. Checking the linked location in the table reveals this selection refers to Jessamine county, KY, with fipsno 21113.10 We can now identify this county in the Connectivity Graph. As shown in Figure 40, we find that the roughly 91 mile range selects the 85 small counties that neighbor Jessamin county. Figure 40: Connectivity graph for Jessamine county, KY

In practice, policy or theoretical considerations often dictate a given distance band (e.g., commuting distance). As we have seen, we need to be cautious before we uncritically translate these criteria into distance bands. Especially when the areal units in question are of widely varying sizes, there will be problems with the distribution of the neighbor cardinalities. In addition, isolates will result when the distance is insufficiently large.

1. University of Chicago, Center for Spatial Data Science – anselin@uchicago.edu

2. The right click needs to be exactly on the rectangle, since clicking in the white legend panel will result in a set of options that does not include the color choice for the category. Also, it has to be a right click; just clicking on the box will select all the points in the map.

3. The latitude is the $$y$$ dimension, and the longitude the $$x$$ dimension, so that the traditional reference to the pair (lat, lon) actually pertains to the coordinates as (y,x) and not (x,y).

4. The nearest neighbor distance is the smallest distance from a given point to all the other points, or, the distance from a point to its nearest neighbor.

5. The selection tool with unique_id = 11359 will select the observation in the table. Through the process of linking, it will also be highlighted in the map.

6. $$\mathbf{W'}$$ is the transpose of the weights matrix $$\mathbf{W}$$, such that rows of the original matrix become columns in the transpose. Each new weight is then $$(w_{ij} + w_{ji})/2$$.

7. For a more extensive technical discussion and historical background, see, e.g., Yamada (2016).

8. Most algorithms to construct Thiessen polygons break down when multiple locations have the same coordinates. This can be the result of rounding errors in the coordinates, but also when there are several observations at the same location (e.g., with data over different time periods). To create a valid layer with Thiessen polygons, only unique coordinates can be included in the point layer.

9. In this example, the earlier project file was loaded, so that the previously created contiguity weights are already contained in the weights manager list.

10. The county is easily identified by using the Move Selected to Top option in the table.