# Distance-Band Spatial Weights

*Luc Anselin*^{1}

^{1}

*03/16/2018 (revised and updated)*

## Introduction

In this Chapter, we will continue to deal with the spatial weights functionality in GeoDa, but now we will focus on weights that use the notion of distance. Intrinsically, this is most appropriate for point layers, but we will see that it can readily be generalized to polygons as well.

We will initially use a data set with point locations of house sales for Cleveland, OH, but later return to our U.S. county Homicides data to illustrate the polygon case.

We will compute the distance between points to create distance-band weights, as well as k-nearest neighbor weights. We will examine the weights characteristics and pay particular attention to the issue of neighborless locations, or *isolates*. We also
consider generalizing the
concept of contiguity to points (using Thiessen polygons), and the notion of distance-based weights for polygons (using their centroids to compute the distances).

### Objectives

Construct distance band spatial weights

Understand the contents of a gwt weights file

Assess the characteristics of distance-based weights

Assess the effect of the max-min distance cut-off

Identify isolates

Construct k-nearest neighbor spatial weights

Create Thiessen polygons from a point layer

Construct contiguity weights for points and distance weights for polygons

Understand the use of great circle distance

#### GeoDa functions covered

- Legend > set color for category
- Preferences > Use classic yellow cross-hatching to highlight selection in maps
- Weight File Creation dialog
- distance-band weights
- k-nearest neighbor weights
- great circle distance option

- Weights Manager
- Connectivity Histogram, Map and Graph

- Map
- Thiessen Polygons option

### Getting started

To begin, we will use a data set that contains the location and sales price of 205 homes in a core area of Cleveland, OH for the fourth quarter of 2015.
The data set is included as one of the Center for Spatial Data Science example
data sets and can be downloaded from
Cleveland Home Sales (2015).
This yields a folder with four shape files with file name **clev_sls_154_core** and
the usual four file extensions (shp, shx, dbf and prj).

We get started by clearing the previous project and dropping the file **clev_sls_154_core.shp** into the **Drop files here** rectangle of the connect to data source dialog. A themeless base map is shown
in Figure 1.

#### Customizing a point map

To make this look a little nicer, we make three minor modifications. First, we add a base map using the
corresponding toolbar icon and pick **Nokia Day**. The transparency may need some adjustment to
make the points easier to see (use **Change Map Transparency** from the base map icon options menu).

Next, we change the color of the points themselves (actually, drawn as tiny circles). This is one of the options in the legend panel of the map. Right click on the green box to the left
of **(205)** and choose **Color for Category**, as in Figure 2.^{2}

In our example, we turned the circles to black, which results in the base map shown in Figure 3.

A third minor adjustment is to change the way selected and unselected points are
displayed. The default is to use transparency to distinguish them, but for points this
often does not provide a clear distinction, in the sense that the unselected points
may be hard to see. Instead, we go back to the *old* way of highlighting selected observations
in GeoDa, which is to use a different color.

We set this option in the **System** tab of the **GeoDa Preferences Setup**. As shown in Figure 4, the first item under **Maps** pertains to the way in which map selections are highlighted. The default is to have the check box **off**, but we change that so that the box is checked.

A selection of points (locations) in the map is now shown as *red*, whereas the
unselected points remain in black, as shown in Figure 5.

The points considered in our example are contained within an arbitrary rectangular selection taken from the full set of home sales for that period. We are now ready to proceed with spatial weights derived from a point layer.

## Distance-Band Weights

### Concepts

#### Distance metric

The core input into the determination of a neighbor relation for distance-based
spatial weights is a formal measure of distance, or a distance *metric*.
The most familiar special case is the *Euclidean* or straight line
distance, \(d_{ij}\), as the crow flies:
\[\begin{equation*}
d_{ij} = \sqrt{(x_{i} - x_{j})^{2} + (y_{i} - y_{j})^{2}},
\end{equation*}\]
for two points \(i\) and \(j\), with respective coordinates \((x_{i},y_{i})\) and
\((x_{j},y_{j})\).

#### Great circle distance

Euclidean inter-point distances are only meaningful when the coordinates are recorded on a plane, i.e., for projected points.

In practice, one often works with unprojected points, expressed as degrees of latitude and longitude, in which case using a straight line distance measure is inappropriate, since it ignores the curvature of the earth. This is especially the case for longer distances, such as from the East Coast to the West Coast in the U.S.

The proper distance measure in this case is the so-called
*arc distance* or *great circle distance*. This
takes the latitude and longitude in decimal degrees as input
into a conversion formula.^{3} Decimal degrees are obtained
from the degree-minute-second value as degrees +
minutes/60 + seconds/3600.

The latitude and longitude in decimal degrees are converted into radians as: \[\begin{eqnarray*} \mbox{Lat}_r &=& (\mbox{Lat}_d - 90) * \pi/180\\ \mbox{Lon}_r &=& \mbox{Lon}_d * \pi/180, \end{eqnarray*}\] where the subscripts \(d\) and \(r\) refer respectively to decimal degrees and radians, and \(\pi = 3.14159 \dots\). With \(\Delta \mbox{Lon} = \mbox{Lon}_{r(j)} - \mbox{Lon}_{r(i)}\), the expression for the arc distance is: \[\begin{eqnarray*} d_{ij} &=& \mbox{R} * \arccos [ \cos ( \Delta \mbox{Lon} ) * \sin \mbox{Lat}_{r(i)} * \sin \mbox{Lat}_{r(j)} )\\ &&+ \cos \mbox{Lat}_{r(i)} * \cos \mbox{Lat}_{r(j)} ], \end{eqnarray*}\] or, equivalently: \[\begin{eqnarray*} d_{ij} &=& \mbox{R} * \arccos [ \cos ( \Delta \mbox{Lon} ) * \cos \mbox{Lat}_{r(i)} * \cos \mbox{Lat}_{r(j)} )\\ &&+ \sin \mbox{Lat}_{r(i)} * \sin \mbox{Lat}_{r(j)} ], \end{eqnarray*}\] where R is the radius of the earth. In GeoDa, the arc distance is obtained in miles with R = 3959, and in kilometers with R = 6371.

These calculated distance values are only approximate, since the radius of the earth is taken at the equator. A more precise measure would take into account the actual latitude at which the distance is measured. In addition, the earth’s shape is much more complex than a sphere, but the approximation serves our purposes.

#### Distance-band weights

The most straightforward spatial weights matrix constructed from a
distance measure is obtained when \(i\) and \(j\) are considered neighbors whenever \(j\)
falls within a critical *distance band* from \(i\). More precisely, \(w_{ij} = 1\)
when \(d_{ij} \le \delta\), and \(w_{ij} = 0\) otherwise, where \(\delta\) is
a preset critical distance cutoff.

In order to avoid isolates (islands) that would result from too
stringent a critical distance, the distance must be chosen such
that each
location has at least one neighbor. Such a distance conforms
to a *max-min* criterion, i.e., it is the largest of the
nearest neighbor distances.^{4}

In practice, the max-min criterion often leads to too many neighbors for locations that are somewhat clustered, since the critical distance is determined by the points that are furthest apart. This problem frequently occurs when the density of the points is uneven across the data set, such as when some of the points are clustered and others more spread out. We revisit this problem in the illustrations below.

Further technical details on distance-based spatial weights are contained Chapters 3 and 4 of Anselin and Rey (2014), although the software illustrations are for an earlier GeoDa interface design.

### Creating distance-band weights

As we did for contiguity weights, we invoke the **Weights Manager** and click on the **Create** button to get the process started. In the **Weights File Creation** interface, after specifying the ID variable (**unique_id**), we focus on the right-most button, **Distance Weight**. This generates the dialog for
the three distance weight options: **Distance band**, **K-Nearest neighbors**, and **Adaptive kernel**.
In this Chapter, we focus on the first two options.

Distance band weights are initiated by selecting the **Distance band** button in the interface, as
shown in Figure 6. This is also the default option. The max-min distance (largest nearest neighbor distance) is given in the box next to
**Specify bandwidth**, in units appropriate for the projection used. In our example, these are feet.

Note the importance of the **Distance metric**, highlighted in Figure 6. Since our data is projected, it is appropriate to use **Euclidean** (straight line) distance. However, many data sets come in simple latitude-longitude, for which a great circle distance (or arc distance) must be used instead. We will revisit this shortly.

The critical distance for our point data is about 3598 feet, or roughly 0.7 miles. This is the distance that ensures that each point (house sale) has at least one neighbor. After clicking on the **Create** button and specifying a file name, such as **clev_sls_154_cored_d**
(the GWT file extension is added automatically), the new weights and their
summary properties are listed in the **Weights Manager**, as in Figure 7.

An important aspect of the metadata in the weights manager is the **threshold value**. This
information will also be included in any **Project File** that is saved. This is the only reliable way to remember which distance cut-off was used for the distance bands in the weights.

### GWT file

Distance-based weights are saved in files with a **GWT** file extension. This format,
illustrated in Figure 8, is slightly different from the GAL format used for contiguity weights. It was first introduced in SpaceStat in 1995, and later adopted by R spdep and other software packages. The header line is the same as for GAL files, but each pair of neighbors is listed, with the ID of the observation, the ID of its neighbor and the distance that separates them. This distance is currently only included for informational purposes, since GeoDa does not use the actual distance value in any statistical operations (spatial weights are also row-standardized by default).

### Weights characteristics

In the same way as for contiguity weights, we can assess the characteristics of
distance-based weights by means of the **Connectivity Histogram**, the **Connectivity
Map**, and the **Connectivity Graph**, available through the buttons at the bottom of the weights manager.

#### Connectivity histogram

The shape of the connectivity histogram for distance-band weights is typically very different from that of contiguity-based weights (as in any histogram, we can bring up descriptive statistics through the
**View > Display Statistics** option). As illustrated in Figure 9, we see a much larger range in the number of neighbors, as well as extremes, with some observations having only one neighbor, and others having 32.

We also observe these descriptive statistics in the property list shown in Figure 7. Compared to contiguity weights, the mean (12.64) and median (13.00) number of neighbors are much higher, and the matrix is also much denser (% non-zero = 6.17%).

The range in the number of neighbors is directly related to the spatial distribution of the points. Locations that are somewhat isolated will drive the determination of the largest nearest neighbor cut-off point (their nearest neighbor distance will be large), whereas dense clusters of locations will encompass many neighbors using this large cut-off distance.

#### Finding the locations separated by the max-min distance

In the Cleveland example, we can examine the GWT file to find the observation pair separated by the critical distance cut-off (3598). As shown in Figure 10, this turns out to be the observation pair with unique_id 11359 and 10014.

In the Table, a selection of observation with unique_id 11359 will result in the corresponding
location to be highlighted in red in the
point map, as indicated with the pointer in
Figure 11.^{5}

We now open the **Connectivity Map**, shown in Figure 12. Since this map is a regular themeless map, we can customize
it in the same way as the original point map by adding a base layer (Nokia day) and changing the
color category to black. As the current selection,
the point with **unique_id** 11359 will be highlighted in red. With the pointer
hovering over the selected point, the status bar of the connectivity map
will show the selected observation (11359) and its one neighbor (10114, in black), as highlighted in
Figure 12.

#### Finding the most connected observations

We can examine the effect of the large distance cut-off on more densely distributed point locations. For example, selecting the right-most bar in the **Connectivity Histogram** will highlight the two most connected observations in the map. The connectivity histogram shows
that these have 32 neighbors. The two points are highlighted in red in the map in Figure 13.

As expected, the two points in question are in the center of a dense cluster of sales transactions.
Through linking, they are also selected in the **Connectivity Map**. We can find their unique_id
values from the table (**Move Selected to Top**). For example, for the observation with unique_id=19785, we see in the connectivity map that this point has 32 neighbors, highlighted as
black circles in Figure 14.

The unequal distribution of the neighbor cardinality in distance-band weights is often an undesirable feature. Therefore, when the spatial distribution of the points is highly uneven, distance-band weights should be avoided, since they could provide misleading impressions of (local) spatial autocorrelation. We examine some alternatives below.

#### Connectivity graph

The properties of the distance band weights can be further investigated by means of the
**Connectivity Graph**. As before, this is invoked through the right-most button at
the bottom of the weights manager.

The pattern shown in Figure 15 highlights how the connectivity divides the points into two interconnected subgraphs and two pairs of points. The different sub-networks have no connection between them. In Figure 15, the pointer is centered on location 19195, which has 32 neighbors and is located in the center of a tightly connected network. We can also identify a few locations that are only connected with their nearest neighbor, but not with any other locations.

### Isolates

So far, we have used the default cut-off value for the distance band. However, the dialog is flexible enough that we can type in any value for the cut-off, or use the moveable button to drag to any value larger than the minimum. Sometimes, theoretical or policy considerations suggest a specific value for the cut-off that may be smaller than the max-min distance.

For example, say we want to use 1500 ft. as the distance band. After typing in that value in the dialog, shown in Figure 16, we proceed in the usual way to create the weights.

However, a warning appears, as in Figure 17, pointing out that the specified cut-off value is smaller than the max-min distance needed to ensure that each observation has at least one neighbor.

If we proceed and click **Yes** in the dialog, the properties of the new weights are listed in
the weight manager, as in Figure 18. This includes the **threshold value** of 1500,
but also shows a much sparser distribution, with **%non-zero** as 1.48% (compared to 6.17% for the
default). In addition, the minimum number of neighbors is indicated to be **0**. In other words,
one of more observations do not have neighbors when a distance band of 1500 feet is used.

#### Isolates in the connectivity histogram

The connectivity histogram shown in Figure 19 reveals a much more compact distribution of neighbor cardinality (compared to the max-min criterion). However, it also suggests the existence of 24 isolates, i.e., observations without neighbors. This is given as a warning at the top of the histogram (highlighted in red in the figure), but can also be seen by hovering the pointer over the first bin. The status bar reveals that the range 0-1 has 24 observations.

When selecting the left-most bar in the histogram, we can locate the isolated points in the map, as in Figure 20 (the red points are the selected observations). The selected points are indeed locations that are far away from the other points (more than 1500 feet).

#### Isolates in the connectivity graph

The most dramatic visualizion of the isolates is given by the **Connectivity Graph**, shown in
Figure 21. The 24 points without an edge in the graph to another point are easily
identified. In our example, the pointer is over location with unique_id 62122. The status bar indicates
that this point has 0 neighbors.

#### How to deal with isolates

Since the isolated observations are not included in the spatial weights (in effect, the corresponding row in the spatial weights matrix consists of zeros), they are not accounted for in any spatial analysis, such as tests for spatial autocorrelation, or spatial regression. For all practical purposes, they should be removed from such analysis. However, they are fine
to be included in a traditional *non-spatial* data analysis.

Ignoring isolates may cause problems in the calculation of spatially lagged variables, or measures of local spatial autocorrelation. By construction, the spatially lagged variable will be zero, which may suggest spurious correlations.

Alternatives where isolates are avoided by design are the K-nearest neighbor weights and contiguity weights constructed from the Thiessen polygons for the points. They are discussed next.

## K-Nearest Neighbor Weights

### Concept

As mentioned, an alternative type of distance-based spatial weights that avoids the problem
of isolates are \(k\)-nearest neighbor weights. In contrast
to the distance band, this is not a symmetric relation. The fact that B is the nearest
neighbor to A does not imply that A is the nearest neighbor to B. There may be another point
C that is actually closer to B than A. This asymmetry can cause problems in analyses that
depend on the instrinsic symmetry of the weights (e.g., some algorithms to estimate spatial
regression models). One solution is to replace the original weights matrix \(\mathbf{W}\) by
\((\mathbf{W + W'})/2\), which is symmetric by construction.^{6} GeoDa currently does not implement
this approach.

A potential issue with \(k\)-nearest neighbor weights is the occurrence of ties, i.e., when more than one location \(j\) has the same distance from \(i\). A number of solutions exist to break the tie, from randomly selecting one of the \(k\)-th order neighbors, to including all of them. In GeoDa, random selection is implemented.

### Creating KNN weights

KNN weights are computed by selecting the corresponding button in
**Distance Weight** panel of the **Weights File Creation** interface. The value for the **Number of neighbors** (k) is specified in the box shown in Figure 22. The default is 4, but in our example, we have selected **6**.

The weights (saved as the file **clev_sls_154_core_k6**) are added to the collection contained in the weights manager. In addition, all its properties are listed, as illustrated in Figure 23. Note
that the properties now include the number of neighbors (instead of the distance threshold value,
as is the case of distance-band weights). Also, **symmetry** is set to asymmetric, which is a
fundamental difference with distance-band weights.

#### Properties of KNN weights

The properties listed in the weights manager also include the mean and median number of neighbors, which of course equal k (in our example, they equal 6). The resulting weights matrix is much sparser than the distance-band weights (2.93% compared to 6.17%).

Again, we can also use the connectivity histogram and the connectivity map to inspect the neighbor characteristics of the observations. However, in this case, the histogram doesn’t make much sense, since all observations have the same number of neighbors (by construction), as shown in Figure 24.

In contrast, the connectivity graph, shown in Figure 25, clearly demonstrates how each point is connected to six other points. In our example, this yields a fully connected graph instead of the collection of sub-graphs for the distance band.

In Figure 25, we have the pointer over observation with unique_id 9372. The status bar lists the ID values for its six nearest neighbors.

#### KNN and distance

One drawback of the k-nearest neighbor approach is that it ignores the distances involved. The first k neighbors are selected, irrespective of how near or how far they may be. This suggests a notion of distance decay that is not absolute, but relative, in the sense of intervening opportunities (e.g., you consider the two closest grocery stores, irrespective of how far they may be).

This can be illustrated in the Cleveland example by selecting an observation in the western part of the map, where the house sales are densely distributed in space. With the pointer on one of the circles in the connectivity map, we distinguish six black circles close by, as shown in Figure 26.

By contrast, if we move to the eastern part of the data set and similarly select an observation with the pointer, the six neighbors are much farther apart, as in Figure 27.

This relative distance effect should be kept in mind before mechanically applying a k-nearest neighbor criterion.

## Generalizing the Concept of Contiguity

In GeoDa, the concept of contiguity can be generalized to point layers by converting the latter to a tessellation, specifically Thiessen polygons. Queen or rook contiguity weights can then be created for the polygons, in the usual way.

Similarly, the concepts of distance-band weights and k-nearest neighbor weights can be generalized to polygon layers. The layers are represented by their central points and the standard distance computations are applied.

These operations can be carried out explicitly, by actually creating a separate Thiessen polygon
layer or centroid point layer, and subsequently loading it into GeoDa as a new project.
Alternatively, the computations happen *under the hood*, in the sense that it is not
necessary to create a separate layer. In this way, it is possible to use the weights
manager to create contiguity weights for points or distance weights for polygons directly
in the **Weights File Creation** dialog.

We briefly consider these options.

### Contiguity-based weights for points

#### Thiessen polygons

An alternative solution to deal with the problem of the uneven distribution of neighbor cardinality for distance-band weights is to compute a measure of contiguity. This is accomplished by turning the points into Thiessen polygons. These are also referred to as Voronoi diagrams or Delaunay triangulations.^{7}

In general terms, a Thiessen polygon is a tessellation (a way to divide an area into regular subareas) that encloses all locations that are closer to the central point than to any other point. In economic geography, this is a (simplistic) notion of a market area, in the sense that all consumers in the polygon would patronize the seller located at the central point. The polygons are constructed by combining lines perpendicular at the midpoint of a line that connects a point to its nearest neighbors. From this, the most compact polygon is created.

#### Creating Thiessen polygons from a point layer

In any point map, the tessellation is invoked as an option, by right-clicking on the map. Similar
to the setup for **Shape Centers**, **Thiessen Polygons** gives a choice between simply
displaying the polygons over the point layer, or saving them as a separater
layer, as shown in Figure 28.

There
is a third option (greyed out in the example below) that applies to situations where
multiple points share the same location (e.g., appartments in the same high rise).
GeoDa has an option to save the information for the duplicate points to a
table.^{8}

The map overlay is illustrated in Figure 29 for the Cleveland house sale points. Each polygon encloses exactly one point. Besides the characteristic distortions as the edge of the polygon, we notice a great discrepancy between the polygon sizes in the areas with a dense distribution of points and the areas where they are further apart.

#### Contiguity weights for Thiessen polygons

When selecting rook or queen contiguity in the **Contiguity Weight** panel of the
weights file creation dialog, the Thiessen polygons are constructed in the background
and the contiguity criteria applied to them. For example, for our Cleveland point data, we can create
a queen contiguity weights file in the standard way (e.g., as **clev_sls_154_core_q**). The
file name subsequently shows up in the weights manager list, as
illustrated in Figure 30.

The descriptive statistics in the properties list as well as the associated connectivity histogram illustrate why this approach may be a useful alternative to distance-band or k-nearest neighbor weights. In Figure 31, theconnectivity histogram is shown with the statistics displayed.

The histogram represents a much more symmetric and compact distribution of the neighbor cardinalities, very similar to the typical shape of the histogram for first order contiguity between polygons. The median number of neighbors is 6 and the average 5.6, with a limited spread around these values. In many instances where the point distribution is highly uneven, this approach provides a useful compromise between the distance-band and the k-nearest neighbors.

This is further illustrated by the connectivity graph for the queen contiguity, shown in Figure 32. The pointer over point with unique_id 40295 shows this location having six neighbors, compared to only one in Figure 15 for the default distance band contiguity. The more balanced structure is also reflected by the fully connected graph.

Before moving on to the next option, we save the weights information in a **Project File**
(e.g., **clev_sls_154_core.gda**), using **File > Save Project**.

### Distance-based weights for polygons

To illustrate the application of distance-based weights to polygons, the current project
needs to be cleared and the U.S. county homicide file (**natregimes**) loaded, either
as a shape file (**natregimes.shp**, without any weights information), or from a
previously created project
file (**natregimes.gda**, with the weights information).

As we have seen before, the polygon layer has a series of **Shape Center** options to
add the centroid or mean center information to the data table, display those points on the
map, or save them as a separate point layer.

In order to create distance weights for polygons, such as the U.S. counties, there is no need to explicitly save or display the centroids. The calculation happens in the background, whenever a distance option is chosen in the weights file creation dialog.

We proceed as usual, and select the **Distance Weight** option in the **Weights File Creation**
dialog. With
**fipsno** as the ID variable and **Distance Band** as the type of weight, the **Specify bandwidth**
box will show a cut-off distance of 1.465776, as in Figure 33.

However, this distance cut-off is for the default setting of **Euclidean
Distance**. For the U.S. counties, the geographic layer is provided in latitude-longitude
decimal degrees (i.e., the coordinates are unprojected). Consequently, the use of a straight line Euclidean distance is inappropriate (at least, for larger distances). Instead, the great circle distance or arc distance needs to be computed. So far, we have only considered Euclidean distance (the default), but the drop down list in the weights file creation interface also includes **Arc Distance** (in miles or in kilometers), as shown in Figure 34.

With the **Arc Distance (mi)** option checked, the threshold distance becomes about
91 miles, as displayed in the dialog in Figure 35.

Proceeding in the usual fashion (and saving the weights as **natregimes_darc**) adds the
properties of the new weights to the list in the weights manager, shown in
Figure 36.^{9} Note how the properties include the
distance unit (**mile**), the points for which the distances were
computed (**centroids**), as well as the threshold value, with the distance metric now set to **arc**.

The resulting weights clearly demonstrate the pitfalls of using a distance-band when polygons (such as U.S. counties) are of widely varying sizes. This is similar to the issues encountered for points with different densities. From the property list in Figure 36, we see that the range of weights goes from 1 to 85. The connectivity histogram in Figure 37 illustrates the extensive range of neighbor cardinalities, very similar to what we obtained for the Cleveland data in Figure 9.

To illustrate the problem with the max-min cut-off distance, we search in the GWT file for the pair of observations that are a distance of 90.8652 apart. As shown in Figure 38, the cut-off distance is between the centroids of fipsno 32007 (Elko county, NV) and fipsno 16083 (Twin Falls, ID).

Selecting the observation with fipsno 32007 in the Table (using the **Selection Tool**) highlights
this location in all the open views. With the **Connectivity Map** open, we find that there
is only one neighbor for the selected/highlighted Elko county, NV (FIPS code 32007).

Clearly, the cut-off distance
has major consequences for the smaller counties east of the Mississippi. As an illustration, we
select the right-most bar in the connectivity histogram, the single location that has 85 neighbors.
Checking the linked location in the table reveals this selection refers to Jessamine county, KY, with fipsno 21113.^{10} We
can now identify this county in the **Connectivity Graph**. As shown in Figure 40,
we find that the roughly 91 mile range selects the 85 small counties that neighbor Jessamin county.

In practice, policy or theoretical considerations often dictate a given distance band (e.g., commuting distance). As we have seen, we need to be cautious before we uncritically translate these criteria into distance bands. Especially when the areal units in question are of widely varying sizes, there will be problems with the distribution of the neighbor cardinalities. In addition, isolates will result when the distance is insufficiently large.

## References

Anselin, Luc, and Sergio J. Rey. 2014. *Modern Spatial Econometrics in Practice, a Guide to Geoda, Geodaspace and Pysal*. Chicago, IL: GeoDa Press.

Yamada, Ikuho. 2016. “Thiessen Polygons.” *The International Encyclopedia of Geography*, 1–6.

University of Chicago, Center for Spatial Data Science – anselin@uchicago.edu↩

The right click needs to be exactly on the rectangle, since clicking in the white legend panel will result in a set of options that does not include the color choice for the category. Also, it has to be a right click; just clicking on the box will select all the points in the map.↩

The latitude is the \(y\) dimension, and the longitude the \(x\) dimension, so that the traditional reference to the pair (lat, lon) actually pertains to the coordinates as (y,x) and not (x,y).↩

The nearest neighbor distance is the smallest distance from a given point to all the other points, or, the distance from a point to its nearest neighbor.↩

The selection tool with

**unique_id**= 11359 will select the observation in the table. Through the process of linking, it will also be highlighted in the map.↩\(\mathbf{W'}\) is the

*transpose*of the weights matrix \(\mathbf{W}\), such that rows of the original matrix become columns in the transpose. Each new weight is then \((w_{ij} + w_{ji})/2\).↩For a more extensive technical discussion and historical background, see, e.g., Yamada (2016).↩

Most algorithms to construct Thiessen polygons break down when multiple locations have the same coordinates. This can be the result of rounding errors in the coordinates, but also when there are several observations at the same location (e.g., with data over different time periods). To create a valid layer with Thiessen polygons, only unique coordinates can be included in the point layer.↩

In this example, the earlier project file was loaded, so that the previously created contiguity weights are already contained in the weights manager list.↩

The county is easily identified by using the

**Move Selected to Top**option in the table.↩