# GIS 空间分析与应用建模

1. 矢量数据空间分析
2. 栅格数据空间分析
3. 三维数据空间分析

## 空间数据统计分析

### 一般统计分析

GIS 属性数据的一般统计分析

1. 先对数据进行【描述性统计分析】，以发现其内在规律
2. 再选择进一步分析的方法

1. 频数
2. 频率
3. 频率分布图
4. 频率直方图

1. 平均值
• 算数：$$\frac{1}{n}\sum_{i=1}^{n} a_i.$$
• 加权：$$\int_{-\infty}^{\infty} xf(x)\,dx$$
• 调和平均数
• 加权调和平均数
• 几何平均数
2. 中位数
• the median satisfies: $$\operatorname{P}(X\leq m) = \operatorname{P}(X\geq m)=\int_{-\infty}^m f(x)\, dx=\frac{1}{2}.$$
3. 众数

1. 方差
• mean square error: $\operatorname{MSE}(\overline{X})=\operatorname{E}((\overline{X}-\mu)^2)=\left(\frac{\sigma}{\sqrt{n}}\right)^2= \frac{\sigma^2}{n}$ where $$\sigma^2$$ is the population variance.
2. 标准差
3. 极差
4. 离差
5. 平均离差
• mean absolute deviation: $$\frac{1}{n}\sum_{i=1}^n |x_i-m(X)|.$$
6. 离差平方和
7. 变差系数

1. 偏度（skewness）
• skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or even undefined.
• The skewness is also sometimes denoted $$\operatorname{Skew}[X]$$. The formula expressing skewness in terms of the non-central moment $$\operatorname E[X^3]$$ can be expressed by expanding the previous formula, \begin{align} \gamma_1 &= \operatorname{E}\left[\left(\frac{X-\mu}{\sigma}\right)^3 \right] \\ & = \frac{\operatorname{E}[X^3] - 3\mu\operatorname E[X^2] + 3\mu^2\operatorname E[X] - \mu^3}{\sigma^3}\\ &= \frac{\operatorname{E}[X^3] - 3\mu(\operatorname E[X^2] -\mu\operatorname E[X]) - \mu^3}{\sigma^3}\\ &= \frac{\operatorname{E}[X^3] - 3\mu\sigma^2 - \mu^3}{\sigma^3}. \end{align}
2. 峰度（Peakness/Kurtosis [kɝ'tosɪs]）：$$\beta=\frac{V_4}{\sigma^4}=\frac{\frac{\sum(X_\bar{X})^4f}{\sum f}}{\sigma^4}$$

1. 柱状图
2. 扇形图
3. 直方图
4. 折线图
5. 散点图

### 探索性空间数据分析

“让数据说话”

1. 计算 EDA 方法

包括从简单的统计计算到高级的用于探索分析多变量数据集中模式的多元统计分析方法。

2. 图形 EDA 方法

可视化的探索数据分析。

常用的图形方法有

• 直方图(histogram)

• 茎叶图(stem leaf)

 4 | 4 6 7 9
5 |
6 | 3 4 6 8 8
7 | 2 2 5 6
8 | 1 4 8
9 |
10 | 6
key: 6|3=63
leaf unit: 1.0
stem unit: 10.0
• 箱线图(box plot)

• The “interquartile range”, abbreviated “IQR”, is just the width of the box in the box-and-whisker plot. That is, IQR = Q3 – Q1. The IQR can be used as a measure of how spread-out the values are. Statistics assumes that your values are clustered around some central value. The IQR tells how spread out the “middle” values are; it can also be used to tell when some of the other values are “too far” from the central value. These “too far away” points are called “outliers”, because they “lie outside” the range in which we expect them.

• (Why one and a half times the width of the box? Why does that particular value demark the difference between “acceptable” and “unacceptable” values? Because, when John Tukey was inventing the box-and-whisker plot in 1977 to display these values, he picked 1.5×IQR as the demarkation line for outliers. This has worked well, so we’ve continued using that value ever since.)

• Adjusted box plots are intended for skew distributions. They rely on the medcouple（？？） statistic of skewness. For a medcouple value of MC, the lengths of the upper and lower whiskers are respectively defined to be $1.5 \times IQR \times e^{3 MC}, ~\qquad~ 1.5 \times IQR \times e^{-4 MC} \text{if} MC \geq 0$ and $1.5 \times IQR \times e^{4 MC}, ~\quad~ 1.5 \times IQR \times e^{-3 MC} \text{if} MC \leq 0.$ Observe that for symmetrical distributions, the medcouple will be zero, and this reduces to Tukey’s boxplot with equal whisker lengths of $$1.5 \times IQR$$ for both whiskers.

• 散点图(scatter plot)

• Given a set of variables X1, X2, … , Xk, the scatter plot matrix contains all the pairwise scatter plots of the variables on a single page in a matrix format. That is, if there are k variables, the scatter plot matrix will have k rows and k columns and the ith row and jth column of this matrix is a plot of Xi versus Xj.

• Matlab 的 plotmatrix 函数

• plotmatrix(X,Y) creates a matrix of subaxes containing scatter plots of the columns of X against the columns of Y. If X is p-by-n and Y is p-by-m, then plotmatrix produces an n-by-m matrix of subaxes.
• 平行坐标图(parallel coordinate plot)

ESDA = EDA + SDA

ESDA：

1. 概括空间数据的性质;

2. 探索空间数据中的模式;

3. 产生和地理数据相关的假设;

4. 在地图上识别异常数据的分布位置;

5. 发现是否存在热点区域(hotspots)。

地图能够定位案例及其空间关系，并能在分析、检验和表示模型的结果中发挥重要作用。