8.复杂数据处理 · 关系图谱

# 8.复杂数据处理 · 关系图谱

我们在上一节中用了一个企业中的人员结构作为树形结构的例子，在这棵树中其实节点与节点之间的边（Edge）所表达的是人员之间的上下级关系。而在日常生活中我们还有非常多并不存在上下层次的关系（人际关系、事物关系等等），就好比我们的人际交往中绝大部分都是平等的。而且在树形结构中节点之间的关系严格遵守不形成环的原则，然而在我们的人际交往关系中，环形关系结构则是必然存在的。

环形关系结构

当需要使用计算机编程来实现这种结构的时候，就需要用到关系图谱（Graph）。需要注意的是，关系图谱与我们平时所听到的图片是完完全全的两回事。图片是用于表达视觉效果的二维格式（包含点阵图和矢量图），而图形是一种多维的抽象结构，主要用于表达抽象事物之间的关系。

有趣的是我们刚学习完的树形结构其实是一种特殊的关系图谱，树形结构中所规定的是一种节点之间只能有上下级的关系，且不再重复。一棵树形结构中必然存在着根节点和叶节点，但是在关系图谱中却不一定存在明确的头尾节点，它可能是由一群看似杂乱无章的节点相互连接，并且彼此的连接还有一些各式各样的差异，如连接强度不同等。

# 8.1 无向图

在 7.1.2 节中我们定义了一个用于树形结构的节点，在每一个树形节点中拥有一个来自父节点的引用、存储自身数据的空间和存储子节点引用的数组。

不过与树形结构存在区别的是，树形结构中节点之间的连接是带有方向属性的，即从父节点指向其子节点。而关系图谱存在着两种类型，无向图（Graph）和有向图（Directed Graph，或 Digraph）。其中树形结构正是一种特殊的、每两个顶点之间只有单向边的有向图。但正如上面的图所示，实际应用中同样存在着具有双向关系的有向图。所以在关系图谱中并不存在子节点这一概念，取而代之的则是相邻顶点（Adjacent Vertice）。

# 8.1.1 定义顶点

一般来说，因为关系图谱具有非常高的复杂性和不确定性，节点与节点之间的关系经常需要发生不同的变化，如果采取像上一节中树形结构中一样，使用引用来表示节点之间的关系，就会产生非常深的引用嵌套。而且 JavaScript 中的引用也无法满足我们表达节点之间关系数据的需求。

所以我们需要另辟蹊径，使用另外一种方式来描述一个关系图谱。我们在上一节的树形结构中，定义一个节点并不需要对其进行编号，因为稳定且单向的层级关系可以使得节点在树形结构中的位置会较为稳定。但这在关系图谱中情况则变得非常的复杂，而导致没办法使用这种简单的方式来完成，那么我们就需要对节点进行编号了。

class Vertex {
  constructor(id, property) {
    this.id = id
    this.property = property
  }
}

const me = new Vertex(1, [ 'Will', 'male' ])

1
2
3
4
5
6
7
8
9

那么问题来了，如果说我们在创建顶点的时候原数据并不带有能够标识每一个个体信息的标识符，就需要有一个能够产生具有唯一性的的标识符。业界通常会使用如 UUID（Universally Unique Identifier，通用唯一识别码）、数据库自增键值等方式。而我们这里可以简单地定义一个用于创建带有标识符的顶点的函数以方便我们使用。

let vertexId = 0
function newVertex(property) {
  return new Vertex(++vertexId, property)
}

const me = newVertex([ 'Will', 'male' ])

1
2
3
4
5
6
7

# 8.1.2 定义边

定义好了关系图谱中的顶点之后，就需要开始定义我们用于表达节点之间关系的边了。而因为 JavaScript 中的引用并不能满足稀疏存储和附带信息的需求，所以我们同样需要为边定义一个类型以创建一个边对象。

class Edge {
  constructor(leftId, rightId, property) {
    this.leftId = leftId
    this.rightId = rightId
    this.property = property
  }
}

const will = newVertex({
  name: 'Will',
  gender: 'male'
})
const ru = newVertex({
  name: 'Rrrruu',
  gender: 'female'
})

const relation = new Edge(will.id, ru.id, 'couple')

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

现在我们可以通过顶点对象中的 id 属性取得该顶点的标识符，但若需要使用标识符来获取顶点对象本身，就需要别的实体来完成这个需求。而这样的任务我们可以交由关系图谱本身来完成。

# 8.1.3 图

既然我们对关系图谱中的顶点对象进行编号，以便进行检索，那么我们是不是可以使用同样的方式来对边对象进行编号以便进行检索呢？答案自然是肯定的。

相比于顶点对象只会从自身的 id 被检索，边对象则会从与边相连的两个顶点被检索，所以更需要进行编号以提升检索的速度。通过对边对象的编号和关系的变换，我们可以整理出顶点与边的关系。

顶点与边的关系

整理好这些关系之后，我们就可以通过已经梳理好的逻辑来定义一个关系图谱的 JavaScript 类了。

对顶点进行编号，以优化对顶点的检索；
对边进行编号，并存储好边与顶点的关系。

class Graph {
  constructor(vertices, edges) {

    // Vertices
    this.vertexIds = []
    this.vertices = {}

    for (let i = 0; i < vertices.length; ++i) {
      const vertex = vertices[i]

      this.vertexIds.push(vertex.id)
      this.vertices[vertex.id] = vertex
    }

    const edgesWithId = edges.map(function(edge, i) {
      edge.id = i + 1
      return edge
    })
    
    // Edges
    this.edgeIds = []
    this.edges = {}
    this.edgeRelations = {}

    for (let i = 0; i < edgesWithId.length; ++i) {
      const edge = edgesWithId[i]

      this.edgeIds.push(edge.id)
      this.edges[edge.id] = edge
      
      // 初始化顶点与边的关系
      if (typeof this.edgeRelations[edge.leftId] === 'undefined') {
        this.edgeRelations[edge.leftId] = []
      }

      if (typeof this.edgeRelations[edge.rightId] === 'undefined') {
        this.edgeRelations[edge.rightId] = []
      }

      this.edgeRelations[edge.leftId].push(edge.id)
      this.edgeRelations[edge.rightId].push(edge.id)
    }

  }
}

const vertices = [
  new Vertex(1, 'A'),
  new Vertex(2, 'B'),
  new Vertex(3, 'C'),
  new Vertex(4, 'D')
]

const edges = [
  new Edge(1, 2, 1),
  new Edge(1, 3, 2),
  new Edge(2, 4, 1),
  new Edge(3, 4, 1)
]

const graph = new Graph(vertices, edges)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

graph

# 8.1.4 操作图形

完成了关系图谱的建立后，自然需要将其运用起来以配合算法来解决一些我们所面临的问题。但在实现算法之前自然不可能让算法直接涉及图对象中的内部元素，所以我们也需要先为关系图谱定义一些操作方法，如获取某一个顶点、遍历所有顶点、遍历所有边等。

# 获取某一个顶点

前面在关系图谱类中使用了 vertexIds 存储顶点的标识符和使用 vertices 来存储顶点对象。那么要获取图形中的某一个顶点，保险起见首先要确保在 vertexIds 中存在该节点标识符，否则就直接返回 null。然后再从 vertices 中获取该节点的实例对象以返回。

class Graph {
  // ...

  getVertex(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return null
    }

    return this.vertices[vertexId]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12

# 遍历顶点/边

虽然在 JavaScript 中我们默认所使用的数组都是自带有序特性的，但是在关系图谱的定义中，顶点之间并不存在顺序。所以我们自然也不会允许对图对象中的顶点进行直接的循环操作，而采用回调函数的方式进行循环，以模糊其有序性。

class Graph {
  // ...
  
  eachVertices(callbackFunc) {
    const self = this

    return self.vertexIds.forEach(function(vertexId) {
      return callbackFunc(self.vertices[vertexId])
    })
  }
  
  eachEdges(callbackFunc) {
    const self = this

    return self.edgeIds.forEach(function(edgeId) {
      return callbackFunc(self.edges[edgeId])
    })
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# 8.1.5 特征值

我们将关系图谱建立了起来以后，就需要开始对这个图进行一些操作了。首先我们前面就说到树形结构是一种特别的图形结构，那么在树形结构中的一些节点特性在图形结构中也同样适用。

# Degree

Degree 在树形结构的节点中表示的是某一个节点的子节点数量，而因为在关系图谱中的顶点并不存在“子节点”或“子顶点”的概念，取而代之的则是相邻顶点。而相邻顶点的数量就等于与该顶点相连的边的数量。那么要获取相邻边的数量则首先需要定义一个方法以传入顶点标识符并得到相邻边数组。

class Graph {
  // ...
  
  getEdgesByVertexId(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return []
    }

    if (!_.has(this.edgeRelations, vertexId)) {
      return []
    }

    const self = this

    return self.edgeRelations[vertexId].map(function(edgeId) {
      return self.edges[edgeId]
    })
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

得到相邻边后返回其长度则便是该顶点的度。

class Graph {
  // ...
  
  degree(vertexId) {
    return this.getEdgesByVertexId(vertexId).length
  }
}

1
2
3
4
5
6
7
8

# 最大的度（Max Degree）

在关系图谱中每一个顶点的度是图论算法中非常重要的计算对象。就好比使用关系图谱描述一个社交群体中人与人的相关关系，每一个人作为一个顶点时，顶点的度则代表了对应的个人社交关系，数量越多则代表该名成员在该群体中的重要性越高。显而易见，如果某一个节点的度最大，则说明他很有可能是这个群体中的核心人物。

寻找一个关系图谱中带有最大度数的顶点并不困难，只需全部先计算出所有顶点的度，然后找出最大数即可。

当然，光是找出最大的度可不能满足算法的需要。除了找出最大的度数以外，自然还需要知道是哪一个顶点具有最大的度数。所以我们需要准备两个函数，一个用于找出带有最大度数的顶点，而另外一个则用于获取其度数。

class Graph {
  // ...

  largestVertex() {
    const self = this

    const degrees = self.vertexIds.map(function(vertexId) {
      return {
        degree: self.degree(vertexId),
        id: vertexId
      }

    })

    return self.getVertex(_.maxBy(degrees, 'degree').id)
  }

  maxDegree() {
    return this.degree(this.largestVertex().id)
  }
}

const vertices = [
  new Vertex(1, 'A'),
  new Vertex(2, 'B'),
  new Vertex(3, 'C'),
  new Vertex(4, 'D'),
  new Vertex(5, 'E')
]

const edges = [
  new Edge(1, 2, 1),
  new Edge(1, 3, 2),
  new Edge(2, 4, 1),
  new Edge(3, 4, 1),
  new Edge(1, 4, 2),
  new Edge(4, 5, 3)
]

const graph = new Graph(vertices, edges)
console.log(graph.largestVertex().property) //=> D
console.log(graph.maxDegree()) //=> 4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

# 平均度（Average Degree）

在一个关系图谱中，除了最大的顶点度以外，每一个顶点的平均度数也是一个非常重要的数学特征值。在这里我们可以有一个比较巧妙的计算方式，而不需要像上面计算最大度的时候那样子先把所有的顶点度数求出来。这有一个用于计算关系图谱中平均度的公式。

$\overline{D}=\frac{2\times N_e}{N_v}$

其中 $\overline{D}$ 为一个图中的平均度，而 $N_e$ 是该图中边的数量， $N_v$ 则是顶点的数量。这个公式还是非常好理解的，因为一条边同时属于左右两个顶点，所以在计算平均度的时候首先需要将边数乘以 $2$ ，然后再除以顶点的数量即可得到平均度。

class Graph {
  // ...

  avgDegree() {
    return 2 * this.edgeIds.length / this.vertexIds.length
  }
}

const vertices = [
  new Vertex(1, 'A'),
  new Vertex(2, 'B'),
  new Vertex(3, 'C'),
  new Vertex(4, 'D'),
  new Vertex(5, 'E')
]

const edges = [
  new Edge(1, 2, 1),
  new Edge(1, 3, 2),
  new Edge(2, 4, 1),
  new Edge(3, 4, 1),
  new Edge(1, 4, 2),
  new Edge(4, 5, 3)
]

const graph = new Graph(vertices, edges)
console.log(graph.avgDegree()) //=> 2.4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

# 自环（Self-Loop）

自环是一种十分抽象的概念，其定义是在一个关系图谱内，某一个顶点存在一条边再与自己相连。这种情况我们很难使用一个现实生活中的例子来说明，但是我们可以将一个关系图谱中的每一个顶点看作是事件序列中的某一个事件，顶点间的边则表示多种情况下事件之间的连续关系。那么在这种场景下，顶点的自环就可以被解释为某一个事件的连续发生可能性。

self-loop

在多种事件顺序发生的可能性中，如果某一个事件的连续发生次数（我们可以使用边的属性值表示）越多，则表示该事件的重要性越强。这种概念在一些对用户行为进行分析的应用中十分重要。

而找到自环的边的方法也非常简单，只需要找到那些左右顶点相同的边即可。

class Graph {
  // ...
  
  loops() {
    const self = this

    return self.edgeIds
      .map(function(edgeId) {
        return self.edges[edgeId]
      })
      .filter(function(edge) {
        return edge.leftId === edge.rightId
      })
  }
}

const vertices = [
  new Vertex(1, '1'),
  new Vertex(2, '2'),
  new Vertex(3, '3')
]

const edges = [
  new Edge(1, 1, 3),
  new Edge(1, 2, 1),
  new Edge(1, 3, 1),
  new Edge(2, 3, 2)
]

const graph = new Graph(vertices, edges)
console.log(graph.loops()) //=> [ Edge{ leftId: 1, rightId: 1, property: 3 } ]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

# 8.1.6 无向图代码清单

class Vertex {
  constructor(id, property) {
    this.id = id
    this.property = property
  }
}

class Edge {
  constructor(leftId, rightId, property) {
    this.leftId = leftId
    this.rightId = rightId
    this.property = property
  }
}

let vertexId = 0
function newVertex(property) {
  return new Vertex(++vertexId, property)
}


class Graph {
  constructor(vertices, edges) {

    // Vertices
    this.vertexIds = []
    this.vertices = {}

    for (let i = 0; i < vertices.length; ++i) {
      const vertex = vertices[i]

      this.vertexIds.push(vertex.id)
      this.vertices[vertex.id] = vertex
    }

    const edgesWithId = edges.map(function(edge, i) {
      edge.id = i + 1
      return edge
    })
    
    // Edges
    this.edgeIds = []
    this.edges = {}
    this.edgeRelations = {}

    for (let i = 0; i < edgesWithId.length; ++i) {
      const edge = edgesWithId[i]

      this.edgeIds.push(edge.id)
      this.edges[edge.id] = edge
      
      if (typeof this.edgeRelations[edge.leftId] === 'undefined') {
        this.edgeRelations[edge.leftId] = []
      }

      if (typeof this.edgeRelations[edge.rightId] === 'undefined') {
        this.edgeRelations[edge.rightId] = []
      }

      this.edgeRelations[edge.leftId].push(edge.id)
      this.edgeRelations[edge.rightId].push(edge.id)
    }

  }

  getVertex(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return null
    }

    return this.vertices[vertexId]
  }

  eachVertices(callbackFunc) {
    const self = this

    return self.vertexIds.forEach(function(vertexId) {
      return callbackFunc(self.vertices[vertexId])
    })
  }

  eachEdges(callbackFunc) {
    const self = this

    return self.edgeIds.forEach(function(edgeId) {
      return callbackFunc(self.edges[edgeId])
    })
  }

  getEdgesByVertexId(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return []
    }

    if (!_.has(this.edgeRelations, vertexId)) {
      return []
    }

    const self = this

    return self.edgeRelations[vertexId].map(function(edgeId) {
      return self.edges[edgeId]
    })
  }

  degree(vertexId) {
    return this.getEdgesByVertexId(vertexId).length
  }

  largestVertex() {
    const self = this

    const degrees = self.vertexIds.map(function(vertexId) {
      return {
        degree: self.degree(vertexId),
        id: vertexId
      }

    })

    return self.getVertex(_.maxBy(degrees, 'degree').id)
  }

  maxDegree() {
    return this.degree(this.largestVertex().id)
  }

  avgDegree() {
    return 2 * this.edgeIds.length / this.vertexIds.length
  }

  loops() {
    const self = this

    return self.edgeIds
      .map(function(edgeId) {
        return self.edges[edgeId]
      })
      .filter(function(edge) {
        return edge.leftId === edge.rightId
      })
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144

# 8.2 有向图

相比于无向图，有向图则是将无向图中的边加上方向特征，即从原本的一条边即代表两个顶点共同拥有一个平等的关系，变成允许顶点之间存在单向的关系。就好像我们在人际社交中，朋友之间互相认识所以我们是平等的，但是我们也有很多我们只认识他们，他们却不认识我们的人。

# 8.2.1 有向边

因为我们前面所定义的边是不存在方向特性的，所以我们直接使用了 leftId 和 rightId 来存储与边相连的两个顶点的信息。而有向图的边是带有方向特征的，虽然我们也可以像数学一样定义一个向量的表达方式就是从左到右的（如 $\overrightarrow{AB}$ ），直接使用前面所定义的边来表示有向边。

但是出于对程序严谨性和语义的考虑，我们还是需要另外定义一个有向边类型以使用。

class DirectedEdge {
  constructor(originalId, targetId, property) {
    this.originalId = originalId
    this.targetId = targetId
    this.property = property
  }
}

1
2
3
4
5
6
7
8

# 8.2.2 有向图 Digraph

因为使用的边不再是无方向特性的边，所以之前所定义的无向图类型也不能直接当做有向图使用了。有向图和无向图最大的区别就是顶点之间从平行的关系变成了有出或入的单边关系。出代表着某一个顶点存在一个单向的关系指向另外一个顶点，而入则表示某一个顶点被另外一个顶点所指向。

那么前面使用 edgeRelations 来存储顶点与边的关系时则需要加以改动了，我们可以简单地分开 inEdgeRelations 和 outEdgeRelations 来分别存储顶点与入边、出边的关系。

class Digraph {
  constructor(vertices, edges) {

    // Vertices
    this.vertexIds = []
    this.vertices = {}

    for (let i = 0; i < vertices.length; ++i) {
      const vertex = vertices[i]

      this.vertexIds.push(vertex.id)
      this.vertices[vertex.id] = vertex
    }

    const edgesWithId = edges.map(function(edge, i) {
      edge.id = i + 1
      return edge
    })

    // Edges
    this.edgeIds = []
    this.edges = {}
    this.inEdgeRelations = {}
    this.outEdgeRelations = {}

    for (let i = 0; i < edgesWithId.length; ++i) {
      const edge = edgesWithId[i]

      this.edgeIds.push(edge.id)
      this.edges[edge.id] = edge

      if (typeof this.outEdgeRelations[edge.originalId] === 'undefined') {
        this.outEdgeRelations[edge.originalId] = []
      }

      if (typeof this.inEdgeRelations[edge.targetId] === 'undefined') {
        this.inEdgeRelations[edge.targetId] = []
      }

      this.inEdgeRelations[edge.targetId].push(edge.id)
      this.outEdgeRelations[edge.originalId].push(edge.id)
    }
    
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

完成了有向图的基本构建后，我们就可以将无向图中的一些数学特征值计算方法应用到有向图中。但其中度的概念在有向图中被分开为入度和出度，所以平均值、最大值等等都需要分别计算。

class Graph {
  // ...

  getVertex(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return null
    }

    return this.vertices[vertexId]
  }

  eachVertices(callbackFunc) {
    const self = this

    return self.vertexIds.forEach(function(vertexId) {
      return callbackFunc(self.vertices[vertexId])
    })
  }

  eachEdges(callbackFunc) {
    const self = this

    return self.edgeIds.forEach(function(edgeId) {
      return callbackFunc(self.edges[edgeId])
    })
  }

  getInEdgesByVertexId(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return []
    }

    if (!_.has(this.inEdgeRelations, vertexId)) {
      return []
    }

    const self = this

    return self.inEdgeRelations[vertexId].map(function(edgeId) {
      return self.edges[edgeId]
    })
  }

  getOutEdgesByVertexId(vertexId) {
    if (!_.includes(this.vertexIds, vertexId)) {
      return []
    }

    if (!_.has(this.outEdgeRelations, vertexId)) {
      return []
    }

    const self = this

    return self.outEdgeRelations[vertexId].map(function(edgeId) {
      return self.edges[edgeId]
    })
  }

  inDegree(vertexId) {
    return this.getInEdgesByVertexId(vertexId).length
  }

  outDegree(vertexId) {
    return this.getOutEdgesByVertexId(vertexId).length
  }

  largestInDegreeVertex() {
    const self = this

    const inDegrees = self.vertexIds.map(function(vertexId) {
      return {
        inDegree: self.inDegree(vertexId),
        id: vertexId
      }
    })

    return self.getVertex(_.maxBy(inDegrees, 'inDegree').id)
  }

  largestOutDegreeVertex() {
    const self = this

    const outDegrees = self.vertexIds.map(function(vertexId) {
      return {
        outDegree: self.outDegree(vertexId),
        id: vertexId
      }
    })

    return self.getVertex(_.maxBy(outDegrees, 'outDegree').id)
  }

  maxInDegree() {
    return this.inDegree(this.largestInDegreeVertex().id)
  }

  maxOutDegree() {
    return this.outDegree(this.largestOutDegreeVertex().id)
  }

  avgInDegree() {
    const self = this

    const totalInEdgesCount = self.vertexIds
      .map(function(vertexId) {
        if (typeof self.inEdgeRelations[vertexId] !== 'undefined') {
          return self.inEdgeRelations[vertexId]
        } else {
          return []
        }
      })
      .map(function(edges) {
        return edges.length
      })
      .reduce(function(a, b) {
        return a + b
      })

    return totalInEdgesCount / this.vertexIds.length
  }

  avgOutDegree() {
    const self = this

    const totalOutEdgesCount = self.vertexIds
      .map(function(vertexId) {
        if (typeof self.outEdgeRelations[vertexId] !== 'undefined') {
          return self.outEdgeRelations[vertexId]
        } else {
          return []
        }
      })
      .map(function(edges) {
        return edges.length
      })
      .reduce(function(a, b) {
        return a + b
      })

    return totalOutEdgesCount / this.vertexIds.length
  }

  loops() {
    const self = this

    return self.edgeIds
      .map(function(edgeId) {
        return self.edges[edgeId]
      })
      .filter(function(edge) {
        return edge.originalId === edge.targetId
      })
  }

}

有向图

如图所示，我们首先建立一个有向图以备后续使用。

const vertices = [
  new Vertex(1, 'A'),
  new Vertex(2, 'B'),
  new Vertex(3, 'C'),
  new Vertex(4, 'D'),
  new Vertex(5, 'E')
]

const edges = [
  new DirectedEdge(1, 2, 1),
  new DirectedEdge(1, 3, 2),
  new DirectedEdge(2, 4, 1),
  new DirectedEdge(3, 4, 1),
  new DirectedEdge(1, 1, 3),
  new DirectedEdge(3, 5, 4),
  new DirectedEdge(4, 5, 5)
]

const graph = new Digraph(vertices, edges)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# 8.3 有向图的最短路径

有向图的意义在于能够以抽象的方式表示一些实际生活中的事物，人际关系、地点之间路网关系等等。在心理学中有一个非常重要的理论叫做“六度隔离”（Six Degrees of Separation） (opens new window)，而假如我们将一个足够大的人际关系网络使用关系图谱的方式表示出来，那么就可以一探这个理论的究竟了。

而在交通系统中，则可以使用顶点表示地点、交叉路口，用边表示路程，而边的值则可以表示路程的长度。那么这个关系图谱则可以用于计算地点之间的路程，这也是我们所使用的导航系统中的基本原理。

要计算在一个关系图谱中两个顶点之间的最短距离，有非常多的算法用于计算，这里我们介绍一个非常直观和常用的算法——Dijkstra 算法。

Dijkstra 算法是一种适用于有向图的最短路径计算算法，它需要遍历所有的可能性之后，然后返回其中的最短路程。

假设我们使用前面所创建的有向图模型，并将每一条边的距离（边的属性值）作为计算指标，对一整条路径的总距离进行计算。

加权有向图

但由于 Dijkstra 算法涉及的逻辑十分的复杂，有兴趣的同学可以参考掘金上一篇不错的介绍文章 (opens new window)进行学习，这里本节仅提供实现代码以供参考学习。

class Digraph {
  // ...
  
  // Dijkstra's algorithm
  shortestPath(fromVertexId, toVertexId) {
    const self = this

    const preferQueue = []
    const rootNode = new Node(fromVertexId)
    const candidateTree = new Tree(rootNode)
    
    preferQueue.push(...self.getOutEdgesByVertexId(fromVertexId).map(function(edge) {
      return [ fromVertexId, edge.targetId ]
    }))

    while (preferQueue.length > 0) {
      const pair = preferQueue.shift()
      const parentVertexId = pair[0]
      const currentVertexId = pair[1]

      // Add the edge to the candidate tree
      const parentNodes = candidateTree.search(function(node) {
        return node.name === parentVertexId
      })
      const currentNode = new Node(currentVertexId)
      parentNodes.forEach(function(parentNode) {
        candidateTree.addNode(currentNode, parentNode)
      })

      if (currentVertexId === toVertexId) {
        continue
      }

      // Add the next vertex into the prefer queue
      let outEdges = self.getOutEdgesByVertexId(currentVertexId)

      if (outEdges.length <= 0) {
        continue
      }

      outEdges = outEdges.filter(function(edge) {
        return candidateTree.search(function(node) {
          return node.name === edge.targetId
        }).length === 0
      })

      preferQueue
        .push(...outEdges.map(function(edge) {
          return [ currentVertexId, edge.targetId ]
        }))
    }

    const targetNodes = candidateTree.search(function(node) {
      return node.name === toVertexId
    })

    if (targetNodes.length > 0) {
      const pathsWithDistance = targetNodes
        .map(function(node) {
          const vertexId = node.name
          const path = [ vertexId ]
          let lastNode = node

          while (lastNode.parent != null) {
            path.push(lastNode.parent.name)
            lastNode = lastNode.parent
          }

          return path.reverse()
        })
        .map(function(path) {
          const distance = path
            .map(function(vertexId, index) {
              const nextVertexId = path[index + 1]

              if (typeof nextVertexId === 'undefined') {
                return
              }

              const edge = self.getOutEdgesByVertexId(vertexId)
                .find(function(edge) {
                  return edge.targetId === nextVertexId
                })
              
              return edge
            })
            .filter(function(edge) {
              return typeof edge !== 'undefined'
            })
            .map(function(edge) {
              return edge.property
            })
            .reduce(function(distanceA, distanceB) {
              return distanceA + distanceB
            })

          return { path, distance }
        })
      
      const shortestPath = _.minBy(pathsWithDistance, 'distance')

      shortestPath.path = shortestPath.path.map(function(vertexId) {
        return self.getVertex(vertexId)
      })

      return shortestPath
    } else {
      return {
        path: [],
        distance: Infinity
      }
    }
  }
}

const vertices = [
  new Vertex(1, 'A'),
  new Vertex(2, 'B'),
  new Vertex(3, 'C'),
  new Vertex(4, 'D'),
  new Vertex(5, 'E')
]

const edges = [
  new DirectedEdge(1, 2, 1),
  new DirectedEdge(1, 3, 2),
  new DirectedEdge(2, 4, 1),
  new DirectedEdge(3, 4, 1),
  new DirectedEdge(1, 1, 3),
  new DirectedEdge(3, 5, 4),
  new DirectedEdge(4, 5, 5)
]

const graph = new Digraph(vertices, edges)
console.log(graph.shortestPath(1, 5)) //=>
// {
//   distance: 6,
//   path: [
//     Vertex{ A },
//     Vertex{ C },
//     Vertex{ E }
//   ]
// }

# 小结

在本节中，我们学习了如何构建一个没有方向特征的关系图谱，也就是无向图，来表示一些事物之间的平等关系网络；还学习了在无向图的基础上，为顶点之间的边加上方向特性，构成具有传递性的关系网络，以表示一些更为具体的事物关系；并且对一种最短路径寻路算法 Dijkstra 进行了探索。

# 习题

请自行并认真地学习 Dijkstra 算法，并思考如何对 Dijkstra 算法进行变化，使其可以应用在无向图中。
使用加权无向图构建一个你身边朋友圈的关系图谱，并使用习题 1 中所得到的 Dijkstra 算法变种，探索“六度隔离”理论在你身边朋友圈中的适用性。并且通过使用度的概念，寻找你身边朋友圈中的“核心人物”。
学习了最短路径计算算法之后，请思考如何寻找一个关系图谱中两个点之间的最长路径。

海南老脚数

Choose mode

8.复杂数据处理 · 关系图谱