路径查询 #

一、路径查询概述 #

1.1 什么是路径查询 #

路径查询是指在图中查找从一个顶点到另一个顶点的路径的过程。

text

路径查询类型：
├── 最短路径：两点间最短路径
├── 所有路径：两点间所有路径
├── 简单路径：无环路径
├── 加权路径：考虑边权重的路径
└── 路径遍历：遍历指定深度的路径

1.2 路径概念 #

text

路径概念：
├── 路径长度：边的数量
├── 路径权重：边权重之和
├── 简单路径：不重复顶点
├── 环路：起点和终点相同
└── 路径方向：有向或无向

二、Gremlin路径查询 #

2.1 基本路径查询 #

gremlin

// 获取遍历路径
g.V('1').out('knows').path()

// 路径属性
g.V('1').out('knows').path().by('name')

// 多跳路径
g.V('1').out('knows').out('knows').path().by('name')

// 带边的路径
g.V('1').outE('knows').inV().path().by('name').by('weight')

2.2 简单路径 #

gremlin

// 无环路径
g.V('1').repeat(out().simplePath()).times(3)

// 排除特定顶点
g.V('1').out('knows').simplePath().where(without('1'))

2.3 最短路径 #

gremlin

// 最短路径（BFS）
g.V('1').repeat(out().simplePath()).until(hasId('2')).limit(1).path()

// 最短路径带属性
g.V().has('name', 'Tom').
  repeat(out().simplePath()).until(has('name', 'Jerry')).
  limit(1).
  path().by('name')

// 指定深度的最短路径
g.V('1').repeat(out().simplePath()).until(hasId('2').or().loops().is(gt(5))).
  hasId('2').
  path()

2.4 所有路径 #

gremlin

// 所有路径（限制深度）
g.V('1').repeat(out().simplePath()).times(3).emit(hasId('2')).path()

// 所有路径（限制数量）
g.V('1').repeat(out().simplePath()).until(hasId('2')).limit(10).path()

// 所有路径带深度限制
g.V('1').repeat(out().simplePath()).until(hasId('2').or().loops().is(gt(5))).
  hasId('2').
  path()

2.5 加权路径 #

gremlin

// 使用sack计算路径权重
g.withSack(0).V('1').
  repeat(outE().sack(sum).by('weight').inV().simplePath()).
  until(hasId('2')).
  limit(1).
  sack()

// 最小权重路径
g.withSack(0).V('1').
  repeat(outE().sack(sum).by('weight').inV().simplePath()).
  until(hasId('2')).
  order().by(sack()).
  limit(1).
  path().by('name').sack()

2.6 路径过滤 #

gremlin

// 按路径长度过滤
g.V('1').out('knows').path().filter(count(local).is(eq(3)))

// 按路径内容过滤
g.V('1').out('knows').path().filter(unfold().has('status', 'active'))

// 排除特定路径
g.V('1').out('knows').path().filter(unfold().has('name', without('Jerry')))

三、SPARQL路径查询 #

3.1 属性路径 #

sparql

PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

# 序列路径
SELECT ?friend
WHERE {
  ex:Tom foaf:knows/foaf:knows ?friend .
}

# 零次或多次
SELECT ?ancestor
WHERE {
  ex:Tom ex:parent* ?ancestor .
}

# 一次或多次
SELECT ?descendant
WHERE {
  ex:Tom ex:parent+ ?descendant .
}

3.2 路径操作符 #

sparql

# 路径选择（OR）
SELECT ?related
WHERE {
  ex:Tom (foaf:knows|foaf:follows) ?related .
}

# 反向路径
SELECT ?follower
WHERE {
  ?follower ^foaf:knows ex:Tom .
}

# 路径组合
SELECT ?friend
WHERE {
  ex:Tom (foaf:knows/ex:worksAt/^ex:worksAt) ?friend .
}

3.3 路径长度限制 #

sparql

# 1到3跳
SELECT ?friend
WHERE {
  ex:Tom (foaf:knows{1,3}) ?friend .
}

# 精确跳数
SELECT ?friend
WHERE {
  ex:Tom (foaf:knows{2}) ?friend .
}

# 至少N跳
SELECT ?friend
WHERE {
  ex:Tom (foaf:knows{2,}) ?friend .
}

四、实际应用示例 #

4.1 社交网络路径 #

gremlin

// 查找两个用户之间的最短路径
g.V().has('userId', 'user_001').
  repeat(both('follows').simplePath()).
  until(has('userId', 'user_002')).
  limit(1).
  path().by('name')

// 查找N度人脉
g.V().has('userId', 'user_001').
  repeat(both('follows').simplePath()).times(3).
  emit().
  dedup().
  path().by('name')

// 查找共同好友路径
g.V().has('userId', 'user_001').as('a').
  out('follows').as('mutual').
  in('follows').as('b').
  where('b', has('userId', 'user_002')).
  select('a', 'mutual', 'b').by('name')

4.2 组织结构路径 #

gremlin

// 查找上级链
g.V().has('userId', 'emp_001').
  repeat(in('manages')).emit().path().by('name')

// 查找下属树
g.V().has('userId', 'ceo').
  repeat(out('manages')).emit().path().by('name')

// 查找跨部门协作路径
g.V().has('userId', 'emp_001').
  repeat(out('works_with').simplePath()).
  until(has('department', 'Engineering')).
  limit(5).
  path().by('name')

4.3 知识图谱路径 #

gremlin

// 查找概念层级路径
g.V().has('name', 'Graph Database').
  repeat(in('subClassOf')).emit().path().by('name')

// 查找相关概念路径
g.V().has('name', 'Neptune').
  repeat(out('related').simplePath()).times(2).
  emit().path().by('name')

// 查找推理路径
g.V().has('name', 'Tom').
  repeat(out('hasProperty').out('implies').simplePath()).
  until(has('name', 'Expert')).
  path().by('name')

4.4 欺诈检测路径 #

gremlin

// 查找可疑交易路径
g.V().has('transactionId', 'tx_001').
  repeat(both('transferred_to').simplePath()).times(3).
  emit(has('flagged', true)).
  path().by('transactionId')

// 查找环路（可能的洗钱）
g.V().hasLabel('account').
  repeat(out('transferred_to')).times(5).
  emit(cyclicPath()).
  path().by('accountId')

// 查找高风险路径
g.withSack(0).V().has('transactionId', 'tx_001').
  repeat(outE('transferred_to').sack(sum).by('riskScore').inV().simplePath()).
  until(sack().is(gt(100))).
  limit(5).
  path().by('transactionId').sack()

五、路径查询优化 #

5.1 性能优化技巧 #

gremlin

// 使用simplePath避免环路
g.V('1').repeat(out().simplePath()).times(3)  // 好
g.V('1').repeat(out()).times(3)  // 可能产生环路

// 限制遍历深度
g.V('1').repeat(out()).times(3)  // 好
g.V('1').repeat(out()).until(...)  // 注意条件

// 使用limit限制结果
g.V('1').repeat(out()).until(hasId('2')).limit(1).path()  // 好

// 使用索引过滤
g.V().has('userId', 'user_001').repeat(out()).times(3)  // 好

5.2 查询策略 #

text

查询策略：
├── 从小集合开始
├── 使用双向搜索
├── 限制路径深度
├── 使用简单路径
└── 分批处理大量结果

六、路径算法 #

6.1 BFS vs DFS #

text

遍历策略：
├── BFS（广度优先）
│   ├── 适合最短路径
│   └── 内存消耗较大
└── DFS（深度优先）
    ├── 适合深度遍历
    └── 可能栈溢出

6.2 Neptune默认使用BFS #

gremlin

// Neptune默认使用BFS
g.V('1').repeat(out()).times(3)

// 最短路径使用BFS
g.V('1').repeat(out().simplePath()).until(hasId('2')).limit(1).path()

七、总结 #

路径查询要点：

操作	Gremlin	SPARQL
路径获取	path()	属性路径
简单路径	simplePath()	-
最短路径	repeat().until().limit(1)	属性路径
加权路径	sack()	-
路径过滤	filter()	FILTER

最佳实践：

使用simplePath避免环路
限制遍历深度
使用limit限制结果
选择合适的遍历策略
使用索引加速查询

下一步，让我们学习图算法！