1. 图结构基础认知
第一次接触图结构时,我被地铁线路图点醒了——站点是顶点,轨道是边,换乘站就是度超过2的特殊节点。这种非线性数据结构由顶点集合V和边集合E组成,远比线性表复杂。在Java中我们通常用邻接矩阵或邻接表实现,前者适合稠密图(空间换时间),后者适合稀疏图(时间换空间)。
关键认知:图分为有向图和无向图,带权图和不带权图。社交网络是无向图,网页链接是有向图,导航路线是带权图。
邻接矩阵用二维数组实现,matrix[i][j]=1表示顶点i到j有边。我在初期常犯的错是混淆行列顺序,后来养成习惯:先行后列对应"从...到..."。邻接表更节省空间,用链表数组存储,每个顶点维护一个邻居列表。实际开发中,我推荐用Map<Integer, List<Integer>>实现,比数组更灵活。
2. 图的存储结构实现细节
2.1 邻接矩阵实战
java复制class Graph {
private int[][] matrix;
private int vertexCount;
public Graph(int vertexCount) {
this.vertexCount = vertexCount;
matrix = new int[vertexCount][vertexCount];
}
public void addEdge(int from, int to) {
matrix[from][to] = 1;
// 无向图需要对称赋值
matrix[to][from] = 1;
}
public void print() {
for (int i = 0; i < vertexCount; i++) {
System.out.print(i + " -> ");
for (int j = 0; j < vertexCount; j++) {
if (matrix[i][j] == 1) {
System.out.print(j + " ");
}
}
System.out.println();
}
}
}
这个基础实现有几个优化点:
- 改用boolean矩阵节省空间
- 带权图时用Integer代替int,null表示无边
- 动态扩容机制(实际项目常用)
2.2 邻接表优化方案
java复制class Graph {
private Map<Integer, List<Integer>> adjList;
public Graph() {
adjList = new HashMap<>();
}
public void addVertex(int vertex) {
adjList.putIfAbsent(vertex, new ArrayList<>());
}
public void addEdge(int from, int to) {
adjList.get(from).add(to);
// 无向图需要双向添加
adjList.get(to).add(from);
}
public List<Integer> getNeighbors(int vertex) {
return adjList.getOrDefault(vertex, Collections.emptyList());
}
}
实际项目中我常用Guava的Multimap替代手动维护List:
java复制Multimap<Integer, Integer> graph = ArrayListMultimap.create();
graph.put(1, 2);
graph.put(1, 3);
3. 图的遍历算法精讲
3.1 深度优先搜索(DFS)实战
DFS就像走迷宫时右手扶墙策略,我用递归实现最直观:
java复制void dfs(int node, boolean[] visited, Graph graph) {
visited[node] = true;
System.out.print(node + " ");
for (int neighbor : graph.getNeighbors(node)) {
if (!visited[neighbor]) {
dfs(neighbor, visited, graph);
}
}
}
但实际项目必须用迭代栈避免栈溢出:
java复制void dfsIterative(int start, Graph graph) {
Stack<Integer> stack = new Stack<>();
boolean[] visited = new boolean[graph.size()];
stack.push(start);
visited[start] = true;
while (!stack.isEmpty()) {
int current = stack.pop();
System.out.print(current + " ");
for (int neighbor : graph.getNeighbors(current)) {
if (!visited[neighbor]) {
visited[neighbor] = true;
stack.push(neighbor);
}
}
}
}
踩坑记录:邻接表遍历顺序与插入顺序相反,需要保持顺序时应改用LinkedList或Collections.reverse()
3.2 广度优先搜索(BFS)实现
BFS像水波纹扩散,必须用队列实现:
java复制void bfs(int start, Graph graph) {
Queue<Integer> queue = new LinkedList<>();
boolean[] visited = new boolean[graph.size()];
queue.offer(start);
visited[start] = true;
while (!queue.isEmpty()) {
int current = queue.poll();
System.out.print(current + " ");
for (int neighbor : graph.getNeighbors(current)) {
if (!visited[neighbor]) {
visited[neighbor] = true;
queue.offer(neighbor);
}
}
}
}
我在社交网络项目中用BFS实现三度人脉推荐,关键优化点:
- 记录层级深度
- 使用双端队列控制遍历方向
- 提前终止条件(如找到目标节点)
4. 最短路径算法对比
4.1 Dijkstra算法实现
带权图最短路径经典解法,我用优先队列优化:
java复制void dijkstra(int start, WeightedGraph graph) {
PriorityQueue<Node> pq = new PriorityQueue<>();
int[] dist = new int[graph.size()];
Arrays.fill(dist, Integer.MAX_VALUE);
pq.offer(new Node(start, 0));
dist[start] = 0;
while (!pq.isEmpty()) {
Node current = pq.poll();
for (Edge edge : graph.getEdges(current.id)) {
int newDist = dist[current.id] + edge.weight;
if (newDist < dist[edge.to]) {
dist[edge.to] = newDist;
pq.offer(new Node(edge.to, newDist));
}
}
}
}
class Node implements Comparable<Node> {
int id, distance;
// 省略构造方法和compareTo
}
重要限制:不能处理负权边!我曾因此导致导航系统计算出错
4.2 Floyd-Warshall动态规划解法
全源最短路径算法,三重循环暴力但有效:
java复制void floydWarshall(int[][] graph) {
int V = graph.length;
int[][] dist = new int[V][V];
// 初始化
for (int i = 0; i < V; i++) {
for (int j = 0; j < V; j++) {
dist[i][j] = graph[i][j];
}
}
// 动态规划核心
for (int k = 0; k < V; k++) {
for (int i = 0; i < V; i++) {
for (int j = 0; j < V; j++) {
if (dist[i][k] + dist[k][j] < dist[i][j]) {
dist[i][j] = dist[i][k] + dist[k][j];
}
}
}
}
}
实际应用时要配合路径回溯矩阵,我曾用这个算法优化过物流配送路线。
5. 最小生成树实战
5.1 Prim算法实现
贪心策略,逐步扩展树:
java复制void primMST(int[][] graph) {
int V = graph.length;
int[] parent = new int[V];
int[] key = new int[V];
boolean[] mstSet = new boolean[V];
Arrays.fill(key, Integer.MAX_VALUE);
key[0] = 0;
parent[0] = -1;
for (int count = 0; count < V - 1; count++) {
int u = minKey(key, mstSet);
mstSet[u] = true;
for (int v = 0; v < V; v++) {
if (graph[u][v] != 0 && !mstSet[v] && graph[u][v] < key[v]) {
parent[v] = u;
key[v] = graph[u][v];
}
}
}
printMST(parent, graph);
}
5.2 Kruskal算法并查集优化
边排序+并查集检测环:
java复制void kruskalMST(Graph graph) {
Edge[] result = new Edge[graph.V];
int e = 0;
Arrays.sort(graph.edges);
DisjointSet ds = new DisjointSet(graph.V);
for (Edge edge : graph.edges) {
if (e == graph.V - 1) break;
int x = ds.find(edge.src);
int y = ds.find(edge.dest);
if (x != y) {
result[e++] = edge;
ds.union(x, y);
}
}
printMST(result);
}
在电网布线项目中,Kruskal表现优于Prim,因为边数远小于顶点平方。
6. 拓扑排序应用场景
有向无环图(DAG)的线性排序,我用在编译器的依赖解析:
java复制List<Integer> topologicalSort(Graph graph) {
Stack<Integer> stack = new Stack<>();
boolean[] visited = new boolean[graph.V];
for (int i = 0; i < graph.V; i++) {
if (!visited[i]) {
topologicalSortUtil(i, visited, stack, graph);
}
}
List<Integer> result = new ArrayList<>();
while (!stack.isEmpty()) {
result.add(stack.pop());
}
return result;
}
void topologicalSortUtil(int v, boolean[] visited, Stack<Integer> stack, Graph graph) {
visited[v] = true;
for (int neighbor : graph.adj[v]) {
if (!visited[neighbor]) {
topologicalSortUtil(neighbor, visited, stack, graph);
}
}
stack.push(v);
}
实际项目中的改进:
- 增加环检测(递归栈标记)
- 并行化处理(worker模型)
- 增量更新(局部重排序)
7. 性能优化与工程实践
7.1 内存优化技巧
-
稀疏图用CSR格式:
- 顶点指针数组
- 邻居索引数组
- 边权值数组(可选)
-
位图压缩:当顶点数>100万时,用BitSet代替boolean数组
-
对象池复用:频繁创建的Edge对象通过池化管理
7.2 多线程处理方案
我在社交网络分析中的实践:
java复制ExecutorService executor = Executors.newFixedThreadPool(8);
List<Future<Integer>> futures = new ArrayList<>();
for (int partition = 0; partition < 8; partition++) {
final int p = partition;
futures.add(executor.submit(() -> {
return processPartition(graph, p);
}));
}
int total = 0;
for (Future<Integer> f : futures) {
total += f.get();
}
关键点:
- 按顶点ID哈希分区
- 写时复制避免锁竞争
- 结果合并策略
7.3 常见问题排查
- 栈溢出:DFS递归过深改用显式栈
- 负权环:Bellman-Ford算法检测
- 并发修改:使用ConcurrentHashMap或CopyOnWriteArrayList
- 内存不足:分块处理或使用磁盘存储
8. 实际项目案例
8.1 社交网络分析
用图计算用户影响力:
java复制Map<Integer, Double> calculatePageRank(Graph graph, int iterations) {
Map<Integer, Double> pr = new HashMap<>();
double initialValue = 1.0 / graph.size();
// 初始化
for (int node : graph.nodes()) {
pr.put(node, initialValue);
}
// 迭代计算
for (int i = 0; i < iterations; i++) {
Map<Integer, Double> newPr = new HashMap<>();
double danglingFactor = 0.0;
// 收集悬挂节点贡献
for (int node : graph.nodes()) {
if (graph.outDegree(node) == 0) {
danglingFactor += pr.get(node);
}
}
danglingFactor /= graph.size();
// 计算新PR值
for (int node : graph.nodes()) {
double sum = 0.0;
for (int inNode : graph.inNeighbors(node)) {
sum += pr.get(inNode) / graph.outDegree(inNode);
}
newPr.put(node, 0.15 / graph.size() + 0.85 * (sum + danglingFactor));
}
pr = newPr;
}
return pr;
}
8.2 推荐系统应用
基于图的协同过滤:
- 构建用户-物品二分图
- 计算Personalized PageRank
- 取TopK未交互物品推荐
java复制List<Integer> recommendItems(int userId, Graph userItemGraph, int topK) {
Map<Integer, Double> scores = personalizedPageRank(userId, userItemGraph);
return scores.entrySet().stream()
.filter(e -> e.getKey() > 10000) // 假设物品ID>10000
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.limit(topK)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
}
优化方向:
- 引入时间衰减因子
- 结合内容特征
- 实时增量更新
9. 算法可视化工具推荐
-
GraphViz:DOT语言绘制专业图表
dot复制digraph G { A -> B B -> C C -> A } -
JGraphT:Java图论库,含可视化组件
java复制Graph<Integer, DefaultEdge> g = new DefaultDirectedGraph<>(DefaultEdge.class); VisualizationImageServer<Integer, DefaultEdge> v = new VisualizationImageServer<>(new CircleLayoutAlgorithm<>(g), new Dimension(600, 400)); -
Gephi:开源网络分析软件,支持力导向布局
10. 进阶学习路线
-
高级算法:
- 最大流问题(Ford-Fulkerson)
- 二分图匹配(匈牙利算法)
- 强连通分量(Kosaraju)
-
图数据库:
- Neo4j Cypher查询
- JanusGraph分布式方案
- TigerGraph并行计算
-
图计算框架:
- Apache Giraph
- GraphX (Spark)
- Pregel模型
-
学术前沿:
- 图神经网络(GNN)
- 动态图处理
- 子图匹配优化
我在实际项目中最大的体会是:图算法的选择往往比实现更重要。曾经为了优化0.5%的性能花费两周时间,后来发现换种算法直接提升20倍。理解问题本质,选择合适的数据结构和算法,比盲目优化更重要。