作为一名长期使用 FlinkSQL 进行实时数据处理的开发者,我经常需要深入理解 SQL 语句背后的执行逻辑。今天我将分享如何通过 EXPLAIN PLAN 来剖析 FlinkSQL 的底层执行机制,以及哪些 SQL 操作会产生状态——这对于系统稳定性至关重要。
FlinkSQL 的执行流程可以概括为以下几个关键阶段:
这个过程中,最值得关注的是物理计划阶段,因为它直接决定了作业的执行效率和资源消耗。
让我们通过一个实际例子来演示如何查看和分析执行计划:
java复制TableEnvironment tEnv = StreamTableEnvironment.create(env);
tEnv.executeSql("CREATE TABLE events (" +
" userId STRING," +
" amount BIGINT," +
" ts TIMESTAMP(3)," +
" WATERMARK FOR ts AS ts - INTERVAL '5' SECOND" +
") WITH (" +
" 'connector' = 'kafka'," +
" 'topic' = 'events'," +
" 'properties.bootstrap.servers' = 'kafka:9092'," +
" 'format' = 'json'" +
")");
String plan = tEnv.explainSql(
"SELECT userId, COUNT(*) AS cnt " +
"FROM TABLE(TUMBLE(TABLE events, DESCRIPTOR(ts), INTERVAL '1' MINUTES)) " +
"GROUP BY window_start, window_end, userId"
);
System.out.println(plan);
执行计划输出通常包含三个部分:
让我们深入理解常见算子及其状态特性:
| 算子名 | 含义 | 是否有状态 | 状态说明 |
|---|---|---|---|
| TableSourceScan | 读取数据源 | ✅ | 保存 Kafka offset 等消费位置 |
| WatermarkAssigner | 生成 Watermark | ❌ | 无状态操作 |
| Calc | SELECT/WHERE 投影过滤 | ❌ | 简单转换无状态 |
| Exchange | 数据重分区(对应 keyBy) | ❌ | 仅数据分发 |
| GroupWindowAggregate | 窗口聚合 | ✅ | 保存窗口内聚合状态 |
| GroupAggregate | 无界 GROUP BY | ✅ | 危险!状态会无限增长 |
| Join | 双流 Join | ✅ | 需缓存两侧数据 |
| LookupJoin | 维度表关联 | ❌ | 实时查询外部系统 |
| Deduplicate | ROW_NUMBER 去重 | ✅ | 保存已处理记录 |
| OverAggregate | OVER 窗口 | ✅ | 保存窗口范围内的数据 |
需要特别关注的状态算子:
GroupAggregate:无窗口的 GROUP BY 会产生无限增长的状态,必须设置状态 TTL 或改用窗口聚合
Join:双流 Join 需要缓存两侧数据,状态大小取决于关联时间范围和数据量
OverAggregate:OVER 窗口会保留指定行数或时间范围的数据,需合理设置窗口大小
sql复制-- 1. 无界 GROUP BY - 状态会无限增长!
SELECT userId, COUNT(*) FROM events GROUP BY userId;
-- 2. 双流 JOIN - 两侧都要缓存数据
SELECT a.*, b.name
FROM events a JOIN users b ON a.userId = b.userId;
-- 3. OVER 窗口 - 保留窗口范围内的数据
SELECT userId,
SUM(amount) OVER (PARTITION BY userId ORDER BY ts ROWS BETWEEN 10 PRECEDING AND CURRENT ROW)
FROM events;
-- 4. 去重操作 - 需要记录已处理记录
SELECT userId, ts FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY userId ORDER BY ts DESC) AS rn FROM events
) WHERE rn = 1;
sql复制-- 1. 简单投影过滤
SELECT userId, amount * 2 AS double_amount FROM events WHERE amount > 100;
-- 2. 合流操作
SELECT * FROM events_a UNION ALL SELECT * FROM events_b;
-- 3. 窗口聚合(窗口关闭后状态清除)
SELECT userId, COUNT(*) FROM TABLE(TUMBLE(TABLE events, DESCRIPTOR(ts), INTERVAL '1' MINUTES))
GROUP BY window_start, window_end, userId;
-- 4. 维度表关联(实时查询)
SELECT e.userId, d.name
FROM events e JOIN dim_user FOR SYSTEM_TIME AS OF e.ts AS d ON e.userId = d.userId;
sql复制-- 不推荐:状态无限增长
SELECT userId, SUM(amount) FROM events GROUP BY userId;
-- 推荐:每小时窗口聚合
SELECT userId, SUM(amount)
FROM TABLE(TUMBLE(TABLE events, DESCRIPTOR(ts), INTERVAL '1' HOURS))
GROUP BY window_start, window_end, userId;
sql复制-- 不推荐:双流 Join 有状态
SELECT e.*, u.name FROM events e JOIN users u ON e.userId = u.userId;
-- 推荐:Lookup Join 无状态
SELECT e.*, u.name
FROM events e
JOIN dim_user FOR SYSTEM_TIME AS OF e.ts AS u
ON e.userId = u.userId;
对于必须使用无界状态的操作,务必配置状态 TTL:
java复制StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.hours(24))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build();
env.getConfig().setStateTtlConfig(ttlConfig);
分析 EXPLAIN PLAN 的三步法:
定位 Exchange 算子:了解数据在何处进行分区(keyBy),这对性能调优至关重要
识别状态算子:查找包含 Aggregate/Join/Deduplicate 的算子,评估状态大小
警惕 GroupAggregate:无窗口的聚合会导致状态无限增长,必须特别处理
在实际项目中,我通常会先通过 EXPLAIN 分析 SQL 的状态使用情况,再决定是否需要进行优化或增加状态 TTL 配置。这种方法帮助我避免了许多潜在的生产环境问题。