1. PostgreSQL DBA日常运维的核心SQL工具箱
作为管理过上百个PostgreSQL集群的老DBA,我深刻体会到:数据库运维效率的差距,往往体现在对关键SQL的掌握程度上。今天分享的这些脚本不是教科书里的标准答案,而是经过生产环境反复验证的实战利器。每个SQL都附带使用场景和避坑指南,建议收藏备用。
2. 数据库健康状态诊断SQL
2.1 连接数监控与问题定位
sql复制SELECT
datname,
usename,
application_name,
client_addr,
state,
count(*) as connection_count,
max(age(now(), backend_start)) as max_connection_age
FROM pg_stat_activity
WHERE state IS NOT NULL
GROUP BY 1,2,3,4,5
ORDER BY connection_count DESC;
使用场景:
- 突发的连接数暴涨排查
- 识别异常长连接
- 定位未正确关闭连接的客户端应用
避坑经验:
- 当发现
idle in transaction状态连接过多时,优先检查应用层事务提交逻辑 max_connection_age超过1小时的长连接需要特别关注- 配合
pg_terminate_backend()使用时要避免误杀关键业务连接
2.2 表空间使用分析
sql复制SELECT
schemaname,
relname,
pg_size_pretty(pg_total_relation_size(relid)) as total_size,
pg_size_pretty(pg_relation_size(relid)) as data_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as external_size,
n_live_tup as row_count
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;
关键指标解读:
external_size过大可能意味着需要清理TOAST数据或索引膨胀- 定期对比
row_count增长与data_size变化比例,监控数据膨胀趋势 - 对超过10GB的大表建议设置自动vacuum调优参数
3. 性能问题排查SQL
3.1 慢查询实时捕获
sql复制SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows,
shared_blks_hit,
shared_blks_read
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
前置要求:
需要先执行CREATE EXTENSION pg_stat_statements;
分析技巧:
- 重点关注
mean_exec_time > 100ms的查询 shared_blks_read高表示存在大量物理I/O- 结合
EXPLAIN ANALYZE进一步分析具体查询计划
3.2 锁等待分析
sql复制SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
紧急处理步骤:
- 先记录
blocking_statement和blocking_pid - 尝试联系相关应用负责人
- 必要时使用
SELECT pg_cancel_backend(blocking_pid);终止阻塞进程
4. 维护管理类SQL
4.1 索引使用效率分析
sql复制SELECT
schemaname,
relname,
indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
优化建议:
idx_scan=0的索引考虑删除- 大表上的索引大小超过表数据30%时需要评估必要性
idx_tup_fetch/idx_tup_read比率过低可能表明索引效率低下
4.2 Vacuum状态监控
sql复制SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum,
vacuum_count,
autovacuum_count,
pg_size_pretty(pg_relation_size(relid)) as table_size
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC;
调优阈值:
- 当
n_dead_tup > n_live_tup/10时应考虑手动vacuum - 对于频繁更新的表,建议设置:
sql复制ALTER TABLE your_table SET ( autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02 );
5. 备份恢复关键SQL
5.1 WAL归档状态检查
sql复制SELECT
name,
setting,
unit,
category
FROM pg_settings
WHERE name IN (
'archive_mode',
'archive_command',
'restore_command',
'archive_timeout'
);
配置检查要点:
archive_mode必须为onarchive_command要测试可执行且权限正确- 生产环境
archive_timeout建议设置为1小时
5.2 备份时间点恢复测试
sql复制-- 创建还原点
SELECT pg_create_restore_point('before_critical_update');
-- 恢复后验证
SELECT *
FROM pg_available_restore_points()
ORDER BY name DESC
LIMIT 5;
最佳实践:
- 重大变更前必须创建还原点
- 定期测试
pg_rewind和PITR恢复流程 - 备份验证SQL建议保存为脚本文件
6. 安全审计相关SQL
6.1 用户权限审计
sql复制SELECT
rolname,
rolsuper,
rolcreaterole,
rolcreatedb,
rolcanlogin,
rolconnlimit,
rolvaliduntil
FROM pg_roles
WHERE rolname NOT LIKE 'pg_%'
ORDER BY rolsuper DESC, rolname;
安全基线要求:
- 超级用户账号不超过3个
- 所有账号必须设置
rolvaliduntil - 服务账号应设置
rolconnlimit
6.2 敏感数据扫描
sql复制SELECT
table_schema,
table_name,
column_name
FROM information_schema.columns
WHERE column_name LIKE '%pass%'
OR column_name LIKE '%token%'
OR column_name LIKE '%auth%'
ORDER BY table_schema, table_name;
加固建议:
- 发现敏感列应立即评估加密需求
- 考虑使用pgcrypto扩展进行列级加密
- 审计这些字段的访问权限
7. 性能调优进阶SQL
7.1 共享缓冲区命中率分析
sql复制SELECT
sum(blks_hit) * 100 / nullif(sum(blks_hit + blks_read), 0) as hit_ratio,
sum(blks_hit) as blks_hit,
sum(blks_read) as blks_read
FROM pg_stat_database;
调优指南:
- 低于95%需要增加
shared_buffers - 结合
pg_prewarm扩展预热常用表 - 监控趋势比单次值更重要
7.2 工作内存使用分析
sql复制SELECT
usename,
sum(work_mem) as total_work_mem,
count(*) as active_queries
FROM pg_stat_activity
WHERE state = 'active'
GROUP BY usename
ORDER BY total_work_mem DESC;
配置建议:
- 单个会话的
work_mem设置不要超过总内存的5% - 排序操作多的应用可以适当增大
- 通过
SET LOCAL work_mem在事务中临时调整
8. 高可用监控SQL
8.1 复制状态监控
sql复制SELECT
client_addr,
usename,
application_name,
state,
sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) as replay_lag_bytes,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
故障判断:
replay_lag_bytes持续增长可能预示网络问题sync_state不为'sync'时切换可能导致数据丢失- 物理复制建议监控
write_lag和flush_lag
8.2 备库查询路由检查
sql复制SELECT
name,
setting,
pending_restart
FROM pg_settings
WHERE name IN (
'hot_standby',
'hot_standby_feedback',
'max_standby_streaming_delay',
'max_standby_archive_delay'
);
配置要点:
- 读密集型备库应开启
hot_standby_feedback max_standby_streaming_delay设置要考虑业务容忍度- 定期检查
pending_restart标志
9. 统计信息管理
9.1 统计信息过期检查
sql复制SELECT
schemaname,
relname,
last_analyze,
last_autoanalyze,
analyze_count,
autoanalyze_count
FROM pg_stat_user_tables
ORDER BY
greatest(last_analyze, last_autoanalyze) NULLS FIRST
LIMIT 20;
维护建议:
- 超过7天未分析的表应手动执行
ANALYZE - 大表建议设置
autovacuum_analyze_scale_factor=0.05 - 模式变更后必须重新分析相关表
9.2 扩展统计信息检查
sql复制SELECT
stxname,
stxnamespace::regnamespace,
stxkeys,
stxkind
FROM pg_statistic_ext;
使用技巧:
- 对多列关联查询创建扩展统计
- 函数依赖统计可优化
GROUP BY查询 - 需要定期
ANALYZE更新统计信息
10. 日常巡检脚本模板
sql复制-- 巡检报告生成脚本
SELECT
'数据库版本' as item,
version() as value
UNION ALL SELECT
'运行时间',
pg_postmaster_start_time()::text
UNION ALL SELECT
'配置加载时间',
pg_conf_load_time()::text
UNION ALL SELECT
'活跃连接数',
count(*)::text
FROM pg_stat_activity
WHERE state = 'active'
UNION ALL SELECT
'总连接数',
count(*)::text
FROM pg_stat_activity
UNION ALL SELECT
'最大连接数',
current_setting('max_connections')
UNION ALL SELECT
'WAL目录使用量',
pg_size_pretty(pg_wal_dir_usage())
UNION ALL SELECT
'数据库总大小',
pg_size_pretty(sum(pg_database_size(oid))::bigint)
FROM pg_database;
巡检周期:
- 核心业务数据库每天执行
- 结果与历史数据对比分析
- 异常指标加入监控告警
11. 实用技巧补充
11.1 动态SQL生成技巧
sql复制-- 批量生成索引创建语句
SELECT
format('CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_%I_%I ON %I.%I(%I);',
tablename,
replace(columnname, ',', '_'),
schemaname,
tablename,
columnname)
FROM (
SELECT
n.nspname as schemaname,
c.relname as tablename,
a.attname as columnname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_attribute a ON a.attrelid = c.oid
WHERE c.relkind = 'r'
AND n.nspname NOT LIKE 'pg_%'
AND a.attnum > 0
AND NOT a.attisdropped
) t
WHERE (schemaname, tablename, columnname) NOT IN (
SELECT
n.nspname,
c.relname,
a.attname
FROM pg_index i
JOIN pg_class c ON c.oid = i.indrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_attribute a ON a.attrelid = c.oid
WHERE a.attnum = ANY(i.indkey)
);
11.2 元数据查询优化
sql复制-- 快速获取表结构定义
SELECT
pg_catalog.format_type(a.atttypid, a.atttypmod) as data_type,
a.attname as column_name,
not a.attnotnull as nullable
FROM pg_catalog.pg_attribute a
WHERE a.attnum > 0
AND NOT a.attisdropped
AND a.attrelid = 'your_schema.your_table'::regclass
ORDER BY a.attnum;
这些SQL脚本都是我多年运维PostgreSQL数据库积累的精华,每个都曾在关键时刻解决过实际问题。建议根据实际环境调整参数后保存为脚本库,定期执行关键监控SQL,可以提前发现80%的潜在问题。