1. PostgreSQL DBA日常运维的核心SQL工具包
作为PostgreSQL数据库管理员,我们每天都需要处理各种运维任务——从性能调优到故障排查,从容量规划到安全审计。经过多年实战积累,我发现有20%的SQL语句能解决80%的日常问题。这些SQL就像瑞士军刀,熟练使用能极大提升工作效率。下面分享我整理的高频实用SQL清单,包含详细使用场景和解读。
提示:所有SQL均在PG 12+环境验证,部分函数在旧版本可能需要调整语法。
2. 数据库健康检查与监控
2.1 实时性能诊断
sql复制-- 查看当前活跃会话(按CPU排序)
SELECT pid, usename, application_name, client_addr,
now() - query_start AS duration,
query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY now() - query_start DESC;
-- 关键指标:duration显示查询已运行时间,超过30秒的查询建议重点关注
这个查询帮我定位过无数性能问题。有次生产环境CPU突然飙高,就是靠它发现有个报表查询忘记加时间范围条件,扫描了全表数据。建议搭配pg_cancel_backend(pid)使用,可以即时终止问题会话。
2.2 空间使用分析
sql复制-- 数据库大小排行
SELECT d.datname AS database,
pg_size_pretty(pg_database_size(d.datname)) AS size
FROM pg_database d
ORDER BY pg_database_size(d.datname) DESC;
-- 表空间明细(包含索引)
SELECT nspname AS schema,
relname AS table,
pg_size_pretty(pg_total_relation_size(C.oid)) AS total_size,
pg_size_pretty(pg_relation_size(C.oid)) AS table_size,
pg_size_pretty(pg_indexes_size(C.oid)) AS index_size
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind = 'r'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 50;
空间监控是DBA的基础工作。我习惯每天早上一来先跑这两个查询,特别关注增长异常的表。曾经有个日志表一周膨胀了200GB,就是这个方法及时发现了未配置自动清理的问题。
3. 查询性能分析与优化
3.1 慢查询识别
sql复制-- 从pg_stat_statements获取最耗资源的查询
SELECT query, calls,
total_exec_time, mean_exec_time,
rows, shared_blks_hit, shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- 需要先启用扩展:
-- CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
这个功能是PG的性能分析神器。通过mean_exec_time可以找到真正需要优化的查询。有个经典案例:一个平均执行2秒的查询每天被调用50万次,优化后整体负载下降了40%。
3.2 执行计划分析
sql复制-- 生成图形化执行计划(需客户端支持)
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 1000;
-- 文本格式简化版
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 1000;
解读执行计划是DBA的核心技能。我特别关注:
Seq ScanvsIndex Scan:全表扫描需警惕Buffers: shared hit/read:缓存命中率Actual RowsvsPlanned Rows:统计信息是否准确
4. 维护与管理工作流
4.1 索引管理
sql复制-- 查找缺失索引(基于pg_stat_all_tables)
SELECT schemaname, relname, seq_scan, seq_tup_read,
seq_tup_read/seq_scan AS avg_tuples_per_scan
FROM pg_stat_all_tables
WHERE seq_scan > 0
AND schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY seq_tup_read DESC
LIMIT 25;
-- 重建索引(解决膨胀问题)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
我每月会例行检查缺失索引。CONCURRENTLY参数是关键,可以在不锁表的情况下重建索引。曾经有个电商网站在大促前用这个方法将订单查询性能提升了8倍。
4.2 统计信息更新
sql复制-- 手动更新统计信息
ANALYZE VERBOSE orders;
-- 针对大表抽样分析
SET default_statistics_target = 1000;
ANALYZE orders;
统计信息不准会导致优化器选择低效的执行计划。对于数据分布不均匀的列(如状态字段),我会适当提高default_statistics_target。有个系统日期字段的查询突然变慢,就是因为统计信息没有捕捉到最近的数据分布变化。
5. 备份恢复与HA管理
5.1 备份状态检查
sql复制-- 检查WAL归档状态
SELECT name, setting, unit,
pg_size_pretty(setting::bigint) AS pretty
FROM pg_settings
WHERE name LIKE '%archive%';
-- 查看复制槽状态(流复制场景)
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(
restart_lsn, confirmed_flush_lsn
)) AS replication_lag
FROM pg_replication_slots;
监控备份和复制状态是确保数据安全的重中之重。我通过replication_lag发现过网络问题导致的复制延迟,避免了潜在的数据丢失风险。
5.2 时间点恢复测试
sql复制-- 创建恢复测试点(需超级用户)
SELECT pg_create_restore_point('before_major_upgrade');
-- 恢复后验证
SELECT pg_is_in_recovery(),
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn();
定期测试恢复流程是DBA的最佳实践。我总说:"没有验证过的备份等于没有备份"。曾用这个方法在3小时内恢复了误删的生产数据库。
6. 安全审计与权限管理
6.1 权限检查
sql复制-- 查看用户权限
SELECT grantee, table_schema, table_name, privilege_type
FROM information_schema.table_privileges
WHERE grantee NOT IN ('postgres', 'PUBLIC');
-- 查找敏感数据列
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE column_name LIKE '%pass%'
OR column_name LIKE '%credit%';
权限管理是安全的基础。我常用第一个查询审计新员工的权限分配。第二个查询帮我发现过存储明文密码的遗留系统,及时进行了加密改造。
6.2 登录审计
sql复制-- 查看失败登录尝试
SELECT usename, datname, client_addr,
auth_method, backend_start
FROM pg_stat_ssl
JOIN pg_stat_activity USING (pid)
WHERE NOT ssl;
-- 检查SSL连接情况
SELECT usename, datname, client_addr,
auth_method, backend_start
FROM pg_stat_ssl
JOIN pg_stat_activity USING (pid)
WHERE NOT ssl;
安全无小事。这些查询帮我识别过暴力破解尝试,促使我们加强了认证策略。现在所有生产环境都强制使用SSL连接。
7. 高级运维技巧
7.1 锁监控与处理
sql复制-- 检测锁等待
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_query,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
锁竞争是常见性能问题。这个复杂查询帮我解决过多次死锁情况。关键是要找到blocking_pid然后视情况终止会话或优化事务。
7.2 参数调优辅助
sql复制-- 查找非默认参数(调优参考)
SELECT name, setting, boot_val, reset_val
FROM pg_settings
WHERE source != 'default'
AND name NOT LIKE '%password%'
ORDER BY name;
-- 计算缓存命中率
SELECT sum(blks_hit) / sum(blks_hit + blks_read) AS hit_ratio
FROM pg_stat_database;
参数调优需要数据支撑。我常用第一个查询对比不同环境的配置。缓存命中率低于99%通常意味着需要调整shared_buffers或优化查询。
8. 自动化运维实践
把这些SQL封装成脚本或可视化报表是进阶做法。我的个人工作流是:
- 用
psql的\o命令输出结果到文件 - 使用
cron定期执行关键检查 - 对异常结果设置邮件告警
- 重要指标记录到时序数据库做趋势分析
例如这个简单的监控脚本:
bash复制#!/bin/bash
OUTFILE="/var/log/pg_monitor_$(date +%Y%m%d).log"
psql -U postgres -c "SELECT now(),
sum(numbackends) as total_connections,
sum(xact_commit) as commits,
sum(blks_hit)*100/(sum(blks_hit)+sum(blks_read)) as cache_hit_ratio
FROM pg_stat_database;" >> $OUTFILE
把这些SQL工具熟练使用后,你会发现PostgreSQL运维效率能有质的提升。每个DBA都应该建立自己的SQL工具箱,并根据业务特点不断优化补充。