1. PostgreSQL锁机制基础解析
PostgreSQL作为一款企业级开源关系数据库,其锁机制设计直接影响着系统的并发性能。与MySQL等数据库不同,PostgreSQL采用了多版本并发控制(MVCC)机制,实现了"读不阻塞写,写不阻塞读"的特性。但在表结构变更等场景下,锁冲突问题仍然会显著影响业务连续性。
PostgreSQL的表级锁分为八个级别,按冲突程度从低到高依次为:
- AccessShareLock(SELECT查询)
- RowShareLock(SELECT FOR UPDATE/SHARE)
- RowExclusiveLock(INSERT/UPDATE/DELETE)
- ShareUpdateExclusiveLock(VACUUM/ANALYZE等维护操作)
- ShareLock(CREATE INDEX非并发模式)
- ShareRowExclusiveLock(某些ALTER TABLE操作)
- ExclusiveLock(REFRESH MATERIALIZED VIEW CONCURRENTLY)
- AccessExclusiveLock(DROP/TRUNCATE等DDL操作)
关键提示:ALTER TABLE操作根据具体子命令不同,可能获取ShareRowExclusiveLock或AccessExclusiveLock。例如添加列操作就需要最高级别的AccessExclusiveLock。
2. ALTER TABLE阻塞场景深度分析
2.1 典型阻塞场景重现
让我们通过实验复现一个典型阻塞场景:
会话1(长事务):
sql复制BEGIN;
SELECT * FROM users WHERE id=1; -- 获取AccessShareLock
-- 保持事务不结束
会话2(结构变更):
sql复制ALTER TABLE users ADD COLUMN phone VARCHAR(20); -- 需要AccessExclusiveLock
-- 此时会话2会挂起
通过pg_locks视图可以观察到锁等待情况:
sql复制SELECT locktype, relation::regclass, pid, mode, granted
FROM pg_locks
WHERE relation = 'users'::regclass;
输出结果将显示:
- 会话1持有AccessShareLock(granted=true)
- 会话2等待AccessExclusiveLock(granted=false)
2.2 锁冲突矩阵详解
下表展示了主要锁类型之间的冲突关系:
| 请求锁/持有锁 | AccessShare | RowShare | RowExclusive | ShareUpdateExclusive | Share | ShareRowExclusive | Exclusive | AccessExclusive |
|---|---|---|---|---|---|---|---|---|
| AccessShare | - | - | - | - | - | - | - | 冲突 |
| RowShare | - | - | - | - | - | 冲突 | 冲突 | 冲突 |
| RowExclusive | - | - | - | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 |
| ShareUpdateExclusive | - | - | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 |
| Share | - | - | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 |
| ShareRowExclusive | - | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 |
| Exclusive | - | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 |
| AccessExclusive | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 | 冲突 |
3. 生产环境解决方案
3.1 预防性措施
- 事务时间控制:
sql复制SET statement_timeout = '30s'; -- 设置单条SQL超时
SET lock_timeout = '5s'; -- 设置锁等待超时
- 维护窗口期:
bash复制# 使用pg_blocking_pids()函数识别阻塞源
psql -c "SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid IN (SELECT unnest(pg_blocking_pids(pg_backend_pid())));"
- 锁升级策略:
sql复制BEGIN;
LOCK TABLE users IN SHARE MODE; -- 先获取低级锁
ALTER TABLE users ADD COLUMN phone VARCHAR(20); -- 锁升级
COMMIT;
3.2 在线DDL最佳实践
- 并发创建索引:
sql复制CREATE INDEX CONCURRENTLY users_phone_idx ON users(phone);
-- 比普通CREATE INDEX锁级别更低(ShareUpdateExclusiveLock)
- 避免长时间事务:
python复制# Django示例:使用atomic装饰器控制事务范围
@transaction.atomic
def update_user(request):
user = User.objects.select_for_update().get(id=1)
# 业务处理
return response
- 使用事件触发器监控:
sql复制CREATE EVENT TRIGGER abort_long_running_ddl
ON ddl_command_start
WHEN TAG IN ('ALTER TABLE')
EXECUTE FUNCTION abort_long_running_ddl();
CREATE OR REPLACE FUNCTION abort_long_running_ddl()
RETURNS event_trigger AS $$
BEGIN
IF EXISTS (
SELECT 1 FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
AND query_start < NOW() - INTERVAL '1 minute'
AND state = 'active'
) THEN
RAISE EXCEPTION '存在长时间运行的事务,DDL操作已中止';
END IF;
END;
$$ LANGUAGE plpgsql;
4. 高级排查技巧
4.1 锁监控视图增强
创建自定义锁监控视图:
sql复制CREATE VIEW lock_monitor AS
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.usename AS blocked_user,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement,
blocked_activity.application_name AS blocked_application,
blocking_activity.application_name AS blocking_application,
now() - blocked_activity.query_start AS blocked_duration,
now() - blocking_activity.query_start AS blocking_duration
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
4.2 自动化处理脚本
bash复制#!/bin/bash
# 自动终止阻塞DDL的事务
BLOCKING_PIDS=$(psql -t -A -c "
SELECT DISTINCT blocking_locks.pid
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE blocked_activity.query LIKE 'ALTER TABLE%'
AND NOT blocked_locks.GRANTED
AND blocking_activity.query_start < NOW() - INTERVAL '5 minutes'
")
for PID in $BLOCKING_PIDS; do
psql -c "SELECT pg_terminate_backend($PID)"
done
5. 替代方案与新技术
5.1 逻辑复制方案
sql复制-- 主库
CREATE PUBLICATION users_pub FOR TABLE users;
-- 从库
CREATE TABLE users_new (LIKE users INCLUDING ALL);
ALTER TABLE users_new ADD COLUMN phone VARCHAR(20);
CREATE SUBSCRIPTION users_sub
CONNECTION 'host=primary dbname=test'
PUBLICATION users_pub;
-- 数据同步完成后
BEGIN;
ALTER TABLE users RENAME TO users_old;
ALTER TABLE users_new RENAME TO users;
DROP TABLE users_old;
COMMIT;
5.2 PostgreSQL 12+的改进
- 快速添加列(非空有默认值):
sql复制ALTER TABLE users ADD COLUMN status VARCHAR(10) NOT NULL DEFAULT 'active';
-- PG11+会立即填充默认值而不重写表
- 索引创建进度监控:
sql复制SELECT pid, query, phase, tuples_processed, tuples_total
FROM pg_stat_progress_create_index;
在实际生产环境中,我们曾遇到一个典型案例:某金融系统在业务高峰期执行ALTER TABLE添加字段,导致核心交易表锁等待超过30分钟。通过分析发现是报表系统长查询持有AccessShareLock所致。最终我们采用以下方案解决:
- 为报表系统建立专用备库
- 在维护窗口期执行DDL
- 设置lock_timeout参数
- 使用pg_terminate_backend终止非关键阻塞会话
