作为一款高性能列式数据库,ClickHouse在处理海量数据时展现出惊人的查询速度,但这也意味着它对内存管理有着极高的要求。在实际生产环境中,内存问题往往是导致ClickHouse性能下降甚至服务崩溃的首要原因。本文将系统性地介绍ClickHouse内存问题的排查方法,从业务层SQL分析到系统层内存分配,再到底层工具诊断,形成一套完整的排查体系。
当发现ClickHouse节点内存使用异常时,首要任务是识别当前正在消耗大量内存的查询。通过system.processes表可以获取实时查询的内存消耗情况:
sql复制SELECT
hostname(),
query_id,
user,
formatReadableSize(memory_usage) AS memory_used,
elapsed,
query
FROM clusterAllReplicas('ck_cluster', system.processes)
WHERE memory_usage > 0
ORDER BY memory_usage DESC
LIMIT 20;
注意:在生产环境中,建议添加LIMIT子句避免结果集过大。同时关注memory_usage大于1GB的查询,这些通常是需要重点优化的对象。
关键指标解读:
除了实时监控,通过system.query_log表可以分析历史查询的内存使用情况,找出周期性的"内存杀手":
sql复制SELECT
argMax(hostName(), memory_usage) AS node_host,
argMax(user, memory_usage) AS execution_user,
formatReadableSize(max(memory_usage)) AS max_mem_readable,
max(memory_usage) AS raw_max_mem,
argMax(query_start_time, memory_usage) AS peak_time,
round(argMax(query_duration_ms, memory_usage)/1000,2) AS duration_sec,
argMax(query, memory_usage) AS worst_query,
count() AS execution_count
FROM clusterAllReplicas('ck_cluster', system.query_log)
WHERE type = 'QueryFinish'
AND event_time >= now() - INTERVAL 7 DAY
AND memory_usage > 1e9 -- 筛选内存消耗大于1GB的查询
GROUP BY normalized_query_hash
ORDER BY raw_max_mem DESC
LIMIT 50;
分析技巧:
某些ClickHouse表引擎会持续占用内存,需要特别关注:
sql复制SELECT
database,
name AS table_name,
engine,
formatReadableSize(total_bytes) AS size
FROM system.tables
WHERE engine IN ('Memory','Set','Join','Buffer')
AND total_bytes > 0
ORDER BY total_bytes DESC;
常见内存表引擎特点:
实操建议:对于必须使用内存引擎的场景,建议设置max_size参数限制内存使用,并建立监控告警。
ClickHouse通过两套系统表提供内存监控指标:
sql复制SELECT
hostname(),
name,
value,
formatReadableSize(value) AS size
FROM clusterAllReplicas('ck_cluster', system.metrics)
WHERE name LIKE '%Memory%' OR name LIKE '%Cache%'
ORDER BY value DESC
LIMIT 20;
关键指标说明:
| 指标名称 | 正常范围 | 异常处理 |
|---|---|---|
| MemoryTracking | 通常<50%物理内存 | 持续高位需检查查询负载 |
| MarkCacheBytes | 数据量1-5% | 过大需调整mark_cache_size |
| UncompressedCacheBytes | 根据查询模式 | 过高需减小uncompressed_cache_size |
| QueryCacheBytes | 根据缓存策略 | 注意缓存命中率 |
sql复制SELECT
hostname(),
metric,
value,
formatReadableSize(value) AS readable_size
FROM clusterAllReplicas('ck_cluster', system.asynchronous_metrics)
WHERE metric LIKE '%Memory%' OR metric LIKE '%Cache%'
ORDER BY value DESC
LIMIT 20;
重点关注的异步指标:
ClickHouse支持内存分配采样,可定位到具体函数级别的内存使用:
sql复制-- 检查内存采样配置
SELECT * FROM system.settings
WHERE name LIKE '%memory_profiler%';
-- 分析内存分配热点
SELECT
trace_type,
count() AS alloc_times,
formatReadableSize(sum(size)) AS total_size,
arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS stacktrace
FROM system.trace_log
WHERE event_date >= today() - 1
AND trace_type = 'Memory'
GROUP BY trace_type, trace
ORDER BY total_size DESC
LIMIT 10;
解读技巧:
ClickHouse默认使用jemalloc内存分配器,相关指标对诊断内存问题至关重要:
sql复制SELECT
hostname(),
metric,
value,
formatReadableSize(value) AS readable_value,
description
FROM clusterAllReplicas('ck_cluster', system.asynchronous_metrics)
WHERE metric LIKE 'jemalloc%'
ORDER BY value DESC;
关键jemalloc指标解析:
| 指标组 | 关键指标 | 健康标准 |
|---|---|---|
| 基础分配 | jemalloc.allocated | 应小于物理内存80% |
| 活跃内存 | jemalloc.active | allocated/active ≈ 0.9 |
| 物理内存 | jemalloc.resident | 接近系统RSS值 |
| 内存碎片 | jemalloc.retained | 不应持续增长 |
| 后台回收 | jemalloc.background_thread | 应有4-8个线程 |
除了ClickHouse自身指标,还需检查操作系统内存状态:
bash复制# 在ClickHouse服务器上执行
cat /proc/$(pidof clickhouse-server)/status | grep -i vm
free -h
cat /proc/meminfo | grep -i cache
关键系统指标对应关系:
| ClickHouse指标 | 系统指标 | 说明 |
|---|---|---|
| MemoryResident | RSS | 实际物理内存使用 |
| MemoryVirtual | VIRT | 虚拟内存大小 |
| jemalloc.resident | AnonPages | 匿名页使用量 |
| OSMemoryCached | Cached | 系统页面缓存 |
xml复制<!-- config.xml -->
<profiles>
<default>
<max_memory_usage>10000000000</max_memory_usage> <!-- 10GB -->
<max_memory_usage_for_user>20000000000</max_memory_usage_for_user>
<max_untracked_memory>1048576</max_untracked_memory>
</default>
</profiles>
关键参数说明:
xml复制<!-- config.xml -->
<yandex>
<mark_cache_size>8589934592</mark_cache_size> <!-- 8GB -->
<uncompressed_cache_size>8589934592</uncompressed_cache_size>
<cache_policy>SLRU</cache_policy>
</yandex>
缓存优化原则:
紧急处理:
根因分析:
长期优化:
现象:某聚合查询偶尔消耗50GB+内存导致OOM
分析过程:
解决方案:
sql复制SET optimize_aggregation_in_order=1;
ALTER TABLE target_table MODIFY SETTING index_granularity=8192;
现象:查询停止后内存不释放,resident持续高位
诊断步骤:
调优方案:
xml复制<yandex>
<jemalloc>
<dirty_decay_ms>10000</dirty_decay_ms>
<background_thread>true</background_thread>
</jemalloc>
</yandex>
现象:周期性内存增长,与JOIN表相关
排查方法:
优化措施:
sql复制CREATE TABLE join_table (...) ENGINE = Join(...) SETTINGS join_use_nulls=0;
SET join_algorithm='auto';
完善的监控是预防内存问题的关键:
基础监控:
高级监控:
告警规则:
示例Prometheus监控指标:
yaml复制- name: clickhouse_memory
rules:
- record: clickhouse:memory_usage:ratio
expr: sum(clickhouse_metric_MemoryTracking) by (instance) / on(instance) node_memory_MemTotal_bytes
labels:
severity: warning
annotations:
summary: "ClickHouse memory usage high (instance {{ $labels.instance }})"
description: "ClickHouse is using {{ $value | humanizePercentage }} of system memory"
通过以上系统化的内存问题排查方法,结合实际情况灵活应用,可以有效解决ClickHouse环境中的各类内存问题,保障服务的稳定高效运行。