在现代分布式系统中,日志管理是运维和开发人员面临的重要挑战之一。随着微服务架构的普及,传统的日志查看方式已经无法满足需求。ELK(Elasticsearch + Logstash + Kibana)技术栈配合Filebeat轻量级日志采集器,已经成为企业级日志管理的标准解决方案。
这套方案的核心价值在于:
在开始部署前,需要确保服务器满足以下最低配置要求:
提示:对于生产环境,建议将各组件部署在不同的服务器上,以获得更好的性能和可靠性。
bash复制# 下载Elasticsearch(以8.5.0版本为例)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.5.0-linux-x86_64.tar.gz
# 解压安装包
tar -xzf elasticsearch-8.5.0-linux-x86_64.tar.gz
cd elasticsearch-8.5.0/
# 创建专用用户(Elasticsearch不允许使用root运行)
useradd elasticsearch -s /bin/bash -d /home/elasticsearch
chown -R elasticsearch:elasticsearch .
编辑config/elasticsearch.yml文件:
yaml复制cluster.name: spring-boot-logging
node.name: ${HOSTNAME}
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node # 单节点模式,生产环境应配置集群
xpack.security.enabled: true # 启用安全功能
xpack.security.http.ssl.enabled: true
# JVM堆内存设置(建议不超过物理内存的50%)
-Xms4g
-Xmx4g
bash复制# 增加虚拟内存限制
echo 'vm.max_map_count=262144' | sudo tee -a /etc/sysctl.conf
sysctl -p
# 增加文件描述符限制
echo 'elasticsearch - nofile 65536' | sudo tee -a /etc/security/limits.conf
echo 'elasticsearch - memlock unlimited' | sudo tee -a /etc/security/limits.conf
bash复制# 切换到elasticsearch用户启动
su - elasticsearch -c "./bin/elasticsearch -d"
# 验证服务状态
curl -X GET "localhost:9200" -u elastic:your_password
bash复制wget https://artifacts.elastic.co/downloads/kibana/kibana-8.5.0-linux-x86_64.tar.gz
tar -xzf kibana-8.5.0-linux-x86_64.tar.gz
cd kibana-8.5.0/
编辑config/kibana.yml:
yaml复制server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "your_password"
i18n.locale: "zh-CN" # 设置为中文界面
# 生产环境应启用HTTPS
server.ssl.enabled: true
server.ssl.certificate: /path/to/your/cert.pem
server.ssl.key: /path/to/your/key.pem
bash复制nohup ./bin/kibana > kibana.log 2>&1 &
# 验证启动
curl -I http://localhost:5601
bash复制wget https://artifacts.elastic.co/downloads/logstash/logstash-8.5.0-linux-x86_64.tar.gz
tar -xzf logstash-8.5.0-linux-x86_64.tar.gz
cd logstash-8.5.0/
创建config/pipelines.yml:
yaml复制- pipeline.id: spring-boot-logs
path.config: "/etc/logstash/conf.d/spring-boot.conf"
pipeline.workers: 4
queue.type: persisted
queue.max_bytes: 2gb
创建config/conf.d/spring-boot.conf:
ruby复制input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/logstash/certs/logstash.crt"
ssl_key => "/etc/logstash/certs/logstash.key"
}
}
filter {
# 基础字段处理
mutate {
remove_field => ["host", "agent", "ecs", "input"]
}
# 日期处理
date {
match => ["timestamp", "ISO8601"]
target => "@timestamp"
}
# 异常堆栈处理
if [log][level] == "ERROR" {
mutate {
add_tag => ["error"]
}
}
}
output {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
index => "spring-boot-logs-%{+YYYY.MM.dd}"
user => "logstash_writer"
password => "your_password"
ssl_certificate_verification => true
cacert => "/etc/logstash/certs/ca.crt"
}
}
bash复制./bin/logstash -f config/conf.d/spring-boot.conf --config.reload.automatic
# 生产环境建议使用systemd管理
bash复制wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.5.0-linux-x86_64.tar.gz
tar -xzf filebeat-8.5.0-linux-x86_64.tar.gz
cd filebeat-8.5.0-linux-x86_64/
编辑filebeat.yml:
yaml复制filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/spring-boot/*.log
json.keys_under_root: true
json.add_error_key: true
fields:
app: "spring-boot-app"
env: "production"
fields_under_root: true
# 多行日志处理(如Java异常堆栈)
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
output.logstash:
hosts: ["logstash:5044"]
ssl.enabled: true
ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
# 启用模块监控
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
bash复制# 测试配置
./filebeat test config
./filebeat test output
# 启动服务
nohup ./filebeat -e > filebeat.log 2>&1 &
# 生产环境建议使用systemd管理
在pom.xml中添加必要的依赖:
xml复制<dependencies>
<!-- Logstash Logback编码器 -->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.2</version>
</dependency>
<!-- Micrometer监控 -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
</dependencies>
创建src/main/resources/logback-spring.xml:
xml复制<configuration>
<include resource="org/springframework/boot/logging/logback/defaults.xml"/>
<property name="LOG_PATH" value="/var/log/spring-boot"/>
<property name="LOG_FILE" value="${LOG_PATH}/application.log"/>
<!-- JSON格式日志输出 -->
<appender name="FILE_JSON" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${LOG_FILE}</file>
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
<providers>
<timestamp>
<timeZone>UTC</timeZone>
</timestamp>
<logLevel/>
<loggerName/>
<message/>
<mdc/>
<stackTrace>
<throwableConverter class="net.logstash.logback.stacktrace.ShortenedThrowableConverter">
<maxDepthPerThrowable>30</maxDepthPerThrowable>
<maxLength>2048</maxLength>
</throwableConverter>
</stackTrace>
<pattern>
<pattern>
{
"app": "${spring.application.name}",
"version": "${info.app.version:1.0.0}",
"env": "${spring.profiles.active:default}"
}
</pattern>
</pattern>
</providers>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>${LOG_FILE}.%d{yyyy-MM-dd}.%i.gz</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
</rollingPolicy>
</appender>
<!-- 异步日志输出 -->
<appender name="ASYNC_FILE" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="FILE_JSON"/>
<queueSize>1024</queueSize>
</appender>
<root level="INFO">
<appender-ref ref="ASYNC_FILE"/>
</root>
</configuration>
application.yml配置示例:
yaml复制spring:
application:
name: order-service
logging:
config: classpath:logback-spring.xml
level:
root: INFO
com.example: DEBUG
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
创建日志工具类增强日志功能:
java复制public class LogUtils {
private static final Logger logger = LoggerFactory.getLogger(LogUtils.class);
public static void logHttpRequest(HttpServletRequest request,
long duration, int status) {
MDC.put("http.method", request.getMethod());
MDC.put("http.path", request.getRequestURI());
MDC.put("http.status", String.valueOf(status));
MDC.put("http.duration_ms", String.valueOf(duration));
MDC.put("client.ip", getClientIp(request));
logger.info("HTTP request completed");
MDC.clear();
}
public static void logBusinessEvent(String eventType,
String userId,
Map<String, Object> details) {
MDC.put("event.type", eventType);
MDC.put("user.id", userId);
logger.info("Business event: {}", details);
MDC.clear();
}
private static String getClientIp(HttpServletRequest request) {
String ip = request.getHeader("X-Forwarded-For");
return ip != null ? ip.split(",")[0] : request.getRemoteAddr();
}
}
实现请求日志记录:
java复制@Component
public class RequestLoggingInterceptor implements HandlerInterceptor {
private static final ThreadLocal<Long> startTime = new ThreadLocal<>();
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) {
startTime.set(System.currentTimeMillis());
return true;
}
@Override
public void afterCompletion(HttpServletRequest request,
HttpServletResponse response,
Object handler, Exception ex) {
Long start = startTime.get();
if (start != null) {
long duration = System.currentTimeMillis() - start;
LogUtils.logHttpRequest(request, duration, response.getStatus());
}
startTime.remove();
}
}
@Configuration
public class WebConfig implements WebMvcConfigurer {
@Autowired
private RequestLoggingInterceptor loggingInterceptor;
@Override
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(loggingInterceptor);
}
}
增强日志处理能力:
ruby复制filter {
# 用户代理解析
if [http_user_agent] {
useragent {
source => "http_user_agent"
target => "user_agent"
}
}
# IP地理位置解析
if [client_ip] {
geoip {
source => "client_ip"
target => "geoip"
}
}
# 业务状态分类
translate {
field => "http.status"
destination => "status.category"
dictionary => {
"2??" => "SUCCESS"
"4??" => "CLIENT_ERROR"
"5??" => "SERVER_ERROR"
}
fallback => "UNKNOWN"
}
# 指纹去重
fingerprint {
source => ["message", "@timestamp"]
target => "[@metadata][fingerprint]"
method => "SHA256"
key => "your_secret_key"
}
}
创建索引模板优化存储:
json复制{
"index_patterns": ["spring-boot-logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy",
"index.refresh_interval": "30s"
},
"mappings": {
"dynamic": "strict",
"properties": {
"@timestamp": {
"type": "date"
},
"app": {
"type": "keyword"
},
"log.level": {
"type": "keyword"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"geoip": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
配置ILM策略自动管理日志:
json复制{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "1d"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
创建关键监控仪表板:
设置关键告警规则:
json复制{
"name": "High Error Rate Alert",
"trigger": {
"schedule": {
"interval": "5m"
}
},
"conditions": {
"script": {
"source": "ctx.results[0].hits.total.value > 20",
"lang": "painless"
}
},
"actions": {
"email_alert": {
"throttle_period": "15m",
"email": {
"to": ["devops@example.com"],
"subject": "High Error Rate Detected",
"body": "Found {{ctx.results[0].hits.total.value}} errors in last 5 minutes"
}
}
}
}
Filebeat优化:
queue.mem.events大小(默认4096)pipelining提高吞吐量loadbalance模式连接多个Logstash实例Logstash优化:
pipeline.workerspersistent queue防止数据丢失dead_letter_queue处理解析失败的日志Elasticsearch优化:
refresh_interval(日志场景可设为30s)_forcemerge减少分段数网络层安全:
认证授权:
数据保护:
日志收集失败:
性能问题:
数据不一致:
bash复制#!/bin/bash
# 日志系统健康检查脚本
check_service() {
service=$1
port=$2
if nc -z localhost $port; then
echo "[OK] $service is running on port $port"
else
echo "[ERROR] $service is not responding on port $port"
fi
}
# 检查各服务状态
check_service "Elasticsearch" 9200
check_service "Kibana" 5601
check_service "Logstash" 5044
# 检查磁盘空间
df -h /var/lib/elasticsearch
java复制@Component
public class TraceFilter implements Filter {
@Override
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
String traceId = httpRequest.getHeader("X-Trace-ID");
if (traceId == null || traceId.isEmpty()) {
traceId = UUID.randomUUID().toString();
}
MDC.put("trace.id", traceId);
try {
chain.doFilter(request, response);
} finally {
MDC.remove("trace.id");
}
}
}
java复制@Service
public class OrderService {
private final Counter orderCounter;
private final Timer orderProcessingTimer;
public OrderService(MeterRegistry registry) {
this.orderCounter = Counter.builder("orders.total")
.description("Total orders count")
.register(registry);
this.orderProcessingTimer = Timer.builder("orders.processing.time")
.description("Order processing time")
.register(registry);
}
public Order createOrder(OrderRequest request) {
return orderProcessingTimer.record(() -> {
Order order = processOrder(request);
orderCounter.increment();
LogUtils.logBusinessEvent("ORDER_CREATED",
request.getUserId(),
Map.of("orderId", order.getId(), "amount", order.getAmount()));
return order;
});
}
}
在实际部署这套日志平台时,有几个关键点需要特别注意:
这套方案在我们的生产环境中已经稳定运行超过两年,日均处理日志量超过1TB,支撑了上百个微服务的日志管理需求。通过合理的配置和优化,即使在业务高峰期也能保持稳定的性能表现。