微服务日志规范
日志架构
整体架构
┌─────────────────────────────────────────────────────┐
│ 应用服务 │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ 日志输出 │ │ 日志收集 │ │ 日志上报 │ │
│ └───────────┘ └───────────┘ └───────────┘ │
└───────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Filebeat / Fluentd │
│ (日志收集代理) │
└───────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Kafka │
│ (日志缓冲队列) │
└───────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Logstash │
│ (日志处理) │
└───────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Elasticsearch │
│ (日志存储) │
└───────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Kibana │
│ (日志展示) │
└─────────────────────────────────────────────────────┘
日志级别
| 级别 | 说明 | 使用场景 |
|---|---|---|
| ERROR | 错误 | 系统异常、业务异常 |
| WARN | 警告 | 可恢复的异常、参数校验失败 |
| INFO | 信息 | 关键业务流程、状态变更 |
| DEBUG | 调试 | 详细调试信息 |
| TRACE | 追踪 | 最详细的追踪信息 |
日志格式
1. 结构化日志
JSON 格式:
{
"timestamp": "2026-04-10T10:00:00.000+08:00",
"level": "INFO",
"service": "user-service",
"instance": "192.168.1.100:8080",
"traceId": "abc123def456",
"spanId": "span789",
"thread": "http-nio-8080-exec-1",
"class": "com.example.controller.UserController",
"method": "getUser",
"line": 42,
"message": "查询用户成功",
"userId": 12345,
"duration": 15
}
2. Logback 配置
<!-- logback-spring.xml -->
<configuration>
<springProperty scope="context" name="appName" source="spring.application.name"/>
<property name="LOG_PATTERN"
value="%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] [%X{traceId:-N/A}] %-5level %logger{36} - %msg%n"/>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<customFields>{"service":"${appName}"}</customFields>
<includeMdc>true</includeMdc>
<includeContext>false</includeContext>
<includeCallerData>true</includeCallerData>
<fieldNames>
<timestamp>timestamp</timestamp>
<level>level</level>
<message>message</message>
<logger>class</logger>
<thread>thread</thread>
</fieldNames>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/${appName}.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>logs/${appName}.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
<totalSizeCap>10GB</totalSizeCap>
</rollingPolicy>
<encoder>
<pattern>${LOG_PATTERN}</pattern>
<charset>UTF-8</charset>
</encoder>
</appender>
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
<queueSize>8192</queueSize>
<neverBlock>true</neverBlock>
<discardingThreshold>0</discardingThreshold>
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</appender>
<!-- 业务日志 -->
<appender name="BUSINESS" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/business.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/business.%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder>
<pattern>${LOG_PATTERN}</pattern>
<charset>UTF-8</charset>
</encoder>
</appender>
<!-- 慢查询日志 -->
<appender name="SLOW_QUERY" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/slow-query.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/slow-query.%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder>
<pattern>${LOG_PATTERN}</pattern>
<charset>UTF-8</charset>
</encoder>
</appender>
<!-- 审计日志 -->
<appender name="AUDIT" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/audit.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>logs/audit.%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>90</maxHistory>
</rollingPolicy>
<encoder>
<pattern>${LOG_PATTERN}</pattern>
<charset>UTF-8</charset>
</encoder>
</appender>
<!-- 日志级别配置 -->
<logger name="com.example.business" level="INFO" additivity="false">
<appender-ref ref="BUSINESS"/>
<appender-ref ref="ASYNC"/>
</logger>
<logger name="com.example.slow" level="WARN" additivity="false">
<appender-ref ref="SLOW_QUERY"/>
<appender-ref ref="ASYNC"/>
</logger>
<logger name="com.example.audit" level="INFO" additivity="false">
<appender-ref ref="AUDIT"/>
<appender-ref ref="ASYNC"/>
</logger>
<root level="INFO">
<appender-ref ref="ASYNC"/>
</root>
</configuration>
3. 日志内容规范
好的日志示例:
// 业务操作日志
log.info("订单创建成功,orderId={}, userId={}, amount={}",
orderId, userId, amount);
// 错误日志(带堆栈)
log.error("订单创建失败,orderId={}, userId={}, reason={}",
orderId, userId, e.getMessage(), e);
// 慢查询日志
if (duration > 1000) {
log.warn("慢查询,sql={}, duration={}ms, params={}",
sql, duration, params);
}
// 审计日志
log.info("用户登录,userId={}, ip={}, userAgent={}",
userId, ip, userAgent);
不好的日志示例:
// 信息不完整
log.info("order created");
// 错误日志不带堆栈
log.error("order create failed");
// 敏感信息明文
log.info("user login, password={}", password);
// 循环内打印日志
for (Item item : items) {
log.info("processing item: {}", item); // 性能问题
}
链路追踪集成
1. Trace ID 传递
@Component
public class TraceIdInterceptor implements HandlerInterceptor {
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) {
String traceId = request.getHeader("X-Trace-ID");
if (traceId == null) {
traceId = UUID.randomUUID().toString().replace("-", "");
}
MDC.put("traceId", traceId);
return true;
}
@Override
public void afterCompletion(HttpServletRequest request,
HttpServletResponse response,
Object handler,
Exception ex) {
MDC.clear();
}
}
2. Feign 传递 Trace ID
@Component
public class FeignTraceInterceptor implements RequestInterceptor {
@Override
public void apply(RequestTemplate template) {
String traceId = MDC.get("traceId");
if (traceId != null) {
template.header("X-Trace-ID", traceId);
}
}
}
3. 线程池传递 MDC
@Component
public class MdcTaskDecorator implements TaskDecorator {
@Override
public Runnable decorate(Runnable runnable) {
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
try {
MDC.setContextMap(contextMap);
runnable.run();
} finally {
MDC.clear();
}
};
}
}
@Configuration
public class ThreadPoolConfig {
@Bean
public ThreadPoolTaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setTaskDecorator(new MdcTaskDecorator());
return executor;
}
}
日志收集
1. Filebeat 配置
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/app/*.log
fields:
service: ${SERVICE_NAME}
environment: ${ENVIRONMENT}
instance: ${HOSTNAME}
json.keys_under_root: true
json.add_error_key: true
json.message_key: message
multiline.pattern: '^\d{4}-\d{2}-\d{2}'
multiline.negate: true
multiline.match: after
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- add_docker_metadata: ~
output.kafka:
enabled: true
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
topic: "logs-%{[environment]}"
partition.hash:
reachable_only: true
compression: gzip
required_acks: 1
2. Logstash 配置
# logstash.conf
input {
kafka {
bootstrap_servers => "kafka1:9092,kafka2:9092,kafka3:9092"
topics => ["logs-*"]
group_id => "logstash"
codec => "json"
}
}
filter {
# 添加时间戳
date {
match => ["timestamp", "ISO8601"]
target => "@timestamp"
}
# 解析日志内容
if [message] {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:log_timestamp} %{DATA:log_level} %{DATA:log_class} - %{GREEDYDATA:log_message}" }
}
}
# 添加地理信息
if [ip] {
geoip {
source => "ip"
target => "geoip"
}
}
# 删除不必要字段
remove_field => ["host", "agent", "ecs"]
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logs-%{[environment]}-%{+YYYY.MM.dd}"
template_name => "logs"
}
}
3. Elasticsearch 索引模板
PUT _template/logs
{
"index_patterns": ["logs-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "logs",
"index.lifecycle.rollover_alias": "logs"
},
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"level": { "type": "keyword" },
"service": { "type": "keyword" },
"instance": { "type": "keyword" },
"traceId": { "type": "keyword" },
"message": { "type": "text" },
"class": { "type": "keyword" },
"thread": { "type": "keyword" },
"duration": { "type": "long" },
"userId": { "type": "long" },
"ip": { "type": "ip" },
"geoip": {
"properties": {
"country": { "type": "keyword" },
"city": { "type": "keyword" }
}
}
}
}
}
日志分析
1. Kibana 查询
错误日志查询:
service: "user-service" AND level: "ERROR" AND timestamp >= now-1h
慢查询分析:
service: "order-service" AND duration > 1000
用户行为分析:
userId: 12345 AND (message: "登录" OR message: "下单" OR message: "支付")
2. 日志聚合
错误统计:
GET /logs-*/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{ "term": { "level": "ERROR" } },
{ "range": { "timestamp": { "gte": "now-1h" } } }
]
}
},
"aggs": {
"errors_by_service": {
"terms": { "field": "service" }
},
"errors_over_time": {
"date_histogram": {
"field": "timestamp",
"interval": "5m"
}
}
}
}
3. 告警规则
错误率告警:
- alert: HighErrorRate
expr: |
sum(rate(logs_total{level="ERROR"}[5m]))
/ sum(rate(logs_total[5m])) > 0.01
for: 5m
labels:
severity: warning
annotations:
summary: "日志错误率过高"
description: "错误率超过 1%"
慢查询告警:
- alert: SlowQuery
expr: |
avg(logs_duration{query_type="SELECT"}) > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "慢查询告警"
description: "平均查询时间超过 1 秒"
日志脱敏
1. 敏感信息过滤
@Component
public class SensitiveDataConverter implements JsonProvider {
private static final Set<String> SENSITIVE_FIELDS = Set.of(
"password", "token", "secret", "key", "creditCard", "idCard"
);
@Override
public Object parse(String json) {
// 解析并脱敏
return maskSensitiveData(json);
}
private Object maskSensitiveData(Object data) {
if (data instanceof Map) {
Map<?, ?> map = (Map<?, ?>) data;
Map<Object, Object> result = new HashMap<>();
for (Map.Entry<?, ?> entry : map.entrySet()) {
String key = String.valueOf(entry.getKey());
Object value = entry.getValue();
if (SENSITIVE_FIELDS.contains(key.toLowerCase())) {
result.put(key, "***");
} else {
result.put(key, maskSensitiveData(value));
}
}
return result;
}
return data;
}
}
2. Logback 脱敏
public class SensitiveDataConverter extends ClassicConverter {
@Override
public String convert(ILoggingEvent event) {
String message = event.getFormattedMessage();
// 脱敏手机号
message = message.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
// 脱敏身份证
message = message.replaceAll("(\\d{6})\\d{8}(\\d{4})", "$1********$2");
// 脱敏邮箱
message = message.replaceAll("(\\w{3})\\w*@\\w+", "$1***@$2");
return message;
}
}
日志归档
1. 生命周期管理
# ILM 策略
PUT _ilm/policy/logs
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
2. 日志备份
#!/bin/bash
# 日志备份脚本
BACKUP_DIR="/backup/logs"
DATE=$(date +%Y%m%d)
# 备份旧日志
tar -czf $BACKUP_DIR/logs-$DATE.tar.gz /var/log/app/*.log.*
# 上传到对象存储
aws s3 cp $BACKUP_DIR/logs-$DATE.tar.gz s3://logs-bucket/
# 清理本地备份
find $BACKUP_DIR -name "logs-*.tar.gz" -mtime +30 -delete
最佳实践
1. 日志规范
- 统一格式:使用 JSON 结构化格式
- 包含关键字段:timestamp, level, service, traceId
- 避免敏感信息:密码、token 等需要脱敏
- 控制日志量:避免打印过多无用日志
2. 性能优化
- 异步日志:使用 AsyncAppender
- 批量上报:减少网络开销
- 日志采样:DEBUG/TRACE 级别采样
- 压缩归档:节省存储空间
3. 问题排查
- 关联查询:通过 traceId 关联所有日志
- 上下文完整:包含足够的排查信息
- 告警及时:建立完善的告警机制
总结
微服务日志是排查问题的重要依据,需要建立统一的日志规范,包括日志格式、日志级别、日志收集、日志分析等。
使用 ELK 栈可以实现日志的集中管理和分析,结合链路追踪可以快速定位问题。
建立完善的日志脱敏和归档机制,保证数据安全和合规性。