前言
SkyWalking 是国产开源的 APM(应用性能监控)系统,提供分布式链路追踪、性能指标监控、服务依赖分析等功能。本文将介绍 Spring Boot 集成 SkyWalking 的完整方案。
SkyWalking 基础
1. 架构组件
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Agent │────▶│ OAP │────▶│ Storage │
│ (探针) │ │ (服务端) │ │ (存储) │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ UI │
│ (界面) │
└─────────────┘
- Agent - 无侵入探针,收集链路数据
- OAP - Open Analysis Platform,分析平台
- Storage - 存储(Elasticsearch、MySQL、H2)
- UI - 可视化界面
2. 启动 SkyWalking
# 下载 SkyWalking
wget https://archive.apache.org/dist/skywalking/9.7.0/apache-skywalking-apm-9.7.0.tar.gz
# 解压
tar -xzf apache-skywalking-apm-9.7.0.tar.gz
cd apache-skywalking-apm-9.7.0
# 启动 OAP
bin/oap-service.sh start
# 启动 UI
bin/webapp-service.sh start
3. Docker 启动
# docker-compose.yml
version: '3.8'
services:
elasticsearch:
image: elasticsearch:8.8.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
volumes:
- es-data:/usr/share/elasticsearch/data
oap:
image: apache/skywalking-oap:9.7.0
depends_on:
- elasticsearch
environment:
SW_STORAGE: elasticsearch
SW_STORAGE_ES_CLUSTER_NODES: elasticsearch:9200
ports:
- "11800:11800"
- "12800:12800"
ui:
image: apache/skywalking-ui:9.7.0
depends_on:
- oap
environment:
SW_OAP_ADDRESS: http://oap:12800
ports:
- "8080:8080"
volumes:
es-data:
集成 Spring Boot
1. 添加探针
方式一:JVM 启动参数
java -javaagent:/path/to/skywalking-agent.jar \
-Dskywalking.agent.service_name=demo-service \
-Dskywalking.collector.backend_service=localhost:11800 \
-jar demo.jar
方式二:IDEA 配置
VM Options:
-javaagent:/path/to/skywalking-agent.jar
-Dskywalking.agent.service_name=demo-service
-Dskywalking.collector.backend_service=localhost:11800
2. 配置文件
# agent.config
agent:
service_name: demo-service
namespace: default
collector:
backend_service: localhost:11800
logging:
level: info
file: /var/log/skywalking-agent.log
plugins:
spring_rest_template:
include_path_patterns:
- /api/**
3. Maven 插件
<plugin>
<groupId>org.apache.skywalking</groupId>
<artifactId>sw-maven-plugin</artifactId>
<version>9.7.0</version>
<executions>
<execution>
<goals>
<goal>copy-agent</goal>
</goals>
<phase>package</phase>
</execution>
</executions>
<configuration>
<agentDestination>${project.build.directory}/skywalking-agent</agentDestination>
</configuration>
</plugin>
链路追踪
1. 自动追踪
SkyWalking 自动追踪以下组件:
- Spring MVC - Controller 方法
- Spring Boot - RestTemplate、WebClient
- 数据库 - MySQL、PostgreSQL、Redis
- 消息队列 - Kafka、RocketMQ、RabbitMQ
- HTTP 客户端 - HttpClient、OkHttp
2. 自定义追踪
@Component
@RequiredArgsConstructor
public class OrderService {
private final OrderRepository orderRepository;
/**
* 手动添加追踪标签
*/
@Trace
public Order createOrder(OrderCreateDTO dto) {
// 添加标签
ActiveSpan.tag("order.type", dto.getType());
ActiveSpan.tag("user.id", dto.getUserId().toString());
try {
Order order = orderRepository.save(convert(dto));
// 添加日志
ActiveSpan.log("Order created", order.getId());
return order;
} catch (Exception e) {
// 记录异常
ActiveSpan.error("Order creation failed", e);
throw e;
}
}
/**
* 异步追踪
*/
@Trace
public CompletableFuture<Order> createOrderAsync(OrderCreateDTO dto) {
return CompletableFuture.supplyAsync(() -> {
// 传播上下文
ContextCarrier carrier = ContextManager.createContextCarrier();
ContextManager.extract(carrier);
return ContextManager.continueTrace(carrier, () -> {
return orderRepository.save(convert(dto));
});
});
}
}
3. 链路上下文传播
@Component
public class TraceContextFilter implements Filter {
@Override
public void doFilter(
ServletRequest request,
ServletResponse response,
FilterChain chain
) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
// 提取上游链路上下文
ContextCarrier carrier = new ContextCarrier();
carrier.deserializeFromCarrierItem(
httpRequest.getHeader("sw8")
);
// 继续追踪
ContextManager.continueTrace(carrier);
try {
chain.doFilter(request, response);
} finally {
ContextManager.stopTrace();
}
}
}
性能监控
1. 指标监控
SkyWalking 自动收集以下指标:
- 服务指标 - QPS、响应时间、成功率
- 端点指标 - 每个接口的 QPS、响应时间
- 数据库指标 - SQL 执行时间、慢查询
- JVM 指标 - 内存、GC、线程
2. 慢查询监控
# application.yml
spring:
datasource:
url: jdbc:mysql://localhost:3306/demo
username: root
password: 123456
# SkyWalking 自动监控慢 SQL
# 阈值:100ms
agent:
slow_sql_threshold: 100
3. 自定义指标
@Component
public class CustomMetrics {
/**
* 记录业务指标
*/
@Trace
public void recordOrderMetrics(Order order) {
// 订单金额
MetricsLabel label = new MetricsLabel();
label.append("type", order.getType());
label.append("status", order.getStatus());
MetricsBuilder builder = new MetricsBuilder();
builder.value(order.getAmount());
// 发送到 OAP
MetricsClient.send("order_amount", label, builder);
}
}
告警配置
1. 告警规则
# alarm-settings.yml
rules:
# 服务响应时间慢
- name: service_resp_time_rule
expression: sum(service_resp_time) > 1000
period: 10
# 服务成功率低
- name: service_sla_rule
expression: service_sla < 90
period: 10
# 端点响应时间慢
- name: endpoint_resp_time_rule
expression: endpoint_resp_time > 500
period: 10
# 数据库慢查询
- name: database_slow_query
expression: database_resp_time > 1000
period: 10
hooks:
webhook:
url: http://localhost:8080/webhook
2. Webhook 处理
@RestController
@RequestMapping("/webhook")
public class SkyWalkingWebhook {
@PostMapping
public ResponseEntity<Void> handleAlert(@RequestBody Alert alert) {
log.warn("收到告警:{}", alert);
// 发送通知
notificationService.send(alert);
return ResponseEntity.ok().build();
}
}
@Data
public class Alert {
private String scope;
private String name;
private String alarmMessage;
private Long startTime;
}
3. 通知集成
@Service
public class NotificationService {
/**
* 发送邮件通知
*/
public void sendEmail(Alert alert) {
// 实现邮件发送
}
/**
* 发送钉钉通知
*/
public void sendDingTalk(Alert alert) {
DingTalkClient client = new DefaultDingTalkClient(
"https://oapi.dingtalk.com/robot/send"
);
OapiRobotSendRequest request = new OapiRobotSendRequest();
request.setMsgtype("markdown");
Markdown markdown = new Markdown();
markdown.setTitle("SkyWalking 告警");
markdown.setText(formatAlert(alert));
request.setMarkdown(markdown);
client.execute(request);
}
private String formatAlert(Alert alert) {
return String.format(
"## SkyWalking 告警\n\n" +
"- **服务**: %s\n" +
"- **告警**: %s\n" +
"- **时间**: %s\n" +
"- **详情**: %s",
alert.getScope(),
alert.getName(),
new Date(alert.getStartTime()),
alert.getAlarmMessage()
);
}
}
服务依赖分析
1. 拓扑图
SkyWalking 自动生成服务拓扑图,展示:
- 服务间调用关系
- 数据库依赖
- 消息队列依赖
- 外部 API 依赖
2. 依赖指标
- 调用次数 - 服务间调用频率
- 响应时间 - 平均、P95、P99
- 成功率 - 调用成功比例
- 慢调用 - 超过阈值的调用
3. 瓶颈分析
通过拓扑图和指标,快速定位:
- 慢服务
- 高错误率服务
- 资源瓶颈
最佳实践
1. 探针配置
# agent.config
agent:
# 服务名称
service_name: ${SW_AGENT_NAME:demo-service}
# 命名空间
namespace: ${SW_AGENT_NAMESPACE:default}
# 采样率(生产环境建议降低)
sample_n_per_3_secs: ${SW_AGENT_SAMPLE:-10}
# 忽略路径
ignore_suffix: ${SW_AGENT_IGNORE_SUFFIX:.css,.js,.html,.png}
collector:
# OAP 地址
backend_service: ${SW_AGENT_COLLECTOR_BACKEND_SERVICES:localhost:11800}
logging:
# 日志级别
level: ${SW_AGENT_LOGGING_LEVEL:info}
2. 性能优化
# 生产环境配置
agent:
# 降低采样率
sample_n_per_3_secs: 10
# 关闭不必要的插件
plugins:
exclude_plugins:
- spring-cloud-gateway
- grpc
# 异步发送
grpc:
async: true
buffer_size: 1000
3. 存储优化
# OAP 配置
storage:
elasticsearch:
# 索引保留天数
retention: 7
# 分片数
shards: 3
# 副本数
replicas: 1
4. 多环境隔离
# 开发环境
agent:
namespace: dev
service_name: demo-service-dev
# 测试环境
agent:
namespace: test
service_name: demo-service-test
# 生产环境
agent:
namespace: prod
service_name: demo-service-prod
总结
SkyWalking 链路追踪要点:
- ✅ 无侵入探针 - Java Agent 自动追踪
- ✅ 链路追踪 - 分布式调用链
- ✅ 性能监控 - 指标、慢查询、JVM
- ✅ 告警配置 - 规则、通知
- ✅ 服务依赖 - 拓扑图、瓶颈分析
- ✅ 最佳实践 - 配置优化、多环境隔离
SkyWalking 是 Spring Boot 应用监控的利器。