GuntherX's


【文献精读】Characterizing-Microservice-Dependency-and-Performance-阿里巴巴微服务trace分析

Apr 5, 2023 - #Research

Metadata

Abstract

Loosely-coupled and light-weight microservices running in containers are replacing monolithic applications gradually. Understanding the characteristics of microservices is critical to make good use of microservice architectures. However, there is no comprehensive study about microservice and its related systems in production environments so far. In this paper, we present a solid analysis of large-scale deployments of microservices at Alibaba clusters. Our study focuses on the characterization of microservice dependency as well as its runtime performance. We conduct an in-depth anatomy of microservice call graphs to quantify the difference between them and traditional DAGs of data-parallel jobs. In particular, we observe that microservice call graphs are heavy-tail distributed and their topology is similar to a tree and moreover, many microservices are hot-spots. We reveal three types of meaningful call dependency that can be utilized to optimize microservice designs. Our investigation on microservice runtime performance indicates most microservices are much more sensitive to CPU interference than memory interference. To synthesize more representative microservice traces, we build a mathematical model to simulate call graphs. Experimental results demonstrate our model can well preserve those graph properties observed from Alibaba traces.## Annotations

%% begin annotations %%

Main Content

2. 微服务架构

2.1 调用图

微服务可以被分成两类:

通过MQ通常可以将微服务图划分为若干个部分,每个部分又可以划分为若干个两层调用(two-tier invocations),调用方称为(Upstream Microservice, UM),被调用方(Downstream Microservice,DM)

2.2 图学习算法(图聚类)

[[InfoGraph]]是一种非监督学习的方法,将节点信息(微服务类型)以及边特征(通信框架)作为深度神经网络的输入训练,以最大化训练集中的互信息,以此可以为每一个图生成一个embedded向量。再利用生成的向量利用K-means算法完成聚类,使用Silhouette score度量k值的优劣。

3. 调用图剖析

3.1 微服务调用图的大小遵循重尾分布(如Burr Distribution)

image.png

3.1 微服务调用图的结构更类似一个树,并且其中很多都有一个较长的调用链

image.png

3.2 微服务调用图是高度动态的

每当一个用户请求到达Entering Microservice,接下来的调用路径会变得非常复杂,因为这还要取决于用户的状态(例如在为用户排序商品的时候,用户是否有优惠卷会影响排序的调用路径),这更加强调了微服务中需要基于图的预测任务。

3.3 剖析

Stateless 微服务的调用特征可以分为3种类型(Fig 8-a):

image.png

随着调用图越来越深,Stateless访问Memcacheds的IP会线性的下降,因为调用图深的请求往往是发生了 Memcacheds 的cache miss,因此会去Database里面获取数据,IP(S2D)呈上升趋势,而S2D的上升和S2M下降的差距会由MQ来填补上。

4. Stateless微服务间的依赖分析

4. 1 循环依赖

image.png

4.2 耦合依赖——具有很高调用概率和很多调用次数的依赖

为一个两层调用\(Y->X\)定义两个metric:

\[ \begin{aligned} Call Probability(Y2X) = Count(X) / Sum \\ Call Time(Y2X) = Count(X) / N \end{aligned} \]

其中\(Count(X)\)表示UM Y调用DM X的次数(注意在一次请求中一个DM可能会被Y调用多次),\(Sum\)表示Y在所有调用图中发起的两层调用的个数和,\(N\)表示\(Sum\)

中DM X被调用的那些两层调用的个数。

image.png

从Fig 10中可以看出来,有超过10%的微服务对的Call Time和Call Probability的乘积大于5,这说明有相当一部分微服务对有着很强的耦合依赖。

同时还发现,有强耦合依赖的微服务对中有17%不会与其他微服务分享DM(入度为1),这意味着可以将这对服务本地化,以减少远程调用产生的开销

5 微服务运行时性能

5.1 微服务调用率

5.2 微服务时延性能(RT)

6. 微服务图的概率模型生成

文章还提出了一个概率模型去生成微服务图以及调用trace,可以更好的模拟生产级别的微服务架构和用户请求。

, ,