Continuous profiling enabled by Pyroscope in Vald
Profiling is the way of showing the application resource usage.
Continuous profiling is the process of continuously collecting application performance. It helps developers to analyze it more deeply.
Nowadays, the demand for continuous profiling is increasing. Vald also applies continuous profiling using Pyroscope from v1.4.
This post briefly introduces the Pyroscope in Vald from the background that requires continuous profiling.
Why do we need continuous profiling?
When some performance problems such as low latency, memory leaks, CPU performance, etc., occur in an application, the developer needs to investigate bottlenecks and identify their root causes of them. In recent years, this situation has been increasing more and more with the complexity of the application.
Usually, the developer starts profiling after the problem happens about those situations. However, it takes a long time to solve the fundamental cause and exhausts the developers because it is difficult to reproduce the same problems. Needless to say, Vald team has encountered in some cases.
Continuous profiling is not the silver bullet ultimately but helps us. It stores performance data from the OS layer to the application layer, visualizes profile data, enables developers to compare performance states before and after the problem occurs.
Difference with metrics
A word is similar to profiling with metrics, but there is a clear difference between the two.
Profiling deals with only predefined parameters (CPU, RAM, threads, etc.), takes a long time to sample (over 10 min at least), and visualizes correlated stack trace. On the other hand, the metrics dealing with user-defined parameters take less time than usual profiling and visualize as time series.
What is the Pyroscope?
Pyroscope is one of the continuous profiling tools.
It supports many programming languages, e.g., Go
, Python
, Java
, PHP
, eBPF
, or etc.
You can see the Pyroscope Live Demo.
This time, we focus on using Go
. It requires the application implemented by Go
and Pyroscope server
when profiling with the application implemented by Go
.
There are two types of agents, called Push
and Pull
, to use Pyroscope. The application should send pprof
data to the Pyroscope server when using the Push
agent. The Pyroscope server scrapes pprof
data from the application when the Pull
agent runs.
*NOTE: pprof
is a tool for visualization and analysis of profiling data.
The way of Vald with Pyroscope
Vald applies Pull
agent for profiling because we won’t have any dependencies on the application code for using Pyroscope. We can profile easily just by editing Kubernetes manifest.
Not only profiling the application, but Vald also applies eBPF
for profiling on the Linux kernel level.
The architecture image is here.
+-----------------+ pull pprof
| vald components | <----+
+-----------------+ | +------------------+
+-----+ Pyroscope server |
+-----------------+ | +------------------+
| eBPF-Agent | <----+
+-----------------+ pull eBPF
We can apply a continuous profiling system easily by:
- Set
pprof
enable mode for each component invaldrelease.yaml
. - Deploy
eBFP Agent
for each Kubernetes Node. - Deploy
Pyroscope server
.
A tutorial using k3d
Deploy
Here are the steps when deploying Vald cluster on k3d with Pyroscope. Please use k3d in v5.3.0 or later when you try because eBPF is unavailable.
- Clone the Vald repository
git clone https://github.com/vdaas/vald.git && cd vald
- Create k3d cluster
k3d cluster create -v "/lib/modules:/lib/modules" --host-pid-mode=true --agents=3
- Deploy
vald-helm-operator
When you’d like to deploy Vald cluster on your local environment, please set rabc.create
as true
.
helm install --values ./charts/vald-helm-operator/values.yaml vald-helm-operator vald/vald-helm-operator --set rbac.create=true
- Deploy
vald-release
helm install vald vald/vald --values example/helm/values-with-pyroscope.yaml
When each Vald component pod runs with pprof
enabled, pod annotation
like below will be given for each pod. Pyroscope server decides which pod to scrape based on this information.
pyroscope.io/application-name: vald-agent-ngt
pyroscope.io/port: "6060"
pyroscope.io/profile-cpu-enabled: "true"
pyroscope.io/profile-mem-enabled: "true"
pyroscope.io/scrape: "true"
- Deploy Pyroscope server and eBPF agent
make k8s/metrics/pyroscope/deploy
You can get the manifest from the below page.
Pyroscope UI
When the success of the deployment, we can access Pyroscope UI on our browser.
We can see 3 types of view, Single view, Comparison view, and Diff view. You can select a view according to your demand. Each view provides a table view and Flamegraph.
The above image is an example of Flamegraph. The horizontal axis in this image represents the CPU monopoly time for function calls, and the vertical axis is the call stack. We can understand that the one with a wide width uses the CPU for a long time.
Single view
Single view shows profiling data in set time duration. It is the primary view to investigate your applications.
The table view on the left is a sortable view and is used to check Top utilization. The view on the right is the FlameGraph format described above. Above the table view is a search box that allows you to search for specific profiling.
Comparison view
The comparison view shows two profiling data for the specified two time periods. It helps to compare profiling data in two different periods. For example, when comparing two different versions at the time of release, you can compare how CPU and Memory metrics have changed before and after deployment.
Diff view
The diff view shows different profiling for the specified two time periods based on the older one. You can see three colors bar gray, red, and green.
The gray bars represent data for which the before and after profiling results are of equal performance. In this view, you will not focus on the gray. The red indicates how usage has increased compared to the old profiling, while green is the opposite of red, meaning how much it has decreased. This is useful for comparing old and new application profiling data, as in comparison view, but more quantitatively, it allows you to check the increase or decrease as a percentage.
Conclusion
This post introduces why Vald applied Continuous profiling using Pyroscope and Pyroscope in Vald cluster. After using it, we could get its effectiveness, we will post about it. How about starting continuous profiling for your application?