Continuous profiling enabled by Pyroscope in Vald

vald.vdaas.org
ITNEXT
Published in
6 min readMar 25, 2022

--

Profiling is the way of showing the application resource usage.

Continuous profiling is the process of continuously collecting application performance. It helps developers to analyze it more deeply.

Nowadays, the demand for continuous profiling is increasing. Vald also applies continuous profiling using Pyroscope from v1.4.

This post briefly introduces the Pyroscope in Vald from the background that requires continuous profiling.

Why do we need continuous profiling?

When some performance problems such as low latency, memory leaks, CPU performance, etc., occur in an application, the developer needs to investigate bottlenecks and identify their root causes of them. In recent years, this situation has been increasing more and more with the complexity of the application.

Usually, the developer starts profiling after the problem happens about those situations. However, it takes a long time to solve the fundamental cause and exhausts the developers because it is difficult to reproduce the same problems. Needless to say, Vald team has encountered in some cases.

Continuous profiling is not the silver bullet ultimately but helps us. It stores performance data from the OS layer to the application layer, visualizes profile data, enables developers to compare performance states before and after the problem occurs.

Difference with metrics

A word is similar to profiling with metrics, but there is a clear difference between the two.

Profiling deals with only predefined parameters (CPU, RAM, threads, etc.), takes a long time to sample (over 10 min at least), and visualizes correlated stack trace. On the other hand, the metrics dealing with user-defined parameters take less time than usual profiling and visualize as time series.

This time, we focus on using Go. It requires the application implemented by Go and Pyroscope server when profiling with the application implemented by Go.

There are two types of agents, called Push and Pull, to use Pyroscope. The application should send pprof data to the Pyroscope server when using the Push agent. The Pyroscope server scrapes pprof data from the application when the Pull agent runs.

*NOTE: pprof is a tool for visualization and analysis of profiling data.

The way of Vald with Pyroscope

Vald applies Pull agent for profiling because we won’t have any dependencies on the application code for using Pyroscope. We can profile easily just by editing Kubernetes manifest.

Not only profiling the application, but Vald also applies eBPF for profiling on the Linux kernel level.

The architecture image is here.

+-----------------+  pull pprof
| vald components | <----+
+-----------------+ | +------------------+
+-----+ Pyroscope server |
+-----------------+ | +------------------+
| eBPF-Agent | <----+
+-----------------+ pull eBPF

We can apply a continuous profiling system easily by:

  • Set pprof enable mode for each component in valdrelease.yaml.
  • Deploy eBFP Agent for each Kubernetes Node.
  • Deploy Pyroscope server.

A tutorial using k3d

Deploy

Here are the steps when deploying Vald cluster on k3d with Pyroscope. Please use k3d in v5.3.0 or later when you try because eBPF is unavailable.

  • Clone the Vald repository
git clone https://github.com/vdaas/vald.git && cd vald
  • Create k3d cluster
k3d cluster create -v "/lib/modules:/lib/modules" --host-pid-mode=true --agents=3
  • Deploy vald-helm-operator

When you’d like to deploy Vald cluster on your local environment, please set rabc.create as true.

helm install --values ./charts/vald-helm-operator/values.yaml vald-helm-operator vald/vald-helm-operator --set rbac.create=true
  • Deploy vald-release
helm install vald vald/vald --values example/helm/values-with-pyroscope.yaml

When each Vald component pod runs with pprof enabled, pod annotation like below will be given for each pod. Pyroscope server decides which pod to scrape based on this information.

pyroscope.io/application-name: vald-agent-ngt
pyroscope.io/port: "6060"
pyroscope.io/profile-cpu-enabled: "true"
pyroscope.io/profile-mem-enabled: "true"
pyroscope.io/scrape: "true"
  • Deploy Pyroscope server and eBPF agent
make k8s/metrics/pyroscope/deploy

You can get the manifest from the below page.

Pyroscope UI

When the success of the deployment, we can access Pyroscope UI on our browser.

We can see 3 types of view, Single view, Comparison view, and Diff view. You can select a view according to your demand. Each view provides a table view and Flamegraph.

Example Flameglaph(vald-agent-ngt CPU usage)

The above image is an example of Flamegraph. The horizontal axis in this image represents the CPU monopoly time for function calls, and the vertical axis is the call stack. We can understand that the one with a wide width uses the CPU for a long time.

Single view

Single view shows profiling data in set time duration. It is the primary view to investigate your applications.
The table view on the left is a sortable view and is used to check Top utilization. The view on the right is the FlameGraph format described above. Above the table view is a search box that allows you to search for specific profiling.

Example single view (vald-agent-ngt.cpu)

Comparison view

The comparison view shows two profiling data for the specified two time periods. It helps to compare profiling data in two different periods. For example, when comparing two different versions at the time of release, you can compare how CPU and Memory metrics have changed before and after deployment.

Example comparison view (vald-agent-ngt.cpu)

Diff view

The diff view shows different profiling for the specified two time periods based on the older one. You can see three colors bar gray, red, and green.
The gray bars represent data for which the before and after profiling results are of equal performance. In this view, you will not focus on the gray. The red indicates how usage has increased compared to the old profiling, while green is the opposite of red, meaning how much it has decreased. This is useful for comparing old and new application profiling data, as in comparison view, but more quantitatively, it allows you to check the increase or decrease as a percentage.

Example diff view (vald-agent-ngt.cpu)

Conclusion

This post introduces why Vald applied Continuous profiling using Pyroscope and Pyroscope in Vald cluster. After using it, we could get its effectiveness, we will post about it. How about starting continuous profiling for your application?

--

--

A highly scalable distributed fast approximate nearest neighbor dense vector search engine.