Continuous profiling enabled by Pyroscope in Vald

vald.vdaas.org

Published in

ITNEXT

6 min readMar 25, 2022

Profiling is the way of showing the application resource usage.

Continuous profiling is the process of continuously collecting application performance. It helps developers to analyze it more deeply.

Nowadays, the demand for continuous profiling is increasing. Vald also applies continuous profiling using Pyroscope from v1.4.

This post briefly introduces the Pyroscope in Vald from the background that requires continuous profiling.

Why do we need continuous profiling?

When some performance problems such as low latency, memory leaks, CPU performance, etc., occur in an application, the developer needs to investigate bottlenecks and identify their root causes of them. In recent years, this situation has been increasing more and more with the complexity of the application.

Usually, the developer starts profiling after the problem happens about those situations. However, it takes a long time to solve the fundamental cause and exhausts the developers because it is difficult to reproduce the same problems. Needless to say, Vald team has encountered in some cases.

Continuous profiling is not the silver bullet ultimately but helps us. It stores performance data from the OS layer to the application layer, visualizes profile data, enables developers to compare performance states before and after the problem occurs.

Difference with metrics

A word is similar to profiling with metrics, but there is a clear difference between the two.

Profiling deals with only predefined parameters (CPU, RAM, threads, etc.), takes a long time to sample (over 10 min at least), and visualizes correlated stack trace. On the other hand, the metrics dealing with user-defined parameters take less time than usual profiling and visualize as time series.

What is the Pyroscope?

Pyroscope is one of the continuous profiling tools.

Open Source Continuous Profiling Platform | Debug performance issues down to a single line of code…

Find bottlenecks in your code and fix performance issues

pyroscope.io

It supports many programming languages, e.g., Go, Python, Java, PHP, eBPF, or etc.

Pyroscope Agent | Open Source Continuous Profiling Platform

Pyroscope Agent records and aggregates what your application has been doing, then sends that data over to the Pyroscope…

pyroscope.io

You can see the Pyroscope Live Demo.

This time, we focus on using Go. It requires the application implemented by Go and Pyroscope server when profiling with the application implemented by Go.

There are two types of agents, called Push and Pull, to use Pyroscope. The application should send pprof data to the Pyroscope server when using the Push agent. The Pyroscope server scrapes pprof data from the application when the Pull agent runs.

*NOTE: pprof is a tool for visualization and analysis of profiling data.

The way of Vald with Pyroscope

Vald applies Pull agent for profiling because we won’t have any dependencies on the application code for using Pyroscope. We can profile easily just by editing Kubernetes manifest.

Not only profiling the application, but Vald also applies eBPF for profiling on the Linux kernel level.

The architecture image is here.

+-----------------+  pull pprof
| vald components | <----+
+-----------------+      |     +------------------+
                         +-----+ Pyroscope server |
+-----------------+      |     +------------------+
|    eBPF-Agent   | <----+
+-----------------+  pull eBPF

We can apply a continuous profiling system easily by:

Set pprof enable mode for each component in valdrelease.yaml.
Deploy eBFP Agent for each Kubernetes Node.
Deploy Pyroscope server.

A tutorial using k3d

Deploy

Here are the steps when deploying Vald cluster on k3d with Pyroscope. Please use k3d in v5.3.0 or later when you try because eBPF is unavailable.

Clone the Vald repository

git clone https://github.com/vdaas/vald.git && cd vald

Create k3d cluster

k3d cluster create -v "/lib/modules:/lib/modules" --host-pid-mode=true --agents=3

Deploy vald-helm-operator

When you’d like to deploy Vald cluster on your local environment, please set rabc.create as true.

helm install --values ./charts/vald-helm-operator/values.yaml vald-helm-operator vald/vald-helm-operator --set rbac.create=true

Deploy vald-release

helm install vald vald/vald --values example/helm/values-with-pyroscope.yaml

When each Vald component pod runs with pprof enabled, pod annotation like below will be given for each pod. Pyroscope server decides which pod to scrape based on this information.

pyroscope.io/application-name: vald-agent-ngt
pyroscope.io/port: "6060"
pyroscope.io/profile-cpu-enabled: "true"
pyroscope.io/profile-mem-enabled: "true"
pyroscope.io/scrape: "true"

Deploy Pyroscope server and eBPF agent

make k8s/metrics/pyroscope/deploy

You can get the manifest from the below page.

vald/k8s/metrics/pyroscope at master · vdaas/vald

This is the mafests to deploy pyroscope server and pyroscope agent. The pyroscope server scrapes pprof data. Which pod…

github.com

Pyroscope UI

When the success of the deployment, we can access Pyroscope UI on our browser.

We can see 3 types of view, Single view, Comparison view, and Diff view. You can select a view according to your demand. Each view provides a table view and Flamegraph.

Example Flameglaph(vald-agent-ngt CPU usage)

The above image is an example of Flamegraph. The horizontal axis in this image represents the CPU monopoly time for function calls, and the vertical axis is the call stack. We can understand that the one with a wide width uses the CPU for a long time.

Single view

Single view shows profiling data in set time duration. It is the primary view to investigate your applications.
The table view on the left is a sortable view and is used to check Top utilization. The view on the right is the FlameGraph format described above. Above the table view is a search box that allows you to search for specific profiling.

Comparison view

The comparison view shows two profiling data for the specified two time periods. It helps to compare profiling data in two different periods. For example, when comparing two different versions at the time of release, you can compare how CPU and Memory metrics have changed before and after deployment.

Diff view

The diff view shows different profiling for the specified two time periods based on the older one. You can see three colors bar gray, red, and green.
The gray bars represent data for which the before and after profiling results are of equal performance. In this view, you will not focus on the gray. The red indicates how usage has increased compared to the old profiling, while green is the opposite of red, meaning how much it has decreased. This is useful for comparing old and new application profiling data, as in comparison view, but more quantitatively, it allows you to check the increase or decrease as a percentage.

Conclusion

This post introduces why Vald applied Continuous profiling using Pyroscope and Pyroscope in Vald cluster. After using it, we could get its effectiveness, we will post about it. How about starting continuous profiling for your application?

Continuous profiling enabled by Pyroscope in Vald

Why do we need continuous profiling?

Difference with metrics

What is the Pyroscope?

Open Source Continuous Profiling Platform | Debug performance issues down to a single line of code…

Find bottlenecks in your code and fix performance issues

Pyroscope Agent | Open Source Continuous Profiling Platform

Pyroscope Agent records and aggregates what your application has been doing, then sends that data over to the Pyroscope…

The way of Vald with Pyroscope

A tutorial using k3d

Deploy

vald/k8s/metrics/pyroscope at master · vdaas/vald

This is the mafests to deploy pyroscope server and pyroscope agent. The pyroscope server scrapes pprof data. Which pod…

Pyroscope UI

Conclusion

Other posts

Vald. A highly scalable distributed fast approximate nearest neighbour dense vector search engine.

Introduction of Vald: Cloud-Native Vector Search Engine

A New World created by similar search: Cases where Vald can be used.

What Do You Use Vald for?: Example Senarios & Case Studies

A Super Easy Way to Try Similarity Search using Vald

How to deploy Vald on your k3d within 5 minutes

Release Announcement: v1.4.0

We will release v1.4.0 this week.

Written by vald.vdaas.org