Release Announcement: Vald v1.6.0

A new version of v1.6.0, a minor update, has been available since this week. This update introduces several new features:

This post describes the background and mechanism of OpenTelemetry and Circuit Breaker.

What is OpenTelemetry?

Vald is highly scaleable, user can easily scale each component in Vald. As the component scale, it increases the difficulty to observe and trace how the service behaves.

To collect telemetry data such as traces and metrics, Vald implemented this functionality using OpenCensus. As OpenCensus is merged into OpenTelemetry), Vald is also updated to follow it.

OpenTelemetry is an open-source framework for instrumenting, generating, collecting, and exporting telemetry data. It provides sets of standardized vendor-agnostic SDKs, APIs, and tools for ingesting, transforming, and sending data to an Observability back-end.

How to configure OpenTelemetry?

To enable OpenTelemetry on Vald, you can update the Helm chart settings.

defaults:
observability:
# enable observability feature in Vald
enabled: true
metrics:
# enable version info metrics
enable_version_info: true
# enable memory metrics
enable_memory: true
# enable goroutine metrics
enable_goroutine: true
# enable cgo metrics
enable_cgo: true

trace:
# enable tracing feature in Vald
enabled: true

prometheus:
# enable Prometheus
enabled: true
# Prometheus collect interval
collect_interval: 500ms
# Prometheus collect timeout
collect_timeout: 10s
# enable in memory mode of Prometheus
enable_in_memory_mode: true

jaeger:
# enable Jaeger
enabled: false

Please set the following configurations to enable OpenTelemetry:

  • Set defaults.observability.enabled to true
  • Set defaults.observability.metrics.enabled to true
  • Set defaults.observability.trace.enabled to true
  • Set defaults.observability.prometheus.enabled to true
  • Set defaults.observability.jaeger.enabled to true

About Vald deployment, please refer to below guideline.

Circuit Breaker

The Circuit Breaker is a design pattern to handle service failure and limit its impact. Vald is designed based on Microservices architecture, a service usually calls other services to perform an action or retrieve data.

When the service cannot connect to another service due to its outage, the origin service may retry to connect to the service.

But for example, if the target service is down due to maintenance, or external service failure, the service may not be recovered in a short time and the service will be retrying to connect to the service.

To avoid this issue, Vald implements the Circuit Breaker pattern.

How does it work?

Circuit Breaker patterns have three states:

Closed

  • If the upstream service is up and gets the proper response, the Circuit Breaker remains closed and all calls happen normally.
  • If the service fails and exceeds the defined threshold, the Circuit Breaker will switch to the Open state.

Open

  • The service returns an error directly without executing the function.
  • The Circuit Breaker will automatically switch to the Half Open state after a defined period.

Half Open

  • If a single request succeeds, the Circuit Breaker switches to the Closed state and back to normal.
  • Otherwise, if the request is failed, the Circuit Breaker switches to the Open state and continues to return an error until the next timeout.

How to configure the Circuit Breaker?

To configure the Circuit Breaker, you can update the Helm chart settings.

defaults:
grpc:
client:
circuit_breaker:
# closed refresh timeout, the interval that checks the error rate in closed state
closed_refresh_timeout: "10s"
# minimum sampling request count in closed_refresh_timeout
# the error rate will not be checked if the request count is smaller than min_samples count
min_samples: 1000

# error rate that turns closed state to open based on closed_refresh_timeout and min_samples
closed_error_rate: 0.7
# timeout of open state, will automatically turns to half open state after this timeout
open_timeout: "1s"

About Vald deployment, please refer to below guideline.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
vald.vdaas.org

A highly scalable distributed fast approximate nearest neighbor dense vector search engine.