The story of JAPAN SEARCH with Vald

Vald provides high-scalable, and high-speed vector approximate neighbours search features to users. However, there are many usages of the vector search. It is not easy to imagine how users can use it in their services. It is easy to use for teams that already know, but it is a challenge to use it for team just starting their investigation.

In this post, we would like to introduce an example of the usage of Vald in a service called JAPAN SEARCH provided by the National Diet Library in Japan. The following article was provided by the National Diet Library and introduced the background and future expectations of Vald. We hope it will help you imagine how Vald can be used in your products.

Hello! I am a librarian and engineer working for the National Diet Library, Japan. Today, I am pleased to introduce the use case of Vald in Japan Search, a service for which I’m in charge of backend operations and development.

Japan Search, officially launched in August 2020, is a national platform to aggregate metadata of digital resources of various fields and provide an integrated search service as well as an API and a variety of functions for use in order to promote digital content usage. The Japan Search system is developed and operated by the National Diet Library in cooperation with many kinds of organizations in Japan. Japan Search also plays a role in structuring the metadata aggregated as Linked Open Data using uniform rules and providing them in a machine-readable format. In addition to searching metadata by keywords, Japan Search has an image search function that allows users to search for similar thumbnail images by using Vald. It uses specifically a machine learning model to convert images into feature vectors, indexes the feature vectors in Vald, and returns images with feature vectors that are close to the query. This image search function plays a useful role in discovering the content sought by users who want to search but do not know the keywords, or by users who are not familiar with the Japanese language.

Let me show you some examples. The following URL is the result of a search using a painting from the 1800s depicting a tourist attraction in Tokyo as the query. You can find old images in a similar composition.

This is the result of a search for images of armor. It’s cool, isn’t it?

This is the search result for images of ramen. Thumbnails of visual materials on the theme of ramen(Japanese noodle soup) will be found. Don’t they make you hungry?

In this way, you can thus search the metadata in various ways. In addition, you can filter search results by licenses such as Creative Commons from the image search results.

An example of a similar image search result

Background

Initially, the original function of the current image search function was built on an on-premises server using the NGTD Docker container. When Japan Search was made official from the beta version in the summer of 2020, we migrated it into the AWS Kubernetes environment (EKS). At the same time, we asked the development team of Vald, the successor service of NGTD, for their opinion and decided to implement it.

What I consider to be the advantages of using Vald

Using Vald for Japan Search has four advantages:

  1. Fast response time when searching a large number of images (around several million).
  2. The ability to add, delete, and update search targets without stopping the service.
  3. Stable and fault-tolerant operation
  4. Simple configuration in terms of operation cost (especially after Vald 1.0 release).

We are now using Vald v1.1.2, which I think is an excellent product in terms of both fault tolerance for long-term operation and simplicity of system configuration. Let’s also touch on the related tools of Vald. The server-side application of Japan Search is implemented in Java. It was very useful that Vald officially provided a Java gRPC client library which enables the communication between Vald and the application. I think it is great for convenience that client libraries are also available for other programming languages such as Python and Go.

Difficulties in implementing Vald

The biggest challenge we faced during the introduction of Vald was that there was not much useful information on the Internet, nor in the official repositories. That’s why I had to make a series of experiments to understand the functions of each pod. And when something went wrong, it sometimes took a lot of effort to distinguish whether it was a product-specific problem or not, since our team had never touched Kubernetes before. Kind advice of Vald development team, however, was always very helpful. Today, the official repository provides an architecture document with illustrations.

If you are considering implementing Vald, the document should be a great help for you.

What to expect from Vald in the future

For a small team like ours, it would be helpful to have recipes for specific use cases, so that we can better understand the features of the software and make them easier to use. So I hope that the documentation will be further enriched. One functionality that I would like to see implemented in the future is the switch between multiple nearest neighbor search algorithms (e.g. FAISS), which would allow us to conduct experiments in order to compare their convenience. We are planning to use Vald for similarity search in other services of our organization. We will continue to share our operational cases with the Vald community.

Written by: Toru Aoike(National Diet Library)

Postscript

Great thanks JAPAN SEARCH team for giving the story of bringing in Vald to JAPAN SEARCH. Did this post help you to understand how to use Vald?

If you’re interested in Vald, please feel free to contact us! We are happy to support you.

We will keep providing more documents for more useful. Please tell us what you are not sure about something, we will plan to publish a document.

See you next post :)

A highly scalable distributed fast approximate nearest neighbor dense vector search engine.