Why we need the ANN vector search engine?
As you know, in recent decades, information technology has been extremely evolved. The technology evolution such as Smart Phone, 5G Network, Cloud Computing, has been affecting our lifestyle. We can not only search for what you’d like to know but also buy anything on the Internet, sharing and saving images, videos, memories, and more digital content. Along with those changes, the demand for searching with object data has become stronger. The most popular search technology is the k-NN (k nearest neighbor). It is a very simple, fast, and high-precision algorithm. However, considering using a vectorized object data, the dimension of the vector may be very high. That will add difficulties to the exact similar searching due to the curse of dimensionality, and the ANN (Approximate Nearest Neighbor) is getting the attention.
The example scenario of using Vald
Vald is a highly scalable distributed fast ANN dense vector search engine. A blog post about its overview has been published. In this post, we focus on the usage example and do not mention Vald deeply. Please check if you are interested.
Vald. A highly scalable distributed fast approximate nearest neighbour dense vector search engine.
Before showing the example scenarios, we have to keep in mind that Vald can be used for any object data in the sense when the data can be converted to the vector using some own methods (like ML models).
Image / Video / Audio
One of the most popular object data is the image, video, and audio. Many studies have been done on these themes. The following chapters show the examples for using Vald.
Let us show you an example of image recognition using Vald. Known images are indexed into Vald as vectorized data by using ML models beforehand. A image can be recognized by querying it as a vectorized data by using the same model. Vald returns a similar image vector as the search result. Face-Recognition is a good example to imagine.
One of the examples using the video data is that to recognize video from a scene of the video. Vald can be used as to recognize a speaker from voice data.
Recommendation is an important technology for various types of modern Internet services like EC or streaming services. For example, using the model considered product relationship on EC for vectorizing image data, Vald can recommend products based on the product any users are watching. For video recommendations (movie, YouTube, etc.), the idea is to use a model which is based on context or performer/actor. And also, singer, tune or voice quality could be used for recommendation content.
Same as previous examples, text data is one of the most popular object data. For example, we show the 3 situations for using Vald with text data.
For example, it is able to search similar sentences or programming codes using a vector conversion model that trained to detect writing habits because such texts reflect habits who wrote it. This is useful for detecting the sentence is plagiarized or not.
To use Vald as a grammar checker, both of grammatically correct and incorrect sentences are indexed into it. A vectorized sentence which you want to check is used for a search query, and you can get similar sentences to it. If many of them are grammatically correct, the target sentence can be interpreted as correct, and if not, incorrect.
In case of checking spelling, vectorized idiom or word data are indexed into Vald. To check spelling of a word, send it as a search query to Vald. If the spelling is correct, the returned vector will be as same as a search query. If not, the nearest vector is the correct word in the point of spelling.
Auto language translation is helpful for people who hard to understand the original language. By vectorizing the text using a model trained so that the distance between 2 or more languages is close to the same word.
There are more application ideas of ANN.
One of the ideas is malware detection. To detect given binary is malware or not, index the already known malware binaries into Vald, then make a search request with its vector. If malware, the similarity will be high.
Another idea is Social Analysis. Today, many kinds of social data such as activity histories, interests, hobbies, and more are stored. You can vectorize those data by applying each model and insert them into Vald. It can use it for suggesting related friends or events or recommending content.
They are just example applications of ANN. Let’s try to apply for your use case.
At last, we will introduce case studies about using Vald.
JAPAN SEARCH is a platform for many kinds of digital content archives operated by National Diet Library, Japan. It has many kinds of content data and we can search any content by some queries (e.g. Mt. Fuji). It uses Vald for image similarity search.
In JAPAN SEARCH, some contents have image data and these are indexed in their Vald Cluster. On the search result page, you can see the image search icon when the result has the image. When clicking it, Vald gets the search request and returns the nearest neighbor, and shows results on the search result page.
HACKDAY is one of the most famous hackathons in Japan. In HACKDAY 2021 held on Mar. 2021, Vald participated as a technical sponsor. Many products were born in HACKDAY 2021, some of them used Vald.
“mevie” is a sharing app of self-introduction that uses Vald for text similarity search. At first, mevie converts the self-introduction data to text data. After that, it converts them into a dense vector by using BERT which is known as one of the language representation models provided by Google. mevie recommends people a user will get along with from users’ self-introduction context.
Thanks for taking the time to read this blog post. In this blog, we introduced the diversity of search enabled by vector search engines, and introduced some of the real use cases where Vald is actually used. Other than we introduced, some users use Vald for searching various types of objects. We hope to introduce them in the future.
If you are interested in Vald, please check the public document.
Vald is a highly scalable distributed fast approximate nearest neighbor dense vector search engine. Vald is designed…
We are welcome to contact you.
Join vald on Slack
Slack is a new way to communicate with your team. It's faster, better organized, and more secure than email.
Next, we will show a super easy way to use Vald.
See you again :)