Publishing a Notebook with chiVe dataset

This is an introduction to the background and content of vald-demo repository that was recently released.

Why did we create this repository?

Vald is a highly scalable distributed fast approximate nearest neighbor dense vector search engine. Vald is a newly made OSS project and is still not popular yet. Now, we are working on growing the user community and attract more contribution to it. This post and the publishing of the notebook are part of those efforts.

Contents of vald-demo/chive

Next, let’s take a look at the contents of the repository we have published.

chive\
- README.md
- sample-values.yaml: Sample YAML for deploying Vald using Helm to run notebook.
- tutorial.ipynb : Example of using Vald with chiVe.
- tutorial.md : Example of using Vald with chiVe (with output cells)

How to use

In this chapter, we will show an example of using Jupyter Notebook in the Docker environment as one of the usage methods.

git clone https://github.com/vdaas/vald-demo.git
docker run -it -v $(pwd)/vald-demo:/home/jovyan/work -p 8888:8888 jupyter/datascience-notebook

Insert

Insert is an interface to register vectors to Vald. Here, a 300-dimensional vector randomly generated by np.random.rand(300) is inserted to Vald. (NOTE: The import statement is omitted.)

# create gRPC channel
channel = grpc.insecure_channel("localhost:8081")
# create stub
istub = insert_pb2_grpc.InsertStub(channel)

# Insert
sample = np.random.rand(300)
ivec = payload_pb2.Object.Vector(id="test", vector=sample)
icfg = payload_pb2.Insert.Config(skip_strict_exist_check=True)
ireq = payload_pb2.Insert.Request(vector=ivec, config=icfg)

istub.Insert(ireq)
name: "vald-agent-ngt-0"
uuid: "test"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"

Search

Search is an interface to perform the similarity search on vectors indexed in Vald. In this example, we use a randomly generated 300-dimensional vector. Since only one vector has been indexed into Vald at the previous Insert phase, one value is returned as a result of a similarity search. As for the value of distance, it may change depending on the value of the random vector and the function of distance calculation such as l2 (means l2-norm) and cos (means cosine distance).

# create stub
sstub = search_pb2_grpc.SearchStub(channel)

# Search
svec = np.random.rand(300)
scfg = payload_pb2.Search.Config(num=10, radius=-1.0, epsilon=0.1, timeout=3000000000)
sreq = payload_pb2.Search.Request(vector=svec, config=scfg)

sstub.Search(sreq)
results {
id: "test"
distance: 0.22659634053707123
}

Update

Update is an interface to update the vectors which are already inserted in Vald. Here, we replace the vector whose id is test with another random vector.

# create stub
ustub = update_pb2_grpc.UpdateStub(channel)

# Update
sample = np.random.rand(300)
uvec = payload_pb2.Object.Vector(id="test", vector=sample)
ucfg = payload_pb2.Update.Config(skip_strict_exist_check=True)
ureq = payload_pb2.Update.Request(vector=uvec, config=ucfg)

ustub.Update(ureq)
name: "vald-agent-ngt-0"
uuid: "test"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"

Remove

Remove is an interface to delete the vectors which are already inserted in Vald. In this example, the vector with id test will be deleted from Vald.

# create stub
rstub = remove_pb2_grpc.RemoveStub(channel)

# Remove
rid = payload_pb2.Object.ID(id="test")
rcfg = payload_pb2.Remove.Config(skip_strict_exist_check=True)
rreq = payload_pb2.Remove.Request(id=rid, config=rcfg)

rstub.Remove(rreq)
name: "vald-agent-ngt-0"
uuid: "test"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"

Closing

We introduced the background and the content of vald-demo repository. Thank you for your interest in the post and Vald. If you are interested in approximate neighbor search in this post, please try to run our notebook.

A highly scalable distributed fast approximate nearest neighbor dense vector search engine.