Publishing a Notebook with chiVe dataset

This is an introduction to the background and content of vald-demo repository that was recently released.

Why did we create this repository?

Currently, Vald is designed with many new technologies to allow greater flexibility and to meet the various demands of users. On the other hand, high flexibility is often accompanied by high complexity of the software structure. It makes it hard to image how to use Vald. In addition, it gives an impression that it is difficult to use it, and it also gives an impression that it is difficult to get a concrete image of what it can do.

Vald now aims to make it simple, easy to use, fast, and high-performance. As a first step, we created this notebook as a concrete and simple use case. By providing one concrete example of what can be done, we would like to make Vald easier to understand and use. One of the other reasons why we created this notebook is to help people to use Vald easily in various environments and use cases.

This is the reason why we created this repository.

We would like to release more demos using the other datasets such as English text or images.

Contents of vald-demo/chive

The above repository contains a chiVe directory that contains the notebook and other files that we have created. The contents of the directory are as follows.

chive\
- README.md
- sample-values.yaml: Sample YAML for deploying Vald using Helm to run notebook.
- tutorial.ipynb : Example of using Vald with chiVe.
- tutorial.md : Example of using Vald with chiVe (with output cells)

The released notebook is intended for those who have completed Get Started. To run the released notebook, complete Get Started is required.

If you haven’t done it yet, please try it at first. We will start to introduce the usage and contents of the notebook.

How to use

Let’s try to execute following commands at first.

git clone https://github.com/vdaas/vald-demo.git
docker run -it -v $(pwd)/vald-demo:/home/jovyan/work -p 8888:8888 jupyter/datascience-notebook

If executing success, you can use a notebook in Jupyter Notebook.

This notebook will give users to experience using the basic Vald interface such as Insert, Search, Update, and Remove using chiVe. Also, you can experience applied uses such as Word Analogies through similarity search.

Next, we would like to introduce the basic interface of Vald, which is used in this notebook.

Insert

code:

# create gRPC channel
channel = grpc.insecure_channel("localhost:8081")
# create stub
istub = insert_pb2_grpc.InsertStub(channel)

# Insert
sample = np.random.rand(300)
ivec = payload_pb2.Object.Vector(id="test", vector=sample)
icfg = payload_pb2.Insert.Config(skip_strict_exist_check=True)
ireq = payload_pb2.Insert.Request(vector=ivec, config=icfg)

istub.Insert(ireq)

output:

name: "vald-agent-ngt-0"
uuid: "test"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"

Search

Also, by inserted another vector with a different id of test, you can search for multiple vectors inserted in Vald. (Vald does not allow the insert of vectors with the same id.) Please try it!

code:

# create stub
sstub = search_pb2_grpc.SearchStub(channel)

# Search
svec = np.random.rand(300)
scfg = payload_pb2.Search.Config(num=10, radius=-1.0, epsilon=0.1, timeout=3000000000)
sreq = payload_pb2.Search.Request(vector=svec, config=scfg)

sstub.Search(sreq)

output:

results {
id: "test"
distance: 0.22659634053707123
}

Update

code:

# create stub
ustub = update_pb2_grpc.UpdateStub(channel)

# Update
sample = np.random.rand(300)
uvec = payload_pb2.Object.Vector(id="test", vector=sample)
ucfg = payload_pb2.Update.Config(skip_strict_exist_check=True)
ureq = payload_pb2.Update.Request(vector=uvec, config=ucfg)

ustub.Update(ureq)

output:

name: "vald-agent-ngt-0"
uuid: "test"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"

Remove

code:

# create stub
rstub = remove_pb2_grpc.RemoveStub(channel)

# Remove
rid = payload_pb2.Object.ID(id="test")
rcfg = payload_pb2.Remove.Config(skip_strict_exist_check=True)
rreq = payload_pb2.Remove.Request(id=rid, config=rcfg)

rstub.Remove(rreq)

output:

name: "vald-agent-ngt-0"
uuid: "test"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"
ips: "127.0.0.1"

Closing

We hope to enhance the community around Vald and approximate neighbor search. Let’s work together to improve the community!

If you want to know more about Vald, please visit follows web site or join our Slack:

See you again :)

A highly scalable distributed fast approximate nearest neighbor dense vector search engine.