This Is related to the story Find what is relevant.
The immediate application would be to search for relevant content given some keywords. Documents relating to those keywords should pop up even if those words were never written literally. Also, searching in different languages. Searching for terms or concepts in a language, would yield results in another language.
Other applications, and taking advantage of LLM models (out of the scope of this project) to generate the keywords themselves (reading the current document and coming up with relevant keywords) Then the semantic search would find similar documents/fragments. The LLM would be exposed via an Openai API compatible way so that the frontend could use it however it wants. The backend would implement the MCP tools and the model itself.
Data Embedding
First we have to vectorize the texts using an embedding model. These models translate text to a numeric representation so that we can apply a distance algorithm later given a query (also needs to be vectorized). This vectorizing process is multilingual so we can support major languages and the resulting numeric data would be the same independently of the language used. This is one of the benefits over traditional search, where no matter what languages you use in the query, you will find results with similar ideas no matter the language.
The golang go-llama-cpp binding does not support new architectures (like bert) since it is based on old versions ob llama.cpp and its not currently maintained.
The ollama engine is a separate process that needs to run in parallel.
So the only real option to ship an embedding model with the daemon is to use the onnx runtime for golang
We should load a dynamic library (.so in linux and Mac, dll in windows) and then load the models. A good enough model is multilingual-e5-small, especially in its qint8 version. Its just 118 MB and the tokenizer about 17MB.
As long as we ship those three files along with the daemon (library, model, tokenizer) we are good to go.
The process of vectorization is slow (even though we can do it in batches) so it will slow down the reindex process. However, as the search text is already stored in the db, we can populate the embeddings table on the background once the reindex process has finished. On a live index (for example when we create a new change) the embedding overhead is minimal.
Vector search
We should then use the sqlite-vec, extension so that we can store vectorized data in the database
The database will grow around 300MB in size (+ indexex) for large nodes.
With all the data stored, we then can implement the api that will search similar content as the one in the query search. This will be done via a K-nearest neighbors (instead of brute force) on the vectorized data. The sqlite extension will take care of this, so it should be relatively fast.