mirror of
https://github.com/khoj-ai/khoj.git
synced 2024-12-18 02:27:10 +00:00
c28755ccd2
Extract performance into separate sectin into shoving it under search Create page for web interface
4 KiB
4 KiB
Advanced Usage
Search across Different Languages
To search for notes in multiple, different languages, you can use a multi-lingual model.
For example, the paraphrase-multilingual-MiniLM-L12-v2 supports 50+ languages, has good search quality and speed. To use it:
-
Manually update
search-type > asymmetric > encoder
toparaphrase-multilingual-MiniLM-L12-v2
in your~/.khoj/khoj.yml
file for now. See diff ofkhoj.yml
below for illustration:asymmetric: - encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1 + encoder: paraphrase-multilingual-MiniLM-L12-v2 cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2 model_directory: "~/.khoj/search/asymmetric/"
-
Regenerate your content index. For example, by opening <khoj-url>/api/update?t=force
Access Khoj on Mobile
- Setup Khoj on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
- Install Tailscale on your personal server and phone
- Open the Khoj web interface of the server from your phone browser.
It should behttp://tailscale-ip-of-server:42110
orhttp://name-of-server:42110
if you've setup MagicDNS - Click the Add to Homescreen button
- Enjoy exploring your notes, documents and images from your phone!
Use OpenAI Models for Search
Setup
- Set
encoder-type
,encoder
andmodel-directory
underasymmetric
and/orsymmetric
search-type
in yourkhoj.yml
(at~/.khoj/khoj.yml
):asymmetric: - encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1" + encoder: text-embedding-ada-002 + encoder-type: khoj.utils.models.OpenAI cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2" - encoder-type: sentence_transformers.SentenceTransformer - model_directory: "~/.khoj/search/asymmetric/" + model-directory: null
- Setup your OpenAI API key in Khoj
- Restart Khoj server to generate embeddings. It will take longer than with the offline search models.
Warnings
This configuration uses an online model
- It will send all notes to OpenAI to generate embeddings
- All queries will be sent to OpenAI when you search with Khoj
- You will be charged by OpenAI based on the total tokens processed
- It requires an active internet connection to search and index
Bootstrap Khoj Search for Offline Usage later
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine. Note: Only search can currently run in fully offline mode, not chat.
- With Internet
- Manually download the asymmetric text, symmetric text and image search models from HuggingFace
- Pip install khoj (and dependencies) in an associated virtualenv. E.g
python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant
- Without Internet
- Copy each of the search models into their respective folders,
asymmetric
,symmetric
andimage
under the~/.khoj/search/
directory on the air-gapped machine - Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g
source .venv/bin/activate && khoj
- Copy each of the search models into their respective folders,