5 KiB
Advanced Usage
Search across Different Languages
To search for notes in multiple, different languages, you can use a multi-lingual model.
For example, the paraphrase-multilingual-MiniLM-L12-v2 supports 50+ languages, has good search quality and speed. To use it:
-
Manually update
search-type > asymmetric > encoder
toparaphrase-multilingual-MiniLM-L12-v2
in your~/.khoj/khoj.yml
file for now. See diff ofkhoj.yml
below for illustration:asymmetric: - encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1 + encoder: paraphrase-multilingual-MiniLM-L12-v2 cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2 model_directory: "~/.khoj/search/asymmetric/"
-
Regenerate your content index. For example, by opening <khoj-url>/api/update?t=force
Access Khoj on Mobile
- Setup Khoj on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
- Install Tailscale on your personal server and phone
- Open the Khoj web interface of the server from your phone browser.
It should behttp://tailscale-ip-of-server:42110
orhttp://name-of-server:42110
if you've setup MagicDNS - Click the Add to Homescreen button
- Enjoy exploring your notes, documents and images from your phone!
Use OpenAI Models for Search
Setup
- Set
encoder-type
,encoder
andmodel-directory
underasymmetric
and/orsymmetric
search-type
in yourkhoj.yml
(at~/.khoj/khoj.yml
):asymmetric: - encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1" + encoder: text-embedding-ada-002 + encoder-type: khoj.utils.models.OpenAI cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2" - encoder-type: sentence_transformers.SentenceTransformer - model_directory: "~/.khoj/search/asymmetric/" + model-directory: null
- Setup your OpenAI API key in Khoj
- Restart Khoj server to generate embeddings. It will take longer than with the offline search models.
Warnings
This configuration uses an online model
- It will send all notes to OpenAI to generate embeddings
- All queries will be sent to OpenAI when you search with Khoj
- You will be charged by OpenAI based on the total tokens processed
- It requires an active internet connection to search and index
Bootstrap Khoj Search for Offline Usage later
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine. Note: Only search can currently run in fully offline mode, not chat.
- With Internet
- Manually download the asymmetric text, symmetric text and image search models from HuggingFace
- Pip install khoj (and dependencies) in an associated virtualenv. E.g
python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant
- Without Internet
- Copy each of the search models into their respective folders,
asymmetric
,symmetric
andimage
under the~/.khoj/search/
directory on the air-gapped machine - Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g
source .venv/bin/activate && khoj
- Copy each of the search models into their respective folders,
Query Filters
Use structured query syntax to filter entries from your knowledge based used by search results or chat responses.
- Word Filter: Get entries that include/exclude a specified term
- Entries that contain term_to_include:
+"term_to_include"
- Entries that contain term_to_exclude:
-"term_to_exclude"
- Entries that contain term_to_include:
- Date Filter: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984:
dt:"1984-04-01"
- Entries after March 31st 1984:
dt>="1984-04-01"
- Entries before April 2nd 1984 :
dt<="1984-04-01"
- Entries from April 1st 1984:
- File Filter: Get entries from a specified file
- Entries from incoming.org file:
file:"incoming.org"
- Entries from incoming.org file:
- Combined Example
what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"
- Adds all filters to the natural language query. It should return entries
- from the file 1984.org
- containing dates from the year 1984
- excluding words "big" and "brother"
- that best match the natural language query "what is the meaning of life?"