khoj/docs/advanced.md

5 KiB

Advanced Usage

Search across Different Languages

To search for notes in multiple, different languages, you can use a multi-lingual model.
For example, the paraphrase-multilingual-MiniLM-L12-v2 supports 50+ languages, has good search quality and speed. To use it:

  1. Manually update search-type > asymmetric > encoder to paraphrase-multilingual-MiniLM-L12-v2 in your ~/.khoj/khoj.yml file for now. See diff of khoj.yml below for illustration:

    asymmetric:
    -  encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
    +  encoder: paraphrase-multilingual-MiniLM-L12-v2
      cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
      model_directory: "~/.khoj/search/asymmetric/"
    
  2. Regenerate your content index. For example, by opening <khoj-url>/api/update?t=force

Access Khoj on Mobile

  1. Setup Khoj on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
  2. Install Tailscale on your personal server and phone
  3. Open the Khoj web interface of the server from your phone browser.
    It should be http://tailscale-ip-of-server:42110 or http://name-of-server:42110 if you've setup MagicDNS
  4. Click the Add to Homescreen button
  5. Enjoy exploring your notes, documents and images from your phone!

Setup

  1. Set encoder-type, encoder and model-directory under asymmetric and/or symmetric search-type in your khoj.yml (at ~/.khoj/khoj.yml):
       asymmetric:
    -    encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
    +    encoder: text-embedding-ada-002
    +    encoder-type: khoj.utils.models.OpenAI
         cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
    -    encoder-type: sentence_transformers.SentenceTransformer
    -    model_directory: "~/.khoj/search/asymmetric/"
    +    model-directory: null
    
  2. Setup your OpenAI API key in Khoj
  3. Restart Khoj server to generate embeddings. It will take longer than with the offline search models.

Warnings

This configuration uses an online model

  • It will send all notes to OpenAI to generate embeddings
  • All queries will be sent to OpenAI when you search with Khoj
  • You will be charged by OpenAI based on the total tokens processed
  • It requires an active internet connection to search and index

Bootstrap Khoj Search for Offline Usage later

You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine. Note: Only search can currently run in fully offline mode, not chat.

  • With Internet
    1. Manually download the asymmetric text, symmetric text and image search models from HuggingFace
    2. Pip install khoj (and dependencies) in an associated virtualenv. E.g python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant
  • Without Internet
    1. Copy each of the search models into their respective folders, asymmetric, symmetric and image under the ~/.khoj/search/ directory on the air-gapped machine
    2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g source .venv/bin/activate && khoj

Query Filters

Use structured query syntax to filter entries from your knowledge based used by search results or chat responses.

  • Word Filter: Get entries that include/exclude a specified term
    • Entries that contain term_to_include: +"term_to_include"
    • Entries that contain term_to_exclude: -"term_to_exclude"
  • Date Filter: Get entries containing dates in YYYY-MM-DD format from specified date (range)
    • Entries from April 1st 1984: dt:"1984-04-01"
    • Entries after March 31st 1984: dt>="1984-04-01"
    • Entries before April 2nd 1984 : dt<="1984-04-01"
  • File Filter: Get entries from a specified file
    • Entries from incoming.org file: file:"incoming.org"
  • Combined Example
    • what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"
    • Adds all filters to the natural language query. It should return entries
      • from the file 1984.org
      • containing dates from the year 1984
      • excluding words "big" and "brother"
      • that best match the natural language query "what is the meaning of life?"