2023-06-21 08:13:50 +00:00
< h1 > < img src = "src/khoj/interface/web/assets/icons/khoj-logo-sideways.svg" width = "330" alt = "Khoj Logo" > < / h1 >
2023-06-21 07:13:21 +00:00
[![test ](https://github.com/khoj-ai/khoj/actions/workflows/test.yml/badge.svg )](https://github.com/khoj-ai/khoj/actions/workflows/test.yml)
[![dockerize ](https://github.com/khoj-ai/khoj/actions/workflows/dockerize.yml/badge.svg )](https://github.com/khoj-ai/khoj/pkgs/container/khoj)
[![pypi ](https://github.com/khoj-ai/khoj/actions/workflows/pypi.yml/badge.svg )](https://pypi.org/project/khoj-assistant/)
2022-07-29 13:06:34 +00:00
2023-06-02 05:10:33 +00:00
*An AI personal assistant for your digital brain*
2022-07-29 13:06:34 +00:00
2023-01-29 02:01:20 +00:00
**Supported Plugins**
2023-06-21 07:13:21 +00:00
[![Khoj on Obsidian ](https://img.shields.io/badge/Obsidian-%23483699.svg?style=for-the-badge&logo=obsidian&logoColor=white )](https://github.com/khoj-ai/khoj/tree/master/src/interface/obsidian#readme)
[![Khoj on Emacs ](https://img.shields.io/badge/Emacs-%237F5AB6.svg?&style=for-the-badge&logo=gnu-emacs&logoColor=white )](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs#readme)
2023-01-29 02:01:20 +00:00
2022-07-29 13:06:34 +00:00
## Table of Contents
- [Features ](#Features )
2023-01-04 21:42:53 +00:00
- [Demos ](#Demos )
- [Khoj in Obsidian ](#khoj-in-obsidian )
- [Khoj in Emacs, Browser ](#khoj-in-emacs-browser )
2022-08-15 21:35:06 +00:00
- [Interfaces ](#Interfaces )
2022-07-29 13:06:34 +00:00
- [Architecture ](#Architecture )
- [Setup ](#Setup )
2022-08-04 20:32:32 +00:00
- [Install ](#1-Install )
2023-02-19 05:43:59 +00:00
- [Run ](#2-Run )
- [Configure ](#3-Configure )
2023-03-10 23:31:59 +00:00
- [Install Plugins ](#4-install-interface-plugins )
2022-07-29 13:06:34 +00:00
- [Use ](#Use )
2023-03-10 23:31:59 +00:00
- [Khoj Search ](#Khoj-search )
- [Khoj Chat ](#Khoj-chat )
2022-07-29 13:06:34 +00:00
- [Upgrade ](#Upgrade )
2023-01-04 22:50:26 +00:00
- [Khoj Server ](#upgrade-khoj-server )
- [Khoj.el ](#upgrade-khoj-on-emacs )
- [Khoj Obsidian ](#upgrade-khoj-on-obsidian )
2023-02-19 05:43:59 +00:00
- [Uninstall ](#uninstall )
2022-07-31 23:42:48 +00:00
- [Troubleshoot ](#Troubleshoot )
2023-01-04 16:33:31 +00:00
- [Advanced Usage ](#advanced-usage )
- [Access Khoj on Mobile ](#access-khoj-on-mobile )
2023-01-12 03:36:56 +00:00
- [Use OpenAI Models for Search ](#use-openai-models-for-search )
2023-02-06 22:38:20 +00:00
- [Search across Different Languages ](#search-across-different-languages )
2023-07-06 03:45:00 +00:00
- [Boostrap Khoj Search for Offline Usage Later ](#bootstrap-khoj-search-for-offline-usage-later )
2022-07-29 13:06:34 +00:00
- [Miscellaneous ](#Miscellaneous )
2023-01-14 03:22:12 +00:00
- [Setup OpenAI API key in Khoj ](#set-your-openai-api-key-in-khoj )
2023-03-10 23:31:59 +00:00
- [GPT API ](#gpt-api )
2022-07-29 13:06:34 +00:00
- [Performance ](#Performance )
- [Query Performance ](#Query-performance )
- [Indexing Performance ](#Indexing-performance )
- [Miscellaneous ](#Miscellaneous-1 )
2022-08-05 02:27:09 +00:00
- [Development ](#Development )
2023-01-05 23:07:00 +00:00
- [Visualize Codebase ](#visualize-codebase )
2023-07-11 03:08:07 +00:00
- [Create Release ](#create-khoj-release )
2022-08-05 02:27:09 +00:00
- [Setup ](#Setup )
- [Using Pip ](#Using-Pip )
- [Using Docker ](#Using-Docker )
2023-02-17 18:52:45 +00:00
- [Validate ](#Validate )
2022-07-31 23:42:48 +00:00
- [Credits ](#Credits )
2022-07-29 13:06:34 +00:00
## Features
2023-03-27 13:51:27 +00:00
- **Search**
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers.
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
- **General**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
2023-07-02 23:49:51 +00:00
- **Multiple Sources**: Index your Org-mode and Markdown notes, PDF files, Github repositories, and Photos
2023-03-27 13:51:27 +00:00
- **Multiple Interfaces**: Interact from your [Web Browser ](./src/khoj/interface/web/index.html ), [Emacs ](./src/interface/emacs/khoj.el ) or [Obsidian ](./src/interface/obsidian/ )
2022-07-29 13:06:34 +00:00
2023-01-04 21:42:53 +00:00
## Demos
### Khoj in Obsidian
2023-06-21 07:13:21 +00:00
https://github.com/khoj-ai/khoj/assets/6413477/3e33d8ea-25bb-46c8-a3bf-c92f78d0f56b
2022-07-29 13:06:34 +00:00
2023-01-04 23:07:32 +00:00
< details > < summary > Description< / summary >
2023-01-04 21:42:53 +00:00
2023-07-02 02:07:59 +00:00
1. Install Khoj via `pip` and start Khoj backend in a terminal (Run `khoj` )
```
python -m pip install khoj-assistant
khoj
```
2. Install Khoj plugin via Community Plugins settings pane on Obsidian app
- Check the new Khoj plugin settings
- Let Khoj backend index the markdown, pdf, Github markdown files in the current Vault
- Open Khoj plugin on Obsidian via Search button on Left Pane
- Search \"*Announce plugin to folks*\" in the [Obsidian Plugin docs ](https://marcus.se.net/obsidian-plugin-docs/ )
- Jump to the [search result ](https://marcus.se.net/obsidian-plugin-docs/publishing/submit-your-plugin )
2023-01-04 23:07:32 +00:00
< / details >
2023-01-04 21:42:53 +00:00
### Khoj in Emacs, Browser
2022-08-15 23:15:43 +00:00
https://user-images.githubusercontent.com/6413477/184735169-92c78bf1-d827-4663-9087-a1ea194b8f4b.mp4
2022-07-29 13:06:34 +00:00
2023-01-04 23:07:32 +00:00
< details > < summary > Description< / summary >
2022-07-29 13:06:34 +00:00
2022-08-15 23:15:43 +00:00
- Install Khoj via pip
- Start Khoj app
2023-06-21 07:13:21 +00:00
- Add this readme and [khoj.el readme ](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs ) as org-mode for Khoj to index
2022-08-15 23:15:43 +00:00
- Search \"*Setup editor*\" on the Web and Emacs. Re-rank the results for better accuracy
2023-06-21 07:13:21 +00:00
- Top result is what we are looking for, the [section to Install Khoj.el on Emacs ](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs#2-Install-Khojel )
2023-01-04 23:07:32 +00:00
< / details >
2022-07-29 13:06:34 +00:00
2023-01-04 23:07:32 +00:00
< details > < summary > Analysis< / summary >
2022-07-29 13:06:34 +00:00
2022-08-16 13:38:07 +00:00
- The results do not have any words used in the query
2022-07-29 13:06:34 +00:00
- *Based on the top result it seems the re-ranking model understands that Emacs is an editor?*
- The results incrementally update as the query is entered
2022-08-15 21:35:06 +00:00
- The results are re-ranked, for better accuracy, once user hits enter
2023-01-04 23:07:32 +00:00
< / details >
2022-08-15 21:35:06 +00:00
### Interfaces
2023-06-21 07:13:21 +00:00
![](https://github.com/khoj-ai/khoj/blob/master/docs/interfaces.png?)
2022-07-29 13:06:34 +00:00
## Architecture
2023-06-21 07:13:21 +00:00
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_architecture.png?)
2022-07-29 13:06:34 +00:00
## Setup
2023-01-04 21:42:53 +00:00
These are the general setup instructions for Khoj.
2023-06-29 21:54:51 +00:00
- Make sure [python ](https://realpython.com/installing-python/ ) and [pip ](https://pip.pypa.io/en/stable/installation/ ) are installed on your machine
2023-06-21 07:13:21 +00:00
- Check the [Khoj.el Readme ](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs#Setup ) to setup Khoj with Emacs< br />
2023-03-27 13:51:27 +00:00
Its simpler as it can skip the server *install* , *run* and *configure* step below.
2023-06-21 07:13:21 +00:00
- Check the [Khoj Obsidian Readme ](https://github.com/khoj-ai/khoj/tree/master/src/interface/obsidian#Setup ) to setup Khoj with Obsidian< br />
2023-03-27 13:51:27 +00:00
Its simpler as it can skip the *configure* step below.
2023-01-04 21:42:53 +00:00
2022-08-04 20:32:32 +00:00
### 1. Install
2023-07-10 18:54:57 +00:00
Run the following command in your terminal to install the Khoj backend.
2023-04-16 13:17:20 +00:00
- On Linux/MacOS
```shell
python -m pip install khoj-assistant
```
- On Windows
```shell
py -m pip install khoj-assistant
```
2022-07-29 13:06:34 +00:00
2023-02-19 05:43:59 +00:00
### 2. Run
2022-07-29 13:06:34 +00:00
2023-07-10 18:54:57 +00:00
Run the following commmand from your terminal to start the Khoj backend and open Khoj in your browser.
2022-08-16 13:38:07 +00:00
```shell
2023-07-10 18:54:57 +00:00
khoj --gui
2022-08-16 13:38:07 +00:00
```
2022-08-15 21:35:06 +00:00
2023-02-19 05:43:59 +00:00
Note: To start Khoj automatically in the background use [Task scheduler ](https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10 ) on Windows or [Cron ](https://en.wikipedia.org/wiki/Cron ) on Mac, Linux (e.g with `@reboot khoj` )
2022-08-16 13:38:07 +00:00
### 3. Configure
2023-07-10 18:54:57 +00:00
1. Set `File` , `Folder` and hit `Save` in each Plugins you want to enable for Search on the Khoj config page
2. Add your OpenAI API key to Chat Feature settings if you want to use Chat
3. Click `Configure` and wait. The app will download ML models and index the content for search and (optionally) chat
2022-08-15 21:35:06 +00:00
2023-03-10 23:31:59 +00:00
### 4. Install Interface Plugins
2023-07-10 18:54:57 +00:00
Khoj exposes a web interface to search, chat and configure by default.< br / >
2023-03-10 23:31:59 +00:00
The optional steps below allow using Khoj from within an existing application like Obsidian or Emacs.
- **Khoj Obsidian**:< br />
2023-06-21 07:13:21 +00:00
[Install ](https://github.com/khoj-ai/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin ) the Khoj Obsidian plugin
2023-03-10 23:31:59 +00:00
- **Khoj Emacs**:< br />
2023-06-21 07:13:21 +00:00
[Install ](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs#2-Install-Khojel ) khoj.el
2022-07-29 13:06:34 +00:00
2023-03-10 23:31:59 +00:00
## Use
### Khoj Search
2023-01-04 21:42:53 +00:00
- **Khoj via Obsidian**
2023-01-04 23:07:32 +00:00
- Click the *Khoj search* icon 🔎 on the [Ribbon ](https://help.obsidian.md/User+interface/Workspace/Ribbon ) or Search for *Khoj: Search* in the [Command Palette ](https://help.obsidian.md/Plugins/Command+palette )
2022-07-29 13:06:34 +00:00
- **Khoj via Emacs**
- Run `M-x khoj <user-query>`
2023-01-04 21:42:53 +00:00
- **Khoj via Web**
2023-07-11 03:16:25 +00:00
- Open < http: // localhost:42110 /> directly
2022-07-29 13:06:34 +00:00
- **Khoj via API**
2023-07-11 03:16:25 +00:00
- See the Khoj FastAPI [Swagger Docs ](http://localhost:42110/docs ), [ReDocs ](http://localhost:42110/redocs )
2022-07-29 13:06:34 +00:00
2023-03-10 23:31:59 +00:00
< details > < summary > Query Filters< / summary >
2022-12-26 19:12:49 +00:00
Use structured query syntax to filter the natural language search results
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
2023-03-10 23:31:59 +00:00
< / details >
### Khoj Chat
#### Overview
- Creates a personal assistant for you to inquire and engage with your notes
- Uses [ChatGPT ](https://openai.com/blog/chatgpt ) and [Khoj search ](#khoj-search )
- Supports multi-turn conversations with the relevant notes for context
- Shows reference notes used to generate a response
- **Note**: *Your query and top notes from khoj search will be sent to OpenAI for processing*
#### Setup
- [Setup your OpenAI API key in Khoj ](#set-your-openai-api-key-in-khoj )
#### Use
2023-07-11 03:16:25 +00:00
1. Open [/chat ](http://localhost:42110/chat )[^2]
2023-03-10 23:31:59 +00:00
2. Type your queries and see response by Khoj from your notes
#### Demo
2023-06-21 07:13:21 +00:00
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_chat_web_interface.png?)
2023-03-10 23:31:59 +00:00
### Details
1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
2. These notes, the last few messages and associated metadata is passed to ChatGPT along with your query for a response
2022-07-29 13:06:34 +00:00
## Upgrade
2023-01-04 22:50:26 +00:00
### Upgrade Khoj Server
2022-08-16 13:38:07 +00:00
```shell
2022-08-04 20:32:32 +00:00
pip install --upgrade khoj-assistant
2022-07-29 13:06:34 +00:00
```
2023-02-19 05:43:59 +00:00
*Note: To upgrade to the latest pre-release version of the khoj server run below command*
```shell
# Maps to the latest commit on the master branch
pip install --upgrade --pre khoj-assistant
```
2023-02-17 18:52:45 +00:00
2023-01-04 22:50:26 +00:00
### Upgrade Khoj on Emacs
- Use your Emacs Package Manager to Upgrade
2023-06-21 07:13:21 +00:00
- See [khoj.el readme ](https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs#Upgrade ) for details
2023-01-04 22:50:26 +00:00
### Upgrade Khoj on Obsidian
- Upgrade via the Community plugins tab on the settings pane in the Obsidian app
2023-06-21 07:13:21 +00:00
- See the [khoj plugin readme ](https://github.com/khoj-ai/khoj/tree/master/src/interface/obsidian#2-Setup-Plugin ) for details
2023-01-04 22:50:26 +00:00
2023-02-19 05:43:59 +00:00
## Uninstall
2023-02-06 22:38:20 +00:00
1. (Optional) Hit `Ctrl-C` in the terminal running the khoj server to stop it
2. Delete the khoj directory in your home folder (i.e `~/.khoj` on Linux, Mac or `C:\Users\<your-username>\.khoj` on Windows)
3. Uninstall the khoj server with `pip uninstall khoj-assistant`
4. (Optional) Uninstall khoj.el or the khoj obsidian plugin in the standard way on Emacs, Obsidian
2022-07-31 23:42:48 +00:00
## Troubleshoot
2022-07-29 13:06:34 +00:00
2023-01-14 02:03:15 +00:00
#### Install fails while building Tokenizer dependency
- **Details**: `pip install khoj-assistant` fails while building the `tokenizers` dependency. Complains about Rust.
- **Fix**: Install Rust to build the tokenizers package. For example on Mac run:
2023-01-05 23:09:36 +00:00
```shell
brew install rustup
rustup-init
source ~/.cargo/env
```
2023-06-21 07:13:21 +00:00
- **Refer**: [Issue with Fix ](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946 ) for more details
2022-07-29 13:06:34 +00:00
2023-01-14 02:03:15 +00:00
#### Search starts giving wonky results
2023-07-11 03:16:25 +00:00
- **Fix**: Open [/api/update?force=true ](http://localhost:42110/api/update?force=true )[^2] in browser to regenerate index from scratch
2023-01-14 02:03:15 +00:00
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
#### Khoj in Docker errors out with \"Killed\" in error message
- **Fix**: Increase RAM available to Docker Containers in Docker Settings
- **Refer**: [StackOverflow Solution ](https://stackoverflow.com/a/50770267 ), [Configure Resources on Docker for Mac ](https://docs.docker.com/desktop/mac/#resources )
#### Khoj errors out complaining about Tensors mismatch or null
- **Mitigation**: Disable `image` search using the desktop GUI
## Advanced Usage
2023-01-14 03:22:12 +00:00
### Access Khoj on Mobile
2023-01-04 16:33:31 +00:00
1. [Setup Khoj ](#Setup ) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
2. [Install ](https://tailscale.com/kb/installation/ ) [Tailscale ](tailscale.com/ ) on your personal server and phone
2023-07-11 03:16:25 +00:00
3. Open the Khoj web interface of the server from your phone browser.< br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS ](https://tailscale.com/kb/1081/magicdns/ )
2023-01-12 03:36:56 +00:00
4. Click the [Add to Homescreen ](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen ) button
2023-07-02 23:49:51 +00:00
5. Enjoy exploring your notes, documents and images from your phone!
2023-01-04 16:33:31 +00:00
2023-06-21 07:13:21 +00:00
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_pwa_android.png?)
2023-01-04 18:38:57 +00:00
2023-01-14 03:22:12 +00:00
### Use OpenAI Models for Search
#### Setup
2023-01-12 03:36:56 +00:00
1. Set `encoder-type` , `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml` [^1]:
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: text-embedding-ada-002
2023-05-28 04:50:26 +00:00
+ encoder-type: khoj.utils.models.OpenAI
2023-01-12 03:36:56 +00:00
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
- encoder-type: sentence_transformers.SentenceTransformer
- model_directory: "~/.khoj/search/asymmetric/"
+ model-directory: null
```
2023-01-14 03:22:12 +00:00
2. [Setup your OpenAI API key in Khoj ](#set-your-openai-api-key-in-khoj )
3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
2023-01-12 03:36:56 +00:00
2023-01-14 03:22:12 +00:00
#### Warnings
This configuration *uses an online model*
- It will **send all notes to OpenAI** to generate embeddings
- **All queries will be sent to OpenAI** when you search with Khoj
- You will be **charged by OpenAI** based on the total tokens processed
- It *requires an active internet connection* to search and index
2023-02-06 22:38:20 +00:00
### Search across Different Languages
To search for notes in multiple, different languages, you can use a [multi-lingual model ](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models ).< br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2 ](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 ) supports [50+ languages ](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages ), has good search quality and speed. To use it:
2023-03-19 23:23:49 +00:00
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
2023-02-06 22:38:20 +00:00
```diff
asymmetric:
2023-03-22 02:38:17 +00:00
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
2023-03-19 23:23:49 +00:00
+ encoder: "paraphrase-multilingual-MiniLM-L12-v2"
2023-02-06 22:38:20 +00:00
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
model_directory: "~/.khoj/search/asymmetric/"
```
2023-07-11 03:16:25 +00:00
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force ](http://localhost:42110/api/update?t=force )
2023-01-14 03:22:12 +00:00
2023-07-06 03:45:00 +00:00
### Bootstrap Khoj Search for Offline Usage later
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
Note: *Only search can currently run in fully offline mode, not chat.*
- With Internet
1. Manually download the [asymmetric text ](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1 ), [symmetric text ](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 )and [image search ](https://huggingface.co/sentence-transformers/clip-ViT-B-32 ) models from HuggingFace
2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
- Without Internet
1. Copy each of the search models into their respective folders, `asymmetric` , `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
2023-01-14 03:22:12 +00:00
## Miscellaneous
### Set your OpenAI API key in Khoj
If you want, Khoj can be configured to use OpenAI for search and chat.< br / >
Add your OpenAI API to Khoj by using either of the two options below:
2023-07-11 03:16:25 +00:00
- Open your [Khoj settings ](http://localhost:42110/config/processor/conversation ), add your OpenAI API key, and click *Save* . Then go to your [Khoj settings ](http://localhost:42110/config ) and click `Configure` . This will refresh Khoj with your OpenAI API key.
2023-01-14 03:22:12 +00:00
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml` [^1] to your [OpenAI API key ](https://beta.openai.com/account/api-keys ) and restart khoj:
```diff
2023-01-12 03:36:56 +00:00
processor:
conversation:
2023-01-14 03:22:12 +00:00
- openai-api-key: # "YOUR_OPENAI_API_KEY"
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
2023-01-12 03:36:56 +00:00
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
2023-01-14 03:22:12 +00:00
```
2023-01-12 03:36:56 +00:00
2023-03-10 23:31:59 +00:00
**Warning**: *This will enable Khoj to send your query and note(s) to OpenAI for processing*
2023-01-12 03:36:56 +00:00
2023-03-10 23:31:59 +00:00
### GPT API
2023-07-11 03:16:25 +00:00
- The [chat ](http://localhost:42110/api/chat ), [answer ](http://localhost:42110/api/beta/answer ) and [search ](http://localhost:42110/api/beta/search ) API endpoints use [OpenAI API ](https://openai.com/api/ )
2023-01-14 03:22:12 +00:00
- They are disabled by default
- To use them:
1. [Setup your OpenAI API key in Khoj ](#set-your-openai-api-key-in-khoj )
2023-07-11 03:16:25 +00:00
2. Interact with them from the [Khoj Swagger docs ](http://locahost:42110/docs )[^2]
2022-07-29 13:06:34 +00:00
2023-07-01 09:13:19 +00:00
### Index Github Repository for Search, Chat
The Khoj Github plugin can index issues, commit messages and markdown, org-mode and PDF files from any repositories you have access to. This allows you to chat or search with these repositories. Get answers, resolve issues or just explore a repo with the help of your AI personal assistant.
2023-06-13 23:55:58 +00:00
2023-07-01 09:13:19 +00:00
See the [Khoj FAQ ](https://faq.khoj.dev ) for a demo of Khoj search and chat. It makes the Khoj github repo available for exploring.
Note: *Khoj will ignore code files in the repository for now as the default AI model used works best with natural language text, not code.*
#### Setup Khoj Github plugin
2023-06-13 23:55:58 +00:00
1. Get a [pat token ](https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token ) with `repo` and `read:org` scopes in the classic flow.
2023-07-01 09:13:19 +00:00
2. Configure Khoj settings to include the `owner` and `repo_name` . The `owner` will be the organization name if the repo is in an organization. The `repo_name` will be the name of the repository. Optionally, you can also supply a branch name. If no branch name is supplied, the `master` branch will be used.
2022-07-29 13:06:34 +00:00
2022-08-05 02:27:09 +00:00
## Performance
### Query performance
2022-08-17 15:32:55 +00:00
- Semantic search using the bi-encoder is fairly fast at \<50 ms
2022-08-05 02:27:09 +00:00
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
2022-09-07 11:51:03 +00:00
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
2022-08-05 02:27:09 +00:00
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
2022-09-07 11:51:03 +00:00
- Indexing 100K+ line corpus of notes takes about 10 minutes
2022-08-05 02:27:09 +00:00
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
2022-09-07 11:10:38 +00:00
- Note: *It should only take this long on the first run* as the index is incrementally updated
2022-08-05 02:27:09 +00:00
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
2022-08-04 20:32:32 +00:00
## Development
2023-01-05 23:07:00 +00:00
### Visualize Codebase
*[Interactive Visualization](https://mango-dune-07a8b7110.1.azurestaticapps.net/?repo=debanjum%2Fkhoj)*
2023-06-21 07:13:21 +00:00
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_codebase_visualization_0.2.1.png?)
2023-01-05 23:07:00 +00:00
2023-07-10 22:59:34 +00:00
### Create Khoj Release
Follow the steps below to [release ](https://github.com/debanjum/khoj/releases/ ) Khoj. This will create a stable release of Khoj on [Pypi ](https://pypi.org/project/khoj-assistant/ ), [Melpa ](https://stable.melpa.org/#%252Fkhoj ) and [Obsidian ](https://obsidian.md/plugins?id%253Dkhoj ). It will also create desktop apps of Khoj and attach them to the latest release.
1. Create and tag release commit by running the bump_version script. The release commit sets version number in required metadata files.
```shell
./scripts/bump_version.sh -c "< release_version > "
```
2. Push commit and then the tag to trigger the release workflow to create Release with auto generated release notes.
```shell
git push origin master # push release commit to khoj repository
git push origin < release_version > # push release tag to khoj repository
```
3. [Optional] Update the Release Notes to highlight new features, fixes and updates
2022-08-04 20:32:32 +00:00
### Setup
2022-08-05 01:59:52 +00:00
#### Using Pip
2022-08-05 02:27:09 +00:00
##### 1. Install
2022-08-16 13:38:07 +00:00
```shell
2023-02-19 05:43:59 +00:00
# Get Khoj Code
2023-06-21 07:13:21 +00:00
git clone https://github.com/khoj-ai/khoj & & cd khoj
2023-02-19 05:43:59 +00:00
# Create, Activate Virtual Environment
2022-08-16 13:38:07 +00:00
python3 -m venv .venv & & source .venv/bin/activate
2023-02-19 05:43:59 +00:00
# Install Khoj for Development
2023-02-17 18:52:45 +00:00
pip install -e .[dev]
2022-08-16 13:38:07 +00:00
```
2022-08-05 02:27:09 +00:00
2023-02-17 18:52:45 +00:00
##### 2. Run
1. Start Khoj
```shell
khoj -vv
```
2. Configure Khoj
2023-07-11 03:16:25 +00:00
- **Via the Settings UI**: Add files, directories to index the [Khoj settings ](http://localhost:42110/config ) UI once Khoj has started up. Once you've saved all your settings, click `Configure` .
2023-02-17 18:52:45 +00:00
- **Manually**:
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type` and `processor` sub-section(s) irrelevant for your use-case
- Restart khoj
2022-08-05 02:27:09 +00:00
2023-07-02 23:49:51 +00:00
Note: Wait after configuration for khoj to Load ML model, generate embeddings and expose API to query notes, images, documents etc specified in config YAML
2022-08-05 02:27:09 +00:00
2022-08-04 20:32:32 +00:00
#### Using Docker
2022-08-05 02:27:09 +00:00
##### 1. Clone
2022-07-29 13:06:34 +00:00
2022-08-16 13:38:07 +00:00
```shell
2023-06-21 07:13:21 +00:00
git clone https://github.com/khoj-ai/khoj & & cd khoj
2022-08-04 20:32:32 +00:00
```
2022-07-29 13:06:34 +00:00
2022-08-05 02:27:09 +00:00
##### 2. Configure
2022-08-02 18:12:27 +00:00
2023-07-02 23:49:51 +00:00
- **Required**: Update [docker-compose.yml ](./docker-compose.yml ) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
2022-08-04 20:32:32 +00:00
- **Optional**: Edit application configuration in [khoj_docker.yml ](./config/khoj_docker.yml )
2022-08-02 18:12:27 +00:00
2022-08-05 02:27:09 +00:00
##### 3. Run
2022-08-02 18:12:27 +00:00
2022-08-16 13:38:07 +00:00
```shell
2022-08-04 20:32:32 +00:00
docker-compose up -d
```
2022-08-02 18:12:27 +00:00
2022-08-04 20:32:32 +00:00
*Note: The first run will take time. Let it run, it\'s mostly not hung, just generating embeddings*
2022-08-02 18:12:27 +00:00
2022-08-05 02:27:09 +00:00
##### 4. Upgrade
2022-08-16 13:38:07 +00:00
```shell
2022-08-05 02:27:09 +00:00
docker-compose build --pull
```
2023-02-17 18:52:45 +00:00
### Validate
#### Before Make Changes
2023-02-17 20:29:12 +00:00
1. Install Git Hooks for Validation
2023-02-17 18:52:45 +00:00
```shell
2023-02-17 20:29:12 +00:00
pre-commit install -t pre-push -t pre-commit
2023-02-17 18:52:45 +00:00
```
2023-02-17 20:29:12 +00:00
- This ensures standard code formatting fixes and other checks run automatically on every commit and push
2023-02-17 18:52:45 +00:00
- Note 1: If [pre-commit ](https://pre-commit.com/#intro ) didn't already get installed, [install it ](https://pre-commit.com/#install ) via `pip install pre-commit`
2023-02-17 20:29:12 +00:00
- Note 2: To run the pre-commit changes manually, use `pre-commit run --hook-stage manual --all` before creating PR
2023-02-17 18:52:45 +00:00
#### Before Creating PR
2023-06-13 23:55:58 +00:00
1. Run Tests. If you get an error complaining about a missing `fast_tokenizer_file` , follow the solution [in this Github issue ](https://github.com/UKPLab/sentence-transformers/issues/1659 ).
2023-02-17 18:52:45 +00:00
```shell
pytest
```
2. Run MyPy to check types
```shell
mypy --config-file pyproject.toml
```
#### After Creating PR
- Automated [validation workflows ](.github/workflows ) run for every PR.
Ensure any issues seen by them our fixed
- Test the python packge created for a PR
1. Download and extract the zipped `.whl` artifact generated from the pypi workflow run for the PR.
2. Install (in your virtualenv) with `pip install /path/to/download*.whl>`
3. Start and use the application to see if it works fine
2022-07-29 13:06:34 +00:00
2022-07-31 23:42:48 +00:00
## Credits
2022-07-29 13:06:34 +00:00
- [Multi-QA MiniLM Model ](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1 ), [All MiniLM Model ](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 ) for Text Search. See [SBert Documentation ](https://www.sbert.net/examples/applications/retrieve_rerank/README.html )
- [OpenAI CLIP Model ](https://github.com/openai/CLIP ) for Image Search. See [SBert Documentation ](https://www.sbert.net/examples/applications/image-search/README.html )
- Charles Cave for [OrgNode Parser ](http://members.optusnet.com.au/~charles57/GTD/orgnode.html )
- [Org.js ](https://mooz.github.io/org-js/ ) to render Org-mode results on the Web interface
- [Markdown-it ](https://github.com/markdown-it/markdown-it ) to render Markdown results on the Web interface
2023-01-14 03:22:12 +00:00
[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
2023-07-11 03:16:25 +00:00
[^2]: Default Khoj url @ http://localhost:42110