Resolve merge conflicts in dependency imports

This commit is contained in:
sabaimran 2023-11-19 11:42:20 -08:00
commit ef5e9d66c1
43 changed files with 450 additions and 428 deletions

View file

@ -9,7 +9,7 @@
</div> </div>
<div align="center"> <div align="center">
<b>An AI personal assistant for your digital brain</b> <b>An AI copilot for your Second Brain</b>
</div> </div>
@ -24,30 +24,29 @@
</div> </div>
## Introduction ## Introduction
Welcome to the Khoj Docs! This is the best place to [get started](./setup.md) with Khoj. Welcome to the Khoj Docs! This is the best place to get setup and explore Khoj's features.
- Khoj is a desktop application to [search](./search.md) and [chat](./chat.md) with your notes, documents and images - Khoj is an open source, personal AI
- It is an offline-first, open source AI personal assistant accessible from your [Emacs](./emacs.md), [Obsidian](./obsidian.md) or [Web browser](./web.md) - You can [chat](chat.md) with it about anything. When relevant, it'll use any notes or documents you shared with it to respond
- It works with jpeg, markdown, [notion](./notion_integration.md) org-mode, pdf files and [github repositories](./github_integration.md) - Quickly [find](search.md) relevant notes and documents using natural language
- If you have more questions, check out the [FAQ](https://faq.khoj.dev/) - it's a live Khoj instance indexing our Github repository! - It understands pdf, plaintext, markdown, org-mode files, [notion pages](notion_integration.md) and [github repositories](github_integration.md)
- Access it from your [Emacs](emacs.md), [Obsidian](obsidian.md), [Web browser](web.md) or the [Khoj Desktop app](desktop.md)
- You can self-host Khoj on your consumer hardware or share it with your family, friends or team from your private cloud
## Quickstart ## Quickstart
[Click here](./setup.md) for full setup instructions - [Try Khoj Cloud](https://app.khoj.dev) to get started quickly
- [Read these instructions](./setup.md) to self-host a private instance of Khoj
```shell
pip install khoj-assistant && khoj
```
## Overview ## Overview
<img src="https://docs.khoj.dev/assets/khoj_search_on_web.png" width="400px"> <img src="https://docs.khoj.dev/assets/khoj_search_on_web.png" width="400px">
<span>&nbsp;&nbsp;</span> <span>&nbsp;&nbsp;</span>
<img src="https://docs.khoj.dev/assets/khoj_chat_on_web.png" width="400px"> <img src="https://docs.khoj.dev/assets/khoj_chat_on_web.png" width="400px">
#### [Search](./search.md) #### [Search](search.md)
- **Local**: Your personal data stays local. All search and indexing is done on your machine. - **Natural**: Use natural language queries to quickly find relevant notes and documents.
- **Incremental**: Incremental search for a fast, search-as-you-type experience - **Incremental**: Incremental search for a fast, search-as-you-type experience
#### [Chat](./chat.md) #### [Chat](chat.md)
- **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers. - **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers.
- **Iterative discovery**: Iteratively explore and (re-)discover your notes - **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation - **Assisted creativity**: Smoothly weave across answers retrieval and content generation

View file

@ -1,12 +1,13 @@
- Get Started - Get Started
- [Overview](README.md) - [Overview](README.md)
- [Install](setup.md) - [Self-Host](setup.md)
- [Demos](demos.md) - [Demos](demos.md)
- Use - Use
- [Features](features.md) - [Features](features.md)
- [Chat](chat.md) - [Chat](chat.md)
- [Search](search.md) - [Search](search.md)
- Interfaces - Clients
- [Desktop](desktop.md)
- [Obsidian](obsidian.md) - [Obsidian](obsidian.md)
- [Emacs](emacs.md) - [Emacs](emacs.md)
- [Web](web.md) - [Web](web.md)

View file

@ -1,63 +1,11 @@
## Advanced Usage ## Advanced Usage
### Search across Different Languages
### Search across Different Languages (Self-Hosting)
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br /> To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it: For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration: 1. Manually update the search config in server's admin settings page. Go to [the search config](http://localhost:42110/server/admin/database/searchmodelconfig/). Either create a new one, if none exists, or update the existing one. Set the bi_encoder to `sentence-transformers/multi-qa-MiniLM-L6-cos-v1` and the cross_encoder to `cross-encoder/ms-marco-MiniLM-L-6-v2`.
2. Regenerate your content index from all the relevant clients. This step is very important, as you'll need to re-encode all your content with the new model.
```diff
asymmetric:
- encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
+ encoder: paraphrase-multilingual-MiniLM-L12-v2
cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
model_directory: "~/.khoj/search/asymmetric/"
```
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
### Access Khoj on Mobile
1. [Setup Khoj](/#/setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
5. Enjoy exploring your notes, documents and images from your phone!
![](./assets/khoj_pwa_android.png?)
### Use OpenAI Models for Search
#### Setup
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml` (at `~/.khoj/khoj.yml`):
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: text-embedding-ada-002
+ encoder-type: khoj.utils.models.OpenAI
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
- encoder-type: sentence_transformers.SentenceTransformer
- model_directory: "~/.khoj/search/asymmetric/"
+ model-directory: null
```
2. [Setup your OpenAI API key in Khoj](/#/chat?id=setup)
3. Restart Khoj server to generate embeddings. It will take longer than with the offline search models.
#### Warnings
This configuration *uses an online model*
- It will **send all notes to OpenAI** to generate embeddings
- **All queries will be sent to OpenAI** when you search with Khoj
- You will be **charged by OpenAI** based on the total tokens processed
- It *requires an active internet connection* to search and index
### Bootstrap Khoj Search for Offline Usage later
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
Note: *Only search can currently run in fully offline mode, not chat.*
- With Internet
1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
- Without Internet
1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
### Query Filters ### Query Filters

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 333 KiB

View file

@ -1,13 +1,13 @@
### Khoj Chat ## Khoj Chat
#### Overview ### Overview
- Creates a personal assistant for you to inquire and engage with your notes - Creates a personal assistant for you to inquire and engage with your notes
- You can choose to use Online or Offline Chat depending on your requirements - You can choose to use Online or Offline Chat depending on your requirements
- Supports multi-turn conversations with the relevant notes for context - Supports multi-turn conversations with the relevant notes for context
- Shows reference notes used to generate a response - Shows reference notes used to generate a response
### Setup ### Setup (Self-Hosting)
#### Offline Chat #### Offline Chat
Offline chat stays completely private and works without internet. But it is slower, lower quality and more compute intensive. Offline chat stays completely private and works without internet using open-source models.
> **System Requirements**: > **System Requirements**:
> - Minimum 8 GB RAM. Recommend **16Gb VRAM** > - Minimum 8 GB RAM. Recommend **16Gb VRAM**
@ -15,9 +15,10 @@ Offline chat stays completely private and works without internet. But it is slow
> - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required > - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required
> - A Mac M1+ or [Vulcan supported GPU](https://vulkan.gpuinfo.org/) should significantly speed up chat response times > - A Mac M1+ or [Vulcan supported GPU](https://vulkan.gpuinfo.org/) should significantly speed up chat response times
- Open your [Khoj settings](http://localhost:42110/config/) and click *Enable* on the Offline Chat card 1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration.
2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We currently only support offline models that use the [Llama chat prompt](https://replicate.com/blog/how-to-prompt-llama#wrap-user-input-with-inst-inst-tags) format. We recommend using `mistral-7b-instruct-v0.1.Q4_0.gguf`.
![Configure offline chat](https://user-images.githubusercontent.com/6413477/257021364-8a2029f5-dc21-4de8-9af9-9ba6100d695c.mp4 ':include :type=mp4') !> **Note**: Offline chat is not supported for a multi-user scenario. The host machine will encounter segmentation faults if multiple users try to use offline chat at the same time.
#### Online Chat #### Online Chat
Online chat requires internet to use ChatGPT but is faster, higher quality and less compute intensive. Online chat requires internet to use ChatGPT but is faster, higher quality and less compute intensive.
@ -25,14 +26,12 @@ Online chat requires internet to use ChatGPT but is faster, higher quality and l
!> **Warning**: This will enable Khoj to send your chat queries and query relevant notes to OpenAI for processing !> **Warning**: This will enable Khoj to send your chat queries and query relevant notes to OpenAI for processing
1. Get your [OpenAI API Key](https://platform.openai.com/account/api-keys) 1. Get your [OpenAI API Key](https://platform.openai.com/account/api-keys)
2. Open your [Khoj Online Chat settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key. 2. Open your [Khoj Online Chat settings](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/). Add a new setting with your OpenAI API key, and click *Save*. Only one configuration will be used, so make sure that's the only one you have.
3. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the OpenAI chat model you want to use. Make sure to use `OpenAI` as its type.
![Configure online chat](https://user-images.githubusercontent.com/6413477/256998908-ac26e55e-13a2-45fb-9348-3b90a62f7687.mp4 ':include :type=mp4')
### Use ### Use
1. Open Khoj Chat 1. Open Khoj Chat
- **On Web**: Open [/chat](http://localhost:42110/chat) in your web browser - **On Web**: Open [/chat](https://app.khoj.dev/chat) in your web browser
- **On Obsidian**: Search for *Khoj: Chat* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) - **On Obsidian**: Search for *Khoj: Chat* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **On Emacs**: Run `M-x khoj <user-query>` - **On Emacs**: Run `M-x khoj <user-query>`
2. Enter your queries to chat with Khoj. Use [slash commands](#commands) and [query filters](./advanced.md#query-filters) to change what Khoj uses to respond 2. Enter your queries to chat with Khoj. Use [slash commands](#commands) and [query filters](./advanced.md#query-filters) to change what Khoj uses to respond

23
docs/desktop.md Normal file
View file

@ -0,0 +1,23 @@
<h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Desktop</h1>
> An AI copilot for your Second Brain
## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Incremental**: Incremental search for a fast, search-as-you-type experience
## Setup
1. Install the [Khoj Desktop app](https://khoj.dev/downloads) for your OS
2. Generate an API key on the [Khoj Web App](https://app.khoj.dev/config#clients)
3. Set your Khoj API Key on the *Settings* page of the Khoj Desktop app
4. [Optional] Add any files, folders you'd like Khoj to be aware of on the *Settings* page and Click *Save*
## Interface
![](./assets/khoj_chat_on_desktop.png ':size=600px')
![](./assets/khoj_search_on_desktop.png ':size=600px')

View file

@ -25,13 +25,7 @@ pip install -e .'[dev]'
khoj -vv khoj -vv
``` ```
2. Configure Khoj 2. Configure Khoj
- **Via the Settings UI**: Add files, directories to index the [Khoj settings](http://localhost:42110/config) UI once Khoj has started up. Once you've saved all your settings, click `Configure`. - **Via the Desktop application**: Add files, directories to index using the settings page of your desktop application. Click "Save" to immediately trigger indexing.
- **Manually**:
- Copy the `config/khoj_sample.yml` to `~/.khoj/khoj.yml`
- Set `input-files` or `input-filter` in each relevant `content-type` section of `~/.khoj/khoj.yml`
- Set `input-directories` field in `image` `content-type` section
- Delete `content-type` and `processor` sub-section(s) irrelevant for your use-case
- Restart khoj
Note: Wait after configuration for khoj to Load ML model, generate embeddings and expose API to query notes, images, documents etc specified in config YAML Note: Wait after configuration for khoj to Load ML model, generate embeddings and expose API to query notes, images, documents etc specified in config YAML

View file

@ -1,6 +1,6 @@
<h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Emacs</h1> <h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Emacs</h1>
> An AI personal assistance for your digital brain > An AI copilot for your Second Brain in Emacs
<img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge"> <img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge">
<img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge"> <img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge">
@ -10,14 +10,13 @@
## Features ## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search** - **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models - **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search, indexing is done on your machine*
- **Incremental**: Incremental search for a fast, search-as-you-type experience - **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster than search
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answer retrieval and content generation
## Interface ## Interface
#### Search #### Search
@ -27,79 +26,76 @@
![khoj chat on emacs](./assets/khoj_chat_on_emacs.png ':size=400px') ![khoj chat on emacs](./assets/khoj_chat_on_emacs.png ':size=400px')
## Setup ## Setup
- *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine* 1. Generate an API key on the [Khoj Web App](https://app.khoj.dev/config#clients)
2. Add below snippet to your Emacs config file, usually at `~/.emacs.d/init.el`
- *khoj.el attempts to automatically install, start and configure the khoj server.*
If this fails, follow [these instructions](/setup) to manually setup the khoj server.
### Direct Install <!-- tabs:start -->
#### **Direct Install**
*Khoj will index your org-agenda files, by default*
```elisp ```elisp
;; Install Khoj.el
M-x package-install khoj M-x package-install khoj
; Set your Khoj API key
(setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY")
``` ```
### Minimal Install #### **Minimal Install**
Add below snippet to your Emacs config file. *Khoj will index your org-agenda files, by default*
Indexes your org-agenda files, by default.
```elisp ```elisp
;; Install Khoj Package from MELPA Stable ;; Install Khoj client from MELPA Stable
(use-package khoj
:ensure t
:pin melpa-stable
:bind ("C-c s" . 'khoj))
```
- Note: Install `khoj.el` from MELPA (instead of MELPA Stable) if you installed the pre-release version of khoj
- That is, use `:pin melpa` to install khoj.el in above snippet if khoj server was installed with `--pre` flag, i.e `pip install --pre khoj-assistant`
- Else use `:pin melpa-stable` to install khoj.el in above snippet if khoj was installed with `pip install khoj-assistant`
- This ensures both khoj.el and khoj app are from the same version (git tagged or latest)
### Standard Install
Add below snippet to your Emacs config file.
Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Chat
```elisp
;; Install Khoj Package from MELPA Stable
(use-package khoj (use-package khoj
:ensure t :ensure t
:pin melpa-stable :pin melpa-stable
:bind ("C-c s" . 'khoj) :bind ("C-c s" . 'khoj)
:config (setq khoj-org-directories '("~/docs/org-roam" "~/docs/notes") :config (setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY"))
khoj-org-files '("~/docs/todo.org" "~/docs/work.org")
khoj-openai-api-key "YOUR_OPENAI_API_KEY")) ; required to enable chat
``` ```
### With [Straight.el](https://github.com/raxod502/straight.el) #### **Standard Install**
Add below snippet to your Emacs config file. *Configures the specified org files, directories to be indexed by Khoj*
Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Chat
```elisp ```elisp
;; Install Khoj Package using Straight.el ;; Install Khoj client from MELPA Stable
(use-package khoj
:ensure t
:pin melpa-stable
:bind ("C-c s" . 'khoj)
:config (setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY"
khoj-org-directories '("~/docs/org-roam" "~/docs/notes")
khoj-org-files '("~/docs/todo.org" "~/docs/work.org")))
```
#### **Straight.el**
*Configures the specified org files, directories to be indexed by Khoj*
```elisp
;; Install Khoj client using Straight.el
(use-package khoj (use-package khoj
:after org :after org
:straight (khoj :type git :host github :repo "khoj-ai/khoj" :files (:defaults "src/interface/emacs/khoj.el")) :straight (khoj :type git :host github :repo "khoj-ai/khoj" :files (:defaults "src/interface/emacs/khoj.el"))
:bind ("C-c s" . 'khoj) :bind ("C-c s" . 'khoj)
:config (setq khoj-org-directories '("~/docs/org-roam" "~/docs/notes") :config (setq khoj-api-key "YOUR_KHOJ_CLOUD_API_KEY"
khoj-org-files '("~/docs/todo.org" "~/docs/work.org") khoj-org-directories '("~/docs/org-roam" "~/docs/notes")
khoj-openai-api-key "YOUR_OPENAI_API_KEY" ; required to enable chat) khoj-org-files '("~/docs/todo.org" "~/docs/work.org")))
``` ```
<!-- tabs:end -->
## Use ## Use
### Search ### Search
See [Khoj Search](search.md) for details
1. Hit `C-c s s` (or `M-x khoj RET s`) to open khoj search 1. Hit `C-c s s` (or `M-x khoj RET s`) to open khoj search
2. Enter your query in natural language<br/>
2. Enter your query in natural language E.g *"What is the meaning of life?"*, *"My life goals for 2023"*
e.g "What is the meaning of life?", "My life goals for 2023"
### Chat ### Chat
See [Khoj Chat](chat.md) for details
1. Hit `C-c s c` (or `M-x khoj RET c`) to open khoj chat 1. Hit `C-c s c` (or `M-x khoj RET c`) to open khoj chat
2. Ask questions in a natural, conversational style<br/>
2. Ask questions in a natural, conversational style E.g *"When did I file my taxes last year?"*
E.g "When did I file my taxes last year?"
See [Khoj Chat](/#/chat) for more details
### Find Similar Entries ### Find Similar Entries
This feature finds entries similar to the one you are currently on. This feature finds entries similar to the one you are currently on.
@ -108,7 +104,6 @@ This feature finds entries similar to the one you are currently on.
### Advanced Usage ### Advanced Usage
- Add [query filters](https://github.com/khoj-ai/khoj/#query-filters) during search to narrow down results further - Add [query filters](https://github.com/khoj-ai/khoj/#query-filters) during search to narrow down results further
e.g `What is the meaning of life? -"god" +"none" dt>"last week"` e.g `What is the meaning of life? -"god" +"none" dt>"last week"`
- Use `C-c C-o 2` to open the current result at cursor in its source org file - Use `C-c C-o 2` to open the current result at cursor in its source org file
@ -121,31 +116,21 @@ This feature finds entries similar to the one you are currently on.
![](./assets/khoj_emacs_menu.png) ![](./assets/khoj_emacs_menu.png)
Hit `C-c s` (or `M-x khoj`) to open the khoj menu above. Then: Hit `C-c s` (or `M-x khoj`) to open the khoj menu above. Then:
- Hit `t` until you preferred content type is selected in the khoj menu - Hit `t` until you preferred content type is selected in the khoj menu
`Content Type` specifies the content to perform `Search`, `Update` or `Find Similar` actions on `Content Type` specifies the content to perform `Search`, `Update` or `Find Similar` actions on
- Hit `n` twice and then enter number of results you want to see - Hit `n` twice and then enter number of results you want to see
`Results Count` is used by the `Search` and `Find Similar` actions `Results Count` is used by the `Search` and `Find Similar` actions
- Hit `-f u` to `force` update the khoj content index - Hit `-f u` to `force` update the khoj content index
The `Force Update` switch is only used by the `Update` action The `Force Update` switch is only used by the `Update` action
## Upgrade ## Upgrade
### Upgrade Khoj Backend
```bash
pip install --upgrade khoj-assistant
```
### Upgrade Khoj.el
Use your Emacs package manager to upgrade `khoj.el` Use your Emacs package manager to upgrade `khoj.el`
<!-- tabs:start -->
- For `khoj.el` from MELPA #### **With MELPA**
- Method 1 1. Run `M-x package-refresh-content`
- Run `M-x package-list-packages` to list all packages 2. Run `M-x package-reinstall khoj`
- Press `U` on `khoj` to mark it for upgrade
- Press `x` to execute the marked actions
- Method 2
- Run `M-x package-refresh-content`
- Run `M-x package-reinstall khoj`
- For `khoj.el` from Straight #### **With Straight.el**
- Run `M-x straight-pull-package khoj` - Run `M-x straight-pull-package khoj`
<!-- tabs:end -->

View file

@ -1,10 +1,10 @@
## Features ## Features
#### [Search](./search.md) #### [Search](search.md)
- **Local**: Your personal data stays local. All search and indexing is done on your machine. - **Local**: Your personal data stays local. All search and indexing is done on your machine.
- **Incremental**: Incremental search for a fast, search-as-you-type experience - **Incremental**: Incremental search for a fast, search-as-you-type experience
#### [Chat](./chat.md) #### [Chat](chat.md)
- **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers. - **Faster answers**: Find answers faster, smoother than search. No need to manually scan through your notes to find answers.
- **Iterative discovery**: Iteratively explore and (re-)discover your notes - **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation - **Assisted creativity**: Smoothly weave across answers retrieval and content generation

View file

@ -1,14 +1,14 @@
# Setup the Github integration # 🧑🏾‍💻 Setup the Github integration
The Github integration allows you to index as many repositories as you want. It's currently default configured to index Issues, Commits, and all Markdown/Org files in each repository. For large repositories, this takes a fairly long time, but it works well for smaller projects. The Github integration allows you to index as many repositories as you want. It's currently default configured to index Issues, Commits, and all Markdown/Org files in each repository. For large repositories, this takes a fairly long time, but it works well for smaller projects.
# Configure your settings # Configure your settings
1. Go to [http://localhost:42110/config](http://localhost:42110/config) and enter in settings for the data sources you want to index. You'll have to specify the file paths. 1. Go to [https://app.khoj.dev/config](https://app.khoj.dev/config) and enter in settings for the data sources you want to index. You'll have to specify the file paths.
## Use the Github plugin ## Use the Github plugin
1. Generate a [classic PAT (personal access token)](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) from [Github](https://github.com/settings/tokens) with `repo` and `admin:org` scopes at least. 1. Generate a [classic PAT (personal access token)](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) from [Github](https://github.com/settings/tokens) with `repo` and `admin:org` scopes at least.
2. Navigate to [http://localhost:42110/config/content-source/github](http://localhost:42110/config/content-source/github) to configure your Github settings. Enter in your PAT, along with details for each repository you want to index. 2. Navigate to [https://app.khoj.dev/config/content-source/github](https://app.khoj.dev/config/content-source/github) to configure your Github settings. Enter in your PAT, along with details for each repository you want to index.
3. Click `Save`. Go back to the settings page and click `Configure`. 3. Click `Save`. Go back to the settings page and click `Configure`.
4. Go to [http://localhost:42110/](http://localhost:42110/) and start searching! 4. Go to [https://app.khoj.dev/](https://app.khoj.dev/) and start searching!

View file

@ -17,11 +17,13 @@
repo: 'https://github.com/khoj-ai/khoj', repo: 'https://github.com/khoj-ai/khoj',
loadSidebar: true, loadSidebar: true,
themeColor: '#c2a600', themeColor: '#c2a600',
auto2top: true,
// coverpage: true, // coverpage: true,
} }
</script> </script>
<!-- Docsify v4 --> <!-- Docsify v4 -->
<script src="//cdn.jsdelivr.net/npm/docsify@4"></script> <script src="//cdn.jsdelivr.net/npm/docsify@4"></script>
<script src="//cdn.jsdelivr.net/npm/docsify-tabs@1"></script>
<script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script> <script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script> <script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script> <script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>

View file

@ -8,7 +8,7 @@ We haven't setup a fancy integration with OAuth yet, so this integration still r
![setup_new_integration](https://github.com/khoj-ai/khoj/assets/65192171/b056e057-d4dc-47dc-aad3-57b59a22c68b) ![setup_new_integration](https://github.com/khoj-ai/khoj/assets/65192171/b056e057-d4dc-47dc-aad3-57b59a22c68b)
3. Share all the workspaces that you want to integrate with the Khoj integration you just made in the previous step 3. Share all the workspaces that you want to integrate with the Khoj integration you just made in the previous step
![enable_workspace](https://github.com/khoj-ai/khoj/assets/65192171/98290303-b5b8-4cb0-b32c-f68c6923a3d0) ![enable_workspace](https://github.com/khoj-ai/khoj/assets/65192171/98290303-b5b8-4cb0-b32c-f68c6923a3d0)
4. In the first step, you generated an API key. Use the newly generated API Key in your Khoj settings, by default at http://localhost:42110/config/content-source/notion. Click `Save`. 4. In the first step, you generated an API key. Use the newly generated API Key in your Khoj settings, by default at https://app.khoj.dev/config/content-source/notion. Click `Save`.
5. Click `Configure` in http://localhost:42110/config to index your Notion workspace(s). 5. Click `Configure` in https://app.khoj.dev/config to index your Notion workspace(s).
That's it! You should be ready to start searching and chatting. Make sure you've configured your OpenAI API Key for chat. That's it! You should be ready to start searching and chatting. Make sure you've configured your OpenAI API Key for chat.

View file

@ -1,16 +1,15 @@
<h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Obsidian</h1> <h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Obsidian</h1>
> An AI personal assistant for your Digital Brain in Obsidian > An AI copilot for your Second Brain in Obsidian
## Features ## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search** - **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models - **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
- **Incremental**: Incremental search for a fast, search-as-you-type experience - **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster and with less effort than search
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
## Interface ## Interface
![](./assets/khoj_search_on_obsidian.png ':size=400px') ![](./assets/khoj_search_on_obsidian.png ':size=400px')
@ -18,102 +17,37 @@
## Setup ## Setup
- *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
- *Ensure you follow the ordering of the setup steps. Install the plugin after starting the khoj backend. This allows the plugin to configure the khoj backend*
### 1. Setup Backend
Open terminal/cmd and run below command to install and start the khoj backend
- On Linux/MacOS
```shell
python -m pip install khoj-assistant && khoj
```
- On Windows
```shell
py -m pip install khoj-assistant && khoj
```
### 2. Setup Plugin
1. Open [Khoj](https://obsidian.md/plugins?id=khoj) from the *Community plugins* tab in Obsidian settings panel 1. Open [Khoj](https://obsidian.md/plugins?id=khoj) from the *Community plugins* tab in Obsidian settings panel
2. Click *Install*, then *Enable* on the Khoj plugin page in Obsidian 2. Click *Install*, then *Enable* on the Khoj plugin page in Obsidian
3. [Optional] To enable Khoj Chat, set your [OpenAI API key](https://platform.openai.com/account/api-keys) in the Khoj plugin settings 3. Generate an API key on the [Khoj Web App](https://app.khoj.dev/config#clients)
4. Set your Khoj API Key in the Khoj plugin settings in Obsidian
See [official Obsidian plugin docs](https://help.obsidian.md/Extending+Obsidian/Community+plugins) for details See the official [Obsidian Plugin Docs](https://help.obsidian.md/Extending+Obsidian/Community+plugins) for more details on installing Obsidian plugins.
## Use ## Use
### Chat ### Chat
Run *Khoj: Chat* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) and ask questions in a natural, conversational style.<br /> Run *Khoj: Chat* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) and ask questions in a natural, conversational style.<br />
E.g "When did I file my taxes last year?" E.g *"When did I file my taxes last year?"*
Notes:
- *Using Khoj Chat will result in query relevant notes being shared with OpenAI for ChatGPT to respond.*
- *To use Khoj Chat, ensure you've set your [OpenAI API key](https://platform.openai.com/account/api-keys) in the Khoj plugin settings.*
See [Khoj Chat](/chat) for more details See [Khoj Chat](/chat) for more details
### Search
Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or run *Khoj: Search* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
*Note: Ensure the khoj server is running in the background before searching. Execute `khoj` in your terminal if it is not already running*
[search_demo](https://user-images.githubusercontent.com/6413477/218801155-cd67e8b4-a770-404a-8179-d6b61caa0f93.mp4 ':include :type=mp4')
#### Query Filters
Use structured query syntax to filter the natural language search results
- **Word Filter**: Get entries that include/exclude a specified term
- Entries that contain term_to_include: `+"term_to_include"`
- Entries that contain term_to_exclude: `-"term_to_exclude"`
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
- Entries from April 1st 1984: `dt:"1984-04-01"`
- Entries after March 31st 1984: `dt>="1984-04-01"`
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
- **File Filter**: Get entries from a specified file
- Entries from incoming.org file: `file:"incoming.org"`
- Combined Example
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
- Adds all filters to the natural language query. It should return entries
- from the file *1984.org*
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
### Find Similar Notes ### Find Similar Notes
To see other notes similar to the current one, run *Khoj: Find Similar Notes* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) To see other notes similar to the current one, run *Khoj: Find Similar Notes* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
### Search
Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or run *Khoj: Search* from the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
See [Khoj Search](/search) for more details. Use [query filters](/advanced#query-filters) to limit entries to search
[search_demo](https://user-images.githubusercontent.com/6413477/218801155-cd67e8b4-a770-404a-8179-d6b61caa0f93.mp4 ':include :type=mp4')
## Upgrade ## Upgrade
### 1. Upgrade Backend
```shell
pip install --upgrade khoj-assistant
```
### 2. Upgrade Plugin
1. Open *Community plugins* tab in Obsidian settings 1. Open *Community plugins* tab in Obsidian settings
2. Click the *Check for updates* button 2. Click the *Check for updates* button
3. Click the *Update* button next to Khoj, if available 3. Click the *Update* button next to Khoj, if available
## Demo
### Search Demo
[demo](https://github-production-user-asset-6210df.s3.amazonaws.com/6413477/240061700-3e33d8ea-25bb-46c8-a3bf-c92f78d0f56b.mp4 ':include :type=mp4')
#### Description
1. Install Khoj via `pip` and start Khoj backend
```shell
python -m pip install khoj-assistant && khoj
```
2. Install Khoj plugin via Community Plugins settings pane on Obsidian app
- Check the new Khoj plugin settings
- Wait for Khoj backend to index markdown, PDF files in the current Vault
- Open Khoj plugin on Obsidian via Search button on Left Pane
- Search \"*Announce plugin to folks*\" in the [Obsidian Plugin docs](https://marcus.se.net/obsidian-plugin-docs/)
- Jump to the [search result](https://marcus.se.net/obsidian-plugin-docs/publishing/submit-your-plugin)
## Troubleshooting ## Troubleshooting
- Open the Khoj plugin settings pane, to configure Khoj - Open the Khoj plugin settings pane, to configure Khoj
- Toggle Enable/Disable Khoj, if setting changes have not applied - Toggle Enable/Disable Khoj, if setting changes have not applied
- Click *Update* button to force index to refresh, if results are failing or stale - Click *Update* button to force index to refresh, if results are failing or stale
## Current Limitations
- The plugin loads the index of only one vault at a time.<br/>
So notes across multiple vaults **cannot** be searched at the same time

View file

@ -1,7 +1,7 @@
## Khoj Search ## Khoj Search
### Use ### Use
1. Open Khoj Search 1. Open Khoj Search
- **On Web**: Open <http://localhost:42110/> in your web browser - **On Web**: Open <https://app.khoj.dev/> in your web browser
- **On Obsidian**: Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette) - **On Obsidian**: Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **On Emacs**: Run `M-x khoj <user-query>` - **On Emacs**: Run `M-x khoj <user-query>`
2. Query using natural language to find relevant entries from your knowledge base. Use [query filters](./advanced.md#query-filters) to limit entries to search 2. Query using natural language to find relevant entries from your knowledge base. Use [query filters](./advanced.md#query-filters) to limit entries to search

View file

@ -3,41 +3,15 @@ These are the general setup instructions for Khoj.
- Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine - Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine
- Check the [Khoj Emacs docs](/emacs?id=setup) to setup Khoj with Emacs<br /> - Check the [Khoj Emacs docs](/emacs?id=setup) to setup Khoj with Emacs<br />
Its simpler as it can skip the server *install*, *run* and *configure* step below. It's simpler as it can skip the server *install*, *run* and *configure* step below.
- Check the [Khoj Obsidian docs](/obsidian?id=_2-setup-plugin) to setup Khoj with Obsidian<br /> - Check the [Khoj Obsidian docs](/obsidian?id=_2-setup-plugin) to setup Khoj with Obsidian<br />
Its simpler as it can skip the *configure* step below. Its simpler as it can skip the *configure* step below.
### 1. Install For Installation, you can either use Docker or install Khoj locally.
#### 1.1 Local Server Setup ### 1. Installation (Docker)
Run the following command in your terminal to install the Khoj backend.
- On Linux/MacOS Use the sample docker-compose [in Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to run Khoj in Docker. Start by configuring all the environment variables to your choosing. Your admin account will automatically be created based on the admin credentials in that file, so pay attention to those. To start the container, run the following command in the same directory as the docker-compose.yml file. This will automatically setup the database and run the Khoj server.
```shell
python -m pip install khoj-assistant
```
- On Windows
```shell
py -m pip install khoj-assistant
```
For more detailed Windows installation and troubleshooting, see [Windows Install](./windows_install.md).
##### 1.1.1 Local Server Start
Run the following command from your terminal to start the Khoj backend and open Khoj in your browser.
```shell
khoj
```
Khoj should now be running at http://localhost:42110. You can see the web UI in your browser.
Note: To start Khoj automatically in the background use [Task scheduler](https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10) on Windows or [Cron](https://en.wikipedia.org/wiki/Cron) on Mac, Linux (e.g with `@reboot khoj`)
#### 1.2 Local Docker Setup
Use the sample docker-compose [in Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to run Khoj in Docker. To start the container, run the following command in the same directory as the docker-compose.yml file. You'll have to configure the mounted directories to match your local knowledge base.
```shell ```shell
docker-compose up docker-compose up
@ -45,27 +19,131 @@ docker-compose up
Khoj should now be running at http://localhost:42110. You can see the web UI in your browser. Khoj should now be running at http://localhost:42110. You can see the web UI in your browser.
#### 1.3 Download the desktop client [Optional] ### 1. Installation (Local)
You can use our desktop executables to select file paths and folders to index. You can simply select the folders or files, and they'll be automatically uploaded to the server. Once you specify a file or file path, you don't need to update the configuration again; it will grab any data diffs dynamically over time. This part is currently optional, but may make setup and configuration slightly easier. It removes the need for setting up custom file paths for your Khoj data configurations. #### Prerequisites
**To download the desktop client, go to https://download.khoj.dev** and the correct executable for your OS will automatically start downloading. Once downloaded, you can configure your folders for indexing using the settings tab. To set your chat configuration, you'll have to use the web interface for the Khoj server you setup in the previous step. ##### Install Postgres (with PgVector)
### 1.4 Use (deprecated) desktop builds Khoj uses the `pgvector` package to store embeddings of your index in a Postgres database. In order to use this, you need to have Postgres installed.
Before `v0.12.0``, we had self-contained desktop builds that included both the server and the client. These were difficult to maintain, but are still available as part of earlier releases. To find setup instructions, see here: <!-- tabs:start -->
- [Desktop Installation](desktop_installation.md) #### **MacOS**
- [Windows Installation](windows_install.md)
### 2. Configure Install [Postgres.app](https://postgresapp.com/). This comes pre-installed with `pgvector` and relevant dependencies.
1. Set `File`, `Folder` and hit `Save` in each Plugins you want to enable for Search on the Khoj config page
2. Add your OpenAI API key to Chat Feature settings if you want to use Chat
3. Click `Configure` and wait. The app will download ML models and index the content for search and (optionally) chat
![configure demo](https://user-images.githubusercontent.com/6413477/255307879-61247d3f-c69a-46ef-b058-9bc533cb5c72.mp4 ':include :type=mp4') #### **Windows**
### 3. Install Interface Plugins (Optional) Use the [recommended installer](https://www.postgresql.org/download/windows/)
#### **Linux**
From [official instructions](https://wiki.postgresql.org/wiki/Apt)
```bash
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install postgres-16 postgresql-16-pgvector
```
##### **From Source**
1. Follow instructions to [Install Postgres](https://www.postgresql.org/download/)
2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#installation) in case you need to manually install it. Reproduced instructions below for convenience.
```bash
cd /tmp
git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install # may need sudo
```
<!-- tabs:end -->
##### Create the Khoj database
Make sure to update your environment variables to match your Postgres configuration if you're using a different name. The default values should work for most people.
<!-- tabs:start -->
#### **MacOS**
```bash
createdb khoj -U postgres
```
#### **Windows**
```bash
createdb khoj -U postgres
```
#### **Linux**
```bash
sudo -u postgres createdb khoj
```
<!-- tabs:end -->
#### Install package
##### Local Server Setup
- *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
Run the following command in your terminal to install the Khoj backend.
<!-- tabs:start -->
#### **MacOS**
```shell
python -m pip install khoj-assistant
```
#### **Windows**
```shell
py -m pip install khoj-assistant
```
For more detailed Windows installation and troubleshooting, see [Windows Install](./windows_install.md).
#### **Linux**
```shell
python -m pip install khoj-assistant
```
<!-- tabs:end -->
##### Local Server Start
Run the following command from your terminal to start the Khoj backend and open Khoj in your browser.
```shell
khoj --anonymous-mode
```
`--anonymous-mode` allows you to run the server without setting up Google credentials for login. This allows you to use any of the clients without a login wall. If you want to use Google login, you can skip this flag, but you will have to add your Google developer credentials.
On the first run, you will be prompted to input credentials for your admin account and do some basic configuration for your chat model settings. Once created, you can go to http://localhost:42110/server/admin and login with the credentials you just created.
Khoj should now be running at http://localhost:42110. You can see the web UI in your browser.
Note: To start Khoj automatically in the background use [Task scheduler](https://www.windowscentral.com/how-create-automated-task-using-task-scheduler-windows-10) on Windows or [Cron](https://en.wikipedia.org/wiki/Cron) on Mac, Linux (e.g with `@reboot khoj`)
### 2. Download the desktop client
You can use our desktop executables to select file paths and folders to index. You can simply select the folders or files, and they'll be automatically uploaded to the server. Once you specify a file or file path, you don't need to update the configuration again; it will grab any data diffs dynamically over time.
**To download the latest desktop client, go to https://download.khoj.dev** and the correct executable for your OS will automatically start downloading. Once downloaded, you can configure your folders for indexing using the settings tab. To set your chat configuration, you'll have to use the web interface for the Khoj server you setup in the previous step.
To use the desktop client, you need to go to your Khoj server's settings page (http://localhost:42110/config) and copy the API key. Then, paste it into the desktop client's settings page. Once you've done that, you can select files and folders to index.
### 3. Configure
1. Go to http://localhost:42110/server/admin and login with your admin credentials. Go to the ChatModelOptions if you want to add additional models for chat.
1. Select files and folders to index [using the desktop client](./setup.md?id=_2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing.
- Select Notion workspaces and Github repositories to index using the web interface.
### 4. Install Client Plugins (Optional)
Khoj exposes a web interface to search, chat and configure by default.<br /> Khoj exposes a web interface to search, chat and configure by default.<br />
The optional steps below allow using Khoj from within an existing application like Obsidian or Emacs. The optional steps below allow using Khoj from within an existing application like Obsidian or Emacs.
@ -75,9 +153,17 @@ The optional steps below allow using Khoj from within an existing application li
- **Khoj Emacs**:<br /> - **Khoj Emacs**:<br />
[Install](/emacs?id=setup) khoj.el [Install](/emacs?id=setup) khoj.el
### 5. Use Khoj 🚀
You can head to http://localhost:42110 to use the web interface. You can also use the desktop client to search and chat.
## Upgrade ## Upgrade
### Upgrade Khoj Server ### Upgrade Khoj Server
<!-- tabs:start -->
#### **Local Setup**
```shell ```shell
pip install --upgrade khoj-assistant pip install --upgrade khoj-assistant
``` ```
@ -88,6 +174,16 @@ pip install --upgrade khoj-assistant
pip install --upgrade --pre khoj-assistant pip install --upgrade --pre khoj-assistant
``` ```
#### **Docker**
From the same directory where you have your `docker-compose` file, this will fetch the latest build and upgrade your server.
```shell
docker-compose up --build
```
<!-- tabs:end -->
### Upgrade Khoj on Emacs ### Upgrade Khoj on Emacs
- Use your Emacs Package Manager to Upgrade - Use your Emacs Package Manager to Upgrade
- See [khoj.el package setup](/emacs?id=setup) for details - See [khoj.el package setup](/emacs?id=setup) for details

View file

@ -1,4 +1,4 @@
# Telemetry # Telemetry (self-hosting)
We collect some high level, anonymized metadata about usage of Khoj. This includes: We collect some high level, anonymized metadata about usage of Khoj. This includes:
- Client (Web, Emacs, Obsidian) - Client (Web, Emacs, Obsidian)

View file

@ -1,19 +1,18 @@
<h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Web</h1> <h1><img src="./assets/khoj-logo-sideways-500.png" width="200" alt="Khoj Logo"> Web</h1>
> An AI personal assistant for your Digital Brain > An AI copilot for your Second Brain
## Features ## Features
- **Chat**
- **Faster answers**: Find answers quickly, from your private notes or the public internet
- **Assisted creativity**: Smoothly weave across retrieving answers and generating content
- **Iterative discovery**: Iteratively explore and re-discover your notes
- **Search** - **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models - **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
- **Incremental**: Incremental search for a fast, search-as-you-type experience - **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster and with less effort than search
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
## Setup ## Setup
The Khoj web interface is the default interface. It comes packaged with the khoj server. No setup required. The Khoj web app is the default interface to Khoj. You can access it from any web browser. Try it on [Khoj Cloud](https://app.khoj.dev)
## Interface ## Interface
![](./assets/khoj_search_on_web.png ':size=400px') ![](./assets/khoj_search_on_web.png ':size=400px')

View file

@ -1,7 +1,7 @@
{ {
"id": "khoj", "id": "khoj",
"name": "Khoj", "name": "Khoj",
"version": "0.14.0", "version": "1.0.0",
"minAppVersion": "0.15.0", "minAppVersion": "0.15.0",
"description": "An AI copilot for your Second Brain", "description": "An AI copilot for your Second Brain",
"author": "Khoj Inc.", "author": "Khoj Inc.",

View file

@ -39,7 +39,7 @@ dependencies = [
"bs4 >= 0.0.1", "bs4 >= 0.0.1",
"dateparser >= 1.1.1", "dateparser >= 1.1.1",
"defusedxml == 0.7.1", "defusedxml == 0.7.1",
"fastapi == 0.77.1", "fastapi >= 0.104.1",
"python-multipart >= 0.0.5", "python-multipart >= 0.0.5",
"jinja2 == 3.1.2", "jinja2 == 3.1.2",
"openai >= 0.27.0, < 1.0.0", "openai >= 0.27.0, < 1.0.0",
@ -60,7 +60,7 @@ dependencies = [
"bs4 >= 0.0.1", "bs4 >= 0.0.1",
"anyio == 3.7.1", "anyio == 3.7.1",
"pymupdf >= 1.23.5", "pymupdf >= 1.23.5",
"django == 4.2.5", "django == 4.2.7",
"authlib == 1.2.1", "authlib == 1.2.1",
"gpt4all >= 2.0.0; platform_system == 'Linux' and platform_machine == 'x86_64'", "gpt4all >= 2.0.0; platform_system == 'Linux' and platform_machine == 'x86_64'",
"gpt4all >= 2.0.0; platform_system == 'Windows' or platform_system == 'Darwin'", "gpt4all >= 2.0.0; platform_system == 'Windows' or platform_system == 'Darwin'",

View file

@ -240,10 +240,18 @@ class ConversationAdapters:
def get_openai_conversation_config(): def get_openai_conversation_config():
return OpenAIProcessorConversationConfig.objects.filter().first() return OpenAIProcessorConversationConfig.objects.filter().first()
@staticmethod
async def aget_openai_conversation_config():
return await OpenAIProcessorConversationConfig.objects.filter().afirst()
@staticmethod @staticmethod
def get_offline_chat_conversation_config(): def get_offline_chat_conversation_config():
return OfflineChatProcessorConversationConfig.objects.filter().first() return OfflineChatProcessorConversationConfig.objects.filter().first()
@staticmethod
async def aget_offline_chat_conversation_config():
return await OfflineChatProcessorConversationConfig.objects.filter().afirst()
@staticmethod @staticmethod
def has_valid_offline_conversation_config(): def has_valid_offline_conversation_config():
return OfflineChatProcessorConversationConfig.objects.filter(enabled=True).exists() return OfflineChatProcessorConversationConfig.objects.filter(enabled=True).exists()
@ -267,10 +275,21 @@ class ConversationAdapters:
return None return None
return config.setting return config.setting
@staticmethod
async def aget_conversation_config(user: KhojUser):
config = await UserConversationConfig.objects.filter(user=user).prefetch_related("setting").afirst()
if not config:
return None
return config.setting
@staticmethod @staticmethod
def get_default_conversation_config(): def get_default_conversation_config():
return ChatModelOptions.objects.filter().first() return ChatModelOptions.objects.filter().first()
@staticmethod
async def aget_default_conversation_config():
return await ChatModelOptions.objects.filter().afirst()
@staticmethod @staticmethod
def save_conversation(user: KhojUser, conversation_log: dict): def save_conversation(user: KhojUser, conversation_log: dict):
conversation = Conversation.objects.filter(user=user) conversation = Conversation.objects.filter(user=user)
@ -320,10 +339,6 @@ class ConversationAdapters:
async def get_openai_chat_config(): async def get_openai_chat_config():
return await OpenAIProcessorConversationConfig.objects.filter().afirst() return await OpenAIProcessorConversationConfig.objects.filter().afirst()
@staticmethod
async def aget_default_conversation_config():
return await ChatModelOptions.objects.filter().afirst()
class EntryAdapters: class EntryAdapters:
word_filer = WordFilter() word_filer = WordFilter()

View file

@ -1,6 +1,6 @@
{ {
"name": "Khoj", "name": "Khoj",
"version": "0.14.0", "version": "1.0.0",
"description": "An AI copilot for your Second Brain", "description": "An AI copilot for your Second Brain",
"author": "Saba Imran, Debanjum Singh Solanky <team@khoj.dev>", "author": "Saba Imran, Debanjum Singh Solanky <team@khoj.dev>",
"license": "GPL-3.0-or-later", "license": "GPL-3.0-or-later",

View file

@ -6,7 +6,7 @@
;; Saba Imran <saba@khoj.dev> ;; Saba Imran <saba@khoj.dev>
;; Description: An AI copilot for your Second Brain ;; Description: An AI copilot for your Second Brain
;; Keywords: search, chat, org-mode, outlines, markdown, pdf, image ;; Keywords: search, chat, org-mode, outlines, markdown, pdf, image
;; Version: 0.14.0 ;; Version: 1.0.0
;; Package-Requires: ((emacs "27.1") (transient "0.3.0") (dash "2.19.1")) ;; Package-Requires: ((emacs "27.1") (transient "0.3.0") (dash "2.19.1"))
;; URL: https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs ;; URL: https://github.com/khoj-ai/khoj/tree/master/src/interface/emacs
@ -63,7 +63,7 @@
;; Khoj Static Configuration ;; Khoj Static Configuration
;; ------------------------- ;; -------------------------
(defcustom khoj-server-url "http://localhost:42110" (defcustom khoj-server-url "https://app.khoj.dev"
"Location of Khoj API server." "Location of Khoj API server."
:group 'khoj :group 'khoj
:type 'string) :type 'string)
@ -94,7 +94,7 @@
:type 'number) :type 'number)
(defcustom khoj-api-key nil (defcustom khoj-api-key nil
"API Key to Khoj server." "API Key to your Khoj. Default at https://app.khoj.dev/config#clients."
:group 'khoj :group 'khoj
:type 'string) :type 'string)

View file

@ -1,7 +1,7 @@
{ {
"id": "khoj", "id": "khoj",
"name": "Khoj", "name": "Khoj",
"version": "0.14.0", "version": "1.0.0",
"minAppVersion": "0.15.0", "minAppVersion": "0.15.0",
"description": "An AI copilot for your Second Brain", "description": "An AI copilot for your Second Brain",
"author": "Khoj Inc.", "author": "Khoj Inc.",

View file

@ -1,6 +1,6 @@
{ {
"name": "Khoj", "name": "Khoj",
"version": "0.14.0", "version": "1.0.0",
"description": "An AI copilot for your Second Brain", "description": "An AI copilot for your Second Brain",
"author": "Debanjum Singh Solanky, Saba Imran <team@khoj.dev>", "author": "Debanjum Singh Solanky, Saba Imran <team@khoj.dev>",
"license": "GPL-3.0-or-later", "license": "GPL-3.0-or-later",

View file

@ -75,7 +75,7 @@ export default class Khoj extends Plugin {
if (this.settings.khojUrl === "https://app.khoj.dev") { if (this.settings.khojUrl === "https://app.khoj.dev") {
if (this.settings.khojApiKey === "") { if (this.settings.khojApiKey === "") {
new Notice(`Khoj API key is not configured. Please visit https://app.khoj.dev to get an API key.`); new Notice(`Khoj API key is not configured. Please visit https://app.khoj.dev/config#clients to get an API key.`);
return; return;
} }

View file

@ -13,7 +13,7 @@ export interface KhojSetting {
export const DEFAULT_SETTINGS: KhojSetting = { export const DEFAULT_SETTINGS: KhojSetting = {
resultsCount: 6, resultsCount: 6,
khojUrl: 'http://127.0.0.1:42110', khojUrl: 'https://app.khoj.dev',
khojApiKey: '', khojApiKey: '',
connectedToBackend: false, connectedToBackend: false,
autoConfigure: true, autoConfigure: true,

View file

@ -26,5 +26,6 @@
"0.12.2": "0.15.0", "0.12.2": "0.15.0",
"0.12.3": "0.15.0", "0.12.3": "0.15.0",
"0.13.0": "0.15.0", "0.13.0": "0.15.0",
"0.14.0": "0.15.0" "0.14.0": "0.15.0",
"1.0.0": "0.15.0"
} }

View file

@ -2,7 +2,7 @@
{% block content %} {% block content %}
<div class="page"> <div class="page">
<div class="section"> <div id="content" class="section">
<h2 class="section-title">Content</h2> <h2 class="section-title">Content</h2>
<div class="section-cards"> <div class="section-cards">
<div class="card"> <div class="card">
@ -118,7 +118,7 @@
</div> </div>
</div> </div>
</div> </div>
<div class="section"> <div id ="features" class="section">
<h2 class="section-title">Features</h2> <h2 class="section-title">Features</h2>
<div id="features-hint-text"></div> <div id="features-hint-text"></div>
<div class="section-cards"> <div class="section-cards">
@ -144,9 +144,9 @@
</div> </div>
</div> </div>
</div> </div>
<div class="section"> <div id="clients" class="section">
<h2 class="section-title">Clients</h2> <h2 class="section-title">Clients</h2>
<div class="api-settings"> <div id="clients-api" class="api-settings">
<div class="card-title-row"> <div class="card-title-row">
<img class="card-icon" src="/static/assets/icons/key.svg" alt="API Key"> <img class="card-icon" src="/static/assets/icons/key.svg" alt="API Key">
<h3 class="card-title">API Keys</h3> <h3 class="card-title">API Keys</h3>
@ -172,7 +172,7 @@
</div> </div>
</div> </div>
{% if billing_enabled %} {% if billing_enabled %}
<div class="section"> <div id="billing" class="section">
<h2 class="section-title">Billing</h2> <h2 class="section-title">Billing</h2>
<div class="section-cards"> <div class="section-cards">
<div class="card"> <div class="card">

View file

@ -1,5 +1,6 @@
# Standard Packages # Standard Packages
import logging import logging
import json
from datetime import datetime, timedelta from datetime import datetime, timedelta
from typing import Optional from typing import Optional
@ -31,6 +32,10 @@ def extract_questions(
""" """
Infer search queries to retrieve relevant notes to answer user query Infer search queries to retrieve relevant notes to answer user query
""" """
def _valid_question(question: str):
return not is_none_or_empty(question) and question != "[]"
# Extract Past User Message and Inferred Questions from Conversation Log # Extract Past User Message and Inferred Questions from Conversation Log
chat_history = "".join( chat_history = "".join(
[ [
@ -70,7 +75,7 @@ def extract_questions(
# Extract, Clean Message from GPT's Response # Extract, Clean Message from GPT's Response
try: try:
questions = ( split_questions = (
response.content.strip(empty_escape_sequences) response.content.strip(empty_escape_sequences)
.replace("['", '["') .replace("['", '["')
.replace("']", '"]') .replace("']", '"]')
@ -79,9 +84,18 @@ def extract_questions(
.replace('"]', "") .replace('"]', "")
.split('", "') .split('", "')
) )
questions = []
for question in split_questions:
if question not in questions and _valid_question(question):
questions.append(question)
if is_none_or_empty(questions):
raise ValueError("GPT returned empty JSON")
except: except:
logger.warning(f"GPT returned invalid JSON. Falling back to using user message as search query.\n{response}") logger.warning(f"GPT returned invalid JSON. Falling back to using user message as search query.\n{response}")
questions = [text] questions = [text]
logger.debug(f"Extracted Questions by GPT: {questions}") logger.debug(f"Extracted Questions by GPT: {questions}")
return questions return questions

View file

@ -154,17 +154,20 @@ def truncate_messages(
) )
system_message = messages.pop() system_message = messages.pop()
assert type(system_message.content) == str
system_message_tokens = len(encoder.encode(system_message.content)) system_message_tokens = len(encoder.encode(system_message.content))
tokens = sum([len(encoder.encode(message.content)) for message in messages]) tokens = sum([len(encoder.encode(message.content)) for message in messages if type(message.content) == str])
while (tokens + system_message_tokens) > max_prompt_size and len(messages) > 1: while (tokens + system_message_tokens) > max_prompt_size and len(messages) > 1:
messages.pop() messages.pop()
tokens = sum([len(encoder.encode(message.content)) for message in messages]) assert type(system_message.content) == str
tokens = sum([len(encoder.encode(message.content)) for message in messages if type(message.content) == str])
# Truncate current message if still over max supported prompt size by model # Truncate current message if still over max supported prompt size by model
if (tokens + system_message_tokens) > max_prompt_size: if (tokens + system_message_tokens) > max_prompt_size:
current_message = "\n".join(messages[0].content.split("\n")[:-1]) assert type(system_message.content) == str
original_question = "\n".join(messages[0].content.split("\n")[-1:]) current_message = "\n".join(messages[0].content.split("\n")[:-1]) if type(messages[0].content) == str else ""
original_question = "\n".join(messages[0].content.split("\n")[-1:]) if type(messages[0].content) == str else ""
original_question_tokens = len(encoder.encode(original_question)) original_question_tokens = len(encoder.encode(original_question))
remaining_tokens = max_prompt_size - original_question_tokens - system_message_tokens remaining_tokens = max_prompt_size - original_question_tokens - system_message_tokens
truncated_message = encoder.decode(encoder.encode(current_message)[:remaining_tokens]).strip() truncated_message = encoder.decode(encoder.encode(current_message)[:remaining_tokens]).strip()

View file

@ -31,6 +31,7 @@ from khoj.utils import state, constants
from khoj.utils.helpers import AsyncIteratorWrapper, get_device from khoj.utils.helpers import AsyncIteratorWrapper, get_device
from fastapi.responses import StreamingResponse, Response from fastapi.responses import StreamingResponse, Response
from khoj.routers.helpers import ( from khoj.routers.helpers import (
CommonQueryParams,
get_conversation_command, get_conversation_command,
validate_conversation_config, validate_conversation_config,
agenerate_chat_response, agenerate_chat_response,
@ -55,6 +56,7 @@ from database.models import (
Entry as DbEntry, Entry as DbEntry,
GithubConfig, GithubConfig,
NotionConfig, NotionConfig,
ChatModelOptions,
) )
@ -122,7 +124,7 @@ async def map_config_to_db(config: FullConfig, user: KhojUser):
def _initialize_config(): def _initialize_config():
if state.config is None: if state.config is None:
state.config = FullConfig() state.config = FullConfig()
state.config.search_type = SearchConfig.parse_obj(constants.default_config["search-type"]) state.config.search_type = SearchConfig.model_validate(constants.default_config["search-type"])
@api.get("/config/data", response_model=FullConfig) @api.get("/config/data", response_model=FullConfig)
@ -355,15 +357,12 @@ def get_config_types(
async def search( async def search(
q: str, q: str,
request: Request, request: Request,
common: CommonQueryParams,
n: Optional[int] = 5, n: Optional[int] = 5,
t: Optional[SearchType] = SearchType.All, t: Optional[SearchType] = SearchType.All,
r: Optional[bool] = False, r: Optional[bool] = False,
max_distance: Optional[Union[float, None]] = None, max_distance: Optional[Union[float, None]] = None,
dedupe: Optional[bool] = True, dedupe: Optional[bool] = True,
client: Optional[str] = None,
user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None),
host: Optional[str] = Header(None),
): ):
user = request.user.object user = request.user.object
start_time = time.time() start_time = time.time()
@ -467,10 +466,7 @@ async def search(
request=request, request=request,
telemetry_type="api", telemetry_type="api",
api="search", api="search",
client=client, **common.__dict__,
user_agent=user_agent,
referer=referer,
host=host,
) )
end_time = time.time() end_time = time.time()
@ -483,12 +479,9 @@ async def search(
@requires(["authenticated"]) @requires(["authenticated"])
def update( def update(
request: Request, request: Request,
common: CommonQueryParams,
t: Optional[SearchType] = None, t: Optional[SearchType] = None,
force: Optional[bool] = False, force: Optional[bool] = False,
client: Optional[str] = None,
user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None),
host: Optional[str] = Header(None),
): ):
user = request.user.object user = request.user.object
if not state.config: if not state.config:
@ -514,10 +507,7 @@ def update(
request=request, request=request,
telemetry_type="api", telemetry_type="api",
api="update", api="update",
client=client, **common.__dict__,
user_agent=user_agent,
referer=referer,
host=host,
) )
return {"status": "ok", "message": "khoj reloaded"} return {"status": "ok", "message": "khoj reloaded"}
@ -527,10 +517,7 @@ def update(
@requires(["authenticated"]) @requires(["authenticated"])
def chat_history( def chat_history(
request: Request, request: Request,
client: Optional[str] = None, common: CommonQueryParams,
user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None),
host: Optional[str] = Header(None),
): ):
user = request.user.object user = request.user.object
validate_conversation_config() validate_conversation_config()
@ -542,10 +529,7 @@ def chat_history(
request=request, request=request,
telemetry_type="api", telemetry_type="api",
api="chat", api="chat",
client=client, **common.__dict__,
user_agent=user_agent,
referer=referer,
host=host,
) )
return {"status": "ok", "response": meta_log.get("chat", [])} return {"status": "ok", "response": meta_log.get("chat", [])}
@ -555,10 +539,7 @@ def chat_history(
@requires(["authenticated"]) @requires(["authenticated"])
async def chat_options( async def chat_options(
request: Request, request: Request,
client: Optional[str] = None, common: CommonQueryParams,
user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None),
host: Optional[str] = Header(None),
) -> Response: ) -> Response:
cmd_options = {} cmd_options = {}
for cmd in ConversationCommand: for cmd in ConversationCommand:
@ -568,10 +549,7 @@ async def chat_options(
request=request, request=request,
telemetry_type="api", telemetry_type="api",
api="chat_options", api="chat_options",
client=client, **common.__dict__,
user_agent=user_agent,
referer=referer,
host=host,
) )
return Response(content=json.dumps(cmd_options), media_type="application/json", status_code=200) return Response(content=json.dumps(cmd_options), media_type="application/json", status_code=200)
@ -580,14 +558,11 @@ async def chat_options(
@requires(["authenticated"]) @requires(["authenticated"])
async def chat( async def chat(
request: Request, request: Request,
common: CommonQueryParams,
q: str, q: str,
n: Optional[int] = 5, n: Optional[int] = 5,
d: Optional[float] = 0.18, d: Optional[float] = 0.18,
client: Optional[str] = None,
stream: Optional[bool] = False, stream: Optional[bool] = False,
user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None),
host: Optional[str] = Header(None),
rate_limiter_per_minute=Depends(ApiUserRateLimiter(requests=30, window=60)), rate_limiter_per_minute=Depends(ApiUserRateLimiter(requests=30, window=60)),
rate_limiter_per_day=Depends(ApiUserRateLimiter(requests=500, window=60 * 60 * 24)), rate_limiter_per_day=Depends(ApiUserRateLimiter(requests=500, window=60 * 60 * 24)),
) -> Response: ) -> Response:
@ -601,7 +576,7 @@ async def chat(
meta_log = (await ConversationAdapters.aget_conversation_by_user(user)).conversation_log meta_log = (await ConversationAdapters.aget_conversation_by_user(user)).conversation_log
compiled_references, inferred_queries, defiltered_query = await extract_references_and_questions( compiled_references, inferred_queries, defiltered_query = await extract_references_and_questions(
request, meta_log, q, (n or 5), (d or math.inf), conversation_command request, common, meta_log, q, (n or 5), (d or math.inf), conversation_command
) )
online_results: Dict = dict() online_results: Dict = dict()
@ -647,11 +622,8 @@ async def chat(
request=request, request=request,
telemetry_type="api", telemetry_type="api",
api="chat", api="chat",
client=client,
user_agent=user_agent,
referer=referer,
host=host,
metadata=chat_metadata, metadata=chat_metadata,
**common.__dict__,
) )
if llm_response is None: if llm_response is None:
@ -678,6 +650,7 @@ async def chat(
async def extract_references_and_questions( async def extract_references_and_questions(
request: Request, request: Request,
common: CommonQueryParams,
meta_log: dict, meta_log: dict,
q: str, q: str,
n: int, n: int,
@ -710,7 +683,16 @@ async def extract_references_and_questions(
# Infer search queries from user message # Infer search queries from user message
with timer("Extracting search queries took", logger): with timer("Extracting search queries took", logger):
# If we've reached here, either the user has enabled offline chat or the openai model is enabled. # If we've reached here, either the user has enabled offline chat or the openai model is enabled.
if await ConversationAdapters.ahas_offline_chat(): offline_chat_config = await ConversationAdapters.aget_offline_chat_conversation_config()
conversation_config = await ConversationAdapters.aget_conversation_config(user)
if conversation_config is None:
conversation_config = await ConversationAdapters.aget_default_conversation_config()
openai_chat_config = await ConversationAdapters.aget_openai_conversation_config()
if (
offline_chat_config
and offline_chat_config.enabled
and conversation_config.model_type == ChatModelOptions.ModelType.OFFLINE
):
using_offline_chat = True using_offline_chat = True
offline_chat = await ConversationAdapters.get_offline_chat() offline_chat = await ConversationAdapters.get_offline_chat()
chat_model = offline_chat.chat_model chat_model = offline_chat.chat_model
@ -722,7 +704,7 @@ async def extract_references_and_questions(
inferred_queries = extract_questions_offline( inferred_queries = extract_questions_offline(
defiltered_query, loaded_model=loaded_model, conversation_log=meta_log, should_extract_questions=False defiltered_query, loaded_model=loaded_model, conversation_log=meta_log, should_extract_questions=False
) )
elif await ConversationAdapters.has_openai_chat(): elif openai_chat_config and conversation_config.model_type == ChatModelOptions.ModelType.OPENAI:
openai_chat_config = await ConversationAdapters.get_openai_chat_config() openai_chat_config = await ConversationAdapters.get_openai_chat_config()
openai_chat = await ConversationAdapters.get_openai_chat() openai_chat = await ConversationAdapters.get_openai_chat()
api_key = openai_chat_config.api_key api_key = openai_chat_config.api_key
@ -744,9 +726,9 @@ async def extract_references_and_questions(
r=True, r=True,
max_distance=d, max_distance=d,
dedupe=False, dedupe=False,
common=common,
) )
) )
# Dedupe the results again, as duplicates may be returned across queries.
result_list = text_search.deduplicated_search_responses(result_list) result_list = text_search.deduplicated_search_responses(result_list)
compiled_references = [item.additional["compiled"] for item in result_list] compiled_references = [item.additional["compiled"] for item in result_list]

View file

@ -6,10 +6,10 @@ from datetime import datetime
from functools import partial from functools import partial
import logging import logging
from time import time from time import time
from typing import Iterator, List, Optional, Union, Tuple, Dict, Any from typing import Annotated, Iterator, List, Optional, Union, Tuple, Dict, Any
# External Packages # External Packages
from fastapi import HTTPException, Request from fastapi import HTTPException, Header, Request, Depends
# Internal Packages # Internal Packages
from khoj.utils import state from khoj.utils import state
@ -232,3 +232,20 @@ class ApiUserRateLimiter:
# Add the current request to the cache # Add the current request to the cache
user_requests.append(time()) user_requests.append(time())
class CommonQueryParamsClass:
def __init__(
self,
client: Optional[str] = None,
user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None),
host: Optional[str] = Header(None),
):
self.client = client
self.user_agent = user_agent
self.referer = referer
self.host = host
CommonQueryParams = Annotated[CommonQueryParamsClass, Depends()]

View file

@ -63,7 +63,7 @@ async def update(
request: Request, request: Request,
files: list[UploadFile], files: list[UploadFile],
force: bool = False, force: bool = False,
t: Optional[Union[state.SearchType, str]] = None, t: Optional[Union[state.SearchType, str]] = state.SearchType.All,
client: Optional[str] = None, client: Optional[str] = None,
user_agent: Optional[str] = Header(None), user_agent: Optional[str] = Header(None),
referer: Optional[str] = Header(None), referer: Optional[str] = Header(None),
@ -182,13 +182,16 @@ def configure_content(
files: Optional[dict[str, dict[str, str]]], files: Optional[dict[str, dict[str, str]]],
search_models: SearchModels, search_models: SearchModels,
regenerate: bool = False, regenerate: bool = False,
t: Optional[state.SearchType] = None, t: Optional[state.SearchType] = state.SearchType.All,
full_corpus: bool = True, full_corpus: bool = True,
user: KhojUser = None, user: KhojUser = None,
) -> tuple[Optional[ContentIndex], bool]: ) -> tuple[Optional[ContentIndex], bool]:
content_index = ContentIndex() content_index = ContentIndex()
success = True success = True
if t is not None and t in [type.value for type in state.SearchType]:
t = state.SearchType(t)
if t is not None and not t.value in [type.value for type in state.SearchType]: if t is not None and not t.value in [type.value for type in state.SearchType]:
logger.warning(f"🚨 Invalid search type: {t}") logger.warning(f"🚨 Invalid search type: {t}")
return None, False return None, False
@ -201,7 +204,7 @@ def configure_content(
try: try:
# Initialize Org Notes Search # Initialize Org Notes Search
if (search_type == None or search_type == state.SearchType.Org.value) and files["org"]: if (search_type == state.SearchType.All.value or search_type == state.SearchType.Org.value) and files["org"]:
logger.info("🦄 Setting up search for orgmode notes") logger.info("🦄 Setting up search for orgmode notes")
# Extract Entries, Generate Notes Embeddings # Extract Entries, Generate Notes Embeddings
text_search.setup( text_search.setup(
@ -217,7 +220,9 @@ def configure_content(
try: try:
# Initialize Markdown Search # Initialize Markdown Search
if (search_type == None or search_type == state.SearchType.Markdown.value) and files["markdown"]: if (search_type == state.SearchType.All.value or search_type == state.SearchType.Markdown.value) and files[
"markdown"
]:
logger.info("💎 Setting up search for markdown notes") logger.info("💎 Setting up search for markdown notes")
# Extract Entries, Generate Markdown Embeddings # Extract Entries, Generate Markdown Embeddings
text_search.setup( text_search.setup(
@ -234,7 +239,7 @@ def configure_content(
try: try:
# Initialize PDF Search # Initialize PDF Search
if (search_type == None or search_type == state.SearchType.Pdf.value) and files["pdf"]: if (search_type == state.SearchType.All.value or search_type == state.SearchType.Pdf.value) and files["pdf"]:
logger.info("🖨️ Setting up search for pdf") logger.info("🖨️ Setting up search for pdf")
# Extract Entries, Generate PDF Embeddings # Extract Entries, Generate PDF Embeddings
text_search.setup( text_search.setup(
@ -251,7 +256,9 @@ def configure_content(
try: try:
# Initialize Plaintext Search # Initialize Plaintext Search
if (search_type == None or search_type == state.SearchType.Plaintext.value) and files["plaintext"]: if (search_type == state.SearchType.All.value or search_type == state.SearchType.Plaintext.value) and files[
"plaintext"
]:
logger.info("📄 Setting up search for plaintext") logger.info("📄 Setting up search for plaintext")
# Extract Entries, Generate Plaintext Embeddings # Extract Entries, Generate Plaintext Embeddings
text_search.setup( text_search.setup(
@ -269,7 +276,7 @@ def configure_content(
try: try:
# Initialize Image Search # Initialize Image Search
if ( if (
(search_type == None or search_type == state.SearchType.Image.value) (search_type == state.SearchType.All.value or search_type == state.SearchType.Image.value)
and content_config and content_config
and content_config.image and content_config.image
and search_models.image_search and search_models.image_search
@ -286,7 +293,9 @@ def configure_content(
try: try:
github_config = GithubConfig.objects.filter(user=user).prefetch_related("githubrepoconfig").first() github_config = GithubConfig.objects.filter(user=user).prefetch_related("githubrepoconfig").first()
if (search_type == None or search_type == state.SearchType.Github.value) and github_config is not None: if (
search_type == state.SearchType.All.value or search_type == state.SearchType.Github.value
) and github_config is not None:
logger.info("🐙 Setting up search for github") logger.info("🐙 Setting up search for github")
# Extract Entries, Generate Github Embeddings # Extract Entries, Generate Github Embeddings
text_search.setup( text_search.setup(
@ -305,7 +314,9 @@ def configure_content(
try: try:
# Initialize Notion Search # Initialize Notion Search
notion_config = NotionConfig.objects.filter(user=user).first() notion_config = NotionConfig.objects.filter(user=user).first()
if (search_type == None or search_type in state.SearchType.Notion.value) and notion_config: if (
search_type == state.SearchType.All.value or search_type in state.SearchType.Notion.value
) and notion_config:
logger.info("🔌 Setting up search for notion") logger.info("🔌 Setting up search for notion")
text_search.setup( text_search.setup(
NotionToEntries, NotionToEntries,

View file

@ -229,7 +229,7 @@ def collate_results(hits, image_names, output_directory, image_files_url, count=
# Add the image metadata to the results # Add the image metadata to the results
results += [ results += [
SearchResponse.parse_obj( SearchResponse.model_validate(
{ {
"entry": f"{image_files_url}/{target_image_name}", "entry": f"{image_files_url}/{target_image_name}",
"score": f"{hit['score']:.9f}", "score": f"{hit['score']:.9f}",
@ -237,7 +237,7 @@ def collate_results(hits, image_names, output_directory, image_files_url, count=
"image_score": f"{hit['image_score']:.9f}", "image_score": f"{hit['image_score']:.9f}",
"metadata_score": f"{hit['metadata_score']:.9f}", "metadata_score": f"{hit['metadata_score']:.9f}",
}, },
"corpus_id": hit["corpus_id"], "corpus_id": str(hit["corpus_id"]),
} }
) )
] ]

View file

@ -163,7 +163,7 @@ def deduplicated_search_responses(hits: List[SearchResponse]):
else: else:
hit_ids.add(hit.corpus_id) hit_ids.add(hit.corpus_id)
yield SearchResponse.parse_obj( yield SearchResponse.model_validate(
{ {
"entry": hit.entry, "entry": hit.entry,
"score": hit.score, "score": hit.score,

View file

@ -288,15 +288,15 @@ def generate_random_name():
# List of adjectives and nouns to choose from # List of adjectives and nouns to choose from
adjectives = [ adjectives = [
"happy", "happy",
"irritated", "serendipitous",
"annoyed", "exuberant",
"calm", "calm",
"brave", "brave",
"scared", "scared",
"energetic", "energetic",
"chivalrous", "chivalrous",
"kind", "kind",
"grumpy", "suave",
] ]
nouns = ["dog", "cat", "falcon", "whale", "turtle", "rabbit", "hamster", "snake", "spider", "elephant"] nouns = ["dog", "cat", "falcon", "whale", "turtle", "rabbit", "hamster", "snake", "spider", "elephant"]

View file

@ -14,7 +14,7 @@ from khoj.utils.helpers import to_snake_case_from_dash
class ConfigBase(BaseModel): class ConfigBase(BaseModel):
class Config: class Config:
alias_generator = to_snake_case_from_dash alias_generator = to_snake_case_from_dash
allow_population_by_field_name = True populate_by_name = True
def __getitem__(self, item): def __getitem__(self, item):
return getattr(self, item) return getattr(self, item)
@ -29,8 +29,8 @@ class TextConfigBase(ConfigBase):
class TextContentConfig(ConfigBase): class TextContentConfig(ConfigBase):
input_files: Optional[List[Path]] input_files: Optional[List[Path]] = None
input_filter: Optional[List[str]] input_filter: Optional[List[str]] = None
index_heading_entries: Optional[bool] = False index_heading_entries: Optional[bool] = False
@ -50,31 +50,31 @@ class NotionContentConfig(ConfigBase):
class ImageContentConfig(ConfigBase): class ImageContentConfig(ConfigBase):
input_directories: Optional[List[Path]] input_directories: Optional[List[Path]] = None
input_filter: Optional[List[str]] input_filter: Optional[List[str]] = None
embeddings_file: Path embeddings_file: Path
use_xmp_metadata: bool use_xmp_metadata: bool
batch_size: int batch_size: int
class ContentConfig(ConfigBase): class ContentConfig(ConfigBase):
org: Optional[TextContentConfig] org: Optional[TextContentConfig] = None
image: Optional[ImageContentConfig] image: Optional[ImageContentConfig] = None
markdown: Optional[TextContentConfig] markdown: Optional[TextContentConfig] = None
pdf: Optional[TextContentConfig] pdf: Optional[TextContentConfig] = None
plaintext: Optional[TextContentConfig] plaintext: Optional[TextContentConfig] = None
github: Optional[GithubContentConfig] github: Optional[GithubContentConfig] = None
notion: Optional[NotionContentConfig] notion: Optional[NotionContentConfig] = None
class ImageSearchConfig(ConfigBase): class ImageSearchConfig(ConfigBase):
encoder: str encoder: str
encoder_type: Optional[str] encoder_type: Optional[str] = None
model_directory: Optional[Path] model_directory: Optional[Path] = None
class SearchConfig(ConfigBase): class SearchConfig(ConfigBase):
image: Optional[ImageSearchConfig] image: Optional[ImageSearchConfig] = None
class OpenAIProcessorConfig(ConfigBase): class OpenAIProcessorConfig(ConfigBase):
@ -95,26 +95,26 @@ class ConversationProcessorConfig(ConfigBase):
class ProcessorConfig(ConfigBase): class ProcessorConfig(ConfigBase):
conversation: Optional[ConversationProcessorConfig] conversation: Optional[ConversationProcessorConfig] = None
class AppConfig(ConfigBase): class AppConfig(ConfigBase):
should_log_telemetry: bool should_log_telemetry: bool = True
class FullConfig(ConfigBase): class FullConfig(ConfigBase):
content_type: Optional[ContentConfig] = None content_type: Optional[ContentConfig] = None
search_type: Optional[SearchConfig] = None search_type: Optional[SearchConfig] = None
processor: Optional[ProcessorConfig] = None processor: Optional[ProcessorConfig] = None
app: Optional[AppConfig] = AppConfig(should_log_telemetry=True) app: Optional[AppConfig] = AppConfig()
version: Optional[str] = None version: Optional[str] = None
class SearchResponse(ConfigBase): class SearchResponse(ConfigBase):
entry: str entry: str
score: float score: float
cross_score: Optional[float] cross_score: Optional[float] = None
additional: Optional[dict] additional: Optional[dict] = None
corpus_id: str corpus_id: str

View file

@ -39,7 +39,7 @@ def load_config_from_file(yaml_config_file: Path) -> dict:
def parse_config_from_string(yaml_config: dict) -> FullConfig: def parse_config_from_string(yaml_config: dict) -> FullConfig:
"Parse and validate config in YML string" "Parse and validate config in YML string"
return FullConfig.parse_obj(yaml_config) return FullConfig.model_validate(yaml_config)
def parse_config_from_file(yaml_config_file): def parse_config_from_file(yaml_config_file):

View file

@ -9,9 +9,6 @@ import os
from fastapi import FastAPI from fastapi import FastAPI
app = FastAPI()
# Internal Packages # Internal Packages
from khoj.configure import configure_routes, configure_search_types, configure_middleware from khoj.configure import configure_routes, configure_search_types, configure_middleware
from khoj.processor.embeddings import CrossEncoderModel, EmbeddingsModel from khoj.processor.embeddings import CrossEncoderModel, EmbeddingsModel
@ -320,6 +317,7 @@ def client(
state.anonymous_mode = False state.anonymous_mode = False
app = FastAPI()
configure_routes(app) configure_routes(app)
configure_middleware(app) configure_middleware(app)
app.mount("/static", StaticFiles(directory=web_directory), name="static") app.mount("/static", StaticFiles(directory=web_directory), name="static")

View file

@ -227,7 +227,7 @@ def test_answer_not_known_using_notes_command(chat_client_no_background, default
# Assert # Assert
assert response.status_code == 200 assert response.status_code == 200
assert response_message == prompts.no_notes_found.format() assert response_message == prompts.no_entries_found.format()
# ---------------------------------------------------------------------------------------------------- # ----------------------------------------------------------------------------------------------------

View file

@ -26,5 +26,6 @@
"0.12.2": "0.15.0", "0.12.2": "0.15.0",
"0.12.3": "0.15.0", "0.12.3": "0.15.0",
"0.13.0": "0.15.0", "0.13.0": "0.15.0",
"0.14.0": "0.15.0" "0.14.0": "0.15.0",
"1.0.0": "0.15.0"
} }