mirror of
https://github.com/khoj-ai/khoj.git
synced 2024-11-27 17:35:07 +01:00
Fix diff blocks, links, remove footnotes & rearrange sections in docs
Extract performance into separate sectin into shoving it under search Create page for web interface
This commit is contained in:
parent
2ddee7e745
commit
c28755ccd2
12 changed files with 125 additions and 115 deletions
4
.github/workflows/build_desktop.yml
vendored
4
.github/workflows/build_desktop.yml
vendored
|
@ -4,6 +4,10 @@ on:
|
||||||
push:
|
push:
|
||||||
branches:
|
branches:
|
||||||
- master
|
- master
|
||||||
|
paths:
|
||||||
|
- src/khoj/**
|
||||||
|
- pyproject.toml
|
||||||
|
- Khoj.spec
|
||||||
workflow_dispatch:
|
workflow_dispatch:
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
|
|
|
@ -1,18 +1,20 @@
|
||||||
- Getting Started
|
- Get Started
|
||||||
- [Overview](README.md)
|
- [Overview](overview.md)
|
||||||
- [Install](setup.md)
|
- [Install](setup.md)
|
||||||
- [Windows Installation](windows_install.md)
|
- [Windows Installation](windows_install.md)
|
||||||
- Learn More
|
- [Demos](demos.md)
|
||||||
|
- Use
|
||||||
- [Features](features.md)
|
- [Features](features.md)
|
||||||
- [Chat](chat.md)
|
- [Chat](chat.md)
|
||||||
- [Search](search.md)
|
- [Search](search.md)
|
||||||
- [Demos](demos.md)
|
|
||||||
- Interfaces
|
- Interfaces
|
||||||
- [Obsidian](obsidian.md)
|
- [Obsidian](obsidian.md)
|
||||||
- [Emacs](emacs.md)
|
- [Emacs](emacs.md)
|
||||||
|
- [Web](web.md)
|
||||||
- Data Sources
|
- Data Sources
|
||||||
- [Github](github_integration.md)
|
- [Github](github_integration.md)
|
||||||
- [Notion](notion_integration.md)
|
- [Notion](notion_integration.md)
|
||||||
- [Advanced](advanced.md)
|
- [Advanced](advanced.md)
|
||||||
|
- [Performance](performance.md)
|
||||||
- Contributing
|
- Contributing
|
||||||
- [Development](development.md)
|
- [Development](development.md)
|
||||||
|
|
|
@ -1,54 +1,60 @@
|
||||||
|
|
||||||
## Advanced Usage
|
## Advanced Usage
|
||||||
|
### Search across Different Languages
|
||||||
|
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
|
||||||
|
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
|
||||||
|
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
|
||||||
|
|
||||||
|
```diff
|
||||||
|
asymmetric:
|
||||||
|
- encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
|
||||||
|
+ encoder: paraphrase-multilingual-MiniLM-L12-v2
|
||||||
|
cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
|
||||||
|
model_directory: "~/.khoj/search/asymmetric/"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
|
||||||
|
|
||||||
### Access Khoj on Mobile
|
### Access Khoj on Mobile
|
||||||
1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
|
1. [Setup Khoj](/#/setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
|
||||||
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
|
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
|
||||||
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
|
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
|
||||||
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
|
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
|
||||||
5. Enjoy exploring your notes, documents and images from your phone!
|
5. Enjoy exploring your notes, documents and images from your phone!
|
||||||
|
|
||||||
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_pwa_android.png?)
|
![](./assets/khoj_pwa_android.png?)
|
||||||
|
|
||||||
### Search across Different Languages
|
### Use OpenAI Models for Search
|
||||||
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
|
#### Setup
|
||||||
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
|
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml` (at `~/.khoj/khoj.yml`):
|
||||||
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
|
```diff
|
||||||
```diff
|
asymmetric:
|
||||||
asymmetric:
|
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
||||||
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
+ encoder: text-embedding-ada-002
|
||||||
+ encoder: "paraphrase-multilingual-MiniLM-L12-v2"
|
+ encoder-type: khoj.utils.models.OpenAI
|
||||||
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
||||||
model_directory: "~/.khoj/search/asymmetric/"
|
- encoder-type: sentence_transformers.SentenceTransformer
|
||||||
```
|
- model_directory: "~/.khoj/search/asymmetric/"
|
||||||
|
+ model-directory: null
|
||||||
|
```
|
||||||
|
2. [Setup your OpenAI API key in Khoj](/#/chat?id=setup)
|
||||||
|
3. Restart Khoj server to generate embeddings. It will take longer than with the offline search models.
|
||||||
|
|
||||||
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
|
#### Warnings
|
||||||
|
This configuration *uses an online model*
|
||||||
|
- It will **send all notes to OpenAI** to generate embeddings
|
||||||
|
- **All queries will be sent to OpenAI** when you search with Khoj
|
||||||
|
- You will be **charged by OpenAI** based on the total tokens processed
|
||||||
|
- It *requires an active internet connection* to search and index
|
||||||
|
|
||||||
### Bootstrap Khoj Search for Offline Usage later
|
### Bootstrap Khoj Search for Offline Usage later
|
||||||
|
|
||||||
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
|
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
|
||||||
Note: *Only search can currently run in fully offline mode, not chat.*
|
Note: *Only search can currently run in fully offline mode, not chat.*
|
||||||
|
|
||||||
- With Internet
|
- With Internet
|
||||||
1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
|
1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
|
||||||
2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
|
2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
|
||||||
- Without Internet
|
- Without Internet
|
||||||
1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
|
1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
|
||||||
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
|
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
|
||||||
|
|
||||||
|
|
||||||
## Miscellaneous
|
|
||||||
### Set your OpenAI API key in Khoj
|
|
||||||
If you want, Khoj can be configured to use OpenAI for search and chat.<br />
|
|
||||||
Add your OpenAI API to Khoj by using either of the two options below:
|
|
||||||
- Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
|
|
||||||
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml`[^1] to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
|
|
||||||
```diff
|
|
||||||
processor:
|
|
||||||
conversation:
|
|
||||||
- openai-api-key: # "YOUR_OPENAI_API_KEY"
|
|
||||||
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
|
|
||||||
model: "text-davinci-003"
|
|
||||||
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
|
|
||||||
```
|
|
||||||
|
|
||||||
!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing
|
|
||||||
|
|
22
docs/chat.md
22
docs/chat.md
|
@ -1,16 +1,30 @@
|
||||||
### Khoj Chat
|
### Khoj Chat
|
||||||
#### Overview
|
#### Overview
|
||||||
- Creates a personal assistant for you to inquire and engage with your notes
|
- Creates a personal assistant for you to inquire and engage with your notes
|
||||||
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](#khoj-search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
|
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](/#/search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
|
||||||
- Supports multi-turn conversations with the relevant notes for context
|
- Supports multi-turn conversations with the relevant notes for context
|
||||||
- Shows reference notes used to generate a response
|
- Shows reference notes used to generate a response
|
||||||
- **Note**: *Your query and top notes from khoj search will be sent to OpenAI for processing*
|
|
||||||
|
!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing
|
||||||
|
|
||||||
#### Setup
|
#### Setup
|
||||||
- [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
|
- Get your [OpenAI API Key](https://platform.openai.com/account/api-keys)
|
||||||
|
- Add your OpenAI API to Khoj by using either of the two options below:
|
||||||
|
|
||||||
|
- Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
|
||||||
|
|
||||||
|
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml` @ `~/.khoj/khoj.yml` to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
|
||||||
|
```diff
|
||||||
|
processor:
|
||||||
|
conversation:
|
||||||
|
- openai-api-key: # "YOUR_OPENAI_API_KEY"
|
||||||
|
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
|
||||||
|
model: "text-davinci-003"
|
||||||
|
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
|
||||||
|
```
|
||||||
|
|
||||||
#### Use
|
#### Use
|
||||||
1. Open [/chat](http://localhost:42110/chat)[^2]
|
1. Open [/chat](http://localhost:42110/chat)
|
||||||
2. Type your queries and see response by Khoj from your notes
|
2. Type your queries and see response by Khoj from your notes
|
||||||
|
|
||||||
#### Demo
|
#### Demo
|
||||||
|
|
|
@ -40,8 +40,8 @@ git clone https://github.com/khoj-ai/khoj && cd khoj
|
||||||
|
|
||||||
#### 2. Configure
|
#### 2. Configure
|
||||||
|
|
||||||
- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
|
- **Required**: Update [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
|
||||||
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
|
- **Optional**: Edit application configuration in [khoj_docker.yml](https://github.com/khoj-ai/khoj/blob/master/config/khoj_docker.yml)
|
||||||
|
|
||||||
#### 3. Run
|
#### 3. Run
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Emacs</h1>
|
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Emacs</h1>
|
||||||
|
|
||||||
An AI personal assistance for your digital brain
|
> An AI personal assistance for your digital brain
|
||||||
|
|
||||||
<img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge">
|
<img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge">
|
||||||
<img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge">
|
<img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge">
|
||||||
|
@ -100,7 +100,7 @@ Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Ch
|
||||||
|
|
||||||
E.g "When did I file my taxes last year?"
|
E.g "When did I file my taxes last year?"
|
||||||
|
|
||||||
See [Khoj Chat](./README.md#khoj-chat) for more details
|
See [Khoj Chat](/#/chat) for more details
|
||||||
|
|
||||||
### Find Similar Entries
|
### Find Similar Entries
|
||||||
This feature finds entries similar to the one you are currently on.
|
This feature finds entries similar to the one you are currently on.
|
||||||
|
|
|
@ -25,6 +25,7 @@
|
||||||
<script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script>
|
<script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script>
|
||||||
<script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script>
|
<script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script>
|
||||||
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>
|
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>
|
||||||
|
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-diff.min.js"></script>
|
||||||
<script defer data-domain="khoj.dev" src="https://plausible.io/js/script.js"></script>
|
<script defer data-domain="khoj.dev" src="https://plausible.io/js/script.js"></script>
|
||||||
</body>
|
</body>
|
||||||
<style>
|
<style>
|
||||||
|
|
|
@ -7,7 +7,7 @@
|
||||||
# Khoj
|
# Khoj
|
||||||
*An AI personal assistant for your digital brain*
|
*An AI personal assistant for your digital brain*
|
||||||
|
|
||||||
Welcome to the Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to dive straight into the code.
|
Welcome to the Khoj Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to explore the code or our [Website](https://khoj.dev) for an invite up to Khoj cloud.
|
||||||
|
|
||||||
Khoj gives you lightning fast, offline search on your personal machine and gives you the power to talk to your notes.
|
Khoj gives you lightning fast, offline search on your personal machine and gives you the power to talk to your notes.
|
||||||
|
|
||||||
|
@ -28,7 +28,7 @@ Khoj gives you lightning fast, offline search on your personal machine and gives
|
||||||
- **Natural**: Advanced natural language understanding using Transformer based ML Models
|
- **Natural**: Advanced natural language understanding using Transformer based ML Models
|
||||||
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
|
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
|
||||||
- **Multiple Sources**: Index your Org-mode and Markdown notes, PDF files, Github repositories, and Photos
|
- **Multiple Sources**: Index your Org-mode and Markdown notes, PDF files, Github repositories, and Photos
|
||||||
- **Multiple Interfaces**: Interact from your [Web Browser](./src/khoj/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or [Obsidian](./src/interface/obsidian/)
|
- **Multiple Interfaces**: Interact from your [Web Browser](https://docs.khoj.dev/#/web), [Emacs](https://docs.khoj.dev/#/emacs) or [Obsidian](https://docs.khoj.dev/#/obsidian)
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
[Click here](./setup.md) for full setup instructions.
|
[Click here](./setup.md) for full setup instructions.
|
||||||
|
@ -52,8 +52,3 @@ If you're using Github or Notion, you can get on a waitlist for [Khoj Cloud](htt
|
||||||
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
|
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
|
||||||
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
|
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
|
||||||
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
|
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
|
||||||
|
|
||||||
|
|
||||||
[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
|
|
||||||
|
|
||||||
[^2]: Default Khoj url @ http://localhost:42110
|
|
19
docs/performance.md
Normal file
19
docs/performance.md
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
### Search performance
|
||||||
|
|
||||||
|
- Semantic search using the bi-encoder is fairly fast at \<100 ms across all content types
|
||||||
|
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
|
||||||
|
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
|
||||||
|
|
||||||
|
### Indexing performance
|
||||||
|
|
||||||
|
- Indexing is more strongly impacted by the size of the source data
|
||||||
|
- Indexing 100K+ line corpus of notes takes about 10 minutes
|
||||||
|
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
|
||||||
|
- Note: *It should only take this long on the first run* as the index is incrementally updated
|
||||||
|
|
||||||
|
### Miscellaneous
|
||||||
|
|
||||||
|
- Testing done on a Mac M1 and a \>100K line corpus of notes
|
||||||
|
- Search, indexing on a GPU has not been tested yet
|
|
@ -1,11 +1,11 @@
|
||||||
## Khoj Search
|
## Khoj Search
|
||||||
- **Khoj via Obsidian**
|
- **Using Obsidian**
|
||||||
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
|
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
|
||||||
- **Khoj via Emacs**
|
- **Using Emacs**
|
||||||
- Run `M-x khoj <user-query>`
|
- Run `M-x khoj <user-query>`
|
||||||
- **Khoj via Web**
|
- **Using Web**
|
||||||
- Open <http://localhost:42110/> directly
|
- Open <http://localhost:42110/> in your web browser
|
||||||
- **Khoj via API**
|
- **Using API**
|
||||||
- See the Khoj FastAPI [Swagger Docs](http://localhost:42110/docs), [ReDocs](http://localhost:42110/redocs)
|
- See the Khoj FastAPI [Swagger Docs](http://localhost:42110/docs), [ReDocs](http://localhost:42110/redocs)
|
||||||
|
|
||||||
### Query Filters
|
### Query Filters
|
||||||
|
@ -27,53 +27,3 @@ Use structured query syntax to filter the natural language search results
|
||||||
- containing dates from the year *1984*
|
- containing dates from the year *1984*
|
||||||
- excluding words *"big"* and *"brother"*
|
- excluding words *"big"* and *"brother"*
|
||||||
- that best match the natural language query *"what is the meaning of life?"*
|
- that best match the natural language query *"what is the meaning of life?"*
|
||||||
|
|
||||||
## Details
|
|
||||||
1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
|
|
||||||
2. These notes, the last few messages and associated metadata is passed to ChatGPT along with your query for a response
|
|
||||||
|
|
||||||
|
|
||||||
## Performance
|
|
||||||
|
|
||||||
### Query performance
|
|
||||||
|
|
||||||
- Semantic search using the bi-encoder is fairly fast at \<50 ms
|
|
||||||
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
|
|
||||||
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
|
|
||||||
|
|
||||||
### Indexing performance
|
|
||||||
|
|
||||||
- Indexing is more strongly impacted by the size of the source data
|
|
||||||
- Indexing 100K+ line corpus of notes takes about 10 minutes
|
|
||||||
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
|
|
||||||
- Note: *It should only take this long on the first run* as the index is incrementally updated
|
|
||||||
|
|
||||||
### Miscellaneous
|
|
||||||
|
|
||||||
- Testing done on a Mac M1 and a \>100K line corpus of notes
|
|
||||||
- Search, indexing on a GPU has not been tested yet
|
|
||||||
|
|
||||||
## Advanced Usage
|
|
||||||
|
|
||||||
### Use OpenAI Models for Search
|
|
||||||
#### Setup
|
|
||||||
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml`[^1]:
|
|
||||||
```diff
|
|
||||||
asymmetric:
|
|
||||||
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
|
||||||
+ encoder: text-embedding-ada-002
|
|
||||||
+ encoder-type: khoj.utils.models.OpenAI
|
|
||||||
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
|
||||||
- encoder-type: sentence_transformers.SentenceTransformer
|
|
||||||
- model_directory: "~/.khoj/search/asymmetric/"
|
|
||||||
+ model-directory: null
|
|
||||||
```
|
|
||||||
2. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
|
|
||||||
3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
|
|
||||||
|
|
||||||
#### Warnings
|
|
||||||
This configuration *uses an online model*
|
|
||||||
- It will **send all notes to OpenAI** to generate embeddings
|
|
||||||
- **All queries will be sent to OpenAI** when you search with Khoj
|
|
||||||
- You will be **charged by OpenAI** based on the total tokens processed
|
|
||||||
- It *requires an active internet connection* to search and index
|
|
||||||
|
|
|
@ -87,7 +87,7 @@ pip install --upgrade --pre khoj-assistant
|
||||||
- **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details
|
- **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details
|
||||||
|
|
||||||
#### Search starts giving wonky results
|
#### Search starts giving wonky results
|
||||||
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true)[^2] in browser to regenerate index from scratch
|
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true) in browser to regenerate index from scratch
|
||||||
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
|
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
|
||||||
|
|
||||||
#### Khoj in Docker errors out with \"Killed\" in error message
|
#### Khoj in Docker errors out with \"Killed\" in error message
|
||||||
|
|
19
docs/web.md
Normal file
19
docs/web.md
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Web</h1>
|
||||||
|
|
||||||
|
> An AI personal assistant for your Digital Brain
|
||||||
|
|
||||||
|
## Features
|
||||||
|
- **Search**
|
||||||
|
- **Natural**: Advanced natural language understanding using Transformer based ML Models
|
||||||
|
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
|
||||||
|
- **Incremental**: Incremental search for a fast, search-as-you-type experience
|
||||||
|
- **Chat**
|
||||||
|
- **Faster answers**: Find answers faster and with less effort than search
|
||||||
|
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
|
||||||
|
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
The Khoj web interface is the default interface. It comes packaged with the khoj server.
|
||||||
|
|
||||||
|
## Interface
|
||||||
|
![](./assets/khoj_chat_web_interface.png?)
|
Loading…
Reference in a new issue