mirror of
https://github.com/khoj-ai/khoj.git
synced 2024-11-27 09:25:06 +01:00
Fix diff blocks, links, remove footnotes & rearrange sections in docs
Extract performance into separate sectin into shoving it under search Create page for web interface
This commit is contained in:
parent
2ddee7e745
commit
c28755ccd2
12 changed files with 125 additions and 115 deletions
4
.github/workflows/build_desktop.yml
vendored
4
.github/workflows/build_desktop.yml
vendored
|
@ -4,6 +4,10 @@ on:
|
|||
push:
|
||||
branches:
|
||||
- master
|
||||
paths:
|
||||
- src/khoj/**
|
||||
- pyproject.toml
|
||||
- Khoj.spec
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
|
|
|
@ -1,18 +1,20 @@
|
|||
- Getting Started
|
||||
- [Overview](README.md)
|
||||
- Get Started
|
||||
- [Overview](overview.md)
|
||||
- [Install](setup.md)
|
||||
- [Windows Installation](windows_install.md)
|
||||
- Learn More
|
||||
- [Demos](demos.md)
|
||||
- Use
|
||||
- [Features](features.md)
|
||||
- [Chat](chat.md)
|
||||
- [Search](search.md)
|
||||
- [Demos](demos.md)
|
||||
- Interfaces
|
||||
- [Obsidian](obsidian.md)
|
||||
- [Emacs](emacs.md)
|
||||
- [Web](web.md)
|
||||
- Data Sources
|
||||
- [Github](github_integration.md)
|
||||
- [Notion](notion_integration.md)
|
||||
- [Advanced](advanced.md)
|
||||
- [Performance](performance.md)
|
||||
- Contributing
|
||||
- [Development](development.md)
|
||||
|
|
|
@ -1,54 +1,60 @@
|
|||
|
||||
## Advanced Usage
|
||||
### Search across Different Languages
|
||||
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
|
||||
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
|
||||
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
|
||||
|
||||
```diff
|
||||
asymmetric:
|
||||
- encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
|
||||
+ encoder: paraphrase-multilingual-MiniLM-L12-v2
|
||||
cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
|
||||
model_directory: "~/.khoj/search/asymmetric/"
|
||||
```
|
||||
|
||||
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
|
||||
|
||||
### Access Khoj on Mobile
|
||||
1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
|
||||
1. [Setup Khoj](/#/setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
|
||||
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
|
||||
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
|
||||
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
|
||||
5. Enjoy exploring your notes, documents and images from your phone!
|
||||
|
||||
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_pwa_android.png?)
|
||||
![](./assets/khoj_pwa_android.png?)
|
||||
|
||||
### Search across Different Languages
|
||||
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
|
||||
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
|
||||
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
|
||||
```diff
|
||||
asymmetric:
|
||||
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
||||
+ encoder: "paraphrase-multilingual-MiniLM-L12-v2"
|
||||
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
||||
model_directory: "~/.khoj/search/asymmetric/"
|
||||
```
|
||||
### Use OpenAI Models for Search
|
||||
#### Setup
|
||||
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml` (at `~/.khoj/khoj.yml`):
|
||||
```diff
|
||||
asymmetric:
|
||||
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
||||
+ encoder: text-embedding-ada-002
|
||||
+ encoder-type: khoj.utils.models.OpenAI
|
||||
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
||||
- encoder-type: sentence_transformers.SentenceTransformer
|
||||
- model_directory: "~/.khoj/search/asymmetric/"
|
||||
+ model-directory: null
|
||||
```
|
||||
2. [Setup your OpenAI API key in Khoj](/#/chat?id=setup)
|
||||
3. Restart Khoj server to generate embeddings. It will take longer than with the offline search models.
|
||||
|
||||
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
|
||||
#### Warnings
|
||||
This configuration *uses an online model*
|
||||
- It will **send all notes to OpenAI** to generate embeddings
|
||||
- **All queries will be sent to OpenAI** when you search with Khoj
|
||||
- You will be **charged by OpenAI** based on the total tokens processed
|
||||
- It *requires an active internet connection* to search and index
|
||||
|
||||
### Bootstrap Khoj Search for Offline Usage later
|
||||
|
||||
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
|
||||
Note: *Only search can currently run in fully offline mode, not chat.*
|
||||
You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
|
||||
Note: *Only search can currently run in fully offline mode, not chat.*
|
||||
|
||||
- With Internet
|
||||
1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
|
||||
2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
|
||||
- Without Internet
|
||||
1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
|
||||
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
|
||||
|
||||
|
||||
## Miscellaneous
|
||||
### Set your OpenAI API key in Khoj
|
||||
If you want, Khoj can be configured to use OpenAI for search and chat.<br />
|
||||
Add your OpenAI API to Khoj by using either of the two options below:
|
||||
- Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
|
||||
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml`[^1] to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
|
||||
```diff
|
||||
processor:
|
||||
conversation:
|
||||
- openai-api-key: # "YOUR_OPENAI_API_KEY"
|
||||
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
|
||||
model: "text-davinci-003"
|
||||
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
|
||||
```
|
||||
|
||||
!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing
|
||||
- With Internet
|
||||
1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
|
||||
2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
|
||||
- Without Internet
|
||||
1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
|
||||
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
|
||||
|
|
22
docs/chat.md
22
docs/chat.md
|
@ -1,16 +1,30 @@
|
|||
### Khoj Chat
|
||||
#### Overview
|
||||
- Creates a personal assistant for you to inquire and engage with your notes
|
||||
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](#khoj-search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
|
||||
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](/#/search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
|
||||
- Supports multi-turn conversations with the relevant notes for context
|
||||
- Shows reference notes used to generate a response
|
||||
- **Note**: *Your query and top notes from khoj search will be sent to OpenAI for processing*
|
||||
|
||||
!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing
|
||||
|
||||
#### Setup
|
||||
- [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
|
||||
- Get your [OpenAI API Key](https://platform.openai.com/account/api-keys)
|
||||
- Add your OpenAI API to Khoj by using either of the two options below:
|
||||
|
||||
- Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
|
||||
|
||||
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml` @ `~/.khoj/khoj.yml` to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
|
||||
```diff
|
||||
processor:
|
||||
conversation:
|
||||
- openai-api-key: # "YOUR_OPENAI_API_KEY"
|
||||
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
|
||||
model: "text-davinci-003"
|
||||
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
|
||||
```
|
||||
|
||||
#### Use
|
||||
1. Open [/chat](http://localhost:42110/chat)[^2]
|
||||
1. Open [/chat](http://localhost:42110/chat)
|
||||
2. Type your queries and see response by Khoj from your notes
|
||||
|
||||
#### Demo
|
||||
|
|
|
@ -40,8 +40,8 @@ git clone https://github.com/khoj-ai/khoj && cd khoj
|
|||
|
||||
#### 2. Configure
|
||||
|
||||
- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
|
||||
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
|
||||
- **Required**: Update [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
|
||||
- **Optional**: Edit application configuration in [khoj_docker.yml](https://github.com/khoj-ai/khoj/blob/master/config/khoj_docker.yml)
|
||||
|
||||
#### 3. Run
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Emacs</h1>
|
||||
|
||||
An AI personal assistance for your digital brain
|
||||
> An AI personal assistance for your digital brain
|
||||
|
||||
<img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge">
|
||||
<img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge">
|
||||
|
@ -100,7 +100,7 @@ Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Ch
|
|||
|
||||
E.g "When did I file my taxes last year?"
|
||||
|
||||
See [Khoj Chat](./README.md#khoj-chat) for more details
|
||||
See [Khoj Chat](/#/chat) for more details
|
||||
|
||||
### Find Similar Entries
|
||||
This feature finds entries similar to the one you are currently on.
|
||||
|
|
|
@ -25,6 +25,7 @@
|
|||
<script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script>
|
||||
<script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script>
|
||||
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>
|
||||
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-diff.min.js"></script>
|
||||
<script defer data-domain="khoj.dev" src="https://plausible.io/js/script.js"></script>
|
||||
</body>
|
||||
<style>
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
# Khoj
|
||||
*An AI personal assistant for your digital brain*
|
||||
|
||||
Welcome to the Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to dive straight into the code.
|
||||
Welcome to the Khoj Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to explore the code or our [Website](https://khoj.dev) for an invite up to Khoj cloud.
|
||||
|
||||
Khoj gives you lightning fast, offline search on your personal machine and gives you the power to talk to your notes.
|
||||
|
||||
|
@ -28,7 +28,7 @@ Khoj gives you lightning fast, offline search on your personal machine and gives
|
|||
- **Natural**: Advanced natural language understanding using Transformer based ML Models
|
||||
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
|
||||
- **Multiple Sources**: Index your Org-mode and Markdown notes, PDF files, Github repositories, and Photos
|
||||
- **Multiple Interfaces**: Interact from your [Web Browser](./src/khoj/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or [Obsidian](./src/interface/obsidian/)
|
||||
- **Multiple Interfaces**: Interact from your [Web Browser](https://docs.khoj.dev/#/web), [Emacs](https://docs.khoj.dev/#/emacs) or [Obsidian](https://docs.khoj.dev/#/obsidian)
|
||||
|
||||
## Install
|
||||
[Click here](./setup.md) for full setup instructions.
|
||||
|
@ -52,8 +52,3 @@ If you're using Github or Notion, you can get on a waitlist for [Khoj Cloud](htt
|
|||
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
|
||||
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
|
||||
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
|
||||
|
||||
|
||||
[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
|
||||
|
||||
[^2]: Default Khoj url @ http://localhost:42110
|
19
docs/performance.md
Normal file
19
docs/performance.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
## Performance
|
||||
|
||||
### Search performance
|
||||
|
||||
- Semantic search using the bi-encoder is fairly fast at \<100 ms across all content types
|
||||
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
|
||||
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
|
||||
|
||||
### Indexing performance
|
||||
|
||||
- Indexing is more strongly impacted by the size of the source data
|
||||
- Indexing 100K+ line corpus of notes takes about 10 minutes
|
||||
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
|
||||
- Note: *It should only take this long on the first run* as the index is incrementally updated
|
||||
|
||||
### Miscellaneous
|
||||
|
||||
- Testing done on a Mac M1 and a \>100K line corpus of notes
|
||||
- Search, indexing on a GPU has not been tested yet
|
|
@ -1,11 +1,11 @@
|
|||
## Khoj Search
|
||||
- **Khoj via Obsidian**
|
||||
- **Using Obsidian**
|
||||
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
|
||||
- **Khoj via Emacs**
|
||||
- **Using Emacs**
|
||||
- Run `M-x khoj <user-query>`
|
||||
- **Khoj via Web**
|
||||
- Open <http://localhost:42110/> directly
|
||||
- **Khoj via API**
|
||||
- **Using Web**
|
||||
- Open <http://localhost:42110/> in your web browser
|
||||
- **Using API**
|
||||
- See the Khoj FastAPI [Swagger Docs](http://localhost:42110/docs), [ReDocs](http://localhost:42110/redocs)
|
||||
|
||||
### Query Filters
|
||||
|
@ -27,53 +27,3 @@ Use structured query syntax to filter the natural language search results
|
|||
- containing dates from the year *1984*
|
||||
- excluding words *"big"* and *"brother"*
|
||||
- that best match the natural language query *"what is the meaning of life?"*
|
||||
|
||||
## Details
|
||||
1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
|
||||
2. These notes, the last few messages and associated metadata is passed to ChatGPT along with your query for a response
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
### Query performance
|
||||
|
||||
- Semantic search using the bi-encoder is fairly fast at \<50 ms
|
||||
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
|
||||
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
|
||||
|
||||
### Indexing performance
|
||||
|
||||
- Indexing is more strongly impacted by the size of the source data
|
||||
- Indexing 100K+ line corpus of notes takes about 10 minutes
|
||||
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
|
||||
- Note: *It should only take this long on the first run* as the index is incrementally updated
|
||||
|
||||
### Miscellaneous
|
||||
|
||||
- Testing done on a Mac M1 and a \>100K line corpus of notes
|
||||
- Search, indexing on a GPU has not been tested yet
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Use OpenAI Models for Search
|
||||
#### Setup
|
||||
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml`[^1]:
|
||||
```diff
|
||||
asymmetric:
|
||||
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
||||
+ encoder: text-embedding-ada-002
|
||||
+ encoder-type: khoj.utils.models.OpenAI
|
||||
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
||||
- encoder-type: sentence_transformers.SentenceTransformer
|
||||
- model_directory: "~/.khoj/search/asymmetric/"
|
||||
+ model-directory: null
|
||||
```
|
||||
2. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
|
||||
3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
|
||||
|
||||
#### Warnings
|
||||
This configuration *uses an online model*
|
||||
- It will **send all notes to OpenAI** to generate embeddings
|
||||
- **All queries will be sent to OpenAI** when you search with Khoj
|
||||
- You will be **charged by OpenAI** based on the total tokens processed
|
||||
- It *requires an active internet connection* to search and index
|
||||
|
|
|
@ -87,7 +87,7 @@ pip install --upgrade --pre khoj-assistant
|
|||
- **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details
|
||||
|
||||
#### Search starts giving wonky results
|
||||
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true)[^2] in browser to regenerate index from scratch
|
||||
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true) in browser to regenerate index from scratch
|
||||
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
|
||||
|
||||
#### Khoj in Docker errors out with \"Killed\" in error message
|
||||
|
|
19
docs/web.md
Normal file
19
docs/web.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Web</h1>
|
||||
|
||||
> An AI personal assistant for your Digital Brain
|
||||
|
||||
## Features
|
||||
- **Search**
|
||||
- **Natural**: Advanced natural language understanding using Transformer based ML Models
|
||||
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
|
||||
- **Incremental**: Incremental search for a fast, search-as-you-type experience
|
||||
- **Chat**
|
||||
- **Faster answers**: Find answers faster and with less effort than search
|
||||
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
|
||||
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
|
||||
|
||||
## Setup
|
||||
The Khoj web interface is the default interface. It comes packaged with the khoj server.
|
||||
|
||||
## Interface
|
||||
![](./assets/khoj_chat_web_interface.png?)
|
Loading…
Reference in a new issue