Fix diff blocks, links, remove footnotes & rearrange sections in docs

Extract performance into separate sectin into shoving it under search
Create page for web interface
This commit is contained in:
Debanjum Singh Solanky 2023-07-21 00:05:44 -07:00
parent 2ddee7e745
commit c28755ccd2
12 changed files with 125 additions and 115 deletions

View file

@ -4,6 +4,10 @@ on:
push:
branches:
- master
paths:
- src/khoj/**
- pyproject.toml
- Khoj.spec
workflow_dispatch:
jobs:

View file

@ -1,18 +1,20 @@
- Getting Started
- [Overview](README.md)
- Get Started
- [Overview](overview.md)
- [Install](setup.md)
- [Windows Installation](windows_install.md)
- Learn More
- [Demos](demos.md)
- Use
- [Features](features.md)
- [Chat](chat.md)
- [Search](search.md)
- [Demos](demos.md)
- Interfaces
- [Obsidian](obsidian.md)
- [Emacs](emacs.md)
- [Web](web.md)
- Data Sources
- [Github](github_integration.md)
- [Notion](notion_integration.md)
- [Advanced](advanced.md)
- [Performance](performance.md)
- Contributing
- [Development](development.md)

View file

@ -1,27 +1,51 @@
## Advanced Usage
### Search across Different Languages
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
```diff
asymmetric:
- encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
+ encoder: paraphrase-multilingual-MiniLM-L12-v2
cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
model_directory: "~/.khoj/search/asymmetric/"
```
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
### Access Khoj on Mobile
1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
1. [Setup Khoj](/#/setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
5. Enjoy exploring your notes, documents and images from your phone!
![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_pwa_android.png?)
![](./assets/khoj_pwa_android.png?)
### Search across Different Languages
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
### Use OpenAI Models for Search
#### Setup
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml` (at `~/.khoj/khoj.yml`):
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: "paraphrase-multilingual-MiniLM-L12-v2"
+ encoder: text-embedding-ada-002
+ encoder-type: khoj.utils.models.OpenAI
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
model_directory: "~/.khoj/search/asymmetric/"
- encoder-type: sentence_transformers.SentenceTransformer
- model_directory: "~/.khoj/search/asymmetric/"
+ model-directory: null
```
2. [Setup your OpenAI API key in Khoj](/#/chat?id=setup)
3. Restart Khoj server to generate embeddings. It will take longer than with the offline search models.
2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
#### Warnings
This configuration *uses an online model*
- It will **send all notes to OpenAI** to generate embeddings
- **All queries will be sent to OpenAI** when you search with Khoj
- You will be **charged by OpenAI** based on the total tokens processed
- It *requires an active internet connection* to search and index
### Bootstrap Khoj Search for Offline Usage later
@ -34,21 +58,3 @@
- Without Internet
1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
## Miscellaneous
### Set your OpenAI API key in Khoj
If you want, Khoj can be configured to use OpenAI for search and chat.<br />
Add your OpenAI API to Khoj by using either of the two options below:
- Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml`[^1] to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
```diff
processor:
conversation:
- openai-api-key: # "YOUR_OPENAI_API_KEY"
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
```
!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing

View file

@ -1,16 +1,30 @@
### Khoj Chat
#### Overview
- Creates a personal assistant for you to inquire and engage with your notes
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](#khoj-search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](/#/search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
- Supports multi-turn conversations with the relevant notes for context
- Shows reference notes used to generate a response
- **Note**: *Your query and top notes from khoj search will be sent to OpenAI for processing*
!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing
#### Setup
- [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
- Get your [OpenAI API Key](https://platform.openai.com/account/api-keys)
- Add your OpenAI API to Khoj by using either of the two options below:
- Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
- Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml` @ `~/.khoj/khoj.yml` to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
```diff
processor:
conversation:
- openai-api-key: # "YOUR_OPENAI_API_KEY"
+ openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
model: "text-davinci-003"
conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
```
#### Use
1. Open [/chat](http://localhost:42110/chat)[^2]
1. Open [/chat](http://localhost:42110/chat)
2. Type your queries and see response by Khoj from your notes
#### Demo

View file

@ -40,8 +40,8 @@ git clone https://github.com/khoj-ai/khoj && cd khoj
#### 2. Configure
- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
- **Required**: Update [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
- **Optional**: Edit application configuration in [khoj_docker.yml](https://github.com/khoj-ai/khoj/blob/master/config/khoj_docker.yml)
#### 3. Run

View file

@ -1,6 +1,6 @@
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Emacs</h1>
An AI personal assistance for your digital brain
> An AI personal assistance for your digital brain
<img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge">
<img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge">
@ -100,7 +100,7 @@ Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Ch
E.g "When did I file my taxes last year?"
See [Khoj Chat](./README.md#khoj-chat) for more details
See [Khoj Chat](/#/chat) for more details
### Find Similar Entries
This feature finds entries similar to the one you are currently on.

View file

@ -25,6 +25,7 @@
<script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-diff.min.js"></script>
<script defer data-domain="khoj.dev" src="https://plausible.io/js/script.js"></script>
</body>
<style>

View file

@ -7,7 +7,7 @@
# Khoj
*An AI personal assistant for your digital brain*
Welcome to the Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to dive straight into the code.
Welcome to the Khoj Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to explore the code or our [Website](https://khoj.dev) for an invite up to Khoj cloud.
Khoj gives you lightning fast, offline search on your personal machine and gives you the power to talk to your notes.
@ -28,7 +28,7 @@ Khoj gives you lightning fast, offline search on your personal machine and gives
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- **Multiple Sources**: Index your Org-mode and Markdown notes, PDF files, Github repositories, and Photos
- **Multiple Interfaces**: Interact from your [Web Browser](./src/khoj/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or [Obsidian](./src/interface/obsidian/)
- **Multiple Interfaces**: Interact from your [Web Browser](https://docs.khoj.dev/#/web), [Emacs](https://docs.khoj.dev/#/emacs) or [Obsidian](https://docs.khoj.dev/#/obsidian)
## Install
[Click here](./setup.md) for full setup instructions.
@ -52,8 +52,3 @@ If you're using Github or Notion, you can get on a waitlist for [Khoj Cloud](htt
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
[^2]: Default Khoj url @ http://localhost:42110

19
docs/performance.md Normal file
View file

@ -0,0 +1,19 @@
## Performance
### Search performance
- Semantic search using the bi-encoder is fairly fast at \<100 ms across all content types
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet

View file

@ -1,11 +1,11 @@
## Khoj Search
- **Khoj via Obsidian**
- **Using Obsidian**
- Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **Khoj via Emacs**
- **Using Emacs**
- Run `M-x khoj <user-query>`
- **Khoj via Web**
- Open <http://localhost:42110/> directly
- **Khoj via API**
- **Using Web**
- Open <http://localhost:42110/> in your web browser
- **Using API**
- See the Khoj FastAPI [Swagger Docs](http://localhost:42110/docs), [ReDocs](http://localhost:42110/redocs)
### Query Filters
@ -27,53 +27,3 @@ Use structured query syntax to filter the natural language search results
- containing dates from the year *1984*
- excluding words *"big"* and *"brother"*
- that best match the natural language query *"what is the meaning of life?"*
## Details
1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
2. These notes, the last few messages and associated metadata is passed to ChatGPT along with your query for a response
## Performance
### Query performance
- Semantic search using the bi-encoder is fairly fast at \<50 ms
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
### Indexing performance
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
### Miscellaneous
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
## Advanced Usage
### Use OpenAI Models for Search
#### Setup
1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml`[^1]:
```diff
asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+ encoder: text-embedding-ada-002
+ encoder-type: khoj.utils.models.OpenAI
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
- encoder-type: sentence_transformers.SentenceTransformer
- model_directory: "~/.khoj/search/asymmetric/"
+ model-directory: null
```
2. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
#### Warnings
This configuration *uses an online model*
- It will **send all notes to OpenAI** to generate embeddings
- **All queries will be sent to OpenAI** when you search with Khoj
- You will be **charged by OpenAI** based on the total tokens processed
- It *requires an active internet connection* to search and index

View file

@ -87,7 +87,7 @@ pip install --upgrade --pre khoj-assistant
- **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details
#### Search starts giving wonky results
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true)[^2] in browser to regenerate index from scratch
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true) in browser to regenerate index from scratch
- **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*
#### Khoj in Docker errors out with \"Killed\" in error message

19
docs/web.md Normal file
View file

@ -0,0 +1,19 @@
<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Web</h1>
> An AI personal assistant for your Digital Brain
## Features
- **Search**
- **Natural**: Advanced natural language understanding using Transformer based ML Models
- **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
- **Incremental**: Incremental search for a fast, search-as-you-type experience
- **Chat**
- **Faster answers**: Find answers faster and with less effort than search
- **Iterative discovery**: Iteratively explore and (re-)discover your notes
- **Assisted creativity**: Smoothly weave across answers retrieval and content generation
## Setup
The Khoj web interface is the default interface. It comes packaged with the khoj server.
## Interface
![](./assets/khoj_chat_web_interface.png?)