Fix diff blocks, links, remove footnotes & rearrange sections in docs

Extract performance into separate sectin into shoving it under search Create page for web interface
2024-11-27 09:25:06 +01:00 · 2023-07-21 00:05:44 -07:00 · 2023-07-21 00:05:44 -07:00 · c28755ccd2
commit c28755ccd2
parent 2ddee7e745
12 changed files with 125 additions and 115 deletions
--- a/.github/workflows/build_desktop.yml
+++ b/.github/workflows/build_desktop.yml
@ -4,6 +4,10 @@ on:
  push:
    branches:
      - master
+    paths:
+      - src/khoj/**
+      - pyproject.toml
+      - Khoj.spec
  workflow_dispatch:

 jobs:
--- a/docs/_sidebar.md
+++ b/docs/_sidebar.md
@ -1,18 +1,20 @@
- Getting Started
-    - [Overview](README.md)
+- Get Started
+    - [Overview](overview.md)
    - [Install](setup.md)
        - [Windows Installation](windows_install.md)
- Learn More
+    - [Demos](demos.md)
+- Use
    - [Features](features.md)
        - [Chat](chat.md)
        - [Search](search.md)
-    - [Demos](demos.md)
    - Interfaces
        - [Obsidian](obsidian.md)
        - [Emacs](emacs.md)
+        - [Web](web.md)
    - Data Sources
        - [Github](github_integration.md)
        - [Notion](notion_integration.md)
    - [Advanced](advanced.md)
+    - [Performance](performance.md)
 - Contributing
    - [Development](development.md)
--- a/docs/advanced.md
+++ b/docs/advanced.md
@ -1,54 +1,60 @@

 ## Advanced Usage
+### Search across Different Languages
+To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
+For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
+1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
+
+    ```diff
+    asymmetric:
+    -  encoder: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
+    +  encoder: paraphrase-multilingual-MiniLM-L12-v2
+      cross-encoder: cross-encoder/ms-marco-MiniLM-L-6-v2
+      model_directory: "~/.khoj/search/asymmetric/"
+    ```
+
+2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
+
 ### Access Khoj on Mobile
-1. [Setup Khoj](#Setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
+1. [Setup Khoj](/#/setup) on your personal server. This can be any always-on machine, i.e an old computer, RaspberryPi(?) etc
 2. [Install](https://tailscale.com/kb/installation/) [Tailscale](tailscale.com/) on your personal server and phone
 3. Open the Khoj web interface of the server from your phone browser.<br /> It should be `http://tailscale-ip-of-server:42110` or `http://name-of-server:42110` if you've setup [MagicDNS](https://tailscale.com/kb/1081/magicdns/)
 4. Click the [Add to Homescreen](https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Add_to_home_screen) button
 5. Enjoy exploring your notes, documents and images from your phone!

-![](https://github.com/khoj-ai/khoj/blob/master/docs/khoj_pwa_android.png?)
+![](./assets/khoj_pwa_android.png?)

-### Search across Different Languages
-  To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
-  For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
-  1. Manually update `search-type > asymmetric > encoder` to `paraphrase-multilingual-MiniLM-L12-v2` in your `~/.khoj/khoj.yml` file for now. See diff of `khoj.yml` below for illustration:
-  ```diff
-   asymmetric:
- encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
-+ encoder: "paraphrase-multilingual-MiniLM-L12-v2"
-     cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
-     model_directory: "~/.khoj/search/asymmetric/"
-  ```
+### Use OpenAI Models for Search
+#### Setup
+1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml` (at `~/.khoj/khoj.yml`):
+   ```diff
+      asymmetric:
+   -    encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
+   +    encoder: text-embedding-ada-002
+   +    encoder-type: khoj.utils.models.OpenAI
+        cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
+   -    encoder-type: sentence_transformers.SentenceTransformer
+   -    model_directory: "~/.khoj/search/asymmetric/"
+   +    model-directory: null
+   ```
+2. [Setup your OpenAI API key in Khoj](/#/chat?id=setup)
+3. Restart Khoj server to generate embeddings. It will take longer than with the offline search models.

-  2. Regenerate your content index. For example, by opening [\<khoj-url\>/api/update?t=force](http://localhost:42110/api/update?t=force)
+#### Warnings
+  This configuration *uses an online model*
+  - It will **send all notes to OpenAI** to generate embeddings
+  - **All queries will be sent to OpenAI** when you search with Khoj
+  - You will be **charged by OpenAI** based on the total tokens processed
+  - It *requires an active internet connection* to search and index

 ### Bootstrap Khoj Search for Offline Usage later

-  You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
-  Note: *Only search can currently run in fully offline mode, not chat.*
+You can bootstrap Khoj pre-emptively to run on machines that do not have internet access. An example use-case would be to run Khoj on an air-gapped machine.
+Note: *Only search can currently run in fully offline mode, not chat.*

-  - With Internet
-    1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
-    2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
-  - Without Internet
-    1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
-    2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
-
-
-## Miscellaneous
-### Set your OpenAI API key in Khoj
-If you want, Khoj can be configured to use OpenAI for search and chat.<br />
-Add your OpenAI API to Khoj by using either of the two options below:
- - Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
- - Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml`[^1] to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
-    ```diff
-    processor:
-      conversation:
-    -    openai-api-key: # "YOUR_OPENAI_API_KEY"
-    +    openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
-        model: "text-davinci-003"
-        conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
-    ```
-
-!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing
+- With Internet
+  1. Manually download the [asymmetric text](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [symmetric text](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and [image search](https://huggingface.co/sentence-transformers/clip-ViT-B-32) models from HuggingFace
+  2. Pip install khoj (and dependencies) in an associated virtualenv. E.g `python -m venv .venv && source .venv/bin/activate && pip install khoj-assistant`
+- Without Internet
+  1. Copy each of the search models into their respective folders, `asymmetric`, `symmetric` and `image` under the `~/.khoj/search/` directory on the air-gapped machine
+  2. Copy the khoj virtual environment directory onto the air-gapped machine, activate the environment and start and khoj as normal. E.g `source .venv/bin/activate && khoj`
--- a/docs/chat.md
+++ b/docs/chat.md
@ -1,16 +1,30 @@
 ### Khoj Chat
 #### Overview
 - Creates a personal assistant for you to inquire and engage with your notes
- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](#khoj-search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
+- Uses [ChatGPT](https://openai.com/blog/chatgpt) and [Khoj search](/#/search). [Offline chat](https://github.com/khoj-ai/khoj/issues/201) is coming soon.
 - Supports multi-turn conversations with the relevant notes for context
 - Shows reference notes used to generate a response
- **Note**: *Your query and top notes from khoj search will be sent to OpenAI for processing*
+
+!> **Warning**: This will enable Khoj to send your query and note(s) to OpenAI for processing

 #### Setup
- [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
+- Get your [OpenAI API Key](https://platform.openai.com/account/api-keys)
+- Add your OpenAI API to Khoj by using either of the two options below:
+
+  - Open your [Khoj settings](http://localhost:42110/config/processor/conversation), add your OpenAI API key, and click *Save*. Then go to your [Khoj settings](http://localhost:42110/config) and click `Configure`. This will refresh Khoj with your OpenAI API key.
+
+  - Set `openai-api-key` field under `processor.conversation` section in your `khoj.yml` @ `~/.khoj/khoj.yml` to your [OpenAI API key](https://beta.openai.com/account/api-keys) and restart khoj:
+    ```diff
+    processor:
+      conversation:
+    -    openai-api-key: # "YOUR_OPENAI_API_KEY"
+    +    openai-api-key: sk-aaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhhhhhhhhhhhhhhh
+        model: "text-davinci-003"
+        conversation-logfile: "~/.khoj/processor/conversation/conversation_logs.json"
+    ```

 #### Use
-1. Open [/chat](http://localhost:42110/chat)[^2]
+1. Open [/chat](http://localhost:42110/chat)
 2. Type your queries and see response by Khoj from your notes

 #### Demo
--- a/docs/development.md
+++ b/docs/development.md
@ -40,8 +40,8 @@ git clone https://github.com/khoj-ai/khoj && cd khoj

 #### 2. Configure

- **Required**: Update [docker-compose.yml](./docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
- **Optional**: Edit application configuration in [khoj_docker.yml](./config/khoj_docker.yml)
+- **Required**: Update [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to mount your images, (org-mode or markdown) notes, PDFs and Github repositories
+- **Optional**: Edit application configuration in [khoj_docker.yml](https://github.com/khoj-ai/khoj/blob/master/config/khoj_docker.yml)

 #### 3. Run

--- a/docs/emacs.md
+++ b/docs/emacs.md
@ -1,6 +1,6 @@
 <h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Emacs</h1>

-An AI personal assistance for your digital brain
+> An AI personal assistance for your digital brain

 <img src="https://stable.melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Stable Badge">
 <img src="https://melpa.org/packages/khoj-badge.svg" width="150" alt="Melpa Badge">
@ -100,7 +100,7 @@ Indexes the specified org files, directories. Sets up OpenAI API key for Khoj Ch

  E.g "When did I file my taxes last year?"

-  See [Khoj Chat](./README.md#khoj-chat) for more details
+  See [Khoj Chat](/#/chat) for more details

 ### Find Similar Entries
 This feature finds entries similar to the one you are currently on.
--- a/docs/index.html
+++ b/docs/index.html
@ -25,6 +25,7 @@
  <script src="//cdn.jsdelivr.net/npm/docsify/lib/plugins/search.min.js"></script>
  <script src="//cdn.jsdelivr.net/npm/docsify-copy-code/dist/docsify-copy-code.min.js"></script>
  <script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>
+  <script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-diff.min.js"></script>
  <script defer data-domain="khoj.dev" src="https://plausible.io/js/script.js"></script>
 </body>
 <style>
--- a/docs/overview.md
+++ b/docs/overview.md
@ -7,7 +7,7 @@
 # Khoj
 *An AI personal assistant for your digital brain*

-Welcome to the Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to dive straight into the code.
+Welcome to the Khoj Docs! This is the best place to get started with Khoj. Check out our [Github](https://github.com/khoj-ai/khoj) to explore the code or our [Website](https://khoj.dev) for an invite up to Khoj cloud.

 Khoj gives you lightning fast, offline search on your personal machine and gives you the power to talk to your notes.

@ -28,7 +28,7 @@ Khoj gives you lightning fast, offline search on your personal machine and gives
  - **Natural**: Advanced natural language understanding using Transformer based ML Models
  - **Pluggable**: Modular architecture makes it easy to plug in new data sources, frontends and ML models
  - **Multiple Sources**: Index your Org-mode and Markdown notes, PDF files, Github repositories, and Photos
-  - **Multiple Interfaces**: Interact from your [Web Browser](./src/khoj/interface/web/index.html), [Emacs](./src/interface/emacs/khoj.el) or [Obsidian](./src/interface/obsidian/)
+  - **Multiple Interfaces**: Interact from your [Web Browser](https://docs.khoj.dev/#/web), [Emacs](https://docs.khoj.dev/#/emacs) or [Obsidian](https://docs.khoj.dev/#/obsidian)

 ## Install
 [Click here](./setup.md) for full setup instructions.
@ -52,8 +52,3 @@ If you're using Github or Notion, you can get on a waitlist for [Khoj Cloud](htt
 - Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
 - [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
 - [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
-
-
-[^1]: Default Khoj config file @ `~/.khoj/khoj.yml`
-
-[^2]: Default Khoj url @ http://localhost:42110
--- a/docs/performance.md
+++ b/docs/performance.md
@ -0,0 +1,19 @@
+## Performance
+
+### Search performance
+
+- Semantic search using the bi-encoder is fairly fast at \<100 ms across all content types
+- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
+- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
+
+### Indexing performance
+
+- Indexing is more strongly impacted by the size of the source data
+- Indexing 100K+ line corpus of notes takes about 10 minutes
+- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
+- Note: *It should only take this long on the first run* as the index is incrementally updated
+
+### Miscellaneous
+
+- Testing done on a Mac M1 and a \>100K line corpus of notes
+- Search, indexing on a GPU has not been tested yet
--- a/docs/search.md
+++ b/docs/search.md
@ -1,11 +1,11 @@
 ## Khoj Search
- **Khoj via Obsidian**
+- **Using Obsidian**
  - Click the *Khoj search* icon 🔎 on the [Ribbon](https://help.obsidian.md/User+interface/Workspace/Ribbon) or Search for *Khoj: Search* in the [Command Palette](https://help.obsidian.md/Plugins/Command+palette)
- **Khoj via Emacs**
+- **Using Emacs**
  - Run `M-x khoj <user-query>`
- **Khoj via Web**
-  - Open <http://localhost:42110/> directly
- **Khoj via API**
+- **Using Web**
+  - Open <http://localhost:42110/> in your web browser
+- **Using API**
  - See the Khoj FastAPI [Swagger Docs](http://localhost:42110/docs), [ReDocs](http://localhost:42110/redocs)

 ### Query Filters
@ -27,53 +27,3 @@ Use structured query syntax to filter the natural language search results
    - containing dates from the year *1984*
    - excluding words *"big"* and *"brother"*
    - that best match the natural language query *"what is the meaning of life?"*
-
-## Details
-1. Your query is used to retrieve the most relevant notes, if any, using Khoj search
-2. These notes, the last few messages and associated metadata is passed to ChatGPT along with your query for a response
-
-
-## Performance
-
-### Query performance
-
- Semantic search using the bi-encoder is fairly fast at \<50 ms
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
-
-### Indexing performance
-
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Note: *It should only take this long on the first run* as the index is incrementally updated
-
-### Miscellaneous
-
- Testing done on a Mac M1 and a \>100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
-
-## Advanced Usage
-
-### Use OpenAI Models for Search
-#### Setup
-1. Set `encoder-type`, `encoder` and `model-directory` under `asymmetric` and/or `symmetric` `search-type` in your `khoj.yml`[^1]:
-   ```diff
-      asymmetric:
-   -    encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
-   +    encoder: text-embedding-ada-002
-   +    encoder-type: khoj.utils.models.OpenAI
-        cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
-   -    encoder-type: sentence_transformers.SentenceTransformer
-   -    model_directory: "~/.khoj/search/asymmetric/"
-   +    model-directory: null
-   ```
-2. [Setup your OpenAI API key in Khoj](#set-your-openai-api-key-in-khoj)
-3. Restart Khoj server to generate embeddings. It will take longer than with offline models.
-
-#### Warnings
-  This configuration *uses an online model*
-  - It will **send all notes to OpenAI** to generate embeddings
-  - **All queries will be sent to OpenAI** when you search with Khoj
-  - You will be **charged by OpenAI** based on the total tokens processed
-  - It *requires an active internet connection* to search and index
--- a/docs/setup.md
+++ b/docs/setup.md
@ -87,7 +87,7 @@ pip install --upgrade --pre khoj-assistant
 - **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details

 #### Search starts giving wonky results
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true)[^2] in browser to regenerate index from scratch
+- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true) in browser to regenerate index from scratch
 - **Note**: *This is a fix for when you percieve the search results have degraded. Not if you think they've always given wonky results*

 #### Khoj in Docker errors out with \"Killed\" in error message
--- a/docs/web.md
+++ b/docs/web.md
@ -0,0 +1,19 @@
+<h1><img src="./assets/khoj-logo-sideways.svg" width="200" alt="Khoj Logo">Web</h1>
+
+> An AI personal assistant for your Digital Brain
+
+## Features
+- **Search**
+  - **Natural**: Advanced natural language understanding using Transformer based ML Models
+  - **Local**: Your personal data stays local. All search and indexing is done on your machine. *Unlike chat which requires access to GPT.*
+  - **Incremental**: Incremental search for a fast, search-as-you-type experience
+- **Chat**
+  - **Faster answers**: Find answers faster and with less effort than search
+  - **Iterative discovery**: Iteratively explore and (re-)discover your notes
+  - **Assisted creativity**: Smoothly weave across answers retrieval and content generation
+
+## Setup
+The Khoj web interface is the default interface. It comes packaged with the khoj server.
+
+## Interface
+![](./assets/khoj_chat_web_interface.png?)