Commit graph

2457 commits

Author SHA1 Message Date
Debanjum Singh Solanky
efcf7d1508 Extract prompts as LangChain Prompt Templates into a separate module
Improves code modularity, cleanliness. Reduces bloat in GPT.py module
2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky
b484953bb3 Import app state correctly to generate embeddings with OpenAI model
Resolves #216
2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
9cfaaf0941 Update docs to configure khoj.yml for using OpenAI model for embeddings 2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
a0d0dbaca7 Fix link to Khoj Obsidian Demo video in Readmes 2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky
ebb5d7b8e5 Release Khoj version 0.6.2 2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky
d02415edcc Write generated server id to env file when env file does not contain it 2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky
dc0626856e Put the telemetry db in a separate directory by default 2023-05-17 18:58:47 +05:30
Debanjum
dc495babb3
Add Telemetry to Understand Khoj Usage
### Objective: 
Use telemetry to better understand Khoj usage.
This will motivate and prioritize work for Khoj.

Specific questions:
- Number of active deployments of khoj server
- How regularly is khoj used (hourly, daily, weekly etc)?
- How much is which feature used (chat, search)?
- Which UI interface is used most (obsidian, emacs, web ui)?

### Details
- Expose setting to disable telemetry logging in khoj.yml
- Create basic telemetry server to log data to a DB
- Log calls to Khoj API /search, /chat, /update endpoints
- Batch upload telemetry data to server at ~hourly interval
2023-05-17 19:09:50 +08:00
Debanjum Singh Solanky
55d72231b3 Generate docker image for telemetry server using Github workflow 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
e9f04dc644 Add dockerfile to containerize telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
07b19964d4 Schedule jobs at (co-)prime intervals to reduce overlap in job runs 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
d42f0f5055 Add basic telemetry server for khoj 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
134cce9d32 Batch upload telemetry data at regular interval instead of while querying 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
3ede919c66 Log usage of /search, /chat, /update API endpoints to telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
f2e89f6f46 Add khoj app helper methods to log app usage to a telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
9ca61d62ff Enable/disable logging telemetry by setting bool in khoj.yml config
We log usage telemetry by default, unless setting explicitly set in
khoj.yml
2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky
131b8407b5 Allow Khoj Chat to respond to general queries not in reference notes
- Khoj chat will now respond to general queries if:
  1. no relevant reference notes available or
  2. when explicitly induced by prefixing the chat message with "@general"

- Previously Khoj Chat would a lot of times refuse to respond to
  general queries not answerable from reference notes or chat history

- Make chat quality tests more robust
  - Add more equivalent chat response options refusing to answer
  - Force haiku writing to not give any preable, just the haiku
2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky
cc75f986b2 Test text search index only updates on changes to text content 2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky
f9ccce430e Allow configuring OpenAI chat model for Khoj chat
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
  defaults to using gpt-3.5-turbo, unless chat-model field under
  conversation processor updated in khoj.yml
2023-05-03 23:01:13 +08:00
Debanjum
f0253e2cbb
Include Filename, Entry Heading in All Compiled Entries to Improve Search Context
Merge pull request #214 from debanjum/add-filename-heading-to-compiled-entry-for-context

- Set filename as top heading in compiled org, markdown entries
  - Note: *Khoj was already indexing filenames in compiled markdown entries but they weren't set as top level headings but rather appended as bare text*. The updated structure should provide more schematic context of relevance
- Set entry heading as heading for compiled org, md entries, even if split by max tokens
- Snip prepended heading to avoid crossing model max_token limits
- Entries with no md headings should not get heading prefix prepended
2023-05-03 22:59:30 +08:00
Debanjum Singh Solanky
6b535cc345 Snip prepended heading to avoid crossing model max_token limits
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.

100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
02aeee60aa Set filename as top heading of org entries for better search context
Previously filename was only being appended to markdown entries.

Test filename getting prepended to compiled entry as heading
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
94825a70b9 Set heading of md entries to improve search context for long entries
Otherwise if a markdown entry is longer than max_tokens, the split
entries (apart from first one) do not get their heading context set
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
5de04621b5 Set filename as top heading of md entries for better search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context

Test filename getting prepended as heading to compiled entry
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
0e3fb59e09 Entries with no md headings should not get heading prefix prepended
Files with no headings would previously get their entry be prefixed
with a markdown heading prefix (#)
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
45a991d75c Prepend entry heading to all compiled org snippets to improve search context
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.

This limits search context required to retrieve these continuation
entries
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
3386cc92b5 Fix khoj server config update in khoj.el by unquoting list to cl-push to
- cl-push expects a generatlized variable. Else throws (setf quote)
  undefined warning
- This results in the config call failing on calling khoj entrypoint
2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky
948a4274e4 Fix documentation strings and simplify not null checks 2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky
731ef5688f Use cl-pushnew to fix byte-compile errors with using add-to-list 2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky
f046523b33 Improve khoj.el messages to convey state of khoj server
- Remove waiting for server message as it hides the messages from the
  server
- Fix the nil message that were being rendered, by checking before
  showing messages from server
- Consistently prefix messages from khoj with khoj.el
2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky
76df393eb5 Only call khoj server configure API from khoj.el when config updated
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs

Also show more details to user about what in khoj is being configured
2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
ceae06ae9d Fix khoj.el compilation warnings around unused variables 2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
8269adf849 Refactor khoj-setup in khoj.el for readability. No functional change 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
865d12b6f2 Fix escaping quote in chat references to prevent it breaking out of html 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
26cb878327 Add Yarn lockfile for Khoj Obsidian 2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky
e3180d63e6 Sync Khoj Obsidian Tagline with Khoj tagline 2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky
62e6e09521 Release Khoj version 0.6.1 2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky
b079fb31bc Replace Windows path separators in indexName configured via Khoj Obsidian
Resolves #185, #199

- Issue
  IndexName created from Obsidian Absolute Vault path wasn't replacing
  windows path, drive separators with underscore. It was only
  replacing unix path separators

- Fix
  Also replace windows drive and path separators with _ while creating
  IndexName in Khoj Obsidian plugin
2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky
d90df966a9 Make khoj logger use utf-8 encoding when writing to khoj log file
Resolve logger error issue mentioned in #199
2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky
dc3f399f91 Fix to get score associated with SearchResponse in result as string 2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky
d5000c63e1 Update Readmes to use python -m pip install khoj-assistant
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky
453c84ab79 Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes 2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky
35aa06067f Release Khoj version 0.6.0
Upload styles.css via release workflow
2023-03-31 18:13:16 +07:00
Debanjum
8f4e5d3d83
Improve Styling of Khoj Search Modal on Obsidian and Indexing of Markdown
Merge pull request #198 from debanjum/improve-khoj-search-for-markdown-obsidian

### Overview
- Copied Khoj Search Modal styling from Jim Prince's PR #135 with minor improvements
- Implements improvements to the Khoj Search in Markdown/Obsidian suggested by folks. Specifically:
  - #133
  - #134
  - #142

### Changes
- 5673bd5 Keep original formatting in compiled text entry strings
- a2ab68a Include filename of markdown entries for search indexing
- 6712996 Create Note with Query as title from within Khoj Search Modal
- d3257cb Style the search result. Use Obsidian theme colors and font-size
- 4009148 For each result: snip it by lines, show filename, remove frontmatter
2023-03-30 14:15:23 +07:00
Debanjum Singh Solanky
5673bd5b96 Keep original formatting in compiled text entry strings
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
  No point in dropping the formatting unnecessarily,
  even if (say) the currrent search models don't account for it (yet)
2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky
a2ab68a7a2 Include filename of markdown entries for search indexing
Append originating filename to compiled string of each entry for
better search quality by providing more context to model

Update markdown_to_jsonl tests to ensure filename being added

Resolves #142
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
67129964a7 Create Note with Query as title from within Khoj Search Modal
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.

The note creation code is borrowed from Omnisearch.

Resolves #133
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
d3257cb24e Style the search result. Use Obsidian theme colors and font-size
Based on PR #135
2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky
40091489c0 For each result: snip it by lines, show filename, remove frontmatter
Based on PR #135
Resolves #134
2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky
240db7b4f0 Add screenshot of Khoj chat on Obsidian to Readme. Fix links 2023-03-30 02:49:05 +07:00