Commit graph

1085 commits

Author SHA1 Message Date
Debanjum Singh Solanky
d77e05c279 Release Khoj version 0.7.0 2023-07-01 05:44:22 -07:00
Debanjum Singh Solanky
30d87a9a01 Update color of Khoj chat in Obsidinan plugin to Lantern theme 2023-07-01 02:18:47 -07:00
Debanjum Singh Solanky
51826d28d6 Ensure clicking Update in Khoj Obsidian indexes PDF files too 2023-07-01 02:18:47 -07:00
sabaimran
dac2d14380 Handle file names appropriately for md files and render commits in github results 2023-07-01 01:20:58 -07:00
sabaimran
dbe713604d Fix error in tests for markdown_to_jsonl 2023-07-01 00:49:40 -07:00
sabaimran
931aab4464 Handle case for when headers value is None 2023-07-01 00:37:30 -07:00
sabaimran
d01afb3ee4 Fix path issues for URL-based markdown files 2023-07-01 00:25:11 -07:00
sabaimran
31655447e7 Add the sign-up list to the chat page as well and update copy 2023-06-30 21:43:01 -07:00
sabaimran
796102c74e Add separate configuration if the given Khoj instance is meant for demo
- In theory, this will be suitable for any Khoj instance that's meant for external-facing purposes (as in, outside of the user's network)
- Prevent re-indexing for Github data if this is a demo instance
- Fix up some issues with the CSS which made settings page small in mobile
- In the frontend views for Khoj, add a button to get on the waitlist and links to the landing page
2023-06-30 20:38:55 -07:00
sabaimran
db3026739d Resolve diffs in api.py to make /chat endpoint async with new request parameter 2023-06-30 00:25:37 -07:00
sabaimran
ef72508914 Try/catch around github file decoding, await call to search in chat API, fix img width 2023-06-30 00:23:21 -07:00
Debanjum Singh Solanky
b950889f47 Fix org-mode web renderer to handle results containing list in block
- Break out of rendering list if at end of org block in org.js
- This would previous hang rendering results in web interface

Should try fix this upstream in org.js as well
2023-06-29 19:01:25 -07:00
sabaimran
780c769567 Add additional request headers to improve telemetry 2023-06-29 18:51:24 -07:00
sabaimran
6c10d68262
Merge pull request #253 from khoj-ai/features/github-issues-indexing
Support indexing Github issues as well as corresponding comments
2023-06-29 16:02:47 -07:00
sabaimran
b2dd946c6d Rename issue to entry method for accuracy 2023-06-29 15:23:50 -07:00
Debanjum Singh Solanky
51dfa48e2b Have Khoj support Python 3.11 as Pytorch supports it now
- Previously Khoj could only support Python upto 3.10 due to pytorch.
  But lots of folks had python 3.11 installed by default on their machines.

  This required installing python 3.10 and dealing with virtual envs.

  With Torch >= 2.0.1 now able to support python 3.11, at least one
  class of installation troubles for Khoj should drop. See
  https://github.com/pytorch/pytorch/issues/86566 for reference

- Preliminary testing indicates using the new torch 2.x may reduce
  search time by 25% (from 80ms to 60ms on Mac M1)

- Update Docs to not require mentioning python <=3.10 required
- Update Github test workflow to run khoj tests with python 3.11 too
2023-06-29 15:13:26 -07:00
sabaimran
65bf894302 Interpret org files as a list and put them in separate divs. Update styling of search results to separate into cards 2023-06-29 15:12:48 -07:00
Debanjum Singh Solanky
d212298573 Make Configure button on web interface incrementally update by default
We should add a way to force index everything.

But force indexing should not be the default when user is just trying
update content to index
2023-06-29 14:52:51 -07:00
Debanjum Singh Solanky
da2de21339 Only return requested result count even if search in multiple content types
- Set results_count to default value at start so it is an int, never None
2023-06-29 14:49:05 -07:00
sabaimran
77672ac0ae Demarcate different results with a border box
- Add back support for searching by type Github
- Remove custom class name in markdown js file
2023-06-29 14:14:25 -07:00
sabaimran
6edc32f2f4 Accept current changes to include issues in rendering flow 2023-06-29 12:25:29 -07:00
sabaimran
ab7dabe74f Explicitly use Union type for function parameters for lint checks 2023-06-29 11:44:30 -07:00
sabaimran
fecf6700d2 Limit small image rendering to just the avatar images 2023-06-29 11:27:18 -07:00
sabaimran
70e550250a Add an additional data source for issues from Github repositories + quality of life updates
- Use a request session to reduce the overhead of setting up a new connection with the Github URL each request
- Use the streaming feature for the REST api to reduce some of the memory footprint
2023-06-29 10:59:54 -07:00
Debanjum Singh Solanky
5f2717cc4b Use logger.warning since logger.warn is deprecated 2023-06-28 22:15:27 -07:00
Debanjum Singh Solanky
56ce97ef9e Use async/await in tests for query method of text and image search
The text, image search query method has become async. So async/await
is required to get results correctly in tests etc
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
b1767f93d6 Get any configured asymmetric search model to encode query for search
- Set image_search.query to async to use it with multi-threading
  This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
8eae7c898c Put each result under org heading when query for "all" content type in khoj.el
- Add "all" as default content type when no content type retrieved
  from server
2023-06-28 22:07:02 -07:00
Debanjum Singh Solanky
630bf995f1 Style each result based on its content type in same view on Khoj web
- So when searching across content types (with content-type = "all")
  org-mode results get rendered differently than markdown, PDF etc. results

- Set div class for each result separately instead of a single uber div
  for styling. This allows styling div of each result based on the
  content-type of that result

- No need to create placeholder "all" content type on web interface as
  server is passing an all content type by itself
2023-06-28 22:07:01 -07:00
Debanjum Singh Solanky
1773a78339 Fix createRequestUrl method signature to fetch results from khoj web 2023-06-28 12:10:45 -07:00
Debanjum Singh Solanky
212b1a96c8 Create "all" search type for search across all content types on khoj server
Allows moving logic to handle search across all content types to
server from clients
2023-06-28 11:34:26 -07:00
Debanjum Singh Solanky
0636ceaf14 Merge branch 'master' of github.com:khoj-ai/khoj into parallelize-search-across-all-asymmetric-text-content-types
Conflicts:
- src/khoj/routers/api.py: Use theirs
2023-06-27 16:10:32 -07:00
Debanjum Singh Solanky
510bb7e684 Use typing union in text_search for python 3.8 compatible type hinting 2023-06-27 15:59:50 -07:00
Debanjum Singh Solanky
1b11d5723d Extract search request URL builder into js function in web interface 2023-06-27 15:50:41 -07:00
Debanjum Singh Solanky
09f739b8cc Null check config, log warning instead of error when configuring search 2023-06-27 15:48:48 -07:00
sabaimran
9d62d66a77 Simplify construction of repo shorthand in GithubToJsonl 2023-06-27 15:05:03 -07:00
sabaimran
227169ebde Support configuration of multiple Github repositories in the settings interface
- Add cards to configure each of the Github repositories
- Fix a bug in the API which caused all other settings to be wiped when updating one of the content types
- Provide an error message to the user if they have a misconfiguration in their chat settings
2023-06-27 14:10:09 -07:00
sabaimran
37a1f15c38 Add backend support for indexing multiple repositories
- Add support for indexing org files as well as markdown files from the Github repository and update corresponding search view
- Support indexing a list of repositories
2023-06-27 12:06:15 -07:00
sabaimran
ddd550e6f4 Add call to use X-CSRFToken in relevant POST methods 2023-06-26 12:38:00 -07:00
sabaimran
35e24d7851 Fix null checking in state for content config API and telemetry API 2023-06-26 11:37:34 -07:00
sabaimran
5e39421f56 Merge branch 'master' of github.com:debanjum/khoj 2023-06-25 11:41:47 -07:00
sabaimran
4410a3bb4b Limit max width of the pre tag to 100% of the screen width 2023-06-25 11:41:15 -07:00
sabaimran
ffe66b848a Use a single column tempalte for config plugins when in mobile 2023-06-25 11:27:41 -07:00
Debanjum Singh Solanky
b1890aa050 Null check intermediary objects when config not fully initialized 2023-06-24 15:34:18 -07:00
Debanjum Singh Solanky
946af0889d Improve showing status message on saving config via web interface
- Show success/failure status message much closer to the save button
  Previously status message was shown on top of the page, which wasn't
  always in view and wasn't easily seen
- Improve the status message to more clearly show next steps on success
2023-06-24 00:49:57 -07:00
Debanjum Singh Solanky
40d1abfe50 Update the new /config APIs to configure Khoj for first time users
- Setup state.config and sub-components from unset state
- Setup search types with default settings
2023-06-24 00:45:30 -07:00
Debanjum Singh Solanky
edabede93a Fix post configuration state update on error or success on config html 2023-06-23 14:52:25 -07:00
Debanjum Singh Solanky
4744d69221 Resolve button name, anchor tag feedback. Add status message to settings page
- Use "Configure" name for settings config action
- Use more standard anchor tag instead of button
- Add configure status message
2023-06-23 09:48:38 -07:00
Debanjum Singh Solanky
26abafa658 Highlight currently active tab in web interface for orientation 2023-06-22 00:33:28 -07:00
Debanjum Singh Solanky
2728c714d7 Put pico.css in local assets. Move common css styling into khoj.css 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
20a37697de Add Khoj header with navigation pane to Search and Chat Interfaces 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
c467a0cbb0 Update UI of config sub pages to use khoj lantern theme styling 2023-06-22 00:33:11 -07:00
Debanjum Singh Solanky
0ce2ec590a Update main config page on khoj server to match khoj lantern theme 2023-06-21 20:25:25 -07:00
Debanjum Singh Solanky
d30a9ddd33 Use Khoj Logo on Search, Chat pages of Web Interface 2023-06-21 12:34:53 -07:00
Debanjum Singh Solanky
6d4aad57e1 Use new Khoj Lantern Logo in Web, Emacs, Obsidian UIs and Docs 2023-06-21 01:57:22 -07:00
Debanjum Singh Solanky
69d4fa6525 Rename project links across repo from debanjum/khoj to khoj-ai/khoj 2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky
5c4eb950d5 Search across all content types via khoj.el on Emacs
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.

This results in search across all enabled asymmetric search text
content types
2023-06-20 23:39:56 -07:00
Debanjum Singh Solanky
2cd3e799d3 Improve null and type checks 2023-06-20 23:30:59 -07:00
Debanjum Singh Solanky
d5fb4196de Update web interface to allow querying all content types at once 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
5c7c8d1f46 Use async/await to fix parallelization of search across content types 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
1192e49307 Pass default value matching argument types expected by text_search methods 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
0144e610d6 Only search across content types that work with asymmetric search 2023-06-20 22:21:46 -07:00
Debanjum Singh Solanky
f6a7aa6c96 Style Khoj chat on web interface with new lantern theme
- Color khoj chat message with new yellow theme color
- Update Khoj chat emoji to lantern
- Add page type to title of pages on web interface
2023-06-20 01:39:33 -07:00
Debanjum Singh Solanky
6d94d6e75a Encode the asymmetric, symmetric search queries in parallel for speed
Use timer to measure time to encode queries and total search time
2023-06-20 01:18:17 -07:00
Debanjum Singh Solanky
d292dc03b3 Use new Khoj Logotype in Web interface 2023-06-20 01:13:06 -07:00
Debanjum Singh Solanky
db07362ca3 Encode user query as same across search types to speed up query time
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
  query and pass it to the query methods of each search types

TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
2023-06-19 23:29:54 -07:00
Debanjum Singh Solanky
285d17af2a Search in parallel across all enabled content types requested via API
- Update API to return content from all enabled content types when type
  is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
2023-06-19 23:29:06 -07:00
Debanjum Singh Solanky
79d325fbb6 Fix triggering @general queries in Khoj Chat 2023-06-19 23:05:33 -07:00
Debanjum Singh Solanky
e97a20d70c Set conversation type if query param set, else return chat history
Only initialize variables if query is not empty, to avoid unnecessary
compute, variable null checks etc.

Fixes #230
2023-06-19 19:59:16 -07:00
sabaimran
4722a2c16d Add Github configuration page and success notifications 2023-06-18 10:06:45 -07:00
sabaimran
668135c763 Merge branch 'master' of github.com:debanjum/khoj into features/pretty-config-page 2023-06-18 08:35:09 -07:00
sabaimran
81183a1fe1 Address misc PR comments and update logo in all clients
- Rename the new logo to reflect accuracy on size (e.g., 128x128)
- Update the icns file for Mac
- Update nomenclature in settings pages
2023-06-18 08:34:58 -07:00
Debanjum Singh Solanky
a44cde2865 Show hint to re-index vault if wonky results in Obsidian search modal
Remove spurious indentation in Obsidian styles.css

Resolves #207
2023-06-18 04:53:51 -07:00
Debanjum Singh Solanky
595cc5b0f5 Use printer icon for PDF logs. Only split lines if file at web link in web interface 2023-06-18 02:26:03 -07:00
Debanjum Singh Solanky
e31a540a5e Get all md files recursively in repository by passing recursive param
Previously the `get_markdown_files' method was only getting files at
root of the repository

Fix, improve logger messages in github to jsonl processor
2023-06-18 01:47:15 -07:00
Debanjum Singh Solanky
6fdac24416 Set page size to 100 to reduce requests required to Github API to 1/3
- Default is 30. So number of paginated requests required to get all
  items (commits, files) will reduce by 67%

- No need to increase page size for the get tree Github API request from
  `get_markdown_files'

  Get tree Github API doesn't support pagination and return 100K items
  in response. This should be way more than enough for our current
  use-cases
2023-06-18 01:44:36 -07:00
Debanjum Singh Solanky
87975e589a Fix passing auth token to Github API to increase rate limits by x85
- Previously wasn't prefixing "token" to PAT token in Auth header
  This resulted in the request being considered unauthenticated

- Unauthenticated requests to Github API are limited to 60 requests/hour
  Authenticated requests to Github API are allowed 5000 requests/hour
2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky
9c70af960c Extract logic to get file content from Github into a separate method 2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky
10d4c38ce9 Extract Wait for rate limit reset logic into a function for reuse 2023-06-18 01:06:46 -07:00
sabaimran
aad7f825e0 Remove music configuration 2023-06-17 21:23:56 -07:00
sabaimran
5f97afbfac Ignore type checks from mypy in subindexed fields 2023-06-17 16:53:36 -07:00
sabaimran
c2d46de8bc Add endpoint for regenerating directly from the config page and add music content-type 2023-06-17 15:47:33 -07:00
sabaimran
ded3100caf Update the configuration page to make config management easier
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky
3f24e53b6e Render URL as link in web interface if file param of result is a web link 2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky
63ec84ad78 Store Github URL of Markdown files on Github in file jsonl param 2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky
0c1c7583b5 Handle pagination, API rate limits. Get all commits from Github repo 2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky
31d17d0b22 Index commits message from repository with the github plugin 2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky
c29c141a7e Use Github Rest API to index Markdown files in Github Repository
The Llama_Hub Github plugin is fairly limited.

The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
2023-06-17 02:16:13 -07:00
Saba
ac96f43b1b Remove try-catch specific to Github plugin; consolidate GUI logic 2023-06-16 23:46:25 -07:00
Saba
019d3732de Rename orgmode_search to org_search 2023-06-13 16:06:54 -07:00
Saba
08d79f5ba4 Unify types used in Github and other text-based configs. Fix typing issues 2023-06-13 15:52:36 -07:00
Saba
a6cd96a6a9 Add a Github plugin which can be used to read from a Github repository 2023-06-13 14:40:06 -07:00
Debanjum
c68cde4803
Log clients calling API endpoints on Khoj server
- Make API endpoints on Khoj server accept `client` as request parameter
  - Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
  - Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
2023-06-09 18:36:49 +05:30
sabaimran
59fa48036f
Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size
Pass truncated message as string in ChatMessage when exceeding max prompt size
2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky
139a3ba060 Update server to log new server version field to telemetry db 2023-06-08 14:14:21 +05:30
Saba
5d5ebcbf7c Rename truncate messages method and update unit tests to simplify assertion logic 2023-06-06 23:25:43 -07:00
Saba
7119ed0849 Run pre-commit script 2023-06-05 19:29:23 -07:00
Saba
6212d7c2e8 Remove debug line 2023-06-05 19:00:25 -07:00
Saba
f65ff9815d Move message truncation logic into a separate function. Add unit tests with factory boy. 2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky
eb6175e9b0 Update description field in webmanifest of Khoj, Khoj Chat PWA 2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky
bb2363f324 Set client request param when calling khoj server APIs from Web 2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky
caab55fbdd Set client request param when calling khoj server APIs from Obsidian 2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky
de2494154f Set client request param when calling khoj server APIs from Emacs 2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky
168c11cea7 Make server API endpoints accept client as query param
- The chat, search and update API will accept client as request param.
- This will allow logging the client from which these APIs was called.
2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky
8617cf1389 Push telemetry to Posthog to grok Khoj usage 2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky
d13db2e666 Make old telemetry server forward requests to new server 2023-06-05 13:06:45 +05:30
Saba
5f4223efb4 Increase timeout to OpenAI call 2023-06-04 20:49:47 -07:00
Saba
0e63a90377 Fix the mechanism to retrieve the message content 2023-06-04 20:25:37 -07:00
Saba
f0efe0177e Pass truncated message as string in ChatMessage when exceeding max prompt size 2023-06-04 19:33:46 -07:00
Saba
068ee0ac5e Swap elif with else, as usage of this method does not use openai_api_key 2023-06-04 02:25:08 -07:00
Saba
6508379d7b Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky
7af8a56434 Remove filename from reference before rendering references in khoj.el
Fixes bug where actual reference heading in next line jumping out of
references footnote section
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
ec280067ef Do not retrieve relevant notes when having a general chat with Khoj
- This improves latency of @general chat by avoiding unnecessary
  compute
- It also avoids passing references in API response when they haven't
  been used to generate the chat response. So interfaces don't have to
  add logic to not render them unnecessarily
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
90439a8db1 Update Khoj subtitle to AI personal assistant for your digital brain 2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
e9ed7a19fd Update search prompt to extract PDF search type. Fix extract_question prompt 2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky
bbe3bf9733 Render PDF search results in Khoj Obsidian interface
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
  - Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
e3892945d4 Render PDF search results in Khoj.el Emacs interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
85144006a1 Render PDF search results in khoj web interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
acd14a5e41 Wire up PDF to jsonl processor to Khoj server layer (API, config)
- Specify PDF content to index via khoj.yml
- Index PDF content on app start, reconfigure
- Expose PDF as a search type via API
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
286b500f66 Create PDF to JSONL processor using PyPDF and LangChain
Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts
throwing typing error for python 3.8, 3.9
2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky
1b3effd8e6 Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor 2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky
1cd9ecd449 Truncate last message if still over max supported prompt size by model 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
ed4d0f9076 Simplify argument names used in khoj openai completion functions
- Match argument names passed to khoj openai completion funcs with
  arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
703a7c89c0 Reduce retry count and request timeout for faster response or failure
- Fix bug where both LangChain and Khoj retry requests 6 times each.
  So a total of 12 requests at >1minute intervals for each chat
  response in case of OpenAI API being down

- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
  between retries way too much. This slowed down chat response times
  quite a bit when API was being flaky

- With these updates you'll know if call to chat API failed in under a
  minute
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
18081b3bc6 Use LangChain to call GPT over API 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
277d2f5c96 Do not add "Notes:" suffix to chat messages when no notes retrieved
This was causing spurious "Notes:" suffix being added to Khoj Chat in
response
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
334be4e600 Use LangChain to call OpenAI for Khoj Chat
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
  using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
  creating agents and managing their state
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
efcf7d1508 Extract prompts as LangChain Prompt Templates into a separate module
Improves code modularity, cleanliness. Reduces bloat in GPT.py module
2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky
b484953bb3 Import app state correctly to generate embeddings with OpenAI model
Resolves #216
2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
a0d0dbaca7 Fix link to Khoj Obsidian Demo video in Readmes 2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky
ebb5d7b8e5 Release Khoj version 0.6.2 2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky
d02415edcc Write generated server id to env file when env file does not contain it 2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky
dc0626856e Put the telemetry db in a separate directory by default 2023-05-17 18:58:47 +05:30
Debanjum Singh Solanky
e9f04dc644 Add dockerfile to containerize telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
07b19964d4 Schedule jobs at (co-)prime intervals to reduce overlap in job runs 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
d42f0f5055 Add basic telemetry server for khoj 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
134cce9d32 Batch upload telemetry data at regular interval instead of while querying 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
3ede919c66 Log usage of /search, /chat, /update API endpoints to telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
f2e89f6f46 Add khoj app helper methods to log app usage to a telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
9ca61d62ff Enable/disable logging telemetry by setting bool in khoj.yml config
We log usage telemetry by default, unless setting explicitly set in
khoj.yml
2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky
131b8407b5 Allow Khoj Chat to respond to general queries not in reference notes
- Khoj chat will now respond to general queries if:
  1. no relevant reference notes available or
  2. when explicitly induced by prefixing the chat message with "@general"

- Previously Khoj Chat would a lot of times refuse to respond to
  general queries not answerable from reference notes or chat history

- Make chat quality tests more robust
  - Add more equivalent chat response options refusing to answer
  - Force haiku writing to not give any preable, just the haiku
2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky
cc75f986b2 Test text search index only updates on changes to text content 2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky
f9ccce430e Allow configuring OpenAI chat model for Khoj chat
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
  defaults to using gpt-3.5-turbo, unless chat-model field under
  conversation processor updated in khoj.yml
2023-05-03 23:01:13 +08:00
Debanjum Singh Solanky
6b535cc345 Snip prepended heading to avoid crossing model max_token limits
Otherwise if heading > max_tokens than the search models will just see
a heading (with repeated filename) for each compiled entry and not
actual content.

100 characters should be sufficient to include filename (not path) and
entry heading. If longer rather truncate to pass entry unique text to
model for search context
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
02aeee60aa Set filename as top heading of org entries for better search context
Previously filename was only being appended to markdown entries.

Test filename getting prepended to compiled entry as heading
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
94825a70b9 Set heading of md entries to improve search context for long entries
Otherwise if a markdown entry is longer than max_tokens, the split
entries (apart from first one) do not get their heading context set
2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky
5de04621b5 Set filename as top heading of md entries for better search context
Previously filename was appended to the end of the compiled entry.
This didn't provide appropriate structured context

Test filename getting prepended as heading to compiled entry
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
0e3fb59e09 Entries with no md headings should not get heading prefix prepended
Files with no headings would previously get their entry be prefixed
with a markdown heading prefix (#)
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
45a991d75c Prepend entry heading to all compiled org snippets to improve search context
All compiled snippets split by max tokens (apart from first) do not
get the heading as context.

This limits search context required to retrieve these continuation
entries
2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky
3386cc92b5 Fix khoj server config update in khoj.el by unquoting list to cl-push to
- cl-push expects a generatlized variable. Else throws (setf quote)
  undefined warning
- This results in the config call failing on calling khoj entrypoint
2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky
948a4274e4 Fix documentation strings and simplify not null checks 2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky
731ef5688f Use cl-pushnew to fix byte-compile errors with using add-to-list 2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky
f046523b33 Improve khoj.el messages to convey state of khoj server
- Remove waiting for server message as it hides the messages from the
  server
- Fix the nil message that were being rendered, by checking before
  showing messages from server
- Consistently prefix messages from khoj with khoj.el
2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky
76df393eb5 Only call khoj server configure API from khoj.el when config updated
Previously khoj.el was calling the server configure API even when
config was same as before.
This had broken the khoj search as you type experience from emacs

Also show more details to user about what in khoj is being configured
2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
ceae06ae9d Fix khoj.el compilation warnings around unused variables 2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky
8269adf849 Refactor khoj-setup in khoj.el for readability. No functional change 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
865d12b6f2 Fix escaping quote in chat references to prevent it breaking out of html 2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky
26cb878327 Add Yarn lockfile for Khoj Obsidian 2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky
e3180d63e6 Sync Khoj Obsidian Tagline with Khoj tagline 2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky
62e6e09521 Release Khoj version 0.6.1 2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky
b079fb31bc Replace Windows path separators in indexName configured via Khoj Obsidian
Resolves #185, #199

- Issue
  IndexName created from Obsidian Absolute Vault path wasn't replacing
  windows path, drive separators with underscore. It was only
  replacing unix path separators

- Fix
  Also replace windows drive and path separators with _ while creating
  IndexName in Khoj Obsidian plugin
2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky
d90df966a9 Make khoj logger use utf-8 encoding when writing to khoj log file
Resolve logger error issue mentioned in #199
2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky
dc3f399f91 Fix to get score associated with SearchResponse in result as string 2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky
d5000c63e1 Update Readmes to use python -m pip install khoj-assistant
Makes it easier to tell pip associated with which python is being
used. Easier to debug when users have different versions of python
installed (e.g 3.10 and 3.11)
2023-04-16 20:17:20 +07:00
Debanjum Singh Solanky
453c84ab79 Add Screenshots of Khoj Chat Interface on Emacs, Obsidian to Readmes 2023-04-07 23:19:47 +07:00
Debanjum Singh Solanky
35aa06067f Release Khoj version 0.6.0
Upload styles.css via release workflow
2023-03-31 18:13:16 +07:00
Debanjum Singh Solanky
5673bd5b96 Keep original formatting in compiled text entry strings
- Explicity split entry string by space during split by max_tokens
- Prevent formatting of compiled entry from being lost
- The formatting itself contains useful information
  No point in dropping the formatting unnecessarily,
  even if (say) the currrent search models don't account for it (yet)
2023-03-30 14:02:46 +07:00
Debanjum Singh Solanky
a2ab68a7a2 Include filename of markdown entries for search indexing
Append originating filename to compiled string of each entry for
better search quality by providing more context to model

Update markdown_to_jsonl tests to ensure filename being added

Resolves #142
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
67129964a7 Create Note with Query as title from within Khoj Search Modal
This follows expected behavior for obsidain search modals
E.g Ominsearch and default Obsidian search.

The note creation code is borrowed from Omnisearch.

Resolves #133
2023-03-30 13:51:36 +07:00
Debanjum Singh Solanky
d3257cb24e Style the search result. Use Obsidian theme colors and font-size
Based on PR #135
2023-03-30 12:35:29 +07:00
Debanjum Singh Solanky
40091489c0 For each result: snip it by lines, show filename, remove frontmatter
Based on PR #135
Resolves #134
2023-03-30 12:34:55 +07:00
Debanjum Singh Solanky
240db7b4f0 Add screenshot of Khoj chat on Obsidian to Readme. Fix links 2023-03-30 02:49:05 +07:00
Debanjum Singh Solanky
234be96e53 Fix processor key used to configure chat model in khoj obsidian 2023-03-30 01:47:09 +07:00
Debanjum Singh Solanky
c8c0cfd10e Add Chat features, setup and usage to Khoj Obsidian plugin Readme 2023-03-30 00:32:24 +07:00
Debanjum Singh Solanky
7ecae224e7 Configure OpenAI API Key from the Khoj plugin setting in Obsidian 2023-03-29 23:54:08 +07:00
Debanjum Singh Solanky
3d616c8d65 Use Obsidian font sizes. Improve input field, reference indexing
- Give space in the input field. Too narrow previously
- References should be indexed from 1 instead of 0
- Use Obsidian font size variables to scale fonts in chat appropriately
2023-03-29 22:13:55 +07:00
Debanjum Singh Solanky
23bd737f6b Use chat input element to send message on Enter. No send button required 2023-03-29 22:13:30 +07:00
Debanjum Singh Solanky
81e98c3079 Scroll to bottom of modal on open and message send 2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
59ff1ae27f Use obsidian theme colors for bg, text. Restrict css namespace via prefix 2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
001ac7b5eb Style Obsidian Chat Modal like Khoj Chat Web Interface
- Add message sender, date metadata as message footer
- Use css directly from Khoj Chat Web Interface.
  - Modify it to work under a Obsidian modal
  - So replace html, body styling from web interface to instead
    styling new "khoj-chat" class attached to contentEl of modal
2023-03-29 18:12:12 +07:00
Debanjum Singh Solanky
112f388ada Render references next to chat responses by khoj in chat modal 2023-03-28 18:11:03 +07:00
Debanjum Singh Solanky
1d3d949962 Render conversation logs on page load 2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
cd46a17e5f Add Khoj Chat Modal, Command in Khoj Obsidian to Chat using API 2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
c0972e09e6 Rename KhojModal to KhojSearchModal, a more specific name for it
In preparation to introduce Khoj chat in Obsidian
2023-03-28 14:56:29 +07:00
Debanjum Singh Solanky
64fff1d372 Release Khoj version 0.5.0 2023-03-28 03:35:59 +07:00
Debanjum Singh Solanky
fc218508f9 Update khoj.el docs and Emacs Readme for chat, simplified setup 2023-03-27 22:02:47 +07:00
Debanjum Singh Solanky
83a7ccd729 Fix docstrings and method ordering in khoj.el 2023-03-27 18:33:09 +07:00
Debanjum Singh Solanky
5c2327ee4f Configure org directories to index from khoj.el
Converts paths to glob style regexes that will index all org files
recursively under the specified list of path

Should help setup for org-roam users from khoj.el
2023-03-27 18:30:53 +07:00
Debanjum Singh Solanky
6e8a40906d Allow disabling automatic server setup. Fix server start vs ready logic
- khoj-auto-setup controls whether to automatically check for and
  setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
  method. Allows calling khoj-setup during package load via init.el

- Fix: Do not attempt to configure or wait for server ready if
  user has said no to auto-setup request
- Fix logic to mark server started vs ready
  - Previously the started/running vs ready variables defs were getting
    intertwined
  - Server started indicates server bootup has been triggered
  - Server ready indicates server API ready to accept requests
2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky
526a927bce Fix org entry extraction test, variable prefixed with khoj in khoj.el
Discovered via failing build and test workflows on Github
2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky
7243059507 Track index update asynchronously via moon phase progressbar in khoj.el 2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky
8a9055f918 Restrict server messages show in echo area to main server files 2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky
ae535a06eb Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs 2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky
36b17d4ae0 Generalize the directory from config extraction elisp method 2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky
924424c754 Throw actionable exceptions when content types or chat not configured 2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky
359a2cacef Fix khoj--server-running to work with unconfigured or external server
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)

- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky
d7fb9a596e Auto configure server before loading khoj-menu
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done

As easier to know when server ready to configure
2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky
8a21aff438 Make khoj.el server start, stop, restart, setup methods interactive
No need to erase temporary buffers before working on them
2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky
cb40a96c85 Index configured org files from khoj.el
- Set `khoj-org-files-index' to list of files to index
- Defaults to indexing org-agenda-files
- Uses khoj server api to configure org files to index
2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky
50760acc37 Wait for Khoj server to get ready before opening khoj.el transient menu
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
  Until then khoj features wouldn't work anyway, so avoids confusion
2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky
82eb4bfd0d Setup Khoj server on opening khoj from with Emacs
- Create helper methods to check, stop, restart, setup khoj server
- (Ask to) setup khoj server on calling khoj main entrypoint function
2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky
99d19dcf43 Start Khoj server from Emacs using khoj.el 2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky
c92d79118a Install Khoj server from Emacs using khoj.el 2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky
e281a498b4 Style Khoj search org buffer via elisp instead of in-buffer settings 2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky
4f655d20ae Style Khoj chat directly via elisp instead of via in-buffer settings 2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky
f6ff7b1beb Render foonote reference links as superscript for Khoj Chat on Emacs 2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky
67c850a4ac Add retry logic to OpenAI API queries to increase Chat tenacity
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
  tenacity package. This is officially suggested and used by other
  popular GPT based libraries
2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky
ff846f05c5 Clean-up khoj.el based on linting helpers and manual review 2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky
7e36f421f9 Truncate message logs to below max supported prompt size by model
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
  argument to generate_chatml_messages_with_context method
2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky
4725416fbd Use shortcut keybindings in buffer to ease sending messages to Khoj 2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky
508b2176b7 Update Chat API, Logs, Interfaces to store, use references as list
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
b08745b541 Keep chat messages at 1 empty line visible distance in khoj.el
- Clean redundant concat, format string
- Improve variable name to emojified sender
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
27217a330d Time chat API sub-components for performance analysis
Time and the search query extraction, search and response generation
components
2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky
5e9558d39d Stylize references shown as footnote links in chat messages
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
  - Add continuation suffix, ..., when reference definition truncated
2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky
cf28f104c7 Register separate timestamps for user query and response by Khoj Chat 2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky
93e2aff786 Add references as org footnotes instead of links 2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky
d78454d4ad Load Khoj Chat buffer before asking for query to provide context 2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky
863933daaa Resolve build issues found by melpazoid 2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky
e9ca04af0d Require dash, org to run ERT tests for khoj.el 2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky
06df394d6c Style chat messages as org-mode entries in Emacs
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
  - Improves color coding for readability
  - Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky
364e6c11af Render chat history from API in chat buffer on first run
- Generalize the render-chat-response method to handle rendering
  history or chat response from chat API reponse

- Trigger rendering of khoj chat history if Khoj chat buffer not
  created for this session yet
2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky
36b52fdd0a Properly escape reference links before rendering
- Use org-insert-link method to improve link rendering robustness
  Previous simple mechanism to crete org-links would result in links
  escaping out of formating. Use a user-facing org-mode method to
  remove/reduce probability of this

- Replace newlines with space to render reference notes as links
2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky
72f63a6ef7 Add basic chat interface for Khoj on Emacs
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
  - [sender-name]: *[message]*
    - /[receive-date]/
- Add references as org links with context visible on hover,
  but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
  Use `-map-indexed' method from dash
2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky
e4d67694e1 Add search to method, variable names meant for khoj search in khoj.el
In preparation to introduce Khoj chat in Emacs
2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky
2f6284872d Mention Khoj needs Python version 3.10 or lower in docs 2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky
601ff2541b Revert to using GPT to extract search queries from users message
- Reasons:
  - GPT can extract date aware search queries with date filters
    better than ChatGPT given the same prompt.
  - Need quality more than cost savings for now.
  - Need to figure ways to improve prompt for ChatGPT before using it
2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky
e28526bbc9 Extract search queries from users message using ChatGPT as Search Actor
- Reasons
  - ChatGPT should be better at following instructions than GPT
  - At 1/10th the cost, it's much cheaper than using older GPT models
2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky
939d7731da Fix-up Search Actor GPT's response for decoding it as valid JSON 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f63fd0995e Pass more search results as context to Chat Actor to improve inference 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
10836dedee Search should return user message if GPT response is not valid JSON
Previously would throw if GPT response is not valid JSON. Better to
return original message to use for search instead
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
08f5fb315f Add answers to context for Search Actor to generate relevant queries
Update Search Actor prompt with answers, more precise primer and
two more examples for context

Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
45cb510421 Loosen search results score thresold used by chat for more context 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
d871e04a81 Use past user messages, inferred questions as context to extract questions
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
  extract complete questions
- This should improve search results quality

- Example Expected Inferred Questions from User Message using History:
  1. "What is the name of Arun's daughter?"
    => "What is the name of Arun's daughter"
  2. "Where does she study?" =>
    => "Where does Arun's daughter study?" OR
    => "Where does Arun's daughter, Reena study?"
2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky
1a5d1130f4 Generate search queries from message to answer users chat questions
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
   E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
   Answer time range based questions
   Limit search to specified timeframe in question using date filter
   E.g "What national parks did I visit last year?" adds
   dt>="2022-01-01" dt<"2023-01-01" to Khoj search

Note: Temperature set to 0. Message to search queries should be deterministic
2023-03-18 16:28:51 -06:00
Debanjum
e75e13d788
Create Tests to Measure Chat Quality, Capabilities
Create Rubric to Test Chat Quality and Capabilities

### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities

### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
   - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
   - Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
   - Measure quality at 2 separate layers
     - **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
     - **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message.  This is what the `/api/chat` API exposes.
   - Mark desired but not currently available capabilities as expected to fail <br />
     This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky
7526a50dd4 Extract conversation processor utility funcs from gpt.py into utils.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
24ddebf3ce Make converse prompt more precise. Fix default arg vals in gpt methods
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
  answering
- Make GPT be more reliable in looking at past conversations for
  forming response
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
8609e3129e Fix, improve displaying chat messages, sources by Khoj in web interface
Pretty pretty json in conversation logs
2023-03-14 11:24:47 -06:00
Debanjum
6c0e82b2d6
Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
  - Previously was asking GPT to summarize the notes
  - Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
  - Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
  - Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now

## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
  - Allows Khoj to say don't know if it can't find answer to query from notes
  - Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky
cccd225247 Deduplicate and simplify logic to render chat message with reference 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
b9caad458e Type score_threshold with union, not |, to support python <3.10 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
a71f168273 Move the chat API out of beta. Save chat sessions at 15min intervals 2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky
8bb8824d0c Bump khoj versions in obsidian, emacs files 2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky
e16d0b6d7e Open references notes used for chat on mobile too (by clicking)
Requires clicking the reference as hover doesn't work on mobile
2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky
c3c7b8a951 Make Khoj chat a separate Progressive Web App (PWA) for easier access 2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky
3838f9d8e3 Remove explicitly asking GPT to say I don't know in prompt for now
GPT still mostly says I don't know when answer not in notes or chats

But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky
f7b8cdd02e Log prompts being passed to GPT for debugging 2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky
2739a492b4 Log message metadata along with Khoj message instead of user message
References should be attached to khoj chat messsage rather than the
users message in the chat interface
2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
87d1e1341d Show reference notes used as response context in chat interface 2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
280061e1fa Do not deduplicate search results used for chat context
- Chat uses compiled form of search results, not the raw entries to
  provide context for chat. The compiled snipped search results
  themselves are unique and using multiple of them for context from
  the same raw note is fine if they cross the score and rank thresholds

  This should improve the context provided for chat

- Also apply score_threshold, no deduplication to the answers API
2023-03-06 23:51:31 -06:00