Commit graph

830 commits

Author SHA1 Message Date
Debanjum Singh Solanky
f6ff7b1beb Render foonote reference links as superscript for Khoj Chat on Emacs 2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky
67c850a4ac Add retry logic to OpenAI API queries to increase Chat tenacity
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
  tenacity package. This is officially suggested and used by other
  popular GPT based libraries
2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky
ff846f05c5 Clean-up khoj.el based on linting helpers and manual review 2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky
7e36f421f9 Truncate message logs to below max supported prompt size by model
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
  argument to generate_chatml_messages_with_context method
2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky
4725416fbd Use shortcut keybindings in buffer to ease sending messages to Khoj 2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky
508b2176b7 Update Chat API, Logs, Interfaces to store, use references as list
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
b08745b541 Keep chat messages at 1 empty line visible distance in khoj.el
- Clean redundant concat, format string
- Improve variable name to emojified sender
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
27217a330d Time chat API sub-components for performance analysis
Time and the search query extraction, search and response generation
components
2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky
5e9558d39d Stylize references shown as footnote links in chat messages
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
  - Add continuation suffix, ..., when reference definition truncated
2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky
cf28f104c7 Register separate timestamps for user query and response by Khoj Chat 2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky
93e2aff786 Add references as org footnotes instead of links 2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky
d78454d4ad Load Khoj Chat buffer before asking for query to provide context 2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky
863933daaa Resolve build issues found by melpazoid 2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky
e9ca04af0d Require dash, org to run ERT tests for khoj.el 2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky
06df394d6c Style chat messages as org-mode entries in Emacs
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
  - Improves color coding for readability
  - Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky
364e6c11af Render chat history from API in chat buffer on first run
- Generalize the render-chat-response method to handle rendering
  history or chat response from chat API reponse

- Trigger rendering of khoj chat history if Khoj chat buffer not
  created for this session yet
2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky
36b52fdd0a Properly escape reference links before rendering
- Use org-insert-link method to improve link rendering robustness
  Previous simple mechanism to crete org-links would result in links
  escaping out of formating. Use a user-facing org-mode method to
  remove/reduce probability of this

- Replace newlines with space to render reference notes as links
2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky
72f63a6ef7 Add basic chat interface for Khoj on Emacs
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
  - [sender-name]: *[message]*
    - /[receive-date]/
- Add references as org links with context visible on hover,
  but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
  Use `-map-indexed' method from dash
2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky
e4d67694e1 Add search to method, variable names meant for khoj search in khoj.el
In preparation to introduce Khoj chat in Emacs
2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky
2f6284872d Mention Khoj needs Python version 3.10 or lower in docs 2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky
601ff2541b Revert to using GPT to extract search queries from users message
- Reasons:
  - GPT can extract date aware search queries with date filters
    better than ChatGPT given the same prompt.
  - Need quality more than cost savings for now.
  - Need to figure ways to improve prompt for ChatGPT before using it
2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky
e28526bbc9 Extract search queries from users message using ChatGPT as Search Actor
- Reasons
  - ChatGPT should be better at following instructions than GPT
  - At 1/10th the cost, it's much cheaper than using older GPT models
2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky
939d7731da Fix-up Search Actor GPT's response for decoding it as valid JSON 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f63fd0995e Pass more search results as context to Chat Actor to improve inference 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
10836dedee Search should return user message if GPT response is not valid JSON
Previously would throw if GPT response is not valid JSON. Better to
return original message to use for search instead
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
08f5fb315f Add answers to context for Search Actor to generate relevant queries
Update Search Actor prompt with answers, more precise primer and
two more examples for context

Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
45cb510421 Loosen search results score thresold used by chat for more context 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
d871e04a81 Use past user messages, inferred questions as context to extract questions
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
  extract complete questions
- This should improve search results quality

- Example Expected Inferred Questions from User Message using History:
  1. "What is the name of Arun's daughter?"
    => "What is the name of Arun's daughter"
  2. "Where does she study?" =>
    => "Where does Arun's daughter study?" OR
    => "Where does Arun's daughter, Reena study?"
2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky
1a5d1130f4 Generate search queries from message to answer users chat questions
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
   E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
   Answer time range based questions
   Limit search to specified timeframe in question using date filter
   E.g "What national parks did I visit last year?" adds
   dt>="2022-01-01" dt<"2023-01-01" to Khoj search

Note: Temperature set to 0. Message to search queries should be deterministic
2023-03-18 16:28:51 -06:00
Debanjum
e75e13d788
Create Tests to Measure Chat Quality, Capabilities
Create Rubric to Test Chat Quality and Capabilities

### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities

### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
   - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
   - Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
   - Measure quality at 2 separate layers
     - **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
     - **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message.  This is what the `/api/chat` API exposes.
   - Mark desired but not currently available capabilities as expected to fail <br />
     This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky
7526a50dd4 Extract conversation processor utility funcs from gpt.py into utils.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
24ddebf3ce Make converse prompt more precise. Fix default arg vals in gpt methods
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
  answering
- Make GPT be more reliable in looking at past conversations for
  forming response
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
8609e3129e Fix, improve displaying chat messages, sources by Khoj in web interface
Pretty pretty json in conversation logs
2023-03-14 11:24:47 -06:00
Debanjum
6c0e82b2d6
Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
  - Previously was asking GPT to summarize the notes
  - Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
  - Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
  - Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now

## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
  - Allows Khoj to say don't know if it can't find answer to query from notes
  - Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky
cccd225247 Deduplicate and simplify logic to render chat message with reference 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
b9caad458e Type score_threshold with union, not |, to support python <3.10 2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky
a71f168273 Move the chat API out of beta. Save chat sessions at 15min intervals 2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky
8bb8824d0c Bump khoj versions in obsidian, emacs files 2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky
e16d0b6d7e Open references notes used for chat on mobile too (by clicking)
Requires clicking the reference as hover doesn't work on mobile
2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky
c3c7b8a951 Make Khoj chat a separate Progressive Web App (PWA) for easier access 2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky
3838f9d8e3 Remove explicitly asking GPT to say I don't know in prompt for now
GPT still mostly says I don't know when answer not in notes or chats

But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky
f7b8cdd02e Log prompts being passed to GPT for debugging 2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky
2739a492b4 Log message metadata along with Khoj message instead of user message
References should be attached to khoj chat messsage rather than the
users message in the chat interface
2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
87d1e1341d Show reference notes used as response context in chat interface 2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky
280061e1fa Do not deduplicate search results used for chat context
- Chat uses compiled form of search results, not the raw entries to
  provide context for chat. The compiled snipped search results
  themselves are unique and using multiple of them for context from
  the same raw note is fine if they cross the score and rank thresholds

  This should improve the context provided for chat

- Also apply score_threshold, no deduplication to the answers API
2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky
672f61529e Make getting deduped search results configurable via Search API 2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky
4fb628975c Fix jumping to note from Khoj Obsidian search modal result on Windows
- Issue
  The file path separator by khoj server and the Obsidian vault were
  different on Windows
- Fix
  Normalize file path to use forward slash(/) to find the matching
  note file in the Obsidian vault for jump to it

Resolves #177
2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky
b6cdc5c7cb Do not expose answer API as a chat type in chat web interface or API
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat

For now it is only exposed via API. Later it will be expose in the
interfaces as well

Remove ability to select different chat types from the chat web
interface as there is only a single chat type

Stop appending answers to the conversation logs
2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky
7f994274bb Support multi-turn conversations in chat mode
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context

- Make GPT provide reason when it can't answer the question. Gives
  user context to tune their questions
2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky
d73042426d Support filtering for results above threshold score in search API 2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky
45f461d175 Keep search results passed to GPT as context in conversation logs
This will be useful to
1. Show source references used to arrive at answer
2. Carry out multi-turn conversations
2023-03-05 16:00:19 -06:00
Debanjum Singh Solanky
7cad1c9428 Only use past chat message, not session summaries as chat context
Passing only chat messages for current active, and summaries
for past session isn't currently as useful
2023-03-05 16:00:18 -06:00
Debanjum Singh Solanky
ad1f1cf620 Improve and simplify Khoj Chat using ChatGPT
- Set context by either including last 2 chat messages from active
  session or past 2 conversation summaries from conversation logs

- Set personality in system message
- Place personality system message before last completed back & forth
  This may stop ChatGPT forgetting its personality as conversation progresses given:
  - The conditioning based on system role messages is light
  - If system message is too far back in conversation history, the
    model may forget its personality conditioning
  - If system message at end of conversation, the model can think its
    the start of a new conversation
  - Inserting the system message before last completed back & forth should
    prevent ChatGPT from assuming its the start of a new conversation
    while not losing personality conditioning from the system message

- Simplfy the Khoj Chat API to for now just answer from users notes
  instead of trying to infer other potential interaction types.
  - This is the default expected behavior from the feature anyway
  - Use the compiled text of the top 2 search results for context

- Benefits of using ChatGPT
  - Better model
  - 1/10th the price
  - No hand rolled prompt required to make GPT provide more chatty,
    assistant type responses
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
9d42b5d60d Use multiple compiled search results for more relevant context to GPT
Increase temperature to allow GPT to collect answer across multiple
notes
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
c3b624e351 Introduce improved answer API and prompt. Use by default in chat web interface
- Improve GPT prompt
  - Make GPT answer users query based on provided notes instead
    of summarizing the provided notes
  - Make GPT be truthful using prompt and reduced temperature
  - Use Official OpenAI Q&A prompt from cookbook as starting reference
- Replace summarize API with the improved answer API endpoint
- Default to answer type in chat web interface. The chat type is not
  fit for default consumption yet
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
7184508784 Mention Python and Pip need to be installed in Main and Emacs Readme 2023-03-02 21:28:54 -06:00
Debanjum Singh Solanky
211e460398 Output date filter from cache log at debug level. Remove unused imports
Other logs not directly useful to user have already been converted
to debug log levels in 1ae4016. Just forgot to convert this log line too
2023-03-02 15:41:32 -06:00
Debanjum Singh Solanky
b6dbe4dd1d Do not try retrieve an unconfigured core content type in Config GUI
Previous behavior was resulting in a null reference error. As key for
the core content/search type was not present in current config

Fallback to using default config for unconfigured core content type
instead

See #165 for details
2023-03-02 11:09:31 -06:00
Debanjum Singh Solanky
1ae40163a9 Show user friendly information logs by default for context
- Use emojis to make info logs easier to read
- Inform when khoj is ready to use
- Provide information on what khoj is doing while starting up
- Inform when content/search types and processors are setup
- Inform when models are being loaded from the web as this step can
  take time
- Convert all other info logs to be only shown in verbose mode
2023-03-01 16:39:07 -06:00
Debanjum Singh Solanky
fe03ba3dce Index intro text before headings in org files
- Text before headings was not being indexed due to buggy orgnode
  parsing logic
- Resolved indexing intro text from files with and without headings in
  them
- Ensure intro text node has heading set to all title lines collected
  from the file

Resolves #165
2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky
7ad251b8ef Log and Continue on OSError while collating dates for date filters
Log to understand if error, date can be handled better
Mitigates #172
2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky
2bed4c3b50 Fix configuring search types & /config/types API when no plugin configured
- Test /config/types API when no plugin configured, only plugin configured
  and no content configured scenarios
- Do not throw null reference exception while configuring search types
  when no plugin configured
- Do not throw null reference exception on calling /config/types API
  when no plugin configured

Resolves bug introduced by #173
2023-03-01 01:23:37 -06:00
Debanjum Singh Solanky
8914dbd073 Fix creating GUI panels for unconfigured search, processor types
Repro:
1. Open khoj server with `khoj` on first run
2. Install/enable Khoj Obsidian plugin (to configure khoj server)
3. Restart khoj server with `khoj`

Bug:
- Unconfigured processor and search_types are instantiated as None in
  self.current_config
- While creating the desktop GUI, these null configs are attempted to
  be accessed as valid dictionaries for creating their GUI panels
- This results in the null ref errors

Fix:
Use default config to create their GUI elements for unconfigured
search and processor types

Resolves #167
2023-03-01 01:20:58 -06:00
Debanjum Singh Solanky
b09350c052 Fix to return only enabled content types via the new config/types API
- Previously was return all core content types even if they had not been
  setup
- Add test to validate only configured content types are returned by
  the api/config/types API endpoint
2023-02-28 22:08:26 -06:00
Debanjum Singh Solanky
b177adf3a7 Return value of search_type in /config/type API endpoint
- Remove need for interfaces to downcase content types returned by API
  before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
2023-02-28 21:49:26 -06:00
Debanjum Singh Solanky
88344f9ed2 Improve rendering search results of plugin content types on web interface
Render only the entry from plugin search response instead of raw json
Use the results-ledger styling for results-plugin styling
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
c2814fce58 Improve rendering search results of plugin content types in khoj.el
Render only the entry from plugin search response instead of raw json
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
f3f24387ec Use new config/types API to set enabled content types on web interface 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
1e43f1a12e Use new config/types API to set enabled content types in khoj.el menu 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
9d38eadd42 Return enabled content types via api/config/types API endpoint
Simplifies dynamically populating enabled content types for interfaces
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
68bd5d9ebc Configure API routes after set up search types while configuring server
Configure app routes after configuring server.
Import API routers after search type is dynamically populated.
Allow API to recognize the dynamically populated plugin search types
as valid type query param.
Enable searching for plugin type content.
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
d91c7e2761 Search for plugin content via the search API 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
47b58a2a4d Configure, use dynamically instantiated SearchType enum on app start
The SearchType is now dynamically populated with core and configured
plugin types

Use the new dynamic SearchType enum from state.py across codebase
2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
ab0d3a08e2 Index configured plugins on app start and via update API endpoint 2023-02-28 20:25:51 -06:00
Debanjum Singh Solanky
55a032e8c4 Add processor to index entries from jsonl files for plugins
- Read, merge entries from input jsonl files and filters
- Mark new, modified entries for update
2023-02-24 02:54:12 -06:00
Debanjum Singh Solanky
fcbbe8c759 Read content plugin configs from Khoj config YAML
Configure external text content plugins via the Khoj YAML
Reuse existing TextContentConfig definition for external text content plugins
2023-02-23 23:57:32 -06:00
Debanjum Singh Solanky
61b6ee2857 Use helper script to bump khoj pre-release versions 2023-02-17 20:31:51 -06:00
Debanjum Singh Solanky
053d6141f3 Ignore ts typing error, Fix SPDX license identifier in Obsidian plugin 2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky
36be3c4b8f Fix or ignore MyPy issues in PyQt desktop GUI code
- Remove unneeded type ignore for mps with the latest mypy
- Stop excluding PyQT desktop GUI code from MyPy checks
- Do not warn about unused ignores. Some issue with mypy giving
  different errors in different environments (venv, system and pre-commit)
2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky
051f0e3fb5 Add, configure and run pre-commit locally and in test workflow 2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky
5e83baab21 Use Black to format Khoj server code and tests 2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky
8b293edd7c Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues 2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky
c641eb4ad6 Improve rendering log and error stacktraces using the Rich package
- Use Rich to render uvicorn, fastAPI logs as well
  The previous CustomFormatter only worked on khoj logs
- Improve rendering stacktrace on errors using Rich
2023-02-15 16:19:32 -06:00
Debanjum Singh Solanky
bc7477ea3e Move Emacs, Obsidian plugin code out from under src/khoj directory
- What
  - The Emacs and Obsidian interfaces stay in their original
    directories under src/
  - src/khoj now only contains code meant for pypi packaging

- Benefits
  - This avoids having to update khoj MELPA, Obsidian plugin config as
    the Emacs, Obsidian code is under their original directories
  - It separates the code in src/khoj meant for python packaging from
    code for external interfaces like Emacs and Obsidian
2023-02-14 15:44:22 -06:00
Debanjum Singh Solanky
25a749ca1d Use the src/ layout to fix packaging Khoj for PyPi
- Why
  The khoj pypi packages should be installed in `khoj' directory.
  Previously it was being installed into `src' directory, which is a
  generic top level directory name that is discouraged from being used

- Changes
 - move src/* to src/khoj/*
 - update `setup.py' to `find_packages' in `src' instead of project root
 - rename imports to form `from khoj.*' in complete project
 - update `constants.web_directory' path to use `khoj' directory
 - rename root logger to `khoj' in `main.py'
 - fix image_search tests to use the newly rename `khoj' logger
 - update config, docs, workflows to reference new path `src/khoj'
2023-02-14 15:19:06 -06:00
Debanjum
84322b2a45
Demo using Search in Khoj Obsidian Plugin 2023-02-14 08:43:50 -08:00
Debanjum Singh Solanky
a4dcb20622 Add setting to toggle auto configuring of khoj backend from Obsidian
- By default the obsidian plugin automatically configures the khoj
  backend to index the current vault
- For more complex scenarios, users can manage their ~/.khoj/khoj.yml
  manually by toggling the auto-configure setting off in the khoj
  plugin settings

Resolves #156
2023-02-13 20:15:28 -06:00
Debanjum Singh Solanky
24aa696ef5 Indicate indexing active on Update button in Obsidian plugin settings
Use moon rotating through phases to indicate notes indexing in progress

Resolves #129
2023-02-13 19:28:19 -06:00
Debanjum Singh Solanky
11517ba8eb Encode jsonl data as utf8 for gzip write for consistent read/write encoding
Should help with issue #89
2023-02-12 17:33:23 -06:00
Debanjum Singh Solanky
3ec41c4d64 Wrap lines for org, markdown results in khoj search results buffer 2023-02-12 07:33:50 -06:00
Debanjum Singh Solanky
9a013ec48f Add more details to setup Khoj backend in Obsidian plugin readme 2023-02-12 07:31:13 -06:00
Jason Axelson
6d5930363a Fix obsidian plugins doc link
Also make it more obvious where the link is going, initially I thought
the link was to another official khoj documentation site.
2023-02-10 07:11:21 -10:00
Debanjum Singh Solanky
215235efd2 Bump khoj pre-release version 2023-02-08 20:24:36 -03:00
Debanjum Singh Solanky
2445664d40 Deprioritize searching for Music content over other text content 2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky
2e052913b6 Search in first configured content type when no search type set
Instead of searching through all configured content types but only
returning results of the last configured content type
2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky
a26ab31d20 Allow chat with markdown notes if no org-mode content configured 2023-02-07 02:41:31 -03:00
Debanjum Singh Solanky
fbb7747dcc Read Markdown file as utf8 instead of the default encoding used by OS
- Background
  1. Obsidian stores markdown notes as utf8[1]
  2. By default, the python `open' command uses the OS locale encoding[2]

  This was causing the `UnicodeDecodeError: <locale_encoding> codec can't decode byte' error

- Fix
  - Read markdown files as utf8
    The Obsidian plugin is the main use-case for markdown files in
    khoj currently and that stores md files as utf8.
    Do not assume utf8 for other content types like org-mode, beancount for now.
  - Fail if error in reading file as utf8, instead of ignoring errors.
    Would rather have user realize that their files are not going to
    get indexed correctly.

[1]: https://forum.obsidian.md/t/better-handle-md-files-not-stored-in-utf8-format/13524/3
[2]: https://docs.python.org/3/library/functions.html#open
2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky
66dca6cf33 Add Docs to Search across Languages, Uninstall Khoj to Readme
Add details and fixes to Obsidian, Main readme
based on feedback, confusion from the Obsidian plugin announcement
2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky
cba9a6a703 Use List, Tuple, Set from typing to support Python 3.8 for khoj
Before Python 3.9, you can't directly use list, tuple, set etc for
type hinting

Resolves #130
2023-02-06 01:23:52 -03:00
Debanjum Singh Solanky
f26cee604d Update Khoj Plugin Install Instructions. Rename main Readme to README
Khoj plugin page from within Obsidian isn't recognized. Seems like it
needs an uppercase readme file only. So it doesn't show the Khoj
readme from within Obsidian itself.
2023-01-27 20:01:31 -03:00
Debanjum Singh Solanky
2e13e15625 Ensure markdown entries in khoj.el results separated by empty line
- Update khoj.el test to reflect updated rendering logic
- Move ledger render function before image rendered to group functions
  with similar logic closer
2023-01-26 19:13:02 -03:00
Debanjum Singh Solanky
85ae46f429 Use thread_last to make results rendering funcs more readable in khoj.el 2023-01-26 18:59:44 -03:00
Debanjum Singh Solanky
b415f87093 Split code in onChooseSuggestion method to make it more readable
Split find file, jump to file code to make onChooseSuggestion more readable
- Use find, instead of using return in forEach to get first match
- Move the jump to file+heading code out from forEach
2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
37063f6a38 Truncate query to 8k chars for find similar notes from obsidian plugin
Truncate current file data passed to khoj backend API via query string
below default query size supported by popular servers
2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
4456cf5c8f No need to use then or finally in async functions after an await 2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
4070be637c Pass app object from plugin instance to child objects and functions
Do not reference global app object from child objects and funcs
directly.

It is only available for debugging purposes and access to it maybe
dropped in the future.
2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
c203c6a3fd Use Sentence case for Find Similar Note command name in Khoj Obsidian 2023-01-26 18:26:24 -03:00
Debanjum Singh Solanky
e18124ef6f Add badge for tests and update project subtitle in khoj.el Readme 2023-01-23 20:52:03 -03:00
Debanjum Singh Solanky
86e808abfb Test get-current-text helpers for Find Similar feature in khoj.el 2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky
be6acda212 Create khoj.el tests. Test rendering results of each content types 2023-01-23 20:33:47 -03:00
Debanjum Singh Solanky
0d0bf3b5aa Simplify get-current-text functions for Find Similar in khoj.el
Use existing functions like `string-trim', `thing-at-point' and
remove unneeded code from the two functions
2023-01-23 19:15:52 -03:00
Debanjum Singh Solanky
07e9e4ecc3 Get current paragraph text when point at start of paragraph in khoj.el
Previously if cursor was at start of current paragraph, it would get
text for the current and next paragraph, instead of just the current one
2023-01-23 18:05:54 -03:00
Debanjum Singh Solanky
a0b03c8bb1 Get current entry text when point at heading for Find Similar in khoj.el
Previously if cursor was at heading of current entry, it would find entries
similar to the previous outline heading, instead of the current one
2023-01-23 10:01:25 -03:00
Debanjum Singh Solanky
013c7c10a4 Bump khoj pre-release version 2023-01-22 18:45:56 -03:00
Debanjum Singh Solanky
ad3c9b5f44 Bump khoj version to 0.2.5 in preparation for release 2023-01-22 18:18:21 -03:00
Debanjum Singh Solanky
9ed056c7e7 Use consistent indentation in Khoj Emacs Readme 2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky
0980c6e87f Update Emacs Usage section in Readme. Add find-similar, menu usage 2023-01-22 18:04:12 -03:00
Debanjum Singh Solanky
6908b6eed3 Truncate image queries below max tokens length supported by ML model
This would previously return the infamous tensor size mismatch error
Verify this error is not raised since adding the query truncation logic
2023-01-21 14:11:00 -03:00
Debanjum Singh Solanky
3d9ed91e42 Search by image at path only if query of form "file:/path/to/image"
Previously no query syntax helpers, like the "file:" prefix, were used
before checking if query contains file path.

This made query to image search brittle to misinterpretation and
pointless checking

Add test to verify search by image at file works as expected
2023-01-21 14:06:56 -03:00
Debanjum Singh Solanky
b7aa22a059 Change order of arg passed to query-api-and-render-results by importance 2023-01-20 22:13:24 -03:00
Debanjum Singh Solanky
936a88fa7e Find items of specified type similar to current text item at point
- Support querying with text surrounding point in any text buffer
  Previously could only find items similar to org entry at point

- Find similar items of specified content type indexed on khoj
  Previously only looked for similar org entries indexed on khoj

  Now uses the content-type configured in khoj transient menu to find
  items of the specified content type

- Details
  - Generalize the get-current-org-entry-text func to get text for any
    outline section
  - Replace leading whitespaces from query text as well
  - Create method to get current paragraph text from non-outline mode
    buffers
  - Update transient, find-similar funcs to pass, use content-type
    configured in khoj transient menu
  - Generalize query title creation logic to remove markdown headings
    prefix (#) apart from org heading prefix (*) as well
  - Update last used khoj content-type and results from the
    find-similar and update funcs for later reuse
  - Jump to top of results buffer after results rendered
2023-01-20 22:12:54 -03:00
Debanjum Singh Solanky
17aaadea1f Find notes similar to current org entry at point 2023-01-20 05:14:54 -03:00
Debanjum Singh Solanky
44bbc0a417 Add section separators to khoj.el for easier code traversal 2023-01-19 23:36:54 -03:00
Debanjum Singh Solanky
48ad3c535e Use default content types if fail to call backend on khoj.el load
Do not want khoj.el to fail on init/load if khoj backend not running
2023-01-19 20:13:49 -03:00
Debanjum Singh Solanky
9f0bd0a361 Add Github workflow for khoj.el build and quality checks
Add khoj.el build badge to khoj.el Readme
2023-01-19 20:13:19 -03:00
Debanjum Singh Solanky
0dd1cba272 Rename configuration sections in khoj.el transient menu 2023-01-19 03:03:08 -03:00
Debanjum Singh Solanky
5d0f369186 Add ability to quit khoj transient with standard q keybinding 2023-01-19 02:47:07 -03:00
Debanjum Singh Solanky
87c7cf4272 Use single khoj func as entrypoint. Group khoj.el code into sections
- Give more relevant, specific name to khoj suffix commands
- Remove `khoj-simple'. Have single `khoj' function for entrypoint
2023-01-19 02:38:19 -03:00
Debanjum Singh Solanky
9d64a009fd Allow updating khoj content index from within khoj.el
- Split transient config menu by type
2023-01-18 23:07:59 -03:00
Debanjum Singh Solanky
a8d0c7d905 Rename search type to more apt content type in khoj.el 2023-01-18 22:13:49 -03:00
Debanjum Singh Solanky
00daea16df Allow setting default-search-type to image. Make docstrings compact 2023-01-18 22:01:17 -03:00
Debanjum Singh Solanky
216b17cfd0 Dynamically populate content type choices when khoj transient invoked 2023-01-18 22:00:56 -03:00
Debanjum Singh Solanky
5f446b1440 Convert main khoj.el entrypoint into transient menu for richer configuration 2023-01-18 21:50:07 -03:00
Debanjum Singh Solanky
5c07dcd219 Fix, update Obsidian Readme. Add Find Similar Notes to Implementation section 2023-01-18 00:22:26 -03:00
Debanjum
b7fc344be1
Search for Similar Notes from Obsidian Plugin
Enable searching for notes similar to the current note being viewed

## Main Changes
- 39a18e2 Extend search modal to search for similar notes
  - Hide input field on init, Trigger search on opening modal when in similar notes mode
  - Set input to contents of current markdown file and get notes similar to it
  - Re-rank, by default, when searching for similar notes
  - Filter out current note from similar note search results
- 0bed410 Only show `Find Similar Note' command in Editor
2023-01-18 00:10:10 -03:00
Debanjum Singh Solanky
6119d0a69e Add usage of "Find Similar Notes" command to the Khoj Obsidian Readme 2023-01-18 00:03:13 -03:00
Debanjum Singh Solanky
657e455785 Remove unused `onunload' method in main.ts of khoj obsidian plugin 2023-01-17 23:46:38 -03:00
Debanjum Singh Solanky
0bed410712 Limit Find Similar Note command to be triggered from Editor
Fixup indentation and comments
2023-01-17 19:34:48 -03:00
Debanjum Singh Solanky
39a18e2080 Add ability to search for similar notes in Khoj Obsidian
- Hide input field on init, Trigger search on opening modal in similar notes mode
- Set input to current markdown file and get similar notes to it
- Enable rerank when searching for similar notes
- Filter out current note from similar note search results
2023-01-17 19:07:18 -03:00
Debanjum Singh Solanky
ffaef92476 Encode query string before passing as query param to search API 2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky
d5a7cc5b0f Compact code to map results from search API into SearchResult objects
Make code compact for readability
Remove unneeded temporary variables and return statements
2023-01-17 18:04:11 -03:00
Debanjum Singh Solanky
8ab7a26bde Update Khoj on Obsidian screenshots in Main and Plugin Readme
- Screenshot querying "Setup Editor" on test vault with Khoj Readmes
- New features showcase:
  - information keybindings, rerank keybinding at bottom of modal
  - fixed top level headings in search results
  - search results snipped if greater than N words
2023-01-17 13:58:50 -03:00
Debanjum Singh Solanky
7b4f78776c Fix extracting Markdown Entries with Top Level Headings
- Previously top level headings would have get stripped of the
  space between heading text and the prefix # symbols. That is,
  `# Top Level Heading' would get converted to `#Top Level Heading'
- This would mess up their rendering as a heading in search results

- Add unit tests to text_to_jsonl processors to prevent regression
2023-01-17 13:06:28 -03:00
Debanjum Singh Solanky
1a296518c5 Limit total words for each Search Result rendered in search modal
Provides a more consistent rendering of results in modal.
Makes it easier to see more results in modal.
To see complete entry, user can always just jump to entry from modal
2023-01-17 13:06:14 -03:00
Debanjum Singh Solanky
e7b89f7fd0 Return compiled entry in additional details of /api/search response
This can be used to highlight portion of raw entry to highlight and
for passing to summarizer to stay with max_tokens limit supported by
GPT models
2023-01-16 22:56:06 -03:00
Debanjum Singh Solanky
7071d081e9 Increase max_tokens returned by GPT summarizer. Remove default params 2023-01-16 22:55:36 -03:00
Debanjum Singh Solanky
3d9cdadbbb Add codebase visualization of Khoj Obsidian to Khoj Obsidian Readme 2023-01-15 14:09:21 -03:00
Debanjum Singh Solanky
d02ba325aa Handle empty chat history returned by API to chat.html on web interface 2023-01-15 13:51:16 -03:00
Debanjum
3f2ea039a7
Add Chat page to the Khoj Web Interface
### Overview
- Provide a chat interface to engage with and inquire your notes
- Simplify interacting with the beta `chat` and `summarize` APIs

### Use
- Open `<khoj-url>/chat`, by default at http://localhost:8000/chat?type=summarize
- Type your queries, see summarized response by Khoj from your notes

**Note**:
 - **You will need to add an API key from OpenAI to your khoj.yml**
 - **Your query and top note from search result will be sent to OpenAI for processing**

## Details
- 177756b Show chat history on loading chat page on web interface
- d8ee0f0 Save chat history to disk for persistence, seeing chat logs
- 5294693 Style chat messages as speech bubbles
- d170747 Add khoj web interface and chat styling to new chat page on khoj web
- de6c146 Implement functional, unstyled chat page for khoj web interface
2023-01-13 23:02:19 -03:00
Debanjum Singh Solanky
16d4560ff8 Comment css styling of chat page for later reference 2023-01-13 22:40:01 -03:00
Debanjum Singh Solanky
cfef346d03 Do not update query field to ever chat message
It doesn't work as well with chat, unlike for search page
Use more appropriate thinking face emoji for you instead of surprise face
2023-01-13 22:24:26 -03:00
Debanjum Singh Solanky
177756be7e Fetch chat history from backend and render it on chat page load 2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
330febaa1a Update conversation logs from /beta/summary API endpoint too 2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
cb6f0b53c9 Make user_message_metadata arg to message_to_log in gpt.py optional
- Use a default user_message_metadata if arg not set
- Update conversation to use `by' as `you' and `khoj'
2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
cc2456e411 Update /beta/chat API to return chat history if no query param passed 2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
d8ee0f0e9a Use scheduler to save chat history to disk every 5 minutes
- The previous mechanism to trigger saving on shutdown event did not work
- Use scheduler to persist chat sessions to disk at a 5 minute interval
  - This improve time granularity, fixed interval of saving chat logs
  - It may lose ~5 minutes of chat history until mechanism to also
    write on shutdown found/resolved
- Create conversation directory if it doesn't exist before attempting write
- Reset chat_session after writing it to disk
2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
5294693e97 Style message as speech bubbles on chat page of web interface
- Wrap messages into speech bubbles
  - Color messages by khoj blue, sender grey
  - Add those standard protrusions to the speech bubbles for fun

- Align bubbles left or right based on sender
  - messages by khoj are left aligned, message by self are right aligned

- Put message metadata like sender and time under speech bubble
  - use data-* attribute and ::after css pseudo-selector for this

- Update renderMessage func to accept time param, remove unused type_ param
2023-01-13 22:01:57 -03:00
Debanjum Singh Solanky
7723d656dc Do not force GPT to summarize note using past tense
Not all notes are in the past. Notes can be about stuff in the future.
Casting them to past tense gives the impression that they've already
happened / been done.
2023-01-13 13:10:35 -03:00
Debanjum Singh Solanky
2842e3a035 Automatically scroll to bottom of chat body on new messages 2023-01-13 13:09:51 -03:00
Debanjum Singh Solanky
34014635d0 Improve colors, fix contrast for accessability on web interface
- Changes
  - Use blue color for khoj heading font
    - This fixes the title color issue

  - Update background to lighter shade
    - This fixes the body text color issue

  - Update colors for todo, done, miscellaneous todo state, tag color
    - This does not fix the color contrast issue but seems like an acceptable solution
    - Using white text rather than black text on blue background
      better even though the black text on blue background passes the
      WCAG acceptable contrast score
    - For details see blog post:
      https://uxmovement.com/buttons/the-myths-of-color-contrast-accessibility/

  - Add border to tags to give them tag pills look and differntiate
    from todo states

  - Buttons and inputs
    - Change background color of input fields like type dropdown,
      update button and results count counter, to match background
      color of page
    - Add shadow on hover over button, dropdowns

Resolves #111
2023-01-12 21:59:50 -03:00
Debanjum Singh Solanky
d170747ec2 Add khoj web interface & chat styling to new chat page on khoj web
- Ensure message input box sticks to bottom of screen
- Ensure chat logs div is scrollable when logs become longer than screen
  Do not make the whole page scroll, just the chat logs body div
2023-01-12 21:58:46 -03:00
Debanjum Singh Solanky
de6c146290 Implement functional, unstyled chat page for khoj web interface
Expose it at /chat URL
2023-01-12 21:53:25 -03:00
Debanjum Singh Solanky
e6793816f9 Upgrade Khoj.el Readme. Add TOC, Screenshot, Features Sections
- Update Query filter details
2023-01-12 02:14:02 -03:00
Debanjum Singh Solanky
26f791e9ad Update Obsidian Plugin Readme. Add Khoj icon to Khoj Modal Placeholder text
- Fold Query Filter, Demo Description
- Add Limitations to Readme
- Add *Update* index bullet to Troubleshooting Options
2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky
3e63af5c94 Constrain grid rows to fix layout of Khoj web interface on Chrome 2023-01-12 01:48:52 -03:00
Debanjum Singh Solanky
50c797962c Jump to Search Result from Khoj Modal even on Obsidian Android
Uses longest file path match to find markdown file in vault
corresponding to file of search result returned by Khoj

Allow jumping to search result from khoj plugin modal on Android too
2023-01-11 19:44:11 -03:00
Debanjum Singh Solanky
51ea6d9c9b Do not force index update when configure backend on plugin load
- Backend can handle incremental updates
- Avoid khoj usability delay by avoiding recomputed everytime vault opened
2023-01-11 17:17:08 -03:00
Debanjum Singh Solanky
5996d47d7c Trigger input event to Get, Render Reranked results from Khoj backend
Previous mechanism of manually triggering getSuggestions,
renderSuggestions flow was corrupting traversing and opening
reranked search results in KhojModal

Emulate event that would anyway trigger the get & render of results in
modal. This lets obsidian core handle the flow without digging too
deep into obsidian cores handling of the flow. Lowers the chance of
breakage
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
1c813a6884 Convert results count setting to slider in plugin settings pane 2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
4e1abd1b72 Disable update button while indexing vault in plugin settings 2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
513c86c6a1 Set index file paths relative to current or default path on khoj backend
We need the index file paths to make sense on the khoj backend server

Having path of index on backend relative to current vault directory
on frontend ignores the fact that the frontend maybe on a different
machine than the khoj backend server

Using unique index name per vault allows switching vaults without
overwriting indices of other vaults created on khoj backend when khoj
obsidian plugin is loaded on opening a different vault
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
4407e23c19 Only index current vault on Khoj. Remove plugin setting to configure it
- Overview
  Limits using Khoj with a single vault at a time. This is
  automatically configured to the most recently opened vault.

  Once directory filters are supported on backend, the plugin will be
  updated to index multiple vault but search only current vault from
  current vaults khoj obsidian plugin

- Code Details
 - Remove setting to configure Vault directory from Khoj Obsidian plugin
 - Automatically configure Khoj to index only current Vault.
 - Overwrites any previous vaults that were intended to be indexed by
   Khoj backend
 - Force update of index after configuring vault

- Why
  It's not helpful for now and can lead to more problems, confusion.
  Once directory filters
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
86a1e43605 Return HTTP Exception on /api/update API call failure
- Previously the backend was just throwing backend error.
  The frontend calling the /update API wasn't getting notified
- Now the frontend can react appropriately and make the issue
  visible to the user
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
5af2b68e2b Update plugin notifications for errors and success
- Only show notification on plugin load and failure.
- In settings page, set current backend status at top of pane instead
  of showing notification
  Notices bubbles cluttered the UI while typing updates to settings
- Show notification once index updated via settings pane button click
  There was no notification on index updated, which usually takes time
  on the backend
2023-01-11 16:39:23 -03:00
Debanjum Singh Solanky
853192932a setCTA on Khoj Obsidian plugin button. Minor cleanup of space, tabs 2023-01-10 23:36:02 -03:00
Debanjum Singh Solanky
da49ea272c Add placeholder text to modal in Khoj Obsidian plugin 2023-01-10 22:50:11 -03:00
Debanjum Singh Solanky
580f4aca23 Add hints to Modal for available Keybindings 2023-01-10 22:03:47 -03:00
Debanjum Singh Solanky
b52cd85c76 Allow Reranking Results using Keybinding from Khoj Search Modal 2023-01-10 21:59:38 -03:00
Debanjum Singh Solanky
7991ab7a86 Add button in Obsidian plugin settings to force re-indexing your vault 2023-01-10 19:49:12 -03:00
Debanjum Singh Solanky
f046a95f3d Track connectedToBackend as a setting. Use it across obsidian plugin
- Display warning at top of khoj obsidian plugin settings
- Make search command available only if connected to backend
- Show warning notice on clicking khoj search ribbon button

- Call saveData after configureKhojBackend to ensure
  connnectedToBackend setting saved after being (potentially) updated
  in configureKhojBackend function
2023-01-10 17:28:47 -03:00
Debanjum Singh Solanky
768e874185 Load obsidian plugin even if fail to connect to backend but show warning
- Previously the plugin would not load if cannot connect to Khoj backend
  - Silently failing to load with no reason provided is not helpful
- Load plugin to allow user to fix the Khoj URL in their plugin setting
- Show reason for khoj plugin not working. More helpful than failing silently
2023-01-10 17:20:02 -03:00
Debanjum Singh Solanky
aa22d83172 Create and use a context manager to time code
Use the timer context manager in all places where code was being timed

- Benefits
  - Deduplicate timing code scattered across codebase.
  - Provides single place to manage perf timing code
  - Use consistent timing log patterns
2023-01-09 19:48:16 -03:00
Debanjum Singh Solanky
93f39dbd43 Add typing to text_search. Reformat code to set existing_embedding 2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
db7483329c Only import type hint packages for type checking. Avoids circular imports
Use annotations from the __future__ package to avoid having to quote
type hints. This import will not be required after Python 3.11
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
e5254a8e56 Create BaseEncoder class. Make OpenAI encoder its child. Use for typing
- Set type of all bi_encoders to BaseEncoder

- Make load_model return type Union of CrossEncoder and BaseEncoder
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
cf7400759b Remove unused render_results method from text and image search
It's a relic from when khoj was being used as a python module
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
afcfc3cd62 Split text_search.query logic into separate methods for modularity
The query method had become too big.

Extract out filter, score, sort and deduplicate logic used by
text_search.query into separate methods.

This should improve readabilty of code.
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
8dc6ee8b6c Pass `model' arg to extract_search_type method from beta search API
Issue caught by mypy
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
8498903641 Fix, add typing to Filter and TextSearchModel classes
- Changes
  - Fix method signatures of BaseFilter subclasses.
    Else typing information isn't translating to them
  - Explicitly pass `entries: list[Entry]' as arg to `load' method
  - Fix type of `raw_entries' arg to `apply' method
    to list[Entry] from list[str]
  - Rename `raw_entries' arg to `apply' method to `entries'
  - Fix `raw_query' arg used in `apply' method of subclasses to `query'
  - Set type of entries, corpus_embeddings in TextSearchModel

- Verification
  Ran `mypy --config-file .mypy.ini src' to verify typing
2023-01-09 19:47:27 -03:00
Debanjum Singh Solanky
eace7c6215 Use torch.tensor as torch.Tensor cannot create tensor on MPS device
- `torch.Tensor' is apparently a legacy tensor constructor
- Using that to create tensor on MPS devices throws error:
  RuntimeError: legacy constructor expects device type: cpu but device type: mps was passed
- `torch.tensor' can handle creating tensors on Mac GPU (MPS) fine
2023-01-09 19:47:19 -03:00
Debanjum Singh Solanky
9def3f8c6f Add exception handling to beta APIs, in case OpenAI API call fails 2023-01-09 01:27:06 -03:00
Debanjum Singh Solanky
7b164de021 Add beta API to summarize top search result using an OpenAI model
This is unlike the more general chat API that combines summarization
of top search result and conversing with the OpenAI model

This should give faster summary results. As no intent categorization
API call required
2023-01-09 01:25:59 -03:00
Debanjum Singh Solanky
d36da46f7b Truncate prompt to not exceed OpenAI prompt limit
Truncate prompt containing the top retrieved entry to 500 words to
avoid triggering the max_token limit error
2023-01-09 00:51:46 -03:00
Debanjum Singh Solanky
237123d18c Fix tests for the conversation processor
- Use latest davinci model for tests
- Wrap prompt in triple quotes to improve legibilty
- `understand' method returns dictionary instead of string. Fix its test
- Fix prompt for new model to pass `chat_with_history' test
2023-01-09 00:22:26 -03:00
Debanjum Singh Solanky
918af5e6f8 Make OpenAI conversation model configurable via khoj.yml
- Default to using `text-davinci-003' if conversation model not
  explicitly configured by user. Stop using the older `davinci' and
  `davinci-instruct' models

- Use `model' instead of `engine' as parameter.
  Usage of `engine' parameter in OpenAI API is deprecated
2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky
74e779f8d0 Fix /beta/chat API to use Entry class instead of old dictionary pattern
Search returns response of type SearchResponse instead of a dict now
2023-01-08 15:28:26 -03:00
Debanjum Singh Solanky
f2436039a0 Improve readability of GPT prompt strings in conversation processor 2023-01-08 15:27:41 -03:00
Debanjum Singh Solanky
6119005838 Improve comments, exceptions, typing and init of OpenAI model code 2023-01-08 00:36:18 -03:00
Debanjum Singh Solanky
c0ae8eee99 Allow using OpenAI models for search in Khoj
- Init processor before search to instantiate `openai_api_key'
  from `khoj.yml'. The key is used to configure search with openai models

- To use OpenAI models for search in Khoj
  - Set `encoder' to name of an OpenAI model. E.g text-embedding-ada-002
  - Set `encoder-type' in `khoj.yml' to `src.utils.models.OpenAI'
  - Set `model-directory' to `null', as online model cannot be stored on disk
2023-01-07 23:13:56 -03:00
Debanjum Singh Solanky
826f9dc054 Drop long words from compiled entries to be within max token limit of models
Long words (>500 characters) provide less useful context to models.

Dropping very long words allow models to create better embeddings by
passing more of the useful context from the entry to the model
2023-01-07 23:13:56 -03:00