Commit graph

1187 commits

Author SHA1 Message Date
Saba
5d5ebcbf7c Rename truncate messages method and update unit tests to simplify assertion logic 2023-06-06 23:25:43 -07:00
Saba
7119ed0849 Run pre-commit script 2023-06-05 19:29:23 -07:00
Saba
948ba6ddca Remove unused logger 2023-06-05 19:01:03 -07:00
Saba
6212d7c2e8 Remove debug line 2023-06-05 19:00:25 -07:00
Saba
f65ff9815d Move message truncation logic into a separate function. Add unit tests with factory boy. 2023-06-05 18:58:29 -07:00
Saba
5f4223efb4 Increase timeout to OpenAI call 2023-06-04 20:49:47 -07:00
Saba
0e63a90377 Fix the mechanism to retrieve the message content 2023-06-04 20:25:37 -07:00
Saba
f0efe0177e Pass truncated message as string in ChatMessage when exceeding max prompt size 2023-06-04 19:33:46 -07:00
Debanjum
f6ceb22373
Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 15:05:34 +05:30
Saba
068ee0ac5e Swap elif with else, as usage of this method does not use openai_api_key 2023-06-04 02:25:08 -07:00
Saba
6508379d7b Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky
7af8a56434 Remove filename from reference before rendering references in khoj.el
Fixes bug where actual reference heading in next line jumping out of
references footnote section
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
ec280067ef Do not retrieve relevant notes when having a general chat with Khoj
- This improves latency of @general chat by avoiding unnecessary
  compute
- It also avoids passing references in API response when they haven't
  been used to generate the chat response. So interfaces don't have to
  add logic to not render them unnecessarily
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
90439a8db1 Update Khoj subtitle to AI personal assistant for your digital brain 2023-06-02 10:42:44 +05:30
Debanjum
e022910f31
Search PDF files with Khoj. Integrate with LangChain
- **Introduce Khoj to LangChain**: 
    Call GPT with LangChain for Khoj Chat
- **Search (and Chat about) PDF files with Khoj**
  - Create PDF to JSONL Processor: Convert PDF content into standardized JSONL format
  - Expose PDF search type via Khoj server API
  - Enable querying PDF files via Obsidian, Emacs and Web interfaces
2023-06-02 10:20:26 +05:30
Debanjum Singh Solanky
e9ed7a19fd Update search prompt to extract PDF search type. Fix extract_question prompt 2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky
89fbfce20a Mention PDF are also supported in Khoj Readme 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
bbe3bf9733 Render PDF search results in Khoj Obsidian interface
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
  - Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
e3892945d4 Render PDF search results in Khoj.el Emacs interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
85144006a1 Render PDF search results in khoj web interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
acd14a5e41 Wire up PDF to jsonl processor to Khoj server layer (API, config)
- Specify PDF content to index via khoj.yml
- Index PDF content on app start, reconfigure
- Expose PDF as a search type via API
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
d63194c3a9 Create tests for PDF to JSONL processor 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
286b500f66 Create PDF to JSONL processor using PyPDF and LangChain
Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts
throwing typing error for python 3.8, 3.9
2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky
1b3effd8e6 Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor 2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky
1cd9ecd449 Truncate last message if still over max supported prompt size by model 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
ed4d0f9076 Simplify argument names used in khoj openai completion functions
- Match argument names passed to khoj openai completion funcs with
  arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
703a7c89c0 Reduce retry count and request timeout for faster response or failure
- Fix bug where both LangChain and Khoj retry requests 6 times each.
  So a total of 12 requests at >1minute intervals for each chat
  response in case of OpenAI API being down

- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
  between retries way too much. This slowed down chat response times
  quite a bit when API was being flaky

- With these updates you'll know if call to chat API failed in under a
  minute
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
18081b3bc6 Use LangChain to call GPT over API 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
277d2f5c96 Do not add "Notes:" suffix to chat messages when no notes retrieved
This was causing spurious "Notes:" suffix being added to Khoj Chat in
response
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
334be4e600 Use LangChain to call OpenAI for Khoj Chat
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
  using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
  creating agents and managing their state
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
efcf7d1508 Extract prompts as LangChain Prompt Templates into a separate module
Improves code modularity, cleanliness. Reduces bloat in GPT.py module
2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky
b484953bb3 Import app state correctly to generate embeddings with OpenAI model
Resolves #216
2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
9cfaaf0941 Update docs to configure khoj.yml for using OpenAI model for embeddings 2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
a0d0dbaca7 Fix link to Khoj Obsidian Demo video in Readmes 2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky
ebb5d7b8e5 Release Khoj version 0.6.2 2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky
d02415edcc Write generated server id to env file when env file does not contain it 2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky
dc0626856e Put the telemetry db in a separate directory by default 2023-05-17 18:58:47 +05:30
Debanjum
dc495babb3
Add Telemetry to Understand Khoj Usage
### Objective: 
Use telemetry to better understand Khoj usage.
This will motivate and prioritize work for Khoj.

Specific questions:
- Number of active deployments of khoj server
- How regularly is khoj used (hourly, daily, weekly etc)?
- How much is which feature used (chat, search)?
- Which UI interface is used most (obsidian, emacs, web ui)?

### Details
- Expose setting to disable telemetry logging in khoj.yml
- Create basic telemetry server to log data to a DB
- Log calls to Khoj API /search, /chat, /update endpoints
- Batch upload telemetry data to server at ~hourly interval
2023-05-17 19:09:50 +08:00
Debanjum Singh Solanky
55d72231b3 Generate docker image for telemetry server using Github workflow 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
e9f04dc644 Add dockerfile to containerize telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
07b19964d4 Schedule jobs at (co-)prime intervals to reduce overlap in job runs 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
d42f0f5055 Add basic telemetry server for khoj 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
134cce9d32 Batch upload telemetry data at regular interval instead of while querying 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
3ede919c66 Log usage of /search, /chat, /update API endpoints to telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
f2e89f6f46 Add khoj app helper methods to log app usage to a telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
9ca61d62ff Enable/disable logging telemetry by setting bool in khoj.yml config
We log usage telemetry by default, unless setting explicitly set in
khoj.yml
2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky
131b8407b5 Allow Khoj Chat to respond to general queries not in reference notes
- Khoj chat will now respond to general queries if:
  1. no relevant reference notes available or
  2. when explicitly induced by prefixing the chat message with "@general"

- Previously Khoj Chat would a lot of times refuse to respond to
  general queries not answerable from reference notes or chat history

- Make chat quality tests more robust
  - Add more equivalent chat response options refusing to answer
  - Force haiku writing to not give any preable, just the haiku
2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky
cc75f986b2 Test text search index only updates on changes to text content 2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky
f9ccce430e Allow configuring OpenAI chat model for Khoj chat
- Simplifies switching between different OpenAI chat models. E.g GPT4
- It was previously hard-coded to use gpt-3.5-turbo. Now it just
  defaults to using gpt-3.5-turbo, unless chat-model field under
  conversation processor updated in khoj.yml
2023-05-03 23:01:13 +08:00
Debanjum
f0253e2cbb
Include Filename, Entry Heading in All Compiled Entries to Improve Search Context
Merge pull request #214 from debanjum/add-filename-heading-to-compiled-entry-for-context

- Set filename as top heading in compiled org, markdown entries
  - Note: *Khoj was already indexing filenames in compiled markdown entries but they weren't set as top level headings but rather appended as bare text*. The updated structure should provide more schematic context of relevance
- Set entry heading as heading for compiled org, md entries, even if split by max tokens
- Snip prepended heading to avoid crossing model max_token limits
- Entries with no md headings should not get heading prefix prepended
2023-05-03 22:59:30 +08:00