Commit graph

3995 commits

Author SHA1 Message Date
Saba
c5666e0404 Move factory dependencies to optional settings 2023-06-06 23:26:24 -07:00
Saba
5d5ebcbf7c Rename truncate messages method and update unit tests to simplify assertion logic 2023-06-06 23:25:43 -07:00
Saba
7119ed0849 Run pre-commit script 2023-06-05 19:29:23 -07:00
Saba
948ba6ddca Remove unused logger 2023-06-05 19:01:03 -07:00
Saba
6212d7c2e8 Remove debug line 2023-06-05 19:00:25 -07:00
Saba
f65ff9815d Move message truncation logic into a separate function. Add unit tests with factory boy. 2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky
eb6175e9b0 Update description field in webmanifest of Khoj, Khoj Chat PWA 2023-06-06 01:53:42 +05:30
Debanjum Singh Solanky
bb2363f324 Set client request param when calling khoj server APIs from Web 2023-06-06 00:05:00 +05:30
Debanjum Singh Solanky
caab55fbdd Set client request param when calling khoj server APIs from Obsidian 2023-06-06 00:04:46 +05:30
Debanjum Singh Solanky
de2494154f Set client request param when calling khoj server APIs from Emacs 2023-06-06 00:02:10 +05:30
Debanjum Singh Solanky
168c11cea7 Make server API endpoints accept client as query param
- The chat, search and update API will accept client as request param.
- This will allow logging the client from which these APIs was called.
2023-06-05 23:57:08 +05:30
Debanjum Singh Solanky
8617cf1389 Push telemetry to Posthog to grok Khoj usage 2023-06-05 22:47:49 +05:30
Debanjum Singh Solanky
d13db2e666 Make old telemetry server forward requests to new server 2023-06-05 13:06:45 +05:30
Saba
5f4223efb4 Increase timeout to OpenAI call 2023-06-04 20:49:47 -07:00
Saba
0e63a90377 Fix the mechanism to retrieve the message content 2023-06-04 20:25:37 -07:00
Saba
f0efe0177e Pass truncated message as string in ChatMessage when exceeding max prompt size 2023-06-04 19:33:46 -07:00
Debanjum
f6ceb22373
Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 15:05:34 +05:30
Saba
068ee0ac5e Swap elif with else, as usage of this method does not use openai_api_key 2023-06-04 02:25:08 -07:00
Saba
6508379d7b Use api_key keyword argument to set the openai_api_key parameter for GPT 2023-06-04 00:57:00 -07:00
Debanjum Singh Solanky
7af8a56434 Remove filename from reference before rendering references in khoj.el
Fixes bug where actual reference heading in next line jumping out of
references footnote section
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
ec280067ef Do not retrieve relevant notes when having a general chat with Khoj
- This improves latency of @general chat by avoiding unnecessary
  compute
- It also avoids passing references in API response when they haven't
  been used to generate the chat response. So interfaces don't have to
  add logic to not render them unnecessarily
2023-06-02 10:42:44 +05:30
Debanjum Singh Solanky
90439a8db1 Update Khoj subtitle to AI personal assistant for your digital brain 2023-06-02 10:42:44 +05:30
Debanjum
e022910f31
Search PDF files with Khoj. Integrate with LangChain
- **Introduce Khoj to LangChain**: 
    Call GPT with LangChain for Khoj Chat
- **Search (and Chat about) PDF files with Khoj**
  - Create PDF to JSONL Processor: Convert PDF content into standardized JSONL format
  - Expose PDF search type via Khoj server API
  - Enable querying PDF files via Obsidian, Emacs and Web interfaces
2023-06-02 10:20:26 +05:30
Debanjum Singh Solanky
e9ed7a19fd Update search prompt to extract PDF search type. Fix extract_question prompt 2023-06-02 10:06:03 +05:30
Debanjum Singh Solanky
89fbfce20a Mention PDF are also supported in Khoj Readme 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
bbe3bf9733 Render PDF search results in Khoj Obsidian interface
- Make plugin update khoj server config to index PDF files in vault too
- Make Obsidian plugin update index for PDF files in vault too
- Show PDF results in Khoj Search modal as well
  - Ensure combined results are sorted by score across both types
- Jump to PDF file when select it PDF search result from modal
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
e3892945d4 Render PDF search results in Khoj.el Emacs interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
85144006a1 Render PDF search results in khoj web interface 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
acd14a5e41 Wire up PDF to jsonl processor to Khoj server layer (API, config)
- Specify PDF content to index via khoj.yml
- Index PDF content on app start, reconfigure
- Expose PDF as a search type via API
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
d63194c3a9 Create tests for PDF to JSONL processor 2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
286b500f66 Create PDF to JSONL processor using PyPDF and LangChain
Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts
throwing typing error for python 3.8, 3.9
2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky
1b3effd8e6 Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor 2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky
1cd9ecd449 Truncate last message if still over max supported prompt size by model 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
ed4d0f9076 Simplify argument names used in khoj openai completion functions
- Match argument names passed to khoj openai completion funcs with
  arguments passed to langchain calls to OpenAI
- This simplifies the logic in the khoj openai completion funcs
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
703a7c89c0 Reduce retry count and request timeout for faster response or failure
- Fix bug where both LangChain and Khoj retry requests 6 times each.
  So a total of 12 requests at >1minute intervals for each chat
  response in case of OpenAI API being down

- Retrying too many times when the API is failing doesn't help
- The earlier 60 second request timeout was spacing out the interval
  between retries way too much. This slowed down chat response times
  quite a bit when API was being flaky

- With these updates you'll know if call to chat API failed in under a
  minute
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
18081b3bc6 Use LangChain to call GPT over API 2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
277d2f5c96 Do not add "Notes:" suffix to chat messages when no notes retrieved
This was causing spurious "Notes:" suffix being added to Khoj Chat in
response
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
334be4e600 Use LangChain to call OpenAI for Khoj Chat
- Use ChatModel and ChatOpenAI to call OpenAI chat model instead of
  using OpenAI package directly
- This is being done as part of migration to rely on LangChain for
  creating agents and managing their state
2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky
efcf7d1508 Extract prompts as LangChain Prompt Templates into a separate module
Improves code modularity, cleanliness. Reduces bloat in GPT.py module
2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky
b484953bb3 Import app state correctly to generate embeddings with OpenAI model
Resolves #216
2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
9cfaaf0941 Update docs to configure khoj.yml for using OpenAI model for embeddings 2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky
a0d0dbaca7 Fix link to Khoj Obsidian Demo video in Readmes 2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky
ebb5d7b8e5 Release Khoj version 0.6.2 2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky
d02415edcc Write generated server id to env file when env file does not contain it 2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky
dc0626856e Put the telemetry db in a separate directory by default 2023-05-17 18:58:47 +05:30
Debanjum
dc495babb3
Add Telemetry to Understand Khoj Usage
### Objective: 
Use telemetry to better understand Khoj usage.
This will motivate and prioritize work for Khoj.

Specific questions:
- Number of active deployments of khoj server
- How regularly is khoj used (hourly, daily, weekly etc)?
- How much is which feature used (chat, search)?
- Which UI interface is used most (obsidian, emacs, web ui)?

### Details
- Expose setting to disable telemetry logging in khoj.yml
- Create basic telemetry server to log data to a DB
- Log calls to Khoj API /search, /chat, /update endpoints
- Batch upload telemetry data to server at ~hourly interval
2023-05-17 19:09:50 +08:00
Debanjum Singh Solanky
55d72231b3 Generate docker image for telemetry server using Github workflow 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
e9f04dc644 Add dockerfile to containerize telemetry server 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
07b19964d4 Schedule jobs at (co-)prime intervals to reduce overlap in job runs 2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky
d42f0f5055 Add basic telemetry server for khoj 2023-05-17 16:08:21 +05:30