Commit graph

72 commits

Author SHA1 Message Date
Debanjum Singh Solanky
05a3c81adb Add beautiful as dependency to pass pytests 2023-06-23 15:10:09 -07:00
Debanjum Singh Solanky
69d4fa6525 Rename project links across repo from debanjum/khoj to khoj-ai/khoj 2023-06-21 00:13:21 -07:00
Debanjum Singh Solanky
c29c141a7e Use Github Rest API to index Markdown files in Github Repository
The Llama_Hub Github plugin is fairly limited.

The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
2023-06-17 02:16:13 -07:00
Saba
a6cd96a6a9 Add a Github plugin which can be used to read from a Github repository 2023-06-13 14:40:06 -07:00
Saba
c5666e0404 Move factory dependencies to optional settings 2023-06-06 23:26:24 -07:00
Saba
f65ff9815d Move message truncation logic into a separate function. Add unit tests with factory boy. 2023-06-05 18:58:29 -07:00
Debanjum Singh Solanky
acd14a5e41 Wire up PDF to jsonl processor to Khoj server layer (API, config)
- Specify PDF content to index via khoj.yml
- Index PDF content on app start, reconfigure
- Expose PDF as a search type via API
2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky
286b500f66 Create PDF to JSONL processor using PyPDF and LangChain
Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts
throwing typing error for python 3.8, 3.9
2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky
efcf7d1508 Extract prompts as LangChain Prompt Templates into a separate module
Improves code modularity, cleanliness. Reduces bloat in GPT.py module
2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky
285a2b86d2 Use aiohttp version 3.8.4 as 4.x breaks docker image build 2023-03-26 05:33:02 +07:00
Debanjum Singh Solanky
67c850a4ac Add retry logic to OpenAI API queries to increase Chat tenacity
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
  tenacity package. This is officially suggested and used by other
  popular GPT based libraries
2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky
7e36f421f9 Truncate message logs to below max supported prompt size by model
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
  argument to generate_chatml_messages_with_context method
2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky
1b4d562700 Test Chat Director Capabilities: Answer from notes, chat history etc
- Chat directors are broad agents.
  - Chat directors orchestrate narrow actor agents to synthesize
    final response for the user
  - Agents are Prompts + ML Model

- Test Chat Director Capabilities
  1. [X] Answer from retrieved notes
  2. [X] Answer from chat history
  3. [X] Answer general questions
  4. [X] Carry out multi-turn conversation
  5. [X] Say don't know when answer not in provided context
  6. [X] Answers that require current date awareness
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
  7. [X] Date-aware aggregation across multiple different notes
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
  8. [X] Ask clarification questions if no unambiguous answer in provided context
  9. [X] Retrieve answer from chat history beyond lookback window
     This test is expected to fail as the chat director is not capable
     of searching chat history yet. But the test allows assessing chat quality
 10. [X] Retrieve context for answer using multiple independent
         searches on knowledge base
     This test is expected to fail as the chat is not capable of doing
     this without the Search actor. But the test allows assessing chat quality
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
7c4d546039 Configure tests to mark chat quality tests & filter unhelpful warnings
- Mark chat quality tests, register custom mark for chat quality
- Filter unhelpful deprecation warnings from within dateparser library
- Error if tests use unregistered marks
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
ad1f1cf620 Improve and simplify Khoj Chat using ChatGPT
- Set context by either including last 2 chat messages from active
  session or past 2 conversation summaries from conversation logs

- Set personality in system message
- Place personality system message before last completed back & forth
  This may stop ChatGPT forgetting its personality as conversation progresses given:
  - The conditioning based on system role messages is light
  - If system message is too far back in conversation history, the
    model may forget its personality conditioning
  - If system message at end of conversation, the model can think its
    the start of a new conversation
  - Inserting the system message before last completed back & forth should
    prevent ChatGPT from assuming its the start of a new conversation
    while not losing personality conditioning from the system message

- Simplfy the Khoj Chat API to for now just answer from users notes
  instead of trying to infer other potential interaction types.
  - This is the default expected behavior from the feature anyway
  - Use the compiled text of the top 2 search results for context

- Benefits of using ChatGPT
  - Better model
  - 1/10th the price
  - No hand rolled prompt required to make GPT provide more chatty,
    assistant type responses
2023-03-05 01:24:13 -06:00
Debanjum Singh Solanky
a8940462c4 Automate khoj python package versioning using hatch-vcs and Git tags 2023-02-17 18:19:01 -06:00
Debanjum Singh Solanky
36be3c4b8f Fix or ignore MyPy issues in PyQt desktop GUI code
- Remove unneeded type ignore for mps with the latest mypy
- Stop excluding PyQT desktop GUI code from MyPy checks
- Do not warn about unused ignores. Some issue with mypy giving
  different errors in different environments (venv, system and pre-commit)
2023-02-17 16:13:05 -06:00
Debanjum Singh Solanky
051f0e3fb5 Add, configure and run pre-commit locally and in test workflow 2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky
5e83baab21 Use Black to format Khoj server code and tests 2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky
8b293edd7c Move mypy config into pyproject.toml. Ignore 2 remaining mypy issues 2023-02-16 03:33:08 -06:00
Debanjum Singh Solanky
7a9a811874 Fix authors, homepage URL in pyproject.toml and workflow triggers 2023-02-16 03:19:56 -06:00
Debanjum Singh Solanky
dcb86c2d3e Build khoj python package using hatchling, pyproject.toml
- Why
  - pyprojects.toml is the python standards compliant config format
    - allows collating python tooling configs into single standard file
  - hatch(-ling) is a new lightweight build system for python packages

- Detailed Changes
  - Replace setup.py, setuptools with pyproject.toml, hatchling for
    khoj python config and build
  - move pytest into optional development dependencies
  - add more links to khoj in the project urls section
  - add topic classifiers and keywords to find khoj package

  - Delete setup.py, MANIFEST.in as moved to pyproject.toml based setup
  - Update pypi workflow to set python package version in pyproject.toml
2023-02-16 02:37:32 -06:00