Commit graph

2474 commits

Author SHA1 Message Date
Debanjum Singh Solanky
e3deb29f8e Upgrade khoj.el workflow to use Python 3.11 2024-04-07 11:24:07 +05:30
Debanjum Singh Solanky
14fbf594b2 Support using Python 3.12 with Khoj
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
  It's an optional dependency anyway, so only install it if python < 3.12
- Run unit tests with python version 3.12 as well

Resolves #522
2024-04-07 11:23:44 +05:30
sabaimran
86c831f7e2 Add a link to the data sources portion in the clients documentation 2024-04-07 09:32:58 +05:30
sabaimran
351fb31a34 Add webpage search to socket codepath, add a feature page for online search 2024-04-07 09:23:29 +05:30
Debanjum Singh Solanky
4be4c53222 Release Khoj version 1.9.0 2024-04-05 17:13:58 +05:30
sabaimran
54db0152b9 Add link to the khoj cloud service for connection to Notion 2024-04-05 15:41:43 +05:30
sabaimran
81f1450c1c Update yarn.lock to sync with package.json for documentation 2024-04-05 15:36:23 +05:30
sabaimran
d22fd6dfe3 Get rid of unnecessary package-lock.json file 2024-04-05 15:34:02 +05:30
sabaimran
7d7ce92e46 Add updated information in docs about the Notion integration 2024-04-05 15:31:43 +05:30
sabaimran
2aedd3c819 Increase freq. of telemetry upload to every 5 minutes 2024-04-05 14:13:47 +05:30
sabaimran
3b1234d084 Await the calls to the db in the notion.py file 2024-04-05 13:58:14 +05:30
sabaimran
19c10b1418 Upgrade the package versions used in yarn.lock for the documentation project 2024-04-05 13:25:41 +05:30
sabaimran
00a67e9524 Add additional log lines when configuring the Notion settings for a user in the callback 2024-04-05 13:19:24 +05:30
sabaimran
d23f7da8e3 Handle the case where a previous serach model isn't set when updating the model 2024-04-05 13:18:51 +05:30
sabaimran
f57f9f672d
Address Notion, Image tech debt in indexing code path (#687)
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
2024-04-05 12:10:03 +05:30
sabaimran
69dee75c34 Update the readme for accuracy, updated demos 2024-04-04 10:57:24 +05:30
sabaimran
a60321b68e Push khoj to include inline references when possible 2024-04-04 10:31:13 +05:30
sabaimran
5bdcb4e69c Wait for location data to be returned before setting up the socket connection 2024-04-04 10:31:13 +05:30
Debanjum Singh Solanky
f01a12b1d2 Improve styling of chat sessions side panel
- Move green server connected dot to the bottom. Show status when
  disconnected from server
- Move "New conversation" button to right of the "Conversation" title
- Center alignment of the new conversation and connection status buttons
2024-04-04 01:43:26 +05:30
sabaimran
dd1e5e145a Use List[Any] for typing 2024-04-03 21:46:41 +05:30
sabaimran
b8087c4c8e Add typing to empty list variables in github_to_entries 2024-04-03 21:41:36 +05:30
sabaimran
d036fdfc26 If tree is not in the contents, then just return empty files list 2024-04-03 17:55:25 +05:30
Debanjum Singh Solanky
f915b2bd14 Fix passing model_name param to chatml formatter for online chat 2024-04-03 17:21:43 +05:30
sabaimran
6aa88761b8 Skip creating the default agent if there's no default conversation config 2024-04-03 17:21:01 +05:30
sabaimran
9c42c8be6b
Merge pull request #679 from khoj-ai/features/chat-socket-streaming
Add a websocket for streaming from the chat UI
2024-04-03 04:43:31 -07:00
sabaimran
b4f71e06b3 Add timeout after 10 minutes of inactivity on socket 2024-04-02 22:12:27 +05:30
sabaimran
f48426623d resolve merge conflict in chat.html 2024-04-02 17:29:48 +05:30
sabaimran
bf1187f465 Use new online/websearch logic and add agent to chat_metadata 2024-04-02 17:20:38 +05:30
sabaimran
867e1007d1 Remove superfluous newline 2024-04-02 17:20:08 +05:30
sabaimran
228ad68042 Merge with origin/master 2024-04-02 17:02:21 +05:30
sabaimran
776550d5ce Add a migration for updating the default chat model, update for existing users 2024-04-02 17:01:31 +05:30
sabaimran
47fc7e1ce6 Rebase with matser 2024-04-02 16:16:06 +05:30
Debanjum
215ab6e66a
Extract More Dates from entries to improve Date Filter (#683)
- Overview
  - Extract more structured date variants (e.g with dot(.) & slash(/) separators, 2-digit year)
  - Extract some natural, partial dates as well from entries
- Capability
  Add ability to extract the following additional date forms:
  - Natural Dates: 21st April 2000, February 29 2024
  - Partial Natural Dates: March 24, Mar 2024
  - Structured Dates: 20/12/24, 20.12.2024, 2024/12/20
  Note: Previously only YYYY-MM-DD ISO-8601 structured date form was extracted for date filters
- Performance
  Using regexes is MUCH faster than using the `dateparser' python library
  It's a little crude but gives acceptable performance for large datasets
2024-04-02 16:14:53 +05:30
Debanjum
3c3e48b18c
Migrate to Llama.cpp for Offline Chat (#680)
## Benefits
- Support all GGUF format chat models
- Support more GPUs like AMD, Nvidia, Mac, Vulcan (previously just Vulcan, Mac)
- Support more capabilities like larger context window, schema enforcement, speculative decoding etc.

## Changes
### Major
- Use llama.cpp for offline chat models
  - Support larger context window
  - Automatically apply appropriate chat template. So offline chat models not using llama2 format are now supported
  - Use better default offline chat model, NousResearch/Hermes-2-Pro-Mistral-7B
- Enable extract queries actor to improve notes search with offline chat
- Update documentation to use llama.cpp for offline chat in Khoj

### Minor
- Migrate to use NouseResearch's Hermes-2-Pro 7B as default offline chat model in khoj.yml
- Rename GPT4AllChatProcessor to OfflineChatProcessor Config, Model
- Only add location to image prompt generator when location known
2024-04-02 15:49:42 +05:30
Debanjum Singh Solanky
7afee2d55c Let offline chat model set context window. Improve, fix prompts 2024-03-31 16:19:35 +05:30
Debanjum Singh Solanky
4228965c9b Handle msg truncation when question is larger than max prompt size
Notice and truncate the question it self at this point
2024-03-31 15:50:06 +05:30
Debanjum Singh Solanky
c6487f2e48 Fix docs showing how to setup llama-cpp with Khoj 2024-03-31 15:36:40 +05:30
Debanjum Singh Solanky
886d49e3a4 Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat 2024-03-31 00:59:20 +05:30
Debanjum Singh Solanky
4f65dde201 Release Khoj version 1.8.0 2024-03-31 00:06:15 +05:30
sabaimran
c0e78fd56d Fix broken get-started documentation links 2024-03-30 15:05:12 +05:30
sabaimran
dd2a3f712b Add more demo videos, images, add feature sections 2024-03-30 14:48:46 +05:30
sabaimran
4cb91a042e Add an agents feature page, and clarification around custom domains 2024-03-30 14:20:46 +05:30
sabaimran
928f273bbe Configure production setup for moving to single worker model 2024-03-30 10:35:55 +05:30
Debanjum Singh Solanky
7923903d21 Improve date filter regexes to extract structured, natural, partial dates
- Much faster than using dateparser
  - It took 2x-4x for improved regex to extracts 1-15% more dates
  - Whereas It took 33x to 100x for dateparser to extract 65% - 400% more dates
  - Improve date extractor tests to test deduping dates, natural,
    structured date extraction from content

- Extract some natural, partial dates and more structured dates
  Using regex is much faster than using dateparser. It's a little
  crude but should pay off in performance.

  Supports dates of form:
  - (Day-of-Month) Month|AbbreviatedMonth Year|2DigitYear
  - Month|AbbreviatedMonth (Day-of-Month) Year|2DigitYear
2024-03-30 00:07:19 +05:30
Debanjum Singh Solanky
104eeea274 Extract natural language and locale specific dates in content
Previously we just extracted dates in YYYY-MM-DD format from content
for date filterings during search.

Use dateparser to extract dates across locales and natural language

This should improve notes returned as context when chat searches
knowledge base with date filters

Fallback to regex for date parsing from content if dateparser fails

- Limit natural date extractor capabilities to improve performance
  - Assume language is english
    Language detection otherwise takes a REALLY long time
  - Do not extract unix timestamps, timezone
    - This isn't required, as just using date and approximating dates as UTC
2024-03-30 00:06:56 +05:30
Debanjum Singh Solanky
90c5b3c410 Update stale Khoj pypi package metadata
Use latest License, Intended Audience and Dev Status
2024-03-29 00:06:55 +05:30
sabaimran
1195f843a3 Remove forward slash from the root agents endpoint 2024-03-28 23:06:55 +05:30
Debanjum Singh Solanky
a374288cea Use OIDC TrustedPublisher to publish khoj python package to PyPi 2024-03-28 22:58:36 +05:30
sabaimran
3417164ec2 Bump gunicorn workers up to 8 2024-03-28 22:34:13 +05:30
sabaimran
a1729b9b9e Add telemetry for agents used in conversation, increase image width in agents page 2024-03-28 22:18:11 +05:30