Commit graph

2514 commits

Author SHA1 Message Date
Debanjum Singh Solanky
8ff3890ba8 Dynamically generate navigation menu based on user info from server 2024-04-10 10:34:36 +05:30
Debanjum Singh Solanky
94c69eb8e3 Create API endpoint to get authenticated user information
This help clients render UI with user information
2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
377e979800 Make current chat expand to full width when session panel collapsed
This behavior also matches web client behavior on chat session panel
collapse
2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
913dcdfbcd Only render first run setup message once if error or server not running 2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
3b630841bd s/aget_all_filenames_by_source/get_all_filenames_by_source as sync func 2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
e45edbb992 Collapse navigation tabs into icons on mobile. Add spacing to them 2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
93edd5427f Add Chat navigation tab back to top pane on web client
Reduces user confusion on how to go to chat pane
Add emoji's for each tab to provide cleaner, iconified division between
the nav options
2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
8159d1ab25 Fix showing Search navigation tab from Agent pages on web client
The `has_documents' flag wasn't being passed. So the search tab
always showing up as empty instead of being dynamically enabled if
documents had been indexed.
2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
76cb543347 Show title bar in Khoj desktop app on Windows 2024-04-09 21:04:44 +05:30
sabaimran
312528d471 Fix typo in SECURE_PROXY_SSL_HEADER settings 2024-04-09 12:33:21 +05:30
sabaimran
e56c5e67dd Revert SSL Redirect setting as it prevents the admin page from loading 2024-04-09 12:24:48 +05:30
sabaimran
1770bb174b Add UUID to the KhojUser search fields and inc frequency of telemetry job to 2 mins 2024-04-09 11:51:51 +05:30
sabaimran
ab51ae9091 Use SECURE_SSL_REDIRECT to ensure requests are routed to https always 2024-04-09 10:18:12 +05:30
sabaimran
1c229dad91 Set daily limit for unsubsribed users to 5 in websocket API 2024-04-08 21:16:48 +05:30
sabaimran
27815d982c Redirect user to the login page when either of the csrf token inputs is missing 2024-04-08 20:22:17 +05:30
sabaimran
d257629f81 Handle case when properties field isn't present in the page 2024-04-08 16:15:47 +05:30
Debanjum
9b68062fa9
Add Sponsors Section to Readme 2024-04-08 03:09:24 -07:00
sabaimran
089e0d028b Add a more gracefull error message when the rate limit is exceeded 2024-04-08 15:20:54 +05:30
Debanjum
11ce3e2268
Update Text Chunking Strategy to Improve Search Context (#645)
## Major
- Parse markdown, org parent entries as single entry if fit within max tokens
- Parse a file as single entry if it fits with max token limits
- Add parent heading ancestry to extracted markdown entries for context
- Chunk text in preference order of para, sentence, word, character

## Minor
- Create wrapper function to get entries from org, md, pdf & text files
- Remove unused Entry to Jsonl converter from text to entry class, tests
- Dedupe code by using single func to process an org file into entries

Resolves #620
2024-04-08 13:56:38 +05:30
Debanjum Singh Solanky
9239c2c2ed Update drop large words test to ensure newlines considerd word boundary
Prevent regression to #620
2024-04-08 13:38:08 +05:30
Debanjum Singh Solanky
67b1178aec Remove debug logs generated while compiling org-mode entries 2024-04-08 13:01:24 +05:30
Debanjum
4eda79cc3a
Support using Python 3.12 with Khoj (#690)
### Why
- Python 3.12 is the default Python on Ubuntu 24.04 LTS, Windows and Mac via Homebrew
- Python 3.12 has a bunch of improvements that can be explored with Khoj (e.g per core GIL for performance)

## Changes
- The latest PyTorch now supports Python 3.12
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
  But it's an optional dependency, so only install it if python < 3.12

### Testing
- Verified Khoj installs fine on Windows and Mac with Python 3.12
- Verified Khoj chat works fine on Mac, Windows with Python 3.12

Resolves #522
2024-04-08 11:43:34 +05:30
sabaimran
731ad03348 Skip indexing commits that are missing properties 2024-04-07 15:19:07 +05:30
sabaimran
376eaf64cd Check if results are present in the pages or db response in Notion 2024-04-07 15:19:07 +05:30
Debanjum Singh Solanky
8222615280 Do not add original user message to knowledge search queries for offline chat
It's not required anymore. The extracted questions by the offline chat
model being used should be good enough.
2024-04-07 11:29:35 +05:30
Debanjum Singh Solanky
e3deb29f8e Upgrade khoj.el workflow to use Python 3.11 2024-04-07 11:24:07 +05:30
Debanjum Singh Solanky
14fbf594b2 Support using Python 3.12 with Khoj
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
  It's an optional dependency anyway, so only install it if python < 3.12
- Run unit tests with python version 3.12 as well

Resolves #522
2024-04-07 11:23:44 +05:30
sabaimran
86c831f7e2 Add a link to the data sources portion in the clients documentation 2024-04-07 09:32:58 +05:30
sabaimran
351fb31a34 Add webpage search to socket codepath, add a feature page for online search 2024-04-07 09:23:29 +05:30
Debanjum Singh Solanky
4be4c53222 Release Khoj version 1.9.0 2024-04-05 17:13:58 +05:30
sabaimran
54db0152b9 Add link to the khoj cloud service for connection to Notion 2024-04-05 15:41:43 +05:30
sabaimran
81f1450c1c Update yarn.lock to sync with package.json for documentation 2024-04-05 15:36:23 +05:30
sabaimran
d22fd6dfe3 Get rid of unnecessary package-lock.json file 2024-04-05 15:34:02 +05:30
sabaimran
7d7ce92e46 Add updated information in docs about the Notion integration 2024-04-05 15:31:43 +05:30
sabaimran
2aedd3c819 Increase freq. of telemetry upload to every 5 minutes 2024-04-05 14:13:47 +05:30
sabaimran
3b1234d084 Await the calls to the db in the notion.py file 2024-04-05 13:58:14 +05:30
sabaimran
19c10b1418 Upgrade the package versions used in yarn.lock for the documentation project 2024-04-05 13:25:41 +05:30
sabaimran
00a67e9524 Add additional log lines when configuring the Notion settings for a user in the callback 2024-04-05 13:19:24 +05:30
sabaimran
d23f7da8e3 Handle the case where a previous serach model isn't set when updating the model 2024-04-05 13:18:51 +05:30
sabaimran
f57f9f672d
Address Notion, Image tech debt in indexing code path (#687)
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
2024-04-05 12:10:03 +05:30
sabaimran
69dee75c34 Update the readme for accuracy, updated demos 2024-04-04 10:57:24 +05:30
sabaimran
a60321b68e Push khoj to include inline references when possible 2024-04-04 10:31:13 +05:30
sabaimran
5bdcb4e69c Wait for location data to be returned before setting up the socket connection 2024-04-04 10:31:13 +05:30
Debanjum Singh Solanky
00f599ea78 Fix passing flags to re.split to break org, md content by heading level
`re.MULTILINE' should be passed to the `flags' argument, not the
`max_splits' argument of the `re.split' func

This was messing up the indexing by only allowing a maximum of
re.MULTILINE splits. Fixing this improves the search quality to
previous state
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
32ac0622ff Extract dates from compiled text entries 2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
29c1c18042 Increase search distance to get relevant content for chat post indexer update
More content indexed per entry would result in an overall scores
lowering effect. Increase default search distance threshold to counter that

- Details
  - Fix expected results post indexing updates
  - Fix search with max distance post indexing updates

- Minor
  - Remove openai chat actor test for after: operator as it's not expected anymore
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
ad4fa4b2f4 Fix adding file path instead of stem to markdown entries 2024-04-04 02:41:55 +05:30
sabaimran
720139c3c1 Fix all unit tests for test_text_search 2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
44b3247869 Update logical splitting of org-mode text into entries
- Major
  - Do not split org file, entry if it fits within the max token limits
    - Recurse down org file entries, one heading level at a time until
      reach leaf node or the current parent tree fits context window
    - Update `process_single_org_file' func logic to do this recursion

  - Convert extracted org nodes with children into entries
    - Previously org node to entry code just had to handle leaf entries
    - Now it recieve list of org node trees
    - Only add ancestor path to root org-node of each tree
    - Indent each entry trees headings by +1 level from base level (=2)

- Minor
  - Stop timing org-node parsing vs org-node to entry conversion
    Just time the wrapping function for org-mode entry extraction
    This standardizes what is being timed across at md, org etc.
  - Move try/catch to `extract_org_nodes' from `parse_single_org_file'
    func to standardize this also across md, org
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
eaa27ca841 Only add spaces after heading if any tags in orgnode raw entry repr 2024-04-04 02:41:55 +05:30