sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-19 02:57:10 +00:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	377e979800	Make current chat expand to full width when session panel collapsed This behavior also matches web client behavior on chat session panel collapse	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	913dcdfbcd	Only render first run setup message once if error or server not running	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	3b630841bd	s/aget_all_filenames_by_source/get_all_filenames_by_source as sync func	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	e45edbb992	Collapse navigation tabs into icons on mobile. Add spacing to them	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	93edd5427f	Add Chat navigation tab back to top pane on web client Reduces user confusion on how to go to chat pane Add emoji's for each tab to provide cleaner, iconified division between the nav options	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	8159d1ab25	Fix showing Search navigation tab from Agent pages on web client The `has_documents' flag wasn't being passed. So the search tab always showing up as empty instead of being dynamically enabled if documents had been indexed.	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	76cb543347	Show title bar in Khoj desktop app on Windows	2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky	f040418cf1	Fix indexing files in sub-folders on the Desktop app - `fs.readdir' func in node version 18.18.2 has buggy `recursive' option See nodejs/node#48640, effect-ts/effect#1801 for details - We were recursing down a folder in two ways on the Desktop app. Remove `recursive: True' option to the `fs.readdirSync' method call to recurse down via app code only	2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky	a8dec1c9d5	Index all text, code files in Github repos. Not just md, org files	2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky	8291b898ca	Standardize structure of text to entries to match other entry processors Add process_single_plaintext_file func etc with similar signatures as org_to_entries and markdown_to_entries processors The standardization makes modifications, abstractions easier to create	2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky	079f409238	Skip indexing Github repo on hitting Github API rate limit Sleep until rate limit passed is too expensive, as it keeps a app worker occupied. Ideally we should schedule job to contine after rate limit wait time has passed. But this can only be added once we support jobs scheduling.	2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky	d5c9b5cb32	Stop indexing commits, issues and issue comments in Github indexer Normal indexing quickly Github hits rate limits. Purpose of exposing Github indexer is for indexing content like notes, code and other knowledge base in a repo. The current indexer doesn't scale to index metadata given Github's rate limits, so remove it instead of giving a degraded experience of partially indexed repos	2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky	7ff1bd9f8b	Send more text file types from Desktop app and improve indexing them - Allow syncing more file types from desktop app to index on server - Use `file-type' package to identify valid text file types on Desktop app - Split plaintext entries into smaller logical units than a whole file Since the text splitting upgrades in #645, compiled chunks have more logical splits like paragraph, sentence. Show those (potentially) smaller snippets to the user as references - Tangential Fix: Initialize unbound currentTime variable for error log timestamp	2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky	89915dcb4c	Identify file type by content & allow server to index all text files - Use Magika's AI for a tiny, portable and better file type identification system - Existing file type identification tools like `file' and `magic' require system level packages, that may not be installed by default on all operating systems (e.g `file' command on Windows)	2024-04-09 20:19:39 +05:30
sabaimran	312528d471	Fix typo in SECURE_PROXY_SSL_HEADER settings	2024-04-09 12:33:21 +05:30
sabaimran	e56c5e67dd	Revert SSL Redirect setting as it prevents the admin page from loading	2024-04-09 12:24:48 +05:30
sabaimran	1770bb174b	Add UUID to the KhojUser search fields and inc frequency of telemetry job to 2 mins	2024-04-09 11:51:51 +05:30
sabaimran	ab51ae9091	Use SECURE_SSL_REDIRECT to ensure requests are routed to https always	2024-04-09 10:18:12 +05:30
sabaimran	1c229dad91	Set daily limit for unsubsribed users to 5 in websocket API	2024-04-08 21:16:48 +05:30
sabaimran	27815d982c	Redirect user to the login page when either of the csrf token inputs is missing	2024-04-08 20:22:17 +05:30
sabaimran	d257629f81	Handle case when properties field isn't present in the page	2024-04-08 16:15:47 +05:30
Debanjum	9b68062fa9	Add Sponsors Section to Readme	2024-04-08 03:09:24 -07:00
sabaimran	089e0d028b	Add a more gracefull error message when the rate limit is exceeded	2024-04-08 15:20:54 +05:30
Debanjum	11ce3e2268	Update Text Chunking Strategy to Improve Search Context (#645 ) ## Major - Parse markdown, org parent entries as single entry if fit within max tokens - Parse a file as single entry if it fits with max token limits - Add parent heading ancestry to extracted markdown entries for context - Chunk text in preference order of para, sentence, word, character ## Minor - Create wrapper function to get entries from org, md, pdf & text files - Remove unused Entry to Jsonl converter from text to entry class, tests - Dedupe code by using single func to process an org file into entries Resolves #620	2024-04-08 13:56:38 +05:30
Debanjum Singh Solanky	9239c2c2ed	Update drop large words test to ensure newlines considerd word boundary Prevent regression to #620	2024-04-08 13:38:08 +05:30
Debanjum Singh Solanky	67b1178aec	Remove debug logs generated while compiling org-mode entries	2024-04-08 13:01:24 +05:30
Debanjum	4eda79cc3a	Support using Python 3.12 with Khoj (#690 ) ### Why - Python 3.12 is the default Python on Ubuntu 24.04 LTS, Windows and Mac via Homebrew - Python 3.12 has a bunch of improvements that can be explored with Khoj (e.g per core GIL for performance) ## Changes - The latest PyTorch now supports Python 3.12 - RapidOCR for indexing image PDFs doesn't currently support python 3.12. But it's an optional dependency, so only install it if python < 3.12 ### Testing - Verified Khoj installs fine on Windows and Mac with Python 3.12 - Verified Khoj chat works fine on Mac, Windows with Python 3.12 Resolves #522	2024-04-08 11:43:34 +05:30
sabaimran	731ad03348	Skip indexing commits that are missing properties	2024-04-07 15:19:07 +05:30
sabaimran	376eaf64cd	Check if results are present in the pages or db response in Notion	2024-04-07 15:19:07 +05:30
Debanjum Singh Solanky	8222615280	Do not add original user message to knowledge search queries for offline chat It's not required anymore. The extracted questions by the offline chat model being used should be good enough.	2024-04-07 11:29:35 +05:30
Debanjum Singh Solanky	e3deb29f8e	Upgrade khoj.el workflow to use Python 3.11	2024-04-07 11:24:07 +05:30
Debanjum Singh Solanky	14fbf594b2	Support using Python 3.12 with Khoj - RapidOCR for indexing image PDFs doesn't currently support python 3.12. It's an optional dependency anyway, so only install it if python < 3.12 - Run unit tests with python version 3.12 as well Resolves #522	2024-04-07 11:23:44 +05:30
sabaimran	86c831f7e2	Add a link to the data sources portion in the clients documentation	2024-04-07 09:32:58 +05:30
sabaimran	351fb31a34	Add webpage search to socket codepath, add a feature page for online search	2024-04-07 09:23:29 +05:30
Debanjum Singh Solanky	4be4c53222	Release Khoj version 1.9.0	2024-04-05 17:13:58 +05:30
sabaimran	54db0152b9	Add link to the khoj cloud service for connection to Notion	2024-04-05 15:41:43 +05:30
sabaimran	81f1450c1c	Update yarn.lock to sync with package.json for documentation	2024-04-05 15:36:23 +05:30
sabaimran	d22fd6dfe3	Get rid of unnecessary package-lock.json file	2024-04-05 15:34:02 +05:30
sabaimran	7d7ce92e46	Add updated information in docs about the Notion integration	2024-04-05 15:31:43 +05:30
sabaimran	2aedd3c819	Increase freq. of telemetry upload to every 5 minutes	2024-04-05 14:13:47 +05:30
sabaimran	3b1234d084	Await the calls to the db in the notion.py file	2024-04-05 13:58:14 +05:30
sabaimran	19c10b1418	Upgrade the package versions used in yarn.lock for the documentation project	2024-04-05 13:25:41 +05:30
sabaimran	00a67e9524	Add additional log lines when configuring the Notion settings for a user in the callback	2024-04-05 13:19:24 +05:30
sabaimran	d23f7da8e3	Handle the case where a previous serach model isn't set when updating the model	2024-04-05 13:18:51 +05:30
sabaimran	f57f9f672d	Address Notion, Image tech debt in indexing code path (#687 ) * Add support for using OAuth2.0 in the Notion integration * Add notion to the admin page * Remove unnecessary content_index and image search/setup references * Trigger background job to start indexing Notion after user configures it * Add a log line when a new Notion integration is setup * Fix references to the configure_content methods	2024-04-05 12:10:03 +05:30
sabaimran	69dee75c34	Update the readme for accuracy, updated demos	2024-04-04 10:57:24 +05:30
sabaimran	a60321b68e	Push khoj to include inline references when possible	2024-04-04 10:31:13 +05:30
sabaimran	5bdcb4e69c	Wait for location data to be returned before setting up the socket connection	2024-04-04 10:31:13 +05:30
Debanjum Singh Solanky	00f599ea78	Fix passing flags to re.split to break org, md content by heading level `re.MULTILINE' should be passed to the `flags' argument, not the `max_splits' argument of the `re.split' func This was messing up the indexing by only allowing a maximum of re.MULTILINE splits. Fixing this improves the search quality to previous state	2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky	32ac0622ff	Extract dates from compiled text entries	2024-04-04 02:41:55 +05:30

... 24 25 26 27 28 ...

3769 commits