sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-27 17:35:07 +01:00

Author	SHA1	Message	Date
sabaimran	6607e666dc	Increase rate limit for data upload packet size in indexer.py	2024-07-26 19:35:32 +05:30
Debanjum	498fe2458c	Support Gemma 2 Model Family for Offline Chat (#855 ) ## Overview - Gemma 2 is a new open model family by Google. They've released a 9B, 29B param model. A 2B model is also expected. - It performs really well on the Chatbot arena and shows good performance when testing within Khoj as well. - Llama.cpp support for Gemma 2 architecture seems to have stabilized - If Gemma 2 performs well in further testing, it can be made the default offline chat model for Khoj - Once the 2B param model is released, the model size to download can be automatically chosen based on (V)RAM available ## Major - Support Gemma 2 for Offline Chat - Improve and fix chat model prompts for better, consistent context ## Minor - Fix and improve offline chat actor, director tests - Improve offline chat truncation to consider chat message delimiter tokens	2024-07-23 06:57:02 -07:00
sabaimran	e694c82343	Fix Docker build issues with yarn / next /node (#859 ) * Rollback node version being installed from nodesource to node 20	2024-07-19 19:11:29 +05:30
sabaimran	1af9dbb083	Switch node/yarn install steps to use more native installation patterns	2024-07-19 17:10:08 +05:30
sabaimran	6d5ca5a3e1	yarn clean cache before build	2024-07-19 16:06:38 +05:30
sabaimran	7f0d1bd414	Add verbose logs when outputing yarn install steps	2024-07-19 15:48:43 +05:30
sabaimran	7426a4f819	Prefetch related agent when retrieving the conversation for performance improvements	2024-07-19 14:43:30 +05:30
Debanjum Singh Solanky	e9f86e320b	Fix and improve offline chat actor, director tests - Use updated references schema with compiled key - Enable director tests that are now expected to pass and that do pass (with Gemma 2 at least)	2024-07-18 03:43:09 +05:30
Debanjum Singh Solanky	b0ee78586c	Improve offline chat truncation to consider message separator tokens	2024-07-18 03:43:09 +05:30
Debanjum Singh Solanky	6f46e6afc6	Improve and fix chat model prompts for better, consistent context - Add day of week to system prompt of openai, anthropic, offline chat models - Pass more context to offline chat system prompt to - ask follow-up questions - know where to find information about khoj (itself) - Fix output mode selection prompt. Log error if model does not select valid option from list of valid output modes provided - Use consistent names for question, answers passed to extract_questions_offline prompt - Log which model extracts question, what the offline chat model sees as context. Similar to debug log shown for openai models	2024-07-18 03:43:09 +05:30
Debanjum Singh Solanky	53eabe0c06	Support Gemma 2 for Offline Chat - Pass system message as the first user chat message as Gemma 2 doesn't support system messages - Use gemma-2 chat format - Pass chat model name to generic, extract questions chat actors Used to figure out chat template to use for model For generic chat actor argument was anyway available but not being passed, which is confusing	2024-07-18 03:09:38 +05:30
Debanjum	2ab8fb78b1	Migrate the PyPI package to use project name: khoj (#853 ) ### Changes - Deprecate [khoj-assistant](https://pypi.org/project/khoj-assistant) pypi package. Use more accurate and succinct pypi project name, [khoj](https://pypi.org/project/khoj) - Update references to use `khoj` pypi package in docs and code - Update pypi workflow to publish to both khoj, khoj-assistant for now - Update stale python 3.9 support mentioned in our pyproject Can't support python 3.9 as depend on [Django 5.0.7](https://pypi.org/project/Django/5.0.7/) which needs python >=3.10 ### Verify - Updated `pypi.yml` github workflow publishes to both (new) [khoj](https://pypi.org/project/khoj/1.16.1.dev16/), (old) [khoj-assistant](https://pypi.org/project/khoj-assistant/1.16.1.dev16/) pypi projects - Can install Khoj python package with `pip install khoj`	2024-07-17 01:05:51 -07:00
Debanjum Singh Solanky	30d60aaae9	Add, fix Khoj Docker container labels	2024-07-17 10:41:17 +05:30
Debanjum Singh Solanky	583fa3c188	Migrate the pypi package to khoj project name. Update references - Deprecate khoj-assistant pypi package. Use more accurate and succinct pypi project name, khoj - Update references to sye khoj pypi package in docs and code instead of the legacy khoj-assistant pypi package - Update pypi workflow to publish to both khoj, khoj-assistant for now - Update stale python 3.9 support mentioned in our pyproject. Can't support python 3.9 as depend on latest django which support >=3.10	2024-07-17 10:41:16 +05:30
Debanjum	23f61d49e0	Support syncing, searching images from Obsidian plugin (#847 ) - Sync images from Obsidian vault with Khoj server now that Khoj can OCR images - Support rendering images returned by Khoj search modal	2024-07-14 20:41:39 -07:00
Debanjum Singh Solanky	02658ad4fd	Upgrade Django version	2024-07-11 16:35:10 +05:30
Debanjum Singh Solanky	cbae8b68fb	Add DB migration from making bi_encode configs optional in #834	2024-07-11 16:33:31 +05:30
Debanjum Singh Solanky	3a75838196	Add Keyboard shortcuts to navigate in Khoj Desktop	2024-07-11 16:29:53 +05:30
Debanjum Singh Solanky	6c1861b319	Improve the prompt to generate images with DALLE3 and SD3 - Major - Ask for prompt in prose - Remove seed from SD3 image generation to improve diversity of output for a given prompt Otherwise for conversations with similar sounding prompts, the images would be almost exactly the same. This maybe another indicator of SD3's inability to capture detailed instructions - Consistently use "prompt" wording instead of "query" in improved image generation prompts. Previously a mix of those terms were being used, which could confuse the chat model - Minor - Add day of week to prompt - Remove 2-5 sentence limit on instructions to SD3. It seems to be able to follow longer instructions just with less fidelity than DALLE. And the 2-5 sentence instruction limit wasn't being adhered to - Improve ability to edit, improve the image based on follow-up instructions by the user - Align prompts for DALLE and SD3. Only difference is to wrap text to be rendered in quotes for SD3. This improves it's ability to render requested text. DALLE cannot render text as well or consistently	2024-07-11 16:29:53 +05:30
Debanjum Singh Solanky	21fe1a917b	Support syncing, searching images from Obsidian plugin	2024-07-11 16:22:31 +05:30
sabaimran	260aa61818	Remove tests for python3.9	2024-07-09 12:28:11 +05:30
sabaimran	4471c1e37f	Apply mitigations for piling up open connections - Because we're using a FastAPI api framework with a Django ORM, we're running into some interesting conditions around connection pooling and clean-up. We're ending up with a large pile-up of open, stale connections to the DB recurringly when the server has been running for a while. To mitigate this problem, given starlette and django run in different python threads, add a middleware that will go and call the connection clean up method in each of the threads.	2024-07-09 12:22:58 +05:30
Debanjum	0b1b262512	Add system dependencies required by RapidOCR to fix Khoj Docker image (#842 ) - Issue The Khoj docker build would fail with `ImportError: libGL.so.1: cannot open shared object file: No such file or directory`. This was required by the Khoj RapidOCR python package dependency. - Fix A minimal set of system packages have been added to resolve this issue.	2024-07-08 22:16:16 +05:30
kxnarak	43413cd21f	add dependencies required by the RapidOCR python package	2024-07-08 18:26:19 +05:30
sabaimran	037e157648	Fix a variety of links	2024-07-08 16:49:13 +05:30
sabaimran	6b80bb3f37	Add a demo for the khoj mini application, minor updates to other pages, remove out of date demos page	2024-07-08 16:33:47 +05:30
Debanjum Singh Solanky	9e31ebff93	Release Khoj version 1.16.0	2024-07-07 18:26:10 +05:30
Debanjum Singh Solanky	54132efd67	Fix Khoj Obsidian plugin build	2024-07-07 18:26:10 +05:30
Debanjum Singh Solanky	510d9b3a29	Add short keys to open chat menu, new chat, search from Obsidian pane	2024-07-07 17:57:17 +05:30
Debanjum Singh Solanky	3e0c882e27	Transcribe only when keyboard shortcut or button pressed in Obsidian - Transcribe on holding Ctrl+s keyboard shortcut - Transcribe on holding the transcribe button pressed via mouse too - Make the transcribe button robust to inadvertent touches by using timeout - Do not transcribe, trigger auto-send on silences. Silence detection is super rudimentary, just blocks standard emanations by whisper when no speech	2024-07-07 17:57:17 +05:30
sabaimran	0eb000c3ea	Add health checks for the django ORM	2024-07-07 16:11:28 +05:30
Debanjum Singh Solanky	a31cd0dec1	Fix async batch delete of indexed entries	2024-07-06 22:45:26 +05:30
Debanjum	08b379c2ab	Fix, Improve Indexing, Deleting Files (#840 ) ### Fix - Fix degrade in speed when indexing large files - Resolve org-mode indexing bug by splitting current section only once by heading - Improve summarization by fixing formatting of text in indexed files ### Improve - Improve scaling user, admin flows to delete all entries for a user	2024-07-06 19:52:42 +05:30
Debanjum Singh Solanky	4a471979eb	Upgrade sentence-transformer package to version 3.0.1 Add einops dependency for some sentence transformer models like the nomic-embed	2024-07-06 19:35:59 +05:30
Debanjum Singh Solanky	d693baccbc	Make it optional to set the encoder, cross-encoder configs via admin UI	2024-07-06 19:35:59 +05:30
Debanjum Singh Solanky	1baebb8d0e	Identify markdown headings by any whitespace character after ^#+ Previously only markdown headings with space characters after # would be considered a heading. So ^##\t wouldn't be considered a valid heading	2024-07-06 19:35:59 +05:30
Debanjum Singh Solanky	010486fb36	Split current section once by heading to resolve org-mode indexing bug - Split once by heading (=first_non_empty) to extract current section body Otherwise child headings with same prefix as current heading will cause the section split to go into infinite loop - Also add check to prevent getting into recursive loop while trying to split entry into sub sections	2024-07-06 19:35:59 +05:30
Debanjum Singh Solanky	6a135b1ed7	Fix degrade in speed of indexing large files. Improve summarization Adding files to the DB for summarization was slow, buggy in two ways: - We were updating same text of modified files in DB = no of chunks per file times - The `" ".join(file_content)' code was breaking each character in the file content by a space. This formats the original file content incorrectly before storing in the DB Because this code ran in the main file indexing path, it was slowing down file indexing. Knowledge bases with larger files were impacted more strongly	2024-07-06 19:35:59 +05:30
Debanjum Singh Solanky	e6ffb6b52c	Improve scaling user flow to delete all entries - Delete entries by batch to improve efficiency of query at scale - Share code to delete all user entries between it's async, sync methods - Add indicator to show when files being deleted on web config page	2024-07-06 19:35:59 +05:30
Debanjum Singh Solanky	1ab59865b5	Improve scaling admin flow to delete all entries for user	2024-07-06 19:35:59 +05:30
Debanjum	05138cbd0a	Use DOM Scripting, Add CSP to Web config pages. Disable CSP in Obsidian plugin (#834 ) - Add CSP to web config pages. Load phone no. validation js, css from S3 - Construct config page elements on Web via DOM scripting - Disable CSP in Khoj Obsidian as it interferes with Obsidian functionality - Other miscellaneous voice message level improvements (rate limit, listening animation)	2024-07-06 19:30:09 +05:30
Debanjum Singh Solanky	9bdb48807b	Ratelimit text to speech model. Validate share chat url domain - Do not log auth error message on server when Resend setup as Magic links for sign-in are now supported	2024-07-06 12:53:19 +05:30
Debanjum Singh Solanky	b334db0fca	Add CSP to web config pages. Load phone no validation js, css from S3	2024-07-06 12:48:28 +05:30
Debanjum Singh Solanky	2f034f807a	Construct config page elements on Web via DOM scripting. Minimize isage of innerHTML to prevent DOM clobbering and unintended escape by user Input	2024-07-06 12:48:28 +05:30
Debanjum Singh Solanky	69c9e8cc08	Disable CSP in Khoj Obsidian as it interferes with Obsidian functionality The Khoj CSP interferes with other Obsidian features and plugins as CSP is applied page wide. For now chat message sanitization via Dompurify should suffice. Enable CSP when can scope it to only the Khoj Obsidian plugin.	2024-07-05 16:10:08 +05:30
Debanjum Singh Solanky	a353d883a0	Make it optional to set the encoder, cross-encoder configs via admin UI Upgrade sentence-transformer, add einops dependency for some sentence transformer models like nomic	2024-07-05 16:09:30 +05:30
Debanjum Singh Solanky	6d59ad7fc9	Add listening circle animation to speak button in Obsidian plugin Use icon active focus as color of animation button	2024-07-05 14:00:53 +05:30
Debanjum Singh Solanky	516af86575	Fix add, remove of the text to speech loader element in Obsidian	2024-07-04 17:38:45 +05:30
Debanjum Singh Solanky	814aca6d69	Skip summarize when not triggered via slash cmd and can't summarize Maybe better to fallback to non-summarize behavior if summarize intent is just inferred but we can't actually summarize because the single file added to conversation isn't satisfied	2024-07-04 13:31:00 +05:30
Debanjum	4446de00d3	Enable Voice, Keyboard Shortcuts in Khoj Obsidian Plugin (#837 ) - Simplify quick jump between Khoj side pane and main editor view using keyboard shortcuts - Enable voice chat in Obsidian to make interactions with Khoj more seamless	2024-07-04 13:28:29 +05:30

1 2 3 4 5 ...

2992 commits