sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-23 23:48:56 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	a60baa55fb	Upgrade Django, a Khoj server dependency, to version 5.0.8	2024-08-20 12:32:00 -07:00
Debanjum Singh Solanky	acdc3f9470	Unwrap any json in md code block, when parsing chat actor responses This is a more robust way to extract json output requested from gemma-2 (2B, 9B) models which tend to return json in md codeblocks. Other models should remain unaffected by this change. Also removed request to not wrap json in codeblocks from prompts. As code is doing the unwrapping automatically now, when present	2024-08-16 14:16:29 -05:00
Debanjum Singh Solanky	1cdfa8087c	Update Khoj tagline to "Your Second Brain"	2024-08-05 02:27:05 +05:30
Debanjum Singh Solanky	53eabe0c06	Support Gemma 2 for Offline Chat - Pass system message as the first user chat message as Gemma 2 doesn't support system messages - Use gemma-2 chat format - Pass chat model name to generic, extract questions chat actors Used to figure out chat template to use for model For generic chat actor argument was anyway available but not being passed, which is confusing	2024-07-18 03:09:38 +05:30
Debanjum Singh Solanky	583fa3c188	Migrate the pypi package to khoj project name. Update references - Deprecate khoj-assistant pypi package. Use more accurate and succinct pypi project name, khoj - Update references to sye khoj pypi package in docs and code instead of the legacy khoj-assistant pypi package - Update pypi workflow to publish to both khoj, khoj-assistant for now - Update stale python 3.9 support mentioned in our pyproject. Can't support python 3.9 as depend on latest django which support >=3.10	2024-07-17 10:41:16 +05:30
Debanjum Singh Solanky	02658ad4fd	Upgrade Django version	2024-07-11 16:35:10 +05:30
sabaimran	260aa61818	Remove tests for python3.9	2024-07-09 12:28:11 +05:30
sabaimran	4471c1e37f	Apply mitigations for piling up open connections - Because we're using a FastAPI api framework with a Django ORM, we're running into some interesting conditions around connection pooling and clean-up. We're ending up with a large pile-up of open, stale connections to the DB recurringly when the server has been running for a while. To mitigate this problem, given starlette and django run in different python threads, add a middleware that will go and call the connection clean up method in each of the threads.	2024-07-09 12:22:58 +05:30
Debanjum Singh Solanky	a353d883a0	Make it optional to set the encoder, cross-encoder configs via admin UI Upgrade sentence-transformer, add einops dependency for some sentence transformer models like nomic	2024-07-05 16:09:30 +05:30
Debanjum Singh Solanky	0d04018622	Install pydantic with optional email validator package Otherwise Khoj fails on startup. Not sure why, must be new changes to pydantic?	2024-06-24 16:12:20 +05:30
Debanjum Singh Solanky	22f6db0a6b	Upgrade RapidOCR and enable for Python 3.12. Fix PDF OCR test	2024-06-22 16:01:55 +05:30
Debanjum Singh Solanky	55a23eae25	Upgrade pillow to fix pytest workflow failure	2024-06-22 15:17:43 +05:30
Raghav Tirumale	bd3b590153	Support Indexing Docx Files (#801 ) * Add support for indexing docx files and associated unit tests --------- Co-authored-by: sabaimran <narmiabas@gmail.com>	2024-06-20 11:18:01 +05:30
sabaimran	a57e1e7a14	Fix langchain, tenacity versions	2024-06-17 14:52:11 +05:30
sabaimran	ce9c14f894	Fix more packages related to langchain in the pyproject.toml	2024-06-17 14:38:05 +05:30
Debanjum Singh Solanky	179c70dba8	Upgrade Khoj llama-cpp, django and jinja dependencies	2024-06-04 09:05:53 +05:30
sabaimran	4aac84e1c1	Pin rsesend verison in pyproject.toml	2024-05-30 07:05:11 +05:30
sabaimran	01cdc54ad0	Add support for Anthropic models (#760 ) * Add support for chatting with Anthropic's suite of models - Had to use a custom class because there was enough nuance with how the anthropic SDK works that it would be better to simply separate out the logic. The extract questions flow needed modification of the system prompt in order to work as intended with the haiku model	2024-05-26 22:50:34 +05:30
sabaimran	0b7910d4af	Pin th elangchain-community version explicitly	2024-05-21 05:26:17 -05:00
sabaimran	2b8e5a86cc	Update version for resent library in pyproject.toml	2024-05-09 13:43:27 -07:00
sabaimran	eb65532386	Use Django ap scheduler in place of the sqlalchemy one	2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky	230d160602	Improve rendering task scheduled settings view and message - Render crontime string in natural language in message & settings UI - Show more fields in tasks web config UI - Add link to the tasks settings page in task scheduled chat response - Improve task variables names Rename executing_query to query_to_run. scheduling_query to scheduling_request	2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky	3ce06a938c	Render scheduled task response as html to improve readability in email	2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky	c17dbbeb92	Render next run time in user timezone in config, chat UIs - Pass timezone string from ipapi to khoj via clients - Pass this data from web, desktop and obsidian clients to server - Use user tz to render next run time of scheduled task in user tz	2024-05-01 08:30:10 +05:30
Debanjum Singh Solanky	c11742f443	Add chat actor to schedule run query for user at specified times - Detect when user intends to schedule a task, aka reminder Add new output mode: reminder. Add example of selecting the reminder output mode - Extract schedule time (as cron timestring) and inferred query to run from user message - Use APScheduler to call chat with inferred query at scheduled time - Handle reminder scheduling from both websocket and http chat requests - Support constructing scheduled task using chat history as context Pass chat history to scheduled query generator for improved context for scheduled task generation	2024-05-01 08:28:59 +05:30
Debanjum	17a06f152c	Support Llama 3 and Improve Offline Chat Actors (#724 ) - Add support for Llama 3 in Khoj offline mode - Make chat actors generate valid json with more local models - Fix offline chat actor tests	2024-04-25 14:00:56 +05:30
Debanjum Singh Solanky	89ef23de50	Upgrade gunicorn and make it only a production dependency	2024-04-24 11:28:55 +05:30
Debanjum Singh Solanky	a2e4e4bede	Add support for Llama 3 in Khoj offline mode - Improve extract question prompts to explicitly request JSON list - Use llama-3 chat format if HF repo_id mentions llama-3. The llama-cpp-python logic for detecting when to use llama-3 chat format isn't robust enough currently	2024-04-24 09:40:00 +05:30
sabaimran	46210695b6	pin version of huggingface hub explicitly to ensure relevant constants are present. Closes #708	2024-04-17 01:09:36 +05:30
Debanjum Singh Solanky	4977b55106	Use offline chat prompt config to set context window of loaded chat model Previously you couldn't configure the n_ctx of the loaded offline chat model. This made it hard to use good offline chat model (which these days also have larger context) on machines with lower VRAM	2024-04-14 02:35:36 +05:30
Debanjum	9a48f72041	Index more text file types from Desktop, Github (#692 ) ### Index more text file types - Index all text, code files in Github repos. Not just md, org files - Send more text file types from Desktop app and improve indexing them - Identify file type by content & allow server to index all text files ### Deprecate Github Indexing Features - Stop indexing commits, issues and issue comments in a Github repo - Skip indexing Github repo on hitting Github API rate limit ### Fixes and Improvements - Fix indexing files in sub-folders from Desktop app - Standardize structure of text to entries to match other entry processors	2024-04-12 00:08:29 +05:30
sabaimran	3fe94a67b0	Send welcome emails when a new user signs up (#691 ) * Don't trigger any re-indexing on server initailization * Integrate Resend to send welcome emails when a new user signs up - Only send if this is the first time they've signed in - Configure welcome email with basic styling, as more complex designs don't work and style tag did not work	2024-04-10 19:57:33 +05:30
Debanjum Singh Solanky	89915dcb4c	Identify file type by content & allow server to index all text files - Use Magika's AI for a tiny, portable and better file type identification system - Existing file type identification tools like `file' and `magic' require system level packages, that may not be installed by default on all operating systems (e.g `file' command on Windows)	2024-04-09 20:19:39 +05:30
Debanjum Singh Solanky	14fbf594b2	Support using Python 3.12 with Khoj - RapidOCR for indexing image PDFs doesn't currently support python 3.12. It's an optional dependency anyway, so only install it if python < 3.12 - Run unit tests with python version 3.12 as well Resolves #522	2024-04-07 11:23:44 +05:30
sabaimran	47fc7e1ce6	Rebase with matser	2024-04-02 16:16:06 +05:30
Debanjum Singh Solanky	886d49e3a4	Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat	2024-03-31 00:59:20 +05:30
Debanjum Singh Solanky	90c5b3c410	Update stale Khoj pypi package metadata Use latest License, Intended Audience and Dev Status	2024-03-29 00:06:55 +05:30
sabaimran	56da96b2e9	Increase minimum python required in the pyproject, use python 3.11 for building the wheel in the workflow	2024-03-28 12:19:07 +05:30
Debanjum Singh Solanky	8ca39a436c	Use llama.cpp for offline chat models - Benefits of moving to llama-cpp-python from gpt4all: - Support for all GGUF format chat models - Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac) - Supports models with more capabilities like tools, schema enforcement, speculative ddecoding, image gen etc. - Upgrade default chat model, prompt size, tokenizer for new supported chat models - Load offline chat model when present on disk without requiring internet - Load model onto GPU if not disabled and device has GPU - Load model onto CPU if loading model onto GPU fails - Create helper function to check and load model from disk, when model glob is present on disk. `Llama.from_pretrained' needs internet to get repo info from HuggingFace. This isn't required, if the model is already downloaded Didn't find any existing HF or llama.cpp method that looked for model glob on disk without internet	2024-03-26 22:33:01 +05:30
sabaimran	36af9776e6	Add the websockets dependency to pyproject.toml	2024-03-20 14:11:18 +05:30
Debanjum Singh Solanky	8cdfaf41ec	Update project URLs to show on pypi project page	2024-03-15 04:03:39 +05:30
Debanjum	3abe7ccb26	Improve Online Search Speed and Context (#670 ) ### Major - Read web pages in parallel to improve chat response time - Read web pages directly when Olostep proxy not setup - Include search results & web page content in online context for chat response ### Minor - Simplify, modularize and add type hints to online search functions	2024-03-11 22:16:30 +05:30
Debanjum Singh Solanky	88f096977b	Read webpages directly when Olostep proxy not setup This is useful for self-hosted, individual user, low traffic setups where a proxy service is not required	2024-03-11 18:41:02 +05:30
Debanjum Singh Solanky	1105d8814f	Use cross-encoder to rerank search results by default on GPU machines Latest sentence-transformer package uses GPU for cross-encoder. This makes it fast enough to enable reranking on machines with GPU. Enabling search reranking by default allows (at least) users with GPUs to side-step learning the UI affordance to rerank results (i.e hitting Cmd/Ctrl-Enter or ENTER).	2024-03-10 14:29:21 +05:30
sabaimran	81beb7940c	Upload generated images to s3, if AWS credentials and bucket is available (#667 ) * Upload generated images to s3, if AWS credentials and bucket is available. - In clients, render the images via the URL if it's returned with a text-to-image2 intent type * Make the loading screen more intuitve, less jerky and update the programmatic copy button * Update the loading icon when waiting for a chat response	2024-03-08 10:54:13 +05:30
Debanjum Singh Solanky	4696577636	Upgrade python dependencies	2024-02-16 17:41:09 +05:30
Debanjum Singh Solanky	e21a8530f3	Move used python packages for test into dev dependency group The test dependency group was being used independently	2024-02-16 17:41:09 +05:30
Debanjum Singh Solanky	cf4a524988	Move production dependencies to prod python packages group This will reduce khoj dependencies to install for self-hosting users - Move auth production dependencies to prod python packages group - Only enable authentication API router if not in anonymous mode - Improve error with requirements to enable authentication when not in anonymous mode	2024-02-16 17:41:08 +05:30
sabaimran	208ccc83ec	Fix version of gpt4all to 2.1.0 as it's not backwards compatible	2024-02-10 09:32:04 +05:30
Debanjum	d1bfb245df	Improve Khoj Chat and Settings UI (#630 ) * Fix license in pyproject.toml. Remove unused utils.state import * Use single debug mode check function. Disable telemetry in debug mode - Use single logic to check if khoj is running in debug mode. Previously there were 3 different variants of the check - Do not log telemetry if KHOJ_DEBUG is set to true. Previously didn't log telemetry even if KHOJ_DEBUG set to false * Respect line breaks in user, khoj chat messages to improve formatting * Disable Whatsapp config section on web client if Twilio not configured Simplify Whatsapp configuration status checking js by standardizing external input to lower case * Disable Phone API when Twilio not setup and rate limit calls to it - Move phone api to separate router and only enable it if Twilio enabled - Add rate-limiting to OTP and verification calls * Add slugs for phone rate limiting --------- Co-authored-by: sabaimran <narmiabas@gmail.com>	2024-01-29 18:03:43 +05:30

1 2 3

131 commits