sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-27 17:35:07 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	8ca39a436c	Use llama.cpp for offline chat models - Benefits of moving to llama-cpp-python from gpt4all: - Support for all GGUF format chat models - Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac) - Supports models with more capabilities like tools, schema enforcement, speculative ddecoding, image gen etc. - Upgrade default chat model, prompt size, tokenizer for new supported chat models - Load offline chat model when present on disk without requiring internet - Load model onto GPU if not disabled and device has GPU - Load model onto CPU if loading model onto GPU fails - Create helper function to check and load model from disk, when model glob is present on disk. `Llama.from_pretrained' needs internet to get repo info from HuggingFace. This isn't required, if the model is already downloaded Didn't find any existing HF or llama.cpp method that looked for model glob on disk without internet	2024-03-26 22:33:01 +05:30
sabaimran	e323a6d69b	Include additional user context in the image generation flow (#660 ) * Make major improvements to the image generation flow - Include user context from online references and personal notes for generating images - Dynamically select the modality that the LLM should respond with - Retun the inferred context in the query response for the dekstop, web chat views to read * Add unit tests for retrieving response modes via LLM * Move output mode unit tests to the actor suite, rather than director * Only show the references button if there is at least one available * Rename aget_relevant_modes to aget_relevant_output_modes * Use a shared method for generating reference sections, simplify some of the prompting logic * Make out of space errors in the desktop client more obvious	2024-03-06 13:48:41 +05:30
Debanjum Singh Solanky	0d949140f4	Fix actor, director tests using freeze time by ignoring transformers package transformers package was causing freeze time to fail during setup	2024-02-06 03:00:48 +05:30
sabaimran	79913d4c17	Add isort to the pre-commit configuration and apply it to the whole project (#595 ) * Apply isort to the entire repository * Fix missing import issues in text_to_entries * Fix imports in migration files	2023-12-28 18:04:02 +05:30
Debanjum Singh Solanky	a0a7ab7ec8	Rename conversation.gpt4all package to conversation.offline	2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky	0f1ebcae18	Upgrade to latest GPT4All. Use Mistral as default offline chat model GPT4all now supports gguf llama.cpp chat models. Latest GPT4All (+mistral) performs much at least 3x faster. On Macbook Pro at ~10s response start time vs 30s-120s earlier. Mistral is also a better chat model, although it hallucinates more than llama-2	2023-10-22 19:04:23 -07:00
Debanjum Singh Solanky	1a9023d396	Update Chat Actor test to not incept with prior world knowledge	2023-10-15 17:22:44 -07:00
Debanjum Singh Solanky	56bd69d5af	Improve Llama v2 extract questions actor and associated prompt - Format extract questions prompt format with newlines and whitespaces - Make llama v2 extract questions prompt consistent - Remove empty questions extracted by offline extract_questions actor - Update implicit qs extraction unit test for offline search actor	2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky	13b16a4364	Use default Llama 2 supported by GPT4All Remove custom logic to download custom Llama 2 model. This was added as GPT4All didn't support Llama 2 when it was added to Khoj	2023-10-03 19:01:54 -07:00
sabaimran	343854752c	Improve docker builds for local hosting (#476 ) * Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions * Move configure_search method into indexer * Add conditional installation for gpt4all * Add hint to go to localhost:42110 in the docs. Addresses #477	2023-09-08 17:07:26 -07:00
Debanjum Singh Solanky	95acb1583d	Update local Chat Actor and Director tests expected to fail	2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky	c2b7a14ed5	Fix context, response size for Llama 2 to stay within max token limits Create regression text to ensure it does not throw the prompt size exceeded context window error	2023-08-01 20:52:00 -07:00
sabaimran	d8fa967b43	Update chat actor unit tests for greater accuracy and benchmarking	2023-08-01 12:24:43 -07:00
sabaimran	124d97c26d	Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352 ) * Working example with LlamaV2 running locally on my machine - Download from huggingface - Plug in to GPT4All - Update prompts to fit the llama format * Add appropriate prompts for extracting questions based on a query based on llama format * Rename Falcon to Llama and make some improvements to the extract_questions flow * Do further tuning to extract question prompts and unit tests * Disable extracting questions dynamically from Llama, as results are still unreliable	2023-07-27 20:51:20 -07:00
sabaimran	8b2af0b5ef	Add support for our first Local LLM 🤖🏠 (#330 ) * Add support for gpt4all's falcon model as an additional conversation processor - Update the UI pages to allow the user to point to the new endpoints for GPT - Update the internal schemas to support both GPT4 models and OpenAI - Add unit tests benchmarking some of the Falcon performance * Add exc_info to include stack trace in error logs for text processors * Pull shared functions into utils.py to be used across gpt4 and gpt * Add migration for new processor conversation schema * Skip GPT4All actor tests due to typing issues * Fix Obsidian processor configuration in auto-configure flow * Rename enable_local_llm to enable_offline_chat	2023-07-26 16:27:08 -07:00

15 commits