sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-18 10:37:11 +00:00

Author	SHA1	Message	Date
Debanjum	fc6be543bd	Improve GPQA eval prompt to imrpove parsing answer from Khoj response Some checks are pending build and deploy github pages for documentation / deploy (push) Waiting to run Details pre-commit / Setup Application and Lint (push) Waiting to run Details	2024-11-30 17:21:09 -08:00
Debanjum	29e801c381	Add MATH500 dataset to eval Evaluate simpler MATH500 responses with gemini 1.5 flash This improves both the speed and cost of running this eval	2024-11-28 12:48:25 -08:00
Debanjum	22aef9bf53	Add GPQA (diamond) dataset to eval	2024-11-28 12:48:25 -08:00
Debanjum	70b7e7c73a	Improve load of complex json objects. Use it to pick tool, run code Gemini doesn't work well when trying to output json objects. Using it to output raw json strings with complex, multi-line structures requires more intense clean-up of raw json string for parsing	2024-11-26 17:37:57 -08:00
Debanjum	ed364fa90e	Track running costs & accuracy of eval runs in progress Collect, display and store running costs & accuracy of eval run. This provides more insight into eval runs during execution instead of having to wait until the eval run completes.	2024-11-20 12:40:51 -08:00
Debanjum	45c623f95c	Dedupe, organize chat actor, director tests - Move Chat actor tests that were previously in chat director tests file - Dedupe online, offline io selector chat actor tests	2024-11-18 16:10:50 -08:00
Debanjum	2a76c69d0d	Run online, offine chat actor, director tests for any supported provider - Previously online chat actors, director tests only worked with openai. This change allows running them for any supported onlnie provider including Google, Anthropic and Openai. - Enable online/offline chat actor, director in two ways: 1. Explicitly setting KHOJ_TEST_CHAT_PROVIDER environment variable to google, anthropic, openai, offline 2. Implicitly by the first API key found from openai, google or anthropic. - Default offline chat provider to use Llama 3.1 3B for faster, lower compute test runs	2024-11-18 15:11:37 -08:00
Debanjum	653127bf1d	Improve data source, output mode selection - Set output mode to single string. Specify output schema in prompt - Both thesee should encourage model to select only 1 output mode instead of encouraging it in prompt too many times - Output schema should also improve schema following in general - Standardize variable, func name of io selector for readability - Fix chat actors to test the io selector chat actor - Make chat actor return sources, output separately for better disambiguation, at least during tests, for now	2024-11-18 15:11:37 -08:00
Debanjum	a2ccf6f59f	Fix github workflow to start Khoj, connect to PG and upload results - Do not trigger tests to run in ci on update to evals	2024-11-18 04:25:15 -08:00
Debanjum	7c0fd71bfd	Add GitHub workflow to quiz Khoj across modes and specified evals (#982 ) - Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across general, default and research modes - Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models - Trigger eval workflow on release or manually - Make dataset, khoj mode and sample size configurable when triggered via manual workflow - Enable Web search, webpage read tools during evaluation	2024-11-18 02:19:30 -08:00
sabaimran	0eba6ce315	When diagram generation fails, save to conversation log - Update tool name when choosing tools to execute	2024-11-17 13:23:12 -08:00
sabaimran	7e662a05f8	Merge branch 'master' of github.com:khoj-ai/khoj into features/improve-tool-selection	2024-11-17 12:26:55 -08:00
Debanjum	41d9011a26	Move evaluation script into tests/evals directory This should give more space for eval scripts, results and readme	2024-11-17 02:08:20 -08:00
Debanjum	d9d5884958	Enable evaluating Khoj on the OpenAI SimpleQA bench using eval script - Just load the raw csv from OpenAI bucket. Normalize it into FRAMES format - Improve docstring for frames datasets as well - Log the load dataset perf timer at info level	2024-11-17 02:08:20 -08:00
Debanjum	eb5bc6d9eb	Remove Talc search bench from Khoj eval script	2024-11-17 02:08:20 -08:00
sabaimran	c77dc84a68	Remove output_modes function reference in chat tests	2024-11-15 14:03:07 -08:00
Debanjum	9fc44f1a7f	Enable evaluation Khoj on the Talc Search Bench using Eval script - Just load the raw jsonl from Github and normalize it into FRAMES format - Color printed accuracy in eval script to blue for readability	2024-11-13 22:50:14 -08:00
Debanjum	f4e37209a2	Improve error handling, display and configurability of eval script - Default to evaluation decision of None when either agent or evaluator llm fails. This fixes accuracy calculations on errors - Fix showing color for decision True - Enable arg flags to specify output results file paths	2024-11-13 14:32:22 -08:00
Debanjum	ff5c10c221	Do not CRUD on entries, files & conversations in DB for null user Increase defense-in-depth by reducing paths to create, read, update or delete entries, files and conversations in DB when user is unset.	2024-11-11 12:20:07 -08:00
sabaimran	8805e731fd	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-10 19:24:11 -08:00
Debanjum	f967bdf702	Show correct example index being currently processed in frames eval Previously the batch start index wasn't being passed so all batches started in parallel were showing the same processing example index This change doesn't impact the evaluation itself, just the index shown of the example currently being evaluated	2024-11-10 14:49:51 -08:00
Debanjum	84a8088c2b	Only evaluate non-empty responses to reduce eval script latency, cost Empty responses by Khoj will always be an incorrect response, so no need to make call to an evaluator agent to check that	2024-11-10 14:49:51 -08:00
sabaimran	623a97a9ee	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-07 17:18:23 -08:00
sabaimran	cf0bcec0e7	Revert SKIP_TESTS flag in offline chat director tests	2024-11-04 19:06:54 -08:00
sabaimran	1f372bf2b1	Update file summarization unit tests now that multiple files are allowed	2024-11-04 17:45:54 -08:00
Debanjum	1ccbf72752	Use logger instead of print to track eval	2024-11-04 00:40:26 -08:00
Debanjum	791eb205f6	Run prompt batches in parallel for faster eval runs	2024-11-02 04:58:03 -07:00
Debanjum	96904e0769	Add script to evaluate khoj on Google's FRAMES benchmark Google's FRAMES benchmark evaluates multi-step retrieval and reasoning capabilities of an agent. The script uses Gemini as an LLM Judge to evaluate Khoj responses to the FRAMES benchmark prompts against the ground truth provided by it.	2024-11-02 04:57:42 -07:00
Debanjum	50ffd7f199	Merge branch 'master' into features/advanced-reasoning	2024-10-28 04:10:59 -07:00
Debanjum	3e17ab438a	Separate notes, online context from user message sent to chat models (#950 ) Overview --- - Put context into separate user message before sending to chat model. This should improve model response quality and truncation logic in code - Pass online context from chat history to chat model for response. This should improve response speed when previous online context can be reused - Improve format of notes, online context passed to chat models in prompt. This should improve model response quality Details --- The document, online search context are now passed as separate user messages to chat model, instead of being added to the final user message. This will improve - Models ability to differentiate data from user query. That should improve response quality and reduce prompt injection probability - Make truncation logic simpler and more robust When context window hit, can simply pop messages to auto truncate context in order of context, user, assistant message for each conversation turn in history until reach current user query The complex, brittle logic to extract user query from context in last user message isn't required.	2024-10-28 02:03:18 -07:00
sabaimran	30f9225021	Merge branch 'master' of github.com:khoj-ai/khoj into features/advanced-reasoning	2024-10-23 19:15:51 -07:00
sabaimran	f3ce47b445	Create explicit flow to enable the free trial (#944 ) * Create explicit flow to enable the free trial The current design is confusing. It obfuscates the fact that the user is on a free trial. This design will make the opt-in explicit and more intuitive. * Use the Subscription Type enum instead of hardcoded strings everywhere * Use length of free trial in the frontend code as well	2024-10-23 15:29:23 -07:00
Debanjum Singh Solanky	39a613d3bc	Fix up openai chat actor tests	2024-10-22 03:09:36 -07:00
sabaimran	ad197be70c	Fix PDFs unit test, skip OCR	2024-10-20 22:25:41 -07:00
sabaimran	a979457442	Add unit tests for agents - Add permutations of testing for with, without knowledge base. Private, public, different users.	2024-10-20 20:04:50 -07:00
Debanjum Singh Solanky	6a8fd9bf33	Reorder embeddings search arguments based on argument importance	2024-10-10 04:45:00 -07:00
Debanjum Singh Solanky	91c76d4152	Intelligently initialize a decent default set of chat model options Given the LLM landscape is rapidly changing, providing a good default set of options should help reduce decision fatigue to get started Improve initialization flow during first run - Set Google, Anthropic Chat models too Previously only Offline, Openai chat models could be set during init - Add multiple chat models for each LLM provider Interactively set a comma separated list of models for each provider - Auto add default chat models for each provider in non-interactive model if the {OPENAI,GEMINI,ANTHROPIC}_API_KEY env var is set - Do not ask for max_tokens, tokenizer for offline models during initialization. Use better defaults inferred in code instead - Explicitly set default chat model to use If unset, it implicitly defaults to using the first chat model. Make it explicit to reduce this confusion Resolves #882	2024-09-19 20:32:08 -07:00
Debanjum Singh Solanky	bc2e889d72	Update chat director, client tests to call chat API using new POST method	2024-09-11 17:28:06 -07:00
Debanjum Singh Solanky	241b9009ba	Update OpenAI chat actor tests to handle more questions being extracted	2024-09-11 16:16:55 -07:00
Raghav Tirumale	549686a7a4	Add Vision Support (#889 ) # Summary of Changes * New UI to show preview of image uploads * ChatML message changes to support gpt-4o vision based responses on images * AWS S3 image uploads for persistent image context in conversations * Database changes to have `vision_enabled` option in server admin panel while configuring models * Render previously uploaded images in the chat history, show uploaded images for pending msgs * Pass the uploaded_image_url through to subqueries * Allow image to render upon first message from the homepage * Add rendering support for images to shared chat as well * Fix some UI/functionality bugs in the share page * Convert user attached images for chat to webp format before upload * Use placeholder to attached image for data source, response mode actors * Update all clients to call /api/chat as a POST instead of GET request * Fix copying chat messages with images to clipboard TLDR; Add vision support for openai models on Khoj via the web UI! --------- Co-authored-by: sabaimran <narmiabas@gmail.com> Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>	2024-09-09 15:22:18 -07:00
Debanjum Singh Solanky	238bc11a50	Fix, improve openai chat actor, director tests & online search prompt	2024-08-22 19:09:33 -07:00
Debanjum Singh Solanky	9986c183ea	Default to gpt-4o-mini instead of gpt-3.5-turbo in tests, func args GPT-4o-mini is cheaper, smarter and can hold more context than GPT-3.5-turbo. In production, we also default to gpt-4o-mini, so makes sense to upgrade defaults and tests to work with it	2024-08-22 19:04:49 -07:00
Debanjum Singh Solanky	58c8068079	Upgrade default offline chat model to llama 3.1	2024-08-20 09:28:56 -07:00
Debanjum	39e566ba91	Improve Document, Online Search to Answer Vague or Meta Questions (#870 ) - Major - Improve doc search actor performance on vague, random or meta questions - Pass user's name to document and online search actors prompts - Minor - Fix and improve openai chat actor tests - Remove unused max tokns arg to extract qs func of doc search actor	2024-08-16 06:46:13 -07:00
srikary12	05c0aa3882	Support exclusion file filters (#826 ) ### Overview Support exclude file filter in user search queries ### Details - All of the exclude file filter terms need to be satisfied - Any one of the include file filter terms should be satisfied ### Example - Search Query: what happened yesterday? -file:"tasks.org" -file:"work.md" file:"diary.org" file:"journal.org - Behavior: Query will try find relevant notes in any of `journal.org` or `diary.org` and not in `tasks.org` and not in `work.md` ### Details * Add support for exclusion file filters * Translate file filter to valid Django DB entry filter regex * Exclude all files when multiple exclude file filter in query Previously we were applying an "Or" filter, which would exclude any file mentioned in a query with multiple exclude file filter. This is not what we naturally mean when we ask excluding a file in a query * Rename, rearrange, deduplicate and add file filter tests Closes #728 --------- Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>	2024-08-12 05:41:54 -07:00
sabaimran	c08b9e89f0	Update test_db_lock with new function name	2024-08-08 13:03:01 +05:30
sabaimran	1a1d9c7257	Merge branch 'master' of github.com:khoj-ai/khoj into features/big-upgrade-chat-ux	2024-07-27 14:18:05 +05:30
Debanjum Singh Solanky	878cc023a0	Fix and improve openai chat actor tests - Use new form of passing doc references to now passing chat actor test - Fix message list generation from conversation logs provided Strangely the parent conversation_log gets passed down to message_to_log func when the kwarg is not explicitly specified	2024-07-26 23:53:47 +05:30
sabaimran	44d34f9090	Update the unit test for the subscribed user	2024-07-26 19:59:01 +05:30
sabaimran	377f7668c5	Merge pull request #858 from khoj-ai/use-sse-instead-of-websocket Use Single HTTP API for Robust, Generalizable Chat Streaming	2024-07-26 07:11:54 -07:00

1 2 3 4 5 ...

426 commits