sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-03 12:23:02 +01:00

Author	SHA1	Message	Date
sabaimran	47937d5148	Merge branch 'features/include-full-file-in-convo-with-filter' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-11 09:34:08 -08:00
sabaimran	ae4eb96d48	Consolidate file name to icon mapping	2024-11-11 09:34:04 -08:00
Debanjum	7954f39633	Use accept param to file input to indicate supported file types in web app Remove unused total size calculations in chat input	2024-11-11 04:06:17 -08:00
Debanjum	4223b355dc	Use python stdlib methods to write pdf, docx to temp files for loaders Use python standard method tempfile.NamedTemporaryFile to write, delete temporary files safely.	2024-11-11 03:24:50 -08:00
Debanjum	fd15fc1e59	Move construct chat history back to it's original position in file Keep function where it original was allows tracking diffs and change history more easily	2024-11-11 03:24:50 -08:00
sabaimran	8805e731fd	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-10 19:24:11 -08:00
sabaimran	a5e2b9e745	Exit early when running an automation if the conversation for the automation does not exist.	2024-11-10 19:22:21 -08:00
sabaimran	55200be4fa	Apply agent color fill to the toggle both in off and on states	2024-11-10 19:16:43 -08:00
Debanjum	7468f6a6ed	Deduplicate online references returned by chat API to clients This will ensure only unique online references are shown in all clients. The duplication issue was exacerbated in research mode as even with different online search queries, you can get previously seen results. This change does a global deduplication across all online results seen across research iterations before returning them in client reponse.	2024-11-10 16:10:32 -08:00
Debanjum	137687ee49	Deduplicate searches in normal mode & across research iterations - Deduplicate online, doc search queries across research iterations. This avoids running previously run online, doc searches again and dedupes online, doc context seen by model to generate response. - Deduplicate online search queries generated by chat model for each user query. - Do not pass online, docs, code context separately when generate response in research mode. These are already collected in the meta research passed with the user query - Improve formatting of context passed to generate research response - Use xml tags to delimit context. Pass per iteration queries in each iteration result - Put user query before meta research results in user message passed for generating response This deduplications will improve speed, cost & quality of research mode	2024-11-10 16:10:32 -08:00
Debanjum	306f7a2132	Show error in picking next tool to researcher llm in next iteration Previously the whole research mode response would fail if the pick next tool call to chat model failed. Now instead of it completely failing, the researcher actor is told to try again in next iteration. This allows for a more graceful degradation in answering a research question even if a (few?) calls to the chat model fail.	2024-11-10 14:52:02 -08:00
Debanjum	eb492f3025	Only keep webpage content requested, even if Jina API gets more data Jina search API returns content of all webpages in search results. Previously code wouldn't remove content beyond max_webpages_to_read limit set. Now, webpage content in organic results aree explicitly removed beyond the requested max_webpage_to_read limit. This should align behavior of online results from Jina with other online search providers. And restrict llm context to a reasonable size when using Jina for online search.	2024-11-10 14:51:16 -08:00
Debanjum	8ef7892c5e	Exclude non-dictionary doc context from chat history sent to chat models This fixes chat with old chat sessions. Fixes issue with old Whatsapp users can't chat with Khoj because chat history doc context was stored as a list earlier	2024-11-10 14:51:16 -08:00
Debanjum	d892ab3174	Fix handling of command rate limit and improve rate limit messages Command rate limit wouldn't be shown to user as server wouldn't be able to handle HTTP exception in the middle of streaming. Catch exception and render it as LLM response message instead for visibility into command rate limiting to user on client Log rate limmit messages for all rate limit events on server as info messages Convert exception messages into first person responses by Khoj to prevent breaking the third wall and provide more details on wht happened and possible ways to resolve them.	2024-11-10 14:51:16 -08:00
Debanjum	80ee35b9b1	Wrap messages in web, obsidian UI to stay within screen when long links Wrap long links etc. in chat messages and train of thought lists on web app app and obsidian plugin by breaking them into newlines by word	2024-11-10 14:49:51 -08:00
Debanjum	f967bdf702	Show correct example index being currently processed in frames eval Previously the batch start index wasn't being passed so all batches started in parallel were showing the same processing example index This change doesn't impact the evaluation itself, just the index shown of the example currently being evaluated	2024-11-10 14:49:51 -08:00
Debanjum	84a8088c2b	Only evaluate non-empty responses to reduce eval script latency, cost Empty responses by Khoj will always be an incorrect response, so no need to make call to an evaluator agent to check that	2024-11-10 14:49:51 -08:00
sabaimran	170d959feb	Handle offline messages differently, as they don't respond well to the structured messages	2024-11-09 19:52:46 -08:00
sabaimran	2c543bedd7	Add typing to the constructed messages listed	2024-11-09 19:40:27 -08:00
sabaimran	79b15e4594	Only add images when they're present and vision enabled	2024-11-09 19:37:30 -08:00
sabaimran	bd55028115	Fix randint import from random when creating filenames for tmp	2024-11-09 19:17:18 -08:00
sabaimran	92b6b3ef7b	Add attached files to latest structured message in chat ml format	2024-11-09 19:17:00 -08:00
sabaimran	835fa80a4b	Allow docx conversion in the chatFunction.ts	2024-11-09 18:51:00 -08:00
sabaimran	459318be13	And random suffixes to decreases any clash probability when writing tmp files to disc	2024-11-09 18:46:34 -08:00
sabaimran	dbf0c26247	Remove _summary_ description in function descriptions	2024-11-09 18:42:42 -08:00
sabaimran	e5ac076fc4	Move construct_chat_history method back to conversation.utils.py	2024-11-09 18:27:46 -08:00
sabaimran	bc95a99fb4	Make tracer the last input parameter for all the relevant chat helper methods	2024-11-09 18:22:46 -08:00
sabaimran	ceb29eae74	Add phone number verification and remove telemetry update call from place where authentication middleware isn't yet installed (in the middleware itself).	2024-11-09 12:25:36 -08:00
sabaimran	3badb27744	Remove stored uploaded files after they're processed.	2024-11-08 23:28:02 -08:00
sabaimran	78630603f4	Delete the fact checker application	2024-11-08 17:27:42 -08:00
sabaimran	807687a0ac	Automatically generate titles for conversations from history	2024-11-08 16:02:34 -08:00
sabaimran	7159b0b735	Enforce limits on file size when converting to text	2024-11-08 15:27:28 -08:00
sabaimran	4695174149	Add support for file preview in the chat input area (before message sent)	2024-11-08 15:12:48 -08:00
sabaimran	ad46b0e718	Label pages when extract text from pdf, docs content. Fix scroll area in doc preview.	2024-11-08 14:53:20 -08:00
sabaimran	ee062d1c48	Fix parsing for PDFs via content indexing API	2024-11-07 18:17:29 -08:00
sabaimran	623a97a9ee	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-07 17:18:23 -08:00
sabaimran	33498d876b	Simplify the share chat page. Don't need it to maintain its own conversation history - When chatting on a shared page, fork and redirect to a new conversation page	2024-11-07 17:14:11 -08:00
sabaimran	4b8be55958	Convert UUID to string when forking a conversation	2024-11-07 17:13:04 -08:00
sabaimran	9bbe27fe36	Set default value of attached files to empty list	2024-11-07 17:12:45 -08:00
sabaimran	3a51996f64	Process attached files in the chat history and add them to the chat message	2024-11-07 16:06:58 -08:00
sabaimran	a89160e2f7	Add support for converting an attached doc and chatting with it - Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend - We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.	2024-11-07 16:06:37 -08:00
sabaimran	e521853895	Remove unnecessary console.log statements	2024-11-07 16:03:31 -08:00
sabaimran	92c3b9c502	Add function to get an icon from a file type	2024-11-07 16:02:53 -08:00
sabaimran	140c67f6b5	Remove focus ring from the text area component	2024-11-07 16:02:02 -08:00
sabaimran	b8ed98530f	Accept attached files in the chat API - weave through all subsequent subcalls to models, where relevant, and save to conversation log	2024-11-07 16:01:48 -08:00
sabaimran	ecc81e06a7	Add separate methods for docx and pdf files to just convert files to raw text, before further processing	2024-11-07 16:01:08 -08:00
sabaimran	394035136d	Add an api that gets a document, and converts it to just text	2024-11-07 16:00:10 -08:00
sabaimran	3b1e8462cd	Include attach files in calls to extract questions	2024-11-07 15:59:15 -08:00
sabaimran	de73cbc610	Add support for relaying attached files through backend calls to models	2024-11-07 15:58:52 -08:00
Debanjum	4cad96ded6	Add Script to Evaluate Khoj on Google's FRAMES benchmark (#955 ) - Why We need better, automated evals to measure performance shifts of Khoj across prompt, model and capability changes. Google's FRAMES benchmark evaluates multi-step retrieval and reasoning capabilities of AI agents. It's a good starter benchmark to evaluate Khoj. - Details This PR adds an eval script to evaluate Khoj responses on the the FRAMES benchmark prompts against the ground truth provided by it. Script allows configuring sample size, batch size, sampling queries from the eval dataset. Gemini is used as an LLM Judge to auto grade Khoj responses vs ground truth data from the benchmark.	2024-11-06 17:52:01 -08:00

1 2 3 4 5 ...

3844 commits