sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-30 10:53:02 +01:00

Author	SHA1	Message	Date
Debanjum	8851b5f78a	Standardize chat message truncation and serialization before print Previously chatml messages were just strings, now they can be list of strings or list of dicts as well. - Use json seriallization to manage their variations and truncate them before printing for context. - Put logic in single function for use across chat models	2024-11-13 16:30:17 -08:00
Debanjum	15b0cfa3dd	Improve structured message truncation in logger Previously chatml messages were just strings. Since gemini, anthropic models always have messages as list of strings, truncate those strings instead of the list of message content	2024-11-13 14:32:22 -08:00
Debanjum	153ae8bea9	Cut binary, long output files from code result for context efficiency Removing binary data and truncating large data in output files generated by code runs should improve speed and cost of research mode runs with large or binary output files. Previously binary data in code results was passed around in iteration context during research mode. This made the context inefficient because models have limited efficiency and reasoning capabilities over b64 encoded image (and other binary) data and would hit context limits leading to unnecessary truncation of other useful context Also remove image data when logging output of code execution	2024-11-13 14:32:22 -08:00
sabaimran	4a1b1e8b9a	Add support for interrupting messages after they've been sent.	2024-11-12 22:22:45 -08:00
sabaimran	d607ad7a27	Release Khoj version 1.29.1	2024-11-12 10:32:56 -08:00
sabaimran	8ec1764e42	Handle size calculation more gracefully for converted documents, depending on type	2024-11-12 02:00:29 -08:00
sabaimran	b6714c202f	Increase the title character limit to 500 for conversations	2024-11-12 01:51:19 -08:00
sabaimran	f05e64cf8c	Release Khoj version 1.29.0	2024-11-11 21:46:25 -08:00
sabaimran	47d3c8c235	Remove email query parameter from subscription patch api	2024-11-11 21:39:49 -08:00
sabaimran	d7027109a5	And null handling for response output_files in code output	2024-11-11 21:14:56 -08:00
sabaimran	d68243a3fb	Revert clean_json logic temporarily. Eventually, we should do better validation here to extract markdown-formatted json.	2024-11-11 21:05:17 -08:00
sabaimran	1cab6c081f	Add better error handling for diagram output, and fix chat history construct - Make the `clean_json` method more robust as well	2024-11-11 20:44:19 -08:00
sabaimran	7bd2f83f97	Wrap test in suggestionCard	2024-11-11 20:12:46 -08:00
Debanjum	48862a8400	Enable Passing External Documents for Analysis in Code Sandbox (#960 ) - Allow passing user files as input into code sandbox for analysis - Update prompt to give more example of complex, multi-line code - Simplify logic for model. Run one program at a time, instead of allowing model to run multiple programs in parallel - Show Code generated charts and docs in Reference pane of web app and make them downloaded	2024-11-11 19:37:17 -08:00
Debanjum	5078ac0ce2	Await on conversation save when generate conversation title via API	2024-11-11 19:17:39 -08:00
Debanjum	e1d0015248	Allow disabling Khoj telemetry via KHOJ_TELEMETRY_DISABLE env var	2024-11-11 19:17:39 -08:00
Debanjum	a52500d289	Show generated code artifacts before notes and online references	2024-11-11 18:00:22 -08:00
Debanjum	218eed83cd	Show output file not code on hover. Remove reference card title border	2024-11-11 18:00:22 -08:00
Debanjum	b970cfd4b3	Align styling of reference panel card across code, docs, web results - Add a border below heading - Show code snippet in pre block - Overflow-x when reference side panel open to allow seeing whole text via x-scroll - Align header, body position of reference cards with each other - Only show filename in doc reference cards at message bottom. Show full file path in hover and reference side panel	2024-11-11 18:00:22 -08:00
Debanjum	8e9f4262a9	Render code output files with code references in reference section - Improve rendering code reference with better icons, smaller text and different line clamps for better visibility - Show code output files as sub card of code card in reference section - Allow downloading files generated by code instead of rendering it in chat message directly - Show executed code before online references in reference panel	2024-11-11 18:00:22 -08:00
Debanjum	92c1efe6ee	Fixes to render & save code context with non text based output modes - Fix to render code generated chart with images, excalidraw diagrams - Fix to save code context to chat history in image, diagram output modes - Fix bug in image markdown being wrapped twice in markdown syntax - Render newline in code references shown on chat page of web app Previously newlines weren't getting rendered. This made the code executed by Khoj hard to read in references. This changes fixes that. `dangerouslySetInnerHTML' usage is justified as rendered code snippet is being sanitized by DOMPurify before rendering.	2024-11-11 18:00:22 -08:00
Debanjum	af0215765c	Decode code text output files from b64 to str to ease client processing	2024-11-11 18:00:22 -08:00
Debanjum	7b39f2014a	Enable analysing user documents in code sandbox and other improvements - Run one program at a time, instead of allowing model to pass multiple programs to run in parallel to simplify logic for model - Update prompt to give more example of complex, multi-line code - Allow passing user files as input into code sandbox for analysis - Log code execution timer at info level to evaluate execution latencies in production - Type the generated code for easier processing by caller functions	2024-11-11 17:59:37 -08:00
sabaimran	dc109559d4	Research mode gray when off, colored when on	2024-11-11 16:35:07 -08:00
sabaimran	cdda9c2e73	Improve text wrapping for attached files and preview context For the research mode toggle, make it not fill when it's off	2024-11-11 13:32:10 -08:00
sabaimran	dd36303bb7	Fix sending file attachments in save_to_conversation method - When files attached but upload fails, don't update the state variables - Make removing null characters in pdf extraction more space efficient	2024-11-11 12:53:06 -08:00
Debanjum	536fe994be	Remove unused db adapter methods, like for fact checker data store	2024-11-11 12:22:34 -08:00
Debanjum	10bca6fa8f	Convert required user param check into decorator. Use with more adapters	2024-11-11 12:22:32 -08:00
Debanjum	ff5c10c221	Do not CRUD on entries, files & conversations in DB for null user Increase defense-in-depth by reducing paths to create, read, update or delete entries, files and conversations in DB when user is unset.	2024-11-11 12:20:07 -08:00
sabaimran	27fa39353e	Make custom agent creation flow available to everyone - For private agents, add guardrails to prevent against any misuse or violation of terms of service.	2024-11-11 11:54:59 -08:00
sabaimran	b563f46a2e	Merge pull request #957 from khoj-ai/features/include-full-file-in-convo-with-filter Support including file attachments in the chat message Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation. Pipe the relevant attached_files context through all methods calling into models. This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).	2024-11-11 11:34:42 -08:00
sabaimran	2bb2ff27a4	Rename attached_files to query_files. Update relevant backend and client-side code.	2024-11-11 11:21:26 -08:00
sabaimran	47937d5148	Merge branch 'features/include-full-file-in-convo-with-filter' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-11 09:34:08 -08:00
sabaimran	ae4eb96d48	Consolidate file name to icon mapping	2024-11-11 09:34:04 -08:00
Debanjum	7954f39633	Use accept param to file input to indicate supported file types in web app Remove unused total size calculations in chat input	2024-11-11 04:06:17 -08:00
Debanjum	4223b355dc	Use python stdlib methods to write pdf, docx to temp files for loaders Use python standard method tempfile.NamedTemporaryFile to write, delete temporary files safely.	2024-11-11 03:24:50 -08:00
Debanjum	fd15fc1e59	Move construct chat history back to it's original position in file Keep function where it original was allows tracking diffs and change history more easily	2024-11-11 03:24:50 -08:00
Debanjum	35d6c792e4	Show snippet of truncated messages in debug logs to avoid log flooding	2024-11-11 02:30:38 -08:00
sabaimran	8805e731fd	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-10 19:24:11 -08:00
sabaimran	a5e2b9e745	Exit early when running an automation if the conversation for the automation does not exist.	2024-11-10 19:22:21 -08:00
sabaimran	55200be4fa	Apply agent color fill to the toggle both in off and on states	2024-11-10 19:16:43 -08:00
Debanjum	7468f6a6ed	Deduplicate online references returned by chat API to clients This will ensure only unique online references are shown in all clients. The duplication issue was exacerbated in research mode as even with different online search queries, you can get previously seen results. This change does a global deduplication across all online results seen across research iterations before returning them in client reponse.	2024-11-10 16:10:32 -08:00
Debanjum	137687ee49	Deduplicate searches in normal mode & across research iterations - Deduplicate online, doc search queries across research iterations. This avoids running previously run online, doc searches again and dedupes online, doc context seen by model to generate response. - Deduplicate online search queries generated by chat model for each user query. - Do not pass online, docs, code context separately when generate response in research mode. These are already collected in the meta research passed with the user query - Improve formatting of context passed to generate research response - Use xml tags to delimit context. Pass per iteration queries in each iteration result - Put user query before meta research results in user message passed for generating response This deduplications will improve speed, cost & quality of research mode	2024-11-10 16:10:32 -08:00
Debanjum	306f7a2132	Show error in picking next tool to researcher llm in next iteration Previously the whole research mode response would fail if the pick next tool call to chat model failed. Now instead of it completely failing, the researcher actor is told to try again in next iteration. This allows for a more graceful degradation in answering a research question even if a (few?) calls to the chat model fail.	2024-11-10 14:52:02 -08:00
Debanjum	eb492f3025	Only keep webpage content requested, even if Jina API gets more data Jina search API returns content of all webpages in search results. Previously code wouldn't remove content beyond max_webpages_to_read limit set. Now, webpage content in organic results aree explicitly removed beyond the requested max_webpage_to_read limit. This should align behavior of online results from Jina with other online search providers. And restrict llm context to a reasonable size when using Jina for online search.	2024-11-10 14:51:16 -08:00
Debanjum	8ef7892c5e	Exclude non-dictionary doc context from chat history sent to chat models This fixes chat with old chat sessions. Fixes issue with old Whatsapp users can't chat with Khoj because chat history doc context was stored as a list earlier	2024-11-10 14:51:16 -08:00
Debanjum	d892ab3174	Fix handling of command rate limit and improve rate limit messages Command rate limit wouldn't be shown to user as server wouldn't be able to handle HTTP exception in the middle of streaming. Catch exception and render it as LLM response message instead for visibility into command rate limiting to user on client Log rate limmit messages for all rate limit events on server as info messages Convert exception messages into first person responses by Khoj to prevent breaking the third wall and provide more details on wht happened and possible ways to resolve them.	2024-11-10 14:51:16 -08:00
Debanjum	80ee35b9b1	Wrap messages in web, obsidian UI to stay within screen when long links Wrap long links etc. in chat messages and train of thought lists on web app app and obsidian plugin by breaking them into newlines by word	2024-11-10 14:49:51 -08:00
sabaimran	170d959feb	Handle offline messages differently, as they don't respond well to the structured messages	2024-11-09 19:52:46 -08:00
sabaimran	2c543bedd7	Add typing to the constructed messages listed	2024-11-09 19:40:27 -08:00
sabaimran	79b15e4594	Only add images when they're present and vision enabled	2024-11-09 19:37:30 -08:00
sabaimran	bd55028115	Fix randint import from random when creating filenames for tmp	2024-11-09 19:17:18 -08:00
sabaimran	92b6b3ef7b	Add attached files to latest structured message in chat ml format	2024-11-09 19:17:00 -08:00
sabaimran	835fa80a4b	Allow docx conversion in the chatFunction.ts	2024-11-09 18:51:00 -08:00
sabaimran	459318be13	And random suffixes to decreases any clash probability when writing tmp files to disc	2024-11-09 18:46:34 -08:00
sabaimran	dbf0c26247	Remove _summary_ description in function descriptions	2024-11-09 18:42:42 -08:00
sabaimran	e5ac076fc4	Move construct_chat_history method back to conversation.utils.py	2024-11-09 18:27:46 -08:00
sabaimran	bc95a99fb4	Make tracer the last input parameter for all the relevant chat helper methods	2024-11-09 18:22:46 -08:00
sabaimran	ceb29eae74	Add phone number verification and remove telemetry update call from place where authentication middleware isn't yet installed (in the middleware itself).	2024-11-09 12:25:36 -08:00
sabaimran	3badb27744	Remove stored uploaded files after they're processed.	2024-11-08 23:28:02 -08:00
sabaimran	78630603f4	Delete the fact checker application	2024-11-08 17:27:42 -08:00
sabaimran	807687a0ac	Automatically generate titles for conversations from history	2024-11-08 16:02:34 -08:00
sabaimran	7159b0b735	Enforce limits on file size when converting to text	2024-11-08 15:27:28 -08:00
sabaimran	4695174149	Add support for file preview in the chat input area (before message sent)	2024-11-08 15:12:48 -08:00
sabaimran	ad46b0e718	Label pages when extract text from pdf, docs content. Fix scroll area in doc preview.	2024-11-08 14:53:20 -08:00
sabaimran	ee062d1c48	Fix parsing for PDFs via content indexing API	2024-11-07 18:17:29 -08:00
sabaimran	623a97a9ee	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-07 17:18:23 -08:00
sabaimran	33498d876b	Simplify the share chat page. Don't need it to maintain its own conversation history - When chatting on a shared page, fork and redirect to a new conversation page	2024-11-07 17:14:11 -08:00
sabaimran	4b8be55958	Convert UUID to string when forking a conversation	2024-11-07 17:13:04 -08:00
sabaimran	9bbe27fe36	Set default value of attached files to empty list	2024-11-07 17:12:45 -08:00
sabaimran	3a51996f64	Process attached files in the chat history and add them to the chat message	2024-11-07 16:06:58 -08:00
sabaimran	a89160e2f7	Add support for converting an attached doc and chatting with it - Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend - We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.	2024-11-07 16:06:37 -08:00
sabaimran	e521853895	Remove unnecessary console.log statements	2024-11-07 16:03:31 -08:00
sabaimran	92c3b9c502	Add function to get an icon from a file type	2024-11-07 16:02:53 -08:00
sabaimran	140c67f6b5	Remove focus ring from the text area component	2024-11-07 16:02:02 -08:00
sabaimran	b8ed98530f	Accept attached files in the chat API - weave through all subsequent subcalls to models, where relevant, and save to conversation log	2024-11-07 16:01:48 -08:00
sabaimran	ecc81e06a7	Add separate methods for docx and pdf files to just convert files to raw text, before further processing	2024-11-07 16:01:08 -08:00
sabaimran	394035136d	Add an api that gets a document, and converts it to just text	2024-11-07 16:00:10 -08:00
sabaimran	3b1e8462cd	Include attach files in calls to extract questions	2024-11-07 15:59:15 -08:00
sabaimran	de73cbc610	Add support for relaying attached files through backend calls to models	2024-11-07 15:58:52 -08:00
Debanjum	05a93fcbed	v-align attach, send buttons with chat input text area on web app Otherwise, those buttons look off-center when images are attached to the chat input area	2024-11-05 17:10:53 -08:00
sabaimran	a0480d5f6c	use fill weight for the toggle right (enabled state) for research mode	2024-11-04 22:01:09 -08:00
sabaimran	dc26da0a12	Add uploaded files in the conversation file filter for a new convo	2024-11-04 22:00:47 -08:00
Debanjum	b51ee644aa	Fix escaping filename when normalizing in org node parser	2024-11-04 20:24:57 -08:00
Debanjum	5724d16a6f	Fix passing images to anthropic chat models to extract questions	2024-11-04 20:24:57 -08:00
sabaimran	7543360210	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-04 16:55:48 -08:00
sabaimran	b6145df3be	Handle file retrieval when agent is None	2024-11-04 16:55:22 -08:00
sabaimran	3dc9139cee	Add additional handling for when file_object comes back empty	2024-11-04 16:53:07 -08:00
sabaimran	a27b8d3e54	Remove summarize condition for only 1 file filter	2024-11-04 16:51:37 -08:00
sabaimran	362bdebd02	Add methods for reading full files by name and including context Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation. Pipe the relevant attached_files context through all methods calling into models. We'll want to limit the file sizes for which this is used and provide more helpful UI indicators that this sort of behavior is taking place.	2024-11-04 16:37:13 -08:00
sabaimran	e3ca52b7cb	Use .get() to get text accompanying image url, instead of subindexing	2024-11-04 16:09:16 -08:00
sabaimran	1e89baca7b	Deprecate the UserSearchModelConfig and remove all references - The server has moved to a model of standardization for the embeddings generation workflow. Remove references to the support for differentiated models. - The migration script fo ra new model needs to be updated to accommodate full regeneration.	2024-11-04 12:24:41 -08:00
sabaimran	99c1d2831a	Release Khoj version 1.28.3	2024-11-02 12:23:11 -07:00
sabaimran	075b4ecf15	Call subscription_to_state with sync_to_async wrapper when getting user subscription state - This is needed in case the renewal_date is not set and we need to reset it for the user	2024-11-02 12:22:35 -07:00
sabaimran	ec44cbe1e7	Release Khoj version 1.28.2	2024-11-02 07:53:51 -07:00
Debanjum	31b5fde163	Only enable prompt tracer if git python is installed	2024-11-02 02:07:02 -07:00
sabaimran	5b18dc96e0	Release Khoj version 1.28.1	2024-11-01 22:51:51 -07:00
Debanjum	e85dd59295	Release Khoj version 1.28.0	2024-11-01 19:06:59 -07:00
Debanjum	14e453039d	Add prompt tracing, agent personality to infer webpage urls chat actor	2024-11-01 18:12:50 -07:00
Debanjum	ab321dc518	Expect query before tool in response to give think space in research prompt	2024-11-01 17:51:41 -07:00

1 2 3 4 5 ...

3059 commits