sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-18 18:47:11 +00:00

Author	SHA1	Message	Date
sabaimran	d91935c880	Initial commit of a functional but not yet elegant prototype for this concept	2024-11-28 17:28:23 -08:00
Debanjum	a552543f4f	Use json5 to parse llm generated questions to query docs and web Some checks failed build khoj.el / build (push) Waiting to run Details desktop / 🖥️ Build, Release Desktop App (push) Waiting to run Details dockerize / Publish Khoj Docker Images (push) Waiting to run Details build and deploy github pages for documentation / deploy (push) Waiting to run Details pypi / Publish Python Package to PyPI (push) Waiting to run Details test khoj.el / test (27.1) (push) Waiting to run Details test khoj.el / test (27.2) (push) Waiting to run Details test khoj.el / test (28.1) (push) Waiting to run Details test khoj.el / test (28.2) (push) Waiting to run Details test khoj.el / test (snapshot) (push) Waiting to run Details test / Run Tests (push) Has been cancelled Details pre-commit / Setup Application and Lint (push) Has been cancelled Details json5 is more forgiving, handles double quotes, newlines in raw json string	2024-11-28 14:35:34 -08:00
Debanjum	0a69af4f61	Update to latest ToDesktop runtime	2024-11-28 13:56:14 -08:00
Debanjum	1d0fe141dc	Release Khoj version 1.30.8	2024-11-28 13:37:30 -08:00
Debanjum	8c120a5139	Fallback to json5 loader if json.loads cannot parse complex json str JSON5 spec is more flexible, try to load using a fast json5 parser if the stricter json.loads from the standard library can't load the raw complex json string into a python dictionary/list	2024-11-26 21:17:00 -08:00
Debanjum	70b7e7c73a	Improve load of complex json objects. Use it to pick tool, run code Gemini doesn't work well when trying to output json objects. Using it to output raw json strings with complex, multi-line structures requires more intense clean-up of raw json string for parsing	2024-11-26 17:37:57 -08:00
Debanjum	29315f44e7	Add assetlinks.json to link android app to app.khoj.dev domain Add sha cert of android upload, signing keys to open debug, prod apps as TWA in fullscreen on android phones	2024-11-26 01:57:54 -08:00
Debanjum	a97a45bf20	Align agent personality with recently updated khoj personality See update to Khoj personality in commit `6eb59464da`	2024-11-26 00:06:16 -08:00
Debanjum	5723a3778e	Speed up Docker image builds using multi-stage parallel pipelines (#987 ) ## Objective Improve build speed and size of khoj docker images ## Changes ### Improve docker image build speeds - Decouple web app and server build steps - Build the web app and server in parallel - Cache docker layers for reuse across dockerize github workflow runs - Split Docker build layers for improved cacheability (e.g separate `yarn install` and `yarn build` steps) ### Reduce size of khoj docker images - Use an up-to-date `.dockerignore` to exclude unnecessary directories - Do not installing cuda python packages for cpu builds ### Improve web app builds - Use consistent mechanism to get fonts for web app - Make tailwind extensions production instead of dev dependencies - Make next.js create production builds for the web app (via `NODE_ENV=production` env var)	2024-11-24 21:49:46 -08:00
Debanjum	6a39651ad3	Standardize loading fonts locally across pages on web app	2024-11-24 20:41:15 -08:00
sabaimran	6eb59464da	Add additional reinforcement to coax gemini into giving a minimum helpful response	2024-11-24 14:53:53 -08:00
sabaimran	15f062b34a	Remove print statement for agent style map	2024-11-24 14:53:53 -08:00
sabaimran	d7e68a2d1b	Wait for iplcodata to load before first message - Fix the console khoj ai ascii art - Remove some not so good suggested prompt	2024-11-24 14:53:53 -08:00
Debanjum	710e00ad9e	Make tailwind extensions prod, instead of dev, deps of web app	2024-11-24 13:59:40 -08:00
Debanjum	7c77d65d35	Improve logic to disable telemetry via KHOJ_TELEMETRY_DISABLE env var Some checks failed dockerize / Publish Khoj Docker Images (push) Waiting to run Details build and deploy github pages for documentation / deploy (push) Waiting to run Details pre-commit / Setup Application and Lint (push) Waiting to run Details pypi / Publish Python Package to PyPI (push) Waiting to run Details test / Run Tests (push) Waiting to run Details build khoj.el / build (push) Has been cancelled Details desktop / 🖥️ Build, Release Desktop App (push) Has been cancelled Details test khoj.el / test (27.1) (push) Has been cancelled Details test khoj.el / test (27.2) (push) Has been cancelled Details test khoj.el / test (28.1) (push) Has been cancelled Details test khoj.el / test (28.2) (push) Has been cancelled Details test khoj.el / test (snapshot) (push) Has been cancelled Details The newly added KHOJ_TELEMETRY_DISABLE env var knob to disable telemetry should override old config mechanism when set	2024-11-24 00:54:16 -08:00
sabaimran	2d683898c2	Release Khoj version 1.30.7	2024-11-23 22:51:10 -08:00
sabaimran	914ff994f7	Fix cost addition to chat_metadata Some checks are pending build khoj.el / build (push) Waiting to run Details desktop / 🖥️ Build, Release Desktop App (push) Waiting to run Details dockerize / Publish Khoj Docker Images (push) Waiting to run Details build and deploy github pages for documentation / deploy (push) Waiting to run Details pre-commit / Setup Application and Lint (push) Waiting to run Details pypi / Publish Python Package to PyPI (push) Waiting to run Details test / Run Tests (push) Waiting to run Details test khoj.el / test (27.1) (push) Waiting to run Details test khoj.el / test (27.2) (push) Waiting to run Details test khoj.el / test (28.1) (push) Waiting to run Details test khoj.el / test (28.2) (push) Waiting to run Details test khoj.el / test (snapshot) (push) Waiting to run Details	2024-11-23 22:50:45 -08:00
Debanjum	caaa127dcf	Release Khoj version 1.30.6	2024-11-23 21:07:00 -08:00
Debanjum	8f966b11ec	Release Khoj version 1.30.5	2024-11-23 20:49:05 -08:00
Debanjum	e5b211a743	Release Khoj version 1.30.4	2024-11-23 19:48:21 -08:00
Debanjum	c4ef31d86f	Release Khoj version 1.30.3 Some checks are pending build khoj.el / build (push) Waiting to run Details desktop / 🖥️ Build, Release Desktop App (push) Waiting to run Details dockerize / Publish Khoj Docker Images (push) Waiting to run Details build and deploy github pages for documentation / deploy (push) Waiting to run Details pypi / Publish Python Package to PyPI (push) Waiting to run Details test khoj.el / test (27.1) (push) Waiting to run Details test khoj.el / test (27.2) (push) Waiting to run Details test khoj.el / test (28.1) (push) Waiting to run Details test khoj.el / test (28.2) (push) Waiting to run Details test khoj.el / test (snapshot) (push) Waiting to run Details	2024-11-23 14:40:06 -08:00
sabaimran	4ac49ca90f	Release Khoj version 1.30.2	2024-11-23 12:00:28 -08:00
sabaimran	eb1b21baaa	Add a new sign in modal that is triggered from the login prompt screen, rather than redirecting user to another screen to sign in	2024-11-23 11:55:34 -08:00
sabaimran	7f5bf35806	Disambiguate renewal_date type. Previously, being used as None, False, and Datetime in different places. Some checks failed dockerize / Publish Khoj Docker Images (push) Waiting to run Details build and deploy github pages for documentation / deploy (push) Waiting to run Details pypi / Publish Python Package to PyPI (push) Waiting to run Details pre-commit / Setup Application and Lint (push) Has been cancelled Details test / Run Tests (push) Has been cancelled Details	2024-11-22 12:06:20 -08:00
sabaimran	5e8c824ecc	Improve the experience for finding past conversation - add a conversation title search filter, and an agents filter, for finding conversations - in the chat session api, return relevant agent style data	2024-11-22 12:03:01 -08:00
sabaimran	a761865724	Fix handling of customer.subscription.updated event to process new renewal end date	2024-11-22 12:03:01 -08:00
sabaimran	6a054d884b	Add quicker/easier filtering on auth	2024-11-22 12:03:01 -08:00
Debanjum	b9a889ab69	Fix Khoj responses when code generated charts in response context The current fix should improve Khoj responses when charts in response context. It truncates code context before sharing with response chat actors. Previously Khoj would respond with it not being able to create chart but than have a generated chart in it's response in default mode. The truncate code context was added to research chat actor for decision making but it wasn't added to conversation response generation chat actors. When khoj generated charts with code for its response, the images in the context would exceed context window limits. So the truncation logic to drop all past context, including chat history, context gathered for current response. This would result in chat response generator 'forgetting' all for the current response when code generated images, charts in response context.	2024-11-21 14:43:52 -08:00
Debanjum	5475a262d4	Move truncate code context func for reusability across modules It needs to be used across routers and processors. It being in run_code tool makes it hard to be used in other chat provider contexts due to circular dependency issues created by send_message_to_model_wrapper func	2024-11-21 14:27:39 -08:00
Debanjum	f434c3fab2	Fix toggling prompt tracer on/off in Khoj via PROMPTRACE_DIR env var Previous changes to depend on just the PROMPTRACE_DIR env var instead of KHOJ_DEBUG or verbosity flag was partial/incomplete. This fix adds all the changes required to only depend on the PROMPTRACE_DIR env var to enable/disable prompt tracing in Khoj.	2024-11-21 14:06:00 -08:00
Debanjum	1f96c13f72	Enable starting khoj uvicorn server with ssl cert file, key for https Pass your domain cert files via the --sslcert, --sslkey cli args. For example, to start khoj at https://example.com, you'd run command: KHOJ_DOMAIN=example.com khoj --sslcert example.com.crt --sslkey example.com.key --host example.com This sets up ssl certs directly with khoj without requiring a reverse proxy like nginx to serve khoj behind https endpoint for simple setups. More complex setups should, of course, still use a reverse proxy for efficient request processing	2024-11-21 11:07:18 -08:00
sabaimran	9fea02f20f	In telemetry, differentiate create_user google and email	2024-11-21 11:01:37 -08:00
sabaimran	9db885b5f7	Limit access to chat models to futurist users	2024-11-21 07:53:24 -08:00
sabaimran	3519dd76f0	Fix type of excalidraw image response	2024-11-20 19:01:13 -08:00
sabaimran	467de76fc1	Improve the image diagramming prompts and response parsing	2024-11-20 18:59:40 -08:00
Debanjum	2203236e4c	Update desktop app dependencies	2024-11-20 13:05:55 -08:00
Debanjum	6f1adcfe67	Track Usage Metrics in Chat API. Track Running Cost, Accuracy in Evals (#985 ) - Track, return cost and usage metrics in chat api response Track input, output token usage and cost of interactions with openai, anthropic and google chat models for each call to the khoj chat api - Collect, display and store costs & accuracy of eval run currently in progress This provides more insight into eval runs during execution instead of having to wait until the eval run completes.	2024-11-20 12:59:44 -08:00
Debanjum	bbd24f1e98	Improve dropdown menus on web app setting page with scroll & min-width - Previously when settings list became long the dropdown height would overflow screen height. Now it's max height is clamped and y-scroll - Previously the dropdown content would take width of content. This would mean the menu could sometimes be less wide than the button. It felt strange. Now dropdown content is at least width of parent button	2024-11-20 12:27:13 -08:00
Debanjum	c53c3db96b	Track, return cost and usage metrics in chat api response - Track input, output token usage and cost for interactions via chat api with openai, anthropic and google chat models - Get usage metadata from OpenAI using stream_options - Handle openai proxies that do not support passing usage in response - Add new usage, end response events returned by chat api. - This can be optionally consumed by clients at a later point - Update streaming clients to mark message as completed after new end response event, not after end llm response event - Ensure usage data from final response generation step is included - Pass usage data after llm response complete. This allows gathering token usage and cost for the final response generation step across streaming and non-streaming modes	2024-11-20 12:17:58 -08:00
Debanjum	80df3bb8c4	Enable prompt tracing only when PROMPTRACE_DIR env var set Better to decouple prompt tracing from debug mode or verbosity level and require explicit, independent config to enable prompt tracing	2024-11-20 11:54:02 -08:00
Debanjum	9ab76ccaf1	Skip adding agent to chat metadata when chat unset to avoids null ref	2024-11-19 21:10:23 -08:00
Debanjum	4da0499cd7	Stream responses by openai's o1 model series, as api now supports it Previously o1 models did not support streaming responses via API. Now they seem to do	2024-11-19 21:10:23 -08:00
Debanjum	7bdc9590dd	Fix handling sources, output in chat actor when is automated task Remove unnecessary ```python prefix removal. It isn't being triggered in json deserialize path.	2024-11-19 13:49:27 -08:00
Debanjum	0e7d611a80	Remove ```python codeblock prefix from raw json before deserialize	2024-11-19 12:53:52 -08:00
Debanjum	001c13ef43	Upgrade web app package dependencies	2024-11-19 12:53:52 -08:00
sabaimran	5134d49d71	Release Khoj version 1.30.1	2024-11-18 17:30:33 -08:00
sabaimran	8bdd0b26d3	And a connections clean up decorator to all scheduled tasks	2024-11-18 17:19:36 -08:00
Debanjum	817601872f	Update default offline models enabled	2024-11-18 16:38:17 -08:00
Debanjum	653127bf1d	Improve data source, output mode selection - Set output mode to single string. Specify output schema in prompt - Both thesee should encourage model to select only 1 output mode instead of encouraging it in prompt too many times - Output schema should also improve schema following in general - Standardize variable, func name of io selector for readability - Fix chat actors to test the io selector chat actor - Make chat actor return sources, output separately for better disambiguation, at least during tests, for now	2024-11-18 15:11:37 -08:00
Debanjum	e3fd51d14b	Pass user arg to create title from query in new automation flow	2024-11-18 12:58:10 -08:00
Debanjum	9e74de9b4f	Improve serializing conversation JSON to print messages on console - Handle chatml message.content with non-json serializable data like WebP image binary data used by Gemini models	2024-11-18 12:57:05 -08:00
sabaimran	3f70d2f685	Add more graceful exception handling when tool selection doesn't work	2024-11-18 09:34:49 -08:00
sabaimran	f75085dc7a	Release Khoj version 1.30.0	2024-11-17 21:36:22 -08:00
sabaimran	c72813ba67	Merge pull request #981 from rznzippy/bugfix/980/database-connections-leakage Fix database connections leakage (#980)	2024-11-17 21:01:06 -08:00
sabaimran	7d50c6590d	Merge pull request #977 from khoj-ai/features/improve-tool-selection - JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically. - Make slight improvements to tool selection indicators	2024-11-17 20:08:19 -08:00
Debanjum	48567fd468	Do not erase partial message when generation stopped via button on web app Previously, we'd replace the generated message with an error message when message generation stopped via stop button on chat page of web app. So the partially generated message (which could be useful) gets lost. This change just stops generation, while keeping the generated response so any useful information from the partially generated message can be retrieved.	2024-11-17 16:29:18 -08:00
Debanjum	285006d6c9	Sync chat models in Khoj with OpenAI proxies (e.g Ollama) on startup - Allows managing chat models in the OpenAI proxy service like Ollama. - Removes need to manually add, remove chat models from Khoj Admin Panel for these OpenAI compatible API services when enabled. - Khoj still mantains the chat models configs within Khoj, so they can be configured via the Khoj admin panel as usual.	2024-11-17 15:34:36 -08:00
Debanjum	d6eece63f4	Use Jina API Key of Jina web scraper if configured in DB Previously Jina search didn't API key. Now that it does need API key, we should re-use the API key set in the Jina web scraper config, otherwise fallback to using JINA_API_KEY from environment variable, if either is present. Resolves #978	2024-11-17 15:34:14 -08:00
sabaimran	6531f24ca0	Further improvements for descriptions to LLM on modes, code, diagram, image.	2024-11-17 13:23:57 -08:00
sabaimran	0eba6ce315	When diagram generation fails, save to conversation log - Update tool name when choosing tools to execute	2024-11-17 13:23:12 -08:00
sabaimran	7e662a05f8	Merge branch 'master' of github.com:khoj-ai/khoj into features/improve-tool-selection	2024-11-17 12:26:55 -08:00
Ilya Khrustalev	00b1af8f99	Fix database connections leakage (#980 )	2024-11-17 19:15:05 +01:00
Debanjum	69ef6829c1	Simplify integrating Ollama, OpenAI proxies with Khoj on first run - Integrate with Ollama or other openai compatible APIs by simply setting `OPENAI_API_BASE' environment variable in docker-compose etc. - Update docs on integrating with Ollama, openai proxies on first run - Auto populate all chat models supported by openai compatible APIs - Auto set vision enabled for all commercial models - Minor - Add huggingface cache to khoj_models volume. This is where chat models and (now) sentence transformer models are stored by default - Reduce verbosity of yarn install of web app. Otherwise hit docker log size limit & stops showing remaining logs after web app install - Suggest `ollama pull <model_name>` to start it in background	2024-11-17 02:08:20 -08:00
Debanjum	2366fa08b9	Update default vision supported & anthropic chat models on first run - Update to latest initialize with new claude 3.5 sonnet and haiku models - Update to set vision enabled for google and anthropic models by default. Previously we didn't support but we've supported this for a month or two now	2024-11-17 02:08:20 -08:00
Debanjum	23ab258d78	Improve user conversation config details on Admin panel Show user email and chat model that is associated with the user conversation config	2024-11-17 02:08:20 -08:00
Debanjum	fc45aceecf	Delete unused favicon ico in old web app directory	2024-11-17 02:08:20 -08:00
Debanjum	a16fc3ade8	Only add /research prefix when no slash command in message on web app - Explictly adding a slash command is a higher priority intent than research mode being enabled in the background. Respect that for a more intuitive UX flow. - Explicit slash commands do not currently work in research mode. You've to turn research mode off to use other slash commands. This is strange, unnecessary given intent priority is clear.	2024-11-17 02:08:20 -08:00
sabaimran	a1b4587b34	Remove extract_images flag from PDF loader	2024-11-15 21:46:35 -08:00
sabaimran	e3f1ea9dee	Improve tool, output mode selection process - JSON extract from LLMs is pretty decent now, so get the input tools and output modes all in one go. It'll help the model think through the full cycle of what it wants to do to handle the request holistically. - Make slight improvements to tool selection indicators	2024-11-15 13:53:53 -08:00
sabaimran	c1a5b32ebf	Do not start server when importing the main.py file, unless gunicorn - Add more graceful shutdown when closing bg scheduler thread	2024-11-14 17:36:51 -08:00
sabaimran	be3ee5ec9f	Add cool new suggestion cards for math, diagramming	2024-11-14 17:36:51 -08:00
Debanjum	8e009f48ce	Show tool call error in next iteration. Allow rerun if model requests. Previously errors would get eaten up but the model wouldn't see anything. And the model wouldn't be allowed re-run the same query-tool combination in the next iteration. This update should give it insight into why it didn't get a result. So it can make an informed (hopefully better) decision on what to do next. And re-run the previous query if appropriate.	2024-11-13 22:50:14 -08:00
Debanjum	604da90fa8	Wrap try/catch around online search in research mode like other tools Previously when call to online search API etc. failed, it'd error out of response to query in research mode. Khoj should skip tool use that iteration but continue to try respond.	2024-11-13 16:46:09 -08:00
Debanjum	8851b5f78a	Standardize chat message truncation and serialization before print Previously chatml messages were just strings, now they can be list of strings or list of dicts as well. - Use json seriallization to manage their variations and truncate them before printing for context. - Put logic in single function for use across chat models	2024-11-13 16:30:17 -08:00
Debanjum	15b0cfa3dd	Improve structured message truncation in logger Previously chatml messages were just strings. Since gemini, anthropic models always have messages as list of strings, truncate those strings instead of the list of message content	2024-11-13 14:32:22 -08:00
Debanjum	153ae8bea9	Cut binary, long output files from code result for context efficiency Removing binary data and truncating large data in output files generated by code runs should improve speed and cost of research mode runs with large or binary output files. Previously binary data in code results was passed around in iteration context during research mode. This made the context inefficient because models have limited efficiency and reasoning capabilities over b64 encoded image (and other binary) data and would hit context limits leading to unnecessary truncation of other useful context Also remove image data when logging output of code execution	2024-11-13 14:32:22 -08:00
sabaimran	4a1b1e8b9a	Add support for interrupting messages after they've been sent.	2024-11-12 22:22:45 -08:00
sabaimran	d607ad7a27	Release Khoj version 1.29.1	2024-11-12 10:32:56 -08:00
sabaimran	8ec1764e42	Handle size calculation more gracefully for converted documents, depending on type	2024-11-12 02:00:29 -08:00
sabaimran	b6714c202f	Increase the title character limit to 500 for conversations	2024-11-12 01:51:19 -08:00
sabaimran	f05e64cf8c	Release Khoj version 1.29.0	2024-11-11 21:46:25 -08:00
sabaimran	47d3c8c235	Remove email query parameter from subscription patch api	2024-11-11 21:39:49 -08:00
sabaimran	d7027109a5	And null handling for response output_files in code output	2024-11-11 21:14:56 -08:00
sabaimran	d68243a3fb	Revert clean_json logic temporarily. Eventually, we should do better validation here to extract markdown-formatted json.	2024-11-11 21:05:17 -08:00
sabaimran	1cab6c081f	Add better error handling for diagram output, and fix chat history construct - Make the `clean_json` method more robust as well	2024-11-11 20:44:19 -08:00
sabaimran	7bd2f83f97	Wrap test in suggestionCard	2024-11-11 20:12:46 -08:00
Debanjum	48862a8400	Enable Passing External Documents for Analysis in Code Sandbox (#960 ) - Allow passing user files as input into code sandbox for analysis - Update prompt to give more example of complex, multi-line code - Simplify logic for model. Run one program at a time, instead of allowing model to run multiple programs in parallel - Show Code generated charts and docs in Reference pane of web app and make them downloaded	2024-11-11 19:37:17 -08:00
Debanjum	5078ac0ce2	Await on conversation save when generate conversation title via API	2024-11-11 19:17:39 -08:00
Debanjum	e1d0015248	Allow disabling Khoj telemetry via KHOJ_TELEMETRY_DISABLE env var	2024-11-11 19:17:39 -08:00
Debanjum	a52500d289	Show generated code artifacts before notes and online references	2024-11-11 18:00:22 -08:00
Debanjum	218eed83cd	Show output file not code on hover. Remove reference card title border	2024-11-11 18:00:22 -08:00
Debanjum	b970cfd4b3	Align styling of reference panel card across code, docs, web results - Add a border below heading - Show code snippet in pre block - Overflow-x when reference side panel open to allow seeing whole text via x-scroll - Align header, body position of reference cards with each other - Only show filename in doc reference cards at message bottom. Show full file path in hover and reference side panel	2024-11-11 18:00:22 -08:00
Debanjum	8e9f4262a9	Render code output files with code references in reference section - Improve rendering code reference with better icons, smaller text and different line clamps for better visibility - Show code output files as sub card of code card in reference section - Allow downloading files generated by code instead of rendering it in chat message directly - Show executed code before online references in reference panel	2024-11-11 18:00:22 -08:00
Debanjum	92c1efe6ee	Fixes to render & save code context with non text based output modes - Fix to render code generated chart with images, excalidraw diagrams - Fix to save code context to chat history in image, diagram output modes - Fix bug in image markdown being wrapped twice in markdown syntax - Render newline in code references shown on chat page of web app Previously newlines weren't getting rendered. This made the code executed by Khoj hard to read in references. This changes fixes that. `dangerouslySetInnerHTML' usage is justified as rendered code snippet is being sanitized by DOMPurify before rendering.	2024-11-11 18:00:22 -08:00
Debanjum	af0215765c	Decode code text output files from b64 to str to ease client processing	2024-11-11 18:00:22 -08:00
Debanjum	7b39f2014a	Enable analysing user documents in code sandbox and other improvements - Run one program at a time, instead of allowing model to pass multiple programs to run in parallel to simplify logic for model - Update prompt to give more example of complex, multi-line code - Allow passing user files as input into code sandbox for analysis - Log code execution timer at info level to evaluate execution latencies in production - Type the generated code for easier processing by caller functions	2024-11-11 17:59:37 -08:00
sabaimran	dc109559d4	Research mode gray when off, colored when on	2024-11-11 16:35:07 -08:00
sabaimran	cdda9c2e73	Improve text wrapping for attached files and preview context For the research mode toggle, make it not fill when it's off	2024-11-11 13:32:10 -08:00
sabaimran	dd36303bb7	Fix sending file attachments in save_to_conversation method - When files attached but upload fails, don't update the state variables - Make removing null characters in pdf extraction more space efficient	2024-11-11 12:53:06 -08:00
Debanjum	536fe994be	Remove unused db adapter methods, like for fact checker data store	2024-11-11 12:22:34 -08:00
Debanjum	10bca6fa8f	Convert required user param check into decorator. Use with more adapters	2024-11-11 12:22:32 -08:00
Debanjum	ff5c10c221	Do not CRUD on entries, files & conversations in DB for null user Increase defense-in-depth by reducing paths to create, read, update or delete entries, files and conversations in DB when user is unset.	2024-11-11 12:20:07 -08:00
sabaimran	27fa39353e	Make custom agent creation flow available to everyone - For private agents, add guardrails to prevent against any misuse or violation of terms of service.	2024-11-11 11:54:59 -08:00
sabaimran	b563f46a2e	Merge pull request #957 from khoj-ai/features/include-full-file-in-convo-with-filter Support including file attachments in the chat message Now that models have much larger context windows, we can reasonably include full texts of certain files in the messages. Do this when an explicit file filter is set in a conversation. Do so in a separate user message in order to mitigate any confusion in the operation. Pipe the relevant attached_files context through all methods calling into models. This breaks certain prior behaviors. We will no longer automatically be processing/generating embeddings on the backend and adding documents to the "brain". You'll have to go to settings and go through the upload documents flow there in order to add docs to the brain (i.e., have search include them during question / response).	2024-11-11 11:34:42 -08:00
sabaimran	2bb2ff27a4	Rename attached_files to query_files. Update relevant backend and client-side code.	2024-11-11 11:21:26 -08:00
sabaimran	47937d5148	Merge branch 'features/include-full-file-in-convo-with-filter' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-11 09:34:08 -08:00
sabaimran	ae4eb96d48	Consolidate file name to icon mapping	2024-11-11 09:34:04 -08:00
Debanjum	7954f39633	Use accept param to file input to indicate supported file types in web app Remove unused total size calculations in chat input	2024-11-11 04:06:17 -08:00
Debanjum	4223b355dc	Use python stdlib methods to write pdf, docx to temp files for loaders Use python standard method tempfile.NamedTemporaryFile to write, delete temporary files safely.	2024-11-11 03:24:50 -08:00
Debanjum	fd15fc1e59	Move construct chat history back to it's original position in file Keep function where it original was allows tracking diffs and change history more easily	2024-11-11 03:24:50 -08:00
Debanjum	35d6c792e4	Show snippet of truncated messages in debug logs to avoid log flooding	2024-11-11 02:30:38 -08:00
sabaimran	8805e731fd	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-10 19:24:11 -08:00
sabaimran	a5e2b9e745	Exit early when running an automation if the conversation for the automation does not exist.	2024-11-10 19:22:21 -08:00
sabaimran	55200be4fa	Apply agent color fill to the toggle both in off and on states	2024-11-10 19:16:43 -08:00
Debanjum	7468f6a6ed	Deduplicate online references returned by chat API to clients This will ensure only unique online references are shown in all clients. The duplication issue was exacerbated in research mode as even with different online search queries, you can get previously seen results. This change does a global deduplication across all online results seen across research iterations before returning them in client reponse.	2024-11-10 16:10:32 -08:00
Debanjum	137687ee49	Deduplicate searches in normal mode & across research iterations - Deduplicate online, doc search queries across research iterations. This avoids running previously run online, doc searches again and dedupes online, doc context seen by model to generate response. - Deduplicate online search queries generated by chat model for each user query. - Do not pass online, docs, code context separately when generate response in research mode. These are already collected in the meta research passed with the user query - Improve formatting of context passed to generate research response - Use xml tags to delimit context. Pass per iteration queries in each iteration result - Put user query before meta research results in user message passed for generating response This deduplications will improve speed, cost & quality of research mode	2024-11-10 16:10:32 -08:00
Debanjum	306f7a2132	Show error in picking next tool to researcher llm in next iteration Previously the whole research mode response would fail if the pick next tool call to chat model failed. Now instead of it completely failing, the researcher actor is told to try again in next iteration. This allows for a more graceful degradation in answering a research question even if a (few?) calls to the chat model fail.	2024-11-10 14:52:02 -08:00
Debanjum	eb492f3025	Only keep webpage content requested, even if Jina API gets more data Jina search API returns content of all webpages in search results. Previously code wouldn't remove content beyond max_webpages_to_read limit set. Now, webpage content in organic results aree explicitly removed beyond the requested max_webpage_to_read limit. This should align behavior of online results from Jina with other online search providers. And restrict llm context to a reasonable size when using Jina for online search.	2024-11-10 14:51:16 -08:00
Debanjum	8ef7892c5e	Exclude non-dictionary doc context from chat history sent to chat models This fixes chat with old chat sessions. Fixes issue with old Whatsapp users can't chat with Khoj because chat history doc context was stored as a list earlier	2024-11-10 14:51:16 -08:00
Debanjum	d892ab3174	Fix handling of command rate limit and improve rate limit messages Command rate limit wouldn't be shown to user as server wouldn't be able to handle HTTP exception in the middle of streaming. Catch exception and render it as LLM response message instead for visibility into command rate limiting to user on client Log rate limmit messages for all rate limit events on server as info messages Convert exception messages into first person responses by Khoj to prevent breaking the third wall and provide more details on wht happened and possible ways to resolve them.	2024-11-10 14:51:16 -08:00
Debanjum	80ee35b9b1	Wrap messages in web, obsidian UI to stay within screen when long links Wrap long links etc. in chat messages and train of thought lists on web app app and obsidian plugin by breaking them into newlines by word	2024-11-10 14:49:51 -08:00
sabaimran	170d959feb	Handle offline messages differently, as they don't respond well to the structured messages	2024-11-09 19:52:46 -08:00
sabaimran	2c543bedd7	Add typing to the constructed messages listed	2024-11-09 19:40:27 -08:00
sabaimran	79b15e4594	Only add images when they're present and vision enabled	2024-11-09 19:37:30 -08:00
sabaimran	bd55028115	Fix randint import from random when creating filenames for tmp	2024-11-09 19:17:18 -08:00
sabaimran	92b6b3ef7b	Add attached files to latest structured message in chat ml format	2024-11-09 19:17:00 -08:00
sabaimran	835fa80a4b	Allow docx conversion in the chatFunction.ts	2024-11-09 18:51:00 -08:00
sabaimran	459318be13	And random suffixes to decreases any clash probability when writing tmp files to disc	2024-11-09 18:46:34 -08:00
sabaimran	dbf0c26247	Remove _summary_ description in function descriptions	2024-11-09 18:42:42 -08:00
sabaimran	e5ac076fc4	Move construct_chat_history method back to conversation.utils.py	2024-11-09 18:27:46 -08:00
sabaimran	bc95a99fb4	Make tracer the last input parameter for all the relevant chat helper methods	2024-11-09 18:22:46 -08:00
sabaimran	ceb29eae74	Add phone number verification and remove telemetry update call from place where authentication middleware isn't yet installed (in the middleware itself).	2024-11-09 12:25:36 -08:00
sabaimran	3badb27744	Remove stored uploaded files after they're processed.	2024-11-08 23:28:02 -08:00
sabaimran	78630603f4	Delete the fact checker application	2024-11-08 17:27:42 -08:00
sabaimran	807687a0ac	Automatically generate titles for conversations from history	2024-11-08 16:02:34 -08:00
sabaimran	7159b0b735	Enforce limits on file size when converting to text	2024-11-08 15:27:28 -08:00
sabaimran	4695174149	Add support for file preview in the chat input area (before message sent)	2024-11-08 15:12:48 -08:00
sabaimran	ad46b0e718	Label pages when extract text from pdf, docs content. Fix scroll area in doc preview.	2024-11-08 14:53:20 -08:00
sabaimran	ee062d1c48	Fix parsing for PDFs via content indexing API	2024-11-07 18:17:29 -08:00
sabaimran	623a97a9ee	Merge branch 'master' of github.com:khoj-ai/khoj into features/include-full-file-in-convo-with-filter	2024-11-07 17:18:23 -08:00
sabaimran	33498d876b	Simplify the share chat page. Don't need it to maintain its own conversation history - When chatting on a shared page, fork and redirect to a new conversation page	2024-11-07 17:14:11 -08:00
sabaimran	4b8be55958	Convert UUID to string when forking a conversation	2024-11-07 17:13:04 -08:00
sabaimran	9bbe27fe36	Set default value of attached files to empty list	2024-11-07 17:12:45 -08:00
sabaimran	3a51996f64	Process attached files in the chat history and add them to the chat message	2024-11-07 16:06:58 -08:00
sabaimran	a89160e2f7	Add support for converting an attached doc and chatting with it - Document is first converted in the chatinputarea, then sent to the chat component. From there, it's sent in the chat API body and then processed by the backend - We couldn't directly use a UploadFile type in the backend API because we'd have to convert the api type to a multipart form. This would require other client side migrations without uniform benefit, which is why we do it in this two-phase process. This also gives us capacity to repurpose the moe generic interface down the road.	2024-11-07 16:06:37 -08:00
sabaimran	e521853895	Remove unnecessary console.log statements	2024-11-07 16:03:31 -08:00
sabaimran	92c3b9c502	Add function to get an icon from a file type	2024-11-07 16:02:53 -08:00
sabaimran	140c67f6b5	Remove focus ring from the text area component	2024-11-07 16:02:02 -08:00
sabaimran	b8ed98530f	Accept attached files in the chat API - weave through all subsequent subcalls to models, where relevant, and save to conversation log	2024-11-07 16:01:48 -08:00
sabaimran	ecc81e06a7	Add separate methods for docx and pdf files to just convert files to raw text, before further processing	2024-11-07 16:01:08 -08:00

1 2 3 4 5 ...

3182 commits