sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-18 02:27:10 +00:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	20b6f0c2f4	Access internal links directly via a simple get request The other webpage scrapers will not work for internal webpages. Try access those urls directly if they are visible to the Khoj server over the network. Only enable this by default for self-hosted, single user setups. Otherwise ability to scan internal network would be a liability! For use-cases where it makes sense, the Khoj server admin can explicitly add the direct webpage scraper via the admin panel	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	d94abba2dc	Fallback through enabled scrapers to reduce web page read failures - Set up scrapers via API keys, explicitly adding them via admin panel or enabling only a single scraper to use via server chat settings. - Use validation to ensure only valid scrapers added via admin panel Example API key is present for scrapers that require it etc. - Modularize the read webpage functions to take api key, url as args Removes dependence on constants loaded in online_search. Functions are now mostly self contained - Improve ability to read webpages by using the speed, success rate of different scrapers. Optimal configuration needs to be discovered	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	11c64791aa	Allow changing perf timer log level. Info log time for webpage read	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	c841abe13f	Change webpage scraper to use via server admin panel	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	e47922e53a	Aggregate webpage extract queries to run once for each distinct webpage This should reduce webpage read and response generation time. Previously, we'd run separate webpage read and extract relevant content pipes for each distinct (query, url) pair. Now we aggregate all queries for each url to extract information from and run the webpage read and extract relevant content pipes once for each distinct url. Even though the webpage content extraction pipes were previously being in parallel. They increased response time by 1. adding more context for the response generation chat actor to respond from 2. and by being more susceptible to page read and extract latencies of the parallel jobs The aggregated retrieval of context for all queries for a given webpage could result in some hit to context quality. But it should improve and reduce variability in response time, quality and costs.	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	98f99fa6f8	Allow using Firecrawl to extract web page content Set the FIRECRAWL_TO_EXTRACT environment variable to true to have Firecrawl scrape and extract content from webpage using their LLM This could be faster, not sure about quality as LLM used is obfuscated	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	993fd7cd2b	Support using Firecrawl to read webpages Firecrawl is open-source, self-hostable with a default hosted service provided, similar to Jina.ai. So it can be 1. Self-hosted as part of a private Khoj cloud deployment 2. Used directly by getting an API key from the Firecrawl.dev service This is as an alternative to Olostep and Jina.ai for reading webpages.	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	731ea3779e	Return data sources to use if exception in data source chat actor Previously no value was returned if an exception got triggered when collecting information sources to search.	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	a932564169	Try respond even if web search, webpage read fails during chat Khoj shouldn't refuse to respond to user if web lookups fail. It should transparently mention that online search etc. failed. But try respond as best as it can without those references This change ensures a response to the users query is attempted even when web info retrieval fails.	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	1b04b801c6	Try respond even if document search via inference endpoint fails The huggingface endpoint can be flaky. Khoj shouldn't refuse to respond to user if document search fails. It should transparently mention that document lookup failed. But try respond as best as it can without the document references This changes provides graceful failover when inference endpoint requests fail either when encoding query or reranking retrieved docs	2024-10-17 17:40:49 -07:00
Debanjum Singh Solanky	9affeb9e85	Fix to log the client app calling the chat API - Remove unused subscribed variable from the chat API - Unexpectedly dropped client app logging when migrated API chat to do advanced streaming in july	2024-10-17 15:24:43 -07:00
Debanjum Singh Solanky	c6c48cfc18	Fix arg to generate_summary_from_file and type of this_iteration	2024-10-17 13:38:48 -07:00
Debanjum Singh Solanky	884fe42602	Allow automation as an output mode supported by custom agents	2024-10-17 11:58:52 -07:00
Debanjum Singh Solanky	c5e19b37ef	Use Khoj icons. Add automation & improve agent text on web login page	2024-10-17 11:58:52 -07:00
Debanjum Singh Solanky	42acc324dc	Handle correctly setting file filters as array when API call fails - Only set addedFiles to selectedFiles when selectedFiles is an array - Only set seleectedFiles, addedFiles to API response json when response succeeded. Previously we set it to response json on errors as well. This made the variables into json objects instead of arrays on API call failure - Check if selectedFiles, addedFiles are arrays before running operations on them. Previously the addedFiles.includes was where the code would fail	2024-10-17 11:58:52 -07:00
Debanjum Singh Solanky	7ebfc24a96	Upgrade Django version used by Khoj server	2024-10-17 11:58:52 -07:00
Debanjum Singh Solanky	ea59dde4a0	Upgrade documentation website dependencies	2024-10-17 11:58:52 -07:00
sabaimran	07ab8ab931	Update handling of gemini response with new API changes. Per documentation: finish_reason (google.ai.generativelanguage_v1beta.types.Candidate.FinishReason): Optional. Output only. The reason why the model stopped generating tokens. If empty, the model has not stopped generating the tokens.	2024-10-17 09:00:01 -07:00
Rehan Daphedar	27835628e6	Fix typo in docs for error 400 fix when self-hosting (#938 )	2024-10-16 23:15:43 -07:00
Debanjum Singh Solanky	19c65fb82b	Show user uuid field in django admin panel	2024-10-15 17:59:12 -07:00
Debanjum Singh Solanky	6c5b362551	Remove deprecated GET chat API endpoint	2024-10-15 15:13:09 -07:00
Debanjum Singh Solanky	931c56182e	Fix default chat model to use user model if no server chat model set - Advanced chat model should also fallback to user chat model if set - Get conversation config should falback to user chat model if set These assume no server chat model settings is configured	2024-10-15 15:13:09 -07:00
Debanjum Singh Solanky	feb6d65ef8	Merge branch 'master' into features/advanced-reasoning	2024-10-15 09:37:56 -07:00
Debanjum Singh Solanky	336c6c3689	Show tool to use decision for next iteration in train of thought	2024-10-15 01:12:18 -07:00
Debanjum Singh Solanky	81fb65fa0a	Return data sources to use if exception in data source chat actor Previously no value was returned if an exception got triggered when collecting information sources to search.	2024-10-14 18:20:20 -07:00
Debanjum Singh Solanky	3c93f07b3f	Try respond even if web search, webpage read fails during chat Khoj shouldn't refuse to respond to user if web lookups fail. It should transparently mention that online search etc. failed. But try respond as best as it can without those references This change ensures a response to the users query is attempted even when web info retrieval fails.	2024-10-14 18:13:26 -07:00
Debanjum Singh Solanky	07ab7ebf07	Try respond even if document search via inference endpoint fails The huggingface endpoint can be flaky. Khoj shouldn't refuse to respond to user if document search fails. It should transparently mention that document lookup failed. But try respond as best as it can without the document references This changes provides graceful failover when inference endpoint requests fail either when encoding query or reranking retrieved docs	2024-10-14 18:13:26 -07:00
Debanjum Singh Solanky	d6206aa80c	Remove deprecated GET chat API endpoint	2024-10-14 18:13:26 -07:00
Debanjum Singh Solanky	263eee4351	Fix default chat model to use user model if no server chat model set - Advanced chat model should also fallback to user chat model if set - Get conversation config should falback to user chat model if set These assume no server chat model settings is configured	2024-10-14 18:13:26 -07:00
sabaimran	81aa1b5589	Update some edge cases and usability of create agent flow - Use the slug to determine which agent to PATCH - Make the agent creation form multi-step to streamline the process	2024-10-14 14:07:31 -07:00
Debanjum Singh Solanky	abcd11cfc0	Merge branch 'master' into features/advanced-reasoning	2024-10-13 03:06:23 -07:00
Debanjum Singh Solanky	9356e66b94	Fix default chat model to use user model if no server chat model set - Advanced chat model should also fallback to user chat model if set - Get conversation config should falback to user chat model if set These assume no server chat model settings is configured	2024-10-13 03:02:29 -07:00
Debanjum Singh Solanky	9314f0a398	Fix default chat configs to use user model if no server chat model set Post merge cleanup in advanced reasoning to fallback to user chat model if no server chat model defined for advanced and default	2024-10-13 02:59:10 -07:00
Debanjum Singh Solanky	8ff13e4cf6	Update readme. Mention new capabilities	2024-10-13 01:30:53 -07:00
Debanjum Singh Solanky	a2200466b7	Merge branch 'master' into features/advanced-reasoning	2024-10-12 21:01:22 -07:00
Debanjum	c66c571396	Simplify switching chat model when self-hosting (#934 ) # Overview - Default to use user chat models for train of thought when no server chat settings created by admins - Default to not create server chat settings on first run # Details This change simplifies switching chat models for self-hosted setups by just changing the chat model on the user settings page. It falls back to use the user chat model for train of thought if server chat settings have not been created on the admin panel. Server chat settings, when set, controls the chat model used for Khoj's train of thought and the default user chat model. Previously a self-hosted user had to update 1. the server chat settings in the admin panel and 2. their own user chat model in the user settings panel to completely switch to a different chat model for both train of thought & response generation respectively You can still set server chat settings via the admin panel to use a different chat model for train of thought vs response generation. But this is only useful for advanced, multi-user setups.	2024-10-12 19:58:05 -07:00
Debanjum Singh Solanky	90888a1099	Log when new user created via magic link or whatsapp as well	2024-10-12 19:56:01 -07:00
Debanjum Singh Solanky	8222c6629d	Remove unused subscribed argument to read_webpage function	2024-10-12 10:45:39 -07:00
Debanjum Singh Solanky	9daaae0fdb	Render inline any image files output by code in message Update regex to also include any links to code generated images that aren't explicitly meant to be displayed inline. This allows folks to download the image (unlike the fake link that doesn't work created by model)	2024-10-12 10:34:57 -07:00
Debanjum Singh Solanky	20d495c43a	Update the iterative chat director prompt to generalize across chat models These prompts work across o1 and standard openai model. Works with anthropic and google models as well	2024-10-12 10:34:57 -07:00
sabaimran	eb4d598d0f	Eliminate the drawer component from the Agents view	2024-10-10 20:40:59 -07:00
sabaimran	0a1c3e4f41	Release Khoj version 1.25.0	2024-10-10 18:07:30 -07:00
sabaimran	01a58b71a5	Skip image, code generation if in research mode	2024-10-10 18:06:29 -07:00
Debanjum Singh Solanky	1b13d069f5	Pass data collected from various sources to code tool in normal flow too	2024-10-10 05:19:27 -07:00
Debanjum Singh Solanky	f462d34547	Render images files output by code interpreter in message on web app	2024-10-10 05:17:53 -07:00
Debanjum Singh Solanky	564491e164	Extract date filters quoted with non-ascii quotes in query	2024-10-10 04:45:00 -07:00
Debanjum Singh Solanky	6a8fd9bf33	Reorder embeddings search arguments based on argument importance	2024-10-10 04:45:00 -07:00
Debanjum Singh Solanky	0eacc0b2b0	Use consistent name for user, planner to not miss current user query Previously Khoj would start answering the previous query. This maybe because the prompt uses User for prompt in chat history but was using Q for current user prompt.	2024-10-10 04:45:00 -07:00
Debanjum Singh Solanky	284c8c331b	Increase default max iterations for research chat director to 5	2024-10-10 04:45:00 -07:00
Debanjum Singh Solanky	1e390325d2	Let research chat director decide which webpage to read, if any Make webpages to read automatically on search_online configurable via a argument. Set it to default to 1, so other callers of the function are unaffected. But iterative chat director can still decide which, if any, webpages to read based on the online search it performs	2024-10-10 04:45:00 -07:00

... 5 6 7 8 9 ...

3885 commits