sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-23 23:48:56 +01:00

Author	SHA1	Message	Date
Debanjum	586654e2af	Allow directly reading web pages, even when SERP not enabled (#676 ) ### Overview Khoj can now read website directly without needing to go through the search step first ### Details - Parallelize simple webpage read and extractor - Rename extract_content online results field to web pages - Tweak prompts to extract information from webpages, online results - Test select webpage as data source and extract web urls chat actors - Render webpage read in chat response references on Web, Desktop apps - Pass multiple webpages with their urls in online results context - Support webpage command in chat API - Add webpage chat command for read web pages requested by user - Create chat actor for directly reading webpages based on user message	2024-03-24 16:25:25 +05:30
Debanjum Singh Solanky	85c62efca1	Test select webpage as data source and extract web urls chat actors	2024-03-24 15:46:29 +05:30
Debanjum Singh Solanky	ecddf98430	Handle truncation when single long non-system chat message Previously was assuming the system prompt is being always passed as the first message. So expected there to be at least 2 messages in logs. This broke chat actors querying with single long non system message. A more robust way to extract system prompt is via the message role instead	2024-03-15 15:58:39 +05:30
Debanjum Singh Solanky	6118d1ff57	Create chat actor for directly reading webpages based on user message - Add prompt for the read webpages chat actor to extract, infer webpage links - Make chat actor infer or extract webpage to read directly from user message - Rename previous read_webpage function to more narrow read_webpage_at_url function	2024-03-14 14:58:37 +05:30
Debanjum Singh Solanky	dd883dc53a	Dedupe query in notes prompt. Improve OAI chat actor, director tests - Remove stale tests - Improve tests to pass across gpt-3.5 and gpt-4-turbo - The haiku creation director was failing because of duplicate query in instantiated prompt	2024-03-14 01:22:33 +05:30
Debanjum Singh Solanky	70b04d16c0	Test data source, output mode selector, web search query chat actors	2024-03-14 01:22:33 +05:30
Debanjum Singh Solanky	88f096977b	Read webpages directly when Olostep proxy not setup This is useful for self-hosted, individual user, low traffic setups where a proxy service is not required	2024-03-11 18:41:02 +05:30
Debanjum Singh Solanky	ca2f962e95	Read, extract information from web pages in parallel to lower response time - Time reading webpage, extract info from webpage steps for perf analysis - Deduplicate webpages to read gathered across separate google searches - Use aiohttp to make API requests non-blocking, pair with asyncio to parallelize all the online search webpage read and extract calls	2024-03-11 18:41:02 +05:30
sabaimran	e323a6d69b	Include additional user context in the image generation flow (#660 ) * Make major improvements to the image generation flow - Include user context from online references and personal notes for generating images - Dynamically select the modality that the LLM should respond with - Retun the inferred context in the query response for the dekstop, web chat views to read * Add unit tests for retrieving response modes via LLM * Move output mode unit tests to the actor suite, rather than director * Only show the references button if there is at least one available * Rename aget_relevant_modes to aget_relevant_output_modes * Use a shared method for generating reference sections, simplify some of the prompting logic * Make out of space errors in the desktop client more obvious	2024-03-06 13:48:41 +05:30
sabaimran	44f8f20ea7	Miscellaneous bugs and fixes for chat sessions (#646 ) * Display given_name field only if it is not None * Add default slugs in the migration script * Ensure that updated_at is saved appropriately, make sure most recent chat is returned for default history * Remove the bin button from the chat interface, given deletion is handled in the drop-down menus * Refresh the side panel when a new chat is created * Improveme tool retrieval prompt, don't let /online fail, and improve parsing of extract questions * Fix ending chat response by offline chat on hitting a stop phrase Previously the whole phrase wouldn't be in the same response chunk, so chat response wouldn't stop on hitting a stop phrase Now use a queue to keep track of last 3 chunks, and to stop responding when hit a stop phrase * Make chat on Obsidian backward compatible post chat session API updates - Make chat on Obsidian get chat history from `responseJson.response.chat' when available (i.e when using new api) - Else fallback to loading chat history from responseJson.response (i.e when using old api) * Fix detecting success of indexing update in khoj.el When khoj.el attempts to index on a Khoj server served behind an https endpoint, the success reponse status contains plist with certs. This doesn't mean the update failed. Look for :errors key in status instead to determine if indexing API call failed. This fixes detecting indexing API call success on the Khoj Emacs client, even for Khoj servers running behind SSL/HTTPS * Fix the mechanism for populating notes references in the conversation primer for both offline and online chat * Return conversation.default when empty list for dynamic prompt selection, send all cmds in telemetry * Fix making chat on Obsidian backward compatible post chat session API updates New API always has conversation_id set, not `chat' which can be unset when chat session is empty. So use conversation_id to decide whether to get chat logs from `responseJson.response.chat' or `responseJson.response' instead --------- Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>	2024-02-20 13:55:35 -08:00
sabaimran	a3eb17b7d4	Have Khoj dynamically select conversation command(s) in chat (#641 ) * Have Khoj dynamically select which conversation command(s) are to be used in the chat flow - Intercept the commands if in default mode, and have Khoj dynamically guess which tools would be the most relevant for answering the user's query * Remove conditional for default to enter online search mode * Add multiple-tool examples in the prompt, make prompt for tools more specific to info collection	2024-02-11 17:11:32 +05:30
Debanjum Singh Solanky	dd4cf66be1	Improve offline chat system prompt to think step by step	2024-02-06 20:23:19 +05:30
Debanjum Singh Solanky	035165b534	Make offline chat model current date aware. Improve system prompts - Can now expect date awareness chat quality test to pass - Prevent offline chat model from printing verbatim user Notes and special tokens - Make it ask follow-up questions if it needs more context	2024-02-06 20:23:19 +05:30
Debanjum Singh Solanky	0d949140f4	Fix actor, director tests using freeze time by ignoring transformers package transformers package was causing freeze time to fail during setup	2024-02-06 03:00:48 +05:30
sabaimran	b782683e60	Scrape results from Serper results using Olostep (#627 ) * Initailize changes to incporate web scraping logic after getting SERP results - Do some minor refactors to pass a symptom prompt to the openai model when making a query - integrate Olostep in order to perform the webscraping * Fix truncation error with new line, fix typing in olostep code * Use the authorization header for the token * Add a small hint/indicator for how to use Khojs other modalities in the welcome prompt * Add more detailed error message if Olostep query fails * Add unit tests which invoke Olostep in chat director * Add test for olostep tool	2024-01-29 14:16:50 +05:30
Debanjum	4d30f7d1d9	Short-circuit API rate limiter for unauthenticated users (#607 ) ### Major - Short-circuit API rate limiter for unauthenticated user Calls by unauthenticated users were failing at API rate limiter as it failed to access user info object. This is a bug. API rate limiter should short-circuit for unauthenicated users so a proper Forbidden response can be returned by API Add regression test to verify that unauthenticated users get 403 response when calling the /chat API endpoint ### Minor - Remove trailing slash to normalize khoj url in obsidian plugin settings - Move used /api/config API controllers into separate module - Delete unused /api/beta API endpoint - Fix error message rendering in khoj.el, khoj obsidian chat - Handle deprecation warnings for subscribe renew date, langchain, pydantic & logger.warn	2024-01-17 00:59:52 +05:30
Debanjum Singh Solanky	d26a4ffcea	Only run the OpenAI chat client, /online test when API keys are set	2024-01-17 00:36:03 +05:30
Debanjum Singh Solanky	7039c202c8	Merge branch 'master' into short-circuit-api-rate-limiter	2024-01-16 18:18:34 +05:30
Debanjum Singh Solanky	6ded4c1d75	Merge branch 'master' into fix-1000-file-index-update-limit	2024-01-16 16:50:58 +05:30
Debanjum Singh Solanky	e0b381d523	Only run /online command offline chat director test when SERPER KEY present	2024-01-16 13:09:38 +05:30
Debanjum Singh Solanky	7dfbcd2e5a	Handle subscribe renew date, langchain, pydantic & logger.warn warnings - Ensure langchain less than 0.2.0 is used, to prevent breaking ChatOpenAI, PyMuPDF usage due to their deprecation after 0.2.0 - Set subscription renewal date to a timezone aware datetime - Use logger.warning instead of logger.warn as latter is deprecated - Use `model_dump' not deprecated dict to get all configured content_types	2024-01-12 01:46:52 +05:30
Debanjum Singh Solanky	ba99089a12	Short-circuit API rate limiter for unauthenticated user Calls by unauthenticated users were failing at API rate limiter as it failed to access user info object. This is a bug. API rate limiter should short-circuit for unauthenicated users so a proper Forbidden response can be returned by API Add regression test to verify that unauthenticated users get 403 response when calling the /chat API endpoint	2024-01-12 00:23:50 +05:30
Debanjum Singh Solanky	4ded32cc64	Test 1000 file upload limit to index/update API endpoint Due to FastAPI limitation	2024-01-03 22:14:36 +05:30
sabaimran	79913d4c17	Add isort to the pre-commit configuration and apply it to the whole project (#595 ) * Apply isort to the entire repository * Fix missing import issues in text_to_entries * Fix imports in migration files	2023-12-28 18:04:02 +05:30
sabaimran	6dd2b05bf5	Rebase with master	2023-12-19 21:02:49 +05:30
sabaimran	ef21d78c99	Initial changes to support multiple search model configurations - All search models are loaded into memory, and stored in a dictionary indexed by name - Still need to add database migrations and create a UI for user to select their choice. Presently, it uses the default option	2023-12-05 00:35:40 -05:00
Debanjum Singh Solanky	2b09caa237	Make online results an optional argument to the gpt converse method	2023-12-04 12:15:29 -05:00
sabaimran	6e1ba11e59	Resolve merge conflicts for rendering chat response	2023-11-27 11:33:13 -08:00
sabaimran	e438853b09	Add additional unit tests to verify behavior of unsubscribed/subscribed users	2023-11-26 13:09:00 -08:00
Debanjum Singh Solanky	a0a7ab7ec8	Rename conversation.gpt4all package to conversation.offline	2023-11-26 04:19:32 -08:00
sabaimran	73e38fccf3	Explicitly set billing to off in the test for being able to index a large set of data	2023-11-25 20:48:32 -08:00
sabaimran	b2afbaa315	Add support for rate limiting the amount of data indexed - Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed - Delete any files that are being removed for adminstering the calculation - Show current amount of data indexed in the config page	2023-11-25 20:28:04 -08:00
sabaimran	60c23d9e3a	Add online search chat director tests	2023-11-21 23:08:36 -08:00
sabaimran	c652a7fd2d	Move text_to_entries under the new content folder	2023-11-21 22:25:17 -08:00
sabaimran	1e2af083f0	Rename the data_sources module to content	2023-11-21 22:11:32 -08:00
sabaimran	2bb989e9d8	Resolve merge conflicts and fix some import ordering	2023-11-21 12:30:43 -08:00
sabaimran	a474c31e02	Move the django app into the src/khoj folder for better organization and functionality - Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder - Update associated file paths and system references	2023-11-21 10:56:04 -08:00
sabaimran	b8e6883a81	Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search	2023-11-19 16:20:08 -08:00
sabaimran	4def8cce36	Merge pull request #541 from asim-shrestha/patch-1 Add test separators	2023-11-19 14:14:34 -08:00
Debanjum	71799add0b	Index Parent Headings of Org-Mode Entries to Improve Search Context (#548 ) ### Overview The parent hierarchy of org-mode entries can store important context. This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index ### Details - Test search uses ancestor headings as context for improved results - Add ancestor headings of each org-mode entry to their compiled form - Track ancestor headings for each org-mode entry in org-node parser Resolves #85	2023-11-19 13:18:19 -08:00
sabaimran	e398a76779	Fix test word filter	2023-11-19 13:14:58 -08:00
sabaimran	33a9304428	Resolve merge conflicts	2023-11-19 12:57:55 -08:00
sabaimran	ef5e9d66c1	Resolve merge conflicts in dependency imports	2023-11-19 11:42:20 -08:00
sabaimran	f688529150	Update the default configuration for the AppConfig	2023-11-17 19:26:31 -08:00
Debanjum Singh Solanky	ca87b4ede9	Wrap common API query parameters into shared class to deduplicate code - Upgrade FastAPI to >= latest version. Required upgrade of FastAPI. Earlier version didn't support wrapping common query params in class - Use per fixture app instead of a global FastAPI app in conftest - Upgrade minimum required Django version - Fix no notes chat director test with updated no notes message No notes message was updated in commit `118f1143`	2023-11-17 18:43:49 -08:00
Debanjum Singh Solanky	33ad9b8e64	Update text search test since indexing ancestor hierarchy added	2023-11-17 15:26:55 -08:00
Debanjum Singh Solanky	55785d50c3	Use title, when present, as root ancestor of entries instead of file path	2023-11-17 15:03:27 -08:00
sabaimran	ec06d2c446	Move data indexer files into a separate folder under processor. Update assoc UTs	2023-11-16 17:19:55 -08:00
Debanjum Singh Solanky	ddb07def0d	Test search uses ancestor headings as context for improved results - Update test data to add deeper outline hierarchy for testing hierarchy as context - Update collateral tests that need count of entries updated, deleted asserts to be updated	2023-11-16 03:05:19 -08:00
Debanjum Singh Solanky	74403e3536	Add ancestor headings of each org-mode entry to their compiled form Resolves #85	2023-11-16 02:54:41 -08:00

1 2 3 4 5 ...

323 commits