sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-23 23:48:56 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	d63194c3a9	Create tests for PDF to JSONL processor	2023-06-01 21:42:48 +05:30
Debanjum Singh Solanky	286b500f66	Create PDF to JSONL processor using PyPDF and LangChain Switch `pydantic' to >= 1.9.1 else `langchain.document_loaders' starts throwing typing error for python 3.8, 3.9	2023-06-01 21:41:49 +05:30
Debanjum Singh Solanky	1b3effd8e6	Fork Markdown to JSONL processor as start template for PDF to Jsonl Processor	2023-06-01 09:13:31 +05:30
Debanjum Singh Solanky	1cd9ecd449	Truncate last message if still over max supported prompt size by model	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	ed4d0f9076	Simplify argument names used in khoj openai completion functions - Match argument names passed to khoj openai completion funcs with arguments passed to langchain calls to OpenAI - This simplifies the logic in the khoj openai completion funcs	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	703a7c89c0	Reduce retry count and request timeout for faster response or failure - Fix bug where both LangChain and Khoj retry requests 6 times each. So a total of 12 requests at >1minute intervals for each chat response in case of OpenAI API being down - Retrying too many times when the API is failing doesn't help - The earlier 60 second request timeout was spacing out the interval between retries way too much. This slowed down chat response times quite a bit when API was being flaky - With these updates you'll know if call to chat API failed in under a minute	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	18081b3bc6	Use LangChain to call GPT over API	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	277d2f5c96	Do not add "Notes:" suffix to chat messages when no notes retrieved This was causing spurious "Notes:" suffix being added to Khoj Chat in response	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	334be4e600	Use LangChain to call OpenAI for Khoj Chat - Use ChatModel and ChatOpenAI to call OpenAI chat model instead of using OpenAI package directly - This is being done as part of migration to rely on LangChain for creating agents and managing their state	2023-06-01 08:50:59 +05:30
Debanjum Singh Solanky	efcf7d1508	Extract prompts as LangChain Prompt Templates into a separate module Improves code modularity, cleanliness. Reduces bloat in GPT.py module	2023-06-01 08:50:58 +05:30
Debanjum Singh Solanky	b484953bb3	Import app state correctly to generate embeddings with OpenAI model Resolves #216	2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky	9cfaaf0941	Update docs to configure khoj.yml for using OpenAI model for embeddings	2023-05-28 10:21:54 +05:30
Debanjum Singh Solanky	a0d0dbaca7	Fix link to Khoj Obsidian Demo video in Readmes	2023-05-23 04:23:08 +05:30
Debanjum Singh Solanky	ebb5d7b8e5	Release Khoj version 0.6.2	2023-05-17 20:04:20 +05:30
Debanjum Singh Solanky	d02415edcc	Write generated server id to env file when env file does not contain it	2023-05-17 19:38:44 +05:30
Debanjum Singh Solanky	dc0626856e	Put the telemetry db in a separate directory by default	2023-05-17 18:58:47 +05:30
Debanjum	dc495babb3	Add Telemetry to Understand Khoj Usage ### Objective: Use telemetry to better understand Khoj usage. This will motivate and prioritize work for Khoj. Specific questions: - Number of active deployments of khoj server - How regularly is khoj used (hourly, daily, weekly etc)? - How much is which feature used (chat, search)? - Which UI interface is used most (obsidian, emacs, web ui)? ### Details - Expose setting to disable telemetry logging in khoj.yml - Create basic telemetry server to log data to a DB - Log calls to Khoj API /search, /chat, /update endpoints - Batch upload telemetry data to server at ~hourly interval	2023-05-17 19:09:50 +08:00
Debanjum Singh Solanky	55d72231b3	Generate docker image for telemetry server using Github workflow	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	e9f04dc644	Add dockerfile to containerize telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	07b19964d4	Schedule jobs at (co-)prime intervals to reduce overlap in job runs	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	d42f0f5055	Add basic telemetry server for khoj	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	134cce9d32	Batch upload telemetry data at regular interval instead of while querying	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	3ede919c66	Log usage of /search, /chat, /update API endpoints to telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	f2e89f6f46	Add khoj app helper methods to log app usage to a telemetry server	2023-05-17 16:08:21 +05:30
Debanjum Singh Solanky	9ca61d62ff	Enable/disable logging telemetry by setting bool in khoj.yml config We log usage telemetry by default, unless setting explicitly set in khoj.yml	2023-05-15 23:26:38 +08:00
Debanjum Singh Solanky	131b8407b5	Allow Khoj Chat to respond to general queries not in reference notes - Khoj chat will now respond to general queries if: 1. no relevant reference notes available or 2. when explicitly induced by prefixing the chat message with "@general" - Previously Khoj Chat would a lot of times refuse to respond to general queries not answerable from reference notes or chat history - Make chat quality tests more robust - Add more equivalent chat response options refusing to answer - Force haiku writing to not give any preable, just the haiku	2023-05-12 18:42:40 +08:00
Debanjum Singh Solanky	cc75f986b2	Test text search index only updates on changes to text content	2023-05-12 17:37:34 +08:00
Debanjum Singh Solanky	f9ccce430e	Allow configuring OpenAI chat model for Khoj chat - Simplifies switching between different OpenAI chat models. E.g GPT4 - It was previously hard-coded to use gpt-3.5-turbo. Now it just defaults to using gpt-3.5-turbo, unless chat-model field under conversation processor updated in khoj.yml	2023-05-03 23:01:13 +08:00
Debanjum	f0253e2cbb	Include Filename, Entry Heading in All Compiled Entries to Improve Search Context Merge pull request #214 from debanjum/add-filename-heading-to-compiled-entry-for-context - Set filename as top heading in compiled org, markdown entries - Note: Khoj was already indexing filenames in compiled markdown entries but they weren't set as top level headings but rather appended as bare text. The updated structure should provide more schematic context of relevance - Set entry heading as heading for compiled org, md entries, even if split by max tokens - Snip prepended heading to avoid crossing model max_token limits - Entries with no md headings should not get heading prefix prepended	2023-05-03 22:59:30 +08:00
Debanjum Singh Solanky	6b535cc345	Snip prepended heading to avoid crossing model max_token limits Otherwise if heading > max_tokens than the search models will just see a heading (with repeated filename) for each compiled entry and not actual content. 100 characters should be sufficient to include filename (not path) and entry heading. If longer rather truncate to pass entry unique text to model for search context	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	02aeee60aa	Set filename as top heading of org entries for better search context Previously filename was only being appended to markdown entries. Test filename getting prepended to compiled entry as heading	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	94825a70b9	Set heading of md entries to improve search context for long entries Otherwise if a markdown entry is longer than max_tokens, the split entries (apart from first one) do not get their heading context set	2023-05-03 22:53:13 +08:00
Debanjum Singh Solanky	5de04621b5	Set filename as top heading of md entries for better search context Previously filename was appended to the end of the compiled entry. This didn't provide appropriate structured context Test filename getting prepended as heading to compiled entry	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	0e3fb59e09	Entries with no md headings should not get heading prefix prepended Files with no headings would previously get their entry be prefixed with a markdown heading prefix (#)	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	45a991d75c	Prepend entry heading to all compiled org snippets to improve search context All compiled snippets split by max tokens (apart from first) do not get the heading as context. This limits search context required to retrieve these continuation entries	2023-05-03 22:50:31 +08:00
Debanjum Singh Solanky	3386cc92b5	Fix khoj server config update in khoj.el by unquoting list to cl-push to - cl-push expects a generatlized variable. Else throws (setf quote) undefined warning - This results in the config call failing on calling khoj entrypoint	2023-05-03 15:10:56 +08:00
Debanjum Singh Solanky	948a4274e4	Fix documentation strings and simplify not null checks	2023-05-02 21:47:50 +08:00
Debanjum Singh Solanky	731ef5688f	Use cl-pushnew to fix byte-compile errors with using add-to-list	2023-05-02 21:47:38 +08:00
Debanjum Singh Solanky	f046523b33	Improve khoj.el messages to convey state of khoj server - Remove waiting for server message as it hides the messages from the server - Fix the nil message that were being rendered, by checking before showing messages from server - Consistently prefix messages from khoj with khoj.el	2023-04-28 11:15:13 +08:00
Debanjum Singh Solanky	76df393eb5	Only call khoj server configure API from khoj.el when config updated Previously khoj.el was calling the server configure API even when config was same as before. This had broken the khoj search as you type experience from emacs Also show more details to user about what in khoj is being configured	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	ceae06ae9d	Fix khoj.el compilation warnings around unused variables	2023-04-27 20:45:16 +08:00
Debanjum Singh Solanky	8269adf849	Refactor khoj-setup in khoj.el for readability. No functional change	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	865d12b6f2	Fix escaping quote in chat references to prevent it breaking out of html	2023-04-27 20:45:00 +08:00
Debanjum Singh Solanky	26cb878327	Add Yarn lockfile for Khoj Obsidian	2023-04-18 00:57:11 +07:00
Debanjum Singh Solanky	e3180d63e6	Sync Khoj Obsidian Tagline with Khoj tagline	2023-04-18 00:56:50 +07:00
Debanjum Singh Solanky	62e6e09521	Release Khoj version 0.6.1	2023-04-17 23:31:35 +07:00
Debanjum Singh Solanky	b079fb31bc	Replace Windows path separators in indexName configured via Khoj Obsidian Resolves #185, #199 - Issue IndexName created from Obsidian Absolute Vault path wasn't replacing windows path, drive separators with underscore. It was only replacing unix path separators - Fix Also replace windows drive and path separators with _ while creating IndexName in Khoj Obsidian plugin	2023-04-17 16:55:33 +07:00
Debanjum Singh Solanky	d90df966a9	Make khoj logger use utf-8 encoding when writing to khoj log file Resolve logger error issue mentioned in #199	2023-04-17 16:55:07 +07:00
Debanjum Singh Solanky	dc3f399f91	Fix to get score associated with SearchResponse in result as string	2023-04-16 20:22:51 +07:00
Debanjum Singh Solanky	d5000c63e1	Update Readmes to use python -m pip install khoj-assistant Makes it easier to tell pip associated with which python is being used. Easier to debug when users have different versions of python installed (e.g 3.10 and 3.11)	2023-04-16 20:17:20 +07:00

1 2 3 4 5 ...

1166 commits