Commit graph

2559 commits

Author SHA1 Message Date
Debanjum Singh Solanky
93edd5427f Add Chat navigation tab back to top pane on web client
Reduces user confusion on how to go to chat pane
Add emoji's for each tab to provide cleaner, iconified division between
the nav options
2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
8159d1ab25 Fix showing Search navigation tab from Agent pages on web client
The `has_documents' flag wasn't being passed. So the search tab
always showing up as empty instead of being dynamically enabled if
documents had been indexed.
2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
76cb543347 Show title bar in Khoj desktop app on Windows 2024-04-09 21:04:44 +05:30
Debanjum Singh Solanky
f040418cf1 Fix indexing files in sub-folders on the Desktop app
- `fs.readdir' func in node version 18.18.2 has buggy `recursive' option
  See nodejs/node#48640, effect-ts/effect#1801 for details

- We were recursing down a folder in two ways on the Desktop app.
  Remove `recursive: True' option to the `fs.readdirSync' method call
  to recurse down via app code only
2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky
a8dec1c9d5 Index all text, code files in Github repos. Not just md, org files 2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky
8291b898ca Standardize structure of text to entries to match other entry processors
Add process_single_plaintext_file func etc with similar signatures as
org_to_entries and markdown_to_entries processors

The standardization makes modifications, abstractions easier to create
2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky
079f409238 Skip indexing Github repo on hitting Github API rate limit
Sleep until rate limit passed is too expensive, as it keeps a
app worker occupied.

Ideally we should schedule job to contine after rate limit wait time
has passed. But this can only be added once we support jobs scheduling.
2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky
d5c9b5cb32 Stop indexing commits, issues and issue comments in Github indexer
Normal indexing quickly Github hits rate limits. Purpose of exposing
Github indexer is for indexing content like notes, code and other
knowledge base in a repo.

The current indexer doesn't scale to index metadata given Github's
rate limits, so remove it instead of giving a degraded experience of
partially indexed repos
2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky
7ff1bd9f8b Send more text file types from Desktop app and improve indexing them
- Allow syncing more file types from desktop app to index on server
  - Use `file-type' package to identify valid text file types on Desktop app

- Split plaintext entries into smaller logical units than a whole file
  Since the text splitting upgrades in #645, compiled chunks have more
  logical splits like paragraph, sentence.
  Show those (potentially) smaller snippets to the user as references

- Tangential Fix:
  Initialize unbound currentTime variable for error log timestamp
2024-04-09 20:19:40 +05:30
Debanjum Singh Solanky
89915dcb4c Identify file type by content & allow server to index all text files
- Use Magika's AI for a tiny, portable and better file type
  identification system
- Existing file type identification tools like `file' and `magic'
  require system level packages, that may not be installed by default
  on all operating systems (e.g `file' command on Windows)
2024-04-09 20:19:39 +05:30
sabaimran
312528d471 Fix typo in SECURE_PROXY_SSL_HEADER settings 2024-04-09 12:33:21 +05:30
sabaimran
e56c5e67dd Revert SSL Redirect setting as it prevents the admin page from loading 2024-04-09 12:24:48 +05:30
sabaimran
1770bb174b Add UUID to the KhojUser search fields and inc frequency of telemetry job to 2 mins 2024-04-09 11:51:51 +05:30
sabaimran
ab51ae9091 Use SECURE_SSL_REDIRECT to ensure requests are routed to https always 2024-04-09 10:18:12 +05:30
sabaimran
1c229dad91 Set daily limit for unsubsribed users to 5 in websocket API 2024-04-08 21:16:48 +05:30
sabaimran
27815d982c Redirect user to the login page when either of the csrf token inputs is missing 2024-04-08 20:22:17 +05:30
sabaimran
d257629f81 Handle case when properties field isn't present in the page 2024-04-08 16:15:47 +05:30
sabaimran
089e0d028b Add a more gracefull error message when the rate limit is exceeded 2024-04-08 15:20:54 +05:30
Debanjum
11ce3e2268
Update Text Chunking Strategy to Improve Search Context (#645)
## Major
- Parse markdown, org parent entries as single entry if fit within max tokens
- Parse a file as single entry if it fits with max token limits
- Add parent heading ancestry to extracted markdown entries for context
- Chunk text in preference order of para, sentence, word, character

## Minor
- Create wrapper function to get entries from org, md, pdf & text files
- Remove unused Entry to Jsonl converter from text to entry class, tests
- Dedupe code by using single func to process an org file into entries

Resolves #620
2024-04-08 13:56:38 +05:30
Debanjum Singh Solanky
67b1178aec Remove debug logs generated while compiling org-mode entries 2024-04-08 13:01:24 +05:30
sabaimran
731ad03348 Skip indexing commits that are missing properties 2024-04-07 15:19:07 +05:30
sabaimran
376eaf64cd Check if results are present in the pages or db response in Notion 2024-04-07 15:19:07 +05:30
Debanjum Singh Solanky
8222615280 Do not add original user message to knowledge search queries for offline chat
It's not required anymore. The extracted questions by the offline chat
model being used should be good enough.
2024-04-07 11:29:35 +05:30
sabaimran
351fb31a34 Add webpage search to socket codepath, add a feature page for online search 2024-04-07 09:23:29 +05:30
Debanjum Singh Solanky
4be4c53222 Release Khoj version 1.9.0 2024-04-05 17:13:58 +05:30
sabaimran
2aedd3c819 Increase freq. of telemetry upload to every 5 minutes 2024-04-05 14:13:47 +05:30
sabaimran
3b1234d084 Await the calls to the db in the notion.py file 2024-04-05 13:58:14 +05:30
sabaimran
00a67e9524 Add additional log lines when configuring the Notion settings for a user in the callback 2024-04-05 13:19:24 +05:30
sabaimran
d23f7da8e3 Handle the case where a previous serach model isn't set when updating the model 2024-04-05 13:18:51 +05:30
sabaimran
f57f9f672d
Address Notion, Image tech debt in indexing code path (#687)
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
2024-04-05 12:10:03 +05:30
sabaimran
a60321b68e Push khoj to include inline references when possible 2024-04-04 10:31:13 +05:30
sabaimran
5bdcb4e69c Wait for location data to be returned before setting up the socket connection 2024-04-04 10:31:13 +05:30
Debanjum Singh Solanky
00f599ea78 Fix passing flags to re.split to break org, md content by heading level
`re.MULTILINE' should be passed to the `flags' argument, not the
`max_splits' argument of the `re.split' func

This was messing up the indexing by only allowing a maximum of
re.MULTILINE splits. Fixing this improves the search quality to
previous state
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
32ac0622ff Extract dates from compiled text entries 2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
29c1c18042 Increase search distance to get relevant content for chat post indexer update
More content indexed per entry would result in an overall scores
lowering effect. Increase default search distance threshold to counter that

- Details
  - Fix expected results post indexing updates
  - Fix search with max distance post indexing updates

- Minor
  - Remove openai chat actor test for after: operator as it's not expected anymore
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
ad4fa4b2f4 Fix adding file path instead of stem to markdown entries 2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
44b3247869 Update logical splitting of org-mode text into entries
- Major
  - Do not split org file, entry if it fits within the max token limits
    - Recurse down org file entries, one heading level at a time until
      reach leaf node or the current parent tree fits context window
    - Update `process_single_org_file' func logic to do this recursion

  - Convert extracted org nodes with children into entries
    - Previously org node to entry code just had to handle leaf entries
    - Now it recieve list of org node trees
    - Only add ancestor path to root org-node of each tree
    - Indent each entry trees headings by +1 level from base level (=2)

- Minor
  - Stop timing org-node parsing vs org-node to entry conversion
    Just time the wrapping function for org-mode entry extraction
    This standardizes what is being timed across at md, org etc.
  - Move try/catch to `extract_org_nodes' from `parse_single_org_file'
    func to standardize this also across md, org
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
eaa27ca841 Only add spaces after heading if any tags in orgnode raw entry repr 2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
2ea8a832a0 Log error when fail to index md file. Fix, improve typing in md_to_entries 2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
44eab74888 Dedupe code by using single func to process an org file into entries
Add type hints to orgnode and org-to-entries packages
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
db2581459f Parse markdown parent entries as single entry if fit within max tokens
These changes improve context available to the search model.
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with sparse, short heading/entry trees

Previously we'd always split markdown files by headings, even if a
parent entry was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to the search model to select appropriate entries for a query,
especially from short entry trees

Revert back to using regex to parse through markdown file instead of
using MarkdownHeaderTextSplitter. It was easier to implement the
logical split using regexes rather than bend MarkdowHeaderTextSplitter
to implement it.
- DFS traverse the markdown knowledge tree, prefix ancestry to each entry
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
982ac1859c Parse markdown file as single entry if it fits with max token limits
These changes improve entry context available to the search model
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with small files

Previously we split all markdown files by their headings,
even if the file was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to select the appropriate entries for a given query for the search model,
especially from short knowledge trees
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
d8f01876e5 Add parent heading ancestory to extracted markdown entries for context
Improve, update the markdown to entries extractor tests
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
86575b2946 Chunk text in preference order of para, sentence, word, character
- Previous simplistic chunking strategy of splitting text by space
  didn't capture notes with newlines, no spaces. For e.g in #620

- New strategy will try chunk the text at more natural points like
  paragraph, sentence, word first. If none of those work it'll split
  at character to fit within max token limit

- Drop long words while preserving original delimiters

Resolves #620
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
a627f56a64 Remove unused Entry to Jsonl converter from text to entry class, tests
This was earlier used when the index was plaintext jsonl file. Now
that documents are indexed in a DB this func is not required.

Simplify org,md,pdf,plaintext to entries tests by removing the entry
to jsonl conversion step
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
28105ee027 Create wrapper function to get entries from org, md, pdf & text files
- Convert extract_org_entries function to actually extract org entries
  Previously it was extracting intermediary org-node objects instead
  Now it extracts the org-node objects from files and converts them
  into entries
- Create separate, new function to extract_org_nodes from files
- Similarly create wrapper funcs for md, pdf, plaintext to entries

- Update org, md, pdf, plaintext to entries tests to use the new
  simplified wrapper function to extract org entries
2024-04-04 02:41:55 +05:30
Debanjum Singh Solanky
f01a12b1d2 Improve styling of chat sessions side panel
- Move green server connected dot to the bottom. Show status when
  disconnected from server
- Move "New conversation" button to right of the "Conversation" title
- Center alignment of the new conversation and connection status buttons
2024-04-04 01:43:26 +05:30
sabaimran
dd1e5e145a Use List[Any] for typing 2024-04-03 21:46:41 +05:30
sabaimran
b8087c4c8e Add typing to empty list variables in github_to_entries 2024-04-03 21:41:36 +05:30
sabaimran
d036fdfc26 If tree is not in the contents, then just return empty files list 2024-04-03 17:55:25 +05:30
Debanjum Singh Solanky
f915b2bd14 Fix passing model_name param to chatml formatter for online chat 2024-04-03 17:21:43 +05:30
sabaimran
6aa88761b8 Skip creating the default agent if there's no default conversation config 2024-04-03 17:21:01 +05:30
sabaimran
b4f71e06b3 Add timeout after 10 minutes of inactivity on socket 2024-04-02 22:12:27 +05:30
sabaimran
f48426623d resolve merge conflict in chat.html 2024-04-02 17:29:48 +05:30
sabaimran
bf1187f465 Use new online/websearch logic and add agent to chat_metadata 2024-04-02 17:20:38 +05:30
sabaimran
867e1007d1 Remove superfluous newline 2024-04-02 17:20:08 +05:30
sabaimran
228ad68042 Merge with origin/master 2024-04-02 17:02:21 +05:30
sabaimran
776550d5ce Add a migration for updating the default chat model, update for existing users 2024-04-02 17:01:31 +05:30
sabaimran
47fc7e1ce6 Rebase with matser 2024-04-02 16:16:06 +05:30
Debanjum
215ab6e66a
Extract More Dates from entries to improve Date Filter (#683)
- Overview
  - Extract more structured date variants (e.g with dot(.) & slash(/) separators, 2-digit year)
  - Extract some natural, partial dates as well from entries
- Capability
  Add ability to extract the following additional date forms:
  - Natural Dates: 21st April 2000, February 29 2024
  - Partial Natural Dates: March 24, Mar 2024
  - Structured Dates: 20/12/24, 20.12.2024, 2024/12/20
  Note: Previously only YYYY-MM-DD ISO-8601 structured date form was extracted for date filters
- Performance
  Using regexes is MUCH faster than using the `dateparser' python library
  It's a little crude but gives acceptable performance for large datasets
2024-04-02 16:14:53 +05:30
Debanjum Singh Solanky
7afee2d55c Let offline chat model set context window. Improve, fix prompts 2024-03-31 16:19:35 +05:30
Debanjum Singh Solanky
4228965c9b Handle msg truncation when question is larger than max prompt size
Notice and truncate the question it self at this point
2024-03-31 15:50:06 +05:30
Debanjum Singh Solanky
886d49e3a4 Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat 2024-03-31 00:59:20 +05:30
Debanjum Singh Solanky
4f65dde201 Release Khoj version 1.8.0 2024-03-31 00:06:15 +05:30
Debanjum Singh Solanky
7923903d21 Improve date filter regexes to extract structured, natural, partial dates
- Much faster than using dateparser
  - It took 2x-4x for improved regex to extracts 1-15% more dates
  - Whereas It took 33x to 100x for dateparser to extract 65% - 400% more dates
  - Improve date extractor tests to test deduping dates, natural,
    structured date extraction from content

- Extract some natural, partial dates and more structured dates
  Using regex is much faster than using dateparser. It's a little
  crude but should pay off in performance.

  Supports dates of form:
  - (Day-of-Month) Month|AbbreviatedMonth Year|2DigitYear
  - Month|AbbreviatedMonth (Day-of-Month) Year|2DigitYear
2024-03-30 00:07:19 +05:30
Debanjum Singh Solanky
104eeea274 Extract natural language and locale specific dates in content
Previously we just extracted dates in YYYY-MM-DD format from content
for date filterings during search.

Use dateparser to extract dates across locales and natural language

This should improve notes returned as context when chat searches
knowledge base with date filters

Fallback to regex for date parsing from content if dateparser fails

- Limit natural date extractor capabilities to improve performance
  - Assume language is english
    Language detection otherwise takes a REALLY long time
  - Do not extract unix timestamps, timezone
    - This isn't required, as just using date and approximating dates as UTC
2024-03-30 00:06:56 +05:30
sabaimran
1195f843a3 Remove forward slash from the root agents endpoint 2024-03-28 23:06:55 +05:30
sabaimran
a1729b9b9e Add telemetry for agents used in conversation, increase image width in agents page 2024-03-28 22:18:11 +05:30
sabaimran
d503b3e867 Use Personality vernacular in agent page
- When setting up the default agent, configure every conversation that doesn't have an agent to use the Khoj agent
- Fix reverse migration for the locale removal migration
2024-03-28 15:07:02 +05:30
sabaimran
e59de8c9b1 Constrain width/size of agent image in agents view 2024-03-28 13:32:11 +05:30
sabaimran
51d0c9b8b0 Add telemetry to keep state of new agents being used 2024-03-28 11:37:24 +05:30
sabaimran
46ebc55e2b Add a top tab for agents 2024-03-28 11:37:01 +05:30
sabaimran
8397187231 Use default agent when creating a new conversation without agent specified 2024-03-28 11:36:27 +05:30
Debanjum Singh Solanky
4912c0ee30 Use extract queries actor to improve notes search with offline chat
Previously we were skipping the extract questions step for offline
chat as default offline chat model wasn't good enough to output proper
json given the time it took to extract questions.

The new default offline chat models gives json much more regularly and
with date filters, so the extract questions step becomes useful given
the impact on latency
2024-03-26 22:33:01 +05:30
Debanjum Singh Solanky
1ebd5c3648 Rename GPT4AllChatProcessor* to OfflineChatProcessor Config, Model 2024-03-26 22:33:01 +05:30
Debanjum Singh Solanky
2a0b943bb4 Use Hermes-2-Pro as default offline chat model in khoj.yml 2024-03-26 22:33:01 +05:30
Debanjum Singh Solanky
8ca39a436c Use llama.cpp for offline chat models
- Benefits of moving to llama-cpp-python from gpt4all:
  - Support for all GGUF format chat models
  - Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
  - Supports models with more capabilities like tools, schema
    enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
  chat models

- Load offline chat model when present on disk without requiring internet
  - Load model onto GPU if not disabled and device has GPU
  - Load model onto CPU if loading model onto GPU fails
  - Create helper function to check and load model from disk, when model
    glob is present on disk.

    `Llama.from_pretrained' needs internet to get repo info from
    HuggingFace. This isn't required, if the model is already downloaded

    Didn't find any existing HF or llama.cpp method that looked for model
    glob on disk without internet
2024-03-26 22:33:01 +05:30
Debanjum Singh Solanky
0a7392f6ec Only add location to image prompt generator when location known 2024-03-26 22:33:01 +05:30
sabaimran
fdf78525b4
Part 2: Add web UI updates for basic agent interactions (#675)
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications

* Customize default behaviors for conversations without agents or with default agents

* Add a new web client route for viewing all agents

* Use agent_id for getting correct agent

* Add web UI views for agents
- Add a page to view all agents
- Add slugs to manage agents
- Add a view to view single agent
- Display active agent when in chat window
- Fix post-login redirect issue

* Fix agent view

* Spruce up the 404 page and improve the overall layout for agents pages

* Create chat actor for directly reading webpages based on user message

- Add prompt for the read webpages chat actor to extract, infer
  webpage links
- Make chat actor infer or extract webpage to read directly from user
  message
- Rename previous read_webpage function to more narrow
  read_webpage_at_url function

* Rename agents_page -> agent_page

* Fix unit test for adding the filename to the compiled markdown entry

* Fix layout of agent, agents pages

* Merge migrations

* Let the name, slug of the default agent be Khoj, khoj

* Fix chat-related unit tests

* Add webpage chat command for read web pages requested by user

Update auto chat command inference prompt to show example of when to
use webpage chat command (i.e when url is directly provided in link)

* Support webpage command in chat API

- Fallback to use webpage when SERPER not setup and online command was
  attempted
- Do not stop responding if can't retrieve online results. Try to
  respond without the online context

* Test select webpage as data source and extract web urls chat actors

* Tweak prompts to extract information from webpages, online results

- Show more of the truncated messages for debugging context
- Update Khoj personality prompt to encourage it to remember it's capabilities

* Rename extract_content online results field to webpages

* Parallelize simple webpage read and extractor

Similar to what is being done with search_online with olostep

* Pass multiple webpages with their urls in online results context

Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.

URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered

* Render webpage read in chat response references on Web, Desktop apps

* Time chat actor responses & chat api request start for perf analysis

* Increase the keep alive timeout in the main application for testing

* Do not pipe access/error logs to separate files. Flow to stdout/stderr

* [Temp] Reduce to 1 gunicorn worker

* Change prod docker image to use jammy, rather than nvidia base image

* Use Khoj icon when Khoj web is installed on iOS as a PWA

* Make slug required for agents

* Simplify calling logic and prevent agent access for unauthenticated users

* Standardize to use personality over tuning in agent nomenclature

* Make filtering logic more stringent for accessible agents and remove unused method:

* Format chat message query

---------

Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
2024-03-26 18:13:24 +05:30
Debanjum Singh Solanky
15ed208996 Use Khoj icon when Khoj web is installed on iOS as a PWA 2024-03-26 00:13:12 +05:30
Debanjum
586654e2af
Allow directly reading web pages, even when SERP not enabled (#676)
### Overview
Khoj can now read website directly without needing to go through the search step first

### Details
- Parallelize simple webpage read and extractor
- Rename extract_content online results field to web pages
- Tweak prompts to extract information from webpages, online results
- Test select webpage as data source and extract web urls chat actors

- Render webpage read in chat response references on Web, Desktop apps
- Pass multiple webpages with their urls in online results context

- Support webpage command in chat API
- Add webpage chat command for read web pages requested by user
- Create chat actor for directly reading webpages based on user message
2024-03-24 16:25:25 +05:30
Debanjum Singh Solanky
9e52ae9e98 Time chat actor responses & chat api request start for perf analysis 2024-03-24 15:47:38 +05:30
Debanjum Singh Solanky
dabf71bc3c Render webpage read in chat response references on Web, Desktop apps 2024-03-24 15:47:38 +05:30
Debanjum Singh Solanky
a2e79c94be Pass multiple webpages with their urls in online results context
Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.

URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered
2024-03-24 15:47:38 +05:30
Debanjum Singh Solanky
71b6905008 Parallelize simple webpage read and extractor
Similar to what is being done with search_online with olostep
2024-03-24 15:46:29 +05:30
Debanjum Singh Solanky
1167f6ddf9 Rename extract_content online results field to webpages 2024-03-24 15:46:29 +05:30
Debanjum Singh Solanky
b22a7dae5d Tweak prompts to extract information from webpages, online results
- Show more of the truncated messages for debugging context
- Update Khoj personality prompt to encourage it to remember it's capabilities
2024-03-24 15:46:29 +05:30
Debanjum Singh Solanky
ad6f6bb0ed Support webpage command in chat API
- Fallback to use webpage when SERPER not setup and online command was
  attempted
- Do not stop responding if can't retrieve online results. Try to
  respond without the online context
2024-03-24 15:46:29 +05:30
Debanjum Singh Solanky
a6b7432837 Add webpage chat command for read web pages requested by user
Update auto chat command inference prompt to show example of when to
use webpage chat command (i.e when url is directly provided in link)
2024-03-24 15:46:29 +05:30
sabaimran
8abc8ded82
Part 1: Server-side changes to support agents integrated with Conversations (#671)
* Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications

* Customize default behaviors for conversations without agents or with default agents

* Use agent_id for getting correct agent

* Merge migrations

* Simplify some variable definitions, add additional security checks for agents

* Rename agent.tuning -> agent.personality
2024-03-23 22:09:38 +05:30
sabaimran
4deb849fb1 Merge branch 'features/add-agents-ui' of github.com:khoj-ai/khoj into features/chat-socket-streaming 2024-03-23 14:04:25 +05:30
sabaimran
8edbd7094f Let the name, slug of the default agent be Khoj, khoj 2024-03-23 14:03:58 +05:30
sabaimran
6b4c4f10b5 Merge branch 'features/add-agents-ui' of github.com:khoj-ai/khoj into features/chat-socket-streaming 2024-03-23 11:22:00 +05:30
sabaimran
20617614ae Merge branch 'features/customize-chat-with-agents' of github.com:khoj-ai/khoj into features/add-agents-ui 2024-03-23 11:20:57 +05:30
sabaimran
2399d91f61 Merge migrations 2024-03-22 10:05:33 +05:30
sabaimran
d38089ab57 Merge with origin 2024-03-22 09:55:33 +05:30
Debanjum Singh Solanky
aed4313cfc Fix updating specific conversation by id from the chat API endpoint
- Use the conversation id of the retrieved conversation rather than the
  potentially unset conversation id passed via API
- await creating new chat when no chat id provided and no existing
  conversations exist
2024-03-21 02:46:52 +05:30
sabaimran
6ba0d8e379 Add a connected notification if the websocket is connected 2024-03-20 20:53:28 +05:30
sabaimran
255b69dc58 Add a comma delimeter between outputted search queries 2024-03-20 19:43:35 +05:30
sabaimran
d84188b221 Scroll down when a message is added in the chat interface's handle stream response method 2024-03-20 15:04:41 +05:30
sabaimran
70ad78990a Use a common method for sending a generic message to the client from the server in the ws connection 2024-03-20 15:04:14 +05:30
sabaimran
d4e83b060a Update the web UI for the chat interface to establish a connection via a socket to the server
- Move some common methods into separate functions to make the UI components more efficient
- The normal HTTP-based chat connection will still work and serves as a fallback if the websocket is unavailable
2024-03-20 14:34:47 +05:30
sabaimran
a346f79b39 Add support for chatting via the web socket connection
- Convert to a model of calling the search API directly with a function call (rather than using the API method)
- Gracefully handle websocket connection disconnects
- Ensure that the rest of the response is still saved, as it is currently, if the user disconects from the client
- Setup unchangeable context at the beginning of the session when the connection is established (like location, username, etc)
2024-03-20 14:33:33 +05:30
Debanjum Singh Solanky
62a83dc9bb Fix online search actor to use natural dates not after: operator
The recently added after: operator to online search actor was too
restrictive, gave worse results than when just use natural language
dates in search query
2024-03-15 21:50:14 +05:30
Debanjum Singh Solanky
4a1e6a2275 Convert deleted old user requests log line to debug from info 2024-03-15 20:50:10 +05:30
Debanjum Singh Solanky
9a068dadbf Fix extract questions prompt to use YYYY-MM-DD date filter format 2024-03-15 18:43:18 +05:30
Debanjum Singh Solanky
ecddf98430 Handle truncation when single long non-system chat message
Previously was assuming the system prompt is being always passed as
the first message. So expected there to be at least 2 messages in logs.

This broke chat actors querying with single long non system message.

A more robust way to extract system prompt is via the message role
instead
2024-03-15 15:58:39 +05:30
Debanjum Singh Solanky
ec0c35b7ed Improve delete, rename chat session UX in Desktop, Web app
- Ask for Confirmation before deleting chat session in Desktop, Web app
- Save chat session rename on hitting enter in title edit input box
- No need to flash previous conversation cleared status message
- Move chat session delete button after rename button in Desktop app
2024-03-15 15:58:19 +05:30
Debanjum Singh Solanky
924b1215ce Allow unset locale for Google authenticated user 2024-03-15 15:35:20 +05:30
Debanjum Singh Solanky
c792fa819f Fix setting chat session title from Desktop app
Pass auth headers to not have the chat session title update request fail
2024-03-15 15:19:20 +05:30
Debanjum Singh Solanky
c9e05dc184 Get conversation by title when requested via chat API 2024-03-15 12:31:50 +05:30
sabaimran
724557fc7b Merge branch 'master' of github.com:khoj-ai/khoj into features/add-agents-ui 2024-03-15 12:14:34 +05:30
sabaimran
7fc484ba7a Merge branch 'master' of github.com:khoj-ai/khoj into features/customize-chat-with-agents 2024-03-15 12:13:28 +05:30
Debanjum Singh Solanky
cac26dafe3 Only create new chat on get if a specific chat id, slug isn't requested 2024-03-15 11:58:39 +05:30
sabaimran
416feb13ef Fix layout of agent, agents pages 2024-03-15 11:17:40 +05:30
sabaimran
d734be61cf Rename agents_page -> agent_page 2024-03-15 10:17:51 +05:30
Debanjum Singh Solanky
08993ff109 Add new, remove old known chat models from model to prompt size map 2024-03-15 04:02:25 +05:30
Debanjum Singh Solanky
fba0338787 Release Khoj version 1.7.0 2024-03-15 00:08:32 +05:30
Debanjum Singh Solanky
6118d1ff57 Create chat actor for directly reading webpages based on user message
- Add prompt for the read webpages chat actor to extract, infer
  webpage links
- Make chat actor infer or extract webpage to read directly from user
  message
- Rename previous read_webpage function to more narrow
  read_webpage_at_url function
2024-03-14 14:58:37 +05:30
Debanjum
e549824fe2
Improve OpenAI Chat Actors and their prompts (#673)
### Major
- Enforce json mode response from OpenAI chat actors prev using string lists
- Use `gpt-4-turbo-preview' as default chat model, extract questions actor
- Make Khoj read khoj website to respond with accurate, up-to-date information about itself
- Dedupe query in notes prompt. Improve OAI chat actor, director tests

### Minor
- Test data source, output mode selector, web search query chat actors
- Improve notes search actor to always create a non-empty list of queries
- Construct available data sources, output modes as a bullet list in prompts
- Use consistent agent name across static and dynamic examples in prompts
- Add actor's name to extract questions prompt to improve context for guidance
2024-03-14 12:44:40 +05:30
sabaimran
3caf0a79d8 Spruce up the 404 page and improve the overall layout for agents pages 2024-03-14 11:26:49 +05:30
sabaimran
c45030af44 Fix agent view 2024-03-14 11:13:19 +05:30
Debanjum Singh Solanky
a1ce12296f Fix rendering online with note references post streaming chat response
Previously only the notes references would get rendered post response
streaming when when both online and notes references were used to
respond to the user's message
2024-03-14 03:40:40 +05:30
Debanjum Singh Solanky
1aeea3d854 Fix opening external links from confirmation dialog box on desktop app 2024-03-14 02:29:22 +05:30
Debanjum Singh Solanky
2e5cc49cb3 Enforce json response from OpenAI chat actors prev using string lists
- Allow passing response format type to OpenAI API via chat actors
- Convert in-context examples to use json objects instead of str lists
- Update actors outputting str list to request output to be json_object
  - OpenAI's json mode enforces the model to output valid json object
2024-03-14 01:22:33 +05:30
Debanjum Singh Solanky
7211eb9cf5 Default to gpt-4-turbo-preview for chat model, extract questions actor
GPT-4 is more expensive and generally less capable than gpt-4-turbo-preview
2024-03-14 01:22:33 +05:30
Debanjum Singh Solanky
dd883dc53a Dedupe query in notes prompt. Improve OAI chat actor, director tests
- Remove stale tests
- Improve tests to pass across gpt-3.5 and gpt-4-turbo
- The haiku creation director was failing because of duplicate query in
  instantiated prompt
2024-03-14 01:22:33 +05:30
Debanjum Singh Solanky
14682d5354 Improve notes search actor to always create a non-empty list of queries
- Remove the option for Notes search query generation actor to return
  no queries. Whether search should be performed is decided before,
  this step doesn't need to decide that
- But do not throw warning if the response is a list with no elements
2024-03-14 01:22:33 +05:30
Debanjum Singh Solanky
f5734826cb Improve pick data source prompt to look online for info about Khoj
- Add examples where user queries requesting information about Khoj
  results in the "online" data source being selected
- Add an example for "general" to select chat command prompt
2024-03-14 01:21:13 +05:30
Debanjum Singh Solanky
9a516bed47 Construct available data sources, output modes as a bullet list in prompts 2024-03-14 00:34:57 +05:30
Debanjum Singh Solanky
f28fb89af8 Use consistent agent name across static and dynamic examples in prompts
Previously the examples constructed from chat history used "Khoj" as
the agent's name but all 3 prompts using the func used static examples
with "AI:" as the pertinent agent's name
2024-03-14 00:34:57 +05:30
Debanjum Singh Solanky
f5793149a9 Add actor's name to extract questions prompt to improve context for guidance 2024-03-14 00:34:57 +05:30
Debanjum Singh Solanky
73ad444086 Make online search Actor read khoj.dev for docs, info about Khoj
- Add example to read khoj.dev website for up-to-date info to setup,
  use khoj, discover khoj features etc.
- Online search should use site: and after: google search operators
  - Show example of adding the after: date filter to google search
- Give local event lookup example using user's current location in
  query
- Remove unused select search content type prompt
2024-03-14 00:34:57 +05:30
sabaimran
290712c3fe Add web UI views for agents
- Add a page to view all agents
- Add slugs to manage agents
- Add a view to view single agent
- Display active agent when in chat window
- Fix post-login redirect issue
2024-03-14 00:07:36 +05:30
Debanjum
3abe7ccb26
Improve Online Search Speed and Context (#670)
### Major
- Read web pages in parallel to improve chat response time
- Read web pages directly when Olostep proxy not setup
- Include search results & web page content in online context for chat response

### Minor
- Simplify, modularize and add type hints to online search functions
2024-03-11 22:16:30 +05:30
Debanjum Singh Solanky
dc86e44a07 Include search results & webpage content in online context for chat response
Previously if a web page was read for a sub-query, only the extracted
web page content was provided as context for the given sub-query. But
the google results themselves have relevant snippets. So include them
2024-03-11 18:41:02 +05:30
Debanjum Singh Solanky
d136a6be44 Simplify, modularize and add type hints to online search functions
- Simplify content arg to `extract_relevant_info' function. Validate,
  clean the content arg inside the `extract_relevant_info' function

- Extract `search_with_google' function outside the parent function
- Call the parent function a more appropriate `search_online' instead
  of `search_with_google'
- Simplify the `search_with_google' function using list comprehension.
  Drop empty search result fields from chat model context for response
  to reduce cost and response latency

- No need to show stacktrace when unable to read webpage, basic error
  is enough
- Add type hints to online search functions to catch issues with mypy
2024-03-11 18:41:02 +05:30
Debanjum Singh Solanky
88f096977b Read webpages directly when Olostep proxy not setup
This is useful for self-hosted, individual user, low traffic setups
where a proxy service is not required
2024-03-11 18:41:02 +05:30
Debanjum Singh Solanky
ca2f962e95 Read, extract information from web pages in parallel to lower response time
- Time reading webpage, extract info from webpage steps for perf
  analysis
- Deduplicate webpages to read gathered across separate google
  searches
- Use aiohttp to make API requests non-blocking, pair with asyncio to
  parallelize all the online search webpage read and extract calls
2024-03-11 18:41:02 +05:30
sabaimran
8e1445b15b Use agent_id for getting correct agent 2024-03-11 14:44:46 +05:30
sabaimran
6ab649312f Add a new web client route for viewing all agents 2024-03-11 14:40:40 +05:30
sabaimran
352168d6c2 Customize default behaviors for conversations without agents or with default agents 2024-03-11 14:20:28 +05:30
sabaimran
9b88976f36 Initial pass at backend changes to support agents
- Add a db model for Agents, attaching them to conversations
- When an agent is added to a conversation, override the system prompt to tweak the instructions
- Agents can be configured with prompt modification, model specification, a profile picture, and other things
- Admin-configured models will not be editable by individual users
- Add unit tests to verify agent behavior. Unit tests demonstrate imperfect adherence to prompt specifications
2024-03-11 12:45:24 +05:30
Debanjum
18fa3e2384
Rerank Search Results by Default on GPU machines (#668)
- Trigger
   SentenceTransformer Cross Encoder models now run fast on GPU enabled machines, including Mac ARM devices since UKPLab/sentence-transformers#2463

- Details
  - Use cross-encoder to rerank search results by default on GPU machines and when using an inference server
  - Only call search API when pause in typing search query on web, desktop apps
2024-03-10 15:15:25 +05:30
Debanjum Singh Solanky
53d402480c Rerank search results with cross-encoder when using an inference server
If an inference server is being used, we can expect the cross encoder
to be running fast enough to rerank search results by default
2024-03-10 15:09:46 +05:30
Debanjum Singh Solanky
44c8d09342 Only call search API when pause in typing search query on web, desktop apps
Wait for 300ms since stop typing before calling search API.

This smooths out UI jitter when rendering search results, especially
now that we're reranking for every search query on GPU enabled devices

Emacs already has 300ms debounce time. More convoluted to add
debounce time to Obsidian search modal, so not updating that yet
2024-03-10 14:29:24 +05:30
Debanjum Singh Solanky
1105d8814f Use cross-encoder to rerank search results by default on GPU machines
Latest sentence-transformer package uses GPU for cross-encoder. This
makes it fast enough to enable reranking on machines with GPU.

Enabling search reranking by default allows (at least) users with GPUs
to side-step learning the UI affordance to rerank results
(i.e hitting Cmd/Ctrl-Enter or ENTER).
2024-03-10 14:29:21 +05:30
Debanjum Singh Solanky
fd81446ba3 Do not create new chat session when an old chat session is deleted
- Fix
  `get_conversation_by_user' shouldn't return new conversation if
  conversation with requested id not found.

  It should only return new conversation if no specific conversation
  is requested and no conversations found for user at all

- Repro
  - Delete a new chat, this calls loadChat via window.onload which
    calls server /chat/history API endpoint with conversationId set to
    that of just deleted conversation sporadically

    The call to GET chat/history API with conversationId set occurs
    when window.onload triggers before the conversationId is deleted
    by the delete button after the DELETE /chat/history API call (via race)

  - In such a scenario, get_conversation_by_user called by
    chat/history API with conversationId of deleted conversation
    returns a new conversation

- Miscellaneous
  - Chat history load should be logged as call to that chat_history api,
    not the "chat" api
  - Show status updates of clearing conversation history in chat input
  - Simplify web, desktop client code by removing unnecessary new variables
2024-03-10 02:17:23 +05:30
Debanjum Singh Solanky
b7fad04870 Use consistent field name for queries in chat history & better image prompt 2024-03-09 19:11:03 +05:30
sabaimran
6aae9864d3 Fix Notion indexing and add an admin view for Entry objects 2024-03-09 16:25:23 +05:30
sabaimran
12d6c4da7d Only include inferred queries in the conversation history for images, not links. Overflow the side panel when too long 2024-03-09 11:59:35 +05:30
sabaimran
e5cd0237e3 Release Khoj version 1.6.2 2024-03-08 17:04:03 +05:30
Debanjum Singh Solanky
446ac7649d Remove unused js method in web chat client, add newline to web data in prompt 2024-03-08 16:40:39 +05:30
Debanjum Singh Solanky
12d32ac99c Increase user visibility into more errors during image generation
Catch OpenAI connection error and errors during better image prompt
generation
2024-03-08 16:40:39 +05:30
sabaimran
ff31759423 Fix target determination in the copy programmatic output button 2024-03-08 16:33:12 +05:30
sabaimran
9f934929c6 Infer mime type from file ending when not available in browser. Don't output image in conversation turns 2024-03-08 12:34:26 +05:30
sabaimran
81beb7940c
Upload generated images to s3, if AWS credentials and bucket is available (#667)
* Upload generated images to s3, if AWS credentials and bucket is available.
- In clients, render the images via the URL if it's returned with a text-to-image2 intent type
* Make the loading screen more intuitve, less jerky and update the programmatic copy button
* Update the loading icon when waiting for a chat response
2024-03-08 10:54:13 +05:30
sabaimran
13894e1fd5 add instructions for drag/drop files in sys prompt 2024-03-07 17:57:42 +05:30
sabaimran
7357b6eff1 Revert white-space preline and add more detailed help text when selecting file 2024-03-06 16:47:27 +05:30
sabaimran
b615c0719e
Support upload for files via drag/drop in the web UI (#666)
* Add additional styling changes for showing UI changes when dragging file to the main screen

* Add a loading spinner when file upload is in progress, and don't index github/notion when indexing files

* Add an explicit icon for file uploading in the chat button menu

* Add appropriate dragover styling when picking a file from the file picker/browser

* Add a loading screen when retrieving chat history. Fix width of the chat window. Put attachment icon to the left of chat input
2024-03-06 16:43:05 +05:30
sabaimran
e323a6d69b
Include additional user context in the image generation flow (#660)
* Make major improvements to the image generation flow

- Include user context from online references and personal notes for generating images
- Dynamically select the modality that the LLM should respond with
- Retun the inferred context in the query response for the dekstop, web chat views to read

* Add unit tests for retrieving response modes via LLM

* Move output mode unit tests to the actor suite, rather than director

* Only show the references button if there is at least one available

* Rename aget_relevant_modes to aget_relevant_output_modes

* Use a shared method for generating reference sections, simplify some of the prompting logic

* Make out of space errors in the desktop client more obvious
2024-03-06 13:48:41 +05:30
Debanjum Singh Solanky
2d61591c22 Improve user visibility into errors during image generation 2024-02-29 13:19:13 +05:30
sabaimran
0bbb5cff85 Release Khoj version 1.6.1 2024-02-26 13:27:20 -08:00
sabaimran
c8194a7364 Make out of space errors in the desktop client more obvious 2024-02-26 11:53:36 -08:00
Debanjum Singh Solanky
956dd71d91 Clean entry before adding to DB and log when it fails
Remove \0 null characters from entry fields as this is causing
indexing errors
2024-02-27 01:19:34 +05:30
Debanjum Singh Solanky
bb613a8e1d Make indentation styling more compact on Obsidian client 2024-02-25 14:41:45 +05:30
Debanjum Singh Solanky
682b70011f Set chat body height to remove UX jitter on chat history load in Web, Desktop 2024-02-25 14:40:47 +05:30
Debanjum Singh Solanky
efe86ce159 Fix saved conversation logger to handle image responses 2024-02-25 13:46:32 +05:30
Debanjum Singh Solanky
4839f2901a Open external links in Desktop app with default app for url on OS
- Open external links using the default link handler registered on OS
  for the link type, e.g http:// -> firefox, mailto: thunderbird etc
- Confirm before opening non-http URL using an external app
2024-02-25 13:21:52 +05:30
Debanjum
170bce2c02
Fix, Improve rendering images in Obsidian, Desktop, Web clients (#659)
- Improve render of inferred query in image chat messages in Web, Desktop apps
- Add inferred queries to image chat responses in Obsidian client
- Fix rendering images from Khoj response in Obsidian client
2024-02-25 00:56:26 +05:30
Debanjum Singh Solanky
f84606325c Improve render of inferred query in image chat messages in Web, Desktop apps 2024-02-25 00:47:06 +05:30
Debanjum Singh Solanky
a2e53d5e41 Add inferred queries to image chat responses in Obsidian client 2024-02-25 00:24:58 +05:30
Debanjum Singh Solanky
9b61f0b5f7 Fix rendering images from Khoj response in Obsidian client 2024-02-25 00:11:11 +05:30
sabaimran
b9d0533d92
Misc. fixes to prompting, admin, and others (#658)
* Simplify and clarify prompt for selecting toolset dynamically

* Add error handling around call to OLOSTEP api

* Fix conversation admin page

* Skip adding none or empty entries in the chunking method
2024-02-24 10:25:42 -08:00
Debanjum Singh Solanky
0e0e751ef7 Improve docstring of entrypoint function to the emacs client 2024-02-24 21:09:41 +05:30
Debanjum
8855529637
Improve Syncing Obsidian Vault, Invalidate Static Assets in Browser Cache in Web Client (#657)
- Improve
  - Only send files modified since their last sync for indexing on server from the Obsidian client
- Fix 
  - Invalidate static asset browser cache in Web client when Khoj version changes
2024-02-24 20:20:30 +05:30
Debanjum Singh Solanky
a46f70c4b0 Remove deprecated lastSyncedFiles settings field from Obsidian client 2024-02-24 20:18:22 +05:30
Debanjum Singh Solanky
03a6b491b2 Warn when can't identify mimeType of files in Desktop, Obsidian clients 2024-02-24 19:59:03 +05:30
Debanjum Singh Solanky
3675ab4864 Only sync modified files from the Obsidian client
Previously we'd send all files in vault and let the server
deduplicate.

This changes takes inspiration from the desktop app, and only pushes
files which were modified after their previous sync with the server.

This should reduce the processing load on the server
2024-02-24 07:48:40 +05:30
Debanjum Singh Solanky
ddfbf31bc8 Append version query param to web asset URLs to bypass browser cache
Ensure latest assets are loaded when khoj version is updated
2024-02-24 06:49:25 +05:30
sabaimran
42773e808c
Retrieve, create, and save conversations differently for ClientApplications (#656)
* Retrieve, create, and save conversations differently if they're coming from a client application

- Not all of our client apps will necessarily maintain state over the conversation IDs available to a user. For some (single-threaded conversations), it should just use a single conversation. Fix the code to do so

* Simplify conversation retrieval logic

* Keep 0 padding below chat response

* Add order_by sorting to retrieving the conversation without id
2024-02-23 11:32:00 -08:00
Debanjum
9afb2a14ef
Fix and Improve Chat UI in Web, Desktop apps (#655)
### Improvements to Chat UI on Web, Desktop apps
- Improve styling of chat session side panel
- Improve styling of chat message bubble in Desktop, Web app
- Add frosted, minimal chat UI to background of Login screen
- Improve PWA install experience of Khoj

### Fixes to Chat UI on Web, Desktop apps
- Fix creating new chat sessions from the Desktop app
- Only show 3 starter questions even when consecutive chat sessions created

### Other Improvements
- Update Khoj cloud trial period to a fortnight instead of a week
- Document using venv to handle dependency conflict on khoj pip install

Resolves #276
2024-02-23 19:27:02 +05:30
Debanjum Singh Solanky
c70ca78cdc Improve PWA install experience for Khoj on Desktop, Mobile
- Resolve PWA issues thrown by Chrome/Edge
  - Add screenshot samples showcasing remember, browse and draw features
    - This can provide a richer app store like experience when
      installing Khoj PWA on Mobile or Desktop
    - Add wide and narrow screenshots to show Mobile vs Desktop UX
  - Add higher resolution favicon for PWA
- Use single web manifest instead of separate ones for Chat, Search
- Update manifest description with more details about Khoj features
2024-02-23 18:59:52 +05:30
Debanjum Singh Solanky
e10b260988 Update web login screen to show frosted minimal chat UI in background 2024-02-23 18:59:52 +05:30
Debanjum Singh Solanky
1b0318564e Log when conversation turn is saved to DB 2024-02-23 18:59:52 +05:30
Debanjum Singh Solanky
4c39960917 Make number of conversation starters to get from DB configurable 2024-02-23 18:59:52 +05:30
Debanjum Singh Solanky
50617594fd Only show 3 starter questions even when consecutive chat sessions created
Reset starter question suggestions before appending in web, desktop app

Otherwise previously it'd keep adding to existing starter question
suggestions on each new session creation if multiple consecutive new
chat sessions created.

This would result in more than the 3 expected starter questions being
displayed at a time
2024-02-23 18:59:52 +05:30
Debanjum Singh Solanky
102f5c3f53 Improve styling of chat session side panel
- Make collapse, expand toggle arrow point in the direction the action
  will expand the side panel in
- Make the collapsed side panel reduce to a 1px sliver
2024-02-23 18:59:52 +05:30
Debanjum Singh Solanky
6283d9fe83 Update Khoj cloud trial period to a fortnight instead of a week
- Improve rate limit error message wording
- Make the "too many requests" error message more robust. Should throw
  that exception fix self.request >= self.subscribed_requests because
  upgrading wouldn't fix this rate limiting
2024-02-23 18:33:56 +05:30
Debanjum Singh Solanky
05c1903784 Fix creating new chat sessions from the Desktop app
Code wasn't passing the authorization header in the POST request to
create new chat session
2024-02-23 18:33:56 +05:30
Debanjum Singh Solanky
8a219b6e9c Improve styling of chat message bubble in Desktop, Web app
- Respect newline with pre-line but not for bullets to improve
  formatting of responses by Khoj
- Respect bold font by loading tajawal font with other weights
- Reduce bottom margin in chat message bubble, its taking too much space
2024-02-23 18:33:56 +05:30
sabaimran
b4902090e7
Misc. chat and application improvements (#652)
* Document original query when subqueries can't be generated
* Only add messages to the chat message log if it's non-empty
* When changing the search model, alert the user that all underlying data will be deleted
* Adding more clarification to the prompt input for username, location
* Check if has_more is in the notion results before getting next_cursor
* Update prompt template for user name/location, update confirmation message when changing search model
2024-02-22 19:09:22 -08:00
Debanjum Singh Solanky
7271164256 Set chat session title to textContent of the chat session HTML element
We don't expect/want the user to use HTML titles for chat session
2024-02-23 02:07:08 +05:30
sabaimran
f8ec6b4464 Remove backslash for default route in api_chat 2024-02-20 20:09:44 -08:00
sabaimran
b1c86fee3b Release Khoj version 1.6.0 2024-02-20 14:12:24 -08:00
sabaimran
44f8f20ea7
Miscellaneous bugs and fixes for chat sessions (#646)
* Display given_name field only if it is not None

* Add default slugs in the migration script

* Ensure that updated_at is saved appropriately, make sure most recent chat is returned for default history

* Remove the bin button from the chat interface, given deletion is handled in the drop-down menus

* Refresh the side panel when a new chat is created

* Improveme tool retrieval prompt, don't let /online fail, and improve parsing of extract questions

* Fix ending chat response by offline chat on hitting a stop phrase

Previously the whole phrase wouldn't be in the same response chunk, so
chat response wouldn't stop on hitting a stop phrase

Now use a queue to keep track of last 3 chunks, and to stop responding
when hit a stop phrase

* Make chat on Obsidian backward compatible post chat session API updates

- Make chat on Obsidian get chat history from
  `responseJson.response.chat' when available (i.e when using new api)
- Else fallback to loading chat history from
  responseJson.response (i.e when using old api)

* Fix detecting success of indexing update in khoj.el

When khoj.el attempts to index on a Khoj server served behind an https
endpoint, the success reponse status contains plist with certs. This
doesn't mean the update failed.

Look for :errors key in status instead to determine if indexing API
call failed. This fixes detecting indexing API call success on the
Khoj Emacs client, even for Khoj servers running behind SSL/HTTPS

* Fix the mechanism for populating notes references in the conversation primer for both offline and online chat

* Return conversation.default when empty list for dynamic prompt selection, send all cmds in telemetry

* Fix making chat on Obsidian backward compatible post chat session API updates

New API always has conversation_id set, not `chat' which can be unset
when chat session is empty.

So use conversation_id to decide whether to get chat logs from
`responseJson.response.chat' or `responseJson.response' instead

---------

Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
2024-02-20 13:55:35 -08:00
sabaimran
138f5223bd
Fix process for generating embeddings for Notion entries (#648)
* Fix process for generating embeddings for Notion entries
* If no title field found, just log a warning and set the title to
2024-02-20 13:46:56 -08:00
Debanjum Singh Solanky
4722da9642 Only enable API token, Whatsapp cards on Web UI when Stripe, Twilio setup 2024-02-16 17:41:09 +05:30
Debanjum Singh Solanky
cf4a524988 Move production dependencies to prod python packages group
This will reduce khoj dependencies to install for self-hosting users

- Move auth production dependencies to prod python packages group
  - Only enable authentication API router if not in anonymous mode
  - Improve error with requirements to enable authentication when not in
    anonymous mode
2024-02-16 17:41:08 +05:30
Debanjum Singh Solanky
d7dbb715ef Fix docs links in khoj introductory chat message 2024-02-13 22:38:03 +05:30
sabaimran
32ec54172e
Add additional personalization in Chat via Location, Username (#644)
* Add location metadata to chat history
* Add support for custom configuration of the user name
* Add region, country, city in the desktop app's URL for context in chat
* Update prompts to specify user location, rather than just location.
* Add location data to Obsidian chat query
* Use first word for first name, last word for last name when setting profile name
2024-02-13 17:05:13 +05:30
sabaimran
a3eb17b7d4
Have Khoj dynamically select conversation command(s) in chat (#641)
* Have Khoj dynamically select which conversation command(s) are to be used in the chat flow
- Intercept the commands if in default mode, and have Khoj dynamically guess which tools would be the most relevant for answering the user's query
* Remove conditional for default to enter online search mode
* Add multiple-tool examples in the prompt, make prompt for tools more specific to info collection
2024-02-11 17:11:32 +05:30
sabaimran
69344a6aa6
Add support for multiple chat sessions in the desktop application (#639)
* Add chat sessions to the desktop application
* Increase width of the main chat body to 90vw
* Update the version of electron
* Render the default message if chat history fails to load
* Merge conversation migrations and fix slug setting
* Update the welcome message, use the hostURL, and update background color for chat actions
* Only update the window's web contents if the page is config
2024-02-11 16:05:28 +05:30
sabaimran
1412ed6a00
Support multiple chat sessions within the web UI (#638)
* Enable support for multiple chat sessions within the web client

- Allow users to create multiple chat sessions and manage them
- Give chat session slugs based on the most recent message
- Update web UI to have a collapsible menu with active chats
- Move chat routes into a separate file

* Make the collapsible side panel more graceful, improve some styling elements of the new layout

* Support modification of the conversation title

- Add a new field to the conversation object
- Update UI to add a threedotmenu to each conversation

* Get the default conversation if a matching one is not found by id
2024-02-11 15:48:28 +05:30
Debanjum Singh Solanky
70f74cde68 Fix timestamps to separate each logline. Info log response start time 2024-02-07 20:45:16 +05:30
Debanjum Singh Solanky
8e5db72140 Release Khoj version 1.5.1 2024-02-06 23:09:33 +05:30
Debanjum
fc1b8f6fb6
Fix Khoj Obsidian plugin on Obsidian Mobile (#635)
- Removed node-fetch dependency to work on mobile. 
- Fix CORS issue for Khoj (streaming) chat on Obsidian mobile
- Verified Khoj plugin, search, chat work on Obsidian mobile.

## Details
### Major
- Allow calls to Khoj server from Obsidian mobile app to fix CORS issue
- Chat stream using default `fetch' not `node-fetch' in obsidian plugin

### Minor
- Load chat history after other elements in chat modal on Obsidian are rendered
- Scroll to bottom of chat modal on Obsidian across mobile & desktop
2024-02-06 22:03:51 +05:30
Debanjum Singh Solanky
fd238ff792 Load chat history after other elements in chat modal on Obsidian rendered
This reduces laggy feeling due to latency of loading chat history from
server
2024-02-06 21:25:43 +05:30
Debanjum Singh Solanky
e06a0c6ae0 Scroll to bottom of chat modal on Obsidian across mobile & desktop
Put logic into single reused function
2024-02-06 21:25:43 +05:30
Debanjum Singh Solanky
07dc04f40e Allow calls to Khoj server from Obsidian mobile app to fix CORS issue
- Obsidian mobile uses capacitor js. Requests from it have origin as
  http://localhost on Android and capacitor://localhost on iOS
- Allow those Obsidian mobile origins in CORS middleware of server
2024-02-06 21:25:43 +05:30
Debanjum Singh Solanky
dd4cf66be1 Improve offline chat system prompt to think step by step 2024-02-06 20:23:19 +05:30
Debanjum Singh Solanky
035165b534 Make offline chat model current date aware. Improve system prompts
- Can now expect date awareness chat quality test to pass
- Prevent offline chat model from printing verbatim user Notes and
  special tokens
- Make it ask follow-up questions if it needs more context
2024-02-06 20:23:19 +05:30
Debanjum Singh Solanky
447904f0ab Chat stream using default fetch' not node-fetch' in obsidian plugin
Plugins using NodeJS libraries like `node-fetch' don't work on
Obsidian mobile
2024-02-06 03:03:42 +05:30
Debanjum Singh Solanky
ba79334863 Only log number of day old user requests, not the complete dictionary 2024-02-02 10:33:31 +05:30
Debanjum Singh Solanky
1c6f1d94f5 Fix styling of Whatsapp card & notify banner in config page of web app
- Put Whatsapp card back in Client section.
  - Fixes side spacing on cards
  - Improve Whatsapp card row gaps

- Hide notification banner on web app load. Previously it showed up as
  a yellow dot on smaller displays
2024-02-01 22:59:57 +05:30
sabaimran
4daac334bc
Fix subscription state detection for users based on phone numbers, emails (#633)
* Fix subscription state detection for users based on phone numbers, emails
* Fix unit tests for api_user4
* Use a single method for determining subscription from user
* Pass user object, rather than user.email for getting subscription state
2024-01-31 07:48:55 +05:30
sabaimran
fc4b57d9f6 Revert styling for white-space pre-line in the chat views as it looks bad 2024-01-29 18:29:54 +05:30
sabaimran
da854703aa Release Khoj version 1.5.0 2024-01-29 18:05:10 +05:30
Debanjum
d1bfb245df
Improve Khoj Chat and Settings UI (#630)
* Fix license in pyproject.toml. Remove unused utils.state import

* Use single debug mode check function. Disable telemetry in debug mode

- Use single logic to check if khoj is running in debug mode.
  Previously there were 3 different variants of the check

- Do not log telemetry if KHOJ_DEBUG is set to true. Previously didn't
log telemetry even if KHOJ_DEBUG set to false

* Respect line breaks in user, khoj chat messages to improve formatting

* Disable Whatsapp config section on web client if Twilio not configured

Simplify Whatsapp configuration status checking js by standardizing
external input to lower case

* Disable Phone API when Twilio not setup and rate limit calls to it

- Move phone api to separate router and only enable it if Twilio enabled
- Add rate-limiting to OTP and verification calls

* Add slugs for phone rate limiting

---------

Co-authored-by: sabaimran <narmiabas@gmail.com>
2024-01-29 18:03:43 +05:30
sabaimran
4fb8d5c6d4
Store rate limiter-related metadata in the database for more resilience (#629)
* Store rate limiter-related metadata in the database for more resilience
- This helps maintain state even between server restarts
- Allows you to scale up workers on your service without having to implement sticky routing
* Make the usage exceeded message less abrasive
* Fix rate limiter for specific conversation commands and improve the copy
2024-01-29 15:27:06 +05:30
sabaimran
71cbe5160d
Add retries in case the embeddings API fails (#628)
* Add retries in case the embeddings API fails
* Improve error handling in the inference endpoint API request handler
- retry only if HTTP exception
- use logger to output information about errors
2024-01-29 15:26:34 +05:30
sabaimran
b782683e60
Scrape results from Serper results using Olostep (#627)
* Initailize changes to incporate web scraping logic after getting SERP results
- Do some minor refactors to pass a symptom prompt to the openai model when making a query
- integrate Olostep in order to perform the webscraping
* Fix truncation error with new line, fix typing in olostep code
* Use the authorization header for the token
* Add a small hint/indicator for how to use Khojs other modalities in the welcome prompt
* Add more detailed error message if Olostep query fails
* Add unit tests which invoke Olostep in chat director
* Add test for olostep tool
2024-01-29 14:16:50 +05:30
sabaimran
360b59cdb2 Add handling for None field values in logs and make telemetry upload more frequent 2024-01-26 00:00:55 +05:30
sabaimran
737fb6417b Revert none checking in telemetry logs 2024-01-25 23:48:09 +05:30
sabaimran
211c5623e8 Improve error handling for telemetry uploads
- Use response.raise_for_status when telemetry upload files
- Do not send null packets to the destination server
2024-01-25 20:40:42 +05:30
Debanjum Singh Solanky
098a8e4fb1 Fix evaluating connected to server status in Obsidian plugin
Only show welcome status message when khojApiKey not set and khojUrl
set to khoj cloud
2024-01-25 18:04:29 +05:30
Debanjum Singh Solanky
1c52ddf792 Bump up server side content indexing interval to ~1 day
Reduce server side indexing load and API request failures
2024-01-25 13:33:34 +05:30
sabaimran
0fba1e27c5 Add hint to input text for using slash commands 2024-01-25 11:56:56 +05:30
sabaimran
da6cd5ddc4
Improve subqueries for online search and prompt generation for image (#626)
* Improve subqueries for online search and prompt generation for image
- Include conversation history so that subqueries or intermediate prompts are generated with the appropriate context
2024-01-24 17:42:59 +05:30
sabaimran
dbdca7d8d1 Disable swagger UI docs in production 2024-01-24 15:23:39 +05:30
sabaimran
ddf6fd9c09 Remove valid number alert 2024-01-23 17:57:27 +05:30
Debanjum Singh Solanky
17107a0337 Release Khoj version 1.4.0 2024-01-23 10:18:31 +05:30
sabaimran
679db51453
Add support for phone number authentication with Khoj (part 2) (#621)
* Allow users to configure phone numbers with the Khoj server

* Integration of API endpoint for updating phone number

* Add phone number association and OTP via Twilio for users connecting to WhatsApp

- When verified, store the result as such in the KhojUser object

* Add a Whatsapp.svg for configuring phone number

* Change setup hint depending on whether the user has a number already connected or not

* Add an integrity check for the intl tel js dependency

* Customize the UI based on whether the user has verified their phone number

- Update API routes to make nomenclature for phone addition and verification more straightforward (just /config/phone, etc).
- If user has not verified, prompt them for another verification code (if verification is enabled) in the configuration page

* Use the verified filter only if the user is linked to an account with an email

* Add some basic documentation for using the WhatsApp client with Khoj

* Point help text to the docs, rather than landing page info

* Update messages on various callbacks and add link to docs page to learn more about the integration
2024-01-22 18:14:58 -08:00
sabaimran
58bf917775
Update the font used across Khoj desktop and web to be Tajawal (#622) 2024-01-20 23:13:33 +05:30
Debanjum
679f0f24a4
Improve Chat Input Pane Actions. Move to 1 Click Audio Chat on Mobile (#624)
## Major
### Move to single click audio chat UX on Obsidian, Desktop, Web clients
  New default UX has 1 long-press on mobile, 2-click on desktop to send transcribed audio message
  - New Audio Chat Flow
    1. Record audio while microphone button pressed
    2. Show auto-send 3s countdown timer UI for audio chat message
        Provide a visual cue around send button for how long before audio
        message is automatically sent to Khoj for response
    3. Auto-send msg in 3s unless stop send message button clicked
  - Why
    - Removes the previous default of 3 clicks required to send audio message
       The record > stop > send process to send audio messages was unclear and effortful
    - Still allows stopping message from being sent, to make correction to transcribed audio
    - Removes inadvertent long audio transcriptions if forget to press stop while recording

### Improve chat input pane actions & icons on Obsidian. Desktop, Web clients
- Use SVG icons in chat footer on web, desktop app
- Move delete icon to left of chat input. This makes it harder to inadvertently click it
- Add send button to chat input pane
- Color chat message send button to make it primary CTA
- Make chat footer shorter. Use no or round border on action buttons

## Minor
- Stop rendering empty starter questions element when no questions present
- Add round border, hover color to starter questions in web, desktop apps
- Fix auto resizing chat input box when transcribed text added
- Convert chat input into a text area in the Obsidian client
2024-01-20 21:52:56 +05:30
Debanjum Singh Solanky
ec3b837d00 Send audio message in 2-clicks on desktop to avoid holding down mic button 2024-01-20 21:40:38 +05:30
Debanjum Singh Solanky
f0daa45ae0 Move to single click audio chat UX on Obsidian client
- Capabillity
  New default UX has 1 long-press to send transcribed audio message

  - Removes the previous default of 3 clicks required to send audio message
    - The record > stop > send process to send audio messages was unclear
  - Still allows stopping message from being sent, if users want to make
    correction to transcribed audio
  - Removes inadvertent long audio transcriptions if user forgets to
    press stop when recording

- Changes
  - Record audio while microphone button pressed
  - Show auto-send 3s countdown timer UI for audio chat message
    Provide a visual cue around send button for how long before audio
    message is automatically sent to Khoj for response
  - Auto-send msg in 3s unless stop send message button clicked
2024-01-20 16:07:12 +05:30
Debanjum Singh Solanky
29a581d2b0 Move to single click audio chat UX on desktop app
- Capabillity
  New default UX has 1 long-press to send transcribed audio message

  - Removes the previous default of 3 clicks required to send audio message
    - The record > stop > send process to send audio messages was unclear
  - Still allows stopping message from being sent, if users want to make
    correction to transcribed audio
  - Removes inadvertent long audio transcriptions if user forgets to
    press stop when recording

- Changes
  - Record audio while microphone button pressed
  - Show auto-send 3s countdown timer UI for audio chat message
    Provide a visual cue around send button for how long before audio
    message is automatically sent to Khoj for response
  - Auto-send msg in 3s unless stop send message button clicked
2024-01-20 16:03:51 +05:30
Debanjum Singh Solanky
699e9ff878 Move to single click audio chat UX on web app
- Capabillity
  New default UX has 1 long-press to send transcribed audio message

  - Removes the previous default of 3 clicks required to send audio message
    - The record > stop > send process to send audio messages was unclear
  - Still allows stopping message from being sent, if users want to make
    correction to transcribed audio
  - Removes inadvertent long audio transcriptions if user forgets to
    press stop when recording

- Changes
  - Record audio while microphone button pressed
  - Show auto-send 3s countdown timer UI for audio chat message
    Provide a visual cue around send button for how long before audio
    message is automatically sent to Khoj for response
  - Auto-send msg in 3s unless stop send message button clicked
2024-01-20 15:56:46 +05:30
Debanjum Singh Solanky
26bd3533d8 Stop rendering empty starter questions element when no questions present 2024-01-20 11:39:58 +05:30
Debanjum Singh Solanky
7c8c475c3a Add round border, hover color to starter questions in web, desktop apps 2024-01-20 00:51:11 +05:30
Debanjum Singh Solanky
8a488b9e39 Fix auto resizing chat input box when transcribed text added 2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky
07ca137bdf Convert chat input into a text area in the Obsidian client
This allows for better readability of multi-line messages by users.
The chat input is a text area in the other clients as well.
2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky
d4552117f6 Add and improve chat input pane, actions, icons on Obsidian client
- Move delete icon to left of chat input. This makes it harder to
  inadvertently click
- Add send button to chat footer. Enter being the only way to send
  messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
2024-01-20 00:48:56 +05:30
Debanjum Singh Solanky
c0ad64d9a3 Add and improve chat input pane, actions, icons on desktop client
- Use SVG icons in chat footer on web
- Move delete icon to left of chat input. This makes it harder to
  inadvertently click
- Add send button to chat footer. Enter being the only way to send
  messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
2024-01-20 00:29:49 +05:30
Debanjum Singh Solanky
ea85ebdacb Add and improve chat input pane, actions, icons on web client
- Use SVG icons in chat footer on web
- Move delete icon to left of chat input. This makes it harder to
  inadvertently click
- Add send button to chat footer. Enter being the only way to send
  messages is not intuitive, outside standard modern UI patterns
- Color chat message send button to make it primary CTA on web client
- Make chat footer shorter. Use no or round border on action buttons
2024-01-19 20:40:42 +05:30
sabaimran
039ed78253
Add support for a first-party client app to call into Khoj (Part 1) (#601)
* Add support for a first party client app
- Based on a client id and client secret, allow a first party app to call into the Khoj backend with a phone number identifier
- Add migration to add phone numbers to the KhojUser object
* Add plus in front of country code when registering a phone number.
- Decrease free tier limit to 5 (from 10)
- Return a response object when handling stripe webhooks
* Fix telemetry method which references authenticated user's client app
* Add better error handling for null phone numbers, simplify logic of authenticating user
* Pull the client_secret in the API call from the authorization header
* Add a migration merge to resolve phone number and other changes
2024-01-18 19:24:14 +05:30
Debanjum Singh Solanky
9dfe1bb003 Fix updating subscription when invoice paid. Revert renewal_date logic
The actual issue was that `get_or_create_user_by_email' tried to
create a subscription even if it already existed.

With updated logic:
- New subscription is only created when it doesn't already exist in
  `get_or_create_user_by_email'
- `set_user_subscription' just updates the subscription state as
  user subscription object creation is already managed by
  `get_or_create_user_by_email'. So the other conditionals are
  unnecessary
2024-01-18 16:20:18 +05:30
Debanjum Singh Solanky
9b1a66c969 Fix updating subscription renewal date when invoice paid 2024-01-18 14:46:10 +05:30
sabaimran
93d5cb128c Initialize embeddings to empty list before processing 2024-01-18 13:27:04 +05:30
Debanjum Singh Solanky
24af888c41 Release Khoj version 1.3.0 2024-01-18 11:42:13 +05:30
Debanjum
8b4dd16255
Fix markdownRenderer arg to allow chat responses in Obsidian plugin (#619)
- Issue
Users with Dataview plugin would have error as its markdown
post-processor expects the sourcePath to be a string

This prevents Khoj from responding to chat messages in the Obsidian
chat modal. Search via Obsidian still works but it throws the same
dataview plugin error

- Fix
Pass a string as sourcePath to markdownRenderer to fix failing chat response
and stop throwing dataview errors on search

Resolves #614, Resolves #606
2024-01-18 10:18:31 +05:30
Debanjum
c8dbe8ee7b
Improve server status check and message in Obsidian client (#617)
- Update health API to pass authenticated users their info
- Improve Khoj server status check in Khoj Obsidian client
- Show Khoj Obsidian commands even if no connection to server
- Show Khoj chat by default in Obsidian side pane instead of search
2024-01-18 10:17:35 +05:30
Debanjum Singh Solanky
f9420e1209 Show Khoj Obsidian commands even if no connection to server
Server connection check can be a little flaky in Obsidian. Don't gate
the commands behind it to improve usability of Khoj.

Previously the commands would get disabled when server connection
check failed, even though server was actually accessible
2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky
36bf42a860 Show Khoj chat by default in Obsidian side pane instead of search 2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky
aab75a6ead Improve Khoj server status check in Khoj Obsidian client
- Update server connection status on every edit of khoj url, api key in
  settings instead of only on plugin load

  The error message was stale if connection fixed after changes in
  Khoj plugin settings to URL or API key, like on plugin install

- Show better welcome message on first plugin install.
  Include API key setup instruction

- Show logged in user email on Khoj settings page
2024-01-18 10:09:20 +05:30
Debanjum Singh Solanky
1a46734485 Fix markdownRenderer arg to allow chat responses in Obsidian plugin
- Issue: Users with Dataview plugin would have error as its markdown
post-processor expects the sourcePath to be a string

This prevents Khoj from responding to chat messages in the Obsidian
chat modal. Search via Obsidian still works but it throws the same
dataview error

- Fix: Pass a string as sourcePath to markdownRenderer to fix
failing chat response

Resolves #614, Resolves #606
2024-01-18 10:02:50 +05:30
sabaimran
e9e49ea098
Allow custom inference endpoint for the crossencoder model (#616)
* Add support for custom inference endpoints for the cross encoder model
- Since there's not a good out of the box solution, I've deployed a custom model/handler via huggingface to support this use case.
* Use langchain.community for pdf, openai chat modules
* Add an explicit stipulation that the api endpoint for crossencoder inference should be for huggingface for now
2024-01-18 10:02:12 +05:30
Debanjum Singh Solanky
870af19ba4 Update health API to pass authenticated users their info
This allows Khoj clients to get email address associated with
user's API token for display in client UX

In anonymous mode, default user information is passed
2024-01-17 13:38:57 +05:30
Debanjum
4d30f7d1d9
Short-circuit API rate limiter for unauthenticated users (#607)
### Major
- Short-circuit API rate limiter for unauthenticated user
  Calls by unauthenticated users were failing at API rate limiter as it
  failed to access user info object. This is a bug.
  
  API rate limiter should short-circuit for unauthenicated users so a
  proper Forbidden response can be returned by API
  
  Add regression test to verify that unauthenticated users get 403
  response when calling the /chat API endpoint
  
### Minor
- Remove trailing slash to normalize khoj url in obsidian plugin settings
- Move used /api/config API controllers into separate module
- Delete unused /api/beta API endpoint
- Fix error message rendering in khoj.el, khoj obsidian chat
- Handle deprecation warnings for subscribe renew date, langchain, pydantic & logger.warn
2024-01-17 00:59:52 +05:30
Debanjum Singh Solanky
2752e0d607 Update jinja2 and axios min supported package versions 2024-01-16 18:45:38 +05:30
Debanjum Singh Solanky
7039c202c8 Merge branch 'master' into short-circuit-api-rate-limiter 2024-01-16 18:18:34 +05:30
Debanjum Singh Solanky
8917228dbb Remove unused, deprecated /api/config/data API endpoints
- Use /api/health for server up check instead of api/config/default
- Remove unused `khoj--post-new-config' method
- Remove the now unused /config/data GET, POST API endpoints
2024-01-16 18:15:06 +05:30
Debanjum Singh Solanky
6ded4c1d75 Merge branch 'master' into fix-1000-file-index-update-limit 2024-01-16 16:50:58 +05:30
Debanjum Singh Solanky
16175137e5 Decode URL encoded query string in chat API endpoint before processing 2024-01-16 13:09:28 +05:30
Debanjum Singh Solanky
9fe1c8ae13 Make references and online_results optional params to converse_offline
Fixes all the failing GPT4All tests because they were missing the
online_results argument
2024-01-16 13:09:28 +05:30
Debanjum Singh Solanky
d74f8e03d3 Pass max context length to fix using updated GPT4All.list_gpu method
It's signature was updated in GPT4All 2.1.0 pypi release.

Resolves #610
2024-01-16 12:23:45 +05:30
Debanjum Singh Solanky
1ae6669fbf Correctly handle API response when no files to index 2024-01-16 11:57:40 +05:30
sabaimran
50575b749b
Add option to use HuggingFace's inference endpoint for generating embeddings (#609)
* Support using hosted Huggingface inference endpoint for embeddings generation
* Since the huggingface inference endpoint is model-specific, make the URL an optional property of the search model config
* Handle ECONNREFUSED error in desktop app
* Drive API key via the search model config model and use more generic names
2024-01-16 08:58:24 +05:30
Debanjum Singh Solanky
ba37b28fb5 Improve batched error handling. Catch can't connect to server error
Break out of batch processing when unable to connect to server or
when requests throttled by server
2024-01-14 01:04:44 +05:30
Debanjum Singh Solanky
7dfbcd2e5a Handle subscribe renew date, langchain, pydantic & logger.warn warnings
- Ensure langchain less than 0.2.0 is used, to prevent breaking
  ChatOpenAI, PyMuPDF usage due to their deprecation after 0.2.0
- Set subscription renewal date to a timezone aware datetime
- Use logger.warning instead of logger.warn as latter is deprecated
- Use `model_dump' not deprecated dict to get all configured content_types
2024-01-12 01:46:52 +05:30
Debanjum Singh Solanky
5f97357fe0 Delete unused /api/beta API endpoint 2024-01-12 01:11:05 +05:30
Debanjum Singh Solanky
bb1c1b39d8 Move /api/config API controllers into separate module for code modularity 2024-01-12 01:11:04 +05:30
Debanjum Singh Solanky
ba99089a12 Short-circuit API rate limiter for unauthenticated user
Calls by unauthenticated users were failing at API rate limiter as it
failed to access user info object. This is a bug.

API rate limiter should short-circuit for unauthenicated users so a
proper Forbidden response can be returned by API

Add regression test to verify that unauthenticated users get 403
response when calling the /chat API endpoint
2024-01-12 00:23:50 +05:30
Debanjum Singh Solanky
b1269fdad2 Remove trailing slash to normalize khoj url in obsidian plugin settings 2024-01-11 21:56:36 +05:30
Debanjum Singh Solanky
ffdb291fe0 Fix error message rendering in khoj.el, khoj obsidian chat
- Fix failed to index error message in khoj.el
- Fix chat model not configured message in khoj obsidian chat
2024-01-11 21:55:54 +05:30
Debanjum Singh Solanky
af9ceb00a0 Show relevant error msg in desktop app, e.g when can't connect to server 2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
43423432ce Pass indexed filenames in API response for client validation 2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
5f9ac5a630 Collect files to index in single dict to simplify index/update controller
Simplifies code while maintaining typing
2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
efe41aaaca Push 1000 files at a time from the Desktop client for indexing
FastAPI API endpoints only support uploading 1000 files at a time.
So split all files to index into groups of 1000 for upload to
index/update API endpoint
2024-01-09 23:09:34 +05:30
Debanjum Singh Solanky
b6d5392c0c Release Khoj version 1.2.1 2024-01-04 18:45:37 +05:30
Debanjum Singh Solanky
fca7a5ff32 Push 1000 files at a time from the Obsidian client for indexing
FastAPI API endpoints only support uploading 1000 files at a time.
So split all files to index into groups of 1000 for upload to
index/update API endpoint
2024-01-04 18:43:22 +05:30
Debanjum Singh Solanky
4a234c8db3 Use default offline/openai chat model to extract DB search queries
Make usage of the first offline/openai chat model as the default LLM
to use for background tasks more explicit

The idea is to use the default/first chat model for all background
activities, like user message to extract search queries to perform.
This is controlled by the server admin.

The chat model set by the user is used for user-facing functions like
generating chat responses
2024-01-03 14:04:49 +05:30
Debanjum Singh Solanky
e28adf2884 Also index pdf, markdown and plaintext files using khoj emacs client
Previously you could only index org-mode files and directories from
khoj.el

Mark the `khoj-org-directories', `khoj-org-files' variables for
deprecation, since `khoj-index-directories', `khoj-index-files'
replace them as more appropriate names for the more general case

Resolves #597
2024-01-03 11:46:17 +05:30
Debanjum Singh Solanky
5abaed9d08 Use user chosen OpenAI model to extract DB search questions from query
Previously Khoj was selecting the first OpenAI model configured on
server and not the OpenAI model configured by the user for themselves
2024-01-03 11:45:06 +05:30
Debanjum Singh Solanky
05536aab6b Merge how users can share personal information in personality prompt 2024-01-03 11:40:14 +05:30
Liam Swayne
455f78b178
Replace var declarations with let declarations (#576)
* Replace var declaration with let declaration
2023-12-29 10:20:48 +05:30
sabaimran
79913d4c17
Add isort to the pre-commit configuration and apply it to the whole project (#595)
* Apply isort to the entire repository
* Fix missing import issues in text_to_entries
* Fix imports in migration files
2023-12-28 18:04:02 +05:30
sabaimran
442c913de3 Update telemetry state for search model only if one is found, fix alt text for language setting 2023-12-28 12:53:53 +05:30
sabaimran
d3ab3f1b70 Rename matrix_blog to web and move the language setting into the content section 2023-12-28 12:44:49 +05:30
sabaimran
00af6baeb6 Resolve merge conflicts with intro message in chat.html web view 2023-12-23 17:52:58 +05:30
sabaimran
afec4394f9
Merge pull request #592 from ayushjha119/Fixed-Health-Check-to-Khoj-api
Fixed health check to khoj api
2023-12-23 13:04:50 +05:30
sabaimran
c50eb8a691 Fix mypy/pre-commit issues 2023-12-23 11:44:37 +05:30
Debanjum Singh Solanky
21c55b4c0d Release Khoj version 1.2.0 2023-12-22 21:43:47 +05:30
Debanjum Singh Solanky
6a8c1fe423 Sanitize rendering chat references in Web, Desktop and Obsidian clients
Use textContent instead of innerHTML to append references

Resolves #583
2023-12-22 18:11:49 +05:30
Debanjum
6879daccc6
Fix Chat Streaming on Obsidian, Docker Image Version and First-Run, Chat Error Messages in Clients (#589)
- Fix streaming chat response in Obsidian client
- Fix first-run, chat error message in obsidian, desktop and web clients
- Set Khoj app version to latest version in Docker images
- Tag Khoj Docker image built on release with the `latest` tag
   This align docker image release cadence with client, server releases
2023-12-22 04:13:01 -08:00
Debanjum Singh Solanky
d101297995 Use markdown formatted chat message in chat modal 2023-12-22 17:01:31 +05:30
Debanjum Singh Solanky
350fd89c8d Clear chat history html in Obsidian if getChatHistory works too 2023-12-22 17:01:31 +05:30
ayushjha119
e487ec5370 fixed app to api health Check 2023-12-21 17:51:30 +05:30
Debanjum Singh Solanky
70607cbbbb Update FRE message to get any Khoj client to sync files with server 2023-12-21 15:23:47 +05:30
ayushjha119
b3d7d6a79d used the Response class from fastapi.responses and set the input for status_code to 200 2023-12-21 14:26:40 +05:30
sabaimran
e1aaff2053 Add more details about functionality in Khoj's intro message 2023-12-21 10:09:30 +05:30
sabaimran
a1211f40d7 Fix type declaration for the cross_encoder_model state variable. Update name of the new update API 2023-12-21 09:15:13 +05:30
sabaimran
089e4bee12 FIx unit tests with new search model configurations 2023-12-20 21:50:44 +05:30
Debanjum Singh Solanky
447c1b90e7 Fix streaming chat response in Obsidian client
- Convert renderIncrementalMessage to an async method as
  MarkdownRenderer is an async method

- Simplify code, remove unneeded JSON check
2023-12-20 14:51:19 +05:30
sabaimran
aa23da60a3 Add a notification banner to show temporary messages 2023-12-20 14:22:08 +05:30
Debanjum Singh Solanky
e04fe921eb Fix first-run, chat error message in obsidian, desktop and web clients
- Disable chat input field if getChatHistory had error as Khoj may not
  be setup correctly to chat
2023-12-20 14:03:07 +05:30
sabaimran
5ff9df9d4c Add support per user for configuring the preferred search model from the config page
- Honor this setting across the relevant places where embeddings are used
- Convert the VectorField object to have None for dimensions in order to make the search model easily configurable
2023-12-20 13:25:43 +05:30
sabaimran
0f6e4ff683 Add a model that specifies the user's search model configuration
- Update all endpoints that generate embeddings to use the new model. Incl. generating text embeddings, creating embeddings for a search query
2023-12-20 09:22:26 +05:30
sabaimran
6dd2b05bf5 Rebase with master 2023-12-19 21:02:49 +05:30
sabaimran
e3557cd8b7 Update the personality prompt to make Khoj aware that users can share data via the desktop app 2023-12-19 16:42:45 +05:30
sabaimran
927e477f68 Ignore typing error in custom action short description 2023-12-19 16:10:58 +05:30
sabaimran
946305d977 Add function to export conversations for debugging 2023-12-19 16:05:20 +05:30
sabaimran
903a01745f Use 0px for padding for input row buttons in web 2023-12-18 16:09:06 +05:30
sabaimran
5b092d59f4 Ignore dict assignment typing error 2023-12-17 22:34:54 +05:30
sabaimran
03cb86ee46 Update typing and object assignment for new text to image method return 2023-12-17 21:28:33 +05:30
sabaimran
0288804f2e Render the inferred query along with the image that Khoj returns 2023-12-17 21:02:55 +05:30
sabaimran
49af2148fe Miscellaneous improvements to image generation
- Improve the prompt before sending it for image generation
- Update the help message to include online, image functionality
- Improve styling for the voice, trash buttons
2023-12-17 20:25:35 +05:30
sabaimran
7cb64cb2f9 Add telemetry for image generation conversation command 2023-12-17 18:25:03 +05:30
sabaimran
09544dee09 Add TextToImageModelConfig to the admin page 2023-12-17 16:44:19 +05:30
sabaimran
0459666beb CSRF Cookie not set error in prod. Try fixing https forwarding for mitigation 2023-12-17 12:55:18 +05:30
sabaimran
61dde8ed89 If text to image config isn't set, send back an error message to the client 2023-12-17 12:54:50 +05:30
sabaimran
3065cea562 Address mypy typing issues 2023-12-16 09:24:26 +05:30
sabaimran
5f6dcf9f2e Add a rate limiter for the transcribe API endpoint 2023-12-16 09:18:56 +05:30
sabaimran
73a107690d Add a ConversationCommand rate limiter for the chat endpoint 2023-12-16 09:03:52 +05:30
sabaimran
9b961ed496
Merge pull request #580 from khoj-ai/fix-upgrade-chat-to-create-images
Support Image Generation with Khoj
2023-12-07 21:17:58 +05:30
Debanjum Singh Solanky
7504669f2b Fix rendering image on chat response in obsidian client 2023-12-05 03:48:07 -05:00
Debanjum Singh Solanky
408b7413e9 Use global openai client for transcribe, image 2023-12-05 03:36:33 -05:00
Debanjum Singh Solanky
162b219f2b Throw unsupported error when server not configured for image, speech-to-text 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
8f2f053968 Fix rendering image on chat response in web, desktop client 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
d124266923 Reduce promise based nesting in chat JS func used in desktop, web client
Use async/await to reduce .then() based nesting to improve code
readability
2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
6e3f66c0f1 Use base64 encoded image instead of source URL for persistence
The source URL returned by OpenAI would expire soon. This would make
the chat sessions contain non-accessible images/messages if using
OpenaI image URL

Get base64 encoded image from OpenAI and store directly in
conversation logs. This resolves the image link expiring issue
2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
52c5f4170a Show generated images in the chat modal of the Khoj Obsidian plugin 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
8016a57b5e Show generated images in chat interface on Desktop client 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
cc051ceb4b Show generated images in chat interface on Web client 2023-12-05 01:51:14 -05:00
Debanjum Singh Solanky
252b35b2f0 Support /image slash command to generate images using the chat API 2023-12-05 01:51:14 -05:00
sabaimran
ef21d78c99 Initial changes to support multiple search model configurations
- All search models are loaded into memory, and stored in a dictionary indexed by name
- Still need to add database migrations and create a UI for user to select their choice. Presently, it uses the default option
2023-12-05 00:35:40 -05:00
Debanjum Singh Solanky
1d9c1333f2 Configure text to image models available on server
- Currently supports OpenAI text to image model, by default dall-e-3
- Allow setting the text to image model via CLI during server setup
2023-12-04 21:27:53 -05:00
Debanjum Singh Solanky
f0222f6d08 Make save_to_conversation_log helper function reusable
- Move it out to conversation.utils from generate_chat_response function
- Log new optional intent_type argument to capture type of response
  expected. This can be type responses by Khoj e.g speech, image. It
  can be used to render responses by Khoj appropriately on clients
- Make user_message_time argument optional, set the time to now by
  default if not passed by calling function
2023-12-04 19:42:12 -05:00
sabaimran
d2ddbef08f Use a unique name for the temp PDF generated 2023-12-04 19:27:00 -05:00
sabaimran
d20746613a Properly filter out empty PDFs for indexing 2023-12-04 16:15:17 -05:00
Debanjum Singh Solanky
316b7d471a Handle offline chat model retrieval when no internet
Offline chat shouldn't fail on retrieve_model when no internet,
if model was previously downloaded and usable offline
2023-12-04 13:46:25 -05:00
Debanjum Singh Solanky
2b09caa237 Make online results an optional argument to the gpt converse method 2023-12-04 12:15:29 -05:00
Debanjum Singh Solanky
7009793170 Migrate to OpenAI Python library >= 1.0 2023-12-03 18:16:00 -05:00
sabaimran
cc064ea57d Fix circular import issue 2023-12-03 17:46:44 -05:00
sabaimran
21f8d63e89 If a user subscribes to Khoj with an email address that's not present in the DB, create an account 2023-12-03 17:28:40 -05:00
sabaimran
c5d297a9ed Recursively search through folders for indexing 2023-12-03 16:17:28 -05:00
Debanjum Singh Solanky
a57d529f39 Fix path to system tray icon of Khoj desktop app 2023-12-03 00:12:50 -08:00
Debanjum Singh Solanky
106cdbe455 Release Khoj version 1.1.0 2023-11-30 20:09:08 -08:00
Debanjum Singh Solanky
10ce4ee11c Ignore null params type check for markdown renderer in Obsidian client 2023-11-30 20:09:08 -08:00
sabaimran
a5ffa2342f Add documentation for local setup and fix admin panel bugs
- Wasn't able to login to the admin panel when KHOJ_DEBUG was not True. Fix this error so self-hosted users can get unblocked from accessing the admin settings
- Don't force users to set their KHOJ_DJANGO_SECRET_KEY
2023-11-30 17:55:27 -08:00
Debanjum Singh Solanky
d587632700 Clear result before render thinking placeholder emoji in Obsidian chat 2023-11-30 13:53:09 -08:00
Debanjum Singh Solanky
48719ee0dd Render newline separation in chat references to improve readability 2023-11-30 13:16:48 -08:00
Debanjum Singh Solanky
1a31a2efcf Render Khoj chat streaming response as md & show refs in Obsidian
- Use new style references for Khoj chat modal in Obsidian
- Khoj Chat responses in Obsidian had regressed to not show references
  for new questions after modal has been opened. Now even those are
  rendered, and use new references style
- Render chat response as markdown while it's being streamed
2023-11-30 13:02:00 -08:00
Debanjum Singh Solanky
0430fa67b6 Show temporary status message when copied to clipboard 2023-11-29 13:49:33 -08:00
Debanjum Singh Solanky
491a1a949a Render chat responses as markdown in Desktop client too 2023-11-29 13:49:33 -08:00
Debanjum Singh Solanky
20ef5bfc93 Properly stop mediaRecorder stream to clear microphone in-use state 2023-11-29 13:48:35 -08:00
Debanjum Singh Solanky
8faa63c3c6 Convert config page buttons to use stronger yellow 2023-11-28 19:55:43 -08:00
Debanjum Singh Solanky
a6ca2076d5 Open link to Khoj app landing page from nav pane in current tab 2023-11-28 14:20:37 -08:00
Debanjum Singh Solanky
643e018947 Handle if user subscription field doesn't exists in telemetry func
Avoid null ref in the method when running Khoj server in anon mode
2023-11-28 14:15:14 -08:00
Debanjum Singh Solanky
110d7646fc Use milder yellow as primary Khoj theme color for chat, buttons etc. 2023-11-28 14:15:14 -08:00
sabaimran
18254850ab Set a default value for the khoj django secret key and add additional guidance for setting environment variables on first run 2023-11-28 09:39:44 -08:00
sabaimran
6290b463f5 Compute size of the indexed data only if explicitly requested to avoid heavy load on the DB 2023-11-27 12:05:00 -08:00
sabaimran
eb5e3096e0 Change subscribed scope to premium 2023-11-27 11:39:20 -08:00
sabaimran
6e1ba11e59 Resolve merge conflicts for rendering chat response 2023-11-27 11:33:13 -08:00
Debanjum Singh Solanky
71f2d54258 Render chat response as markdown while streaming on Web, Desktop clients 2023-11-26 20:27:10 -08:00
Debanjum Singh Solanky
9e714d032b Fix Khoj telemetry server. Add server_version column 2023-11-26 15:05:43 -08:00
Debanjum Singh Solanky
b249bbb5b5 Limit max audio file size allowed for transcription on API endpoint 2023-11-26 14:19:46 -08:00
Debanjum Singh Solanky
a79604b601 Fix return types of offline, online transcribe methods for python 3.9 2023-11-26 06:26:34 -08:00
Debanjum Singh Solanky
06f99ceb3c Rename /api/speak API endpoint to /api/transcribe 2023-11-26 06:18:44 -08:00
Debanjum Singh Solanky
56a1a61c77 Remove unused button element retrieval code from web, desktop 2023-11-26 06:17:56 -08:00
Debanjum Singh Solanky
877532a167 Speak to Khoj from the Obsidian client
- Add transcription button with mic icon
- Collect audio recording on pressing mic
- Process and send audio recording to server for transcription
- Extract the functionality to flash status in chat input for reuse
2023-11-26 06:17:54 -08:00
Debanjum Singh Solanky
cc9eae5d18 Update default chat model to Mistral in GPT4AllProcessor config 2023-11-26 05:55:43 -08:00
Debanjum Singh Solanky
4636390f7f Transcribe speech to text offline with Whisper
- Allow server admin to configure offline speech to text model during
  initialization
- Use offline speech to text model to transcribe audio from clients
- Set offline whisper as default speech to text model as no setup api key reqd
2023-11-26 05:55:11 -08:00
Debanjum Singh Solanky
a0a7ab7ec8 Rename conversation.gpt4all package to conversation.offline 2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
499adf86a0 Move transcription using OpenAI API into independent package 2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
897170ab15 Use single db migration script for transcribe model, related updates 2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
28090216f6 Show transcription error status in chatInput placeholder on web, desktop
- Extract flashing status message in chat input placeholder into
  reusable function
- Use emoji prefixes for status messages
- Improve alt text of transcribe button to indicate what the button does
2023-11-26 04:19:32 -08:00
Debanjum Singh Solanky
fc040825b2 Default to Offline chat with Mistral as minimal setup, no API key reqd. 2023-11-26 01:07:20 -08:00
Debanjum Singh Solanky
5a6547677c Add type of operation variable in latest migration 2023-11-26 00:38:52 -08:00
Debanjum Singh Solanky
3e252036c3 Remove whitespace: pre-line from chat html, since markdown rendering 2023-11-26 00:27:29 -08:00
Debanjum Singh Solanky
b484795b8e Merge branch 'master' into add-speak-to-chat
- Conflicts:
  - src/interface/desktop/chat.html
    Combine and use common class names for speak component
  - src/khoj/database/adapters/__init__.py
    Combine imports
  - src/khoj/interface/web/chat.html
    Combine and use common class names for speak component
  - src/khoj/routers/api.py
    Combine imports
2023-11-26 00:26:21 -08:00
sabaimran
6233a957b4 Merge branch 'master' of github.com:khoj-ai/khoj into features/enforce-subscription-status 2023-11-25 22:46:10 -08:00
sabaimran
52b88de7f4 Indicate in the desktop if the user gets rate limited for indexing 2023-11-25 22:31:23 -08:00
Debanjum
e0a59cff68
Delete Conversation History from Web, Desktop, Obsidian Clients (#551)
Add delete button to clear conversation history from Web, Desktop and Obsidian Khoj clients

Resolves #523
2023-11-25 22:24:12 -08:00
Debanjum Singh Solanky
d0e294d8a5 Clear Conversation History from the Obsidian client
- Fix font color for Khoj chat responses in Obsidian. Previous color
  had too low a contrast to be readable
2023-11-25 22:16:13 -08:00
sabaimran
b2afbaa315 Add support for rate limiting the amount of data indexed
- Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed
- Delete any files that are being removed for adminstering the calculation
- Show current amount of data indexed in the config page
2023-11-25 20:28:04 -08:00
Debanjum Singh Solanky
07bf365c7c Clear any network connections to khoj server via khoj.el on reindex
- Ignore errors in deleting network requests to khoj server
- Also delete open network connection to khoj server on auto reindex
  Otherwise when server is unreachable a bunch of failed network
  connections accrue in the processes list
2023-11-25 20:19:41 -08:00
sabaimran
dd1badae81 Use userwithtoken.user when authenticating with an API key 2023-11-24 22:18:45 -08:00
sabaimran
48b9116195 Fix to use user rather than user_with_token in authenticated credentials 2023-11-24 22:18:00 -08:00
sabaimran
771f9bcfa1 If the user subscription was created over 7 days ago, then their trial is expired 2023-11-24 22:08:32 -08:00
sabaimran
e5b1350523 Enforce API use limits depending on whether the server has billing enabled
and whether the given user is subscribed
2023-11-24 21:55:16 -08:00
sabaimran
9c868ee10b Use the state.billing_enabled field to determine whether to use the subscribed scope 2023-11-24 20:41:19 -08:00
sabaimran
69c8f45830 Use scopes to represent whether the use has a valid subscription in the middleware 2023-11-24 20:29:36 -08:00
Debanjum
25f3f2367e
Handle Server Unavailable Error from Khoj.el (#568)
- Make auto-update of content index user configurable from khoj.el
- Handle server unavailable error on auto-index schedule job in khoj.el

Resolves #567
2023-11-24 16:46:07 -08:00
Debanjum Singh Solanky
138f4e3f3c Make auto-update of content index user configurable from khoj.el 2023-11-24 16:40:50 -08:00
Debanjum Singh Solanky
0885fc6c23 Handle server unavailable error on auto-index schedule job in khoj.el 2023-11-24 16:39:44 -08:00
sabaimran
c13953311a Add reflective questions to admin pages 2023-11-23 14:01:05 -08:00
sabaimran
c42ec32a95
Merge pull request #552 from khoj-ai/features/internet-enabled-search
Support internet-enabled, online searching using Serper.dev
2023-11-23 12:34:05 -08:00
sabaimran
c641b8df58 Update desktop package version 2023-11-22 17:54:53 -08:00
sabaimran
a1b2289074 Release Khoj version 1.0.1 2023-11-22 17:52:07 -08:00
sabaimran
b1b037f0ea Fix URL configuration issues with reorganized subfolders 2023-11-22 17:03:33 -08:00
sabaimran
e0949e232b Import random in adapters file for selecting reflective question 2023-11-22 07:52:51 -08:00
sabaimran
256e8de40a Merge with features/internet-enabled-search 2023-11-22 07:25:24 -08:00
Debanjum Singh Solanky
fd60db766e Clear Conversation History from the Web Client 2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky
d5a4830761 Clear Conversation History from the Desktop Client 2023-11-22 03:35:00 -08:00
Debanjum Singh Solanky
3096544cf2 Create API endpoint to clear user's chat history 2023-11-22 03:34:59 -08:00
Debanjum Singh Solanky
63675b3299 Speak to Khoj from the Desktop client
- Use icons to style speech to text recording state
2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky
2951fc92d7 Speak to Khoj from the Web client
- Use icons to style speech to text recording state
2023-11-22 02:47:17 -08:00
Debanjum Singh Solanky
cc77bc4076 Create speech to text API endpoint. Use OpenAI whisper for ASR
- Wrap audio transcription in try/catch and delete audio file after
processing
- Use configured speech to text model, else handle error
2023-11-22 02:47:06 -08:00
Debanjum Singh Solanky
1ca99b6eb0 Add speech to text model configuration to Database 2023-11-22 02:24:31 -08:00
sabaimran
c652a7fd2d Move text_to_entries under the new content folder 2023-11-21 22:25:17 -08:00
sabaimran
1e2af083f0 Rename the data_sources module to content 2023-11-21 22:11:32 -08:00
sabaimran
4cb28aeffb Resolve merge conflicts with master 2023-11-21 22:07:41 -08:00
Debanjum Singh Solanky
4cdfe8fc4f Re-enable Khoj Obsidian plugin for Mobile, as Khoj cloud is available 2023-11-21 16:33:48 -08:00
Debanjum
5d9d50157e
Clean Logs, Improve Message Rendering and Make Khoj Trusted Host Configurable (#561)
- Append chat message to chat logs as TextNodes in web, desktop clients

- Simplify Code to Identify Files from Github, Notion on Web, Desktop Client
  - Use file source to find entries from github, notion on web, desktop client
  - Pass file source to clients via text search API response

- Make Django Logs Follow Khoj Log Format, Verbosity
  - Handle image search setup related warning
  - Format Django initializing outputs using Khoj logger format

- Use `KHOJ_HOST` env var to set allowed/trusted domains to host Khoj
2023-11-21 15:14:34 -08:00
Debanjum Singh Solanky
9e736d4340 Use KHOJ_DOMAIN for CORS allow_origins list as well
- Default to app.khoj.dev
- Remove unnecesary any_path regex in allow_origins. It only cares
  about host, paths are not set in origin header
2023-11-21 14:02:04 -08:00
sabaimran
5469e81a87 Use full path for the static directory in FastAPI and reflect deeper nesting of the django app 2023-11-21 13:44:45 -08:00
sabaimran
d199c4c35f Resovle merge conflicts with matser 2023-11-21 13:35:56 -08:00
Debanjum Singh Solanky
76d041f633 Use KHOJ_HOST env var to set allowed/trusted domains to host Khoj
Allows hosting Khoj behind other, non "khoj.dev" domains
2023-11-21 13:11:45 -08:00
Debanjum Singh Solanky
90d463c12a Append chat message to chat logs as TextNodes in web, desktop clients 2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
befcbcdd5d Use file source to find entries from github, notion on web, desktop client
This is a more robust mechanism of identification than via file name
including github or notion domain names
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
3f0de45ec6 Pass file source to clients via text search API response
Source of entry stored in DB is now passed to clients for processing
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
4aec581306 Handle image search setup related warning
Ideally should rename model_directory to config_directory or some such
but the current image search code will need to be migrated soon. So
changing the variable name and creating a migration script for old
khoj.yml files using model-directory variable isn't worth it

Remove the explicity set of number of threads to use by pytorch. Use
the default used by it.
2023-11-21 13:10:50 -08:00
Debanjum Singh Solanky
b06628ee31 Format Django initializing outputs using Khoj logger format
- Collect STDOUT from the `migrate', `collectstatic' commands and
  output using the Khoj logger format and verbosity settings

- Only show Django `collectstatic' command output in verbose mode

- Fix showing the Initializing Khoj log line by moving it after logger
  level set
2023-11-21 13:10:50 -08:00
sabaimran
341abf03ff Handle none for search_type and use equals comparator rather than in for determining Notion type 2023-11-21 12:55:09 -08:00
sabaimran
2bb989e9d8 Resolve merge conflicts and fix some import ordering 2023-11-21 12:30:43 -08:00
sabaimran
244b76ffed Add isort for automatic import sorting and skip main.py because it's a drama queen 👑 2023-11-21 12:20:41 -08:00
Debanjum
8a0d92e2d7
Fix Connectivity Check in Obsidian Client (#559) from dtkav/bugfix-local-connectivity-check
Check connection to Khoj server for self-hosted server. This check had regressed during the cloud rearchitecture
2023-11-21 12:05:16 -08:00
sabaimran
0e6f09b241
Merge pull request #562 from khoj-ai/fix/pypi-package-app-not-included
Fix PyPi package app reference issue
2023-11-21 11:54:46 -08:00
sabaimran
333cb3445c Use colon rather than equals to indicate typing 2023-11-21 11:28:51 -08:00
Debanjum Singh Solanky
645fd96634 Search across all content types from Khoj Obsidian client
Previously it was only searching for PDF and Markdown files. This was
meant to show only content from current vault as results.

But it has not scaled well as other clients also allow syncing PDF and
markdown files now. So remove this content type filter for now.

A proper solution would limit by using file/dir filters on server or
client side.
2023-11-21 11:19:33 -08:00
sabaimran
a1460a5bf9 Set operations to typed empty list in migration file 2023-11-21 11:14:40 -08:00
sabaimran
71e794c26f Remove the sys.append line in the main.py file, as it's not required 2023-11-21 10:57:21 -08:00
sabaimran
a474c31e02 Move the django app into the src/khoj folder for better organization and functionality
- Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder
- Update associated file paths and system references
2023-11-21 10:56:04 -08:00
Debanjum Singh Solanky
c89bd49973 Fix ranking search results on Obsidian
It's reversed since score of entries is now a distance metric on
Khoj server. So lesser distance is better. Previously higher score was
better
2023-11-21 01:24:59 -08:00
Daniel Grossmann-Kavanagh
f142999bce fix khoj local server usage 2023-11-20 17:07:30 -08:00
Debanjum Singh Solanky
c07401cf76 Fix, Improve chat config via CLI on first run by using defaults
- Fix setting prompt size for online chat
- generally improve chat config via cli by using default chat model,
  prompt size for online and offline chat
2023-11-20 17:01:20 -08:00
sabaimran
b142de15a8 Merge branch 'features/internet-enabled-search' of github.com:khoj-ai/khoj into features/reflective-suggested-questions 2023-11-20 15:56:09 -08:00
sabaimran
a9623ef85a Add requisite imports in order to instantiate offline model in adapters file 2023-11-20 15:27:42 -08:00
sabaimran
a8f13f334f Fix merging issues with base after popping the stash 2023-11-20 15:22:50 -08:00
sabaimran
8fa0b69c67 Resolve merge issue with adapters methods 2023-11-20 15:21:06 -08:00
sabaimran
fee99779bf Add subqueries for internet-connected search results and update client-side code accordingly
- Add a wrapper method to help make direct queries to the LLM and determine any intermediate responses needed for handling the request
2023-11-20 15:19:15 -08:00
Debanjum Singh Solanky
d61b0dd55c Add Khoj Django app package to sys path to load Django module via pip install 2023-11-20 14:55:00 -08:00
sabaimran
b8e6883a81 Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search 2023-11-19 16:20:08 -08:00
sabaimran
237195e20e Make all name-related fields nullable within the GoogleUser 2023-11-19 14:22:32 -08:00
Debanjum
71799add0b
Index Parent Headings of Org-Mode Entries to Improve Search Context (#548)
### Overview
The parent hierarchy of org-mode entries can store important context. 
This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index

### Details
- Test search uses ancestor headings as context for improved results
- Add ancestor headings of each org-mode entry to their compiled form
- Track ancestor headings for each org-mode entry in org-node parser

Resolves #85
2023-11-19 13:18:19 -08:00
sabaimran
ef5e9d66c1 Resolve merge conflicts in dependency imports 2023-11-19 11:42:20 -08:00
Debanjum Singh Solanky
c3465d6982 Release Khoj version 1.0.0 2023-11-19 09:50:25 -08:00
Debanjum
736744be3a
Update documentation to reflect new multi-user config scenario (#550)
- Update docs to show how to use Khoj Cloud
- Move self-hosting Khoj to separate section
- Add page to setup Desktop app
- Set default URL to Khoj Cloud URL in Obsidian, Emacs clients
2023-11-18 18:22:46 -08:00
Debanjum Singh Solanky
e1bf1f0e86 Update default Khoj server URL to Khoj cloud on Emacs, Obsidian clients 2023-11-18 16:25:45 -08:00
Debanjum Singh Solanky
8775ce730a Use URL fragments to allow jumping to config page sections on Web app 2023-11-18 16:25:45 -08:00
sabaimran
f792b1e301 Remove already defined identical function 2023-11-18 14:08:50 -08:00
sabaimran
e2fff5dc47 Don't explicitly use value to get the model type value 2023-11-18 14:01:01 -08:00
sabaimran
a8a25ceac2 Honor user's chat settings when running the extract questions phase
- Add marginally better error handling when GPT gives a messed up respones to the extract questions method
- Remove debug log lines
2023-11-18 13:31:51 -08:00
sabaimran
67156e6aec Add new logs for debugging issues with chat references 2023-11-18 12:10:50 -08:00
sabaimran
5de2ab6098 Change parse_obj calls to use model_validate per new pydantic specification 2023-11-18 12:10:36 -08:00
sabaimran
6d249645a6 Fix interpretation of the default search type 2023-11-18 00:04:18 -08:00
sabaimran
f180b2ba94 Resolve mypy errors for various data types 2023-11-17 23:26:15 -08:00
sabaimran
3328a41f08 Update types of base config models for pydantic 2.0 2023-11-17 23:08:52 -08:00
sabaimran
f688529150 Update the default configuration for the AppConfig 2023-11-17 19:26:31 -08:00
sabaimran
11ccb92755 Fix formatting of welcome message to use markdown 2023-11-17 18:55:59 -08:00
Debanjum Singh Solanky
ca87b4ede9 Wrap common API query parameters into shared class to deduplicate code
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
  Earlier version didn't support wrapping common query params in class

- Use per fixture app instead of a global FastAPI app in conftest

- Upgrade minimum required Django version

- Fix no notes chat director test with updated no notes message
  No notes message was updated in commit 118f1143
2023-11-17 18:43:49 -08:00
sabaimran
262f3ccb59 Resolve mypy issues with formatting 2023-11-17 17:11:00 -08:00
sabaimran
a7e00898cb Fix rendering even when no online context references are returned 2023-11-17 16:41:28 -08:00
sabaimran
0fcf234f07 Add support for using serper.dev for online queries
- Use the knowledgeGraph, answerBox, peopleAlsoAsk and organic responses of serper.dev to provide online context for queries made with the /online command
- Add it as an additional tool for doing Google searches
- Render the results appropriately in the chat web window
- Pass appropriate reference data down to the LLM
2023-11-17 16:19:11 -08:00
Debanjum Singh Solanky
55785d50c3 Use title, when present, as root ancestor of entries instead of file path 2023-11-17 15:03:27 -08:00
sabaimran
bfbe273ffd Add some styling to the copy button for programmatic output 2023-11-17 12:18:35 -08:00
sabaimran
9ddf3b58c3 Use the markdown parser for rendering the chat messages in the web interface 2023-11-17 12:14:02 -08:00
sabaimran
a0b12b001a Provide in-line rendering when output matches certain views 2023-11-17 11:04:36 -08:00
sabaimran
ec06d2c446 Move data indexer files into a separate folder under processor. Update assoc UTs 2023-11-16 17:19:55 -08:00
sabaimran
45a42faec8 Make adjectives more positive for api token generation 2023-11-16 15:55:35 -08:00
sabaimran
118f1143ff When user tries using the notes slash command without having any data indexed 2023-11-16 12:52:39 -08:00
sabaimran
e8a13f0813
Add multi-user support to Khoj and use Postgres for backend storage (#549)
- Adds support for multiple users to be connected to the same Khoj instance using their Google login credentials
- Moves storage solution from in-memory json data to a Postgres db. This stores all relevant information, including accounts, embeddings, chat history, server side chat configuration
- Adds the concept of a Khoj server admin for configuring instance-wide settings regarding search model, and chat configuration
- Miscellaneous updates and fixes to the UX, including chat references, colors, and an updated config page
- Adds billing to allow users to subscribe to the cloud service easily
- Adds a separate GitHub action for building the dockerized production (tag `prod`) and dev (tag `dev`) images, separate from the image used for local building. The production image uses `gunicorn` with multiple workers to run the server.
- Updates all clients (Obsidian, Emacs, Desktop) to follow the client/server architecture. The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration.
- Changes license to GNU AGPLv3

Resolves #467 
Resolves #488 
Resolves #303 
Resolves #345 
Resolves #195 
Resolves #280 
Resolves #461 
Closes #259 
Resolves #351
Resolves #301
Resolves #296
2023-11-16 11:48:01 -08:00
Debanjum Singh Solanky
74403e3536 Add ancestor headings of each org-mode entry to their compiled form
Resolves #85
2023-11-16 02:54:41 -08:00
Debanjum Singh Solanky
305c25ae1a Track ancestor headings for each org-mode entry in org-node parser 2023-11-16 02:39:14 -08:00
Debanjum Singh Solanky
cc05013715 Update first run message on Web app with Chat models setup instructions
- Link to Django admin panel for user to create Chat Models on their
  Khoj server
- This should only get hit when user is not using Khoj cloud, as Khoj
  cloud would already have Chat models configured
2023-11-15 22:44:24 -08:00
Debanjum Singh Solanky
6c1693b8f4 Update first run message on Desktop app with API token setup instructions
- Open Web app settings in the default browser via link click
- Open Desktop app settings via link click
2023-11-15 22:44:11 -08:00
Debanjum Singh Solanky
922983bd53 Set max cos distance to 0.18. Test search API query with max distance 2023-11-15 20:26:21 -08:00
Debanjum Singh Solanky
18dbad5edb Use Sigmoid to normalize cross-encoder score between 0-1
- While sigmoid normalization isn't required for reranking.
  Normalizing score to distance metrics for both encoder and cross
  encoder scores is useful to reason about them
- Softmax wasn't required as don't need probabilities, sigmoid is good
  enough to get distance metric
2023-11-15 19:31:59 -08:00
sabaimran
ea144de438 Merge with master 2023-11-15 18:34:46 -08:00
Debanjum Singh Solanky
348cc0cf0e Use better name for DB adapter func to create user by Google token 2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky
08a057bdd5 Rename SearchModel to SearchModelConfig DB model, Require Cross-Encoder 2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky
0679b2a7bd Use embeddings model store from state in text to entries
Do not need to instantiating it separately. In all other places we're
using the embeddings model store in global state anyway
2023-11-15 17:31:50 -08:00
sabaimran
245a9cbf63 Fix return type of the update_or_create method 2023-11-15 17:31:50 -08:00
sabaimran
bbae7dd83c Update logic for creating a new user to use aupdate_or_create 2023-11-15 17:31:50 -08:00
sabaimran
8e62af77b9 Update format for return type of the generate token mehtod 2023-11-15 17:03:01 -08:00
sabaimran
4a487aff23 Fix return type of the update_or_create method 2023-11-15 14:35:42 -08:00
sabaimran
b63856ecb4 Update logic for creating a new user to use aupdate_or_create 2023-11-15 12:50:39 -08:00
sabaimran
b8e7488a95 Use a more permissive distance filter for search results from notes 2023-11-15 11:13:47 -08:00
sabaimran
05b7542115 Remove config lock from the state 2023-11-15 10:44:45 -08:00
sabaimran
ecd005cac0 Check if search model is already in DB before creating a new one 2023-11-15 10:41:35 -08:00
Debanjum Singh Solanky
9c6e7bdea2 Upgrade server, desktop app dependencies to resolve CVE bugs 2023-11-15 01:47:53 -08:00
Debanjum Singh Solanky
8f200cf53f Remove unused parameter from configure_search_type method 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
f8e5e118e1 Only create KhojUser on login if doesn't already exist 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
3d8d6145f2 Add search model config from khoj.yml to Postgres DB via migration script 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
4af194d74b Make search model configurable on server
- Expose ability to modify search model via Django admin interface
- Previously the bi_encoder and cross_encoder models to use were set
  in code
- Now it's user configurable but with a default config generated by
  default
2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky
e98141f4c3 Subscribe default user to standard plan with a far away renewal date
Self hosted users in anonymous mode have all capabilities unlocked
2023-11-14 16:31:39 -08:00
Debanjum Singh Solanky
9d30fda26d Deduplicate, improve name of prompt templates for GPT4All chat models
- Do not pass unused rerank_results parameter to text_search.query method
2023-11-14 16:31:09 -08:00
Debanjum Singh Solanky
795ec9eb55 Add KHOJ_prefix to server admin credentials environment variables 2023-11-14 16:13:13 -08:00
sabaimran
ee005de662 Rename django files URL to server instead of django 2023-11-14 12:36:38 -08:00
sabaimran
20ce3d0c78 Update default docker compose configuration with Khoj local mode 2023-11-14 12:21:26 -08:00
sabaimran
8c36079f74 Add a first run experience to intialize the admin user if none exists and setup chat models 2023-11-13 21:07:12 -08:00
Debanjum Singh Solanky
e9adb58c16 Rate limit calls to the /chat API per user, per day/minute 2023-11-13 19:41:46 -08:00
Debanjum Singh Solanky
33a8eb0470 Log when new user is created 2023-11-13 19:37:24 -08:00
sabaimran
603f838115 Block input text field when waiting for chat response 2023-11-11 17:14:37 -08:00
Debanjum Singh Solanky
9c321ac070 Fix cross encoder to use softmax to convert it to a distance metric 2023-11-11 16:12:24 -08:00
sabaimran
8a824167cf Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references 2023-11-11 12:59:31 -08:00
sabaimran
fa428932a8 Update URL for downloading the desktop application 2023-11-11 12:59:15 -08:00
Debanjum Singh Solanky
941c7f23a3 Only get text search results above confidence threshold via API
- During the migration, the confidence score stopped being used. It
  was being passed down from API to some point and went unused

- Remove score thresholding for images as image search confidence
  score different from text search model distance score

- Default score threshold of 0.15 is experimentally determined by
  manually looking at search results vs distance for a few queries

- Use distance instead of confidence as metric for search result quality
  Previously we'd moved text search to a distance metric from a
  confidence score.

  Now convert even cross encoder, image search scores to distance metric
  for consistent results sorting
2023-11-11 04:11:33 -08:00
Debanjum Singh Solanky
e44e6df221 Reduce data dumped in console log from web, desktop app 2023-11-11 02:05:07 -08:00
Debanjum Singh Solanky
f044a89d50 Show status in Save, Reinitialize button of config page on web app
- Show non-transient error message in status element if action fails
- On success, just show temporary success message within button
2023-11-11 02:04:58 -08:00
Debanjum Singh Solanky
f17d9da36c Move Configure, Reinitialize buttons into the Content section on Web app
Remove the Results Count button from the web app. It's hanging weirdly
with not much context to its purpose.

Reintroduce it in the Search card when created under the Features section
2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky
325cb0f7fb Show message in Save button of Github, Notion config save in web app
Show the success, failure message only temporarily. Previously it
stuck around after clicking save until page refresh
2023-11-11 02:01:39 -08:00
Debanjum Singh Solanky
b34d4fa741 Save config, update index on save of Github, Notion config in web app
Reduce user confusion by joining config update with index updation for
each content type.

So only a single click required to configure any content type instead
of two clicks on two separate pages
2023-11-11 00:33:49 -08:00
Debanjum Singh Solanky
c4364b9100 Weaken asking follow-up qs and q&a mode in notes prompt to OpenAI models
- Notes prompt doesn't need to be so tuned to question answering. User
could just want to talk about life. The notes need to be used to
response to those, not necessarily only retrieve answers from notes

- System and notes prompts were forcing asking follow-up questions a
  little too much. Reduce strength of follow-up question asking
2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky
cba371678d Stop OpenAI chat from emitting reference notes directly in chat body
The Chat models sometime output reference notes directly in the chat
body in unformatted form, specifically as Notes:\n['. Prevent that.
Reference notes are shown in clean, formatted form anyway
2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky
8585976f37 Revert "Use notes in system prompt, rather than in the user message"
This reverts commit e695b9ab8c.
2023-11-10 23:36:43 -08:00
Debanjum Singh Solanky
b6441683c6 Increase reference text on 1st expansion to 3 lines and 140 characters 2023-11-10 23:36:43 -08:00
sabaimran
55c97241b5 Merge branch 'fix/imports-and-references' of github.com:khoj-ai/khoj into fix/imports-and-references 2023-11-10 22:38:34 -08:00
sabaimran
e2e96f9aa4 Add default settings to let new users be subscribed on trial
- Add the default user to a subscription trial
- Update associated unit tests
2023-11-10 22:38:28 -08:00
Debanjum Singh Solanky
501e7606a0 Increase reference text on 1st expansion to 3 lines and 140 characters 2023-11-10 21:27:04 -08:00
sabaimran
0a950d9382 Fix checker to determine if obsidian client is connected 2023-11-10 19:21:58 -08:00
sabaimran
c736604366 Merge with remote 2023-11-10 17:50:15 -08:00
sabaimran
b0b07bde6c Allow chat reference to expand enough to show the whole reference, rather than constraining the height 2023-11-10 17:49:20 -08:00
sabaimran
14f8c151c8 Fix return type of the generate_chat_response method 2023-11-10 17:48:54 -08:00
Debanjum Singh Solanky
45b8670c25 Fix return type hint for generate_chat_response func 2023-11-10 17:34:19 -08:00
Debanjum Singh Solanky
9b6c5ddba4 Update action row padding in cards on config page of web app 2023-11-10 16:53:25 -08:00
sabaimran
54d4fd0e08 Add chat_model data for logging selected models to telemetry 2023-11-10 16:46:34 -08:00
sabaimran
e695b9ab8c Use notes in system prompt, rather than in the user message 2023-11-10 15:09:33 -08:00
sabaimran
cec932d88a Update prompt so that GPT is more context aware with its capabilities 2023-11-10 14:37:11 -08:00
sabaimran
e62788ad79 Await result for determining if user has entries 2023-11-10 13:51:56 -08:00
sabaimran
1a56344f12 Remove the old syncData reference as it no longer exists 2023-11-10 10:10:07 -08:00
Debanjum Singh Solanky
39ad1c6ce6 Release Khoj version 0.14.0
Fix Khoj subtitle in manifest of Khoj Obsidian plugin
2023-11-10 00:28:33 -08:00
Debanjum Singh Solanky
745d6bfeed Add detailed intro message, mention download desktop app for docs sync 2023-11-10 00:20:28 -08:00
Debanjum Singh Solanky
6eb7df717c Only show search in web app nav pane if user has documents indexed 2023-11-09 19:14:54 -08:00
Debanjum Singh Solanky
c0789dc57b Use email to get_user_subscription from DB and other DB adapters
- Needing user subscription requires chaining function
- Simplify get_file_sources DB adapter
2023-11-09 19:09:57 -08:00
Debanjum Singh Solanky
841ed95521 Move active user profile halo check into nav pane macro on web app 2023-11-09 18:05:19 -08:00
Debanjum Singh Solanky
ddac693762 Hide download desktop app message in web app if synced files exist 2023-11-09 17:47:00 -08:00
Debanjum Singh Solanky
30a9674f25 Mark generated profile pic with subscription circle in web app 2023-11-09 15:22:38 -08:00
Debanjum Singh Solanky
d6e6ed1cfa Keep single Save button, Show next sync, default to prod Khoj URL in Desktop app
- Make mutable syncing variable not a const
- Show next sync time to make users aware of data sync is automated
- Keep a single Save button to reduce confusion. It does what Save All
  previously did. Intent to manual sync should Save All
- Default to using app.khoj.dev as default Khoj URL to ease setup
2023-11-09 14:04:58 -08:00
Debanjum Singh Solanky
e1f0128576 Change config migration script to update to 0.15.0 version
Next release, 0.14.0 wouldn't contain the migration to Postgres
2023-11-09 12:21:58 -08:00
Debanjum Singh Solanky
17cbbb0b01 Use Consistent Environment Variable for KHOJ_DEBUG 2023-11-09 11:01:28 -08:00
Debanjum Singh Solanky
391db80499 Improve subscribed user profile pictures and nav pane selection
- Add yellow halo around subscribed user profile
- Fix highlighting current page in header nav pane
2023-11-09 00:57:05 -08:00
Debanjum Singh Solanky
605058c72a Allow null user profile picture from Google OAuth in DB
- Fix width of generated profile picture generated for user
- Ignore unused Stripe webhook events
2023-11-09 00:46:59 -08:00
Debanjum Singh Solanky
a2609973b8 Disable Subscription if Stripe environment not setup
Deduplicate DJANGO_SECRET_KEY and KHOJ_DJANGO_SECRET_KEY to latter
name as prefixed with KHOJ as KHOJ app specific
2023-11-08 19:39:32 -08:00
Debanjum Singh Solanky
09e1235832 Auto update billing card UI on (re/un-)subscribe click on web app
Previously required a page load to see the updated billing state after
clicking resubscribe or unsubscribe buttons
2023-11-08 18:38:12 -08:00
Debanjum Singh Solanky
8b8bb15866 Keep sync state in memory, initialized to false in Desktop app
Prevent deadlock if desktop app killed in middle of syncing
2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky
c043eb54ae Use typed entry source instead of raw str to map source to conf in api.py 2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky
8178004e6d Move Subscription data into separate table in DB. Merge migrations 2023-11-08 18:03:08 -08:00
Debanjum Singh Solanky
3bb10128ef Move subscription API to separate, independent router 2023-11-08 16:20:27 -08:00
Debanjum Singh Solanky
ec1395d072 Clean, merge subscription update events, API and functions
- Reduce webhook triggers for subscription updates
- Merge subscription update API endpoint, functions for (re/un-)subscribe
2023-11-08 15:55:20 -08:00
Debanjum Singh Solanky
ef5c13f968 Keep user subscription state. Update it when user has unsubscribed 2023-11-08 12:08:36 -08:00
Debanjum Singh Solanky
c52affc6d9 Get Khoj Cloud Subscription URL via environment variable 2023-11-08 12:07:53 -08:00
sabaimran
609d358b1a Use sql datetime comparison for detecting validity of subscription renewal date
- Update the unsubscribe endpoint to use query params
- Use subscription id to process unsubscribe endpoint, rather than the customer id
2023-11-07 19:17:36 -08:00
sabaimran
98cf095b65 Fix bug for rendering chat references in LLM response 2023-11-07 16:44:41 -08:00
sabaimran
0e1cdb6536 Add additional error handling for processing unknown Stripe events and fix typo in STRIPE_SIGNING env variable 2023-11-07 16:43:05 -08:00
sabaimran
08c86927cb Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into fix-improve-config-page-on-desktop-and-web-app 2023-11-07 12:46:49 -08:00
sabaimran
cec54e3a8a
Merge pull request #536 from khoj-ai/features/update-chat-ui
Update the chat UI to have richer representation of the references
2023-11-07 12:34:57 -08:00
Debanjum Singh Solanky
f466751f4d Expose card on web app config page to manage subscription to Khoj cloud 2023-11-07 10:21:00 -08:00
Debanjum Singh Solanky
9aaf475c8a Create API webhook, endpoints for subscription payments using Stripe
- Add fields to mark users as subscribed to a specific plan and
  subscription renewal date in DB
- Add ability to unsubscribe a user using their email address
- Expose webhook for stripe to callback confirming payment
2023-11-07 10:20:51 -08:00
Debanjum Singh Solanky
156421d30a Show file type icons for each indexed file in config card of web app 2023-11-07 05:48:44 -08:00
Debanjum Singh Solanky
045c2252d6 Set content enabled status on update via config buttons on web app
Previously hitting configure or disable wouldn't update the state of
the content cards. It needed page refresh to see if the content was
synced correctly.

Now cards automatically get set to new state on hitting disable button
on card or global configure buttons
2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
7c424e0d5f Enable deleting all indexed desktop files from Khoj via Desktop app 2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
779fa531a5 Prevent Desktop app triggering multiple simultaneous syncs to server
Lock syncing to server if a sync is already in progress.

While the sync save button gets disabled while sync is in progress,
the background sync job can still trigger a sync in parallel. This
sync lock prevents that
2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
404d47f1a1 Bubble up content indexing errors to notify user on client apps 2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky
6e957584ac Create config page on web app to manage computer files indexed by Khoj
Remove the table of all files indexed by Khoj. This seems overkill and
doesn't match the UI semantics of the other data sources like Github,
Notion.

Create instead a data source card for computer files with the same
update, disable semantics of the Github and Notion data source cards

Users can disable each data source from its card on the main config page.

They can see/delete individual files indexed from the computer data source
once they click into the computer files data source card on the config page
2023-11-07 04:42:53 -08:00
Debanjum Singh Solanky
d527b644f4 Update content by source via API. Make web client use this API for config 2023-11-07 03:41:19 -08:00
Debanjum Singh Solanky
9ab327a2b6 Store the data source of each entry in database
This will be useful for updating, deleting entries by their data
source. Data source can be one of Computer, Github or Notion for now

Store each file/entries source in database
2023-11-07 02:18:48 -08:00
Debanjum Singh Solanky
c82cd0862a Delete deprecated content config pages for local files from web client
The desktop app now manages syncing local computer files to index
The server only manages "cloud" data source like github and notion.
2023-11-06 23:55:37 -08:00
Debanjum Singh Solanky
97cf8339aa Rename Sync button, Force Sync toggle to Save, Save All buttons 2023-11-06 21:57:37 -08:00
Debanjum Singh Solanky
a08b152358 Improve log messages in text_entries and memory leak unit test 2023-11-06 19:27:31 -08:00
sabaimran
6c8689e4ae Update corresponding chat UX in the desktop client as well 2023-11-06 16:18:41 -08:00
sabaimran
e01ecf1419 /s/references/reference to fix bug of jumping references 2023-11-06 16:12:25 -08:00
Debanjum
38f24a037d
Improve Indexing Text Entries (#535)
Major
- Ensure search results logic consistent across migration to DB, multi-user
- Manually verified search results for sample queries look the same across migration
 - Flatten indexing code for better indexing progress tracking and code readability

Minor
- a4f407f Test memory leak on MPS device when generating vector embeddings
- ef24485 Improve Khoj with DB setup instructions in the Django app readme (for now)
- f212cc7 Arrange remaining text search tests in arrange, act, assert order
- 022017d Fix text search tests to test updated indexing log messages
2023-11-06 16:01:53 -08:00
sabaimran
270f7b3eb3 Update the chat UI to have richer representation of the references 2023-11-05 15:46:43 -08:00
sabaimran
d697d752c2
Use repeat rather than manually specify auto in grid-template-rows
Co-authored-by: Debanjum <debanjum@gmail.com>
2023-11-05 15:23:42 -08:00
sabaimran
5f1e37fff0 Adjust indentation for css property 2023-11-05 14:33:23 -08:00
Debanjum Singh Solanky
a4f407f595 Test memory leak on MPS device when generating vector embeddings
Slope threshold of 2.0 determined qualitatively on local Mac device
Minor unused import and clean-up
2023-11-05 03:48:54 -08:00
Debanjum Singh Solanky
ef24485ada Improve Khoj with DB setup instructions in the Django app readme (for now) 2023-11-05 02:04:52 -08:00
sabaimran
084a8becc5 Fix but to prevent default in chat trigger 2023-11-04 20:13:33 -07:00
Debanjum Singh Solanky
5489e98b9c Do not index org heading entries by default
This is to maintain the previous default behavior
2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky
34b5a86d1d Use SentenceTransformer to disable progress bar when encoding query
The Langchain HuggingFaceEmbeddings wrapper doesn't support disabling
progressbar, not especially for only query but not documents.

This makes the logs noisy with encoding progressbar for each
incremental queries

No features of the Langchain wrapper for SentenceTransformer was
currently being used anyway for now, and we can always switch back to
it if required
2023-11-04 20:09:25 -07:00
Debanjum Singh Solanky
dc9946fc03 Flatten nested loops, improve progress reporting in text_to_jsonl indexer
Flatten the nested loops to improve visibilty into indexing progress

Reduce spurious logs, report the logs at aggregated level and update
the logging description text to improve indexing progress reporting
2023-11-04 20:09:25 -07:00
sabaimran
88eeee3f4b Move try/catch for import one line later 2023-11-04 19:46:47 -07:00
sabaimran
dbaa892665 Flip catching modulenotfound to import error exception 2023-11-04 19:34:10 -07:00
sabaimran
8c3d5a49da Add try/except around image extraction step 2023-11-04 19:27:18 -07:00
sabaimran
fdfab39942 Update the config UI to show all files indexed with option to delete
- Given the separation of the client and server now, the web UI will no longer support configuration of local file paths of data to index
- Expose a way to show all the files that are currently set for indexing, along with an option to delete all or specific files
2023-11-04 19:03:34 -07:00
sabaimran
800bb4f458 Remove references to demo
- The demo setting is no longer necessary for the time being, as we won't have anymore demo instances
2023-11-04 17:17:04 -07:00
sabaimran
b5972e9311 Use OCR to extract image text in PDFs 2023-11-04 17:15:28 -07:00
Debanjum Singh Solanky
8273bf26b7 Fix multi-line chat input and output render on web, desktop clients
- Remove spurious whitespace in chat input box on page load being
  added because text area element was ending on newline
- Do not insert newline in message when send message by hitting enter key
  This would be more evident when send message with cursor in the
  middle of the sentence, as a newline would be inserted at the cursor
  point
- Remove chat message separator tokens from model output. Model
  sometimes starts to output text in it's chat format
2023-11-04 01:09:35 -07:00
Debanjum Singh Solanky
2f1756cc15 Do not use icon for each file, folder to index in desktop app.
Other minor fixes based on PR feedback
2023-11-04 00:13:10 -07:00
Debanjum Singh Solanky
e8f568d79c Make splash screen wider, opaque and fix it's spinner radius
Radius should be such that final spin doesn't extend out of the circle
Opaque background improves contrast for better visual
2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky
3ef05f4803 Use css var for main font color in search, chat page of desktop app 2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky
a19cbde2d7 Add About page for Khoj to Desktop app. Expose it via system tray
- Pass current khoj version from package.json to about page via
  electron IPC between backend js and frontend page
- Update Khoj information in default About screen as well, in case
  it's exposed anywhere else
2023-11-03 23:59:21 -07:00
Debanjum Singh Solanky
a327294ee9 Rename khoj.js to utils.js in web and desktop client apps 2023-11-03 18:13:37 -07:00
Debanjum Singh Solanky
db57eeaefe Console log a welcome message on loading Desktop client 2023-11-03 05:15:41 -07:00
Debanjum Singh Solanky
6fae6fb2a4 Merge branch 'features/multi-user-support-khoj' into improve-client-app-theming 2023-11-03 04:58:41 -07:00
Debanjum Singh Solanky
4cd76311ad Slow down spinning at end of splash sequence. Make animation bigger 2023-11-03 04:28:17 -07:00
Debanjum Singh Solanky
34661c33a2 Show splash screen on starting desktop app 2023-11-03 03:19:08 -07:00
Debanjum Singh Solanky
126d3f4563 Render each file, folder to index row with icon in desktop app
Make the file, folders to index look less like an editable field
2023-11-03 02:48:42 -07:00
Debanjum Singh Solanky
80ae132cad Update Desktop, Obsidian client color theme to lighter yellow
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
  with lantern flame shade
- Add water, leaf, flower color variables
2023-11-03 02:48:42 -07:00
sabaimran
fb6ebd19fc
Fix refactor bugs, CSRF token issues for use in production (#531)
Fix refactor bugs, CSRF token issues for use in production
* Add flags for samesite settings to enable django admin login
* Include tzdata to dependencies to work around python package issues in linux
* Use DJANGO_DEBUG flag correctly
* Fix naming of entry field when creating EntryDate objects
* Correctly retrieve openai config settings
* Fix datefilter with embeddings name for field
2023-11-02 23:02:38 -07:00
Debanjum Singh Solanky
345856e7be Merge branch 'master' of github.com:khoj-ai/khoj into features/multi-user-support-khoj
Merge changes to use latest GPT4All with GPU, GGUF model support into
khoj multi-user support rearchitecture branch
2023-11-02 22:44:25 -07:00
Debanjum Singh Solanky
041074ccd6 Make chat the landing page for the desktop app
Chat, unlike search, doesn't knowledge base indexing setup.
So you can get started with chat much faster.
2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
3801105b2a Make chat the landing page for the web app
Chat, unlike search, doesn't knowledge base indexing setup.
So you can get started with chat much faster.
2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
0d4e7d46c2 Fix color and size of profile picture circle in nav pane 2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
4fbe8ac6b1 Console log a welcome message on loading web client 2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
9fc6c97139 Use Khoj standard font family, weight in web client settings page 2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
b6f07099cd Simplify login page styling on web client
- Center all elements: icon, text and button
- Use khoj icon not logo-text
- Simplify login title text
2023-11-02 20:42:21 -07:00
Debanjum Singh Solanky
7b7f6d3bc8 Update web client theme to a lighter
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
  with lantern flame shade
- Add water, leaf, flower color variables
2023-11-02 20:42:21 -07:00
sabaimran
fe860aaf83 Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into features/multi-user-support-khoj 2023-11-02 14:56:01 -07:00
sabaimran
2c9496bcf1 Add additional null checks in the migrate_server_pg script 2023-11-02 14:55:58 -07:00
sabaimran
20df0f5330 Use url_path_for for creating the login page URL in the application 2023-11-02 14:55:14 -07:00
sabaimran
fd11b78552
Fix migration script error when openai not available (#530) 2023-11-02 11:28:08 -07:00
sabaimran
fe6720fa06
[Multi-User Part 8]: Make conversation processor settings server-wide (#529)
- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
2023-11-02 10:43:27 -07:00
Debanjum Singh Solanky
12b3eeae9e Use Khoj fonts on config page of web and desktop apps too
Previously pico.css font-families were being selected for the config
page. This was different from the fonts used by index.html, chat.html

This improves spacing issue of heading further
2023-11-01 17:50:50 -07:00
Debanjum Singh Solanky
022d695309 Switch to narrow view below width of 700px on web client
This makes the dropdown menu align better to the profile picture in
mobile view
2023-11-01 17:49:44 -07:00
Debanjum Singh Solanky
6a0adfbfbb Default to profile picture with Initial if user has no profile picture 2023-11-01 17:49:44 -07:00
Tuan Nguyen
354605e73e
Autofocus to chat input when openning chat (#524) 2023-11-01 16:09:45 -07:00
Debanjum Singh Solanky
d92a2d03a7 Rename Files, Classes from X_To_JSONL to more appropriate X_To_Entries
These content processors are converting content into entries in DB
instead of entries in JSONL file
2023-11-01 14:51:33 -07:00
Debanjum Singh Solanky
2ad2055bcb Remove user null check in API controllers that require authentication 2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky
7ac5a4766d Match spacing of navigation header pane in config vs search/chat pages 2023-11-01 14:38:19 -07:00
Debanjum Singh Solanky
2e3a4a6a9b Use Jinja macro to deduplicate navigation header HTML 2023-11-01 14:38:12 -07:00
Debanjum Singh Solanky
c631b61a81 Put colors shared by index, chat html into khoj css global variables 2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky
f585a71744 Put logout, settings under dropdown menu with logged in user's profile picture
- Create dropdown menu. Put settings page, logout action under it
- Make user's profile picture the dropdown menu heading
- Create khoj.js to store shared js across web client
  It currently stores the dropdown menu open, close functionality
- Put shared styling for khoj dropdown menu under khoj.css
2023-11-01 02:13:24 -07:00
Debanjum Singh Solanky
58a7171911 Show truncated API key for identification & restrict table width
- Use a function to generate API Key table row HTML, to dedup logic
- Show delete, copy icon hints on hover
- Reduce length of copied message to not expand table width
- Truncating API key helps keep the API key table width within width
  of smaller width displays
2023-10-31 23:10:26 -07:00
Debanjum Singh Solanky
9cebd7f856 Add emoji icons to Search, Chat, Settings items in nav menu of Web client
Emoji icons have already been added to the Search, Chat and Settings
top navigation menu in the desktop client. This change adds these to
the web client as well
2023-10-31 22:38:44 -07:00
Debanjum Singh Solanky
f77336ba61 Add key icon for API keys table in Web client config page 2023-10-31 19:01:09 -07:00
Debanjum Singh Solanky
87e6b1eab9 Rename TextEmbeddings to TextEntries for improved readability
Improves readability as name has closer match to underlying
constructs
2023-10-31 18:55:59 -07:00
Debanjum Singh Solanky
bcbee05a9e Rename DbModels Embeddings, EmbeddingsAdapter to Entry, EntryAdapter
Improves readability as name has closer match to underlying
constructs

- Entry is any atomic item indexed by Khoj. This can be an org-mode
  entry, a markdown section, a PDF or Notion page etc.

- Embeddings are semantic vectors generated by the search ML model
  that encodes for meaning contained in an entries text.

- An "Entry" contains "Embeddings" vectors but also other metadata
  about the entry like filename etc.
2023-10-31 18:50:54 -07:00
sabaimran
54a387326c
[Multi-User Part 6]: Address small bugs and upstream PR comments (#518)
- 08654163cb: Add better parsing for XML files
- f3acfac7fb: Add a try/catch around the dateparser in order to avoid internal server errors in app
- 7d43cd62c0: Chunk embeddings generation in order to avoid large memory load
- e02d751eb3: Addresses comments from PR #498 
- a3f393edb4: Addresses comments from PR #503 
- 66eb078286: Addresses comments from PR #511 
- Address various items in https://github.com/khoj-ai/khoj/issues/527
2023-10-31 17:59:53 -07:00
sabaimran
5f3f6b7c61
[Multi-User Part 5]: Add a production Docker file and use a gunicorn configuration with it (#514)
- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
2023-10-26 13:15:31 -07:00
Debanjum
9acc722f7f
[Multi-User Part 4]: Authenticate using API Tokens (#513)
###  New
- Use API keys to authenticate from Desktop, Obsidian, Emacs clients
- Create API, UI on web app config page to CRUD API Keys
- Create user API keys table and functions to CRUD them in Database

### 🧪 Improve
- Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality
- Only load chat model to GPU if enough space, throw error on load failure
- Show encoding progress, truncate headings to max chars supported
- Add instruction to create db in Django DB setup Readme

### ⚙️ Fix
- Fix error handling when configure offline chat via Web UI
- Do not warn in anon mode about Google OAuth env vars not being set
- Fix path to load static files when server started from project root
2023-10-26 12:33:03 -07:00
sabaimran
4b6ec248a6
[Multi-User Part 3]: Separate chat sesssions based on authenticated users (#511)
- Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration.
- There does _seem_ to be some regression in chat quality, which is most likely attributable to search results.

This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
2023-10-26 11:37:41 -07:00
sabaimran
a8a82d274a
[Multi-User Part 2]: Add login pages and gate access to application behind login wall (#503)
- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
2023-10-26 10:17:29 -07:00
sabaimran
216acf545f
[Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498)
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
2023-10-26 09:42:29 -07:00
Debanjum Singh Solanky
9677eae791 Expose CLI flag to disable using GPU for offline chat model
- Offline chat models outputing gibberish when loaded onto some GPU.
  GPU support with Vulkan in GPT4All seems a bit buggy

- This change mitigates the upstream issue by allowing user to
  manually disable using GPU for offline chat

Closes #516
2023-10-25 17:51:46 -07:00
Debanjum Singh Solanky
0f1ebcae18 Upgrade to latest GPT4All. Use Mistral as default offline chat model
GPT4all now supports gguf llama.cpp chat models. Latest
GPT4All (+mistral) performs much at least 3x faster.

On Macbook Pro at ~10s response start time vs 30s-120s earlier.
Mistral is also a better chat model, although it hallucinates more
than llama-2
2023-10-22 19:04:23 -07:00
sabaimran
963cd165eb Resolve merge conflicts 2023-10-19 14:39:05 -07:00
Debanjum Singh Solanky
8346e1193c Release Khoj version 0.13.0 2023-10-18 03:43:54 -07:00
Debanjum Singh Solanky
6631fc38db Delete plaintext config via API. Catch any offline model loading exception 2023-10-18 03:37:45 -07:00
Debanjum Singh Solanky
53abd1a506 Mark sync completed on desktop client, even when no files to send
Previously Sync spinner on desktop config screen would hang when no
files to send to server & the Sync button had been manually triggered
2023-10-18 01:30:56 -07:00
Debanjum Singh Solanky
71b0012e8c Set offline chat config to default value if unset on server load 2023-10-18 00:59:43 -07:00
Debanjum Singh Solanky
cf1cdc3fe1 Disambiguate input_filter variable names in fs_syncer functions 2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky
e3cd8b4150 Only index files returned by input-filter globs in fs_syncer
Ignore .org, .pdf etc. suffixed directories under `input-filter' from
being evaluated as files.

Explicitly filter results by input-filter globs to only index files,
not directory for each text type

Add test to prevent regression

Closes #448
2023-10-17 23:32:10 -07:00
Debanjum Singh Solanky
51363d280d Do not configure khoj server for pull based indexing from khoj.el
Do not make khoj server pull update index on Obsidian plugin load.
Index is updated on push from plugin instead now/
2023-10-17 21:47:19 -07:00
Debanjum Singh Solanky
d9d133dfb9 Read text files as utf-8, instead of default os locale
On Windows, the default locale isn't utf8. Khoj had regressed to
reading files in OS specified locale encoding, e.g cp1252, cp949 etc.

It now explicitly uses utf8 encoding to read text files for indexing

Resolves #495, resolves #472
2023-10-17 21:47:19 -07:00
Debanjum
3d4576ae38
Fix encoding binary files for sync from the Desktop, Obsidian client (#506)
- Fix encoding binary files like PDFs for sync from Desktop client
- Fix encoding binary files like PDFs for sync from Obsidian client
2023-10-17 15:37:22 -07:00
Debanjum Singh Solanky
c8293998d9 Fix encoding binary files like PDFs for sync from Obsidian client
Use readBinary to read binary files like PDFs instead of read
2023-10-17 15:08:30 -07:00
sabaimran
ba60c869c9 Fix encoding binary files like PDFs for sync from Desktop client
Use readFileSync, Buffer to pass appropriately formatted binary data
2023-10-17 15:08:23 -07:00
Andrew Spott
3d7381446d
Changed globbing. Now doesn't clobber a users glob if they want to a… (#496)
* Changed globbing.  Now doesn't clobber a users glob if they want to add it, but will (if just given a directory), add a recursive glob.

Note: python's glob engine doesn't support `{}` globing, a future option is to warn if that is included.

* Fix typo in globformat variable

* Use older glob pattern for plaintext files

---------

Co-authored-by: Saba <narmiabas@gmail.com>
2023-10-17 11:26:06 -07:00
sabaimran
2646c8554d Provide a default value to offline_chat configuration of the conversation processor 2023-10-17 10:35:22 -07:00
Debanjum Singh Solanky
b8976426eb Update offline chat model config schema used by Emacs, Obsidian clients
The server uses a new schema for the conversation config. The Emacs,
Obsidian clients need to use this schema to update the conversation
config
2023-10-17 07:01:35 -07:00
Debanjum
ecc6fbfeb2
Push Files to Index from Emacs, Obsidian & Desktop Clients using Multi-Part Forms (#499)
### Overview
- Add ability to push data to index from the Emacs, Obsidian client
- Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON
  - Benefits of new mechanism
    - No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests
    - The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required
    - Binary files don't need to be encoded on client and decoded on server

### Code Details
### Major
- Use multi-part form to receive files to index on server
- Use multi-part form to send files to index on desktop client
- Send files to index on server from the khoj.el emacs client
  - Send content for indexing on server at a regular interval from khoj.el
- Send files to index on server from the khoj obsidian client
- Update tests to test multi-part/form method of pushing files to index

#### Minor
- Put indexer API endpoint under /api path segment
- Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method
- Improve emoji, message on content index updated via logger
- Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user
- Improve indexing of binary files
  - Let fs_syncer pass PDF files directly as binary before indexing
  - Use encoding of each file set in indexer request to read file 
- Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost
- Update indexer API endpoint URL to` index/update` from `indexer/batch`

Resolves #471 #243
2023-10-17 06:05:15 -07:00
Debanjum Singh Solanky
6a4f1b2188 Add more client, request details in logs by index/update API endpoint 2023-10-17 05:43:29 -07:00
Debanjum Singh Solanky
5efae1ad55 Update indexer API endpoint query params for force, content type
New URL query params, `force' and `t' match name of query parameter in
existing Khoj API endpoints

Update Desktop, Obsidian and Emacs client to call using these new API
query params. Set `client' query param from each client for telemetry
visibility
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
84654ffc5d Update indexer API endpoint URL to index/update from indexer/batch
New URL follows action oriented endpoint naming convention used for
other Khoj API endpoints

Update desktop, obsidian and emacs client to call this new API
endpoint
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
e347823ff4 Log telemetry for index updates via push to API endpoint 2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
05be6bd877 Clicking Update Index in Obsidian settings should push files to index
Use the indexer/batch API endpoint to regenerate content index rather
than the previous pull based content indexing API endpoint
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
13a3122bf3 Stop configuring server to pull files to index from Obsidian client
Obsidian client now pushes vault files to index instead
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
99a2c934a3 Add CORS policy to allow requests from khoj apps, obsidian & localhost
Using fetch from Khoj Obsidian plugin was failing due to cross-origin
request and method: no-cors didn't allow passing x-api-key custom
header. And using Obsidian's request with multi-part/form-data wasn't
possible either.
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
541cd59a49 Let fs_syncer pass PDF files directly as binary before indexing
No need to do unneeded base64 encoding/decoding to pass pdf contents
for indexing from fs_syncer to pdf_to_jsonl
2023-10-17 04:58:13 -07:00
Debanjum Singh Solanky
d27dc71dfe Use encoding of each file set in indexer request to read file
Get encoding type from multi-part/form-request body for each file
Read text files as utf-8 and pdfs, images as binary
2023-10-17 04:58:12 -07:00
Debanjum Singh Solanky
8e627a5809 Pass any files to be deleted to indexer API via Khoj Obsidian plugin
- Keep state of previously synced files to identify files to be deleted
- Last synced files stored in settings for persistence of this data
  across Obsidian reboots
2023-10-17 03:34:49 -07:00
Debanjum Singh Solanky
f2e293a149 Push Vault files to index to Khoj server using Khoj Obsidian plugin
Use the multi-part/form-data request to sync Markdown, PDF files in
vault to index on khoj server

Run scheduled job to push updates to value for indexing every 1 hour
2023-10-17 03:05:30 -07:00
Debanjum Singh Solanky
6baaaaf91a Test request body of multi-part form to update content index from khoj.el 2023-10-16 23:54:32 -07:00
Debanjum Singh Solanky
79b3f8273a Make khoj.el send files to be deleted from index to server 2023-10-16 23:53:02 -07:00
Debanjum Singh Solanky
f64fa06e22 Initialize the Khoj Transient menu on first run instead of load
This prevents Khoj from polling the Khoj server until explicitly
invoked via `khoj' entrypoint function.

Previously it'd make a request to the khoj server every time Emacs or
khoj.el was loaded

Closes #243
2023-10-16 19:11:46 -07:00
Debanjum
b4949f7f0b
Improve Offline Chat Model Experience (#494)
- Make offline chat model user configurable. Use `filename` of any [GPT4All supported  model](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models.json) like below:
- Run GPT4All Chat Model on GPU, when available via [GPT4All Vulcan support](https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan)
- Use default Llama 2 supported by GPT4All
- Make `tokenizer` and `max-prompt-size` of chat model user configurable. E.g When using chat models not in [this pre-defined list](https://github.com/khoj-ai/khoj/blob/master/src/khoj/processor/conversation/utils.py) that support larger context window or a different tokenizer.

Closes #406, #418
2023-10-16 17:44:49 -07:00
Debanjum Singh Solanky
644c3b787f Scale no. of chat history messages to use as context with max_prompt_size
Previously lookback turns was set to a static 2. But now that we
support more chat models, their prompt size vary considerably.

Make lookback_turns proportional to max_prompt_size. The truncate_messages
can remove messages if they exceed max_prompt_size later

This lets Khoj pass more of the chat history as context for models
with larger context window
2023-10-16 17:22:28 -07:00
Debanjum Singh Solanky
df1d74a879 Use max_prompt_size, tokenizer from config for chat model context stuffing 2023-10-15 16:52:53 -07:00
Debanjum Singh Solanky
116595b351 Use chat_model specified in new offline_chat section of config
- Dedupe offline_chat_model variable. Only reference offline chat
  model stored under offline_chat. Delete the previous chat_model
  field under GPT4AllProcessorConfig

- Set offline chat model to use via config/offline_chat API endpoint
2023-10-15 16:37:49 -07:00
Debanjum Singh Solanky
feb4f17e3d Update chat config schema. Make max_prompt, chat tokenizer configurable
This provides flexibility to use non 1st party supported chat models

- Create migration script to update khoj.yml config
  - Put `enable_offline_chat' under new `offline-chat' section
    Referring code needs to be updated to accomodate this change
  - Move `offline_chat_model' to `chat-model' under new `offline-chat' section
  - Put chat `tokenizer` under new `offline-chat' section
  - Put `max_prompt' under existing `conversation' section
    As `max_prompt' size effects both openai and offline chat models
2023-10-15 16:35:11 -07:00
sabaimran
c125995d94
[Multi-User]: Part 0 - Add support for logging in with Google (#487)
* Add concept of user authentication to the request session via GoogleUser
2023-10-14 19:39:13 -07:00
Debanjum Singh Solanky
247e75595c Use AutoTokenizer to support more tokenizers 2023-10-14 16:54:52 -07:00
Saba
ff2dbadc9d Use computed plaintext_content to set file content rather than calling f.read again 2023-10-14 13:28:34 -07:00
Debanjum Singh Solanky
1ad8b150e8 Add default tokenizer, max_prompt as fallback for non-default offline chat models
Pass user configured chat model as argument to use by converse_offline

The proper fix for this would allow users to configure the max_prompt
and tokenizer to use (while supplying default ones, if none provided)
For now, this is a reasonable start.
2023-10-13 22:48:56 -07:00
Debanjum Singh Solanky
56bd69d5af Improve Llama v2 extract questions actor and associated prompt
- Format extract questions prompt format with newlines and whitespaces
- Make llama v2 extract questions prompt consistent

- Remove empty questions extracted by offline extract_questions actor
- Update implicit qs extraction unit test for offline search actor
2023-10-13 22:48:56 -07:00
sabaimran
09bb3686cc
Strip the incoming query from the slash conversation command (#500)
* Strip the incoming query from the slash conversation command before passing it to the model or for search
* Return q when content index not loaded
* Remove -n 4 from pytest ini configuration to isolate test failures
2023-10-13 21:11:23 -07:00
Debanjum Singh Solanky
96c0b21285 Sync desktop app package.json with other Khoj clients metadata
- Make `bump_version.sh' script set version for the Khoj desktop app too
- Sync Khoj desktop app authors, license, description and version with
  the other interfaces and server
- Update description in packages metadata to match project subtitle on Github
2023-10-13 20:43:55 -07:00
sabaimran
80fb56b8a5 Sync deksktop app package version with the other releases 2023-10-13 19:23:00 -07:00
Debanjum Singh Solanky
b669aa2395 Clean and fix the content indexing code in the Emacs client
- Pass payloads as unibyte. This was causing the request to fail for
  files with unicode characters
- Suppress messages with file content in on index updates
- Fix rendering response from server on index update API call
- Extract code to populate body of index update HTTP request with files
2023-10-13 18:00:37 -07:00
Debanjum Singh Solanky
bea196aa30 Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method
Previously global state of `url-request-method' would affect the
kind of request made to api/config/data API endpoint as it wasn't
being explicitly being set before calling the API endpoint

This was done with the assumption that the default value of GET for
url-request-method wouldn't change globally

But in some cases, experientially, it can get changed. This was
resulting in khoj.el load failing as POST request was being made
instead which would throw error
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
292f0420ad Send content for indexing on server at a regular interval from khoj.el
- Allow indexing frequency to be configurable by user
- Ensure there is only one khoj indexing timer running
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
fc99431754 Send files to index on server from the khoj.el emacs client
- Add elisp variable to set API key to engage with the Khoj server
- Use multi-part form to POST the files to index to the indexer API
  endpoint on the khoj server
2023-10-12 20:58:52 -07:00
Debanjum Singh Solanky
68018ef397 Use multi-part form to send files to index on desktop client
- Add typing for variables in for loop and other minor formatting clean-up
- Assume utf8 encoding for text files and binary for image, pdf files
2023-10-12 20:58:49 -07:00
Debanjum Singh Solanky
7190b3811d Remove all filter terms in user query from defiltered_query
Previously only the the last filter's terms were getting effectively
applied as the `filter.defilter' operation was being done on
`user_query' but was updating the `defiltered_query'
2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky
60e9a61647 Use multi-part form to receive files to index on server
- This uses existing HTTP affordance to process files
  - Better handling of binary file formats as removes need to url encode/decode
  - Less memory utilization than streaming json as files get
    automatically written to disk once memory utilization exceeds preset limits
  - No manual parsing of raw files streams required
2023-10-11 23:58:23 -07:00
Debanjum Singh Solanky
9ba173bc2d Improve emoji, message on content index updated via logger
Use mailbox closed with flag down once content index completed.

Use standard, existing logger messages in new indexer messages, when
files to index sent by clients
2023-10-11 17:12:03 -07:00
Debanjum Singh Solanky
6aa69da3ef Put indexer API endpoint under /api path segment
Update FastAPI app router, desktop app and to use new url path to
batch indexer API endpoint

All api endpoints should exist under /api path segment
2023-10-09 21:35:58 -07:00
Debanjum Singh Solanky
f6f7a62d80 Wait for user to stop typing to trigger search from khoj.el in Emacs
- Improves user experience by aligning idle time with search latency
  to avoid display jitter (to render results) while user is typing

- Makes the idle time configurable

Closes #480
2023-10-06 12:44:45 -07:00
sabaimran
5c4f0d42b7 Return new default config in API endpoint 2023-10-06 12:30:09 -07:00
sabaimran
052b25af0a Update default configuration passed to Khoj clients to circumvent valiation issues 2023-10-06 12:29:15 -07:00
Debanjum Singh Solanky
a85ff941ca Make offline chat model user configurable
Only GPT4All supported Llama v2 models will work given the prompt
structure is not currently configurable
2023-10-04 20:41:14 -07:00
Debanjum Singh Solanky
d1ff812021 Run GPT4All Chat Model on GPU, when available
GPT4All now supports running models on GPU via Vulkan
2023-10-04 18:42:12 -07:00
Debanjum Singh Solanky
13b16a4364 Use default Llama 2 supported by GPT4All
Remove custom logic to download custom Llama 2 model.
This was added as GPT4All didn't support Llama 2 when it was added to Khoj
2023-10-03 19:01:54 -07:00
sabaimran
4a5ed7f06c
Update Khoj package version for Electron, Desktop app (#492)
* Address package upgrade for Electron application
* Update package version for Electron desktop application
2023-10-03 12:21:32 -07:00
sabaimran
3f962a55c3
Fix Linux Desktop Application (#491)
* Use separate functions for adding files and folders to configuration for indexing
* Add a loading bar while data is syncing
* Bump the minor version for the application
2023-10-03 11:43:19 -07:00
sabaimran
63b3696af0 Release Khoj version 0.12.3 2023-09-26 22:41:11 -07:00
sabaimran
d2f9bca1cf Fix null ref issue in query method and update logic for determining whether khoj is already configured in obsidian 2023-09-26 22:33:44 -07:00
sabaimran
2f18383349 Release Khoj version 0.12.2 2023-09-26 11:59:47 -07:00
sabaimran
588f35b6e9 Add max prompt size for gpt-3.5-turbo-16k 2023-09-26 10:57:35 -07:00
sabaimran
4e370d7a18 Release Khoj version 0.12.1 2023-09-26 09:24:53 -07:00
sabaimran
3675aa348a Update naming of Khoj in manifest.json for Obsidian 2023-09-26 09:24:36 -07:00
sabaimran
a82d1becc3 Release Khoj version 0.12.0 2023-09-26 09:17:56 -07:00
sabaimran
38f0df3d53 Remove unused icons from electron app folder 2023-09-26 07:56:29 -07:00
sabaimran
5e16074b92 Fix comparison for search type in plugins mode 2023-09-25 10:57:17 -07:00
sabaimran
2dd15e9f63
Resolve issues with GPT4All and fix prompt for yesterday extract questions date filter (#483)
- GPT4All integration had ceased working with 0.1.7 specification. Update to use 1.0.12. At a later date, we should also use first party support for llama v2 via gpt4all
- Update the system prompt for the extract_questions flow to add start and end date to the yesterday date filter example.
- Update all setup data in conftest.py to use new client-server indexing pattern
2023-09-18 14:41:26 -07:00
sabaimran
b225d1188c Fix formatting of gpt.py 2023-09-18 11:09:02 -07:00
Jonny-GM
34b202b868
More lenient date searching (#481)
* Modify DateFilter to use compiled entry key
* Instruct search to include date in query
* Minor prompt change
* Prompt fix
2023-09-18 10:46:00 -07:00
sabaimran
16874e1953 Provide force fallback for regeneration 2023-09-12 16:35:07 -07:00
sabaimran
9f42a1a036 Propagate flags to configure index command 2023-09-11 10:33:44 -07:00
sabaimran
343854752c
Improve docker builds for local hosting (#476)
* Remove GPT4All dependency in pyproject.toml and use multiplatform builds in the dockerization setup in GH actions
* Move configure_search method into indexer
* Add conditional installation for gpt4all
* Add hint to go to localhost:42110 in the docs. Addresses #477
2023-09-08 17:07:26 -07:00
sabaimran
dccfae3853
Remove PySide dependency and deprecate desktop builds (#475)
* Remove PySide, gui option from code
* Remove pyside 6 dependency from code
* Remove workflows which build desktop applications
* Update unit tests and update line in documentation
* Remove additional references to pyinstaller, gui
* Add uninstall steps to normal uninstall instructions
2023-09-07 11:36:27 -07:00
sabaimran
76562f4250
Add front-end Electron application for Khoj local file syncing (#473)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Use state.host and state.port for configuring the URL for the indexer
* Fix parsing of PDF files
* Read markdown files from streamed data and update unit tests
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
* Init: refactor indexer/batch endpoint to support a generic file ingestion format
* Add features to better support indexing from files sent by the desktop client
* Initial commit with Electron application
- Adds electron app
* Add import for pymupdf, remove import for pypdf
* Allow user to configure khoj host URL
* Remove search type configuration from index.html
* Use v1 path for current indexer routes
2023-09-06 12:04:18 -07:00
bholagabbar
205dc90746
Fix notion title bug (#474)
* Update notion_to_jsonl.py
* Fix try-catch block
2023-09-05 10:47:42 -07:00
sabaimran
4854258047
Move to a push-first model for retrieving embeddings from local files (#457)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
2023-08-31 12:55:17 -07:00
sabaimran
92cbfef7ab Skip plaintext file indexing if there's a parsing issue and log the file 2023-08-29 14:34:08 -07:00
sabaimran
74409c2c64 Release Khoj version 0.11.4 2023-08-29 11:44:35 -07:00
sabaimran
1b85958bcc trim chat input start 2023-08-28 19:18:10 -07:00
sabaimran
e592f6eac8 Release Khoj version 0.11.3 2023-08-28 14:46:03 -07:00
sabaimran
7c35da9fc4 Fix bug in /chat endpoint for general and update depdendencies 2023-08-28 14:12:11 -07:00
sabaimran
bc09143856 Release Khoj version 0.11.2 2023-08-28 10:16:13 -07:00
Debanjum Singh Solanky
01b310635e Enable passing search query filters via chat and test it 2023-08-28 09:24:32 -07:00
Debanjum Singh Solanky
794bad8bcb Make date_filter.extract_date_range method always return a list type 2023-08-28 00:55:28 -07:00
Debanjum Singh Solanky
d5a2de6222 Add method to extract filter terms from query to all filters
- Test the get_filter_term method in all 3 word, file, date filters
- Make the existing can_filter method by default in base filter abstract class
2023-08-28 00:55:28 -07:00
Debanjum
150105505b
Add Default chat command. Make Khoj ask clarifying questions (#468)
- Make Khoj ask clarifying questions when answer not in provided context
- Add default conversation command to auto switch b/w general, notes modes
- Show filtered list of commands available with the currently input text
- Use general prompt when no references found and not in Notes mode
- Test general and notes slash commands in offline chat director tests
2023-08-28 00:52:57 -07:00
Debanjum Singh Solanky
eb6cd4f8d0 Use general prompt when no references found and not in Notes mode 2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
edffbad837 Make Khoj ask clarifying questions when answer not in provided context
Previously it would just refuse ask for clarification. This improves
the chat quality score for the existing director tests
2023-08-28 00:47:02 -07:00
Debanjum Singh Solanky
75c1016ec0 Show filtered list of commands available with the currently input text 2023-08-28 00:46:10 -07:00
Debanjum Singh Solanky
74605f6159 Add default conversation command to auto switch b/w general, notes modes
This was the default behavior but behavior regressed when adding slash
commands in PR #463
2023-08-28 00:46:10 -07:00
sabaimran
cbc978ea08 Update help links for notion, github to point to the main docs 2023-08-27 15:02:55 -07:00
sabaimran
b45e1d8c0d
Fix plaintext HTML parsing and rendering (#464)
* Store conversation command options in an Enum
* Move to slash commands instead of using @ to specify general commands
* Calculate conversation command once & pass it as arg to child funcs
* Add /notes command to respond using only knowledge base as context
This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base
* Test general and notes slash commands in openai chat director tests
---------

Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
2023-08-27 11:24:30 -07:00
Debanjum
7919787fb7
Use Slash Commands and Add Notes Slash Command (#463)
* Store conversation command options in an Enum

* Move to slash commands instead of using @ to specify general commands

* Calculate conversation command once & pass it as arg to child funcs

* Add /notes command to respond using only knowledge base as context

This prevents the chat model to try respond using it's general world
knowledge only without any references pulled from the indexed
knowledge base

* Test general and notes slash commands in openai chat director tests

* Update gpt4all tests to use md configuration

* Add a /help tooltip

* Add dynamic support for describing slash commands. Remove default and treat notes as the default type

---------

Co-authored-by: sabaimran <narmiabas@gmail.com>
2023-08-26 18:11:18 -07:00
sabaimran
e64357698d
Skip indexing single bad markdown, plaintext file (#460) 2023-08-23 15:34:56 -07:00
sabaimran
84bd579077 Format the chat outputted message with code, bolding, or italics. Add a copy button for code. Closes #445. 2023-08-19 20:02:57 -07:00
sabaimran
f9e09ba490 Do not try downloading model from GPT4All if the user is not connected to the internet 2023-08-19 19:09:21 -07:00
Debanjum Singh Solanky
3ff4e19dd2 Release Khoj version 0.11.1 2023-08-16 22:53:29 -07:00
sabaimran
4fb8c2c5e1 Pass a SIGTERM to tell the uvicorn server to exit and gracefully kill the thread 2023-08-16 21:27:05 -07:00
sabaimran
4e03dfea43
Attach the parent to the server thread, allowing the kill signal to trigger a graceful exit (#446) 2023-08-16 19:36:10 -07:00
Debanjum Singh Solanky
26c3977fb9 Remove info hint to reindex khoj on unexpected search results
The index corruption was issue resolved a while ago in #325 and
hasn't cropped up again
2023-08-16 00:58:59 -07:00
sabaimran
def909a913
Revert "Open Web interface within Desktop app in GUI mode" (#444) 2023-08-15 23:26:28 -07:00
sabaimran
6562ec6531 Release Khoj version 0.11.0 2023-08-14 19:25:03 -07:00
sabaimran
0ea901c7c1
Allow indexing to continue even if there's an issue parsing a particular org file (#430)
* Allow indexing to continue even if there's an issue parsing a particular org file
* Use approximation in pytorch comparison in text_search UT, skip additional file parser errors for org files
* Change error of expected failure
2023-08-14 07:56:33 -07:00
sabaimran
7b907add77
Add support for indexing plaintext files (#420)
* Add support for indexing plaintext files
- Adds backend support for parsing plaintext files generically (.html, .txt, .xml, .csv, .md)
- Add equivalent frontend views for setting up plaintext file indexing
- Update config, rawconfig, default config, search API, setup endpoints
* Add a nifty plaintext file icon to configure plaintext files in the Web UI
* Use generic glob path for plaintext files. Skip indexing files that aren't in whitelist
2023-08-09 15:44:40 -07:00
Ellen7ions
26bddcb65c
Add support for starting a new line with shift-enter (#412)
* Add support for starting a new line with shift-enter
* Remove useless comments. Set font-size: medium.
* Update src/khoj/interface/web/chat.html
Update the styling to have the padding, margin and line-height like before.
Co-authored-by: Debanjum <debanjum@gmail.com>
* Update src/khoj/interface/web/chat.html
Make the chat-body scroll to the bottom after resizing
Co-authored-by: Debanjum <debanjum@gmail.com>
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
2023-08-07 19:49:07 -07:00
Debanjum Singh Solanky
97609e4995 Use 500px png of khoj logo instead svg for much smaller asset size
The khoj logo svg was 1.3Mb. The 500px png of it is 38Kb.
Given all usage of khoj-logo are below 230px this should work fine
2023-08-07 18:27:11 -07:00
Debanjum
14a816d173
Open Web interface within Desktop app in GUI mode (#429)
Previously the GUI mode (with khoj --gui or using the desktop app) would open the web interface in the users default web browser. Now the web interface is just rendered within the app itself using PyQT's Webview. This gives it a more proper app like feel
2023-08-07 17:48:30 -07:00
Debanjum Singh Solanky
378b96ec1b Open the khoj app window maximized on startup 2023-08-07 15:39:05 -07:00
Debanjum Singh Solanky
ea734ba1c8 Open app in native view on starting it in GUI mode instead of on web browser
- Opens settings page on first run and landing page after in GUI mode
  Previously was only opening the GUI on linux after first run as it
  doesn't have a system tray
- Both the views are from the web interface but are rendered within
  the app instead of the browser
2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky
9c494705a8 Open the search, chat or config view in app from the system tray menu 2023-08-07 13:41:42 -07:00
Debanjum Singh Solanky
cc36b87345 Render the web interface directly within the desktop app as a webview 2023-08-07 13:41:12 -07:00
Jason Qin
3ef1b7073d
Update obsidian/manifest.json
Closes #426
2023-08-07 10:41:39 -07:00
sabaimran
738cf650b3
Explicitly set Khoj to use the default locale of the user (#425)
- Explicitly set locale using `locale.setLocale(locale.LC_ALL, '')` for localization. Relevant for datetime libraries. See [Python 3 documentation](https://docs.python.org/3/library/locale.html#locale.setlocale).
2023-08-07 09:23:24 -07:00
Muftawo
c8ef619090
fixed reference link to landing page (#417)
* Fixed zsh error no matches found
* Fixed home page 404 error
2023-08-04 10:38:14 -07:00
sabaimran
78012b8111 Avoid null ref issue when setting model state for web UI. Closes #410 2023-08-03 00:39:06 -07:00
sabaimran
0baed742e4
Add checksums to verify the correct model is downloaded as expected (#405)
* Add checksums to verify the correct model is downloaded as expected
- This should help debug issues related to corrupted model download
- If download fails, let the application continue
* If the model is not download as expected, add some indicators in the settings UI
* Add exc_info to error log if/when download fails for llamav2 model
* Simplify checksum checking logic, update key name in model state for web client
2023-08-02 23:26:52 -07:00
Debanjum Singh Solanky
e6e3acdbe4 Release Khoj version 0.10.1 2023-08-01 23:55:13 -07:00
Debanjum Singh Solanky
7c1d70aa17 Bump GPT4All response generation batch size to 512 from 256
A batch size of 512 performs ~20% better on a XPS with no GPU and 16Gb
RAM. Seems worth the tradeoff for now
2023-08-01 23:34:02 -07:00
Debanjum
16c6bfce8e
Improve Quality and Reliability of Offline Chat (#393)
# Incoming
## Major
### Fix Prompt Size Exceeded Issue
- Fix issues related to prompt size, Closes #386. Use the correct tokenizer to calculate whether the input needs to be truncated or not.

### Improve Llama 2 Model Download
- Use the correct download link for LlamaV2 -- should have been using the small model, but was using the medium
- Add better downloading logic to retry download if it failed, Closes #379 

### Fix Segmentation Fault due to Race
- Add a lock around generating chat responses from the offline model to avoid segmentation faults. Closes #367.
- Add a loading symbol to the web chat UI when the model is thinking. Closes #392

### Improve Chat Response Latency
- Improve performance of offline chat by increasing batch size (via `n_batch`) to automatically engage more cores/GPU, using smaller model and fixing prompt vs response token generation numbers. Closes #363

### Fix Fake Dialogue Continuation
- Fix formatting of user query with offline chat, this was contributing to #398
- Stop Llama 2 from Creating Fake Dialogue Continuations. Closes #398

## Minor
- Improve default message for Chat window on web when it's not configured. Include hint to use offline chat.
- Add null check in `perform_chat_checks` method
- Add offline chat director unit tests

## Performance Analysis (Time to First Token)
|  | v0.10.0 | this branch |
|-|-|-|
| Query 1 | 52s | 28s |
| Query 2 | 33s| 42s |
| Query 3 | 67s| 38s|
2023-08-01 22:07:27 -07:00
Debanjum Singh Solanky
44292afff2 Put offline model response generation behind the chat lock as well
Not just the chat response streaming
2023-08-01 21:53:52 -07:00
Debanjum Singh Solanky
1812473d27 Extract new schema version for each migration script into a variable
This should ease readability, indicates which version this
migration script will update the schema to once applied
2023-08-01 21:41:08 -07:00
Debanjum Singh Solanky
b9937549aa Simplify migration scripts management. Make them use static version
- Only make them update config when it's run conditions are satisfies
- Use static schema version to simplify reasoning about run conditions
2023-08-01 21:28:20 -07:00
Debanjum Singh Solanky
185a1fbed7 Remove old chat setup timer. It is mislabelled, irrelevant since streaming 2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
c2b7a14ed5 Fix context, response size for Llama 2 to stay within max token limits
Create regression text to ensure it does not throw the prompt size
exceeded context window error
2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
6e4050fa81 Make Llama 2 stop generating response on hitting specified stop words
It would previously some times start generating fake dialogue with
it's internal prompt patterns of <s>[INST] in responses.

This is a jarring experience. Stop generation response when hit <s>

Resolves #398
2023-08-01 20:52:00 -07:00
Debanjum Singh Solanky
aa6846395d Fix offline model migration script to run for version < 0.10.1
- Use same batch_size in extract question actor as the chat actor
- Log final location the chat model is to be stored in, instead of
  it's temp filename while it is being downloaded
2023-08-01 20:51:53 -07:00
Ikko Eltociear Ashimine
49abb9df9c
Fix typo in orgnode.py (#397)
Fix spelling of Ouput in org parser property drawer comment to Output.
2023-08-01 19:54:57 -07:00
sabaimran
f409e16137 Update some of the extract question prompts for llamav2 2023-08-01 12:23:36 -07:00
sabaimran
b11b00a9ff Add log line for time to first response 2023-08-01 10:57:38 -07:00
sabaimran
778df6be71 Add a logline when the offline model migration script runs 2023-08-01 09:27:42 -07:00
sabaimran
3a5d93d673 Add migration script for getting the new offline model 2023-08-01 09:25:05 -07:00
sabaimran
90efc2ea7a Update comments and add explanations 2023-08-01 09:24:03 -07:00
sabaimran
f7e03f6d63 Switch spinner snake case -> camel case 2023-08-01 08:52:25 -07:00
sabaimran
1c52a6993f add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources
- This also solves #367
2023-08-01 00:23:17 -07:00
sabaimran
6c3074061b Disable the input bar when chat response is in flight 2023-08-01 00:21:39 -07:00
sabaimran
c14cbe926a Add a loading symbol to web chat. Closes #392 2023-07-31 23:35:48 -07:00
sabaimran
8054bdc896 Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU) 2023-07-31 23:25:08 -07:00
sabaimran
e55e9a7b67 Fix unit tests and truncation logic 2023-07-31 21:37:59 -07:00
sabaimran
2335f11b00 Add better error handling for download processes incase of failure 2023-07-31 21:07:38 -07:00
sabaimran
209975e065 Resolve merge conflicts: let Khoj fail if the model tokenizer is not found 2023-07-31 19:12:26 -07:00
sabaimran
2d6c3cd4fa Misc. quality improvements for Llama V2
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
2023-07-31 19:11:20 -07:00
sabaimran
ca195097d7 Update chat hint message at first run 2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky
ded606c7cb Fix format of user query during general conversation with Llama 2 2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky
48e5ac0169 Do not drop system message when truncating context to max prompt size
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model

Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
2023-07-31 17:21:14 -07:00
sabaimran
88ef86ad5c
Fix typing issues for mypy (#372) 2023-07-30 19:27:48 -07:00
sabaimran
ca2c942b65 Add typing to compiled_references and inferred_queries 2023-07-30 19:10:30 -07:00
sabaimran
3646fd1449 Add a warning to indicate that Khoj is not configured to work with personal data sources 2023-07-30 18:52:10 -07:00
sabaimran
996832dc72 Allow user to chat even if content types aren't configured - use empty references 2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky
53810a0ff7 Create khoj config dir if non-existant, before writing to khoj env file 2023-07-30 01:35:36 -07:00
sabaimran
f65d157244 Release Khoj version 0.10.0 2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky
f76af869f1 Do not log the gpt4all chat response stream in khoj backend
Stream floods stdout and does not provide useful info to user
2023-07-28 19:14:04 -07:00
sabaimran
5ccb01343e
Add Offline chat to Obsidian (#359)
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
2023-07-28 18:47:56 -07:00
Debanjum
b3c1507708
Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs
- Configure using Offline Chat from Emacs: 
- Enable, Disable Offline Chat from Emacs

- Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup
- Benefits: Offline chat models are better for privacy but not great at answering questions
2023-07-28 18:06:58 -07:00
sabaimran
9f78db0579
Let Offline chat override OpenAI API settings (#362)
* Let Offline chat override OpenAI API settings
* Download the offline model whenever offline chat is enabled
* Add progressbar for download for llamav2 model to track progress
* Change ordering of n due to switch of default processor
* Flip ordering of offline/openai checks when extracting questions from query
2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky
ebfbef1f68 Configure using offline chat from Emacs
Closes #358
2023-07-28 16:07:33 -07:00
sabaimran
29081f4429 Adjust parameters for offline chat 2023-07-27 22:22:09 -07:00
sabaimran
124d97c26d
Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352)
* Working example with LlamaV2 running locally on my machine

- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format

* Add appropriate prompts for extracting questions based on a query based on llama format

* Rename Falcon to Llama and make some improvements to the extract_questions flow

* Do further tuning to extract question prompts and unit tests

* Disable extracting questions dynamically from Llama, as results are still unreliable
2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky
715d56d4f0 Use new schema to update khoj.yml config from khoj.el 2023-07-26 17:34:16 -07:00