Commit graph

1241 commits

Author SHA1 Message Date
Debanjum Singh Solanky
510bb7e684 Use typing union in text_search for python 3.8 compatible type hinting 2023-06-27 15:59:50 -07:00
Debanjum Singh Solanky
1b11d5723d Extract search request URL builder into js function in web interface 2023-06-27 15:50:41 -07:00
Debanjum Singh Solanky
09f739b8cc Null check config, log warning instead of error when configuring search 2023-06-27 15:48:48 -07:00
Debanjum Singh Solanky
5c4eb950d5 Search across all content types via khoj.el on Emacs
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.

This results in search across all enabled asymmetric search text
content types
2023-06-20 23:39:56 -07:00
Debanjum Singh Solanky
2cd3e799d3 Improve null and type checks 2023-06-20 23:30:59 -07:00
Debanjum Singh Solanky
d5fb4196de Update web interface to allow querying all content types at once 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
5c7c8d1f46 Use async/await to fix parallelization of search across content types 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
1192e49307 Pass default value matching argument types expected by text_search methods 2023-06-20 22:21:50 -07:00
Debanjum Singh Solanky
0144e610d6 Only search across content types that work with asymmetric search 2023-06-20 22:21:46 -07:00
Debanjum Singh Solanky
6d94d6e75a Encode the asymmetric, symmetric search queries in parallel for speed
Use timer to measure time to encode queries and total search time
2023-06-20 01:18:17 -07:00
Debanjum Singh Solanky
db07362ca3 Encode user query as same across search types to speed up query time
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
  query and pass it to the query methods of each search types

TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
2023-06-19 23:29:54 -07:00
Debanjum Singh Solanky
285d17af2a Search in parallel across all enabled content types requested via API
- Update API to return content from all enabled content types when type
  is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
2023-06-19 23:29:06 -07:00
Debanjum Singh Solanky
79d325fbb6 Fix triggering @general queries in Khoj Chat 2023-06-19 23:05:33 -07:00
Debanjum Singh Solanky
e97a20d70c Set conversation type if query param set, else return chat history
Only initialize variables if query is not empty, to avoid unnecessary
compute, variable null checks etc.

Fixes #230
2023-06-19 19:59:16 -07:00
sabaimran
6224dce49d
Merge pull request #228 from debanjum/features/pretty-config-page
Update the config page to be more usable
2023-06-19 18:11:35 -07:00
sabaimran
4722a2c16d Add Github configuration page and success notifications 2023-06-18 10:06:45 -07:00
sabaimran
668135c763 Merge branch 'master' of github.com:debanjum/khoj into features/pretty-config-page 2023-06-18 08:35:09 -07:00
sabaimran
81183a1fe1 Address misc PR comments and update logo in all clients
- Rename the new logo to reflect accuracy on size (e.g., 128x128)
- Update the icns file for Mac
- Update nomenclature in settings pages
2023-06-18 08:34:58 -07:00
Debanjum Singh Solanky
a44cde2865 Show hint to re-index vault if wonky results in Obsidian search modal
Remove spurious indentation in Obsidian styles.css

Resolves #207
2023-06-18 04:53:51 -07:00
Debanjum Singh Solanky
595cc5b0f5 Use printer icon for PDF logs. Only split lines if file at web link in web interface 2023-06-18 02:26:03 -07:00
Debanjum
e06be395f9
Use Github REST API and Index Commit Messages off Github Repository
- Migrate to Github REST API instead of Llama Hub to index Markdown Docs in Github Repository
- Index Commit Messages from Github Repository as well
2023-06-18 14:51:32 +05:30
Debanjum Singh Solanky
e31a540a5e Get all md files recursively in repository by passing recursive param
Previously the `get_markdown_files' method was only getting files at
root of the repository

Fix, improve logger messages in github to jsonl processor
2023-06-18 01:47:15 -07:00
Debanjum Singh Solanky
6fdac24416 Set page size to 100 to reduce requests required to Github API to 1/3
- Default is 30. So number of paginated requests required to get all
  items (commits, files) will reduce by 67%

- No need to increase page size for the get tree Github API request from
  `get_markdown_files'

  Get tree Github API doesn't support pagination and return 100K items
  in response. This should be way more than enough for our current
  use-cases
2023-06-18 01:44:36 -07:00
Debanjum Singh Solanky
87975e589a Fix passing auth token to Github API to increase rate limits by x85
- Previously wasn't prefixing "token" to PAT token in Auth header
  This resulted in the request being considered unauthenticated

- Unauthenticated requests to Github API are limited to 60 requests/hour
  Authenticated requests to Github API are allowed 5000 requests/hour
2023-06-18 01:19:26 -07:00
Debanjum Singh Solanky
9c70af960c Extract logic to get file content from Github into a separate method 2023-06-18 01:19:13 -07:00
Debanjum Singh Solanky
10d4c38ce9 Extract Wait for rate limit reset logic into a function for reuse 2023-06-18 01:06:46 -07:00
sabaimran
aad7f825e0 Remove music configuration 2023-06-17 21:23:56 -07:00
sabaimran
5f97afbfac Ignore type checks from mypy in subindexed fields 2023-06-17 16:53:36 -07:00
sabaimran
c2d46de8bc Add endpoint for regenerating directly from the config page and add music content-type 2023-06-17 15:47:33 -07:00
sabaimran
ded3100caf Update the configuration page to make config management easier
- Add a central configuration management page to make management of config details easier
- Add relevant api endpoints both for client and server to update/request data as necessary
- Attempt to update the favicon
2023-06-17 15:21:28 -07:00
Debanjum Singh Solanky
3f24e53b6e Render URL as link in web interface if file param of result is a web link 2023-06-17 04:26:40 -07:00
Debanjum Singh Solanky
63ec84ad78 Store Github URL of Markdown files on Github in file jsonl param 2023-06-17 04:23:01 -07:00
Debanjum Singh Solanky
0c1c7583b5 Handle pagination, API rate limits. Get all commits from Github repo 2023-06-17 04:21:39 -07:00
Debanjum Singh Solanky
31d17d0b22 Index commits message from repository with the github plugin 2023-06-17 02:59:54 -07:00
Debanjum Singh Solanky
c29c141a7e Use Github Rest API to index Markdown files in Github Repository
The Llama_Hub Github plugin is fairly limited.

The Github Rest API is well supported and can easily be extended to
index commit messages, issues, discussions, PRs etc.
2023-06-17 02:16:13 -07:00
Debanjum
9f00a366ab
Add a Github plugin to index content from a Github repository
- Use the Github plugin on LlamaHub to read in markdown files from specified Github repository for indexing
- Update the desktop GUI application to take in the required parameters to read from Github
- Requires a classic PAT token for Github access
2023-06-17 12:28:47 +05:30
Saba
ac96f43b1b Remove try-catch specific to Github plugin; consolidate GUI logic 2023-06-16 23:46:25 -07:00
Saba
07ade2262a Set default value of pat_token in conftest.py to be empty string 2023-06-13 17:03:03 -07:00
Saba
751edfefe5 Add separate unit test for github. Will only run of a PAT token is set 2023-06-13 16:55:58 -07:00
Saba
3a61919344 Fix failing unit tests by hard-coding model presence of expected search types 2023-06-13 16:32:47 -07:00
Saba
019d3732de Rename orgmode_search to org_search 2023-06-13 16:06:54 -07:00
Saba
08d79f5ba4 Unify types used in Github and other text-based configs. Fix typing issues 2023-06-13 15:52:36 -07:00
Saba
a6cd96a6a9 Add a Github plugin which can be used to read from a Github repository 2023-06-13 14:40:06 -07:00
Debanjum
c68cde4803
Log clients calling API endpoints on Khoj server
- Make API endpoints on Khoj server accept `client` as request parameter
  - Khoj API endpoints: /chat, /search, /update
- Make Khoj clients set `client` request param when calling the API endpoints on the Khoj server
  - Khoj clients: Emacs, Obsidian and Web
- Also log khoj server_version running to telemetry server
2023-06-09 18:36:49 +05:30
sabaimran
59fa48036f
Merge pull request #224 from debanjum/fix/message-exceeds-prompt-size
Pass truncated message as string in ChatMessage when exceeding max prompt size
2023-06-08 17:32:53 -07:00
Debanjum Singh Solanky
139a3ba060 Update server to log new server version field to telemetry db 2023-06-08 14:14:21 +05:30
Saba
c5666e0404 Move factory dependencies to optional settings 2023-06-06 23:26:24 -07:00
Saba
5d5ebcbf7c Rename truncate messages method and update unit tests to simplify assertion logic 2023-06-06 23:25:43 -07:00
Saba
7119ed0849 Run pre-commit script 2023-06-05 19:29:23 -07:00
Saba
948ba6ddca Remove unused logger 2023-06-05 19:01:03 -07:00