Commit graph

92 commits

Author SHA1 Message Date
Debanjum Singh Solanky
8ca39a436c Use llama.cpp for offline chat models
- Benefits of moving to llama-cpp-python from gpt4all:
  - Support for all GGUF format chat models
  - Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
  - Supports models with more capabilities like tools, schema
    enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
  chat models

- Load offline chat model when present on disk without requiring internet
  - Load model onto GPU if not disabled and device has GPU
  - Load model onto CPU if loading model onto GPU fails
  - Create helper function to check and load model from disk, when model
    glob is present on disk.

    `Llama.from_pretrained' needs internet to get repo info from
    HuggingFace. This isn't required, if the model is already downloaded

    Didn't find any existing HF or llama.cpp method that looked for model
    glob on disk without internet
2024-03-26 22:33:01 +05:30
Debanjum Singh Solanky
8cdfaf41ec Update project URLs to show on pypi project page 2024-03-15 04:03:39 +05:30
Debanjum
3abe7ccb26
Improve Online Search Speed and Context (#670)
### Major
- Read web pages in parallel to improve chat response time
- Read web pages directly when Olostep proxy not setup
- Include search results & web page content in online context for chat response

### Minor
- Simplify, modularize and add type hints to online search functions
2024-03-11 22:16:30 +05:30
Debanjum Singh Solanky
88f096977b Read webpages directly when Olostep proxy not setup
This is useful for self-hosted, individual user, low traffic setups
where a proxy service is not required
2024-03-11 18:41:02 +05:30
Debanjum Singh Solanky
1105d8814f Use cross-encoder to rerank search results by default on GPU machines
Latest sentence-transformer package uses GPU for cross-encoder. This
makes it fast enough to enable reranking on machines with GPU.

Enabling search reranking by default allows (at least) users with GPUs
to side-step learning the UI affordance to rerank results
(i.e hitting Cmd/Ctrl-Enter or ENTER).
2024-03-10 14:29:21 +05:30
sabaimran
81beb7940c
Upload generated images to s3, if AWS credentials and bucket is available (#667)
* Upload generated images to s3, if AWS credentials and bucket is available.
- In clients, render the images via the URL if it's returned with a text-to-image2 intent type
* Make the loading screen more intuitve, less jerky and update the programmatic copy button
* Update the loading icon when waiting for a chat response
2024-03-08 10:54:13 +05:30
Debanjum Singh Solanky
4696577636 Upgrade python dependencies 2024-02-16 17:41:09 +05:30
Debanjum Singh Solanky
e21a8530f3 Move used python packages for test into dev dependency group
The test dependency group was being used independently
2024-02-16 17:41:09 +05:30
Debanjum Singh Solanky
cf4a524988 Move production dependencies to prod python packages group
This will reduce khoj dependencies to install for self-hosting users

- Move auth production dependencies to prod python packages group
  - Only enable authentication API router if not in anonymous mode
  - Improve error with requirements to enable authentication when not in
    anonymous mode
2024-02-16 17:41:08 +05:30
sabaimran
208ccc83ec Fix version of gpt4all to 2.1.0 as it's not backwards compatible 2024-02-10 09:32:04 +05:30
Debanjum
d1bfb245df
Improve Khoj Chat and Settings UI (#630)
* Fix license in pyproject.toml. Remove unused utils.state import

* Use single debug mode check function. Disable telemetry in debug mode

- Use single logic to check if khoj is running in debug mode.
  Previously there were 3 different variants of the check

- Do not log telemetry if KHOJ_DEBUG is set to true. Previously didn't
log telemetry even if KHOJ_DEBUG set to false

* Respect line breaks in user, khoj chat messages to improve formatting

* Disable Whatsapp config section on web client if Twilio not configured

Simplify Whatsapp configuration status checking js by standardizing
external input to lower case

* Disable Phone API when Twilio not setup and rate limit calls to it

- Move phone api to separate router and only enable it if Twilio enabled
- Add rate-limiting to OTP and verification calls

* Add slugs for phone rate limiting

---------

Co-authored-by: sabaimran <narmiabas@gmail.com>
2024-01-29 18:03:43 +05:30
sabaimran
b782683e60
Scrape results from Serper results using Olostep (#627)
* Initailize changes to incporate web scraping logic after getting SERP results
- Do some minor refactors to pass a symptom prompt to the openai model when making a query
- integrate Olostep in order to perform the webscraping
* Fix truncation error with new line, fix typing in olostep code
* Use the authorization header for the token
* Add a small hint/indicator for how to use Khojs other modalities in the welcome prompt
* Add more detailed error message if Olostep query fails
* Add unit tests which invoke Olostep in chat director
* Add test for olostep tool
2024-01-29 14:16:50 +05:30
sabaimran
679db51453
Add support for phone number authentication with Khoj (part 2) (#621)
* Allow users to configure phone numbers with the Khoj server

* Integration of API endpoint for updating phone number

* Add phone number association and OTP via Twilio for users connecting to WhatsApp

- When verified, store the result as such in the KhojUser object

* Add a Whatsapp.svg for configuring phone number

* Change setup hint depending on whether the user has a number already connected or not

* Add an integrity check for the intl tel js dependency

* Customize the UI based on whether the user has verified their phone number

- Update API routes to make nomenclature for phone addition and verification more straightforward (just /config/phone, etc).
- If user has not verified, prompt them for another verification code (if verification is enabled) in the configuration page

* Use the verified filter only if the user is linked to an account with an email

* Add some basic documentation for using the WhatsApp client with Khoj

* Point help text to the docs, rather than landing page info

* Update messages on various callbacks and add link to docs page to learn more about the integration
2024-01-22 18:14:58 -08:00
sabaimran
039ed78253
Add support for a first-party client app to call into Khoj (Part 1) (#601)
* Add support for a first party client app
- Based on a client id and client secret, allow a first party app to call into the Khoj backend with a phone number identifier
- Add migration to add phone numbers to the KhojUser object
* Add plus in front of country code when registering a phone number.
- Decrease free tier limit to 5 (from 10)
- Return a response object when handling stripe webhooks
* Fix telemetry method which references authenticated user's client app
* Add better error handling for null phone numbers, simplify logic of authenticating user
* Pull the client_secret in the API call from the authorization header
* Add a migration merge to resolve phone number and other changes
2024-01-18 19:24:14 +05:30
Debanjum
4d30f7d1d9
Short-circuit API rate limiter for unauthenticated users (#607)
### Major
- Short-circuit API rate limiter for unauthenticated user
  Calls by unauthenticated users were failing at API rate limiter as it
  failed to access user info object. This is a bug.
  
  API rate limiter should short-circuit for unauthenicated users so a
  proper Forbidden response can be returned by API
  
  Add regression test to verify that unauthenticated users get 403
  response when calling the /chat API endpoint
  
### Minor
- Remove trailing slash to normalize khoj url in obsidian plugin settings
- Move used /api/config API controllers into separate module
- Delete unused /api/beta API endpoint
- Fix error message rendering in khoj.el, khoj obsidian chat
- Handle deprecation warnings for subscribe renew date, langchain, pydantic & logger.warn
2024-01-17 00:59:52 +05:30
Debanjum Singh Solanky
2752e0d607 Update jinja2 and axios min supported package versions 2024-01-16 18:45:38 +05:30
Debanjum Singh Solanky
7039c202c8 Merge branch 'master' into short-circuit-api-rate-limiter 2024-01-16 18:18:34 +05:30
Debanjum Singh Solanky
d74f8e03d3 Pass max context length to fix using updated GPT4All.list_gpu method
It's signature was updated in GPT4All 2.1.0 pypi release.

Resolves #610
2024-01-16 12:23:45 +05:30
Debanjum Singh Solanky
7dfbcd2e5a Handle subscribe renew date, langchain, pydantic & logger.warn warnings
- Ensure langchain less than 0.2.0 is used, to prevent breaking
  ChatOpenAI, PyMuPDF usage due to their deprecation after 0.2.0
- Set subscription renewal date to a timezone aware datetime
- Use logger.warning instead of logger.warn as latter is deprecated
- Use `model_dump' not deprecated dict to get all configured content_types
2024-01-12 01:46:52 +05:30
sabaimran
5ff9df9d4c Add support per user for configuring the preferred search model from the config page
- Honor this setting across the relevant places where embeddings are used
- Convert the VectorField object to have None for dimensions in order to make the search model easily configurable
2023-12-20 13:25:43 +05:30
Debanjum Singh Solanky
7009793170 Migrate to OpenAI Python library >= 1.0 2023-12-03 18:16:00 -05:00
Debanjum Singh Solanky
de5aa5c32e Update pillow, aiohttp dependencies 2023-11-28 19:55:43 -08:00
Debanjum Singh Solanky
4636390f7f Transcribe speech to text offline with Whisper
- Allow server admin to configure offline speech to text model during
  initialization
- Use offline speech to text model to transcribe audio from clients
- Set offline whisper as default speech to text model as no setup api key reqd
2023-11-26 05:55:11 -08:00
Debanjum Singh Solanky
19e042037a Run isort with black profile to avoid conflicts between the two 2023-11-21 12:52:07 -08:00
Debanjum Singh Solanky
4e98acbca7 Update minimum pydantic version to one with model_validate function 2023-11-20 14:52:37 -08:00
Debanjum Singh Solanky
ca87b4ede9 Wrap common API query parameters into shared class to deduplicate code
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
  Earlier version didn't support wrapping common query params in class

- Use per fixture app instead of a global FastAPI app in conftest

- Upgrade minimum required Django version

- Fix no notes chat director test with updated no notes message
  No notes message was updated in commit 118f1143
2023-11-17 18:43:49 -08:00
sabaimran
d06b2cf24b Downgrade pyproject.toml to avert depedency conflict 2023-11-15 10:47:54 -08:00
Debanjum Singh Solanky
9c6e7bdea2 Upgrade server, desktop app dependencies to resolve CVE bugs 2023-11-15 01:47:53 -08:00
Debanjum Singh Solanky
9aaf475c8a Create API webhook, endpoints for subscription payments using Stripe
- Add fields to mark users as subscribed to a specific plan and
  subscription renewal date in DB
- Add ability to unsubscribe a user using their email address
- Expose webhook for stripe to callback confirming payment
2023-11-07 10:20:51 -08:00
Debanjum Singh Solanky
9f47fc8e34 Upgrade langchain version since adding support for OCR-ing PDFs 2023-11-06 21:58:33 -08:00
Debanjum
38f24a037d
Improve Indexing Text Entries (#535)
Major
- Ensure search results logic consistent across migration to DB, multi-user
- Manually verified search results for sample queries look the same across migration
 - Flatten indexing code for better indexing progress tracking and code readability

Minor
- a4f407f Test memory leak on MPS device when generating vector embeddings
- ef24485 Improve Khoj with DB setup instructions in the Django app readme (for now)
- f212cc7 Arrange remaining text search tests in arrange, act, assert order
- 022017d Fix text search tests to test updated indexing log messages
2023-11-06 16:01:53 -08:00
Debanjum Singh Solanky
a4f407f595 Test memory leak on MPS device when generating vector embeddings
Slope threshold of 2.0 determined qualitatively on local Mac device
Minor unused import and clean-up
2023-11-05 03:48:54 -08:00
sabaimran
b5972e9311 Use OCR to extract image text in PDFs 2023-11-04 17:15:28 -07:00
Debanjum Singh Solanky
6fae6fb2a4 Merge branch 'features/multi-user-support-khoj' into improve-client-app-theming 2023-11-03 04:58:41 -07:00
sabaimran
fb6ebd19fc
Fix refactor bugs, CSRF token issues for use in production (#531)
Fix refactor bugs, CSRF token issues for use in production
* Add flags for samesite settings to enable django admin login
* Include tzdata to dependencies to work around python package issues in linux
* Use DJANGO_DEBUG flag correctly
* Fix naming of entry field when creating EntryDate objects
* Correctly retrieve openai config settings
* Fix datefilter with embeddings name for field
2023-11-02 23:02:38 -07:00
Debanjum Singh Solanky
345856e7be Merge branch 'master' of github.com:khoj-ai/khoj into features/multi-user-support-khoj
Merge changes to use latest GPT4All with GPU, GGUF model support into
khoj multi-user support rearchitecture branch
2023-11-02 22:44:25 -07:00
sabaimran
54a387326c
[Multi-User Part 6]: Address small bugs and upstream PR comments (#518)
- 08654163cb: Add better parsing for XML files
- f3acfac7fb: Add a try/catch around the dateparser in order to avoid internal server errors in app
- 7d43cd62c0: Chunk embeddings generation in order to avoid large memory load
- e02d751eb3: Addresses comments from PR #498 
- a3f393edb4: Addresses comments from PR #503 
- 66eb078286: Addresses comments from PR #511 
- Address various items in https://github.com/khoj-ai/khoj/issues/527
2023-10-31 17:59:53 -07:00
sabaimran
5f3f6b7c61
[Multi-User Part 5]: Add a production Docker file and use a gunicorn configuration with it (#514)
- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
2023-10-26 13:15:31 -07:00
sabaimran
a8a82d274a
[Multi-User Part 2]: Add login pages and gate access to application behind login wall (#503)
- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
2023-10-26 10:17:29 -07:00
sabaimran
216acf545f
[Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498)
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
2023-10-26 09:42:29 -07:00
Debanjum Singh Solanky
0f1ebcae18 Upgrade to latest GPT4All. Use Mistral as default offline chat model
GPT4all now supports gguf llama.cpp chat models. Latest
GPT4All (+mistral) performs much at least 3x faster.

On Macbook Pro at ~10s response start time vs 30s-120s earlier.
Mistral is also a better chat model, although it hallucinates more
than llama-2
2023-10-22 19:04:23 -07:00
sabaimran
6dc0df3afb
Pin pytorch version to 2.0.1 in order to avoid exit code 139 in Docker container (#512) 2023-10-20 14:10:21 -07:00
sabaimran
963cd165eb Resolve merge conflicts 2023-10-19 14:39:05 -07:00
Debanjum
ecc6fbfeb2
Push Files to Index from Emacs, Obsidian & Desktop Clients using Multi-Part Forms (#499)
### Overview
- Add ability to push data to index from the Emacs, Obsidian client
- Switch to standard mechanism of syncing files via HTTP multi-part/form. Previously we were streaming the data as JSON
  - Benefits of new mechanism
    - No manual parsing of files to send or receive on clients or server is required as most have in-built mechanisms to send multi-part/form requests
    - The whole response is not required to be kept in memory to parse content as JSON. As individual files arrive they're automatically pushed to disk to conserve memory if required
    - Binary files don't need to be encoded on client and decoded on server

### Code Details
### Major
- Use multi-part form to receive files to index on server
- Use multi-part form to send files to index on desktop client
- Send files to index on server from the khoj.el emacs client
  - Send content for indexing on server at a regular interval from khoj.el
- Send files to index on server from the khoj obsidian client
- Update tests to test multi-part/form method of pushing files to index

#### Minor
- Put indexer API endpoint under /api path segment
- Explicitly make GET request to /config/data from khoj.el:khoj-server-configure method
- Improve emoji, message on content index updated via logger
- Don't call khoj server on khoj.el load, only once khoj invoked explicitly by user
- Improve indexing of binary files
  - Let fs_syncer pass PDF files directly as binary before indexing
  - Use encoding of each file set in indexer request to read file 
- Add CORS policy to khoj server. Allow requests from khoj apps, obsidian & localhost
- Update indexer API endpoint URL to` index/update` from `indexer/batch`

Resolves #471 #243
2023-10-17 06:05:15 -07:00
sabaimran
c125995d94
[Multi-User]: Part 0 - Add support for logging in with Google (#487)
* Add concept of user authentication to the request session via GoogleUser
2023-10-14 19:39:13 -07:00
sabaimran
09bb3686cc
Strip the incoming query from the slash conversation command (#500)
* Strip the incoming query from the slash conversation command before passing it to the model or for search
* Return q when content index not loaded
* Remove -n 4 from pytest ini configuration to isolate test failures
2023-10-13 21:11:23 -07:00
Debanjum Singh Solanky
96c0b21285 Sync desktop app package.json with other Khoj clients metadata
- Make `bump_version.sh' script set version for the Khoj desktop app too
- Sync Khoj desktop app authors, license, description and version with
  the other interfaces and server
- Update description in packages metadata to match project subtitle on Github
2023-10-13 20:43:55 -07:00
Debanjum Singh Solanky
72f8fde7ef Run pytests in parallel on multiple CPU cores using pytest-xdist for speed 2023-10-12 20:56:17 -07:00
Debanjum Singh Solanky
60e9a61647 Use multi-part form to receive files to index on server
- This uses existing HTTP affordance to process files
  - Better handling of binary file formats as removes need to url encode/decode
  - Less memory utilization than streaming json as files get
    automatically written to disk once memory utilization exceeds preset limits
  - No manual parsing of raw files streams required
2023-10-11 23:58:23 -07:00
Debanjum Singh Solanky
148e8f468f Restrict openai package version below 1.0.0 to avoid breaking changes 2023-10-09 19:30:58 -07:00