Commit graph

7 commits

Author SHA1 Message Date
sabaimran
377f7668c5
Merge pull request #858 from khoj-ai/use-sse-instead-of-websocket
Use Single HTTP API for Robust, Generalizable Chat Streaming
2024-07-26 07:11:54 -07:00
Debanjum Singh Solanky
54b4203683 Update chat API client tests to mix testing of batch and streaming mode 2024-07-23 17:56:03 +05:30
Debanjum Singh Solanky
e9f86e320b Fix and improve offline chat actor, director tests
- Use updated references schema with compiled key
- Enable director tests that are now expected to pass and that do pass
  (with Gemma 2 at least)
2024-07-18 03:43:09 +05:30
Debanjum Singh Solanky
d5ceff2691 Update tests and documentation with Jina reader API usage and info
Update offline, openai chat actor, director tests to not require
Serper to run the online command tests

Update documentation for self-hosted online search to mention no setup
is required by default. But improvements can be made by using
Serper.dev or Olostep
2024-07-02 17:19:09 +05:30
Raghav Tirumale
d4e5c95711
Add Ability to Summarize Documents (#800)
* Uses entire file text and summarizer model to generate document summary.
* Uses the contents of the user's query to create a tailored summary.
* Integrates with File Filters #788 for a better UX.
2024-06-18 19:31:07 +05:30
Debanjum Singh Solanky
886d49e3a4 Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat 2024-03-31 00:59:20 +05:30
Debanjum Singh Solanky
8ca39a436c Use llama.cpp for offline chat models
- Benefits of moving to llama-cpp-python from gpt4all:
  - Support for all GGUF format chat models
  - Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
  - Supports models with more capabilities like tools, schema
    enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
  chat models

- Load offline chat model when present on disk without requiring internet
  - Load model onto GPU if not disabled and device has GPU
  - Load model onto CPU if loading model onto GPU fails
  - Create helper function to check and load model from disk, when model
    glob is present on disk.

    `Llama.from_pretrained' needs internet to get repo info from
    HuggingFace. This isn't required, if the model is already downloaded

    Didn't find any existing HF or llama.cpp method that looked for model
    glob on disk without internet
2024-03-26 22:33:01 +05:30
Renamed from tests/test_gpt4all_chat_director.py (Browse further)