- How to pip install khoj to run offline chat on GPU
After migration to llama-cpp-python more GPU types are supported but
require build step so mention how
- New default offline chat model
- Where to get supported chat models from on HuggingFace
- Benefits of moving to llama-cpp-python from gpt4all:
- Support for all GGUF format chat models
- Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac)
- Supports models with more capabilities like tools, schema
enforcement, speculative ddecoding, image gen etc.
- Upgrade default chat model, prompt size, tokenizer for new supported
chat models
- Load offline chat model when present on disk without requiring internet
- Load model onto GPU if not disabled and device has GPU
- Load model onto CPU if loading model onto GPU fails
- Create helper function to check and load model from disk, when model
glob is present on disk.
`Llama.from_pretrained' needs internet to get repo info from
HuggingFace. This isn't required, if the model is already downloaded
Didn't find any existing HF or llama.cpp method that looked for model
glob on disk without internet
### Overview
Khoj can now read website directly without needing to go through the search step first
### Details
- Parallelize simple webpage read and extractor
- Rename extract_content online results field to web pages
- Tweak prompts to extract information from webpages, online results
- Test select webpage as data source and extract web urls chat actors
- Render webpage read in chat response references on Web, Desktop apps
- Pass multiple webpages with their urls in online results context
- Support webpage command in chat API
- Add webpage chat command for read web pages requested by user
- Create chat actor for directly reading webpages based on user message
Previously even if MAX_WEBPAGES_TO_READ was > 1, only 1 extracted
content would ever be passed.
URL of the extracted webpage content wasn't passed to clients in
online results context. This limited them from being rendered
- Fallback to use webpage when SERPER not setup and online command was
attempted
- Do not stop responding if can't retrieve online results. Try to
respond without the online context
- Use the conversation id of the retrieved conversation rather than the
potentially unset conversation id passed via API
- await creating new chat when no chat id provided and no existing
conversations exist
The recently added after: operator to online search actor was too
restrictive, gave worse results than when just use natural language
dates in search query
### Improve
- Improve delete, rename chat session UX in Desktop, Web app
- Get conversation by title when requested via chat API
### Fix
- Allow unset locale for Google authenticating user
- Handle truncation when single long non-system chat message
- Fix setting chat session title from Desktop app
- Only create new chat on get if a specific chat id, slug isn't requested
Previously was assuming the system prompt is being always passed as
the first message. So expected there to be at least 2 messages in logs.
This broke chat actors querying with single long non system message.
A more robust way to extract system prompt is via the message role
instead
- Ask for Confirmation before deleting chat session in Desktop, Web app
- Save chat session rename on hitting enter in title edit input box
- No need to flash previous conversation cleared status message
- Move chat session delete button after rename button in Desktop app
- Add prompt for the read webpages chat actor to extract, infer
webpage links
- Make chat actor infer or extract webpage to read directly from user
message
- Rename previous read_webpage function to more narrow
read_webpage_at_url function
### Major
- Enforce json mode response from OpenAI chat actors prev using string lists
- Use `gpt-4-turbo-preview' as default chat model, extract questions actor
- Make Khoj read khoj website to respond with accurate, up-to-date information about itself
- Dedupe query in notes prompt. Improve OAI chat actor, director tests
### Minor
- Test data source, output mode selector, web search query chat actors
- Improve notes search actor to always create a non-empty list of queries
- Construct available data sources, output modes as a bullet list in prompts
- Use consistent agent name across static and dynamic examples in prompts
- Add actor's name to extract questions prompt to improve context for guidance
Previously only the notes references would get rendered post response
streaming when when both online and notes references were used to
respond to the user's message
- Allow passing response format type to OpenAI API via chat actors
- Convert in-context examples to use json objects instead of str lists
- Update actors outputting str list to request output to be json_object
- OpenAI's json mode enforces the model to output valid json object
- Remove stale tests
- Improve tests to pass across gpt-3.5 and gpt-4-turbo
- The haiku creation director was failing because of duplicate query in
instantiated prompt
- Remove the option for Notes search query generation actor to return
no queries. Whether search should be performed is decided before,
this step doesn't need to decide that
- But do not throw warning if the response is a list with no elements
- Add examples where user queries requesting information about Khoj
results in the "online" data source being selected
- Add an example for "general" to select chat command prompt
Previously the examples constructed from chat history used "Khoj" as
the agent's name but all 3 prompts using the func used static examples
with "AI:" as the pertinent agent's name
- Add example to read khoj.dev website for up-to-date info to setup,
use khoj, discover khoj features etc.
- Online search should use site: and after: google search operators
- Show example of adding the after: date filter to google search
- Give local event lookup example using user's current location in
query
- Remove unused select search content type prompt
### Major
- Read web pages in parallel to improve chat response time
- Read web pages directly when Olostep proxy not setup
- Include search results & web page content in online context for chat response
### Minor
- Simplify, modularize and add type hints to online search functions
Previously if a web page was read for a sub-query, only the extracted
web page content was provided as context for the given sub-query. But
the google results themselves have relevant snippets. So include them
- Simplify content arg to `extract_relevant_info' function. Validate,
clean the content arg inside the `extract_relevant_info' function
- Extract `search_with_google' function outside the parent function
- Call the parent function a more appropriate `search_online' instead
of `search_with_google'
- Simplify the `search_with_google' function using list comprehension.
Drop empty search result fields from chat model context for response
to reduce cost and response latency
- No need to show stacktrace when unable to read webpage, basic error
is enough
- Add type hints to online search functions to catch issues with mypy