Create Rubric to Test Chat Quality and Capabilities
### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities
### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
- Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
- Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
- Measure quality at 2 separate layers
- **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
- **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes.
- Mark desired but not currently available capabilities as expected to fail <br />
This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
answering
- Make GPT be more reliable in looking at past conversations for
forming response
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
- Previously was asking GPT to summarize the notes
- Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
- Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
- Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now
## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
- Allows Khoj to say don't know if it can't find answer to query from notes
- Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
GPT still mostly says I don't know when answer not in notes or chats
But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
- Chat uses compiled form of search results, not the raw entries to
provide context for chat. The compiled snipped search results
themselves are unique and using multiple of them for context from
the same raw note is fine if they cross the score and rank thresholds
This should improve the context provided for chat
- Also apply score_threshold, no deduplication to the answers API
- Issue
The file path separator by khoj server and the Obsidian vault were
different on Windows
- Fix
Normalize file path to use forward slash(/) to find the matching
note file in the Obsidian vault for jump to it
Resolves#177
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat
For now it is only exposed via API. Later it will be expose in the
interfaces as well
Remove ability to select different chat types from the chat web
interface as there is only a single chat type
Stop appending answers to the conversation logs
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context
- Make GPT provide reason when it can't answer the question. Gives
user context to tune their questions
- Set context by either including last 2 chat messages from active
session or past 2 conversation summaries from conversation logs
- Set personality in system message
- Place personality system message before last completed back & forth
This may stop ChatGPT forgetting its personality as conversation progresses given:
- The conditioning based on system role messages is light
- If system message is too far back in conversation history, the
model may forget its personality conditioning
- If system message at end of conversation, the model can think its
the start of a new conversation
- Inserting the system message before last completed back & forth should
prevent ChatGPT from assuming its the start of a new conversation
while not losing personality conditioning from the system message
- Simplfy the Khoj Chat API to for now just answer from users notes
instead of trying to infer other potential interaction types.
- This is the default expected behavior from the feature anyway
- Use the compiled text of the top 2 search results for context
- Benefits of using ChatGPT
- Better model
- 1/10th the price
- No hand rolled prompt required to make GPT provide more chatty,
assistant type responses
- Improve GPT prompt
- Make GPT answer users query based on provided notes instead
of summarizing the provided notes
- Make GPT be truthful using prompt and reduced temperature
- Use Official OpenAI Q&A prompt from cookbook as starting reference
- Replace summarize API with the improved answer API endpoint
- Default to answer type in chat web interface. The chat type is not
fit for default consumption yet
Previous behavior was resulting in a null reference error. As key for
the core content/search type was not present in current config
Fallback to using default config for unconfigured core content type
instead
See #165 for details
- Use emojis to make info logs easier to read
- Inform when khoj is ready to use
- Provide information on what khoj is doing while starting up
- Inform when content/search types and processors are setup
- Inform when models are being loaded from the web as this step can
take time
- Convert all other info logs to be only shown in verbose mode
- Text before headings was not being indexed due to buggy orgnode
parsing logic
- Resolved indexing intro text from files with and without headings in
them
- Ensure intro text node has heading set to all title lines collected
from the file
Resolves#165
- Test /config/types API when no plugin configured, only plugin configured
and no content configured scenarios
- Do not throw null reference exception while configuring search types
when no plugin configured
- Do not throw null reference exception on calling /config/types API
when no plugin configured
Resolves bug introduced by #173
Repro:
1. Open khoj server with `khoj` on first run
2. Install/enable Khoj Obsidian plugin (to configure khoj server)
3. Restart khoj server with `khoj`
Bug:
- Unconfigured processor and search_types are instantiated as None in
self.current_config
- While creating the desktop GUI, these null configs are attempted to
be accessed as valid dictionaries for creating their GUI panels
- This results in the null ref errors
Fix:
Use default config to create their GUI elements for unconfigured
search and processor types
Resolves#167
- Previously was return all core content types even if they had not been
setup
- Add test to validate only configured content types are returned by
the api/config/types API endpoint
- Remove need for interfaces to downcase content types returned by API
before using the type in search and other API endpoint
- Fix to check for search_type.name in plugin keys instead of value
Configure app routes after configuring server.
Import API routers after search type is dynamically populated.
Allow API to recognize the dynamically populated plugin search types
as valid type query param.
Enable searching for plugin type content.
- Remove unneeded type ignore for mps with the latest mypy
- Stop excluding PyQT desktop GUI code from MyPy checks
- Do not warn about unused ignores. Some issue with mypy giving
different errors in different environments (venv, system and pre-commit)