- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
tenacity package. This is officially suggested and used by other
popular GPT based libraries
Merge pull request #192 from debanjum/improvements-to-khoj-chat-in-emacs
### Khoj Chat on Emacs Improvements
- d78454d Load Khoj Chat buffer before asking for query to provide context
- 93e2aff Use org footnotes to add references, allows jump to def on click
- 5e9558d Stylize reference links as superscripts and show definition on hover
- bc71c19 Use `m` or `C-x m` in-buffer keybindings to send messages to Khoj
### Khoj Chat Server Improvements
- 27217a3 Time chat API sub-components for performance analysis
- 508b217 Update Chat API, Logs, Interfaces to store, use references as list
- d4b3866 Truncate message logs to below max supported prompt size by chat model
- cf28f10 Register separate timestamps for user query and response by Khoj Chat
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
argument to generate_chatml_messages_with_context method
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
- Add continuation suffix, ..., when reference definition truncated
Merge pull request #191 from debanjum/create-chat-interface-on-emacs
- Render conversation history in a read-only org-mode buffer for Khoj Chat
- Add `chat` as a transient action in the Khoj transient menu
- Style chat messages as org-mode entries
- Put received date in property drawer and keep it hidden/folded by default
- Add Khoj chat response as child entry of the users associated question org entry
This allows folding back-n-forth between user and Khoj for easier viewing
- Render source notes snippets used as references for response as org-mode links
Hovering mouse on link or opening links shows reference note snippets used
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
- Improves color coding for readability
- Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
- Generalize the render-chat-response method to handle rendering
history or chat response from chat API reponse
- Trigger rendering of khoj chat history if Khoj chat buffer not
created for this session yet
- Use org-insert-link method to improve link rendering robustness
Previous simple mechanism to crete org-links would result in links
escaping out of formating. Use a user-facing org-mode method to
remove/reduce probability of this
- Replace newlines with space to render reference notes as links
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
- [sender-name]: *[message]*
- /[receive-date]/
- Add references as org links with context visible on hover,
but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
Use `-map-indexed' method from dash
Merge pull request #189 from debanjum/add-search-actor-to-improve-notes-lookup-for-chat
### Introduce Search Actor
Search actor infers Search Queries from user's message
- Capabilities
- Use previous messages to add context to current search queries[^1]
This improves quality of responses in multi-turn conversations.
- Deconstruct users message into multiple search queries to lookup notes[^2]
- Use relative date awareness to add date filters to search queries[^3]
- Chat Director now does the following:
1. [*NEW*] Use Search Actor to generate search queries from user's message
2. Retrieve relevant notes from Knowledge Base using the Search queries
3. Pass retrieved relevant notes to Chat Actor to respond to user
### Add Chat Quality Tests
- Test Search Actor capabilities
- Mark Chat Director Tests for Relative Date, Multiple Search Queries as Expected Pass
### Give More Search Results as Context to Chat Actor
- Loosen search results score threshold to work better for searches with date filters
- Pass more search results (up to 5 from 2) as context to Chat Actor to improve inference
[^1]: Multi-Turn Example
Q: "When did I go to Mars?"
Search: "When did I go to Mars?"
A: "You went to Mars in the future"
Q: "How was that experience?"
Search: "How my Mars experience?"
*This gives better context for the Chat actor to respond*
[^2]: Deconstruct Example:
Is Alpha older than Beta? => What is Alpha's age? & When was Beta born?
[^3]: Date Example:
Convert user messages containing relative dates like last month, yesterday to date filters on specific dates like dt>="2023-03-01"
- Reasons:
- GPT can extract date aware search queries with date filters
better than ChatGPT given the same prompt.
- Need quality more than cost savings for now.
- Need to figure ways to improve prompt for ChatGPT before using it
Update Search Actor prompt with answers, more precise primer and
two more examples for context
Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
extract complete questions
- This should improve search results quality
- Example Expected Inferred Questions from User Message using History:
1. "What is the name of Arun's daughter?"
=> "What is the name of Arun's daughter"
2. "Where does she study?" =>
=> "Where does Arun's daughter study?" OR
=> "Where does Arun's daughter, Reena study?"
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
Answer time range based questions
Limit search to specified timeframe in question using date filter
E.g "What national parks did I visit last year?" adds
dt>="2022-01-01" dt<"2023-01-01" to Khoj search
Note: Temperature set to 0. Message to search queries should be deterministic
- Remove stale message_to_prompt test
It is too broad, reduces maintainability.
Remove as it doesn't really need its own test right now
- Setting skipif at module level for chat actor, director tests
reduces code duplication as earlier was using decorator on each chat
test
Create Rubric to Test Chat Quality and Capabilities
### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities
### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
- Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
- Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
- Measure quality at 2 separate layers
- **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
- **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes.
- Mark desired but not currently available capabilities as expected to fail <br />
This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
Combine hand-written custom notes and PG essays with personal
content to bulk up notes count
Delete old documentation markdown as not a representative dataset for
application (which is more tuned for personal notes)
- Chat directors are broad agents.
- Chat directors orchestrate narrow actor agents to synthesize
final response for the user
- Agents are Prompts + ML Model
- Test Chat Director Capabilities
1. [X] Answer from retrieved notes
2. [X] Answer from chat history
3. [X] Answer general questions
4. [X] Carry out multi-turn conversation
5. [X] Say don't know when answer not in provided context
6. [X] Answers that require current date awareness
This test is expected to fail as the chat is not capable of doing
this without the Search actor. But the test allows assessing chat quality
7. [X] Date-aware aggregation across multiple different notes
This test is expected to fail as the chat is not capable of doing
this without the Search actor. But the test allows assessing chat quality
8. [X] Ask clarification questions if no unambiguous answer in provided context
9. [X] Retrieve answer from chat history beyond lookback window
This test is expected to fail as the chat director is not capable
of searching chat history yet. But the test allows assessing chat quality
10. [X] Retrieve context for answer using multiple independent
searches on knowledge base
This test is expected to fail as the chat is not capable of doing
this without the Search actor. But the test allows assessing chat quality
- Index markdown test data as knowledge base. As easier to get good
markdown content (vs org)
- Setup markdown_content_config, processor_config and chat_client to
test chat API
- Mark chat quality tests, register custom mark for chat quality
- Filter unhelpful deprecation warnings from within dateparser library
- Error if tests use unregistered marks
- Chat actors are narrow agents (prompt + ML model)
Chat actors are different from the Chat director. who orchestrates
the narrow actor agents to synthesize final response to the user
- Test Chat Actor Capabilities
1. Answer from retrieved notes
2. Answer from chat history
3. Answer general questions
4. Carry out multi-turn conversation
5. Say don't know when answer not in provided context
6. Answers that require current date awareness
7. Date-aware aggregation across multiple different notes
8. Ask clarification questions if no unambiguous answer in provided context
This test is expected to fail as the chat is not capable of doing
this consistently yet. But having the test allows assessing chat quality
- Use Openai API Key from OPENAI_API_KEY environment variable
- Gitignore .env file, python virtualenv directory
Put OpenAI API Key in .env file to run chatbot tests via vscode
The .env file is default location for importing env vars
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
answering
- Make GPT be more reliable in looking at past conversations for
forming response