mirror of
https://github.com/khoj-ai/khoj.git
synced 2025-02-17 08:04:21 +00:00
Mirror of khoj from Github
agentaiassistantchatchatgptemacsimage-generationllama3llamacppllmobsidianobsidian-mdoffline-llmproductivityragresearchself-hostedsemantic-searchsttwhatsapp-ai
## Overview ### New - Support using Firecrawl(https://firecrawl.dev) to read web pages - Add, switch and re-prioritize web page reader(s) to use via the admin panel ### Speed - Improve response speed by aggregating web page read, extract queries to run only once for each web page ### Response Resilience - Fallback through enabled web page readers until web page read - Enable reading web pages on the internal network for self-hosted Khoj running in anonymous mode - Try respond even if web search, web page read fails during chat - Try respond even if document search via inference endpoint fails ### Fix - Return data sources to use if exception in data source chat actor ## Details ### Configure web page readers to use - Only the web scraper set in Server Chat Settings via the Django admin panel, if set - Otherwise use the web scrapers added via the Django admin panel (in order of priority), if set - Otherwise, use all the web scrapers enabled by settings API keys via environment variables (e.g `FIRECRAWL_API_KEY', `JINA_API_KEY' env vars set), if set - Otherwise, use Jina to web scrape if no scrapers explicitly defined For self-hosted setups running in anonymous-mode, the ability to directly read webpages is also enabled by default. This is especially useful for reading webpages in your internal network that the other web page readers will not be able to access. ### Aggregate webpage extract queries to run once for each distinct web page Previously, we'd run separate webpage read and extract relevant content pipes for each distinct (query, url) pair. Now we aggregate all queries for each url to extract information from and run the webpage read and extract relevant content pipes once for each distinct URL. Even though the webpage content extraction pipes were previously being run in parallel. They increased the response time by 1. adding more ~duplicate context for the response generation step to read 2. being more susceptible to variability in web page read latency of the parallel jobs The aggregated retrieval of context for all queries for a given webpage could result in some hit to context quality. But it should improve and reduce variability in response time, quality and costs. This should especially help with speed and quality of online search for offline or low context chat models. |
||
---|---|---|
.github | ||
documentation | ||
scripts | ||
src | ||
tests | ||
.dockerignore | ||
.gitattributes | ||
.gitignore | ||
.pre-commit-config.yaml | ||
docker-compose.yml | ||
Dockerfile | ||
gunicorn-config.py | ||
LICENSE | ||
manifest.json | ||
prod.Dockerfile | ||
pyproject.toml | ||
pytest.ini | ||
README.md | ||
versions.json |
Your AI second brain
Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.
- Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini).
- Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
- Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
- Create agents with custom knowledge, persona, chat model and tools to take on any role.
- Automate away repetitive research. Get personal newsletters and smart notifications delivered to your inbox.
- Find relevant docs quickly and easily using our advanced semantic search.
- Generate images, talk out loud, play your messages.
- Khoj is open-source, self-hostable. Always.
- Run it privately on your computer or try it on our cloud app.
See it in action
Go to https://app.khoj.dev to see Khoj live.
Full feature list
You can see the full feature list here.
Self-Host
To get started with self-hosting Khoj, read the docs.
Contributors
Cheers to our awesome contributors! 🎉
Made with contrib.rocks.
Interested in Contributing?
We are always looking for contributors to help us build new features, improve the project documentation, or fix bugs. If you're interested, please see our Contributing Guidelines and check out our Contributors Project Board.