sij/khoj: Mirror of khoj from Github

sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2025-02-17 08:04:21 +00:00

Mirror of khoj from Github

agent ai assistant chat chatgpt emacs image-generation llama3 llamacpp llm obsidian obsidian-md offline-llm productivity rag research self-hosted semantic-search stt whatsapp-ai

Find a file

Debanjum 7fb4c2939d Make Chat and Online Search Resilient and Faster (#936 ) ## Overview ### New - Support using Firecrawl(https://firecrawl.dev) to read web pages - Add, switch and re-prioritize web page reader(s) to use via the admin panel ### Speed - Improve response speed by aggregating web page read, extract queries to run only once for each web page ### Response Resilience - Fallback through enabled web page readers until web page read - Enable reading web pages on the internal network for self-hosted Khoj running in anonymous mode - Try respond even if web search, web page read fails during chat - Try respond even if document search via inference endpoint fails ### Fix - Return data sources to use if exception in data source chat actor ## Details ### Configure web page readers to use - Only the web scraper set in Server Chat Settings via the Django admin panel, if set - Otherwise use the web scrapers added via the Django admin panel (in order of priority), if set - Otherwise, use all the web scrapers enabled by settings API keys via environment variables (e.g `FIRECRAWL_API_KEY', `JINA_API_KEY' env vars set), if set - Otherwise, use Jina to web scrape if no scrapers explicitly defined For self-hosted setups running in anonymous-mode, the ability to directly read webpages is also enabled by default. This is especially useful for reading webpages in your internal network that the other web page readers will not be able to access. ### Aggregate webpage extract queries to run once for each distinct web page Previously, we'd run separate webpage read and extract relevant content pipes for each distinct (query, url) pair. Now we aggregate all queries for each url to extract information from and run the webpage read and extract relevant content pipes once for each distinct URL. Even though the webpage content extraction pipes were previously being run in parallel. They increased the response time by 1. adding more ~duplicate context for the response generation step to read 2. being more susceptible to variability in web page read latency of the parallel jobs The aggregated retrieval of context for all queries for a given webpage could result in some hit to context quality. But it should improve and reduce variability in response time, quality and costs. This should especially help with speed and quality of online search for offline or low context chat models.		2024-10-17 17:57:44 -07:00
.github	Remove tools cache in dockerize.yml workflow	2024-09-29 00:27:37 -07:00
documentation	Upgrade documentation website dependencies	2024-10-17 11:58:52 -07:00
scripts	Update bump version script to bump new next.js web app version too	2024-08-05 16:20:47 +05:30
src	Return enabled scrapers as WebScraper objects for more ergonomic code	2024-10-17 17:44:09 -07:00
tests	Intelligently initialize a decent default set of chat model options	2024-09-19 20:32:08 -07:00
.dockerignore	Use pypi khoj to fix docker builds and dockerize github workflow	2023-02-19 01:57:01 -06:00
.gitattributes	Exclude tests data file from programming stats on Github	2023-08-28 11:00:52 -07:00
.gitignore	Cycle through chat history in chat input on Obsidian (#861 )	2024-08-12 23:55:25 -07:00
.pre-commit-config.yaml	Add isort to the pre-commit configuration and apply it to the whole project (#595 )	2023-12-28 18:04:02 +05:30
docker-compose.yml	Intelligently initialize a decent default set of chat model options	2024-09-19 20:32:08 -07:00
Dockerfile	Reduce size of Khoj Docker images by removing layers and caches	2024-09-29 04:06:35 -07:00
gunicorn-config.py	Bump gunicorn workers per server up to 2	2024-04-18 11:32:51 +05:30
LICENSE	Change license to GNU AGPLv3 from GNU GPLv3	2023-11-16 11:14:06 -08:00
manifest.json	Release Khoj version 1.25.0	2024-10-10 18:07:30 -07:00
prod.Dockerfile	Reduce size of Khoj Docker images by removing layers and caches	2024-09-29 04:06:35 -07:00
pyproject.toml	Upgrade Django version used by Khoj server	2024-10-17 11:58:52 -07:00
pytest.ini	Move the django app into the src/khoj folder for better organization and functionality	2023-11-21 10:56:04 -08:00
README.md	Use Khoj icons. Add automation & improve agent text on web login page	2024-10-17 11:58:52 -07:00
versions.json	Release Khoj version 1.25.0	2024-10-10 18:07:30 -07:00

README.md

Your AI second brain

📑 Docs • 🌐 Web • 🔥 App • 💬 Discord • ✍🏽 Blog

Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.

Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini).
Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
Create agents with custom knowledge, persona, chat model and tools to take on any role.
Automate away repetitive research. Get personal newsletters and smart notifications delivered to your inbox.
Find relevant docs quickly and easily using our advanced semantic search.
Generate images, talk out loud, play your messages.
Khoj is open-source, self-hostable. Always.
Run it privately on your computer or try it on our cloud app.

See it in action

Go to https://app.khoj.dev to see Khoj live.

Full feature list

You can see the full feature list here.

Self-Host

To get started with self-hosting Khoj, read the docs.

Contributors

Cheers to our awesome contributors! 🎉

Made with contrib.rocks.

Interested in Contributing?

We are always looking for contributors to help us build new features, improve the project documentation, or fix bugs. If you're interested, please see our Contributing Guidelines and check out our Contributors Project Board.