sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-23 23:48:56 +01:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	051f0e3fb5	Add, configure and run pre-commit locally and in test workflow	2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky	11a18cc452	Update khoj docker config to index sub directories for text content - Khoj supports indexing subdirectories but the khoj docker config wasn't updated to support the same - This should also allow khoj docker users to index multiple separate directory trees by mounting them into separate sub folders within /data/<content-type>/. For e.g /data/org/dir1, /data/org/dir2 etc in khoj_docker.yml	2023-02-06 21:04:50 -03:00
Debanjum Singh Solanky	918af5e6f8	Make OpenAI conversation model configurable via khoj.yml - Default to using `text-davinci-003' if conversation model not explicitly configured by user. Stop using the older `davinci' and `davinci-instruct' models - Use `model' instead of `engine' as parameter. Usage of `engine' parameter in OpenAI API is deprecated	2023-01-09 00:17:51 -03:00
Debanjum Singh Solanky	7e05389776	Quote all values passed to input-filter fields in sample yaml files	2023-01-08 22:40:18 -03:00
Debanjum Singh Solanky	0440f3fd57	Add encoder-type field to the search-type sections in khoj_sample.yml	2023-01-08 22:07:13 -03:00
Debanjum Singh Solanky	8b8e202ab3	Set input-filter to list in khoj_docker.yml and khoj_sample.yml `input-filter' was converted to a list a while back but the sample khoj configs were not updated to reflect this. This change fixes that	2023-01-08 21:08:00 -03:00
Debanjum Singh Solanky	7380518f24	Upgrade PyTorch, Pillow version to resolve Dependabot Security Advisories This also enables GPU usage by Khoj on MacOS as MPS support is now in PyTorch mainline	2023-01-05 15:39:09 -03:00
Debanjum Singh Solanky	1502fbc9e9	Add index_heading_entries flag to default and sample khoj configs	2022-09-11 17:33:37 +03:00
Debanjum Singh Solanky	26ff66f38b	(Re-)Enable image search via Docker image as image search issues fixed	2022-09-08 10:42:34 +03:00
Debanjum Singh Solanky	d4072974d7	Use of XMP metadata in Khoj Image Search is broken. Disable by default - CLIP Image score and XMP metadata score are not combining well. When combined they give non sensical results. Enable only once figure how best to combine the two. - Show scores with higher precision for image search - Image search scores seem to be mostly be between 0.2 - 0.3 for some reason - Higher precision scores make it easier to understand the quality of returned results perceived by the model itself	2022-08-19 19:17:28 +03:00
Debanjum Singh Solanky	f78d6ae754	Create khoj_sample file with all configurable fields in one place - Reason - Simplifies code. No merge_dict required - 1 place for user to see all configurables, defaults and required values - Details - Remove default_config from code. Set defaults in khoj_sample.yml itself - Keep fields required to be set by user as empty in khoj_sample to YAML - Set defaults for fields not requiring configuration by user	2022-08-05 01:08:33 +03:00
Debanjum Singh Solanky	a876b652d8	Rename khoj_sample.yml to more specific khoj_docker.yml - Update docker-compose.yml to start container using khoj_docker.yml - Use /data/org in input-filter for content-type > org	2022-08-04 22:42:05 +03:00
Debanjum Singh Solanky	b3ebb01beb	Disable image search for now as unable to load CLIP model Loading CLIP Model from Sentence-Transformer is failing See https://github.com/UKPLab/sentence-transformers/issues/1659 for details	2022-08-04 02:20:45 +03:00
Debanjum Singh Solanky	a4eb55dd00	Rename khoj config yml file to follow more specific khoj*.yml pattern - That is, sample_config.yml is renamed to khoj_sample.yml - This makes the application config filename less generic, more easily identifiable with the application - Update docs, app accordingly	2022-08-03 12:06:55 +03:00
Debanjum Singh Solanky	1f4b5ac112	Create test markdown files. Use them in sample config, docker-compose	2022-07-21 22:09:44 +04:00
Debanjum Singh Solanky	6c9ffdba57	Allow indexing multiple image directories for image search	2022-07-20 02:56:01 +04:00
Debanjum Singh Solanky	732b2d287f	Give the project a short, less generic name. Rename it to Khoj - Semantic Search was just a placeholder used to test the idea out Didn't want to get into naming at that point of time	2022-07-19 18:26:16 +04:00
Debanjum Singh Solanky	989526ae54	Use a more accurate model for symmetric semantic search - The all-MiniLM-L6-v2 is more accurate - The exact previous model isn't benchmarked but based on the performance of the closest model to it. Seems like the new model maybe similar in speed and size - On very preliminary evaluation of the model, the new model seems faster, with pretty decent results	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	4a90972e38	Use a better model for asymmetric semantic search - The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1] - It has the right mix of model query speed, size and performance on benchmarks - On hugging face it has way more downloads and likes than the msmarco model[2] - On very preliminary evaluation of the model - It doubles the encoding speed of all entries (down from ~8min to 4mins) - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier) [1]: https://www.sbert.net/docs/pretrained_models.html [2]: https://huggingface.co/sentence-transformers	2022-07-18 20:27:26 +04:00
Debanjum Singh Solanky	50658453cd	Add separate conda environment.yml for osx-arm64 Conda doesn't support using the same environment across platforms We were able to get away with this till now because of manually setting up the conda environment.yml But it's more robust to just add conda environment YAML files for each platform as necessary	2022-07-14 23:16:49 +04:00
Debanjum Singh Solanky	e96253a7c1	Add dateparser library to conda environment YAML	2022-07-14 22:29:07 +04:00
Saba	07a56c4ab6	Add specific version for Python packages and downgrade miniconda Docker image to potentially fix build issues	2022-07-04 18:01:55 -04:00
Saba	092d0f2f21	Move Dockerfile to project root to avoid permissions issues. Allocate more memory to docker-compose to avoid OOM	2022-07-04 12:33:55 -04:00
Debanjum Singh Solanky	78b76d65a0	Minor fix to notes jsonl file extension in sample_config.yml	2022-01-29 04:13:36 -05:00
Debanjum Singh Solanky	c31abad0a6	Mount embeddings to /data/embeddings for directory naming consistency - Keeps directory paths consistent between host and container volumes - Consistency simplifies documentation and updates required to setup sample_config.yml for local installation	2022-01-29 03:24:02 -05:00
Debanjum Singh Solanky	b0067fc32e	Store docker, conda, semantic-search configuration in a config directory - Improves organization of config files required for application - Declutters the application root directory from configs	2022-01-29 02:41:11 -05:00

26 commits