Commit graph

32 commits

Author SHA1 Message Date
Debanjum
d8efcd559f
Add Feature Section to Readme
- Make Architecture a top-level section
- Minor improvement to Configure section
2022-07-25 15:43:27 -07:00
Debanjum Singh Solanky
f953b20415 Add Khoj Architecture Diagram in Docs. Show it in the Project Readme 2022-07-26 02:09:51 +04:00
Debanjum Singh Solanky
674d933282 Improve Khoj Intro text. Move Run Unit Test Section under Developement Heading 2022-07-26 02:06:44 +04:00
Debanjum Singh Solanky
3728583e08 Update Readme. Add section for using Khoj via Web interface 2022-07-22 04:02:03 +04:00
Debanjum Singh Solanky
4c24202e42 Update documentation. Simplify, reflect current capabilities 2022-07-21 22:09:44 +04:00
Debanjum Singh Solanky
d4d7dbaca6 Support Natural Search on Markdown Files
- Reason:
  Allow natural search on markdown based notes, documentation,
  websites etc

- Details:
  - Create markdown processor to extract Markdown entries (identified by
    Heading) into standard jsonl format required by text_search
  - Update API, Configs to support interfacing with new markdown type
  - Update Emacs, Web clients to support interfacing with new markdown
    type via API
  - Update Readme to mentiond markdown is also supported

Closes #35
2022-07-21 22:07:05 +04:00
Debanjum Singh Solanky
732b2d287f Give the project a short, less generic name. Rename it to Khoj
- Semantic Search was just a placeholder used to test the idea out
  Didn't want to get into naming at that point of time
2022-07-19 18:26:16 +04:00
Debanjum Singh Solanky
4a90972e38 Use a better model for asymmetric semantic search
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1]
- It has the right mix of model query speed, size and performance on benchmarks
- On hugging face it has way more downloads and likes than the msmarco model[2]
- On very preliminary evaluation of the model
  - It doubles the encoding speed of all entries (down from ~8min to 4mins)
  - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier)

[1]: https://www.sbert.net/docs/pretrained_models.html
[2]: https://huggingface.co/sentence-transformers
2022-07-18 20:27:26 +04:00
sabaimran
36ef37e940
Fix formatting for pytest command
Use org formatting rather than md.
2022-07-08 10:18:26 -04:00
Saba
07a56c4ab6 Add specific version for Python packages and downgrade miniconda Docker image to potentially fix build issues 2022-07-04 18:01:55 -04:00
Saba
092d0f2f21 Move Dockerfile to project root to avoid permissions issues. Allocate more memory to docker-compose to avoid OOM 2022-07-04 12:33:55 -04:00
Debanjum
19f8f85333
Show Demo of Semantic Search in Readme
- Use Markdown file to help upload demo to Github
- Use generated link from upload into Readme org file
2022-05-14 01:29:13 -07:00
Debanjum Singh Solanky
37bfc956c9 Update Readme Local Development Section 2022-02-27 23:16:58 -05:00
Debanjum Singh Solanky
624a3faf92 Update Readme. Improve Organization, Reduce Staleness 2022-02-26 19:04:49 -05:00
Debanjum Singh Solanky
359f25b0a4 Rename publish workflow to build. Add badge to the workflow on Readme 2022-01-29 18:11:47 -05:00
Debanjum Singh Solanky
859258864c Update Readme badge post rename of build.yml to test.yml 2022-01-29 17:10:43 -05:00
Debanjum Singh Solanky
3e889760c7 Merge sample_config, docker_sample_config yml into a single sample_config.yml
- Update readme to indicate how to update the new sample_config to run on test data
2022-01-29 01:32:12 -05:00
Debanjum Singh Solanky
2bc2780501 Mention the experimental /chat API interacts with OpenAI's API 2022-01-29 00:11:40 -05:00
Debanjum Singh Solanky
6ed667aed0 Add Troubleshooting Section, Minor Fixes to Readme 2022-01-29 00:11:40 -05:00
Saba
1ba7fa66e5 Update README and default folders in docker_sample_config.yml
- Add instruction to using Docker with README
- Use the ./tests/data folder in docker_sample_conifg.yml so it can work right away for users
2022-01-28 23:20:50 -05:00
Albert Davies
2e2069f720
Fix url error in README
Misplaced quotes
2022-01-13 16:28:46 -08:00
Saba
916a1ffc73 Fix formatting of REAMDE env dependencies 2021-12-16 20:36:31 -05:00
Saba
9ebf00e29b Add instructions for installing exiftool to README (for Ubuntu only) 2021-12-11 14:13:37 -05:00
Debanjum Singh Solanky
6244ccc01a Add Configuring Application Section, Update Run command in Readme 2021-11-18 19:25:50 +05:30
Debanjum Singh Solanky
e933a5d3d0 Fix badge in readme, post workflow rename 2021-09-30 04:58:18 -07:00
Debanjum Singh Solanky
cedd723721 Add tests badge to readme. Simplify name of tests workflow 2021-09-30 04:51:47 -07:00
Debanjum Singh Solanky
150593c776 Update Readme. Acknowledger PyExifTool and Minor Fixes 2021-09-16 12:39:42 -07:00
Debanjum Singh Solanky
8dec58b12a Update Readme to state can now query beancount transactions, images 2021-08-22 21:50:27 -07:00
Debanjum Singh Solanky
78a1f4ebb4 Use YAML file to allow user to configure application. Add tests
- YAML Config
  - Can specify all params[1] earlier being passed via cmd args in config YAML
  - Can now also configure sentence-transformer models to use etc for search
    - [1] Config params
       - org files
       - compressed entries file config path
       - embeddings file config path

  - Include sample_config.yaml
  - Include sample .org file from this repos readmes

- CLI
  - Configuration Priority: Config via cmd > Config via YAML > Default Config
  - Test CLI, include test config.yml for the tests

- Set default type to None unless set via query param to API
  Run notes search if search_enabled, also if type is None (default)
  Prepares for running queries on all search types unless type
  specified in API query param

- Update Readme
2021-08-21 19:07:39 -07:00
Debanjum Singh Solanky
eddbc67358 Document how to install latest version in Readme 2021-08-17 18:27:10 -07:00
Debanjum Singh Solanky
af9660f28e Move application files under src directory. Update Readmes
- Remove callign asymmetric search script directly command.
  It doesn't work anymore on calling directly due to internal package
  import issues
2021-08-17 04:11:03 -07:00
Debanjum Singh Solanky
0509854e14 Replace README.md with README.org. Can be used as notes for testing 2021-08-16 20:00:05 -07:00