diff --git a/config/sample_config.yml b/config/sample_config.yml
index 7f5809c1..077c8564 100644
--- a/config/sample_config.yml
+++ b/config/sample_config.yml
@@ -8,6 +8,12 @@ content-type:
compressed-jsonl: "/data/embeddings/notes.jsonl.gz"
embeddings-file: "/data/embeddings/note_embeddings.pt"
+ markdown:
+ input-files: null
+ input-filter: "/data/markdown/*.md"
+ compressed-jsonl: "/data/embeddings/markdown.jsonl.gz"
+ embeddings-file: "/data/embeddings/markdown_embeddings.pt"
+
ledger:
input-files: null
input-filter: /data/ledger/*.beancount
diff --git a/docker-compose.yml b/docker-compose.yml
index fbf2a6b8..3ea99981 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -24,6 +24,7 @@ services:
- ./tests/data/images/:/data/images/
- ./tests/data/ledger/:/data/ledger/
- ./tests/data/music/:/data/music/
+ - ./tests/data/markdown/:/data/markdown/
# Embeddings and models are populated after the first run
# You can set these volumes to point to empty directories on host
- ./tests/data/embeddings/:/data/embeddings/
diff --git a/tests/data/markdown/interface_emacs_readme.md b/tests/data/markdown/interface_emacs_readme.md
new file mode 100644
index 00000000..c61abf80
--- /dev/null
+++ b/tests/data/markdown/interface_emacs_readme.md
@@ -0,0 +1,69 @@
+# Emacs Khoj
+
+*An Emacs interface for [Khoj](https://github.com/debanjum/khoj)*
+
+## Requirements
+
+- Install and Run [Khoj](https://github.com/debanjum/khoj)
+
+## Installation
+
+- Direct Install
+ - Put `khoj.el` in your Emacs load path. For e.g \~/.emacs.d/lisp
+
+ - Load via `use-package` in your \~/.emacs.d/init.el or .emacs
+ file by adding below snippet
+
+ ``` elisp
+ ;; Khoj Package
+ (use-package khoj
+ :load-path "~/.emacs.d/lisp/khoj.el"
+ :bind ("C-c s" . 'khoj))
+ ```
+- With [straight.el](https://github.com/raxod502/straight.el)
+ - Add below snippet to your \~/.emacs.d/init.el or .emacs config
+ file and execute it.
+
+ ``` elisp
+ ;; Khoj Package for Semantic Search
+ (use-package khoj
+ :after org
+ :straight (khoj :type git :host github :repo "debanjum/khoj" :files (:defaults "src/interface/emacs/khoj.el"))
+ :bind ("C-c s" . 'khoj))
+ ```
+- With [Quelpa](https://github.com/quelpa/quelpa#installation)
+ - Ensure [Quelpa](https://github.com/quelpa/quelpa#installation),
+ [quelpa-use-package](https://github.com/quelpa/quelpa-use-package#installation)
+ are installed
+
+ - Add below snippet to your \~/.emacs.d/init.el or .emacs config
+ file and execute it.
+
+ ``` elisp
+ ;; Khoj Package
+ (use-package khoj
+ :after org
+ :quelpa (khoj :fetcher url :url "https://raw.githubusercontent.com/debanjum/khoj/master/interface/emacs/khoj.el")
+ :bind ("C-c s" . 'khoj))
+ ```
+
+## Usage
+
+1. Open Query Interface on Client
+
+ - In Emacs: Call `khoj` using keybinding `C-c s` or `M-x khoj`
+ - On Web: Open
+
+2. Query in Natural Language
+
+ e.g \"What is the meaning of life?\" \"What are my life goals?\"
+
+ **Note: It takes about 4s on a Mac M1 and a \>100K line corpus of
+ notes**
+
+3. (Optional) Narrow down results further
+
+ Include/Exclude specific words or date range from results by
+ updating query with below query format
+
+ e.g \`What is the meaning of life? -god +none dt:\"last week\"\`
diff --git a/tests/data/markdown/main_readme.md b/tests/data/markdown/main_readme.md
new file mode 100644
index 00000000..682515aa
--- /dev/null
+++ b/tests/data/markdown/main_readme.md
@@ -0,0 +1,153 @@
+![](https://github.com/debanjum/khoj/actions/workflows/test.yml/badge.svg)
+![](https://github.com/debanjum/khoj/actions/workflows/build.yml/badge.svg)
+
+# Khoj
+
+*Allow natural language search on user content like notes, images,
+transactions using transformer ML models*
+
+User can interface with Khoj via [Web](./src/interface/web/index.html),
+[Emacs](./src/interface/emacs/khoj.el) or the API. All search is done
+locally[\*](https://github.com/debanjum/khoj#miscellaneous)
+
+## Demo
+
+
+
+## Setup
+
+### 1. Clone
+
+``` shell
+git clone https://github.com/debanjum/khoj && cd khoj
+```
+
+### 2. Configure
+
+- \[Required\] Update [docker-compose.yml](./docker-compose.yml) to
+ mount your images, (org-mode or markdown) notes and beancount
+ directories
+- \[Optional\] Edit application configuration in
+ [sample~config~.yml](./config/sample_config.yml)
+
+### 3. Run
+
+``` shell
+docker-compose up -d
+```
+
+*Note: The first run will take time. Let it run, it\'s mostly not hung,
+just generating embeddings*
+
+## Use
+
+- **Khoj via API**
+ - See [Khoj API Docs](http://localhost:8000/docs)
+ - [Query](http://localhost:8000/search?q=%22what%20is%20the%20meaning%20of%20life%22)
+ - [Regenerate
+ Embeddings](http://localhost:8000/regenerate?t=ledger)
+ - [Configure Application](https://localhost:8000/ui)
+- **Khoj via Emacs**
+ - [Install](https://github.com/debanjum/khoj/tree/master/src/interface/emacs#installation)
+ [khoj.el](./src/interface/emacs/khoj.el)
+ - Run `M-x khoj `
+
+## Run Unit tests
+
+``` shell
+pytest
+```
+
+## Upgrade
+
+``` shell
+docker-compose build --pull
+```
+
+## Troubleshooting
+
+- Symptom: Errors out with \"Killed\" in error message
+ - Fix: Increase RAM available to Docker Containers in Docker
+ Settings
+ - Refer: [StackOverflow
+ Solution](https://stackoverflow.com/a/50770267), [Configure
+ Resources on Docker for
+ Mac](https://docs.docker.com/desktop/mac/#resources)
+- Symptom: Errors out complaining about Tensors mismatch, null etc
+ - Mitigation: Delete content-type \> image section from
+ docker~sampleconfig~.yml
+
+## Miscellaneous
+
+- The experimental [chat](localhost:8000/chat) API endpoint uses the
+ [OpenAI API](https://openai.com/api/)
+ - It is disabled by default
+ - To use it add your `openai-api-key` to config.yml
+
+## Development Setup
+
+### Setup on Local Machine
+
+1. 1\. Install Dependencies
+
+ 1. Install Python3 \[Required\]
+
+ 2. [Install
+ Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
+ \[Required\]
+
+ 3. Install Exiftool \[Optional\]
+
+ ``` shell
+ sudo apt-get -y install libimage-exiftool-perl
+ ```
+
+2. 2\. Install Khoj
+
+ ``` shell
+ git clone https://github.com/debanjum/khoj && cd khoj
+ conda env create -f config/environment.yml
+ conda activate khoj
+ ```
+
+3. 3\. Configure
+
+ - Configure files/directories to search in `content-type` section
+ of `sample_config.yml`
+ - To run application on test data, update file paths containing
+ `/data/` to `tests/data/` in `sample_config.yml`
+ - Example replace `/data/notes/*.org` with
+ `tests/data/notes/*.org`
+
+4. 4\. Run
+
+ Load ML model, generate embeddings and expose API to query notes,
+ images, transactions etc specified in config YAML
+
+ ``` shell
+ python3 -m src.main -c=config/sample_config.yml -vv
+ ```
+
+### Upgrade On Local Machine
+
+``` shell
+cd khoj
+git pull origin master
+conda deactivate khoj
+conda env update -f config/environment.yml
+conda activate khoj
+```
+
+## Acknowledgments
+
+- [Multi-QA MiniLM
+ Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1)
+ for Asymmetric Text Search. See [SBert
+ Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
+- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image
+ Search. See [SBert
+ Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
+- Charles Cave for [OrgNode
+ Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
+- Sven Marnach for
+ [PyExifTool](https://github.com/smarnach/pyexiftool/blob/master/exiftool.py)