Commit graph

22 commits

Author SHA1 Message Date
sabaimran
1e2af083f0 Rename the data_sources module to content 2023-11-21 22:11:32 -08:00
sabaimran
b8e6883a81 Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search 2023-11-19 16:20:08 -08:00
Debanjum Singh Solanky
55785d50c3 Use title, when present, as root ancestor of entries instead of file path 2023-11-17 15:03:27 -08:00
sabaimran
ec06d2c446 Move data indexer files into a separate folder under processor. Update assoc UTs 2023-11-16 17:19:55 -08:00
Debanjum Singh Solanky
305c25ae1a Track ancestor headings for each org-mode entry in org-node parser 2023-11-16 02:39:14 -08:00
sabaimran
4854258047
Move to a push-first model for retrieving embeddings from local files (#457)
* Initial version - setup a file-push architecture for generating embeddings with Khoj
* Update unit tests to fix with new application design
* Allow configure server to be called without regenerating the index; this no longer works because the API for indexing files is not up in time for the server to send a request
* Use state.host and state.port for configuring the URL for the indexer
* On application startup, load in embeddings from configurations files, rather than regenerating the corpus based on file system
2023-08-31 12:55:17 -07:00
sabaimran
2697c7a186 Update org tests to use new method, update Github configuration in tests 2023-06-27 15:04:48 -07:00
Debanjum Singh Solanky
fe03ba3dce Index intro text before headings in org files
- Text before headings was not being indexed due to buggy orgnode
  parsing logic
- Resolved indexing intro text from files with and without headings in
  them
- Ensure intro text node has heading set to all title lines collected
  from the file

Resolves #165
2023-03-01 12:11:33 -06:00
Debanjum Singh Solanky
47569da38e Fix usage of "\" in orgnode test string to resolve DeprecationWarning 2023-02-17 17:15:44 -06:00
Debanjum Singh Solanky
051f0e3fb5 Add, configure and run pre-commit locally and in test workflow 2023-02-17 13:31:36 -06:00
Debanjum Singh Solanky
5e83baab21 Use Black to format Khoj server code and tests 2023-02-17 11:55:17 -06:00
Debanjum Singh Solanky
25a749ca1d Use the src/ layout to fix packaging Khoj for PyPi
- Why
  The khoj pypi packages should be installed in `khoj' directory.
  Previously it was being installed into `src' directory, which is a
  generic top level directory name that is discouraged from being used

- Changes
 - move src/* to src/khoj/*
 - update `setup.py' to `find_packages' in `src' instead of project root
 - rename imports to form `from khoj.*' in complete project
 - update `constants.web_directory' path to use `khoj' directory
 - rename root logger to `khoj' in `main.py'
 - fix image_search tests to use the newly rename `khoj' logger
 - update config, docs, workflows to reference new path `src/khoj'
2023-02-14 15:19:06 -06:00
Debanjum Singh Solanky
9d369ae4df Fix OrgNode render of entries with property drawers and empty body
- Issue
  - Indent regex was previously catching escape sequences like newlines
  - This was resulting in entries with only escape sequences in body to
    be prepended to property drawers etc during rendering
- Fix
  - Update indent regex to only look for spaces in each line
  - Only render body when body contains non-escape characters
  - Create test to prevent this regression from silently resurfacing
2022-09-11 16:09:19 +03:00
Debanjum Singh Solanky
1d3b3d5f39 Convert field get/set methods in OrgNode class to @property
- Use more descriptive variable names in OrgNode parser and class
- Convert OrgNode fields to private/protected, use property methods to
  get/set them
2022-09-11 14:59:28 +03:00
Debanjum Singh Solanky
ebd5039bd1 Merge branch 'master' into support-incremental-updates-of-embeddings 2022-09-10 22:37:13 +03:00
Debanjum Singh Solanky
b9a6e80629 Make OrgNode tags stable sorted to find new entries for incremental updates
- Having Tags as sets was returning them in a different order
  everytime
- This resulted in spuriously identifying existing entries as new
  because their tags ordering changed
- Converting tags to list fixes the issue and identifies updated new
  entries for incremental update correctly
2022-09-10 20:59:52 +03:00
Debanjum Singh Solanky
976397bd82 Ignore empty #+TITLE, merge multiple #+TITLE for 0th level headings 2022-09-10 15:34:47 +03:00
Debanjum Singh Solanky
11917c6ddd Do not normalize absolute filenames for creating links in OrgNode 2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky
a02d9db457 Test Task State Extraction in OrgNode Tests 2022-08-10 13:48:18 +03:00
Debanjum Singh Solanky
d50bfb5188 Parse Logbook Entries in the OrgNode parser for Org-Mode. Update tests 2022-07-21 00:15:30 +04:00
Debanjum Singh Solanky
85fbe1c42b Normalize org notes path to be relative to home directory
- This is still clunky but it should be commitable
- General enough that it'll work even when a users notes are not in the home directory
- While solving for the special case where:
  - Notes are being processed on a different machine and used on a different machine
  - But the notes directory is in the same location relative to home on both the machines
2022-06-28 19:16:11 +04:00
Debanjum Singh Solanky
f66192f2a7 Test OrgNode Parsing and Rendering 2022-06-17 19:13:11 +03:00