khoj/src/search_filter
Debanjum Singh Solanky b673d26a12 Extract Entries in a standardized format across text search types
Issue:
 - Had different schema of extracted entries for symmetric_ledger vs asymmetric

 - Entry extraction for asymmetric was dirty, relying on cryptic
   indices to store raw entry vs cleaned entry meant to be passed to embeddings

 - This was pushing the load of figuring out what property to extract
   from each entry to downstream processes like the filters

 - This limited the filters to only work for asymmetric search, not for
   symmetric_ledger

- Fix
   - Use consistent format for extracted entries
     {
       'embed': entry_string_meant_to_be_passed_to_model_and_get_embeddings,
       'raw'  : raw_entry_string_meant_to_be_passed_to_use
     }

 - Result
   - Now filters can be applied across search types, and the specific
     field they should be applied on can be configured by each search
     type
2022-07-19 20:52:25 +04:00
..
date_filter.py Extract Entries in a standardized format across text search types 2022-07-19 20:52:25 +04:00
explicit_filter.py Extract Entries in a standardized format across text search types 2022-07-19 20:52:25 +04:00