Issue:
- Had different schema of extracted entries for symmetric_ledger vs asymmetric
- Entry extraction for asymmetric was dirty, relying on cryptic
indices to store raw entry vs cleaned entry meant to be passed to embeddings
- This was pushing the load of figuring out what property to extract
from each entry to downstream processes like the filters
- This limited the filters to only work for asymmetric search, not for
symmetric_ledger
- Fix
- Use consistent format for extracted entries
{
'embed': entry_string_meant_to_be_passed_to_model_and_get_embeddings,
'raw' : raw_entry_string_meant_to_be_passed_to_use
}
- Result
- Now filters can be applied across search types, and the specific
field they should be applied on can be configured by each search
type
- With \t Last Word in Headings was suffixed by \t and so couldn't be
filtered by
- User interacts with raw entries, so run explicit filters on raw entry
- For semantic search using the filtered entry is cleaner, still
- Fix date_filter date_in_entry within query range check
- Extracted_date_range is in [included_date, excluded_date) format
- But check was checking for date_in_entry <= excluded_date
- Fixed it to do date_in_entry < excluded_date
- Fix removal of date filter from query
- Add tests for date_filter
- Default to looking at dates from past, as most notes are from past
- Look for dates in future for cases where it's obvious query is for
dates in the future but dateparser's parse doesn't parse it at all.
E.g parse('5 months from now') returns nothing
- Setting PREFER_DATES_FROM_FUTURE in this case and passing just
parse('5 months') to dateparser.parse works as expected
Details
--
- Move explicit_filters function into separate module under search_filter
- Update signature of explicit filter to take and return query, entries, embeddings
- Use this explicit_filter func from search_filters module in query
Reason
--
Abstraction will simplify adding other pre-search filters. E.g datetime filter