Commit graph

13 commits

Author SHA1 Message Date
Debanjum Singh Solanky
b673d26a12 Extract Entries in a standardized format across text search types
Issue:
 - Had different schema of extracted entries for symmetric_ledger vs asymmetric

 - Entry extraction for asymmetric was dirty, relying on cryptic
   indices to store raw entry vs cleaned entry meant to be passed to embeddings

 - This was pushing the load of figuring out what property to extract
   from each entry to downstream processes like the filters

 - This limited the filters to only work for asymmetric search, not for
   symmetric_ledger

- Fix
   - Use consistent format for extracted entries
     {
       'embed': entry_string_meant_to_be_passed_to_model_and_get_embeddings,
       'raw'  : raw_entry_string_meant_to_be_passed_to_use
     }

 - Result
   - Now filters can be applied across search types, and the specific
     field they should be applied on can be configured by each search
     type
2022-07-19 20:52:25 +04:00
Debanjum Singh Solanky
85077bc1d1 Handle unparseable date range passed via date filter in query
- Do not reuse the same list
- Just create new list, so only parsed data is in it
2022-07-14 22:47:23 +04:00
Debanjum Singh Solanky
c3b3e8959d Put entry splitting regex in explicit filter into a variable for code readability 2022-07-14 22:00:10 +04:00
Debanjum Singh Solanky
3aac3c7d52 Run explicit filter on raw entry, add more terms to split entries by
- With \t Last Word in Headings was suffixed by \t and so couldn't be
filtered by
- User interacts with raw entries, so run explicit filters on raw entry
   - For semantic search using the filtered entry is cleaner, still
2022-07-14 21:54:04 +04:00
Debanjum Singh Solanky
7640e2ab0c Wrap attempt to extract dates from entry in try/catch
- Not all YYYY-MM-DD strings in entry are necessarily dates
2022-07-14 21:38:00 +04:00
Debanjum Singh Solanky
9de2097182 Fix date filter usage with multi word queries. Simplify date regex 2022-07-14 21:34:33 +04:00
Debanjum Singh Solanky
dcb6fe479e Fix date_filter query, entry in query range check. Add tests for it
- Fix date_filter date_in_entry within query range check
  - Extracted_date_range is in [included_date, excluded_date) format
  - But check was checking for date_in_entry <= excluded_date
  - Fixed it to do date_in_entry < excluded_date

- Fix removal of date filter from query
- Add tests for date_filter
2022-07-14 20:01:35 +04:00
Debanjum Singh Solanky
011f81fac5 Fix date_filter to handle non overlapping date ranges 2022-07-14 18:53:38 +04:00
Debanjum Singh Solanky
70ac35b2a5 Compute Date Range to filter entries to, from Comparators, Dates in Query 2022-07-14 18:20:09 +04:00
Debanjum Singh Solanky
e6db3e3d00 Prefer Dates From Future only when specific words in date string
- Default to looking at dates from past, as most notes are from past
- Look for dates in future for cases where it's obvious query is for
  dates in the future but dateparser's parse doesn't parse it at all.
  E.g parse('5 months from now') returns nothing

- Setting PREFER_DATES_FROM_FUTURE in this case and passing just
  parse('5 months') to dateparser.parse works as expected
2022-07-14 18:13:12 +04:00
Debanjum Singh Solanky
4a201d52af Add, test date filter regex and date parsing to get natural date range 2022-07-14 16:47:32 +04:00
Debanjum Singh Solanky
b54588717f Filter for entries with dates specified by user in query
- Create Date filter
  - Users can pass dates in YYYY-MM-DD format in their query
- Use it to filter asymmetric search to user specified dates
2022-07-14 00:51:02 +04:00
Debanjum Singh Solanky
c92789d20a Extract explicit pre-search filter function into a separate module
Details
--
- Move explicit_filters function into separate module under search_filter
- Update signature of explicit filter to take and return query, entries, embeddings
- Use this explicit_filter func from search_filters module in query

Reason
--
Abstraction will simplify adding other pre-search filters. E.g datetime filter
2022-07-13 16:20:04 +04:00