khoj/tests/evals at 7c0fd71bfd2803f40b1c39937748b4ab67108c28 - sij/khoj

sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-11-27 09:25:06 +01:00

History

Debanjum 7c0fd71bfd Add GitHub workflow to quiz Khoj across modes and specified evals (#982 ) - Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across general, default and research modes - Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models - Trigger eval workflow on release or manually - Make dataset, khoj mode and sample size configurable when triggered via manual workflow - Enable Web search, webpage read tools during evaluation	2024-11-18 02:19:30 -08:00
..
eval.py	Add GitHub workflow to quiz Khoj across modes and specified evals (#982 )	2024-11-18 02:19:30 -08:00

Add GitHub workflow to quiz Khoj across modes and specified evals (#982 )

- Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across *general*, *default* and *research* modes
- Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models
- Trigger eval workflow on release or manually
- Make dataset, khoj mode and sample size configurable when triggered via manual workflow
- Enable Web search, webpage read tools during evaluation

2024-11-18 02:19:30 -08:00

eval.py

Add GitHub workflow to quiz Khoj across modes and specified evals (#982 )

2024-11-18 02:19:30 -08:00