sij/khoj

mirror of https://github.com/khoj-ai/khoj.git synced 2024-12-19 02:57:10 +00:00

Author	SHA1	Message	Date
Debanjum	29e801c381	Add MATH500 dataset to eval Evaluate simpler MATH500 responses with gemini 1.5 flash This improves both the speed and cost of running this eval	2024-11-28 12:48:25 -08:00
Debanjum	22aef9bf53	Add GPQA (diamond) dataset to eval	2024-11-28 12:48:25 -08:00
Debanjum	8dd2122817	Set sample size to 200 for automated eval runs as well	2024-11-23 14:48:38 -08:00
Debanjum	50d8405981	Enable khoj to use terrarium code sandbox as tool in eval workflow	2024-11-20 14:19:27 -08:00
Debanjum	ffbd0ae3a5	Fix eval github workflow to run on releases, i.e on tags push	2024-11-20 12:57:42 -08:00
Debanjum	a2ccf6f59f	Fix github workflow to start Khoj, connect to PG and upload results - Do not trigger tests to run in ci on update to evals	2024-11-18 04:25:15 -08:00
Debanjum	7c0fd71bfd	Add GitHub workflow to quiz Khoj across modes and specified evals (#982 ) - Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across general, default and research modes - Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models - Trigger eval workflow on release or manually - Make dataset, khoj mode and sample size configurable when triggered via manual workflow - Enable Web search, webpage read tools during evaluation	2024-11-18 02:19:30 -08:00