Set page size to 100 to reduce requests required to Github API to 1/3

- Default is 30. So number of paginated requests required to get all items (commits, files) will reduce by 67% - No need to increase page size for the get tree Github API request from `get_markdown_files' Get tree Github API doesn't support pagination and return 100K items in response. This should be way more than enough for our current use-cases
2024-11-24 07:55:07 +01:00 · 2023-06-18 01:20:05 -07:00 · 2023-06-18 01:20:05 -07:00 · 6fdac24416
commit 6fdac24416
parent 87975e589a
1 changed files with 2 additions and 1 deletions
--- a/src/khoj/processor/github/github_to_jsonl.py
+++ b/src/khoj/processor/github/github_to_jsonl.py
@ -117,11 +117,12 @@ class GithubToJsonl(TextToJsonl):
        # Get commit messages from the repository using the Github API
        commits_url = f"{self.repo_url}/commits"
        headers = {"Authorization": f"token {self.config.pat_token}"}
        params = {"per_page": 100}
        commits = []
        while commits_url is not None:
            # Get the next page of commits
-            response = requests.get(commits_url, headers=headers)
+            response = requests.get(commits_url, headers=headers, params=params)
            raw_commits = response.json()
            # Wait for rate limit reset if needed