khoj/documentation/docs/get-started/privacy-security.md

32 lines
2.9 KiB
Markdown
Raw Normal View History

---
sidebar_position: 4
slug: /privacy
---
# Privacy
If you're using Khoj to index you personal data, it's almost certain you'll have sensitive and private information you'd like to index.
Khoj is designed to be a personal AI, so one of our cornerstone principles is to make it as privacy-friendly as possible. That's why, you can *always* choose to run Khoj on your own hardware, and never share your data outside of your device. You can generate your embeddings directly on your machine, and then use an offline chat client so that your data never leaves your machine. You'll find the instructions to [self-hosting](./setup.mdx) here.
Here's what to consider if you're using Khoj, whether self-hosted or on our cloud:
1. Some of your relevant indexed data may be included as context when you chat with Khoj. This means that it may be sent to OpenAI, if you use one of the OpenAI models.
1. We collect completely anonymized usage telemetry and send it to [PostHog](https://posthog.com/). This includes data like unique chat requests, unique search requests, unique requests to index data. Usage data is collected to help us understand how people are using Khoj, and to help us prioritize features.
- We do not log your IP address, nor upload any of your personal data to PostHog.
- You can see our telemetry aggregation code [here](https://github.com/khoj-ai/khoj/blob/master/src/khoj/routers/helpers.py#L71) and see our telemetry server [here](https://github.com/khoj-ai/khoj/blob/master/src/telemetry/telemetry.py).
- If you're self-hosting, you can opt out of telemetry by following [these instructions](/miscellaneous/telemetry).
Self-hosting isn't for everyone, so we've still taken steps to make Khoj privacy-friendly, even if you choose to use our [cloud offering](https://app.khoj.dev/login). Here's what to consider when using Khoj Cloud:
1. Your embeddings are generated by an open source model within our own dedicated endpoint [hosted on AWS with Huggingface](https://huggingface.co/inference-endpoints/dedicated). There's zero persistent memory to the Huggingface Inference endpoints (it's stateless).
1. Your embeddings and the associated raw text are stored in a secure Postgres DB in our private AWS cloud. Your data is sharded on a unique user ID. We store the raw text in your files to improve file syncing and provide context when you chat with Khoj.
1. When you use the single-sign-on option with Google, we only receive your name, a link to your profile photo, and your email address.
:::tip[Info]
Your data is yours. We do not sell your data or use it for training models. Khoj is a sustainable, open-source alternative to closed-source, commercial personal AI. We have no interest in selling your data to make a quick buck.
:::
We have lots of ideas of how to make Khoj really robust as a personal AI and cloud offering, but also trust-less and privacy-centric. Please [reach out](mailto:team@khoj.dev) if this is important to you, and you'd like to help us build it.