Welcome to the GitHub Copilot Trust Center, we are excited you are here.
GitHub Copilot Trust Center
We enable developers and organizations to maximize their potential by prioritizing security, privacy, compliance, and transparency as we develop and iterate on GitHub Copilot.
GitHub Copilot Trust Center
GitHub Copilot and AI
AI coding tools are already reshaping software development, and AI’s role in coding will only continue to grow. Here’s what we’ve learned so far.
GitHub Copilot and security
GitHub Copilot uses top-notch Azure infrastructure and encryption, and an AI-based vulnerability prevention system that blocks insecure coding patterns in real-time.
How GitHub Copilot treats code and customer data
GitHub Copilot for Business does not access the source code in your editor other than to generate a suggestion, and prompts used to generate a suggestion are transmitted to the model securely. Once a suggestion is generated, your prompts are not retained.
Prompts used to generate a suggestion may include various elements of the context, including file content both in the file you are editing, as well as neighboring or related files within a project. Prompts may also include the URLs of repositories or file paths to identify relevant context. The comments and code along with context are then used to synthesize and suggest individual lines and whole functions.
What is transmitted back and forth:
How much data is transmitted back and forth varies widely and depends on many factors, including: the language the user is using, if there are other open tabs, how long the current file is, whether the code referencing feature is enabled, whether the model generates a single, or multi-line suggestion, whether parts of a prompt are redacted, etc. Therefore, we can't indicate exactly how much is transmitted in number of bytes/characters in a prompt. Further, because responses are non-deterministic we can't know how many characters/bytes will be emitted by the model ahead of time. The same input could yield a different output depending on context.
Copilot for Business does not retain any prompts—including code and other context used for the purposes of providing suggestions—for training its models or any other development of Microsoft or GitHub products. Prompts are discarded once a suggestion is returned.
GitHub Copilot transmits data to GitHub’s Azure tenant to generate suggestions, including both contextual data about the code and file being edited (“prompts”) and data about the user’s actions (“user engagement data”). Copilot applies several measures of protection to this data, including:
The transmitted data is encrypted both in transit and at rest; Copilot-related data is encrypted in transit using transport layer security (TLS), and for any data we retain at rest using Microsoft Azure’s data encryption (FIPS Publication 140-2 standards).
Prompts only persist long enough to return a response from GitHub’s Azure tenant and are then immediately discarded. Prompts are never stored; all processing is ephemeral and takes place in memory.
Where user engagement data is retained, access is strictly controlled. The data can only be accessed by (1) named GitHub personnel working on the Copilot team or on the GitHub platform health team, and (2) Microsoft personnel working on or with the Copilot team.
Role-based access controls and multi-factor authentication are required for personnel accessing user engagement data.
User engagement data is currently stored for 24 months.
For Copilot for Business, prompts are never written to durable storage but are held in volatile memory for as long as the request is being serviced. Prompts are crafted on a user machine by the Copilot extension, then are sent to the Copilot proxy over TLS, where they are forwarded to the Azure Open AI service, (also over TLS). Once the proxy receives and forwards the response back to the extension, the request context is destroyed, and the prompt content is no longer accessible. Access to the runtime where the Copilot proxy is deployed is limited to the Copilot engineering team. There is no access to prompt content, as it is not logged; or exposed through any API.
Audits and Certifications: GitHub Copilot is not currently included in GitHub’s existing audits and certifications, including SOC 2, ISO 27001, and FedRAMP Tailored. Compliance at GitHub begins with good security, so our first focus is fully onboarding Copilot to GitHub security programs and tooling. GitHub is engaging with a third-party audit firm to perform a gap assessment of Copilot as part of readiness activities for SOC 2 Type 1 (security criteria) and ISO 27001, with a goal of having the full audits completed by May 2024.
External Penetration Test: GitHub has not yet performed an external penetration test of Copilot. GitHub plans to include Copilot for Business penetration testing in the next cycle, which is in the 2nd half of 2023, with the report being issued in early 2024
How can you help
You can help by using GitHub Copilot and sharing feedback in the feedback forum. Please also report incidents (e.g., offensive output, code vulnerabilities, apparent personal information in code generation) directly to firstname.lastname@example.org so that we can improve our safeguards. GitHub takes safety and security very seriously and we are committed to continually improving.
Copilot is included in the GitHub Bug Bounty program. Copilot submissions are triaged and processed through the existing bug bounty workstreams.
How GitHub Copilot aids secure development
As suggestions are generated and before they are returned to the user, Copilot applies an AI-based vulnerability prevention system that blocks insecure coding patterns in real-time to make Copilot suggestions more secure. Our model targets the most common vulnerable coding patterns, including hardcoded credentials, SQL injections, and path injections.
The system leverages LLMs to approximate the behavior of static analysis tools and can even detect vulnerable patterns in incomplete fragments of code. This means insecure coding patterns can be quickly blocked and replaced by alternative suggestions.
The best way to build secure software is through a secure software development lifecycle (SDLC). GitHub offers solutions to assist with other aspects of security throughout the SDLC, including code scanning (SAST), secret scanning, and dependency management (SCA). We recommend enabling features like branch protection to ensure that code is merged into your codebase only after it has passed your required tests and peer review.
How GitHub Copilot works with other security measures
Proxies for filtering, e.g., PII: Outbound requests contain a prompt which is made up of code in the currently edited file and related files. If this request is dropped, then Copilot will fail to provide a completion and may show an error message. If the request is modified through operation of a proxy filter that removes personal information or questionable content or code, then Copilot is able to process the request as normal.
Air-gapped environments. GitHub Copilot for Business requires an active internet connection between a user’s IDE and the GitHub Copilot Proxy service. As a result, it does not work in air-gapped environments.
Limitations of GitHub Copilot
While our experiments have shown that GitHub Copilot suggests code of the same or better quality than the average developer, we can’t give any assurance that the code is bug free. Like any programmer, Copilot may sometimes suggest insecure code. We recommend taking the same precautions you take with the code written by your engineers (linting, code scanning, IP scanning, etc.)
GitHub Copilot and privacy
Your privacy is paramount. We're committed to handling your data responsibly, while delivering an optimal GitHub Copilot experience.
What personal data does GitHub Copilot for Business collect?
GitHub Copilot for Business collects personal data from three categories of data: user engagement data, prompts and suggestions.
User engagement data
User engagement data is usage information about events generated when interacting with a code editor. These events include user edit actions (for example completions accepted and dismissed), error messages, and general usage data to identify user metrics such as latency and feature engagement. This information may include personal data, such as pseudonymous identifiers.
A prompt is the collection of code and supporting contextual information that the GitHub Copilot extension sends to GitHub to generate suggestions. The extension sends a prompt when a user working on a file, pauses typing, or uses a designated keyboard shortcut to request a suggestion.
A suggestion is one or more lines of proposed code and other output returned to the Copilot extension after a prompt is received and processed by the AI models that power Copilot.
What processing role does GitHub play with respect to personal data collected by GitHub Copilot for Business?
For purposes of this section, we use “processor" in the meaning of the EU’s General Data Protection Regulation.
For enterprise customers, GitHub acts primarily as a processor for personal data in GitHub Copilot for Business
GitHub’s data protection commitments to our enterprise customers are laid out in GitHub’s Data Protection Agreement (“GitHub DPA”).
Per the GitHub DPA, GitHub acts primarily as processor (or subprocessor to enterprise customers who are processors) whenever it processes personal data to provide Copilot. “Providing” Copilot includes processing activities, such as delivering functional capabilities, troubleshooting, and making ongoing improvements.
GitHub processes personal data as a controller in limited, contractually agreed circumstances including billing and account management, and to produce aggregated reports for capacity planning, product development and regulatory financial reports.
GitHub’s Privacy Statement applies to the information GitHub processes as a data controller on behalf of individuals that use GitHub for personal use through the free or Team services. The GitHub DPA governs GitHub’s use of personal data as a controller in relation to enterprise customers.
What access does Microsoft have to personal data collected by GitHub Copilot for Business? What processing role does Microsoft play with respect to this data?
GitHub ensures that its subprocessors, including Microsoft, comply with all obligations and requirements regarding the use of personal data in accordance with GDPR. Microsoft acts as a sub-processor to GitHub specifically for personal data processed by GitHub on behalf of our customers to provide Copilot for Business. While Microsoft systems and employees may have access to user engagement data, such access is strictly controlled. It's important to note that Microsoft's involvement as a sub-processor is limited to the processing of data on their infrastructure, and access to the data is subject to strict controls, just-in-time approvals, and role-based access controls as outlined in GitHub’s Data Protection Agreement (DPA).
Does GitHub Copilot use Prompts or Suggestions to train AI models?
No, GitHub Copilot does not use Prompts or Suggestions to train AI models. These inputs are not retained or utilized in the training process of AI models for GitHub Copilot.
How else is personal data in GitHub Copilot for Business used?
GitHub processes user engagement data as a process to provide the service, which includes:
To deliver functional capabilities as licensed, configured, and used by Customer and its users, including providing personalized user experiences;
Troubleshooting (preventing, detecting, and repairing problems); and
Keeping Products up to date and performant, and enhancing user productivity, reliability, efficacy, quality, and security.
Per Customer instruction in the GitHub DPA, GitHub processes UED as a data controller for the purposes of:
Billing and account management;
Compensation such as calculating employee commissions and partner incentives;
Aggregated internal reporting and business modeling, such as forecasting, revenue, capacity planning, and product strategy; and
Aggregated financial reporting.
How long does GitHub Copilot for Business retain personal data?
Prompts are discarded once a suggestion is returned.
Suggestions are not retained by GitHub.
User engagement data
User engagement data is retained by GitHub for 24 months.
Github Copilot and data flow
How is the data flowing and what is being done with it?
GitHub Copilot and copyright
Respecting intellectual property rights is an important part of the software development process. Learn about code ownership, filtering, and public code use here.
Does GitHub Copilot “copy/paste”?
No, GitHub Copilot generates suggestions using probabilistic reasoning.
When thinking about intellectual property and open source issues, it is critical to understand how GitHub Copilot really works. The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.
To generate a suggestion, Copilot begins by examining the code in your editor—focusing on the lines just before and after your cursor, but also information in other files open in your editor. That information is sent to Copilot’s model, to make a probabilistic determination of what is likely to come next and generate suggestions.
What are the intellectual property considerations when using GitHub Copilot?
The primary IP considerations for GitHub Copilot relate to copyright. The model that powers Copilot is trained on a broad collection of publicly accessible code, which may include copyrighted code, and Copilot’s suggestions (in rare instances) may resemble the code its model was trained on. Here’s some basic information you should know about these considerations:
Copyright law permits the use of copyrighted works to train AI models: Countries around the world have provisions in their copyright laws that enable machines to learn, understand, extract patterns, and facts from copyrighted materials, including software code. By example, the European Union, Japan, and Singapore, have express provisions permitting machine learning to develop AI models. Other countries including Canada, India, and the United States also permit such training under their fair use/fair dealing provisions. GitHub Copilot’s AI model was trained with the use of code from GitHub’s public repositories—which are publicly accessible and within the scope of permissible copyright use.
What about copyright risk in suggestions? In rare instances (less than 1% based on GitHub’s research), suggestions from GitHub may match examples of code used to train GitHub’s AI model. Again, Copilot does not “look up” or “copy and paste” code, but is instead using context from a user’s code editor to synthesize and generate a suggestion. Our experience shows that matching suggestions are most likely to occur in two situations: (i) when there is little or no context in the code editor for Copilot’s model to synthesize, or (ii) when a matching suggestion represents a common approach or method. If a suggestion matches existing copyrighted code, there is risk that using that suggestion could trigger claims of copyright infringement, which would depend on the amount and nature of code used, and the context of how the code is used. In many ways, this is the same risk that arises when using any code that a developer does not originate, such as copying code from an online source, or reusing code from a library. That is why responsible organizations and developers recommend that users employ code scanning policies to identify and evaluate potential matching code.
What about open source license considerations?
When a developer uses code made available under an open source software license, they may have to meet license requirements, such as attributing the author of the code, disclosing source code that makes use of open source code, or distributing the code under certain licenses. If these requirements are not met, the owner of the code could assert claims including copyright infringement or breach of the applicable open source license.
Does a suggestion that matches code automatically trigger copyright or open source considerations? No. The existence of matching code does not itself dictate whether the concerns and their legal risk exist. Whether and when these considerations may apply depends on many factors, including the quantity and nature of the open source code used, and the specific open source license applicable to such code. As with any code that your developers did not originate, the decision about when, how much, and in what context to use any code is one your organization needs to make based on its policies, and in consultation with industry and legal service providers. All organizations should maintain appropriate policies and procedures to ensure that these licensing concerns are properly addressed, as described below.
Discussing all possible concerns and safeguards around open source is beyond the scope of this document. If your organization is using GitHub Copilot, however, you are likely already developing code, policies, and procedures around open source. You should apply them equally to code suggested by Copilot.
Each organization is responsible for setting its open source policies and procedures.
Does GitHub Copilot include a filtering mechanism to mitigate risk?
Yes, GitHub Copilot does include an optional code referencing filter to detect and suppress certain suggestions that match public code on GitHub.
GitHub has created a duplication detection filter to detect and suppress GitHub Copilot suggestions that contain code that includes snippets of at least 150 characters that match public code on GitHub. This filter is enabled by the administrator for your enterprise and it can apply for all organizations within your enterprise or the administrator can defer control to individual organizations.
With the filter enabled, Copilot checks code suggestions with its surrounding code for matches or near matches (ignoring whitespace) against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to the user.
Does GitHub Copilot include features to make it easier for users to identify potentially relevant open source licenses for matching suggestions?
Yes, GitHub Copilot is previewing a code-referencing feature to assist users to find and review potentially relevant open source licenses. The code-referencing feature is currently in preview.
If a suggestion matches publicly available code on GitHub, maintain consistency with your organization’s open source policies and procedures. A prudent and responsible step should include investigating available information to determine whether to use a suggestion.
GitHub Copilot’s code referencing feature identifies suggestions that contain exact [or near] matches with public code. When a match is located, Copilot provides an alert that includes links to repositories for any such matching code, along with any available information on applicable software licenses, and logs this information. Copilot users can review this information to determine whether the applicable suggestions are suitable for use, and whether additional measures may be necessary to use them.
Copilot users can also use this feature as a tool for learning. Using the information provided by the code referencing feature, a developer might find inspiration from other codebases, discover documentation, and almost certainly gain confidence that this fragment is appropriate to use in their project. They might take a dependency, provide attribution where appropriate, or possibly even pursue another implementation strategy. By helping developers understand the community context of their code in a manner that also preserves developer flow, we believe Copilot will continue to deliver responsible innovation and true happiness at the keyboard.
Is GitHub Copilot intended to fully automate code generation and replace developers?
No. Copilot is a tool intended to make developers more efficient. It’s not intended to replace developers, who should continue to apply the same sorts of safeguards and diligence they would apply with regard to any third-party code of unknown origin.
The product is called “Copilot” not “Autopilot” and it’s not intended to generate code without oversight. You should use exactly the same sorts of safeguards and diligence with Copilot’s suggestions as you would use with any third-party code.
Identifying best practices for use of third party code is beyond the scope of this section. That said, whatever practices your organization currently uses – rigorous functionality testing, code scanning, security testing, etc. – you should continue these policies with Copilot’s suggestions. Moreover, you should make sure your code editor or editor does not automatically compile or run generated code before you review it.
Can GitHub Copilot users simply use suggestions without concern?
Not necessarily. GitHub Copilot users should align their use of Copilot with their respective risk tolerances.
As noted above, GitHub Copilot is not intended to replace developers, or their individual skill and judgment, and is not intended to fully automate the process of code development. The same risks that apply to the use of any third-party code apply to the use of Copilot’s suggestions.
Depending on your particular use case, you should consider implementing the protections discussed above. It is your responsibility to assess what is appropriate for the situation and implement appropriate safeguards.
You’re entitled to IP indemnification from GitHub for the unmodified suggestions when Copilot’s filtering is enabled. If you do elect to enable this feature, the copyright responsibility is ours, not our customers. As part of our ongoing commitment to responsible AI, GitHub and Microsoft extends our IP indemnity and protection support to our customers who are empowering their teams with GitHub Copilot. Details here.
How does GitHub Copilot use your code to provide suggestions?
GitHub Copilot provides suggestions based on the context of what you’re working on in your code editor. This requires temporarily transferring an ephemeral copy of various elements of that context to GitHub’s servers.
Generative AI tools provide responses to something generically called a “prompt.” In the case of GitHub Copilot, the prompt consists of various elements from your code editor. This may include file content both in the file you’re editing, as well as neighboring or related files within a project. It may also include URLs of repositories or file paths to identify relevant context. The comments and code, along with this context, are then used to synthesize and suggest individual lines of code and entire functions.
The prompt needs to be transferred to GitHub’s servers for processing. The transmitted data is encrypted, both in transit and at rest.
Copilot interacts with its model, which is hosted on Microsoft’s Azure service, to generate suggestions. These suggestions are then transmitted back to the user. As above, the suggestions are encrypted in transit and at rest (as noted here/insert security section discussing this).
Prompts are transmitted in real-time only to return suggestions and discarded once a suggestion is returned. Unless you are a Copilot for Individuals user who has disabled code snippets collection in settings, Copilot does not retain any prompts, including code content, code prompts, code suggestions, nor content from neighboring tabs and local file systems. Suggestions are not retained either, except temporarily as further described below.
Does GitHub Copilot retain any of your code that it used as a basis for providing suggestions?
GitHub Copilot for Business does not retain your code for any purpose after it has provided Suggestions.
As noted above, Copilot does transfer content from your code editor to GitHub’s servers for purposes of assessing the context and providing suggestions. The transferred copy is purely ephemeral and, shortly after Copilot has provided suggestions, the copy is deleted. It is not used for any other purpose.
Does GitHub Copilot use any of your code to train or retrain the codex model (or any successor model)?
No. GitHub Copilot does not use your code to train the Azure Open AI model.
As noted above, Copilot does not retain your code to train large language models but only does so to improve the performance of the copilot service; by fine tuning ranking, sorting algorithms, and prompt crafting.
Who owns the suggestions provided by GitHub Copilot?
We don’t determine whether a suggestion is capable of being owned, but we are clear that GitHub does not claim ownership of a suggestion.
Whether a suggestion generated by an AI model can be owned depends on many factors (e.g. the intellectual property law in the relevant country, the length of the suggestion, the extent that suggestion is considered ‘functional’ instead of expressive, etc).
If a suggestion is capable of being owned, our terms are clear: GitHub does not claim ownership.
GitHub does not claim ownership of any suggestion. In certain cases, it is possible for Copilot to produce similar suggestions to different users. For example, two unrelated users both starting new files to code the quicksort algorithm in Java will likely get the same suggestion. The possibility of providing similar suggestions to multiple users is a common part of generative AI systems.
GitHub Copilot and the future of work
GitHub Copilot isn’t made to replace developers—it’s here to enhance their work and make the industry more inclusive, too.
Does GitHub Copilot empower developers and enhance productivity?
GitHub Copilot is not a substitute for developers. In fact, it empowers them to be more productive, accelerating some coding tasks to free up time for them to focus on more important problems. Our research shows that GitHub Copilot boosts job satisfaction, too—and we’re supporting external research to understand what generative AI technology means for the future of work.
How does GitHub Copilot support an inclusive development catalyst?
Early research on generative AI and GitHub Copilot specifically find that these tools have the potential to lower barriers to entry and enable more people to become software developers. GitHub supports programs to expand access to Copilot (including making it free for students) and other developer tools in an effort to help those interested in joining the industry.
How does GitHub Copilot transform developer opportunities?
Advances in developer productivity are nothing new. While AI may change your workflow, history offers many examples of how jobs evolve and adapt, often creating more opportunities in the process. Compilers, high-level programming languages, open source software, IDEs—the list of advances that have changed how developers work is long and ever-expanding. The data shows that, over time, these tools have lowered costs of software development while dramatically increasing demand for software and developers. Here’s the result: According to the US Bureau of Labor Statistics, there are more developers than ever before and they are paid more, too (even after adjusting for inflation).
GitHub Copilot and accessibility
GitHub is committed to empowering developers with disabilities to help build the technologies that drive human progress.
What is GitHub's mission and goal regarding developer collaboration and accessibility for people with disabilities?
At GitHub, our mission is to accelerate human progress through developer collaboration. We believe people with disabilities should benefit from and be able to contribute to the creation of that progress.
Our goal is to empower developers with disabilities to build on GitHub. In doing so, we collectively increase access to technology for all people with disabilities. This includes access to our AI pair programmer, GitHub Copilot, which improves developer productivity and happiness.
What accessibility standards does GitHub use?
While developing products including GitHub Copilot, we take into account leading global accessibility standards, which include:
Web Content Accessibility Guidelines (WCAG)
U.S. Section 508
EN 301 549
How does GitHub include accessibility in the development process?
In addition to accessibility standards, the development and iterative improvement of our products is guided by the lived experiences of people in our community with disabilities. We host regular internal accessibility office hours that provide direct feedback to designers and developers.
Accessibility is also integrated into our development processes through design checklists, linting, code inspection, automated accessibility scanning, and manual testing. GitHub Copilot for Visual Studio and Visual Studio Code leverages the native accessibility features of those Integrated Development Environments.
How does GitHub test accessibility?
Our internal accessibility audits are performed by testers that have been certified through the U.S. Department of Homeland Security (DHS) Trusted Tester Program. The Trusted Tester program creates a common testing approach, including code and UI inspection-based tests for determining software and website accessibility compliance and conformance to accessibility standards. Our internal audit process also includes testing by people with disabilities.
Where can users find information about GitHub accessibility?
Visit accessibility.github.com for our accessibility statement and accessibility conformance reports
Check the accessibility change log for the latest improvements
Find product accessibility demos on our accessibility playlist on YouTube
Catch the latest news on the GitHub accessibility blog
Ask a question on the accessibility community discussions page
Report a problem at support.github.com
GitHub Copilot and contracts
There are a few documents that govern your use of GitHub Copilot. Learn about them here.
GitHub Copilot product specific terms