tag:www.githubstatus.com,2005:/history GitHub Status - Incident History 2024-06-28T19:03:04Z GitHub tag:www.githubstatus.com,2005:Incident/21251038 2024-06-28T19:03:04Z 2024-06-28T19:03:04Z Delays in Actions <p><small>Jun <var data-var='date'>28</var>, <var data-var='time'>19:03</var> UTC</small><br><strong>Update</strong> - We are continuing to work on mitigating delays creating pull request merge commits, Actions runs for pull request events, and changes to organization members.</p><p><small>Jun <var data-var='date'>28</var>, <var data-var='time'>17:59</var> UTC</small><br><strong>Update</strong> - Actions runs triggered by pull requests are experiencing start delays. We have engaged the appropriate teams and are investigating the issue.</p><p><small>Jun <var data-var='date'>28</var>, <var data-var='time'>17:58</var> UTC</small><br><strong>Update</strong> - Pull Requests is experiencing degraded performance. We are continuing to investigate.</p><p><small>Jun <var data-var='date'>28</var>, <var data-var='time'>17:34</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Actions</p> tag:www.githubstatus.com,2005:Incident/21243659 2024-06-27T23:44:21Z 2024-06-27T23:44:21Z Incident with Codespaces <p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>23:44</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>23:43</var> UTC</small><br><strong>Update</strong> - We have identified a root cause for Codespaces issue in the West US region and are rolling out a fix.</p><p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>23:34</var> UTC</small><br><strong>Update</strong> - A subset of customers are currently experiencing issues creating and resuming Codespaces in the West US region.</p><p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>23:34</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Codespaces</p> tag:www.githubstatus.com,2005:Incident/21242609 2024-06-27T21:42:14Z 2024-06-28T13:51:48Z Disruption with GitHub services <p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>21:42</var> UTC</small><br><strong>Resolved</strong> - Between June 27th, 2024 at 20:39 UTC and 21:37 UTC the Migrations service was unable to process migrations. This was due to a invalid infrastructure credential. <br /><br />We mitigated the issue by updating the credential and deploying the service. <br /><br />Mechanisms and automation will be implemented to detect and prevent this issue again in the future.</p><p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>21:20</var> UTC</small><br><strong>Update</strong> - Some GitHub Enterprise Importer migrations are failing. We have identified a root cause and are rolling out a fix.</p><p><small>Jun <var data-var='date'>27</var>, <var data-var='time'>21:16</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/21156994 2024-06-19T12:53:33Z 2024-06-24T02:34:26Z Incident with Copilot Pull Request Summaries <p><small>Jun <var data-var='date'>19</var>, <var data-var='time'>12:53</var> UTC</small><br><strong>Resolved</strong> - Between June 18th, 2024 at 09:34 PM UTC and June 19th, 2024 at 12:53 PM the Copilot Pull Request Summaries Service was unavailable. This was due to an internal change in access approach from the Copilot Pull Request service to the Copilot API.<br /><br />We mitigated the incident by reverting the change in access which immediately resolved the errors.<br /><br />We are working to improve our monitoring in this area and reduce our time to detection to more quickly address issues like this one in the future.<br /></p><p><small>Jun <var data-var='date'>19</var>, <var data-var='time'>12:31</var> UTC</small><br><strong>Update</strong> - We are deploying a fix now and expect recovery within the hour.</p><p><small>Jun <var data-var='date'>19</var>, <var data-var='time'>11:59</var> UTC</small><br><strong>Update</strong> - We’ve identified an issue with Copilot pull request summaries that has caused errors when attempting to generate summaries since yesterday (June 18, 2024) at around 21:00 UTC. <br /><br />We have identified a fix, and we expect the issue to be resolved within two hours.</p><p><small>Jun <var data-var='date'>19</var>, <var data-var='time'>11:58</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Copilot</p> tag:www.githubstatus.com,2005:Incident/21148992 2024-06-18T18:09:43Z 2024-06-21T19:43:13Z We are investigating degraded performance for GitHub Enterprise Importer migrations <p><small>Jun <var data-var='date'>18</var>, <var data-var='time'>18:09</var> UTC</small><br><strong>Resolved</strong> - Starting on June 18th from 4:59pm UTC to 6:06pm UTC, customer migrations were unavailable and failing. This impacted all in-progress migration during that time. This issue was due to an incorrect configuration on our Database cluster. We mitigated the issue by remediating the database configuration and are working with stakeholders to ensure safeguards are in place to prevent the issue going forward.</p><p><small>Jun <var data-var='date'>18</var>, <var data-var='time'>18:04</var> UTC</small><br><strong>Update</strong> - We have applied a configuration change to our migration service as a mitigation and are beginning to see recovery and in increase in successful migration runs. We are continuing to monitor.</p><p><small>Jun <var data-var='date'>18</var>, <var data-var='time'>17:48</var> UTC</small><br><strong>Update</strong> - We have identified what we believe to be the source of the migration errors and are applying a mitigation, which we expect will begin improving migration success rate.</p><p><small>Jun <var data-var='date'>18</var>, <var data-var='time'>17:15</var> UTC</small><br><strong>Update</strong> - We are investigating degraded performance for GitHub Enterprise Importer migrations. Some customers may see an increase in failed migrations. Investigation is ongoing.</p><p><small>Jun <var data-var='date'>18</var>, <var data-var='time'>17:14</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/21078205 2024-06-11T21:39:47Z 2024-06-14T05:48:37Z Incident with Actions <p><small>Jun <var data-var='date'>11</var>, <var data-var='time'>21:39</var> UTC</small><br><strong>Resolved</strong> - On June 11th, 2024 between 20:13 UTC and 21:39 UTC, the GitHub Actions service was degraded. A security-related change applied by one of our third-party providers prevented new customers from onboarding to GitHub Actions and caused an average 28% of Actions jobs to fail.<br /><br />We mitigated the incident by working with the third-party provider to revert the change and are working with their engineering team to fully understand the root cause. Additionally, we are improving communication between GitHub and our service providers to reduce the time needed to resolve similar issues in the future.</p><p><small>Jun <var data-var='date'>11</var>, <var data-var='time'>21:35</var> UTC</small><br><strong>Update</strong> - We've applied a mitigation to unblock running Actions and are seeing an improvement in our service availability.</p><p><small>Jun <var data-var='date'>11</var>, <var data-var='time'>21:16</var> UTC</small><br><strong>Update</strong> - Customers may see issues running Actions, we are in the process of applying a mitigation to restore our service.</p><p><small>Jun <var data-var='date'>11</var>, <var data-var='time'>20:34</var> UTC</small><br><strong>Update</strong> - Customers may see issues running Actions</p><p><small>Jun <var data-var='date'>11</var>, <var data-var='time'>20:33</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Actions and API Requests</p> tag:www.githubstatus.com,2005:Incident/21015719 2024-06-06T04:43:52Z 2024-06-07T22:05:34Z Incident with Packages <p><small>Jun <var data-var='date'> 6</var>, <var data-var='time'>04:43</var> UTC</small><br><strong>Resolved</strong> - On June 6, 2024 between 03:29 and 04:19 UTC, the service responsible for the Maven package registry was degraded. This affected GitHub customers who were trying to upload packages to the Maven package registry.<br /><br />We observed increased database pressure due to bulk operations in progress, and at 04:19 UTC, the Maven upload issues resolved when those bulk operations finished. We're continuing to assess any additional compounding factors.<br /><br />We are working on improving our thresholds for existing alerts to reduce our time to detection and mitigation of issues like this one in the future.<br /></p><p><small>Jun <var data-var='date'> 6</var>, <var data-var='time'>04:38</var> UTC</small><br><strong>Update</strong> - We were alerted to problems in maven uploads. These have now improved, and we're continuing to monitor and investigate.</p><p><small>Jun <var data-var='date'> 6</var>, <var data-var='time'>04:21</var> UTC</small><br><strong>Update</strong> - We are investigating reports of issues with Packages. We will continue to keep users updated on progress towards mitigation.</p><p><small>Jun <var data-var='date'> 6</var>, <var data-var='time'>04:21</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Packages</p> tag:www.githubstatus.com,2005:Incident/21009829 2024-06-05T19:27:21Z 2024-06-25T15:16:14Z Incident with Issues <p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>19:27</var> UTC</small><br><strong>Resolved</strong> - On June 5, 2024, between 17:05 UTC and 19:27 UTC, the GitHub Issues service was degraded. During that time, no events related to projects were displayed on issue timelines. These events indicate when an issue was added to or removed from a project and when their status changed within a project. The data couldn’t be loaded due to a misconfiguration of the service backing these events. This happened after a scheduled secret rotation when the wrongly configured service continued using the old secrets which had expired. <br /><br />We mitigated the incident by remediating the service configuration and have started simplifying the configuration to avoid similar misconfigurations in the future.</p><p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>19:27</var> UTC</small><br><strong>Update</strong> - Issues is operating normally.</p><p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>19:19</var> UTC</small><br><strong>Update</strong> - We continue to troubleshoot the problem with issues timeline.</p><p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>18:47</var> UTC</small><br><strong>Update</strong> - We continue to troubleshoot the problem with issues timeline.</p><p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>18:01</var> UTC</small><br><strong>Update</strong> - We're continuing to investigate the problem.</p><p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>17:26</var> UTC</small><br><strong>Update</strong> - We're seeing issues related to the issues timeline service, we're investigating and we will continue to keep users updated on progress towards mitigation.</p><p><small>Jun <var data-var='date'> 5</var>, <var data-var='time'>17:22</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded availability for Issues</p> tag:www.githubstatus.com,2005:Incident/20938285 2024-05-30T17:22:55Z 2024-06-25T20:59:51Z Incident with Copilot <p><small>May <var data-var='date'>30</var>, <var data-var='time'>17:22</var> UTC</small><br><strong>Resolved</strong> - On May 30th, 2024, between 03:37 PM UTC and 05:14 PM UTC Copilot chat conversations on github.com saw degraded availability, where chat requests referencing files from a repository failed. This was due to an expired security certificate, which required communication to an internal service. Overall, the error rate was 40% on average. Other Copilot chat experiences were unaffected during this time.<br /><br />The incident was mitigated by rotating the certificate in question.<br /><br />To prevent future incidents, we are working to reduce our time to detect and have removed certificate-based dependencies between these internal systems in the process.<br /></p><p><small>May <var data-var='date'>30</var>, <var data-var='time'>17:22</var> UTC</small><br><strong>Update</strong> - Copilot is operating normally.</p><p><small>May <var data-var='date'>30</var>, <var data-var='time'>17:22</var> UTC</small><br><strong>Update</strong> - We have rolled out mitigation and fixes appear to be stable. This incident has been resolved.</p><p><small>May <var data-var='date'>30</var>, <var data-var='time'>17:14</var> UTC</small><br><strong>Update</strong> - Copilot is experiencing degraded performance. We are continuing to investigate.</p><p><small>May <var data-var='date'>30</var>, <var data-var='time'>17:14</var> UTC</small><br><strong>Update</strong> - Our CoPilot API is currently experiencing back-end connectivity issues and we are actively engaged in mitigation steps.</p><p><small>May <var data-var='date'>30</var>, <var data-var='time'>17:14</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/20910208 2024-05-28T21:24:34Z 2024-05-28T21:24:34Z Incident with Codespaces <p><small>May <var data-var='date'>28</var>, <var data-var='time'>21:24</var> UTC</small><br><strong>Investigating</strong> - Codespaces is operating normally.</p><p><small>May <var data-var='date'>28</var>, <var data-var='time'>21:24</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>May <var data-var='date'>28</var>, <var data-var='time'>21:19</var> UTC</small><br><strong>Update</strong> - A fix has been applied and we are seeing some recovery. We will continue to monitor for a bit before marking this issue resolved.</p><p><small>May <var data-var='date'>28</var>, <var data-var='time'>20:53</var> UTC</small><br><strong>Update</strong> - We are still investigating root cause and remediation options. In the meantime, here is a workaround to be able to pull images from DockerHub:<br /><br />1. Make a free DockerHub account at https://hub.docker.com (or use an existing account if you have one).<br />2. Create a DockerHub secret/PAT from https://hub.docker.com/settings/security (Read permission should be sufficient).<br />3. Go to https://github.com/settings/codespaces<br /><br />Add three Codespace secrets:<br /><br />- DOCKERHUB_CONTAINER_REGISTRY_PASSWORD (equal to the DockerHub PAT you created)<br />- DOCKERHUB_CONTAINER_REGISTRY_SERVER (equal to https://index.docker.io/v1/)<br />- DOCKERHUB_CONTAINER_REGISTRY_USER (equal to your DockerHub username)<br /><br />4. Make sure these secrets are set as visible to the target repo.<br />5. Create/rebuild your Codespace<br /><br />Steps above are distilled from the official docs: https://docs.github.com/en/codespaces/reference/allowing-your-codespace-to-access-a-private-registry#example-secrets</p><p><small>May <var data-var='date'>28</var>, <var data-var='time'>20:53</var> UTC</small><br><strong>Update</strong> - <i>Duplicate update, same as above</i></p><p><small>May <var data-var='date'>28</var>, <var data-var='time'>20:23</var> UTC</small><br><strong>Update</strong> - Some Codespaces are currently failing to be properly created for images hosted by DockerHub. Other registries should be unaffected. We are investigating root cause and will report back shortly.</p><p><small>May <var data-var='date'>28</var>, <var data-var='time'>20:17</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Codespaces</p> tag:www.githubstatus.com,2005:Incident/20863096 2024-05-23T16:02:48Z 2024-05-28T20:33:02Z Incident with Codespaces <p><small>May <var data-var='date'>23</var>, <var data-var='time'>16:02</var> UTC</small><br><strong>Resolved</strong> - On May 23, 2024 between 15:31 and 16:02 the Codespaces service reported a degraded experience in codespaces across all regions. Upon further investigation this was found to be an error reporting issue and did not have user facing impact. The new error reporting that was implemented began raising on existing non-user facing errors that are handled further in the flow, at the controller level, which do not cause user impact. We are working to improve our reporting roll out process to reduce issues like this in the future which includes updating monitors and dashboards to exclude this class of error. We are also reclassifying and correcting internal API responses to better represent when errors are user facing for more accurate reporting.</p><p><small>May <var data-var='date'>23</var>, <var data-var='time'>15:41</var> UTC</small><br><strong>Update</strong> - We are investigating increased error rates for customers attempting to start Codespaces across all regions, around 15% of attempts are affected. Any affected customers may attempt to retry starting their Codespace. We are continuing to investigate.</p><p><small>May <var data-var='date'>23</var>, <var data-var='time'>15:31</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Codespaces</p> tag:www.githubstatus.com,2005:Incident/20841162 2024-05-21T19:06:05Z 2024-05-23T16:32:06Z Incident with Actions <p><small>May <var data-var='date'>21</var>, <var data-var='time'>19:06</var> UTC</small><br><strong>Resolved</strong> - On May 21, 2024, between 11:40 UTC and 19:06 UTC various services experienced elevated latency due to a configuration change in an upstream cloud provider.<br /><br />GitHub Copilot Chat experienced P50 latency of up to 2.5s and P95 latency of up to 6s. GitHub Actions was degraded with 20 - 60 minute delays for workflow run updates. GitHub Enterprise Importer customers experienced longer migration run times due to GitHub Actions delays. Additionally, billing related metrics for budget notifications and UI reporting were delayed leading to outdated billing details. No data was lost and systems caught up after the incident. <br /><br />At 12:31 UTC, we detected increased latency to cloud hosts. At 14:09 UTC, non-critical traffic was paused, which did not result in restoration of service. At 14:27 UTC, we identified high CPU load within a network gateway cluster caused by a scheduled operating system upgrade that resulted in unintended, uneven distribution of traffic within the cluster. We initiated deployment of additional hosts at 16:35 UTC. Rebalancing completed by 17:58 UTC with system recovery observed at 18:03 UTC and completion at 19:06 UTC.<br /><br />We have identified gaps in our monitoring and alerting for load thresholds. We have prioritized these fixes to improve time to detection and mitigation of this class of issues.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>18:14</var> UTC</small><br><strong>Update</strong> - Actions is operating normally.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>18:03</var> UTC</small><br><strong>Update</strong> - We are beginning to see recovery for any delays to Actions Workflow Runs, Workflow Job Runs, and Check Steps. Customers who are still experiencing jobs which appear to be stuck may re-run the workflow in order to see a completed state. We are also seeing recovery for GitHub Enterprise Importer migrations. We are continuing to monitor recovery.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>17:41</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate delays to status updates to Actions Workflow Runs, Workflow Job Runs, and Check Steps. This is impacting 100% of customers using these features, with an average delay of 20 minutes and P99 delay of 1 hour. Customers may see that their Actions workflows may have completed, but the run may appear to be hung waiting for its status to update. This is also impacting GitHub Enterprise Importer migrations. Migrations may take longer to complete. We are are working with our provider to address the issue and will continue to provide updates as we learn more.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>17:14</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate delays to status updates to Actions Workflow Runs, Workflow Job Runs, and Check Steps. Customers may see that their Actions workflows may have completed, but the run may appear to be hung waiting for its status to update. This is also impacting GitHub Enterprise Importer migrations. Migrations may take longer to complete. We are are working with our provider to address the issue and will continue to provide updates as we learn more.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>16:02</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate delays to Actions Workflow Runs, Workflow Job Runs, and Check Steps and will provide further updates as we learn more.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>15:00</var> UTC</small><br><strong>Update</strong> - We have identified a change in a third party network configuration and are working with the provider to address the issue. We will continue to provide updates as we learn more.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>14:34</var> UTC</small><br><strong>Update</strong> - We have identified network connectivity issues causing delays in Actions Workflow Runs, Workflow Job Runs, and Check Steps. We are continuing to investigate.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>13:58</var> UTC</small><br><strong>Update</strong> - We are investigating delayed updates to Actions job statuses.</p><p><small>May <var data-var='date'>21</var>, <var data-var='time'>12:45</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Actions</p> tag:www.githubstatus.com,2005:Incident/20833952 2024-05-20T17:05:48Z 2024-05-22T19:13:58Z We are investigating reports of degraded performance. <p><small>May <var data-var='date'>20</var>, <var data-var='time'>17:05</var> UTC</small><br><strong>Resolved</strong> - Between May 19th 3:40AM UTC and May 20th 5:40PM UTC the service responsible for rendering Jupyter notebooks was degraded. During this time customers were unable to render Jupyter Notebooks.<br /><br />This occurred due to an issue with a Redis dependency which was mitigated by restarting. An issue with our monitoring led to a delay in our response. We are working to improve the quality and accuracy of our monitors to reduce the time to detection.</p><p><small>May <var data-var='date'>20</var>, <var data-var='time'>17:01</var> UTC</small><br><strong>Update</strong> - We are beginning to see recovery rendering Jupyter notebooks and are continuing to monitor.</p><p><small>May <var data-var='date'>20</var>, <var data-var='time'>16:50</var> UTC</small><br><strong>Update</strong> - Customers may experience errors viewing rendered Jupyter notebooks from PR diff pages or the files tab</p><p><small>May <var data-var='date'>20</var>, <var data-var='time'>16:47</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/20800199 2024-05-16T05:15:35Z 2024-05-17T20:16:45Z Incident with Actions <p><small>May <var data-var='date'>16</var>, <var data-var='time'>05:15</var> UTC</small><br><strong>Resolved</strong> - On May 16, 2024, between 4:10 UTC and 5:02 UTC customers experienced various delays in background jobs, primarily UI updates for Actions. This issue was due to degradation in our background job service affecting 22.4% of total jobs. Across all affected services, the average job delay was 2m 22s. Actions jobs themselves were unaffected, this issue affected the timeliness of UI updates, with an average delay of 11m 40s and a maximum of 20m 14s.<br /><br />This incident was due to a performance problem on a single processing node, where Actions UI updates were being processed. Additionally, a misconfigured monitor did not alert immediately, resulting in a 25m late detection time and a 37m total increase in time to mitigate. <br /><br />We mitigated the incident by removing the problem node from the cluster and service was restored. No data was lost, and all jobs executed successfully.<br /><br />To reduce our time to detection and mitigation of issues like this one in the future, we have repaired our misconfigured monitor and added additional monitoring to this service. <br /></p><p><small>May <var data-var='date'>16</var>, <var data-var='time'>04:43</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Actions</p> tag:www.githubstatus.com,2005:Incident/20787399 2024-05-14T21:04:49Z 2024-06-06T16:17:08Z We are investigating reports of degraded performance. <p><small>May <var data-var='date'>14</var>, <var data-var='time'>21:04</var> UTC</small><br><strong>Resolved</strong> - On May 14, 2024 between 18:00 UTC and 20:10 UTC, GitHub Actions performance was degraded and larger hosted runners using <code>linux-x64</code> images (Ubuntu20, Ubuntu22, Ubuntu24-beta) experienced longer than normal job start up times. Approximately 25% of all runs targeting larger hosted runners queued during this time were slow to start, with a median wait time of 1 minute, 55 seconds.<br /><br />The issue was caused by a downstream dependency overloading, which impacted our machine setup process. Each larger hosted runner job is run on a fresh VM, and one of the setup steps is installing the Actions agent. Up to the point of this incident, we would pull the latest agent version from the GitHub release and install it on the VM. However, during this incident the speciific GitHub release for the Linux x64 Actions agent became overloaded, and our agent downloads were severely throttled. This throttling led to timeouts during the download, and caused our hosted runner system to conclude the VMs were failing to start. We need the Actions agent online to start serving jobs, and with the download timing out, our service assumed the runner wasn't starting up successfully. This failure to start up led to those VMs being reset again and again instead of serving jobs. We mitigated the issue by falling back to a cached version of the Actions agent present on our image.<br /><br />We have further refined the fallback system to automatically use the cached agent binaries, and added new functionality to allow for easier agent downloading from other locations. Both of these measures should eliminate future impacts from similar downstream impact.</p><p><small>May <var data-var='date'>14</var>, <var data-var='time'>20:47</var> UTC</small><br><strong>Update</strong> - We are seeing recovery for queue times on Actions Larger Runners and are continuing to monitor full recovery.</p><p><small>May <var data-var='date'>14</var>, <var data-var='time'>20:09</var> UTC</small><br><strong>Update</strong> - We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.</p><p><small>May <var data-var='date'>14</var>, <var data-var='time'>19:16</var> UTC</small><br><strong>Update</strong> - We are continuing to investigate long queue times for Actions Larger Runners</p><p><small>May <var data-var='date'>14</var>, <var data-var='time'>18:40</var> UTC</small><br><strong>Update</strong> - We are investigating long queue times for Actions Larger Runners</p><p><small>May <var data-var='date'>14</var>, <var data-var='time'>18:37</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/20778761 2024-05-13T20:10:42Z 2024-05-14T17:10:22Z Incident with Actions <p><small>May <var data-var='date'>13</var>, <var data-var='time'>20:10</var> UTC</small><br><strong>Resolved</strong> - On May 13, 2024, between 19:03 UTC and 19:57 UTC, some customers experienced delays in receiving status updates for in-progress GitHub Actions workflow runs. The root cause was identified to be a bug in the logic for checking the state of a configuration which would only manifest under very specific conditions and cause exceptions. These exceptions impacted the backend process for handling workflow run status updates and caused jobs with any annotations to not get updated properly. Jobs without any annotations were not affected. The affected jobs during the incident will get marked as failed after 24 hours, and affected customers will need to manually retry the jobs they want to execute.<br /><br />We resolved the incident by reverting the problematic change. We are enhancing our process for deploying changes and reassessing our monitoring of relevant subsystems to prevent similar issues in the future.<br /></p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>20:10</var> UTC</small><br><strong>Update</strong> - We are seeing signs of recovery for actions jobs</p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>19:51</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Actions</p> tag:www.githubstatus.com,2005:Incident/20775725 2024-05-13T15:44:36Z 2024-05-28T15:20:27Z Incident with Copilot <p><small>May <var data-var='date'>13</var>, <var data-var='time'>15:44</var> UTC</small><br><strong>Resolved</strong> - Incident Report: May 13, 2024 (lasting approximately 4 hours)<br /><br />On May 13 at 10:40 AM UTC, GitHub Copilot Chat began returning error responses to 6% of users. The problem was identified, and a status update was provided shortly after. A mitigation strategy was implemented by 14:30 UTC, which mitigated the impact.<br /><br />The root cause of the incident was a combination of issues in the request handling process. Specifically, some requests were malformed, which resulted in being incorrectly routed to the wrong deployment. That deployment wasn’t resilient to the malformed requests and resulted in errors.<br /><br />To mitigate the immediate impact, requests were routed away from the failing deployment. This temporarily reduced the number of errors while the underlying issue was investigated and resolved.<br /><br />To prevent similar incidents in the future, we have enhanced validation checks for incoming requests to ensure proper handling and routing. In addition, we upgraded backend systems to provide more robust error handling and observability.<br /></p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>14:56</var> UTC</small><br><strong>Update</strong> - We are applying configuration changes to mitigate impact to Copilot Chat users.</p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>14:13</var> UTC</small><br><strong>Update</strong> - We continue to investigate the root cause of elevated errors in Copilot Chat.</p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>13:27</var> UTC</small><br><strong>Update</strong> - Copilot is experiencing degraded performance. We are continuing to investigate.</p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>13:24</var> UTC</small><br><strong>Update</strong> - We are investigating an increase in exceptions impacting Copilot Chat usage from IDEs.</p><p><small>May <var data-var='date'>13</var>, <var data-var='time'>13:23</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/20730441 2024-05-07T15:55:12Z 2024-05-08T02:43:43Z Incident with Issues, API Requests and Pull Requests <p><small>May <var data-var='date'> 7</var>, <var data-var='time'>15:55</var> UTC</small><br><strong>Resolved</strong> - Starting on May 7th, 2024 at 14:00 UTC, our elasticsearch cluster that powers Issues and Pull Requests became unresponsive correlating to a spike in usage. This affected GitHub customers who were trying to search issues and pull requests.<br /><br />We mitigated this incident by adding additional cluster members which increased the resources available to the cluster. We are working to add additional safeguards to our endpoints. We are also continuing to investigate the root cause of the instability and if the index needs to be re-sharded for future risk mitigation.</p><p><small>May <var data-var='date'> 7</var>, <var data-var='time'>15:25</var> UTC</small><br><strong>Update</strong> - We are seeing recovery, and are continuing to monitor issues and pull requests search results.</p><p><small>May <var data-var='date'> 7</var>, <var data-var='time'>15:00</var> UTC</small><br><strong>Update</strong> - We’re investigating problems with our Issues and Pull Requests search cluster that are impacting result list pages and endpoints.</p><p><small>May <var data-var='date'> 7</var>, <var data-var='time'>14:24</var> UTC</small><br><strong>Update</strong> - We're investigating page load problems with issues and pull requests.</p><p><small>May <var data-var='date'> 7</var>, <var data-var='time'>14:23</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Issues, API Requests and Pull Requests</p> tag:www.githubstatus.com,2005:Incident/20693830 2024-05-03T02:45:29Z 2024-05-03T18:06:21Z We are investigating reports of degraded performance. <p><small>May <var data-var='date'> 3</var>, <var data-var='time'>02:45</var> UTC</small><br><strong>Resolved</strong> - Starting on May 2, 2024 at 8:00 UTC through May 3 2:45 UTC the GitHub Enterprise Server (GHES) Azure Marketplace offering was degraded and customers were not able to create GHES VMs using our provided GHES images. This affected all GitHub customers that were attempting to deploy VMs in Azure using either the API or the Azure Portal. This was due to an incorrect configuration of our Azure Marketplace offering causing the images to be no longer visible to Azure users.<br /><br />We mitigated the incident by working with our partners in Azure to restore access to the affected images.<br /><br />We are working with our partners in Azure to add additional safeguards to ensure our images remain available to customers at all times. In addition, we continue to work with on restoring access to some older patch versions of GHES that remain unavailable at this time.</p><p><small>May <var data-var='date'> 3</var>, <var data-var='time'>02:37</var> UTC</small><br><strong>Update</strong> - Azure Marketplace links have been restored and we are validating the images</p><p><small>May <var data-var='date'> 3</var>, <var data-var='time'>01:44</var> UTC</small><br><strong>Update</strong> - Work is in progress to restore Azure Marketplace.</p><p><small>May <var data-var='date'> 3</var>, <var data-var='time'>01:05</var> UTC</small><br><strong>Update</strong> - GHES Images on Azure are now restored and are available via the Azure CLI. Azure Marketplace listing is still not yet available. We will provide updates on the progress of the Azure Marketplace restoration.</p><p><small>May <var data-var='date'> 3</var>, <var data-var='time'>00:32</var> UTC</small><br><strong>Update</strong> - Work is in progress to restore images. ETA is within 30 minutes.</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>23:58</var> UTC</small><br><strong>Update</strong> - Work is in progress to restore images, however there is no ETA yet for when restore will be complete</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>23:17</var> UTC</small><br><strong>Update</strong> - Work is in progress to restore images, however there is no ETA yet for when restore will be complete</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>22:43</var> UTC</small><br><strong>Update</strong> - Work in progress to restore images</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>22:09</var> UTC</small><br><strong>Update</strong> - We have identified the issue and are working to restore GHES VHD images.</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>20:26</var> UTC</small><br><strong>Update</strong> - We are actively engaged and working to mitigate the issue.</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>19:48</var> UTC</small><br><strong>Update</strong> - We have identified the root cause and are working to mitigate the issue.</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>19:07</var> UTC</small><br><strong>Update</strong> - Currently customers who use Azure Marketplace are unable to access GHES VHD images. This prevents customers who use Azure from spinning up new GHES instances (running instances are unaffected and are still able to hotpatch to new versions as well)</p><p><small>May <var data-var='date'> 2</var>, <var data-var='time'>19:07</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/20642925 2024-04-26T16:49:38Z 2024-04-30T17:59:20Z We are investigating reports of degraded performance. <p><small>Apr <var data-var='date'>26</var>, <var data-var='time'>16:49</var> UTC</small><br><strong>Resolved</strong> - On April 26th, 2024, we received several internal reports of GitHub staff being unable to login using their Passkeys. Out of an abundance of caution, we updated our public status to yellow. Further investigation discovered that a background job intended to clean up stale authenticated device records was mistakenly removing associated passkeys. As a result, some passkey registrations were incorrectly removed from some user accounts.<br />The impact from the bug in the background job was limited to a small number of passkey users (<0.5%). We have since resolved the bug in question and are working to notify all affected users.</p><p><small>Apr <var data-var='date'>26</var>, <var data-var='time'>16:49</var> UTC</small><br><strong>Update</strong> - The issue appears to be limited in scope to a few internal users, without any reports of issues from outside GitHub. We are adding additional logging to our WebAuthn flow to detect this in the future. If you cannot use your mobile passkey to sign in, please contact support or reach out to us in https://github.com/orgs/community/discussions/67791</p><p><small>Apr <var data-var='date'>26</var>, <var data-var='time'>15:10</var> UTC</small><br><strong>Update</strong> - Sign in to GitHub.com using a passkey from a mobile device is currently failing. Users may see an error message saying that passkey sign in failed, or may not see any passkeys available after signing in with their password.<br />This issue impacts both GitHub.com on mobile devices as well as cross-device authentication where the phone's passkey is used to authenticate on a desktop browser.<br />To workaround this issue, use your password and the 2FA method you setup prior to setting up your passkey, either TOTP or SMS.</p><p><small>Apr <var data-var='date'>26</var>, <var data-var='time'>15:10</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> tag:www.githubstatus.com,2005:Incident/20625392 2024-04-24T18:20:14Z 2024-04-30T13:05:34Z Incident with Pull Requests <p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>18:20</var> UTC</small><br><strong>Resolved</strong> - This incident was caused by a mitigation for <a href="https://www.githubstatus.com/incidents/7txcg0j03kkg">this incident</a>. Please follow the link for the incident summary.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>18:01</var> UTC</small><br><strong>Update</strong> - The previous mitigation has been rolled back and updates to the pull request merge button should be working again. If you are still seeing issues, please attempt refreshing the pull request page.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>17:45</var> UTC</small><br><strong>Update</strong> - One of our mitigations from the previous incident caused live updates to the pull request merge button to be disabled for some customers. Refreshing the page will update the mergability status.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>17:40</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Pull Requests</p> tag:www.githubstatus.com,2005:Incident/20621859 2024-04-24T16:16:34Z 2024-04-30T11:36:11Z Incident with Pull Requests, Git Operations, Actions, API Requests, Issues and Webhooks <p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:16</var> UTC</small><br><strong>Resolved</strong> - On April 24, 2024, between 10:26 and 16:30 UTC, the Pull Request page took longer than usual to enable a Pull Request to be merged. During the incident, the 75th percentile time to merge a pull request went from around 3 seconds to just over 6 minutes. We also saw slightly elevated error rates across the service, with an average error rate of 0.1%, peaking to 0.31% of requests. <br /><br />The underlying cause of the incident was a repeated problematic query that resulted in MySQL replicas crashing, which caused a back-up in the jobs that compute whether PRs can be merged.<br /><br />At 16:30 the incident self-mitigated when the volume of the problematic query decreased. Concurrent to this, we deployed a mitigation to remove the mergability polling code from the PR experience to alleviate pressure on MySQL servers. This mitigation caused the merge button to not automatically enable, but require a page refresh instead. At 17:40 we statused PRs again for this issue. We rolled back the mitigation to resolve the second incident at 18:20.<br /><br />Going forward, to improve mitigation we are investing in faster recovery from cascading read-replica crashes, improving query retry logic to more aggressively back off when read replicas are unhealthy, improving our ability to diagnose server crashes and trace them back to user activity, and making it easier to block problematic queries.<br /><br />To prevent recurrence, we are eliminating the query that caused the server to crash, improving our detection and mitigation of problematic queries, and investigating and remediating the underlying issues that caused MySQL to crash. We are also improving the merge button polling mechanism and mergeability background job to be resilient to this class of incident so customers can still merge PRs. <br /></p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:13</var> UTC</small><br><strong>Update</strong> - Issues is operating normally.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:13</var> UTC</small><br><strong>Update</strong> - Actions is operating normally.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:13</var> UTC</small><br><strong>Update</strong> - Pull Requests is operating normally.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:13</var> UTC</small><br><strong>Update</strong> - Webhooks is operating normally.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:13</var> UTC</small><br><strong>Update</strong> - Git Operations is operating normally.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>16:12</var> UTC</small><br><strong>Update</strong> - API Requests is operating normally.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>15:50</var> UTC</small><br><strong>Update</strong> - We are seeing site-wide recovery but continue to closely monitor our systems and putting additional mitigations in place to ensure we are back to full health.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>14:08</var> UTC</small><br><strong>Update</strong> - We are continuing to see consistent impact, and we’re continuing to work on multiple mitigations to reduce load on our systems.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>12:47</var> UTC</small><br><strong>Update</strong> - We have found an issue that may be contributing additional load to the web site and are working on mitigations. We don't see any additional impact at this time and will provide another update within an hour if we see improvements or fully mitigate the issue based on this investigation.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>12:00</var> UTC</small><br><strong>Update</strong> - We have taken some mitigations and see less than 0.3 percent of requests failing site wide but we still see elevated 500 errors and will continue to stay statused and investigate until we are confident we have restored our error rate to base line.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>11:13</var> UTC</small><br><strong>Update</strong> - We are seeing increased 500 errors for various GraphQL and REST APIs related to database issues. Some users may see periodic 500 errors. The team is looking into the problematic queries and mitigations now.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>11:09</var> UTC</small><br><strong>Update</strong> - Actions is experiencing degraded performance. We are continuing to investigate.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>11:06</var> UTC</small><br><strong>Update</strong> - Git Operations is experiencing degraded performance. We are continuing to investigate.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>10:55</var> UTC</small><br><strong>Update</strong> - Pull Requests is experiencing degraded performance. We are continuing to investigate.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>10:52</var> UTC</small><br><strong>Update</strong> - Webhooks is experiencing degraded performance. We are continuing to investigate.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>10:51</var> UTC</small><br><strong>Update</strong> - Issues is experiencing degraded performance. We are continuing to investigate.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>10:45</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for API Requests</p> tag:www.githubstatus.com,2005:Incident/20621931 2024-04-24T11:01:46Z 2024-04-24T11:01:46Z Incident with Git Operations <p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>11:01</var> UTC</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>10:56</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Git Operations</p> tag:www.githubstatus.com,2005:Incident/20571321 2024-04-18T18:47:27Z 2024-05-01T17:49:09Z Incident with Codespaces <p><small>Apr <var data-var='date'>18</var>, <var data-var='time'>18:47</var> UTC</small><br><strong>Resolved</strong> - Between 4:40 UTC and 18:47 UTC on April 18th, approximately 200 users experienced issues creating and reconnecting to Codespaces. After investigation it was determined that the nearly 100% of those users were internal to GitHub. There was very little customer impact during this incident.<br /><br />We are discussing improvements to our monitoring so we can more quickly discern internal and external impact and avoid potentially confusing public status.</p><p><small>Apr <var data-var='date'>18</var>, <var data-var='time'>18:41</var> UTC</small><br><strong>Update</strong> - Codespaces customers using our 16 core machines in West US 2 and West US 3 region may experience issues creating new Codespaces and resuming existing Codespaces. We suggest any customers experiencing issues switch to the East US region.</p><p><small>Apr <var data-var='date'>18</var>, <var data-var='time'>18:25</var> UTC</small><br><strong>Investigating</strong> - We are investigating reports of degraded performance for Codespaces</p> tag:www.githubstatus.com,2005:Incident/20551752 2024-04-17T00:48:52Z 2024-04-25T20:49:15Z Incident with Copilot <p><small>Apr <var data-var='date'>17</var>, <var data-var='time'>00:48</var> UTC</small><br><strong>Resolved</strong> - On April 16th, 2024, between 22:31 UTC and 00:11 UTC, Copilot chat users experienced elevated request errors. On average, the error rate was 1.2% and peaked at 5.2%. This was due to a rolling application upgrade applied to a backend system during a maintenance event.<br /><br />The incident was resolved once the rolling upgrade was completed.<br /><br />We are working to improve monitoring and alerting of our services, be more resilient to failures, and coordinate maintenance events to reduce our time to detection and mitigation of issues like this in the future.</p><p><small>Apr <var data-var='date'>17</var>, <var data-var='time'>00:30</var> UTC</small><br><strong>Update</strong> - We're continuing to investigate issues with Copilot</p><p><small>Apr <var data-var='date'>16</var>, <var data-var='time'>23:59</var> UTC</small><br><strong>Update</strong> - Copilot is experiencing degraded performance. We are continuing to investigate.</p><p><small>Apr <var data-var='date'>16</var>, <var data-var='time'>23:57</var> UTC</small><br><strong>Update</strong> - We're investigating issues with Copilot availability</p><p><small>Apr <var data-var='date'>16</var>, <var data-var='time'>23:51</var> UTC</small><br><strong>Investigating</strong> - We are currently investigating this issue.</p> -