<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xml" href="http://githubengineering.com/feed.xslt.xml"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="http://jekyllrb.com" version="3.3.1">Jekyll</generator><link href="http://githubengineering.com/atom.xml" rel="self" type="application/atom+xml" /><link href="http://githubengineering.com/" rel="alternate" type="text/html" /><updated>2016-12-21T18:37:27+00:00</updated><id>http://githubengineering.com//</id><title type="html">GitHub Engineering</title><subtitle>The Blog of the GitHub Engineering Team</subtitle><author><name>GitHub Engineering</name></author><entry><title type="html">Orchestrator at GitHub</title><link href="http://githubengineering.com/orchestrator-github/" rel="alternate" type="text/html" title="Orchestrator at GitHub" /><published>2016-12-08T00:00:00+00:00</published><updated>2016-12-08T00:00:00+00:00</updated><id>http://githubengineering.com/orchestrator-github</id><content type="html" xml:base="http://githubengineering.com/orchestrator-github/">&lt;p&gt;GitHub uses MySQL to store its metadata: Issues, Pull Requests, comments, organizations, notifications and so forth. While &lt;code class=&quot;highlighter-rouge&quot;&gt;git&lt;/code&gt; repository data does not need MySQL to exist and persist, GitHub’s service does. Authentication, API, and the website itself all require the availability of our MySQL fleet.&lt;/p&gt;

&lt;p&gt;Our replication topologies span multiple data centers and this poses a challenge not only for availability but also for manageability and operations.&lt;/p&gt;

&lt;h3 id=&quot;automated-failovers&quot;&gt;Automated failovers&lt;/h3&gt;

&lt;p&gt;We use a classic MySQL master-replicas setup, where the master is the single writer, and replicas are mainly used for read traffic. We expect our MySQL fleet to be available for writes. Placing a review, creating a new repository, adding a collaborator, all require write access to our backend database. We require the master to be available.&lt;/p&gt;

&lt;p&gt;To that effect we employ automated master failovers. The time it would take a human to wake up &amp;amp; fix a failed master is beyond our expectancy of availability, and operating such a failover is sometimes non-trivial. We expect master failures to be automatically detected and recovered within 30 seconds or less, and we expect failover to result with minimal loss of available hosts.&lt;/p&gt;

&lt;p&gt;We also expect to avoid false positives and false negatives. Failing over when there’s no failure is wasteful and should be avoided. Not failing over when failover should take place means an outage. Flapping is unacceptable. And so there must be a reliable detection mechanism that makes the right choice and takes a predictable course of action.&lt;/p&gt;

&lt;h3 id=&quot;orchestrator&quot;&gt;orchestrator&lt;/h3&gt;

&lt;p&gt;We employ &lt;a href=&quot;https://github.com/github/orchestrator&quot;&gt;Orchestrator&lt;/a&gt; to manage our MySQL failovers. &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; is an open source MySQL replication management and high availability solution. It observes MySQL replication topologies, auto-detects topology layout and changes, understands replication rules across configurations and versions, detects failure scenarios and recovers from master and intermediate master failures.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;3 datacenter topology&quot; src=&quot;/images/orchestrator-github/orchestrator-logo-wide.png&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;failure-detection&quot;&gt;Failure detection&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; takes a different approach to failure detection than the common monitoring tools. The common way to detect master failure is by observing the master: via ping, via simple port scan, via simple &lt;code class=&quot;highlighter-rouge&quot;&gt;SELECT&lt;/code&gt; query. These tests all suffer from the same problem: &lt;em&gt;What if there’s an error?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Network glitches can happen; the monitoring tool itself may be network partitioned. The naive solutions are along the lines of “try several times at fixed intervals, and on the n-th successive failure, assume master is failed”. While repeated polling works, they tend to lead to false positives and to increased outages: the smaller &lt;em&gt;n&lt;/em&gt; is (or the smaller the interval is), the more potential there is for a false positive: short network glitches will cause for unjustified failovers. However larger &lt;em&gt;n&lt;/em&gt; values (or longer poll intervals) will delay a true failure case.&lt;/p&gt;

&lt;p&gt;A better approach employs multiple observers, all of whom, or the majority of whom must agree that the master has failed. This reduces the danger of a single observer suffering from network partitioning.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; uses a holistic approach, utilizing the replication cluster itself. The master is not an isolated entity. It has replicas. These replicas continuously poll the master for incoming changes, copy those changes and replay them. They have their own retry count/interval setup. When &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; looks for a failure scenario, it looks at the master &lt;em&gt;and&lt;/em&gt; at all of its replicas. It knows what replicas to expect because it continuously observes the topology, and has a clear picture of how it looked like the moment before failure.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; seeks agreement between itself and the replicas: if &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; cannot reach the master, but all replicas are happily replicating and making progress, there is no failure scenario. But if the master is unreachable to &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; and all replicas say: “Hey! Replication is broken, we cannot reach the master”, our conclusion becomes very powerful: we haven’t just gathered input from multiple hosts. We have identified that the replication cluster is broken &lt;em&gt;de-facto&lt;/em&gt;. The master may be alive, it may be dead, may be network partitioned; it does not matter: the cluster does not receive updates and for all practical purposes does not function. This situation is depicted in the image below:&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;3 datacenter topology&quot; src=&quot;/images/orchestrator-github/orchestrator-failed-master.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Masters are not the only subject of failure detection: &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; employs similar logic to intermediate masters: replicas which happen to have further replicas of their own.&lt;/p&gt;

&lt;p&gt;Furthermore, &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; also considers more complex cases as having unreachable replicas or other scenarios where decision making turns more fuzzy. In some such cases, it is still confident to proceed to failover. In others, it suffices with detection notification only.&lt;/p&gt;

&lt;p&gt;We observe that &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;’s detection algorithm is very accurate. We spent a few months in testing its decision making before switching on auto-recovery.&lt;/p&gt;

&lt;h3 id=&quot;failover&quot;&gt;Failover&lt;/h3&gt;

&lt;p&gt;Once the decision to failover has been made, the next step is to choose where to failover to. That decision, too, is non trivial.&lt;/p&gt;

&lt;p&gt;In semi-sync replication environments, which &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; supports, one or more designated replicas are guaranteed to be most up-to-date. This allows one to guarantee one or more servers that would be ideal to be promoted. Enabling semi-sync is on our roadmap and we use asynchronous replication at this time. Some updates made to the master may never make it to any replicas, and there is no guarantee as for which replica will get the most recent updates. Choosing the most up-to-date replica means you lose the least data. However in the world of operations not all replicas are created equal: at any given time we may be experimenting with a recent MySQL release, that we’re not ready yet to put to production; or may be transitioning from &lt;code class=&quot;highlighter-rouge&quot;&gt;STATEMENT&lt;/code&gt; based replication to &lt;code class=&quot;highlighter-rouge&quot;&gt;ROW&lt;/code&gt; based; or have servers in a remote data center that preferably wouldn’t take writes. Or you may have a designated server of stronger hardware that you’d like to promote no matter what.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; understands all replication rules and picks a replica that makes most sense to promote based on a set of rules and the set of available servers, their configuration, their physical location and more. Depending on servers’ configuration, it is able to do a two-step promotion by first healing the topology in whatever setup is easiest, then promoting a designated or otherwise best server as master.&lt;/p&gt;

&lt;p&gt;We build trust in the failover procedure by continuously testing failovers. We intend to write more on this in a later post.&lt;/p&gt;

&lt;h3 id=&quot;anti-flapping-and-acknowledgements&quot;&gt;Anti-flapping and acknowledgements&lt;/h3&gt;

&lt;p&gt;Flapping is strictly unacceptable. To that effect &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; is configured to only perform one automated failover for any given cluster in a preconfigured time period. Once a failover takes place, the failed cluster is marked as “blocked” from further failovers. This mark is cleared after, say, &lt;code class=&quot;highlighter-rouge&quot;&gt;30&lt;/code&gt; minutes, or until a human says otherwise.&lt;/p&gt;

&lt;p&gt;To clarify, an automated master failover in the middle of the night does not mean stakeholders get to sleep it over. Pages will arrive, even as failover takes place. A human will observe the state, and may or may not acknowledge the failover as justified. Once acknowledged, &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; forgets about that failover and is free to proceed with further failovers on that cluster should the case arise.&lt;/p&gt;

&lt;h3 id=&quot;topology-management&quot;&gt;Topology management&lt;/h3&gt;

&lt;p&gt;There’s more than failovers to &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;. It allows for simplified topology management and visualization.&lt;/p&gt;

&lt;p&gt;We have multiple clusters of differing size, that span multiple datacenters (DCs). Consider the following:&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;3 datacenter topology&quot; src=&quot;/images/orchestrator-github/orchestrator-3dc.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The different colors indicate different data centers, and the above topology spans three DCs. Cross-DC network has higher latency and network calls are more expensive than within the intra-DC network, and so we typically group DC servers under a designated &lt;em&gt;intermediate master&lt;/em&gt;, aka &lt;em&gt;local DC master&lt;/em&gt;, and reduce cross-DC network traffic. In the above &lt;code class=&quot;highlighter-rouge&quot;&gt;instance-64bb&lt;/code&gt; (blue, 2nd from bottom on the right) could replicate from &lt;code class=&quot;highlighter-rouge&quot;&gt;instance-6b44&lt;/code&gt; (blue, bottom, middle) and free up some cross-DC traffic.&lt;/p&gt;

&lt;p&gt;This design leads to more complex topologies: replication trees that go deeper than one or two levels. There are more use cases to having such topologies:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Experimenting with a newer version: to test, say, MySQL &lt;code class=&quot;highlighter-rouge&quot;&gt;5.7&lt;/code&gt; we create a subtree of &lt;code class=&quot;highlighter-rouge&quot;&gt;5.7&lt;/code&gt; servers, with one acting as an intermediate master. This allows us to test &lt;code class=&quot;highlighter-rouge&quot;&gt;5.7&lt;/code&gt; replication flow and speed.&lt;/li&gt;
  &lt;li&gt;Migrating from &lt;code class=&quot;highlighter-rouge&quot;&gt;STATEMENT&lt;/code&gt; based replication to &lt;code class=&quot;highlighter-rouge&quot;&gt;ROW&lt;/code&gt; based replication: we again migrate slowly by creating subtrees, adding more and more nodes to those trees until they consume the entire topology.&lt;/li&gt;
  &lt;li&gt;By way of simplifying automation: a newly provisioned host, or a host restored from backup, is set to replicate from the backup server whose data was used to restore the host.&lt;/li&gt;
  &lt;li&gt;Data partitioning is achieved by incubating and splitting out new clusters, originally dangling as sub-clusters then becoming independent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deep nested replication topologies introduce management complexity:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;All intermediate masters turn to be point of failure for their nested subtrees.&lt;/li&gt;
  &lt;li&gt;Recoveries in mixed-versions topologies or mixed-format topologies are subject to cross-version or cross-format replication constraints. Not any server can replicate from any other.&lt;/li&gt;
  &lt;li&gt;Maintenance requires careful refactoring of the topology: you can’t just take down a server to upgrade its hardware; if it serves as a local/intermediate master taking it offline would break replication on its own replicas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; allows for easy and safe refactoring and management of such complex topologies:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;It can failover dead intermediate masters, eliminating the “point of failure” problem.&lt;/li&gt;
  &lt;li&gt;Refactoring (moving replicas around the topology) is made easy via GTID or &lt;a href=&quot;https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md&quot;&gt;Pseudo-GTID&lt;/a&gt; (an application level injection of sparse GTID-like entries).&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; understands replication rules and will refuse to place, say, a &lt;code class=&quot;highlighter-rouge&quot;&gt;5.6&lt;/code&gt; server below a &lt;code class=&quot;highlighter-rouge&quot;&gt;5.7&lt;/code&gt; server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; also serves as the de-facto topology state/inventory indicator. It complements &lt;code class=&quot;highlighter-rouge&quot;&gt;puppet&lt;/code&gt; or service discoveries configuration which imply &lt;em&gt;desired&lt;/em&gt; state, by actually observing the &lt;em&gt;existing&lt;/em&gt; state. State is queryable at various levels, and we employ &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; at some of our automation tasks.&lt;/p&gt;

&lt;h3 id=&quot;chatops-integration&quot;&gt;Chatops integration&lt;/h3&gt;

&lt;p&gt;We love our chatops as they make our operations visible and accessible to our greater group of engineers.
While the orchestrator service provides a web interface, we rarely use it; one’s browser is her own private command center, with no visibility to others and no history.&lt;/p&gt;

&lt;p&gt;We rely on chatops for most operations. As a quick example of visibility we get by chatops, let’s examine a cluster:&lt;/p&gt;

&lt;div class=&quot;chat&quot; style=&quot;margin: 30px 0;&quot;&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .orc cluster sample-cluster
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
&lt;pre&gt;
host                     lag  status  version          mode  format     extra
---                      ---  ---     ---              ---   ---        ---
instance-e854             0s  ok      5.6.26-74.0-log  rw    STATEMENT  &amp;gt;&amp;gt;,P-GTID
+ instance-fadf           0s  ok      5.6.26-74.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-9d3d         0s  ok      5.6.31-77.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-8125         0s  ok      5.6.31-77.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
+ instance-b982           0s  ok      5.6.26-74.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
+ instance-c5a7           0s  ok      5.6.31-77.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-64bb         0s  ok      5.6.31-77.0-log  rw    nobinlog   P-GTID
+ instance-6b44           0s  ok      5.6.31-77.0-log  rw    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-cac3     14400s  ok      5.6.31-77.0-log  rw    STATEMENT  &amp;gt;&amp;gt;,P-GTID
&lt;/pre&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Say we wanted to upgrade &lt;code class=&quot;highlighter-rouge&quot;&gt;instance-fadf&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;5.6.31-77.0-log&lt;/code&gt;. It has two replicas attached, that I don’t want to be affected. We can:&lt;/p&gt;

&lt;div class=&quot;chat&quot; style=&quot;margin: 30px 0;&quot;&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .orc relocate-replicas instance-fadf below instance-c5a7
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
&lt;pre&gt;
instance-9d3d
instance-8125
&lt;/pre&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;To the effect of:&lt;/p&gt;

&lt;div class=&quot;chat&quot; style=&quot;margin: 30px 0;&quot;&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .orc cluster sample-cluster
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
&lt;pre&gt;
host                     lag  status  version          mode  format     extra
---                      ---  ---     ---              ---   ---        ---
instance-e854             0s  ok      5.6.26-74.0-log  rw    STATEMENT  &amp;gt;&amp;gt;,P-GTID
+ instance-fadf           0s  ok      5.6.26-74.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
+ instance-b982           0s  ok      5.6.26-74.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
+ instance-c5a7           0s  ok      5.6.31-77.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-9d3d         0s  ok      5.6.31-77.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-8125         0s  ok      5.6.31-77.0-log  ro    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-64bb         0s  ok      5.6.31-77.0-log  rw    nobinlog   P-GTID
+ instance-6b44           0s  ok      5.6.31-77.0-log  rw    STATEMENT  &amp;gt;&amp;gt;,P-GTID
  + instance-cac3     14400s  ok      5.6.31-77.0-log  rw    STATEMENT  &amp;gt;&amp;gt;,P-GTID
&lt;/pre&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;The instance is now free to be taken &lt;a href=&quot;http://githubengineering.com/context-aware-mysql-pools-via-haproxy&quot;&gt;out of the pool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Other actions are available to us via chatops. We can force a failover, acknowledge recoveries, query topology structure etc. &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; further communicates with us on chat, and notifies us in the event of a failure/recovery.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; also runs as a command-line tool, and the &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; service supports web API, and so can easily participate in automated tasks.&lt;/p&gt;

&lt;h3 id=&quot;orchestrator--github&quot;&gt;orchestrator @ GitHub&lt;/h3&gt;

&lt;p&gt;GitHub has adopted &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;, and will continue to improve and maintain it. The &lt;a href=&quot;https://github.com/github/orchestrator&quot;&gt;github repo&lt;/a&gt; will serve as the new upstream and will accept issues and pull requests from the community.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; continues to be free and open source, and is released under the &lt;a href=&quot;https://github.com/github/orchestrator/blob/master/LICENSE&quot;&gt;Apache License 2.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Migrating the project to the &lt;a href=&quot;https://github.com/github/orchestrator&quot;&gt;GitHub repo&lt;/a&gt; had the unfortunate result of diverging from the original &lt;a href=&quot;https://github.com/outbrain/orchestrator/&quot;&gt;Outbrain repo&lt;/a&gt;, due to the way import paths are coupled with repo URI in &lt;code class=&quot;highlighter-rouge&quot;&gt;golang&lt;/code&gt;. The two diverged repositories will not be kept in sync; and we took the opportunity to make some further diverging changes, though made sure to keep API &amp;amp; command line spec compatible. We’ll keep an eye for incoming Issues on the Outbrain repo.&lt;/p&gt;

&lt;h3 id=&quot;outbrain&quot;&gt;Outbrain&lt;/h3&gt;

&lt;p&gt;It is our pleasure to acknowledge &lt;a href=&quot;http://www.outbrain.com/&quot;&gt;Outbrain&lt;/a&gt; as the original author of &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;. The project originated at Outbrain while seeking to manage a growing fleet of servers in three data centers. It began as a means to visualize the existing topologies, with minimal support for refactoring, and came at a time where massive hardware upgrades and datacenter changes were taking place. &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; was used as the tool for refactoring and for ensuring topology setups went as planned and without interruption to service, even as servers were being provisioned or retired.&lt;/p&gt;

&lt;p&gt;Later on Pseudo-GTID was introduced to overcome the problems of unreachable/crashing/lagging intermediate masters, and shortly afterwards recoveries came into being. &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; was put to production in very early stages and worked on busy and sensitive systems.&lt;/p&gt;

&lt;p&gt;Outbrain was happy to develop &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; as a public open source project and provided the resources to allow its development, not only to the specific benefits of the company, but also to the wider community. Outbrain authors many more open source projects, which can be found on their GitHub’s &lt;a href=&quot;https://outbrain.github.io/&quot;&gt;Outbrain engineering&lt;/a&gt; page.&lt;/p&gt;

&lt;p&gt;We’d like to thank Outbrain for their contributions to &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;, as well as for their openness to having us adopt the project.&lt;/p&gt;

&lt;h3 id=&quot;further-acknowledgements&quot;&gt;Further acknowledgements&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; was later developed at &lt;a href=&quot;http://www.booking.com&quot;&gt;Booking.com&lt;/a&gt;. It was brought in to improve on the existing high availability scheme. &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;’s flexibility allowed for simpler hardware setup and faster failovers. It was fortunate to enjoy the large MySQL setup Booking.com employs, managing various MySQL vendors, versions, configurations, running on clusters ranging from a single master to many hundreds of MySQL servers and Binlog Servers on multiple data centers. Booking.com continuously contributes to &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We’d like to further acknowledge major community contributions made by Google/&lt;a href=&quot;http://vitess.io&quot;&gt;Vitess&lt;/a&gt; (&lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt; is the &lt;a href=&quot;http://vitess.io/user-guide/server-configuration.html#orchestrator&quot;&gt;failover mechanism&lt;/a&gt; used by Vitess), and by &lt;a href=&quot;https://squareup.com/&quot;&gt;Square, Inc&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;related-projects&quot;&gt;Related projects&lt;/h3&gt;

&lt;p&gt;We are working to release a public &lt;code class=&quot;highlighter-rouge&quot;&gt;puppet&lt;/code&gt; module for &lt;code class=&quot;highlighter-rouge&quot;&gt;orchestrator&lt;/code&gt;, and will edit this post once released.&lt;/p&gt;

&lt;p&gt;Chef users, please consider this &lt;a href=&quot;https://github.com/sendgrid-ops/chef-orchestrator&quot;&gt;Chef cookbook&lt;/a&gt; by &lt;a href=&quot;https://github.com/silviabotros&quot;&gt;@silviabotros&lt;/a&gt;.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;shlomi-noach&quot;, &quot;fullname&quot;=&gt;&quot;Shlomi Noach&quot;, &quot;twitter&quot;=&gt;&quot;ShlomiNoach&quot;, &quot;role&quot;=&gt;&quot;Senior Infrastructure Engineer&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;Website&quot;, &quot;url&quot;=&gt;&quot;http://openark.org&quot;}, {&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/shlomi-noach&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/ShlomiNoach&quot;}]}</name></author><summary type="html">GitHub uses MySQL to store its metadata: Issues, Pull Requests, comments, organizations, notifications and so forth. While git repository data does not need MySQL to exist and persist, GitHub’s service does. Authentication, API, and the website itself all require the availability of our MySQL fleet.</summary></entry><entry><title type="html">How we made diff pages three times faster</title><link href="http://githubengineering.com/how-we-made-diff-pages-3x-faster/" rel="alternate" type="text/html" title="How we made diff pages three times faster" /><published>2016-12-06T00:00:00+00:00</published><updated>2016-12-06T00:00:00+00:00</updated><id>http://githubengineering.com/how-we-made-diff-pages-3x-faster</id><content type="html" xml:base="http://githubengineering.com/how-we-made-diff-pages-3x-faster/">&lt;p&gt;We serve a lot of diffs here at GitHub. Because it is computationally
expensive to generate and display a diff, we’ve traditionally had to apply
some very conservative limits on what gets loaded. We knew
we could do better, and we set out to do so.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;total diff runs per hour&quot; src=&quot;/images/progressive-diffs/total-diffs.png&quot; /&gt;
&lt;/div&gt;

&lt;h2 id=&quot;historical-approach-and-problems&quot;&gt;Historical approach and problems&lt;/h2&gt;

&lt;p&gt;Before this change, we fetched diffs by asking Git for the diff between two
commit objects. We would then parse the output, checking it against the various
limits we had in place. At the time they were as follows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Up to 300 files in total.&lt;/li&gt;
  &lt;li&gt;Up to 100KB of diff text per file.&lt;/li&gt;
  &lt;li&gt;Up to 1MB of diff text overall.&lt;/li&gt;
  &lt;li&gt;Up to 3,000 lines of diff text per file.&lt;/li&gt;
  &lt;li&gt;Up to 20,000 lines of diff text overall.&lt;/li&gt;
  &lt;li&gt;An overall RPC timeout of up to eight seconds, though in some places it would
be adjusted to fit within the remaining time allotted to the request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These limits were in place to both prevent excessive load on the file servers,
as well as prevent the browser’s DOM from growing too large and making the web
page less responsive.&lt;/p&gt;

&lt;p&gt;In practice, our limits did a pretty good job of protecting our servers and
users’ web browsers from being overloaded. But because these limits were applied
in the order Git handed us back the diff text, it was possible for a diff to be
truncated before we reached the interesting parts. Unfortunately, users had to
fall back to command-line tools to see their changes in these cases.&lt;/p&gt;

&lt;p&gt;Finally, we had timeouts happening far more frequently than we liked. Regardless
of the size of the requested diff, we shouldn’t force the user to wait up to
eight seconds before responding, and even then occasionally with an error
message.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;Diff page timeouts before progressive diff&quot; src=&quot;/images/progressive-diffs/previous-timeouts.png&quot; /&gt;
&lt;/div&gt;

&lt;h2 id=&quot;our-goals&quot;&gt;Our Goals&lt;/h2&gt;

&lt;p&gt;Our main goal was to improve the user experience around (re)viewing
diffs on GitHub:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Allow users to (re)view the changes that matter, rather than just whatever
appears before the diff is truncated.&lt;/li&gt;
  &lt;li&gt;Reduce request timeouts due to very large diffs.&lt;/li&gt;
  &lt;li&gt;Pave the way for previously inaccessible optimizations (e.g. avoid loading
suppressed diffs).&lt;/li&gt;
  &lt;li&gt;Reduce unnecessary load on &lt;a href=&quot;http://githubengineering.com/introducing-dgit/&quot;&gt;GitHub’s storage infrastructure&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Improve accuracy of diff statistics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;a-new-approach&quot;&gt;A new approach&lt;/h2&gt;

&lt;p&gt;To achieve the aforementioned goals, we had to come up with a new and better
approach to handling large diffs. We wanted a solution that would allow us to
get a high-level overview of all changes in a diff, and then load the patch
texts for the individual changed files “progressively”. These discrete sections
could later be assembled by the user’s browser.&lt;/p&gt;

&lt;p&gt;But to achieve this without disrupting the user experience, our new solution
also needed to be flexible enough to load and display diffs identically to how
we were doing it in production to date. We wanted to verify accuracy and monitor
any performance impact by running the old and new diff loading strategies in
production, side-by-side, before changing to the new progressive loading
strategy.&lt;/p&gt;

&lt;p&gt;Lucky for us, Git provides an excellent plumbing command called &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;diff-table-of-contents-with-git-diff-tree&quot;&gt;Diff “table of contents” with &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree&lt;/code&gt; is a low-level (plumbing) &lt;code class=&quot;highlighter-rouge&quot;&gt;git&lt;/code&gt; command that can be used to
compare the contents of two tree objects and output the comparison result in
different ways.&lt;/p&gt;

&lt;p&gt;The default output format is &lt;code class=&quot;highlighter-rouge&quot;&gt;--raw&lt;/code&gt;, which prints a list of changed files:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; git-diff-tree --raw -r --find-renames HEAD~ HEAD
:100644 100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 441624ae5d2a2cd192aab3ad25d3772e428d4926 M	fileA
:100644 100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 4ea306ce50a800061eaa6cd1654968900911e891 M	fileB
:100644 100644 7c4ede99d4fefc414a3f7d21ecaba1cbad40076b fb3f68e3ca24b2daf1a0575d08cd6fe993c3f287 M	fileC
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Using &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --raw&lt;/code&gt; we could determine what changed at a high level very
quickly, without the overhead of generating patch text. We could then later
paginate through this list of changes, or “deltas”, and load the exact patch
data for each “page” by specifying a subset of the deltas’ paths to
&lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --patch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To better understand the obvious performance overhead of calling two git
commands instead of one, and to ensure that we wouldn’t cause any regressions in
the returned data, we initially focused on generating the same output as a plain
call to &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --patch&lt;/code&gt;, by calling &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --raw&lt;/code&gt; and then
feeding all returned paths back into &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --patch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We started a &lt;a href=&quot;https://github.com/github/scientist&quot;&gt;Scientist&lt;/a&gt;
experiment which ran both algorithms in parallel, comparing accuracy and
timing. This gave us detailed information on cases where results were not as
expected, and allowed us to keep an eye on performance.&lt;/p&gt;

&lt;p&gt;As expected, our new algorithm, which was replacing something that hadn’t been
materially refactored in years, had many mismatches and performance was
worse than before.&lt;/p&gt;

&lt;p&gt;Most of the issues that we found were simply unexpected behaviors of the old
code under certain conditions. We meticulously emulated these corner cases,
until we were left only with mismatches related to rename detection in
&lt;code class=&quot;highlighter-rouge&quot;&gt;git diff&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;fetching-diff-text-with-git-diff-pairs&quot;&gt;Fetching diff text with &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-pairs&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;Loading the patch text from a set of deltas sounds like it should have been a
pretty straightforward operation. We had the list of paths that changed, and
just needed to look up the patch texts for these paths. What could possibly go
wrong?&lt;/p&gt;

&lt;p&gt;In our first attempt we loaded the diffs by passing the first 300 paths from our
deltas to &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --patch&lt;/code&gt;. This emulated our existing behaviour, - and
we unexpectedly ran into rare mismatches. Curiously, these mismatches were all
related to renames, but only when multiple files containing the same or very
similar contents got renamed in the same diff.&lt;/p&gt;

&lt;p&gt;This happened because rename detection in git is based on the contents of the
tree that it is operating on, and by looking at only a subset of the original
tree, git’s rename detection was failing to match renames as expected.&lt;/p&gt;

&lt;p&gt;To preserve the rename associations from the initial &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --raw&lt;/code&gt; run,
&lt;a href=&quot;http://github.com/peff&quot;&gt;@peff&lt;/a&gt; added a &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-pairs&lt;/code&gt; command to our fork of
Git. Provided a set of blob object IDs (provided by the deltas) it returns
the corresponding diff text, exactly what we needed.&lt;/p&gt;

&lt;p&gt;On a high level, the process for generating a diff in Git is as follows:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Do a tree-wide diff, generating modified pairs, or added/deleted paths (which
are just considered pairs with a null before/after state).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Run various algorithms on the whole set of pairs, like rename detection. This
is just linking up adds and deletes of similar content.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;For each pair, output it in the appropriate format (we’re interested in
&lt;code class=&quot;highlighter-rouge&quot;&gt;--patch&lt;/code&gt;, obviously).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-pairs&lt;/code&gt; lets you take the output from step 2, and feed it individually
into step 3.&lt;/p&gt;

&lt;p&gt;With this new function in place, we were finally able to get our performance and
accuracy to a point where we could transparently switch to this new diff method
without negative user impact.&lt;/p&gt;

&lt;p&gt;If you’re interested in viewing or contributing to the source for
&lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-pairs&lt;/code&gt; we submitted it upstream &lt;a href=&quot;http://public-inbox.org/git/20161201204042.6yslbyrg7l6ghhww@sigill.intra.peff.net/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;change-statistics-with-git-diff-tree---numstat---shortstat&quot;&gt;Change statistics with &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --numstat --shortstat&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;GitHub displays line change statistics for both the entire diff and each
delta. Generating the line change statistics for a diff can be a very
costly operation, depending on the size and contents of the diff. However, it is
very useful to have summary statistics on a diff at a glance so that the user
can have a good overview of the changes involved.&lt;/p&gt;

&lt;p&gt;Historically we counted the changes in the patch text as we processed it so that
only one diff operation would need to run to display a diff. This operation and
its results were cached so performance was optimal. However, in the case of
truncated diffs there were changes that were never seen and therefore not
included in these statistics. This was done to give us better performance at the
cost of slightly inaccurate total counts for large diffs.&lt;/p&gt;

&lt;p&gt;With our move to progressive diffs, it would become increasingly likely that we
would only ever be looking at a part of the diff at any one time so the counts
would be inaccurate most of the time instead of rarely.&lt;/p&gt;

&lt;p&gt;To address this problem we decided to collect the statistics for the entire diff
using &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree --numstat --shortstat&lt;/code&gt;. This would not only solve the
problem of dealing with partial diffs, but also make the counts accurate in
cases where they would have been incorrect before.&lt;/p&gt;

&lt;p&gt;The downside of this change is that Git was now potentially running the entire
diff twice. We determined this was acceptable, however as the remaining diff
processing for presentation was far more resource intensive. Also, with
progressive diffs, it was entirely probable that many larger diffs would never
have the second pass since those deltas might never be loaded anyway.&lt;/p&gt;

&lt;p&gt;Due to the nature of how &lt;code class=&quot;highlighter-rouge&quot;&gt;git-diff-tree&lt;/code&gt; works, we were even able to combine the
call for these statistics with the call for deltas into a single command, to
further improve performance. This is because Git already needed to perform a
full diff in order to determine what the statistics were, so having it also
print the tree diff information is essentially free.&lt;/p&gt;

&lt;h3 id=&quot;patches-in-batches-a-whole-new-diff&quot;&gt;Patches in batches: a whole new diff&lt;/h3&gt;

&lt;p&gt;For the initial request of a page containing a diff, we first fetched the deltas
along with the diff statistics. Next we fetched as much diff text as we could,
but with significantly reduced limits compared to before.&lt;/p&gt;

&lt;p&gt;To determine optimal limits, we turned to some of our copious internal
metrics. We wanted results as quickly as possible, but we also wanted a solution
which would display the full diff in “most” cases. Some of the information
our metrics revealed was:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;81% of viewed diffs contain changes to fewer than ten files.&lt;/li&gt;
  &lt;li&gt;52% of viewed diffs contain only changes to one or two files.&lt;/li&gt;
  &lt;li&gt;80% of viewed diffs have fewer than 20KB of patch text.&lt;/li&gt;
  &lt;li&gt;90% of viewed diffs have fewer than 1000 lines of patch text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From these, it was clear a great number of diffs only involved a handful of
changes. If we set our new limits with these metrics in mind, we could continue
to be very fast in most cases while significantly improving performance in
previously slow or inaccessible diffs.&lt;/p&gt;

&lt;p&gt;In the end, we settled on the following for the initial request for a diff page:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Up to 400 lines of diff text.&lt;/li&gt;
  &lt;li&gt;Up to 20KB of diff text.&lt;/li&gt;
  &lt;li&gt;A request cycle dependent timeout.&lt;/li&gt;
  &lt;li&gt;A maximum individual patch size of 400 lines or 20KB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allowed the initial request for a large diff to be &lt;em&gt;much&lt;/em&gt; faster, and the
rest of the diff to automatically load after the first batch of patches was
already rendered.&lt;/p&gt;

&lt;p&gt;After one of the limits on patch text was reached during asynchronous batch
loading, we simply render the deltas without their diff text and a “load diff”
button to retrieve the patch as needed.&lt;/p&gt;

&lt;p&gt;Overall, the effective limits we enforce for the &lt;em&gt;entire&lt;/em&gt; diff became:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Up to 3,000 files.&lt;/li&gt;
  &lt;li&gt;Up to 60,000,000 lines (not loaded automatically).&lt;/li&gt;
  &lt;li&gt;Up to 3GB of diff text (also not loaded automatically).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these changes, you got more of the diff you needed in less time than ever
before. Of course, viewing a 60,000,000 line diff would require the user to
press the “load diff” button more than a couple thousand times.&lt;/p&gt;

&lt;p&gt;The benefits to this approach were a clear win. The number of diff timeouts
dropped almost immediately.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;Diff page timeouts after progressive diff&quot; src=&quot;/images/progressive-diffs/timeouts-after.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Additionally, the higher percentile performance of our main diffs pages improved
by nearly 3x!&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;compare page performance after progressive diff&quot; src=&quot;/images/progressive-diffs/compare-view-after.png&quot; /&gt;
&lt;/div&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;pull request files tab performance after progressive diff&quot; src=&quot;/images/progressive-diffs/pulls-view-after.png&quot; /&gt;
&lt;/div&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;commit view performance after progressive diff&quot; src=&quot;/images/progressive-diffs/commit-view-after.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Our diff pages pages were traditionally among our worst performing, so the
performance win was even noticeable on our high percentile graph for overall
requests’ performance &lt;em&gt;across the entire site&lt;/em&gt;, shaving off around 3.5s from
the 99.9th percentile:&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;overall high percentile performance&quot; src=&quot;/images/progressive-diffs/overall-after.png&quot; /&gt;
&lt;/div&gt;

&lt;h2 id=&quot;looking-to-the-future&quot;&gt;Looking to the future&lt;/h2&gt;

&lt;p&gt;This new approach opens the door to new types of optimizations and interface
ideas that weren’t possible before. We’ll be continuing to improve how we fetch
and render diffs, making them more useful and responsive.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;brianmario&quot;, &quot;fullname&quot;=&gt;&quot;Brian Lopez&quot;, &quot;role&quot;=&gt;&quot;Application Engineering Manager&quot;, &quot;twitter&quot;=&gt;&quot;brianmario&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/brianmario&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/brianmario&quot;}]}</name></author><summary type="html">We serve a lot of diffs here at GitHub. Because it is computationally
expensive to generate and display a diff, we’ve traditionally had to apply
some very conservative limits on what gets loaded. We knew
we could do better, and we set out to do so.</summary></entry><entry><title type="html">GLB part 2: HAProxy zero-downtime, zero-delay reloads with multibinder</title><link href="http://githubengineering.com/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/" rel="alternate" type="text/html" title="GLB part 2: HAProxy zero-downtime, zero-delay reloads with multibinder" /><published>2016-12-01T00:00:00+00:00</published><updated>2016-12-01T00:00:00+00:00</updated><id>http://githubengineering.com/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder</id><content type="html" xml:base="http://githubengineering.com/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/">&lt;p&gt;Recently we &lt;a href=&quot;http://githubengineering.com/introducing-glb/&quot;&gt;introduced GLB&lt;/a&gt;, the GitHub Load Balancer that powers GitHub.com. The GLB proxy tier, which handles TCP connection and TLS termination is powered by &lt;a href=&quot;http://www.haproxy.org/&quot;&gt;HAProxy&lt;/a&gt;, a reliable and high performance TCP and HTTP proxy daemon. As part of the design of GLB, we set out to solve a few of the common issues found when using HAProxy at scale.&lt;/p&gt;

&lt;p&gt;Prior to GLB, each host ran a single monolithic instance of HAProxy for all our public services, with frontends for each external IP set, and backends for each backing service. With the number of services we run, this became unwieldy, our configuration was over one thousand lines long with many interdependent ACLs and no modularization. Migrating to GLB we decided to split the configuration per-service and support running multiple isolated load balancer instances on a single machine. Additionally, we wanted to be able to update a single HAProxy configuration easily without any downtime, additional latency on connections or disrupting any other HAProxy instance on the host. Today we are releasing our solution to this problem, &lt;a href=&quot;https://github.com/github/multibinder&quot;&gt;multibinder&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;haproxy-almost-safe-reloads&quot;&gt;HAProxy almost-safe reloads&lt;/h2&gt;

&lt;p&gt;HAProxy uses the SO_REUSEPORT socket option, which allows multiple processes to create LISTEN sockets on the same IP/port combination. The Linux kernel then balances new connections between all available LISTEN sockets. In this diagram, we see the initial stage of an HAProxy reload starting with a single process (left) and then causing a second process to start (right) which binds to the same IP and port, but with a different socket:&lt;/p&gt;

&lt;div style=&quot;text-align:center; padding: 10px 0px;&quot;&gt;
&lt;img alt=&quot;Forking a second HAProxy by default&quot; src=&quot;/images/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/1-fork.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;This works great so far, until the original process terminates. HAProxy sends a signal to the original process stating that the new process is now &lt;code class=&quot;highlighter-rouge&quot;&gt;accept()&lt;/code&gt;ing and handling connections (left), which causes it to stop accepting new connections and close its own socket before eventually exiting once all connections complete (right):&lt;/p&gt;

&lt;div style=&quot;text-align:center; padding: 10px 0px;&quot;&gt;
&lt;img alt=&quot;Lost connections on termination&quot; src=&quot;/images/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/2-lost-conns.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Unfortunately there’s a small period between when this process last calls &lt;code class=&quot;highlighter-rouge&quot;&gt;accept()&lt;/code&gt; and when it calls &lt;code class=&quot;highlighter-rouge&quot;&gt;close()&lt;/code&gt; where the kernel will still route some new connections to the original socket. The code then blindly continues to close the socket, and all connections that were queued up in that LISTEN socket get discarded (because &lt;code class=&quot;highlighter-rouge&quot;&gt;accept()&lt;/code&gt; is never called for them):&lt;/p&gt;

&lt;div style=&quot;text-align:center; padding: 10px 0px;&quot;&gt;
&lt;img alt=&quot;Dropped connections between accept() and close()&quot; src=&quot;/images/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/3-accept-close.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;For small scale sites, the chance of a new connection arriving in the few microseconds between these calls is very low. Unfortunately at the scale we run HAProxy, a customer impacting number of connections would hit this issue each and every time we reload HAProxy. Previously we used the official solution offered by HAProxy, dropping SYN packets during this small window, causing the client to retry the SYN packet shortly afterwards. Other &lt;a href=&quot;https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html&quot;&gt;potential solutions&lt;/a&gt; to the same problem include using &lt;code class=&quot;highlighter-rouge&quot;&gt;tc qdisc&lt;/code&gt; to stall the SYN packets as they come in, and then un-stall the queue once the reload is complete. During development of GLB, we weren’t satisfied with either solution and sought out one without any queue delays and sharing of the same LISTEN socket.&lt;/p&gt;

&lt;h2 id=&quot;supporting-zero-downtime-zero-delay-reloads&quot;&gt;Supporting zero-downtime, zero-delay reloads&lt;/h2&gt;

&lt;p&gt;The way other services typically support zero-downtime reloads is to share a LISTEN socket, usually by having a parent process that holds the socket open and &lt;code class=&quot;highlighter-rouge&quot;&gt;fork()&lt;/code&gt;s the service when it needs to reload, leaving the socket open for the new process to consume. This creates a slightly different situation, where the kernel has a single LISTEN socket and clients are queued for &lt;code class=&quot;highlighter-rouge&quot;&gt;accept()&lt;/code&gt; by either process. The file descriptors in each process may be different, but they will point to the same in-kernel socket structure.&lt;/p&gt;

&lt;p&gt;In this scenario, a new process would be started that inherits the same LISTEN socket (left), and when the original pid stops calling &lt;code class=&quot;highlighter-rouge&quot;&gt;accept()&lt;/code&gt;, connections remain queued for the new process to process because the kernel LISTEN socket and queue are shared (right):&lt;/p&gt;

&lt;div style=&quot;text-align:center; padding: 10px 0px;&quot;&gt;
&lt;img alt=&quot;Ideal socket sharing method&quot; src=&quot;/images/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/4-share-socket.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Unfortunately, HAProxy doesn’t support this method directly. We considered patching HAProxy to add built-in support but found that the architecture of HAProxy favours process isolation and non-dynamic configuration, making it a non-trivial architectural change. Instead, we created &lt;a href=&quot;https://github.com/github/multibinder&quot;&gt;multibinder&lt;/a&gt; to solve this problem generically for any daemon that needs zero-downtime reload capabilities, and integrated it with HAProxy by using a few tricks with existing HAProxy configuration directives to get the same result.&lt;/p&gt;

&lt;p&gt;Multibinder is similar to other file-descriptor sharing services such as &lt;a href=&quot;https://github.com/stripe/einhorn&quot;&gt;einhorn&lt;/a&gt;, except that it runs as an isolated service and process tree on the system, managed by your usual process manager. The actual service, in this case HAProxy, runs separately as another service, rather than as a child process. When HAProxy is started, a small wrapper script calls out to multibinder and requests the existing LISTEN socket to be sent using &lt;a href=&quot;http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch14lev1sec6.html&quot;&gt;Ancillary Data&lt;/a&gt; over an UNIX Domain Socket. The flow looks something like the following:&lt;/p&gt;

&lt;div style=&quot;text-align:center; padding: 10px 0px;&quot;&gt;
&lt;img alt=&quot;Multibinder reload flow&quot; src=&quot;/images/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/5-ancillary.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Once the socket is provided to the HAProxy wrapper, it leaves the LISTEN socket in the file descriptor table and writes out the HAProxy configuration file from an ERB template, injecting the file descriptors using &lt;a href=&quot;http://cbonte.github.io/haproxy-dconv/1.6/configuration.html#bind&quot;&gt;file descriptor binds&lt;/a&gt; like &lt;code class=&quot;highlighter-rouge&quot;&gt;fd@N&lt;/code&gt; (where N is the file descriptor received from multibinder), then calls &lt;code class=&quot;highlighter-rouge&quot;&gt;exec()&lt;/code&gt; to launch HAProxy which uses the provided file descriptor rather than creating a new socket, thus inheriting the same LISTEN socket. From here, we get the ideal setup where the original HAProxy process can stop calling &lt;code class=&quot;highlighter-rouge&quot;&gt;accept()&lt;/code&gt; and connections simply queue up for the new process to handle.&lt;/p&gt;

&lt;div style=&quot;text-align:center; padding: 10px 0px;&quot;&gt;
&lt;img alt=&quot;Multibinder LISTEN socket sharing diagram&quot; src=&quot;/images/glb-part-2-haproxy-zero-downtime-zero-delay-reloads-with-multibinder/6-success.png&quot; /&gt;
&lt;/div&gt;

&lt;h2 id=&quot;example--multiple-instances&quot;&gt;Example &amp;amp; multiple instances&lt;/h2&gt;

&lt;p&gt;Along with the release of multibinder, we’re also providing examples of &lt;a href=&quot;https://github.com/github/multibinder/tree/master/haproxy&quot;&gt;running multiple HAProxy instances with multibinder&lt;/a&gt; leveraging systemd service templates. Following these instructions you can launch a set of HAProxy servers using separate configuration files, each using the same system-wide multibinder instance to request their binds and having true zero-downtime, zero-delay reloads.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;joewilliams&quot;, &quot;fullname&quot;=&gt;&quot;Joe Williams&quot;, &quot;role&quot;=&gt;&quot;Senior Infrastructure Engineer&quot;, &quot;twitter&quot;=&gt;&quot;williamsjoe&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/joewilliams&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/williamsjoe&quot;}, {&quot;name&quot;=&gt;&quot;Blog&quot;, &quot;url&quot;=&gt;&quot;http://joeandmotorboat.com/&quot;}]}</name></author><summary type="html">Recently we introduced GLB, the GitHub Load Balancer that powers GitHub.com. The GLB proxy tier, which handles TCP connection and TLS termination is powered by HAProxy, a reliable and high performance TCP and HTTP proxy daemon. As part of the design of GLB, we set out to solve a few of the common issues found when using HAProxy at scale.</summary></entry><entry><title type="html">octocatalog-diff: GitHub’s Puppet development and testing tool</title><link href="http://githubengineering.com/octocatalog-diff-github-s-puppet-development-and-testing-tool/" rel="alternate" type="text/html" title="octocatalog-diff: GitHub's Puppet development and testing tool" /><published>2016-10-20T00:00:00+00:00</published><updated>2016-10-20T00:00:00+00:00</updated><id>http://githubengineering.com/octocatalog-diff-github-s-puppet-development-and-testing-tool</id><content type="html" xml:base="http://githubengineering.com/octocatalog-diff-github-s-puppet-development-and-testing-tool/">&lt;p&gt;Today we are announcing the open source release of &lt;a href=&quot;http://github.com/github/octocatalog-diff&quot;&gt;octocatalog-diff&lt;/a&gt;: GitHub’s Puppet development and testing tool.&lt;/p&gt;

&lt;p&gt;GitHub uses &lt;a href=&quot;https://puppet.com/&quot;&gt;Puppet&lt;/a&gt; to configure the infrastructure that powers GitHub.com, comprised of hundreds of roles deployed on thousands of nodes. Each change to Puppet code must be validated to ensure not only that it serves the intended purpose for the role at hand, but also to avoid causing unexpected side effects on other roles. GitHub employs automated Continuous Integration testing and manual deployment testing for Puppet code changes, but it can be time-consuming to complete the manual deployment testing across hundreds of roles.&lt;/p&gt;

&lt;p&gt;Recently, GitHub has been using an internally-developed tool called &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; to help reduce the time required for these testing cycles. With this tool, developers are able to preview the effects of their change across all roles via a distributed “catalog difference” test that takes less than three minutes to run. Because of reduced testing cycles and increased confidence in their deployments, developers can iterate much faster on their Puppet code changes.&lt;/p&gt;

&lt;p&gt;Before demonstrating &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt;, let’s address the existing solutions and the reasoning for creating a new tool.&lt;/p&gt;

&lt;h3 id=&quot;existing-landscape-of-puppet-testing&quot;&gt;Existing landscape of Puppet testing&lt;/h3&gt;

&lt;p&gt;There are three main strategies for Puppet code testing in wide use, and GitHub uses all of them:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Deployment testing – actually running the Puppet agent (possibly with &lt;code class=&quot;highlighter-rouge&quot;&gt;--noop&lt;/code&gt; to preview actions without actually making changes) allows the developer to review log files or examine the system to see if the results are as intended.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Automated testing – this may include unit tests with &lt;a href=&quot;http://rspec-puppet.com/&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;rspec-puppet&lt;/code&gt;&lt;/a&gt;, acceptance tests with &lt;a href=&quot;https://github.com/puppetlabs/beaker&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;beaker&lt;/code&gt;&lt;/a&gt;, syntax checking &lt;code class=&quot;highlighter-rouge&quot;&gt;puppet parser validate&lt;/code&gt; or linting with &lt;a href=&quot;http://puppet-lint.com/&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;puppet-lint&lt;/code&gt;&lt;/a&gt;. These types of tests often run in a Continuous Integration environment to verify that the code meets a set of specified criteria.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Catalog testing – &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; and Puppet’s &lt;a href=&quot;https://forge.puppet.com/puppetlabs/catalog_preview&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;catalog_preview&lt;/code&gt;&lt;/a&gt; module both allow comparison of catalogs produced by two different Puppet versions or between two environments.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GitHub needed a catalog testing approach that could run from a development or CI environment, because for security reasons only a small number of engineers have direct access to the Puppet master. Because &lt;code class=&quot;highlighter-rouge&quot;&gt;catalog_preview&lt;/code&gt; is designed to be fully integrated into the Puppet master, it would be inaccessible for a large portion of GitHub’s Puppet contributors, and as such it was not the right fit. Therefore, we embarked upon our own development of a tool that could run independently of a Puppet installation, and produced &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;octocatalog-diff&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;This screen shot shows &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; in action, as run from a developer’s workstation:&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;octocatalog-diff command line&quot; src=&quot;/images/announcing-octocatalog-diff/octocatalog-diff-screenshot.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;In this example, the developer is comparing the Puppet catalog changes between the master branch and the Puppet code in the current working directory. Two resources are being created (an Exec resource to create the mount point, and a Filesystem resource to format &lt;code class=&quot;highlighter-rouge&quot;&gt;/dev/xvdf&lt;/code&gt;). Two resources are being removed (the old Exec resource to change permissions on the work directory, and the old Filesystem on &lt;code class=&quot;highlighter-rouge&quot;&gt;/dev/xvdb&lt;/code&gt;). And one resource is being changed (several parameters of the mount point are being updated).&lt;/p&gt;

&lt;p&gt;The output was generated in under 15 seconds, obviating a traditional workflow of committing code, waiting for CI jobs to pass, deploying code to a node, and reviewing the results. The process that generated this output did not require access to, or put any load on, either the Puppet master or the node whose catalog was computed.&lt;/p&gt;

&lt;p&gt;The next graphic shows output from &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; when run via a distributed CI job, to preview the effects of a code change on nodes across the fleet:&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;octocatalog-diff distributed CI&quot; src=&quot;/images/announcing-octocatalog-diff/octocatalog-diff-ci-screenshot.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;In this example, the developer wishes to see which systems will be affected by a particular change to the Puppet code. The output from &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; reveals that the changes affect certain GitHub API nodes. The developer can use this information to test deployment on just those six representative systems instead of hundreds or thousands of nodes. This cuts down on unnecessary testing and provides confidence that there will not be unexpected side effects, allowing the developer to complete the work more efficiently and with less risk.&lt;/p&gt;

&lt;h3 id=&quot;octocatalog-diff-key-features&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; key features&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; has several useful features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Comparing catalogs generated by two branches of a Git repository&lt;/li&gt;
  &lt;li&gt;Predicting differences due to fact changes by allowing the developer to override facts&lt;/li&gt;
  &lt;li&gt;Comparing the content differences of static files, not just the path differences&lt;/li&gt;
  &lt;li&gt;Caching base catalogs to allow subsequent runs to complete faster&lt;/li&gt;
  &lt;li&gt;Ignoring selected types, titles, or parameters to suppress meaningless or known changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; is able to compare catalogs obtained in the following ways:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Compiling a catalog from Puppet code (the most common use case)&lt;/li&gt;
  &lt;li&gt;Reading in a JSON file containing a compiled catalog&lt;/li&gt;
  &lt;li&gt;Retrieving the last known catalog for a node from PuppetDB&lt;/li&gt;
  &lt;li&gt;Querying the Puppet master server for the catalog via its API&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;octocatalog-diff-at-github&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; at GitHub&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; is being used in “catalog only” mode as a Continuous Integration (CI) job in GitHub’s Puppet repository. Upon every push, &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; compiles the catalogs for over 50 critical roles, using real node names and facts, to ensure that changes to one role do not unexpectedly break Puppet catalogs for other roles. In addition, developers use &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; in “difference” mode to preview their changes across the fleet, which has enabled them to perform major refactoring with minimal risk.&lt;/p&gt;

&lt;p&gt;Over the past year, GitHub has successfully upgraded from Puppet 3.4 to 4.5, migrated hard-coded parameters from thousands of manifests into the &lt;code class=&quot;highlighter-rouge&quot;&gt;hiera&lt;/code&gt; hierarchical data store, transitioned node classification from hostname regular expressions to application and roles, expanded roles to run in different environments and in containers, and upgraded roles to run under new operating systems. Using &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; to predict changes across the fleet, a relatively small number of developers accomplished these substantial initiatives quickly and without their Puppet changes causing outages.&lt;/p&gt;

&lt;h3 id=&quot;open-source&quot;&gt;Open source&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; is &lt;a href=&quot;https://github.com/github/octocatalog-diff&quot;&gt;released&lt;/a&gt; to the open source community &lt;a href=&quot;https://github.com/github/octocatalog-diff/blob/master/LICENSE&quot;&gt;under the MIT license&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While we find &lt;code class=&quot;highlighter-rouge&quot;&gt;octocatalog-diff&lt;/code&gt; to be reliable for our needs, there are undoubtedly configurations or customizations within others’ Puppet code bases that we have not anticipated. We welcome community participation and contributions, and look forward to enhancing the compatibility and functionality of the tool.&lt;/p&gt;

&lt;h3 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h3&gt;

&lt;p&gt;We acknowledge and thank the Site Reliability Engineering team at GitHub for their suggestions and code reviews, and the other engineers at GitHub who worked patiently with us to diagnose problems and test improvements during the pre-production stages.&lt;/p&gt;</content><author><name>kpaulisse</name></author><summary type="html">Today we are announcing the open source release of octocatalog-diff: GitHub’s Puppet development and testing tool.</summary></entry><entry><title type="html">Introducing the GitHub Load Balancer</title><link href="http://githubengineering.com/introducing-glb/" rel="alternate" type="text/html" title="Introducing the GitHub Load Balancer" /><published>2016-09-22T00:00:00+00:00</published><updated>2016-09-22T00:00:00+00:00</updated><id>http://githubengineering.com/introducing-glb</id><content type="html" xml:base="http://githubengineering.com/introducing-glb/">&lt;p&gt;At GitHub we serve billions of HTTP, Git and SSH connections each day. To get the best performance we run on &lt;a href=&quot;http://githubengineering.com/githubs-metal-cloud/&quot;&gt;bare metal hardware&lt;/a&gt;. Historically one of the more complex components has been our load balancing tier. Traditionally we scaled this vertically, running a small set of very large machines running &lt;a href=&quot;http://www.haproxy.org/&quot;&gt;haproxy&lt;/a&gt;, and using a very specific hardware configuration allowing dedicated 10G link failover. Eventually we needed a solution that was scalable and we set out to create a load balancer solution that would run on commodity hardware in our typical data center configuration.&lt;/p&gt;

&lt;p&gt;Over the last year we’ve developed our new load balancer, called GLB (GitHub Load Balancer). Today, and over the next few weeks, we will be sharing the design and releasing its components as open source software.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;GLB&quot; src=&quot;/images/introducing-glb/glb-logo-dark.png&quot; /&gt;
&lt;/div&gt;

&lt;h2 id=&quot;out-with-the-old-in-with-the-new&quot;&gt;Out with the old, in with the new&lt;/h2&gt;

&lt;p&gt;GitHub is growing and our monolithic, vertically scaled load balancer tier had met its match and a new approach was required. Our original design was based around a small number of large machines each with dedicated links to our network spine. This design tied networking gear, the load balancing hosts and load balancer configuration together in such a way that scaling horizontally was deemed too difficult. We set out to find a better way.&lt;/p&gt;

&lt;p&gt;We first identified the goals of the new system, design pitfalls of the existing system and prior art that we could draw &lt;a href=&quot;http://www.linuxvirtualserver.org/&quot;&gt;experience&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/watch?v=dKsOvc73gQk&quot;&gt;inspiration&lt;/a&gt; from. After some time we determined that the following would produce a successful load balancing tier that we could maintain into the future:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Runs on commodity hardware&lt;/li&gt;
  &lt;li&gt;Scales horizontally&lt;/li&gt;
  &lt;li&gt;Supports high availability, avoids breaking TCP connections during normal operation and failover&lt;/li&gt;
  &lt;li&gt;Supports connection draining&lt;/li&gt;
  &lt;li&gt;Per service load balancing, with support for multiple services per load balancer host&lt;/li&gt;
  &lt;li&gt;Can be iterated on and deployed like normal software&lt;/li&gt;
  &lt;li&gt;Testable at each layer, not just integration tests&lt;/li&gt;
  &lt;li&gt;Built for multiple POPs and data centers&lt;/li&gt;
  &lt;li&gt;Resilient to typical DDoS attacks, and tools to help mitigate new attacks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;design&quot;&gt;Design&lt;/h2&gt;

&lt;p&gt;To achieve these goals we needed to rethink the relationship between IP addresses and hosts, the constituent layers of our load balancing tier and how connections are routed, controlled and terminated.&lt;/p&gt;

&lt;h3 id=&quot;stretching-an-ip&quot;&gt;Stretching an IP&lt;/h3&gt;

&lt;p&gt;In a typical setup, you assign a single public facing IP address to a single physical machine. DNS can then be used to split traffic over multiple IPs, letting you shard traffic across multiple servers. Unfortunately, DNS entries are cached fairly aggressively (often ignoring the TTL), and some of our users may specifically whitelist or hardcode IP addresses. Additionally, we offer a certain set of IPs for our Pages service which customers can use directly for their apex domain. Rather than relying on adding additional IPs to increase capacity, and having an IP address fail when the single server failed, we wanted a solution that would allow a single IP address to be served by multiple physical machines.&lt;/p&gt;

&lt;p&gt;Routers have a feature called Equal-Cost Multi-Path (ECMP) routing, which is designed to split traffic destined for a single IP across multiple links of equal cost. ECMP works by hashing certain components of an incoming packet such as the source and destination IP addresses and ports. By using a consistent hash for this, subsequent packets that are part of the same TCP flow will hash to the same path, avoiding out of order packets and maintaining session affinity.&lt;/p&gt;

&lt;p&gt;This works great for routing packets across multiple paths to the same physical destination server. Where it gets interesting is when you use ECMP to split traffic destined for a single IP across multiple physical servers, each of which terminate TCP connections but share no state, like in a load balancer. When one of these servers fails or is taken out of rotation and is removed from the ECMP server set a &lt;a href=&quot;https://en.wikipedia.org/wiki/Consistent_hashing&quot;&gt;rehash event occurs&lt;/a&gt;. 1/N connections will get reassigned to the remaining servers. Since these servers don’t share connection state these connections get terminated. Unfortunately, these connections may not be the same 1/N connections that were mapped to the failing server. Additionally, there is no way to gracefully remove a server for maintenance without also disrupting 1/N active connections.&lt;/p&gt;

&lt;h3 id=&quot;l4l7-split-design&quot;&gt;L4/L7 split design&lt;/h3&gt;

&lt;p&gt;A pattern that has been used by other projects is to split the load balancers into a L4 and L7 tier. At the L4 tier, the routers use ECMP to shard traffic using consistent hashing to a set of L4 load balancers - typically using software like &lt;a href=&quot;http://www.linuxvirtualserver.org/software/ipvs.html&quot;&gt;ipvs/LVS&lt;/a&gt;. LVS keeps connection state, and optionally syncs connection state with multicast to other L4 nodes, and forwards traffic to the L7 tier which runs software such as haproxy. We call the L4 tier “director” hosts since they direct traffic flow, and the L7 tier “proxy” hosts, since they proxy connections to backend servers.&lt;/p&gt;

&lt;p&gt;This L4/L7 split has an interesting benefit: the proxy tier nodes can now be removed from rotation by gracefully draining existing connections, since the connection state on the director nodes will keep existing connections mapped to their existing proxy server, even after they are removed from rotation for new connections. Additionally, the proxy tier tends to be the one that requires more upkeep due to frequent configuration changes, upgrades and scaling so this works to our advantage.&lt;/p&gt;

&lt;p&gt;If the multicast connection syncing is used, then the L4 load balancer nodes handle failure slightly more gracefully, since once a connection has been synced to the other L4 nodes, the connection will no longer be disrupted. Without connection syncing, providing the director nodes hash connections the same way and have the same backend set, connections may successfully continue over a director node failure. In practise, most installations of this tiered design just accept connection disruption under node failure or node maintenance.&lt;/p&gt;

&lt;p&gt;Unfortunately, using LVS for the director tier has some significant drawbacks. Firstly, multicast was not something we wanted to support, so we would be relying on the nodes having the same view of the world, and having consistent hashing to the backend nodes. Without connection syncing, certain events, including planned maintenance of nodes, could cause connection disruption. Connection disruption is something we wanted to avoid due to how git cannot retry or resume if the connection is severed mid-flight. Finally, the fact that the director tier requires connection state at all adds an extra complexity to DDoS mitigation such as &lt;a href=&quot;http://githubengineering.com/syn-flood-mitigation-with-synsanity/&quot;&gt;synsanity&lt;/a&gt; - to avoid resource exhaustion, syncookies would now need to be generated on the director nodes, despite the fact that the connections themselves are terminated on the proxy nodes.&lt;/p&gt;

&lt;h3 id=&quot;designing-a-better-director&quot;&gt;Designing a better director&lt;/h3&gt;

&lt;p&gt;We decided early on in the design of our load balancer that we wanted to improve on the common pattern for the director tier. We set out to design a new director tier that was stateless and allowed both director and proxy nodes to be gracefully removed from rotation without disruption to users wherever possible. Users live in countries with less than ideal internet connectivity, and it was important to us that long running clones of reasonably sized repositories would not fail during planned maintenance within a reasonable time limit.&lt;/p&gt;

&lt;p&gt;The design we settled on, and now use in production, is a variant of &lt;a href=&quot;https://en.wikipedia.org/wiki/Rendezvous_hashing&quot;&gt;Rendezvous hashing&lt;/a&gt; that supports constant time lookups. We start by storing each proxy host and assign a state. These states handle the connection draining aspect of our design goals and will be discussed further in a future post. We then generate a single, fixed-size forwarding table and fill each row with a set of proxy servers using the ordering component of Rendezvous hashing. This table, along with the proxy states, are sent to all director servers and kept in sync as proxies come and go. When a TCP packet arrives on the director, we hash the source IP to generate consistent index into the forwarding table. We then encapsulate the packet inside another IP packet (actually &lt;a href=&quot;https://lwn.net/Articles/614348/&quot;&gt;Foo-over-UDP&lt;/a&gt;) destined to the internal IP of the proxy server, and send it over the network. The proxy server receives the encapsulated packet, decapsulates it, and processes the original packet locally. Any outgoing packets use Direct Server Return, meaning packets destined to the client egress directly to the client, completely bypassing the director tier.&lt;/p&gt;

&lt;h2 id=&quot;stay-tuned&quot;&gt;Stay tuned&lt;/h2&gt;

&lt;p&gt;Now that you have a taste of the system that processed and routed the request to this blog post we hope you stay tuned for future posts describing our director design in depth, improving haproxy hot configuration reloads and how we managed to migrate to the new system without anyone noticing.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;joewilliams&quot;, &quot;fullname&quot;=&gt;&quot;Joe Williams&quot;, &quot;role&quot;=&gt;&quot;Senior Infrastructure Engineer&quot;, &quot;twitter&quot;=&gt;&quot;williamsjoe&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/joewilliams&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/williamsjoe&quot;}, {&quot;name&quot;=&gt;&quot;Blog&quot;, &quot;url&quot;=&gt;&quot;http://joeandmotorboat.com/&quot;}]}</name></author><summary type="html">At GitHub we serve billions of HTTP, Git and SSH connections each day. To get the best performance we run on bare metal hardware. Historically one of the more complex components has been our load balancing tier. Traditionally we scaled this vertically, running a small set of very large machines running haproxy, and using a very specific hardware configuration allowing dedicated 10G link failover. Eventually we needed a solution that was scalable and we set out to create a load balancer solution that would run on commodity hardware in our typical data center configuration.</summary></entry><entry><title type="html">The GitHub GraphQL API</title><link href="http://githubengineering.com/the-github-graphql-api/" rel="alternate" type="text/html" title="The GitHub GraphQL API" /><published>2016-09-14T00:00:00+00:00</published><updated>2016-09-14T00:00:00+00:00</updated><id>http://githubengineering.com/the-github-graphql-api</id><content type="html" xml:base="http://githubengineering.com/the-github-graphql-api/">&lt;p&gt;GitHub announced a public API &lt;a href=&quot;https://github.com/blog/21-the-api&quot;&gt;one month after the site launched&lt;/a&gt;. We’ve evolved this platform through three versions, adhering to RFC standards and embracing new design patterns to provide a clear and consistent interface. We’ve often heard that our REST API was an inspiration for other companies; countless tutorials refer to our endpoints. Today, we’re excited to announce our biggest change to the API since we snubbed XML in favor of JSON: we’re making the GitHub API available through GraphQL.&lt;/p&gt;

&lt;p&gt;GraphQL is, at its core, a specification for a data querying language. We’d like to talk a bit about GraphQL, including the problems we believe it solves and the opportunities it provides to integrators.&lt;/p&gt;

&lt;h2 id=&quot;why&quot;&gt;Why?&lt;/h2&gt;

&lt;p&gt;You may be wondering why we chose to start supporting GraphQL. Our API was designed to be RESTful and hypermedia-driven. We’re fortunate to have &lt;a href=&quot;https://developer.github.com/libraries/&quot;&gt;dozens of different open-source clients&lt;/a&gt; written in a plethora of languages. Businesses grew around these endpoints.&lt;/p&gt;

&lt;p&gt;Like most technology, REST is not perfect and has some drawbacks. Our ambition to change our API focused on solving two problems.&lt;/p&gt;

&lt;p&gt;The first was scalability. The REST API is responsible for over 60% of the requests made to our database tier. This is partly because, by its nature, hypermedia navigation requires a client to repeatedly communicate with a server so that it can get all the information it needs. Our responses were bloated and filled with all sorts of &lt;code class=&quot;highlighter-rouge&quot;&gt;*_url&lt;/code&gt; hints in the JSON responses to help people continue to navigate through the API to get what they needed. Despite all the information we provided, we heard from integrators that our REST API also wasn’t very flexible. It sometimes required two or three separate calls to assemble a complete view of a resource. It seemed like our responses simultaneously sent too much data &lt;em&gt;and&lt;/em&gt; didn’t include data that consumers needed.&lt;/p&gt;

&lt;p&gt;As we began to audit our endpoints in preparation for an APIv4, we encountered our second problem. We wanted to collect some meta-information about our endpoints. For example, we wanted to identify the OAuth scopes required for each endpoint. We wanted to be smarter about how our resources were paginated. We wanted assurances of type-safety for user-supplied parameters. We wanted to generate documentation from our code. We wanted to generate clients instead of manually supplying patches to &lt;a href=&quot;http://octokit.github.io/&quot;&gt;our Octokit suite&lt;/a&gt;. We studied a variety of API specifications built to make some of this easier, but we found that none of the standards totally matched our requirements.&lt;/p&gt;

&lt;p&gt;And then we learned about GraphQL.&lt;/p&gt;

&lt;h2 id=&quot;the-switch&quot;&gt;The switch&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;http://graphql.org/&quot;&gt;GraphQL&lt;/a&gt; is a querying language developed by Facebook over the course of several years. In essence, you construct your request by defining the resources you want. You send this via a &lt;code class=&quot;highlighter-rouge&quot;&gt;POST&lt;/code&gt; to a server, and the response matches the format of your request.&lt;/p&gt;

&lt;p&gt;For example, say you wanted to fetch just a few attributes off of a user. Your GraphQL query might look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-graphql&quot;&gt;{
  viewer {
    login
    bio
    location
    isBountyHunter
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And the response back might look like this:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;data&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;viewer&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;login&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;octocat&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;bio&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;I've been around the world, from London to the Bay.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;location&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;San Francisco, CA&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;isBountyHunter&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;You can see that the keys and values in the JSON response match right up with the terms in the query string.&lt;/p&gt;

&lt;p&gt;What if you wanted something more complicated? Let’s say you wanted to know how many repositories you’ve starred. You also want to get the names of your first three repositories, as well as their total number of stars, total number of forks, total number of watchers, and total number of open issues. That query might look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-graphql&quot;&gt;{
  viewer {
    login
    starredRepositories {
      totalCount
    }
    repositories(first: 3) {
      edges {
        node {
          name
          stargazers {
            totalCount
          }
          forks {
            totalCount
          }
          watchers {
            totalCount
          }
          issues(states:[OPEN]) {
            totalCount
          }
        }
      }
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The response from our API might be:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  
  &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;data&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  
    &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;viewer&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;login&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;octocat&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;starredRepositories&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;131&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;repositories&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;edges&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
          &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;node&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;octokit.rb&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;stargazers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;forks&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;watchers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;issues&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
          &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
          &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  
            &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;node&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;octokit.objc&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;stargazers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;forks&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;watchers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;issues&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
          &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
          &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;node&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;octokit.net&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;stargazers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;19&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;forks&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;watchers&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;issues&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
                &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;totalCount&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
              &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
            &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
          &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;You just made &lt;em&gt;one&lt;/em&gt; request to fetch all the data you wanted.&lt;/p&gt;

&lt;p&gt;This type of design enables clients where smaller payload sizes are essential. For example, a mobile app could simplify its requests by only asking for the data it needs. This enables new possibilities and workflows that are freed from the limitations of downloading and parsing massive JSON blobs.&lt;/p&gt;

&lt;p&gt;Query analysis is something that we’re also exploring with. Based on the resources that are requested, we can start providing more intelligent information to clients. For example, say you’ve made the following request:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-graphql&quot;&gt;{
  viewer {
    login
    email
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Before executing the request, the GraphQL server notes that you’re trying to get the &lt;code class=&quot;highlighter-rouge&quot;&gt;email&lt;/code&gt; field. If your client is misconfigured, a response back from our server might look like this:&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;data&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;viewer&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;login&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;octocat&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;errors&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&quot;message&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Your token has not been granted the required scopes to
      execute this query. The 'email' field requires one of the following
      scopes: ['user'], but your token has only been granted the: ['gist']
      scopes. Please modify your token's scopes at: https://github.com/settings/tokens.&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This could be beneficial for users concerned about the OAuth scopes required by integrators. Insight into the scopes required could ensure that only the appropriate types are being requested.&lt;/p&gt;

&lt;p&gt;There are several other features of GraphQL that we hope to make available to clients, such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The ability to &lt;em&gt;batch requests&lt;/em&gt;, where you can define dependencies between two separate queries and fetch data efficiently.&lt;/li&gt;
  &lt;li&gt;The ability to &lt;em&gt;create subscriptions&lt;/em&gt;, where your client can receive new data when it becomes available.&lt;/li&gt;
  &lt;li&gt;The ability to &lt;em&gt;defer data&lt;/em&gt;, where you choose to mark a part of your response as time-insensitive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;defining-the-schema&quot;&gt;Defining the schema&lt;/h2&gt;

&lt;p&gt;In order to determine if GraphQL really was a technology we wanted to embrace, we formed a small team within the broader Platform organization and went looking for a feature on the site we wanted to build using GraphQL. We decided that implementing &lt;a href=&quot;https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comments&quot;&gt;emoji reactions on comments&lt;/a&gt; was concise enough to try and port to GraphQL. Choosing a subset of the site to power with GraphQL required us to model a complete workflow and focus on building the new objects and types that defined our GraphQL schema. For example, we started by constructing a user in our schema, moved on to a repository, and then expanded to issues within a repository. Over time, we grew the schema to encapsulate all the actions necessary for modeling reactions.&lt;/p&gt;

&lt;p&gt;We found implementing a GraphQL server to be very straightforward. &lt;a href=&quot;https://facebook.github.io/graphql/&quot;&gt;The Spec&lt;/a&gt; is clearly written and succinctly describes the behaviors of various parts of a schema. GraphQL has a type system that forces the server to be unambiguous about requests it receives and responses it produces. You define a schema, describing the objects that represent your resources, fields on those objects, and the connections between various objects. For example, a &lt;code class=&quot;highlighter-rouge&quot;&gt;Repository&lt;/code&gt; object has a non-null &lt;code class=&quot;highlighter-rouge&quot;&gt;String&lt;/code&gt; field for the &lt;code class=&quot;highlighter-rouge&quot;&gt;name&lt;/code&gt;. A repository also has &lt;code class=&quot;highlighter-rouge&quot;&gt;watchers&lt;/code&gt;, which is a connection to another non-nullable object, &lt;code class=&quot;highlighter-rouge&quot;&gt;User&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Although the initial team exploring GraphQL worked mostly on the backend, we had several allies on the frontend who were also interested in GraphQL, and, specifically, moving parts of GitHub to use &lt;a href=&quot;https://facebook.github.io/relay/&quot;&gt;Relay&lt;/a&gt;. They too were seeking better ways to access user data and present it more efficiently on the website. We began to work together to continue finding portions of the site that would be easy to communicate with via our nascent GraphQL schema. We decided to begin transforming some of our social features, such as the profile page, the stars counter, and the ability to watch repositories. These initial explorations paved the way to placing GraphQL in production. (That’s right! We’ve been running GraphQL in production for some time now.) As time went on, we began to get a bit more ambitious: we ported over some of the Git commit history pages to GraphQL and used &lt;a href=&quot;http://githubengineering.com/scientist/&quot;&gt;Scientist&lt;/a&gt; to identify any potential discrepancies.&lt;/p&gt;

&lt;p&gt;Drawing off our experiences in supporting the REST API, we worked quickly to implement our existing services to work with GraphQL. This included setting up logging requests and reporting exceptions, OAuth and AuthZ access, rate limiting, and providing helpful error responses. We tested our schema to ensure that every part of was documented and we wrote linters to ensure that our naming structure was standardized.&lt;/p&gt;

&lt;h2 id=&quot;open-source&quot;&gt;Open source&lt;/h2&gt;

&lt;p&gt;We work primarily in Ruby, and we were grateful for the existing gems supporting GraphQL. We used the &lt;a href=&quot;https://github.com/rmosolgo/graphql-ruby&quot;&gt;rmosolgo/graphql-ruby&lt;/a&gt; gem to implement &lt;strong&gt;the entirety&lt;/strong&gt; of our schema. We also incorporated the &lt;a href=&quot;https://github.com/Shopify/graphql-batch&quot;&gt;Shopify/graphql-batch&lt;/a&gt; gem to ensure that multiple records and relationships were fetched efficiently.&lt;/p&gt;

&lt;p&gt;Our frontend and backend engineers were also able to contribute to these gems as we experimented with them. We’re thankful to the maintainers for their very quick work in accepting our patches. To that end, we’d like to humbly offer a couple of our own open source projects:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/github/graphql-client&quot;&gt;github/graphql-client&lt;/a&gt;, a client that can be integrated into Rails for rendering GraphQL-backed views.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/github/github-graphql-rails-example&quot;&gt;github/github-graphql-rails-example&lt;/a&gt;, a small app built with Rails that demonstrates how you might interact with our GraphQL schema.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re going to continue to extract more parts of our system that we’ve developed internally and release them as open source software, such as our loaders that efficiently batch ActiveRecord requests.&lt;/p&gt;

&lt;h2 id=&quot;the-future&quot;&gt;The future&lt;/h2&gt;

&lt;p&gt;The move to GraphQL marks a larger shift in our Platform strategy to be more transparent and more flexible. Over the next year, we’re going to keep iterating on our schema to bring it out of Early Access and into a wider production readiness.&lt;/p&gt;

&lt;p&gt;Since our application engineers are using the same GraphQL platform that we’re making available to our integrators, this provides us with the opportunity to ship UI features &lt;em&gt;in conjunction with&lt;/em&gt; API access. Our new Projects feature is a good example of this: the UI on the site is powered by GraphQL, and you can already use the feature programmatically. Using GraphQL on the frontend and backend eliminates the gap between what we release and what you can consume. We really look forward to making more of these simultaneous releases.&lt;/p&gt;

&lt;p&gt;GraphQL represents a massive leap forward for API development. Type safety, introspection, generated documentation, and predictable responses benefit both the maintainers and consumers of our platform. We’re looking forward to our new era of a GraphQL-backed platform, and we hope that you do, too!&lt;/p&gt;

&lt;p&gt;If you’d like to get started with GraphQL—including our new GraphQL Explorer that lets you make :sparkles:live queries:sparkles:, check out &lt;a href=&quot;https://developer.github.com/early-access/graphql&quot;&gt;our developer documentation&lt;/a&gt;!&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;gjtorikian&quot;, &quot;role&quot;=&gt;&quot;Platform Engineer&quot;, &quot;twitter&quot;=&gt;&quot;gjtorikian&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/gjtorikian&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/gjtorikian&quot;}]}</name></author><summary type="html">GitHub announced a public API one month after the site launched. We’ve evolved this platform through three versions, adhering to RFC standards and embracing new design patterns to provide a clear and consistent interface. We’ve often heard that our REST API was an inspiration for other companies; countless tutorials refer to our endpoints. Today, we’re excited to announce our biggest change to the API since we snubbed XML in favor of JSON: we’re making the GitHub API available through GraphQL.</summary></entry><entry><title type="html">Building resilience in Spokes</title><link href="http://githubengineering.com/building-resilience-in-spokes/" rel="alternate" type="text/html" title="Building resilience in Spokes" /><published>2016-09-07T00:00:00+00:00</published><updated>2016-09-07T00:00:00+00:00</updated><id>http://githubengineering.com/building-resilience-in-spokes</id><content type="html" xml:base="http://githubengineering.com/building-resilience-in-spokes/">&lt;p&gt;&lt;a href=&quot;/introducing-dgit/&quot;&gt;Spokes&lt;/a&gt; is the replication system for the file
servers where we store over 38 million Git repositories and over 36 million gists.  It
keeps at least three copies of every repository and every gist so that we
can provide durable, highly available access to content even when servers and networks fail.  Spokes
uses a combination of Git and rsync to replicate, repair, and rebalance
repositories.&lt;/p&gt;

&lt;h2 id=&quot;what-is-spokes&quot;&gt;What is Spokes?&lt;/h2&gt;

&lt;p&gt;Before we get into the topic at hand—building resilience—we have a
new name to announce: DGit is now Spokes.&lt;/p&gt;

&lt;p&gt;Earlier this year, we &lt;a href=&quot;/introducing-dgit/&quot;&gt;announced&lt;/a&gt; “DGit” or
“Distributed Git,” our application-level replication system for Git.  We
got feedback that the name “DGit” wasn’t very distinct and could cause
confusion with the Git project itself.  So we have decided to rename the
system &lt;em&gt;Spokes&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id=&quot;defining-resilience&quot;&gt;Defining resilience&lt;/h2&gt;

&lt;p&gt;In any system or service, there are two key ways to measure resilience:
availability and durability.  A system’s availability is the fraction of
the time it can provide the service it was designed to provide.  Can it
serve content?  Can it accept writes?  Availability can be partial,
complete, or degraded: is every repository available?  Are some
repositories—or whole servers—slow?&lt;/p&gt;

&lt;p&gt;A system’s durability is its resistance to permanent data loss.  Once the
system has accepted a write—a push, a merge, an edit through the
website, new-repository creation, etc.—it should never corrupt or
revert that content.  The key here is the moment that the system accepts
the write: how many copies are stored, and where?  Enough copies must be
stored for us to believe with some very high probability that the write
will not be lost.&lt;/p&gt;

&lt;p&gt;A system can be durable but not available.  For example, if a system can’t
make the minimum required number of copies of an incoming write, it might
refuse to accept writes.  Such a system would be temporarily unavailable
for writing, while maintaining the promise not to lose data.  Of course,
it is also possible for a system to be available without being durable,
for example, by accepting writes whether or not they can be committed
safely.&lt;/p&gt;

&lt;p&gt;Readers may recognize this as related to the &lt;a href=&quot;https://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;CAP
Theorem&lt;/a&gt;.  In short, a system
can satisfy at most two of these three properties:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;consistency: all nodes see the same data&lt;/li&gt;
  &lt;li&gt;availability: the system can satisfy read and write requests&lt;/li&gt;
  &lt;li&gt;partition tolerance: the system works even when nodes are down or
unable to communicate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spokes puts the highest priority on consistency and partition tolerance.
In worst-case failure scenarios, it will refuse to accept writes that it
cannot commit, synchronously, to at least two replicas.&lt;/p&gt;

&lt;h2 id=&quot;availability&quot;&gt;Availability&lt;/h2&gt;

&lt;p&gt;Spokes’s availability is a function of the availability of underlying
servers and networks, and of our ability to detect and route around server
and network problems.&lt;/p&gt;

&lt;p&gt;Individual servers become unavailable pretty frequently.  Since rolling
out Spokes this past spring, we have had individual servers crash due to a
kernel deadlock and faulty RAM chips.  Sometimes servers provide degraded
service due to lesser hardware faults or high system load.  In all cases,
Spokes must detect the problem quickly and route around it.  Each
repository is replicated on three servers, so there’s almost always an
up-to-date, available replica to route to even if one server is offline.  Spokes is more
than the sum of its individually-failure-prone parts.&lt;/p&gt;

&lt;p&gt;Detecting problems quickly is the first step.  Spokes uses a combination
of heartbeats and real application traffic to determine when a file server
is down.  Using real application traffic is key for two reasons.  First,
heartbeats learn and react slowly.  Each of our file servers handles a
hundred or more incoming requests per second.  A heartbeat that happens
once per second would learn about a failure only after a hundred requests
had already failed.  Second, heartbeats test only a subset of the server’s
functionality: for example, whether or not the server can accept a TCP connection and respond
to a no-op request.  But what if the failure mode is more subtle?  What if
the Git binary is corrupt?  What if disk accesses have stalled?  What if
all authenticated operations are failing?  No-ops can often succeed when
real traffic will fail.&lt;/p&gt;

&lt;p&gt;So Spokes watches for failures during the processing of real application
traffic, and it marks a node as offline if too many requests fail.  Of
course, real requests do fail sometimes.  Someone can try to read a branch
that has already been deleted, or try to push to a branch they don’t have
access to, for example.  So Spokes only marks the node offline if three
requests fail in a row.  That sometimes marks perfectly healthy nodes
offline—three requests can fail in a row just by random chance—but
not often, and the penalty for it is not large.&lt;/p&gt;

&lt;p&gt;Spokes uses heartbeats, too, but not as the primary failure-detection
mechanism.  Instead, heartbeats have two purposes: polling system load and
providing the all-clear signal after a node has been marked as offline.
As soon as a heartbeat succeeds, the node is marked as online again.  If
the heartbeat succeeds despite ongoing server problems (retrieving system
load is almost a no-op), the node
will get marked offline again after three more failed requests.&lt;/p&gt;

&lt;p&gt;So Spokes detects that a node is down within about three failed
operations.  That’s still three failed operations too many!  For clean
failures—connections refused or timeouts—all operations know how to
try the next host.  Remember, Spokes has three or more copies of every
repository.  A routing query for a repository returns not one server, but
a list of three (or so) up-to-date replicas, sorted in preference order.
If an operation attempted on the first-choice replica fails, there are usually two other
replicas to try.&lt;/p&gt;

&lt;p&gt;A graph of operations (here, remote procedure calls, or RPCs) failed over
from one server to another clearly shows when a server is offline.  In
this graph, a single server is unavailable for about 1.5 hours; during
this time, many thousands of RPC operations are redirected to other
servers.  This graph is the single best detector the Spokes team has for
discovering misbehaving servers.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;One server down&quot; src=&quot;/images/building-resilience-in-spokes/one-server-down.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Spokes’s node-offline detection is only advisory—i.e., only an
optimization.  A node that has had three failures in a row just gets moved
to the end of the preference order for all read operations, rather than
removed from the list of replicas.  It’s better for Spokes to try a
probably-offline replica last, than to not try it at all.&lt;/p&gt;

&lt;p&gt;This failure detector works well for server failures: when a server is
overloaded or offline, operations to it will fail.  Spokes detects those
failures and temporarily stops directing traffic to the failed server
until a heartbeat succeeds.  However, failures of networks and application
(Rails) servers are much messier.  A given file server can appear to be
offline to just a subset of the application servers, or one bad
application server can spuriously determine that every file server is
offline.  So Spokes’s failure detection is actually MxN: each application
server keeps its own list of which file servers are offline, or not.  If
we see many application servers marking a single file server as offline,
then it probably is.  If we see a single application server marking many
file servers offline, then we’ve learned about a fault on that application
server, instead.&lt;/p&gt;

&lt;p&gt;The figure below illustrates the MxN nature of failure detection and shows
in red which failure detectors are true if a single file server, &lt;code class=&quot;highlighter-rouge&quot;&gt;dfs4&lt;/code&gt;, is
offline.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;MxN failure detection&quot; src=&quot;/images/building-resilience-in-spokes/mxn.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;In one recent incident, a single front-end application server in a staging
environment lost its ability to resolve the DNS names of the file servers.
Because it couldn’t reach the file servers to send them RPC operations or
heartbeats, it concluded that every file server was offline.  But that
incorrect determination was limited to that one application server; all
other application servers worked normally.  So the flaky application
server was immediately obvious in the RPC-failover graphs, and no
production traffic was affected.&lt;/p&gt;

&lt;h2 id=&quot;durability&quot;&gt;Durability&lt;/h2&gt;

&lt;p&gt;Sometimes, servers fail.  Disks can fail; RAID controllers can fail; even
entire servers or entire racks can fail.  Spokes provides durability for
repository data even in the face of such adversity.&lt;/p&gt;

&lt;p&gt;The basic building block of durability, like availability, is
replication.  Spokes keeps at least three copies of every repository,
wiki, and gist, and those copies are in different racks.  No updates to a
repository—pushes, renames, edits to a wiki, etc.—are accepted
unless a strict majority of the replicas can apply the change and get the
same result.&lt;/p&gt;

&lt;p&gt;Spokes needs just one extra copy to survive a single-node failure.  So why
a majority?  It’s possible, even common, for a repository to get multiple
writes at roughly the same time.  Those writes might conflict: one user
might delete a branch while another user pushes new commits to that same
branch, for example.  Conflicting writes must be serialized—that is,
they have to be applied (or rejected) in the same order on every replica,
so every replica gets the same result.  The way Spokes serializes writes
is by ensuring that every write acquires an exclusive lock on a majority
of replicas.  It’s impossible for two writes to acquire a majority at the
same time, so Spokes eliminates conflicts by eliminating concurrent writes
entirely.&lt;/p&gt;

&lt;p&gt;If a repository exists on exactly three replicas, then a successful write
on two replicas constitutes both a durable set, and a majority.  If a
repository has four or five replicas, then three are required for a
majority.&lt;/p&gt;

&lt;p&gt;In contrast, many other replication and consensus protocols have a single
primary copy at any moment.  The order that writes arrive at the primary
copy is the official order, and all other replicas must apply writes in
that order.  The primary is generally designated manually, or
automatically using an election protocol.  Spokes simply skips that step
and treats every write as an election—selecting a winning order and
outcome directly, rather than a winning server that dictates the write
order.&lt;/p&gt;

&lt;p&gt;Any write in Spokes that can’t be applied identically at a majority of
replicas gets reverted from any replica where it was applied.  In essence,
every write operation goes through a voting protocol, and any replicas on
the losing side of the vote are marked as unhealthy—unavailable for
reads or writes—until they can be repaired.  Repairs are automatic and
quick.  Because a majority agreed either to accept or to roll back the
update, there are still at least two replicas available to continue
accepting both reads and writes while the unhealthy replica is
repaired.&lt;/p&gt;

&lt;p&gt;To be clear, disagreements and repairs are exceptional cases.  GitHub
accepts many millions of repository writes each day.  On a typical day, a
few dozen writes will result in non-unanimous votes, generally because one
replica was particularly busy, the connection to it timed out, and the
other replicas voted to move on without it.  The lagging replica almost
always recovers within a minute or two, and there is no user-visible
impact on the repository’s availability.&lt;/p&gt;

&lt;p&gt;Rarer still are whole-disk and whole-server failures, but they do happen.
When we have to remove an entire server, there are suddenly hundreds of
thousands of repositories with only two copies, instead of three.  This,
too, is a repairable condition.  Spokes checks periodically to see if
every repository has the desired number of replicas; if not, more replicas
are created.  New replicas can be created anywhere, and they can be copied
from wherever the surviving two copies of each repository are.  Hence,
repairs after a server failure are N-to-N.  The larger the file server cluster, the faster it can
recover from a single-node failure.&lt;/p&gt;

&lt;h2 id=&quot;clean-shutdowns&quot;&gt;Clean shutdowns&lt;/h2&gt;

&lt;p&gt;As described above, Spokes can deal quickly and transparently with a
server going offline or even failing permanently.  So, can we use that
for planned maintenance, when we need to reboot or retire a server?  Yes
and no.&lt;/p&gt;

&lt;p&gt;Strictly speaking, we can reboot a server with &lt;code class=&quot;highlighter-rouge&quot;&gt;sudo reboot&lt;/code&gt;, and we can
retire it just by unplugging it.  But there are subtle disadvantages to
doing so, so we have more careful mechanisms, reusing a lot of the same
logic that would respond to a crash or a failure.&lt;/p&gt;

&lt;p&gt;Simply rebooting a server does not affect future read and write
operations, which will be transparently directed to other replicas.  It
doesn’t affect in-progress write operations, either, as those are
happening on all replicas, and the other two replicas can easily vote to
proceed without the server we’re rebooting.  But a reboot does break
in-progress read operations.  Most of those reads—e.g., fetching a
README to display on a repository’s home page—are quick and will
complete while the server shuts down gracefully.  But some reads,
particularly clones of large repositories, take minutes or hours to
complete, depending on the speed of the end user’s network.
Breaking these is, well, rude.  They can be restarted on
another replica, but all progress up to that point would be lost.&lt;/p&gt;

&lt;p&gt;Hence, rebooting a server intentionally in Spokes begins with a quiescing
period.  While a server is quiescing, it is marked as offline for the
purposes of new read operations, but existing read operations, including
clones, are allowed to finish.  Quiescing can take anywhere from a few
seconds to many hours, depending on which read operations are active on
the server that is getting rebooted.&lt;/p&gt;

&lt;p&gt;Perhaps surprisingly, write operations are sent to servers as usual, even
while they quiesce.  That’s because write operations run on all replicas,
so one replica can drop out at any time without user-visible impact.
Also, that replica would fall arbitrarily far behind if it didn’t receive
writes while quiescing, creating a lot of catch-up load when it is finally
brought fully back online.&lt;/p&gt;

&lt;p&gt;We don’t perform “&lt;a href=&quot;https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey&quot;&gt;chaos
monkey&lt;/a&gt;”
testing on the Spokes file servers, for
the same reasons we prefer to quiesce them before rebooting them: to avoid
interrupting long-running reads.  That is, we do not reboot them randomly
just to confirm that sudden, single-node failures are still (mostly)
harmless.&lt;/p&gt;

&lt;p&gt;Instead of “chaos monkey” testing, we perform rolling reboots as needed,
which accomplish roughly the same testing goals.  When we need to make
some change that requires a reboot—e.g., changing kernel or filesystem
parameters, or changing BIOS settings—we quiesce and reboot each
server.  Racks serve as availability zones&lt;sup&gt;&lt;a href=&quot;#footnote-1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;,
so we quiesce entire racks at
a time.  As servers in a given rack finish quiescing—i.e., complete all
outstanding read operations—we reboot up to five of them at a time.
When a whole rack is finished, we move on to the next rack.&lt;/p&gt;

&lt;p&gt;Below is a graph showing RPC operations failed over during a rolling
reboot.  Each server gets a different color.  Values are stacked, so the
tallest spike shows a moment where eight servers were rebooting at once.
The large block of light red shows where one server did not reboot cleanly
and was offline for over two hours.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;Rolling reboot&quot; src=&quot;/images/building-resilience-in-spokes/rolling-reboots.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Retiring a server by simply unplugging it has the same disadvantages as
unplanned reboots, and more.  In addition to disrupting any in-progress
read operations, it creates several hours of additional risk for all the
repositories that used to be hosted on the server.  When a server
disappears suddenly, all of the repositories formerly on it are now down
to two copies.  Two copies are enough to perform any read or write
operation, but two copies aren’t enough to tolerate an additional failure.
In other words, removing a server without warning increases the
probability of rejecting write operations later that same day.  We’re in
the business of keeping that probability to a minimum.&lt;/p&gt;

&lt;p&gt;So instead, we prepare a server for retirement by removing it from the
count of active replicas for any repository.  Spokes can still
use that server for both read and write operations.  But when it asks if
all repositories have enough replicas, suddenly some of them—the ones
on the retiring server—will say no, and more replicas will be created.
These repairs proceed exactly as if the server had just disappeared,
except that now the server remains available in case some other server
fails.&lt;/p&gt;

&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;Availability is important, and durability is more important still.
Availability is a measure of what fraction of the time a service responds
to requests.  Durability is a measure of what fraction of committed data a
service can faithfully store.&lt;/p&gt;

&lt;p&gt;Spokes keeps at least three replicas of every repository, to provide both
availability and durability.  Three replicas means that one server can
fail with no user-visible effect.  If two servers fail, Spokes can provide
full access for most repositories and read-only access to repositories that had
two of their replicas on the two failing servers.&lt;/p&gt;

&lt;p&gt;Spokes does not accept writes to a repository unless a majority of
replicas—and always at least two—can commit the write and produce
the same resulting repository state.  That requirement provides
consistency by ensuring the same write ordering on all replicas.  It
also provides durability in the face of single-server failures by
storing every committed write in at least two places.&lt;/p&gt;

&lt;p&gt;Spokes has a failure detector, based on monitoring live application
traffic, that determines when a server is
offline and routes around the problem.  Finally, Spokes has automated repairs
for recovering quickly when a disk or server fails permanently.&lt;/p&gt;

&lt;hr /&gt;

&lt;div style=&quot;font-size: 0.8em; font-style: italic;&quot;&gt;
&lt;p&gt;
&lt;a name=&quot;footnote-1&quot;&gt;1.&lt;/a&gt; Treating racks as availability zones means we
place repository replicas so that no repository has two replicas within
the same rack.  Hence, we can lose an entire rack of servers and not
affect the availability or durability of any of the repositories hosted on
them.  We chose racks as availability zones because several important
failure modes, especially related to power and networking, can affect
entire racks of servers at a time.
&lt;/p&gt;
&lt;/div&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;piki&quot;, &quot;fullname&quot;=&gt;&quot;Patrick Reynolds&quot;, &quot;role&quot;=&gt;&quot;Git Infrastructure Engineering Manager&quot;, &quot;twitter&quot;=&gt;&quot;p_reynolds&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/piki&quot;}, {&quot;name&quot;=&gt;&quot;Website&quot;, &quot;url&quot;=&gt;&quot;http://piki.org/&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;http://twitter.com/p_reynolds&quot;}]}</name></author><summary type="html">Spokes is the replication system for the file
servers where we store over 38 million Git repositories and over 36 million gists.  It
keeps at least three copies of every repository and every gist so that we
can provide durable, highly available access to content even when servers and networks fail.  Spokes
uses a combination of Git and rsync to replicate, repair, and rebalance
repositories.</summary></entry><entry><title type="html">Context aware MySQL pools via HAProxy</title><link href="http://githubengineering.com/context-aware-mysql-pools-via-haproxy/" rel="alternate" type="text/html" title="Context aware MySQL pools via HAProxy" /><published>2016-08-17T00:00:00+00:00</published><updated>2016-08-17T00:00:00+00:00</updated><id>http://githubengineering.com/context-aware-mysql-pools-via-haproxy</id><content type="html" xml:base="http://githubengineering.com/context-aware-mysql-pools-via-haproxy/">&lt;p&gt;At GitHub we use MySQL as our main datastore. While repository data lies in &lt;code class=&quot;highlighter-rouge&quot;&gt;git&lt;/code&gt;, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth against MySQL via a custom git proxy (&lt;a href=&quot;http://githubengineering.com/benchmarking-github-enterprise/&quot;&gt;babeld&lt;/a&gt;). To be able to serve under the high load GitHub operates at, we use MySQL replication to scale out read load.&lt;/p&gt;

&lt;p&gt;We have different clusters to provide with different types of services, but the single-writer-multiple-readers design applies to them all. Depending on growth of traffic, on application demand, on operational tasks or other constraints, we take replicas in or out of our pools. Depending on workloads some replicas may lag more than others.&lt;/p&gt;

&lt;p&gt;Displaying up-to-date data is important. We have tooling that helps us ensure we keep replication lag at a minimum, and typically it doesn’t exceed &lt;code class=&quot;highlighter-rouge&quot;&gt;1&lt;/code&gt; second. However sometimes lags do happen, and when they do, we want to put aside those lagging replicas, let them catch their breath, and avoid sending traffic their way until they are caught up.&lt;/p&gt;

&lt;p&gt;We set out to create a self-managing topology that will exclude lagging replicas automatically, handle disasters gracefully, and yet allow for complete human control and visibility.&lt;/p&gt;

&lt;h3 id=&quot;haproxy-for-load-balancing-replicas&quot;&gt;HAProxy for load balancing replicas&lt;/h3&gt;

&lt;p&gt;We use HAProxy for various tasks at GitHub. Among others, we use it to load balance our MySQL replicas. Our applications connect to HAProxy servers at &lt;code class=&quot;highlighter-rouge&quot;&gt;:3306&lt;/code&gt; and are routed to replicas that can serve read requests. Exactly what makes a replica able to “serve read requests” is the topic of this post.&lt;/p&gt;

&lt;p&gt;MySQL load balancing via HAProxy is commonly used, but we wanted to tackle a few operational and availability concerns:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Can we automate exclusion and inclusion of backend servers based on replication status?&lt;/li&gt;
  &lt;li&gt;Can we automate exclusion and inclusion of backend servers based on server role?&lt;/li&gt;
  &lt;li&gt;How can we react to a scenario where too many servers are excluded, and we are only left with one or two “good” replicas?&lt;/li&gt;
  &lt;li&gt;Can we &lt;em&gt;always serve&lt;/em&gt;?&lt;/li&gt;
  &lt;li&gt;How easy would it be to override pool membership manually?&lt;/li&gt;
  &lt;li&gt;Will our solution survive a &lt;code class=&quot;highlighter-rouge&quot;&gt;service haproxy reload/restart&lt;/code&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this criteria in mind, the standard &lt;code class=&quot;highlighter-rouge&quot;&gt;mysql-check&lt;/code&gt; commonly used in HAProxy-MySQL load balancing will not suffice.&lt;/p&gt;

&lt;p&gt;This simple check merely tests whether a MySQL server is live and doesn’t gain additional insight as for its internal replication state (lag/broken) or for its operational state (maintenance/ETL/backup jobs etc.).&lt;/p&gt;

&lt;p&gt;Instead, we make our HAProxy pools context aware. We let the backend MySQL hosts make an informed decision: “should I be included in a pool or should I not?”&lt;/p&gt;

&lt;h3 id=&quot;context-aware-mysql-pools&quot;&gt;Context aware MySQL pools&lt;/h3&gt;

&lt;p&gt;In its very simplistic form, context awareness begins with asking the MySQL backend replica: “are you lagging?” We will reach far beyond that, but let’s begin by describing this commonly used setup.&lt;/p&gt;

&lt;p&gt;In this situation, HAProxy no longer uses a &lt;code class=&quot;highlighter-rouge&quot;&gt;mysql-check&lt;/code&gt; but rather an &lt;code class=&quot;highlighter-rouge&quot;&gt;http-check&lt;/code&gt;. The MySQL backend server provides an &lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP&lt;/code&gt; interface, responding with &lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP 200&lt;/code&gt; or &lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP 503&lt;/code&gt; depending on replication lag. HAProxy will interpret these as “good” (&lt;code class=&quot;highlighter-rouge&quot;&gt;UP&lt;/code&gt;) or “bad” (&lt;code class=&quot;highlighter-rouge&quot;&gt;DOWN&lt;/code&gt;), respectively. On the HAProxy side, it looks like this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;backend mysql_ro_main
  option httpchk GET /
  balance roundrobin
  retries 1
  timeout connect 1000
  timeout check 300
  timeout server 86400000

  default-server port 9876 fall 2 inter 5000 rise 1 downinter 5000 on-marked-down shutdown-sessions weight 10
  server my-db-0001 my-db-0001.heliumcarbon.com:3306 check
  server my-db-0002 my-db-0002.heliumcarbon.com:3306 check
  server my-db-0003 my-db-0003.heliumcarbon.com:3306 check
  server my-db-0004 my-db-0004.heliumcarbon.com:3306 check
  server my-db-0005 my-db-0005.heliumcarbon.com:3306 check
  server my-db-0006 my-db-0006.heliumcarbon.com:3306 check
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The backend servers need to provide an HTTP service on &lt;code class=&quot;highlighter-rouge&quot;&gt;:9876&lt;/code&gt;. That service would connect to MySQL, check for replication lag, and return with &lt;code class=&quot;highlighter-rouge&quot;&gt;200&lt;/code&gt; (say, &lt;code class=&quot;highlighter-rouge&quot;&gt;lag &amp;lt;= 5s&lt;/code&gt;) or &lt;code class=&quot;highlighter-rouge&quot;&gt;503&lt;/code&gt; (&lt;code class=&quot;highlighter-rouge&quot;&gt;lag &amp;gt; 5s&lt;/code&gt; or replication is broken).&lt;/p&gt;

&lt;h3 id=&quot;some-reflections&quot;&gt;Some reflections&lt;/h3&gt;

&lt;p&gt;This commonly used setup automatically excludes or includes backend servers based on replication status. If the server is lagging, the specialized HTTP service will report &lt;code class=&quot;highlighter-rouge&quot;&gt;503&lt;/code&gt;, which HAProxy will interpret as &lt;code class=&quot;highlighter-rouge&quot;&gt;DOWN&lt;/code&gt;, and the server will not serve traffic until it recovers.&lt;/p&gt;

&lt;p&gt;But, what happens when two, three, or four replicas are lagging? We are left with less and less serving capacity. The remaining replicas are receiving two or three times more traffic than they’re used to receiving. If this happens, the replicas might succumb to the load and lag as well, and the solution above might not be able to handle an entire fleet of lagging replicas.&lt;/p&gt;

&lt;p&gt;What’s more, some of our replicas have special roles. Each cluster has a node running continuous logical or physical backups. For example, other nodes might be serving a purely analytical workload or be partially weighted to verify a newer MySQL version.&lt;/p&gt;

&lt;p&gt;In the past, we would update the HAProxy config file with the list of servers as they came and went. As we grew in volume and in number of servers this became an operational overhead. We’d rather take a more dynamic approach that provides increased flexibility.&lt;/p&gt;

&lt;p&gt;We may operate a MySQL master failover. This may be a planned operation (e.g. upgrading to latest release) or an unplanned one (e.g. automated failover on hardware failure). The new master must be excluded from the read-pool. The old master, if available, may now serve reads. Again, we wish to avoid the need to update HAProxy’s configuration with these changes.&lt;/p&gt;

&lt;h3 id=&quot;static-haproxy-configuration-dynamic-decisions&quot;&gt;Static HAProxy configuration, dynamic decisions&lt;/h3&gt;

&lt;p&gt;In our current setup the HAProxy configuration does not regularly change. It may change when we introduce new hardwares now and then, but otherwise it is static, and HAProxy reacts to ongoing instructions by the backend servers telling it:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;I’m good to participate in a pool (&lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP 200&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;I’m in bad state; don’t send traffic my way (&lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP 503&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;I’m in maintenance mode. No error on my side, but don’t send traffic my way (&lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP 404&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The HAProxy config file lists each and every known server. The list includes the backup server. It includes the analytics server. It even includes the master. And the backend servers tell HAProxy whether they wish to participate in taking read traffic or not.&lt;/p&gt;

&lt;p&gt;The HAProxy config file lists each and every known server. The list includes the backup server, the analytics server, and even the master. The backend servers themselves tell HAProxy whether they wish to participate in taking read traffic or not.&lt;/p&gt;

&lt;p&gt;Before showing you how to implement this, let’s consider availability.&lt;/p&gt;

&lt;h3 id=&quot;graceful-failover-of-pools&quot;&gt;Graceful failover of pools&lt;/h3&gt;

&lt;p&gt;HAProxy supports multiple backend pools per frontend, and provides with Access Control Lists (&lt;a href=&quot;https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#7&quot;&gt;ACLs&lt;/a&gt;). ACLs often use incoming connection data (headers, cookies etc.) but are also able to observe backend status.&lt;/p&gt;

&lt;p&gt;The scheme is to define two (or more) backend pools:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The first (“main”/”normal”) pool consists of replicas with acceptable lag that are able to serve traffic, as above&lt;/li&gt;
  &lt;li&gt;The second (“backup”) pool consists of valid replicas which are allowed to be lagging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We use an &lt;code class=&quot;highlighter-rouge&quot;&gt;acl&lt;/code&gt; that observes the number of available servers in our &lt;code class=&quot;highlighter-rouge&quot;&gt;main&lt;/code&gt; backend. We then set a rule to use the &lt;code class=&quot;highlighter-rouge&quot;&gt;backup&lt;/code&gt; pool if that &lt;code class=&quot;highlighter-rouge&quot;&gt;acl&lt;/code&gt; applies:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;frontend mysql_ro
  ...
  acl mysql_not_enough_capacity nbsrv(mysql_ro_main) lt 3
  use_backend mysql_ro_backup if mysql_not_enough_capacity
  default_backend mysql_ro_main
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;See &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/haproxy/haproxy-sample.cfg#L4-L13&quot;&gt;code sample&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example above we choose to switch to the &lt;code class=&quot;highlighter-rouge&quot;&gt;mysql_ro_backup&lt;/code&gt; pool when left with less than three active hosts in our &lt;code class=&quot;highlighter-rouge&quot;&gt;mysql_ro_main&lt;/code&gt; pool. We’d rather serve stale data than stop serving altogether. Of course, by this time our alerting system will have alerted us to the situation and we will already be looking into the source of the problem.&lt;/p&gt;

&lt;p&gt;Remember that it’s not HAProxy that makes the decision “who’s in and who’s out” but the backend server itself. To that effect, HAProxy sends a check &lt;em&gt;hint&lt;/em&gt; to the server. We choose to send the hint in the form of a URI, as this makes for a readable, clear code:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;backend mysql_ro_main
  option httpchk GET /check-lag
  http-check disable-on-404
  balance roundrobin
  retries 1
  timeout connect 1000
  timeout check 300
  timeout server 86400000

  default-server port 9876 fall 2 inter 5000 rise 1 downinter 5000 on-marked-down shutdown-sessions weight 10
  server my-db-0001 my-db-0001.heliumcarbon.com:3306 check
  server my-db-0002 my-db-0002.heliumcarbon.com:3306 check
  server my-db-0003 my-db-0003.heliumcarbon.com:3306 check
  server my-db-0004 my-db-0004.heliumcarbon.com:3306 check
  server my-db-0005 my-db-0005.heliumcarbon.com:3306 check
  server my-db-0006 my-db-0006.heliumcarbon.com:3306 check

backend mysql_ro_backup
  option httpchk GET /ignore-lag
  http-check disable-on-404
  balance roundrobin
  retries 1
  timeout connect 1000
  timeout check 300
  timeout server 86400000

  default-server port 9876 fall 2 inter 10000 rise 1 downinter 10000 on-marked-down shutdown-sessions weight 10
  server my-db-0001 my-db-0001.heliumcarbon.com:3306 check
  server my-db-0002 my-db-0002.heliumcarbon.com:3306 check
  server my-db-0003 my-db-0003.heliumcarbon.com:3306 check
  server my-db-0004 my-db-0004.heliumcarbon.com:3306 check
  server my-db-0005 my-db-0005.heliumcarbon.com:3306 check
  server my-db-0006 my-db-0006.heliumcarbon.com:3306 check
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;See &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/haproxy/haproxy-sample.cfg#L15-L47&quot;&gt;code sample&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both backend pools list the exact same servers. The major difference between the pools is the &lt;code class=&quot;highlighter-rouge&quot;&gt;check&lt;/code&gt; URI:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;option httpchk GET /check-lag
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;vs.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;option httpchk GET /ignore-lag
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;As the URI suggests, the first, &lt;code class=&quot;highlighter-rouge&quot;&gt;main&lt;/code&gt; pool is looking for backup servers that do not lag (and we wish to also exclude the master, the backup server, etc.). The &lt;code class=&quot;highlighter-rouge&quot;&gt;backup&lt;/code&gt; pool is happy to take servers that actually do lag. But still, it wishes to exclude the master and other special servers.&lt;/p&gt;

&lt;p&gt;HAProxy’s behavior is to use the &lt;code class=&quot;highlighter-rouge&quot;&gt;main&lt;/code&gt; pool for as long as at least three replicas are happy to serve data. If only two replicas or less are in good shape, HAProxy switches to the &lt;code class=&quot;highlighter-rouge&quot;&gt;backup&lt;/code&gt; pool, where we re-introduce the lagging replicas; serving more stale data, but still serving.&lt;/p&gt;

&lt;p&gt;Also noteworthy in the above is &lt;code class=&quot;highlighter-rouge&quot;&gt;http-check disable-on-404&lt;/code&gt;, which puts a &lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP 404&lt;/code&gt; server in a &lt;code class=&quot;highlighter-rouge&quot;&gt;NOLB&lt;/code&gt; state. We will discuss this in more detail soon.&lt;/p&gt;

&lt;h3 id=&quot;implementing-the-http-service&quot;&gt;Implementing the HTTP service&lt;/h3&gt;

&lt;p&gt;Any &lt;code class=&quot;highlighter-rouge&quot;&gt;HTTP&lt;/code&gt; service implementation will do. At GitHub, we commonly use &lt;code class=&quot;highlighter-rouge&quot;&gt;shell&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;Ruby&lt;/code&gt; scripts, that integrate well with our &lt;a href=&quot;https://hubot.github.com/&quot;&gt;ChatOps&lt;/a&gt;. We have many reliable &lt;code class=&quot;highlighter-rouge&quot;&gt;shell&lt;/code&gt; building blocks, and our current solution is a &lt;code class=&quot;highlighter-rouge&quot;&gt;shell&lt;/code&gt; oriented service, in the form of &lt;code class=&quot;highlighter-rouge&quot;&gt;xinetd&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;xinetd&lt;/code&gt; makes it easy to “speak HTTP” via &lt;code class=&quot;highlighter-rouge&quot;&gt;shell&lt;/code&gt;. A simplified setup looks like this:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/xinetd/xinetd.conf&quot;&gt;xinetd config&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/xinetd/mysqlchk_general&quot;&gt;mysqlcheck_general&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/puppet/xinetd.pp&quot;&gt;puppet&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the above, we’re in particular interested that &lt;code class=&quot;highlighter-rouge&quot;&gt;xinetd&lt;/code&gt; serves on &lt;code class=&quot;highlighter-rouge&quot;&gt;:9876&lt;/code&gt; and calls upon &lt;code class=&quot;highlighter-rouge&quot;&gt;/path/to/scipts/xinetd-mysql&lt;/code&gt; to respond to HAPRoxy’s &lt;code class=&quot;highlighter-rouge&quot;&gt;check&lt;/code&gt; requests.&lt;/p&gt;

&lt;h3 id=&quot;implementing-the-check-script&quot;&gt;Implementing the check script&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/scripts/xinetd-mysql&quot;&gt;xinetd-mysql&lt;/a&gt; script routes the request to an appropriate handler. Recall that we asked HAProxy to &lt;em&gt;hint&lt;/em&gt; per &lt;code class=&quot;highlighter-rouge&quot;&gt;check&lt;/code&gt;. The hint URI, such as &lt;code class=&quot;highlighter-rouge&quot;&gt;/check-lag&lt;/code&gt;, is intercepted by &lt;code class=&quot;highlighter-rouge&quot;&gt;xinetd-mysql&lt;/code&gt; which further invokes a dedicated handler for this check. Thus, we have different handlers for &lt;code class=&quot;highlighter-rouge&quot;&gt;/check-lag&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;/ignore-lag&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;/ignore-lag-and-yes-please-allow-backup-servers-as-well&lt;/code&gt; etc.&lt;/p&gt;

&lt;p&gt;The real magic happens when running &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/scripts/xinetd-mysql-check-lag&quot;&gt;this handler script&lt;/a&gt;. This is where the server makes the decision: “Should I be included in the read-pool or not?” The script bases its decision on the following factors:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Did a human suggest that this server be explicitly included/excluded?
This is just a matter of &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/scripts/xinetd-mysql-check-lag#L9-L17&quot;&gt;touching a file&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Is this the &lt;em&gt;master&lt;/em&gt; server? A &lt;em&gt;backup&lt;/em&gt; server? Something else?
The server happens to know its own role via service discovery or even via &lt;code class=&quot;highlighter-rouge&quot;&gt;puppet&lt;/code&gt;. We &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/scripts/xinetd-mysql-check-lag#L20-L26&quot;&gt;check for a hint file&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Is MySQL lagging? Is it alive at all?
This (finally) &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd/blob/master/scripts/xinetd-mysql-check-lag#L30-L45&quot;&gt;executes a self-check&lt;/a&gt; on MySQL.
For lag we use a &lt;a href=&quot;https://www.percona.com/doc/percona-toolkit/pt-heartbeat.html&quot;&gt;heartbeat&lt;/a&gt; mechanism, but your mileage may vary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This &lt;code class=&quot;highlighter-rouge&quot;&gt;xinetd&lt;/code&gt;/&lt;code class=&quot;highlighter-rouge&quot;&gt;shell&lt;/code&gt; implementation suggests we do not use persistent MySQL connections; each &lt;code class=&quot;highlighter-rouge&quot;&gt;check&lt;/code&gt; generates a new connection on the backend server. While this seems wasteful, the rate of incoming check requests is not high, and negligible in the scale of our busy servers. But, furthermore, this better serves our trust in the system: a hogged server may be able to serve existing connections but refuse new ones; we’re happy to catch this scenario.&lt;/p&gt;

&lt;h3 id=&quot;or-error&quot;&gt;404 or error?&lt;/h3&gt;

&lt;p&gt;Servers that just don’t want to participate send a &lt;code class=&quot;highlighter-rouge&quot;&gt;404&lt;/code&gt;, causing them to go &lt;code class=&quot;highlighter-rouge&quot;&gt;NOLB&lt;/code&gt;. Lagging, broken or dead replicas send a &lt;code class=&quot;highlighter-rouge&quot;&gt;503&lt;/code&gt;. This makes it easier on our alerting system and makes it clearer when we have a problem.&lt;/p&gt;

&lt;p&gt;One outstanding issue is that HAProxy never transitions from &lt;code class=&quot;highlighter-rouge&quot;&gt;DOWN&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;NOLB&lt;/code&gt;. The automaton requires first going &lt;code class=&quot;highlighter-rouge&quot;&gt;UP&lt;/code&gt;. This is not an integrity problem but causes more alerting. We work around this by cross checking servers and refreshing if need be. This is a rare situation for us and thus of no significant concern.&lt;/p&gt;

&lt;h3 id=&quot;operations&quot;&gt;Operations&lt;/h3&gt;

&lt;p&gt;This small building blocks design permits us to do simple unit testing. Control and visibility are easily gained: disabling and enabling servers is a matter of creating a file. Whether forced to exist by a human or implied by server role.&lt;/p&gt;

&lt;p&gt;These scripts integrate well within our chatops. We are able to see the exact response HAProxy sees via simple chatops commands:&lt;/p&gt;

&lt;div class=&quot;chat&quot; style=&quot;margin: 30px 0;&quot;&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .mysql xinetd my-db-0004 /check-lag
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      200 ; OK
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Or we can interfere and force backends in/out the pools:&lt;/p&gt;

&lt;div class=&quot;chat&quot; style=&quot;margin: 30px 0;&quot;&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .mysqlproxy host force-disable my-db-0004
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      Host my-db-0004 disabled by @shlomi-noach at 2016-06-30 15:22:07
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .mysqlproxy host restore my-db-0004
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      Host my-db-0004 restored to normal state
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;We have specialized monitoring for these HAProxy boxes, but we don’t wish to be notified if a single replica starts to lag. Rather, we’re interested in the bigger picture: a summary of the total found errors in the pools. This means there’s a difference between a half empty &lt;code class=&quot;highlighter-rouge&quot;&gt;main&lt;/code&gt; pool and a completely empty one. In the event of problems, we get a single alert that summarizes the status across the cluster’s pools. As always, we can also check from chatops:&lt;/p&gt;

&lt;div class=&quot;chat&quot; style=&quot;margin: 30px 0;&quot;&gt;
  &lt;div class=&quot;message self&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22&quot; alt=&quot;shlomi-noach&quot; srcset=&quot;https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=22 1x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=44 2x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=66 3x, https://avatars3.githubusercontent.com/shlomi-noach?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;shlomi-noach&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
      .mysqlproxy sup myproxy-0001
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;message robot&quot;&gt;
    &lt;img class=&quot;avatar avatar-small&quot; src=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22&quot; alt=&quot;hubot&quot; srcset=&quot;https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=22 1x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=44 2x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=66 3x, https://avatars1.githubusercontent.com/hubot?v=3&amp;amp;s=88 4x&quot; width=&quot;22&quot; height=&quot;22&quot; data-proofer-ignore=&quot;true&quot; /&gt;
    &lt;b class=&quot;author&quot;&gt;Hubot&lt;/b&gt;
    &lt;div class=&quot;entry&quot;&gt;
&lt;pre&gt;&lt;code&gt;OK
mysql_ro_main OK
  3/10 servers are nolb in pool
mysql_ro_backup OK
  3/10 servers are nolb in pool
&lt;/code&gt;&lt;/pre&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;We’ve stripped our script and config files to decouple them from GitHub’s specific setup and flow. We’ve also &lt;a href=&quot;https://github.com/github/mysql-haproxy-xinetd&quot;&gt;open sourced them&lt;/a&gt; in the hope that you’ll  find them useful, and that they’ll help you implement your own solution with context-aware MySQL replica pools.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;shlomi-noach&quot;, &quot;fullname&quot;=&gt;&quot;Shlomi Noach&quot;, &quot;twitter&quot;=&gt;&quot;ShlomiNoach&quot;, &quot;role&quot;=&gt;&quot;Senior Infrastructure Engineer&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;Website&quot;, &quot;url&quot;=&gt;&quot;http://openark.org&quot;}, {&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/shlomi-noach&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/ShlomiNoach&quot;}]}</name></author><summary type="html">At GitHub we use MySQL as our main datastore. While repository data lies in git, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth against MySQL via a custom git proxy (babeld). To be able to serve under the high load GitHub operates at, we use MySQL replication to scale out read load.</summary></entry><entry><title type="html">gh-ost: GitHub’s online schema migration tool for MySQL</title><link href="http://githubengineering.com/gh-ost-github-s-online-migration-tool-for-mysql/" rel="alternate" type="text/html" title="gh-ost: GitHub's online schema migration tool for MySQL" /><published>2016-08-01T00:00:00+00:00</published><updated>2016-08-01T00:00:00+00:00</updated><id>http://githubengineering.com/gh-ost-github-s-online-migration-tool-for-mysql</id><content type="html" xml:base="http://githubengineering.com/gh-ost-github-s-online-migration-tool-for-mysql/">&lt;p&gt;Today we are announcing the open source release of &lt;a href=&quot;http://github.com/github/gh-ost&quot;&gt;gh-ost&lt;/a&gt;: GitHub’s triggerless online schema migration tool for MySQL.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; has been developed at GitHub in recent months to answer a problem we faced with ongoing, continuous production changes requiring modifications to MySQL tables. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; changes the existing online table migration paradigm by providing a low impact, controllable, auditable, operations friendly solution.&lt;/p&gt;

&lt;p&gt;MySQL table migration is a well known problem, and has been addressed by online schema change tools since 2009. Growing, fast-paced products often require changes to database structure. Adding/changing/removing columns and indexes etc., are blocking operations with the default MySQL behavior. We conduct such schema changes multiple times per day and wish to minimize user facing impact.&lt;/p&gt;

&lt;p&gt;Before illustrating &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt;, let’s address the existing solutions and the reasoning for embarking on a new tool.&lt;/p&gt;

&lt;h3 id=&quot;online-schema-migrations-existing-landscape&quot;&gt;Online schema migrations, existing landscape&lt;/h3&gt;

&lt;p&gt;Today, online schema changes are made possible via these three main options:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Migrate the schema on a replica, clone/apply on other replicas, promote refactored replica as new master&lt;/li&gt;
  &lt;li&gt;Use MySQL’s Online DDL for InnoDB&lt;/li&gt;
  &lt;li&gt;Use a schema migration tool. Most common today are &lt;a href=&quot;https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;pt-online-schema-change&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/&quot;&gt;Facebook’s OSC&lt;/a&gt;; also found are &lt;a href=&quot;https://github.com/soundcloud/lhm&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;LHM&lt;/code&gt;&lt;/a&gt; and the original &lt;a href=&quot;http://shlomi-noach.github.io/openarkkit/oak-online-alter-table.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;oak-online-alter-table&lt;/code&gt;&lt;/a&gt; tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other options include Rolling Schema Upgrade with Galera Cluster, and otherwise non-InnoDB storage engines. At GitHub we use the common master-replicas architecture and utilize the reliable InnoDB engine.&lt;/p&gt;

&lt;p&gt;Why have we decided to embark on a new solution rather than use either of the above? The existing solutions are all limited in their own ways, and the below is a very brief and generalized breakdown of some of their shortcomings. We will drill down more in-depth about the shortcomings of the trigger-based online schema change tools.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Replica migration makes for an operational overhead, which requires larger host count, longer delivery times and more complex management. Changes are applied explicitly on specific replicas or on sub-trees of the topology. Such considerations as hosts going down, host restores from an earlier backup, newly provisioned hosts, all require a strict tracking system for per-host changes. A change might require multiple iterations, hence more time. Promoting a replica to master incurs a brief outage. Multiple changes going at once are more difficult to coordinate. We commonly deploy multiple schema changes per day and wish to be free of the management overhead, while we recognize this solution to be in use.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;MySQL’s Online DDL for InnoDB is only “online” on the server on which it is invoked. Replication stream serializes the &lt;code class=&quot;highlighter-rouge&quot;&gt;alter&lt;/code&gt; which causes replication lag. An attempt to run it individually per-replica results in much of the management overhead mentioned above. The DDL is uninterruptible; killing it halfway results in long rollback or with data dictionary corruption. It does not play “nice”; it cannot throttle or pause on high load. It is a commitment into an operation that may exhaust your resources.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We’ve been using &lt;code class=&quot;highlighter-rouge&quot;&gt;pt-online-schema-change&lt;/code&gt; for years. However as we grew in volume and traffic, we hit more and more problems, to the point of considering many migrations as “risky operations”. Some migrations would only be able to run during off-peak hours or through weekends; others would consistently cause MySQL outage.
All existing online-schema-change tools utilize MySQL &lt;code class=&quot;highlighter-rouge&quot;&gt;triggers&lt;/code&gt; to perform the migration, and therein lies a few problems.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;whats-wrong-with-trigger-based-migrations&quot;&gt;What’s wrong with trigger-based migrations?&lt;/h3&gt;

&lt;p&gt;All online-schema-change tools operate in similar manner: they create a &lt;em&gt;ghost&lt;/em&gt; table, in the likeness of your original table, migrate that table while empty, slowly and incrementally copy data from your original table to the &lt;em&gt;ghost&lt;/em&gt; table, meanwhile propagating ongoing changes (any &lt;code class=&quot;highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; applied to your table) to the &lt;em&gt;ghost&lt;/em&gt; table. When the tool is satisfied the tables are in sync, it replaces your original table with the &lt;em&gt;ghost&lt;/em&gt; table.&lt;/p&gt;

&lt;p&gt;Tools like &lt;code class=&quot;highlighter-rouge&quot;&gt;pt-online-schema-change&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;LHM&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;oak-online-alter-table&lt;/code&gt; use a synchronous approach, where each change to your table translates immediately, utilizing same transaction space, to a mirrored change on the &lt;em&gt;ghost&lt;/em&gt; table. The Facebook tool uses an asynchronous approach of writing changes to a changelog table, then iterating that and applying changes onto the &lt;em&gt;ghost&lt;/em&gt; table. All of these tools use triggers to identify those ongoing changes to your table.&lt;/p&gt;

&lt;p&gt;Triggers are stored routines which are invoked on a per-row operation upon &lt;code class=&quot;highlighter-rouge&quot;&gt;INSERT&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;DELETE&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;UPDATE&lt;/code&gt; on a table. A trigger may contain a set of queries, and these queries run in the same transaction space as the query that manipulates the table. This makes for an atomicy of both the original operation on the table and the trigger-invoked operations.&lt;/p&gt;

&lt;p&gt;Trigger usage in general, and trigger-based migrations in particular, suffer from the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Triggers, being stored routines, are interpreted code. MySQL does not precompile them. Hooking onto your query’s transaction space, they add the overhead of a parser and interpreter to each query acting on your migrated table.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Locks: the triggers share the same transaction space as the original queries, and while those queries compete for locks on the table, the triggers independently compete on locks on another table. This is in particular acute with the synchronous approach. Lock contention is directly related to write concurrency on the master. We have experienced near or complete lock downs in production, to the effect of rendering the table or the entire database inaccessible due to lock contention.
Another aspect of trigger locks is the metadata locks they require when created or destroyed. We’ve seen stalls to the extent of many seconds to a minute while attempting to remove triggers from a busy table at the end of a migration operation.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Non pausability: when load on the master turns high, you wish to throttle or suspend your pending migration. However a trigger-based solution cannot truly do so. While it may suspend the row-copy operation, it cannot suspend the triggers. Removal of the triggers results in data loss. Thus, the triggers must keep working throughout the migration. On busy servers, we have seen that even as the online operation throttles, the master is brought down by the load of the triggers.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Concurrent migrations: we or others may be interested in being able to run multiple concurrent migrations (on different tables). Given the above trigger overhead, we are not prepared to run multiple concurrent trigger-based migrations. We are unaware of anyone doing so in practice.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Testing: we might want to experiment with a migration, or evaluate its load. Trigger based migrations can only simulate a migration on replicas via Statement Based Replication, and are far from representing a true master migration given that the workload on a replica is single threaded (that is always the case on a per-table basis, regardless of multi-threaded replication technology in use).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;gh-ost&quot;&gt;gh-ost&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; stands for GitHub’s Online Schema Transmogrifier/Transfigurator/Transformer/Thingy&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;gh-ost light logo&quot; src=&quot;/images/announcing-gh-ost/gh-ost-general-flow.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Triggerless&lt;/li&gt;
  &lt;li&gt;Lightweight&lt;/li&gt;
  &lt;li&gt;Pauseable&lt;/li&gt;
  &lt;li&gt;Dynamically controllable&lt;/li&gt;
  &lt;li&gt;Auditable&lt;/li&gt;
  &lt;li&gt;Testable&lt;/li&gt;
  &lt;li&gt;Trustable&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;triggerless&quot;&gt;Triggerless&lt;/h4&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; does not use triggers. It intercepts changes to table data by tailing the binary logs. It therefore works in an asynchronous approach, applying the changes to the &lt;em&gt;ghost&lt;/em&gt; table some time after they’ve been committed.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; expects binary logs in RBR (Row Based Replication) format; however that does not mean you cannot use it to migrate a master running with SBR (Statement Based Replication). In fact, we do just that. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is happy to read binary logs from a replica that translates SBR to RBR, and it is happy to reconfigure the replica to do that.&lt;/p&gt;

&lt;h4 id=&quot;lightweight&quot;&gt;Lightweight&lt;/h4&gt;

&lt;p&gt;By not using triggers, &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; decouples the migration workload from the general master workload. It does not regard the concurrency and contention of queries running on the migrated table. Changes applied by such queries are streamlined and serialized in the binary log, where &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; picks them up to apply on the &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; table. In fact, &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; also serializes the row-copy writes along with the binary log event writes. Thus, the master only observes a single connection that is sequentially writing to the &lt;em&gt;ghost&lt;/em&gt; table. This is not very different from ETLs.&lt;/p&gt;

&lt;h4 id=&quot;pauseable&quot;&gt;Pauseable&lt;/h4&gt;

&lt;p&gt;Since all writes are controlled by &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt;, and since reading the binary logs is an asynchronous operation in the first place, &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is able to suspend all writes to the master when throttling. Throttling implies no row-copy on the master &lt;em&gt;and&lt;/em&gt; no row updates. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; does create an internal tracking table and keeps writing heartbeat events to that table even when throttled, in negligible volumes.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; takes throttling one step further and offers multiple controls over throttling:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Load: a familiar feature for users of &lt;code class=&quot;highlighter-rouge&quot;&gt;pt-online-schema-change&lt;/code&gt;, one may set thresholds on MySQL metrics, such as &lt;code class=&quot;highlighter-rouge&quot;&gt;Threads_running=30&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Replication lag: &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; has a built-in heartbeat mechanism which it utilizes to examine replication lag; you may specify control replicas, or &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will implicitly use the replica you hook it to in the first place.&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Query: you may present with a query that decides if throttling should kick in. Consider &lt;code class=&quot;highlighter-rouge&quot;&gt;SELECT HOUR(NOW()) BETWEEN 8 and 17&lt;/code&gt;.&lt;/p&gt;

    &lt;p&gt;All the above metrics can be &lt;em&gt;dynamically changed&lt;/em&gt; even while the migration is executing.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;Flag file: touch a file and &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; begins throttling. Remove the file and it resumes work.&lt;/li&gt;
  &lt;li&gt;User command: dynamically connect to &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; (see following) across the network and &lt;em&gt;instruct it&lt;/em&gt; to start throttling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;dynamically-controllable&quot;&gt;Dynamically controllable&lt;/h4&gt;

&lt;p&gt;With existing tools, when a migration generates a high load, the DBA would reconfigure, say, a smaller &lt;code class=&quot;highlighter-rouge&quot;&gt;chunk-size&lt;/code&gt;, terminate and re-run the migration from start. We find this wasteful.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; listens to requests via unix socket file and (configurable) via TCP. You may give &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; instructions even while migration is running. You may, for example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;echo throttle | socat - /tmp/gh-ost.sock&lt;/code&gt; to start throttling. Likewise you may &lt;code class=&quot;highlighter-rouge&quot;&gt;no-throttle&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Change execution parameters: &lt;code class=&quot;highlighter-rouge&quot;&gt;chunk-size=1500&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;max-lag-millis=2000&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;max-load=Thread_running=30&lt;/code&gt; are examples to instructions &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; accepts that change its behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;auditable&quot;&gt;Auditable&lt;/h4&gt;

&lt;p&gt;Likewise, the same interface can be used to ask &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; of the &lt;em&gt;status&lt;/em&gt;. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is happy to report current progress, major configuration params, identity of servers involved and more. As this information is accessible via network, it gives great visibility into the ongoing operation, that you would otherwise find today only by using a shared screen or tailing log files.&lt;/p&gt;

&lt;h4 id=&quot;testable&quot;&gt;Testable&lt;/h4&gt;

&lt;p&gt;Because the binary log content is decoupled from the master’s workload, applying a migration on a replica is more  similar to a true master migration (though still not completely, and more work is on the roadmap).&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; comes with built-in support for testing via &lt;code class=&quot;highlighter-rouge&quot;&gt;--test-on-replica&lt;/code&gt;: it allows you to run a migration on a replica, such that at the end of the migration &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; would stop the replica, swap tables, reverse the swap, and leave you with both tables in place and in sync, replication stopped. This allows you to examine and compare the two tables at your leisure.&lt;/p&gt;

&lt;p&gt;This is how we test &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; in production at GitHub: we have multiple designated production replicas; they are not serving traffic but instead running continuous covering migration test on all tables. Each of our production tables, as small as empty and as large as many hundreds of GB, is being migrated via a trivial statement that does not really modify its structure (&lt;code class=&quot;highlighter-rouge&quot;&gt;engine=innodb&lt;/code&gt;). Each such migration ends with stopped replication. We take complete checksum of entire table data from both the original table and &lt;em&gt;ghost&lt;/em&gt; table and expect them to be identical. We then resume replication and proceed to next table. Every single one of our production tables is &lt;em&gt;known&lt;/em&gt; to have passed multiple successful migrations via &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt;, on replica.&lt;/p&gt;

&lt;h4 id=&quot;trustable&quot;&gt;Trustable&lt;/h4&gt;

&lt;p&gt;All the above, and more, are made to build trust with &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt;’s operation. After all, it is a new tool in a landscape that has used the same tool for years.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;We test &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; on replicas; we’ve completed thousands of successful migrations before trying it out on masters for the first time. So can you. Migrate your replicas, verify the data is intact. We want you to do that!&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;As you execute &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt;, and as you may suspect load on your master is increasing, go ahead and initiate throttling. Touch a file. &lt;code class=&quot;highlighter-rouge&quot;&gt;echo throttle&lt;/code&gt;. See how the load on your master is just back to normal. By just knowing you &lt;em&gt;can&lt;/em&gt; do that, you will gain a lot of peace of mind.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;A migration begins and the ETA says it’s going to end at &lt;code class=&quot;highlighter-rouge&quot;&gt;2:00am&lt;/code&gt;? Are you concerned with the final cut-over, where the tables are swapped, and you want to stick around? You can instruct &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; to &lt;em&gt;postpone&lt;/em&gt; the cut-over using a flag file. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will complete the row-copy but will not flip the tables. Instead, it will keep applying ongoing changes, keeping the &lt;em&gt;ghost&lt;/em&gt; table in sync. As you come to the office the next day, remove the flag file or &lt;code class=&quot;highlighter-rouge&quot;&gt;echo unpostpone&lt;/code&gt; into &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt;, and the cut-over will be made. We don’t like our software to bind us into observing its behavior. It should instead liberate us to do things humans do.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Speaking of ETA, &lt;code class=&quot;highlighter-rouge&quot;&gt;--exact-rowcount&lt;/code&gt; will keep you smiling. Pay the initial price of a lengthy &lt;code class=&quot;highlighter-rouge&quot;&gt;SELECT COUNT(*)&lt;/code&gt; on your table. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will get an accurate estimate of the amount of work it needs to do. It will heuristically &lt;em&gt;update&lt;/em&gt; that estimation as migration proceeds. While ETA timing is always subject to change, progress percentage turns accurate. If, like us, you’ve been bitten by migrations stating &lt;code class=&quot;highlighter-rouge&quot;&gt;99%&lt;/code&gt; then stalling for an hour keeping you biting your fingernails, you’ll appreciate the change.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;gh-ost-operation-modes&quot;&gt;gh-ost operation modes&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; operates by connecting to potentially multiple servers, as well as connecting itself as a replica in order to stream binary log events directly from one of those servers. There are various operation modes, which depend on your setup, configuration, and where you want to run the migration.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;gh-ost operation modes&quot; src=&quot;/images/announcing-gh-ost/gh-ost-operation-modes.png&quot; /&gt;
&lt;/div&gt;

&lt;h5 id=&quot;a-connect-to-replica-migrate-on-master&quot;&gt;a. Connect to replica, migrate on master&lt;/h5&gt;

&lt;p&gt;This is the mode &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; expects by default. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will investigate the replica, crawl up to find the topology’s master, and connect to it as well. Migration will:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Read and write row-data on master&lt;/li&gt;
  &lt;li&gt;Read binary logs events on the replica, apply the changes onto the master&lt;/li&gt;
  &lt;li&gt;Investigate table format, columns &amp;amp; keys, count rows on the replica&lt;/li&gt;
  &lt;li&gt;Read internal changelog events (such as heartbeat) from the replica&lt;/li&gt;
  &lt;li&gt;Cut-over (switch tables) on the master&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your master works with SBR, this is the mode to work with. The replica must be configured with binary logs enabled (&lt;code class=&quot;highlighter-rouge&quot;&gt;log_bin&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;log_slave_updates&lt;/code&gt;) and should have &lt;code class=&quot;highlighter-rouge&quot;&gt;binlog_format=ROW&lt;/code&gt; (&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; can apply the latter for you).&lt;/p&gt;

&lt;p&gt;However even with RBR we suggest this is the least master-intrusive operation mode.&lt;/p&gt;

&lt;h5 id=&quot;b-connect-to-master&quot;&gt;b. Connect to master&lt;/h5&gt;

&lt;p&gt;If you don’t have replicas, or do not wish to use them, you are still able to operate directly on the master. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will do all operations directly on the master. You may still ask it to be considerate of replication lag.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Your master must produce binary logs in RBR format.&lt;/li&gt;
  &lt;li&gt;You must approve this mode via &lt;code class=&quot;highlighter-rouge&quot;&gt;--allow-on-master&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h5 id=&quot;c-migratetest-on-replica&quot;&gt;c. Migrate/test on replica&lt;/h5&gt;

&lt;p&gt;This will perform a migration on the replica. &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will briefly connect to the master but will thereafter perform all operations on the replica without modifying anything on the master.
Throughout the operation, &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; will throttle such that the replica is up to date.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;--migrate-on-replica&lt;/code&gt; indicates to &lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; that it must migrate the table directly on the replica. It will perform the cut-over phase even while replication is running.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;--test-on-replica&lt;/code&gt; indicates the migration is for purpose of testing only. Before cut-over takes place, replication is stopped. Tables are swapped and then swapped back: your original table returns to its original place.
Both tables are left with replication stopped. You may examine the two and compare data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;gh-ost-at-github&quot;&gt;gh-ost at GitHub&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is now powering all of our production migrations. We’re running it daily, as engineering requests come, sometimes multiple times a day. With its auditing and control capabilities, we will be integrating it into our chatops. Our engineers will have clear insight into migration progress and will be able to control its behavior. Metrics and events are being collected and will provide with clear visibility into migration operations in production.&lt;/p&gt;

&lt;h3 id=&quot;open-source&quot;&gt;Open source&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is &lt;a href=&quot;https://github.com/github/gh-ost&quot;&gt;released&lt;/a&gt; with &lt;span class=&quot;octicon octicon-heart&quot;&gt;&lt;/span&gt; to the open source community &lt;a href=&quot;https://github.com/github/gh-ost/blob/master/LICENSE&quot;&gt;under the &lt;code class=&quot;highlighter-rouge&quot;&gt;MIT&lt;/code&gt; license&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While we find it to be stable, we have improvements we want to make. We release it at this time as we wish to welcome community participation and contributions. From time to time we may publish suggestions for community contributions.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is actively maintained. We encourage you to try it out, test it; we’ve made great efforts to make it trustworthy.&lt;/p&gt;

&lt;h3 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;gh-ost&lt;/code&gt; is designed, developed, reviewed and tested by the database infrastructure engineering team at GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/jonahberquist&quot;&gt;@jonahberquist&lt;/a&gt;, &lt;a href=&quot;https://github.com/ggunson&quot;&gt;@ggunson&lt;/a&gt;, &lt;a href=&quot;https://github.com/tomkrouper&quot;&gt;@tomkrouper&lt;/a&gt;, &lt;a href=&quot;https://github.com/shlomi-noach&quot;&gt;@shlomi-noach&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We would like to acknowledge the engineers at GitHub who have provided valuable information and advice. Thank you to our friends from the MySQL community who have reviewed and commented on this project during its pre-production stages.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;shlomi-noach&quot;, &quot;fullname&quot;=&gt;&quot;Shlomi Noach&quot;, &quot;twitter&quot;=&gt;&quot;ShlomiNoach&quot;, &quot;role&quot;=&gt;&quot;Senior Infrastructure Engineer&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;Website&quot;, &quot;url&quot;=&gt;&quot;http://openark.org&quot;}, {&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/shlomi-noach&quot;}, {&quot;name&quot;=&gt;&quot;Twitter Profile&quot;, &quot;url&quot;=&gt;&quot;https://twitter.com/ShlomiNoach&quot;}]}</name></author><summary type="html">Today we are announcing the open source release of gh-ost: GitHub’s triggerless online schema migration tool for MySQL.</summary></entry><entry><title type="html">SYN Flood Mitigation with synsanity</title><link href="http://githubengineering.com/syn-flood-mitigation-with-synsanity/" rel="alternate" type="text/html" title="SYN Flood Mitigation with synsanity" /><published>2016-07-12T00:00:00+00:00</published><updated>2016-07-12T00:00:00+00:00</updated><id>http://githubengineering.com/syn-flood-mitigation-with-synsanity</id><content type="html" xml:base="http://githubengineering.com/syn-flood-mitigation-with-synsanity/">&lt;p&gt;GitHub hosts a wide range of user content, and like all large websites this often causes us to become a target of denial of service attacks. Around a year ago, GitHub was on the receiving end of a large, unusual and very well publicised attack involving both application level and volumetric attacks against our infrastructure.&lt;/p&gt;

&lt;p&gt;Our users rely on us to be highly available and we take this seriously. Although the attackers are doing the wrong thing, there’s no use blaming the attacker for their attacks being successful. Our commitment is to own our own availability, and that we have a responsibility to mitigate these sorts of attacks to the maximum extent technically possible.&lt;/p&gt;

&lt;p&gt;In an effort to reduce the impact of these attacks, we began work on a series of additional mitigation strategies and systems to better prepare us for a future attack of a similar nature. Today we’re sharing our mitigation for one of the attacks we received: synsanity, a SYN flood DDoS mitigation module for Linux 3.x.&lt;/p&gt;

&lt;h2 id=&quot;what-is-a-syn-flood-anyway&quot;&gt;What is a SYN flood anyway?&lt;/h2&gt;

&lt;p&gt;SYN floods are one of the oldest and most common attacks, so common that the Linux kernel includes some built in support for mitigating them. When a client connects to a server using TCP, it uses the &lt;a href=&quot;https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_establishment&quot;&gt;three-way handshake&lt;/a&gt; to synchronise:&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;TCP Three-way Handshake&quot; src=&quot;/images/syn-flood-mitigation-with-synsanity/tcp-3whs.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;A SYN packet is essentially the client telling the server “I’d like to connect”. During this handshake, both client and server generate random Initial Sequence Numbers (ISNs), which are used to synchronise the TCP connection between the two parties. These sequence numbers let TCP keep track of which messages have been sent and acknowledged by the other party.&lt;/p&gt;

&lt;p&gt;A SYN flood abuses this handshake by only going part way through the handshake. Rather than progressing through the normal sequence, an attacker floods the target server with as many SYN packets as they can muster, from as many different hosts as they can, and spoofing the origin IP as much as they can.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;SYN Flood&quot; src=&quot;/images/syn-flood-mitigation-with-synsanity/syn-flood.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;The host receiving the SYN flood must respond to each and every packet with a SYN-ACK, but unfortunately the source IP was likely spoofed, so they go nowhere (or worse, come back as rejected). These packets are almost indistinguishable from real SYN packets from real clients, which means it’s hard or impossible to filter out the bad ones on the server. Even external DDoS scrubbing services can only guess whether a packet is legitimate or part of a flood, making it difficult to mitigate an attack without impacting legitimate traffic.&lt;/p&gt;

&lt;p&gt;To make matters worse, when the server is handling normal connections and receives the ACK from a real client, it still needs to know that it came from a SYN packet it sent, so it must also keep a list of connections (in state &lt;code class=&quot;highlighter-rouge&quot;&gt;SYN_RECV&lt;/code&gt;) for which a SYN has been received and an ACK has not yet been received.&lt;/p&gt;

&lt;p&gt;During a SYN flood, this behaviour is undesirable. If the queue of connections in &lt;code class=&quot;highlighter-rouge&quot;&gt;SYN_RECV&lt;/code&gt; has no size limit, memory will get exhausted pretty quickly. If it does have a size limit, as is the case in Linux, then there’s no more space to store state and the connections will simply fail as the packets are dropped.&lt;/p&gt;

&lt;h2 id=&quot;syn-cookies&quot;&gt;SYN cookies&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/SYN_cookies&quot;&gt;SYN cookies&lt;/a&gt; are a clever way of avoiding the storage of TCP connection state during the initial handshake, deferring that storage until a valid ACK has been received. It works by crafting the Initial Sequence Number (ISN) in the SYN-ACK packet sent by the server in such a way that it cryptographically hashes details about the initial SYN packet and its TCP options, so that when the ACK is received (with a sequence number 1 larger than the ISN), the server can validate that it generated the SYN-ACK packet for which an ACK is now being received. The server stores no state for the connection until the ACK (containing the validated SYN cookie) is received, and only at that point is state regenerated and stored.&lt;/p&gt;

&lt;p&gt;Since this hash is calculated with a secret that only the server knows, it doesn’t significantly weaken the sequence number selection and it’s still difficult for someone to forge an ACK (or other packet) for a different connection without having seen the SYN-ACK from the real server.&lt;/p&gt;

&lt;p&gt;SYN cookies have been around for a while, and they have fairly minimal impact on the reliability and spoof-protection of TCP. Rather than enabling them constantly, the Linux kernel by default automatically enables SYN cookies only when the SYN receive queue is full. This means that under normal circumstances when no SYN flood is occurring, you get no impact at all, but during a SYN flood, you accept the minimal impact of SYN cookies (in return for not dropping connections). The extra CPU cost of creating SYN cookies is offset by the fact that you no longer have a limited resource, and in practise this is an excellent trade-off.&lt;/p&gt;

&lt;p&gt;In Linux 3.x, SYN cookies are generated inside a machine-wide lock on the LISTEN socket that the packet was destined for. This implementation causes all SYN cookies to be generated serially across all cores, defeating the benefits of a multi-processor system. To make matters worse, all cores spin waiting for the lock to become available. This was fine back in the days when an average attacker could only send a few MBits of SYN packets your way, mostly thanks to the networks being much slower. These days however, with servers attached to transit providers with multiple 10GB+ links the whole way down the line, it’s now possible to completely saturate CPU resources.&lt;/p&gt;

&lt;p&gt;While Linux 4.x has a patch to send SYN cookies under a per-CPU-core socket lock, which does fix the problem, we wanted a solution that allowed us to use an existing, maintained kernel with upstream security patches. We didn’t want to roll and maintain an entire custom kernel and all related future security patches just to mitigate this form of attack. Patching Linux 3.x to backport the socket lock change was also a similar maintenance burden we wanted to avoid.&lt;/p&gt;

&lt;h2 id=&quot;synproxy&quot;&gt;SYNPROXY&lt;/h2&gt;

&lt;p&gt;One solution to get the best of both worlds was the SYNPROXY iptables module. It sits in &lt;a href=&quot;http://www.netfilter.org/&quot;&gt;netfilter&lt;/a&gt; in the kernel, before the Linux TCP stack, and as the name suggests, proxies all connections while generating SYN cookies. When a SYN packet comes in, it responds with a SYN-ACK and throws away all state. On receipt of a valid ACK packet matching the SYN cookie, it then sends a SYN downstream and completes the usual TCP handshake. For every subsequent packet in each direction, it modifies the sequence numbers so that it is transparent to both sides.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;SYNPROXY packet flow&quot; src=&quot;/images/syn-flood-mitigation-with-synsanity/synproxy.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;This is quite an intrusive way of solving the problem since it touches every packet during the entire connection, but it does successfully mitigate SYN floods. Unfortunately we found that in practise under our load and with the amount of malformed packets we receive, it quickly broke down and caused a kernel panic. Additionally, it had to be enabled all the time, since there was no simple way to activate it only when under attack. This meant that we would have to accept the minimal impact of SYN cookies constantly, and at our scale this still would likely cause issues for some of our users.&lt;/p&gt;

&lt;p&gt;We decided that it was more complicated than it needed to be for our use case, and we wanted a simpler solution that would only touch the packets that needed to be touched to mitigate a SYN flood. We also decided that a mitigation should only cause potential (even if minimal) impact during mitigation, and not under normal operation.&lt;/p&gt;

&lt;h2 id=&quot;synsanity&quot;&gt;synsanity&lt;/h2&gt;

&lt;p&gt;Enter synsanity, our solution to mitigate SYN floods on Linux 3.x. synsanity is inspired by SYNPROXY, in that it is an iptables module that sits inside iptables between the Linux TCP stack and the network card. The major difference is that rather than touch all packets, synsanity simply generates a SYN cookie identically to the way the Linux kernel would generate one if the SYN queue was full, and once it validates the ACK packet, it allows it through to the standard Linux SYN cookie code, which creates and completes the connection. After this point, synsanity doesn’t touch any further packet in the TCP connection.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;synsanity packet flow&quot; src=&quot;/images/syn-flood-mitigation-with-synsanity/synsanity.png&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;Similar to the way that Linux only enables SYN cookies when the SYN queue overflows, we only enable synsanity when the SYN queue overflows as well. We match the core Linux code exactly, except that we do it in an iptables module, outside the LISTEN lock. Since an iptables module can be compiled and maintained outside the Linux kernel source tree itself, we don’t need to use a custom Linux kernel, and can instead just maintain and deploy a single module to our servers.&lt;/p&gt;

&lt;p&gt;synsanity has allowed us to mitigate multiple attacks that would have previously caused a partial or complete service outage, both long running attacks and large volume attacks.&lt;/p&gt;

&lt;div style=&quot;text-align:center&quot;&gt;
&lt;img alt=&quot;synsanity syncookie graph&quot; src=&quot;/images/syn-flood-mitigation-with-synsanity/graph-300kpps-syn-flood.png&quot; /&gt;
synsanity sending SYN cookies during a 300kpps SYN flood
&lt;/div&gt;

&lt;h2 id=&quot;open-source&quot;&gt;Open Source&lt;/h2&gt;

&lt;p&gt;We believe that if you need to hide your mitigation to keep it secure, it’s not designed well enough. The best and most secure tools are shared, open and subject to community scrutiny, so today we’re open sourcing &lt;a href=&quot;https://github.com/github/synsanity&quot;&gt;synsanity&lt;/a&gt; so that everyone can benefit from this work.&lt;/p&gt;</content><author><name>{&quot;username&quot;=&gt;&quot;theojulienne&quot;, &quot;fullname&quot;=&gt;&quot;Theo Julienne&quot;, &quot;role&quot;=&gt;&quot;Infrastructure Engineering Manager&quot;, &quot;twitter&quot;=&gt;&quot;theojulienne&quot;, &quot;links&quot;=&gt;[{&quot;name&quot;=&gt;&quot;GitHub Profile&quot;, &quot;url&quot;=&gt;&quot;https://github.com/theojulienne&quot;}]}</name></author><summary type="html">GitHub hosts a wide range of user content, and like all large websites this often causes us to become a target of denial of service attacks. Around a year ago, GitHub was on the receiving end of a large, unusual and very well publicised attack involving both application level and volumetric attacks against our infrastructure.</summary></entry></feed>
