Google suffers Sunday outage, impacts Cloud, YouTube, Gmail and more

Google Cloud experienced widespread issues on Sunday, June 2, impacting the search giant's own services, as well as that of its cloud customers.
The intermittent outage, which has since been resolved, was blamed on "high levels of network congestion."
Networking not working
Google services like YouTube, Nest and Gmail, as well as Cloud customers like Snapchat, Shopify, Vimeo and Discord, were impacted by the problem, which began around 12:15 Pacific time.
"We are experiencing high levels of network congestion in the eastern USA, affecting multiple services in Google Cloud, G Suite and YouTube. Users may see slow performance or intermittent errors," the company said on its status page at the time.
While the congestion occurred in the US, its impact was felt globally and was described by network monitoring company ThousandEyes as a "large scale" outage
The problem was resolved as of 4:00 pm Pacific time, with Google promising to "conduct an internal investigation of this issue and make appropriate improvements to our systems to help prevent or minimize future recurrence. We will provide a detailed report of this incident once we have completed our internal investigation. This detailed report will contain information regarding SLA credits."
In a statement, the company apologized for the inconvenience and thanked customers for their "patience and continued support." It added: "Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better."
The outage which, among other things, meant that Nest users could not control their thermostats, comes after several major disruptions have impacted the largest cloud companies in recent years, highlighting the difficulty of building a resilient service, even with enormous resources.
Just last year saw Google Cloud go down due to a BGP error, Microsoft Azure be knocked out by a lightning strike, and Amazon Web Services be disrupted by a “power event.”
Update: Google's VP of 24x7, Benjamin Treynor Sloss, said in a blog post: "In essence, the root cause of Sunday’s disruption was a configuration change that was intended for a small number of servers in a single region. The configuration was incorrectly applied to a larger number of servers across several neighbouring regions, and it caused those regions to stop using more than half of their available network capacity. The network traffic to/from those regions then tried to fit into the remaining network capacity, but it did not. The network became congested, and our networking systems correctly triaged the traffic overload and dropped larger, less latency-sensitive traffic in order to preserve smaller latency-sensitive traffic flows, much as urgent packages may be couriered by bicycle through even the worst traffic jam.
"Google’s engineering teams detected the issue within seconds, but diagnosis and correction took far longer than our target of a few minutes. Once alerted, engineering teams quickly identified the cause of the network congestion, but the same network congestion which was creating service degradation also slowed the engineering teams’ ability to restore the correct configurations, prolonging the outage. The Google teams were keenly aware that every minute which passed represented another minute of user impact, and brought on additional help to parallelize restoration efforts."

Latest Jobs
-
- IAM developer - Saviynt
- United Kingdom
- Upto £60,000 plus benefits
-
IAM developer/ Consultant is required for a global consultancy who are looking to expand their deployment team within the UK Looking for a IAM developer who has experience with at least one of the following vendors Saviynt, Clearskye, Beyond Trust or Okta You will be part of a deployment team, involved in a number of high profile projects Key duties will be: implement IAM solutions to ensure secure access to applications, systems, and data for authorized users. This may involve integrating technologies and standards such as SAML, OAuth, LDAP, and RBAC. Conduct IAM audits and assessments: to identify vulnerabilities, gaps, and areas for improvement. Provide IAM support and troubleshooting and resolve incidents related to user access, authentication, and authorization.
-
- Lead Cyber Security Incident Response Consultant.
- United Kingdom
- N/A
-
Seeking skilled and passionate UK-based individual for a Lead Cyber Security Incident Response Consultant opportunity 3 core skillsets for the role Hands on technical incident response (triage and planning). Business consultancy (engaging with clients). Commercial awareness. Being able to engage in business growth conversations. Consultancy experience is an essential as it the ability to visit clients and the office. Additional experience will include, but not be limited to: Developing incident response strategies, guides and procedures for effective incident handling Proactive and reactive defense plans based on cyber threat actors' techniques Offering guidance, supervision, and fostering opportunities for team development Significant career development opportunities for the right individuals.
-
- OUTSIDE IR35 Contract- Functional tester- SC clearance Microsoft Windows Server
- London
- Outside IR35 contract
-
Front End Functional tester with SC clearance needed for an Outside IR35 project. Current valid SC clearance is required Experience with functional testing with exchange, sharepoint, SQL and other applications relating across a windows server Migration to 2019. Must be able to get to Central London 3 days a week. Jira, Wiki documentation and automation experience highly desirable.
-
- ForgeRock Consultant- UK
- United Kingdom
- Upto £100,000 plus benefits
-
ForgeRock Consultant/ Architect is require for niche consultancy who are looking to expand their presence within the UK/European Market Looking for a lead IAM architect, ideally with ForgeRock experience but would consider other vendors, But looking for someone who is able to advice and consultant with Clients but have the implementation background so they can get involved in projects as and when needed. Key duties will be: Provider IAM consultancy to clients, with a focus on ForgeRock Product stack ·Responsible for the design and implementation of ForgeRock solutions ·Install and configure ForgeRock stack to meet customer authentication and authorization requirements, ·Design and implement OAuth2 protocol using ForgeRock OpenAM, ·Design and develop OpenAM custom authentication modules, ·Configure ForgeRock stack to protect RESTful API, ·Troubleshoot and support ForgeRock IAM stack. This is a great role to join a niche play as they look to kick of their European expansion