Amazon cloud outage caused by human error

Amazon has released a statement addressing last week's Amazon Web Services' S3 system outage, which disrupted a number of websites on the afternoon of Feb. 28.

The AWS S3 service disruption, which impacted cloud computing service in the northern Virginia region, took place when Amazon's S3 team was working on an issue in its S3 billing system. An authorized S3 team member incorrectly executed a command, which was meant to remove servers for one of the S3 subsystems used by the billing process — causing more servers to be removed than intended.

"The servers that were inadvertently removed supported two other S3 subsystems," according to the statement. This mistake required each of the systems to need a full restart.

Since last week, Amazon has made a few changes to its protocols, including improving the recovery time of S3 subsystems and modifying its tools that remove servers.

"While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly," according to the statement. "We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level."

Click here to view the full statement.

More articles on health IT:
AMA, 102 other physician groups ask ONC to delay 2015 certified EHR rollout
4 questions with NewYork-Presbyterian CIO Daniel J. Barchi
Study: AI uses EHRs to predict suicide attempts 2 years in advance

© Copyright ASC COMMUNICATIONS 2017. Interested in LINKING to or REPRINTING this content? View our policies by clicking here.

 

Top 40 Articles from the Past 6 Months