Skip to main content

Reply to "Tehcnical issues log in - Why was this not on forums? but on facebook & twitter???"

just an example of how etsy handled their issues

from: Etsy Explains Causes of Its Recent Site Outages
By Ina Steiner
EcommerceBytes.com
August 21, 2012


Etsy had several unrelated site outages since July 30, and in a forum post on Monday, John Allspaw, Senior VP of Technical Operations at Etsy, explained what happened, how the company handled the outages, and what it's doing to help prevent them in the future. Allspaw explained the nature of the outages in great detail.

In the case of the July 30th outage, Etsy said it had been upgrading its databases to make two improvements to the site. One upgrade was made to allow Etsy to add support for new languages - it has already added German, French, Italian, Dutch and Japanese in addition to English, but plans to support additional languages in the future.

The other upgrade was an improvement to how Etsy's databases did nightly backups, "because they were slowing the site down when they ran."

Etsy said, "We expected to make those improvements separately. Instead, they were accidentally made at the same time. In order to confirm that there was no data loss or corruption during the accidental upgrade, we took the site down while we verified everything was in order, which it was. We brought the site back up." That was the "short version" of what happened.

In "the longer version," Etsy explained that the engineer who deployed the fix for the backup didn't know that it would be coupled with the database upgrade. "When we detected this was happening, we disabled the site in order to make sure we weren’'t going to corrupt or lose any data, and manage the upgrade more gracefully."

Etsy then explained it was making changes to the way it does large upgrades and would be bolstering its automated tools to make it clearer to the engineers what is being deployed.

On August 10, Etsy had another outage. The space it set aside for creating unique ID numbers for the various elements on Etsy.com (such as shops, listings, treasuries, etc.) was not large enough.

"When a new member registers, or a new shop opens, or a new listing is uploaded, we go to a special set of servers to get a new unique ID number for it. The job of those servers is to make sure that no IDs get reused for the same thing. For example, we don'’t want two shops to get the same ID number, because if they did, we couldn’'t be sure which one we should show listings from when a buyer wants to browse one of those shops."

Etsy needs to tell its servers in what range the numbers will be so they can set aside space and memory for them. When it realized the space it had set aside for some of the ID numbers wasn't large enough, it took the site down. "After confirming all was okay, we brought the site back online again, and began proactively looking for and enlarging ranges that might overflow in the future."

However, it left treasuries and parts of activity feeds disabled. "Being able to disable some features is one of the things we do, precisely for situations like this. We don'’t want to prevent shoppers from buying items just because the Treasury and Activity Feed weren'’t behaving correctly, so we brought the site up without them."

Etsy is making some changes to help prevent such incidents from happening in the future, including changes to its database - "We would rather a member get an error message than allow for data to be corrupted."

Etsy then went on to describe "Other Outages and Degradations," such as one on August 18.

Etsy said when there are unexpected outages, it is presented with a decision:

Keep the site online and risk either being too slow to be usable or taking in bad data.
Take the site offline in order to fix the slowness and verify that the data is correct.
"In each of the cases, we decided to take option #2, because it’'s safer for the community."

Allspaw wrapped up the explanation with these words:

"I wrote this blog post to give you the confidence you deserve that we take outages seriously, are willing to give detailed information about them, and that our aim is to learn from each one in order to lessen the possibility of another in the future."
Copyright © 1999-2018 Auctiva.com. All rights reserved.
×
×
×
×