Investigating reports of trouble with outgoing calls (Hosted Platform)

Resolved
{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 9 hours and 31 minutes

Our Hosted Platform had a bad moment Tuesday, and we wanted to share an update in review of the trouble, resolution, and how we’re working to prevent it from happening again. This incident was limited to our Hosted Platform users, and all other customers and services for Simple Phone and Hosted Phone Server we’re unaffected.

For a simple explanation, a bug in our server software caused an issue connecting calls. We identified the software bug, and resolved the trouble. For a detailed explanation, see below.

Here’s what happened:

Just before 10am (PST) this morning we began receiving reports from users about intermittent outgoing calling errors. While troubleshooting, we encountered an issue that prevented phones from registering and connect calls for a short period of time.

The issue was identified as a bug in our core operating systems software stack that caused our “Call Manager” to freeze. The “Call Manager” is like the crossing guard for the entire Hosted Platform network, balancing calls across all of our servers in data centers nationwide. By design, we can remove any affected system from the network during planned/un-planned maintenance or data center outages. We restarted a few servers in that troubleshooting process, and this introduced the issue on additional systems causing a larger system-wide issue.

Once we identified the bug in the software stack and rolled it back, the network stabilized and the “Call Manager” worked as designed.

Here’s what we’re doing differently:

We’re going to review our software and system upgrade process from top to bottom, and develop better testing procedures before publishing update publicly. Oddly this bug didn’t cause trouble until today, even though our last update was about 2 weeks ago. A more detailed review of each update applied to all systems, including package update details for the OS, will help us prevent this from happening again.

We apologize to our effected customers, and will continue to work hard on building not just a reliable phone service, but also one that serves our customers better in times like this. Today's trouble, no matter the issue, is an opportunity to "strengthen our code" and "strengthen our code" is what we will do.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 3 hours and 57 minutes

Our team is reporting a full restoration of services impacted. We will be conducting a full review and issues a RFO report for our customers shorty.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 2 hours and 58 minutes

The team is still working to resolve the remaining issues. We've identified a cause for the trouble, and still working to bring things back online. Our development and engineers are full steam ahead on restoring services. If you would like to forward your calls to different number, please feel free to contact Customer Support at 844-4-Simple anytime.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 2 hours and 31 minutes

While some of the effected services have returned online, and we are working to restore the remaining features services. More updates soon to follow.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 1 hour and 42 minutes

It looks like it is taking a little more work to get everything back onboard, we will continue to provide updates.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 1 hour and 12 minutes

The team is currently working to resolve this and we will follow up once the trouble has cleared.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 23 minutes

Our engineers have identified a few areas that are potentially impacting services, and are currently troubleshooting. We will follow up with more information shortly.

{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}

Our team is currently investigating reports of trouble with outbound calls. Our engineers are reviewing these reports and will provide updates shortly.

Began at: