News
Jun
11
[Core Network] Service Alert 22nd May 2018 - Update 1
Posted by David Croft on 11 June 2018 08:33 AM
This is an incident notification regarding an outage on our core network.

Date: Tuesday, 22nd May 2018
Start time: 14:46 BST
End time: 16:44 BST (Final clear)

Services affected:

Intermittent partial and total outage across our IP network

Report:

A failure of a network device in our core IP network led to a cascade
of additional failures across the network.

Controlled shutdowns of portions of our network were necessary to
bring it back to a stable state in order to fully restore service.

Both the failure and the controlled shutdowns caused packet loss and
temporary routing failures to services hosted on or delivered through
our network.

Root Cause Analysis:

A memory exhaustion on a edge transit router caused it to restart its
BGP process, and the consequent withdrawal and announcement of all
routes from BGP caused other devices on the network to suffer similar
failures in a cascading fashion.

Next Steps:

We are bringing forward our upcoming planned network upgrades, which
will now take place in July. Emergency maintenance sessions will be
announced once lab testing has been completed.

In the meantime we will continue to observe the change freeze on our
network to prevent a further incident.

Regards,

David Croft

Comments (0)
Post a new comment
 
 
Full Name:
Email:
Comments: