TECHNOLOGY

DETAIL EXPLANATION ABOUT FACEBOOK, INSTAGRAM, AND WHATSAPP PROBLEMS

The problem with the three services yesterday was on the backbone network which made access disconnected.

07.10.2021
BY A. NUGROHO
SHARE THE STORY

The outage of Facebook, Inc. services, which includes Facebook, WhatsApp, and Instagram Monday (4/10), made many users of the application wonder. All three were inaccessible for approximately 8 hours and only recovered on Tuesday (5/10) morning.

Facebook has experienced problems several times, but the situation a few days ago became the worst. What really happened? Facebook, Inc. as parent company provides an explanation.

In the official Facebook VP Infrastructure blog, Santosh Janardhan revealed yesterday's disruption was caused by an unintentional internal error. Janardhan said the problem was rooted in the backbone network while performing routine maintenance on Facebook's infrastructure. These problems produce a domino effect that makes repair efforts complicated and take a long time.

Before knowing the point of the mass disruption of Facebook, WhatsApp, and Instagram yesterday, it is necessary to understand that the three services are on the same backbone network. This network was built to connect all the computing facilities of Facebook, Inc. services.

 

 
 
 
 
 
View this post on Instagram
 
 
 
 
 
 
 
 
 
 
 

A post shared by Tech Insider (@techinsider)

 

Maybe you ask how the physical form? It takes the form of tens of thousands of kilometres of fibre-optic cable that stretches across the globe and connects all of Facebook's data centres. Some of them are physical buildings. Some computers store data and process various tasks to run all Facebook services.

There are still other data centres in smaller facilities that connect the Facebook backbone to the internet and users of the Facebook, Inc. platform. It is on the backbone network and Facebook's data centre that the data exchange requested by users of the three services worldwide takes place.

For example, when you want to refresh your Instagram feed, this command will be transferred from the user's phone to the nearest Facebook data centre. This transfer runs over the backbone to a more significant data centre.

Data traffic between all of Facebook's computing facilities is managed by backbone routers that figure out where to send incoming and outgoing data.

"In this backbone, the information or orders that the user wants will be retrieved and processed. After that, the data will be sent back through the backbone network to the user's cellphone, "said Janardhan.

When the data exchange is successful, the user's command will be realized. For example, refresh the Instagram feed to display the latest posts from the meaning of the followed idols. The backbone and data centre are the most essential parts. The Facebook, Instagram, and WhatsApp services can be accessed generally by users. You can imagine when this system is disturbed, the impact is massive.

 

 
 
 
 
 
View this post on Instagram
 
 
 
 
 
 
 
 
 
 
 

A post shared by Who Rules The World (@wrtw.smallgiants)

 

Facebook often performs routine maintenance on its infrastructure. Facebook technicians sometimes need to cut off access to part of the backbone network to perform maintenance when performing maintenance. This routine can be like repairing a wired network, adding capacity, updating software or something else.

During last Monday's incident, a command was given to the system to monitor the global backbone network capacity. However, the command instead cut all connections in the Facebook backbone. Facebook data centres around the world are offline because they are not connected to each other.

Facebook's system actually has audit capabilities to prevent command errors like this. But apparently, there is a bug that causes the system to fail to detect and prevent the command from running.

"This caused the server connection between our data centers and the internet network to be completely cut off. Complete loss of connection creates a second problem that makes matters worse," Janardhan said.

The next problem is the Domain Name System (DNS) server which cannot be accessed because the backbone behind it is disconnected. One thing that Facebook's smaller data centre facilities do is respond to DNS when accessed by users.

DNS plays a role in translating the name of a host or site into an IP address that the computer reads. Later, the DNS server sends the IP address in question through Border Gateway Protocol (BGP).

In addition to containing the IP address of the DNS name, BGP is also tasked with determining the route of data taken when passing through information traffic so that the IP address can be accessed by users. However, Facebook's DNS servers are designed to disable the BGP protocol when the server is disconnected or detected as unable to communicate with Facebook's data centre.

An error in the backbone network made the Facebook server connection wholly disconnected from the data centre. The BGP protocol automatically stops working. As a result, the entire internet can't find Facebook's servers.

This service disruption occurred so quickly but not with efforts to fix it because it turned out to take a long and complicated time until the Facebook, WhatsApp and Instagram services were offline for hours.

The technical team also encountered problems. First, the data centre cannot be accessed as usual because the network is disconnected. Second is the loss of DNS Facebook, Inc.

"The overall loss of DNS has subverted many of the internal tools we normally use to investigate and resolve service outages like this," Janardhan said.

 

 
 
 
 
 
View this post on Instagram
 
 
 
 
 
 
 
 
 
 
 

A post shared by Policy Radio (@policy.radio)

 

Facebook then immediately sent technicians directly to data centres in the field because repairs could not be done remotely. They have to debug and restart the system to get it back up and running.

This solution is not enough. Field teams find it difficult to physically access servers because data centres are designed with high security to prevent interference from irresponsible people. It took extra time to go through the security protocols for the technicians to get to the field and start repairing the server.

Once the server is online, and the backbone connection is reconnected, Facebook, WhatsApp and Instagram services will be turned on gradually. This is to avoid power surges and traffic flooding that can cause other problems.

So far, there has never been a simulation from the company to anticipate the incident last Monday, when Facebook's backbone collapsed worldwide. Facebook took this event as an essential lesson.

 

 

#facebook #instagram #whatsapp