Today having a website or a service down can have a massive impact on your brand image or your business. In traditional IT, a resilient architecture was difficult to design or was very expensive because the infrastructure was not modular.

This topic provides quick and easy steps toward a resilient infrastructure:


General Information

High Availability

Downtime

In a year, how long can you afford to have your service down? Today the "five nines" is the target for major web players. To reach that target, a flexible and resilient infrastructure is mandatory. 

Rate

Time per year


99%

3.65 days

99.9%

8.76 hours

99.99%

52.56 minutes

99.999% (five nines)

5.26 minutes 

99.9999%

31.5 seconds


Performances

We consider that a website must have a response time of 5 seconds maximum, and aim for an average response time of 2 to 3 seconds. Load balancing provides an easy way to achieve this, without compromising performance regardless of website sessions.

However, multiplying layers and services can slow down the overall user experience. For that reason, you should consider designing your solution with asynchronous communication mechanisms that rely on communication bus with AMQP (see RabbitMQ or Apache Kafka).

Redundancy

In order to guarantee your customers maximum availability whatever happens in a physical datacenter, you can benefit from our Regions and Availability Zones (AZs). For more information, see About Regions, Endpoints, and Availability Zones and Regions, Endpoints and Availability Zones Reference


What to Leverage

Infrastructure Architecture

Going to the Cloud enables you to use mechanisms such as Regions and Availability Zones and Load Balancers to design a High Availability and a Load Distribution infrastructure without modifying your application stack. 

Software Architecture

After designing your infrastructure, you can improve or design a new application according to rules that enable you to reach the state of the art of a clustered application. You can also design your software to be fault tolerant. For more information, see Netflix Chaos Monkey





Infrastructure Architecture

Philosophy

The main thing to do is to install each service or application on a single Virtual Machine (VM) and create an OUTSCALE machine image (OMI) from it. This enables you to easily replicate a VM and deploy it several times. For more information, see Creating an OMI

A single service needs to be provided by many VMs (see SPOF). A VM with a service is called a "node", and the collection of "nodes" providing a service is called a "cluster". For more information, see Computer Cluster Wikipedia

Each cluster contains a load balancer that receives incoming traffic. Your Virtual Machines should never receive direct incoming traffic.

Best Practices

  • Run each of your critical services or jobs on a single VM, such as 3-tier pattern.
  • Make your infrastructure grow in scale out mode, not up scale mode, that is, you need to add nodes when overloading instead of resizing a single node.
  • Use several Availability Zones (AZs) to guarantee your service.

Subnets and Security Group Isolation

Because Cloud Computing provides a philosophy of security by design, we use Virtual Private Clouds (VPCs) and subnets to logically isolate each business layer and your overall infrastructure to other infrastructures. For more information, see Creating and Managing Subnets in Your VPC

Three VMs are launched from existing OUTSCALE machine images (OMIs). Each VM has its own dedicated security groups, and is placed in a subnet dedicated to a single business logic. 

The mechanism is consistent: for 1 business logic, you get 1 OMI, 1 Subnet and 1 security group. Elements discussed here appear in red in the graph below:


Out Scaling

Instead of having 1 Virtual Machine per service which is growing (more CPU, more RAM) that is called up scaling, we prefer to distribute the load across several machines. To do that, the load of each business laye is distributed through a load balancer. For more information, see Load Balancing Unit (LBU)
Databases can not be managed with Load Balancers.

Feeding the Multitude

Based on existing OMIs, you can run multiple business nodes. For more information about adding nodes, see Working with Back-end Instances

We replicate the previous infrastructure and cross flows between load balancers and nodes. These elements appear in red in the graph below.

To prevent datacenter failure, each horizontal layer (web server, intel, or database) is located in separate subnet and in a different Availability Zone

3DS OUTSCALE provides load balancers replication and data replication for snapshots and OMIs. The delta between two snapshots is your RPO (Recovery Point Objective).


Go Further: Self-Healing and Reliability

We highly recommend using supervision tools, which enable you to terminate or run new nodes when one of them encounters a problem.