Skip to main content

Business resilience

Business resilience and disaster recovery

Ensuring system resilience and reliability is a core part of our work at Learnosity. Here’s how we keep your product up and running at all times.

How does Learnosity track system failure?

Learnosity uses monitoring and management services such as CloudWatch, DataDog, and StatusCake. These then feed into OpsGenie, which sends out any critical alerts to our tech support team.

Sounds good, but can you tell me more?

DataDog collects metrics from AWS CloudWatch and also from the DataDog agent software we’ve installed on all hosts in the environment. It collects metrics on CPU, memory, and network, among other metrics. We have a number of “monitors” in DataDog that raise alerts to OpsGenie whenever certain levels are exceeded or any unusual patterns have been detected. Depending on the time of day, OpsGenie will then send emails and SMSs to the relevant people.

How quickly will we be notified if a region goes down?

It depends on the error type and the severity of the issue, but you should receive a notification within 5 minutes.

Does Learnosity conduct dry runs to test disaster recovery procedures?

Yes, we do. We regularly test our automatic fail-over and autoscaling. We also regularly test our database recovery procedures. Most of the environments are rotated completely from scratch every 3 weeks with our release cycle.

Can you tell me anything else about Learnosity’s risk mitigation, redundancy, and recoverability?

Of course. We operate with at least three availability zones with even distribution between them in each region. This reduces the risk of an internal AWS failure causing issues. We operate with no SPOFs (single points of failure) and there is always a backup or redundant system behind any single thing. This includes our databases and application servers.

All network connections to the internet are also made in redundant pairs (or more) with multiple VPN tunnels between each location, which in an emergency allows us to move clients between regions to allow continuity if required.

Related Links