Skip to main content

5. Cyber resilience

  • Availability is a fundamental security principle. To achieve it, a cyber resilience approach is necessary.
  • Resilience is mainly an architecture concern, both at system and software level. Cloud architectures are much easier to make resilient, as the underlying infrastructure offers resilient services.
  • However, the effort to achieve resilience shall not be overlooked, as it requires a consistent assembly of cloud services. Don't imagine that a cloud application is automatically resilient!
  • Cloud providers offer several levels of geographical resiliency:
    • Autonomous Zones (AZ), separated by a few kilometers, with very low latency between them
    • Regions, that are typically separated by more than 50 km
  • You can replicate a database in realtime between 2 AZs. Using several AZs significantly improves resilience, by avoiding any dependence to a single datacenter.
  • Using several regions provides even higher resilience, in particular in the case of outage of a entire region.
  • Avoid and remove unnecessary dependencies. This is one of the simplest and most effective way to improve resilience. But being a subtractive approach, it is often overlooked (this has been proved scientifically, see https://doi.org/10.1038/s41586-021-03380-y).
  • Testing the effectiveness of resilience mechanisms is essential to really make sure everything is working. Rule of thumb: if it is not tested, it will not work.
  • Chaos monkey approaches are the ultimate testing approach, but only possible for very mature organizations. It consists in introducing random breakdowns volontarily in production, to keep testing the resilience mechanisms, and regularly challenge operations teams. This approach has been popularized by Netflix.
  • Backup and restore processes are also fundamental for cyber resilience. Indeed, this is the only way to mitigate a successful ransomware attack.
  • Beware that attackers will do their best to erase or corrupt backups too, so the backup process shall be highly independent from the information system (offline backup on tapes or hard disks is a good option, and can be provided by Cloud Service Providers).
  • Restoration testing is essential to make sure backups will work when necessary and are not corrupt. Rule of thumb: an untested backup does not work.
  • Backup encryption is recommended, but shall be managed properly, as cryptography is prone to pitfalls. Loosing your keys is equivalent to loosing your backup. At the end of the end, you will probably use a physical safe.