AWS Backups
Overview
This provides enterprise-grade protection for your AWS infrastructure across multiple accounts and environments. The system is designed around a centralized backup strategy that automatically protects resources in your Development, Staging, and Production environments, with backups stored in a separate, isolated region for disaster recovery purposes.
Key Benefits:
Automated backup protection across all your AWS accounts
Immutable backup storage that cannot be deleted or modified
Cross-region disaster recovery capability
Centralized management from a dedicated backup account
Compliance-ready with built-in retention controls
How It Works
The operates in three main phases:
Phase 1: Resource Identification
The system identifies resources to back up based on tags applied to your AWS resources. By default, all resources in targeted environments are backed up unless specifically excluded, by being taged with backup: false.
Phase 2: Local Backup Creation
Each environment (Development, Staging, Production) creates local backups in its own AWS account using a local vault. These backups happen on a schedule you define, typically daily during off-peak hours.
Phase 3: Air-Gapped Copy
After local backups complete, the system automatically copies them to a centralized air-gapped vault located in a different geographic region within a dedicated backup account. This secondary copy is locked and cannot be modified or deleted before its retention period expires.
Architecture Explanation
The architecture diagram shows the complete backup flow:

Components in the Diagram
Organization Structure (Top) At the top level, your AWS Organization controls backup policies. The organization can specify which Organizational Units receive backup policies, allowing you to selectively enable backups for specific environments or business units.
Identity Center for Breakglass Access (Top Right) AWS Identity Center provides the authentication and authorization mechanism for breakglass roles. These special access roles are only used in emergency disaster recovery scenarios when you need to access the air-gapped vault to restore data. Access through these roles is tightly controlled and audited.
Environment Accounts (Middle Row) Multiple AWS accounts across your organization that you choose to protect with backups. The diagram shows three example environments (Dev, Prod, Stage), but you can apply backup policies to any accounts or Organizational Units in your organization. Each account that has backups enabled contains its own local account vault.
Resources in these accounts create backups locally first before copying to the centralized location.
Central Backup Account (Bottom) A dedicated backup account contains the air-gapped vault in a secondary geographic region. All backups from Dev, Prod, and Stage environments are copied here for long-term retention and disaster recovery.
Break-Glass Access (Bottom Right) Special emergency access roles allow authorized personnel to access the air-gapped vault only when needed for disaster recovery. This access is tightly controlled through AWS Identity Center and requires special permissions.
The Backup Flow
Organizational policies determine which accounts should perform backups
Resources in Dev, Prod, and Stage environments are identified for backup
Local backups are created in each environment's local vault
Backups are encrypted using the centralized encryption keys
Encrypted backups are copied into the air-gapped vault in the backup account
The air-gapped vault stores backups with immutable retention policies
Break-glass roles provide emergency access for disaster recovery scenarios
Organizational Unit Level Backups
One of the most powerful features of this solution is backup management at the Organizational Unit level. This means you can apply backup policies across entire groups of AWS accounts simultaneously.
How OU-Level Backups Work
Instead of configuring backups individually for each account, you attach a backup policy to an entire Organizational Unit. This policy then applies automatically to all accounts within that OU.
Example Scenario: If you attach a backup policy to your "Production" OU, every account within that OU will automatically:
Create daily backups
Use the same backup schedule and retention settings
Copy backups to the centralized air-gapped vault
Benefits of OU-Level Management
Consistency: All accounts in an OU follow identical backup policies, eliminating configuration drift.
Scalability: Adding a new account to an OU automatically applies the backup policy without additional configuration.
Simplified Management: Change the policy once at the OU level and it applies to all member accounts.
Governance: Centralized control ensures compliance with organizational backup standards.
How to disable backups for a resource
Within each account, the system uses tags to determine which resources to back up. The default configuration backs up all resources unless they are tagged with backup: false. To do this for an resource you can follow the tagging example below:
This opt-out approach ensures new resources are automatically protected.
The Air-Gapped Vault Strategy
The term "air-gapped" refers to the security isolation of the backup vault. While not physically disconnected from the network, it is logically isolated through strict access controls and policies.
What Makes It Air-Gapped?
Geographic Separation: The vault exists in a completely different AWS region from your primary workloads, typically on a different continent.
Account Isolation: The vault resides in a dedicated backup account, separate from your operational accounts.
Write-Only Access: Accounts can copy backups into the vault but cannot delete or modify them once stored.
Time-Based Lock: Vault lock policies prevent anyone, including administrators, from deleting backups before the retention period expires.
Limited Access: Only specially authorized break-glass roles can restore from the vault.
Why Air-Gapped Vaults Matter
Ransomware Protection: If malware compromises your primary environment, it cannot reach or delete the air-gapped backups.
Insider Threat Mitigation: Even privileged users in operational accounts cannot delete or modify backups in the air-gapped vault.
Regional Disaster Recovery: If an entire AWS region experiences an outage, your backups remain available in a different region.
Compliance Requirements: Many regulatory frameworks require offsite, immutable backup storage. Air-gapped vaults satisfy these requirements.
Grace Period Consideration
The vault lock includes a configurable grace period, typically 20 days for production environments. During this grace period, the vault lock policy can still be removed or modified. After the grace period expires, the lock becomes permanent and cannot be changed by anyone, including the root account owner or even AWS Support. This provides the highest level of protection for your backup data.
This grace period allows time to validate that backup processes work correctly before making the vault lock permanent.
Backup Integrity and Incremental Storage
How AWS Backup Stores Data
AWS Backup uses incremental storage, only changed blocks are stored after the initial backup. However, unlike traditional incremental backups, each recovery point is fully independent.
Traditional backup systems create chain dependencies: a full backup followed by increments that must be replayed in sequence to restore. If any backup in the chain corrupts, all subsequent backups become unusable when using traditional incremental backups.
AWS takes a different approach. Each recovery point maintains its own complete manifest, a map of pointers referencing every block needed to reconstruct the resource. Unchanged blocks aren't duplicated; multiple recovery points simply reference the same underlying blocks. This makes each recovery point a "synthetic full backup" for restore purposes.
What This Means for You
No chain dependency: Restore from any recovery point independently, without needing other recovery points
Isolated corruption: A problem with one recovery point does not affect others
Efficiency preserved: Backup windows stay short and storage costs stay low
Backup Integrity vs. Source Data Integrity
AWS guarantees the integrity of the backup process itself, your backups won't become corrupted during creation, storage, or copying. However, if corrupted data is written to your source resource, AWS faithfully backs up that state. This is why maintaining multiple recovery points with appropriate retention periods is essential, so you can restore from before corruption occurred.
Security Features
Customer-Managed Encryption Keys
All backups are encrypted using customer-managed KMS key rather than AWS-managed keys. This provides several advantages:
Control: You own and manage the encryption keys, not AWS.
Auditability: Every use of the encryption key is logged in CloudTrail for compliance auditing.
Access Control: You define exactly which services and accounts can use the encryption keys.
Cross-Account Support: The keys can be shared across organizational accounts while maintaining centralized control.
Key Policy Requirements
For backups to work correctly across accounts, source account resources must use customer-managed KMS keys with appropriate permissions. AWS-managed keys and default encryption keys do not support cross-account backup copy operations.
The encryption key policies must grant permissions to:
The AWS Backup service in the secondary region
The destination backup account for copy operations
Vault Access Policy
The air-gapped vault uses an access policy to control which accounts can copy backups into it. This policy is scoped to your AWS Organization ID, ensuring only accounts within your organization can write to the vault.
However, this access policy is explicitly write-only for copy operations. It does not grant permissions to delete, modify, or restore backups, providing an additional security layer.
Breakglass Access Role
A special identity center role provides emergency access to the air-gapped vault for disaster recovery scenarios. This role includes permissions to:
List and describe backups in the vault
Initiate restore operations
Interact with encryption keys for decryption
Create new resources during restore operations
Access to this role should be tightly controlled and audited. It represents the "break glass in case of emergency" access pattern and should only be used during actual disaster recovery events.
What Gets Backed Up
AWS Backups supports a comprehensive range of AWS services:
Compute and Storage
EC2 instances and EBS volumes
Elastic File System (EFS) file systems
FSx file systems (Windows File Server, Lustre, OpenZFS, NetApp ONTAP)
Storage Gateway volumes
S3 buckets
Databases
RDS database instances (all engine types)
Aurora database clusters
Aurora DSQL databases
DynamoDB tables
DocumentDB clusters
Neptune databases
Timestream databases
Data Warehousing
Redshift clusters
Other Services
VMware Cloud on AWS virtual machines
SAP HANA on EC2
CloudFormation stacks
Selection Criteria
Resources are selected for backup based on:
Tags: Resources must have appropriate tags (or lack exclusion tags) as defined in the backup policy.
Resource Type: The backup plan specifies which resource types to include, typically all types unless specifically excluded.
Account Membership: The account must be part of an Organizational Unit with an attached backup policy.
What Cannot Be Backed Up
Some AWS resources do not support AWS Backup:
Lambda function code (use S3 versioning instead)
EC2 instance stores (ephemeral storage)
CloudFront distributions
Route53 configurations
Security groups and network configurations
IAM roles and policies
For these resources, you need alternative backup strategies such as infrastructure-as-code version control.
Backup Lifecycle
Daily Backup Schedule
The default configuration creates backups daily at a scheduled time, typically during off-peak hours (10:00 AM UTC in the example configuration).
Backup Window: A 60-minute window starts at the scheduled time during which the backup must begin.
Completion Window: A 120-minute window from backup start during which the backup must complete.
If a backup cannot start within the backup window or complete within the completion window, it is marked as failed and alerts can be triggered.
Local Retention
Backups created in local account vaults have a short retention period, typically 14 days. This provides immediate restore capability for recent issues while minimizing storage costs in operational accounts.
Long-Term Retention
After local backups complete, they are automatically copied to the air-gapped vault with a longer retention period, typically 180 days (6 months) for production environments.
The retention period in the air-gapped vault must fall within the vault lock constraints:
Must be at least the minimum retention days specified in the vault lock
Cannot exceed the maximum retention days specified in the vault lock
Automatic Cleanup
When retention periods expire, AWS Backup automatically deletes the recovery points. In the air-gapped vault, deletions only occur after the vault lock retention period is satisfied, ensuring compliance with retention requirements.
Access and Recovery
Normal Operations
During normal operations, users cannot directly access the air-gapped vault. The system operates automatically:
Backups are created on schedule
Copies are made to the air-gapped vault
Old backups expire automatically
Monitoring alerts notify of any failures
Disaster Recovery Scenarios
When disaster recovery is necessary:
Step 1: Assessment Determine the scope of the disaster and identify which resources need recovery.
Step 2: Breakglass Access Authorized personnel assume the breakglass role through AWS Identity Center. This access should be logged and audited.
Step 3: Recovery Point Selection Browse available recovery points in the air-gapped vault and select the appropriate point in time for restoration.
Step 4: Restore Job Initiation Initiate a restore job, specifying the target account and region for restoration. The restore job decrypts the backup and recreates the resource.
Step 5: Validation Verify that restored resources function correctly and contain the expected data.
Step 6: Application Recovery Update application configurations to point to newly restored resources and resume operations.
Recovery Time Objectives
Recovery times vary based on resource type and size:
Small EC2 instances: 15-30 minutes
Large databases: Several hours
Large EFS or FSx systems: Multiple hours to days
Plan your Recovery Time Objectives (RTO) accordingly and test disaster recovery procedures regularly.
Limitations and Constraints
Service-Linked Role Requirements
Each AWS service that you back up requires a service-linked role to be created in the backup account. For example, backing up RDS databases requires the creation of the AWS service role for RDS in the backup account.
These roles must be created manually before backups of those resource types will succeed. The role creation is a one-time operation per service.
Encryption Key Requirements
Customer-Managed Keys Required: All resources must use customer-managed KMS keys. Resources encrypted with AWS-managed keys or default encryption keys cannot be copied to the air-gapped vault.
Key Policy Updates Required: Each customer-managed key needs policy statements allowing the backup service and destination account access. This requires updating key policies for existing resources.
Organizational Configuration Prerequisites
Service Access Principals: The AWS Backup service must be enabled as a trusted service in your organization.
Policy Type Enablement: The BACKUP_POLICY type must be enabled at the organization level.
Delegated Administrator: The backup account must be registered as a delegated administrator for the backup service.
Cross-Account Backup: Global settings must enable cross-account backup operations at the organization level.
Resource Limitations
Copy Job Limits: AWS Backup has service quotas on the number of concurrent copy jobs. Large environments may need quota increases.
Vault Capacity: While vaults have no hard capacity limits, extremely large backup volumes may require engagement with AWS Support for planning.
Restore Capacity: Restoring many resources simultaneously may hit EC2 instance launch limits or other service quotas in the target account.
Last updated
Was this helpful?
