You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/agent/guide/datadog-disaster-recovery.md
+17-12
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ further_reading:
15
15
{{< /callout >}}
16
16
17
17
## Overview
18
-
Datadog Disaster Recovery (DDR) provides you with observability continuity in rare outage events that may impact a cloud service provider region or Datadog services running within a cloud provider region. In such cases, DDR enables your organization to meet observability, availability, and business continuity goals. You can also recover live observability at an alternate, functional Datadog site in typically under an hour with DDR. <br><br>
18
+
Datadog Disaster Recovery (DDR) provides you with observability continuity in rare outage events that may impact a cloud service provider region or Datadog services running within a cloud provider region. In such cases, DDR enables your organization to meet critical observability, availability, and business continuity goals. You can also recover live observability at an alternate, functional Datadog site in typically under an hour with DDR. <br><br>
19
19
Additionally, Datadog Disaster Recover allows you to periodically conduct disaster recovery drills to not only test your ability to recover from outage events but to also meet your business and regulatory compliance needs.
20
20
21
21
@@ -83,7 +83,9 @@ curl -X GET "https://api.<SITE>.datadoghq.com/api/v1/org/<PUBLIC-ID>" \
83
83
84
84
85
85
{{% collapse-content title=" 4. Link the DDR org to the primary org" level="h5" %}}
86
-
After the Datadog team has completed the configuration of the designated orgs and you have confirmed the public IDs for your orgs, you can now link them. For security reasons, Datadog is unable to link the orgs on your behalf. To link your primary and DDR orgs, run these commands:
86
+
After the Datadog team has completed the configuration of the designated orgs and you have confirmed the public IDs for your orgs, you can now link them. For security reasons, Datadog is unable to link the orgs on your behalf.
87
+
88
+
To link your primary and DDR orgs, run these commands:
#### when you have linked the DDR org to your primary org
109
111
{{% collapse-content title=" 5. Create your Datadog API and App key for syncing" level="h5" %}}
110
-
At the secondary Datadog site, create a set of API keyandApp key. You will use these keys in _steps 7_ to copy dashboards and monitors between Datadog sites.
112
+
At the secondary Datadog site, create a set of `API key`**and**`App key`. You will use these keys in _steps 7_ to copy dashboards and monitors between Datadog sites.
111
113
112
114
For your Agents, Datadog can copy API key signatures to the secondary backup account for you to prevent you from maintaining another set of API keys for your Agent.
113
115
{{% /collapse-content %}}
114
116
115
117
116
118
{{% collapse-content title=" 6. Configure Single Sign On for the Datadog App" level="h5" %}}
117
-
Go to your [Organization Settings][1] to configure SAML or Google Login for your users. **Single Sign On (SSO) is highly recommended** to enable all your users to be able to seamlessly login to your Disater Recovery organization during an outage.
119
+
Go to your [Organization Settings][1] to configure SAML or Google Login for your users.
118
120
119
-
You must invite your users to your Disaster Recovery organization and give them appropriate roles and permissions.
121
+
**Single Sign On (SSO) is highly recommended** to enable all your users to be able to seamlessly login to your Disater Recovery organization during an outage.
120
122
121
-
Alternatively to streamline this operation you can use [Just-in-Time provisioning with SAML][2].
123
+
You must invite your users to your Disaster Recovery organization and give them appropriate roles and permissions. Alternatively, to streamline this operation you can use [Just-in-Time provisioning with SAML][2].
122
124
{{% /collapse-content %}}
123
125
124
126
125
127
{{% collapse-content title=" 7. Set up Resources syncing and scheduler" level="h5" %}}
126
128
Datadog provides a tool called [Datadog sync-cli][3] to copy your dashboards, monitors and other configurations from your primary organization to your secondary organization. You can determine the frequency and timing of syncing based on your business requirements. Regular syncing is essential to ensure that your secondary organization is up-to-date in the event of a disaster. We recommend performing this operation on a daily basis. For information on setting up and running the backup process, see the [datadog-sync-cli README][5].
127
129
128
-
Sync-cli is primarily intended for unidirectional copying and updating resources from your primary org to your secondary org. Resources copied to the secondary organization can be edited, but any new syncing will override changes that differ from the source in the primary organization. `Sync-cli can be configured for bidirectional syncing, but this is not yet fully tested and should be considered experimental at this moment`(**should we mention this? it doesn't sound like we recommned this at the moment**).
130
+
Sync-cli is primarily intended for unidirectional copying and updating resources from your primary org to your secondary org. Resources copied to the secondary organization can be edited, but any new syncing will override changes that differ from the source in the primary organization. `Sync-cli can be configured for bidirectional syncing, but this is not yet fully tested and should be considered experimental at this moment`(**should we mention this? it doesn't sound like we recommend this at the time**).
129
131
130
132
Each item can be added to the sync scope using the sync-cli configuration available in the documentation. Here’s an example of a configuration file for syncing specific dashboards and monitors using name and tag filtering from an `EU` site to a `US5` site.
131
133
@@ -164,26 +166,29 @@ Verify that your secondary org is accessible and that your Dashboards and Monito
Remote configuration (RC) is a Datadog capability that allows you to remotely configure and change the behavior of Datadog Agents deployed in your infrastructure. Remote Configuration is strongly recommended for a more seamless failover control; alternatively, you can configure your Agents manually or using configuration management tools like Puppet, Ansible, Chef, etc.
169
+
[Remote configuration (RC)][7] is a Datadog capability that allows you to remotely configure and change the behavior of Datadog Agents deployed in your infrastructure. Remote Configuration is strongly recommended for a more seamless failover control; alternatively, you can configure your Agents manually or using configuration management tools like Puppet, Ansible, Chef, etc.
168
170
169
171
Remote configuration will be turned on by default on your new organization and you can create new API keys that are RC-enabled by default for use with your Agent. See the documentation for [Remote configuration][7] for more information.
Update your Datadog Agents to version **7.54 or higher**. This version comes with a new configuration for Disaster Recovery.
175
-
Configure your Datadog Agent's `datadog.yaml` configuration file as shown below for a `US5` site and restart the Agent.
177
+
178
+
Configure your Datadog Agent's `datadog.yaml` configuration file as shown in the example below and restart the Agent.
176
179
177
180
```shell
178
181
multi_region_failover:
179
182
enabled: true
180
183
failover_metrics: false
181
184
failover_logs: false
182
185
failover_traces: false
183
-
site:us5.datadoghq.com
184
-
api_key: us5_site_api_key
186
+
site:<DDR_SITE># For example "site: us5.datadoghq.com" for a US5 site
187
+
api_key:<DDR_SITE_API_KEY>
185
188
```
186
-
During the preview, we recommend having `failover_metrics`, `failover_logs` and `failover_traces` set to **false** when in passive phases. Your Datadog contact will work with you on scheduling dedicated windows for game day testing to measure performance and Recovery Time Objective(RTO).
189
+
During the preview, we recommend having `failover_metrics`, `failover_logs` and `failover_traces` set to **false** when in passive phases.
190
+
191
+
Your Datadog contact will work with you on scheduling dedicated windows for game day testing (`is this the failover testing day?`) to measure performance and Recovery Time Objective(RTO).
0 commit comments