Welcome to the Ouroboros-network-ops Repository
Cardano is a decentralized third-generation proof-of-stake blockchain platform and home to the ada cryptocurrency. It is the first blockchain platform to evolve out of a scientific philosophy and a research-first driven approach.
The ouroboros-network-ops project serves as a cardano-node test cluster for the network team.
It utilizes flake-parts and re-usable nixosModules and flakeModules from cardano-parts.
While working on the next step, you can already start the devshell using:
nix develop
This will be done automatically if you are using direnv
and issue direnv allow
.
Note that the nix version must be at least 2.17
and fetch-closure
,
flakes
and nix-command
should be included in your nix config for
experimental-features
.
It might be handy to add dev-deployer
as a remote, so you don't need to push
branches via GitHub
:
git remote add dev-deployer dev-deployer:./ouroboros-network-ops
For that to work you also need to add dev-deployer
to your ~/.ssh/config
.
From the parent AWS org, create an IAM identity center user with your name and
AdministratorAccess
to the AWS account of the ouroboros-network-ops
deployment, then store the config in ~/.aws/config
under the profile name
ouroboros-network-ops
:
[sso-session $PARENT_ORG_NAME]
sso_start_url = https://$IAM_CENTER_URL_ID.awsapps.com/start
sso_region = $REGION
sso_registration_scopes = sso:account:access
[profile ouroboros-network-ops]
sso_session = $PARENT_ORG_NAME
sso_account_id = $REPO_ARN_ORG_ID
sso_role_name = AdministratorAccess
region = $REGION
The $PARENT_ORG_NAME
can be set to what you prefer, ex: ioe
. The
$IAM_CENTER_URL_ID
and $REGION
will be provided when creating your IAM
identity center user and the $REPO_ARN_ORG_ID
will be obtained in AWS as
the ouroboros-network-ops
aka network-team
org ARN account number.
If your AWS setup uses a flat or single org structure, then adjust IAM identity center account access and the above config to reflect this. The above config can also be generated from the devshell using:
aws configure sso
When adding additional profiles the aws configure sso
command may create
duplicate sso-session blocks or cause other issues, so manually adding new
profiles is preferred. Before accessing or using ouroboros-network-ops
org resources, you
will need to start an sso session which can be done by:
just aws-sso-login
While the identity center approach above gives session based access, it also
requires periodic manual session refreshes which are more difficult to
accomplish on a headless system or shared deployer. For those use cases, IAM
of the ouroboros-network-ops
organization can be used to create an AWS user with your name and
AdministratorAccess
policy. With this approach, create an access key set for
your IAM user and store them in ~/.aws/credentials
under the profile name
ouroboros-network-ops
:
[ouroboros-network-ops]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
No session initiation will be required and AWS resources in the org can be used immediately.
In the case that a profile exists in both ~/.aws/config
and
~/.aws/credentials
, the ~/.aws/config
sso definition will take precedence.
While cluster secrets shared by all machines are generally handled using AWS
KMS, per machine secrets are handled using sops-nix age. However, an admin age
key is still typically desired so that all per machine secrets can be decrypted
by an admin or SRE. A new age admin key can be generated with age-keygen
and
this should be placed in ~/.age/credentials
:
# ouroboros-network-ops: sre
AGE-SECRET-KEY-***********************************************************
We bootstrap our infrastructure using AWS Cloudformation, it creates resources like S3 Buckets, a DNS Zone, KMS key, and OpenTofu state storage.
The distinction of what is managed by Cloudformation and OpenTofu is not very strict, but generally anything that is not of the mentioned resource types will go into OpenTofu since they are harder to configure and reuse otherwise.
All configuration is in ./flake/cloudFormation/terraformState.nix
We use Rain to apply the configuration. There is a wrapper that evaluates the config and deploys it:
just cf terraformState
When arranging DNS zone delegation, the nameservers to delegate to are shown with:
just show-nameservers
We use OpenTofu to create AWS instances, roles, profiles, policies, Route53 records, EIPs, security groups, and similar.
All monitoring dashboards, alerts and recording rules are configured in ./flake/opentofu/grafana.nix
All other cluster resource configuration is in ./flake/opentofu/cluster.nix
The wrapper to setup the state, workspace, evaluate the config, and run tofu
for cluster resources is:
just tofu [cluster] plan
just tofu [cluster] apply
Similarly, for monitoring resources:
just tofu grafana plan
just tofu grafana apply
If your credentials are correct, and the cluster is already provisioned with
openTofu infrastructure, you will be able to access SSH after creating an
./.ssh_config
and nix ip module information using:
just save-ssh-config
just update-ips
With that you can then get started with:
# List machines
just list-machines
# Ssh to a newly provisioned machine
just ssh-bootstrap $MACHINE
# Ssh to a machine already deployed
just ssh $MACHINE
# Find many other operations recipes to use
just --list
Behind the scenes ssh is using AWS SSM and no open port 22 is required. In fact, the default template for a cardano-parts repo does not open port 22 for ingress on security groups.
For special use cases which still utilize port 22 ingress for ssh, ipv4 or ipv6
ssh_config can be used by appending .ipv4
or .ipv6
to the target hostname.
To deploy changes on an OS level, we use the excellent Colmena.
All colmena configuration is in ./flake/colmena.nix
.
To deploy a machine for the first time:
just apply-bootstrap $MACHINE
To subsequently deploy a machine:
just apply $MACHINE
Secrets are encrypted using SOPS with KMS and AGE.
All secrets live in ./secrets/
KMS encryption is generally used for secrets intended to be consumed by all machines as it has the benefit over age encryption of not needing re-encryption every time a machine in the cluster changes. To sops encrypt a secret file intended for all machines with KMS:
sops --encrypt \
--kms "$KMS" \
--config /dev/null \
--input-type binary \
--output-type binary \
$SECRET_FILE \
> secrets/$SECRET_FILE.enc
rm unencrypted-secret-file
For per-machine secrets, age encryption is preferred, where each secret is typically encrypted only for the target machine and an admin such as an SRE.
Age public and private keys will be automatically derived for each deployed
machine from the machine's /etc/ssh/ssh_host_ed25519_key
file. Therefore, no
manual generation of private age keys for machines is required and the public
age key for each machine is printed during each colmena
deployment, example:
> just apply machine
...
machine | sops-install-secrets: Imported /etc/ssh/ssh_host_ed25519_key as age key with fingerprint $AGE_PUBLIC_KEY
...
These machine public age keys become the basis for access assignment of per-machine secrets declared in .sops.yaml
A machine's age public key can also be generated on demand:
just ssh machine -- "'ssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub'"
A KMS or age sops secret file can generally be edited using:
sops ./secrets/github-token.enc
Or simply decrypt a KMS or age sops secret with:
sops -d ./secrets/github-token.enc
In cases where the decrypted data is in json format, sops args of --input-type binary --output-type binary
may also be required to avoid decryption embedded
in json.
See also related sops encryption and decryption recipes:
just sops-decrypt-binary "$FILE" # Decrypt a file to stdout using .sops.yaml rules
just sops-decrypt-binary-in-place "$FILE" # Decrypt a file in place using .sops.yaml rules
just sops-encrypt-binary "$FILE" # Encrypt a file in place using .sops.yaml rules
just sops-rotate-binary "$FILE" # Rotate sops encryption using .sops.yaml rules
Grafana monitoring of nodes managed by the networking team is available here
Grafana templates are available here. Note that one can modify a dashboard in Grafana, download the JSON model and commit it in this repo.
ssh
to dev-deployer
, go to ouroboros-network-ops
directory and execute:
just tf grafana apply
This deployment uses
cardano-parts, which
periodically releases new features and cardano-node
versions. Therefore,
it’s necessary to update the cardano-parts
version in this repository. This
section provides a step-by-step guide to perform the update.
Before starting, ensure that you have the appropriate environment:
-
Using the
dev-deployer
machine: Thedirenv
tool will automatically load the Nix development shell. -
Using a local copy: If working from a local repository, run the following command to enable the necessary Nix features:
nix --extra-experimental-features fetch-closure develop
Note: Updating from the
dev-deployer
machine is recommended because you can immediately test the changes. However, updating from a local copy makes opening a PR easier, as thedev-deployer
machine lacks GitHub credentials.Note: For those working regularly in this repo, it may be worth mentioning that the
fetch-closure
experimental feature can be added to thenix.conf
file so it doesn't need to be included manually on the cli each time. This would then also allow using direnv to automatically load the nix develop default shell when entering the local repo directory.
To check the current version of cardano-parts
, run:
nix flake metadata | grep cardano-parts
This command will return an output similar to the following:
├───cardano-parts: github:input-output-hk/cardano-parts/01f54d8feac449f846f33988f7d74711a1b856da
│ │ ├───nixpkgs follows input 'cardano-parts/nixpkgs'
│ │ ├───nixpkgs follows input 'cardano-parts/nixpkgs'
│ │ │ └───nixpkgs follows input 'cardano-parts/haskell-nix/hydra/nix/nixpkgs'
│ │ ├───nixpkgs follows input 'cardano-parts/haskell-nix/nixpkgs-unstable'
│ │ └───stackage follows input 'cardano-parts/empty-flake'
│ └───nixpkgs follows input 'cardano-parts/nixpkgs'
├───flake-parts follows input 'cardano-parts/flake-parts'
├───nixpkgs follows input 'cardano-parts/nixpkgs'
└───nixpkgs-unstable follows input 'cardano-parts/nixpkgs-unstable'
At the time of writing, the repository uses the cardano-parts
revision 01f54d8feac449f846f33988f7d74711a1b856da
.
To see which pull request (PR) corresponds to the current revision:
- Visit the cardano-parts
main
commits page. - Locate the commit that matches your current revision, or go directly to the commit URL for the current version.
For example, revision 01f54d8feac449f846f33988f7d74711a1b856da
corresponds
to PR #48.
Once you’ve identified the current PR, determine which PRs have been merged since. For instance, PR #48 is followed by the unfinished PR #49.
We'll focus on updating based on the latest merged PR (#48).
To update cardano-parts
to the latest version, run:
nix flake update cardano-parts
This updates to the latest version from the main
branch. If you want to
point to a specific branch or revision, manually update the flake.nix
file.
After updating, you can use the following commands to verify changes:
- Run
menu
to confirm that thecardano-node
version has changed. - Run
just
to list useful commands.
Review the PR details for any changes or recommendations. For example, PR #48 modifies the following files:
Justfile # Adds IPv6 recipe support
flake/colmena.nix # Adds staticIpv6 declaration
flake/opentofu/cluster.nix # Adds IPv6 Terraform support
flake/opentofu/grafana/alerts/cardano-node-divergence.nix-import # Updates alerts for new tracing metrics
flake/opentofu/grafana/alerts/cardano-node-forge.nix-import # Updates alerts for new tracing metrics
flake/opentofu/grafana/alerts/cardano-node-quality.nix-import # Updates alerts for new tracing metrics
flake/opentofu/grafana/alerts/cardano-node.nix-import # Updates alerts for new tracing metrics
flake/opentofu/grafana/dashboards/cardano-node-new-tracing.json # Updates dashboard for new tracing metrics
flake/opentofu/grafana/dashboards/cardano-node-p2p-new-tracing.json # Updates dashboard for new tracing metrics
flake/opentofu/grafana/dashboards/cardano-performance-new-tracing.json # Updates dashboard for new tracing metrics
Focus on inspecting changes to flake/colmena.nix
and flake/cluster.nix
, as
these are important. Use the following commands to review and apply changes:
-
To diff a file, use:
just template-diff "$FILE"
-
To apply changes if the diff looks correct, use:
just template-patch "$FILE"
For other files, you can clone them directly by running:
just template-clone "$FILE"
Note: Some files listed in the PR may not exist in your deployment. These can typically be ignored unless explicitly mentioned in the PR description.
After updating the necessary files, apply the changes with:
just apply
For specific machines, use something similar to:
just apply "'mainnet*'"
Important: If changes were made to opentofu
or grafana
files, you also
need to run:
just tf grafana apply
Important: If changes were made to cluster resources changes, such as aws ec2, route53, etc:
just tf apply # or just tf cluster apply
After successfully applying the changes, open a PR with the updated files.
Remember: If you're working from the dev-deployer
machine, do not commit
the ips-DONT-COMMIT
file.
## Customize deployment
Nix is a wonderful declarative configuration language, but it can be tricky to
understand where things come from due to it being untyped, all the implicit
arguments and abstraction layers. This section explains how one can update the
deployment to run a custom `cardano-node` version or a custom `cardano-node`
configuration.
### Custom `cardano-node` Version
The configuration entry point is the `./flake.nix` file, which defines inputs
and outputs for the deployment. Currently, the file includes local pins for
customizations, as shown below:
```nix
inputs = {
nixpkgs.follows = "cardano-parts/nixpkgs";
nixpkgs-unstable.follows = "cardano-parts/nixpkgs-unstable";
flake-parts.follows = "cardano-parts/flake-parts";
cardano-parts.url = "github:input-output-hk/cardano-parts";
# Local pins for additional customization:
cardano-node-8-12-2.url = "github:IntersectMBO/cardano-node/8.12.2";
iohk-nix-8-12-2.url = "github:input-output-hk/iohk-nix/577f4d5072945a88dda6f5cfe205e6b4829a0423";
};
To add a custom node version, simply include a new input pointing to the
desired branch or revision. If you're using an older cardano-node
version,
you might also need to pin an appropriate version of iohk-nix
.
Add your custom inputs in the flake.nix
file. For example, to add a custom
cardano-node
version:
cardano-node-<version>.url = "github:<your-organization>/cardano-node/<version>";
If needed, also add the corresponding iohk-nix
pin for compatibility with
older versions.
The flake.lock
file needs to be updated if the <version>
changes. This can be done with:
nix flake update cardano-node-<version>
Next, update the ./flake/colmena.nix
file, which contains the configuration
for each individual machine. Create a new variable for your custom node
version, for example:
node8-12-2 = {
imports = [
config.flake.cardano-parts.cluster.groups.default.meta.cardano-node-service
inputs.cardano-parts.nixosModules.profile-cardano-node-group
inputs.cardano-parts.nixosModules.profile-cardano-custom-metrics
{
cardano-parts.perNode = {
lib.cardanoLib = config.flake.cardano-parts.pkgs.special.cardanoLibCustom inputs.iohk-nix-8-12-2 "x86_64-linux";
pkgs = {
inherit
(inputs.cardano-node-8-12-2.packages.x86_64-linux)
cardano-cli
cardano-node
cardano-submit-api
;
};
};
}
];
};
This is based on the default node
configuration, with custom inputs for the
specified node version.
If you don’t need to include previous cardano-node
versions, you can
simplify the configuration by omitting unused packages. For example:
node-tx-submission = {
imports = [
# Base cardano-node service
config.flake.cardano-parts.cluster.groups.default.meta.cardano-node-service
# Config for cardano-node group deployments
inputs.cardano-parts.nixosModules.profile-cardano-node-group
inputs.cardano-parts.nixosModules.profile-cardano-custom-metrics
{
cardano-parts.perNode = {
pkgs = {
inherit
(inputs.cardano-node-tx-submission.packages.x86_64-linux)
cardano-node
;
};
};
}
];
};
You'll require to update `./flake.nix` file to include such an input.
Once the custom node variable is instantiated, update the machine
configurations. At the end of the ./flake/colmena.nix
file, you'll see
entries similar to:
mainnet1-rel-au-1 = {imports = [au m6i-xlarge (ebs 300) (group "mainnet1") node rel topoAu];};
mainnet1-rel-br-1 = {imports = [br m6i-xlarge (ebs 300) (group "mainnet1") node rel topoBr];};
mainnet1-rel-eu3-1 = {imports = [eu3 m6i-xlarge (ebs 300) (group "mainnet1") node-tx-submission rel topoEu3];};
mainnet1-rel-jp-1 = {imports = [jp m6i-xlarge (ebs 300) (group "mainnet1") node rel topoJp];};
mainnet1-rel-sa-1 = {imports = [sa m6i-xlarge (ebs 300) (group "mainnet1") node rel topoSa];};
mainnet1-rel-sg-1 = {imports = [sg m6i-xlarge (ebs 300) (group "mainnet1") node rel topoSg];};
mainnet1-rel-us1-1 = {imports = [us1 m6i-xlarge (ebs 300) (group "mainnet1") node-tx-submission rel topoUs1];};
mainnet1-rel-us2-1 = {imports = [us2 m6i-xlarge (ebs 300) (group "mainnet1") node-tx-submission rel topoUs2];};
Here, I've replaced node
with node-tx-submission
for the machines where
that configuration is required. You should make similar substitutions as
needed for your deployment.
If you want to adjust any configuration parameters, this section of the file is where you should make changes. Update the machine definitions as necessary to meet your specific requirements.