Skip to content
This repository was archived by the owner on Sep 21, 2023. It is now read-only.

vaultStatus - Active node: pod doesnt exist any more #321

Open
sebdoido opened this issue Jun 4, 2018 · 1 comment
Open

vaultStatus - Active node: pod doesnt exist any more #321

sebdoido opened this issue Jun 4, 2018 · 1 comment

Comments

@sebdoido
Copy link

sebdoido commented Jun 4, 2018

Hi there

I have an issue with my vault cluster.

Currently, i don't have any more an active vault pod.

kubectl -n default get vault poc-vault -o jsonpath='{.status.vaultStatus.active}' | xargs -0 -I {} kubectl -n default port-forward {} 8200
Error from server (NotFound): pods "poc-vault-d66f8f747-24f5m" not found
k get pods | grep vault

poc-vault-d66f8f747-lqt76                                      1/2       Running   0          2d
poc-vault-d66f8f747-rgdth                                      1/2       Running   2          2d
poc-vault-etcd-2q9tl8m4hn                                      1/1       Running   0          3d
poc-vault-etcd-gqtp5sgs4p                                      1/1       Running   0          2d
poc-vault-etcd-x6ht4ckrpz                                      1/1       Running   0          2d
kubectl -n default get vault poc-vault -o json
{
    "apiVersion": "vault.security.coreos.com/v1alpha1",
    "kind": "VaultService",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"vault.security.coreos.com/v1alpha1\",\"kind\":\"VaultService\",\"metadata\":{\"annotations\":{},\"name\":\"poc-vault\",\"namespace\":\"default\"},\"spec\":{\"TLS\":{\"static\":{\"clientSecret\":\"vault-client-tls\",\"serverSecret\":\"vault-server-tls\"}},\"nodes\":2,\"version\":\"0.9.1-0\"}}\n"
        },
        "clusterName": "",
        "creationTimestamp": "2018-05-28T13:55:46Z",
        "generation": 0,
        "name": "poc-vault",
        "namespace": "default",
        "resourceVersion": "5029789",
        "selfLink": "/apis/vault.security.coreos.com/v1alpha1/namespaces/default/vaultservices/poc-vault",
        "uid": "d2174cf2-627e-11e8-950f-fa163e1a205a"
    },
    "spec": {
        "TLS": {
            "static": {
                "clientSecret": "vault-client-tls",
                "serverSecret": "vault-server-tls"
            }
        },
        "baseImage": "quay.io/coreos/vault",
        "configMapName": "",
        "nodes": 2,
        "version": "0.9.1-0"
    },
    "status": {
        "clientPort": 8200,
        "initialized": true,
        "phase": "Running",
        "serviceName": "poc-vault",
        "updatedNodes": [
            "poc-vault-d66f8f747-lqt76",
            "poc-vault-d66f8f747-rgdth"
        ],
        "vaultStatus": {
            "active": "poc-vault-d66f8f747-24f5m",
            "sealed": [
                "poc-vault-d66f8f747-lqt76",
                "poc-vault-d66f8f747-rgdth"
            ],
            "standby": null
        }
    }
}

The current active pod in vaultStatus is not here any more...

I check the operator logs

k logs -f vault-operator-67d5846657-4zdfq
time="2018-06-01T14:02:44Z" level=info msg="Go Version: go1.9.2"
time="2018-06-01T14:02:44Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-06-01T14:02:44Z" level=info msg="vault-operator Version: 0.1.9"
time="2018-06-01T14:02:44Z" level=info msg="Git SHA: 43a1dd7"
ERROR: logging before flag.Parse: I0601 14:02:44.832229       1 leaderelection.go:174] attempting to acquire leader lease...
ERROR: logging before flag.Parse: I0601 14:03:02.263682       1 leaderelection.go:184] successfully acquired lease default/vault-operator
time="2018-06-01T14:03:02Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"default\", Name:\"vault-operator\", UID:\"ce489616-627b-11e8-950f-fa163e1a205a\", APIVersion:\"v1\", ResourceVersion:\"4822438\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' vault-operator-67d5846657-4zdfq became leader"
time="2018-06-01T14:03:02Z" level=info msg="starting Vaults controller"
time="2018-06-01T14:03:02Z" level=info msg="Vault CR (default/poc-vault) is created"
time="2018-06-01T14:03:12Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-rgdth): Get https://10-2-2-76.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp 10.2.2.76:8200: getsockopt: connection refused"
time="2018-06-01T14:03:22Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-rgdth): Get https://10-2-2-76.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp 10.2.2.76:8200: getsockopt: connection refused"
time="2018-06-02T01:38:44Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-24f5m): Get https://10-2-0-67.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp: i/o timeout"
time="2018-06-02T01:39:14Z" level=error msg="failed to update vault replica status: failed requesting health info for the vault pod (default/poc-vault-d66f8f747-rgdth): Get https://10-2-2-76.default.pod:8200/v1/sys/health?sealedcode=299&uninitcode=299: dial tcp: i/o timeout"
ERROR: logging before flag.Parse: W0602 13:29:46.715365       1 reflector.go:334] github.com/coreos-inc/vault-operator/pkg/operator/controller.go:35: watch of *v1alpha1.VaultService ended with: too old resource version: 5029789 (5239818) 

I delete my 2 previous pods

k delete pod poc-vault-d66f8f747-lqt76 poc-vault-d66f8f747-rgdth

they were recreated but vault-poc seems to be stuck on poc-vault-d66f8f747-24f5m

kubectl -n default get vault poc-vault -o jsonpath='{.status.vaultStatus.active}'
poc-vault-d66f8f747-24f5m

any ideas?
How to force to check again the other pods and elect a new master?

Thanks for your help

@cu12
Copy link

cu12 commented Aug 1, 2018

Apparently you have to unseal again in order to replace the active node :/

If you run Vault with two pods, you have to unseal both
You could verify if both are unsealed if you check the vaultStatus
it should look like this:

$ kubectl -n kube-system get vault vault -o json
...
        "vaultStatus": {
            "active": "vault-98bd96bd8-8cxt9",
            "sealed": null,
            "standby": [
                "vault-98bd96bd8-jr7gf"
            ]
        }
...

In this case it fails over properly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants