Subconscious Networks SRE

To migrate to new machine images in Hashicorp’s Nomad, we can’t do a git-ops approach easily. It requires manual cluster management.

From a high level, we need to bring up a parallel set of nodes, get them configured correctly, and then turn off the original set. This is made more difficult because both consul and nomad use raft consensus, so we have to manage that state and who the leader is carefully, else we break the cluster which causes more manual intervention.

First, bring up a new set of nodes. In terraform, there are two instances of application platform, blue and green. One is active and has nodes (server_count parameter controls this), the other doesn’t. Add nodes to the one that doesn’t have any, along with the updated machine image. Once that completes, you should have new instances up using the new machine image.

Validate things are running correctly and they have joined the peer list:

consul operator raft list-peers
consul members # this one also includes the client instances
nomad operator raft list-peers
nomad server members
nomad node status # shows client instances

consul maint -enable -reason="" will remove a node from consul’s raft peer list and remove it from DNS queries. Do this on a node before you delete it. Note that this is a stateful flag. The server won’t go out of maintenance mode until you disable maintenance mode. That’s fine if you’re deleting the node.

From there, we need to delete the old servers.

After that’s done, tell the other peers the old nomad servers should be removed from the peer list. You can get the list of servers with nomad server members and then remove any that shouldn’t be there with nomad server force-leave green-test-server-0. If that server is still up, it’ll rejoin.

For servers, we need to ensure our servers are in the raft consensus and remove the old ones.

$ nomad operator raft list-peers
Node                        ID                                    Address            State     Voter  RaftProtocol
test-server-0.global        b3f3dd36-0c73-4a2b-ec22-46f86e07ef62  192.168.87.2:4647  follower  true   3
test-server-1.global        a0471d3e-ee9b-1153-8e9d-46d35bd6ce26  192.168.87.3:4647  follower  true   3
test-server-2.global        779beb08-25f9-1219-635c-6880cf5186e6  192.168.87.4:4647  leader    true   3
green-test-server-1.global  03313bb6-08e7-73ac-26c0-84b635750929  192.168.87.6:4647  follower  true   3
green-test-server-2.global  c8a984a6-346a-3e50-9b20-00a88ffde8a7  192.168.87.7:4647  follower  true   3
 
$ nomad operator raft remove-peer -peer-id 03313bb6-08e7-73ac-26c0-84b635750929
Removed peer with id "03313bb6-08e7-73ac-26c0-84b635750929"
 
$ nomad operator raft list-peers
Node                  ID                                    Address            State     Voter  RaftProtocol
test-server-0.global  b3f3dd36-0c73-4a2b-ec22-46f86e07ef62  192.168.87.2:4647  follower  true   3
test-server-1.global  a0471d3e-ee9b-1153-8e9d-46d35bd6ce26  192.168.87.3:4647  follower  true   3
test-server-2.global  779beb08-25f9-1219-635c-6880cf5186e6  192.168.87.4:4647  leader    true   3

If the servers keep coming back, you’ll need to turn off nomad on those services by doing systemctl stop nomad.

Drain the “client” server. This may take a bit.

nomad node drain -enable -yes 46f1

46f1 is the start to the node id.

Verify the node is ineligible for scheduling.

$ nomad node status
 
ID        Node Pool  DC   Name           Class    Drain  Eligibility  Status
0c0369a7  <none>     dc1  test-client-1  service  false  eligible     ready
1080c9d5  <none>     dc1  test-client-0  ingress  false  eligible     ready
e4567d4b  <none>     dc1  test-client-3  service  false  eligible     ready
6dcc189d  <none>     dc1  test-client-4  service  false  eligible     ready
74144982  <none>     dc1  test-client-2  service  false  eligible     ready

You can mark a node for ineligibilty for scheduling before draining, so old workloads don’t get put onto a server which is about to be turned off.

nomad node eligibility -disable 46f1