multiregion Stanza

Placement	`job -> multiregion`

This feature requires Nomad Enterprise(opens in new tab).

The multiregion stanza specifies that a job will be deployed to multiple federated regions. If omitted, the job will be deployed to a single region — the one specified by the region field or the -region command line flag to nomad job run.

Federated Nomad clusters are members of the same gossip cluster but not the same raft cluster; they don't share their data stores. Each region in a multiregion deployment gets an independent copy of the job, parameterized with the values of the region stanza. Nomad regions coordinate to rollout each region's deployment using rules determined by the strategy stanza.

job "docs" {
  multiregion {

    strategy {
      max_parallel = 1
      on_failure   = "fail_all"
    }

    region "west" {
      count = 2
      datacenters = ["west-1"]
      meta {
        my-key = "my-value-west"
      }
    }

    region "east" {
      count = 5
      datacenters = ["east-1", "east-2"]
      meta {
        my-key = "my-value-east"
      }
    }
  }
}

Multiregion Deployment States

A single region deployment using one of the various upgrade strategies begins in the running state, and ends in the successful state, the canceled state (if another deployment supersedes it before it it's complete), or the failed state. A failed single region deployment may automatically revert to the previous version of the job if its update stanza has the auto_revert setting.

In a multiregion deployment, regions begin in the pending state. This allows Nomad to determine that all regions have accepted the job before continuing. At this point up to max_parallel regions will enter running at a time. When each region completes its local deployment, it enters a blocked state where it waits until the last region has completed the deployment. The final region will unblock the regions to mark them as successful.

`multiregion` Parameters

strategy (Strategy: nil) - Specifies a rollout strategy for the regions.
region (Region: nil) - Specifies the parameters for a specific region. This can be specified multiple times to define the set of regions for the multiregion deployment. Regions are ordered; depending on the rollout strategy Nomad may roll out to each region in order or to several at a time.

Note: Regions can be added, but regions that are removed will not be stopped and will be ignored by the deployment. This behavior may change before multiregion deployments are considered GA.

`strategy` Parameters

max_parallel (int: <optional>) - Specifies the maximum number of region deployments that a multiregion will have in a running state at a time. By default, Nomad will deploy all regions simultaneously.
on_failure (string: <optional>) - Specifies the behavior when a region deployment fails. Available options are "fail_all", "fail_local", or the default (empty ""). This field and its interactions with the job's update stanza is described in the examples below.
Each region within a multiregion deployment follows the auto_revert strategy of its own update stanza (if any). The multiregion on_failure field tells Nomad how many other regions should be marked as failed when one region's deployment fails:
- The default behavior is that the failed region and all regions that come after it in order are marked as failed.
- If on_failure: "fail_all" is set, all regions will be marked as failed. If all regions have already completed their deployments, it's possible that a region may transition from blocked to successful while another region is failing. This successful region cannot be rolled back.
- If on_failure: "fail_local" is set, only the failed region will be marked as failed. The remaining regions will move on to blocked status. At this point, you'll need to manually unblock regions to mark them successful with the nomad deployment unblock command or correct the conditions that led to the failure and resubmit the job.

For system jobs, only max_parallel is enforced. The system scheduler will be updated to support on_failure when the the update stanza is fully supported for system jobs in a future release.

`region` Parameters

The name of a region must match the name of one of the federated regions.

count (int: <optional>) - Specifies a count override for task groups in the region. If a task group specifies a count = 0, its count will be replaced with this value. If a task group specifies its own count or omits the count field, this value will be ignored. This value must be non-negative.
datacenters (array<string>: <optional>) - A list of datacenters in the region which are eligible for task placement. If not provided, the datacenters field of the job will be used.
meta - Meta: nil - The meta stanza allows for user-defined arbitrary key-value pairs. The meta specified for each region will be merged with the meta stanza at the job level.

As described above, the parameters for each region replace the default values for the field with the same name for each region.

`multiregion` Examples

The following examples only show the multiregion stanza and the other stanzas it might be interacting with.

Max Parallel

This example shows the use of max_parallel. This job will deploy first to the "north" and "south" regions. If either "north" finishes and enters the blocked state, then "east" will be next. At most 2 regions will be in a running state at any given time.

multiregion {

  strategy {
    max_parallel = 2
  }

  region "north" {}
  region "south" {}
  region "east" {}
  region "west" {}
}

Rollback Regions

This example shows the default value of on_failure. Because max_parallel = 1, the "north" region will deploy first, followed by "south", and so on. But supposing the "east" region failed, both the "east" region and the "west" region would be marked failed. Because the job has an update stanza with auto_revert=true, both regions would then rollback to the previous job version. The "north" and "south" regions would remain blocked until an operator intervenes.

multiregion {

  strategy {
    on_failure = ""
    max_parallel = 1
  }

  region "north" {}
  region "south" {}
  region "east" {}
  region "west" {}
}

update {
  auto_revert = true
}

Override Counts

This example shows how the count field override the default count of the task group. The job the deploys 2 "worker" and 1 "controller" allocations to the "west" region, and 5 "worker" and 1 "controller" task groups to the "east" region.

multiregion {

    region "west" {
      count = 2
    }

    region "east" {
      count = 5
    }
  }
}

group "worker" {
  count = 0
}

group "controller" {
  count = 1
}

Merging Meta

This example shows how the meta is merged with the meta field of the job, group, and task. A task in "west" will have the values first-key="regional-west", second-key="group-level", whereas a task in "east" will have the values first-key="job-level", second-key="group-level".

multiregion {

    region "west" {
      meta {
        first-key = "regional-west"
        second-key = "regional-west"
      }
    }

    region "east" {
      meta {
        second-key = "regional-east"
      }
    }
  }
}

meta {
  first-key = "job-level"
}

group "worker" {
  meta {
    second-key = "group-level"
  }
}