Rolls (AKS)
Cloud service provider relevance: AKS
The Ocean Roll feature effortlessly synchronizes cluster infrastructure with a fresh image, user data, or security groups, eliminating the need to turn off the Ocean autoscaler or manually detach nodes within the cluster.
With Ocean, you can roll a cluster with just a single click. The roll feature intelligently considers the workloads running in the cluster. It freezes any scale-down activities within the cluster and seamlessly deploys new compute capacity to meet the demands of the workloads. During the startup phase of the new nodes, the existing nodes retain the ability to scale up as needed, ensuring uninterrupted operations. The old nodes are only scaled down once the new nodes are confirmed to be in a healthy state.
How It Works
Whether you are rolling your entire Ocean cluster, a specific virtual node group (VNG), or only specific nodes, Ocean can divide the roll into batches according to your selected batch sizes. For example, if you roll with the default batch size of 20%, Ocean divides the roll into 5 batches and processes as follows:
-
Ocean calculates the number of batches required based on your specified batch size and distributes workloads evenly across the batches.
-
Ocean begins with the first batch, replacing each node while ensuring workloads are successfully accommodated on new nodes. All relevant constraints are considered during the replacement process.
-
When all nodes in a batch finish processing and at least 50% are successfully replaced, Ocean proceeds to the next batch. You can configure this percentage using the
batchMinHealthyPercentageparameter (described later).
Replace Node with Smaller Nodes
A cluster roll can replace a single node with multiple smaller nodes. This avoids a cluster roll failure when only smaller node types are configured in the Ocean cluster before initiating the roll. Rather than replacing each existing node with one of the same type, Ocean provisions the most relevant infrastructure during the cluster roll. This is based on the workloads currently running on the nodes selected for rolling. This is especially helpful when you have modified the list of allowed node types or if you want to remove and replace a specific node type with multiple smaller ones.
This approach can improve cluster utilization by running workloads on infrastructure that better matches their requirements. While Ocean constantly attempts to scale down the cluster, a cluster roll can achieve better utilization when automatic scaling is not possible.
Roll Parameters
-
Respect Pod Disruption Budget (PDB): Some pods may have a Pod Disruption Budget (PDB). In the Spot API, use
respectPdbto instruct Ocean to verify the PDB. WhenrespectPdbis set toTrue, Ocean will not replace a node if the PDB is violated. -
Respect Restrict Scale Down during Roll: Rolls do not consider the restrict-scale-down label. Ocean will replace a node even if a task or pod uses this label. Ocean's autoscaler considers all configured constraints before the roll.
-
Roll Batch Size Percentage: Indicates the percentage of the cluster's target capacity that will be rolled during a node pool update or scale operation. For example, if the cluster's target capacity is 50 nodes, and the Batch Size Percentage is set to 20%, each batch will consist of 20% of the target capacity, 10 nodes (50 nodes * 20% = 10 nodes).
-
Batch Size Healthy Percentage: Indicates the minimum percentage of healthy instances in a single batch. The roll will fail if the number of healthy instances in a single batch is less than this percentage. The range is 1-100; if the parameter value is null, the default value will be 50%. Ocean considers instances not replaced due to PDB as healthy. You can override the behavior of the
batchMinHealthyPercentageparameter by setting theignorePdbparameter to True.
Node Status
During the replacement process, Ocean reports a status for each node:
-
REPLACED: A new node successfully replaced the node.
-
TO_BE_REPLACED: Ocean has not yet tried to replace the node.
-
COULD_NOT_BE_REPLACED: The node was not replaced. This may occur, for example, when no replacement node becomes healthy within the grace period.
-
NOT_REPLACED_DUE_TO_PDB: Replacing the node violates the PDB configuration on one of the pods running on the node. This status is only relevant when
respectPdbis set to True. If a node could not be replaced due to PDB, and the allowed PDB % of nodes for the batch was respected, Ocean would continue to the next batch. -
NOT_REPLACED_DUE_TO_RSD: Replacing the node violates the Restrict Scale Down configuration on one of the pods running on the node. This status is only relevant when
respectRestrictScaleDownis set to True. If a node could not be replaced due to Restrict Scale Down, and the allowed Restrict Scale Down % of nodes for the batch was respected, Ocean would continue to the next batch.
Roll Status
These are the roll statuses:
-
IN_PROGRESS: Nodes are being replaced successfully.
-
FAILED: An error caused the roll to fail. An error message is recorded in the Elastilog.
-
STOPPED: The roll was manually stopped. When you stop a roll, the nodes remain in the state at the stop time. (For example, there is no rollback to the initial state.)
-
COMPLETED: All nodes have been processed, and at least 50% have been successfully replaced.
In the UI console, a specific batch may appear with a Pending state. This means that even though the roll process has started, that batch has not yet started to replace its nodes.
Log Messages
These messages are recorded in the log:
Roll $\{ROLL_ID} has completed successfully.Roll $\{ROLL_ID} has failed. Reason "$\{FAILURE_REASON}".Roll $\{ROLL_ID} has started. Number of batches "$\{NUM_OF_BATCHES}".Roll $\{ROLL_ID} has stopped.
These are reasons for failure:
- The roll has been stuck in the same status for too long.
- The Ocean Controller is not active.
- More than 50 percent of nodes could not be replaced.
- There may be constraint or configuration mismatches such as labels, selectors, taints, or affinity rules.
- There may be one or more unhealthy nodes.
- Kubernetes version not supported.
Roll from Spot API
You can schedule a roll in the Create Cluster or Update Cluster Spot API using a cron expression. This lets you run the roll easily during off hours.
Roll per Cluster, Virtual Node Group, or Node Pool
Ocean virtual node groups are subsets of nodes on a cluster that you can configure for specific purposes within a single Ocean cluster, for example:
- Separate development, test, and production environments.
- Different teams.
- Different applications or microservices.
In AKS, nodes with the same configuration are grouped into node pools, which contain the underlying VMs that run your applications.
In Ocean, each virtual node group (VNG) manages its own set of node pools, so each virtual node group has multiple node pools but not vice versa.
AKS is responsible for launching the VMs with the given configuration and registering them with the cluster. Ocean uses the node pool data to get information about a VM.
The Spot API lets you roll one or more nodes in a virtual node group without having to roll the entire cluster, for example, when you do not want to roll the entire cluster for a local software update. You do this by specifying a list of node IDs or a specific VNG ID.
The virtual node group parameter initiates a roll of one or more virtual node groups in the cluster. When you specify a VNG ID, all the nodes in that virtual node group are rolled.
Similarly, the Spot API lets you roll one or more node pools without rolling the entire cluster. Do this by specifying a list of node pool IDs or a specific node pool ID.
The node pool parameter initiates a roll of one or more node pools in the cluster. When you specify a node pool ID, all the nodes in that node pool are rolled.
Roll from Console
Access the Ocean Cluster Rolls Tab
To access the Ocean Cloud Cluster Rolls tab:
- In the left main menu, click Ocean, and click Cloud Clusters.
- Select a cluster from the list of clusters.
- Click Rolls.
In the Rolls tab, you can run immediate rolls for your clusters, virtual node groups, and node pools or schedule your cluster and VNG rolls.
-
The Rolls tab is empty if you have not run or scheduled a roll in this cluster.
Otherwise:
- If at least one roll exists, the Rolls History list appears.
- Configured roll schedules appear in the Scheduled Rolls list below the Rolls History list.
The Rolls History list contains an entry for each roll.
These are the columns:
- Roll ID (unique ID for the roll).
- Role Scope (cluster, virtual node group, or node pool).
- Comments (optional).
- Start Time for roll: mm/dd/yyyy, hh:mm:ss
- End Time for roll: mm/dd/yyyy, hh:mm:ss
- Nodes Rolled (number of nodes rolled) x out of y, for example 20/23
- Roll Status:
Green color: Completed: Roll successfully completed.
Orange color: Partly completed: At least one node could not be replaced.
Gray color: Stopped: Roll was stopped.
Red color: All nodes could not be replaced.
Click a down arrow for an entry to drill down for information at the node level:
- Node Name.
- Node ID.
- Node Pool Name.
- VNG Name: Click on the link to access settings for the virtual node group.
- VNG ID.
- Batch Number: Number of the batch that was run.
- Node Status:
Green color: Completed: Node was replaced.
Red color: Node could not be replaced.
The Scheduled Rolls list contains an entry for each schedule:
The columns are as follows:
- Role Scope (cluster, virtual node group, or node pool).
- Scheduled frequency.
Roll Now
To roll immediately:
-
From the Rolls tab: If this is your first roll, click either Cluster, Virtual Node Group, or Node Pool.
-OR-
Click Cluster Roll, VNG Roll, or Node Pool Roll from the Create Roll menu on the right of the screen.
Alternative options for starting a roll:
- From the cloud cluster, virtual node groups tab: Select a virtual node group from the list, and then select VNG Roll from the Actions drop-down menu at the top-right of the screen.
- From the cloud cluster overview tab, select Cluster Roll from the Actions menu at the top-right of the screen.
noteThe dialog box that appears depends on what you selected to roll (sample shown below).
-
If you are rolling virtual node groups or node pools, select from the menu at the top of the dialog box. You can optionally select All.
-
Configure the Roll Parameters:
- Set the size of a roll batch (%).
- Set the batch size healthy percentage (%).
- Add an optional comment.
- Turn on or turn off Respect Pod Disruption Budget (PDB).
- Turn on or turn off Respect Restrict Scale Down.
-
Click Roll Cluster / VNG / Node Pool.
noteTo stop a roll while running, click the Stop Roll button on the screen's right, then click Stop Roll in the confirmation box.
Create a Roll Schedule
You can schedule cluster or virtual node group rolls. You cannot schedule node pool rolls.
To create a roll schedule:
-
To create your first roll schedule, click Schedule a Roll
-OR-
From the Create Roll drop-down menu on the right of the screen, click Schedule Roll.
-
In the first step of the wizard, select the roll type. The available roll types depend on your system deployment.
-
In the second step of the wizard, if you are rolling virtual node groups or node pools, select from the drop-down menu at the top of the dialog box. You can optionally select All.
-
Configure the Roll Parameters:
- Configure the size of a roll batch (%).
- Configure the Batch size healthy percentage (%).
- Add an optional comment.
- Turn on or turn off Respect Pod Disruption Budget (PDB).
- Turn on or turn off Respect Restrict Scale Down.
-
In the third step of the wizard, set the schedule frequency using the day/week/month/time controls or type in a Cron expression.
-
Click Schedule Roll. Your schedule appears in the Rolls tab - Scheduled Rolls list under Rolls History.
Turn a Scheduled Roll On or Off
- To the right of the scheduled roll, click the slider right (turn on) or left (turn off).
Delete a Scheduled Roll
- Click the wastebasket icon to the right of the scheduled roll.
- When the confirmation message appears, type "Delete" and then click Delete.