Modular Cluster Topology: Upgrade a Multinode Environment with the ISO and Template#

Important

  • When upgrading from an existing Modular Cluster Topology that was available since VOSS Automate 21.1, use the steps listed here.

  • Before upgrading to release 24.1:

    Install EKB-21455-21.4.0_patch.script first. Refer to MOP-EKB-21455-21.4.0_patch.pdf.

    • Server Name: https://voss.portalshape.com

    • Path: Downloads > VOSS Automate > 24.1 > Upgrade > ISO

    • MOP: MOP-EKB-21455-21.4.0_patch.pdf

    • Patch File: EKB-21455-21.4.0_patch.script

  • Before upgrading to release 24.1, ensure that:

    • an additional 70 GB disk is available for the Insights database

    • all application and database nodes memory allocation is 32 GB with 32 GB reservation

    See: Adding Hard Disk Space and VOSS Automate Hardware Specifications.

    This disk is needed to assign to the insights-voss-sync:database mount point. See: Mount the Insights disk (outside, after Maintenance Window).

  • Before upgrading to release 24.1, ensure that sufficient time is allocated to the maintenance window. This may vary in accordance with your topology, number of devices and subscribers.

    The information below serves as a guideline VOSS support can be contacted if further guidance is required:

    • Cluster upgrade: 4h

    • Template install: 2.5h

    • For a 500K Data User system (13Mil RESOURCE documents), the expected upgrade_db step is about 12h.

    • For a 160K Data User system (2.5Mil RESOURCE documents), the expected upgrade_db step is about 2.5h.

    You can follow the progress on the Admin Portal transaction list.

  • Tasks that are marked Prior to Maintenance Window can be completed a few days prior to the scheduled maintenance window so that VOSS support can be contacted if needed and in order to allow for reduce down time.

The standard screen command should be used where indicated. See: Using the screen command.

Primary database and application node in a Modular Cluster Topology

  • Verify the primary application node (UN2) with the cluster primary role application command run on the node. The output should be true, for example:

    platform@UN2:~$ cluster primary role application
    is_primary: true
    
  • Verify the primary database node (UN1) with the cluster primary role database command run on the node. The output should be true, for example:

    platform@UN1:~$ cluster primary role database
    is_primary: true
    

Download Files and Check (Prior to Maintenance Window)#

Note

Ensure that the .iso file is available on all nodes.

Description and Steps

Notes and Status

VOSS files: https://voss.portalshape.com > Downloads > VOSS Automate > XXX > Upgrade

Download .iso and .template files, where XXX is the release number.

  • Transfer the .iso file to the media/ folder of the all nodes.

  • Transfer the .template file to the media/ folder of the primary application node.

Two transfer options:

Either using SFTP:

  • sftp platform@<unified_node_hostname>

  • cd media

  • put <upgrade_iso_file>

  • put <upgrade_template_file>

Or using SCP:

  • scp <upgrade_iso_file> platform@<unified_node_ip_address>:~/media

  • scp <upgrade_template_file> platform@<unified_node_ip_address>:~/media

Verify that the .iso image and .template file copied:

  • ls -l media/

Verify that the original .sha256 checksums on the Download site match.

  • primary database node: system checksum media/<upgrade_iso_file>

    Checksum: <SHA256>

  • primary application node: system checksum media/<upgrade_template_file>

    Checksum: <SHA256>

Version Check (Prior to Maintenance Window)#

Description and Steps

Notes and Status

Version

Record the current version information. This is required for upgrade troubleshooting.

  • Log in on the Admin Portal and record the information contained in the menu: About > Version

Security and Health Check Steps (Prior to Maintenance Window)#

Description and Steps

Notes and Status

Choose an option:

  • If you’re upgrading from: [21.4-PB4, 21.4-PB5]

    Place the system in maintenance mode to suspend any scheduled transactions. Scheduled transactions that are in progress will be allowed to complete, or otherwise, cancel data sync transactions that are in progress on the GUI. Refer to the Core Feature Guide. For details, refer to the System Maintenance Mode topic in the Platform Guide.

    On an application node of the system, run:

    cluster maintenance-mode start

    You can verify the maintenance mode status with:

    cluster maintenance-mode status

  • If you’re upgrading from: [21.4, 21.4-PB1, 21.4-PB2, 21.4-PB3]

    Turn off any scheduled imports to prevent syncs triggering part way through the upgrade.

    Note

    Schedules can easily be activated and deactivated from the Bulk Schedule Activation / Deactivation menu available on the available on the MVS-DataSync-Dashboard.

    Two options are available:

    Individually for each job:

    1. Log in on the Admin Portal as a high level administrator above Provider level.

    2. Select the Scheduling menu to view scheduled jobs.

    3. Click each scheduled job. On the Base tab, uncheck the Activate check box.

    Mass modify:

    1. On the Admin Portal, export scheduled syncs into a bulk load sheet.

    2. Modify the schedule settings to de-activate scheduled syncs.

    3. Import the sheet.

    Schedules enabled on the CLI:

    1. Run schedule list to check if any schedules exist and overlap with the maintenance window.

    2. For overlapping schedules, disable. Run schedule disable <job-name>.

Verify that the primary database node is the active primary node at the time of upgrade.

database config

Ensure that the node on which the installation will be initiated has the stateStr parameter set to PRIMARY and has the highest priority number (highest priority number could vary depending on cluster layout).

Example output

<ip address>:27020:
  priority: <number>
  stateStr: PRIMARY
  storageEngine: WiredTiger

Description and Steps

Notes and Status

The following step is needed if own private certificate and generated SAN certificates are required and the web cert gen_csr command was run. For details, refer to the Web Certificate Setup Options topic in the Platform Guide.

The steps below are needed to check if a CSR private key exists but no associated signed certificate is available.

Request VOSS support to run on the CLI as root user, the following command:

for LST in /opt/platform/apps/nginx/config/csr/*;
do openssl x509 -in $LST -text -noout >/dev/null
2>&1 && SIGNED="$LST"; done

echo $SIGNED

If the echo $SIGNED command output is blank, back up the csr/ directory with for example the following command:

mv /opt/platform/apps/nginx/config/csr/ /opt/platform/apps/nginx/config/csrbackup

Description and Steps

Notes and Status

Validate the system health. Carry out the following:

  • system mount - mount upgrade ISO.

  • app install check_cluster - install the new version of the cluster check command.

    For details, refer to Cluster Check.
  • cluster check - inspect the output of this command for warnings and errors. You can also use cluster check verbose to see more details, for example, avx enabled. While warnings will not prevent an upgrade, it is advisable that these be resolved prior to upgrading where possible. Some warnings may be resolved by upgrading.

    For troubleshooting and resolutions, also refer to the Health Checks for Cluster Installations Guide and Platform Guide.

    If there is any sign of the paths below are over 80% full, a clean-up is needed, for example to avoid risk of full logs occurring during upgrade. Clean-up steps are indicated next to the paths:

    /              (call support if over 80%)
    /var/log       (run: log purge)
    /opt/platform  (remove any unnecessary files from /media directory)
    /tmp           (reboot)
    

    On the Primary Unified Node, verify there are no pending Security Updates on any of the nodes.

Note

If you run cluster status after installing the new version of cluster check, any error message regarding a failed command can be ignored. This error message will not show after upgrade.

Pre-Upgrade Steps (Maintenance Window)#

As part of the rollback procedure, ensure that a suitable restore point is obtained prior to the start of the activity, as per the guidelines for the infrastructure on which the VOSS Automate platform is deployed.

Optional: If a backup is also required - on the primary database node, use the backup add <location-name> and backup create <location-name> commands. For details, refer to the Platform Guide.

Description and Steps

Notes and Status

After restore point creation and before upgrading: validate system health and check all services, nodes and weights for the cluster:

  • cluster run application cluster list

    Make sure all application nodes show.

  • cluster check - inspect the output of this command, for warnings and errors. You can also use cluster check verbose to see more details.

    • Make sure no services are stopped/broken. The message ‘suspended waiting for mongo’ is normal on the fresh database nodes.

    • Check that the database weights are set. It is critical to ensure the weights are set before upgrading a cluster. Example output:

      172.29.21.240:
          weight: 80
      172.29.21.241:
          weight: 70
      172.29.21.243:
          weight: 60
      172.29.21.244:
          weight: 50
      
    • Verify the primary node in the primary site and ensure no nodes are in the ‘recovering’ state (stateStr is not RECOVERING). On the primary node:

On the primary application node, verify there are no pending Security Updates on any of the nodes:

  • cluster run all security check

Upgrade (Maintenance Window)#

Note

  • By default, the cluster upgrade is carried out in parallel on all nodes and without any backup in order to provide a fast upgrade.

  • For systems upgrading to 24.1 from 21.4.0 - 21.4-PB5:

    • The VOSS platform maintenance mode will be started automatically when the cluster upgrade command is run. This prevents any new occurrences of scheduled transactions, including the 24.1 database syncs associated with insights sync. For details on insights sync, see the Insights Analytics topic in the Platform Guide.

    • The cluster maintenance-mode stop command must however be run manually after the maintenance window of the upgrade: End of the Maintenance Window and Restoring Schedules.

For details on the VOSS platform maintenance mode, see the Maintenance Mode topic in the Platform Guide.

Description and Steps

Notes and Status

It is recommended that the upgrade steps are run in a terminal opened with the screen command.

Verify that the ISO has been uploaded to the media/ directory on each node. This will speed up the upgrade time.

On the primary database node:

  • screen

  • cluster upgrade media/<upgrade_iso_file>

Close screen: Ctrl-a \

Log in on the primary database node and run cluster run database app status. If the report shows insights-voss-sync:realtime stopped on some database nodes, request assistance with root access on the system CLI from VOSS support in order to carry out the following on the primary database node:

  1. Run the command:

    /opt/platform/mags/insights-voss-sync-mag-script install database

    This should return: Configured Postgres secrets.

  2. Verify that the database nodes now all have the correct mongo info:

    cluster run database diag config app insights-voss-sync /mongo

    All nodes should have the password/port/user shown as below:

    mongo:
        password: ********
        port: 27020
        user: insights-platform
    
  3. Restart the insights-voss-sync:real-time service on all database nodes:

    cluster run database app start insights-voss-sync:real-time

All unused docker images except selfservice and voss_ubuntu images will be removed from the system at this stage.

Post-Upgrade and Health Steps (Maintenance Window)#

Description and Steps

Notes and Status

On the primary database node, verify the cluster status:

  • cluster check

  • If any of the above commands show errors, check for further details to assist with troubleshooting:

    cluster run all diag health

    For a cloud deployment (MS Azure / AWS), also refer to the steps below.

To remove a mount directory media/<iso_file basename> on nodes that may have remained after for example an upgrade, run:

cluster run all app cleanup

on the primary database node.

Check for needed security updates. On the primary application node, run:

  • cluster run all security check

If one or more updates are required for any node, run on the primary application node:

  • cluster run all security update

    If upgrading a cloud deployment (MS Azure / AWS), run cluster check. If an error shows at each node:

    grub-pc: package in an undesired state

    then request assistance with root access on the system CLI from VOSS Support in order to run the command on each node:

    dpkg --configure -a

    A text user interface opens and you will be prompted:

    • “GRUB install devices:”- Do not select any device. Press <Tab> to highlight <Ok> and press <Enter>.

    • At “Continuing without installing GRUB?”, press <Yes>

    • Exit root user, run cluster check again and verify the error does not show.

Note: if the system reboots, do not carry out the next manual reboot step.

Manual reboot only if needed:

  • cluster run notme system reboot

If node messages: <node name> failed with timeout are displayed, these can be ignored.

  • system reboot

Since all services will be stopped, this takes some time.

If upgrade is successful, the screen session can be closed by typing exit in the screen terminal. If errors occurred, keep the screen terminal open for troubleshooting purposes and contact VOSS support.

Database Schema Upgrade (Maintenance Window)#

Description and Steps

Notes and Status

It is recommended that the upgrade steps are run in a terminal opened with the screen command.

On the primary application node:

  • screen

  • voss upgrade_db

Check cluster status

  • cluster check

Template Upgrade (Maintenance Window)#

Description and Steps

Notes and Status

It is recommended that the upgrade steps are run in a terminal opened with the screen command.

On the primary application node:

  • screen

  • app template media/<VOSS Automate.template>

The following message appears:

Running the DB-query to find the current environment's
existing solution deployment config...
  • Python functions are deployed

  • System artifacts are imported.

    Note

    In order to carry out fewer upgrade steps, the updates of instances of some models are skipped in the cases where:

    • data/CallManager instance does not exist as instance in data/NetworkDeviceList

    • data/CallManager instance exists, but data/NetworkDeviceList is empty

    • Call Manager AXL Generic Driver and Call Manager Control Center Services match the data/CallManager IP

The template upgrade automatically detects the deployment mode: “Enterprise” or “Provider”. A message displays according to the selected deployment type. Check for one of the messages below:

Importing EnterpriseOverlay.json

Importing ProviderOverlay.json ...

The template install automatically restarts necessary applications. If a cluster is detected, the installation propagates changes throughout the cluster.

Description and Steps

Notes and Status

Review the output from the app template command and confirm that the upgrade message appears:

Deployment summary of PREVIOUS template solution
(i.e. BEFORE upgrade):
-------------------------------------------------


Product: [PRODUCT]
Version: [PREVIOUS PRODUCT RELEASE]
Iteration-version: [PREVIOUS ITERATION]
Platform-version: [PREVIOUS PLATFORM VERSION]

This is followed by updated product and version details:

Deployment summary of UPDATED template solution
(i.e. current values after installation):
-----------------------------------------------

Product: [PRODUCT]
Version: [UPDATED PRODUCT RELEASE]
Iteration-version: [UPDATED ITERATION]
Platform-version: [UPDATED PLATFORM VERSION]

Description and Steps

Notes and Status

  • If no errors are indicated, create a restore point.

    As part of the rollback procedure, ensure that a suitable restore point is obtained prior to the start of the activity, as per the guidelines for the infrastructure on which the VOSS Automate platform is deployed.

For an unsupported upgrade path, the install script stops with the message:

Upgrade failed due to unsupported upgrade path.
Please log in as sysadmin
and see Transaction logs for more detail.

You can roll back as per the guidelines for the infrastructure on which the VOSS Automate platform is deployed.

If there are errors for another reason, the install script stops with a failure message listing the problem. Contact VOSS support.

On the primary application node, verify the extra_functions have the same checksum across the cluster.

  • cluster run application voss get_extra_functions_version -c

Post upgrade migrations:

On a single application node of a cluster, run:

  • voss post-upgrade-migrations

Data migrations that are not critical to system operation can have significant execution time at scale. These need to be performed after the primary upgrade, allowing the migration to proceed whilst the system is in use - thereby limiting upgrade windows.

A transaction is queued on VOSS Automate and its progress is displayed as it executes.

Description and Steps

Notes and Status

Check cluster status and health
  • on the primary database node:

  • cluster status

Post Template Upgrade Tasks (Maintenance Window)#

Description and Steps

Notes and Status

Verify the upgrade

Log in on the Admin Portal and check the information contained in the About > Version menu. Confirm that versions have upgraded.

  • Release should show XXX

  • Platform Version should show XXX

where XXX corresponds with the release number of the upgrade.

  • Check themes on all roles are set correctly

  • For configurations that make use of the Northbound Billing Integration (NBI), please check the service status of NBI and restart if necessary.

Log Files and Error Checks (Maintenance Window)#

Description and Steps

Notes and Status

Inspect the output of the command line interface for upgrade errors, for example File import failed! or Failed to execute command.

On the primary application node, use the log view command to view any log files indicated in the error messages, for example, run the command if the following message appears:

For more information refer to the execution log file with
'log view platform/execute.log'

For example, if it is required send all the install log files in the install directory to an SFTP server:

  • log send sftp://x.x.x.x install

Log in on the Admin Portal as system level administrator, go to Administration Tools > Transaction and inspect the transactions list for errors.

End of the Maintenance Window and Restoring Schedules#

Description and Steps

Notes and Status

On the CLI:

Run the cluster maintenance-mode stop command to end the VOSS maintenance mode when upgrading to 24.1 from 21.4 or 21.4.-PBx.

This will allow scheduled data sync transactions to resume, including insights sync operations added in 24.1.

For details on the VOSS platform maintenance mode, see the Maintenance Mode topic in the Platform Guide.

  • If you’re upgrading from: [21.4, 21.4-PB1, 21.4-PB2, 21.4-PB3]

    Restore Schedules

    Note

    Schedules can easily be activated and deactivated from the Bulk Schedule Activation / Deactivation menu available on the available on the MVS-DataSync-Dashboard.

    Re-enable scheduled imports if any were disabled prior to the upgrade.

    Individually for each job:

    1. Log in on the Admin Portal as a high level administrator above Provider level.

    2. Select the Scheduling menu to view scheduled jobs.

    3. Click each scheduled job. On the Base tab, check the Activate check box.

    Mass modify:

    1. Modify the exported sheet of schedules to activate scheduled syncs.

    2. Import the sheet.

    Note

    Select the Skip next execution option if you do not wish to execute schedules overlapping the maintenance window, but only execute thereafter.

    Schedules enabled on the CLI:

    For disabled schedules that were overlapping the maintenance window, enable.

    Run schedule enable <job-name>.

Licensing (outside, after Maintenance Window)#

Description and Steps

Notes and Status

From release 21.4 onwards, the deployment needs to be licensed. After installation, a 7-day grace period is available to license the product. Since license processing is only scheduled every hour, if you wish to license immediately, first run voss check-license from the primary application node CLI.

  1. Obtain the required license token from VOSS.

  2. Steps for GUI and CLI:

    1. To license through the GUI, follow steps indicated in Product License Management in the Core Feature Guide.

    2. To license through the CLI, follow steps indicated in Product Licensing in the Platform Guide.

Mount the Insights disk (outside, after Maintenance Window)#

Description and Steps

Notes and Status

On each database node, assign the insights-voss-sync:database mount point to the drive added for the Insights database prior to upgrade.

For example, if drives list shows the added disk as:

Unused disks:
sde

then run the command

drives add sde insights-voss-sync:database

on each unified node where the drive has been added.

Sample output (the message below can be ignored on release 24.1:

WARNING: Failed to connect to lvmetad. Falling back to device scanning.)

$ drives add sde insights-voss-sync:database
Configuration setting "devices/scan_lvs" unknown.
Configuration setting "devices/allow_mixed_block_sizes" unknown.
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
71ad98e0-7622-49ad-9fg9-db04055e82bc
Application insights-voss-sync processes stopped.
Migrating data to new drive - this can take several minutes
Data migration complete - reassigning drive
Checking that /dev/sde1 is mounted
Checking that /dev/dm-0 is mounted
/opt/platform/apps/mongodb/dbroot
Checking that /dev/sdc1 is mounted
/backups

Application services:firewall processes stopped.
Reconfiguring applications...
Application insights-voss-sync processes started.