This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Upgrading the Big Data Protector

Starting from version 10.1, the Big Data Protector provides a feature to seamlessly move to a newer version. This uprade mechanism leverages the Rolling Restart feature provided by Cloudera.

Rolling Restart in Cloudera is a feature that allows services and role instances in a cluster to be restarted sequentially, rather than all at once. This minimizes downtime and ensures high availability during configuration changes or upgrades. By restarting components in controlled batches, Cloudera helps maintain cluster stability and service continuity without disrupting critical workloads.

The overall process of upgrading the Big Data Protector, to a newer version, are listed below.

  1. Download the installation pacakge for the newer version of the Big Data Protector.
  2. Extract the contents of the installation package into a separate directory.

    Note: For more information, refer Extracting the installation package.

  3. Execute the configurator script to generate the required parcels or installation files.

    Note: For more information, refer Running the configurator script.

  4. Update the cluster.config file.

    Note: For more information, refer Editing the Cluster Configuration File.

  5. Execute the smooth upgrade script to switch to a newer version of the Big Data Protector.

    Note: For more information, refer Executing the Upgrade Script.

1 - Editing the Cluster Configuration File

The cluster.config file contains critical parameters required to switch to another version of the Big Data Protector. This file is created after executing the configurator script. The cluster.config file is available in the /Installation_Files/ directory.

To edit the cluster.config file:

  1. Log in to the Master node.
  2. Navigate to the directory where the installation files for the new version of the Big Data Protector is extracted.
  3. To view the cluster_config file, using any compatible text editor, run the following command:
    vim cluster.config
    
  4. Press ENTER.
    The command displays the contents of the cluster.config file. The parameters in the file are categorized into mandatory and optional sections.
     CM_HOST=                            # Cloudera Manager server hostname or IP address (e.g., 192.168.123.25 or cm.example.com)
     CM_PORT=                            # Cloudera Manager server port (default: 7180 for HTTP, 7183 for HTTPS)
     CM_USER=                            # Cloudera Manager admin username (e.g., admin)
     CM_PASS=                            # Cloudera Manager admin password (e.g., admin)
     CLOUDERA_BASE=                      # Base directory for Cloudera installation (e.g., /opt/cloudera)
     CLUSTER_NAME=                       # Name of the cluster as shown in Cloudera Manager (e.g., Cluster1)
     PREV_INSTALL_FILES_DIR=             # Path to previous install files directory (e.g., "/build/10.1.1/Installation_Files")
    
     # Rolling restart tuning (optional)
     ROLLING_BATCH_SIZE="1"              # Number of nodes to restart in each batch. A value of 1 ensures strict sequential upgrade—only one node is offline at a time. Increasing this (e.g., to 2 or 5) allows parallel upgrades, which speeds up the process but increases risk and potential downtime. This value depends on cluster size and workload characteristics. Please consult your cluster administrator before modifying.
     ROLLING_SLEEP_SECONDS="300"         # Pause duration (in seconds) between batches. This gives time for services to stabilize and avoids overwhelming cluster. Useful for large clusters or when workload is high.
     ROLLING_FAIL_COUNT_THRESHOLD="0"    # Maximum number of node failures allowed before the rolling restart is aborted. 0 means no limit—restart continues regardless of failures. Set this to a small number (e.g., 2) to enforce safety and halt the process if too many nodes fail.
     ROLLING_STALE_CONFIGS_ONLY="true"   # If true, only roles with stale configuration (i.e., config changes not yet applied) will be restarted. This avoids unnecessary restarts and speeds up the process. If false, all roles are restarted regardless of config state.
     ROLLING_UNUPGRADED_ONLY="true"      # Controls whether the rolling restart targets all roles or only outdated ones.  - false: Full rolling restart (all roles restarted, cleanup runs).  - true: Retry mode (only outdated roles restarted, cleanup skipped). Useful for resuming interrupted upgrades.
     ROLLING_TIMEOUT_SECONDS="3600"      # Total time (in seconds) allowed for the rolling restart to complete. If the process exceeds this duration, it will be considered failed. Default is 1 hour. This value should be tuned based on the number of nodes, batch size, and expected restart duration per node. Please check with your cluster administrator.
     ROLLING_EXCLUDE_SERVICES="impala"   # Optional. Space-separated list of CM service names to exclude from the rolling restart. For example, excluding impala avoids restarting Impala daemons, which may be critical for ongoing queries.
     PARCEL_RECOGNITION_TIMEOUT=300      # Seconds to wait after uploading a parcel and restarting Cloudera Manager for it to detect the new parcel. This value depends on CM performance and cluster size. Please confirm with your administrator.
     STAGE_WAIT_TIMEOUT=900              # Time (in seconds) to wait for a parcel to reach a target stage (e.g., DISTRIBUTED, ACTIVATED). The final expected stage is ACTIVATED. This timeout should be adjusted based on network speed, disk I/O, and number of nodes. Please check with your cluster administrator.
     BDP_SSH_USER=root                   # SSH user used for remote commands and safety checks. Defaults to root, but can be changed if CM agents run under a different user.
     REMOVE_OLD_PARCELS_AFTER_RR=true    # If true, old parcels (e.g., 10.1.x) will be removed after a successful rolling restart. Helps free up disk space and avoid confusion. If false, old parcels are retained for rollback.
    
  5. Edit the parameters as required.

    Note: If the password for Cloudera Manager is not provided in the cluster.config file, the script will prompt for the password during the upgrade.

    Enter CM_PASS (Cloudera Manager password):
    
  6. Save the changes to the cluster.config file.

2 - Executing the Upgrade Script

After editing the cluster.config file, execute the smooth upgrade script to upgrade the protector. On all the nodes, the script will:

  1. Distribute the new parcels.
  2. Activate the new parcels.
  3. Removing the old configuration.
  4. Setting the new configuration.
  5. Starts the rolling restart to update the required services.

To excute the upgrade script:

  1. Log in to the Master node.
  2. Navigate to the directory where the installation files for the new version are extracted.
  3. To execute the script, run the following command:
    ./bdp_smooth_upgrade.sh
    
  4. Press ENTER. The script upgrades the protector to a newer version using the Rolling Restart feature provided by Cloudera.
    'jq' is available.
    Config loaded:
    CM_SCHEME = http
    CM_HOST   = <master_node_ip_address>
    CM_PORT   = 7180
    CLOUDERA_BASE = /opt/cloudera
    CLUSTER_NAME  = <name_of_the_cluster>
    BASE_URL  = http://<master_node_ip_address>:7180/api
    CSD_DIR   = /opt/cloudera/csd
    PARCEL_DIR= /opt/cloudera/parcel-repo
    REMOVE_OLD_PARCELS_AFTER_RR = true
    REMOVE_PARCEL_STRATEGY      = hosts_only
    Detecting Cloudera Manager API version from http://<master_node_ip_address>:7180/api/version ...
    Detected API version: v57
    CM_URL set to: http://<master_node_ip_address>:7180/api/v57
    Checking if cluster '<name_of_the_cluster>' exists in Cloudera Manager...
    Cluster '<name_of_the_cluster>' exists and is accessible.
    Checking if cluster-level Rolling Restart is available (non-intrusive)...
    Rolling Restart appears available (HDFS HA detected: 2xNN, 2xZKFC, 3xJN).
    Copying files from . to Cloudera directories...
    Copying JAR files to /opt/cloudera/csd ...
    Copying parcel files to /opt/cloudera/parcel-repo ...
    Files copied and permissions set successfully.
    Extracting parcel versions from ....
    Detected versions:
    PTY_BDP : <new_BDP_version>_CDP7.1.p0
    PTY_CERT: <new_BDP_version>_CDP7.1.p0
    PTY_LOGFORWARDER_CONF: <new_BDP_version>_CDP7.1.p0
    Encoded versions:
    PTY_BDP : <new_BDP_version>_CDP7.1.p0
    PTY_CERT: <new_BDP_version>_CDP7.1.p0
    PTY_LOGFORWARDER_CONF: <new_BDP_version>_CDP7.1.p0
    Pre-upgrade ACTIVE versions from CM:
    PTY_BDP             : <old_BDP_version>_CDP7.1.p0
    PTY_CERT            : <old_BDP_version>_CDP7.1.p0
    PTY_LOGFORWARDER_CONF: <old_BDP_version>_CDP7.1.p0
    Restarting Cloudera Manager Server...
    Cloudera Manager service restart initiated.
    Waiting for Cloudera Manager API to become available...
    Cloudera Manager is up and responding.
    Waiting for PTY_CERT (<new_BDP_version>_CDP7.1.p0) to be recognized by CM ...
    PTY_CERT recognized.
    PTY_CERT current stage: ACTIVATED
    PTY_CERT is already ACTIVATED. Skipping.
    Waiting for PTY_LOGFORWARDER_CONF (<new_BDP_version>_CDP7.1.p0) to be recognized by CM ...
    PTY_LOGFORWARDER_CONF recognized.
    PTY_LOGFORWARDER_CONF current stage: ACTIVATED
    PTY_LOGFORWARDER_CONF is already ACTIVATED. Skipping.
    Waiting for PTY_BDP (<new_BDP_version>_CDP7.1.p0) to be recognized by CM ...
    PTY_BDP recognized.
    PTY_BDP current stage: ACTIVATED
    PTY_BDP is already ACTIVATED. Skipping.
    Running BDP config script (UNSET): /<old_version_dir>/Installation_Files/set_unset_bdp_config.sh
    Args: --protocol=http:// --cm-server-ip=<master_node_ip_address> --cm-server-port=7180 --cluster-name='<name_of_the_cluster>' --username='<name_of_the_user>' --password=****** --user-choice=UNSET
    
    Checking Cluster's existence...
    
    Cluster's existence verified.
    
    Checking existence of Tez service with name 'tez'.
    ##O=-#      #
    Service 'tez' exists.
    
    Unsetting Tez's config...
    ##################################################################################### 100.0%
    Tez Service wide config ('tez.cluster.additional.classpath.prefix') has been updated.
    
    Checking existence of Impala service with name 'impala'.
    
    Service 'impala' exists.
    
    Unsetting Impala's config...
    ##################################################################################### 100.0%
    Impala's 'IMPALAD_role_env_safety_valve' config for Role Group 'impala-IMPALAD-2' has been updated.
    ##O=-#      #
    ##################################################################################### 100.0%
    Impala's 'IMPALAD_role_env_safety_valve' config for Role Group 'impala-IMPALAD-1' has been updated.
    ##O=-#      #
    ##################################################################################### 100.0%
    Impala's 'IMPALAD_role_env_safety_valve' config for Role Group 'impala-IMPALAD-BASE' has been updated.
    
    Checking existence of Spark on Yarn service with name 'spark_on_yarn'.
    
    Service 'spark_on_yarn' exists.
    
    Unsetting Spark on Yarn's config...
    ##################################################################################### 100.0%
    Spark on Yarn Service wide config ('spark-conf/spark-env.sh_service_safety_valve') has been updated.
    
    Running BDP config script (SET): ./set_unset_bdp_config.sh
    Args: --protocol=http:// --cm-server-ip=<master_node_ip_address> --cm-server-port=7180 --cluster-name='<name_of_the_cluster>' --username='<name_of_the_user>' --password=****** --user-choice=SET
    
    Checking Cluster's existence...
    
    Cluster's existence verified.
    
    Checking existence of Tez service with name 'tez'.
    ##O=-#      #
    Service 'tez' exists.
    
    Setting Tez's config...
    ##O=-#      #
    ##################################################################################### 100.0%
    Tez Service wide config ('tez.cluster.additional.classpath.prefix') has been updated.
    
    Checking existence of Impala service with name 'impala'.
    
    Service 'impala' exists.
    
    Setting Impala's config...
    
    ##################################################################################### 100.0%
    Impala's 'IMPALAD_role_env_safety_valve' config for Role Group 'impala-IMPALAD-2' has been updated.
    
    ##################################################################################### 100.0%
    Impala's 'IMPALAD_role_env_safety_valve' config for Role Group 'impala-IMPALAD-1' has been updated.
    
    ##################################################################################### 100.0%
    Impala's 'IMPALAD_role_env_safety_valve' config for Role Group 'impala-IMPALAD-BASE' has been updated.
    
    Checking existence of Spark on Yarn service with name 'spark_on_yarn'.
    
    Service 'spark_on_yarn' exists.
    
    Setting Spark on Yarn's config...
    
    ##################################################################################### 100.0%
    Spark on Yarn Service wide config ('spark-conf/spark-env.sh_service_safety_valve') has been updated.
    
    ROLLING_EXCLUDE_SERVICES is set. Using restartServiceNames API to exclude: impala
    Waiting for CM command id=<command_ID> to complete ...
    - RollingRestart progress: 0%, active: ?, success: true
    Command <command_ID> finished successfully.
    Rolling restart finished successfully.
    Evaluating convergence before parcel cleanup ...
    Warning: Permanently added 'edge.localdomain.com' (ECDSA) to the list of known hosts.
    Warning: Permanently added 'master.localdomain.com' (ECDSA) to the list of known hosts.
    Warning: Permanently added 'node1.localdomain.com' (ECDSA) to the list of known hosts.
    Warning: Permanently added 'node2.localdomain.com' (ECDSA) to the list of known hosts.
    Warning: Permanently added 'node3.localdomain.com' (ECDSA) to the list of known hosts.
    Cluster appears converged: all hosts use PTY_BDP <new_BDP_version>_CDP7.1.p0 and no old-parcel processes found.
    Converged ? cleaning old parcels (REMOVE_OLD_PARCELS_AFTER_RR=true) ...
    Selected PTY_CERT old version to clean: Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0
    Cleaning old parcel PTY_CERT (Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0) ...
    Current stage: DISTRIBUTED
    Removing distribution of PTY_CERT Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0 from hosts ...
    Waiting for PTY_CERT to reach stage: DOWNLOADED
    Current stage for PTY_CERT: UNDISTRIBUTING
    Current stage for PTY_CERT: UNDISTRIBUTING
    Current stage for PTY_CERT: DOWNLOADED
    Done with PTY_CERT (Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0).
    Selected PTY_BDP old version to clean: Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0
    Cleaning old parcel PTY_BDP (Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0) ...
    Current stage: DISTRIBUTED
    Removing distribution of PTY_BDP Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0 from hosts ...
    Waiting for PTY_BDP to reach stage: DOWNLOADED
    Current stage for PTY_BDP: UNDISTRIBUTING
    Current stage for PTY_BDP: UNDISTRIBUTING
    Current stage for PTY_BDP: DOWNLOADED
    Done with PTY_BDP (Discovering previous parcel versions in: <old_version_dir>/Installation_Files
    Previous PTY_BDP version: <old_BDP_version>_CDP7.1.p0
    Previous PTY_CERT version: <old_BDP_version>_CDP7.1.p0
    <old_BDP_version>_CDP7.1.p0).
    Old parcels cleanup completed.
    

3 - Downgrading to an older version

To downgrade the Big Data Protector to an older version:

  1. Edit the cluster.config file for the older version to update the following:
    1. Set the value of the PREV_INSTALL_FILES_DIR parameter to the newer version of the protector.
    2. Set the value of the ROLLING_STALE_CONFIGS_ONLY parameter to True.
    3. Set the value of the ROLLING_UNUPGRADED_ONLY parameter to True.
  2. Execute the bdp_smooth_upgrade.sh script.

To execute the script:

  1. Log in to the Master node.
  2. Navigate to the directory where the installation files for the older version are extracted.
  3. To execute the script, run the following command:
    ./bdp_smooth_upgrade.sh
    
  4. Press ENTER. The script dowgrades the protector to an older version specified in the cluster.config file.