EMR Serverless Setup CLI
The instructions mentioned in the section are applicable only for the Serverless approach to install the Big Data Protector.
The EMR Serverless Setup CLI automates the complete Docker image build and deployment pipeline for the Big Data Protector. It validates the environment, prepares the configuration files, generates the Docker files, builds images with ESA certificate injection, and pushes the artifacts to AWS ECR.
To facilitate the installation, the configurator script generates a set of python scripts within the ./Installation_Files/ directory. The script and the arguments are listed below.
python scripts/emr_serverless_setup_cli.py <argument>
| Argument | Purpose |
|---|---|
validate | Verifies the working directory and config.json schema. Also validates AWS CLI connectivity and docker presence. |
prepare-assets | Updates the config.ini file and the GetCertificates.sh script with ESA details. |
generate-dockerfile | Creates the runtime-specific Dockerfile (Spark/Hive). |
build | Builds the Docker image with ESA certificate injection. |
push | Pushes the custom image to AWS ECR. |
deploy | Run the full pipeline together from validation to push in a single command, if required. |
Note: Execute the individual commands to accommodate custom modifications at any step.
Validating the Environment
The validate argument in the Python script:
- Validates the
config.jsonschema and the required parameters. - Verifies the Docker installation and the daemon status.
- Verifies the AWS CLI configuration and credentials.
- Tests ECR repository connectivity.
- Validates the presence of BDP artifacts, such as,
.jarand configuration files. - Tests ESA connectivity on the configured port.
To validate the environment:
- Log in to the CLI on a machine or an Amazon EC2 node that has connectivity to the ESA.
- Navigate to the directory where the installation files are extracted.
- To execute the Python script, run the following command:
python scripts/emr_serverless_setup_cli.py validate - Press ENTER.
The script performs the required validations and the status of each step appears.
[Validation] ============================================================ [OK] config.json schema valid + docker info + docker buildx version + aws sts get-caller-identity --output json + aws ecr describe-repositories --repository-names bdp-emr-serverless --region <region_name> Summary: [OK] Working directory [OK] Config schema [OK] Docker installed [OK] Docker daemon [OK] BuildKit support [OK] AWS CLI installed [OK] AWS credentials [OK] Assets prepared [OK] Dockerfile exists [OK] COPY sources exist [OK] ECR repo exists [VALIDATION PASSED]
Preparing the Assets
The prepare-assets argument in the Python script:
- Reads the
common/config.initemplate. - Appends the [sync] section in the
config.inifile with ESA connection settings from theconfig.jsonfile. - Appends the [log] section in the
config.inifile withoutput = stdout. - Updates the
/common/GetCertificates.shfile with the ESA host/port.
To prepare the assets:
- Log in to the CLI on a machine or an Amazon EC2 node that has connectivity to the ESA.
- Navigate to the directory where the installation files are extracted.
- To execute the Python script, run the following command:
python scripts/emr_serverless_setup_cli.py prepare-assets - Press ENTER.
The script performs the required actions and a confirmation appears.[Phase 1: Prepare Assets] ============================================================ [INFO] Runtime: SPARK [INFO] Log Output: stdout (audit logs will be sent to stdout) [OK] inserted [sync] after [protector] and updated [log] section (output=stdout, mode=drop) -> ../common/config.ini [OK] updated GetCertificates.sh -> ../common/GetCertificates.sh generate-dockerfile console output
Generating the Dockerfile
The generate-dockerfile argument in the Python script:
- Reads the runtime configuration from the
config.jsonfile for the spark or hive application. - Generates multi-stage Dockerfile optimized for EMR Serverless.
- Configures BuildKit secrets for secure ESA credential handling.
- Stores the
config.inifile in both Spark and Hive locations to ensure runtime interoperability. - Sets up certificate fetch during build time and not during runtime.
- Configures the required permissions for the
hadoop:hadoopuser.
To generate the DockerFile:
- Log in to the CLI on a machine or an Amazon EC2 node that has connectivity to the ESA.
- Navigate to the directory where the installation files are extracted.
- To execute the Python script, run the following command:
python scripts/emr_serverless_setup_cli.py generate-dockerfile - Press ENTER.
The script performs the required actions and a confirmation appears.
[Phase 2: Generate Dockerfile] ============================================================ + which docker 2>/dev/null + docker info 2>/dev/null | grep -i 'docker root dir' || true [INFO] traditional Docker - using BuildKit secrets (secure) [OK] Generated /home/ubuntu/serverless/final_build/spark/Installation_Files/Dockerfile
Building the Docker Image
The build argument in the Python sript:
- Prompts for ESA credentials, such as, username and password.
- Executes the Docker build with BuildKit secrets.
- Cleans up the temporary credential files immediately after building the image.
To build the docker image:
- Log in to the CLI on a machine or an Amazon EC2 node that has connectivity to the ESA.
- Navigate to the directory where the installation files are extracted.
- To execute the Python script, run the following command:
python scripts/emr_serverless_setup_cli.py build - Press ENTER.
The script starts the build process and the prompt to select the authentication method appears.
============================================================ EMR Serverless BDP Image Builder (Build Only) ============================================================ Runtime: spark + docker info + docker buildx version [INFO] Using existing config.ini and Dockerfile [INFO] If you need to regenerate them, use 'prepare-assets' command first ============================================================ ESA Authentication Required ============================================================ Credentials needed to fetch certificates during Docker build. NOT stored in config files or image layers. Passed securely via Docker BuildKit secrets. Authentication Method: [1] Username/Password [2] JWT Token Select authentication method (1 or 2): - To use the credentials, type
1. - Press ENTER.
The prompt to enter the ESA username appears.Enter ESA Username: - Enter the username.
- Press ENTER.
The prompt to enter the password appears.
Enter ESA Password: - Enter the password.
- Press ENTER. The script resumes and completes the build process.
[Phase 3: Build]
============================================================
+ aws ecr describe-repositories --repository-names bdp-emr-serverless --region <region_name>
+ aws ecr get-login-password --region <region_name> | docker login --username AWS --password-stdin <Account_ID>.dkr.ecr.<region_name>.amazonaws.com
+ which docker 2>/dev/null
+ docker info 2>/dev/null | grep -i 'docker root dir' || true
[BUILD] traditional Docker - using BuildKit secrets (secure)
+ cd /home/ubuntu/serverless/final_build/spark/Installation_Files && DOCKER_BUILDKIT=1 docker build --secret id=esa_user,src=/tmp/tmpoyvdsake.secret --secret id=esa_password,src=/tmp/tmpq6l9mn8v.secret -t bdp-emr-serverless:tag_spark -f Dockerfile .
[OK] Built local image bdp-emr-serverless:tag_spark for runtime 'spark'
============================================================
[SUCCESS] Image built locally
Use 'push' command to push to ECR
============================================================
Pushing the Image to ECR
The push argument in the Python script:
- Authenticates with AWS ECR using aws ecr get-login-password.
- Tags the local image with full ECR URI.
- Pushes all image layers to ECR.
- Verifies the image exists in ECR after push.
To push the image to ECR:
- Log in to the CLI on a machine or an Amazon EC2 node that has connectivity to the ESA.
- Navigate to the directory where the installation files are extracted.
- To execute the Python script, run the following command:
python scripts/emr_serverless_setup_cli.py push - Press ENTER.
The script pushes the image to ECR and a confirmation appears.
[Push Image to ECR] ============================================================ + aws sts get-caller-identity --output json + aws ecr describe-repositories --repository-names bdp-emr-serverless --region <region_name> + docker info + docker images --format '{{.Repository}}:{{.Tag}}' + aws ecr get-login-password --region <region_name> | docker login --username AWS --password-stdin <Account_ID>.dkr.ecr.<region_name>.amazonaws.com [OK] Logged in to ECR: <Account_ID>.dkr.ecr.<region_name>.amazonaws.com + docker tag bdp-emr-serverless:tag_spark <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark [OK] Tagged image bdp-emr-serverless:tag_spark -> <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark + docker push <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark [OK] Pushed image <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark [SUCCESS] Image pushed to ECR
Deploying the Image
The deploy argument enables the execution of the complete pipeline starting from validation to deployment in a single command.
Note: This is an optional step.
To deploy the image:
- Log in to the CLI on a machine or an Amazon EC2 node that has connectivity to the ESA.
- Navigate to the directory where the installation files are extracted.
- To execute the Python script, run the following command:
python scripts/emr_serverless_setup_cli.py deploy - Press ENTER.
The script deploys the image and a confirmation appears.
============================================================ EMR Serverless BDP Image Deployment (Full Pipeline) ============================================================ Runtime: spark + docker info + docker buildx version + aws sts get-caller-identity --output json + aws ecr describe-repositories --repository-names bdp-emr-serverless --region <region_name> [Phase 1/3] Preparing assets... [Phase 1: Prepare Assets] ============================================================ [INFO] Runtime: SPARK [INFO] Log Output: stdout (audit logs will be sent to stdout) [OK] replaced [sync] and updated [log] section (output=stdout, mode=drop) -> ../common/config.ini [OK] updated GetCertificates.sh -> ../common/GetCertificates.sh [Phase 2/3] Generating Dockerfile... [Phase 2: Generate Dockerfile] ============================================================ + which docker 2>/dev/null + docker info 2>/dev/null | grep -i 'docker root dir' || true [INFO] traditional Docker - using BuildKit secrets (secure) [OK] Generated /home/ubuntu/serverless/final_build/spark/Installation_Files/Dockerfile [Phase 3/3] Building and pushing image... ============================================================ ESA Authentication Required ============================================================ Credentials needed to fetch certificates during Docker build. NOT stored in config files or image layers. Passed securely via Docker BuildKit secrets. Authentication Method: [1] Username/Password [2] JWT Token Select authentication method (1 or 2): 1 Enter ESA Username: admin Enter ESA Password: [Phase 3: Build] ============================================================ + aws ecr describe-repositories --repository-names bdp-emr-serverless --region <region_name> + aws ecr get-login-password --region <region_name> | docker login --username AWS --password-stdin <Account_ID>.dkr.ecr.<region_name>.amazonaws.com + which docker 2>/dev/null + docker info 2>/dev/null | grep -i 'docker root dir' || true [BUILD] traditional Docker - using BuildKit secrets (secure) + cd /home/ubuntu/serverless/final_build/spark/Installation_Files && DOCKER_BUILDKIT=1 docker build --secret id=esa_user,src=/tmp/tmphax6dcg9.secret --secret id=esa_password,src=/tmp/tmpzgrig1jz.secret -t bdp-emr-serverless:tag_spark -f Dockerfile . [OK] Built local image bdp-emr-serverless:tag_spark for runtime 'spark' + docker tag bdp-emr-serverless:tag_spark <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark + docker push <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark [OK] Pushed <Account_ID>.dkr.ecr.<region_name>.amazonaws.com/bdp-emr-serverless:tag_spark ============================================================ [SUCCESS] All phases completed ============================================================
Feedback
Was this page helpful?