Performing URP Operations

The instructions mentioned in the section are applicable only for the Serverless approach.

The Big Data Protector on the EMR Serverless architecture provides the following approaches to perform URP operations:

  • AWS Web UI - operations using this approach returns only the driver logs.
  • AWS CLI - operations using this approach returns both the driver and executor logs.

Creating the EMR Serverless Application for Spark

  1. Log in to the AWS console.
  2. Navigate to the EMR page.
  3. From the left pane, click EMR Serverless.
  4. Under Manage applications, select the required EMR studio.
  5. Click Manage applications.
  6. Click Create application.
  7. Under Application settings, specify a value for the following:
    1. Name
    2. Type
    3. Release version
  8. Under Application setup options, select the Use custom settings option.
  9. Under Custom image settings, select the Use the custom image with this application check box.
  10. Browse and select the required image from the Elastic Container Repository.
  11. Under Application logs and metrics, select the Deliver logs to Amazon CloudWatch check box.
  12. In the Log group name box, enter the name for the CloudWatch Log group. The name must be the same as that of the group created to fetch logs from the application.
  13. Under Interactive endpoint, select the Enable endpoint for EMR studio check box to analyze data in Jupyter notebooks on EMR Serverless. This is optional.
  14. Under Network connections, from the Virtual private cloud (VPC) list, select the required VPC.
  15. Select the required Subnets and the Security groups.
  16. Under Application behavior, set the required time to stop the application.
  17. Click Create and start application.

Submitting a Spark Job

  1. Create a Spark script using Protegrity functions.
  2. Upload the Spark script to the S3 bucket.
  3. Using the AWS CLI/CloudShell, submit the job. A sample command is listed below.
    aws emr-serverless start-job-run \
    --region <region_name> \
    --application-id <application_id> \
    --execution-role-arn arn:aws:iam::<Account_ID>:role/EMR-Servlerless-Execution-Role \
    --job-driver '{
        "sparkSubmit": {
        "entryPoint": "s3://<script_path>/<script_name>.py"
        }
    }' \
    --configuration-overrides '{
        "monitoringConfiguration": {
        "cloudWatchLoggingConfiguration": {
            "enabled": true,
            "logGroupName": "<log_group_name>",
            "logStreamNamePrefix": "emrs",
            "logTypes": {
            "SPARK_DRIVER": ["STDOUT","STDERR"],
            "SPARK_EXECUTOR": ["STDOUT","STDERR"]
            }
        }
        }
    }'
    

Last modified : January 13, 2026