In the last article of the five-part series, readers will understand the last high applicability scenario: Scenario 4: Windows AD + Open-Source Ranger.
Join the DZone community and get the full member experience.
Hopefully you have enjoyed the previous four articles in this series. In the last article of this series, we will introduce the last high applicability scenario: “Windows AD + Open-Source Ranger.”
In this solution, Windows AD plays the authentication provider, all user accounts data store on it, and Ranger plays the authorization controller; it will sync accounts data from Windows AD to grant privileges against user accounts from Windows AD. Meanwhile, the EMR cluster needs to install a series of Ranger plugins. These plugins will check with the Ranger server to assure the current user has permission to perform an action. The EMR cluster will also sync accounts data from Windows AD via SSSD so the user can log in nodes of the EMR cluster and submit jobs. As end users, they can log in SSH nodes of the EMR cluster with her/his Windows AD account. If Hue is available, they can also log into Hue with this account.
Let’s deep dive into Ranger for more details; its architecture looks as follows:
The installer will finish the following jobs:
Generally, the installation and integration process can be divided into three stages:
The following diagram illustrates the progress in detail:
At stage 1, we need to do some preparatory work. At stage 2, we start to install and integrate. Here are two options at this stage: one is an all-in-one installation driven by a command-line-based workflow. The other is a step-by-step installation. For most cases, an all-in-one installation is always the best choice; however, your installation workflow may be interrupted by unforeseen errors. If you want to continue installing from the last failed step, please try the step-by-step installation. If you want to re-try a step with different argument values to find the right one, step-by-step is also a better choice. At stage 3, we need to create an EMR cluster. If you already have one, skip this job. In most cases, we need to install Ranger on an existing cluster, not a new cluster. For EMR-native Ranger, it is impossible to install on an existing cluster (because EMR-native Ranger plugins can only be installed when creating the cluster), but open-source Ranger does not have this problem, so you can be free to install on an existing or new EMR cluster.
There is a little overlap in the execution sequence between stages 2 and 3. At step 2.4, the installation progress will be pending, and the installer will indicate users to create their own cluster and keep monitoring the target cluster’s status. Once the cluster is ready, the progress will resume and continue to perform REST actions.
As a design principle, the installer does not include any actions to create an EMR cluster. You should always create your cluster yourself because an EMR cluster could have unpredictable settings, i.e., application-specific (HDFS, Yarn, etc.) configuration, step scripts, bootstrap scripts, and so on. It is unadvised to couple Ranger’s installation with the EMR cluster’s creation.
Notes:
To integrate with Windows AD, EMR cluster nodes need to join the Windows domain (realm). A series of constraints are imposed on the VPC. Before installing, please ensure the hostname of the EC2 instance is no more than fifteen characters. This is a limitation from Windows AD; however, as AWS assigns DNS hostnames based on the IPv4 address, this limitation propagates to VPC. If the CIDR of VPC can make sure the IPv4 address is no more than nine characters, the assigned DNS hostnames can be limited to fifteen characters. With the limitation, a recommended CIDR setting of VPC is 10.0.0.0/16.
Although we can change the default hostname after the EC2 instances are available, the hostname will be used when the computers join the Windows AD directory. This happened during the creation of the EMR cluster. A post modification on the hostname does not work (a possible workaround is to put modifying hostname actions into bootstrap scripts, but we haven’t tried it. To change the hostname, please refer to the AWS Documentation titled Change the hostname of your Amazon Linux instance.
First, we need to create a Windows AD server with PowerShell scripts. First, create an EC2 instance with the Windows Server 2019 Base image (2016 is also tested and supported). Next, log in with an Administrator account, download the Windows AD installation scripts file from here, and save it to your desktop.
Next, press “Win + R” to open a run dialog, copy the following command line, and replace the parameter values with your own settings:
The ad.ps1 has pre-defined default parameter values. The domain name is example.com, the password is Admin1234!, and the trusted realm is COMPUTE.INTERNAL. As a quick start, you can right-click the ad.ps1 file and select Run with PowerShell to execute it.
Note: You can not run the PowerShell scripts by right-clicking “Run with PowerShell” on us-east-1 because its default trusted realm is EC2.INTERNAL, so you should set the -TrustedRealm EC2.INTERNAL explicitly via the above command line.
After the scripts are executed, the computer will ask to restart. This is forced by Windows. We should wait for the computer to restart and then log in again as an Administrator so the subsequent commands in the scripts file continue executing. Be sure to log in again; otherwise, a part of the scripts have no chance to execute.
After logging in, we can open “Active Directory Users and Computers” from Start Menu -> Windows Administrative Tools -> Active Directory Users and Computers or enter dsa.msc from the “Run” dialog to check on the created AD. If everything goes well, we will get the following AD directory:
Next, we need to check the DNS setting. An invalid DNS setting will result in installation failure.
Start
United States
USA — software Apache Ranger and AWS EMR Automated Installation and Integration Series (5): Windows...