Learn what autonomous driving research is, how traditional data management approaches slow autonomous development, and how to accelerate autonomous development.
The march towards autonomous vehicles continues to accelerate. While expert opinions differ on the specific timing and use cases that will emerge first, few deny that self-driving cars are in our future. Not surprisingly, when reviewing big data strategies with my automotive clients, discussions on data management strategies for autonomous driving research inevitably surface.
A few weeks ago, at DataWorks Summit 2017 in Munich, I co-presented a “Big Data in Automotive” session with NorCom and Microsoft, two Hortonworks solution partners collaborating with Hortonworks on data management solutions for autonomous drive research. Particularly intriguing was a discussion I had with Dr. Tobias Abthoff from Norcom. Through our discussions, a more robust data management strategy for autonomous vehicle research emerged.
As we all can appreciate, “teaching” a vehicle to drive under the full range of conditions it will encounter (i.e. road conditions, weather conditions and behavior of other traffic participants like cars, trucks or people) is a daunting proposition. If merely the thought of this makes you nervous, you’ re not alone. According to the American Automobile Association (AAA) , 75 percent of consumers are not yet ready to embrace self-driving cars. However, that is the very challenge facing automakers: teaching vehicles to unfailingly assess and respond to any combination of operational conditions “on-the-fly” through discrete rules (algorithms) governing a vehicle’s behavior.
Interestingly, humans and machines “learn” in similar ways. For any given situation, both humans and machines must first absorb experiences (data) , followed by applying a set of rules (algorithms) that facilitate problem-solving. When outcomes are either positive or negative, we generally learn from the exercise.
As it turns out, teaching cars to drive is an incredibly data intensive endeavor. Traditional data management approaches are straining to cope with the demands imposed by autonomous driving research. Consider challenges in the following two areas.
Traditionally, vehicle test data has been stored in numerous Network Attached Storage (NAS) systems, often distributed in locations around the world. However, given the cost and performance limitations inherent with NAS-based systems, automakers are investigating more efficient storage solutions for autonomous vehicle research.
Once all of this vehicle data is stored, how is it actually used to teach a car to drive? Several data processing steps are required. First, each frame of video (with associated RADAR, LIDAR, and sensor data) is analyzed to capture exactly what was “seen” (i.e. a person crossing an intersection) and cataloged, providing a library of driving scenario “inputs” from which engineers can develop rules (algorithms) that dictate how a vehicle should respond. Next, these algorithms must be tested via simulations, utilizing the real-world autonomous vehicle big data previously collected.
In discussions with Dr. Abthoff, he shared with me a fascinating approach that NorCom has taken to address the data management challenges associated with autonomous vehicle research. The approach consists of the following two basic principles.
Through the ability to store data of unlimited size (beyond petabytes) and variety (video, LIDAR, RADAR, sensor, etc.) , the Hadoop Distributed File System (HDFS) provides a high-performance and cost-effective foundation for storing data associated with autonomous vehicle research.
The NorCom approach also leverages Hadoop’s inherent capability to perform massively scalable MapReduce and Spark workloads, particularly useful for processing algorithm test simulations. By doing so, NorCom has rewritten the data processing playbook. Rather than moving data to the algorithms on workstations for data processing, this new method prescribes exactly the opposite approach by redeploying the algorithms to the data (Hadoop) , where high-performance computing also resides.
By leveraging this approach, data processing performance is enhanced exponentially. Simulation test results that once required days can now be achieved in minutes, accelerating the pace of autonomous development. Complimentary technologies extend these benefits still further. For example, by equipping Hadoop nodes with Graphical Processing Units (GPUs) , simulation computations based on deep learning frameworks can be dramatically accelerated. In addition, container technologies such as Docker provide the ability to deploy legacy applications, once only able to be run on Workstations, to be deployed directly on the high-performance Hadoop cluster, without the need to adapt the applications.