Домой United States USA — software Apache Spark Internals: As Easy as Baking a Pizza!

Apache Spark Internals: As Easy as Baking a Pizza!

106
0
ПОДЕЛИТЬСЯ

This quick (and hopefully) easy read is a fun take at how you can draw parallels between baking a pizza and running an Apache Spark application!
Join the DZone community and get the full member experience. Even if you aren’t into pizzas, I am sure you have a vague idea of what goes into making them. This quick (and hopefully) easy read is a fun take at how you can draw parallels between baking a pizza and running an Apache Spark application! Okay! So here we go, let’s me introduce, Dug. Making “DAGs” is his specialty; He is good with schedules and coming up with a good overall plan for any “job” given to him. He is into the fresh pizza-making business with two other partners (Tessy, who is quite the Task Master, and Buster, who is a good “ Cluster Manager”). When a pizza job order comes up, as the planner in charge, Dug comes up with a set of predefined stages and operations that go into each stage of making the pizza. For a very simple pizza, he will come up with a DAG ( Directed Acyclic Graph — just read that aloud again — it is quite an intuitive name and graph, isn’t it?) So Dug has optimized on Stages to making the pizza that includes: Good job! But hold on, remember he is the planning in charge, so he has the additional responsibility of further creating the specific tasks associated with this job. For example: So essentially, Dug is considering each unit of work as a task; t he bigger the pizza, the larger the number of tasks to be done! And so Dug comes up with a set of tasks that can then be handed over to Tessy, the Task Scheduler. Now that the exact plan is laid out, the actual execution of these tasks comes into the picture. Worker rooms of the pizza shop with executors coming in when summoned. In the pizza shop, assume you have a pool of worker rooms, each room can hold some limited set of pizza shop chefs, depending on the worker room capacity. (let us call the chefs executors as they execute a task given to them). Only when there is a task needed to be executed, the chef is called in to execute the given task in that room. In short, when it finally comes to execution, Tessy the Task Scheduler takes the set of tasks. She takes the help of Buster the cluster manager and they work together. Tessy decides what task gets scheduled when. And Buster decides when an “executor” should be summoned in which “worker” room. To reiterate, Tessy only knows when to hand out the tasks in the right order and Buster only knows which worker rooms are free to hold the executors and how to call upon them. spark.executor.cores One employee with one pair of hands ( 1 core vCPU), can execute one task at a time. But let’s say we can attach, say 4 extra pairs of hands ( 4 core vCPU), to the employee, they can execute 4 tasks at the same time! spark.executor.memory Stretch this imagery a bit more. Let’s say we have the feature of attaching hands of different sizes to the executor.

Continue reading...