Домой United States USA — software Best Practices for Resource Management in PrestoDB

Best Practices for Resource Management in PrestoDB

По

May 13, 2022

This post discusses some of the paradigms that PrestoDB introduces with resource groups as well as best practices.
Join the DZone community and get the full member experience. Resource management in databases allows administrators to have control over resources and assign a priority to sessions, ensuring the most critical transactions get a significant share of system resources. Resource management in a distributed environment makes accessibility of data more accessible and manages resources over the network of autonomous computers (i.e., Distributed systems). The basis of resource management in the distributed system is also resource sharing. PrestoDB is a distributed query engine written by Facebook as the successor to Hive for the highly scalable processing of large volumes of data. Written for the Hadoop ecosystem, PrestoDB is built to scale to tens of thousands of nodes and process petabytes of data. To be usable at a production scale, PrestoDB was built to serve thousands of queries to multiple users without facing bottle-necking and “noisy neighbor” issues. PrestoDB makes use of resource groups in order to organize how different workloads are prioritized. This post discusses some of the paradigms that PrestoDB introduces with resource groups, as well as best practices and considerations to think about before setting up a production system with resource grouping. Presto has multiple “resources” that it can manage resource quotas for. The two main resources are CPU and memory. Additionally, there are granular resource constraints that can be specified, such as concurrency, time, and cpuTime. All of this is done via a pretty ugly JSON configuration file shown in the example below from the PrestoDB doc pages.