Домой United States USA — software Gravity, Residency, and Latency: Balancing the Three Dimensions of Big Data

Gravity, Residency, and Latency: Balancing the Three Dimensions of Big Data

108
0
ПОДЕЛИТЬСЯ

Struggling with challenges related to big data? Remember to balance data gravity, data residency, and data latency priorities.
Join the DZone community and get the full member experience.
The concept of big data has been around for well over a decade now, and today, data sets are bigger than ever before.
Content production of all types continues to explode. Various sorts of telemetry-producing devices from IoT sensors to robots to cloud-based microservices churn out massive quantities of data – all of them potentially valuable.
On the consumption side, AI has given us good reason to generate, collect, and process vast sums of data – the more, the merrier. No AI use case from autonomous vehicles to preventing application outages can’t be improved with more data, all the time.
Where to put all these data continues to be an ongoing concern. The explosion of data threatens to swamp any reduction in storage costs we might eke out. Data gravity continues to weigh us down. And no matter how many data we have, we want any insights we may be able to extract from them right now.
The challenge of how to manage all these data, in fact, is more complicated than people realize. There are multiple considerations that impact any decision we might make about collecting, processing, storing, and extracting value from increasingly large, dynamic data sets.
Here are some of the basics.
The first dimension we must deal with is data gravity. Data gravity refers to the relative cost and time constraints of moving large data sets as compared to the corresponding cost and time impacts of moving compute capabilities closer to the data.
If we’re moving data in or out of a cloud, there are typically ingress and egress fees we must take into account. It’s also important to consider the storage costs for those data, given how hot (rapidly accessible) those data must be.
Bandwidth or network costs can also be a factor, especially if moving multiple data sets in parallel is the best bet.
Every bit as important as cost is the time consideration. How long will it take to move these data from here to there? If we’re moving data through a narrow pipe, such time constraints can be prohibitive.
The second dimension is data residency. Data residency refers to the regulatory constraints that limit the physical locations of our data.
Some jurisdictions require data sovereignty – keeping data on EU citizens within Europe, for example. Other regulations constrain the movement of certain data across borders.
In some cases, data residency limitations apply to entire data sets, but more often than not, they apply to specific fields within those data sets.

Continue reading...