Домой United States USA — software Data-driven 2021: Predictions for a new year in data, analytics and AI

Data-driven 2021: Predictions for a new year in data, analytics and AI

187
0
ПОДЕЛИТЬСЯ

A tumultuous 2020 has had many in the industry pondering what comes next, yielding almost 50 pages of predictions, from more than 30 companies, delivered to my inbox. Here’s a roundup of many of the 2021 predictions, broken down into the topics that garnered the most conjecture.
Towards the end of each year, I receive a slew of predictions, from data/analytics industry executives and luminaries, focused on the year ahead. This year, those predictions filled a 49-page-long document. While I couldn’t include all of them, I’ve rounded up many of this year’s prognostications, from over 30 companies, in this post. The roster includes numerous well-known data/analytics players, including Cloudera, Databricks, Micro Focus, Qlik, SAS, and Snowflake, to name a few. Thoughts from execs at Andreessen Horowitz, the Deloitte AI Institute and O’Reilly are in the mix as well, as are those from executives at smaller but still important industry players. This year’s groupings include data warehouse vs. data lake; the democratization of artificial intelligence (AI); responsible AI; the convergence of AI and business intelligence (BI); growth in data literacy; the data governance imperative; and, of course, the interplay between analytics and the COVID-19 pandemic. Anyway, enough preamble; let’s get on with this year’s predictions. One popular topic this year was the relative strength, and ultimate survivability, of the data warehouse and data lake approaches to analytics. Bob Muglia, Snowflake’s former CEO, says that fully transacting images and videos together with any source of data in a data warehouse is «…coming in the next two to three years, and that’s going to be the nail in the coffin for the data lake.» Micro Focus’ Open Source Relations Manager, Paige Roberts, feels «the data warehouse vendors have an unbeatable head start [over data lake vendors] because building a solid, dependable analytical database like Vertica can take ten years or more alone. The data lake vendors have only been around about ten years and are scrambling to play catch-up. George Fraser, CEO of Fivetran, says «I think 2021 will reveal the need for data lakes in the modern data stack is shrinking.» Adding that «…there are no longer new technical reasons for adopting data lakes because data warehouses that separate compute from storage have emerged.» If that’s not categorical enough for you, Fraser sums things up thus: «In the world of the modern data stack, data lakes are not the optimal solution. They are becoming legacy technology.» Data lake supporters are even more ardent. In a prediction he titled «The Data Lake Can Do What Data Warehouses Do and Much More», Tomer Shiran, co-founder of Dremio, says «data warehouses have historically had…advantages over data lakes. But that’s now changing with the latest open source innovations in the data tier.» He mentions Apache Parquet and Delta Lake as two such innovations and lesser known projects Apache Iceberg and Nessie as well. Together, these projects allow data to be stored in open, columnar formats across file systems, versioned and processed with transactional consistency. Martin Casado, General Partner of Andreessen Horowitz, put it this way: «If you look at the use cases for data lakes vs. data analytics, it’s very different. Data lakes tend to be more unstructured data, compute intensive, focused on operational AI. The use case for operational AI is larger and growing faster. Over time, I think you can argue that it’s the data lake that ends up consuming everything.» Dipti Bokar, at PrestoDB-focsed Ahana says «As cloud adoption has become mainstream, companies are creating and storing the majority of their data in the cloud, especially in cost-efficient Amazon S3-based data lakes.» Her colleague, Dave Simmen, Ahana’s CTO, says «A federated, disaggregated stack…is displacing the traditional data warehouse with its tightly coupled database.» Simmen also believes that «…we’ll see traditional data warehousing and tightly coupled database architectures relegated to legacy workloads.» Over at Databricks, the strategy is to focus on data lake technology, but to imbue it with certain data warehouse-like qualities. Joel Minnick, Databricks’ VP of Marketing, explains it this way: «The vision we see taking shape now is called the lakehouse. It provides a structured transactional layer to a data lake to add data warehouse-like performance, reliability, quality, and scale. It allows many of the use cases that would traditionally have required legacy data warehouses to be accomplished with a data lake alone.» What about players with no dog in the race? O’Reilly’s, Rachel Roumeliotis, VP of AI and Data, acknowledges the validity of the lake and lakehouse models: «Data lakes have experienced a fairly robust resurgence over the last few years, specifically cloud data lakes…these will remain on the radar in 2021. Similarly, the data lakehouse, an architecture that features attributes of both the data lake and the data warehouse, gained traction in 2020 and will continue to grow in prominence in 2021.» Roumeliotis gives a nod to the warehouse model, adding: «Cloud data warehouse engineering develops as a particular focus as database solutions move more to the cloud.» Over at Starburst, which focuses on Trino (formerly PrestoSQL) — an engine that works great for querying data lakes, but which can also connect to data warehouses and numerous other data sources — CEO Justin Borgman says «We’ll see business leaders pointing…to make data-driven decisions, which encompasses all types of data no matter where it lives — in the cloud, on prem, in data lakes or data warehouses.» As you can imagine, there were a great number of predictions focused on AI and machine learning (ML) this year; there were so many, in fact, that they breakdown into a few substantial subcategories. One set of predictions focuses on how AI will become more democratized, accessible, affordable and mature. Starburst’s Borgman says «ML/AI will become more accessible to a broader base of users.» He adds that while data science backgrounds have been necessary to take advantage of AI up until now, that this «is changing to include anyone in the organization who needs data access to make more intelligent decisions.» Alex Peña, Lead Research and Development Engineer at Linode, thinks the economics of AI will improve its accessibility too, saying «Smaller businesses are going to be able to take advantage of AI as the cost of cloud GPU services comes down.» Ryan Wilkinson, Chief Technology Officer at IntelliShift, would concur, stating: «with hardware at a point to support AI…the ML and AI software running in the cloud will mature faster than ever before.» Ryohei Fujimaki, Ph.D., Founder & CEO of dotData, sees automated machine learning (AutoML) as another driver of AI accessibility for non-data scientists, predicting that, in 2021 «…we will see the rise of AutoML 2.0 platforms that take ‘no-code’ to the next level.» Fujimaki also feels that AutoML will help take AI beyond predictive analytics use cases, because it «…can also provide invaluable insights into past trends, events and information that adds value to the business by allowing businesses to discover the ‘unknown unknowns,’ trends and data patterns that are important, but that no one had suspected would be true.

Continue reading...