Start United States USA — software Is Data Discovery Just a Buzzword? Is Data Discovery Just a Buzzword?

Is Data Discovery Just a Buzzword? Is Data Discovery Just a Buzzword?

Von

May 11, 2017

501

Learn about data discovery, data discovery’s rise to fame, what real data discovery should be like, and why the author thinks data discovery is a buzzword.
Ever been irked by a buzzword? If you’ re a data analyst, we’ re betting you have.
The business analytics industry is notorious for its use of jargon. That’s not a problem for those of us in the loop, but for business users who want to get in on the analytics action, the terminology alone can be a major barrier.
If you’ re a business user who wants to capitalize on the huge opportunity presented by better business data analytics, read on. We’ re about to give you the real deal when it comes to one term you’ ve definitely seen a lot of: data discovery.
Let’s start with a simple definition.
At its core, data discovery is the process of teasing out relevant data insights and delivering those insights to the business users who need them — a great proposition as more business users want to access and analyze their own data.
In the early days of BI, data analytics were reserved to technical and IT departments. Marketing managers, R&D heads, and any other type of business user had to rely on manual reports or templates that, as you can imagine, didn’ t always give them the answers they needed.
Enter: Data Discovery Tools.
In 2008, Kurt Schlegel, a Research VP at Gartner, published a paper called, “The Rise of Data Discovery Tools.” In it, he predicted big growth for data discovery tools and indeed, by 2012, data discovery comprised a multi-billion dollar industry under the larger umbrella of BI.
As more diverse business users called for data visualizations they could access and digest quickly, the need for data discovery systems soared.
Suddenly, business users across the organization were able to get the answers (and internal approvals) they needed in a format that was easy to understand and act on. But, it came at a price.
Most data discovery tools rely on resource-heavy data prep, forcing you to aggregate the data before you can visualize it.
This requires additional tools for cleansing, on top of a separate data warehouse, and usually at least one frustrated call to IT. This process is not only lengthy and expensive but also leaves a lot of room for error. Unfortunately, this problem of a high cost, highly fragmented setup also applies to data discovery visualization tools.
If you’ re like most organizations, you’ re probably working with multiple systems like Google Analytics, SQL, and Excel. What you want is a single view of the data to help you prove your point or make the right call.
Many visualization tools can indeed combine multiple sources into a single table. But for this to happen, someone has to model the data first in order to get a correct analysis. And much like data prep, data modeling takes time and resources.
This would tempt many users into simply skipping the time-intensive data modeling step, but that would be a big mistake. Without the right data modeling, you’ ll be looking at inaccurate data and defeating the purpose altogether.
So with all the cost and productivity barriers, why are data discovery tools still so popular?
More importantly…
Since most visual analytics platforms offer only half a solution (i.e. visualization with no or incomplete data prep and modeling) , data discovery tools are still the most popular tool for filling the gaps (not to mention boosting revenues for service providers) .
In the past, investing in these tools made sense. It was the only way organizations could make their data available across the enterprise.
Thankfully, the BI landscape has evolved. Now, with a plethora of full stack solutions and columnar-based tools, you can connect directly to raw data and join data sources for a single data mode — or as we like to call it, a single version of the truth.
A great BI tool is one that combines multiple disparate data sources in one user-friendly, easy-to-read, and more importantly, accurate data visualization through logical joins.
In other words, you get to mash up multiple data sources without messing up your analysis, which also solves the vast majority of modeling challenges before they even arise.
With a full-stack solution, there’s no need to spend time or money on upkeep because there’s no need to juggle multiple models or worry about manually cleaning, structuring, and updating data in a centralized data warehouse.
You can now do all of this visually, no coding required. A system with a focus on predictive analytics can even remember past updates and automate them to save you time in the future.
Investing in a variety of data discovery tools actually creates more gaps within the organization and places a greater burden on IT and technical teams as they try to support multiple systems and users.
Compare that with a full stack solution and there’s no question about which one actually democratises data.
For example, a columnar-based solution combines different datasets and accesses insights from raw data. Business users can plug in data sources as and when they need them in no time.
With a more modern, full stack solution every business user can take control of their data management. Even non-tech users can enter easy commands to run logic on data and make it as simple or complex as you need, all within a single environment.
Multiple people can access data without having to download the files to their PC, update the data, and then reload the server as is the case with many data discovery tools. This long-held and incredibly cumbersome process isn’ t just time intensive, it requires a huge amount of RAM and CPU on each user’s machine, which can get very expensive, very quickly.
On the other hand, full stack solutions with a central server let you easily add a file from your own machine and then make the changes on the remote server directly, giving you faster data syncing while using much fewer resources. You can even work with billions of rows of data.
Full stack solutions with a central server also resolve the issue of errors and discrepancies arising from multiple people accessing the same data simultaneously, because everything is synced in real time.
How many calculations do you want to see within one query or analysis? Most visualization tools make you summarize the data on two levels. But sometimes multiple calculations are what’s needed.
For example, if you want to compare how many product units were sold each month compared to average sales per day, you need extra calculations — first the sum for every month, then divided by the number of days in the month.