Home United States USA — software Pushing IoT Data Gathering, Analysis, and Response to the Edge

Pushing IoT Data Gathering, Analysis, and Response to the Edge

April 19, 2018

206

Dive into IoT architecture and where edge computing fits in for data gathering, analysis, and response. Furthermore, see how the TICK stack can help.
Let’s be friends:
Comment ( 0)
Join the DZone community and get the full member experience.
This article is featured in the new DZone Guide to IoT: Harnessing Device Data. Get your free copy for more insightful articles, industry statistics, and more!
The Internet of Things is not so much a thing as it is a concept. It’s a concept that enables us to instrument our world with sensors and respond to the data coming in from those sensors in meaningful ways. It’s about adding sensors to all the things in our world so that we can measure, analyze, visualize, predict, and react to the environment around those things.
The IoT is not a product because it’s not a single thing. If that sounds fairly abstract, it is, but we’ll figure it all out. In addition to all of that, there are multiple segments, or markets, within the IoT. The one most people are familiar with is consumer IoT. You may have a smart thermostat in your house, some smart switches, or other internet-connected devices and appliances. These are generally considered part of consumer IoT. Then there’s the Industrial IoT, or IIoT. This segment includes things like smart buildings, industrial automation, and monitoring of industrial processes. This is a part of the IoT that most people never see, and rarely hear about, but it’s where the most growth and innovation happens, and it’s what we will focus on in this article.
Almost any IoT architecture is going to involve a few basic components like sensors, a place to collect and store data, some way to visualize and interact with the data, and often some way for actions to be taken based on events in the data.
It’s a fairly simple concept that’s been around for a very long time: a sensor collects data and sends it to a server to store it. That data is then made available for analysis and, based on that, some action is taken. But as IoT deployments grow in size and complexity, the ability to simply have a sensor send all its data to a single, monolithic backend begins to become less and less practical.
First, the sheer amount of data begins to become overwhelming for any one sensor. Second, there are few systems that can handle the volume of sensor data. To illustrate this point, let’s look at a modest-sized IoT deployment of, say, 10,000 sensors deployed across an enterprise. Each sensor takes a series of 5 readings every second. That’s 50k readings/second streaming over the internet to a single backend server. If each reading is 1kB of data, that’s 5kB per sensor/second, or 250kB/s of data per second overall.
It’s fairly reasonable to assume that almost any competent backend system could handle this fairly modest amount of data. But now let’s begin to scale that to something that would actually go into production, and the numbers rapidly begin to grow out of control. Sensors that collect 1,000 readings per second, at 1kB of data per reading, grows to 1MB of data per second, per sensor. At 10,000 sensors, you’re streaming a GB/second of data. Still think it sounds reasonable to stream all of that data to a single backend system in real time? We need to look for alternatives.
You could compress your data, but there’s a compute overhead in doing so. You could scale back the frequency of data collection, but this could impact your ability to detect and respond to anomalies. Or, you could push your data collection, analysis, and response out from the data center or cloud to the edge.
In the scenario above, with 10,000 sensors, it would be reasonable to segment the deployment into groups, each group connecting to the internet via a gateway device. If you could turn each gateway device into a mini data collection, analysis, and response machine, that would help with the overall scaling problem. If you segment each gateway device to service as many as 1,000 sensors, each gateway device would be at the 250kB/s data rate and could easily handle the load.
Now you’ve got 1,000 gateway devices, all collecting data from their 1,000 individual sensors, which has solved your data-rate scaling problem, but it has created another problem: distributed data. Now your data is on 1,000 different devices, and you have no way to see aggregated data from all of your sensors. As always, the left hand giveth while the right hand taketh away!
Now, on top of your data collection and scaling problems, you’ve got a data aggregation problem! I know it sounds like I’m creating more problems than I am solving here, but there is a way to apply a solution across the entire deployment that makes all of these problems go away: the TICK Stack.
The TICK Stack is made of 4 open source software components designed specifically to make the collection, storage, management, visualization, and manipulation of time series data easy and scalable.
The ‘T’ in TICK stands for Telegraf. Telegraf is a plugin-based high-performance data ingestion engine designed to gather incoming data streams from multiple sources, and in an extremely efficient manner, output those streams to data storage platforms.
One of those platforms, the one we’re going to focus on here, is InfluxDB, the ‘I.’ InfluxDB is a Time Series Database designed from the ground up for performance and ease of use when dealing with time series data — and what is IoT data if not time series data?
The ‘C’ in TICK is for Chronograf, the data visualization and management front-end for the other components of the stack. Using Chronograf, you can quickly and easily build stunning dashboards for data monitoring:
These dashboards can help you easily monitor your sensor data and spot anomalies and trends in your data that you might otherwise miss.
The ‘K’ in TICK is Kapacitor. Kapacitor is our streaming processing engine that runs alongside InfluxDB to do more complex data processing, process alerts, etc. Great, so how will this help? Well, one thing I’ve been working on lately is deploying the entire TICK Stack from edge to data center for complete IoT data collection, analysis, reporting, and alerting — and I can say that it has been wildly successful. I took a $30 Pine-64 LTS board, added a $35 7″ touchscreen display, a $9 Bluetooth/Wifi card, and built an edge device that is capable of collecting sensor data via Wi-Fi, wired Ethernet, or Bluetooth LE (I’ve since added a LoRA radio to it as well, just for fun).
That device, with a 32GB MicroSD card, collects sensor data and displays it on a dashboard (in fact, that image above is the dashboard on the device). In addition, it processes that data and sends alerts when the temperature from one of the sensors changes more than 1ºF, or whenever the CO2 concentration in the room changes by more than 100ppm.
But I said I was going to conquer the distributed data problem, and with Kapacitor I have done exactly that. I’ve used Kapacitor to generate the alerts discussed above, but I’m also using Kapacitor to do some fairly sophisticated downsampling of the data.