Home United States USA — software Merging Transactional and Predictive Analytics to Scale IoT Merging Transactional and Predictive...

Merging Transactional and Predictive Analytics to Scale IoT Merging Transactional and Predictive Analytics to Scale IoT

139
0
SHARE

A core value of IoT is using the data it gathers for predictive analytics and decision making. Take a look at one solution, and the code to go with it.
The digital universe is estimated to see a 50-fold data increase in the 2010-2020 decade. Gartner expects 6.4 billion connected things will be in use worldwide in 2018, up 30% from 2015, and will reach 20.8 billion by 2020.
When it comes to the IoT, this involves an increasing number of complex projects encompassing hundreds of suppliers, devices, and technologies. Michele Pelino and Frank E. Gillett from Forrester predict fleet management in transportation, security and surveillance applications in government, inventory and warehouse management applications in retail and industrial asset management in primary manufacturing will be the hottest areas for IoT growth.
The impact of increasing the amount of data is the increase in velocity in which we have to ingest that data, perform data analysis and filter the relevant information. With a stream of millions of events per second coming in from IoT devices, organizations must equip themselves with flexible, comprehensive and cost-effective solutions for their IoT needs.
At GigaSpaces, we’ ve come to a realization that the solution to this growing need is not radically changing an existing architecture, but rather extending it through in-memory computing to enable fast analytics and control against fast data. The combination of low latency streaming analytics, along with transactional workflow triggers, enables acting on IoT data in the moment. This includes predictive maintenance and anomaly detection again millions of sensor data points.
Magic Software Enterprises, a global provider of enterprise-grade application development and business process integration software solutions and a vendor of a broad range of software and IT services, has been leveraging GigaSpaces XAP for years. When Magic came out with their xpi Integration Platform, they were looking for a data aggregation solution to form an IoT Hub in front of Magic xpi. This solution needed to be flexible enough to meet a variety of applications regardless the data and velocity requirements.
In the age of fast data, the xpi platform, although proving operational interoperability, it still faces the challenge of many existing platforms that are not ready to handle fast data ingestion scenarios. Magic was looking for a POC which could be implemented as quickly as possible while delivering fast results.
InsightEdge was the perfect choice to help the Magic IoT solutions handle all the difficult data transformation challenges, allowing customers to concentrate on designing the best processes and flows to support their business goals. The solution needed to be to be flexible and open to any type of data input, regardless the type and structure of the data, the velocity, running in-memory. That’s where we came in. During our meeting, we suggested a simple solution based on Kafka and InsightEdge to help facilitate data velocity and variety in IoT use cases.
By integrating InsightEdge in-memory streaming technology, incoming sensor data is analyzed through a multitude of predefined filters and rules and aggregated by InsightEdge. The aggregated data is easily compared, correlated and merged and is transferred in batches to Magic xpi, where a prediction engine is first to predict when IoT equipment failure might occur, and to prevent the occurrence of the failure by performing maintenance. Monitoring for future failure allows maintenance to be planned before the failure occurs.
InsightEdge provides Magic with a few key benefits:
Using InsightEdge, Magic is able to provide its customers with fast data streaming and the ability to perform aggregations and calculation capabilities on the in-memory grid. Using the XAP data grid makes the streaming process it that much faster, hence eliminating the need for Hadoop.
InsightEdge facilities Magic’s customer needs for the IoT deployments with predictive manufacturing and maintenance, enabling them to receive real-time, fast, data-driven events from their systems.
A live InsightEdge use case is Car Telemetry Ingestion and Data Prediction using Magic’s xpi. In the case of car telemetry, it is very hard to predict in advance what data will be useful. In the case of data prediction, we need to think about not only device telemetry but also diagnostic telemetry.
Predictive car maintenance requires car telemetry ingestion and data prediction. Magic’s solution stack needed one more component in the architecture to be fully compliant with fast data and scalable scenarios, assured innovation was needed and the correct puzzle piece to fit.
In this use case, we will cover post-data-collection (assuming we have CSV files but could have been streaming all the same) and up until the data sent to Magic’s xpi Integration Platform.
Apache Kafka is a distributed streaming platform, or a reliable message broker on steroids but not limited to just that. It enables building real-time streaming data pipelines that reliably get data between systems or applications and building real-time streaming applications that transform or react to the streams of data.
We’ ll be using version “kafka_2.10-0.9.0.0” to run our tests, however, newer Kafka versions are out there. You can download Kafka here or download the specific version we’ ve used for this use case.
Kafdrop is a simple UI monitoring tool for message brokers. In this case, we will use it for Kafka to moderate the topics and messages content during development. Download Kafdrop using the instructions on the Git page and install following the instructions.
InsightEdge is a high-performance Spark distribution designed for low latency workloads and extreme analytics processing in one unified solution. With a robust analytics capacity and virtually no latency, InsightEdge provides immediate results. GigaSpaces’ Spark distribution eliminates dependency on Hadoop Distributed File System (HDFS) so as to break through the embedded performance “glass ceiling” of the “stranded” Spark offering. To this, GigaSpaces has added enterprise-grade features, such as high-availability and security. The result is a hardened Spark distribution that is thirty times faster than standard Spark.
Download InsightEdge here. No installation needed, simply unzip the file to the desired location. (Note: Not compatible with Windows)
First, we need to start Kafka and InsightEdge, so we’ ll use the following two scripts:
For example, if I run ZooKeeper locally and on port 2081, use the following:
Browse to the local instance to make sure it works [link: http: //localhost: 9000]
Now we’ ll have to build our model, so let’s see how it should look like:
Next, we write our event class (to handle incoming events:
Now that we have our Model and Event-Model we can write the code we want to deploy to Spark (which will actually read from Kafka to Spark and persist to the grid) :
Now, we have three options of running logic:
We chose to go with the third option as we have scalability and growth considerations. We need to take into account dozen of external processes running rather than one very long event on the grid.

Continue reading...