IoT and IIoT generated data will dominate the next wave of advanced analytics. As an advanced analytics professional, it is critical to understand the IIoT data flow architecture at a high level, in order to leverage the data generated efficiently. This post aims to provide an introduction to the flow. Effort has been made to keep it as basic as possible. All definitions provided here have been simplified.
The Industrial IoT data flow in a Distribution Center (DC)
The architecture of the IoT data flow in a Smart DC is hown in the illustration below:
As you can see in the illustration above, the data flows from the controllers on the floor, all the way to the cloud, where they are processed. In order to understand this data flow, we will follow the flow downstream, from sensors to the cloud. Let us start with the Sensors and Actuators.
Sensors and Actuators
The actuator is a piece of hardware that transforms a command signal into a physical action on the process. It receives a signal as an input in the physical domain of the control device and sends energy as an output in the physical domain of the command variable.
The sensor is a piece of hardware that is similar to an actuator, the flow of information is in a different direction. It transforms the information generated by the process into binary values, which it then feeds into the control system. These sensors are generally attached to various equipments in and around the processes (On robots, on and around conveyor belts etc.) to capture process characteristics and relay the information back to the control system.
The information generated is then exchanged with network entities like SCADA systems using FiedBuses.
Fieldbus is a digital two-way communication link between intelligent field devices. Think of it as a local area network dedicated to industrial automation.
Controllers perform the following steps cyclically and sequentially:
Read the measurements that arrive as input from the sensors
Execute the control algorithms that are fed by the last reading of the inputs
Write the outcomes of the control algorithms as outputs to the actuators
The duration of a cycle varies according to the kind of controllers in use:
Embedded controllers take a few milliseconds
PLCs take between 10 and 100 milliseconds
In any case, any controller, in every scan cycle, whatever its duration is, performs a reading of all the sensors and calculates the values for all actuators linked to it. The values refer to analog signals, such as temperatures, pressures, or Boolean for two-status representation, such as whether a valve is open or closed, or whether a part in a conveyor is present or not.
Supervisory Control and Data Acquisition (SCADA) and Historian
The sampling of these values come as an input to the systems of the upper level, such as SCADA and Historian, as a representation of the related signals. In the industrial world, the digital representation of a signal is named a tag, and its samples are called time-series. A time-series is an ordered sequence of timestamps and values related to a signal in a particular time interval. SCADA systems and Historian gather the data from the controllers, but with a higher scan time. Typically, they collect the time-series using a scan cycle from 500 to 1,000 milliseconds in SCADA applications, and from one second to one minute or more in the Historian systems.
A memory buffer area that temporarily stores the values of the internal variables of the controllers at its scan time in a sliding time window. The memory buffer area can be exported as a binary or text file on an event or a condition.
Alarms related to measurements or calculations that fall outside the configured high or low limit for that variable.
From the data collection perspective, the data sources of the I-IoT are the following:
Sensors, single pieces of equipment, or stations that are driven by an embedded controller cannot act autonomously as a data source for the I-IoT gateway. This is because they have too many connections to isolated networks that are managed by legacy protocols. Moreover, their data is mostly collected and made available by the controllers, so there is no need to connect directly to them.
SCADA and Historian are in the corporate network and are based on Ethernet, so they are not in an isolated and specialized network, and therefore don’t need specific network cards or boards to communicate with the other systems. SCADA and Historian do not collect all the data that is made available by the controllers to which they are connected. Instead, they collect the most important data, and that which has a lower frequency.
Open Platform Communications (OPC)
In any case, whatever the device that is acting as a data source is, we need to have a software layer to interact with the industrial data source through its own protocol for querying tags and time-series and exposing them by means of a standard interface to the upper levels and external systems. This software layer, which acts as a translator between the controllers and the producers of the industrial data and the consumer systems, is the Open Platform Communications (OPC).
The OPC is a standard that is used to abstract controller-specific protocols into a standardized interface. This allows SCADA and other systems to interface with a middle-man which converts generic OPC read or write requests into device-specific requests and vice versa.
SCADA and Historian, which were initially developed to be consumers of industrial data, therefore implementing an OPC client interface, themselves became producers of industrial data, therefore so implementing an OPC server interface and playing both roles at same time. The quality and the performance of the OPC technology had various issues and limitations at the beginning. Even so, the adoption and the spread of the OPC in the industrial world was considerable. This was because of the significant need of all the actors involved, which includes vendors, end users, and integrators, to exchange data between applications belonging to the automated production system in a standardized and reusable way. OPC interfaces allow you to query for time-series and data cyclically, both on changes, and through subscriptions with dead bands and a scan time typically from one second.
The edge device
The edge device is the device which enables the I-IoT. It is physically located in the factory and is linked from one side to the industrial data sources through the OPC server, and from the other side to the cloud through the IoT gateway. There is much debate about the location, role, and responsibilities of the edge device. With regard to the I-IoT, it must have the following minimum capabilities:
- Implement OPC client interfaces to gather the data from the industrial data sources, which include controllers, SCADA, and Historians
- Have OPC Proxy functionalities to manage indifferently the incoming data requests following both OPC Classic and OPC unified interfaces and addressing them accordingly
- Implement bi-directional communication to the cloud through VPNs, SSLs, or cellular lines through the most common internet application protocols to:
- Transfer the data to the cloud through the IoT Hub
- Receive commands and configurations from the IoT Hub
- Implement a store and forward mechanism to guarantee the data transfer along the communication channel in case of a poor, intermittent, or overloaded connectivity
- Expose centralized functionalities available from the cloud. These include the following:
- Setup and monitoring of the device
- Data acquisition parameters and configuration deployment
- Software patches and software upgrade deployment
- Be available both as a physical and a virtual appliance
- Be multi-platform (at least on Windows and Linux)
- Be flexible and scalable to support the collection and transfer of anything from several hundred tags to dozens of thousands of tags
The Industrial IoT data flow in the cloud
The most relevant components that manage the flow of industrial data in the cloud are the following:
- IoT Hub: This is the dispatcher of data and the manager of devices. It checks security and dispatches data to the right data processors (storage, analytics, or queue). Normally, it is implemented with a multi-protocol gateway, such as HTTPS and a message broker.
- Time Series Database (TSDB): This is the centralized database in which events and data points acquired from sensors are stored.
- Analytics: These work with data to extract anomalies, the machine’s health, the efficiency, or generic key performance indicator (KPI). Analytics can work either in stream mode or in micro-batch processing mode. Normally, we use simple, stream-based analytics to evaluate simple rules, and machine-learning analytics or physics-based analytics for more complex analytics working in micro-batch mode.
- Asset registry: This supports additional (static) information, such as the model of the machine being monitored and the operational attributes. This might include the fuel used, the process steps followed, or the machine’s status.
- Data lake: This is normally used to support raw data, such as images or log files. Sometimes, it is used to offer storage support for events and measures (time-series). In the data lake, we normally store the outcome of the analytics.
- Object storage: This stores additional information for large files or document-based data. Object storage can be implemented using the data lake.
- Big data analytics: These are not necessarily within the scope of the IoT, but sometimes we need to run big data analytics over the entire fleet. Alternatively, we might be using a huge amount of data to carry out business analysis.