Why the sudden Hype?
For many of you who have worked in manufacturing for a long time, this sudden hype around IoT data may be confusing. Sensors have been used to track data in manufacturing for decades now. So why we are seeing this sudden IoT hype now? The fact is, IoT is not really new, as elements of it have been developing for decades. Remote detection of oil well spills was happening in the 1970s and GPS-based vehicle telematics has been around for 20 years.
However, recent developments in technology, cloud computing, cellular technology and computing power has brought IoT to the forefront again. These developments, combined with a significant decrease in sensor costs create optimal conditions to easily connect devices over the internet. The graph from Goldman Sachs shows the pattern of decrease in sensor costs over the years.
Source: Goldman Sachs
Unique aspects of IoT data
Many of the challenges associated with running analytics on IoT data are the same as those associated with any other data set source. However, there are some unique challenges that come along with IoT data. Examples of unique aspects of IoT data are:
- The devices are often distributed widely geographically.
- The data was created by devices operating remotely, sometimes in widely varying environmental conditions that can change from day to day.
- The data is communicated over long distances, often across different networking technologies. It is very common for data to first transmit across a wireless network, then through a type of gateway device to be sent over the public internet–which itself includes multiple different types of networking technology working together
Analytics challenges associated with IoT data
A company can easily have thousands to millions of IoT devices with several sensors on each unit, each sensor reporting values on a regular basis. The inflow of data can grow quite large very quickly. Since IoT devices send data on an ongoing basis, the volume of data in total can increase much faster than many companies are used to.
To demonstrate how this can happen, imagine a company that manufactures small monitoring devices. It produces 12,000 devices a year, starting in 2010 when the product was launched. Each one is tested at the end of assembly and the values reported by the sensors on the device are kept for analysis for five years.
Now, imagine the device also had internet connectivity to track sensor values, and each one remains connected for two years. Since the data inflow continues well after the devices are built, data growth is exponential until it stabilizes when older devices stop reporting values.
The data volume rapidly leads to computing and storage requirements well beyond what can be held by a single server. It gets cost prohibitive very quickly under traditional architectures to distribute it across hundreds or thousands of servers. To do the best analytics, you need lots of historical data, and since you are unlikely to know ahead of time which data is most predictive, you have to keep as much as you can on hand.
When data used for analytics is recorded at headquarters or a manufacturing plant, everything happens at the same place and time zone. IoT devices are spread out across the globe. Events that happen at the absolute same time do not happen at the same local time.How time is recorded affects the integrity of the resulting analytics.
When IoT devices communicate sensor data, time may be captured using the local time. Itcan dramatically affect analytics results if it is not clear whether local time or UTC wasrecorded. For example, consider an analyst working at a company that makes parking spot occupancy detection sensors. She is tasked with creating predictive models to estimate future parking lot fill rates. The time of day is likely to be a very predictive data point. It makes a big difference to her on how this time is recorded. Even determining if it is night or day at the sensor location will be difficult.
IoT devices are located in multiple geographic locations. Different areas of the world have different environmental conditions. Temperature variations can affect sensor accuracy. You could have less accurate readings in Calgary, Canada than in Cancun, Mexico, if cold impacts your device.
Elevation can affect equipment such as diesel engines. If location and elevation is not taken into consideration, you may falsely conclude from IoT sensor readings that a Denver-based fleet of delivery trucks is poorly managing fuel economy compared to a fleet in Indiana. Lots of mountain roads can burn up some fuel!
Remote locations may have weaker network access. The higher data loss could cause data values for those locations to be underrepresented in the resulting analytics. Many IoT devices are solar powered. The available battery charge can affect the frequency of data reporting. A device in Portland, Oregon, where it is often cloudy and rainy will be more impacted than the same device in Phoenix, Arizona, where it is mostly sunny. There are also political considerations related to the location of the IoT device. Privacy laws in Europe affect how the data from devices can be stored and what type of analytics is acceptable. You may be required to anonymize the data from certain countries, which can affect what you can do with analytics.
Constrained devices means lossy networks. For analytics, it often results in either missing or inconsistent data. The missing data is often not random. As mentioned previously, it can be impacted by the location. Devices run on a software, called firmware, which may not be consistent across locations. This could mean differences in reporting frequency or formatting of values. It can result in lost or mangled data.
Data messages from IoT devices often require the destination to know how to interpret the message being sent. Software bugs can lead to garbled messages and data records. Messages lost in translation or never sent due to dead batteries result in missing values. The conservation of power often means not all values available on the device are sent at the same time. The resulting datasets often have missing values, as the device sends some values consistently every time it reports and sends some other values less frequently.
Business value concerns
Many companies are struggling to find value with IoT data. The costs to store, process, and analyze IoT data can grow quickly. With future financial returns uncertain, some companies are questioning if it is worth the investment.
According to McKinsey & Company, a consulting agency, most IoT data is not used. From their research, less than 1% of data generated by an oil platform was used for decisionmaking purposes.
Finding value with IoT analytics is often like finding a diamond in a mountain of rubble. We can accept that 1% of the data has value, but which 1% is it? This can vary depending on the question. One man’s worthless granite is another man’s priceless diamond. The business value challenge is how to keep costs low while increasing the ability to create superior financial returns. Analytics is a great way to get there.
Analytics often requires deciding on whether to fill in or ignore the missing values. Either choice may lead to a dataset that is not a representative of reality. There can also be outside influences, such as environment conditions, that are not captured in the data. Winter storms can lead to power failures affecting devices that are able to report back data. You may end up drawing conclusions based on a non-representative sample of data without realizing it. This can affect the results of IoT analytics – and it will not be clear why.
Since connectivity is a new thing for many devices, there is also often a lack of historical data to base predictive models on. This can limit the type of analytics that can be done with the data. It can also lead to a recency bias in datasets, as newer products are over represented in the data simply because a higher percentage are now a part of the IoT.