The other night I had a great conversation with a client at our MDM/BI Summit in Grapevine regarding IoT and risk management. And what we began to realize was that a tightly integrated, end-to-end, automated solution, can create a decent amount of risk.
For example, remember the “Flash Crash” of 2010 created by automated, algorithmic trading and day traders? (See WSJ blog here). The government is still trying to figure out exactly what happened there..and what — if anything — can be done to prevent it again.
That’s when we started to discuss the need for circuit breakers, or a fail-safe, or kill switch within an IoT solution. This is something that is built into the solution, by design, so that IT has the ability to stop automated actions when they can see that things have gone haywire. Why would such a circuit breaker be necessary? Well, when one is collecting stream data in real-time, and using smart machine-learning algorithms that can not only prescribe an action, but can actually take it, all within seconds, then the potential for catastrophic events can occur.
So, the client and I agreed that circuit breakers are a good thing in automated IoT systems, but the conversation quickly turned to “how”. How do we build circuit breakers? What are the best implementations for IoT automation fail safes? Here’s a few:
- Shutdown some or all of the machine learning systems or prescriptive analytics software that recommend a course of action in the IoT Platform. This would mean that one might actually miss a real/necessary action to take. Actions could be cached and replayed.
- Shutdown some or all of the streaming ingestion/processing engines. This will probably result in lost data (unless the sending device or aggregation point can queue up data), which may be acceptable.
- Use anomoly detection software to prevent inaccurate data from entering the system that might trigger unwanted automated actions.
- Put in a software gate, where the system cannot scale up without authorization. If the IT organization sees that the situation is normal, then they can spin up additional resources when system demand is needed. This can create human-dependent bottlenecks, but that’s kinda the idea.
- Put in a software “dead man’s switch”. This would something like, if any of the necessary processing software is overwhelmed or unexpectedly crashes, then all the other systems stop too. This requires synchronization between all of the various parts in the solution, but might be necessary in some instances.
- Ultimately, one can turn off physical machines. Although drastic, sometimes it may be the last resort.
Whatever the case, it seems wise to build in circuit breakers into your IoT Solution and test them in non-production environments to ensure that they work.
What are some ideas you have for IoT circuit breakers?