Big Data and the Challenge of Complexity

Big Data is everywhere and stakeholders in business and the public sector understand the potential. The next step is to implement strategies for capitalizing on these myriad pieces of information. A famous example is Google flu trends (http://www.google.org/flutrends/us/#US), a chart based on Google searches that predicts the rate of influenza cases in select countries (US and Germany included) before official health statistics are available with an astonishing accuracy.

However, this widely-publicized Big Data success story has serious issues upon closer inspection, overestimating real data by as much as 100% at times, as a team of scientist from Northeastern University and Harvard report (Lazer, et al. 2014 – http://gking.harvard.edu/files/gking/files/0314policyforumff.pdf). More specifically, the researchers point to issues with transparency and changing algorithms, while exposing the minimal gain Google’s Big Data application offers over traditional predictive methods. Consequently, Lazer et al. argue for the use of Big Data to “Understand the Unknown” – human interactions on a societal scale and complex spatial, temporal and non-linear relationships. This is certainly excellent advice, and points to the challenge of complexity in Big Data.

While Big Data is complex, successful Big Data applications expose the simple truths hidden in the haystack of information bits which can be applied for business gain and government planning. Or so the story goes. And exactly this is the major fallacy in many approaches. Let’s have a look at this model:

Big Data (complex) > miracle algorithm >  simple facts

What this model exposes is the tendency to oversimplify, to turn highly complex data into simple, linear narratives. This practice is questionable at best, and most likely produces false insights. In other words – when complexity is simplified, important aspects are lost. The challenge is to find a better way to make the insights of Big Data accessible without unduly reducing the rich intelligence that exists in the data sets.

In fact, there are methods to handle complex relationships, chiefly amongst them cybernetics, system theory, and chaos theory. These scientific methods have been used to understand feedback loops (how to hit a moving target like a submarine while being in movement) and complex systems (e.g. how feeding bread to ducks will eventually destroy the eco system of a pond), as well as the influence of minuscule changes (the famous example of the butterfly over that Andes mountains that causes a tornado in Florida). Cybernetics, invented in WW2, is the move from fixed artillery tables, in which soldiers could look up what numbers to dial into their guns to hit a target, to target machines that calculated the firing parameters based on current data. Applying such a systemic approach avoids many of the problems inherent in the first model. Instead of creating a miracle algorithm, a system approach builds a dynamic representation that connects to an input which accepts simple queries.

Big Data (complex) > dynamic representation > query input

This model also reduces complexity, but instead of a fixed, oversimplified output, it preserves a dynamic relationship with the original data. In addition it introduces a feedback system that essentially takes well-formed questions (queries) to produce dynamic results:

Big Data (complex) <> dynamic representation (systemic) <> query input > output (dynamic)

While this approach avoids the oversimplification problem and goes a long way towards a better understanding of complex system, the results are not necessarily presented in an easily understood format. This is where interactive narrative comes into the play. Narratives have always served as means of knowledge transfer by packaging important pieces of information in the form of a story. This intuitive insight is supported by recent findings by Cognitive Sciences that has recently identified narrative as a build-in mechanism for comprehension and information storage. Interactive narratives enhances this age-old practice by introducing dynamic elements that react to a user’s choices and the ability to “replay,” to see the consequences of different decisions. Instead of creating an abstract dynamic representation, this approach builds a dynamic narrative representation, which a user can understand more easily.  Interaction with the system then turns into a narrative experience with choices and consequences:

Big Data (complex) <> dynamic narrative representation <> narrative choices > dynamic narrative experience

Big Data represents a huge opportunity that will transform both the business world and society at large. However, it is crucial to understand the Challenge of Complexity and avoid oversimplification for successful applications. Based on scientific methods like cybernetics and chaos theory, interactive narrative is an important tool in this regard.