1.3.What are Big Data sources?
Big data is not new. Most of the big data sources existed before, but the scale we use and apply them today has changed.
Big data is often boiled down to a few varieties of data generated by machines, people, and organizations. Machine generated data are data generated from real time sensors in industrial machinery or vehicles that logs that track user behaviour online, environmental sensors or personal health trackers, and many other sense data resources. Human generated data are data produced by the vast amount of social media data, status updates, tweets, photos and media. Organizational generated data are more traditional types of data, including transaction information in databases and structured data open stored in data warehouses.
Big data can be structured, semi-structured, or unstructured. Real value often comes from the combination of these streams of big data sources with each other and analysing them to generate new insights, which then goes back into being big data themselves.
1.3.1. Machine Generated Data
Machine data is the largest source of big data and the most complex. Machines collect data 24/7 via their built-in sensors, both at personal and industrial scales. And thus, they are the largest of all the big data sources. In general, we call machines that provide some type of sensing capability, smart.
There are three main properties of smart devices based on what they do with sensors and things they encapsulate. They can connect to other devices or networks, they can execute services and collect data autonomously and also they have some knowledge of the environment. The widespread availability of the smart devices and their interconnectivity led to a new term “The Internet of Things”.
A Boeing 747 produces huge amounts of data at every flight. Some of the sensors that contribute to this amount of data generated on a plane are accelerometers that measure turbulence and many other sensors built into the engines for temperature, pressure plus many other measurable factors to detect engine malfunctions. Constant real-time analysis of all the data collected provides help monitoring and problem detection at approximately 12,000 meters above ground. This type of analytical processing is called in-situ. Previously, in traditional relational database management systems, data was often moved to computational space for processing. In Big Data space In-Situ means bringing the computation to where data is located or, in this case, generated. A key feature of these types of real-time notifications is that they enable real-time actions.
Using such a capability would require a different approach of application. If there are plans to incorporate Big Data driven insights into an organization, defining a new strategy is necessary. Most Big Data centric businesses have updated their culture to be more real-time action oriented, refining real-time processes to handle anything from customer relations and fraud detection, to system monitoring and control. In addition, such volumes of real-time data and analytical operations that need to take place require an increased use of scalable computing systems, which need to be a part of the planning for an organizational Big Data strategy.
SCADA stands for Supervisory Control and Data Acquisition. SCADA is a type of industrial control system for remote monitoring and control of industrial processes that exists in the physical world, potentially including multiple sites, many types of sensors. SCADA systems can even be used in smart building applications to monitor and control heating, ventilation and air conditioning (HVAC systems), access, and energy consumption. The management of these processes once the trends, patterns, and anomalies are identified in real-time needs to be decided in the Big Data case.