This paper document the concept identifying with big data. It attempts to merge the up to this point part talk on what constitute big data, what measurements characterize the size and different qualities of big data, and what tools technologies exist to saddle the capability of big data.
In the past 18 years, many organizations have seen an increase in the amount of data, in the order of terabytes which has been acquired and stored, but not prepared, cleaned, managed, and analysed enough to provide competitive value for the organization (Armes and Refern 2013). Big Data can be prepared by placing it into a format that will allow for the Big Data to be cleaned. The cleaning of Big Data refers to identifying and correcting incorrect, inaccurate, and duplicate data which is created from the many different heterogeneous sources (Saha and Srivastava 2014). Big Data error rates are typically in the range of 1% to 5% with some spikes over 30%, and cleaning of the errors accounts for around 30% to 80% of the time and money budgets for developing systems. Management of Big Data refers to the process of preparing, cleaning and analysing Big Data. Analysis of Big Data refers to the process to provide information to make better decisions by the organization. The organizations that analyse Big Data have profit performance that is about twice the amount of the organizations which do not analyse Big Data (LaValle, Lesser et al. 2011). Value to organizations comes from an analysis of Big Data, and this analysis may contribute to the profits of the organizations; however, many organizations have not analysed Big Data at all or only minimally (Najjar and Kettinger 2013). The organizations that have acquired Big Data need a new set of governance policies and architectures to minimize the risk of acquiring, storing and analysing the Big Data (Alnafoosi and Steinbach 2013). Some data centres have had over a 100% yearly growth in organizational data. (Tallon, Short et al. 2013) describes organizational risk as having the three components of value, technology, and reputation. If an organization does not have the technology to acquire all the Big Data accurately in a timely manner then the analytical value for the organization may decrease and the reputation of the organization may diminish (Tallon, Short et al. 2013).
From corporate pioneers to city organizers and scholastics, huge information is the subject of consideration, and to some degree, fear. The sudden ascent of enormous information has left numerous ill-equipped. Before, new mechanical advancements initially showed up in specialized and scholarly distributions. The information and union later saturated different roads of learning activation, including books. The quick advancement of huge information advances and the prepared acknowledgment of the idea by open and private divisions left brief period for the talk to create and develop in the scholarly area. Writers and professionals jump frogged to books and other electronic media for prompt and wide course of their work on enormous information. In this way, one finds a few books on huge information, including big data for dummies, yet insufficient essential talk in scholarly distributions (Gandomi and Haider 2015).
A key contribution of this paper is to bring forth the oft-neglected dimensions of big data. The popular discourse on big data, which is dominated and influenced by the marketing efforts of large software and hardware developers, focuses on predictive analytics and structured data. It ignores the largest component of big data, now think of the extent of details and the surge of data and information provided nowadays through the advancements in technologies and the internet. With the increase in storage capabilities and methods of data collection, huge amounts of data have become easily available. Every second, more and more data is being created and needs to be stored and analysed to extract value. Furthermore, data has become cheaper to store, so organizations need to get as much value as possible from the huge amounts of stored data. The rapid change of such data requires a new type of big data analytics, as well as different storage and analysis methods. Such sheer amounts of big data need to be properly analysed, and pertaining information should be extracted.
This paper is organized as follows. We begin the paper by defining big data. We highlight the fact that we consider some data as big data how do we classify data into big data how do we know which kind of data is going to hard for us to process which type of the complexities we will face? while the data complexities (include data’s volume, variety, velocity, value, and veracity) how to handle these complexities and problems.