Bigdata Literature Review

Literature review on Big Data

There is no commonly accepted definition for Big Data. It can however be understood as a combination of five characteristics: volume, variety, velocity, value and veracity. Volume refers to the large amount of data stored and analysed. Variety considers the different types and sources such as well defined and semi structured data like graphs and documents and also unstructured data like photographs and videos. Velocity indicates the speed at which data is generated. Value describes the benefit of the data to the organisation. Veracity determines the correctness of the data (Gordon 2013). It can be considered ‘different’ instead of ‘big’ as it is not a huge collection of a single set of data but rather a growth in data by connection of different data sets to create more information. It represents an ever expanding collection of data sets whose size, variety and speed of generation makes it difficult to manage and harness information from. Complexities arise because the sources of data could be as simple as bank transactions to as complex as facebook photos and videos (Jackson 2013).

To indicate the volume of data Batty (2013) gives an example of a system that collects swipe card data. The system records 7 million trips on a public transport tube in London in a day which further scales to 200 million a month and 2.5 billion a year. In the past data was handcrafted, gathered using expensive and time consuming surveys done manually. Today data is collected through activities and actions recorded by the devices such as computers and smart phones creating overwhelming streams of infinite data (Batty 2013).

Anderson (2008) claims that we have entered the Petabyte Age. Kilobytes were stored in floppy disks, megabytes in hard disks, terabytes in disk arrays and now petabytes in clouds. He also indicates that the massive amount of data is making the hypothesizing, modelling and testing to prove causation approach to science obsolete....