What is Big Data?
Big Data is a term coined to represent the collection of data assets so huge such that traditional methods for storage and data analysis would fall short of handling it effectively. The key challenges or characteristics which differentiates Big Data from regular data sets are termed as 3 Vs
Let's understand each of these in a more detailed level
Volume
This deals which the sheer size of data assets that need to be dealt with.Which the overall data available for consumption is doubling every year, it is the need of our times to start bringing in frameworks and solutions which can tackle data assets that are in the scale of Peta or Exa bytes.
The realization on the amount of data an average human creates everyday - via his electronic interchanges via mobile,pc,laptop,internet etc are a clear testament to the fact that the data volumes which were handled by our traditional relational models are coming to an end. And in the new era which has already dawned volume should not be a bottleneck
Velocity
The days of batch processing are gone. One takes a chunk of data, submits a job to the server and waits for delivery of the result. That scheme works when the incoming data rate is slower than the batch processing rate and when the result is useful despite the delay. With the new sources of data such as social and mobile applications, the batch process breaks down. The data is now streaming into the server in real time, in a continuous fashion and the result is only useful if the delay is very short.
The speed at which data is made available is the need of our latest business models which would need real time data to plan their customer interactions.For e.g. we are looking at targeted messages to reach a prospective customer when he walks into our store which needs to be customized based on his spatial positioning and customer preferences harnessed from the likes of social media interactions.
Variety
This is the most interesting among the 3V phenomenon that defines Big Data. Our earlier definitions of enterprise data to reside on a structural (mostly relational) mode is taking a paradigm shift. Where-in the schema on write ideology of force fitting the incoming data into a predefined relational schema is transitioning into a schema on read approach where all data in its pure and true form would be received. And schema definition are applied only at the time of data provisioning or consumption by business apps or other analytics systems based on their unique needs.
Today's analytics is against sensor logs, twitter data, geospatial maps, handwrittern documents, images, scanned documents etc. This brings us to the need for a new way of stroring and intepreting these unstructured data assets
The above phenomenons which are acting across the world is driving the collective movement which is represented by Big Data and solutions to tackle it.
Big Data is a term coined to represent the collection of data assets so huge such that traditional methods for storage and data analysis would fall short of handling it effectively. The key challenges or characteristics which differentiates Big Data from regular data sets are termed as 3 Vs
- Volume
- Velocity
- Variety
Let's understand each of these in a more detailed level
Volume
This deals which the sheer size of data assets that need to be dealt with.Which the overall data available for consumption is doubling every year, it is the need of our times to start bringing in frameworks and solutions which can tackle data assets that are in the scale of Peta or Exa bytes.
The realization on the amount of data an average human creates everyday - via his electronic interchanges via mobile,pc,laptop,internet etc are a clear testament to the fact that the data volumes which were handled by our traditional relational models are coming to an end. And in the new era which has already dawned volume should not be a bottleneck
Velocity
The days of batch processing are gone. One takes a chunk of data, submits a job to the server and waits for delivery of the result. That scheme works when the incoming data rate is slower than the batch processing rate and when the result is useful despite the delay. With the new sources of data such as social and mobile applications, the batch process breaks down. The data is now streaming into the server in real time, in a continuous fashion and the result is only useful if the delay is very short.
The speed at which data is made available is the need of our latest business models which would need real time data to plan their customer interactions.For e.g. we are looking at targeted messages to reach a prospective customer when he walks into our store which needs to be customized based on his spatial positioning and customer preferences harnessed from the likes of social media interactions.
Variety
This is the most interesting among the 3V phenomenon that defines Big Data. Our earlier definitions of enterprise data to reside on a structural (mostly relational) mode is taking a paradigm shift. Where-in the schema on write ideology of force fitting the incoming data into a predefined relational schema is transitioning into a schema on read approach where all data in its pure and true form would be received. And schema definition are applied only at the time of data provisioning or consumption by business apps or other analytics systems based on their unique needs.
Today's analytics is against sensor logs, twitter data, geospatial maps, handwrittern documents, images, scanned documents etc. This brings us to the need for a new way of stroring and intepreting these unstructured data assets
The above phenomenons which are acting across the world is driving the collective movement which is represented by Big Data and solutions to tackle it.