Big Data Tutorials

Big Data Overview

5:15 AM divjeev 26 comments

What is Big Data?

Big Data is a term coined to represent the collection of data assets so huge such that traditional methods for storage and data analysis would fall short of handling it effectively. The key challenges or characteristics which differentiates Big Data from regular data sets are termed as 3 Vs

Volume
Velocity
Variety

Let's understand each of these in a more detailed level

Volume
This deals which the sheer size of data assets that need to be dealt with.Which the overall data available for consumption is doubling every year, it is the need of our times to start bringing in frameworks and solutions which can tackle data assets that are in the scale of Peta or Exa bytes.

The realization on the amount of data an average human creates everyday - via his electronic interchanges via mobile,pc,laptop,internet etc are a clear testament to the fact that the data volumes which were handled by our traditional relational models are coming to an end. And in the new era which has already dawned volume should not be a bottleneck

Velocity
The days of batch processing are gone. One takes a chunk of data, submits a job to the server and waits for delivery of the result. That scheme works when the incoming data rate is slower than the batch processing rate and when the result is useful despite the delay. With the new sources of data such as social and mobile applications, the batch process breaks down. The data is now streaming into the server in real time, in a continuous fashion and the result is only useful if the delay is very short.

The speed at which data is made available is the need of our latest business models which would need real time data to plan their customer interactions.For e.g. we are looking at targeted messages to reach a prospective customer when he walks into our store which needs to be customized based on his spatial positioning and customer preferences harnessed from the likes of social media interactions.

Variety
This is the most interesting among the 3V phenomenon that defines Big Data. Our earlier definitions of enterprise data to reside on a structural (mostly relational) mode is taking a paradigm shift. Where-in the schema on write ideology of force fitting the incoming data into a predefined relational schema is transitioning into a schema on read approach where all data in its pure and true form would be received. And schema definition are applied only at the time of data provisioning or consumption by business apps or other analytics systems based on their unique needs.

Today's analytics is against sensor logs, twitter data, geospatial maps, handwrittern documents, images, scanned documents etc. This brings us to the need for a new way of stroring and intepreting these unstructured data assets

The above phenomenons which are acting across the world is driving the collective movement which is represented by Big Data and solutions to tackle it.

Big Data Trends

5:15 AM divjeev 7 comments

The latest trends from the world of Big Data are summarized below for your faster consumption.

1) More Analytics and Less Guesses
The assimilation of knowledge assets like never before which spans both internal and data assets has been a breakthrough which most of the businesses were waiting for. And that's exactly what Big Data has achieved. And on top of the same the latest Big Data Analytics capabilities which are being explored by leading Analytic platforms paves the way for smarter and faster decision making using the plethora of information made available.

Needless to stay Big Data and its Analytics is a trend which will define the future of how business decisions are to be made.

2) Privacy and Security on Big Data
As the data assets grow in volume its critical to place the right measures and checkpoints to ensure utmost privacy and security over these assets. And that is one area which is being positioned on the high priority list by most organizations. This trend again has synergy with the security over cloud paradigm that is another angle which is redefining what privacy and security really means in today's data management landscape.

3) Real Investments on Big Data
If the prior years saw more of POCs wherein businesses were exploring the potential of big data. Compared to the investments getting planned currently that was just the tip of the iceberg. Having realized the immense capabilities of Big Data and its Analytics businesses have decided to invest heavily towards related technology platforms to ensure that they in the forefront of this smarter-faster decision making ride.

4) A coalition of Big Data - Cloud - Social - Mobile - NoSQL and Analytics
There is convergence slowly evolving where in the following workflow of critical knowledge assets is getting evident

Social & Mobile platform will continue to be the key data generators
Big Data Platforms leveraging cloud solutions would be the data management solutions
NoSQL and Analytics coupled with the power of in-memory analytics will drive data consumption

Big Data Challenges

5:14 AM divjeev 6 comments

Big Data is here to stay and there are definitely no two ways on the same.

At the same time the unique opportunity which Big Data brings to the current technology landscape also brings along-with few unique challenges which needs to be understood and tackled with:-

1) Effective Analytics
Merging the traditional structured data assets and the unstructured/alternative structured data assets towards holistic analytics would remain one of the key challenges for Big Data as a Platform.

2) Privacy and Security
The growing volume/variety/velocity of data assets would undermine the traditional Privacy and security modes. Hence it is imperative to develop evolved processes and framework to support the changing needs of Big Data.

3) Performance with Schema on Read
The shift from the age old Schema-on-write paradigm which most of the current Data-warehouses follow. To the schema-on-read paradigm which is propagated by the likes of Data Lakes in the Big Data world, will bring on a critical challenge to meet the same performance benchmarks which the Data Retrieval applications used to get from the old warehouses.

4) Co-exist or Replace Data-warehouses?
The key question which many of the Big Data practitioners are today raising is how should the Big Data platforms evolve in an enterprise. Should they co-exists and be limited to play the role of all encompassing Landing layer for the warehouse. Or should it move ahead and replace the warehouses by directly bridging the gap between the data generating source systems and data consumption systems.

5) Survival of a Distributed Model in a relational world
Big Data has become synonymous with the Hadoop Data Platform as its strongest implementation framework. However the fully distributed mode of functioning with Hadoop will bring in intrinsic challenges when business apps and logic accustomed to the relational model of functioning start getting assimilated into the underlying distributed framework of Hadoop powered be MapReduce as the data processing mode.

Big Data Tutorials

DW Blogs

Big Data

HIVE

Hadoop

Big Data Overview

Big Data Trends

Big Data Challenges

Please Share

DEAR READER

Labels