Brief about Big Data and how Google handles the gigantic amount of data?
When it comes to data, few companies have to store a gigantic amount of data on their servers like Google. Estimation, Google could be storing data up to 15 exabytes on their servers. That’s 15 million terabytes of data which would be the equivalent of data stored in 30 million personal computers.
Google’s Mesa is a data warehousing environment that powers much of the Google ecosystem. To capture and render content at high speeds, Google might collect far more personal data about its users than you might even realize. Google accounts will automatically delete its private data after every 18 months. Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. A place where google stores and handles all its data is a Data Center. Google doesn’t hold the biggest of data centers but still it handles a huge amount of data. A data center normally holds petabytes to exabytes of data.
Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters. The average MapReduce job ran across approximately 400 machines in September 2007, crunching approximately 11,000 machines per year in a single month.
Big data is a term that describes the large volume of data — both structured and unstructured — that is created by businesses daily. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
What is Data?
Data are measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images, or other analysis tools. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.
Types of Big Data
Structured is one of the types of big data and By structured data, we mean information that can be prepared, put away, and recovered in a fixed organization. It alludes to exceptionally sorted out data that can be promptly and flawlessly put away and got to from a database by simple search engine algorithms. For instance, the employee table in a company database will be structured as the employee details, their job positions, their salaries, etc., will be present in an organized manner.
Unstructured information alludes to the information that does not have a specific format or structure at all. This makes it troublesome and tedious to analyze unstructured data. Email is a case of unstructured information. Organized and unstructured are two significant kinds of huge information.
Semi-structured is the third type of big data. Semi-structured data pertains to the data containing both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers to the information that although has not been arranged under a specific store (data set), yet contains fundamental data or labels that isolate singular components inside the information.
Characteristics of Big Data:-
A variety of Big Data refers to structured, unstructured, and semistructured data that is gathered from multiple sources. While in the past, data could only be gathered from spreadsheets and databases, and in a variety of forms such as emails, PDFs, photos, videos, audios, SM posts, and so much more.
Velocity essentially refers to the speed at which data is being created in real-time. In a broader prospect, Velocity essentially refers to the speed at which data is being created in real-time. In a broader prospect, it comprises the rate of change, linking of incoming data sets at varying speeds, and action blasts.
Volume is one of the characteristics of big data. We know Big Data indicates huge ‘volumes’ of data that is being generated daily from various sources like social media platforms, business processes, machines, networks, human interactions, etc. Such a large amount of data are stored in data warehouses.
Sectors using Big Data
How Google handles BIGDATA??
1)Google’s Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google’s Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple data centers and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails.
2)MillWheel is a framework for building low-latency data-processing applications that are widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records. MillWheel’s programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. Millwheel provides a unique combination of scalability, fault tolerance, and a versatile programming model that lends itself to a wide variety of problems at Google.