INSIDE LOOK OF GOOGLE DATA STORAGE CENTERS :- HOW DO THEY DO IT ??

           WHERE DOES GOOGLE STORES IT DATA?                                  

For those of us who use the internet on a regular basis, Google is the great answerer of interrogatives. Have a question? Be it common (What is the difference between an acid and a base?) or more obscure (How do monkeys go about peeling bananas? ), Google is sure to turn up an answer that – if nothing else – points you in the right direction. There are other search engines, of course; but none have been verbed (think “Google it”) or even been made an official term in the Oxford Dictionary, like Google. There may have been a time when peers encouraged one another to “Ask Jeeves” or “Yahoo it,” but that time is long gone. Google is the search king in terms of U.S. market share at 65.6% (with Microsoft’s Bing at 16.5%).

   GoogleGraphic
Apart from search, Google operates its own social network (Google+), Gmail, an advertising business that fetched over $43 billion in revenue in 2012, and a host of other products and services. This begs the question: With all of these products/services and the unthinkable amount of data that come with them, how does a company like Google go about storing its information? If we get a little meta and turn to Google with our question, we learn that our answer lies in the functionality of thousands upon thousands of servers. In August of 2011, Data Center Knowledge reported that the number is close to 900,000. Pretty remarkable, right?
These servers don’t all serve the same purpose, of course. Instead, each server has designated tasks. Let’s look at some of Google’s server types and the tasks they are responsible for carrying out.




                                      Web Servers

Google’s web servers are those that will probably resonate most with the common user, as they are responsible for handling the queries that we enter into Google Search. When a user enters a query, web servers carry out the process of interacting with other server types (e.g. index, spelling, ad, etc.) and returning results/serving ads in HTML format. Web servers are the ‘results-gathering’ servers, if you will.

                               

                                 Data-Gathering Servers

Data-gathering servers do the work of collecting and organizing information for Google. These servers “spider” or crawl the internet via Googlebot (Google’s web crawler), searching for newly-added and existing content. These servers have the responsibility of indexing content, updating the index and ranking pages based on Google’s search algorithms.

 

                                      Index Servers

Google’s index servers are where a lot of the “magic” behind Google Search happens. These servers are responsible for returning lists of document IDs that correspond to “documents” (or indexed web pages) wherein the user’s query is present.


                                   Document Servers

Document servers store the document version of web page content. Each page has content saved in the form of JPEG files, PDF files, and more, all of which is stored in several servers depending on the type of information. Document servers provide snippets of information to users based on the search terms entered and are capable of returning entire documents, as well.

The document IDs returned by index servers correspond to documents housed by these servers. Due to the influx of indexed documents each and every day, these servers require more disk space than others. If we were to answer the question – Where does Google store its data? – with one server type, it’d most certainly be the document server.


                                      Ad Servers

Ad servers are vital to both Google’s revenue stream and the livelihood of thousands of businesses. These servers are responsible for managing the advertisements that are integral to Google’s AdWords and AdSense services. Web servers interact with these ad servers when deciding which ads (if any) should be displayed for a particular query.


                                   Spelling Servers

We didn’t all get A’s in spelling during school and some of us need a little help when searching. If you have ever searched for something in Google and the results came up with the phrase, “Did you mean correctspelling,” know that a spelling server was at work. No matter how search terms are entered, spelling servers work to perform the search anyway, taking advantage of the opportunity to learn, correct and better locate what users seek.

  INSIDE LOOK OF GOOGLE DATA CENTERS 

Very few people have stepped inside Google’s data centers, and for good reason: our first priority is the privacy and security of your data, and we go to great lengths to protect it, keeping our sites under close guard. While we’ve shared many of our designs and best practices, and we’ve been publishing our efficiency data since 2008, only a small set of employees have access to the server floor itself. 

Today, for the first time, you can see inside our data centers and pay them a virtual visit. On Where the Internet lives, our new site featuring beautiful photographs by Connie Zhou, you’ll get a never-before-seen look at the technology, the people and the places that keep Google running. 


 

Finally, we invited author and WIRED reporter Steven Levy to talk to the architects of our infrastructure and get an unprecedented look at its inner workings. His new story is an exploration of the history and evolution of our infrastructure, with a first-time-ever report from the floor of a Google data center. 



  


CLOUD STORAGE OF GOOGLE 



Google Cloud Storage is an Internet service to store data in Google's cloud.
Google Cloud Storage allows world-wide storing and retrieval of any amount of data and at any time. It provides a simple programming interface which enables developers to take advantage of Google's own reliable and fast networking infrastructure to perform data operations in a secure and cost effective manner. If expansion needs arise, developers can benefit from the scalability provided by Google's infrastructure.


Building Blocks

  • Projects
    All data in Google Cloud Storage belongs inside a project. A project consists of a set of users, a set of APIs, billing, authentication, and monitoring settings for those APIs. You can have one project or multiple projects.
  • Buckets
    Buckets are the basic containers that hold your data. Everything that you store in Google Cloud Storage must be contained in a bucket. You can use buckets to organize your data and control access to your data, but unlike directories and folders, you cannot nest buckets. Buckets belong to a project and cannot be shared among projects. There is no limit on the number of buckets that you can create in a project.
  • Objects
    Objects are the individual pieces of data that you store in Google Cloud Storage. Objects have two components: object data and object metadata. The object data component is usually a file that you want to store in Google Cloud Storage. The object metadata component is a collection of name-value pairs that describe various object qualities. Objects belong to a bucket and cannot be shared among buckets.

Features

Google Cloud Storage provides several features and capabilities that make storing, sharing, and managing data efficient and reliable.
  1. High Capacity and Scalability
    Google Cloud Storage supports objects that can be terabytes in size. It also supports a large number of buckets per account.
  2. Strong Data Consistency
    Google Cloud Storage provides strong read-after-write consistency for all upload and delete operations. This means that after you upload an object successfully you can immediately download it, delete it, or get its metadata. Likewise, any attempt to access an object immediately after you successfully delete it results in a 404 Not Found status code. List operations are eventually consistent from anywhere on the Internet.
    From an availability standpoint, upload operations to Google Cloud Storage are atomic. When you upload an object, the object is not available until it is completely uploaded. Uploaded objects are never available in a corrupted state or as partial objects. Objects are either available or they are not.
    For more information, see Consistency.
  3. Google Developers Console Projects
    Google Cloud Storage is available as a service in the Google Developers Console where you can add project members, handle billing, manage authentication, and work with other APIs. You can have many projects and each project can have its own Google Cloud Storage instance.
  4. Bucket Locations
    Google Cloud Storage provides the ability to specify where your buckets are geographically stored. It is possible to specify that buckets are stored in Europe or in the US. For more information, see Specifying Bucket Locations.
  5. REST APIS
    Google Cloud Storage provides two RESTful programming interfaces (the XML API and the JSON API) so you don't have to rely on SOAP toolkits or RPC programming to create applications that store, share, and manage data on Google Cloud Storage. Instead, you can use standard HTTP methods, such as PUT, GET, POST, HEAD, and DELETE.
  6. OAuth 2.0 Authentication
    Google Cloud Storage uses OAuth 2.0 authentication and authorization to interact with the API. OAuth 2.0 authentication is a token-based authentication where you can issue tokens to applications to act on your behalf. You can set up OAuth 2.0 authentication for your applications by visiting the OAuth 2.0 authentication and authorization guide and reading about OAuth 2.0 authentication specific to Google Cloud Storage.
  7. Authenticated Browser Downloads
    Google Cloud Storage lets you provide browser-based authenticated downloads to individual Google account holders. You can provide authenticated browser-based downloads by first applying Google account-based ACLs to an object and then sending users a URL that is scoped to the object. The URL for authenticated browser downloads is:
    https://storage.cloud.google.com/bucket/object
    For more information, see Cookie-based Authentication in Authentication.
  8. Google Account Support for Sharing
    Google Cloud Storage uses ACLs to control access to objects and buckets. By configuring ACLs you can share your objects and buckets with the entire world, a Google group, a Google-hosted domain, or specific Google account holders. For more information, see Access Control.

How to Use Google Cloud Storage

You can interface with Google Cloud Storage using the provided tools or programmatically. The tools allow you to perform data access and management operations.
  1. Activating Google Cloud Storage
    Before you can add buckets to your own project, you must activate the service for your project. For more information, see: Activate Google Cloud Storage.
  2. Managing Google Cloud Storage
    The Google Developers Console is a graphical user interface the allows you to:
    • Create and manage a project.
    • Enable the Google Cloud Storage service for a project.
    • Enable the Google Cloud Storage JSON API for a project for programmatic access.
    • Use drag-and-drop features to manage your Google Cloud Storage buckets and objects in a project.
  3. Using the Service from the Command Line
    The gsutil tool lets you access Google Cloud Storage service from the command line. You can use it for a wide range of bucket and object management tasks, including:
    • Creating and deleting buckets.
    • Uploading, downloading, and deleting objects.
    • Listing buckets and objects.
    • Moving, copying, and renaming objects.
    • Setting object and bucket ACLs.
    You can use the gsutil tool to access Google Cloud Storage and for your general Python development. The tool comes with a Python library (boto), an authentication client, and other libraries you can use to build Python applications.
    The following examples show how to use the gsutil tool to access Google Cloud Storage:
    • Hello Google Cloud Storage. Using gsutil to perform data access operations.
    • Access Public Data. Using gsutil to access public data.
  4. Using the Service Programmatically
    Google Cloud Storage gives you a range of programming languages to choose from when creating applications. These languages are supported by client libraries that allow your application to communicate with Google Cloud Storage. The libraries take care of the HTTP protocol details when using the Google Cloud Storage APIs.
    Depending on the language and library you choose, your application is bound to XML API or JSON API. What API to choose depends on your requirements. The XML API was the first API supported. The JSON API has been added to be compatible with other Google products.
    For more information about supported libraries and examples of using both XML and JSON API, see Overview of Client Libraries for Google Cloud Storage.

Getting Started with Google Cloud Storage

Getting started with Google Cloud Storage requires very simple steps:
  1. Activating Google Cloud Storage
    For instructions about activating Google Cloud Storage see: Activate Google Cloud Storage.
  2. Creating a bucket
    The bucket is the container for your data (objects). You can select the region where your bucket resides to reduce costs and speed-up access or satisfy local requirements.
  3. Uploading data
    Google Cloud Storage stores and replicates your data (objects) allowing a high level of persistence. For more information on the service.
  4. Controlling access
    You can control access to your data from anywhere on the Internet.




Comments

Popular posts from this blog

Hack Like a Pro: Perl Scripting for the Aspiring Hacker

Understanding the Link between social media , ID theft and your credit card

21 Tips To Get Adsense Approval For Your Blog