Before getting know what is Hadoop, let us take a quick view of Big Data because Hadoop is something which works ok Big Data.What is Big Data?Usually, on daily basis we work on some GB’s of data. But, on large scale such as Government institutes and social media sites and several other platforms where million of users reside and share data. It is no longer usual data, we encounter in our daily lives. It’s large amount of data, having hundreds and millions of information in Terabytes. This is Big Data. Big Data is difficult to store, collect, maintain, analyze and visualize.IntroductionHadoop an open source, java based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment of interconnected systems where several operations are performed on Big Data.It was developed by Dough Cutting and Mike Cafarella in 2006. Hadoop was inspired by Google’s MapReduce programming framework. Hadoop is capable to work on cluster which is a network of thousand of interconnected machines known as NODES and handle to thousand of terabytes data, i.e., Big Data.Hadoop provides rapid Data Transfer along with nodes through Java based programing codes on which developers perform several Big Data operations such as storage, maintenance, analyzing and visualizing.How does Hadoop works on Big Data?Hadoop works on Distributed File System (DFS), it consists a cluster of thousands of interconnected machines which are nodes. In a cluster, there are two types of Nodes, namely Name Node and Data Node.In a cluster, there is only one Name Node at a time and thousands of Data Nodes. This is also known as Master-Slave system. Where Name Node acts as a Master Node and all other Data Nodes works as Slave Nodes. Both types of Nodes have some specific functions to perform and these are described as below :Hadoop Distributed File System HDFS architecture – it has 2 type of nodes-1. Name Node – it is known as Master Node of the Master-Slave system.• It doesn’t actually stores the data, in fact it stores the metadata of machines, which means that on which particular Data node is which Data block is sent. It has all the information of all the Data nodes of the cluster.• It knows the set of blocks and its location for any given files in HDFS, with this info it knows how to construct file from blocks.• It usually configured with a lot of memory (RAM) because the block locations are help in main memory.2. Data Node – It is known as the Slave Node of the Master-Slave system.• It is responsible for storing the actual data in HDFS.• Every Data Node of the cluster remains in contact with the Name Node at each point of time in the system.• When it is down, it does not affect the availability of data of cluster. Name Node will arrange for replication for the block managed by Data Node that is not available.There is also an another Node which resides in the interconnected cluster of Hadoop known as Yet Another Resource Negotiator (YARN). The operation of YARN is to manage the resources of the cluster and distributing data of failed machines to another machines, mainly data of Data Nodes, because there are plenty of chances for the failure of a machine in thousand of interconnected cluster of machines.NOTE : Hadoop is very flexible and there are very less chances for the failure of the interconnected cluster because whenever a Data Node fails, it’s Data is distributed to another Data Nodes. So we can say that failure of any Data Node can’t be the failure of whole cluster. But failure of Name Node, which is at the head of all Data Nodes, the whole system of network fails. Failure of the Name Node is the single point of failure of the whole system. To prevent this, we keep another Secondary Name Node which take the lead charge on the failure of Primary Name Node. But we should be noted that at a time in a system, only one Node works as Primary Name Node. Secondary Name Node works in case of Primary Name Node failure only.