Energy Efficient Clustering Scheme for Wireless Sensor Networks: A Survey

Wireless sensor networks are application specific networks composed of large number of sensor nodes. Limited energy resource of sensor nodes make efficient energy consumption of nodes as main design issue. Energy efficiency is achieved from hardware level to network protocol levels. Clustering of nodes is an effective approach to reduce energy consumption of nodes. Clustering algorithms group nodes in independent clusters. Each cluster has atleast one cluster head. Nodes send data to respective cluster heads. Cluster heads send data to base station. Clustering algorithms prolong network lifetime by avoiding long distance communication of nodes to base station. In literature various clustering approaches are proposed. Work of this paper discusses working of few of them and distinguishes them according to operational mode and state of clustering. Work of this paper helps to understand classification of clustering schemes.


Introduction
Wireless sensor network [1] is collection of large number of sensor nodes deployed to monitor an area. Sensor nodes sense the area and send sensed data to base station via single or mult i-hop co mmunication. Due to s mall in size and low cost sensor nodes wireless sensor networks have very rapid and large application area like military surveillance [3], environment monitoring [4], agriculture [5], health monitorin g [6], automotive [7], industry [8].
Wireless sensor networks are deployed in a harsh environment. Sensor nodes once deployed in field work unattended. Sensor nodes have limited energy on their onboard battery. Due to harsh working area, it is quite impossible to recharge and replace battery of nodes. All processing done by sensor nodes, i.e. sensing, data processing, communication of data, is energy consuming. Among above communication of data is the most energy consuming process. Lifet ime of wireless sensor network depends upon sensor nodes. So energy of sensor nodes should be consumed very economically and efficiently. Hence energy efficiency of nodes is key design issue for wireless sensor networks [9].
Clu s tering o f no des [10,11] is an en ergy efficien t approach for wireless sensor networks. In clustering, nodes are grouped to form clusters. Each cluster has atleast one cluster head (CH). Instead of sending data directly to base station (BS), nodes send data to their corresponding CH via single or mu lti hop commun ication. CH receives data of all nodes in clusters and aggregates it. CH sends aggregated data to BS again via single or mult i-hop. After certain time period (round time) re-clustering of nodes is performed. Clustering of nodes avoid long distance communication of nodes to BS. Only few nodes i.e. CHs are sending data over long distance. Avoidance of long distance communication is preserving energy of sensor nodes. While reduction of data due to aggregation conserves energy of CHs. Clustering schemes use TDMA schedule for intra cluster communicat ion. Nodes are assigned slot for sending data. Nodes conserve energy by transiting to sleep state for slots of other nodes and avoid idle listening and overhearing. Nodes are sending data with slot assigned to them, hence avoid collision. Avoiding collision, id le listening, and overhearing further conserves energy of nodes.
There are various clustering algorith m based survey papers in literature. Motivation of this paper is to describe issues of clustering approach and to provide adequate and updated knowledge of clustering schemes for wireless sensor network with their advantages/disadvantages in aspect of cluster format ion, cluster head selection, etc. Section 2 describes issues related to clustering scheme and section 3 describes various clustering approaches. Section 4 concludes work of paper. [12] suggests necessity of clustering approach to provide low-energy data processing and intra-and inter-c luster communicat ion. In clustering approach, wireless sensor network can be considered of having following parts:

Clustering Issues
 Cluster: Cluster is a group of sensor nodes.  Member Nodes: Nodes in a cluster are member of this cluster.
 Cluster Head: In most approaches, there is one cluster head for each cluster. CH manages operation of member nodes.
 Base Station: Base Station is relay between network and end-user.

Load Balancing
CH in a cluster is doing much higher work than member nodes of cluster. Hence energy o f CH is consumed mo re than member nodes. To provide load balanced network, ro le of CH should be rotated among other nodes. Size of clusters should be optimized to ensure equal energy consumption of clusters. Round time for re-clustering is same for all clusters, so clusters with less number of nodes do more work as compared to larger clusters [12]. So me upper or lo wer or both limit on nu mber of nodes should be imposed on clusters for optimizing clusters and to provide load balanced network.

Fault Tolerance
A selected CH might not have enough memory to complete work and is out of energy in middle of round. Data of that cluster will be lost. Clustering protocol should be ready for these kinds of faults. An immediate re-clustering is a way to provide fault tolerance. But that will be an energy burden for other working clusters. A back-up CH or a set of CHs are easy and effective way. Again rotation of role of CH among nodes will make the cluster in function after complet ion of that round.

Cost of Clustering
In cluster formation nodes are exchanging control messages and status messages. Overhead of these messages consumes vulnerable energy of nodes. So re-clustering of nodes is consuming un-necessary energy. Clustering schemes should lower frequency of re-clustering process.

Nu mber o f Clusters
High nu mber o f CHs will create more clusters and reduce energy efficiency of clustering. Fewer clusters will make CH overburden. So clustering schemes should optimize the number of clusters.

Classification of Clustering Schemes
Classification of clustering schemes for wireless sensor networks is shown in Fig. 1. According to operational mode, clustering schemes can be classified as distributed and centralized. In distributed approach, nodes locally exchange informat ion for selection of CHs and formation of clusters. In centralized approach, a central node, like base station, is controlling selection of CHs and formation of clusters.
According to the state of clusters, clustering schemes can be classified as static clustering and dynamic clustering. In static clustering, clusters are formed once and CHs are selected among nodes of cluster. In dynamic clustering, clusters are re-formed after co mplet ion of round.
According to characteristics of sensor nodes clustering algorith ms can be categorized as homogenous clustering and heterogeneous clustering. Heterogeneous clustering algorith ms classify nodes as normal nodes and super nodes. Super nodes are having much higher energy than normal nodes and have high chance of CH selection. Ho mogenous clustering schemes do not distinguish nodes even there are super nodes. All nodes have equal chance of CH selection.

Clustering Algorithm for Wireless Sensor Networks
Energy of node is most valuable for wireless sensor network protocols. Various clustering schemes are proposed in literature that conserves energy of nodes. This section explains few schemes and discusses advantages and disadvantages. At the end of section, schemes are summarized according to the classification described in section 2.2.
LEA CH (Lo w Energy Adaptive Clustering Hierarchy) [13] is a clustering-based protocol that minimizes energy dissipation in sensor networks. The purpose of LEA CH is to randomly select sensor nodes as CHs, so high energy dissipation in communicating with the BS is spread to all sensor nodes in network. Round of LEA CH is divided into two phases, set-up phase and steady phase. During set-up phase, a sensor node chooses a random nu mber between 0 and 1. If this random nu mber is less than the threshold T (n), sensor node is a cluster-head. T (n) is calculated as where P is the desired percentage to become a cluster-head; r, the current round; and G, the set of nodes that have not being selected as a cluster-head in the last 1/P rounds. CHs advertise to network about their status. Sensor nodes receive advertisement and decide CH to join accord ing to receive signal strength of status message. After having information about member nodes, CH decide TDMA schedule and broadcast it to cluster. During steady phase, sensor nodes begin sensing and transmit data to the CHs. CHs aggregate data before sending these data to the base station. After a certain period of time spent on steady phase, network goes into set-up phase again and entering into another round of selecting cluster-heads.
LEA CH is a fully d istributed scheme. Ro le of CH is rotated among all the sensor nodes to have network load balanced. But the protocol does not guarantee about equal number of CHs in each round and number of member nodes in each cluster. Clusters formed of uneven size makes network load unbalanced. [14] propose an improvement over LEA CH by selecting CH not randomly but considering remaining energy when energy level drops below 50% o f the init ial energy. Cluster head join process is determined not only by received signal strength but also by remaining energy of cluster head. The data is sent by a node only if data satisfies a predefined condition. But scheme does not have view on informality in clusters.
LEA CH-C [15] is centralized algorithm to fo rm cluster and to assign duty of cluster heads. During set-up phase, nodes send information about respective location and energy level to BS. BS formulates clusters using simulated annealing algorith m [16]. In addition to forming good clusters, BS needs to do load balancing. To do so, BS calcu lates average node energy and nodes having energy lower than average will not participate in CH selection. Algorith m provides CHs such that nodes minimize their transmission distance and conserve energy. After the format ion of clusters and cluster heads, BS broadcasts a message that contains the informat ion of CH ID for each node. The steady phase is same as of LECAH.
In LEACH-F [15], clusters are formed using centralized cluster format ion algorith m developed for LEA CH-C. BS uses simulated annealing to determine optimal clusters and broadcasts the cluster information to the nodes. This broadcast message includes the cluster ID for each node, fro m which the nodes can determine the TDMA schedule and the order to rotate the CH position. The first node listed in the cluster becomes CH for the first round; the second node listed in the cluster becomes CH for the second round, and so forth. Using LEACH-F, there is no setup required for different rounds.
Adaptive Decentralized Re-clustering Protocol (ADRP) [17] selects a CH and set of next heads for upco ming few rounds based on residual energy of each nodes and average energy of cluster. A round in ADRP has two phases as shown in Fig. 2: initial phase and cycle phase.
In the initial phase, nodes send status of their energy and location to BS. BS partitions the network in clusters and selects a CH for each cluster. BS also selects a set of nodes as next heads to avoid re-clustering for few rounds. So a node in a cluster can be in one state out of three: cluster head, next head, and member. In the cycle phase, each CH d istributed constituted TDMA schedule to node. Nodes send data to CH according to allotted time slot. CH receives and aggregates data and sends it to BS. In re-cluster stage, nodes transit to cluster head fro m set of next heads without any assistance fro m BS. Previous CHs are now member nodes. If set of next heads is empty, initial phase is executed again.
The set of next heads avoids re-clustering for few rounds and conserve energy of nodes. But no new nodes can be added until next initial phase. If a node in the next head list is dead, then the sets for cluster will be uneven. Then when the initial phase will be executed?
Energy-Balanced Unequal Clustering (EBUC) [18] is a centralized protocol that organize network in unequal clusters and CHs relay data of other CHs via mu lti-hop routing. PSO is applied at BS to select high energy nodes for CH role and for format ion of clusters with unequal nodes as shown in Fig. 3. Clusters closer to BS are fo rmed of small size to consume less intra-cluster energy and hence are ready for inter-cluster co mmunication energy consumption.
At the starting of first set-up phase, nodes send their energy and location informat ion to BS. Operat ion of clustering is done by BS. BS co mputes average energy level of each node and uses this information fo r CH selection and cluster member nodes. BS estimates energy consumption of each node at end of round that is used for next round. So nodes do not need to send information message to BS again. Inter cluster mu lti hop routing depends upon a cost function that uses the distance between CHs, distance of relay CH to BS and residual energy of relay CH. At the end of set-up phase, BS broadcasts informat ion about clusters and mu lti-hop routing. At the beginning of steady phase, CH broadcasts TDMA schedule to cluster. Node sends data to CH according to allotted TDMA time slot. CH aggregates collected data and sends to BS v ia mu lti-hop inter-c luster routing.  BS estimates the residual energy of nodes and avoids the overhead of sending status message again. Protocol seems to work on ly when BS is located outside the interested working area. Because clusters are of different size and round time is fix, smaller clusters will start sending mo re data to their CH and consumes more energy as compared to the nodes of bigger clusters.
Energy-Aware Routing Protocol (EAP) [19] proposes a new routing scheme for inter-cluster co mmunication and also provides new parameters for cluster head selection to handle heterogeneous energy of nodes. A node maintains a table of residual energy of neighbouring nodes within cluster range of node to calculate average residual energy of all these nodes. A node having residual energy higher than average residual energy has high probability of cluster head selection.
Each round of EAP starts with a set-up phase that organizes nodes in clusters and construct routing tree for cluster heads. In steady phase, sensed data of nodes float to BS via CHs. At beginning of set-up phase, nodes update table of neighbouring nodes residual energy by exchanging informat ion. Each node calculates the broadcasting delay for co mpeting cluster head that depends upon average residual energy of the cluster range of nodes and residual energy of node. That makes EAP to handle heterogeneous energy. An intra-cluster coverage scheme is imposed to select the active nodes among all nodes in cluster that will cover the expectation area of the cluster. The scheme reduces the redundant nodes and TDMA have fewer nodes to schedule. For CHs, a routing tree is constructed according to the weight of CH. Weight is calcu lated as: The scheme has advantage of applicable for both homogenous and heterogeneous energy nodes as it handles heterogeneous energy issue. But the protocol consumes memo ry of nodes by maintaining a table their neighbourhood nodes. Updating of table consumes energy of nodes.
Distributed Energy-Efficient Clustering (DEEC) [20] is an energy-efficient clustering scheme for heterogeneous wireless sensor network. In DEEC, CH selection is probabilistic based on the ratio of the residual energy of each node and the average energy of the network. So the nodes with high residual energy have more chances for selection of CH as compared to low energy nodes. DEEC consider two-level heterogonous network. The heterogeneity of nodes is based on energy of nodes. There are two types of sensor nodes: Advanced nodes and Normal nodes. Advanced nodes have high initial energy as compared to normal nodes.
In DEEC the init ial and residual energy level of the nodes are considered for CH selection wh ich require g lobal knowledge of networks. In DEEC to avoid global knowledge of networks by each node, ideal value of network life-time is estimated which is helpful to co mpute the reference energy that each node expends during a round. Energy Efficient Heterogeneous Clustered (EEHC) scheme [21] extend the node heterogeneity up to three levels. Nodes are of three types; Super nodes, Advance nodes and Normal nodes. Super nodes have highest energy among all; hence have highest chances of selection for CH. EEPSC (Energy-Efficient Protocol with Static Clustering) [22] is a base station assisted static clustering scheme. Each round of EEPSC consists of set-up phase, responsible node selection phase and steady phase. The set-up phase is executed once at beginning of network operation to partition the network. For k desired number of CHs, base station broadcasts k-1 different messages with different transmission powers. All the sensor nodes receiving k=i (1<=i<=k-1) message set their cluster ID to i. Remaining sensor nodes in the network that has not joined any cluster, set their cluster ID to k as shown in Fig. 4 and inform to base station.

Figure 4. Network partition into clusters
After the cluster formation, BS selects one temporary CH for each cluster and broadcast to the whole network. In addition, BS set ups TDMA schedule for each cluster. Set-up phase is co mpleted after TDMA broadcast. Cluster formation and TDMA schedule is done once. In Responsible node selection phase temporary CH and CH are selected for that round. At the beginning of each round, temporary CH receives energy information of each node. Nodes with highest energy are selected as CH and nodes with lo west energy as temporary CH for their cluster. In Steady-phase nodes send field informat ion to CH. CH aggregates the data and send to BS.
The scheme conserves energy by avoiding overhead of re-clustering. The clusters forms are stable and CH selected has highest energy. But the scheme is not able to add new nodes in between the operation because clusters are formed once. If a node is out of energy, then time allotted to that is not used by other node. That increases the delay and consumes extra energy of CH node.
CP [23] uses the idea of The Covering Problem wh ich aims to cover an area with minimu m nu mber of circu lar d isks. CP selects the smallest set of CHs such that all nodes belong to any cluster. The protocol is only concerned about the cluster formation and cluster head selection. Nodes in the network can have three states: unclustered, clustered and cluster head. A node can be a cluster head only when its distance to the nearest cluster head is greater than a threshold Th.
The formation of cluster is init iated by an initiator and BS is always designated as initiator. Initially all nodes are unclustered. The initiator forms its cluster with selected orientation by broadcasting cluster head advertisement (CHA) limited to 2-hop. Upon receiving a CHA directly fro m a CH, an unclustered node accepts that nodes as its CH and changes status to clustered. The node will forward the CHA to other nodes. Nodes receiving a CHA indirectly will calculate distance to the centre of cluster and sets the timer top t=f(d). Upon receiving a CHA d irectly fro m a CH befo re the timer exp ires, node will change its status to clustered and sets that node as CH. If the node does not receive any CHA directly fro m a CH befo re the timer exp ires, nodes will consider itself as CH. The value of f(d) depends upon the density of network.
The protocol can be combined with other clustering scheme to provide uniform clusters. But the scheme does not consider the energy of nodes for cluster head selection. Position of nodes must be pre-engineered. The scheme is applicable if the BS is located inside the network.
Energy-efficient Data Aggregation Protocol Based on Static Clustering (EDASC) [24] applies Hausdroff distance for clustering. The process of clustering is initiated by BS by appointing an initiator. Each node knows all its neighbors through broadcasting a topology discovery message. Init iator broadcasts clustering message and construct cluster by adding nodes to cluster upon satisfaction of two conditions: 1) The Hausdroff distance between the node and the cluster smaller than r (intra-cluster distance).
2) If the node is admitted, the Hausdroff distance between two neighboring clusters must be no longer than 3r.
After cluster format ion, the operation of EDASC is divided into rounds. Each round starts with selection of CH followed by construction of data aggregation tree in cluster. The data is transmitted fro m node to CH to BS. A minimu m spanning tree is constructed among CHs with CH closet to BS as root.
Protocol makes cluster more stable and conserves energy of nodes by avoiding re-clustering overhead. Updating knowledge of neighbors consumes energy of nodes. Clusters close to BS will consume more energy because CH of that clusters are always root for CHs tree.
The proposed approach in [25] is controlled by base station and a head-set is managed for each cluster to distribute the load of cluster head. Base station knows about the active nodes in the field and energy level of each node. BS determines the suitable number of clusters using that informat ion. BS broadcasts the information about CHs. Then CHs construct their cluster and determines the head sets. At one time, only one member of head set is active and receives data from nodes. The task of transmission of aggregated data to base station is distributed uniformly to all the head sets. Along with the data, energy information of nodes is also send to BS for CH selection of next round.
The main concern of QoS-based Adaptive Clustering (QA C) [26] is to increase reliability of network along with lifetime o f the network. QAC algorith m prolongs reliab ility by applying concept of master CH and slave CH in a cluster as shown in Fig. 5. QAC assumes that initial interim cluster heads are deployed with network deploy ment. BS awakes these interim CHs and these nodes are first master CHs. During the set-up phase, master CH broadcasts the advertisement message about their status. Master CH selects a slave CH among the nodes in interim cluster if number of nodes in interim cluster exceeds to given threshold. Master CH broadcast message for slave cluster formation. Nodes in interim cluster select their CH among master CH and slave CH. TDMA schedules are broadcasted for both master and slave cluster. Duty of master CH and slave CH is transferred to other if one of them is not working. The duty transfer increases the reliability of network. After T w time, the master CH beco mes automatically an interim CH and selects new master CH for next round.

Conclusions
Application of wireless sensor network ranges fro m military surveillance to home application to industry to environment monitoring. Limited energy of sensor nodes distinguishes that network to other networks. Working of wireless sensor network depends upon energy of sensor nodes. That makes energy efficiency of nodes a very critical issue for protocol design. Aim of this paper was to provide overview of clustering schemes proposed in literature. Work suggests that clustering of nodes is energy efficient way because clustering avoids long distance communication as well as avoids intra cluster collision, id le listening and overhearing. Work of paper exp lains wo rking, merits and demerits of clustering schemes. Paper did not suggest that which one of them is best.