Various multistage ensembles for prediction of heating energy consumption

Feedforward neural network models are created for prediction of daily heating energy consumption of a NTNU university campus Gløshaugen using actual measured data for training and testing. Improvement of prediction accuracy is proposed by using neural network ensemble. Previously trained feed-forward neural networks are first separated into clusters, using k-means algorithm, and then the best network of each cluster is chosen as member of an ensemble. Two conventional averaging methods for obtaining ensemble output are applied; simple and weighted. In order to achieve better prediction results, multistage ensemble is investigated. As second level, adaptive neuro-fuzzy inference system with various clustering and membership functions are used to aggregate the selected ensemble members. Feedforward neural network in second stage is also analyzed. It is shown that using ensemble of neural networks can predict heating energy consumption with better accuracy than the best trained single neural network, while the best results are achieved with multistage ensemble.


Introduction
The study of the building energy demand has become a topic of great importance, because of the significant increase of interest in energy sustainability, especially after the emanation of the EPB European Directive.In Europe, buildings account for 40% of total energy use and 36% of total CO 2 emission Council. (2010).According to Bergesen et al. (2013), 66% of the total energy consumption of residential buildings occurs in the space heating sector of Norwegian residential buildings.Therefore, the estimation or prediction of building energy consumption has played very important role in building energy management, since it can help to indicate above-normal energy use and/or diagnose the possible causes, if there has been enough historical data gathered.Scientists and engineers are lately moving from calculating energy consumption toward analyzing the real energy use of buildings.One of the reasons is that, due to the complexity of the building energy systems and behavior, non-calibrated models cannot predict well building energy consumption, so there is a need for real time image of energy use (using measured and analyzed data).
The classic approach to estimate the building energy use is based on the application of a model with known system structure and properties as well as forcing variables (forward approach).Using different software tools, such as EnergyPlus, TRNSYS, BLAST, ESP-r, HAP, APACHE requires detailed knowledge of the numerous building parameters (constructions, systems) and behavior, which are usually not available.Perera et al. (2014), developed continuous time mathematical heating model for a building unit based on the first principles.The developed model was implemented in a MATLAB environment, and mainly a theoretical approach is used to validate it for a residential building unit.Model is also validated using experimental data.
In recent years, considerable attention has been given to a different approach for building energy analysis, which is based on the so called "inverse" or datadriven models Kusiak et al. (2010).In a data-driven approach, it is required that the input and output variables are known and measured, and the development of the "inverse" model consists in determination of a mathematical description of the relationship between the independent variables and the dependent one.The data-driven approach is useful when the building (or a system) is already built, and actual consumption (or performance) data are measured and available.For this approach, different statistical methods can be used.Artificial neural networks (ANN) are the most used artificial intelligence models for different types of prediction.The main advantages of an ANN model are its self-learning capability and the fact that it can approximate a nonlinear relationship between the input variables and the output of a complicated system.Feedforward neural networks are most widely used in energy consumption prediction.Ekici and Aksoy (2009) proposed a backpropagation three-layered ANN for the prediction of the heating energy requirements of different building samples.Dombaycı (2010) used hourly heating energy consumption for a model house calculated by degree-hour method for training and testing the ANN model.In Ekonomou (2010) actual recorded input and output data that influence Greek long-term energy consumption were used in the training, validation and testing process.Li et al. (2011) proposed the hybrid genetic algorithm-adaptive network-based fuzzy inference system (ANFIS) which combined the fuzzy if-then rules into the neural network-like structure for the prediction of energy consumption in the library building.The calculated results indicated better performance compared with ANN in terms of forecasting accuracy.An excellent review of the different neural network models used for building energy use prediction was done by Kumar et al. (2013).The ensemble of neural networks is a very successful technique where the outputs of a set of separately trained neural networks are combined to form one unified prediction, Zhou et al. (2002).Since an ensemble is often more accurate than its members, such a paradigm has become a hot topic in recent years and has already been successfully applied to time series prediction Melin et al. (2012), weather forecasting Taylor and Buizza (2002), load prediction in a power system Siwek et al. (2009).
The main idea of this paper is to propose multistage neural network ensemble for prediction of heating energy use.The ensemble members are chosen among 50 separately trained feedforward neural networks using k-means clustering, and for combining their outputs different ANFIS in second stage are used.

Feedforward neural network (FFNN)
Artificial neural network (ANN) method is a computational intelligence technique, based on the information processing system of the human brain and which may be used as an alternative method in engineering analysis and prediction.ANNs work as a black-box model, thus, it is not necessary to have detailed information about the system.Instead, they learn the relationship between input and output variables by means of historical data, similar to the way a nonlinear regression might perform Karatasou et al. (2006).The FFNN architecture consists of an input layer, an output layer, and one or more hidden layers of interconnected processors called neurons.Each layer has a number of neurons and each neuron is fully interconnected with adaptable weighted connections to neurons in the subsequent layer.Therefore, each neuron receives input signals from other neurons or external stimuli, processes it locally through an activation function and produces a transformed output signal to other neurons or external outputs.The nonlinear activation functions in the hidden layer neurons enable the neural network to be a universal approximator.The process of training network is the adjustment of the weights, so that the network can produce the desired response to the given inputs.Different training algorithms could be applied to minimize the error function, but the most widely used are the backpropagation algorithm and the algorithms derived from it.They use a gradient descent technique to minimize the cost function which is the mean square difference between the desired and the actual network outputs.In this study, a multilayer feedforward network with single hidden layer and backpropagation learning algorithm (BPNN) is used.
In BPNN the learning algorithm has two phases.First a training input data set is presented to the network input layer.The network then propagates the input data set from layer to layer until the output data set is generated by the output layer.If this data set is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer.The weights are modified as the error is propagated.

Artificial neural network ensembles
Many engineering problems, especially in energy use prediction, appeared to be too complex for a single neural network.Researchers have shown that simply combining the output of many neural networks can generate more accurate predictions and significantly improve generalization ability than that of any of the individual networks Hansen and Salamon (1990).Theoretical and empirical work showed that a good ensemble is one where the individual networks have both accuracy and diversity, namely the individual networks make their errors on different parts of the input space Qiang et al. (2005).An important problem is, then, how to select the aggregate members in order to have an optimal compromise between these two conflicting conditions Granitto et al. (2005) (2000), genetic algorithm (GASEN) proposed by Zhou et al. (2001) and PSO based approach proposed by Fu et al. (2004).Clustering technology can be used to divide all networks into some groups (clusters) according to similarity of the networks.Then, one most accu-rate individual in each group on the validation set is selected, and finally, all selected individuals construct the ensemble.

K-means for selecting ensemble members
In Qiang et al. (2005)  Ease of implementation, simplicity, efficiency, and empirical success are the main reasons for its popularity.This technique is based on distance matrix, using Euclidean distance as a criterion.It starts with m initial cluster centers and for all data, Euclidean distance from each cluster center is calculated, after which the data points are assigned to the closest cluster center.This method is being repeated until the squared error between the empirical mean of a cluster and the points in the cluster is minimized.When using k-means for selecting neural network ensemble members, the goal is to divide prediction data achieved by individual networks y = {y 1 , . . ., y r } into m clusters, where number of elements in each cluster is n i , and the center of cluster is c i .So clustering can be achieved by finding c i which makes minimized.Obviously, after clustering the diversity between networks in different cluster groups is greater than those within the same group.The diversity is maintained by choosing the most accurate networks in each group as a member of the ensemble.In k-means algorithm, cluster number m must be determined in advance.To select the best m value, the prediction indices of the created ensemble can be compared.Linear combination of the outputs of ensemble members is one of the most popular approaches for combining selected network outputs (simple or weighted).Different approach comprises using neural network for combining selected ensemble members.Ilić et al. (2012) proposed the system comprised of two ANNs assembled in a hierarchical order.In this paper, for the second stage, adaptive neuro-fuzzy inference system (ANFIS) is proposed.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The process of fuzzy inference involves membership functions, fuzzy logic operators, and if-then rules.Fuzzy inference systems (FIS) have been successfully applied in fields such as automatic control Skullestad et al. (2001), monitoring and maintenance Cibulka et al. (2012), data classification, decision analysis, expert systems,and computer vision.Overview of possible application of fuzzy logic in modeling, identification and control can be found in Zadeh (1994).The adaptive network-based fuzzy inference system (AN-FIS) proposed by R. Jang Jang (1993) is one of the most commonly used fuzzy inference systems, and its architecture is obtained by embedding FIS into the framework of adaptive networks.The generalization capability of the fuzzy logic is very poor because it uses the heuristic algorithms for defuzzification, rule evolution and antecedent processing.The main disadvantage of neural network is how to determine proper size and optimal structure of the network.Also, the relationships of weight changes with input output behavior during the training and use of trained system to generate correct output using the weights are very complicated to understand, like a "black box".Combining fuzzy logic and neural network is preeminent idea to overcome the disadvantages of both techniques.Neural networks are used to tune the membership functions of fuzzy systems even for complex systems, Singh et al. (2012).The outstanding property of ANFIS is that it compensates the disadvantage of FIS with the learning mechanism of NN.The architecture of the ANFIS used in this study is based on the first-order Takagi-Sugeno model Takagi and Sugeno (1985).For a simple MISO system (multi-input, single-output), having two inputs (x 1 , x 2 ) and one output (y), typical rule set can be expressed as: The ANFIS architecture is shown in Figure 1.It is composed of five layers where each layer contains several nodes described by the node function.Let O j i denote the output of the i-th node in layer j.
Layer 1: In the first layer, all the nodes are adaptive nodes.The outputs of layer 1 are the fuzzy membership grades of the inputs, which are given by: Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 where A i and B i are the linguistic labels and µ Ai and µ Bi are the membership functions for A i and B i linguistic labels, respectively.As node functions in this layer any continuous and piecewise differentiable functions, such as commonly used trapezoidal, triangularshaped, Gaussian or generalized bell membership functions, can be used.Therefore, outputs of this layer form the membership values of the premise part and parameters contained in membership functions of fuzzy sets called premise parameters.
Layer 2: In contrast to layer 1 the nodes in this layer are fixed.The output O 2 i of the node i can be computed as: where w i represents a firing strength of a rule.Layer 3: In this layer where the normalization process is performed, the nodes are fixed.The ratio of the i-th rules firing strength to the sum of all rules firing strengths is calculated for the corresponding node and thus the outputs of this layer are called normalized firing strengths: Layer 4: The fourth layer deals with the consequent part of the fuzzy rule.Every node i in this layer is an adaptive node and it calculates the contribution of ith rule in the model output function which is defined based on the first-order Takagi-Sugeno method as: where {a i , b i , c i } is the parameter set.Parameters in this layer are referred to as consequent parameters.
Layer 5: This is the summation layer, which consists of a single fixed node.It sums up all the incoming signals and produces the output: From the proposed ANFIS architecture, it is observed that given the values of premise parameters, the overall output can be expressed as a linear combinations of the consequent parameters.More precisely, the output y can be rewritten as: In the training process, the least squares method (forward pass) is used to optimize the consequent parameters with the premise parameters fixed.Once the optimal consequent parameters are found, the backward pass starts immediately.The gradient descent method (backward pass) is used to adjust optimally the premise parameters corresponding to the fuzzy sets in the input domain.The output of the ANFIS is calculated by employing the consequent parameters found in the forward pass.The output error is used to adapt the premise parameters by means of a standard back propagation algorithm.

Case study
According to Bergesen et al. (2013), 66% of the total energy consumption of residential buildings occurs in the space heating sector of Norwegian residential buildings.University campuses are specific groups of diverse buildings, with significant energy consumption, Sretenovic (2013).They consist of many different buildings, representing small-scale town for itself.Therefore, they provide an excellent testbed to characterize and understand energy consumption of group of mixed use buildings.Norwegian University of Science and Technology (NTNU) campus Gløshaugen consists of 35 buildings, with total area of approximately 300,000 m 2 .Building and Energy Management System (BEMS) and web-based Energy Monitoring System (Energy Remote Monitoring-ERM) are available at NTNU.The Schneider ERM system is an Automatic Monitoring and Targeting system with advanced analysis features, which receives main meter and submeter consumption data and provides system energy reporting, alarming, monitoring and analysis.There are 46 heating meters installed in campus.Hourly heat and electricity consumption from all meters can be collected on ERM (EnergyRemoteMonitoring (2014)).District heating net is organized in form of the ring, while the main heat exchanger is installed in Old electric building (Figure 2).The Main meter is installed by the district heating supplier, so it was taken as relevant.Daily heating energy consumption was analyzed in this paper.Creating a model of energy use helps in future building planning; it can provide useful information about most probable energy consumption for similar buildings, or predict energy use in different conditions.Also these models can be used to show impacts of possible energy savings measures and help in finding optimal way of reducing energy costs.It is also very important to have correct and reliable measured data.If a part of a building is leased to other users (which is the case in campus Gløshaugen), there is necessity for calculating bills for each tenant.There is increased interest in data error analysis and developing methods that can point out possible meters malfunction.Also, without correct measured data it is not possible to monitor and prove benefits of applying energy saving measures for increasing energy efficiency.Creating representative model of heating energy consumption can also indicate errors in measured data.After the analysis, database is divided as follows:

The Old Electric Building
• Cold period from January 1 st until March 31 st and from November 1 st until December 31 st • Mild period From April 1 st until June 15 th and from September 16 th until October 31 st • Warm period (outside of heating season) -June 15 th until September 15 th is excluded from the analysis It implicates that better prediction results can be obtained using separate network models for each period compared to using one network for all year.In this paper, only the cold period (with biggest heating energy consumption) is analyzed.
The daily heating consumption is analyzed in terms of the type of the day.The correlation with mean daily outside temperature for each day of the week for the year 2012 is shown in Figure 5. Analysis showed that there is no specific difference between the working days (heating consumptions for Monday to Friday have similar trendlines), while the regression lines for Saturday and Sunday are below them, as expected.In NTNU campus Gløshaugen, heating is not switched off during the weekends, only the design set-point is lowered, so the heating consumption on Monday is not significantly different than the other working days.The analysis of the daily heating consumption also showed that during the holidays and exam periods, heating operation is at the same level as for the working days (heating is maintained at the designed set-point).These conclusions implicate that there should be two separate networks created: one for the working days, and other for the weekend.In this paper, the network for the working days is analyzed.

ANN models development
The most important task in building an ANN prediction model is the selection of input variables.Many different studies dealing with impact of various variables on energy consumption can be found in literature.Empirical research of the influence of hourly values of solar radiation and wind speed on heating demands of building complex heated by district heating system was conducted in Wojdyga (2008).The research results confirmed the influence of increasing heat demand in case of higher wind speeds and decreasing heat demand in cases of sunny days occurring during the heating season.All input variables for the neural network model, that are considered in this study, are: mean daily outside temperature [  6).Therefore, the heating consumption of the previous day is selected as additional input variable.In that way, the prediction is always done for one day ahead.For long-term forecast it is necessary to use this model to perform prediction day by day.However, in that case, the prediction error is accumulated.But, even in the case of static model, where the values of inputs and/or outputs variables of the model for the previous day are not used, for the prediction for longer period in advance it is necessary to have input variables (temperature, wind speed, etc.) for that period.One way is to develop models to separately predict these input variables and then use them to predict consumption, which would again result in error being accumulated at the end.The ANN architecture used in this study is a threelayer FFNN composed of one input layer, one output layer and one hidden layer, with LevenbergMarquardt learning algorithm.In the hidden layer and output layer tansig (sigmoidal) and linear (purelin) activation functions are used.During the application study many different values of hidden neurons were examined using trial and error method, and the best results are achieved with one hidden layer with 10 neurons.For training the models, data for the working days in the cold period (from January 1 st until March 31 st and from November 1 st until December 31 st ) for years 2009, 2010 and 2011 were used (318 samples in total), and for testing 2012 (100 samples).Data with obvious errors and heat meter malfunctions were removed from the dataset.To ensure that no special factor is dominant over the others, all inputs and outputs are normalized to the interval (0, 1) by a linear scaling function.The prediction accuracy of all proposed models is measured by the coefficient of determination (R 2 ), root mean square error (RMSE) and mean absolute percentage error (MAPE).

Neural network ensemble
Possible improvement of the prediction accuracy by using network ensemble is examined.The application of an ensemble technique is divided into two steps.As the first step, after training numerous FFNNs, 50 networks with satisfying accuracy are selected for possible members.The second step is the adequate combination of outputs of the ensemble members to produce the most appropriate output.In order to improve ensemble efficiency, we need to ensure both accuracy of networks and diversity between individuals.The diversity is achieved by appropriate selection of members from many previously trained networks.Considering the difficulty of selecting diversity and accuracy at the same time, we can apply an easier method to gradually achieve both goals.First, we employ clustering technology to divide all networks into some groups (clusters) according to similarity of the networks.Then, one most accurate individual in each group on the validation set is selected.Finally, all selected individuals construct the ensemble, as it can be seen in Figure 7.
There are different methods for combining the outputs.The conventional approach is to use averaging: simple or weighted.The other method, multistage approach, which is expected to give an even better improvement in accuracy, would be to use another neural network as an integrator of the individual classifiers.Two different network architectures are proposed in second stage: FFNN and ANFIS.Generally, structure identification in fuzzy modelling involves several tasks: selecting input variables, input space partitioning, choosing the number and types of membership function for inputs, creating fuzzy rules an selecting initial parameters for membership function.In this paper, different ANFIS models are constructed using three different identification methods: grid partitioning, subtractive clustering and fuzzy C-means clustering.In the grid partitioning method, the domain of each antecedent variable is partitioned into equidistant and identically shaped membership functions, which are previously defined.To demonstrate the effect of choice of MF on the model performance three different functions are tested: the triangular MF (trimf), the generalized bell MF (gbellmf) and the Gaussian MF (gaussmf).The number of MFs for each of the input of ANFIS is set to 2. Fuzzy C-means (FCM), developed by Dunn (1973) and improved by Bezdek (1981), clusters the data by minimizing the total distance of each data point to the cluster centers, wherein each data point belongs to a cluster to some degree that is specified by a membership grade.Subtractive clustering is one of the automated data-driven based methods for constructing the primary fuzzy models proposed by Chiu (1994).It is a fast, one-pass algorithm for estimating the number of clusters and the cluster centers in a set of data, based on the density of data points in input space.As a results, fuzzy model with minimum number of rules is obtained.In the second stage, FFNN with the same architecture as in the first stage is used.All proposed ensembles that are analyzed and compared with the best trained single FFNN: 1. Conventional ensemble: • simple average • weighted average 2. ANFIS multistage: • ANFIS trimf The number of networks in the ensemble is equal to cluster number because one best network in each cluster is selected to join the ensemble.In Table 1 and Table 2 prediction indices for training and testing the models, respectively, are presented.
The presented results show that all neural network models can predict heating consumption with satisfying accuracy.Even the single best trained FFNN gives satisfactory values of 0.9773 for R 2 and 6.3049% for MAPE in testing period.Both conventional methods for creating ensemble show improvement in accuracy, with weighted average for 6 clusters (ensemble members) R 2 raised to 0.9818, while MAPE is 5.6270%.Further enhancement can be achieved by second layer network used to combine the outputs of the individual ensemble members, both FFNN and ANFIS.The main idea of this paper is not to specify the best second stage NN architecture, optimal membership function, or ANFIS clustering type, but to show the general ability of various multistage ensembles to successfully predict heating consumption.The best result for R 2 is 0.9829, achieved with multistage FFNN with 6 cluster, and lowest MAPE is 5.3383%, using ANFIS-FCM multistage.In all models, ensemble has proven its preeminence comparing to the best trained single FFNN.The accuracy of the proposed models in terms of coefficient of determination (R 2 ), while varying ensemble

Conclusion
For the prediction of heating energy consumption in NTNU campus Gløshaugen, 50 different FFNNs are trained based on the coldest period in years 2009, 2010 and 2011 (318 samples), and tested for year 2012 (100 samples).Improvement in prediction accuracy using neural network ensemble is investigated.The main task in this method is achieving both accuracy and diversity of ensemble members.The accuracy is obtained by using adequate training algorithm and selecting number of neurons in hidden layer by trial and error method.K-means, as one of the most used clustering technique is used for separating trained networks into groups (clusters), and the best network in each cluster is selected for the ensemble member.Members are then aggregated into ensemble using various techniques: conventional methods (simple and weighted averaging) and multistage.Averaging the predictions of these networks resulted in an improvement in accuracy over the predictions of the best trained individual FFNN.Further improvement is obtained by training new neural network to combine the predictions of the original networks.In second level two different neural networks are analyzed: FFNN and ANFIS.Different ANFIS models are constructed using various identifi-cation methods: different membership functions (trimf, gbellmf, gaussmf), fuzzy C-means clustering (FCM) and subtractive clustering (SUB).All ensembles are trained and tested for various number of clusters.Multistage model, using ANFIS in second level is proven to be most effective.In this paper we have demonstrated that multistage ensembles, where the adaptive properties of a second layer network are used to combine the outputs of the individual ensemble members, offer enhanced performance over conventional combining methods and best trained single network.

Figure 5 :
Figure 5: Correlation of the daily heating energy consumption with mean daily outside temperature for the year 2012

Figure
Figure 6: autocorrelation function

Figure 8 :
Figure 8: Prediction results of multistage ensemble ANFIS FCM with 5 clusters for training period

Figure 13 :Figure 11 :
Figure 13: Prediction results of multistage ensemble ANFIS FCM with 5 clusters for testing period

Figure 12 :
Figure 12: MAPE for different cluster number for testing set into m groups.Even though kmeans was first proposed over 50 years ago, it is still one of the most widely used algorithms for clustering.

Table 1 :
Prediction indices for training networks

Table 2 :
Prediction indices for testing networks