Open access peer-reviewed chapter - ONLINE FIRST

Forecasting Energy Consumption in Educational Buildings with Big Data Analytics

Written By

Houda Daki, Basma Saad, Asmaa El Hannani, Abdelfatteh Haidine and Hassan Ouahmane

Submitted: 18 January 2024 Reviewed: 06 March 2024 Published: 14 May 2024

DOI: 10.5772/intechopen.1005142

ICT for Smart Grid - Recent Advances, New Perspectives, and Applications IntechOpen
ICT for Smart Grid - Recent Advances, New Perspectives, and Appli... Edited by Abdelfatteh Haidine

From the Edited Volume

ICT for Smart Grid - Recent Advances, New Perspectives, and Applications [Working Title]

Dr. Abdelfatteh Haidine

Chapter metrics overview

16 Chapter Downloads

View Full Metrics

Abstract

This chapter delves into the realm of “Big Data and Analytics in Smart Grid”, focusing specifically on the domain of forecasting energy consumption in educational institution buildings. The chapter starts with a high point of smart grid and forecasting electrical energy consumption in several areas and then describes the forecasting models in buildings for educational institutions. The study gives an overview of forecasting models in this kind of prediction, and it gives their potential and classification based on extensive studies and research. The chapter unfolds the practical advantages and challenges that big data offers to optimize energy forecasting for educational institutions. The exploration covers the entire big data pipeline in smart grid, including data selection, preparation, and the crucial phases of training and testing.

Keywords

  • smart grid
  • big data
  • machine learning
  • electrical consumption forecasting
  • predictive analytics

1. Introduction

A smart grid refers to an intelligent electricity distribution system designed to optimize the generation, distribution, and consumption of electricity. This optimization is achieved through the integration of information and communication technologies (ICT) into the traditional electricity grid [1, 2, 3]. Essentially, smart grids instigate significant transformations in the underlying information systems. This involves the incorporation of new data streams originating from the electricity grid, the inclusion of emerging participants like decentralized renewable energy producers, the integration of novel applications such as electric vehicles and connected households, and the deployment of advanced communication tools like smart meters, sensors, and remote control points. Consequently, this evolution leads to an overwhelming influx of data that energy companies must effectively manage. Leveraging big data technologies becomes crucial for utilities in handling this data surge, though the strategic selection of the appropriate big data technology remains a pivotal decision.

The integration of big data and analytics has emerged as a transformative force, particularly within the intricate framework of smart grids [4]. Especially, the predictive analytics paradigm has become a cornerstone in the quest for a reliable and efficient electric power supply. Lately, predictive analytics has played a significant role in ensuring a dependable electric power supply [5]. This approach employs advanced methods to efficiently process, interpret, and analyze large datasets, enhancing their utility and value. Furthermore, electrical forecasting emerges as a novel strategy to address challenges associated with the generation and consumption of increased energy. This approach ensures heightened efficiency and accuracy by proactively addressing potential disruptions arising from critical factors that could impact energy usage during extreme events. From demand forecasting to real-time adjustments based on critical measures, the application of predictive analytics promises heightened efficiency and accuracy in managing energy resources.

In the quest for sustainable and efficient energy management, educational institutions find themselves at the crossroads of technological innovation and environmental responsibility. The integration of smart grid, big data, and advanced analytics has emerged as a pivotal solution to navigate these challenges. Educational facilities, with their diverse energy needs, face the imperative to balance operational efficiency with environmental consciousness. The predictive analytics paradigm becomes a beacon in this endeavor, offering a strategic approach to anticipate and optimize energy consumption. From forecasting energy demand patterns to real-time adjustments based on critical measures, the application of predictive analytics promises not only heightened efficiency but also a tangible contribution to sustainability goals. Join us as we navigate the intersection of technology and education, uncovering how predictive analytics transforms energy consumption predictions and paves the way for a more sustainable future within educational institutions.

In this chapter, we present the added value of predicting energy consumption within educational institutions, unraveling the potential of data-driven insights in reshaping their energy landscape. Many studies are interested in this domain; for example [6, 7, 8], develop a forecasting model for electrical energy consumption in a university campus, aiming to optimize energy usage, reduce costs, and enhance sustainability efforts. The proposed solutions in all these studies help to have a proactive energy management and an efficient energy optimization based on many added values including cost saving, contribution to environmental sustainability, and operational efficiency.

In this perspective, we will start with a smart grid overview, describing the main concepts and components of these systems. Subsequently, we will begin by providing a definition of the smart grid concept, followed by a more detailed discussion of the main components and systems of this field. Then we will give a global view of the data management issues and challenges in this context. Furthermore, in Section 3, we will summarize the added value of big data technologies for this kind of data and discuss the technical requirements, the tools, and the main steps to implement big data solutions for the smart grid systems. In Section 4, we will present a detailed study of the main techniques and technologies used in forecasting energy consumption in buildings. And then, in Section 5, a thorough review of forecasting energy consumption in educational buildings will be presented, covering aspects such as data selection, preparation, training, and testing. Finally, the chapter concludes with a summary.

Advertisement

2. Smart grid and data management

2.1 Comprehensive overview of the smart grid value chain

A smart grid is characterized as an intelligent network utilizing new technologies, sensors, and equipment to effectively manage diverse energy resources, thereby enhancing the reliability, efficiency, and security of the entire energy value chain [9]. A key advantage of smart grids lies in their capacity to seamlessly integrate renewable energy sources into the system, overseeing energy consumption and production through a bidirectional flow of energy and data across power generation, distribution, and consumption, as illustrated in Figures 1 and 2.

Figure 1.

Smart grid value chain.

Figure 2.

Smart grid as ensemble of informatics (SW) and infrastructure (HW).

The smart grid value chain is characterized by a bidirectional flow of both energy and data among the key components of power generation, distribution, and consumption. This interconnected and dynamic system marks a departure from traditional one-way energy flow, allowing for enhanced coordination, efficiency, and responsiveness within the entire energy ecosystem. In this evolved value chain, power is not only generated and distributed but also intelligently monitored and managed through a seamless exchange of information, facilitating a more adaptive and resilient electricity infrastructure. This bidirectional flow serves as the backbone of the smart grid, enabling a holistic approach to energy management and optimization across its various stages. In this review, we are interested in the information layer.

The initial stage in the smart grid value chain is power generation, encompassing sources like nuclear, hydropower, and renewables. This phase relies on wide-area monitoring and control technologies to establish communication with the subsequent step, known as power distribution. Power distribution operates on a proximity network linking consumers to the electricity grid, transmitting data through advanced metering infrastructure. The final stage in the smart grid value chain is power consumption, involving both residential and industrial electricity users. It is becoming increasingly common for consumers to generate electrical energy using alternative methods such as solar, biomass, and wind. Consequently, effective supervision of their consumption and production becomes crucial for optimizing the overall service.

The infrastructure of the smart grid is intricately structured, comprising multiple layers that operate in concert, as depicted in Figure 3 [10]. At the foundational level, the component layer is tasked with managing physical devices, acquiring functions, information, and communication capabilities from other layers. The communication layer employs various techniques and protocols to facilitate seamless data transfer among the diverse components of the grid. Meanwhile, the information layer delineates the data model and communication systems employed for the exchange of information. In parallel, the function layer outlines logical functions and applications independently of the underlying physical architecture. Lastly, the business layer is responsible for defining business models and adhering to regulatory requirements. The harmonious communication among these layers is essential for effective energy management and data transfer. Each layer interfaces with numerous systems, collectively contributing to the successful execution of its respective functions and responsibilities.

Figure 3.

Smart grid infrastructure layers.

The implementation of smart grids yields a multitude of added values, benefiting both utilities and customers.

Added value for utilities: Smart grids significantly enhance the grid management capabilities of utilities, empowering them to make timely and informed decisions. The incorporation of various optimization, control, and monitoring systems provides utilities with real-time, detailed insights into grid operations. The advantages of utilities are manifold, encompassing:

  • Enhanced System Management: Smart grids contribute to the overall improvement of production, transport, and distribution system management.

  • Increased Energy Independence: By seamlessly integrating renewable energies, smart grids play a pivotal role in advancing energy independence.

  • Optimized Energy Production: Smart grids enable utilities to efficiently manage and model available energy production capacities in response to real-time and spontaneous demand.

  • Network Balance Maintenance: Real-time management of under-voltage and over-voltage situations ensures the maintenance of network balance.

  • Improved Grid Security: Smart grids enhance the security of electricity grids, mitigating risks and reducing instances of fraud.

  • Elevated Service Quality: Utilities experience an enhancement in the quality of services offered, coupled with improvements in customer service, fostering a more robust and responsive energy ecosystem.

Added value for customers: Smart grids deliver a plethora of advantages to customers through the implementation of interactive and scalable models of power grid and energy demand. Customers, encompassing both residential and industrial users of electricity, experience a range of benefits:

  • Empowered Energy Options: Customers, now often prosumers producing their own electricity through alternative methods like solar energy, biomass, and wind, benefit from the flexibility offered by smart grids.

  • Real-time Interaction: Real-time communication with smart grid control and monitoring systems facilitates precise measurement and optimization of customers’ energy utilization within the grid.

  • Enhanced Control: The deployment of smart meters and other smart grid equipment empowers consumers to actively control their energy consumption in real time. This capability enables them to avoid peak loads, leveraging associated price benefits.

  • Cost Savings: Customers can strategically schedule high-energy-consuming activities, such as operating washing machines, dryers, and dishwashers, during off-peak times when energy prices are significantly lower. This not only results in cost savings for customers but also contributes to a more efficient use of energy resources.

  • Reduced Generation Capacity Demand: By optimizing their consumption patterns, customers not only save money but also contribute to a reduced demand for generation capacity during peak periods, fostering a more sustainable and balanced energy grid.

2.2 Smart grid systems

Smart grids rely on advanced and contemporary communication and information infrastructure to enhance energy production, distribution, and storage, consequently streamlining the cost and effort involved in management and planning. Our focus will predominantly be on the information systems integral to the management of data. This emphasis underscores the pivotal role of robust and efficient information systems in ensuring the seamless operation and optimization of smart grid functionalities.

Communication System: plays a critical role, and it should maintain high bandwidth capacity and speed, data security and privacy, etc. Data transmission in smart grid systems is based on communication technologies, starting with access network technologies including PLC, ZigBee, WIFI, etc., followed by area network technologies, using M2M, cellular networks, Ethernet, etc. Subsequently, core network technologies including IP, IMPLS etc., are utilized. Finally, backbone network technologies rely on fiber technologies, microwave link, IP-based wavelength, network, and other optical technologies, for more details about data transmission see subsection [11].

Information Systems: Information systems constitute vital elements within smart grids, working collaboratively to establish a flexible, scalable, and efficient grid, as depicted in Figure 4. These utility information systems play a crucial role in controlling and processing data sourced from utility substations and various electricity consumers, including those in commercial, residential, and industrial sectors. The information extracted is then utilized to derive valuable insights into the state of grid lines and equipment, energy consumption, consumption patterns, and more. Utility information systems encompass several key components:

  • Supervisory Control and Data Acquisition (SCADA) System [12]: This system collects data from the utility field and manages the electrical grid infrastructure. It communicates with other information systems to provide comprehensive reports about the network.

  • Customer Information System (CIS): CIS processes data from electricity consumers, fostering effective communication and data exchange among various information systems.

  • Geographic Information System (GIS): GIS aids in processing data related to the geographical aspects of the grid, enhancing spatial understanding and coordination.

  • Advanced Metering Infrastructure (AMI) [13]: AMI plays a crucial role in collecting and managing data from smart meters, facilitating real-time monitoring of energy consumption.

  • Meter Data Management System (MDMS): MDMS processes and manages data collected from meters, ensuring accurate and efficient handling of consumption information.

  • Demand Response Management System (DRMS): DRMS interacts with all other systems, ensuring effective coordination and response to demand fluctuations, ultimately contributing to grid stability.

  • Outage Management System (OMS): OMS is instrumental in managing and responding to outages promptly. Together with DRMS, it ensures a comprehensive view of the grid and enhances overall consumer satisfaction.

Figure 4.

Information systems in smart grid.

These systems collectively provide a holistic perspective on the grid and consumer interactions, emphasizing the interconnected nature of information systems within the smart grid infrastructure. Further details about the functionalities and roles of each system will be elaborated below.

2.3 Data management issues in smart grid

Smart grid systems generate substantial amounts of data, with systems like SCADA collecting data every 2–5 seconds and AMI systems capturing data at intervals of 1–15 minutes. Consequently, utilities are confronted with a multitude of challenges spanning from strategic considerations to operational performance in the realm of data management. The sheer volume and frequency of data acquisition pose complex challenges that utilities must navigate to effectively harness and leverage the wealth of information generated by smart grid systems. These challenges encompass strategic decision-making regarding data utilization, the establishment of robust data management strategies, and the optimization of performance in handling and analyzing the copious amounts of data flowing through the smart grid infrastructure. As utilities grapple with these challenges, finding innovative and efficient solutions becomes imperative to ensure the continued success and advancement of smart grid technologies.

Standards and Interoperability: The smart grid represents a heterogeneous and intricate environment characterized by diverse devices, networks, systems, and data types. Within this framework, variations abound, encompassing networks with differing processing speeds, devices with or without energy constraints, interactive or non-interactive systems, and continuous or non-continuous data, among others. This diversity imposes a range of requirements and challenges on smart grids concerning data integration, including bandwidth constraints, error management, limited resources, and the demand for high scalability. In the absence of standardized protocols, utilities grapple with the use of disparate protocols featuring distinct definitions and communication techniques, making interoperability an arduous task. To address this challenge and establish standardization in smart grids, various information models have been developed. Notably, IEC 61850 serves as a communication standard MDMS and related enterprise applications. The IEC 61970/61968 Common Information Models (CIM) relies on IEC 61850 as the foundation for information exchanges and messaging. The evolution of smart grid technology has led to the integration of advanced protocols, such as IEC 61850–90-7, specifically tailored for smart inverters. Additionally, protocols like IEEE 1815 (DNP3) and IEEE 2030.5 (SEP2) offer advanced communication capabilities, allowing the utilization of existing communication infrastructures. These standardized protocols play a crucial role in facilitating seamless communication, interoperability, and efficient data management within the complex and dynamic landscape of smart grids [10].

Management of Massive Data Volume: The implementation of smart grids highlights the significant costs associated with storing and processing the substantial volume of data essential for effective grid management. Regrettably, many utilities face challenges in fully leveraging the wealth of new data collected due to limitations in infrastructure and/or insufficient data analysis skills. This underutilization of data not only hampers the potential benefits for utilities but also poses challenges for customer data management. Beyond the considerable data management issues utilities encounter, there are additional challenges pertaining to customer data. The untapped potential of this data diminishes the opportunities for consumers to actively participate in the smart grid, specifically in terms of controlling their energy consumption and capitalizing on price benefits by avoiding peak loads. To harness the full advantages of smart grids, consumers need to be cognizant of the extensive data that can be derived through these platforms. Customers should be educated about how to effectively engage with their meters and various platforms within the grid. This knowledge empowers them to make informed choices that contribute to energy conservation and result in cost savings. By raising awareness about the potential of smart grid data and providing consumers with the necessary skills, a more engaged and empowered consumer base can emerge, unlocking the true potential of smart grid technologies.

Security and Data Privacy: Within the intricate ecosystem of the smart grid, millions of interconnected devices communicate through networks, exposing the grid to potential vulnerabilities. The adoption of virtualization technologies, a cornerstone for leveraging cloud computing, allows electrical companies to operate applications in virtual machines, leading to reduced investment costs in terms of hardware and energy [14]. However, this approach introduces security challenges due to the shared platform among multiple users. Furthermore, network bandwidth emerges as a significant challenge, leading to low-latency issues in real-time applications that necessitate highly scalable, available, and fault-tolerant connections [15]. These information and ICT dimensions escalate the risks associated with compromising the security objectives of the smart grid. These objectives include ensuring availability, integrity, confidentiality, and accountability. The confidentiality of data becomes a concern when there is no secure connectivity between devices [16]. Establishing secure communication channels requires the adoption of authentication mechanisms [17]. The “authentication, authorization, and accounting” (AAA) mechanism is commonly employed for this purpose. Authentication verifies users through credentials, authorization delineates individual user permissions, and accounting oversees user activities [18]. These security measures are imperative for safeguarding the integrity and reliability of the smart grid communication infrastructure.

Data Availability: is paramount in smart grid communications, ensuring uninterrupted electricity delivery and system functionality. Defending against attacks like false data injections is crucial, requiring real-time detection and mitigation mechanisms. The most used mechanism for ensuring data availability in smart grid systems is redundancy and fault tolerance [19, 20]. Redundancy involves duplicating critical components or data across multiple systems to mitigate the impact of failures or disruptions. Fault-tolerant techniques are also commonly employed to ensure continuous operation in the event of component failures or network disruptions.

Data Integrity: is paramount to the security of the smart grid infrastructure. Data integrity measures ensure that information remains unchanged and uncorrupted during transmission or storage. Robust encryption algorithms and digital signatures are commonly used to maintain data integrity, providing assurance against unauthorized alterations, or tampering [21, 22].

Data Accountability: in smart grid systems refers to the principle of holding individuals or entities responsible for their actions and decisions within the grid infrastructure. It encompasses various aspects, including ensuring transparency, traceability, and responsibility for system operations, data management, and security measures. The above-discussed authentication mechanism is the most used to verify the identities of users and devices accessing the system.

Advertisement

3. Big data for smart grid

3.1 Big data life cycle for smart grid systems

The term “big data” encompasses more than just the sheer quantity of datasets. Beyond (1) volume, it is characterized by (2) variety, representing diverse data formats (structured, semi-structured, or unstructured), (3) velocity, ensuring timely processing, (4) value, allowing the extraction of meaningful insights from collected datasets, (5) variability, accounting for the inconsistency in data, and (6) veracity, emphasizing the trustworthiness of the data. Figure 5 illustrates big data technologies for smart grids.

  • Data Source: In the realm of smart grids, a diverse array of data sources contributes to a comprehensive understanding of the electrical infrastructure. These data sources can be categorized into distinct classes based on the nature of the information they provide. Firstly, operational data encompasses critical electrical details, including real and reactive power flows, demand response capacity, and voltage levels. Secondly, non-operational data is unrelated to the power grid’s immediate functioning but includes essential information like master data, power quality, and reliability metrics. Another crucial category is meter usage data, offering insights into power consumption patterns, such as average usage, peak demand, and time-of-day variations. Event message data, the fourth type, originates from events detected by smart grid devices, such as voltage fluctuations or fault detection events. Lastly, metadata serves as a unifying element, facilitating the organization and interpretation of all other data types. These diverse data streams are sourced from a variety of points, including meters, sensors, devices, substations, mobile data terminals, control devices, intelligent electronic devices, distributed energy resources, customer devices, and historical data.

  • Data Integration: In the current landscape, the enhancement of smart grid reliability, persistence, efficiency, and performance is driven by the utilization of modern information and communication technologies along with advanced operational approaches. To ensure seamless data integration, various technologies are employed [23]:

    • Service Oriented Architecture (SOA): The multitude of software in enterprise systems often poses a challenge in terms of management. SOA addresses this issue by establishing a unified approach for software communication, making data integration more adaptable and streamlined. In the context of smart grids, SOA is particularly employed in on-demand systems.

    • Enterprise Service Bus (ESB): ESB facilitates communication management between diverse systems like GIS, OMS, CIS, etc. This technology yields numerous benefits, including cost reduction, time efficiency, and improved integration management and monitoring. In smart grids, ESB technologies are closely intertwined with SOA, reinforcing robustness, and flexibility.

    • Common Information Models (CIM): CIM serves a critical role in smart grid persistence and integrated data architecture. Specifically designed for the electric power industry using UML models, CIM significantly contributes to energy management systems by ensuring data integration efficiency in terms of time and cost. In power systems, CIM is indispensable for ensuring data interoperability, especially when implementing various applications. Operating at the data transformation level, CIM collaborates with ESB to normalize and standardize data between smart grid systems.

    • Messaging: Messaging systems operate on the exchange of messages containing data and other information among various applications, managed by a messaging server. This approach enhances communication efficiency and is integral to the overall data management strategy.

Figure 5.

Big data architecture for smart grid.

In essence, these technologies and approaches play a pivotal role in optimizing smart grid operations, ensuring seamless communication, and promoting data interoperability across diverse systems.

  • Data Storage: Data storage plays a pivotal role in the smart grid, acting as a crucial link between the collection of data from various sources and the delivery of that data to analytics tools, necessitating fast input/output operations per second (IOPS). The requirements of handling big data necessitate a sophisticated and scalable data storage mechanism. One prevalent solution is the use of distributed file systems (DFS), which facilitates file sharing and storage resource access across multiple users and machines. Operating on a client/server storage mechanism, DFS allows each user to obtain a local copy of the stored data. Numerous solutions, including Google’s GFS, Quantcast File System, HDFS, Ceph, Luster, GlusterFS, and PVFS, leverage DFS for effective data storage. Another innovative approach to data storage is the utilization of NoSQL databases, offering a fresh perspective to address the limitations of traditional relational SQL databases when dealing with massive datasets. NoSQL databases are characterized by three primary architectures: key-value solutions like Dynamo and Voldemort, column-oriented solutions including Cassandra and HBase, and document database solutions such as MongoDB and CouchDB. These diverse architectures provide flexibility and efficiency in managing and storing large volumes of data in the context of smart grid applications.

  • Data analytics: In the smart grid ecosystem, data collection from diverse sources is extensive, resulting in a substantial dataset that is essential for analytics consumption. Analytics plays a pivotal role in augmenting the intelligence, efficiency, and profitability of the grid, with various types of analytics including signal analytics, event analytics, state analytics, engineering operations analytics, and customer analytics. Currently, several models integrate these diverse analytics classes, including descriptive, diagnostic, predictive, and prescriptive models. Descriptive models elucidate customer behaviors in demand response programs, offering a fundamental understanding of their practices. Following customer description, diagnostic models delve deeper to comprehend specific behaviors and analyze individual decisions. These models collectively contribute to predictive models that forecast future customer decisions. At the apex of smart grid analytics are prescriptive models, influencing marketing strategies, engagement tactics, and decision-making processes [24]. The processing of big data in smart grids can occur through two primary approaches. Batch processing handles data within specific time intervals and is suited for scenarios with relaxed response time requirements. On the other hand, stream processing is employed for real-time applications, demanding minimal latency in response time for effective and timely decision-making.

  • Data Visualization: Effective data visualization is integral to the assessment of smart grids. Various techniques, particularly multivariate high-dimensional visualization, play a significant role in presenting complex data. These techniques, including 2D and 3D visualization, address the challenge posed by the multitude of variables in smart grids. Visualization tools such as Scatter diagrams, parallel coordinates, and Andrew curves are employed to simplify the representation of high-dimensional data, offering clarity in the assessment of smart grid information, including 3D Power-maps [25].

  • Data Transmission: Data transmission is a critical aspect of big data, influencing every phase of data processing. It is essential to maintain high bandwidth capacity and speed and ensure data security and privacy. In the context of smart grids, data transmission relies on various communication technologies outlined in “communication systems”. This includes access network technologies such as PLC, ZigBee, and WIFI, followed by area network technologies involving M2M, cellular networks, and Ethernet. Core network technologies utilize IP, IMPLS, while backbone network technologies leverage fiber technologies, microwave links, IP-based wavelength division multiplexing (WDM) networks, and other optical technologies. The robustness and efficiency of data transmission systems are crucial for the seamless operation of smart grids.

3.2 Criteria for choosing big data technologies

Big data technologies propose a variety of tools, prompting utilities to carefully select platforms and tools aligned with their specific goals. As highlighted in the preceding subsections, the big data life cycle encompasses five fundamental phases: data sources, data integration, data storage, data analytics, and data visualization. Notably, big data analytics stands out as the most crucial step in this life cycle. Consequently, by understanding the analytical processes involved, utilities can strategically determine the types of data to acquire, ascertain how to store it efficiently, and even select appropriate visualization techniques to enhance their insights. This strategic approach ensures that utilities derive meaningful knowledge from the vast datasets encountered in the big data landscape.

When electrical companies embark on selecting solutions, a multitude of factors must be carefully considered. Criteria such as computation speed, compatibility, graphic capabilities, cloud compatibility, and more come into play. To navigate through these considerations, utilities often leverage multiple criteria decision making (MCDM) tools. Among these tools, the analytic hierarchy process (AHP) stands out as one of the most popular MCDM methods, valued for its ability to incorporate both quantitative and qualitative considerations.

In decision-making applications, AHP proves beneficial for selecting big data analytics platforms. This involves defining criteria across technical, social, cost, and policy perspectives. The AHP model facilitates a comprehensive evaluation of technical perspectives, encompassing hardware and resource configuration requirements. Table 1, as described in [26], outlines the technical perspectives relevant to big data analytics platform selection, providing a structured framework for decision-making in this complex landscape.

Technical perspectiveCriteria
Availability and fault toleranceRedundancy and resilience in networks, servers, physical storage, etc.
Scalability and flexibilityTools must be evolutionary and scalable
Performance (latency)Data processing time (single transaction and query request)
Computational complexityComputation tools extension (data mining and business intelligence)
Distributed storage capacity and configurationsStorage systems parameters such as storage nodes needed in terms of availability, periodic basis, etc.
Data processing modesBatch, real, and hybrid processing
Data securitySecurity compliance according to the platform requirements

Table 1.

AHP model technical perspective.

3.3 Big data implementation in smart grid

The emphasis will be placed on customer data analytics, given its association with the concept of smart consumers. In this paradigm, consumers are not only end-users but potential producers of clean energy, active participants in their consumption patterns, and pivotal actors in achieving a balance between production and consumption. The realm of customer data analytics presents a significant opportunity for utilities to gain profound insights into customer behavior. This understanding serves as a foundation for making strategic decisions that align with the evolving dynamics of smart grids and the active involvement of consumers in the energy ecosystem.

  • Added Value of Customer Data analytics: The implementation of big data analytics for customer data has shifted from being an option to a necessity for electrical companies. Consumers are actively participating in smart grids through the utilization of smart meters, granting them enhanced control over their energy consumption. Utilities leverage demand response (DR) programs to access real-time data on demand curves across various consumption points, enabling more precise calibration and forecasting. This, in turn, facilitates the more efficient regulation of production curves according to demand, minimizing losses from overproduction. The benefits of DR extend to real-time diagnostics of meters and equipment near consumers, triggering alarms, and executing self-healing systems. One of the motivations behind DR is the improvement of customer engagement, enabling utilities to interact with customers’ energy needs even during power outages. Dynamic pricing, integrated into DR, encourages consumers to monitor and adapt their usage in real time, particularly during peak periods, offering them the flexibility to align their consumption with cost-saving measures [13]. The successful implementation of these initiatives relies heavily on employing customer data analytics techniques, as illustrated in Table 2.

  • Big Data Tools for Customer Data Analytics: Customer data, often measured in terabytes and presented in diverse formats, demands high velocity, scalability, and fault tolerance in its processing, storage, and visualization. Implementing big data solutions involves leveraging various tools, with analytics tools standing out as crucial elements in business decision-making. The diverse sources of customer data, ranging from smart meters to devices and historical data, necessitate the use of integration tools to ensure data uniformity. Messaging tools prove to be highly efficient for integrating raw data and are thus aptly employed for customer data integration. When it comes to big data analytics, several processing modes can be employed:

    • Batch Processing Tools: Big data analytics offers various methods for data processing, with batch processing being a significant approach. Hadoop emerges as a suitable choice for batch analytics in the context of smart grids. Given the geographical distribution of smart grid systems, the presence of distributed file systems becomes invaluable. Hadoop encompasses Hbase as a database system, Hadoop distributed file system (HDFS) as a storage system, and MapReduce as a processing engine. However, Hadoop has limitations in handling modern information technology (IT) systems with respect to data velocity, scalability, and machine learning algorithms [27].

    • Real-Time Processing Tools: Real-time processing tools excel in faster execution, particularly when dealing with data possessing higher velocity requirements. These tools employ stream processing or complex event processing systems. Notable solutions for real-time processing in smart grids include S4, Splunk, and Storm. Storm stands out as a fitting choice due to its attributes as an open-source, distributed, and fault-tolerant system. Storm offers numerous advantages for real-time processing in smart grids, including reliable message handling, parallel computations, and a simple programming model. Storm can be seamlessly integrated with Kafka for data integration and Hbase for data storage.

    • Hybrid Processing Tools: Hybrid processing tools adeptly handle both batch and real-time processing. Spark, a versatile framework, is employed for batch processing and provides a real-time processing solution through Spark Streaming. Spark is equipped for large-scale data processing and includes useful tools like Spark SQL, Spark Streaming, a machine learning library, and GraphX. These features make Spark well-suited for meeting big data requirements in smart grids. Spark Streaming utilizes a real-time complex event processing engine to address velocity issues. For data storage in a Spark environment, HDFS or even Hbase can be utilized [28]. Another framework, Apache Flink, excels in processing data in both batch and stream modes. Flink boasts extensive APIs for transformation functions (map/reduce, group, etc.), making it scalable, easy to deploy, fault-tolerant, and fast in execution. Flink’s efficiency in machine learning is enhanced by its dedicated machine learning library, FlinkML. With built-in libraries for accessing HDFS, Flink integrates seamlessly with HDFS for data storage.

Analytics typeMethodsExample of use
Descriptive
  • Data mining

  • Data aggregation

Demand response Programs
Diagnostic
  • Correlation

  • Cause-and-effect analysis

Analyze customer behavior and decisions
Predictive
  • Statistical models

  • Forecasts techniques

Predict customer decision
Prescriptive
  • Business rules

  • Machine learning

  • Computational modeling procedures

Marketing models and engagement strategies

Table 2.

Customer data analytics.

Advertisement

4. Forecasting energy consumption in buildings

4.1 Forecasting approaches

In fact, forecasting is used in several domains for many purposes, and each use has its own horizon and granularity depending on the nature of data used and the desired results. Forecasting uses high degree of probability and statistical techniques to predict future and unknown events, so it relies on modeling to analyze current and historical facts [29]. In general, energy forecasting models can be classified into three categories [30, 31, 32]:

  • White box models are based on detailed physical information and use known and conventional knowledge like physics equations to describe cases. It is not easy to apply it, and it takes more time to run despite those limits, this kind of forecasting gives more accuracy. This type of models is also known as physics-based models, forward classical approach, calibrated simulation approach or as engineering methods.

  • Black box models are based on historical data and uses data mining techniques, statistical analyses, and machine learning algorithms to get the relation between the input and the future outputs values. It Is not easy to apply it, but it presents a high running speed with good accuracy. This type of models is also known as intelligent techniques, statistical analyses, data-driven techniques, or as time series techniques.

  • Gray-box models are based on the combination of both white-box and black-box models, by improving single data-driven techniques with optimization methods or combining several machine learning algorithms. It is not easy to apply it, and it takes more time to run but it presents a high accuracy. This type of models is also known as hybrid models, gray-box models, or improved models.

For this chapter, we will focus on black-box models, to describe this forecasting process, its concept as well as the benchmark of the models that offer.

4.2 Data-driven approach

Data-driven methods always seek to identify the relationship between inputs and outputs by applying data mining techniques, statistical analyses, and machine learning algorithms. This process goes through three stages, training to run training datasets and produce results, validation to run evaluation datasets that are different from the training datasets which aim assessing the performance of the implemented algorithm and finally the testing step to evaluate the forecasting model performances using accuracy metrics. This type of techniques can be divided into two categories [30, 33]:

  • One Dimensional Data-driven Model: One dimensional data-driven model is a simple and basic technique for forecasting problems, because it implements only one predictive algorithm. The most used algorithms for building forecasting are as follows:

    • Linear Regression (LR): is an algorithm to estimate the relationship between a dependent and an independent variable using statistical manner, which makes it suitable for prediction and forecasting applications [34]. This model sets the linear relationship between independent variables, several inputs (x) and single output (y). This algorithm is very useful to figure out the cause-and-effect relationship between variables, and the main goal of the linear regression is to find the optimal values for a and b in the linear equation y = a + b*x [34].

    • Isotonic Regression (IR): belongs also to regression algorithms, but compared to linear regression this algorithm does not follow any form for the target function. Isotonic regression minimizes the mean squared error on the training data to have non-decreasing approximation of a function. The main purpose of this algorithm is to find real values based on observed response and X = x1, x2,…,xn of the training data. Formally isotonic regression uses a unique function called isotonic regression [35]. In general, the unknown values are calculated using this function:

      fx=k=1nwkykxk²E1

    • Decision Tree (DT): is an approach based on tree architecture that lists all possibilities under defined constraints, and it belongs to supervised learning. Decision tree algorithms are used to predict using continuous data or to classify using discrete data [36].

    • Support Vector Regression (SVR): uses the same principle as support vector machine, but for regression problems, SVR is accomplished by introducing an e-insensitive region around the function, called the e-tube. This tube reformulates the optimization problem to find the tube that best approximates the continuous-valued function, while balancing model complexity and prediction error. More specifically, SVR is formulated as an optimization problem by first defining a convex e-insensitive loss function to be minimized and thus finding the flattest tube that contains most of the training instances [37].

    • Gradient Boosting Trees (GBT): is based on three concepts, function loss to be optimized which can be squared error for regression problems or logarithmic loss for classification problems. Weak learner to make predictions and additive model to add weak learners to minimize the function loss [38].

    • Artificial Neural Networks (ANN): uses multiple layers of nodes which are related to each other, and these layers are divided into input layers, hidden layers, and output layers. At artificial neural networks, there is exactly one input layer with a size equal to the length of the feature vector and one output layer equal to the number of labels. For hidden layers, one is sufficient for a great number of problems be more than one can increase the training time [39].

  • Multi-Dimensional Data-driven Model: Multi-dimensional data-driven model is a complicated and advanced technique for forecasting problems because it can combine multiple models to implement a hybrid forecasting.

    • Ensemble models: mix several single algorithms together, and to do this combination ensemble models apply two strategies: (i) homogeneous modeling and (ii) heterogeneous modeling. Random forest is among this king of black-box models, and it is a regression algorithm based on ruining multiple decision trees in parallel with training time to overcome the overfitting.

    • Improved models: are the result of combining single models and optimization techniques to improve the prediction accuracy. Swarm intelligence algorithms such as particle swarm optimization and genetic algorithms are among the optimization techniques used.

4.3 Machine learning for data-driven models

The aim of machine learning concept is to automate and mechanize the acquisition of knowledge from experience. Furthermore, machine learning improves performance of computational methods and algorithms. Thus, machine learning is hardly linked to the data-driven model which finds relationships among the state variables of the system without using physical behavior of the system. Machine learning uses some approaches to select and extract features from input datasets to train data-driven models including supervised, unsupervised, reinforcement, and transfer learning. Table 3 describes some machine learning approaches that are associated with data-driven models. In general, to apply both machine learning and data-driven models to a problem, certain steps must be followed, starting with defining objectives then collecting, exploring, processing, and visualizing data and finally running, evaluating, and comparing performance of the model.

TaskTypeData-driven model
ClassificationSupervised
  • Artificial neural networks

  • Support vector machines

  • Random forest

RegressionSupervised
  • Linear regression

  • Decision tree

  • Support vector regression

ClusteringUnsupervised
  • Gaussian mixture models

  • Density

  • Based on spatial clustering

  • Random forest

Anomaly DetectionUnsupervised
  • Support vector machines

Table 3.

Machine learning uses with data-driven models [40, 41].

Advertisement

5. Review of forecasting electrical consumption in educational buildings

5.1 Data selection and preparation

To improve prediction accuracy, the system must collect all relevant data with high impact on school building energy consumption. Electrical consumption monitoring in a building can be less accurate, if it does not include all factors that hardly affect the energy use. Thus, focusing on the most relevant information and eliminating the irrelevant ones is a central step in forecasting problems. For concreteness, we reviewed some studies about forecasting electrical consumption for education institutions to highlight the more relevant data to be collect in this context. Table 4 summarizes for each study the data used to forecast electrical consumption. In general, most of these studies use temperature, humidity, and wind speed to represent meteorological data. But a few works use the occupancy data for building electrical forecasting, and even those studies which use it do not give enough details.

[42][43][44][45][46][47][48]
TemperatureXXXXXX
IrradiationXX
HumidityXXX
Wind speedXXX
Electricity usageXXXXXXX
Building typeX
Day of the weekXXX
Day of the yearXX
Hour of the dayX
Event dayX
Event typeX

Table 4.

Identification of relevant data of reviewed research papers.

As a result, many studies have highlighted the significance of occupancy and weather data in electricity consumption prediction. However, most of these works either do not integrate these two main data types or do not use detailed occupancy data. This work explores the use of both occupancy and meteorological data, and the study gives a daily occupancy more detailed than other works, which will ensure the accuracy and the quality of predictions. The proposed solution uses in addition to electrical consumption:

  • Occupancy data: presents time data, which is strongly linked to daily occupancy and schedule data based on holidays, school year, semester period, and even weekends. In fact, the occupancy behaviors and activities affect the energy consumption patterns.

  • Meteorological data: presents many factors, which must be taken into consideration because it strongly affects occupancy behavior and materials consumption. Temperature, humidity, wind speed, and pressure are all main factors that should be selected as input data to implement our forecasting model. These data can be collected from several devices such as sensors, weather data, and meteorological station data.

5.2 Model selection and training

According to Runge et al. study [49], in energy consumption forecasting 84% of research use artificial neural network (ANN) models applied with black-box-based models, followed by white-box with 12% and finally gray-box models with 4%. Thus, for this work, we will focus on black-box models, to describe this forecasting process, its concept as well as the benchmark of the models that is offered.

The work of Bourdeau et al. [30] presents a review on data-driven building energy modeling techniques. They introduce the most prevalent techniques and provide an up-to-date overview of recent studies and advancements in building energy consumption modeling and forecast studies. According to Bourdeau et al. works, the number of reviewed research papers from 2007 to 2019 for supervised machine learning for building energy consumption modeling and forecast show that ANN is the most studied, then in the second place came the support vector machine (SVM), and in the third place came the regression models. Wei et al. [50] also review the prevailing data-driven approaches used in building energy analysis. Wei et al. review many methods for prediction building energy including artificial neural networks, support vector machines, statistical regression, DT, and genetic algorithm. They conclude that ANN, SVM, and regression techniques are the most used in these cases. In the same context, Amasyali et al. [51] make an overview of the most used algorithms for energy consumption in buildings. They conclude that an overall 47% of the energy consumption prediction models utilized ANN as machine learning algorithms, while 25% used SVM, 4% DT, and 24% other statistical models.

Some studies implement their own systems and run many algorithms on them to find the most appropriate model in the case of electrical consumption forecasting in the buildings. Grolinger et al. [43] explore many prediction intervals for electrical consumption in the context of event-organizing venues including daily, hourly, and 15-min. Grolinger et al. compare forecasting results accuracy for two machine-learning approaches, ANN and SVM. They achieved high consumption prediction accuracy with daily data better than hourly or 15-min readings and using the ANN model instead of the SVM model. The work of Amberet al. [52] is also interested in daily forecasting electricity consumption of building. Amberet al. compare prediction capabilities of five different intelligent system techniques including multiple regression (MR), genetic programming (GP), ANN, deep neural network (DNN), and SVM. Similarly, to Grolinger et al., Amberet al. results demonstrate that ANN performs better than all the other four techniques. Kim et al. [53] study compares building electric energy prediction approaches that use a traditional statistical method (linear regression) and ANN algorithms. Kim et al. results illustrate that the ANN modeling was more accurate and stable than the linear regression method. As a result, the majority of studies find that ANN are the primary models employed to evaluate and predict energy consumption [43, 52, 54, 55].

ANN offers great number of models, but according to Runge et al. [49] study most research uses multi-layer perceptron (MLP). Runge et al. review shows that 61% of the ANN models use MLP with a large portion using a two hidden layers. The result of the analysis showed that MLP is followed by radial basis neural networks, non-linear autoregression neural network, general regression neural network, and nonlinear autoregressive network with exogenous inputs neural network with 2–7% each. MLP is regression artificial neural network that has been widely used in the building sector for supervised learning using ANN. Chammas et al. [56] propose a system based on MLP to predict energy consumption of a building. They compare their system against four other algorithms, namely linear regression (LR), SVM, gradient boosting machine (GBM), and random forest (RF). Chammas et al. achieve that MLP is the best system configuration energy consumption forecasting in a building. Wahid et al. [57] find also that MLP is the best model to predict electrical consumption in building. Wahid et al. study compares MLP and RF, and their results show that MLP achieved 95.00% accurate result, whereas the accuracy observed by RF was only 90.83%.

All these works listed above like this article compare the efficiency of ANN to other machine learning models in the case of building power consumption forecasting. However, our work compares MLP to many models that are the most used in this field. Thus, this research will give more precision on the performance of the MLP.

5.3 Model testing and validation

Accuracy metrics help to validate forecasting performances of data-driven algorithms. In fact, validation step is used to verify the quality of the model, and error metrics are used to measure the difference between values predicted and the values observed. Many metrics exist, but the choice of the good one depends on the use and the objective of the work:

  • Mean Absolute Error (MAE) [58]: this model evaluation metric describes the average of the absolute values of all differences between the forecast and the real values expressing the same phenomenon.

MAE=0NYrealYpredNE2

where N is the total number of data points.

  • Normalized Mean Absolute Error (NMAE) [59]: this metric presents the MAE by the mean of the time series; we can interpret the result as a weighted mean absolute percentage error.

    NMAE=MAEmeanYreal100E3

  • Mean Square Error (MSE) [60]: this model evaluation metric presents the mean of the squared prediction errors over all instances in the test set. In fact, it describes the difference between the real and the predicted results for an instance. Mean squared error presents how the predicted results are close to real set of points. In general, the smaller value of this metric is the most accurate.

    MSE=1NYrealYpred2NE4

  • Root Mean Square Error (RMSE) [61]: presents the concentration of data around the line of the best fit. Root mean squared error is usually used in regression analysis for numerical predictions because it is a good measure of accuracy in the case of comparing prediction errors of different models or model configurations for a particular variable.

MSE=MSEE5

  • R-squared (R2) [62]: also called coefficient of determination, is a statistical measure that shows how close are the data to the fitted regression line. It indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measure is a percentage value between 0 and 100% scale. In general, the higher the R-squared, the better the model fits the data.

R2=11NYrealYpred21NYrealYmean2E6

Bourdeau et al. [30] propose an overview study of the most used metrics based on reviewing many studies. They find that the RMSE, the coefficient of variation of RMSE (CV-RMSE), and the MAE assessed in 47, 38 and 36%, respectively. On the other hand, the R2, the MSE, the mean relative error (MRE), the mean bias error (MBE), and the normalized mean bias error (NMBE) are used only for 27, 16, 9, 2, and 4%, respectively. Zhang et al. [63] study shows that CV-RMSE must be the first performance measure to be selected followed by other metrics. Zhang et al. consider CV-RMSE is more important metric followed by RMSE, but if these two values were unavailable, then MAPE was selected. If MAPE was unavailable, then R2 was selected. If R2 was unavailable, then the most relevant error method presented was selected and indicated as others. However, this order is not usually respected, Runge et al. [49] have done a review of the most used error metrics, and they find that MAPE is predominately (38%) used as the main performance measure within forecasting papers, with CV-RMSE and R2 accounting for 17–20% of the performance metrics applied.

Advertisement

6. Conclusions

In conclusion, the integration of big data and analytics in the context of smart grids, with a specific focus on predicting energy consumption in educational institutions, represents a transformative leap toward sustainable and efficient energy management. The multifaceted application of predictive analytics, leveraging advanced machine learning models has demonstrated promising results in forecasting energy consumption patterns.

The comprehensive evaluation of these models across various spatial uses within educational institutions has shed light on their strengths and limitations. The insights gained from this study provide a foundation for guiding future applications in energy consumption prediction. As educational institutions increasingly adopt smart grid technologies, and the lessons learned from this research contribute to the development of an effective strategies for optimizing energy usage. Furthermore, the exploration of data sources, storage mechanisms, and transmission technologies within the smart grid ecosystem underscores the complexity of the predictive analytics life cycle. The role of information systems, from SCADA to demand response management, is pivotal in ensuring seamless data integration and efficient energy management.

As the demand for energy-efficient operations continues to grow, the necessity for accurate and adaptable predictive models becomes paramount. This chapter not only highlights the significance of integrating big data analytics but also emphasizes the need for careful consideration in choosing the right tools and platforms. The utilization of multiple criteria decision making (MCDM) tools, such as the analytic hierarchy process (AHP), emerges as a valuable approach for making informed decisions based on technical, social, cost, and policy perspectives. In essence, the predictive analytics landscape in smart grids is dynamic and evolving, offering a plethora of opportunities for enhancing energy efficiency. The findings and considerations presented in this chapter contribute to a broader understanding of the predictive analytics ecosystem in smart grids, laying the groundwork for continued advancements in energy consumption prediction, particularly within the educational sector.

Advertisement

Conflict of interest

“The authors declare no conflict of interest.”

Advertisement

Appendices and nomenclature

AAA

authentication, authorization, and accounting

AHP

analytic hierarchy process

AMI

advanced metering infrastructure

ANN

artificial neural networks

CIM

common information models

CIS

customer information system

DFS

distributed file systems

DR

demand response

DRMS

demand response management system

DT

decision tree

GBT

gradient boosting trees

GIS

geographic information system

HDFS

Hadoop distributed file system

ICT

information and communication technologies

IOPS

input/output operations per second

IR

isotonic regression

IT

information technology

LR

linear regression

MAE

mean average error

MCDM

multiple criteria decision making

MDMS

meter data management system

MSE

mean square error

NMAE

normalized mean absolute error

NMBE

normalized mean bias error

OMS

outage management system

R2

coefficient of determination

RF

random forest

RMSE

root mean square error

SCADA

supervisory control and data acquisition

SVM

support vector machine

SVR

support vector regression

WDM

wavelength division multiplexing

References

  1. 1. Agarwal V, Lefteri HT. Smart grids: importance of power quality. In: Energy-Efficient Computing and Networking: First International Conference. Athens, Greece; 2011
  2. 2. Amin SM. Smart grid: Overview, issues and opportunities. advances and challenges in sensing, modeling, simulation, optimization and control. European Journal of Control. 2011;17(5-6):547-567
  3. 3. Yan Y, Qian Y, Sharif H. A survey on smart grid communication infrastructures: Motivations, requirements and challenges. IEEE Communications Surveys & Tutorials. 2012;15(1):5-20
  4. 4. Wang K, Xu C, Zhang Y. Robust big data analytics for electricity price forecasting in the smart grid. IEEE Transactions on Big Data. 2017;5(1):34-45
  5. 5. Al-Dahidi S, Ayadi O, Alrbai M. Ensemble approach of optimized artificial neural networks for solar photovoltaic power prediction. IEEE Access. 2019;7:81741-81758
  6. 6. Jihui Y, Craig F, Chikako A, Kazuo E. Predictive artificial neural network models to forecast the seasonal hourly electricity consumption for a University Campus. Sustainable Cities and Society. 2018;42:82-92
  7. 7. Jihoon M, Jinwoong P, Eenjun H, Sanghoon J. Forecasting power consumption for higher educational institutions based on machine learning. The Journal of Supercomputing. 2018;74:3778-3800
  8. 8. Paul AA, Stephen A, Nkosinathi M, Obafemi OO. Hybrid adaptive neuro-fuzzy inference system (ANFIS) for a multi-campus university energy consumption forecast. International Journal of Ambient Energy. 2022;43:1685-1694
  9. 9. Wang W, Lu Z. Cyber security in the smart grid: survey and challenges. Computer Networks. 2013;57:1344-1371
  10. 10. McGranaghan M, Schmitt DHL, Cleveland F, Lambert E. Enabling the integrated grid: leveraging data to integrate distributed resources and customers. IEEE Power and Energy Magazine. 2016;14:83-93
  11. 11. Chen M, Shiwen M, Yunhao L. Big data: A survey. Mobile Networks and Applications. 2014;19:171-209
  12. 12. Yan Y, Qian Y, Sharif H, Tipper D. A survey on smart grid communication infrastructures: motivations, requirements and challenges. In: IEEE Commun Surv Tutor. Vol. 15. NYC, USA: IEEE; 2013. pp. 5-20
  13. 13. Siano P. Demand response and smart grids—a survey. Renewable and Sustainable Energy Reviews. 2014;30:461-478
  14. 14. Fang B, Yin X, Tan Y, Li C, Gao Y, Cao Y, et al. The contributions of cloud technologies to smart grid. Renewable and Sustainable Energy Reviews. 2016;59:1326-1331
  15. 15. Jaradat M, Jarrah M, Bousselham A, Jararweh Y, Al-Ayyoub M. The internet of energy: smart sensor networks and big data management for smart grid. Procedia Computer Science. 2015;56:592-597
  16. 16. Singla A, Sachdeva R. Review on security issues and attacks in wireless sensor networks. International Journal of Advance Research in Computer Science Software Engineering. 2013;3:529-534
  17. 17. Khushboo G, Vaishali S. Design issues and challenges in wireless sensor networks. International Journal of Computers and Applications. 2015:112-126
  18. 18. Malhotra J. Review on security issues and attacks in wireless sensor networks. International Journal of Future Genereration Communication Network. 2015;8:81-88
  19. 19. Alidaee B, Wang H, Huang J, Sua LS. Integrating Statistical Simulation and Optimization for Redundancy Allocation in Smart Grid Infrastructure. Energies. 2023;17(1):255
  20. 20. Attarha S, Narayan A, Hage HB, Krüger C, Castro F, Babazadeh D, et al. Virtualization management concept for flexible and fault-tolerant smart grid service provision. Energies. 2020;13(9):2196
  21. 21. Fengjun L, Luo B. Preserving data integrity for smart grid data aggregation. In: IEEE Third International Conference on Smart Grid Communications (SmartGridComm). 2012
  22. 22. Shukla S, Thakur S, Breslin JG. Secure communication in smart meters using elliptic curve cryptography and digital signature algorithm. In: IEEE International Conference on Cyber Security and Resilience. 2021
  23. 23. Vera BA, Colomo PR, Molloy O. Business process analytics using a big data approach. IT Professional. 2013;15:29-35
  24. 24. Stimmel CL. Big Data analytics strategies for the smart grid. Boca Raton: CRC Press; 2014
  25. 25. Nga DV, See OH, Quang DN, Xuen CY, Chee LL. Visualization techniques in smart grid. Smart Grid Renewable Energy. 2012:3-175
  26. 26. Lněnička M. Ahp model for the big data analytics platform selection. Acta Information Pragensia. 2015;4:108-121
  27. 27. Shyam R, Ganesh HBB, Kumar SS, Prabaharan P, Soman KP. Apache spark a big data analytics platform for smart grid. Procedia Technology. 2015;21:171-178
  28. 28. Liu G, Zhu W, Saunders C, Gao F, Yu Y. Real-time complex event processing and analytics for smart grid. Procedia Computer Science. 2015;61:113-119
  29. 29. Voyant C, Notton G, Kalogirou S, Nivet ML, Paoli C, Motte F, et al. Machine learning methods for solar radiation forecasting: A review. Renewable Energy. 2017;105:69-582
  30. 30. Bourdeau M, Qiang ZX, Nefzaoui E, Guo X, Chatellier P. Modeling and forecasting building energy consumption: A review of data-driven techniques. Sustainable Cities and Society. 2019;48:101-533
  31. 31. Foucquier A, Robert S, Suard F, Stéphan L, Jay A. State of the art in building modelling and energy performances prediction: A review. Renewable and Sustainable Energy Reviews. 2013;23:272-288
  32. 32. Ordiano JAG, Bartschat A, Ludwig N, Braun E, Waczowicz S, Renkamp NP, et al. Concept and benchmark results for big data energy forecasting based on apache spark. Journal of Big Data. 2018;5:11
  33. 33. Liu H. Smart cities: Big data prediction methods and applications. Berlin, Germany: Springer Nature; 2020
  34. 34. Hyndman RJ, Athanasopoulos G. Forecasting: Principles and practice. Melbourne, Australia: OTexts; 2018
  35. 35. Chambers B, Zaharia M. Spark: The definitive guide: Big data processing made simple. California, U.S: O'Reilly Media, Inc; 2018
  36. 36. Sharma H, Kumar S. A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR). 2019;5:2094-2097
  37. 37. Kazemzadeh MR, Amjadian A, Amraee T. A hybrid data mining driven algorithm for long term electric peak load and energy demand forecasting. Energy. 2020;204:117-948
  38. 38. Chen Y, Jia Z, Mercola D, Xie X. A gradient boosting algorithm for sur vival analysis via direct optimization of concordance index. Computational and Mathematical Methods in Medicine. 2013;2013:18-27
  39. 39. Grabusts P, Zorins A. The influence of hidden neurons factor on neural network training quality assurance. In: Proceedings of the 10th International Scientific and Practical Conference. Latvia: Rezekne Academy of Technologies; 2015
  40. 40. Duraisamy K, Zhang ZJ, Singh AP. New approaches in turbulence and transition modeling using data-driven techniques. In: 53rd AIAA Aerospace Sciences Meeting. VA, U.S: AIAA; 2015
  41. 41. Parish EJ, Duraisamy K. A paradigm for data-driven predictive modeling using field inversion and machine learning. Journal of Computational Physics. 2016;305:758-774
  42. 42. Amber KP, Aslam MW, Mahmood A, Kousar A, Yamin YM, Akbar B, et al. Energy consumption forecasting for university sector buildings. Energies. 2017;10:1579
  43. 43. Grolinger K, Capretz MAM, Seewald L. Energy consumption prediction with big data: Balancing prediction accuracy and computational resources. In: IEEE International Congress on Big Data (BigData Congress). Washington, U.S; 2016. p. 2016
  44. 44. Ruiz LGB, Rueda R, Cu, Pegalajar MC. Energy consumption forecasting based on Elman neural networks with evolutive optimization. Expert Systems with Applications. 2018;92:380-389
  45. 45. Moon J, Park J, Hwang E, Jun S. Forecasting power consumption for higher educational institutions based on machine learning. The Journal of Supercomputing. 2018;74:3778-3800
  46. 46. Allab Y, Pellegrino M, Guo X, Nefzaoui E, Kindinis A. Energy and comfort assessment in educational building: Case study in a French university campus. Energy and Buildings. 2017;143:202-2019
  47. 47. Amber KP, Aslam MW, Hussain SK. Electricity consumption forecasting models for administration buildings of the UK higher education sector. Energy and Buildings. 2015;90:127-136
  48. 48. Liang Y, Niu D, Wei CC. Short term load forecasting based on feature extraction and improved general regression neural network model. Energies. 2019;166:653-663
  49. 49. Runge J, Zmeureanu R. Forecasting energy use in buildings using artificial neural networks: A review. Energies. 2019;12:3257
  50. 50. Wei Y, Zhang X, Shi Y, Xia L, Pan S, Wu J, et al. A review of data-driven approaches for prediction and classification of building energy consumption. Renewable and Sustainable Energy Reviews. 2018;82:1027-1047
  51. 51. Amasyali K, El-Gohary NM. A review of data-driven building energy consumption prediction studies. Renewable and Sustainable Energy Reviews. 2018;81:192-1205
  52. 52. Amber KP, Ahmad R, Aslam MW, Kousar A, Usman M, Khan MS. Intelligent techniques for forecasting electricity consumption of buildings. Energy. 2018;157:886-893
  53. 53. Kim MK, Kim YS, Srebric J. Predictions of electricity consumption in a campus building using occupant rates and weather elements with sensitivity analysis: Artificial neural network vs. linear regression. Sustainable Cities and Society. 2020;62:102-385
  54. 54. Ahmad T, Chen H, Guo Y, Wang J. A comprehensive overview on the data driven and large scale based approaches for forecasting of building energy demand: A review. Energy and Buildings. 2018;165:301-320
  55. 55. Ai S, Chakravorty A, Rong C. Household power demand prediction using evolutionary ensemble neural network pool with multiple network structures. Sensors. 2019;19:721
  56. 56. Chammas M, Makhoul A, Demerjian J. An efficient data model for energy prediction using wireless sensors. Computers and Electrical Engineering. 2019;76:249-257
  57. 57. Wahid F, Ghazali R, Shah AS, Fayaz M. Prediction of energy consumption in the buildings using multi-layer perceptron and random forest. IJAST. 2017;101:13-22
  58. 58. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research. 2005;30:79-82
  59. 59. Kolassa S, Schutz W. Advantages of the MAD/MEAN ratio over the MAPE. Foresight: The International Journal of Applied Forecasting. 2007:40-43
  60. 60. Lu Y, Wang W. Analysis of the mean absolute error (MAE) and the root mean square error (RMSE) in assessing rounding model. In: IOP conference series: materials science and engineering. 2018
  61. 61. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?--Arguments against avoiding RMSE in the literature. Geoscientific Model Development. 2014;7:1247-1250
  62. 62. Acharya MS, Armaan A, Antony AS. A comparison of regression models for prediction of graduate admissions. In: 2019 international conference on computational intelligence in data science (ICCIDS). 2019. pp. 1-5
  63. 63. Zhang Y, Yang Q. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering. 2021;34(12):5586-5609

Written By

Houda Daki, Basma Saad, Asmaa El Hannani, Abdelfatteh Haidine and Hassan Ouahmane

Submitted: 18 January 2024 Reviewed: 06 March 2024 Published: 14 May 2024