Open access peer-reviewed chapter - ONLINE FIRST

Algorithmic Innovations in Multi-Agent Reinforcement Learning: A Pathway for Smart Cities

Written By

Igor Agbossou

Submitted: 24 September 2023 Reviewed: 09 October 2023 Published: 09 December 2023

DOI: 10.5772/intechopen.113933

Multi-Agent Systems IntechOpen
Multi-Agent Systems Authored by Mehmet Emin Aydin

From the Annual Volume

Multi-Agent Systems [Working Title]

Mehmet Emin Aydin

Chapter metrics overview

46 Chapter Downloads

View Full Metrics

Abstract

The concept of smart cities has emerged as an instrumental solution to tackle the intricate challenges associated with urbanization in the twenty-first century. Among the myriad of issues that smart cities aim to address, key concerns such as efficient traffic management, sustainable energy usage, resilient infrastructure, and enhanced public safety are at the forefront. Notably, the implementation of multi-agent reinforcement learning (MARL) has garnered significant attention for its potential role in realizing the vision of smart cities. This chapter serves as an exploration of the frontiers of algorithmic innovation within MARL and its direct applicability to address the complex challenges of urban smart grid systems. The integration of MARL principles is vital in comprehensively modeling the intricate, interdependent urban systems that underpin the smart city framework. Particularly, we emphasize the relevance of MARL in providing adaptive solutions to the intricate dynamics of the urban smart grid, ensuring effective management of energy resources and demand-side management.

Keywords

  • smart cities
  • multi-agent reinforcement learning
  • urban smart grid system
  • sustainable energy usage
  • deep deterministic policy gradient

1. Introduction

The rapid growth of urbanization across the globe presents a profound challenge for the design and management of modern cities. As the world’s population continues to gravitate towards urban areas, there is an ever-increasing demand for cities to operate efficiently, sustainably, and intelligently. Smart cities, a vision that combines advanced technologies with data-driven decision-making, offer a promising solution to address these complex urban challenges [1, 2]. At the heart of the smart city concept [3, 4, 5, 6] lies the ability to optimize and coordinate various urban systems in real-time, from traffic management [7] and energy distribution [8] to public safety and transportation [7]. Achieving this level of sophistication necessitates cutting-edge technologies, and among these, artificial intelligence (AI) stands out as a crucial enabler. Within the AI domain, multi-agent reinforcement learning (MARL) has emerged as a powerful paradigm for modeling and solving complex decision-making problems involving multiple autonomous agents [9]. MARL represents a fusion of reinforcement learning (RL) [10] and multi-agent systems (MAS) [11, 12] making it well-suited for addressing the intricate and interconnected challenges found in urban environments [3, 5, 6, 11, 13].

This chapter embarks on a journey into the realm of algorithmic innovation within the context of MARL, elucidating its potential as a transformative pathway for the development of smart cities. The motivation behind this exploration is rooted in the pressing need to harness the capabilities of MARL to enhance the quality of urban life. The modern cityscape is a dynamic and ever-evolving ecosystem, characterized by its complex web of interactions between diverse entities such as vehicles, buildings, infrastructure, and, most crucially, its residents. Managing these interactions optimally is a monumental task, one that has traditionally required extensive human intervention and resource allocation. Incidentally, the advent of MARL has opened new possibilities. By endowing agents, whether they are autonomous vehicles, energy management systems, or emergency response units, with the ability to learn and adapt to their environments through interactions, we can pave the way for more efficient, sustainable, and resilient urban systems. The implications extend far beyond mere automation; they encompass the potential to create cities that are responsive, data-driven, and capable of continuously optimizing their operations in real-time. The objective pursued in this chapter is at least twofold: (1) Comprehend MARL fundamentals: we will establish a solid foundation by elucidating the fundamentals of reinforcement learning and multi-agent systems, providing insights into the unique challenges and complexities of MARL in the framework of smart cities. (2) Explore algorithmic innovations: we will delve into the state-of-the-art algorithms and techniques that have emerged in the field of MARL, emphasizing their relevance and potential in the context of smart city development.

In pursuit of our goals, we will direct our attention towards the challenge of implementing intelligent automation for energy management within urban settings, employing a comprehensive multi-scalar approach that spans from individual households to the entire cityscape. It is crucial to acknowledge that the energy consumption in both residential and commercial buildings is steadily on the rise. This upward trajectory can be attributed, in part, to the escalating proliferation of household appliances, which in turn drives a substantial surge in domestic electricity demand. Consequently, effective planning for residential energy utilization becomes imperative for achieving optimal energy management within the framework of urban electricity distribution networks: smart grid. This intricate challenge is effectively addressed through the application of MARL models. The rest of the chapter is structured as follows: Section 2 focuses on the context and overview of the smart city. The Section 3 introduces the fundamentals of MARL, providing a basis for understanding its application in smart cities.

In Section 4, materials and methods, we present a comprehensive exposition on the multi-agent formalization for reinforcement learning pertaining to the automated energy management within urban buildings. This discussion is contextualized within the purview of smart grids, and it extensively expounds upon the methodological nuances underlying the modeling of diverse agents, both at the micro-level of individual buildings and at the macro-level of the entire city infrastructure. Building upon this framework, Section 5 is dedicated to an in-depth exploration of scenario testing implementation and subsequent results analysis. Here, we dissect the findings and their implications, shedding light on the intricate dynamics of urban energy management and the role played by MARL models. Finally, we conclude by emphasizing the significant potential that Collaborative Intelligence embodies in reshaping our urban landscapes, thereby contributing to the growing body of knowledge in the realm of MARL’s pivotal role in shaping the cities of the future.

Advertisement

2. Background and smart cities overview

Across the globe, cities are confronting and surmounting multifaceted challenges by embracing a rich tapestry of inventive concepts [3, 4, 5, 6]. This diverse spectrum of initiatives encompasses novel energy and transportation paradigms [7], groundbreaking innovations in residential construction [3], the proliferation of shared services [11], digitization of administrative processes [4], and a myriad of other pioneering endeavors. These groundbreaking initiatives are a collaborative effort involving cities, established corporations, nimble startups, non-profit organizations, and engaged citizens, all working collectively to advance innovative solutions. In recent years, a significant portion of these initiatives, particularly those harnessing the power of emerging information and communication technologies, have coalesced under the encompassing banner of “smart city” endeavors.

A smart city is one that methodically applies digital technologies to optimize resource utilization, elevate the quality of life for its residents, and bolster the regional economy’s competitiveness in a sustainable fashion. It’s a holistic approach that deploys intelligent solutions across various facets of urban life, encompassing infrastructure, energy management, housing, mobility, services, and security. These solutions are rooted in integrated sensor technology, seamless connectivity, data-driven analytics, and self-sustaining value-added processes.

However, it’s important to recognize that the path to a truly smart city is not without its challenges and complexities. Smart city projects often unfold as intricate, costly, occasionally chaotic, and time-consuming endeavors. They present a unique set of demands, necessitating two distinct competencies: an astute understanding of the ramifications of embedding digital technologies within the fabric of urban development, and the capacity to conceive integrative solutions that transcend traditional departmental boundaries. Confronted by these formidable demands (Figure 1), many decision-makers and implementers find themselves at an impasse, unsure of where to commence or how to navigate the labyrinthine journey ahead. Consequently, a significant reservoir of untapped potential languishes on the precipice of realization.

Figure 1.

General framework of smart city. Source: Adapted from Ref. [13].

2.1 The essence of smart cities

Smart cities epitomize the methodical harnessing of digital technologies’ boundless potential, seamlessly interwoven with a holistic embrace of users, inhabitants, and all stakeholders alike. At its core, the smart city endeavor is driven by a grand vision: to realize urban landscapes characterized by optimized resource utilization, elevated living standards, and a sustainable boost in competitiveness. To achieve these paramount goals, a predominantly digital transformation across the domains of infrastructure, energy management, housing, mobility, services, and security becomes not just preferable, but imperative. Central to this transformation lies the concept of a city’s “digital shadow” [14], a foundational element that serves as the bedrock for the city’s digital evolution. It’s important to distinguish this concept from the notion of a “digital twin” [15]. This digital shadow is the vessel through which intelligent solutions are infused into the urban fabric. Rooted in integrated sensor technologies, seamless connectivity, adept data analysis, and self-sustaining value-added processes, the digital shadow marks the inception of a profound change. The transformation towards an intelligent, interconnected urban ecosystem commences with the transformation of physical products, processes, and services into their digital counterparts. As these entities evolve into intelligent, autonomous, and integrated entities, a digital shadow emerges, propelling ecological and social betterment into the forefront. At the vanguard of this transformation stands the Internet of Things (IoT), serving as the conduit between the digital realm and the physical world [16]. Notably, emerging technologies such as Blockchain are poised to revolutionize secure transactions and identity management within cities, fostering trust and transparency [17, 18, 19, 20].

Powering this transformative voyage is the bedrock of modern data analytics, often encapsulated within the realm of Artificial Intelligence (AI). Through sophisticated algorithms and machine learning, AI sifts through vast datasets to uncover intricate patterns, autonomously refining systems with minimal human intervention. For instance, the intricate road system in Los Angeles has learned to optimize traffic flows through this autonomous self-improvement process, a testament to the potential unleashed by AI-driven urban innovations [11, 15, 16].

However, the digital shadow, in its nascent form, remains neutral and devoid of purpose. Therefore, it is imperative for self-learning systems to be deeply attuned to their surroundings, attentively considering the needs and aspirations of city residents. The digital shadow, while a pivotal prerequisite for smart cities, stands incomplete on its own. It is only through a laser-focused commitment to the city’s holistic milieu that it can fulfill its true potential.

2.2 Main constituents of smart city perspective evolution

In the annals of urban development, the year 2008 marked a transformative juncture when IBM introduced the visionary “smarter planet and smart city concept” worldwide. This groundbreaking concept took root in select cities, driven by the transformative potential of Information Communication Technology (ICT). Leading the charge, nations like Japan, Singapore, and China embarked on the ambitious journey of crafting smart cities, underpinned by the prowess of ICT [17].

Over the past decade, the landscape of smart cities has undergone a remarkable metamorphosis, with ICT serving as the catalyst for change. These cities have evolved to encompass a spectrum of characteristics that together define the essence of “smart cities.” These constituents are encapsulated in the following pillars: “smart governance,” “smart education,” “smart living,” “smart mobility,” “smart environment,” “smart energy,” “smart healthcare,” and “smart citizens” [21, 22]. It is through the harmonious convergence of these elements that the symphony of a smart city comes to life. The smart city narrative continues to unfold, with technology acting as an enabler for greater intelligence and efficiency. Pioneering technologies such as the IoT, AI, and the transformative potential of big data analytics have emerged as formidable allies in the quest for smarter urban landscapes [22, 23, 24, 25]. These technologies are the building blocks of a new urban paradigm, where data-driven insights, automation, and connectivity redefine the way cities operate and flourish. Each of these key components plays a unique role, interlocking with others to create a tapestry of innovation and progress. For a detailed breakdown of these components by domain, please refer to Table 1.

Key domainSectors and services associated with domainsReferences
GovernanceOnline citizens portals, efficient and fast public services, effective resource management, innovative planning approaches, public asset management, e-services, connecting people through social media.[4, 20, 23, 26]
EducationSmart infrastructure with closed-circuit television surveillance, GPS tracking of school busses, smart learning through video conferencing lectures, teacher–students management solutions, virtual labs.[4, 20, 23, 26]
LivingPublic security tools, safety alarms at public places at panic situations, community network management, safety of senior citizens.[20, 21, 23, 26]
MobilityA smart toll collection system, community carpooling system, charging point for electric vehicles, smart parking system, connected and autonomous vehicles[2, 3, 20, 21, 23, 26]
EnvironmentTraffic management, vehicle monitoring, water quality management, air quality management, smart water storage and purification system Wastewater management, pollution sensors, disaster management, green and clean environment, smart waste management.[3, 20, 21, 23, 26]
EnergySmart meters, efficient utilization of energy subsystem, energy distribution through sensors.[3, 20, 23, 26]
HealthcareE-health records, diagnostic analytics portals, emergency medical services[20, 23, 26]
CitizenshipPrivacy and security of citizens, social engagement of the people, raising awareness of smart solutions, community interactions[3, 20, 21, 23, 26]

Table 1.

Main required components of smart cities.

The advent of smart cities has ushered in a paradigm shift in urban planning and management. At its core, this transformation embraces the concept of agent-based modeling (ABM), a powerful computational framework that treats various urban entities as intelligent agents. These agents, which can represent a wide spectrum of urban systems, including traffic signals, public transportation networks, energy grids, and waste management systems, interact with one another and their environment, making informed decisions to optimize their respective functions. In the context of smart cities, MARL allows agents to adapt to the ever-changing dynamics of urban environments, making decisions that lead to more efficient, sustainable, and responsive city operations [25, 26, 27].

Advertisement

3. Fundamentals of multi-agent reinforcement learning

In recent times, MARL has emerged as a focal point of research, particularly in the realm of multi-agent systems operating in expansive, large-scale environments. This surge in interest can be attributed to its remarkable success, particularly in the domain of strategic games […]. At its core, Reinforcement Learning (RL) draws inspiration from the mechanisms of animal learning in psychology […]. It embodies a trial-and-error learning process where an agent strives to learn an action policy that maximizes cumulative rewards over time through its interactions with an environment […]. Urban environments, on the other hand, are characterized by unparalleled complexity and dynamism. They comprise a multitude of interconnected components and features, perpetually influencing one another.

3.1 Reinforcement learning basics

RL stands as a prominent and widely recognized subset of machine learning methods, specifically tailored to the art of acquiring the skill to accomplish specific tasks by engaging in dynamic interactions with the surrounding environment. This pivotal task often hinges on the presence of a reward mechanism, serving as a beacon guiding the intelligent agent towards optimal performance. In essence, RL casts the intelligent agent in the role of a decision-maker, requiring it to navigate a spectrum of situations by selecting actions strategically. The accumulation of rewards based on these actions functions as a compass, guiding the agent towards more proficient decision-making in the future. The overarching goal is to amass rewards as efficiently as possible over an extended period, ultimately steering the agent’s behavior towards a state of optimal performance in the long term. The focal point of RL revolves around the quest to uncover a control policy capable of achieving predefined objectives. In this context, RL takes on the formalized structure of a Markov decision process [28], serving as the bedrock upon which iterative learning is built. Within each iteration, a RL agent carefully chooses an action (at) based on the prevailing policy and the current state (st). This selected action is then executed within the given environment, ushering in a transition to a new state (st+1) and the concomitant bestowal of a reward (rt+1) as elucidated in Figure 2.

Figure 2.

RL paradigm in a Markov decision process for an agent.

Through its continuous interaction with the environment, the policy undergoes iterative refinement via RL methodologies, all aimed at maximizing the cumulative long-term reward. To compute the optimal policy, a diverse array of techniques is at the RL practitioner’s disposal, with value-based and policy-based approaches emerging as the most prominent contenders [29, 30]. When confronted with intricate challenges, such as those outlined in Table 1, neural networks emerge as a formidable tool, empowering RL to effectively predict the optimal policy or value function [31]. Yet, the scope of RL extends beyond individual agents tackling isolated problems. In scenarios marked by complexity and interdependency, as underscored in Table 1, RL seamlessly extends its domain to encompass multiple agents coexisting within the same environment. Here, these agents can engage in collaborative or competitive interactions, ushering in the realm of MARL. In the following, we will delve deeper into the intricacies of MARL and explore how these algorithmic innovations are poised to chart a transformative pathway for smart cities, where a consortium of learning agents collaboratively engages with urban environments.

3.2 From bellman equation to iterative policy evaluation and improvement

The quest to understand and approximate the value function of a given policy π has long been a central pursuit in the realm of RL. At its heart, this endeavor hinges on unraveling the intricate relationship between the value of a state and the values of its successor states. This foundational relationship is encapsulated by the state value function, which adheres to the following Eq. (1):

vπst=aπatstsPsa(rstatst+1+μvπst+1E1

This equation elegantly expresses the interplay between the value of a state and the expected values of its future states. It serves as the cornerstone upon which numerous methods for computing and learning the state value function vπ are constructed. Not to be outdone, the state-action value function qπ also adheres to a recursive relationship known as the Bellman equation [32]:

qπstat=at+1Psa((rstatst+1+μat+1πat+1st+1qπst+1at+1E2

The Bellman equation for (qπ) embodies a similar spirit, delineating the value of a state-action pair (s, a) as a function of the rewards, transitions, and values of the ensuing states. The specific challenge of computing the state value function vπ for a given policy, termed policy evaluation, takes center stage. To tackle this challenge, we adopt an iterative approach, bypassing direct computational methods in favor of a more computationally efficient strategy and we assume complete knowledge of the environment’s dynamics, implying familiarity with the transition probabilities for each element (st, at). Below, we present the pseudo-code outlining the iterative policy evaluation algorithm:

The pursuit of computing the value function for a given policy is not merely an intellectual exercise; rather, it serves as a pivotal stepping stone towards the enhancement and optimization of existing policies. Consider a deterministic policy π for which the associated value function vπ has been diligently calculated.

This foundational knowledge becomes a catalyst for the creation of an improved policy, denoted as π′. So, how does one transition from π to π′? The process unfolds as follows:

πst=argmaxaqπstatE3

The Policy Improvement Theorem [33], a fundamental result in the realm of RL, underpins the process of policy refinement. According to this theorem, the new policy π′ derived from an existing policy π is inherently superior or, at the very least, equivalent in performance. This transformative procedure, which entails evolving an old policy into a more optimal one by aligning it with the insights garnered from the value function, is formally recognized as “policy improvement.” The steps to execute this method are concisely outlined below:

3.3 CityLearn: enabling algorithms implementation and execution

In this section, we present the fundamental implementation tools and aspects of our research through CityLearn [34]. Eqs (1)(3) are operationalized through the pseudo-coded algorithms presented in Tables 2 and 3, forming the core of our agent coordination policies. In the context of MARL, these policies are the controllers. Leveraging the OpenAI Gym standard, CityLearn serves as a platform for deploying MARL algorithms for urban energy management, load-shaping, and demand response [34, 35]. CityLearn operates primarily in a decentralized control mode, which we utilized for our MARL controllers. To ensure reproducibility, we outline its four key functionalities: (1) Facilitation of MARL implementations. (2) Full customizability of the reward function, enabling the use of a multi-output function for MARL applications. (3) Modular and open-source design, accommodating diverse energy system classes, including options for user-created classes, particularly for city-scale energy systems. (4) Provision for users to generate their datasets (e.g., weather data, building energy demand, EV schedules) and integrate them into CityLearn. Further guidelines on dataset creation can be accessed through the GitHub repository [36].

Initialize v(s) arbitrarily for all s ∈ S
Repeat until convergence:
Δ ← 0
For each s ∈ S:
v_old ← v(s)
v(s) ←aπatstsPsa(rstatst+1+μvπst+1
Δ ← max(Δ, |v_old - v(s)|)
Until Δ < θ (a small positive threshold)

Table 2.

Iterative policy evaluation algorithm.

policy-stable ← true
For each s ∈ S do
old-action← π(s)
π(s) ← argmaxaqπstat
if old-action π(s) then
policy-stable← false
end
end
if policy-stable = true
Stop and return π
end
else
Go to Table 2.
end

Table 3.

Iterative policy improvement algorithm.

Advertisement

4. Materials and methods

MAS serve as the linchpin in a diverse array of applications within smart cities. These applications encompass a broad spectrum of technologies, including integration with the IoT and autonomous systems. Examples of such autonomous systems are intelligent robots, unmanned aerial, underwater, or surface vehicles, self-driving cars, and advanced transportation and healthcare systems. In these contexts, MAS agents engage in distributed interactions to fulfill specific tasks and objectives. In the ideal smart city scenario, MAS should exhibit robust, decentralized, and collaborative behaviors. These systems are expected to make intelligent and cognitive decisions while devising efficient solutions to data-driven challenges. Consequently, they facilitate decentralized information management and decision-making processes [37] that underpin the dynamics of smart cities. The remainder of this chapter is devoted to resolving the problem of efficient management of energy systems predicament within urban settings, employing the MARL paradigm. This section aims to dissect the complexities associated with energy resource allocation and optimization, elucidating the role of MARL in enabling efficient and effective urban smart grid management strategies.

4.1 Multi-agent model design for urban smart grid

One of the primary catalysts for grid decarbonization is the seamless integration of renewable energy systems (RES) into the grid’s supply chain. Within the residential domain, the Home Energy Management System (HEMS) proves to be a highly effective tool for automating energy management (Figure 3) as illustrated.

Figure 3.

HEMS agent at the heart of urban smart grid.

In the realm of MARL for smart cities, the selection of appropriate sensors plays a pivotal role in the acquisition of accurate and high-quality data [38]. As technology continues to advance, an extensive array of sensors has become available for gathering geospatial data within urban environments [39, 40, 41, 42]. In recent years, mobile sensors such as smartphones and tablets have gained significant popularity [43, 44]. The concept of demand response (DR) is central to an effective energy management strategy (Figure 4), offering consumers and prosumers the ability to provide the grid with the much-needed flexibility. This is achieved by reducing energy consumption through load management, shifting energy consumption to off-peak times, or generating and storing energy when grid conditions are favorable. In return, consumers and prosumers typically experience a reduction in their energy bills.

Figure 4.

Multi-agent coordination in demand response within smart grid (taken from the CityLearn GitHub repository).

In fact, HEMS functions as a distributed intelligent agent that empowers users to partake in local energy trading and realize energy savings at the household level. By acting as a strategic agent, HEMS contributes to holistic load management, ensuring that the energy needs of each household are met in an efficient and cost-effective manner. The HEMS system operates as an intelligent agent [45, 46] that crafts optimal energy plans for home appliances, considering individual customer energy consumption plans and comfort requirements. By adapting and learning from customer interactions, these agents become adept at predicting and optimizing individual customer decision-making patterns, thereby ensuring effective and efficient urban energy management.

4.2 MARL coordination policy assessment

Coordination policy takes the principles of single-agent RL and extends them to facilitate distributed decision-making at a larger scale.

  • Multi-agent training for decentralized execution: In MARL, the challenge lies in training multiple agents to execute strategies in a decentralized manner [47]. To model this, we formulate the problem as a partially observable Markov decision problem within the framework of Deep Reinforcement Learning (DRL) [48]. However, when the environment is dynamic, the learning task becomes considerably more challenging. In such cases, each agent within a multi-agent system often perceives all other learning agents as part of the environment, creating a non-stationary scenario. This non-stationarity can lead to suboptimal policies, as they are developed in a distributed fashion and may not be sufficiently robust.

  • Addressing non-stationarity through centralized training with decentralized execution: To mitigate the non-stationarity challenge, various approaches have been proposed [49, 50], focusing on centralized training with decentralized execution. These approaches encompass different strategies: (1) Shared Critic Networks: For actor-critic algorithms, like Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [49], they involve training on an ensemble of policies that encourage more robust multi-agent policies. The shared critic network plays a crucial role in coordinating the actions of multiple agents. (2) Q-value Mixing (QMIX): QMIX [51] utilizes a network to estimate joint action values, ensuring monotonicity per agent. This guarantees tractability and consistency between centralized and decentralized policies, fostering better coordination. (3) Inter-Agent Communication: Some methods leverage communication between agents to enhance scalability. For instance, Differentiable Inter-Agent Learning (DIAL) [50] and CommNet [52] employ deep neural networks to learn end-to-end communication protocols. This is particularly useful in complex environments with partial observability. Agents in these approaches exchange information and backpropagate error derivatives, combining centralized learning with decentralized execution to improve overall performance. (4) Macro-Actions: Another approach, as introduced in [53], involves creating abstractions known as macro-actions. These abstractions help improve scalability by allowing agents to reason and plan at a higher level, thus reducing the complexity of decision-making.

Advertisement

5. Experimental settings and results

CityLearn stands out as a comprehensive and user-friendly OpenAI Gym environment tailored for the seamless implementation of MARL techniques within the intricate domain of urban energy systems. This innovative platform is designed to bring about significant transformations in the aggregated electricity demand curve by orchestrating the energy storage capabilities of a diverse array of buildings within a district [34, 35]. The primary goal of CityLearn is to simplify and standardize the evaluation of RL agents, thereby serving as an invaluable benchmarking tool for a wide spectrum of algorithms.

5.1 Key features of CityLearn

Key features of CityLearn are: (1) Diverse Energy Models: At the heart of CityLearn lies an extensive library of energy models, encompassing vital components such as air-to-water heat pumps, electric heaters, chilled water (CHW) systems, domestic hot water (DHW) heaters, and electricity energy storage devices. These models enable the simulation of a wide range of building energy systems, reflecting the rich heterogeneity found in real-world urban environments. (2) Building Energy Systems: Each building in the simulation is equipped with an air-to-water heat pump for cooling and an electric heater for DHW heating. Additionally, buildings have the flexibility to incorporate various combinations of CHW, DHW, and electricity storage devices. These components work collaboratively to offset cooling, DHW heating, and electrical loads drawn from the grid. (3) Storage Capacity: CityLearn introduces the concept of storage capacity, defined as multiples of the hours that storage devices can meet the maximum annual cooling and DHW demand when fully charged. This parameterization allows for precise control and optimization of energy storage strategies. (4) Grid Interaction: The framework encompasses a dynamic interaction between buildings and the main grid. Besides the energy storage devices, other electric equipment, and appliances (non-shiftable loads) also draw electricity from the grid. For sustainable energy practices, CityLearn supports the integration of Photovoltaic (PV) systems within buildings, enabling them to generate their own electricity and reduce dependency on the grid.

CityLearn has emerged as a versatile tool with a wide array of applications. Researchers have harnessed its capabilities to explore incentive-based Demand Response (DR) mechanisms [54], coordinate energy management across multiple buildings [55], and conduct rigorous benchmarking of MARL algorithms [36, 56]. Its flexibility, coupled with its diverse energy models, makes CityLearn an indispensable asset for advancing our understanding of urban energy systems and shaping the future of sustainable urban development.

5.2 Simulation setup for smart grid

In our research endeavor, we leverage the sophisticated CityLearn environment as a crucial testbed to rigorously assess the performance of our MARL algorithm in orchestrating the actions of multiple agents, as detailed in Section 3.2 of this chapter. Our primary objective is to delve into the behavioral dynamics of these agents concerning varying durations of offline training, relying on predefined policies. We postulate that an extended offline training period equips the agents with expert-level knowledge of optimal actions, thereby leading to enhanced overall performance when they transition to an online setting. Within the simulated smart city environment, each individual building is endowed with its dedicated Reinforcement Learning – Deep Deterministic Policy Gradient (RL-DDPG) controller. The number of these controllers aligns with the total count of buildings under consideration, a scenario featuring ten buildings in this specific case study. Notably, we explore the performance of two distinct RL controllers: the conventional DDPG controller and its multi-agent variant, known as MADPG.

Here are some key insights into our simulation setup:

  • Controller configuration: DDPG Controller: In the DDPG setup, each agent operates in relative isolation, possessing knowledge solely of its own states and actions. This encapsulation reflects a scenario where agents are not privy to the states or actions of their peers, operating independently.

  • MADPG controller: In contrast, the MADPG controller extends the agents’ awareness to include not only their own states and actions but also those of every other agent in the system. This elevated level of information sharing fosters collaborative decision-making among the agents.

  • State representation: As states, our framework incorporates key variables, including the hour of the day, outdoor temperature, and the state of charge within each Chilled Water (CHW) storage tank. These states provide valuable context and information for the agents to make informed decisions.

  • Action space: The controllers’ actions pertain to the energy storage and release strategies they employ during hourly intervals. These actions are pivotal in optimizing energy utilization and management within the smart city. Here is a dataset sample that captures the MARL implementation details within the CityLearn environment in Table 4.

Building IDControllerHour of the dayOutdoor temperature (°C)CHW state (%)Action taken
1DDPG15:002860Store
2DDPG14:002540Release
3DDPG12:003075Store
4DDPG11:002055Release
5DDPG10:002770Store
6MADPG18:002950Release
7MADPG17:002445Store
8MADPG16:002680Release
9MADPG13:002265Store
10MADPG09:003135Release

Table 4.

Dataset sample that captures the MARL implementation.

Building ID specifies the unique identifier for each building within the simulation, facilitating the differentiation and tracking of individual buildings. Controller denotes the type of controller utilized in each building, indicating whether it is a DDPG or MADPG controller. The distinction between the two controllers is crucial as it influences the decision-making process and the overall behavior of the building within the smart city environment. Hour of the Day represents the specific hour of the day at which the action is being taken. It provides insights into the temporal dynamics of the decision-making process, showcasing how the controllers operate at different times of the day. Outdoor Temperature signifies the outdoor temperature in Celsius, which is a crucial environmental factor impacting the energy management decisions of the buildings. It highlights the influence of external conditions on the operational strategies implemented by the controllers. CHW represents the state of the CHW storage tank within each building, depicted as a percentage. It indicates the current level of charge within the storage tank, which directly influences the decisions regarding whether to store or release energy. Action Taken specifies the action chosen by the controller in response to the current state of the CHW storage tank. The actions are categorized as “Store” or “Release,” reflecting the decisions made by the controllers to either store or release energy based on the prevailing conditions and objectives.

This information is instrumental in understanding the efficacy of the MARL algorithms and their implications for optimizing energy utilization and management within the smart city context. Our study capitalizes on the CityLearn environment to conduct a meticulous investigation into the performance of MARL algorithms, with a particular focus on the impact of offline training durations.

We assess the efficacy of both individualized and collaborative decision-making strategies, shedding light on the potential advantages of multi-agent coordination in urban energy systems.

5.3 Policy controller results

In our comprehensive evaluation of controllers for smart grid management within the smart city context, we employed the total cooling cost as the primary performance metric to assess their effectiveness. Figure 5 provides a visual representation of our findings, where we compared the performance of two Reinforcement Learning – Deep Deterministic Policy Gradient (RL-DDPG) controllers against two rule-based controllers (RBC).

Figure 5.

MARL policy evaluation for energy distribution.

Here are the key aspects of our evaluation and observations:

5.3.1 Controller comparison

  • DDPG and MADPG controllers: We subjected both the DDPG and MADPG controllers, representing advanced machine learning-driven decision-making, to a rigorous assessment. These controllers were tasked with optimizing cooling costs within the simulated environment.

  • Rule-based controllers (RBC): To provide a benchmark for comparison, we introduced two rule-based controllers. The first RBC followed a standard rule-based approach, activating cooling when the water temperature in the tank reached a predefined maximum threshold. The second RBC was manually tuned to minimize total electricity costs, representing a more refined rule-based strategy.

5.3.2 Performance outcomes

  • DDPG and MADPG outperform standard RBC: Notably, both the DDPG and MADPG controllers demonstrated superior performance compared to the standard RBC. This improvement was achieved without necessitating explicit system modeling and maintained the adaptive potential inherent to Reinforcement Learning (RL) methods.

  • Manually optimized RBC: Interestingly, the manually optimized RBC, which was relatively straightforward to fine-tune due to the uniformity of energy systems across all buildings, outperformed both the DDPG and MADPG controllers.

The study findings, as depicted in Figure 5, indicate that within this specific environment, the MADPG controller did not exhibit a substantial advantage over the DDPG controller. In essence, sharing information among the agents did not yield significant performance improvements. This observation raises the question of whether coordination efforts are necessary in this context, considering that similar savings can be achieved without explicit coordination. We acknowledge that these results may vary in more complex environments characterized by diverse energy systems and potentially differing optimal policies for individual buildings. Further research is warranted to explore the dynamics of coordination in such intricate settings.

5.4 Evaluation metrics comparison across key algorithms

To comprehensively uncover the origins of algorithmic innovations in the pursuit of optimizing multi-agent coordination policies for addressing challenges within smart cities, it is imperative to undertake a global assessment of the performance of various algorithms employed during this research endeavor. This segment presents a thorough juxtaposition of key evaluation metrics for the Deep Deterministic Policy Gradient (DDPG) algorithm, the Multi-Agent Deterministic Policy Gradient (MADDPG) algorithm, the Rule-Based Controller (RBC) algorithm, and the Manual Optimized Rule-Based Controller (MORBC) algorithm. The comparison encompasses crucial metrics, including energy cost savings, network stability, computational complexity, adaptability, scalability, decision-making intricacy, real-time responsiveness, and implementation complexity. By scrutinizing these metrics, we can elucidate the strengths and limitations inherent in each approach as applied to the simulation of smart grid management through CityLearn. Table 5 offers a succinct overview of our comprehensive evaluation findings.

MetricKey algorithms
DDPGMADDPGRBCMORBC
Energy cost savings0.9210.8530.6520.881
Grid stability0.8540.9080.6030.781
Computational complexity0.7020.6050.8520.753
Adaptability0.8010.9520.6050.607
Scalability0.9030.8510.8040.801
Decision-making complexity0.7050.9530.6010.604
Real-time responsiveness0.8540.9060.600.850
Implementation complexity0.7510.8010.8530.870

Table 5.

Innovation metrics comparison across key algorithms.

By contextualizing the evaluation within the parameters of a smart grid experimental environment, these results demonstrate insight into the performance capabilities of each algorithm, helping to highlight algorithmic innovation strategies in policy implementation. Robust multi-agent coordination adapted to efficient and sustainable smart projects.

Advertisement

6. Conclusion and future directions

MARL stands as a transformative force with the potential to revolutionize urban environments and address the multifaceted challenges that smart cities face. By facilitating decentralized decision-making processes, fostering cooperation, and introducing elements of healthy competition among agents, MARL paves the way for more efficient resource allocation and an enhanced quality of life for the denizens of urban landscapes. While the road ahead is not without its hurdles, the ongoing dedication to research and development in the field of MARL is indispensable in realizing the boundless possibilities that smart cities can offer in the future.

This chapter serves as a resounding call to action, resonating not only with researchers and technologists but also with urban visionaries, policymakers, and the architects of tomorrow’s cities. It urges us to unite and harness the collective intelligence of autonomous agents, coupled with the adaptability and resilience embedded in reinforcement learning. Together, we can drive our cities forward into a future where they not only navigate the complexities of urban life but thrive as shining examples of sustainability and innovation.

In our evaluation, we have illuminated the promise of RL-based controllers in the realm of smart grid management. This has underscored the capacity of RL methods to surpass conventional rule-based strategies, particularly as the intricacy of systems escalates. Moreover, our findings beckon further exploration, especially in understanding the role of coordination within the intricate and multifaceted energy systems that characterize modern smart cities.

As we venture into the dynamic realm of smart city development, these results serve as a guiding beacon, urging us to embrace MARL as an essential instrument in sculpting the future of our urban environments.

References

  1. 1. Caprioli C, Bottero M. Addressing complex challenges in transformations and planning: A fuzzy spatial multicriteria analysis for identifying suitable locations for urban infrastructures. Land Use Policy. 2021, 2021;102:105147. DOI: 10.1016/j.landusepol.2020.105147
  2. 2. Li J, Wu X, Fan J, Liu Y, Xu M. Overcoming driving challenges in complex urban traffic: A multi-objective eco-driving strategy via safety model based reinforcement learning. Energy. 2023, 2023;284:128517. DOI: 10.1016/j.energy.2023.128517
  3. 3. Bakıcı T, Almirall E, Wareham J. A smart city initiative: The case of Barcelona. Journal of the Knowledge Economy. 2012;4(2013):135-148. DOI: 10.1007/s13132-012-0084-9
  4. 4. Anthopoulo LG. Understanding Smart Cities: A Tool for Smart Government or an Industrial Trick (Public Administration and Information Technology). Vol. 22. Cham: Springer Nature; 2017. DOI: 10.1007/978-3-319-57015-0
  5. 5. Fernandez-Anez V, Fernández-Güell JM, Giffinger R. Smart City implementation and discourses: An integrated conceptual model. The case of Vienna, Cities. 2018;78(2018):4-16. DOI: 10.1016/j.cities.2017.12.004
  6. 6. Mardacany E. Smart cities characteristics: Importance of built environment components. In: Proceedings of IET Conference on Future Intelligent Cities 2014. London: ETI; 2014. pp. 1-6. DOI: 10.1049/ic.2014.0045
  7. 7. Ali Abdul Razzaq Taresh AAR, Zghair NAK. Redesign of the communications network based on high availability of traffic management technologies to improve the communication. Measurement: Sensors. 2023;27(2023):100776. DOI: 10.1016/j.measen.2023.100776
  8. 8. Hu L, Tian Q, Zou C, Huang J, Ye Y, Wu X. A study on energy distribution strategy of electric vehicle hybrid energy storage system considering driving style based on real urban driving data. Renewable and Sustainable Energy Reviews. 2022, 2022;162:112416. DOI: 10.1016/j.rser.2022.112416
  9. 9. Kober J, Bagnell JA, Peters J. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research. 2013;32(11):1238-1274. DOI: 10.1177/0278364913495721
  10. 10. Singh B, Kumar R, Singh VP. Reinforcement learning in robotic applications: A comprehensive survey. Artificial Intelligence Review. 2022;55(2):945-990. DOI: 10.1007/s10462-021-09997-9
  11. 11. Casavola A, Franzè G, Gagliardi G, Tedesco F. A Multi-Agent Trust and Reputation Mechanisms for the Management of Smart Urban Lighting Systems. IFAC-PapersOnLine. 2022, 2022;55(6):545-550. DOI: 10.1016/j.ifacol.2022.07.185
  12. 12. Sinyabe E, Kamla V, Tchappi I, Najjar Y, Galland S. Shapefile-based multi-agent geosimulation and visualization of building evacuation scenario. Procedia Computer Science. 2023;220(2023):519-526. DOI: 10.1016/j.procs.2023.03.066
  13. 13. Hayes K, Ghosh S, Gnenz W, Annett J, Bryne MB. Smart city Edmonton. In: Augusto JC, editor. Handbook of Smart Cities. Cham: Springer; 2021. DOI: 10.1007/978-3-030-69698-6_17
  14. 14. Bergs T, Gierlings S, Auerbach T, Klink A, Schraknepper D, Augspurger T. The concept of digital twin and digital shadow in manufacturing. Procedia CIRP. 2021;101(2021):81-84. DOI: 10.1016/j.procir.2021.02.010
  15. 15. Yoon S. Building digital twinning: Data, information, and models. Journal of Building Engineering. 2023, 2023;76:107021. DOI: 10.1016/j.jobe.2023.107021
  16. 16. Keegan BJ, McCarthy IP, Kietzmann J, Canhoto AI. On your marks, headset, go! Understanding the building blocks of metaverse realms. Business Horizons. 2023;2023. DOI: 10.1016/j.bushor.2023.09.002
  17. 17. Guo M, Liu Y, Yu H, Hu B, Sang Z. An overview of smart city in China. Communications. 2016;13(5):203-211. DOI: 10.1109/CC.2016.7489987
  18. 18. Das RK, Misra H. Smart city and E-governance: Exploring the connect in the context of local development in India. In: Fourth International Conference on eDemocracy & eGovernment (ICEDEG), Quito. Quito, Ecuador: IEEE; 2017. pp. 232-233. DOI: 10.1109/icedeg.2017.7962540
  19. 19. Sang Z, Li K. ITU-T standardization activities on smart sustainable cities. IET Smart Cities. 2019;1(1):3-9. DOI: 10.1049/iet-smc.2019.0023
  20. 20. Rehena Z, Janssen M. The smart city of Pune. Journal of Smart City Emergence. 2019;2019:261-282. DOI: 10.1016/B978-0-12-816169-2.00012-2
  21. 21. Ismagilova E, Hughes L, Dwivedi YK, Raman KR. Smart cities: Advances in research—An information systems perspective. International Journal of Information Management. 2019;47:88-100. DOI: 10.1016/j.ijinfomgt.2019.01.004
  22. 22. Vinod Kumar TM, Dahiya B. Smart economy in smart cities. In: Vinod Kumar T, editor. Smart Economy in Smart Cities. Advances in 21st Century Human Settlements. Singapore: Springer; 2017. DOI: 10.1007/978-981-10-1610-3_1
  23. 23. Appio FP, Lima M, Paroutis S. Understanding smart cities: Innovation ecosystems, technological advancements, and societal challenges. Technological Forecasting and Social Change. 2019:142:1–14. DOI: 10.1016/j.techfore.2018.12.018
  24. 24. Anthopoulos LG, Reddick CG. Smart city and smart government: synonymous or complementary? In: Proceedings of the 25th International Conference Companion on World Wide Web (WWW ‘16 Companion). Switzerland: International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva; 2016. pp. 351-355. DOI: 10.1145/2872518.2888615
  25. 25. Vinod Kumar TM. Smart metropolitan regional development. In: Book, Chapter Advances in 21st Century Human Settlements Book Series (ACHS). Berlin: Springer; 2019. DOI: 10.1007/978-981-10-8588-8
  26. 26. Yigitcanlar T, Kamruzzaman M, Foth M, Sabatini-Marques J, da Costa E, Ioppolo G. Can cities become smart without being sustainable? A systematic review of the literature. Sustain Cities and Society. 2019;45:348-365. DOI: 10.1016/j.scs.2018.11.033
  27. 27. Sarkheyli A, Sarkheyli E. Smart megaprojects in smart cities, dimensions, and challenges. In: Chapter 19-Smart Cities Cybersecurity and Privacy. New York, NY, United States: Elsevier; 2019. pp. 269-277. DOI: 10.1016/B978-0-12-815032-0.00019-6
  28. 28. Vázquez-Canteli JR, Nagy Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Applied Energy. 2019;235(2019):1072-1089. DOI: 10.1016/j.apenergy.2018.11.002
  29. 29. Vinyals O, Babuschkin I, Czarnecki WM, et al. 2019, grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature. 2019;575:350-354. DOI: 10.1038/s41586-019-1724-z
  30. 30. Shakya AK, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems with Applications. 2023, 2023;231:120495. DOI: 10.1016/j.eswa.2023.120495
  31. 31. Vázquez-Canteli JR, Ulyanin S, Kämpf J, Nagy Z. Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities. Sustainable Cities and Society. 2019;45(2019):243-257. DOI: 10.1016/j.scs.2018.11.021
  32. 32. Jones M, Peet M. A generalization of Bellman’s equation with application to path planning, obstacle avoidance and invariant set estimation. Automatica. 2021, 2021;127:109510. DOI: 10.1016/j.automatica.2021.109510
  33. 33. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. 2nd ed. Massachusetts: MIT Press, Cambridge; 2015. Available from: https://inst.eecs.berkeley.edu/∼cs188/sp20/assets/files/SuttonBartoIPRLBook2ndEd.pdf [Accessed: September 7, 2023]
  34. 34. Github. n.d. Available from: https://github.com/intelligent-environments-lab/CityLearn
  35. 35. Vázquez-Canteli JR, Kämpf J, Henze G, Nagy Z. CityLearn v1.0: An openai gym environment for demand response with deep reinforcement learning. In: BuildSys 2019 – Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. New York, NY, United States: ACM; 2019. pp. 356-357. DOI: 10.1145/3360322.3360998
  36. 36. Dhamankar G, Vazquez-Canteli JR, Nagy Z. Benchmarking multi-agent deep reinforcement learning algorithms on a building energy demand coordination task. In: RLEM 2020 – Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings and Cities. New York, NY, United States: ACM; 2020. pp. 15-19. DOI: 10.1145/3427773.3427870
  37. 37. Kovařík V, Schmid M, Burch N, Bowling M, Lisý V. Artificial Intelligence. 2022;303(2022):103645. DOI: 10.1016/j.artint.2021.103645
  38. 38. Biljecki F, Ledoux L, Stoter J, Vosselman G. The variants of an LOD of a 3D building model and their influence on spatial analyses. ISPRS Journal of Photogrammetry and Remote Sensing. 2016;116(2016):42-54. DOI: 10.1016/j.isprsjprs.2016.03.003
  39. 39. Verma JK, Paul S, editors. Advances in Augmented Reality and Virtual Reality. Singapore: Springer; 2022. p. 312. DOI: 10.1007/978-981-16-7220-0
  40. 40. Johannes E et al. Procedural modeling of architecture with round geometry. Computers & Graphics (Amsterdam, Netherlands). 2017;64:14-25. DOI: 10.1016/j.cag.2017.01.004
  41. 41. Peeters A, Etzion Y. Automated recognition of urban objects for morphological urban analysis. Computers, Environment and Urban Systems. 2012;36(6):573-582
  42. 42. Biljecki F, Ledoux H, Stoter J. Generating 3D city models without elevation data. Computers, Environment and Urban Systems. 2017;64:1-18
  43. 43. Swathika OVG, Karthikeyan K, Padmanaban S. Smart Buildings Digitalization. Case Studies on Data Centers and Automation. CRC Press; 2022. p. 314. DOI: 10.1201/9781003240853
  44. 44. Cherdo L. The 8 Best 3D Scanning Apps for Smartphones and IPads in 2019. 2019. Available from: https://www.aniwaa.com/buyers-guide/3d-scanners/best-3d-scanning-apps-smartphones/ [Accessed: December 5, 2022]
  45. 45. Epstein JM. Remarks on the foundations of agent-based generative social science. In: Tesfatsion L, Judd KL, editors. Handbook of Computational Economics. Vol. 2. Stanford, CA, USA: Elsevier; 2006. pp. 1585-1604. DOI: 10.1016/S1574-0021(05)02034-4
  46. 46. Jiang F, Ma J, Webster CJ, Chiaradia A, Zhou Y, Zhao Z, et al. Generative urban design: A systematic review on problem formulation, design generation, and decision-making. Progress in Planning. 2023;2023:100795. DOI: 10.1016/j.progress.2023.100795
  47. 47. Shoham Y, Leyton-Brown K. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Revision 1.1, Stanford University, University of British Columbia, Cambridge University; 2010. Available from: http://www.masfoundations.org/mas.pdf [Accessed: August 28, 2023]
  48. 48. Palanisamy P. Multi-agent connected autonomous driving using deep reinforcement learning. In: 2020 International Joint Conference on Neural Networks (IJCNN). Vol. 2020. USA: IEEE; 2020. pp. 1-7. DOI: 10.48550/arXiv.1911.04175
  49. 49. Tian Y, Kladny K-R, Wang Q, Huang Z, Fink O. Multi-agent actor-critic with time dynamical opponent model. Neurocomputing (New York, NY, United States: Cornell University). 2023;517:165-172. DOI: 10.48550/arXiv.2204.05576
  50. 50. Foerster J, Assael IA, De Freitas N, Whiteson S. 2016, learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems. 2016;29:2137-2145. DOI: 10.48550/arXiv.1605.06676
  51. 51. Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning. Vol. 2018. PMLR; 2018. pp. 4295-4304. DOI: 10.48550/arXiv.1803.11485
  52. 52. Sukhbaatar S, Fergus R, Szlam A. 2016, learning multiagent communication with backpropagation. Advances in Neural Information Processing Systems. 2016;29:2244-2252. DOI: 10.48550/arXiv.1605.07736
  53. 53. Amato C, Konidaris G, Kaelbling LP, How JP. 2019, Modeling and planning with macro-actions in decentralized POMDPs. Journal of Artificial Intelligence Research. 2019;64:817-859. DOI: 10.1613/jair.1.11418
  54. 54. Davide D, Davide C, Giuseppe P, MarcoSavino P, Capozzoli Alfonso C-Z. Exploring the potentialities of deep reinforcement learning for incentive-based demand response in a cluster of small commercial buildings. Energies. 2021;14(10):1-25. DOI: 10.3390/en14102933
  55. 55. Glatt RG, Silva FL, Soper B, Dawson W, Rusu E, Goldhahn R. Collaborative energy demand response with decentralized actor and centralized critic. In: Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. New York, NY, USA: ACM; 2021. pp. 333-337. DOI: 10.1145/3486611.3488732
  56. 56. Qin R, Gao S, Zhang X, Xu Z, Huang S, Li Z, et al. NeoRl: A near Real-World Benchmark for Offline Reinforcement Learning. 2021. DOI: 10.48550/arXiv.2102.00714

Written By

Igor Agbossou

Submitted: 24 September 2023 Reviewed: 09 October 2023 Published: 09 December 2023