The Performance of Reinforcement Learning for Indoor Climate Control Devices according to the Level of Outdoor Air Particulate Matters

Kim, Sun Ho; Moon, Hyeun Jun

doi:10.3390/buildings13123062

Open AccessArticle

The Performance of Reinforcement Learning for Indoor Climate Control Devices according to the Level of Outdoor Air Particulate Matters

by

Sun Ho Kim

and

Hyeun Jun Moon

^*

Department of Architectural Engineering, Dankook University, Yong-in 448-701, Republic of Korea

^*

Author to whom correspondence should be addressed.

Buildings 2023, 13(12), 3062; https://doi.org/10.3390/buildings13123062

Submission received: 7 November 2023 / Revised: 28 November 2023 / Accepted: 6 December 2023 / Published: 8 December 2023

(This article belongs to the Special Issue AI and Data Analytics for Energy-Efficient and Healthy Buildings)

Download

Browse Figures

Versions Notes

Abstract

:

As people spend more than 90% of their time indoors, indoor environmental quality (IEQ) has become an important factor in maintaining a healthy space for the occupants. There are many indoor climate control devices for improving IEQ. However, it is difficult to maintain an appropriate IEQ with changing outdoor air conditions and occupant behavior in a building. This study proposes a reinforcement learning (RL) algorithm to maintain indoor air quality (IAQ) with low energy consumption in a residential environment by optimally operating indoor climate control devices such as ventilation systems, air purifiers, and kitchen hoods. The proposed artificial intelligence algorithm (AI2C2) employs DDQN (double deep Q-network) to determine the optimal operation of various indoor climate control devices, reflecting IAQ and energy consumption via different outdoor levels of particulate matter. This approach considers the outdoor air condition and occupant activities in training the developed algorithm, which are the most significant factors affecting IEQ and building energy performance. A co-simulation platform using Python and EnergyPlus is applied to train and evaluate the model. As a result, the proposed approach reduced energy consumption and maintained good IAQ. The developed RL algorithm for energy and IAQ showed different performances based on the outdoor PM 2.5 level. The results showed the RL-based control can be more effective when the outdoor PM 2.5 level is higher (or unhealthy) compared to moderate (or healthy) conditions.

Keywords:

reinforcement learning; energy efficiency; indoor air quality; occupant activity; double deep Q-network; co-simulation platform

1. Introduction

Building energy consumption has been increasing consistently for several years. The US Energy Information Administration reported that the increase in electric usage in residential buildings is faster than in any other sector and that electric energy sources will account for half of all household energy use by 2050 [1]. Therefore, reducing the electricity consumption in residential buildings is essential.

Indoor environmental quality (IEQ) has a significant impact on the health, morale, productivity, working efficiency, and satisfaction of occupants. IEQ can be divided into four factors: thermal comfort, indoor air quality (IAQ), visual comfort, and aural comfort [2]. After COVID-19, preventing the spread of infectious viruses in the built environment has become also a very important issue [3]. Pollutants affecting IAQ include chemicals, particulates, and biological substances, such as particulate matter, carbon monoxide (CO), carbon dioxide (CO₂), formaldehyde, radon, and ozone [4]. CO₂ is an important indicator of proper ventilation. Exposure to CO₂ can produce various health effects such as respiratory system effects, neurological symptoms, cognitive performance effects, and physiological responses [5]. Satish et al. [6] experimented to prove that the indoor CO₂ level affects decision-making performance. They showed that statistically significant reductions occurred in seven decision-making scales, except for the focused activity at a high CO₂ concentration of 2500 ppm. Erdmann and Apte [7] analyzed the US Environmental Protection Agency office data and found that dCO₂ (the difference between I/O CO₂) positively correlates with sore throat, dry eyes, sneezing, nose/sinus congestion, and wheezing symptoms.

Particulate matter is a microscopic solid or liquid droplet that is sufficiently small to be inhaled and causes serious health problems. It is classified according to its diameter for air quality regulation purposes. Kim and Kabir [8] reported that exposure to particulate matter increases hospitalization, emergency visits, respiratory symptoms, and results in the deterioration of cardiovascular diseases and chronic respiratory conditions, reduces lung function, and can cause premature death. Sofer et al. [9] used methylation array data to study the correlation between particulate matter and gene methylation in patients with asthma. They found that PM 10 decreased lung function by approximately 3–6%, as evaluated by the peak expiratory flow. However, these particulate matter problems can occur not only outdoors but also indoors. This is because of the introduction of particulate matter from outdoors through infiltration and natural/mechanical ventilation, as well as particulate matter produced by occupant activities such as cooking, indoor cleaning, smoking, and walking [10].

There are various control strategies to improve IAQ along with low energy consumption. These methods can be divided into classical control and advanced control based on the core operating principle of the control [11]. Classical control refers to relatively intuitive control methods including simple on–off control, proportional integral derivative (PID), and rule-based control (RBC). However, it cannot deal with the complexity of building systems and consider the changes in indoor/outdoor weather conditions, occupant activity, and control devices’ status. To overcome these limitations, advanced controls have been proposed, such as model predictive control (MPC), and reinforcement learning (RL) control.

In classical control strategies, RBC rules are usually prescribed with a predetermined value of the upper/lower thresholds limit and a simple control algorithm according to the knowledge and experiments of building engineers and facility managers. Kim et al. [12] developed an integrated control algorithm for a couple of indoor climate control devices for heating, cooling, and ventilation. The developed algorithm could maintain thermal comfort and IAQ simultaneously by employing RBC according to the least upper and lower bound. However, a simple control strategy causes inefficient operation of the building owing to instabilities caused by interactions between the multiple indoor climate controls being operated at the same time. Moreover, it is difficult to use this method with varying environmental conditions [13].

Regarding advanced control methods, model-based control strategies have been studied to reflect the building’s thermal dynamics and perform building control using a simulation model. Recently, MPC has been studied as an appropriate alternative to RBC [14]. Berouine et al. [15] demonstrated that MPC showed better performance than PID and state feedback (SF) controllers to improve IAQ and energy efficiency. The energy consumption using MPC is lower than those of PI and SF by 33.3% and 18.9%, respectively. However, the effectiveness of MPC depends on the quality of the simulation model due to the complexity of the thermal dynamics and the various influencing variables in buildings [16]. Model-free methods using RL were developed to overcome the disadvantages of model-based methods [17]. The deep Q-network (DQN), which integrates RL with a deep neural network (DNN), was developed [18]. An, Niu, and Chen [19] demonstrated that the health risk related to PM 2.5 and energy consumption of air cleaners decreased by 3.2–46.7% and by 2.4–43.7%, respectively, for all 18 cases by applying the DQN model. An et al. [20] developed a window controller with DQN to reduce the indoor PM 2.5 level. The developed control method decreased the mean indoor PM 2.5 concentration by 12.80% in a virtual building and by 9.11% in a real apartment. Yu et al. [21] proposed a control algorithm to optimize the operation of the air conditioner and exhaust fan through DQN. They showed that DQN decreased the energy usage of an air conditioner by 43% while maintaining indoor thermal comfort. Also, indoor CO₂ concentration was reduced by about 24%. However, DQN has the problem of overestimating the action value [22,23]. Van Hasselt et al. introduced the double deep Q-network (DDQN), employing distinct networks [24]. The current Q-network is used for choosing the subsequent greedy actions, while the target network assesses the selected action. Valladares et al. [25] utilized the DDQN to enhance thermal comfort and IAQ while minimizing the energy consumption of air conditioners and fans. Consequently, the proposed control algorithm maintains a constant predicted mean vote (PMV) value of between 0.07 and 0.1, and an average CO₂ concentration under 800 ppm. This approach resulted in a reduction of energy consumption by 4–5%. DDQN has been utilized in a study with a bio-sensor for human skin temperature to propose a building control algorithm. The proposed method led to a 59% enhancement in thermal satisfaction [26]. Liu et al. [27] applied DDQN to home management to find the best schedule for home appliances. As a result, the study demonstrated a more effective reduction in energy costs compared to the PSO (particle swarm optimization).

However, most previous studies operated only one indoor climate control device to improve IEQ; thus, the previous control method cannot reflect interactions among multiple indoor climate control devices. Also, previous studies did not consider occupant activities and outdoor conditions, even though occupants and outdoor air are the most important factors affecting building energy consumption and indoor environments [28,29,30,31,32].

Therefore, we developed an advanced control method for indoor climate control devices to reflect interactions among indoor climate control devices and changes of occupant activities and the outdoor environment. To reflect the interactions of multiple indoor climate control devices, we utilized various indoor climate control devices such as a ventilation system, a kitchen hood, and an air purifier to satisfy two IAQ factors, namely PM 2.5 and CO₂. Also, we selected seven occupant activities (i.e., slee**, eating meals/snacks, working, cooking (oven, grilling, and frying), indoor cleaning, exercising, and resting) to reflect the influence of the occupants on indoor PM 2.5 and CO₂ concentrations. To consider the outdoor air condition, we trained and evaluated the algorithm by dividing it into Case 1 (Unhealthy-Very unhealthy) and Case 2 (Good-Moderate) based on the outdoor PM 2.5 concentration. A DDQN is implemented to reflect various factors related to building control. The developed algorithm is called artificial intelligence-integrated clean control (AI2C2). Figure 1 shows a schematic of AI2C2.

2. Double Deep Q-Network (DDQN)

IEQ could be influenced by indoor and outdoor sources, including the number of people, activities, use of home appliances, and weather. It can be challenging to incorporate all these factors and derive an optimum control signal with a simple on–off control method, especially in a situation of rapidly changing conditions such as the outdoor concentration of particulate matter [13]. The proposed method can overcome these limitations by employing the DDQN to consider the various influencing factors.

Mnih et al. [18] proposed a DQN that integrates RL with DNN. By employing artificial neural networks, the process of parameterizing the Q-value does not need high computing power anymore. This approach, however, was shown to be inconsistent when neural networks were employed to represent the Q-values, primarily owning to correlated training samples. Significant variances in the policy and data distribution occurred with a small update to the Q-value. The relationship between the Q-value and target value also generates instabilities. Instabilities could be mitigated when DQN incorporated two significant concepts, the target network and experience memory. However, Q-learning and DQN have the drawback of overestimating action values, resulting in suboptimal policies [22,23] because the max operator utilized the same value for selecting and evaluating actions. Van Hasselt et al. [29] developed the DDQN algorithm to overcome this limitation. DDQN uses current Q-networks to select the input that produces the largest output, whereas the target network assesses the selected action. Equation (1) expresses the loss function of the DDQN.

L_{I} (θ_{I}) = E [{(r + γ Q (s^{'}, a r g \max_{a^{'}} Q (s^{'}, a^{'}; θ_{i}^{-}) - Q (s, a; θ_{i}))}^{2}]

(1)

3. Materials and Methods

3.1. Testbed and a Simulation Model

The training data were collected from a test chamber located at the Dankook University campus in Korea. A first principle-based model was also constructed to integrate with the simulation as well. The test chamber has various indoor climate control systems, multiple sensors, and meters for electric power, gas, and water. Figure 2 shows a plan of the testbed and images of the climate control devices.

EnergyPlus was utilized to model the testbed and simulate the energy consumption and indoor air quality (CO₂ concentration) in the testbed. EnergyPlus is an energy simulation program to simulate energy consumption in buildings by integrating functions from DOE-2 and BLAST [33]. The size of the testbed is 4.0 m × 5.0 m × 2.4 m, and it is constructed with a urethane panel with gypsum lap**. Table 1 shows the specifications of the indoor climate control systems.

3.2. IAQ Standards

We selected PM 2.5 and CO₂ as IAQ indicators. The guidelines for particulate matter and CO₂ vary according to the country, environmental ministry, and health organization. In this study, 25 µg/m³ and 1000 ppm were used as the acceptable PM 2.5 and CO₂ levels to satisfy most IAQ guidelines. A detailed explanation of the IAQ guideline referred to in order to determine an acceptable level is provided in a previous study [34].

3.3. Numerical Model for Simulation IAQ

EnergyPlus was employed to simulate the CO₂ concentration and energy consumption. Equation (2) expresses the numerical model for CO₂ concentration [35].

{{ρ_{a i r} V}_{Z} C}_{C O 2} \frac{d C_{z}^{t}}{d t} = \sum_{i = 1}^{N_{s l}} k {g_{m a s s}}_{s c h e d l o a d} * {1.0}^{6} + \sum_{i = 1}^{N_{z o n e s}} \dot{m_{i}} (C_{z i} - C_{z}^{t}) + \dot{m_{i n f}} (C_{\infty} - C_{z}^{t}) + \dot{m_{s y s}} (C_{s u p} - C_{z}^{t})

(2)

In Equation (2),

{{ρ_{a i r} V}_{Z} C}_{C O 2} \frac{d C_{z}^{t}}{d t}

(kg/s) is the CO₂ storage term in the zone air;

ρ_{a i r}

(kg/m³) is zone air density; V_Z (m³) is the interior volume; C_CO₂ (-) is the CO₂ capacity multiplier;

C_{z}^{t}

(ppm) is the zone air CO₂ concentration at the current timestep,

\sum_{i = 1}^{N_{s l}} k {g_{m a s s}}_{s c h e d l o a d}

(kg/s) is the sum of scheduled internal CO₂ loads;

\sum_{i = 1}^{N_{z o n e s}} \dot{m_{i}} (C_{z i} - C_{z}^{t})

(ppm-kg/s) is the CO₂ transfer by interzone air mixing;

C_{z i}

(ppm) is the CO₂ concentration in the zone air being transferred into the zone;

\dot{m_{i n f}} (C_{\infty} - C_{z}^{t})

(ppm-kg/s) is the CO₂ transfer by infiltration and ventilation of outdoor air;

C_{\infty}

(ppm) is the outdoor air CO₂ concentration;

\dot{m_{s y s}} (C_{s u p} - C_{z}^{t})

(ppm-kg/s) is the CO₂ transfer by system supply;

C_{s u p}

(ppm) is the CO₂ concentration in the system supply air;

\dot{m_{s y s}}

(kg/s) is system supply air mass flow rate.

EnergyPlus has no class for simulating indoor particle dynamics. Thus, we employed a particle dynamics model to simulate the concentration of indoor particulate matter [36]. The indoor particulate matter is affected by the quantity of outdoor particulate matter that is brought into the indoor environment through infiltration and natural and mechanical ventilation, the amount of indoor particulate matter generated by occupant activities such as cooking, indoor cleaning, and smoking, and the amount of indoor particulate matter deposited or resuspended on indoor surfaces. Nazaroff’s particle dynamics model is expressed as Equation (3).

\frac{d (C_{i} V)}{d t} = E + C_{O} [Q_{S} (1 - η_{S}) + Q_{N} + (Q_{L} + Q_{H}) P)] - C_{i} [Q_{F} η_{F} + {β V + (Q_{S} + Q}_{N} + Q_{L} + Q_{H})]

(3)

In Equation (3), C_i (µg/m³) denotes the indoor concentration of the particle property; V (m³) is the interior volume; E (µg/min) is the emission rate in the room; C_O (µg/m³) denotes the outdoor concentration of particulate matter; Q_S, Q_N, and Q_L represent the airflow rate of mechanical ventilation, natural ventilation, and infiltration, respectively; and Q_F is an additional flow due to particle-control filters. Q_F is produced by an air purifier; η_S and η_F are the removal efficiency of the filter in the mechanical supply flow and air purifier, respectively; β (min⁻¹) is the deposition rate of particles onto the room surfaces; and P (-) represents the particle fraction in the flow path of the infiltration. The kitchen hood was also considered for removing indoor particulate matter; thus, Q_H, which is the flow rate of the kitchen hood, was added to the particle dynamics model.

Table 2 displays the input values for the indoor particle dynamics in the testbed. In this study, the measured values presented in other studies were used for the emission rate in the room [30,37], particle fractions in the flow path of infiltration, and deposition rate of particles onto room surfaces [38], because these values are difficult to measure in the testbed. The outdoor PM 2.5 concentration was divided into 2 cases based on the air quality indices of the Ministry of Environment of Korea. In Case 1, the outdoor PM 2.5 concentration is distributed between unhealthy and very unhealthy, and in Case 2, the outdoor PM 2.5 concentration is between good and moderate. Detailed information about the outdoor PM 2.5 concentration will be explained in Section 3.1. The actual measured values in the testbed were used for the volume of the room and airflow rate of infiltration. The airflow rate and filter efficiency of all indoor climate control devices were obtained from the product specification sheets.

3.4. Occupant Activity Schedule

The occupant activities were selected based on seven categories, namely slee**, eating meals/snacks, working, cooking (oven, grilling, and frying), indoor cleaning, exercising, and resting. This selection considered the occupant’s emission rate of particulate matter and CO₂. Table 3 lists the PM 2.5 and CO₂ emission rates based on occupant activities [30,37,39]. Previous research was referenced to, such as the ICATUS 2016 report [40] and Living time survey 2019 [41], to determine when each activity occurs and how long it lasts.

Figure 3 depicts the PM 2.5 and CO₂ emission rates based on the occupant activity. The blue line represents the changes of PM 2.5 concentration according to the occupant activities. As displayed in Table 3 and Figure 3, the occupants generate PM 2.5 during cooking and indoor cleaning. Cooking activities are classified into three types, that is, oven cooking, grilling, and frying, because there is a significant difference in the amount of PM 2.5 generated depending on the type of cooking. It is assumed that the first cooking step is by oven (10 µg/min), the second is grilling (283 µg/min), and the third is frying (1483 µg/min). Indoor cleaning is performed twice a day, and the emission rate of PM 2.5 is the same at 70 µg/min. The red line in Figure 3 signifies the change in CO₂ concentration according to the occupant activities. According to the activity level of the occupants, the emission rate of CO₂ is set from a minimum of 2.75 × 10⁻⁶ m³/s (slee**) to a maximum of 1.62 × 10⁻⁵ m³/s (exercising). Figure 4 shows the scheme of the testbed that includes the input variable of the numerical model for IAQ.

3.5. Co-Simulation Platform for AI2C2

Figure 5 illustrates the schematic of the co-simulation platform. EnergyPlus simulates the indoor CO₂ concentration in the testbed. We employed the Eppy Python library to exchange the values between the DDQN and EnergyPlus [42]. The implementation of DDQN was carried out using the Keras library. The current state which is simulated in the EnergyPlus indoor particle dynamics is transferred to Python; this is the optimal control action from the DDQN factors to satisfy the IAQ and energy efficiency simultaneously according to the input states. The chosen optimal control action is transferred to EnergyPlus and Nazaroff’s indoor particle dynamics as the control variables. Simulations and calculations are conducted to create the next state value. In this study, we assigned changes of the indoor CO₂/PM 2.5 emission rate and outdoor PM 2.5 concentration to consider variations in occupant activity and outdoor air condition.

3.6. DDQN Training for AI2C2

3.6.1. State Variables

The agent acquires the state from the environment and this information is utilized as the input variables for each DDQN training step. To train AI2C2, we selected 11 state variables related to the outdoor/indoor environment, states of indoor climate control devices, and physical states. Table 4 presents the information for each state. We employed min-max normalization to transform the state values between 0 and 1 to normalize the state data.

3.6.2. Control Action

The action is the set of possible operations that the DDQN agent can execute within the environment. The DDQN optimizes the agent to determine the optimal actions between all possible actions. As displayed in Table 5, the control action is selected for the on/off state and airflow volume of each indoor climate control device. For the air volume of each indoor environmental system, the actual values of the devices installed in the testbed were used, as listed in Table 1. There are 48 possible actions (4 × 3 × 4) found by multiplying the number of possible actions for each device. Min–max normalization was used to normalize the control variables.

3.6.3. Reward Function

As expressed in Equation (4), three reward factors, namely r_PM_2.5, r_CO₂, and r_EC, were used to consider the IAQ (PM 2.5 and CO₂ concentrations) and energy usage simultaneously. The best setting for the weight factor of each reward function (w_PM_2.5, w_CO₂, and w_EC) was selected through several iterations and trial and error.

r_{t o t a l} = w_{P M 2.5} r_{P M 2.5} + w_{C O 2} r_{C O 2} + w_{E C} r_{E C}

(4)

Equations (5) and (6) express the rewards for PM 2.5 and CO₂, respectively. A positive reward of +1 is assigned if the indoor PM 2.5 concentration is below 25 µg/m³ and the indoor CO₂ concentration is below 1000 ppm, indicating the establishment of good IAQ. On the contrary, if the PM 2.5 and CO₂ concentrations exceed the upper limit, a negative reward of −1 is assigned to impose a drawback.

r_{P M 2.5} = \{\begin{matrix} + 1 i f {P M 2.5}_{i n} \leq 25 µ g / m^{3} \\ - 1 i f P {M 2.5}_{i n} > 25 µ g / m^{3} \end{matrix}

(5)

r_{C O 2} = \{\begin{matrix} + 1 i f I n d o o r C O_{2} \leq 1000 p p m \\ - 1 i f I n d o o r C O_{2} > 1000 p p m \end{matrix}

(6)

The total energy consumption reward (r_EC) includes the electricity usage (kWh) by the ventilation system, air purifier, and kitchen hood. To minimize energy consumption, this reward is offered as a penalty in

r_{E C}

, Equation (7).

r_{E C} = - (E_{V S} + E_{A P} + E_{K H})

(7)

3.7. RBC for the Control IAQ

In this study, the proposed method was evaluated by comparing its performance with that of RBC. The RBC employs the simple on/off control according to the prescribed thresholds. If the indoor CO₂ concentration is over the upper limit of 1000 ppm, RBC operates only the ventilation system to reduce the indoor CO₂ by exchanging outdoor and indoor air. The RBC operates the ventilation system, kitchen hood, and air purifier simultaneously to lower the indoor particulate matter when the indoor PM 2.5 concentration is over the upper limit of 25 µg/m³. Table 6 lists the operation status of the indoor climate control devices based on the indoor CO₂ and PM 2.5 concentrations when these rules are applied.

3.8. Evaluation Factor

In this study, the proposed method was evaluated using the total energy consumption and healthy air ratios. The ventilation system, kitchen hood, and air purifier were utilized to improve IAQ. The total energy usage of these environmental devices was selected as an evaluation factor. The healthy air ratio indicates how well the IAQ is maintained in a healthy condition based on an acceptable level. The healthy air ratio is determined by the ratio of time spent under the acceptable air quality level to the reference time [43]. The healthy air ratios of each IAQ factor are defined by Equations (8) and (9).

H e a l t h y a i r r a t i o (P M 2.5) (%) = \frac{T h e d u r a t i o n s p e n t u n d e r t h e a c c e p t a b l e P M 2.5 l e v e l}{(24 \times 60) m i n u t e}

(8)

H e a l t h y a i r r a t i o ({C O}_{2}) (%) = \frac{T h e d u r a t i o n s p e n t u n d e r t h e a c c e p t a b l e C O_{2} l e v e l}{(24 \times 60) m i n u t e}

(9)

4. Results

4.1. Test Cases

A simulation was conducted using the co-simulation platform to evaluate the AI2C2. As listed in Table 7, two simulation cases were selected based on the outdoor PM 2.5 and CO₂ concentration. Case 1 assumed that the outdoor PM 2.5 concentration was between unhealthy and very unhealthy, and Case 2 assumed that it was between good and moderate. The outdoor PM 2.5 concentration was obtained from a weather station installed at Dankook University. The outdoor CO₂ concentration was set to a fixed value of 412.7 ppm, indicating that the change in outdoor CO₂ concentration with time was not large. This fixed value was the average CO₂ concentration in Korea (June 2020) [44].

Figure 6, Figure 7 and Figure 8 illustrate the indoor PM 2.5 and CO₂ concentrations of each case as well as the occupant activities when the devices were not operating. This simulation case was selected for three reasons:

In Figure 6, shown by the yellow background, when the outdoor PM 2.5 concentration is distributed between unhealthy and very unhealthy (Case 1), the indoor PM 2.5 concentration exceeds the upper limit without indoor emission of PM 2.5 such as eating meals/snacks and working. On the contrary, in Case 2 (outdoor PM 2.5 concentration: Good-Moderate), the indoor PM 2.5 concentration does not exceed the upper limit under the same indoor PM 2.5 emission condition, as shown in Figure 7 by the yellow background. These indoor and outdoor conditions demonstrate the efficiency of the AI2C2 algorithm.
As shown in Figure 6, the indoor PM 2.5 concentration changes significantly according to the occupant activity. For example, when the occupant activity is cooking (oven) with a high outdoor PM 2.5 concentration (Case 1), the indoor PM 2.5 concentration increases to 31.2 µg/m³, which is slightly higher than the upper limit. On the contrary, when the occupant activity is cooking (oven) with low outdoor PM 2.5 concentration (Case 2), the concentration of indoor PM 2.5 increases to 9.9 µg/m³, which is lower than the upper limit. However, when the occupant emits a large amount of particulate matter, such as by cooking (frying), the indoor PM 2.5 concentration exceeds the upper limit regardless of the outdoor PM 2.5 concentration (Case 1: 412.8 µg/m³ (Figure 6), Case 2: 392.3 µg/m³ (Figure 7). In this situation, it is demonstrated that AI2C2 can operate indoor environmental systems effectively to remove indoor PM 2.5 with consideration of occupant activities.
As shown in Figure 8, when all indoor climate control systems are not operated, the indoor CO₂ concentration changes of Case 1 and Case 2 are the same. This is because the outdoor CO₂ concentration and CO₂ emission rate according to occupant activity are the same in both cases. As displayed in Figure 8, the indoor CO₂ concentration varies with the occupant activity. When the occupant activity is slee**, the indoor CO₂ concentration decreases to 782.7 ppm, which satisfies the CO₂ guideline. This is because the activity level is relatively low when the occupants are asleep; thus, the emission rate of CO₂ is also low. However, as depicted in Figure 8, all occupant activities after waking up have higher levels than for slee**; thus, the indoor CO₂ concentration increases and exceeds 1000 ppm. In particular, the indoor CO₂ concentration reaches a maximum of 1487.6 ppm, significantly exceeding the upper limit during exercise. In this situation, it is demonstrated that AI2C2 can operate the ventilation system effectively to decrease the indoor PM 2.5 concentration while considering occupant activities.

4.2. Training of AI2C2

The simulation time step was set to 1 min steps (60 per hour). Thus, 1440 simulations and calculations (1 day = 1440 min) were conducted using EnergyPlus and indoor particle dynamics for 1 d, respectively. A single episode was defined as the simulation and calculation running for one day, and this process was repeated 20,000 times to find the optimal DDQN policy. In other words, the proposed method learned the best control value by repeating the control of indoor climate devices for the selected days (Case 1 and Case 2) 20,000 times.

In this study, we employed the Adam optimizer [45] to optimize AI2C2 with a learning rate set at 0.25 × 10⁻³. The size of the minibatch was set to 32, and the discount factor (

γ

) was set to 0.99. We updated the target network at the end of every 10 episodes. We employed a rectified linear unit as an activation function for the neural network and a linear activation function for the output layer. The neural network had two hidden layers, with 30 neurons identified based on the equation [46]. We configured the experience memory size of 1.44 × 10⁷ to accumulate the results from EnergyPlus and indoor particle dynamics. The agents’ experience (i.e., state, action, reward, and next state) was accumulated in experience memory at each time step. To update the Q-network, DDQN selected experience samples randomly from the replay memory.

The three reward factors, namely r_PM2.5, r_CO2, and r_EC, were used to consider the IAQ (PM 2.5 and CO₂) and energy usage simultaneously. Because it is necessary to select an appropriate reward function, assigning appropriate weight to each reward is necessary for enhancing the performance of the DDQN model. The selected weight factor was 1:1:3 (w_PM2.5:w_CO2:w_EC) for Case 1 and 1:1:5 for Case 2. Several trial-and-error steps were repeated to obtain the best setting that produces the optimal DDQN model.

Figure 9 illustrates the average Q-values of AI2C2 for each episode throughout 20,000 episodes. The Q-value increases as episodes progress, indicating that the AI2C2 algorithm learns how to control the devices optimally. The fluctuations in the Q-values are also stable as learning progresses to demonstrate the AI2C2 has been trained enough.

4.3. Evaluation of AI2C2

As shown in Table 8, we evaluated RBC and the proposed method by comparing total energy consumptions, healthy air ratios, and IAQ with device states using RBC (Figure 10 and Figure 11) and the proposed method (Figure 12 and Figure 13).

4.3.1. Case 1 (Outdoor PM 2.5 Concentration: Unhealthy-Very Unhealthy)

Case 1 supposed that the outdoor PM 2.5 concentration was between unhealthy and very unhealthy, and the outdoor CO₂ concentration was fixed at 412.7 ppm. Figure 12 illustrates the indoor environmental condition based on the device state of the proposed method in episode 19,981.

In respect of the energy consumption, the total energy consumption of the proposed method is 317.8 Wh, which is 15.3% lower than that of RBC (375.3 Wh). Specifically, the air purifier in the proposed method consumes more energy than that in RBC (RBC: 54 Wh vs. AI2C2: 194.4 Wh). However, this increase is counterbalanced by decreasing the energy consumption of other devices (i.e., ventilation system and kitchen hood). In terms of the ventilation system, the proposed method consumes 64.3% less energy than RBC. This result indicates that the proposed method reflects the characteristics of ventilation systems, which consume more energy than air purifiers and kitchen hoods, by operating the ventilation system only when ventilation is necessary. For example, as shown in Figure 12 by the yellow background, the proposed method operates the ventilation system to prevent the indoor CO₂ concentration from exceeding the upper limit when the occupant is exercising, which has a high CO₂ emission rate. The proposed method also operates the ventilation system to reduce both the indoor CO₂ and PM 2.5 when the occupant activity increases the emission rate of CO₂ and PM 2.5 simultaneously, such as by indoor cleaning and cooking (Figure 12, blue background). This efficient operation demonstrates that the proposed method operates the ventilation system only when ventilation is necessary.

As depicted in Figure 12, the proposed method operates the air purifier more often than the ventilation system and kitchen hood. This is because the air purifier consumes less energy than the ventilation system and kitchen hood. Furthermore, the operation time of the air purifier is increased to counterbalance the reduction in the indoor PM 2.5 removal resulting from a decrease in the ventilation system operation time. The kitchen hood also consumes more energy than an air purifier; thus, the proposed method minimizes the operation of the kitchen hood, except during indoor cleaning and cooking, which have high PM 2.5 emission rates (Figure 12, green background). Thus, during the operation of the kitchen hood, the proposed method consumes 52.4% less energy than RBC. This shows that the DDQN control can efficiently operate the indoor environmental systems based on the characteristics of indoor climate control devices, indoor/outdoor environments, and occupant activities to decrease the total energy consumption.

In terms of IAQ, the proposed method and RBC could not satisfy the PM 2.5 standard when the PM 2.5 emission rate is high, such as when cooking (grilling and frying), as shown in Figure 10a,b and Figure 12a,b. In the case of the proposed algorithm, the indoor PM 2.5 concentration exceeds the upper limit for 35 min when occupant activity is cooking (grilling) (Figure 12a). This duration is longer than the excess time of RBC (20 min, Figure 10a) because the proposed method prioritizes running air purifiers to increase energy efficiency, unlike RBC, which operates all devices to remove PM 2.5. This pattern also appears when the occupant activity is cooking (frying) (Figure 10b and Figure 12b). Therefore, the healthy air ratio (PM 2.5) of the proposed method is lower than that of RBC during cooking (grilling and frying).

However, in RBC, there are times when the indoor PM 2.5 concentration exceeds the upper limit during occupant activities other than cooking (grilling and frying). In RBC, the indoor PM 2.5 concentration exceeds the upper limit because all systems are operated after exceeding the acceptable level of 25 μg/m³. This inefficient operation of RBC is illustrated in Figure 10c. In contrast, the proposed method operates the indoor climate control devices before the indoor PM 2.5 concentration exceeds the upper limit, as displayed in Figure 12c. This effective operation leads to an improved IAQ; thus, the healthy air ratio (PM 2.5) of the proposed method is 94.1%, which is 1.6% higher than that of RBC.

The proposed method’s healthy air ratio (CO₂) is 99.7%, which is a 0.5% slight improvement over that of RBC. In RBC, the indoor CO₂ concentration does not satisfy the IAQ standard of 1000 ppm when the activity level is high, as illustrated in Figure 10d. In contrast, as shown in Figure 12 by the yellow background, the proposed method exhibits an improved healthy air ratio (CO₂) compared to RBC. This is attributed to the fact that AI2C2 activates the ventilation system before the indoor CO2 concentration surpasses the acceptable level.

4.3.2. Case 2 (Outdoor PM 2.5 Concentration: Good-Moderate)

Case 2 assumed that the outdoor PM 2.5 concentration was between good and moderate, and the outdoor CO₂ concentration was fixed at 412.7 ppm. As with Case 1, the performance of the proposed method was evaluated by using the average value (last 50 episodes) in Case 2. Figure 13 depicts the indoor climate condition based on the device status of the proposed method in episode 19,992 of the DDQN learning. As shown in Table 8, the total energy usage of the proposed method is 160.9 Wh, which is 9.6% lower than that of RBC (177.9 Wh) when the outdoor PM 2.5 concentration is between good and very moderate (Case 2). In Case 2, the trend of increasing or decreasing energy consumption by the indoor climate device of the proposed method is similar to Case 1. In other words, compared to RBC, the proposed method utilizes the air purifier more actively than the ventilation system and kitchen hood, which shows the energy consumption of air purifier increases by 166.5% and decreases by 30.2% and 50.2% in the ventilation system and kitchen hood compared to RBC. This operation, which actively utilizes the air purifier, decreases total energy consumption and offsets the reduction of PM 2.5 removal owing to a decrease in the operation of the ventilation system and kitchen hood. This shows that RL can improve energy efficiency by reflecting the characteristics of climate control devices.

As shown in Figure 11 by the yellow background, when an occupant emits a large amount of particulate matter, the indoor PM 2.5 concentration does not satisfy the acceptable level even when all devices are operating. The proposed method exceeds the upper limit for 38 min (Figure 13a) and 45 min (Figure 13b) when the occupant activity is cooking (grilling, frying). This excess time is 23 min longer than RBC in the case of grilling and 27 min longer in the case of frying. This is because the proposed method utilizes primarily the air purifier for energy-efficient operation, unlike RBC, which utilizes all indoor climate devices to remove indoor PM 2.5 (Figure 13, yellow background). Instead, as shown in Figure 13 by the purple background, the proposed method rapidly starts to operate the ventilation system and kitchen hood to eliminate indoor PM 2.5 after the occupant has finished cooking. In other words, when the PM 2.5 emission rate is so high that operating all devices still does not satisfy the particulate matter standards, the proposed method operates the devices after the particulate matter generation has decreased. Thus, the proposed method shows slightly lower performance than RBC. Specifically, the proposed method’s healthy air ratio (PM 2.5) is 94.2%, which is 2.8% lower than RBC. This control strategy of the proposed method is similarly observed in Case 1 (Figure 12, purple background).

4.4. The Performance of AI2C2 according to Reward Weights

In RL application, selecting the proper hyperparameter has a significant impact on the model performance. Thus, there is a laborious process that requires many iterations and time to find the best hyperparameter setting [47]. Reward function and a weight to each reward are also one of the hyperparameters to improve model performance. We repeated several trials to determine the best setting for producing the optimal DDQN model. In selecting the weight factor for the proposed method, our objective was to improve IAQ while saving energy compared to RBC. For example, Table 9 presents the performance of the proposed method in Case 1 according to the reward weights. First, we set the weight factor ratio to 2:2:1 (w_PM2.5:w_CO2:w_EC) to increase the importance of IAQ. As a result, AI2C2 demonstrated improved performance compared to RBC, with a 3.2% improvement in the healthy ratio (PM 2.5) and 0.5% in the healthy ratio (CO₂). However, the total energy consumption of the proposed method increased by 41.3% compared to RBC, indicating lower performance in energy efficiency. Therefore, we adjusted the weight factor ratio to 1:1:1 to reduce the importance assigned to IAQ. In this ratio, the proposed method showed better performance, with a total energy consumption of 244.9 Wh; however, the healthy ratio (PM 2.5) is 91.4%, which is lower than RBC (92.5%). In other words, the performance of the proposed method is focused on energy savings or IAQ improvement according to the weight factor ratio. To select an appropriate ratio, there were many attempts to find a ratio that improves IAQ while decreasing the total energy consumption based on RBC. As explained in Section 3.2, the weight factor ratio was finally set to 1:1:3 (w_PM2.5:w_CO2:w_EC). This process demonstrates the importance of hyperparameter selection in the DDQN.

5. Discussion

In summary, it was confirmed that applying RL to a building control method can maintain IAQ with energy efficiency. The importance of this study is that it goes beyond previous research, that operated only one indoor climate device, which represents a limited situation [48], such as HVAC [49,50], ventilation systems [51,52], and air purifiers [19], by utilizing various indoor environmental systems to improve IAQ while decreasing the total energy consumption.

Furthermore, it examined how occupant activities affect indoor PM 2.5 and CO₂ concentrations by classifying the occupant activities in detail and showed that the proposed method can operate indoor environmental systems effectively by considering occupant activities. Valladares et al. [25] applied the number of occupants to the state variable to control the air conditioner and ventilation fans by employing a DQN and DDQN. The number of people was changed based on a schedule in a laboratory room with approximately 2–10 occupants, as well as a classroom with up to 60 occupants, and the occupant activity was fixed as sitting on a chair. The developed algorithm represented a superior PMV and 10% lower CO₂ levels, while consuming approximately 4–5% less energy. Reflecting the occupant status in building control strategies for floor heating can also be found in the literature. Zhang and Lam [53] utilized the occupancy status (absence/presence) value as one of the state values. This control method showed a decrease in heating demand of 16.6–18.2% compared to RBC over a three-month period. Deng et al. [54] also used an occupant flag (absence/presence) to develop a DQN model for optimal HVAC control. They demonstrated that thermal comfort and energy saving increased by 9% and 13%, respectively. These studies show that it is possible to improve the indoor environment while saving energy using relatively simple occupant-related parameters such as occupancy and number of occupants. However, these studies did not consider changes in the indoor environment according to the occupant activity as in this study. Thus, the proposed method enabled precise control to maintain acceptable PM 2.5 and CO₂ concentrations. Also, the proposed method showed more energy-efficient device control than RBC in both cases by reflecting occupant activity and indoor climate control device characteristics simultaneously.

In this study, we conducted the simulation for a single zone using the numerical model (EnergyPlus) for CO₂ concentration and Nazaroff’s particle dynamics model for PM 2.5. Although IAQ can be calculated in a relatively simple method for such a single zone, it is difficult to apply this simple method to actual buildings. This is because most spaces where occupants live are not single zone, but multi-zone. Therefore, simulation programs such as CONTAM [55] and COMIS [56], which consider air flows and concentrations of pollutants across various zones, should be used to expand the algorithm for these multi-zones. Also, numerical models utilized in this study predicted average concentrations. However, in real residential environments, there are variations in PM 2.5 concentrations in different spaces (e.g., kitchen and bedroom), and the fluid flow of pollutants from contamination sources changes based on the emission points. Thus, it is necessary to employ computational fluid dynamics to improve the proposed method in a manner that reflects different IAQ based on different spaces.

RL could be utilized to enhance other IEQ factors (i.e., thermal comfort, visual comfort, and acoustic) other than IAQ [57]. Thus, this study can be expanded to include other indoor climate systems, including an air conditioner, humidifier, dehumidifier, lighting, and blinds.

6. Conclusions

DDQN was employed to develop the proposed method algorithm by reflecting variances of indoor and outdoor air conditions, the operating state of indoor climate control device control, and occupant activities that affect IAQ and energy consumption. The proposed method operates multiple indoor climate control devices simultaneously, such as ventilation systems, kitchen hoods, and air purifiers. The proposed method was evaluated based on the total energy usage and healthy air ratios.

In this study, the performance of the proposed method was evaluated through two simulation cases based on the outdoor PM 2.5 concentration. Also, the proposed method was trained to use all indoor environmental systems. In Case 1 (unhealthy outdoor condition), which assumed that the outdoor PM 2.5 concentration was between unhealthy and very unhealthy, the proposed method reduced the total energy usage by 15.3% compared to RBC. Although AI2C2 increased the energy consumption by operating the air purifier, this increase was counterbalanced by the lower energy usage of the other devices. This is because the air purifier has the lowest power consumption among the indoor environmental systems, and DDQN control can efficiently operate the indoor environmental systems for energy conservation. Concerning IAQ, the healthy air ratios of the proposed method were 1.6% (PM 2.5) and 0.5% (CO₂), slightly higher than those of RBC. This is because the RBC operated the indoor environmental systems after exceeding the IAQ standard, and this inefficient operation decreased the healthy air ratio. In contrast, the proposed method operated the indoor environmental systems before the indoor PM 2.5 and CO₂ concentrations exceeded the acceptable level. This effective operation resulted in better IAQ for the space.

In Case 2 (moderate outdoor condition), in which the concentration of outdoor PM 2.5 was distributed between good and moderate, the proposed method decreased the total energy consumption by 14.0% compared to RBC. The healthy air ratios of the proposed method were slightly lower than RBC. This is because the proposed method operated devices with low energy consumption (e.g., air purifier) after the occupant activity was over, rather than operating indoor climate control devices simultaneously to remove indoor particulate matter when occupant activity was in progress.

In summary, the main contributions of this study include:

The proposed method performed energy-efficient control by reflecting the characteristics of indoor climate control devices. The proposed method operated an air purifier with low energy consumption actively instead of operating the ventilation system and kitchen hood with relatively high energy consumption. The proposed method utilized a ventilation system for indoor environments where ventilation was necessary (e.g., cooking, indoor cleaning, exercising). This demonstrates such efficient operation.

The proposed method also reflected occupant activities to improve IAQ by operating the indoor climate control devices before the indoor PM 2.5 and CO₂ concentration exceeded IAQ standard. However, when the indoor PM 2.5 concentration was over the upper limit, even if all indoor climate control devices were operated simultaneously such as by cooking (grilling and frying), the proposed method selected a strategy for energy saving rather than IAQ improvement by operating the air purifier after the occupant activity was over. This proposed method’s operation reduced the total energy consumption.

The performance of the proposed method varied based on the reward weight. When a higher weight was given to energy saving, energy efficiency was improved, but the performance of IAQ was lowered. Conversely, when more weight was given to the IAQ, IAQ was improved but was inefficient in terms of energy. This showed that the importance of hyperparameter selection in the DDQN and the performance of the proposed method can be adjusted according to the stakeholders’ preferences.

The effectiveness of the proposed algorithm varied depending on the outdoor PM 2.5 concentration. When the outdoor PM 2.5 concentration is between unhealthy and very unhealthy, the proposed method operated the indoor climate devices effectively, leading to improvement in both IAQ and energy efficiency. On the contrary, when the outdoor PM 2.5 concentration is between good and moderate, the proposed method demonstrated improved performance in terms of energy efficiency but showed similar performance in terms of IAQ. This is because the proposed method operated devices after activities that generate a lot of particulate matter. When the outdoor PM 2.5 concentration is good, IAQ improvement is not significant because the indoor PM 2.5 level is more influenced by occupant activities (PM 2.5 emission rate) than the outdoor air condition. This result showed that RL-based control is more effective when outdoor PM 2.5 levels are high by improving energy efficiency and IAQ simultaneously.

In recent years, various types of building monitoring data have been collected through building energy management systems. Moreover, indoor climate control devices, e.g., air conditioners, ventilation systems, air purifiers, blinds, and dimming switches, became more commonly available as the standards of living increased. At the same time, we have to consider rapidly changing outdoor weather and irrational occupants’ activities. Hence, the implementation of the proposed method can result in a more effective and sophisticated control model for future smart buildings.

Author Contributions

Data curation, S.H.K.; Formal analysis, S.H.K.; Investigation, S.H.K.; Methodology, S.H.K. and H.J.M.; Visualization, S.H.K.; Writing—original draft, S.H.K. and H.J.M.; Writing—review and editing, H.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (No. 20212020800120). This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2021R1A2B5B02002699).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

IEQ	Indoor environmental quality
RBC	Rule-based control
RL	Reinforcement learning
DNN	Deep neural network
DDQN	Double deep Q-network
AI2C2	Artificial intelligence–integrated clean control
IAQ	Indoor air quality
PI	Proportional integral
MPC	Model predictive control
SF	State feedback
DQN	Deep Q-network
PMV	Predicted mean vote

References

EIA. International Energy Outlook 2021; EIA: Washington, DC, USA, 2022.
Sarbu, I.; Sebarchievici, C. Aspects of Indoor Environmental Quality Assessment in Buildings. Energy Build. 2013, 60, 410–419. [Google Scholar] [CrossRef]
Elsaid, A.M.; Ahmed, M.S. Indoor Air Quality Strategies for Air-Conditioning and Ventilation Systems with the Spread of the Global Coronavirus (COVID-19) Epidemic: Improvements and Recommendations. Environ. Res. 2021, 199, 111314. [Google Scholar] [CrossRef] [PubMed]
Jone, A.P. Indoor Air Quality and Health. Atmos. Environ. 2000, 34, 2663. [Google Scholar] [CrossRef]
Lowther, S.D.; Dimitroulopoulou, S.; Foxall, K.; Shrubsole, C.; Cheek, E.; Gadeberg, B.; Sepai, O. Low Level Carbon Dioxide Indoors—A Pollution Indicator or a Pollutant? A Health-Based Perspective. Environments 2021, 8, 125. [Google Scholar] [CrossRef]
Satish, U.; Mendell, M.J.; Shekhar, K.; Hotchi, T.; Sullivan, D.; Streufert, S.; Fisk, W.J. Is CO₂ an Indoor Pollutant? Direct Effects of Low-to-Moderate CO2 Concentrations on Human Decision-Making Performance. Environ. Health Perspect. 2012, 120, 1671–1677. [Google Scholar] [PubMed]
Erdmann, C.A.; Apte, M.G. Mucous Membrane and Lower Respiratory Building Related Symptoms in Relation to Indoor Carbon Dioxide Concentrations in the 100-Building BASE Dataset. Indoor Air Suppl. 2004, 14, 127–134. [Google Scholar] [CrossRef] [PubMed]
Kim, K.H.; Kabir, E.; Kabir, S. A Review on the Human Health Impact of Airborne Particulate Matter. Environ. Int. 2015, 74, 136–143. [Google Scholar] [CrossRef] [PubMed]
Sofer, T.; Baccarelli, A.; Cantone, L.; Coull, B.; Maity, A.; Lin, X.; Schwartz, J. Exposure to Airborne Particulate Matter Is Associated with Methylation Pattern in the Asthma Pathway. Epigenomics 2013, 5, 147–154. [Google Scholar] [CrossRef]
Zhu, Y.; Hinds, W.C.; Krudysz, M.; Kuhn, T.; Froines, J.; Sioutas, C. Penetration of Freeway Ultrafine Particles into Indoor Environments. J. Aerosol Sci. 2005, 36, 303–322. [Google Scholar] [CrossRef]
Ceccolini, C.; Sangi, R. Benchmarking Approaches for Assessing the Performance of Building Control Strategies: A Review. Energies 2022, 15, 1270. [Google Scholar] [CrossRef]
Kim, S.H.; Yoon, Y.R.; Kim, J.W.; Moon, H.J. An Integrated Operation of Indoor Environmental Systems Considering Occupants Behaviour to Maintain Thermal Comfort and the Concentration of Indoor Particulate Matters. J. Korean Soc. Living Environ. Syst. 2021, 28, 8–21. [Google Scholar] [CrossRef]
Shaikh, P.H.; Nor, N.B.M.; Nallagownden, P.; Elamvazuthi, I.; Ibrahim, T. A Review on Optimized Control Systems for Building Energy and Comfort Management of Smart Sustainable Buildings. Renew. Sustain. Energy Rev. 2014, 34, 409–429. [Google Scholar] [CrossRef]
Serale, G.; Fiorentini, M.; Capozzoli, A.; Bernardini, D.; Bemporad, A. Model Predictive Control (MPC) for Enhancing Building and HVAC System Energy Efficiency: Problem Formulation, Applications and Opportunities. Energies 2018, 11, 631. [Google Scholar] [CrossRef]
Berouinev, A.; Ouladsine, R.; Bakhouya, M.; Lachhab, F.; Essaaidi, M. A Model Predictive Approach for Ventilation System Control in Energy Efficient Buildings. In Proceedings of the 2019 4th World Conference on Complex Systems (WCCS), Ouarzazate, Morocco, 22–25 April 2019; pp. 9–14. [Google Scholar] [CrossRef]
Wei, T.; Wang, Y.; Zhu, Q. Deep Reinforcement Learning for Building HVAC Control. In Proceedings of the 54th Annual Design Automation Conference 2017, Austin, TX, USA, 18–22 June 2017. Part 12828. [Google Scholar] [CrossRef]
Li, B.; **a, L. A Multi-Grid Reinforcement Learning Method for Energy Conservation and Comfort of HVAC in Buildings. In Proceedings of the 2015 IEEE International Conference on Automation Science and Engineering (CASE), Gothenburg, Sweden, 24–28 August 2015; pp. 444–449. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
An, Y.; Niu, Z.; Chen, C. Smart Control of Window and Air Cleaner for Mitigating Indoor PM2.5 with Reduced Energy Consumption Based on Deep Reinforcement Learning. Build. Environ. 2022, 224, 109583. [Google Scholar] [CrossRef]
An, Y.; **a, T.; You, R.; Lai, D.; Liu, J.; Chen, C. A Reinforcement Learning Approach for Control of Window Behavior to Reduce Indoor PM2.5 Concentrations in Naturally Ventilated Buildings. Build. Environ. 2021, 200, 107978. [Google Scholar] [CrossRef]
Yu, K.H.; Chen, Y.A.; Jaimes, E.; Wu, W.C.; Liao, K.K.; Liao, J.C.; Lu, K.C.; Sheu, W.J.; Wang, C.C. Optimization of Thermal Comfort, Indoor Quality, and Energy-Saving in Campus Classroom through Deep Q Learning. Case Stud. Therm. Eng. 2021, 24, 100842. [Google Scholar] [CrossRef]
Thrun, S.; Schwartz, A. Issues in Using Function Approximation for Reinforcement Learning. In Proceedings of the 1993 Connectionist Models Summer School; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993; pp. 1–9. [Google Scholar]
Van Hasselt, H. Insights in Reinforcement Learning: Formal Analysis and Empirical Evaluation of Temporal-Difference Learning Algorithms. Ph.D. Thesis, Utrecht University, Utrecht, The Netherlands, 2011. ISBN 9789039354964. [Google Scholar]
Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
Valladares, W.; Galindo, M.; Gutiérrez, J.; Wu, W.C.; Liao, K.K.; Liao, J.C.; Lu, K.C.; Wang, C.C. Energy Optimization Associated with Thermal Comfort and Indoor Air Control via a Deep Reinforcement Learning Algorithm. Build. Environ. 2019, 155, 105–117. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, Z.; Loftness, V. Bio-Sensing and Reinforcement Learning Approaches for Occupant-Centric Control. ASHRAE Trans. 2019, 125, 364–371. [Google Scholar]
Liu, Y.; Zhang, D.; Gooi, H.B. Optimization Strategy Based on Deep Reinforcement Learning for Home Energy Management. CSEE J. Power Energy Syst. 2020, 6, 572–582. [Google Scholar] [CrossRef]
Andersen, R.V.; Toftum, J.; Andersen, K.K.; Olesen, B.W. Survey of Occupant Behaviour and Control of Indoor Environment in Danish Dwellings. Energy Build. 2009, 41, 11–16. [Google Scholar] [CrossRef]
Patton, A.P.; Calderon, L.; **ong, Y.; Wang, Z.; Senick, J.; Allacci, M.S.; Plotnik, D.; Wener, R.; Andrews, C.J.; Krogmann, U.; et al. Airborne Particulate Matter in Two Multi-Family Green Buildings: Concentrations and Effect of Ventilation and Occupant Behavior. Int. J. Environ. Res. Public Health 2016, 13, 144. [Google Scholar] [CrossRef] [PubMed]
He, C.; Morawska, L.; Hitchins, J.; Gilbert, D. Contribution from Indoor Sources to Particle Number and Mass Concentrations in Residential Houses. Atmos. Environ. 2004, 38, 3405–3415. [Google Scholar] [CrossRef]
Persily, A.; de Jonge, L. Carbon Dioxide Generation Rates for Building Occupants. Indoor Air 2017, 27, 868–879. [Google Scholar] [CrossRef] [PubMed]
Leung, D.Y.C. Outdoor-Indoor Air Pollution in Urban Environment: Challenges and Opportunity. Front. Environ. Sci. 2015, 2, 69. [Google Scholar] [CrossRef]
Crawley, D.B.; Lawrie, L.K.; Winkelmann, F.C.; Buhl, W.F.; Huang, Y.J.; Pedersen, C.O.; Strand, R.K.; Liesen, R.J.; Fisher, D.E.; Witte, M.J.; et al. EnergyPlus: Creating a New-Generation Building Energy Simulation Program. Energy Build. 2001, 33, 319–331. [Google Scholar] [CrossRef]
Kim, S.H.; Kim, J.W.; Moon, H.J. Advanced Optimal Control of Indoor Environmental Devices for Indoor Air Quality Using Reinforcement Learning. In Proceedings of the 42nd AIVC-10th TightVent—8th venticool Conference, Rotterdam, The Netherlands, 5–6 October 2022. [Google Scholar]
U.S. Department of Energy. EnergyPlusTM Engineering Reference; U.S. Department of Energy: Golden, CO, USA, 2020.
Nazaroff, W.W. Indoor Particle Dynamics. Indoor Air Suppl. 2004, 14, 175–183. [Google Scholar] [CrossRef]
Hu, T.; Singer, B.C.; Logue, J.M. Compilation of Published PM2.5 Emission Rates for Cooking, Candles and Incense for Use in Modeling of Exposures in Residences; Lbnl-5890E; Lawrence Berkeley National Lab.(LBNL): Berkeley, CA, USA, 2012; pp. 1–29. [Google Scholar]
Ji Hye, K. Development of Indoor Particulate Matter Concentration and Emission Prediction Model. Soc. Air-Cond. Refrig. Eng. Korea 2018, 47, 44–48. [Google Scholar]
U.S. Department of Energy. Input Output Reference; U.S. Department of Energy: Golden, CO, USA, 2019.
United Nations Statistics Division (UNSD). International Classification of Activities for Time Use Statistics 2016; United Nations: New York, NY, USA, 2021; ISBN 978-92-1-161639-2. [Google Scholar]
Statistics Korea. Korean Statistical Information Service Living Time Survey: Results of 2019. Korean Stat. Inf. Serv. 2020. Available online: https://kostat.go.kr/synap/skin/doc.html?fn=1d14134bd2ed11f716f4bdf2f7d155c0828c7657118d63cd586a46007cbc9d3c&rs=/synap/preview/board/220/ (accessed on 6 November 2023).
Eppy Documentation. Available online: https://eppy.readthedocs.io/en/latest/index.html (accessed on 5 December 2023).
Kim, S.H.; Yoon, Y.R.; Kim, J.W.; Moon, H.J. Reinforcement Learning for Integrated Optimal Control of Ventilation System and an Air Purifier to Maintain Healthy IAQ. Korean Soc. Living Environem. Syst. 2022, 29, 176–190. [Google Scholar] [CrossRef]
Korea Meteorological Administration Climate Change Monitoring. Available online: http://www.climate.go.kr/home/09_monitoring/ (accessed on 23 November 2022).
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. ar**v 2014, ar**v:1412.6980. [Google Scholar]
Paola, J.D. Neural Network Classification of Multispectral Imagery. Ph.D. Thesis, The University of Arizona, Tucson, AZ, USA, 1994. [Google Scholar]
Kiran, M.; Ozyildirim, M. Hyperparameter Tuning for Deep Reinforcement Learning Applications. ar**v 2022, ar**v:2201.11182. [Google Scholar]
Pargfrieder, J.; Jörgl, H.P. An Integrated Control System for Optimizing the Energy Consumption and User Comfort in Buildings. In Proceedings of the IEEE International Symposium on Computer Aided Control System Design, Glasgow, UK, 20 September 2002; pp. 127–132. [Google Scholar] [CrossRef]
Yu, L.; Sun, Y.; Xu, Z.; Shen, C.; Yue, D.; Jiang, T.; Guan, X. Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings. IEEE Trans. Smart Grid 2021, 12, 407–419. [Google Scholar] [CrossRef]
Baghaee, S.; Ulusoy, I. User Comfort and Energy Efficiency in HVAC Systems by Q-Learning. In Proceedings of the 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2–5 May 2018; pp. 1–4. [Google Scholar] [CrossRef]
Heo, S.; Nam, K.; Loy-benitez, J.; Li, Q.; Lee, S.; Yoo, C. A Deep Reinforcement Learning-Based Autonomous Ventilation Control System for Smart Indoor Air Quality Management in a Subway Station. Energy Build. 2019, 202, 109440. [Google Scholar] [CrossRef]
Ganesh, H.S.; Seo, K.; Fritz, H.E.; Edgar, T.F.; Novoselac, A.; Baldea, M. Indoor Air Quality and Energy Management in Buildings Using Combined Moving Horizon Estimation and Model Predictive Control. J. Build. Eng. 2021, 33, 101552. [Google Scholar] [CrossRef]
Zhang, Z.; Lam, K.P. Practical Implementation and Evaluation of Deep Reinforcement Learning Control for a Radiant Heating System. In Proceedings of the 5th Conference on Systems for Built Environments, Shenzen, China, 7–8 November 2018; pp. 148–157. [Google Scholar] [CrossRef]
Deng, X.; Zhang, Y.; Qi, H. Towards Optimal HVAC Control in Non-Stationary Building Environments Combining Active Change Detection and Deep Reinforcement Learning. Build. Environ. 2022, 211, 108680. [Google Scholar] [CrossRef]
Dols, W.S.; Polidoro, B. CONTAM User Guide and Program Documentation: Version 3.2; US Department of Commerce, National Institute of Standards and Technology: Gaithersburg, MD, USA, 2015.
Haas, A.; Weber, A.; Dorer, V.; Keilholz, W.; Pelletret, R. COMIS v3.1 Simulation Environment for Multizone Air Flow and Pollutant Transport Modelling. Energy Build. 2002, 34, 873–888. [Google Scholar] [CrossRef]
Dalamagkidis, K.; Kolokots, D. Reinforcement Learning for Building Environmental Control. In Reinforcement Learning; IntechOpen: London, UK, 2008. [Google Scholar] [CrossRef]

Figure 1. Concept of the proposed RL-based approach.

Figure 2. Plan of the test chamber and indoor climate control devices.

Figure 3. Time and duration of occupant activities in a residential building (updated from [34]).

Figure 4. A schematic diagram of the testbed and sources of indoor contaminants for the numerical IAQ model.

Figure 5. Co-simulation platform for AI2C2.

Figure 6. Indoor PM 2.5 concentrations when all indoor climate control systems are off (Case 1).

Figure 7. Indoor PM 2.5 concentrations when all indoor climate control systems are off (Case 2).

Figure 8. Indoor CO₂ concentrations when all indoor climate control systems are off (all cases).

Figure 9. Convergence of AI2C2 (left: Case 1, right: Case 2).

Figure 10. IAQ and operation of the indoor climate control devices using RBC (Case 1).

Figure 11. IAQ and operation of the indoor climate control devices using RBC (Case 2).

Figure 12. IAQ and operation of the indoor climate control devices using AI2C2 (Case 1, episode 19,981).

Figure 13. IAQ and operation of the indoor climate control devices using AI2C2 (Case 2).

Table 1. Specification of the indoor climate control systems.

Indoor Environmental Control Systems	Ventilation system	Supply/Exhaust airflow rate (m³/s)	Low	0.042
			Medium	0.057
			High	0.07
		Rated power (W)	400
	Kitchen hood	Exhaust airflow rate (m³/s)	Low	0.045
			High	0.055
		Rated power (W)	51
	Air purifier	Airflow rate (m³/s)	Low	0.042
			Medium	0.053
			High	0.08
		Rated power (W)	30

Table 2. Input value for indoor particle dynamics in the testbed.

Parameter			Input Value
Emission rate in the room (µg/min)	Indoor cleaning (Vacuuming)		70
	Cooking	Oven	10
		Grilling	283
		Frying	1483
Interior volume (m³)			68
Outdoor PM 2.5 concentration (µg/m³)		Case 1	36–135
Outdoor PM 2.5 concentration (µg/m³)		Case 2	12–25
Mechanical ventilation flow rate (m³/min)		Off	0
		Low	2.5
		Medium	3.4
		High	4.2
Mechanical supply filter efficiency (-)			0.9
Kitchen hood flow rate (m³/min)		Off	0
		Low	2.7
		High	3.3
Air purifier flow rate (m³/min)		Off	0
		Low	2.5
		Medium	3.2
		High	4.8
Air purifier filter efficiency (-)			0.9
Natural ventilation flow rate (m³/min)			0
Natural infiltration flow rate (m³/min)			0.56
Particle fractions in the infiltration flow path (-)			0.7
Deposition rate of particles onto room surfaces (min⁻¹)			0.0067

Table 3. PM 2.5 and CO₂ emission rates per occupant activity (updated from [34]).

		Slee**	Eating Meals/Snacks	Working	Cooking			Indoor Cleaning	Exercising	Resting
		Slee**	Eating Meals/Snacks	Working	Oven	Grilling	Frying	Indoor Cleaning	Exercising	Resting
PM 2.5	Emission rate (µg/min)	0	0	0	10	283	1483	70	0	0
CO₂	Number of people (-)	1
	Activity level (W)	72	108	117	207			360	423	108
	Emission rate (m³/s)	2.75 × 10⁻⁶	4.13 × 10⁻⁶	4.47 × 10⁻⁶	7.91 × 10⁻⁶			1.38 × 10⁻⁵	1.62 × 10⁻⁵	4.13 × 10⁻⁶

Table 4. States for AI2C2.

State			Unit
Outdoor environmental state		Outdoor PM 2.5 concentration	µg/m³
Outdoor environmental state		Outdoor CO₂ concentration	ppm
Indoor environmental state		Indoor PM 2.5 concentration	µg/m³
		Indoor CO₂ concentration	ppm
		Emission rate of PM 2.5	µg/min
		Emission rate of CO₂	m³/s
State of indoor climate control devices	Ventilation system	Airflow rate	m³/s
	Kitchen hood	Airflow rate	m³/s
	Air purifier	Airflow rate	m³/s
Physical state		Time	-
Physical state		Occupant activity	-

Table 5. Actions for AI2C2.

			Ventilation System	Kitchen Hood	Air Purifier
Action(m³/s)	Off		0
	On	Low	0.042	0.045	0.042
		Medium	0.057	-	0.053
		High	0.07	0.055	0.08

Table 6. Operation state of indoor environmental devices in RBC.

		PM 2.5 Concentration
		>25 µg/m³	≤25 µg/m³
CO₂ concentration	>1000 ppm	All systems On	Ventilation system On Kitchen hood Off Air purifier Off
CO₂ concentration	≤1000 ppm	All systems On	All systems Off

Table 7. Outdoor conditions of the simulation cases.

Case	Outdoor Conditions
Case	PM 2.5	CO₂
Case 1	36–135 μg/m³ (Unhealthy-Very unhealthy)	412.7 ppm (Fixed)
Case 2	12–25 μg/m³ (Good-Moderate)	412.7 ppm (Fixed)

Table 8. Comparison of RBC and DDQN.

Case	Control Method	Energy Consumption (Wh)				Healthy Air Ratio (%)
Case	Control Method	Ventilation System	Kitchen Hood	Air Purifier	Total	PM 2.5	CO₂
Case 1	RBC	249.3	72	54	375.3	92.5	99.2
Case 1	AI2C2 *	89.1 (±4.69)	34.3 (±0.04)	194.4 (±1.36)	317.8 (±6.00)	94.1 (±0.03)	99.7 (±0.03)
Case 2	RBC	127.8	28.7	21.5	177.9	97	98.7
Case 2	AI2C2 *	89.2 (±7.77)	14.3 (±1.58)	57.3 (±2.74)	160.9 (±8.23)	94.2 (±0.01)	97.5 (±0.01)

* The average value and standard deviation of the last 50 episodes.

Table 9. Comparison of RBC and AI2C2 (Case 1).

		RBC	AI2C2 (w_PM2.5:w_CO2:w_EC)
		RBC	2:2:1	1:1:1	1:1:3
Total energy consumption (Wh)		375.3	639.7 (±6.69)	244.9 (±2.33)	317.8 (±6.00)
Healthy air ratio (%)	PM 2.5	92.5	95.7 (±0.06)	91.4 (±0.40)	94.1 (±0.03)
Healthy air ratio (%)	CO₂	99.2	99.7 (±0.03)	99.4 (±0.20)	99.7 (±0.03)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, S.H.; Moon, H.J. The Performance of Reinforcement Learning for Indoor Climate Control Devices according to the Level of Outdoor Air Particulate Matters. Buildings 2023, 13, 3062. https://doi.org/10.3390/buildings13123062

AMA Style

Kim SH, Moon HJ. The Performance of Reinforcement Learning for Indoor Climate Control Devices according to the Level of Outdoor Air Particulate Matters. Buildings. 2023; 13(12):3062. https://doi.org/10.3390/buildings13123062

Chicago/Turabian Style

Kim, Sun Ho, and Hyeun Jun Moon. 2023. "The Performance of Reinforcement Learning for Indoor Climate Control Devices according to the Level of Outdoor Air Particulate Matters" Buildings 13, no. 12: 3062. https://doi.org/10.3390/buildings13123062

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Performance of Reinforcement Learning for Indoor Climate Control Devices according to the Level of Outdoor Air Particulate Matters

Abstract

1. Introduction

2. Double Deep Q-Network (DDQN)

3. Materials and Methods

3.1. Testbed and a Simulation Model

3.2. IAQ Standards

3.3. Numerical Model for Simulation IAQ

3.4. Occupant Activity Schedule

3.5. Co-Simulation Platform for AI2C2

3.6. DDQN Training for AI2C2

3.6.1. State Variables

3.6.2. Control Action

3.6.3. Reward Function

3.7. RBC for the Control IAQ

3.8. Evaluation Factor

4. Results

4.1. Test Cases

4.2. Training of AI2C2

4.3. Evaluation of AI2C2

4.3.1. Case 1 (Outdoor PM 2.5 Concentration: Unhealthy-Very Unhealthy)

4.3.2. Case 2 (Outdoor PM 2.5 Concentration: Good-Moderate)

4.4. The Performance of AI2C2 according to Reward Weights

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI