Introduction
Currently in Portugal, most schools lack sensory components for collecting environmental information inside classrooms. Several studies carried out over the years indicate that poor indoor air quality (IAQ) in enclosed spaces can have a direct impact on well-being, causing concentration difficulties, fatigue, headaches and, in extreme cases, loss of consciousness.
In this context, and given the relevance of the topic, this study set out to develop a modular, easily replicable, scalable and low-cost monitoring system, that would make it possible to accurately collect environmental data from various classrooms located in different educational establishments. To achieve this goal, monitoring boxes were equipped with a wide range of energy-efficient sensors. In addition, this system includes the collection of data relating to occupancy of each monitored space, thanks to the collaboration of the teachers in each school. In this way, it is possible to map the occupancy of spaces without resorting to cameras or other intrusive methods that could compromise the privacy of those involved.
High Level Design
The Airmon System architecture was conceived with scalablility, fault tolerance and security concerns in mind, to be able to efficiently support the country-wide monitoring of school indoor air environmental data, using a common design for large scale IoT systems .
The architecture, presented in Figure 1, is structured into three layers: i) Central Cloud instance (Cloud-Node) that centrally aggregates all collected data and provides dashboard facilities; ii) Local in-school-premises Fog-Node instances, responsible for locally aggregating data collected from various classrooms Edge-Nodes; also providing the means for depicting and analysing local data-sets; iii) Low-cost DIY monitoring local Edge-Nodes/Motes installed in each classroom, responsible for collecting environmental data from classrooms; Edge-Nodes assemble various environmental sensors and are socket powered in the classrooms. The data collected by these monitoring Motes is sent via WiFi to a local school FOG-Node instance. Each FOG-Node instance locally stores data from each Edge-Node and depicts it in the local school dashboard. Fog-Nodes are also responsible for sending collected data to the central Cloud-Node instance, which aggregates data-sets from all monitored schools, thus providing a system-wide real-time overview of the entire collected data.
Figure 1
Hardware design
Edge node:
Figure 2 shows an installation carried out at the Fernando Pessoa University and shows that the monitoring box has side entrances, these entrances were designed to prevent the formation of microclimates inside the boxes, thus avoiding possible errors in the readings.
Each monitoring box consists of 4 components:
• a DHT22 sensor responsible for collecting humidity and temperature
• a MHZ19 sensor responsible for collecting CO2
• a PMS5003 sensor responsible for collecting particles
• an ESP32 which is responsible for communicating all the data collected to the school’s local server.
Figure 2
Software Design
FOG-Node:
The HomeAssistant system and some of the integrations available on it were used as the basis for each server.
To configure the monitoring boxes, we used the EspHome integration, which allows us to configure our boxes using YAML files, defining parameters such as the name, measurement interval or which pins on the board to use.
InfluxDB was chosen to store all the information collected due to its efficiency in handling large volumes of data.
Grafana was used as a visualization tool for the information collected. This integration provides the option of visualizing the data over time, as well as allowing the information to be exported in CSV format.
Google Drive Backup was used to back up all the servers to a remote drive.
Figure 3
CLOUD-NODE:
The cloud node is also a HomeAssistant instance, but it has two major differences. The volume of data present on it, as it is where all the information collected by the other nodes converges, and also the fact that it is the only one exposed on the Internet, through the use of NGINX integration. Figure 4 shows an example of its dashboard.
Figure 4
Based on the environmental data collected through the node present at the Fernando Pessoa University and together with the room occupancy information collected by filling in forms with the collaboration of teachers, with the aim of classifying room occupancy through the use of machine learning, datasets were created and a series of operations were carried out on them:
- Removal of the first 15 min and last 15 min of class
- Removal of columns that were not going to be used for the problem, such as teachers’ emails
- Creating CO2 derivatives
- Eliminating sensor noise using the Ordinary Least Squares technique
- Conversion of Yes/No and Open/Closed values to numerical format (1 and 0)
- Aggregation of Occupation into Classes with Intervals of 3 and 5 People
- Creation of class periods in the datasets
Results
The overall dataset gathered 52,164 lines of information about the three rooms monitored (Room 106, Room 204 and Room 210) and the period of data collection was between 23/03/23 and 18/06/23, where 65 classes were recorded in Room 106, 70 classes in Room 204 and 82 classes in Room 210. Figure 5 shows a histogram of the occupancies recorded in the various rooms.
Figure 5
Machine Learning:
Using machine learning techniques, the aim was to provide a reading of each of the environmental components so that the models could classify the occupation into classes.
To do this, the datasets were randomly divided into 80% for training and 20% for testing.
The techniques used were CNN and MLP out of curiosity about their performance in the context of this work and two more traditional techniques such as RFClassifier and KNN.
The results obtained were evaluated using the accuracy metric.
Our test flow then consisted of training a specific model for each room and a general model, in order to see how well it could generalize.
All models were first trained with all parameters and then retrained with some parameters removed.
In both cases they were trained for each of the occupancy classes defined (3 and 5).
Example of a test case for the Global model:
Using the CNN technique for class 5 classification with all the environmental parameters, the model obtained an accuracy of 86%.
Figure 6
Example of a test case for the Room 204 model:
Using the same test parameters but using a model specifically trained for room 204, the accuracy increased to 93%.
Figure 7
Conclusion
All the objectives proposed in the work were successfully completed, the system was implemented and tested in 2 different schools.
We were able to create publicly available representative datasets (https://github.com/jotaSVV/AirmonSystem-Datasets)
And we obtained promising results when classifying occupancy using machine learning algorithms, managing to build not only specific models for each classroom monitored, but at the same time a general model trained with all the information collected.