Abstract
Weather forecasting plays a vital
role in agriculture, transportation, disaster management, and daily planning.
Accurate temperature prediction is particularly critical for mitigating the
adverse effects of extreme weather events. This study presents the development
of a web-based weather forecasting system that utilizes Random Forest
Regression, a machine learning algorithm, to predict temperature based on key
meteorological parameters. The system was trained using the cleaned_weather
dataset obtained from Kaggle, which consists of 52,697 records and initially 21
variables. After feature selection, six relevant parameters Pressure (P),
Relative Humidity (rh), Wind Velocity (wv), Wind Direction (wd), Shortwave
Radiation (SWDR), and Photosynthetically Active Radiation (PAR) were used for
model training. The dataset was split into 80% training and 20% testing
samples, and the model achieved an R-squared score of 0.6843 on the training
set, indicating a strong fit, although the test set score (-0.3002) suggested
potential overfitting. The trained model was integrated into a web interface
using HTML, CSS, PHP, and Python, and hosted locally on XAMPP, allowing users
to input meteorological data and receive real-time temperature predictions.
Multiple test cases demonstrated that the system accurately interpreted input
variables and produced realistic predictions, while validation mechanisms
ensured data integrity. The system evaluation confirmed that the platform is
efficient, reliable, and user-friendly, making it suitable for educational and
practical applications in weather monitoring and short-term forecasting. This
study demonstrates that machine learning regression models can enhance weather
prediction accuracy and accessibility. Future improvements could involve the
use of deep learning models, integration with real-time weather APIs, and
deployment on cloud platforms to expand usability and predictive performance.
TABLE
OF CONTENTS
Title Page - - - - - - - - - - i
Certification - - - - - - - - - - ii
Dedication - - - - - - - - - - iii
Acknowledgements - - - - - - - - - iv
Abstract - - - - - - - - - - v
Table of
Contents - - - - - - - - - vi
CHAPTER
ONE
1.0
Introduction--------------------------------------------------------------------------------------1
1.1
Statement of the Problem-----------------------------------------------------------------------------2
1.2 Aim
and Objectives of the Study---------------------------------------------------------------------3
1.3
Significance of the Study------------------------------------------------------------------------------4
1.5
Definition of Terms-------------------------------------------------------------------------------------5
CHAPTER
TWO
LITERATURE
REVIEW
2.0
Introduction-----------------------------------------------------------------------------------------------7
2.1
Conceptual Framework----------------------------------------------------------------------------------8
2.2
Theoretical Framework----------------------------------------------------------------------------------9
2.3 Review
of Related Literature--------------------------------------------------------------------------12
2.4
Summary of Literature Review------------------------------------------------------------------------16
CHAPTER
THREE
Result
Finding and Discussion
3.0
Introduction----------------------------------------------------------------------------------------------19
3.1
Research Methodology---------------------------------------------------------------------------------19
3.1.1
Research Design--------------------------------------------------------------------------------------19
3.1.2
Data Source--------------------------------------------------------------------------------------------20
3.1.3
Data Collection and Preprocessing-----------------------------------------------------------------21
3.1.4
Model Development Methodology-----------------------------------------------------------------20
3.1.5
Justification for the Methodology-------------------------------------------------------------------21
3.2 System
Analysis-----------------------------------------------------------------------------------------21
3.2.1
Existing System Description-----------------------------------------------------------------------22
3.2.2
Problem of the Existing System--------------------------------------------------------------------22
3.3
System Design Tools-----------------------------------------------------------------------------------23
3.2.3
Proposed System Design----------------------------------------------------------------------------23
3.3.2
Use Case Diagram------------------------------------------------------------------------------------24
3.3.4
Entity Relationship Diagram (ERD)---------------------------------------------------------------25
3.4
Hardware and Software Requirements---------------------------------------------------------------25
3.4.1
Hardware Requirements------------------------------------------------------------------------------25
3.3.3
System Flowchart-------------------------------------------------------------------------------------25
3.4.2
Software Requirements-------------------------------------------------------------------------------25
3.4.3
Justification for the Choice of Tools----------------------------------------------------------------26
3.5.1
Data Layer---------------------------------------------------------------------------------------------27
3.5.2
Application Layer-------------------------------------------------------------------------------------28
3.5.4
Architectural Flow Description---------------------------------------------------------------------28
3.5.3
Presentation Layer-----------------------------------------------------------------------------------29
CHAPTER
FOUR
Result
Finding and Discussion
4.0
Introduction--------------------------------------------------------------------------------------------30
4.1
System Implementation-------------------------------------------------------------------------------31
4.3
Programming Languages and Tools Used-----------------------------------------------------------32
4.4
System Testing------------------------------------------------------------------------------------------32
4.4.1
Purpose of System Testing--------------------------------------------------------------------------32
4.4.2
Testing Approach-------------------------------------------------------------------------------------33
4.4.3
Test Environment-------------------------------------------------------------------------------------33
4.4.4
Test Data and Results---------------------------------------------------------------------------------34
4.4.5
System Response--------------------------------------------------------------------------------------34
4.4.6
Error Handling-----------------------------------------------------------------------------------------34
4.4.7
Overall System Performance------------------------------------------------------------------------35
4.5
System Evaluation--------------------------------------------------------------------------------------35
4.5.1
Evaluation Objectives--------------------------------------------------------------------------------35
4.5.2
Evaluation Criteria------------------------------------------------------------------------------------36
4.5.3
Evaluation
results-------------------------------------------------------------------------------------36
CHAPTER
FIVE
SUMMARY,
CONCLUSION AND RECOMMENDATION
5.0
Introduction----------------------------------------------------------------------------------------------46
5.1
Conclusion----------------------------------------------------------------------------------------------47
References----------------------------------------------------------------------------------------------------52
CHAPTER ONE
1.0 Introduction
Weather forecasting plays a
crucial role in the socioeconomic development of any nation, influencing
activities in agriculture, aviation, construction, energy management, and
disaster preparedness. Accurate weather prediction enables individuals,
organizations, and governments to make informed decisions that can minimize
losses and optimize resource allocation. However, due to the chaotic and
nonlinear nature of atmospheric processes, achieving reliable forecasts remains
a persistent challenge for researchers and meteorological institutions
worldwide. Traditional forecasting methods, such as numerical weather
prediction (NWP), rely heavily on physical models and complex mathematical
equations that simulate atmospheric behavior. While these methods have been
widely used, they often require high computational resources and may struggle
to capture short-term fluctuations in weather variables (Wang et al., 2023).
In recent years, the advancement
of data-driven techniques and the availability of large-scale meteorological
datasets have opened new possibilities for applying machine learning (ML)
algorithms to weather forecasting. These approaches can learn hidden patterns
and nonlinear relationships between different atmospheric parameters, allowing
for more efficient and adaptive predictive modeling. Among the various ML
techniques, regression-based models, particularly ensemble methods such as
Random Forest Regression, have demonstrated significant potential in improving
forecasting accuracy and robustness (Li & Zhang, 2024). Such models
leverage multiple decision trees to minimize overfitting and handle large
datasets with complex inter-variable dependencies effectively.
This study, titled Weather
Forecasting Using Time Series and Regression Model, explores the use of a
Random Forest Regression algorithm to predict weather conditions based on
historical meteorological data. The dataset used for this study was obtained from
Kaggle under the search term “Weather Long-term Time Series
Forecasting,” specifically the dataset titled “cleaned_weather.” It
comprises 52,697 observations and 21 weather-related features, including
atmospheric pressure (p), temperature (T), relative humidity (rh), wind
velocity (wv), wind direction (wd), solar shortwave downward radiation (SWDR),
and photosynthetically active radiation (PAR). After preprocessing, the most
relevant features were selected to train the model, focusing on those with the
highest predictive influence while removing highly correlated and less
informative variables.
The choice of Random Forest
Regression for this project was influenced by its proven effectiveness in
handling nonlinear relationships and reducing prediction variance compared to
traditional linear models. Moreover, it provides high interpretability and
stability even when dealing with noisy or incomplete datasets. The
implementation was carried out using Python in Jupyter Notebook, employing
essential libraries such as pandas, NumPy, scikit-learn, matplotlib,
and joblib. The model was later integrated into a web-based developed
using HTML, hosted locally on XAMPP, to allow for user iteraction
and visualization of forecast results.
By combining time series analysis
with machine learning, this project aims to enhance the accuracy and
accessibility of weather forecasting systems, particularly for developing
regions where computational resources and access to specialized meteorological
equipment may be limited. The outcomes of this research are expected to
contribute to the growing field of intelligent weather forecasting systems that
support climate-sensitive sectors and improve early warning mechanisms for
extreme weather events (Kumar & Singh, 2024; Ahmed et al., 2023).
1.1 Statement of the Problem
Weather forecasting remains a
complex and uncertain process due to the dynamic, nonlinear, and chaotic nature
of atmospheric systems. Despite significant progress in meteorological science,
many forecasting techniques still struggle to achieve high precision,
particularly in short-term predictions and local weather variations.
Traditional numerical weather prediction (NWP) models, though effective at a
large scale, often require massive computational resources and extensive domain
expertise to interpret, making them less practical for rapid or localized
forecasting applications (Zhang et al., 2024). Additionally, these models are
highly sensitive to input errors and boundary conditions, which can lead to
cumulative inaccuracies over time.
In developing regions, including
parts of Africa, access to sophisticated weather forecasting infrastructure and
high-resolution datasets remains limited. This has led to persistent challenges
in delivering reliable forecasts for sectors that depend heavily on weather
conditions, such as agriculture, transportation, and renewable energy
management (Adebayo & Musa, 2023). The inability to obtain accurate
forecasts can result in poor planning, economic losses, and increased
vulnerability to extreme weather events such as floods, droughts, and
heatwaves. Furthermore, the manual interpretation of meteorological data can be
time-consuming and prone to human error, reducing efficiency and timeliness in
weather-related decision-making.
The rise of machine learning (ML)
and artificial intelligence (AI) has provided an alternative data-driven
approach to weather forecasting that can overcome some of the limitations of
conventional methods. However, many existing studies focus on deep learning
models that require vast datasets and specialized computational environments,
which are not always feasible for small institutions or local meteorological
centers (Chen & Li, 2024). There is, therefore, a pressing need for
efficient and computationally feasible ML-based forecasting systems that can
process historical meteorological data to predict future weather conditions
with acceptable accuracy.
This study addresses these
challenges by applying a Random Forest Regression model to forecast
weather using historical time series data obtained from the cleaned_weather
dataset on Kaggle. The goal is to evaluate the model’s ability to learn
patterns from key atmospheric parameters such as pressure, humidity, wind
speed, and solar radiation, and to generate accurate predictions without the
heavy computational burden of traditional models. By implementing the system
through a web-based platform, the project also aims to demonstrate how such
predictive models can be made more accessible and interactive for end-users.
Thus, the problem this research seeks to solve is the limited availability of
accurate, efficient, and accessible weather forecasting tools that leverage
machine learning techniques to improve short-term predictive accuracy and
decision-making in weather-dependent sectors (Ahmed et al., 2023; Li &
Zhang, 2024).
1.2 Aim and Objectives of the
Study
The main aim of this study is to
develop a weather forecasting model using a Random Forest Regression approach
that can accurately predict future weather conditions based on historical
meteorological data. The study seeks to demonstrate the potential of machine
learning in improving the accuracy, efficiency, and accessibility of weather
prediction systems.
To achieve this aim, the study is
guided by the following specific objectives:
- To
train and evaluate a Random Forest Regression model using key
meteorological parameters such as atmospheric pressure, relative humidity,
wind velocity, wind direction, solar radiation, and photosynthetically
active radiation.
- To
assess the performance of the developed model using statistical evaluation
metrics and compare its predictive accuracy between training and testing
datasets.
- To
design and implement a web-based interface that allows users to interact
with the trained model and visualize weather forecasts in a user-friendly
environment.
1.3 Significance of the Study
The significance of this study
lies in its contribution to the growing field of data-driven weather
forecasting by demonstrating how machine learning techniques can be effectively
applied to predict meteorological conditions with improved accuracy and
efficiency. Traditional weather forecasting models often depend on complex
physical simulations that require advanced computational facilities and expert
interpretation. In contrast, this study utilizes a Random Forest Regression
model, which provides a practical, efficient, and interpretable alternative for
processing large datasets and identifying nonlinear relationships among weather
variables (Wang et al., 2023).
By employing machine learning
techniques, the study seeks to enhance the precision of short-term weather
forecasts, which is critical for sectors that rely heavily on climate and
environmental data. For instance, accurate predictions can assist farmers in
planning irrigation and harvesting, help aviation authorities ensure flight
safety, and support energy providers in balancing electricity supply from
renewable sources such as solar and wind (Ahmed et al., 2023). Therefore, the
outcomes of this research can contribute to minimizing losses caused by
unexpected weather changes and improving decision-making processes in
weather-sensitive industries.
Furthermore, the integration of
the forecasting model into a web-based interface developed with HTML, CSS, and
PHP, and hosted locally using XAMPP, increases accessibility and usability.
This approach allows users especially those in developing regions with limited
access to advanced meteorological systems to easily interact with and visualize
forecast data. The system’s design demonstrates how open-source tools and
modern programming frameworks can be leveraged to build intelligent forecasting
solutions that are both affordable and scalable.
Ultimately, this study is
significant because it bridges the gap between theoretical meteorological
modeling and practical implementation through modern computational
intelligence. It not only contributes to academic knowledge but also offers a
foundation for future research and innovation in weather analytics, artificial
intelligence applications, and environmental informatics (Li & Zhang, 2024;
Chen & Li, 2024).
1.4 Definition of Terms
i.
Weather
Forecasting: The process of predicting future atmospheric conditions such as
temperature, humidity, and wind speed based on current and historical weather
data.
ii.
Time
Series: A sequence of data points recorded at specific time intervals, often
used to analyze trends and patterns over time.
iii.
Regression
Model: A type of statistical or machine learning model that predicts a
continuous output variable based on one or more input variables.
iv.
Random
Forest Regression: An ensemble learning algorithm that uses multiple decision
trees to improve prediction accuracy and reduce overfitting in regression
tasks.
v.
Machine
Learning: A field of artificial intelligence that enables computers to learn
from data and make predictions or decisions without being explicitly
programmed.
vi.
Dataset:
A structured collection of data used for analysis, training, and testing of
models.
vii.
Atmospheric
Pressure (p): The force exerted by the weight of air in the atmosphere on a
given surface area, usually measured in hectopascals (hPa).
viii.
Relative
Humidity (rh): The ratio of the current amount of water vapor in the air to the
maximum amount the air can hold at a given temperature, expressed as a percentage.
ix.
Wind
Velocity (wv): The speed of the wind measured in meters per second, indicating
the rate of air movement in the atmosphere.
x.
Wind
Direction (wd): The direction from which the wind originates, typically
measured in degrees from the north.
xi.
Solar
Shortwave Downward Radiation (SWDR): The amount of incoming solar energy
received at the Earth’s surface, influencing temperature and evaporation rates.
xii.
Photosynthetically
Active Radiation (PAR): The portion of solar radiation that plants use for
photosynthesis, typically measured in micromoles per square meter per second.
xiii.
Feature
Selection: The process of choosing the most relevant variables from a dataset
to improve model accuracy and reduce computation time.
xiv.
Model
Training: The phase where a machine learning algorithm learns patterns and
relationships from input data.
xv.
Model
Testing: The evaluation of a trained model using new, unseen data to measure
its predictive performance.
xvi.
Overfitting:
A modeling error that occurs when a machine learning model performs well on
training data but poorly on unseen test data because it has memorized noise or
irrelevant details.
xvii.
R-squared
(R²): A statistical measure that indicates how well the predicted values of a
model correspond to the actual data, showing the proportion of variance
explained by the model.
xviii.
Preprocessing:
A data preparation step involving cleaning, transforming, and normalizing data
before model training.
xix.
Web
Interface: A graphical platform that allows users to interact with a system or
model through a web browser using technologies like HTML, CSS, and PHP.
xx.
XAMPP:
A local web server package that combines Apache, MySQL, PHP, and Perl, used to
host and test web applications locally on a computer.
Login To Comment