ABSTRACT
Due to technology Advancements,
relationship marketing has become a reality in recent years. Technologies such
as data warehousing, data mining, and campaign management software have made
customer relationship management a new area where firms can gain a competitive
advantage. Particularly through data mining the extraction of hidden and useful
information from a large set databases
companies can identify valuable customers, predict future behaviors based on
the previous purchase pattern of the customer, and enable companies to make
proactive, knowledge-driven decisions.
we propose a system that
will help retailers to understand dependencies among goods purchased by the
customer , also knowing what good is purchased with the other or if a
particular set of goods are purchased so as to maximize profit.
CHAPTER ONE
1.0 INTRODUCTION
1.1
PROBLEM STATEMENT
1.2
SIGNIFICANCE
OF THE STUDY
1.3
AIM
AND OBJECTIVE OF THE STUDY
1.4
METHODOLOGY
1.5 SCOPE OF THE STUDY
1.6 LIMITATION
OF THE STUDY
CHAPTER TWO
LITERATURE REVIEW
1.1
INTRODUCTION
2.2 BACKGROUND OF THE PROBLEM AREA
2.3 APRIORI
ALGORITHM
2.4 ASSOCIATION
RULE
2.5
MARKET BASKET
ANALYSIS
2.6 ADVANTAGES
OF MARKET BASKET ANALYSIS
2.7 DISADVANTAGES MARKET BASKET
ANALYSIS
2.8 REVIEW OF RELATED WORKS
CHAPTER THREE
SYSTEM DESIGN/ DESIGN METHODOLOGY
3.0 INTRODUCTION
3.1 SYSTEM ANALYSIS
3.2 WATERFALL
MODEL
3.3 PROPOSED SYSTEM
DESIGN
3.4 PROGRAMMING
LANGUAGE
3.3 SYSTEM DESIGN
3.4.2 USER
REQUIREMENTS
3.5 DESIGN APPROACH
3.5.1 USE CASE DIAGRAM
3.5.2
SEQUENCE DIAGRAM
3.5.3
FLOW CHART DIAGRAM
3.6 DATABASE
3.6.1 DATABASE DESIGN
3.6.2
TABLES
CHAPTER FOUR
SYSTEM
IMPLEMENTATION AND TESTING
4.0 INTRODUCTION
4.3 PROGRAM
TESTING
4.4 SOFTWARE
TESTING
4.5
SYSTEM DESIGN DIAGRAM
4.6 CHOICE OF PROGRAMMING LANGUAGE
4.7
PROGRAMMING ENVIRONMENT
4.7.1
HARDWARE
REQUIREMENT
4.7.2 SOFTWARE
REQUIREMENT
4.8
SYSTEM
IMPLEMENTATION
CHAPTER FIVE
5.0 INTRODUCTION
5.1 CONSTRAINTS OF THE STUDY
5.2 SUMMARY
5.3 CONCLUSION
5.4 RECOMMENDATIONS
CHAPTER ONE
1.0 INTRODUCTION
Data mining is described
as the extraction of hidden helpful information from a collection of huge
databases, data mining is also a technique that encompasses an enormous form of
applied mathematics and compultational techniques like link
analysis,clustering, classification, summarizing knowledge , regression
analysis and so on. data mining tools predict future trends and behaviors,
permitting businesses to create knowledge-driven selections. The
machine-driven, prospective analyses offered by data mining move on the far
side the analyses of past events. data mining tools provides answer to business questions that were time
consuming. They search databases for hidden patterns, finding useful
information that is beyond the reach of specialists.
Data mining techniques is
enforced speedily on existing package and hardware platforms to reinforce the
worth of existing information resources, and might be integrated with new
product and systems as they're brought. once enforced on high performance
client/server or multiprocessing computers, data mining tools will analyze huge
databases to provide answers to questions such as, ”What goods consumers tend to
buy the most and goods that go along
side with it”.
Coenen(2010) in
his publication” Data Mining: Past, Present and Future” discussed the history
of data mining can be dated as far back as late 80s when the term began to be used, at least
within the research community and diffrentiated it from sql.
Broadly data
mining can be defined as as set of mechanisms and techniques, realised in
software,
to extract hidden
information from data. However,the word hidden in this definition is important;
By the early 1990s
data mining was commonly recognised as a sub process within a larger process
called Knowledge Discovery in Databases or KDD ,the most commonly used definition
of KDD is that of Fayyad et al as “the
nontrivial process of identifying valid, novel, potentially useful and
ultimately understandable patterns in data.’’ (Fayyad et al. 1996).
As such data mining
should be viewed as the sub-process, within the overall KDD process, concerned
with the discovery of hidden information". Other sub-processes that form
part of the KDD process are data preparation (warehousing, data cleaning,
pre-processing,and so on) and the analysis/visualisation of results. For may
practical purposes KDD and data mining are seen as synonymous, but technically
one is a sub-process of the other. The data that data mining techniques were
originally directed at was tabular data and, giventhe processing power available
at the time, computational eficiencywas of significant concern. As the amount
of processing power generally available increased, processing became less of a
concern and was replaced with a desire for accuracy and a desire to mine ever
larger data collections. Today, in the context of tabular data, we have a well
established range of data mining techniques available.
It is well within
the capabilities of many commercial enterprises and researchers to mine tabular
data, using software
such as Weka, on standard desktop
machines. However,the amount of electronic data collected by all kinds of
institutions and commercial enterprises,year on year, continues to grow and
thus there is still a need for efective mechanisms to mineever larger data
sets. The popularity of data mining increased significantly in the 1990s,
notably with the establishment of a number of dedicated conferences; the ACM SIGKDD(special
intrest group on knowledge discovery in data) annual conference in 1995, and
the European PKDD(practice of knowledge discovery in databases) and the Pacific/Asia
PAKDD(pacific asiaconference on knowledge discovery and data mining) conferences
This increase in popularity can be attributed to advances in technology; the
computer processing power and data storage capabilities available meant that
the processing of large volumes of data using desktop machines was a realistic possibility.
It became common place for commercial enterprises to maintain data in computer
readable form, in most cases this was primarily tosupport commercial
activities, the idea that this data could be mined often came second. The 1990s
also saw the introduction of customer loyalty cards that allowed enterprises to record customer
purchases, the resulting datacould then be mined to identify customer
purchasing patterns. Data mining , is the method of looking into giant volumes
of data for patterns using methods like classification, association rule
mining, clustering, etc.. data mining is a topic that is related to topics like machine learning and pattern
recognition. data mining techniques area unit the results of an extended
process of analysis and products development.
I am in my final year. I
was bright and brilliant, my family was
optimistic in me; they thought so much of me, but I had a fault. What was my
fault? I hated compiler construction. I
struggled with calculations all my life. Though i have been lucky; I did well all the same. However, I had
to write my final exam. I searched for all Compiler construction past question
for each year, compared, and sorted them. Guess what I discovered! Over 35% of
the questions were repetitions. I had hit the jackpot. I carefully and
thoroughly checked through the answer page. Therefore, I kept on revising only
the repeated questions. Well, I have a good grade to show for the Data Mining I
performed.
There is huge amount of
data available in Information Industry. This data is of no use until converted
into useful information. Analyzing this huge amount of data and extracting
useful information from it is necessary. The extraction of information is not
the only process we need to perform; it also involves other processes such as
Data pre-processing( Data Cleaning, Data Integration, Data Transformation) Data
Mining, Pattern Evaluation and Data Presentation. Once all these processes are
over, we are now position to use this information in many applications such as
Fraud Detection, Market Analysis, Production Control, Science Exploration etc.
1.5
PROBLEM STATEMENT
Through in depth
research and observations carried on supermarket we have discorvered that
retailers are willing to know what product is purchased with the other or if a
particular products are purchased together as a group of items. Which can help
in their decision making with respect to
placement of product , determining the
timing and extent of promotions on product
and also have a better understanding of customer purchasing habits by
grouping customers with their
transactions.
This project is aimed at
designing and implementing a well-structured market basket analysis software
tool to solve the problem stated above and compare the result to that of an
existing software called WEKA.
1.6
SIGNIFICANCE
OF THE STUDY
1.7
AIM
AND OBJECTIVE OF THE STUDY
The
aim of the study is to maximize profit
for the retailers by providing better services
to the consumers
The
objective of this study are:
·
Cross-Market Analysis - Data Mining
performs Association/correlations between product sales.
·
Identifying Customer Requirements - helps
in identifying the best products for differentcustomers. It uses prediction to
find the factors that may attract new customers.
·
Customer Profiling - helps to determine
what kind of people buy what kind of products.
1.8
METHODOLOGY
I.
Data
Pre-Processing
Due
to the fact that the data we are getting is a raw data,raw data in the real
world may be incomplete it has to be
pre-processed the raw data has to go through data cleaning,data integration,data
normarlization,data reduction because without a quality data there will be
no quality mining results.
Ø data
cleaning:This has to do withfilling
of missing values, resolving of inconsistencies in the raw data.
Ø data
integration:combining data
from multiple sources and generating the user with unified view of the data
Ø normarlization:
normalization is used to minimize or to reduce
redundancy.
Ø data
reduction:reduction of the data set
that is much smaller in volume but yet yields the same analytical results
1.5 SCOPE
OF THE STUDY
This scope of the
study focuses on Babcock Ventures supermarket and the scope of this project
includes:
1. We
aim to develop our very own market basket analysis software, which will be used
in babcock university
2. The
software will exhibit a colorful GUI(graphical user interface).
3. The
software will be based on Apriori .
4. We
intend to conduct a research into the various branches of science that this
software will be based on, such as artificial intelligence.
5. We
will develop a software that will eventually stand out among other data mining
software.
1.6 LIMITATION
OF THE STUDY
The limitations of this
software will include:
2.
Data
restrictions:this is a major factor that stands in the
way of the execution of this project.Since there is no data on households and
individual consumers ,we neglect such purchases.
3. Time constraints: this
is also a major factor due to the fact that it can’t work on a small amount of
raw data because it tends to mislead the retailer in a nut shell this software
will work on large volumes of data.
Buyers has the right to create
dispute within seven (7) days of purchase for 100% refund request when
you experience issue with the file received.
Dispute can only be created when
you receive a corrupt file, a wrong file or irregularities in the table of
contents and content of the file you received.
ProjectShelve.com shall either
provide the appropriate file within 48hrs or
send refund excluding your bank transaction charges. Term and
Conditions are applied.
Buyers are expected to confirm
that the material you are paying for is available on our website
ProjectShelve.com and you have selected the right material, you have also gone
through the preliminary pages and it interests you before payment. DO NOT MAKE
BANK PAYMENT IF YOUR TOPIC IS NOT ON THE WEBSITE.
In case of payment for a
material not available on ProjectShelve.com, the management of
ProjectShelve.com has the right to keep your money until you send a topic that
is available on our website within 48 hours.
You cannot change topic after
receiving material of the topic you ordered and paid for.
Login To Comment