ABSTRACT
The proliferation of smartphones has made
speech-to-text technology a vital tool for enhancing human-computer
interaction, offering a faster, more natural alternative to traditional typing.
This project focuses on the design and implementation of a speech-to-text
application for the Android platform. The application aims to convert spoken
language into written text accurately and efficiently, thereby addressing
challenges such as the difficulty of typing on mobile devices, especially for
individuals with disabilities or those in need of hands-free operation.
The system leverages the Android operating system's
built-in SpeechRecognizer API, alongside consideration of
offline-capable engines like Vosk, to process user audio input. The development
follows a structured methodology encompassing requirements gathering, system
design, API integration, implementation in Java/Kotlin, and rigorous testing.
Key functionalities include real-time speech capture, audio pre-processing,
conversion to text, and the ability for users to edit, save, and export the transcribed
text in various formats.
The primary objective is to create a user-friendly
and accessible application that mitigates the limitations of existing systems,
such as dependency on internet connectivity and poor performance in noisy
environments or with diverse accents. Evaluation metrics, including Word Error
Rate (WER), accuracy, and latency, are used to assess the system's performance.
This project demonstrates the practical development of a mobile application
that harnesses speech recognition technology to facilitate seamless communication
and improve digital accessibility for a broad range of users.
TABLE
OF CONTENTS
TABLE OF CONTENTS
CERTIFICATION……………………………………………………………………………….ii
DEDICATION…………………………………………………………………………………..iii
ACKNOWLEDGEMENTS………………………………………………………………………iv
ABSTRACT………………………………………………………………………………………v
TABLE
OF CONTENT…………………………………………………………………………..vi
CHAPTER
ONE: INTRODUCTION
1.1 BACKGROUND OF THE
STUDY…………………………………………………….1
1.2 STATEMENT OF
PROBLEM………………………………………………………….2
1.3 AIM AND
OBJECTIVES……………………………………………………………….3
1.4 SIGNIFICANCE OF THE
STUDY……………………………………………………..3
1.5 SCOPE AND LIMITATION OF THE
STUDY……………………….………………..4
1.6 DEFINITION OF
TERMS………………………………………………………………5
CHAPTER
TWO: LITERATURE REVIEW
2.1 OVERVIEW OF SPEECH RECOGNITION
TECHNOLOGIES………………..………6
2.2 ANDRIOD DEVELOPMENT AND MOBILE
APPLICATION ……………..…………7
2.3 EXISTING SPEECH-TO-TEXT
SYSTEMS………………………………….………….8
2.4 GAPS IN EXISTING SYSTEMS………………………………………………..………10
2.5 SUMMARY OF LITERATURE
REVIEW……………………………………………...11
2.6 TABULAR FORMAT OF RELATED WORK
REVIEW………………………………11
CHAPTER
THREE: SYSTEM INVESTIGATION AND ANALYSIS
3.1 PROBLEM DEFINTION……………………………….………….……………………22
3.2 METHODOLOGY………………………………………..……………………………..22
3.3 WORKING PRINCIPLES………………………………………….……………………23
3.3.1 VOICE INPUT COLLECTION…………………………………………...…………….23
3.3.2 PRE-PROCESSING OF INPUT……………………………………..………...………..23
3.3.3 SPEECH-TO-TEXT CONVERSION……………………………………………………23
3.3.4 RESULT DISPLAY AND EXPORT………………………………………………..…..23
3.3.5 EVALUATION METRICS……………………………………………………….……..24
3.4 SYSTEM DESIGN DIAGRAMS………………………………………………….…….24
3.4.1 USE CASE DIAGRAM………………………………………………………………….24
3.4.2 DATA FLOW DIAGRAM………………………………………………………………25
3.4.3 SYSTEM ARCHITECTURE……………………………………………………………26
3.5 SYSTEM REQUIREMENTS……………………………………………………………26
3.5.1 SOFTWARE REQUIREMENTS………………………………………………………..26
3.5.2 HARDWARE REQUIREMENTS……………………………………………………….26
CHAPTER FOUR: SYSTEM DEVELOPMENT
4.1 SYSTEM DESIGN………………………………………………………………………27
4.1.1 OUTPUT DESIGN……………………………………………………………………....27
a)
REPORTS TO BE GENERATED………………………………………………..….27
b)
SCREEN FORMS OF REPORTS………………………………………………..….27
c)
FILES USED TO PRODUCE REPORTS………………………….……………..….29
4.1.2 INPUT DESIGN…………………………………………………….….………………..29
a)
LIST OF INPUT ITEMS REQUIRED…………………………………………...…..29
b)
DATA CAPTURE SCREEN FORMS FOR INPUT………………………….……..30
c)
FILES USED TO RETAIN INPUTS……………………………………...…………30
4.1.3 PROCESS DESIGN………………………………………………………..……………31
a)
LIST ALL PROGRAMMING ACTIVITIES NECESSARY………………….....….31
b)
PROGRAM MOUDLES TO BE DEVELOPED………………………………...…..31
c)
VIRTUAL TABLE OF CONTENT…………………………………………….……31
4.1.4 STORAGE DESIGN………………………………………………………...…………..32
a)
DESCRIPTION OF THE DATABASE USED………………………………..…….32
b)
DESCRIPTION OF THE FILES USED……………………………………………..32
4.1.4
DESIGN SUMMARY…………………………………………………………………..32
a) HIERARCHICAL INPUT PROCESSING OUTPUT (HIPO)
CHART………….….32
4.2 SYSTEM IMPLEMENTATION…………………………………………………..…….32
4.2.1 PROGRAM DEVELOPMENT ACTIVITY…………………………………………….33
a)
PROGRAMMING LANGUAGE USED………………………………………….....33
b)
ENVIRONMENT USED FOR DEVELOPMENT…………………………………..33
c)
SOURCE CODE……………………………………………………….……….……33
4.2.2 PROGRAM TESTING…………………………………………………………………..33
a)
CODING PROBLEMS ENCOUNTERED………………….……………….………33
b)
USE OF SAMPLE DATA………………………………….………………..………33
4.2.3 SYSTEM DEVELOPMENT………………………..…….……………………..………34
a) SYSTEM REQUIREMENT……………………………………………………….…34
b) TASKS PRIOR TO IMPLEMENTATION…………………………………….….…34
c) USER GUIDANCE………………………………………………...…….…………..34
4.3 SYSTEM DOCUMENTATION………………………….……………………..…….…35
4.3.1 FUNCTIONS OF PROGRAM MODULES……………………………………….….…35
4.3.2 USER’S MANUAL………………………..………………………………………….…35
CHAPTER
FIVE: SUMMARY, CONCLUSION AND RECOMMENDATION
5.1 SUMMARY…………………………………………………………………….………..37
5.2 CONCLUSION……………………………………………………….………………….38
5.3 RECOMMENDATION…………………………………………………….……………38
REFERENCES
APPENDIX
I
APPENDIX
II
CHAPTER ONE
INTRODUCTION
1.1
BACKGROUND OF THE STUDY
In
our everyday lives, speaking is one of the most common and natural ways people
communicate. We use speech to share our thoughts, ideas, and feelings with
others. It is faster and easier than writing or typing, which is why it plays
an important role in how we interact.
With the growth of technology,
speech is now being used to control devices and applications. One important
technology that makes this possible is called speech-to-text. This means turning spoken words into written text.
It is very useful for sending messages, writing notes, or even searching the
internet just by talking to your phone.
Speech-to-text
is part of a bigger field called speech
processing. This includes things like recognizing who is speaking
(speaker recognition), making a computer talk (speech synthesis), or
understanding what someone is saying (speech recognition). Among these, speech
recognition is very popular because it helps people use their devices without
needing to type.
Smartphones,
especially Android phones, are now very common and have made it easier for
people to use speech-to-text apps. Google has a voice recognition tool that
lets users talk to their phones to send texts, search online, or give commands.
This is helpful not just for convenience, but also for people who have
difficulty typing—like those with disabilities or injuries.
Some
examples of voice assistants are Google
Voice Actions and Apple’s Siri.
They let users send messages, make phone calls, and more using only their
voice. However, Siri is only available on some iPhones, while Google’s
speech-to-text feature is available on most Android phones.
This
project aims to build a simple Android app that takes speech as input and
changes it into text. The app can help people who find it hard to type or just
prefer to speak. It can also be useful for people who are deaf or hard of
hearing by helping them understand spoken words more easily.
In
short, this project will help make communication easier and faster for many
people. It will also show how speech technology can be used to build helpful
tools for everyday use.
1.2 STATEMENT
OF THE PROBLEM
Typing on a mobile phone can
sometimes be slow, stressful, or difficult—especially for people who have
disabilities, are in a hurry, or don’t know how to type well. Many users want a
faster and easier way to write messages, notes, or search online without using
a keyboard.
Even
though some voice-to-text applications already exist, they often require an
internet connection, may not support all Android phones, or may not understand
natural speech very well. Also, some of these apps are not user-friendly or
only work on specific devices like the iPhone.
People
with hearing or speaking difficulties also face communication challenges. They
need tools that can help them turn speech into text quickly and clearly, so
they can interact with others and access information easily.
This
project is created to solve these problems by designing a simple Android
application that can convert speech into text. The goal is to make
communication faster, easier, and more accessible for everyone especially those
who find typing difficult or impossible.
1.3 AIM
AND OBJECTIVES OF THE STUDY
Aim:
The aim of this project is to design and develop a speech-to-text Android
application that allows users to speak into their phones and have their words
converted into written text.
Objectives:
To achieve this aim, the study will:
- Develop an Android
app that can take speech as input and convert it into text.
- Use
a speech recognition engine (like Google’s) to process spoken words.
- Make
the app user-friendly and easy to use for all users, including those with
disabilities.
- Test
the app to make sure it works well and accurately converts speech to text.
- Reduce
the time and effort needed to type by using voice instead.
1.4 SIGNIFICANCE
OF THE STUDY
This study focuses on the design
and development of a speech-to-text mobile application specifically for Android
smartphones. The application aims to utilize Google’s speech recognition
service or any other suitable engine compatible with Android devices. Its core
functionality is to enable users to speak into their phone and have their
speech automatically converted into written text displayed on the screen. The
app will primarily support the English language, as well as any other languages
supported by the selected speech engine. Additional features include the
ability to send the converted text as a message, such as through SMS.
However, there are certain
limitations associated with the development and deployment of this application.
One major limitation is the possible requirement for internet connectivity,
particularly when using Google’s speech recognition services, which may hinder
offline usage. Furthermore, the accuracy of speech recognition may be affected
by background noise, strong or unclear accents, or poor microphone quality.
Language support is also dependent on the capabilities of the chosen speech
engine and may not cover all desired languages. Lastly, due to constraints in
time and resources, comprehensive testing across all Android devices may not be
feasible, potentially affecting compatibility and performance consistency.
1.5 SCOPE
AND LIMITATIONS OF THE STUDY
This study
focuses on the design and development of a speech-to-text mobile application
for Android smartphones. The application is intended to convert spoken words
into written text using Google's speech recognition service or any other
compatible engine available for Android devices. It will support English and
other languages that the chosen speech engine can recognize. Core features of
the app will include real-time speech-to-text conversion and the ability to
send the converted text as a message, such as an SMS.
However, there are certain
limitations to the study. The application may require internet connectivity to
function effectively, especially when relying on cloud-based speech recognition
services like Google’s. Additionally, the app's performance may decline in
noisy environments or when used by individuals with heavy accents or unclear
speech, potentially affecting accuracy. Language support will be limited to
what the speech engine offers, and due to constraints in time and resources,
the app may not be tested on all models of Android smartphones, which could
affect its generalizability and performance across different devices.
1.6 DEFINITION OF
TERMS
- Speech Recognition: A technology that allows a
device to understand and process spoken language.
- Text: Written words that can be
displayed or sent on a screen.
- Android: A mobile operating system used
by many smartphones.
- Application (App): A software program designed to
perform specific tasks on a mobile device.
- Speech-to-Text: The process of turning spoken
words into written text.
- User Interface: The part of an app that users
interact with (buttons, screens, etc.).
Voice
Input: Giving a command or message to a device by speaking
instead of typing.
Buyers has the right to create
dispute within seven (7) days of purchase for 100% refund request when
you experience issue with the file received.
Dispute can only be created when
you receive a corrupt file, a wrong file or irregularities in the table of
contents and content of the file you received.
ProjectShelve.com shall either
provide the appropriate file within 48hrs or
send refund excluding your bank transaction charges. Term and
Conditions are applied.
Buyers are expected to confirm
that the material you are paying for is available on our website
ProjectShelve.com and you have selected the right material, you have also gone
through the preliminary pages and it interests you before payment. DO NOT MAKE
BANK PAYMENT IF YOUR TOPIC IS NOT ON THE WEBSITE.
In case of payment for a
material not available on ProjectShelve.com, the management of
ProjectShelve.com has the right to keep your money until you send a topic that
is available on our website within 48 hours.
You cannot change topic after
receiving material of the topic you ordered and paid for.
Login To Comment