Intelligent gloves: An IT intervention for deaf - mute people

: Deaf - mute people have much potential to contribute to society. However, communication between deaf - mutes and non - deaf - mutes is a problem that isolates deaf - mutes from society and prevents them from interacting with others. In this study, an information technology intervention, intelligent gloves ( IG ) , a prototype of a two - way communication glove, was developed to facilitate communication between deaf - mutes and non - deaf - mutes. IG consists of a pair of gloves, ﬂ ex sensors, an Arduino nano, a screen with a built - in microphone, a speaker, and an SD card module. To facilitate communication from the deaf - mutes to the non - deaf - mutes, the ﬂ ex sensors sense the hand gestures and connected wires, and then transmit the hand movement signals to the Arduino nano where they are translated into words and sentences. The output is displayed on a small screen attached to the gloves, and it is also issued as voice from the speakers attached to the gloves. For communication from the non - deaf - mutes to the deaf - mute, the built - in micro - phone in the screen senses the voice, which is then transmitted to the Arduino nano to translate it to sentences and sign language, which are displayed on the screen using a 3D avatar. A unit testing of IG has shown that it performed as expected without errors. In addition, IG was tested on ten participants, and it has been shown to be both usable and accepted by the target users.


Introduction
Listening and speaking are natural abilities of humans. Unfortunately, there are many people who do not have these abilities and cannot easily communicate with others. The World Health Organization stated that approximately 70 million people in the world are deaf-mutes. A total of 360 million people are deaf, and 32

Related work
Hand gestures would assist and facilitate communication among people by providing a meaningful interaction [5]. There are several applications that embed hand gestures, such as vision-based recognition systems, game control systems, and human-robot interactions [6]. Usually, researchers use wearable sensors to collect hand movements. Then, the data are processed with any of hand gesture recognition approaches [5]. In the literature, there are mainly two approaches, which are vision-based and sensorbased [7,8]. The vision-based approach relies on processing digital images and videos while using machine learning and deep learning approaches for gesture recognition. A real-time system for hand gesture recognition based on You Only Look Once (YOLO) v3 and DarkNet-53 convolutional neural networks (CNNs) was proposed in ref. [9]. The system is evaluated on a labeled dataset of hand gesture images in both Pascal VOC and YOLO format and attained an accuracy of 97.68%. A hybrid automated system that translates sign language to text and speech was built in ref. [10]. The system uses CNN, natural language processing, language translation, and text-to-speech algorithms to recognize, interpret, express, and convert the hand gesture images to the speech of ten different languages. The system is based on a pre-existing dataset of American Sign Language (ASL) and attained an accuracy of 99.63%. A hand gesture recognition system to predict the emergency signs of Indian Sign Language (ISL) was developed in ref. [11]. The system uses a three-dimensional convolutional neural network (3D CNN) and a combination of pre-trained CNN (VGG- 16), long short-term memory, and YOLO v3 algorithm to detect hand gestures. The system is based on a video of ISL dataset [12] and attained 99.6% mean average precision for detecting the hand gestures. The visionbased approach requires using cameras, which are usually available on smartphones, to capture images or videos of the hand gestures. Despite the low cost of cameras, the main drawback of this approach lies in the complex and time-consuming processing required to recognize the hand gestures, which is negatively affected by the background noise, and illumination.
The sensor-based gesture recognition involves using sensors, such as flex sensors, to measure the bending angle, movements, orientations, and alignments of fingers, and the positioning of the palm and use these measurements to recognize the gestures. A system team called "QuadSquad" consisting of two parts, which are gloves and a mobile application running on Windows mobile (Windows 7 and 8), was developed by a research team in ref. [13]. The gloves consisted of flex sensors that would convert hand gestures that are compliant with ASL and transmit them via Bluetooth to the mobile application. The application would recognize and accept the input data from the glove sensors and transform the data to text that would then appear on the device screen and to speech using a text-to-speech engine [13]. A glove with a simple design that included several sensors, a small screen, and a mini speaker attached to the glove was developed in ref. [14]. The glove sensors would translate the hand gestures to text that appears on the screen and to speech via the speaker using a "text-to-speech chip" [14]. SignAloud is another glove developed to translate the sign language to text and speech displayed on a computer [15,16]. The glove would record the hand positions via a collection of sensors. It would send the recorded data wirelessly via Bluetooth from the Arduino controller on the glove to the Arduino controller linked to the computer screen. If the data matched one of the gestures saved in the computer, then the associated word to the gesture would be spoken through a speaker [15,16]. A glove attached with flex sensors that are used to convert the ASL to audio using the Arduino circuit board was developed by Rajapandian et al. [17]. Furthermore, the board would convert the audio to text using Analog-to-Digital converters (ADC) to be displayed on the LCD screen [17]. Likewise, a glove consisting of flex sensors, Arduino board, and an LCD was developed in ref. [18]. The sensors would transmit the sign language to the Arduino board that would process the entered data using a microcontroller, and then send the processed data to be displayed on the LCD [12]. Although the sensors might be slightly more expensive than cameras, the main advantages of the sensor-based approach are the high accuracy and that they do not require data processing as the data needed for the gesture recognition are directly obtained from the sensors' readings. The accuracy and fast processing advantages of the sensorbased approach offset the increment in the cost of this approach since the speed in communication between the deaf-mutes and the non-deaf-mutes is an important factor. Thus, the authors in this current study chose sensor-based gesture not vision-based.
In regard to enabling the two-way communication between the deaf-mutes and non-deaf-mutes, the authors in ref. [19] developed two separate Android applications for communication between non-deafmutes and deaf-mutes. The non-deaf-mute application would convert speech to visual contexts and gestures. The speaking person would give a speech input to the system. The system would convert the speech to text using Google speech-to-text application programming interface (API). Then, it would define the keywords from the text and show some visual images and gestures to the deaf-mute based on the determined keywords. The deaf-mute application enters the gestures or vibrotactile input to the system using a mobile interface, and then converts the input to speech. The system would match the given gesture with the gestures stored in the system's database and present the predefined words associated with the gestures as a speech to the non-deaf-mute person using Google text-to-speech API. The two applications are connected via Bluetooth to pass information between the devices. One of the main drawbacks of this system is the reliance on the mobile interface to input the user gestures, which might be less accurate than the glove sensor-based approach.
All the mentioned IT interventions were mainly designed to simplify communication and to assist deafmute people to be heard. The vision-based approach, which utilizes machine and deep learning algorithms, is complex in processing, and it incurs additional time for recognition that might be detrimental to the speed of communication between the deaf-mute and the non-deaf-mute. In addition, the vision-based approach requires the development of an accurate model to recognize the hand gestures when dynamic images are used as input, which is hard to achieve [10]. Despite its cost, using the sensory approach for sign language translation systems is considered the most accurate approach for capturing input data. Sensors are not affected by the external environment when collecting hand gesture data, and therefore, they provide an improved recognition accuracy. To the best of our knowledge, none of the proposed sensor-based systems have integrated the two-way communication between the deaf-mutes and the non-deaf-mutes, which involves translation from sign language to text and voice and vice versa in a single system. Thus, this work provides a sensor-based glove prototype for integrating the two-way communication between the deaf-mutes and the non-deaf-mutes, without the need of the Bluetooth technology, thereby reducing the power consumption of the system.

Research methodology
The methodology of this study's system design focuses mainly on addressing the research aim of facilitating two-way communication among non-deaf individuals and deaf individuals. Therefore, the following phases were adopted: feasibility analysis, intervention design and implementation, and testing. All phases are described in detail below.

Feasibility analysis
A questionnaire was created and published to understand whether this system has a clear need, and whether the stakeholders would use the proposed solution. The questionnaire questions are presented in Table 1. The questionnaire was distributed randomly using different channels, such as email and social media, targeting people in Saudi Arabia. This questionnaire was distributed to 335 participants of deaf-mute and non-deaf-mute people. Three hundred and nineteen (95.2%) of the responses were from non-deaf-mutes, and 246 (77%) of whom had difficulty in communicating with deaf-mute people. According to the questionnaire's results, 228 (71.5%) communicate with deaf-mute by hand gestures, 162 (50.8%) by writing, and 42 (13.2%) by photographs. Around 255 (80%) non-deaf-mute people see the potential of the proposed solution, which would assist in the communication issue with deaf people. Approximately 16 (5%) of the respondents were from deaf-mutes and 11 (68.8%) of whom had difficulty communicating with others. Around 9 (56.3%) of the deaf-mute respondents communicate with others by hand gestures, 9 (56.3%) by writing, and 4 (25%) by photographs. Almost 223 (70%) of the deaf-mute respondents said that this product would simplify their communication with other people. Therefore, the proposed solution would assist in raising the level of communication among both deaf and non-deaf individuals. Are you deaf and mute? 2.
Do you have difficulty communicating with (none) deaf and mute people? 3.
What is your method of communication with the (none) deaf and mute people? 4.
If there is a product that converts deaf-mute gestures to voice and written text and at the same time converts the speech of non-deaf-mute people to written text and deaf-mute gestures, do you think this product will help communication between deaf and mute people and non-deaf-mute people?

Intervention design and implementation
To enable bi-directional communication between the deaf-mutes and the non-deaf-mutes, a gloves instrument was designed. The two-way interaction between the deaf-mutes and the non-deaf-mutes through the glove instrument is shown in Figure 1. The deaf-mutes use sign language gestures. These gestures are converted using the instrument to voices and sentences that are audible by the non-deaf-mutes. The latter can interact with the deaf-mutes through the same instrument using voice, which is then converted into onscreen text and sign language movements, which are shown on the attached screen using an avatar. Anyone of the users, either deaf-mutes or non-deaf-mutes, can start a conversation at any time and the corresponding output will be presented by the device.
The main components of an IG are gloves, which are made of fabric, ten flex sensors, an Arduino nano board, a small screen with a built-in microphone, a speaker, and a SanDisk (SD card) module, where all the components are attached to the fabric of the gloves. The flex sensor senses the movements of the user's hand and fingers, and it measures the amount of deviation, bending, and movement angles with a high level of accuracy. It is lightweight and can be easily attached to the fabric. Figure 2 shows the flex sensors, which are attached to the gloves and connected to the Arduino using wires. The Arduino nano board collects the hand gestures received from the flex sensors and converts them to an outputvoice and written words on the screen. The SD memory card module is attached to the Arduino board, where a database stored in the SD card is used to match the hand gestures of deaf-mutes and convert them into written words and voice to non-deaf-mutes, or it is used to match the spoken words by non-deaf-mutes and convert them to written words and visual representations/video by avatar to the deaf-mute. The small screen and speaker are used to display and audit the data obtained from the SD card module. Finally, the microphone is used to capture the voice of words spoken by non-deaf-mutes. Figure 3(a) shows how the two gloves are connected to each other and the LCD. The final prototype of the IG system is shown in Figure 3(b). The circuit diagram of the proposed IG system is presented in Figure 4. The circuit contains ten flex sensors (five for right hand and five for left hand), a speaker, an SD card, an Arduino nano board, and a battery. Flex sensors are powered by connecting them to the VCC (5v) and ground (GND) pins available on the Arduino board. For supplying the flex sensors with the positive charge, they are connected to the analog pins in the Arduino nano board, where the right-hand flex sensors are connected to the pins A0-A4 and the left-hand flex sensors are connected to the pins A5-A10. In addition, a static resistor is connected to each wire supplying the positive charge to the system to create a voltage divider. Thus, a variable voltage can be measured by the ADC of the microcontroller. The second part of each flex sensor is connected to the GND pins in the Arduino nano board for the negative charge.
Flex sensors measure the degree of bending. As the flex sensor bends, a resistance output is produced that is connected with the radius of bending and increases as the sensor bends. Flex sensors transmit the measured angle of bending as voltage signal toward the Arduino board. The conversion from analog to digital signal is done with the help of the microcontroller in the board, which has a program for converting the detected analog voltage by the flex sensor to the digital form using the ADC. The digital form of flex sensor  readings is mapped to text by indexing into a file containing the text equivalent to that reading. After that, the text is mapped to the equivalent voice playback, which is stored in the SD card using the same index.
The   Figure 6 shows the design of the system, which follows a three-layer architecture. This allows for modifying each layer separately without affecting the other layers, making the system easy to maintain and extend in the future. The first layer is the interface layer, which includes the gloves, the screen, the speaker, and the microphone. The second layer is the logic layer, and it includes all the functional components of the system, which are written in C++, Arduino programming languages, and Java, such as utilizing sensors to translate movements to sentences. The functional components are as follows: (1) translating sign language to voice, (2) translating voice to text, and (3) translating voice to sign language. The third layer is the data access layer, which includes the database of sign language gestures and written and spoken words for matching the input by the gloves or the microphone and displaying the corresponding output. The database used in the IG system consists of text keywords and their representations in different formats, which are sequences of digits representing the gesture readings of the keywords, recorded voices of the keywords, and animated videos using an avatar.
The study followed a scrum framework in the design of the gloves. The work was broken down to smaller parts known as user sprints that provide high-level details around the desired functionality for a specific user. The sprints were arranged from the highest to the lowest priority. The Backlog of the IG contains four sprints A, B, C, and D.
Sprints A and B include the deaf-mute parts of the system. They include programming the Arduino in the gloves. Sprint A is the programming of the Arduino to make it translate hand gestures to voice coming from the speaker. The Arduino was programmed using Arduino programming language and C++ in the Android studio environment to monitor hand gestures in the form of a range of numbers for each movement, and the numbers were compared with the words stored in the SD card. Then, we pronounce the word or sentence. Sprint B is the programming of Arduino using the Arduino programming language and the Java programming language in Android studio environment to translate hand gestures to text that appears on the screen. Sprints C and D include the non-deaf-mute parts of the system. They include screen programming. Sprint C involves programming the interface using Java in Android studio environment to enable the screen to display the translated non-deaf-mute's voice to texts. Sprint D involves programming the interface using Java in the Android studio environment to enable the screen to display an avatar that matches the translated text obtained from sprint C. The duration of each sprint is 2 weeks, resulting in 8 weeks of work as a total time for the design and implementation of the gloves. Figure 7 shows the prototype scenario of each sprint.

Communication between deaf-mutes and non-deaf-mutes
Normally, deaf-mute people utilize sign language to communicate with non-deaf-mute people. Deaf-mutes and non-deaf-mutes communicate with each other using very simple sentences, which are composed of a few nouns and verbs. The system in this study was designed for the basic communication between deafmutes and non-deaf-mutes. The database in our system consists of simple keywords related to basic conversation sentences, which would be used by users of different age groups in different contexts of their daily life activities. Using the IG system does not require the non-deaf-mutes to learn sign language, since the sign language is converted using the system into on-screen text and voice. However, non-deaf-mutes need to know that sign language is composed of simple sentences with uncomplicated vocabularies of nouns and verbs. The communication process starts when the deaf-mute user wears the IG. Then, the user's hand gestures and finger movements while using sign language are sensed by the flex sensors and sent to the SD card module where they are matched with gestures stored in the database. Using this database, gestures are converted into words, which are then presented on the small screen as texts and a voice that is audible to the non-deaf-mutes through the speakers. From the other side, the system converts the words uttered by the non-deaf-mutes to written words on the screen and sign language representations that are shown on the screen using an avatar. The process starts when non-deaf-mutes speak the words through a microphone. The voices sensed through the microphone are sent to the SD card module where they are matched with the voices of words stored in the database. The corresponding sign language gestures are then shown on the screen using an avatar.

Evaluation
To evaluate the functionality of the IG intervention, functional testing and usability testing were performed. Functional testing was done to ensure that all the system components are working as expected, and usability testing was done to ensure that the system is easy to use by the target users. The evaluations done in this study are similar to the evaluation of the systems proposed in refs [18,19]. The functional system focuses on the overall validation of the system behavior and its ability to perform as expected. The functional testing involved unit and integration testing. In software development, unit testing is often performed as a first step in testing any software [20]. Unit testing is a practical method that is used to increase a piece of software's correctness and quality [21]. One of the main advantages of performing unit testing is to assist in fixing bugs and saving costs during the development cycle [21]. Thus, each function in the IG intervention was considered as a unit. Each unit was tested individually during the development phase and before recruiting the participants. After ensuring the correctness of each unit and as a next step, developers needed to test the integration of two or more units, which is called integration testing. The main aim of integration testing is to ensure that all parts interact with each other in their intended environment to match the expected results [20]. Converting sign language to voice, converting sign language to text, converting voice to texts, and converting voice to sign language represented by an avatar are some of the integrations that were tested. The findings from IG intervention indicated that there were no errors in the system, as all components interacted correctly and matched the expected results, showing that the IG intervention was technically working. Table 2 shows some examples of the performed unit testing results.
To test the usability of the IG system, the convenience sampling technique was used to recruit ten participants during the testing phase. This included five deaf-mutes and five non-deaf-mutes. The participants were college undergraduate students, and their ages range from 18 to 23 years old. The non-deaf-mutes have not dealt with deaf-mute in advance. Before the user testing, simple instructions about the operations of the IG system were given to the participants in written format. Deaf-mutes were asked to wear the IG gloves and use the sign language gestures to communicate with the non-deaf-mutes. Then, the matching voices can be heard by the non-deaf-mutes and the matching text words will appear on the screen. The non-deaf-mute participants were asked to use the built-in microphone to communicate with the deafmutes. Then, the matching text word and avatar will appear on the screen as seen in Figure 9(a and b).
Participants were asked to perform the scenarios mentioned in Table 2. The system was able to recognize the signs performed by the deaf-mute participants in scenarios A and B and display the matching voices and texts. In addition, the system has effectively recognized the words said by the non-deaf-mute participants in scenarios C and D and displays the matching text and animated video. Some users expressed discomfort when using the system due to the cabling of the system. Nonetheless, all the participants agreed that the IG system is overall easy to use and it is suitable to use for two-way translation of sign language.
As for the comparison between the proposed prototype and existing IT gloves, unlike the systems developed in refs [13,18,19], which focused on only one side of the communication from the deaf-mutes to the non-deaf-mutes, the proposed approach focuses on investigating the feasibility of integrating the twoway communication in a single system that would help in facilitating the interaction between the deaf-mutes and the non-deaf-mutes. Results of evaluating the IG system show that it is overall feasible to integrate a twoway communication system in one device, and it shows that the system is easy to use by the target users. In addition, unlike some of the proposed systems in refs [13,16,18,19], in this study, a questionnaire was administered to collect the system requirements and assess its acceptability by the users.  "hello" " Hello" displays on the screen "Hello"

Pass
Translating the voice to words/sentences that appear on the screen 2 Non-deaf-mute person says: "Where is she?" "Where is she" displays on the screen "Where is she"

Pass
Translating the voice to words/sentences that appear on the screen Scenario D: Converting voice to sign language represented by animated video of an avatar on the screen 1 Translates voice to sign language represented by animated video of an avatar Non-deaf-mute person says: "Relax"

Pass
Translating the voice to sentences on the screen, as well as to a sign language by the animated video Intelligent gloves: An IT intervention for deaf-mute people  11

Conclusion and future work
Hearing and speaking loss is a major issue that many people around the world face. More than 5% of the world's population needs rehabilitation to assist them with their disability. This percentage as estimated will almost double by 2050. This shows the urgent need to design and develop an IT intervention that may aid these people to communicate with other people normally without the need to have a translator in the middle. The proposed IT intervention, IG, is a promising solution that would help deaf-mute and non-deafmute people to communicate easily. The system works in both directions as it translates from sign language to voice and text, and vice versa. However, the current IG system is wired, which might constrain the user's freedom of mobility. In the future, the system could enable more flexible movement of the users by utilizing wireless communication such as Bluetooth and WiFi. In addition, reducing the price of IG by replacing some of its components with cheaper parts would make the system more accessible. Also, researchers for this study will add a customization feature, so the avatar will, for example, change based on the users' gender. Additionally, the avatar may also be customized based on the user's age for better usability of the system for different age groups. Finally, we will add an accelerometer sensor to improve the detecting accuracy that is achieved in this current study.