Activity recognition is an active research area [1, 4–6, 8]; however, despite this, it has not yet reached a satisfactory performance or resulted in a standard method . Among the objectives of this area is the recognition of distress situations among the elderly or disabled persons in their habitats, for the purpose of their surveillance. Providing help to elderly and disabled persons from a remote location with the use of information and communication technologies, as well as artificial intelligence techniques and tools is part of what we call “health smart homes” (HSH). Several tools are used to meet this objective, such as infrared sensors, cameras, and microphones; however, given the high cost of these tools, many of these sensors and the need for interaction research tends toward the use of the audio channel. We cite as an example the AUDITHIS system , which allows the analysis of sound and speech in the HSH from eight microphones.
The whole idea of our work is the realization of a system of classification and separation of audio sources in a habitat for an application for telemonitoring of elderly or disabled persons. However, several problems exist , such as the choice of parameters and classification methods, sound quality (presence of noise), volume of information (several signals from different channels are acquired simultaneously), presence of noise, and finally the problem of defining a database of everyday life sounds, which is the goal of our research.
In this article, we first present the key concepts of the research area, and then we present works and projects that address recognition of activities and detection of distress situations in an HSH. Thereafter, we describe the overall system architecture and its main modules. Finally, we discuss the different sounds used in a telemonitoring application, which inspired us to create our database.
2 Key Definitions
In this section, we present the definition of telemedicine, HSHs, and sound and speech recognition, which are the key concepts of our research area.
The term “telemedicine” encompasses systems using video, audio, digital information, and other communication tools deriving from the technology used to transmit information and data relating to medical diagnosis and medical treatments, and providing care and health services to patients located in remote physical environments . Originally, the definition applied to advisory services delivered primarily through interactive video; however, since the advent of the Internet and multimedia, telemedicine has evolved into “telehealth,” which has a wider scope than the telemedicine networks, and considers not only patient education, prevention of disease, and therapeutic decision-making but also administrative resources, patients’ physiological data, and medical databases .
2.2 Health Smart Homes
The “smart home” derives from information and communications technology, which has received great interest from the scientific community in recent years. It is a habitat equipped with information and communication facilities designed to work together to anticipate and meet the needs of occupants, working to promote their comfort, safety, and entertainment while preserving their natural interaction with the environment .
An important application of the smart home is HSH. In the medical field, the main challenges of HSH are home health care, telemedicine, and sociomedical tele-assistance. The priority/targeted populations are the elderly or isolated persons, patients with chronic conditions (physically disabled, cardiopaths), and persons under temporary medical supervision (women with risky pregnancies, etc.).
2.3 Sound and Speech Recognition
Sound recognition is a challenge that has been explored for many years using machine-learning methods with different techniques . It can be used for home automation and people aids, and in many applications inside the home, such as the quantification of water use or detection of distress situations.
Speech recognition and sound recognition have been applied to technology for human assistance, such as the control of a wheelchair by using a given set of vocal commands .
3 Works and Projects Related to Recognition of Activities and Detection of Distress Situations
Recognition of activities and distress situations is done through the information provided by the sensors installed in the HSH. A popular trend is to use the maximum amount of sensors in order to acquire more information. An opposite trend is to use the minimum amount of sensors  in such a way that would allow designing the most powerful system possible, and at lower prices. Because we are interested in the second trend, we chose the audio channel. Although few systems have the capacity to recognize audio [23, 28], we present here a series of works that deal with the recognition of activities and the detection of distress situations on the basis of the sound type information.
Maunder et al.  built a database of sounds of daily living acquired by using two microphones in the kitchen. They tried to differentiate between sounds such as a telephone ringing, a cup falling, a spoon dropping to the floor, etc. Another group  collected sounds in an office environment and tried unsupervised algorithms to classify the sounds of everyday work life. The work presented in Reference  aimed to recognize distress situations at home in embedded situations by using surveillance tools equipped with affordable hardware (with standard sound cards and audio microphones). Another trend  is to perform speech synthesis, speech recognition, and the construction of a coherent dialogue with a person. Such research has applications in robotics aimed at providing company to a person in order to reduce loneliness.
Reference  presents a complete sound recognition system for identifying the different sounds in an apartment to recognize the activities of daily living currently performed, combined with a voice recognition system in French to find distress keywords within the measured signal. Thus, the CARE project , with the use of many sensors (localization, temperature, etc.), allows the recognition of activities such as “going to the toilet” and “exiting the apartment.” In Britain, Hong et al.  created a model that distinguishes the activity of preparing a hot or cold beverage with health activities based on the theory of evidence. Fleury et al.  presented the selection and arrangement of a set of sensors (infrared detectors, door switches, microphones, etc.) in an apartment, used to classify a set of seven activities of daily living: resting, dressing, casual walking, dining, communicating, maintaining hygiene, and disposing of garbage. Litvak et al.  used microphones to detect a special kind of distress: a fall. The solution was based on floor vibration and acoustic sensing, and uses a pattern recognition algorithm to discriminate between fall events involving a human or an inanimate object.
4 Overall System Architecture
An application of sound recognition is composed of two main modules: the extraction module of the most relevant acoustic parameters of the processed signal, and the recognition module that associates the sound with the corresponding (most likely) class; it is the identification phase of the input sound.
Our system is composed of two subsystems:
The first subsystem deals with the detection of sounds in the presence of noise, and their separation (mixture of sounds); this is the input for the second subsystem.
The second subsystem is a classifier, which may be Support Vector Machine (SVM)/ Hiden Markov Model (HMM) based or other types of classifiers, used to classify the sounds resulting from the first subsystem. Figure 1 shows the overall architecture of the system.
The implementation of such a system requires, first, the construction of an everyday life sound database. For pattern recognition systems, there are always standard databases for the assessment of proposed systems. However, the construction of an everyday life sound database in real conditions (with environmental noise) is largely less explored.
In effect, sounds acquired by the system and sounds from different sources are acquired together. If we consider the number of microphones installed in the apartment to be n, then we have n sound sources in a time t, and therefore the information gained may be the same but with a variation in signal power (e.g., the sound of a falling object in the kitchen can be acquired simultaneously by all microphones in the apartment – bathroom, living room, etc. – but the sound of the fall in the kitchen is the strongest). This is also true if we consider that there is no noise in the apartment, or several sounds can be acquired simultaneously, e.g., television sound in the sitting room, sound of flowing water in the bathroom, and a coughing sound from the inhabitant. It is therefore important to identify the sound to be taken into account, i.e., choose the sound that provides the most information.
The detection and separation phase is very important because it enables to
Avoid duplication of information to process, and thus save time and storage space;
Make a good determination of a distress situation if it is done with caution.
When the signals to be processed are selected, it is time to classify them. The classification is done at two levels:
Classification of sounds like sounds of everyday life or speech;
Recognition of the sound class or speech.
Depending on the resulting sound class or recognized speech, the system will send or not send an alarm.
5 Everyday Life Sound Databases
In this section, we present some existing databases of everyday life sounds and how these sounds are divided into categories.
In References [10, 25, 27], Fleury, Vacher, and coworkers defined six categories of sounds except speech: (i) human sounds; (ii) object and supply manipulation, linked to the activity of a person; (iii) outside sounds; (iv) sound of devices; (v) sound of running water – this particular category provides interesting information on activities such as disposal, hygiene, and meal preparation; and (vi) other sounds. Examples of each category are presented in Table 1.
In the framework of the RESIDE-HIS project, sounds are divided into two categories : (i) useful sounds (impulsive and short), including falling objects, glass breaking, door slamming, etc.; and (ii) environmental noise (long and stationary), such as water flowing, hair dryer, electric shaver, etc.
Istrate , for example, created a database of everyday life sounds for an application for telemonitoring of elderly and disabled persons in a habitat. The method for creating this database was based on the one used in the speech database with regard to labeling and description files. The sound database was organized as follows: 15% of the sounds were recorded in a studio, 15% of the sounds were recovered from a CD of effects for films , and 70% of the sounds came from the CD “Sound scene database in real acoustical environments” (Real World Computing Partnership, ATR Laboratories) . In Reference , Istrate and coworkers defined seven classes of sound: door slamming, glass breaking, phone ringing, sound of footsteps, screams, sound of dishes, and door locking. The corpus contains both the sounds associated with distress situations, such as glass breaking, objects falling, and screams, as well as the usual sounds such as the sound of footsteps, door slamming, and sound of dishes. The database must also contain the environmental sounds, such as television, radio, hair dryer, and water flowing, considered as noise .
The everyday life sounds database of Istrate  consists of the following sounds: stapler sounds, clapping, a chair moving, human sounds (sneezing, yawning, laughing, snoring, coughing), opening a pressure vessel, sound of paper crumpling, footstep sounds, punching sounds, slamming of different doors (entrance door, cabinet, refrigerator), electric shaver, hair dryer, door locking, dishes, a chair or a book falling, screams, water flowing in the sink, shower running and glass breaking, ringing tones, and HSH background noise.
The information source of sound sensors can be 
A direct source for a distress situation:
Recognition of a distress call;
Detection of suspicious sounds (objects falling, screaming);
Long absence of any sound activity during the day.
An indirect source: detection of a sequence of everyday life sounds by analyzing over a long period can point to a pathological symptom such as nocturnal urinary disorders. A lack of activity during the day can be an indicator of a grave distress situation.
6 Creating Our Database
6.1 Description of the Sound Database
Sounds may be speech or everyday life sounds. To create our database, we divided everyday life sounds into two categories:
Normal sounds (related to usual activities): e.g., door slamming, door locking, door opening, phone ringing, sound of footsteps, and sound of dishes.
Critical sounds (possibility of existence of a distress situation): e.g., glass breaking, objects falling, and screaming.
Normal sounds are further divided into two categories:
Useful sounds, which may help detect a distress situation when combined with other information;
Disturbance sounds, which are considered as noise, such as television and radio sounds.
As described in Reference , a part of our database is obtained by recording different types of sounds, the second part from commercial CDs, and the third part by extracting sound from mp3 or video files. The last part will be recovered from existing databases such as those reported in References [3, 14, 26].
The principle in our choice of sounds is, on the one hand, to expand the class of sounds as much as possible to ensure a high rate of recognition and thus reduce the error rate, and, on the other hand, exploit this database to achieve other objectives such as recognition of activities. Consequently, it should be noted that even if the sound does not seem significant, it is involved in the recognition of activities when combined with other information.
The list of sounds we need for our application is summarized in Table 2.
6.2 Recording Parameters
We chose a sampling frequency of 44.1 kHz to faithfully reproduce the signal after digitization. We chose the “.wav” format for sound files because it is a standard format and can be read by various software, besides being easy to convert to other formats .
The signal-to-noise ratio of the recordings will take several values that vary between 10 and 40–70 dB. The exact values are not yet defined. The length of the sound files is 20 s. This length was chosen by taking into account the maximum length of sounds to be treated and seeing that the initialization time for some algorithms is ≈5 s . Figures 2–5 show examples of some recorded sounds using a Matlab environment.
In this section, we present a practical implementation for classifying the various environmental sounds in a habitat. The application is based on Matlab environment.
In this experiment, we used a database that consists of the following sounds: knocking on doors, phone ringing, screaming, objects falling, sound of dishes, and muezzin, with a number of repetitions of 10 for each type of sound. The format of files is .wav and the sampling frequency used is 8 kHz.
In the parameterization phase, we used two parameters:
Zero crossing rate (ZCR):
With these two parameters, we calculated the acoustical vectors of each sound of the database and also of the sounds to be classified.
The classification is based on the acoustic parameters of the database that are previously calculated and the parameters of the .wav file to be classified.
Our system compares between the acoustic vectors of each segment and other acoustic vectors of the database. By calculating the difference between the vectors of our database and the other .wav files being classified and also by likelihood, the system decides whether the sound segment belongs to that class or not. The statistical function developed in Matlab classifies the sound in one of the classes in the database (doors knocking, shouting, phone ringing, etc.); this function uses a comparison of statistical parameters and calculates the closest distance.
Figure 6 presents the graphical user interface (GUI) developed with Matlab software to simulate the application of the classification of environmental sounds.
We have done some tests on some sounds, and we present here the results of classification of the three sounds: knocking on doors, screams, and speech. This application gives us information on the sound to be classified, such as duration, sampling frequency, temporal representation of the signal, spectrogram, display of the distribution of classes’ percentages, and classification of signal segments.
7.2 Results and Discussion
The aim of this experiment is to present the global idea of our work and to explain the necessity of creating a database of everyday life sounds, as we are only in the first step of our work: database creation. We have used a simple classifier, and we selected the following parameters: ZCR, energy, and formants. As it is shown in the figures, the recognition rate is encouraging and the different sounds are recognized.
In this article, we provided an overview on HSHs and telemedicine, and then we presented various works related to recognition of activities and detection of distress situations. Subsequently, we described the existing databases created to validate systems of telemonitoring of elderly or disabled persons by means of detection of distress situations. Then, we presented our goal of creating a sound database, which is currently in progress; the detailed steps of database creation are not covered here because it is not yet completed. Finally, we presented our experiment, which is a practical implementation of a classifier of various environmental sounds in a habitat. The application is based on a Matlab environment. As mentioned above, it is necessary to create a database of everyday life sounds because it will be used in both making the choice of the relevant parameters and in the classification phase. The recognition rate is encouraging; however, with the use of other classifiers and the choice of other parameters, the result may be different. Therefore, the goal of our research is to find the most relevant parameters with the use of the appropriate classifier to obtain a better recognition rate.
F. Albinali, N. Davies and A. Friday, Structural learning of activities from sparse datasets, in: 5th IEEE Int. Conference on Pervasive Computing and Communications, pages 221–228, 2007.Google Scholar
S. Bonhomme, Méthodologie et outils pour la conception d’un habitat intelligent (Methodology and tools for the design of an intelligent home), Thèse, pp. 19, july, 2008, France.Google Scholar
E. Castelli, M. Vacher, D. Istrate, L. Besacier and J. F. Sérignat, Habitat telemonitoring system based on the sound surveillance, in: 1st International Conference on Information Communication Technologies in Health, ISBN 960-813-17-1, pp. 141–146, Greece, july, 2003.Google Scholar
S. Dalal, M. Alwan, R. Seifrafi, S. Kell and D. Brown, A rule-based approach to the analysis of elders’ activity data: detection of health and possible emergency conditions, in: AAAI 2005 Fall Symposium, Workshop on Caring Machines: AI in Eldercare. Arlington, Virginia, November 4–6, 2005.Google Scholar
M. Fezari and M. Bousbia-Salah, Speech and sensor in guiding an electric wheelchair, Autom. Control Comp. Sci. 41 (2007), 39–43.Google Scholar
A. Fleury, Détection de motifs temporels dans les environements multi-perceptifs – application à la classification des Activités de la Vie Quotidienne d’une Personne Suivie à Domicile par Télémédecine, PhD Thesis, University Joseph Fourier, Grenoble, 2008.Google Scholar
A. Fleury, N. Noury and M. Vacher, Application des SVM à la classification des Activités de la Vie Quotidienne d’une personne à partir des capteurs d’un Habitat Intelligent pour la Santé, hal-00422566, version October 1–7, 2009.Google Scholar
A. Fleury, M. Vacher, F. Portet, P. Chahuara and N. Noury, A multimodal corpus recorded in a health smart home, in: Proceedings of the Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, LREC, May 18–21, 2010, pp. 99–105. Malta, 2010.Google Scholar
A. Harma, M. McKinney and J. Skowronek, Automatic surveillance of the acoustic activity in our living environment, in: Proc. IEEE Int. Conference on Multimedia and Expo ICME 2005, pp. 634–637, July 6–9, 2005.Google Scholar
R. Harper, Inside the Smart Home, Springer, London, 2003.Google Scholar
D. M. Istrate, Détection et reconnaissance des sons pour la surveillance médicale, Thèse INPG, Spécialité SIPT de l’Ecole Doctorale EEATS, Grenoble, December 16, 2003.Google Scholar
D. Istrate, M. Vacher, E. Castelli and C.-P. Nguyen, Sound processing for health smart home, in: 2nd International Conference on Smart Homes and Health Telematics, p. 8, Singapore, September 15–17, 2004.Google Scholar
D. Istrate, M. Vacher and J.-F. Serignat, Embedded implementation of distress situation identification through sound analysis, J. Inform. Technol. Healthcare 6 (2008), 204–211.Google Scholar
B. Kröse, T. Van Kasteren, C. Gibson and T. Van den Dool, CARE: context awareness in residences for elderly, in: International Conference of the International Society for Gerontechnology, Pisa, Tuscany, Italy, June 4–7, 2008.Google Scholar
D. Litvak, Y. Zigel and I. Gannot, Fall detection of elderly through floor vibrations and sound, in: Proc. 30th Annual Int. Conference of the IEEE-EMBS, pp. 4632–4635, 2008.Google Scholar
M. M. Maheu and A. Allen, 20 Aout, 2007, Available from: http://telehealth.net/glossary.html. Accessed 20 Aout, 2007.
D. Maunder, E. Ambikairajah, J. Epps and B. Celler, Dual-microphone sounds of daily life classification for telemonitoring in a noisy environment, in: Proc. 30th Annual International Conference of the IEEE-EMBS 2008, vols. 1–8, August 20–24, 2008, Vancouver, Canada, pp. 4636–4639.Google Scholar
Real World Computing Partnership, CD – Sound scene database in real acoustical environments, 1998–2001, Available from: http//:tosa.mri.co.jp/sounddb/indexe.htm, Accessed 1998–2001.
S. Sciascia, CD – bruitages, vol. 3, 1992.Google Scholar
M. Stäger, P. Lukowicz and G. Tröster, Power and accuracy tradeoffs in sound-based context recognition systems, Perv. Mobile Comput. 3 (2007), 300–327.Google Scholar
S.-Y. Takahashi, T. Morimoto, S. Maeda and N. Tsuruta, Dialogue experiment for elderly people in home health care system, 6th International Conference, TSD 2003, České Budéjovice, Czech Republic, September 8–12, 2003. Proceedings, pp. 418–423.Google Scholar
M. Vacher, Analyse sonore et multimodale dans le domaine de l’assistance à domicile, in: Mémoire HDR, October 2011.Google Scholar
M. Vacher, A. Fleury, F. Portet, J.-F. Serignat and N. Noury, Complete sound and speech recognition system for health smart homes: application to the recognition of activities of daily living, in: New Developments in Biomedical Engineering, Domenico Campolo, ed., pp. 645–673, 2010.Google Scholar
M. Vacher, F. Portet, A. Fleury and N. Noury, Development of audio sensing technology for ambient assisted living: applications and challenges, Int. J. E-Health Med. Commun. 2 (2011), 35–54.Google Scholar