Pronunciation on the future of consumer electronics, telecommunications

Pronunciation learning technique assisted by smartphone


Vishnu C Mohan

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Department of Computer Science and Engineering,

Adi Shankara Institute of Engineering and Technology,

Kalady, Ernakulam. Pin:683574

[email protected]


Mrs. Neetha K Nataraj

Department of Computer Science and Engineering,

Adi Shankara Institute of Engineering and Technology,

Kalady, Ernakulam. Pin:683574


Abstract— Communication has always been a major requirement in all fields. Proper communication is necessary to convey our message. Here arises the need for pronunciation training methods to train the people to speak a language similar to native ones. From the available methods of this technique we will discuss the most prominent and efficient ones. An ambient intelligent environment is created. In an ambient intelligence (AmI) environment, electronic devices that comprise the Internet of things (IoT) network work together seamlessly to provide a wide variety of applications and intelligent services to users. These methods impart such environment and train the user to enhance his/her vocabulary and fluency in that particular language by pointing out the errors and fields to improve and thus enabling the user to convey his/her message in the best possible way in this world running for perfection. The conventional training methods which are Computer Assisted Pronunciation Training Methods (CAPT) can be imparted into AmI, but the resulting computational cost and portability is not achieved. Hence we introduce the Smartphone Assisted Pronunciation Training strategy (SAPT) which is handy and can be implemented with the IoT members efficiently with a lightweight word recommendation method.

Keywords—ambient intelligence, inyternet of things, computer assisted pronunciation training system, bag of phonemes.

                                                                                                                                                                I.      Introduction

In computing, ambient intelligence (AmI) refers to electronic environments that are sensitive and responsive to the presence of people 34. Ambient intelligence is a vision on the future of consumer electronics, telecommunications and computing that was originally developed in the late 1990s by Eli Zelkha and his team at Palo Alto Ventures for the time frame 2010–2020. In an ambient intelligence world, devices work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices.

To make the AmI environment more user – friendly, a wide variety of applications like traditional internet applications are provided explicitly i.e., on request or implicitly i.e., based on the prediction according to current situation. Computer-assisted pronunciation training (CAPT) is one such popular Internet application.

CAPT is a computer – based language learning technology that enables users to self-correct their pronunciation using an automatically generated training process. This system helps users who are uncomfortable with oral participation and also eliminates the difficulty in finding bilingual tutors who are native speakers1. Thus, CAPT is an effective alternative to traditional pronunciation training.

CAPT systems provide feedback by detecting unacceptable pronunciation from user speech samples. For this an automatic – speech recognition system is trained to identify errors. After recognizing the unacceptable parts, the user is made to pronounce a set of phonemes, words or sentences, provided by the system itself. This in turn improves user’s pronunciation. But it is very much a challenge to adapt CAPT systems to an IoT environment due to their high computational requirements2.

So, we propose a Smartphone-assisted pronunciation learning technique (SAPT), which makes use of lightweight word recommendation method. This technique can be implemented on a Smartphone with low computational capacity, resulting in a significant improvement on the applicability to AmI environment.

                                                                                                                                                    II.    The System Proposed

A.    Smartphone Assisted Pronunciation Learning

First, the user speaks the test words displayed on the Smartphone, a typical IoT device assessed by user. The pronounced words are then analyzed by SAPT to identify the mistakes made by the user. This process requires unacceptable utilization of resources. Instead of this the proposed system transfers the speech signals to an IoT member that executes speech recognition processing. As a result of this the system receives a collection of words from the IoT member and executes lightweight word recommendation on the Smartphone using these words. Finally a set of effective words are displayed for the user to practice with. The same is shown in Fig 2.1.       

The likelihood of a person pronouncing a similar word incorrectly is taken into consideration in order to improve user’s pronunciation. To accomplish this, the word pronunciation is represented as a bag of phonemes. The words recommended are read out by the user


Figure:2.2 Lightweight word recommendation technique for SAPT. (a) Pronunciation test. (b) Bag of phoneme. (c) Selection probability.(d) Recommendation and practice.


Based on the pronunciation, the pronunciation characteristic of user is identified. Next, the correlation of phonemes to the test results is evaluated. Based on the correlation analysis, the system assigns a selection probability value to each word in order to recommend a word set to be practiced. Next, the words with higher selection probability are used in the practice phase for training the user. The procedure followed in the proposed system is shown in Fig 2.2.Pronunciation testing by IoT member

To identify the current level of pronunciation, the system requires a set of words for evaluating the pronunciation and a speech recognition system to determine whether the pronunciation is acceptable. Google voice search (GVS) system is used as the speech recognition system.

A.    Lightweight word recommendation method

For identifying the characteristic and weakness of user’s pronunciation, the user pronunciations are modeled into bag of phonemes. The relationship between the pronunciation of error words and phonemes were found. Later, a word – selection strategy was developed in order to recommend various words which can help improve pronunciation quality.



     The system produces a bag of phonemes and test results. Each word is represented as a vector of phonemes, where 1 indicates the included phoneme and 0 otherwise. From the bag of phonemes and test results, the proposed system learns the user’s pronunciation characteristics.pronunciation, 0 for otherwise. To measure the relationship between the occurrence of a phoneme and test results, we employed the Pearson correlation coefficient5.In statistics, the Pearson correlation is a measure of the linear correlation between two variables X and Y. It has a value between +1 and ?1, where 1 is total positive linear correlation, 0 is no linear correlation, and ?1 is total negative linear correlation.                  Where cov(Pj,T) denotes the covariance between Pj and T, and var(Pj) and var(T) are the variance of Pj and T, respectively.Thus, the phoneme /k/ is inversely correlated to an unacceptable pronunciation. Thus phonemes with high degree of correlation are included more frequently in the recommended word set. 1.       WORD RECOMMENDATION BY SELECTION PROBABILITY     A word recommendation method is developed based on the assumption that words that have phonemes frequently occurring on error words indicate that there is a higher probability of improving the pronunciation of the error words.   This method is implemented to calculate the selection probability of a word using the correlations of phonemes, where the probability of each word represents the selection opportunity for that word to be used to improve the user’s pronunciation. Let a column vector U = {u1,…,uj,…,ud}T be a set of correlation degree values, where uj = C(Pj,T ). Then, the importance value of the i-th word wi to be one of the recommended words is calculated as:Thus the word “grove” is selected 41 times more frequently than the word “cone”‘ to compose the recommended word set.                                                                                                                                I.     Comparison with other techniques In the proposed system, users pronounce three times each word in W before and after the practice to determine their pronunciation skill. Fig. 6.1 shows the number of words accepted in the three pronunciation trials, |BP| and |AP|, respectively. The increment of the accepted words before and after practice, |AP|-|BP|, is shown on the top of the red bar graph. The experimental result shows that the number of accepted words significantly increased by 49 words (7.0%) on average for all four users after the proposed practiceFigure 3.2: Improvement in the number of acceptably pronounced words for each user after practice using random word recommendations.In Fig. 6.2, we show that for each user, the number of acceptably pronounced words after the practice using random recommendation was increased by 11 words (1.6%) on average for the four users. Compared with the increase by 49 words (7.0%), as shown in Fig. 6.1, the comparison indicates that the proposed selection strategy was superior in terms of its effectiveness when compared with the random selection strategy.                                                                                                                                                                 I.     Future ScopeA more detailed analysis can be conducted using a series of data mining techniques such as sequential pattern analysis and sequence alignment. The sequential pattern analysis will allow us to investigate unacceptable pronunciation in more detail because the pronunciation of a word is generated by the sequential process of each phoneme. Using the sequence alignment technique to compare users’ pronunciation with the correct pronunciation, the system will be able to detect phoneme level errors based on the matching results between each phoneme. We also expected that this will make intelligent activities in the ambient intelligence environment more intelligent, delivering better experiences to the user.                                                                                                                                                                   II.    Conclusion      The proposed system replaces the impractical computational costs for analyzing users’ speech signal by the members in the IoT network without modifying their functionality.       A lightweight word recommendation system using the bag of phoneme and correlation analysis was employed to provide personalized pronunciation training after the analysis of data from IoT members.      The proposed pronunciation training system considers the individual pronunciation characteristics of each user and effectively supports diverse word recommendation based on those characters.      The training process resulted in correct pronunciation of 7.0% more words for an average of 700 test words.      Our analysis demonstrated that the correction of the users’ pronunciation was mainly attributed to the recommended words, whose phonemes frequently occurred in the users’ erroneous pronunciation.                                                 References 1  W. K. Leung, K. W. Yuen, K. H. Wong, and H. Meng, “Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device,” in Proc. IEEE Int. Conf. Cognit. Infocommun., Budapest, Hungary, Dec. 2013, pp. 583_588.2  X. Qian, H. Meng, and F. Soong, “Capturing L2 segmental mispronunciations with joint-sequence models in computer-aided pronunciation training (CAPT),” in Proc. Int. Symp. Chin. Spoken Lang. Process., Nov./Dec. 2010, pp. 84_88.3  H. Hagras, D. Alghazzawi, and G. Aldabbagh, “Employing type-2 fuzzy logic systems in the efforts to realize ambient intelligent environments application notes,” IEEE Comput. Intell. Mag., vol. 10, no. 1, pp. 44_51, Feb. 2015.4  N. Kumar, N. Chilamkurti, and S. C. Misra, “Bayesian coalition game for the Internet of Things: An ambient intelligence-based evaluation,” IEEE Commun. Mag., vol. 53, no. 1, pp. 48_55, Jan. 2015.  5  I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res., vol. 3, pp. 1157_1182, Jan. 2003.


I'm Harold!

Would you like to get a custom essay? How about receiving a customized one?

Check it out