Word-Final Phoneme Segmentation Using Cross-Correlation
The goal of this paper is to present a word-final target phoneme automated segmentation method based on cross-correlation coefficients computed between a reference sound wave and a sample sound wave. Most existing Speech Sound Disorder (SSD) Screening solutions require human intervention to a greater or lesser extent and use segmentation methods based on hard-coded time frames. Moreover, existing solutions extract features from the frequency domain, which entails large amounts of computational power to the detriment of real-time feedback. The pre-processing algorithm proposed in this paper, implemented in a Python version 3.7 script, automatically generates 2 new .wav files corresponding to the phonemes found in word-final position in the initial sound waves. The newly-generated .wav files are meant to be used as valid and homogeneous input in a subsequent classification stage aimed at rigorously discriminating mispronunciations of the target phoneme and assist Speech-Language Pathologists (SLPs) with the SSD screening.