Biologically Motivated Speech Recognition

Speech perception is uniquely human, so applying theories about human psycholinguistic systems to computer speech recognition may improve performance. In the natural model, computations are performed by brains using data from ears. Thus, perhaps effective speech recognition can be performed by a brain-like machine learning algorithm receiving ear-like inputs. It is theorized that human listeners translate speech sounds into the lip and tongue movements that produced them - perhaps a speech recognizer that could relate sound to articulation would perform well. This project aims to employ Hierarchical Temporal Memory (HTM), a brain-like algorithm, in identifying ear-transformed audio with the help of X-ray articulation data. An experimental architecture was developed, but reliable HTM results were not obtained.

Research Proposal

Research Paper