The demonstrator service (running on a PC) will contain a simple user interface and a restricted but realistic database of POI information. It will give a flavor of what the envisaged service can offer to the user, and it will also be used as a vehicle for testing the benefits of the newly developed speech technology in a realistic setting, involving tests with end users at strategic moments during the project.
The automatic recognition of spoken Points of Interest (POI) names in a Dutch/Flemish service is extremely difficult and no satisfactory solutions exits today. Two reasons for this failure reside in the existence of multiple pronunciations that are frequently used for the same name and the presence of important multilingual aspects to account for. The Automatic Speech Recognition engine must for instance be able to cope with pronunciations of foreign names by Dutch/Flemish speakers and with pronunciations of Dutch/Flemish names by non-native speakers of Dutch. In order to deal adequately with such pronunciations, one will eventually need a recognizer that can work with a multilingual phoneme set.
The research work packages of the project will investigate whether it is possible to improve POI name recognition, first by improving the Dutch baseline transcriptions by means of the g2p-p2p (grapheme-to-phoneme and phoneme-to-phoneme) converter technology that was developed in the Autonomata project, and subsequently by introducing pronunciation variants by means of automata that were trained on the relations between the baseline and the observed (= auditory verified) phonetic transcriptions of the name tokens as found in the spoken name corpus developed in the previous Autonomata I project. The project will thus make use of all the resources that were developed in the Autonomata I project.
The aim of this application-oriented research project is to build a demonstrator version of a Dutch/Flemish Points of Interest (POI) information providing business service, and to investigate new pronunciation modeling technologies that can help to bring the spoken name recognition component of such a service to the required level of accuracy.