Automatic sign language recognition inspired by human sign perception

Doctoral thesis (2010)

Authors

G.A. Ten Holt

Contributors

M.J.T. Reinders (promotor)

H. De Ridder (promotor)

Department

Mediamatics () (TU Delft)

Language processing Sign language Automatic recognition Sign perception

To reference this document use:

http://resolver.tudelft.nl/uuid:85a89c91-3472-4a1f-a7b7-092f707b3256

More Info

expand_more

Published Date

23-06-2010

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Mediamatics

Abstract

Automatic sign language recognition is a relatively new field of research (since ca. 1990). Its objectives are to automatically analyze sign language utterances. There are several issues within the research area that merit investigation: how to capture the utterances (cameras, magnetic sensors, instrumented gloves), how to extract interesting information from the captured data, and how to classify signs or sentences automatically using the extracted information. These issues are of an immediate and basic nature, and must be solved before any automatic recognition of sign language can be achieved. But other issues, pertaining to the nature of sign language and human recognition, are no less interesting: which elements of a sign are important for the meaning of an utterance? How do consecutive signs influence one another? Why are certain types of variation unimportant while others change the meaning of the sign? Automatic sign language recognition has, until recently, mostly focused on the first set of issues. In this thesis, we attempt to integrate knowledge about sign languages and human sign recognition into the automatic sign recognition process. Research on the (psycho)linguistics of sign languages is itself quite young (since ca. 1960), and many questions as yet unanswered. For this reason, we conduct our own studies of human sign language recognition. The knowledge gained from these experiments is applied in an existing automatic sign language recognition system. The thesis is divided into two parts: the first part describes the experiments conducted with human signers, the second part describes experiments investigating the possibilities of integrating such knowledge in the automatic recognizer. This recognizer is meant to be used in an interactive environment for young children to practice sign language vocabulary. For this reason, it is vision-based (which is unobtrusive), and only handles isolated signs. The experiments in part I of the thesis investigate the information content of various sign elements: fragments of a sign in time (chapter 2), and the sign aspects handshape and hand orientation (chapter 3). In time, the central phase of a sign is the most informative one, equally informative to the entire sign. Recognition based on other phases is also possible to a certain extent, and the transition from the preparation phase to the central phase appears to be a salient moment. As for the aspects, the aspect handshape proves more useful for recognition than hand orientation. Chapter 4 gives an overview of the human recognition research and discusses possibilities for application. In part II, the possibilities of utilizing the results of part I in the recognition system are investigated. Chapter 5 describes the addition of the handshape feature to the system (which chapter 3 showed to be the most interesting feature to add). Adding handshape gives a small improvement in the recognition performance. In chapter 6, the salience of the sign fragments used in chapter 2 for the automatic recognizer is investigated. The central phase proves to be the most informative one, as it was for human signers. Chapter 7 describes experiments in which a small set of frames is used to represent a sign. The results show a deterioration in recognition performance. Strict demands on the correctness of the remaining frames are probably partly responsible for the performance decrease. In conclusion, we can say that applying human knowledge in automatic sign language recognition is a complex task. Conclusions about human sign recognition do not necessarily hold for the automatic recognizer as well. The most important obstacles for utilizing information successfully seem to be: 1) data acquisition: computer vision is not as accomplished as human observers in capturing the complex, dynamic hand and face motions that form sign language. This means that information that is present in a sign movement for a human being may not be (correctly) observed by an automatic vision analysis system. Thus, the data that humans work with is not necessarily identical to the data the recognizer works with, and this may cause techniques that are successful for human signers to fail in the automatic system. And 2) differences in basic system architecture. Research into human sign recognition is still ongoing, there is no clear model of human sign recognition yet. This makes it more difficult to translate observations from human sign recognition to the automatic recognizer: human signers may use techniques that are not compatible with the current architecture of the recognizer. For example: human signers may process aspects independently. If the recognition system processes all data as a single stream, then such a technique cannot be implemented. A more thorough understanding of human sign recognition, more sophisticated computer vision techniques, and a close co-operation between the fields of automatic sign language recognition and human sign perception, seems the best way to overcome these obstacles.

Files

Thesis_c.pdf

(pdf | 27.7 Mb)