Jv
J.R.T.E. van der Hout
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Evaluating Image2Speech
The evaluation of automatically generated phoneme captions for images
Image2Speech is the relatively new task of generating a spoken description of an image. Similar to Automatic Image Captioning, it is a task focused on describing images, however it avoids the usage of textual resources. An Image2Speech system produces a sequences of phonemes instead of (written) words which makes the Image2Speech task applicable to languages which do not have a standardized writing system. This thesis presents an investigation into the evaluation of the Image2Speech task. The Image2Speech output is evaluated with human evaluators as well as multiple objective evaluation metrics. These metrics are often used in the field of Natural Language Processing, such as BLEU, METEOR, PER, etc. and can be used to give an indication of the semantic similarity between two sentences of words. Since humans are the end users of Image2Speech systems, the objective evaluation metrics are correlated with human evaluation in order to determine which metric can best evaluate an Image2Speech system with the end users in mind. For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences. This system outperformed the original Image2Speech system on the Flickr8k corpus, which is a dataset containing 8,000 images which each image also having five written and spoken captions. Subsequently, these phoneme captions were converted into sentences of words in order to be more easily interpretable for human evaluators. The captions were rated by human evaluators for their goodness of describing the image and correlated with the objective evaluation metrics. Although BLEU4 does not perfectly correlate with human ratings, it obtained the highest correlation among the investigated metrics, and is the best currently existing metric for automatically evaluating the Image2Speech task. Current metrics are limited by the fact that they assume their input to be words. A more appropriate metric for the Image2Speech task should assume its input to be parts of words, e.g. phonemes, instead.
...
...
Image2Speech is the relatively new task of generating a spoken description of an image. Similar to Automatic Image Captioning, it is a task focused on describing images, however it avoids the usage of textual resources. An Image2Speech system produces a sequences of phonemes instead of (written) words which makes the Image2Speech task applicable to languages which do not have a standardized writing system. This thesis presents an investigation into the evaluation of the Image2Speech task. The Image2Speech output is evaluated with human evaluators as well as multiple objective evaluation metrics. These metrics are often used in the field of Natural Language Processing, such as BLEU, METEOR, PER, etc. and can be used to give an indication of the semantic similarity between two sentences of words. Since humans are the end users of Image2Speech systems, the objective evaluation metrics are correlated with human evaluation in order to determine which metric can best evaluate an Image2Speech system with the end users in mind. For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences. This system outperformed the original Image2Speech system on the Flickr8k corpus, which is a dataset containing 8,000 images which each image also having five written and spoken captions. Subsequently, these phoneme captions were converted into sentences of words in order to be more easily interpretable for human evaluators. The captions were rated by human evaluators for their goodness of describing the image and correlated with the objective evaluation metrics. Although BLEU4 does not perfectly correlate with human ratings, it obtained the highest correlation among the investigated metrics, and is the best currently existing metric for automatically evaluating the Image2Speech task. Current metrics are limited by the fact that they assume their input to be words. A more appropriate metric for the Image2Speech task should assume its input to be parts of words, e.g. phonemes, instead.
Sense Umbrella Connection and Desensitisation
Weather Witness
Bachelor thesis
(2017)
-
Ronald van Driel, Justin van der Hout, Marissa van der Wel, Marco Houtman, Cynthia Liem
As of this moment there is a lack of data about rainfall in cities. To collect such data, IBM has started the Sense Umbrella Connection and Desensitisation project. For this project an umbrella was equipped with a piezoelectric sensor and a Bluetooth device to record the rain that falls on the surface of this umbrella. One downside of this umbrella is that, besides rain, it will also record other sounds which include recognisable human speech. Because IBM values privacy, one of the tasks was to make sure that no recording containing recognisable human speech would be uploaded to a server.
This bachelor project focuses on creating a mobile application to connect to the umbrella via Bluetooth and save the audio recordings of rain together with GPS data. After saving this data it would originally be analysed for presence of rain and it was also required to remove all human speech before it was sent from the phone to the server. Over the course of the project it became clear that because of a scarce set of sample data and the limited availability of audio processing libraries on Android, it would be difficult to process audio on Android devices. This is why the decision was made to upload all data, and the processing and analysing was moved to an external device supporting Java.
The raw data will now be uploaded to a server which will make a database entry which includes GPS data, and saves the audio file. In the end an app has been developed that is able to gather data from the umbrella and send it to an external server. Additionally an implementation has been made to analyse the audio data gathered to classify which parts may contain rain.
This project focused on developing an Android app and not on other operating systems due to time constraints. ...
This bachelor project focuses on creating a mobile application to connect to the umbrella via Bluetooth and save the audio recordings of rain together with GPS data. After saving this data it would originally be analysed for presence of rain and it was also required to remove all human speech before it was sent from the phone to the server. Over the course of the project it became clear that because of a scarce set of sample data and the limited availability of audio processing libraries on Android, it would be difficult to process audio on Android devices. This is why the decision was made to upload all data, and the processing and analysing was moved to an external device supporting Java.
The raw data will now be uploaded to a server which will make a database entry which includes GPS data, and saves the audio file. In the end an app has been developed that is able to gather data from the umbrella and send it to an external server. Additionally an implementation has been made to analyse the audio data gathered to classify which parts may contain rain.
This project focused on developing an Android app and not on other operating systems due to time constraints. ...
As of this moment there is a lack of data about rainfall in cities. To collect such data, IBM has started the Sense Umbrella Connection and Desensitisation project. For this project an umbrella was equipped with a piezoelectric sensor and a Bluetooth device to record the rain that falls on the surface of this umbrella. One downside of this umbrella is that, besides rain, it will also record other sounds which include recognisable human speech. Because IBM values privacy, one of the tasks was to make sure that no recording containing recognisable human speech would be uploaded to a server.
This bachelor project focuses on creating a mobile application to connect to the umbrella via Bluetooth and save the audio recordings of rain together with GPS data. After saving this data it would originally be analysed for presence of rain and it was also required to remove all human speech before it was sent from the phone to the server. Over the course of the project it became clear that because of a scarce set of sample data and the limited availability of audio processing libraries on Android, it would be difficult to process audio on Android devices. This is why the decision was made to upload all data, and the processing and analysing was moved to an external device supporting Java.
The raw data will now be uploaded to a server which will make a database entry which includes GPS data, and saves the audio file. In the end an app has been developed that is able to gather data from the umbrella and send it to an external server. Additionally an implementation has been made to analyse the audio data gathered to classify which parts may contain rain.
This project focused on developing an Android app and not on other operating systems due to time constraints.
This bachelor project focuses on creating a mobile application to connect to the umbrella via Bluetooth and save the audio recordings of rain together with GPS data. After saving this data it would originally be analysed for presence of rain and it was also required to remove all human speech before it was sent from the phone to the server. Over the course of the project it became clear that because of a scarce set of sample data and the limited availability of audio processing libraries on Android, it would be difficult to process audio on Android devices. This is why the decision was made to upload all data, and the processing and analysing was moved to an external device supporting Java.
The raw data will now be uploaded to a server which will make a database entry which includes GPS data, and saves the audio file. In the end an app has been developed that is able to gather data from the umbrella and send it to an external server. Additionally an implementation has been made to analyse the audio data gathered to classify which parts may contain rain.
This project focused on developing an Android app and not on other operating systems due to time constraints.