{"id":575,"date":"2018-09-03T22:40:20","date_gmt":"2018-09-03T20:40:20","guid":{"rendered":"https:\/\/craftcoders.app\/?p=575"},"modified":"2024-08-14T14:27:54","modified_gmt":"2024-08-14T12:27:54","slug":"quickstart-get-ready-for-3rd-spoken-call-shared-task","status":"publish","type":"post","link":"https:\/\/craftcoders.app\/quickstart-get-ready-for-3rd-spoken-call-shared-task\/","title":{"rendered":"Quickstart: Get ready for 3rd Spoken CALL Shared Task"},"content":{"rendered":"\r\n

These days is Interspeech conference 2018 where I’m invited as a speaker and as they write on their website<\/a>…<\/p>\r\n\r\n\r\n\r\n

Interspeech is the world\u2019s largest and most comprehensive conference on the science and technology of spoken language processing.<\/blockquote>\r\n\r\n\r\n\r\n

Coming Wednesday the results and systems of 2nd Spoken CALL Shared Task (ST2) are presented and discussed in a special session of the conference. Chances are that these discussions will lead to a third edition of the shared task.<\/p>\r\n\r\n\r\n\r\n

\u00a0<\/div>\r\n\r\n\r\n\r\n

With this blog post, I want to address all newcomers and provide a short comprehensible introduction<\/strong> to the most important challenges you will face if you want to participate at Spoken CALL shared task<\/strong>. If you like “hard fun”, take a look at my research group’s tutorial paper<\/a>. There will be unexplained technical terms and many abbreviations combined with academical language in a condensed font for you \ud83d\ude42<\/p>\r\n\r\n\r\n\r\n

\u00a0<\/div>\r\n\r\n\r\n\r\n

What is this all about?<\/h2>\r\n\r\n\r\n\r\n

The Spoken CALL Shared Task aims to create an automated prompt-response system for German-speaking children to learn English. The exercise for a student is to translate a request or sentence into English using voice. The automated system should ideally accept a correct response or reject a student response if faulty and offer relevant support or feedback. There are a number of prompts (given as text in German, preceded by a short animated clip in English), namely to make a statement or ask a question regarding a particular item. A baseline system (def<\/a>)<\/em> is provided by the website of the project<\/a>. The \ufb01nal output of the system should be a judgment call as to the correctness of the utterance. A wide range of answers is to be allowed in response, adding to the dif\ufb01culty of giving automated feedback. Incorrect responses are due to incorrect vocabulary usage, incorrect grammar, or bad pronunciation and quality of the recording.<\/p>\r\n\r\n\r\n\r\n

\u00a0<\/div>\r\n\r\n\r\n\r\n

How to get started<\/h2>\r\n\r\n\r\n\r\n

A day may come when you have to dig into papers when you have to understand how others built their systems, but it is not this day. As someone who is new to the field of natural language processing (NLP) you have to understand the basics of machine learning and scientific work first. Here are the things we will cover with this post:<\/p>\r\n\r\n\r\n\r\n

    \r\n
  1. Machine learning concepts<\/li>\r\n
  2. Running the baseline system<\/li>\r\n
  3. Creating your own system<\/li>\r\n<\/ol>\r\n\r\n\r\n\r\n
    \u00a0<\/div>\r\n\r\n\r\n\r\n

    Machine learning concepts<\/h3>\r\n\r\n\r\n\r\n

    When my research group and I first started to work on the shared task we read the papers and understood barely anything. So, we collected all the technical terms we didn’t understand and created a dictionary<\/a> with short explanations. Furthermore,\u00a0 we learned different concepts you should take a look at:<\/p>\r\n\r\n\r\n\r\n

    \u00a0<\/div>\r\n\r\n\r\n\r\n

    Training data usage<\/h4>\r\n\r\n\r\n\r\n

    For the 2nd edition, there was a corpus (=training data) containing 12,916 data points (in our case speech-utterances), that we can use to create a system. A machine learning algorithm needs training data to extract features from it. These features can be used for classification and the more varying data you have, the better the classification will be.<\/p>\r\n\r\n\r\n\r\n


    But you can’t use all that data for training. You have to keep a part of your data points aside so you can validate that your system can classify data it has never seen before. This is called validation <\/strong>set<\/strong> and the other part is called training set<\/strong>. A rookie mistake (which we made) is to use the test set\u00a0<\/strong>as validation set. The test set is a separate corpus, which you should use at the very end of development only to compare your system with others. For a more detailed explanation take a look at
    this blog post<\/a>.<\/p>\r\n\r\n\r\n\r\n

    If you don’t have a separate validation set (like in our case) you can use cross-validation instead, which is explained here<\/a>. Furthermore, you should try to have an equal distribution between correct and incorrect utterances in your sets. If this is not the case, e.g. if you have 75% correct utterances and 25% incorrect utterances in your training set, your system will tend to accept everything during validation.<\/p>\r\n\r\n\r\n\r\n

    \u00a0<\/div>\r\n\r\n\r\n\r\n

    Metrics<\/h4>\r\n\r\n\r\n\r\n

    Metrics are used to measure how well a system performs. They are based on the system’s results, which generally displayed as a confusion matrix:<\/p>\r\n\r\n\r\n\r\n

    \r\n
    \"\"<\/figure>\r\n<\/div>\r\n\r\n\r\n\r\n

     <\/p>\r\n\r\n\r\n\r\n