Machine learning Archive - CraftCoders.app

Solving the Knapsack Problem with the Jenetics Library

Dominik Jülg — Thu, 13 May 2021 10:46:48 +0000

According to its official documents, Jenetics is a library that is used for programming evolutionary algorithms written in Java. Jenetics is implemented using the Java Stream interface, so it works smoothly with the rest of the Java Stream API. Evolutionary algorithms have their roots in biology, as they use mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. If you want to learn more about the theory behind evolutionary algorithms, I’d suggest reading Introduction to Evolutionary Algorithms first.

Disclaimer: This blog post is based on Introduction to Jenetics Library from Baeldung. But it is using the current library version (6.2.0) and a more complex example: The knapsack problem without using the libraries provided classes for the problem.

The knapsack problem

Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items.
Wikipedia

Defining the problem in code

In the following example, we have a class called “Knapsack” that represents our problem. The class defines items that consist of a size and a value (possibleKnapsackItems). These items are initialized with random values between 0 and 10 and put in a list to represent the items we can put into our knapsack. Furthermore, the class defines the maximum size the knapsack can hold. Attention: Don’t mix up the size of the knapsack (Knapsack.getKnapsackSize) with the number of items that we could put in the knapsack (Knapsack.getItemCount). The items that we actually put into the knapsack will be defined later in our evolutionary algorithm.

public final class Knapsack {
    private final List possibleKnapsackItems; // items that *might* end up in the knapsack, depending on chromosome
    private int knapsackSize;

    public Knapsack(List possibleItems, int knapsackSize) {
        this.possibleKnapsackItems = possibleItems;
        this.knapsackSize = knapsackSize;
    }

    public static Knapsack initializeWithRandomItems(int size, int knapsackSize) {
        Random random = new Random(123);
        List items = Stream.generate(() -> 
                new Item((int) (random.nextDouble()*10),(int) (random.nextDouble()*10)))
                .limit(size)
                .collect(Collectors.toList());
        return new Knapsack(items, knapsackSize);
    }

    public Item getItemByIndex(int index) { return this.possibleKnapsackItems.get(index); }
    public int getItemCount() { return this.possibleKnapsackItems.size(); }
    public int getKnapsackSize() { return this.knapsackSize; }

    public static final class Item {
        private final int size;
        private final int value;

        public Item(final int size, final int value) {
            this.size = Requires.nonNegative(size);
            this.value = Requires.nonNegative(value);
        }

        public int getSize() { return size; }
        public int getValue() { return value; }
    }
}

Let’s get started with the Jenetics Library

In order to use Jenetics, we need to add the following dependency into our build.gradle:

implementation 'io.jenetics:jenetics:6.2.0'

Next we create a runnable class App that will use the Jenetics library and our Knapsack class to run a genetic algorithm. First, let’s make use of our previously created class: We create a knapsack with a size of 100 and 80 items from which we can pick.

public class App {
    private final static int ITEM_COUNT = 80;
    private final static int KNAPSACK_SIZE = 100;
    private final static int POPULATION_SIZE = 500;

    private final Knapsack knapsack = Knapsack.initializeWithRandomItems(ITEM_COUNT, KNAPSACK_SIZE);

    public static void main(String[] args) {
        new App().run(POPULATION_SIZE);
    }

    public void run(int populationSize) {
        // TODO Run the genetic algorithm
    }
}

Let’s work on the run() function. We need to convert the Knapsack problem into another representation that a genetic algorithm can work with, namely a chromosome. And indeed we can transform it into a so-called binary problem, where each one represents an item we put into the knapsack, each zero represents an item we don’t put in the knapsack.

Source: Towards Data Science

Using the Jenetics library we can create a BitChromosome with a length of 80 which is equal to the number of items we can choose from (ITEM_COUNT) and a probability of having 1’s in the chromosome equal to 0.3. These BitChromosomes are accessible via a factory, meaning we can generate as many randomly initialized chromosomes as we want our population size to be.

final Factory> gtf =
        Genotype.of(BitChromosome.of(this.knapsack.getItemCount(), 0.3));

Now, let’s create the execution environment:

final Engine engine = Engine
        .builder(this::fitness, gtf)
        .populationSize(populationSize)
        .build();

The Engine will run our genetic algorithm and needs a couple of information:

The factory we just created, that produces our random chromosomes
The number of random chromosomes we want to create and compare (called populationSize)
Last but not least, a fitness function which we didn’t define, yet

The Fitness Function

The fitness function calculates the fitness of each chromosome. In the case of the knapsack problem, the fitness is equal to the sum of the values of the individual elements that we place in our knapsack (i.e. items with corresponding one in the chromosome). How to put that into code, is something you can think about now 😉

private Integer fitness(Genotype gt) {
    BitChromosome chromosome = gt.chromosome().as(BitChromosome.class);
    int fitness = 0;
    // TODO: Calculate fitness
    return fitness;
}

A first run

In the final step, in our run function, we add some basic statistics, start the evolution and collect the results:

final EvolutionStatistics statistics = EvolutionStatistics.ofNumber();
final Phenotype best = engine.stream()
        // Truncate the evolution stream after 7 "steady"
        // generations.
        .limit(bySteadyFitness(10))
        // Update the evaluation statistics after
        // each generation
        .peek(statistics)
        // Collect (reduce) the evolution stream to
        // its best phenotype.
        .collect(toBestPhenotype());

System.out.println(statistics);
System.out.println(best);

If you put everything together and implemented the fitness function correctly, you should end up with a result looking like this:

+---------------------------------------------------------------------------+
 |  Time statistics                                                          |
 +---------------------------------------------------------------------------+
 |             Selection: sum=0,029213700000 s; mean=0,000811491667 s        |
 |              Altering: sum=0,120244900000 s; mean=0,003340136111 s        |
 |   Fitness calculation: sum=0,054355500000 s; mean=0,001509875000 s        |
 |     Overall execution: sum=0,199033900000 s; mean=0,005528719444 s        |
 +---------------------------------------------------------------------------+
 |  Evolution statistics                                                     |
 +---------------------------------------------------------------------------+
 |           Generations: 36                                                 |
 |               Altered: sum=133.010; mean=3694,722222222                   |
 |                Killed: sum=0; mean=0,000000000                            |
 |              Invalids: sum=0; mean=0,000000000                            |
 +---------------------------------------------------------------------------+
 |  Population statistics                                                    |
 +---------------------------------------------------------------------------+
 |                   Age: max=14; mean=2,183056; var=7,349621                |
 |               Fitness:                                                    |
 |                      min  = 0,000000000000                                |
 |                      max  = 188,000000000000                              |
 |                      mean = 134,464166666667                              |
 |                      var  = 4503,017550280571                             |
 |                      std  = 67,104527047589                               |
 +---------------------------------------------------------------------------+
 [11101010|00000100|11000101|10001000|10001111|10100000|01010010|10110000|11000101|10000101] -> 188

If so, congratulations! You made it.

Further Optimiziation

So up until now, we told the engine to learn using 500 generations and let it decide on itself how to do mutation, recombination, and selection. Of course, if you want to improve the quality of your best phenotype you can configure these things yourself. An easy thing to do is to increase the number of generations to i.e. 5000 and your results will probably improve. But you can also tweak several things like mutation yourself:

final Engine engine = Engine
        .builder(this::fitness, gtf)
        .populationSize(populationSize)
        .survivorsSelector(new TournamentSelector<>(5))                    
        .offspringSelector(new RouletteWheelSelector<>())                   
        .alterers(
            new Mutator<>(0.115),
            new SinglePointCrossover<>(0.16))
        .build();

But to gain some real improvements using your own configuration is something that is pretty time consuming and would need another blogpost, so I’ll leave that to you 😀

Greetings,

Domi

Quickstart: Get ready for 3rd Spoken CALL Shared Task

Dominik Jülg — Mon, 03 Sep 2018 20:40:20 +0000

These days is Interspeech conference 2018 where I’m invited as a speaker and as they write on their website…

Interspeech is the world’s largest and most comprehensive conference on the science and technology of spoken language processing.

Coming Wednesday the results and systems of 2nd Spoken CALL Shared Task (ST2) are presented and discussed in a special session of the conference. Chances are that these discussions will lead to a third edition of the shared task.

With this blog post, I want to address all newcomers and provide a short comprehensible introduction to the most important challenges you will face if you want to participate at Spoken CALL shared task. If you like “hard fun”, take a look at my research group’s tutorial paper. There will be unexplained technical terms and many abbreviations combined with academical language in a condensed font for you 🙂

What is this all about?

The Spoken CALL Shared Task aims to create an automated prompt-response system for German-speaking children to learn English. The exercise for a student is to translate a request or sentence into English using voice. The automated system should ideally accept a correct response or reject a student response if faulty and offer relevant support or feedback. There are a number of prompts (given as text in German, preceded by a short animated clip in English), namely to make a statement or ask a question regarding a particular item. A baseline system (def) is provided by the website of the project. The ﬁnal output of the system should be a judgment call as to the correctness of the utterance. A wide range of answers is to be allowed in response, adding to the difﬁculty of giving automated feedback. Incorrect responses are due to incorrect vocabulary usage, incorrect grammar, or bad pronunciation and quality of the recording.

How to get started

A day may come when you have to dig into papers when you have to understand how others built their systems, but it is not this day. As someone who is new to the field of natural language processing (NLP) you have to understand the basics of machine learning and scientific work first. Here are the things we will cover with this post:

Machine learning concepts
Running the baseline system
Creating your own system

Machine learning concepts

When my research group and I first started to work on the shared task we read the papers and understood barely anything. So, we collected all the technical terms we didn’t understand and created a dictionary with short explanations. Furthermore, we learned different concepts you should take a look at:

Training data usage

For the 2nd edition, there was a corpus (=training data) containing 12,916 data points (in our case speech-utterances), that we can use to create a system. A machine learning algorithm needs training data to extract features from it. These features can be used for classification and the more varying data you have, the better the classification will be.

But you can’t use all that data for training. You have to keep a part of your data points aside so you can validate that your system can classify data it has never seen before. This is called validation set and the other part is called training set. A rookie mistake (which we made) is to use the test set as validation set. The test set is a separate corpus, which you should use at the very end of development only to compare your system with others. For a more detailed explanation take a look at this blog post.

If you don’t have a separate validation set (like in our case) you can use cross-validation instead, which is explained here. Furthermore, you should try to have an equal distribution between correct and incorrect utterances in your sets. If this is not the case, e.g. if you have 75% correct utterances and 25% incorrect utterances in your training set, your system will tend to accept everything during validation.

Metrics

Metrics are used to measure how well a system performs. They are based on the system’s results, which generally displayed as a confusion matrix:

TP: True positive (a correct utterance has been classified as correct?)
FP: False positive (a faulty utterance has been classified as correct?)
TN: True negative (a faulty utterance has been classified as incorrect?)
FN: False negative (a correct utterance has been classified as incorrect?)

Based on the confusion matrix there are four often used metrics: Accuracy, Precision, Recall and F1. When to use which is explained thoroughly here. For the shared task, there’s a special metric called D-score. It is used to evaluate the system’s performance respecting a bias to penalize different classification mistakes differently. More details about D-score can be found in our tutorial paper.

Running the baseline system

If you open the data download page you can see an important differentiation: On the one hand you can download the speech processing system (also asr system or Kaldi system) and on the other hand you can download the text processing system. Basically, you have to independent baseline systems you can work on.

For the asr system to work you have to install several applications. This is one of the pain points. Be careful here! Kaldi is the speech processing framework our baseline system is built on. The things you need for Kaldi are Python, Tensorflow, CUDA, and cuDNN. The latter two are for Nvidia graphics cards. cuDNN depends on CUDA so check out that the versions you install match. Furthermore, Kaldi and Tensorflow should be able to use the installed Nvidia software versions. To find out if everything went well you can try Kaldi’s yes/no example as described in:

kaldi/egs/yesno/README.md

The text processing system can be run using just python and is pretty minimal 😉 At least during ST2 it was. You can either check if there’s a new official baseline system for text processing, or you can use one of our CSU-K systems as a basis:

https://github.com/Snow-White-Group/CSU-K-Toolkit

Creating your own system

To create your own system you first have to decide whether you want to start with text- or speech processing. If you are a complete beginner in the field, I would advise you to start which text processing because it is easier. If you want to start with speech processing take a look at Kaldi for dummies, which will teach you the basics.

The Kaldi system takes the training data audio files as input and produces text output which looks like this:

user-008_2014-05-11_17-57-14_utt_015 CAN I PAY BY CREDIT CARD 
user-023_2014-11-03_09-47-09_utt_009 I WANT A COFFEE 
user-023_2014-11-03_09-47-09_utt_010 I WOULD LIKE A COFFEE 
user-023_2014-11-03_09-47-09_utt_011 I WANT THE STEAK

The asr output can be used as text processing input. The text processing system produces a classification (language: correct/incorrect, meaning: correct/incorrect) from the given sentence as output.

Now you should be at a point where you understand most of the things described in the papers, except for the concrete used architectures and algorithms. Read through them to collect ideas and dig deeper into the things that seem interesting to you 🙂 Furthermore here are some keywords you should have heard of:

POS Tagging, dependency parsing
NER Tagging
Word2Vec, Doc2Vec
speech recognition
information retrieval

I hope you are motivated to participate and if so, see you at the next conference 🙂

Greets,

Domi

Rasa Core & NLU: Conversational AI for dummies

Dominik Jülg — Mon, 23 Jul 2018 10:47:42 +0000

AI is a sought-after topic, but most developers face two hurdles that prevent them from programming anything with it.

It is a complex field in which a lot of experience is needed to achieve good results
Although there are good network topologies and models for a problem, there is often a lack of training data (corpora) without which most neural networks cannot achieve good results

Especially in the up-and-coming natural language processing (nlp) sector, there is a lack of data in many areas. With this blogpost we are going to discuss a simple yet powerful solution to address this problem in the context of a conversational AI. ?

Leon presented a simple solution on our blog a few weeks ago: With AI as a Service reliable language processing systems can be developed in a short time whithout having to hassle around with datasets and neural networks. However, there is one significant drawback due to this type of technology: Dependence on the operator of the service. On one hand the service can be linked with costs, furthermore the own possibly sensitive data has to be passed on to the service operator. Especially for companies this is usually a show stopper. That’s where Rasa enters the stage.

The Rasa Stack

Rasa is an open source (see Github) conversational AI that is fully free for everyone and can be used in-house. There is no dependence on a service from Rasa or any other company. It consists of a two-part stack whose individual parts seem to perform similar tasks at first glance, but on a closer look you see that both try to solve their own problems. Rasa NLU is the language understanding AI we are going to dig deeper into soon. It is used to understand what the user is trying to say and which additional information he provides. Rasa Core is the context-aware AI for conversational flow, which is used to build dialog systems e.g. chatbots like this. It uses the information from Rasa NLU to find out what the user wants and what other information is needed to achieve it. For example, for a weather report you need both the date and the place.

Digging deeper into Rasa NLU

The following paragraphs deal with the development of language understanding. Its basics are already extensively documented, which is why I will keep this brief. Instead, the optimization possibilities are to be presented more extensively. If you have never coded something using Rasa, it makes sense to work through the restaurant example (see also Github code template) to get a basic understanding of the framework.

The processing pipeline is the core element of Rasa NLU. The decisions you make there have a huge influence on the system’s quality. In the restaurant example the pipeline is already given: Two NLU frameworks spaCy and skLearn are used for text processing. Good results can be achieved with very few domain-specific training data (10 – 20 formulations per intent). You can get this amount of data easily using Rasa Trainer. It is so small because transfer learning combines your own training data with spaCy’s own high-quality models to create a neural net. Besides spaCy, there are other ways to process your data, which we will discover now!

Unlock the full potential

Instead of spaCy you can also use MIT Information Extraction. MITIE can also be used for intent recognition and named entity recognition (NER). Both backends perform the same tasks and are therefore interchangeable. The difference lies in the algorithms and models they use. Therefore you are not bound to only spaCy or mitie, but you can also use scikit-learn for intent classification.

Which backend works best for your project is individual and should be tested. As you will see in the next paragraph, the pipeline offers some precious showpieces that work particularly well. The already included cross validation should be used to evaluate the quality of the system.

The processing pipeline

You should understand how the pipeline works to develop a good system for your special problem.

The tokenizer: is used to transform input words, sentences or paragraphs into single word tokens. Hence, unnecessary punctuation is removed and stop words can also be removed.
The featurizer is used to create input vectors from the tokens. They can be used as features for the neural net. The simplest form of an input vector is one-hot.
The intent classifier is a part of the neural net, which is responsible for decision making. It decides which intent is most likely meant by the user. This is called multiclass classification.
Finally named entity recognition can be used to extract information like e-mails from a text. In terms of Rasa (and dialogue systems) this is called entity extraction.

In the following example (from Rasa) you can see how the single parts work together to provide information about intent and entity:

{
    "text": "I am looking for Chinese food",
    "entities": [
        {"start": 8, "end": 15, "value": "chinese", "entity": "cuisine", "extractor": "ner_crf", "confidence": 0.864}
    ],
    "intent": {"confidence": 0.6485910906220309, "name": "restaurant_search"},
    "intent_ranking": [
        {"confidence": 0.6485910906220309, "name": "restaurant_search"},
        {"confidence": 0.1416153159565678, "name": "affirm"}
    ]
}

As mentioned by Rasa itself intent_classifier_tensorflow_embedding can be used for intent classification. It is based on the StarSpace: Embed All The Things! paper published by Facebook Research. They present a completely new way for meaning similarity, which generates awesome results! ?

For named entity recognition you have to make a decision: Either you use common pre-trained entities, or you use custom entities like “type_of_coffee”. Pre-trained entities can be one of the following:

ner_spaCy: Places, Dates, People, Organisations
ner_duckling: Dates, Amounts of Money, Durations, Distances, Ordinals

Those two algorithms perform very well in recognition of the given types, but if you need custom entities they perform rather bad. Instead you should use ner_mitie or ner_crf and collect some more training data than usual. If your entities have a specific structure, which is parsable by a regex make sure to integrate intent_entity_featurizer_regex to your pipeline! In this Github Gist I provided a short script, which helps you to create training samples for a custom entity. You can just pass some sentences for an intent into it and combine it with sample values of your custom entity. It will then create some training samples for each of your sample values.

That’s it 🙂 If you have any questions about Rasa or this blogpost don’t hesitate to contact me! Have a nice week and stay tuned for our next post.

Greets,
Domi

AI as a Service – a Field Report

Leon Gottschick — Mon, 11 Jun 2018 14:20:06 +0000

In this blog post I describe my experiences with AI services from Microsoft and how we (team Billige Plätze) were able to create hackathon award winning prototypes in roughly 24 hours with them. The post is structured according to the hackathons we participated and used AI services in, as of now (11.06.2018) there are 2 hackathons where we used AI services, but im sure there are a lot more to come in the future. The first one was the BlackForest Hackathon which took place in autumn 2017 in Offenburg and the second one was the Zeiss hackathon in Munich, which took place in January 2018.
This post is not intended to be a guide on integrating said services (Microsoft has nice documentations of all their products 🙂 ), but rather a field report on how these services can be used to actualize cool use cases.

Artificial Intelligence as a Service (AIaaS) is third-party offering of artificial intelligence, accessible via a API. So, people get to take advantage of AI without spending too much money, or if you’re lucky and a student, no money at all.

The Microsoft AI Platform

As already mentioned above, we have used several Microsoft services to build our prototypes, including several Microsoft Cognitive Services and the Azure bot service. All of them are part of the Microsoft AI platform, where you can find services, infrastructure and tools for Machine Learning and Artificial Intelligence you can use for your personal or business projects.

we used only some of the services of the AI platform

BlackForest Hackathon

The BlackForest Hackathon has been the first hackathon ever for our team and we have been quite excited to participate.

The theme proposed by the organizers of the hackathon was “your personal digital assistant”, and after some time brainstorming we came up with the idea of creating an intelligent bot, which assists you with creating your learning schedule. Most of us are serious procrastinators (including me :P), so we thought that such a bot can help us stick to our learning schedule and motivate us along the way to the exam.
The features we wanted to implement for our prototype are

asking the user about his habits (usual breakfast, lunch and dinner time as well as sleep schedule),
asking the user about his due exams (lecture, date and the amount of time the user wants to learn for the exam) and
automatic creation of learning appointments, based on the user input, within the users Google calendar.

With the topic of the hackathon in mind we wanted the bot to gather the user input via a dialog as natural as possible.

Billige Plätze in Action

The technology stack

Cem and I have experimented a little bit with the Azure Bot Service before the hackathon, and we thought it to be a perfect match for the task of writing the bot. We also wanted the bot to process natural language and stumbled upon LUIS, a machine learning based service for natural language processing, which can be integrated seamlessly into the bot framework (because it is from Microsoft, too).
Our stack consisted of, as expected, mainly Microsoft technologies. We used

C#,
.Net core,
Visual Studio,
Azure Bot Service,
LUIS and
Azure.

The combination of C#, the bot service and LUIS provided the core functionality of our bot and we were able to deploy it to Azure within one click.

The Azure Bot Service

The Bot Service provides an integrated environment that is purpose-built for bot development, enabling you to build, connect, test, deploy, and manage intelligent bots, all from one place. Bot Service leverages the Bot Builder SDK with support for .NET and Node.js.

Overview of the Bot Service

The bot service consists of the concepts of

channels, which connect the bot service with a messaging platform of your choice,
the bot connector, which connects your actual bot code with one or more channels and handles the message exchange between the channels and the bot via
activity objects.

Dialogs, another core concept, help organize the logic in your bot and manage conversation flow. Dialogs are arranged in a stack, and the top dialog in the stack processes all incoming messages until it is closed or a different dialog is invoked.

General Conversation flow of the Bot Service

By using the bot service we were able to focus on programming the actual conversation with the Bot Builder SDK. To use the bot service you just have to create a new bot in Azure, and connect to the channels of your choice (Telegram worked like a charm) also via the web app.
After creating your bot in Azure you can start coding right away by using a template provided by Visual Studio, you just have to type in your bot credentials in the configuration file and your good to go. Because we didn’t have to worry about where to host the bot and how to set it up we were able to quickly create a series of dialogs (which involved serious copy pasting :P) and test our conversation flow right away by using the botframework emulator, and when we were happy with the results publish the bot on Azure within one click in Visual Studio.
We didn’t have to worry about getting the user input from the messaging platform and integrating natural language understanding into our bot was very easy, because we used Microsoft LUIS. We were seriously impressed by the simplicity of the bot service.

Microsoft LUIS

LUIS is a machine learning-based service to build natural language into apps, bots, and IoT devices.
You can create your own LUIS app on the LUIS website. A LUIS app is basically a language model designed by you, specific for your domain. Your model is composed of utterances, intents and entities, whereas utterances are example phrases users could type into your bot, intents are the users intention you want to extract from the utterance and entities represent relevant information of the utterance, much like variables in a programming language.
An example for an utterance from our bot could be “I write my exam on the 23rd of August 2018”. Based on the utterance, our model is able to extract the intent “ExamDateCreation” as well as the entity “Date” <– 23.08.2018 (for a more detailed explanation, visit the LUIS Documentation). Once you define all your intents and entities needed for your domain in the LUIS web application and provide enough sample utterances, you can test and publish your LUIS app. After publishing you can access your app via a REST API, or in our case through the Azure bot service.

the LUIS web application

LUIS is tightly integrated in the bot service, to integrate our model all we had to do was to add an annotation to a class and extend the prefabricated LuisDialog to get access to our intents and entities.

[Serializable]
[LuisModel("your-application-id", "your-api-key")]
public class ExamSubjectLuisDialog : LuisDialog
{
  [LuisIntent("Klausurfach")]
  private async Task ExamPowerInput(IDialogContext context, IAwaitable result,
  Microsoft.Bot.Builder.Luis.Models.LuisResult luisResult)
        {
        ...
        }
}

Theres nothing else to do to integrate LUIS into your bot. The bot connector service handles the message exchange between your bot code and the LUIS REST API and converts the JSON into C# objects you can directly use. Fun fact: the integration of the google calendar into our bot took us several hours, a lot of nerves and around 300 lines of code, whereas the integration of our LUIS model took around 5 minutes and the lines of code for every LuisDialog we created.

Summary

By using the Azure bot service in combination with a custom LUIS model we were able to create a functional prototype of a conversational bot assisting you in creating your custom learning schedule by adding appointments to your Google calendar in roughly 24 hours, all while being able to understand and process natural language. With the power of the bot service, the bot is available on a number of channels, including Telegram, Slack, Facebook Messenger and Cortana.

It was a real pleasure to use these technologies because they work seamlessly together. Being able to use ready-to-use dialogs for LUIS sped up the development process enormously, as well as being able to deploy the bot on azure within one click out of Visual Studio. I included a little demonstration video of the bot below, because it is no longer in operation.

demo part 2

Zeiss Hackathon Munich

Our second Hackathon we participated in took place in January 2018. It was organized by Zeiss and sponsored by Microsoft.

The theme of this hackathon was VISIONary Ideas wanted, and most of the teams did something with VR/AR. One of the special guests of the hackathon was a teenager with a congenital defect resulting in him to be left with only about 10% of his eye sight. The question asked was “can AI improve his life?”, so we sat down with him and asked him about his daily struggle being almost blind. One problem he faces regularly, and also according to our internet research other blind people, is the withdrawal of cash from an ATM.
So after some time brainstorming we came up with the idea of a simple Android app which guides you through the withdrawal process via auditory guidance. Almost everyone has a smartphone, and the disability support for them is actually pretty good, so a simple app is a pretty good option for a little, accessible life improvement.

We figured out 3 essential things we needed to develop our app:
1. Image Classification, to distinguish between different types of ATM (for our prototype we focused on differentiating ATMs with touch screen and ATMs with buttons on the sides),
2. Optical Character Recognition, to read the text on the screen of the ATM to detect the stage of the withdrawal process the user is in and generate auditory commands via
3. Text to Speech, which comes out of the box in Android.

The Technology Stack

We wanted to develop an Android app, so we used Android in combination with Android Studio. We chose to develop an Android app simply because some of us were familiar with it and I always wanted to do something with Android.

For both the image classification and the OCR we again relied on Microsoft Services.
The Microsoft Vision API provides a REST endpoint for OCR, so we got ourselves a API key and we were ready to go.

For the image classification Microsoft provides the custom vision service, where you can train your own model.

Plugging the parts together

The flow of our app is quite simple, it takes a picture every 5 seconds, converts it into a byte array and first sends it to our custom image classifier to detect the ATM type. After the succesful classification of the ATM all further taken images are sent to the vision API for optical character recognition. We get a JSON as a response with all the text the vision API was able to extract from the image. The app then matches keywords, we focused on the flow of ATMs from Sparkasse, with the extracted text to detect the current stage of the withdrawal process and then create auditory commands via text to speech. We didn’t even have to rely on frameworks by Microsoft like in the first hackathon, all we needed was to call the REST API of the services and process the response.

Summary

Much like in the first hackathon we were really impressed how good the services work out of the box. As you can imagine the images of the ATM were oftentimes pretty shaky, but the OCR worked pretty good nevertheless. And because we only matched important, but distinct, keywords of every stage of the withdrawal process, the app could handle wrongly extracted sentences to a good degree. Our custom image classifier was able to differentiate between ATMs with touch screens and ATMs with buttons pretty reliable with only 18(!) sample pictures.

our custom ATM type classifier

After the 24 hours of coding (-2 hours of sleep :/ ), we had a functioning prototype and were able to pitch our idea to the jury with a “living ATM” played by Danny :).

The jury was impressed by the prototype and our idea of a “analog” pitch (we had no PowerPoint at all) and awarded us the Microsoft price. We all got a brand new Xbox One X, a book from Satya Nadalla and an IoT kit, which was pretty cool (sorry for bragging :P).

Billige Plätze and their new Xboxes

Takeaways

I think there are 3 main points you can take away from this blog post and my experiences with AI as a service.

You can create working prototypes in less than 24 hours
You don’t have to start from scratch
It is not a shame to use finished models. When you need more, there is always time!

Before using these services I thought AI to be very time-consuming and difficult to get right. But we were able to create working prototypes in less than 24 hours! The models provided by Microsoft are very good and you can integrate them seamlessly into your application, be it in your conversational bot or in your Android App.
All projects are available in our GitHub organization, but beware, the code is very dirty, as usual for hackathons!

I hope I was able to inspire you to create your own application using AI services, and take away some of your fears of the whole AI and Machine Learning thing.

Have fun writing your own application and see you in our next blog post!

Analyzing a Telegram Groupchat with Machine Learning using an unsupervised NLP approach

Cem Freimoser — Mon, 04 Jun 2018 08:00:09 +0000

Hey, guys!. When it came to finding a topic for my first blog post, I first thought of another topic… but then I remembered my student research project and thought it was actually fun! The student research project was about NLP (Natural language Processing), maybe I’ll tell you more about it in another blog post.
Almost as interesting as NLP is our Telegram group! Since last week we have our own stickers (based on photos of us). Well, this escalated quickly 😛 This kind of communication is interesting and I wonder if my basic knowledge of NLP is enough to get some interesting insights from our group conversations?…
That’s exactly what we’re about to find out! In this blog post, we will extract a telegram chat, create a domain-specific Word2Vec model and analyze it! You can follow me step by step!
But before we start, we should think about how we want to achieve our goal. Our goal is to find some interesting insights from a chat group without any mathematical shenanigans. To achieve our goal we have to do three things:

Convert a telegram chat to a format suitable for the computer,
train an own Word2Vec model based on the data set obtained in the first step and
use the Word2Vec model to find insights.

Exporting A Telegram Chat

That’s the easy part! First, we need a Unix/Linux system because we will use the Telegram messenger CLI for extracting the chat. I personally use a Windows 10 installation, so I installed an OpenSUSE WSL (Windows Subsystem for Linux) from the store. Once OpenSUSE is installed, we start OpenSUSE and clone the repo:

git clone --recursive https://github.com/vysheng/tg.git && cd tg

As you can see from the README of the project, we need additional libraries to build the project. That’s why we simply install all dependencies:

sudo zypper in lua-devel libconfig-devel readline-devel libevent-devel libjansson-devel python-devel libopenssl-devel

Then we can build the project:

./configure
make

Good job! We have already come quite a bit closer to our goal! Now we still have to
Install gram-history-dump. This is best done in another window so that we can run Telegram messenger CLI! Once you have installed telegram-history-dump you start the Telegram messenger CLI and follow these instructions:

telegram-cli --json -P 9009

Now we can execute the telegram-history-dump to get a JSON from our chats (ensure the Telegram messenger CLI is running):

ruby telegram-history-dump.rb

Very good. Our telegram chats are now saved as .jsonl. Personally, I prefer to work with .csv files. Therefore I adapted a small script which converts the .jsonl into a .csv file. You can download the script from our GitHub Repo and use it as follows:

python3 jsonl2csv.py .jsonl chat.csv

Finished! Now we have gained a small dataset from Telegram. Time for the next step!

Train an own Word2Vec model

Word2Vec is an unsupervised machine learning algorithm developed by a team of researchers led by Tomas Mikolov at Google, which contains a word as input and returns a vector representation of this word as output. Other than a simple WordCount vector, a Word2Vec vector contains semantic characteristics of the word. The special thing about Word2Vec is that it is able to learn this representation completely unsupervised, all the algorithm needs is a corpus, which is the whole lexicon, on which it is trained. Word2Vec then tries to learn the meaning of a word based on the words that appear near that word. The construction of a vector based on Word2Vec can also be imagined as a weighting of dimensions of different meanings.

However, I will not go into the exact functionality of Word2Vec now, because that could fill an own blog post! But I think that I will soon write a blog post about it. Subscribe to our blog! Today we will only look at how we can create our own Word2Vec model with the help of gensim and then investigate it a little bit. For this, we will also apply typical steps of data preprocessing of NLP tasks. I will try to explain them as briefly as possible. However, I must ask you to read the details somewhere else or otherwise this blog post would be far too long. I prepared this blog post with Kaggle. I recommend using Kaggle if you want to try it yourself. Each code snippet corresponds to a cell in Kaggle.

So let’s start with the annoying part. First, we need to import a number of Python libraries. These include all the functions we need for our task

import pandas as pd
import csv
from nltk.tokenize.casual import TweetTokenizer
import nltk
from gensim.models import Word2Vec
from gensim.models import KeyedVectors
from nltk.corpus import stopwords

Good work… again :). Now we are finally ready and can get to the fun part. First, we have to read our previously created dataset. The perfect data type for this is pandas dataframe, which is considered to be the standard for data science tasks in python. The great thing: we can create a dataframe directly from a CSV file. Additionally, we need a list. In this list, we will store a list with the tokens of each set in our corpus. Simplified we can now think that every word is assigned to a token. For example, the word “hello” is mapped to the hello-token

df = pd.read_csv('../input/chat.csv',  delimiter=' ',  quotechar='|', quoting=csv.QUOTE_MINIMAL)
sentences = []

Now we have to break down each sentence or more specific each telegram message into tokens and add them to our corpus. Therefore we iterate over our dataframe and get the value in ‘text’. We can ignore the other columns for this example. Now we filter out all messages labeled with ‘no text’. These are mostly gifs we’ve sent.

One problem we have to deal with is the fact that we have a very very small dataset. Therefore we try to keep the variety of possible tokens as small as possible. First of all we remove all stopwords. In the context of NLP, stopwords are words that do not provide a semantic contribution to a statement. For example, articles. Articles are of great importance for grammar, but none for meaning. Word2Vec is an algorithm that maps the semantic meaning of a word into a vector, therefore we can remove the stopwords. To generate the tokens we use a ready-made Tokenizer from the NLTK package. This is specially designed to create tokens from tweets. Since tweets are probably very similar to chat messages, we use them to generate tokens.

deu_stops = stopwords.words('german')
sno = nltk.stem.SnowballStemmer('german')
for index, row in df.iterrows():
    text = row['text']
    if not text == 'no text':
        tokens = TweetTokenizer(preserve_case=True).tokenize(text)
        tokens_without_stopwords = [sno.stem(x) for x in tokens if x not in deu_stops]
        corpus.append(tokens_without_stopwords)

I’m afraid I must show you some magic now. The following lines define some parameters, which are needed for the training of our Word2Vec model. But since I haven’t explained the functionality of Word2Vec in detail in this blog post, it doesn’t make sense to explain these parameters now, see them as God-given 🙂

num_features = 300
min_word_count = 3
num_workers = 2
window_size = 1
subsampling = 1e-3

Ready! Okay, maybe not yet, but now we can generate our Word2Vec model with a single line of code:

model = Word2Vec(
corpus,
workers=num_workers,
size=num_features,
min_count=min_word_count,
window=window_size,
sample=subsampling)

Done! We have created our own Word2Vec model. Nice! So better save it quickly:

model.init_sims(replace=True)
model_name = "billigeplaetze_specific_word2vec_model"
model.save(model_name)

Finding insights

So finally it is time… We can look for interesting facts. The first thing we want to find out is how broad is our vocabulary? Allegedly the vocabulary correlates with the education so how educated are the Billige Plätze? I suspect nothing good :O.
Attentive readers will ask themselves why we first created a Word2Vec model to determine the vocabulary. Yes, we could have used the number of unique tokens earlier, but what interests us more is how many words our Word2Vec model contains. This will be less than the number of tokens, as many messages are too small to be used for training and words can get lost.

len(model.wv.vocab)
>>> 3517

Fascinating! The Word2Vec model we created based on the group chat of Billige Plätze uses only 3517 words. Attention, emojis also fall in here!

Now we are also able to answer the important questions in life! Who’s more of a Dude, Leon or me? What do you think? Ah, no matter we can use our Word2Vec model!

print(model.wv.distance('leon','dude'))
print(model.wv.distance('cem','dude'))
>>> 0.00175533877401
>>> 0.000874914626021

I always knew 🙂 Nice. But we can do more. In the Billige Plätze group, we often use the word ‘penis’ but with which other words is the word ‘penis’ associated? Let’s ask our word2Vec Model!

model.wv.most_similar('penis')
>>> [('gleich', 0.996833324432373),
 ('Marc', 0.9967494010925293),
 ('abend', 0.9967489242553711),
 ('hast', 0.9967312812805176),
 ('gemacht', 0.9967218637466431),
 ('gerade', 0.996717631816864),
 ('kannst', 0.9967092275619507),
 ('Ja', 0.9967082142829895),
 (':P', 0.9967063069343567),
 ('?', 0.9967035055160522)]

We are even able to visualize our Word2Vec model to make the “core language” visible! We use t-SNE for dimensionality reduction to get two-dimensional vectors that we can then visualize in a scatter plot. With sklearn, the code is very manageable!

vocab = list(model.wv.vocab)
X = model[vocab]
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)
df = pd.DataFrame(X_tsne, index=vocab, columns=['x', 'y']).sample(1000)

and now we can plot it with ploty


plt.rcParams["figure.figsize"] = (20,20)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

ax.scatter(df['x'], df['y'])
for word, pos in df.iterrows():
    ax.annotate(word, pos)

We can see very clearly how certain words gather and others are very far away. So you can see which words are related to each other and which are not.

The End

So people, let’s recap what we’ve done: First we used the WSL (Windows Subsystem for Linux) to run a Linux program to export our telegram chats! We then used the NLTK (Natural Langue Toolkit) to convert the exported chat messages into tokens. With the help of gensim we trained our own Word2Vec model based on these tokens. Finally, we used Ploty and Sklearn to visualize our model!

I hope you guys enjoyed it as much as I did! See you soon on our blog again!