Chomsky Go with the TensorFlow · All Done: TensorFlow 2.0 4 TensorFlow 2.0 offers a good...

TensorFlow

1/2020mlconference.ai

MAGAZINE

A Deep Dive into Tensorflow 2.0

Continuous Delivery for Machine Learning

Tutorial: Explainable Machine Learning

Go with the

Interview with Noam Chomsky

http://www.mlconference.ai

2

ToolsMAGAZINE

mlconference.ai mlconference Machine Learning Conference

Editorial 3Machine Learning Is Going Places

Tools

All Done: TensorFlow 2.0 4TensorFlow 2.0 offers a good introduction to deep learning Hoang Tu Nguyen

Explainable Machine Learning with Python and SHAP 10Take a look inside the black box Natalie Beyer

Deep Learning: Not Only in Python 16Training TensorFlow models with JVM languages Christoph Henkelmann

Strategy

Continuous Delivery for Machine Learning 22Learn how to combine ML and CD Danilo Sato, Arif Wider and Christoph Windheuser

AIOps in 2020 27How industry consolidation is transforming the AIOps value proposition Deepak Jannu

“TensorFlow Brings ML Models to All Devices” 29Interview with Meike Hammer and Maksim Moiseikin

AI & Ethics

What Data Should AI Be Trained on to Avoid Bias? 31The ethics of AI and ML Dean Chester

“Democracy Has Suffered as Power Concentrates” 33Alexander Görlach talks to Noam Chomsky

Thinking AI’s Voices: Gender and Identity 35How an AI’s identity is embodied in its voice Aude Gouaux-Langlois and Belinda Sykora

CONTENT

http://mlconference.ai

https://www.facebook.com/mlconference/

https://twitter.com/mlconference

https://www.youtube.com/channel/UCWoVlB63O0951q0j4Vkheiw

MAGAZINE

3

Strategy


Machine Learning Is Going PlacesWelcome to the very first issue of ML Magazine, presented by ML Confer-ence! That’s not the only exciting news—we are also proud to announce ML Conference is going global in 2020! In addition to our usual stops in Berlin and Munich, we will also be heading to Singapore in September.The world of machine learning and artificial intelligence is full of surprises, new developments and tools, and a vast amount of ethical implications, so the authors in this magazine will give you a full idea of the cutting edge.Would you like to dive deeper into the topic of TensorFlow 2.0 and see how it compares to TensorFlow 1.0? Are you interested in learning how ML and continuous delivery go together? Or would you rather know how to make your machine learning algorithms better explainable? For all of this, we’ve got you covered! Other exciting topics include how to train TensorFlow models with JVM languages—yes, you read that correctly!—and where AIOps is heading in 2020.Since machine learning and artificial intelligence have social and ethical implications, ML Magazine also lets critical voices be heard. You can look forward to a groundbreaking interview with Noam Chomsky, one of the most cited scholars alive, who shares his insights on how technology can be used in opposite ways: either to implement restrictive measures such as surveillance or to liberate people and enable communication. After all, as the world-renowned linguist and social critic points out, technology is a tool that is used differently depending on who is controlling it.In another article dealing with AI and bias, you will see why it is important to take a closer look and find out whether it is even possible to develop an AI algorithm without bias in certain areas—and when it may be advisable to even avoid AI entirely. And, lastly, “who” is this AI? The final article brings into focus how we shape an AI’s identity and our interactions by giving it a voice.The broad range of topics in ML Magazine should provide you with lots of inspiration and new insights into the world of machine learning.

I hope you enjoy this inaugural issue of ML Magazine—and see you at ML Conference!

@mlconference

Maika Möbus Editor, Advisory Board

EDITORIAL






4

ToolsMAGAZINE


by Hoang Tu Nguyen

Slowly but surely, machine learning is establishing it-self in many companies and is becoming an integral part of a tool repertoire when data is to be analyzed and processes are to be automated using predictive algo-rithms. Deep learning, a sub-area of machine learning, is an important enabler, especially for image and speech recognition. More and more open source frameworks and libraries are being published to simplify the use of deep learning. One of these frameworks is TensorFlow [1]. It has found many supporters since its publication in 2015, especially in university research, industry and medical practice. For example, specially designed al-gorithms can be used to carry out preliminary tests to detect skin cancer early on or to digitize handwritten characters. The possibilities with TensorFlow are ex-tremely versatile: From a methodological point of view, it is different concepts of artificial neural networks that make the implementation of such applications possible. TensorFlow is also a tool in the form of a programming library that enables the implementation of such neural networks.

Although TensorFlow is one of the most popular ML frameworks, its low-level programming makes it very confusing and inconvenient to use for the untrained user or for beginners. And this was one of the main reasons for Google to release a more mature version of Tensor-Flow in version 2.0. In doing so, a cleanup of the APIs was done, and several modes, e.g. eager execution, were set as default to simplify programming. The focus of

TensorFlow 2.0 is clearly that its use should be more intuitive and faster.

What’s new in TensorFlow 2.0?TensorFlow 1.x is a framework for data stream oriented programming. In a design phase, a calculation graph is created, which is described with nodes and edges. The nodes contain the operations, and in the edges the data flows between the individual nodes. Afterwards, the graph is filled with the data in the execution phase. The advantages of data stream oriented programming are the possibility of parallelizing operations, the distribut-ed execution on different systems and devices, the scal-ability, and the high portability. All these advantages undeniably have their added value. Especially when per-formance is important, these features can be leveraged to develop a model that is optimized for the specific ap-plication.

However, TensorFlow 1.x is not the best choice for a quick application, for example for a simple test or a quick model setup. Also, the separation between design

Strengths Weaknesses

scalable low-level programming

parallelizable redundant functions

large community cluster loads

many platforms not Pythonic

open source (Google) difficult to debug

Table 1: Strengths and weaknesses of TensorFlow 1.x

TensorFlow 2.0 offers a good introduction to deep learning

All Done: TensorFlow 2.0 TensorFlow is one of the most important frameworks in the area of deep learning. The architecture has been thoroughly revised for version 2.0, so this release offers extensive possibilities for deep learning.





5

ToolsMAGAZINE


and execution phase is unusual for the majority of Py-thon users, as they are used to imperative programming. Another confusing fact is that functions appear twice, which makes the use and choice of functions inconsist-ent and redundant (Table 1).

With TensorFlow 2.0, these weaknesses have not only been eliminated, but also Keras has become the stand-ardized and recommended high-level API. Additionally, the eager execution mode, which was introduced with version 1.5, was set as default. It allows imperative pro-gramming without executing Session.run(). This makes understanding and debugging of TensorFlow code much easier. Another point to simplify the working method is the omission of global namespaces. If sizes are defined for the graph and later deleted, they still remain in the standard graph. This leads to complications and mis-understandings when calling sizes with similar names. Therefore this mechanism was removed in TensorFlow 2.0 (Table 2).

In the following, Keras will be introduced. It is in-tended to create a simple neural network. Afterwards, the new default mechanisms will be discussed before us-ing them in an advanced example.

Keras in TensorFlowKeras is a stand-alone machine learning library for Py-thon that allows you to create predefined models in an easy and user-friendly way. Originally, Keras was a sep-arate library designed for use with Theano, Microsoft Cognitive Toolkit (CNTK) and TensorFlow by provid-ing a higher level of abstraction of artificial neural net-works. Keras needs only one of the two libraries. While Theano does not have a large fan community anymore and Keras will not support CNTK in the future, Tensor-Flow is now the first and only real option for Keras us-

ers. With TensorFlow 1.4, Keras was introduced in the TensorFlow Core API, but then it also competed with other APIs. In TensorFlow 2.0, Keras is now the stand-ardized and recommended high-level API.

Loading the MNIST sample data with Keras in TensorFlowIn this tutorial, we will create a model that classifies hand-written numbers into the corresponding digits. For this, we will build a simple neural network with tf.keras. The data used in the example is also integrated into Keras.

# import modulesimport tensorflow as tf

The MNIST dataset is a collection of handwritten digits that have already been classified. Because of this prelimi-nary work and due to its small size of only 28 x 28 pixels per image, the MNIST dataset is certainly one of the best and most well-known entry-level datasets, which is also available in other machine learning libraries such as scikit-learn. The following example is roughly the "Hello World" of deep learning.

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0

There are 60 000 cases of training data and 10 000 sam-ples of test data. Each of the 28 x 28 pixels carries the information of gray values from 0 to 255, and the pixels are then normalized to be in the value range between 0.0 and 1.0.

x_train.shape>>(60000, 28, 28)x_test.shape>>(10000, 28, 28)

To get a visual impression of the digits, we can also visu-alize them (Fig. 1).

import matplotlib.pyplot as pltplt.imshow(x_train[0], cmap = 'binary')

And in doing so, we can check our human digit recog-nition, because the corresponding entry in y_train con-firms:

y_train[0]>> 5

Training and testing a classical neural networkIn tf.keras, a model is introduced with Sequential. Any number of layers can be inserted into this model. The built model should contain the following layers:

Fig. 1: Visualized digits

Weaknesses in TensorFlow 1.x

Improvements in TensorFlow 2.0

low-level programming Keras as high-level programming

redundant functions clear assignment of functions

cluster loads eager execution

not Pythonic functions instead of sessions

difficult to debug

Table 2: TensorFlow versions compared side by side





6

ToolsMAGAZINE


• Flatten Layer: The normalized arrays are converted into a vector.

• Hidden Layer: The vectors should be linked to a hid-den layer with 128 neurons and then activated with the ReLu function.

• Output layer: The last layer classifies the strands of information into each corresponding digit using the Softmax activation function.

Then, we optimize the model using the Adam method. The loss function is to be calculated with sparse_cat-egorical_crossentropy (Listing 1).

Once the fully connected network has been set up, the network still needs to be trained with the training data. The training should be carried out in five epochs. The model is then evaluated with the test data. We receive an accuracy of a good 97 % (Listing 2). Let's take a look at other modules and start with eager execution.

Eager executionIn TensorFlow 1.x, it was necessary to create a calcula-tion graph to be executed later in a separate environ-ment with Session.run(). In eager execution mode, it is possible to execute TensorFlow imperatively immedi-ately, as every data scientist is used to from NumPy etc.

In two examples (Fig. 2), a simple summation of two constants for TensorFlow 1.x and TensorFlow 2.0 was executed. The calculation on the left side was created in TensorFlow 1.x. As mentioned before, the first step is to create a calculation graph, which is then executed with Session.run(). As in TensorFlow 2.0 eager mode is set as default, TensorFlow datatypes can easily interact with datatypes outside the TensorFlow environment. This simplifies the handling with TensorFlow and allows a more intuitive usage.

GradientTapeWhen creating the calculation graph, a loss function was inserted during the design phase. This loss function pass-es the information directly to the optimizer during the execution phase and optimizes the weights afterwards. In eager mode, the values as they appear in the code are calculated immediately. Thus, no static graph with loss function is prepared in order to drive the error feedback

Fig. 2: Com-parison of TensorFlow 1.x in graph mode and Tensor-Flow 2.0 in eager mode

Listing 1model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10, activation='softmax')])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Listing 2model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test, y_test)

>> loss: 0.0424 - accuracy: 0.9751

model.predict_classes(x_test[0:10])>> array([7, 2, 1, 0, 4, 1, 4, 9, 5, 9], dtype=int64)y_test[0:10]>> array([7, 2, 1, 0, 4, 1, 4, 9, 5, 9], dtype=uint8)

Session

Playing Doom with TF-Agents and TensorFlow 2.0Andreas Eberle (arconsis IT-Solutions GmbH)

During recent years, enormous advances in Reinforcement Learning (RL) have been showcased by DeepMind, OpenAI and others. Their AIs achieve similar-to-human or even super-human capabilities in various

games including Atari games, the Chinese board game Go, Dota 2 and Starcraft. Although major Deep Learning frameworks reached new levels of maturity and usability when employing Supervised Learning, developing RL solu-tions often remains hard to get started and even harder to get right. In this talk we will look at the new TF-Agents framework simplifying development of RL solutions with TensorFlow 2.0 drastically. Using the example of an AI playing Doom, we will present the relevant steps and code parts to get you started.

Also visit this session at ML Conference Singapore

https://mlconference.ai/tools-apis-frameworks/playing-doom-with-tf-agents-and-tensorflow-2-0/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





7

ToolsMAGAZINE


and optimization. For optimization tasks, it is therefore necessary to capture the gradient during the calculation in eager mode. For this purpose, the GradientTape can be used, which records the gradient for each calculation step (Fig. 3).

Using GradientTape and AutoGraphAfter discussing the most important changes among the basic functions, we now create a second example (List-

ing 3) of an artificial neural network, this time modeling a Convolutional Neural Network (CNN).

For training the CNN, we will divide the data into batches. To do this, however, we must ensure that the batches do not always contain the same type of dataset. So we have to make sure that the data is not ordered. With tf.data, we can merge the data and create batches immediately. Batches are parts of the training/test data that are passed through the network. Instead of using all data in one go for the calculation, it has advantages to use only parts of it. The amount of memory required at runtime would simply be too large for many applica-tion scenarios if we were to use all the data for training right away.

The advantage of using the so-called mini-batch method is that less runtime memory is required because fewer data points are involved in the calculations of the training process. In addition, artificial neural networks with small batches learn faster because the weights are updated after each execution. On the other hand, the smaller a batch is, the less accurate the gradient esti-mates are. Therefore, data scientists also speak of the so-called Stochastic Gradient Optimization Method in this context.

The data is to be shuffled 10 000 times and the batch size is to contain 32 data points.

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10_000).batch(32)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

Model setupWe will create the following model as a class, and addi-tionally the first example will contain a two-dimension-al convolutional layer before the complete connected mesh. Convolutional layers create filters that should recognize patterns in the images. The filters are also op-

Fig. 3: GradientTape in action

Listing 3import tensorflow as tffrom tensorflow.keras.layers import Dense, Flatten, Conv2Dfrom tensorflow.keras import Model

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0

# Add a channels dimensionx_train = x_train[..., tf.newaxis]x_test = x_test[..., tf.newaxis]

Listing 4model_seq = Sequential([Input(shape=(28, 28, 1)), Conv2D(8, 3, activation='relu'), Flatten(), Dense(128, activation='relu'), Dense(10, activation='softmax') ] )model_seq.build(input_shape=(None, 28, 28, 1))model_seq.summary()





8

ToolsMAGAZINE


timized during the error feedback. The filters are an n x n array that moves across the image. The scalar prod-uct of the image section and the filter is calculated and stored in a map.

Under tf.keras, there are three ways to create a model: Sequential API, Functional API and model subclassing.

Listing 5class MyModel(Model): def __init__(self): super(MyModel, self).__init__(name='subclass') self.conv1 = Conv2D(8, 3, activation='relu') self.flatten = Flatten() self.d1 = Dense(128, activation='relu') self.d2 = Dense(10, activation='softmax')

def call(self, x): x = self.conv1(x) x = self.flatten(x) x = self.d1(x) return self.d2(x)

def summary(self): x = Input(shape=(28, 28,1)) return Model(inputs=[x], outputs=self.call(x)).summary()

# Create an instance of the modelmodel_sub = MyModel()

model_sub.summary()

Session

Future of HR: Implementation of RPA and AI in HR Processes Using Python and TensorFlowProf. Raul Rodríguez (Woxsen School of Business)

The future of HR is focused on implement-ing and perfecting the AI approach within HR settings by utilizing Python and Tensor-Flow to achieve a 72% bias-free recruitment process and employee attrition system that

is able to track and monitor employees in relation to NLP, facial recognition and sentiment analysis. The attendees will be able to understand how AI–induced robotics get actively involved in talent acquisition and generic HR practices.


Building a model with the Sequential APIThe easiest way to build a model is to use the Sequential API. The model is built sequentially one by one (Listing 4).

Building a model with the Functional APIWith the Functional API, more complex models are pos-sible because multiple inputs and outputs are allowed. This approach is therefore generally preferable.

inputs = Input(shape=(28, 28, 1))conv1 = Conv2D(8, 3, activation='relu')(inputs)flatten = Flatten()(conv1)d1 = Dense(128, activation='relu')(flatten)d2 = Dense(10, activation='softmax')(d1)model_func = Model(inputs=inputs, outputs=d2, name='functional')model_func.summary()

Building a model with subclassingWithin Keras, there is a model class, that is a root class and can be used to define model architectures. This method offers the greatest flexibility, since subclassing is completely customizable. However, its complexity also increases as the code is more difficult to produce and debug (Listing 5).

Selecting the loss function and the optimizerThe output should be returned as digit/class. The loss function tf.keras.losses.SparseCategoricalCrossentro-py() can be used for this. As an optimizer, Adam() is a good reference.

loss_object = tf.keras.losses.SparseCategoricalCrossentropy()optimizer = tf.keras.optimizers.Adam()

For the output, we must first select the metrics. The loss as well as the accuracy of our model can be useful for classification.

Listing [email protected] train_step(images, labels): with tf.GradientTape() as tape: predictions = model(images) loss = loss_object(labels, predictions) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables))

train_loss(loss) train_accuracy(labels, predictions)

@tf.functiondef test_step(images, labels): predictions = model(images) t_loss = loss_object(labels, predictions)

test_loss(t_loss) test_accuracy(labels, predictions)

https://mlconference.ai/machine-learning-business-strategy/future-of-hr-implementation-of-rpa-and-ai-in-hr-processes-using-python-and-tensorflow/? utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





9

ToolsMAGAZINE


train_loss = tf.keras.metrics.Mean(name='train_loss')train_accuracy=tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy( name='test_accuracy')

In the training phase, we require the gradient curve to optimize the model. The presented method Gradient-Tape is to be used for this purpose. By using the eager execution mode, it is possible to write imperative Py-thon code, which is then passed on to TensorFlow, com-piled in TensorFlow and then passed back to Python. The more code has been written, the more transfers take place, which in turn take a lot of time. To reduce these transfers, TensorFlow uses AutoGraph.

AutoGraph compiles entire code sections. Therefore, only one function must be defined and declared with @tf.function. Instead of transferring, compiling and send-ing back single lines of code, TensorFlow creates a cal-culation graph from the imperative function (Listing 6).

Running the modelAfter the model has been defined and completed, we should now run it. We will execute the model five times, so we set the number of epochs to 5. First, we must train our model and then evaluate it with the test data (List-ing 7).

Listing 7EPOCHS = 5models = [model_seq, model_func, model_sub]

for model in models: model = model print('\n' + model.name) print('----------------') for epoch in range(EPOCHS): for images, labels in train_ds: train_step(images, labels)

for test_images, test_labels in test_ds: test_step(test_images, test_labels)

template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}' print(template.format(epoch+1, train_loss.result(), train_accuracy.result()*100, test_loss.result(), test_accuracy.result()*100))

# Reset the metrics for the next epoch train_loss.reset_states() train_accuracy.reset_states()

test_loss.reset_states() test_accuracy.reset_states()

OUTPUT:-------------------------------------------------------------------sequential---------------->>> ...>>> Epoch 5, Loss: 0.012651719152927399, Accuracy: 99.56832885742188, >>> Test Loss: 0.06541633605957031, Test Accuracy: 98.1500015258789

functional---------------->>> ...>>> Epoch 5, Loss: 0.00463454145938158, Accuracy: 99.8499984741211, >>> Test Loss: 0.07032663375139236, Test Accuracy: 98.4000015258789

subclass---------------->>> ...>>> Epoch 5, Loss: 0.0034947844687849283, Accuracy: 99.87666320800781, >>> Test Loss: 0.08777650445699692, Test Accuracy: 98.3699951171875

ConclusionAll three variants return similar results since they rep-resent the same model. In summary, the following can be stated:

The Sequential API can be used for simple models where a basic step-by-step method is sufficient and mul-tiple inputs and outputs are not required.

With the Functional API, more complex models are possible, which can still be easily generated with little additional effort.

Model subclassing gives you full control over the model. However, the code is also difficult to generate and debugging is more difficult as well due to the archi-tecture.

Hoang Tu Nguyen is a Data Scientist at DATANOMIQ GmbH. He studied mechanical engineering at TU Dres-den, where he discovered his passion for data and quantitative correlations. He is currently working as a consultant for Business Intelligence, Data Science and

Machine Learning.

Links & Literature

[1] https://www.tensorflow.org/overview/

https://www.tensorflow.org/overview/





10

ToolsMAGAZINE


by Natalie Beyer

Machine Learning is used in a lot of contexts nowadays. We get offers for different products, recommendations on what to watch tonight and many more. Sometimes the predictions fit our needs and we buy or watch what was offered. Sometimes we get the wrong predictions. Sometimes those predictions are in more sensitive con-texts than watching a show or buying a certain prod-uct. For example, when an algorithm that is supposed to automate hiring decisions discriminates against a group. Amazons recruiters used an algorithm that was systematically rejecting women before inviting them to job interviews [1].

To make sure that we know what the algorithms we use actually do, we have to take a closer look at what we are actually predicting. New methods of explain-able machine learning open up the possibility to explore which factors were used exhaustively by the algorithm to come to the predictions. Those methods can lead to a better understanding of what the algorithm is actually doing and whether it emphasizes columns that should not contain much information.

ExampleTo have a clearer picture of explainable AI, we will go through an example. The used dataset consists out of Kickstarter projects and can be downloaded here [2].

Kickstarter is a crowdfunding platform where people can upload a video or description about their planned projects. If one would like to support a project, he or she can donate money to that project. In this example, I would like to guide you through a machine learning al-gorithm that is going to predict whether a given project

is going to be successful or not. The interesting part is that we are going to take a look at why the algorithm came to a certain decision.

This explainable machine learning example will be in Python. So, at first we need to import a few packages (Listing 1). pandas, NumPy, skikit-learn and Matplotlib are frequently used in data science projects. CatBoost [3] is a great tree based algorithm that can deal excel-lently with categorical data and has a good performancealso in the default settings [4]. SHAP [5] is the packageby Scott M. Lundberg that is the approach to interpretmachine learning outcomes.

Used versions of the packages:

• pandas 0.25.0• NumPy 1.16.4• Matplotlib 3.0.3• scikit-learn 0.19.1• CatBoost 0.18.1• SHAP 0.28.3

Let’s take a look at the downloaded dataset in Figure 1 with kickstarter.head():

Listing 1import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import catboost as catboost from catboost import CatBoostClassifier, Pool, cv import shap

Take a look inside the black box

Explainable Machine Learning with Python and SHAP Machine learning algorithms can cause the “black box” problem, which means we don’t al-ways know exactly what they are predicting. This may lead to unwanted consequences. In the following tutorial, Natalie Beyer will show you how to use the SHAP (SHapley Additive exPlanations) package in Python to get closer to explainable machine learning results.





11

ToolsMAGAZINE


The first column is the identification number of each project. The name column is the name of the Kickstarter project. category classifies each project in one of 159 different categories. Those categories can be summed up into 15 main categories. Next is the currency of the project. The column deadline represents the last possi-ble date to support the project. pledged describes the amount of money that was given in order to support the project. state is the state of the project after the deadline date. backers is defined as the number of supporters for the given project. The last column consists out of the country in which the project was launched.

We are just going to use the states failed and success-ful, as the other states like canceled do not seem to be very interesting.

kickstarter["state"] = kickstarter["state"].replace({"failed": 0, "successful": 1})

First machine learning modelWe are going to start with a machine learning model that takes the following columns as the feature vector:

kickstarter_first = kickstarter[ [ "category", "main_category", "currency", "deadline", "goal", "launched", "backers", "country", "state", ] ]

The last column is going to be our target column, there-fore y. All the other columns are the feature vector, therefore X.

X = kickstarter_first[kickstarter_first.columns[:-1]] y = kickstarter_first[kickstarter_first.columns[-1:]]

We are going to split the dataset with the result of hav-ing 10% of the dataset as the test dataset, and 90% as the training dataset.

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.1, random_state=42 )

As our classifier, I chose CatBoost, as it can deal very well with categorical data. We are going to take the pre-installed settings of the algorithm. Also, 150 iterations are enough for our purposes.

model = CatBoostClassifier( random_seed=42, logging_level="Silent", iterations=150 )

In order to use CatBoost properly, we need to define which columns are categorical. In our case, those are all columns that have the type object.

categorical_features_indices = np.where(X.dtypes == np.object)[0] X.dtypes

We can see in Figure 2 that all columns but goal and back-ers are object columns and should be treated as categori-cal. After fitting the model, we see a pretty good result:

model.fit( X_train, y_train, cat_features=categorical_features_indices, eval_set=(X_test, y_test), )

Fig. 2: Categorical data

Fig. 1: Dataset





12

ToolsMAGAZINE


With this first model, we are able to classify 93% of our test dataset correctly (Figure 3). Let’s not get too excited and check out what we are actually predicting. With the package SHAP (Listing 2), we are able to see which fac-tors were mostly responsible for the predictions in a bar plot (Figure 4).

With this bar plot, we can see that the column backers is contributing the most to the prediction! Oh no! We have put an approximation of the target column (status failed or successful) into our model. If your Kickstarter project has a lot of backers, then it is most likely going to be successful. Let’s give it another go. This time we are just going to use the columns that are not going to reveal too much information.

Second machine learning modelIn the extended dataset kickstarter_extended = kick-starter.copy(), we are going to implement some feature engineering. Looking through the data, one can see that some projects are using special characters in their name. We are going to implement a new column number_spe-cial_character_name that is going to count the number of special characters per name:

kickstarter_extended[ "number_special_character_name" ] = kickstarter_extended.name.str.count('[-()"#/@;:<>{}`+=~|.!?,]')

Fig. 3: Result of the first model

Fig. 4: SHAP bar plot of the first model

Listing 2shap_values = model.get_feature_importance( Pool(X_test, label=y_test, cat_features=categorical_features_indices), type="ShapValues", ) shap_values = shap_values[:, :-1]shap.summary_plot(shap_values, X_test, plot_type="bar")

Workshop

Machine Learning 101++ Using PythonDr. Pieter Buteneers (Chatlayer.ai)

Machine learning is often hyped, but how does it work? We will show you hands-on how you can do data inspection, prediction, build a simple recommender system, and so on.

Using realistic datasets and partially programmed code we will make you accustomed to machine learning concepts such as regression, classification, over-fitting, cross-valida-tion and many more. This tutorial is accessible for anyone with some basic Python knowledge who’s eager to learn the core concepts of machine learning.

Also visit this workshop at ML Conference Munich & Singapore

kickstarter_extended["word_count"] = kickstarter_extended["name"].str. split().map(len)

Also, we are going to change the deadline and launched column from the type object to datetime and thereby re-place the columns. This is happening in order to get the new column delta_days, which consists out of the days between the “launched” date and the “deadline” date:

kickstarter_extended["deadline"] = pd.to_datetime(kickstarter_ extended["deadline"]) kickstarter_extended["launched"] = pd.to_datetime(kickstarter_ extended["launched"])kickstarter_extended["delta_days"] = ( kickstarter_extended["deadline"] - kickstarter_extended["launched"] ).dt.days

It is also interesting to see whether projects are more successful in certain months. Therefore, we are building

https://mlconference.ai/machine-learning-business-strategy/machine-learning-101-using-python/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





13

ToolsMAGAZINE


the new column launched_month. The same for day of week and year:

kickstarter_extended["launched_month"] = kickstarter_extended["launched"]. dt.month kickstarter_extended[ "day_of_week_launched" ] = kickstarter_extended.launched.dt.dayofweek

kickstarter_extended["year_launched"] = kickstarter_extended.launched.dt.year kickstarter_extended.drop(["deadline", "launched"], inplace=True, axis=1)

The new dataset kickstarter_extended now consists of the following columns:

kickstarter_extended = kickstarter_extended[ [ "ID", "category", "main_category", "currency", "goal", "country", "number_special_character_name", "word_count", "delta_days", "launched_month", "day_of_week_launched", "year_launched", "state", ] ]

Again, building the test and training dataset:

X = kickstarter_extended[kickstarter_extended.columns[:-1]] y = kickstarter_extended[kickstarter_extended.columns[-1:]]X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.1, random_state=42 )

Initializing the new model and setting the categorical columns. Afterwards, fitting the model (Listing 3):

The current model is a little bit worse than the first try (Figure 5), but the assumption is that we are now actu-ally predicting on a more accurate database. A quick look at the bar plot, generated by Listing 4 and contain-ing the current feature importances, tells us that in fact goal is the most informative column now (Figure 6).

Fig. 6: SHAP bar plot of the second model

Listing 3model = CatBoostClassifier( random_seed=42, logging_level="Silent", iterations=150 ) categorical_features_indices = np.where(X_train.dtypes == np.object)[0]

model.fit( X_train, y_train, cat_features=categorical_features_indices, eval_set=(X_test, y_test), )model.score(X_test, y_test)

Listing 4shap_values_ks = model.get_feature_importance( Pool(X_test, label=y_test, cat_features=categorical_features_indices), type="ShapValues", ) shap_values_ks = shap_values_ks[:, :-1]shap.summary_plot(shap_values_ks, X_test, plot_type="bar")

Fig. 5: Result of the second model





14

ToolsMAGAZINE


Fig. 8: SHAP force plot of a successful project

Fig. 7: SHAP result of the second model

Blog

Machine Learning with PythonDr. Shirin Glander (codecentric)

Image classification models are intended to classify images into classes. We usually want to divide them into groups that reflect what objects are on a picture. For example, we can train an image classification model that

can distinguish „dog“ from „cat,“ but of course, even more complex classifications can be made in significantly more classes.

Until now, the SHAP package did not show anything other algorithm libraries cannot do. Showing feature importances has already been implemented in XGBoost and CatBoost some versions ago. But now let’s get SHAP to shine. We enter shap.summary_plot(shap_val-ues_ks, X_test) and receive the following summary plot (Figure 7):

In this summary plot, the order of the columns still represents the amount of information the column is ac-countable for in the prediction. Each dot in the visu-alization represents one prediction. The color is related to the real data point. If the actual value in the dataset was high, the color is pink; blue indicates the actual value being low. Grey represents the categorical values which cannot be scaled in high or low. But the package maintainers are working on it. The x-axis represents the

SHAP value, which is the impact on the model output. The model output 1 equates to the prediction of success-ful; 0 the prediction that the project is going to fail.

Let’s take a look at the first row of the summary_plot. If a Kickstarter project owner set the goal high (pink dots) the model output was likely 0 (negative SHAP value, not successful). It totally makes sense: if you set the bar for the money goal too high, you cannot reach it. On the other hand, if you set it very low, you are likely to achieve it by asking just a few of your friends. The column word_count also shows a clear relationship: few words in the name description indicate a negative impact on the model output, in the sense that it is likely a failed project. Maybe more words in the name deliver more information, so that potential supporters already get interested after reading just the title. You can see that the other columns are showing a more complex pic-ture as there are pink dots in a mainly blue area and the other way around.

Listing 5shap_values = model.get_feature_importance( Pool(X_test, label=y_test, cat_features=categorical_features_indices), type="ShapValues",)expected_value = shap_values[0, -1] shap_values = shap_values[:, :-1] shap.initjs() shap.force_plot(expected_value, shap_values[10, :], X_test.iloc[10, :])

https://mlconference.ai/blog/machine-learning-with-python/? utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





15

ToolsMAGAZINE


The great thing about the SHAP package is that it gives the opportunity to dive even deeper into the ex-ploration of our model. Namely, it will give us the feature contributions for every single prediction (List-ing 5).

In the force plot (Figure 8), we can see the row at posi-tion 10 of our test dataset. This was a correct prediction of a successful project. Features that are pink contribute to the model output being higher, that means predicting a success of the Kickstarter project. Blue parts of the visualization indicate a lower model output, predicting a failed project. So the biggest block here is the feature category, which in this case is Tabletop Games. There-fore, with this particular set of information, the project being a Tabletop Game is the most informative feature for the model. Also, the short period of 28 days of the project being online contributes towards the prediction of success.

Another example is row 33161 of the test dataset, which was a correct prediction of a failed project. As we can see in the force plot (Figure 9), generated by Listing 18, the biggest block is the feature goal. Apparently, the set goal of $25,000 was too high.

shap.force_plot(expected_value, shap_values[33161, :], X_test.iloc[33161, :])

So, now we got a better look at our model with this Kickstarter dataset. One could also explore the false predictions and get an even deeper understanding of

Fig. 9: SHAP force plot of a failed project

Fig. 10: SHAP package for image recognition

the model. One can also take a look at the false positives and false negatives. There, you could see on which features the model concentrated that lead to an incorrect model output. There are also many other visualizations like interac-tion values. Check out the documenta-tion [6] if you are interested.

OutlookThe SHAP package is also useful in other machine learning tasks. For ex-ample, image recognition tasks. In Fig-ure 10 [7], you can see which pixels contributed to which model output.

SHAP is giving us the opportunity to better understand the model and which features contributed to which prediction. The package allows us to check whether we are taking just fea-tures into account which make sense. It is the first step towards preventing

models from predicting things based on wrong input features. Thus, machine learning becomes less of a “black box”. This way, we are getting closer to explain-able machine learning.

Natalie Beyer is a Co-Founder of the data science con-sultancy LAVRIO.solutions. The company is building customized and precise solutions for their clients’ needs. With her background in psychology and statis-tics she is responsible for model building, data explo-

ration and visualization of results.

Links & Literature

[1] https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

[2] https://www.kaggle.com/kemical/kickstarter-projects

[3] https://catboost.ai/

[4] https://catboost.ai/#benchmark

[5] https://github.com/slundberg/shap

[6] https://github.com/slundberg/shap#shap-interaction-values

[7] https://github.com/slundberg/shap#deep-learning-example-with-gradientexplainer-tensorflowkeraspytorch-models

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G



https://www.kaggle.com/kemical/kickstarter-projects

https://catboost.ai/

https://catboost.ai/#benchmark

https://github.com/slundberg/shap

https://github.com/slundberg/shap#shap-interaction-values

https://github.com/slundberg/shap#deep-learning-example-with-gradientexplainer-tensorflowkeraspytorch-models







16

ToolsMAGAZINE


by Christoph Henkelmann

AI and deep learning are certainly hot topics at the mo-ment and despite some initial setbacks, e.g. in the field of self-driving cars, the potential of deep learning is far from exhausted. But there are still many areas of IT in which the topic is only just gaining momentum. It is therefore particularly important to investigate how deep learning systems can be implemented on the JVM, as Java (both the language and the platform) is still the dominant technology in the enterprise sector.

TensorFlow is one of the most important frameworks in the field of deep learning. Despite the increasing popularity of Keras, it is still inconceivable to do without it, especially as the AI top dog Google continues to drive its develop-ment forward. This article shows how TensorFlow can be used on the JVM to train and infer TensorFlow models.

What is the combination of TensorFlow and JVM suitable for?DL4J is the only professional deep learning framework that is really at home on the JVM, so if you would like to use deep learning on the JVM, DL4J is usually the best choice. TensorFlow – like many machine learning frame-works – is mainly used with Python. However, there are reasons to use TensorFlow within a JVM context:

You want to use a process which has an implementa-tion in TensorFlow, but not in DL4J, and the porting effort is too high.

You are working with a data science team that is used to work with TensorFlow and Python, but the target infrastructure runs on the JVM.

The data which is necessary for the training lies within a Java infrastructure (databases, custom data formats, APIs) and in order to get to the data, existing interface code must be ported from Java to Python.

The JVM TensorFlow combination is therefore always useful if an existing Java environment is available and, for personnel or project related reasons, TensorFlow has to be used for deep learning (see the box: "TensorFlow and JVM – always a good idea?").

How does TensorFlow work? Before you start to use a new framework, it is impor-tant to take a look at what happens under the hood (see the box: "TensorFlow cheat sheet"). When thinking of TensorFlow, the first things that come to mind are AI and neural networks. But from a technical point of view, TensorFlow is mainly a framework that can ex-ecute complex, iterative, parallel calculations on tensors – and that, if possible, GPU-accelerated. Although deep learning is the main field of application for TensorFlow, it can also be used for any other calculation.

A TensorFlow program – or better: the configuration of a calculation – is always structured like a graph in TensorFlow. The nodes of the graph represent opera-tions, such as adding or multiplying, but also loading and saving. Everything that TensorFlow does takes place in the nodes of a previously defined calculation graph.

The nodes (operations) of the graph are connected by edges through which the data flows in the form of ten-sors. Hence the name TensorFlow.

All calculations in TensorFlow take place in a so-called session. In the session, either a finished graph is

Training TensorFlow models with JVM languages

Deep Learning: Not Only in Python Although there are powerful and comprehensive machine learning solutions for the JVM with frameworks such as DL4J, it may be necessary to use TensorFlow in practice. This can, for example, be the case if a certain algorithm exists only in a TensorFlow implementation and the effort to port the algorithm into another framework is too high. Although you interact with TensorFlow via a Python API, the underlying engine is written in C++. Using the TensorFlow Java wrapper library, you can train and inference TensorFlow models from the JVM without having to rely on Python. Existing interfaces, data sources, and infrastruc-tures can be integrated with TensorFlow without leaving the JVM.





17

ToolsMAGAZINE


loaded, or a new graph is created piece by piece by API calls. Special nodes in the graph can contain variables. In order for the graph to work, these must be initialized. Once this has happened and a session with a finished, in-itialized graph exists, TensorFlow interacts only by call-ing operations in the graph. What is calculated depends on which output nodes of the graph are queried. Thus, not the entire graph is executed, but only the operations

that provide input for the queried node, then their input nodes, etc., back to the input operations, which must be filled with the necessary input tensors.

The important thing with TensorFlow is that all oper-ations are automatically differentiated for the user – this is needed for the training of neural networks. However, the user can safely blend it out since it happens auto-matically.

Usually, the graph is defined by a Python API. It can be represented graphically with auxiliary programs (Fig. 1). But such representations only serve for debugging, as they are not graphically programmed like in a visual programming language such as LabView.

Although, in most examples, Python is used to inter-act with TensorFlow, the actual engine is written in C/C++. Therefore, you can use TensorFlow with any lan-guage that can call C functions. Thus, you can also per-form calculations in TensorFlow from the JVM.

TensorFlow training and inference with PythonThe training of a TensorFlow model with Python (box: “tf.data or feeding?”) can be separated into the follow-ing steps:

• Create the graph, either via several API calls that compose the graph or through loading a *.pb file that contains the graph

• Create a session for the graph• Initialize the graph variable either by calling a special

operation in the graph which fills the variables with default values or through loading a pre-trained model

After these three steps, we have an executable Tensor-Flow session with a functioning model. If we want to (further) train it, the following three steps are always executed in a loop until the model has learned enough

Fig. 1: A (small) section of a TensorFlow graph: The numbers on the edges indicate the size of the tensor flowing through them, the arrows indicate the direction.

TensorFlow and JVM – always a good idea?Although there may be good reasons for this com-bination, it is also important to mention what may speak against it. Especially the choice of TensorFlow should be well considered:

■ TensorFlow is not a suitable framework for deep learning or machine learning beginners.

■ TensorFlow is not user-friendly: The API changes quickly and in the mass of instructions it is often not clear which path is the best.

■ TensorFlow isn’t better just because it is made by Google: Deep learning is math, and math is the same for everyone. TensorFlow does not create “smarter” AIs than other frameworks. It is also not faster than the alternatives (but also not “dumber” or slower).

If you want to get into deep learning and stay on the JVM, the use of DL4J is absolutely recommended. Especially for professional enterprise projects, DL4J is a good choice. But also, if you want to look over the fence and try out a bit of Python, it is worth trying out the TensorFlow alternatives. Here, you are currently better off with Keras, thanks to a much more convenient API.





18

ToolsMAGAZINE


– either by defining a fixed number of training steps be-forehand or by waiting until the training error drops below a certain level:

• Package input data in arrays and assign input sensors• Select output node and pack into a list• Execute the session: a special command causes the

session to perform the necessary operations to gener-ate the selected output

But where does the training take place? It is done by executing the correct output nodes. There is no differ-ence between training and inference for TensorFlow, mathematical operations are simply performed in the calculation graph. We speak of training if these lead to a neural network that learns a better weighting to solve a problem. However, the API calls for training and any other type of usage are the same.

Our input consists of the data that is to be learned (for example, an image as a two-dimensional tensor and the label "dog" or "cat" in the form of an integer ID in a zero-dimensional tensor) during training. By running the correct nodes, TensorFlow updates some variables in the graph to improve the prediction. The main dif-ference between training and inference is that we pe-riodically save the current state of the graph variables – which are constantly changing – while this is useless for the inference because they remain constant.

The TensorFlow Java APINow we can call all operations that are necessary in Py-thon for training or inference via JNI, since TensorFlow is implemented internally in C/C++.

Fortunately, we no longer have to bother wrapping the low-level C API with JNI, as Google has already done this for us. The necessary libraries are, as usual, available on Maven Central. There are four different ar-tifacts, all in the group org.tensorflow:

• tensorflow: A metapackage with dependencies on libtensorflow and libtensorflow_jni; in order to avoid confusion, it should not be used.

• libtensorflow: The API against which you program in Java; this is the compile and runtime dependency and the central entry point.

• libtensorflow_jni: Contains the native CPU de-pendencies for libtensorflow; this artifact is needed at runtime when using a machine without GPU; it contains native code for Windows, Linux and Mac; TensorFlow is completely included, you don't have to install Python or TensorFlow on the running system.

• libtensorflow_jni_gpu: The GPU equivalent to libten-sorflow_jni; you should use this dependency if you use a computer with NVIDIA GPU and Cuda and CuDNN are installed correctly; it only works under Windows and Linux, there is no GPU support for TensorFlow under macOS.

The version numbers of the Java wrappers correspond to the version number of the included TensorFlow version. Here we should always use the newest stable release. We only have to pay attention if the code is supposed to be executed on a computer with GPU (box: "Selecting the GPU to be used"). Not every TensorFlow version supports every CUDA and CuDNN version (CUDA is a special NVIDIA driver to use graphics cards for par-

TensorFlow cheat sheet ■ Tensor: The basis for calculations in TensorFlow. A tensor is actually an object from linear algebra, but for our purposes it is completely sufficient to consider a tensor as a multidimensional array (mostly from float or double values, sometimes also char or boolean). TensorFlow uses tensors for everything. All data that TensorFlow consumes, produces, and uses internally is packaged in tensors – hence the name.

■ Graph: The definition of TensorFlow calculation pro-cedures is usually stored in a file called graph.pb in a ProtoBuf binary format, similar to a Java .class file.

■ Training: When training a machine learning method, data and expected results are presented to the algo-rithm over and over again, whereupon the algorithm adjusts the internal parameters of the model to im-prove the result. Sometimes this is called "learning", although it has little to do with human learning.

■ Inference: Depending on the application, you may want to use a machine learning process to classify, predict, translate, create content, and much more.

All these applications are summarized under the term inference. Inference therefore means as much as "us-ing a procedure to obtain a result". This is what we want to do most of the time in live use after training. During inference, a procedure does not learn.

■ Model: the learned parameters of a machine learn-ing procedure, for example, a neural net. This is the result of the learning process and necessary to obtain results (the variable state of the graph, so to speak). It is distributed over several files and stored in one *.index and several *.data files, for example, *.data-0000-of-0001. The first number indicates the consec-utive number of the file, the second the total number.

■ Session: the context in which TensorFlow is executed, such as a running JVM instance. In order to use Ten-sorFlow, we need to create a session in which a graph is loaded that is initialized with a model. In Java, a JVM instance must be started in which classes are loaded that are instantiated with constructor param-eters.





19

ToolsMAGAZINE


allel calculations, CuDNN is a CUDA based library for neural networks). We must ensure that the CUDA and TensorFlow versions are matching. Currently, all TensorFlow versions from 1.13 on support the same CUDA version: 10.0. With a Java-based solution, we already have a great advantage over Python software when installing the finished software. Thanks to Maven, our resulting artifact already includes all dependencies. Neither Python nor TensorFlow nor any Python librar-ies have to be pre-installed or the installations managed with a tool like Anaconda.

You should not use the top-level dependency tensor-flow, it is better to directly use libtensorflow and one of the *_jni implementations. The reason for this is that the tensorflow artifact has a dependency on libtensorflow_jni (the CPU variant). If we now add libtensorflow_jni_gpu, the CPU-native code is still used and one wonders why everything runs so slowly despite the GPU. The Gradle dependencies for the TensorFlow training on the GPU look like this:

tf.data or feeding?There are two possibilities to load training data into the graph when you train a TensorFlow model in Python:the tf.data API or so-called "feeding", i.e. the trans-fer of individual data for each calculation step. The tf.data API is implemented internally in C, integrated directly into the graph, and therefore very fast – but also complicated to use and very difficult to debug. The feeding method is easy to use and understand, but you need Python code at runtime. Therefore, Py-thon usually slows down the more expensive graph-ics card, and valuable GPU capacity is not used. But which approach do we take in Java? Fortunately, Java is orders of magnitude faster than Python, so here we get the best of both worlds: We can use the easy-to-understand feeding method and still get full performance. That is why we leave the tf.data API out of this article, we just don't need it.

Listing 1# This Python command creates a node for initializationinit_op = tf.global_variables_initializer()# The saver is an auxiliary class that stores a model in Python.saver = tf.train.Saver()# Save is a graph operation# and can only be executed in one sessionwith tf.Session() as sess: # Initializing Variables sess.run(init_op) # Save state save_path = saver.save(sess, filename)

Speaker

Christoph HenkelmannDIVISO

Christoph Henkelmann holds a degree in Computer Science from the University of Bonn. He is currently working at DIVISIO, an AI company from Cologne, where he is CTO and co-founder. At DIVISIO, he combines

practical knowledge from two decades of server and mobile development with proven AI and ML technology. In his pastime he grows cacti, practices the piano and plays video games.

compile "org.tensorflow:libtensorflow:1.14.0"runtimeOnly "org.tensorflow:libtensorflow_jni_gpu:1.14.0"

The required Java API for training and inference is sim-ple and manageable. Only four classes are important: Graph, Session, Tensor and Tensors. We can now see how to use them correctly by rebuilding the Python-typical training steps in Java.

TensorFlow training in JavaThe first step in training is to define the graph. Unfortu-nately, we have to make the first but only compromise right at the beginning. A graph can also be built step by step using the Java API, but for many node types, the Python API automatically generates many necessary helper nodes that are required for the frictionless use of the graph. In order to build this in Java, we would need a very detailed knowledge of the Python API internals. This step must therefore be done once in advance in Py-thon. We then store the resulting graph file as a Java re-source in order to then load it back into the JVM. Saving the current graph in Python is very easy:

with open(filename, 'wb') as f: f.write(tf.get_default_graph().as_graph_def().SerializeToString())

Important: Even if the method used here is called Seri-alizeToString(), the result is still a binary file. For our convenience, we should also save the initialized varia-bles here. Although initializing the variables in the graph from the JVM would be easy, if we always choose the here shown procedure, it makes it easier to do transfer training with complex models afterwards. Hereby, an already existing state of a model is further trained and adapted (Listing 1).

Now we have saved the graph and the model and can train it in Java and execute the graph. For the sake of brevity, the following examples are in Kotlin but can be transferred to any JVM language:

//create empty graphval graph = Graph()

https://mlconference.ai/speaker/christoph-henkelmann/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





20

ToolsMAGAZINE


Session

Multi Tasking Deep Learning for Natural Language Processing – Transfer LearningDebjyoti Paul (Amazon)

In this talk, we will cover how to model model different natural language process-ing. In present NLP tasks like word-based or sentence-based classification, sentence generation and question answering, it is a

challenge to train models with little domain information. The key solution is using a pre-trained model and transfer learn. BERT from Google and MTDNN from Microsoft have been breaking all set benchmarks in recent years. Under-standing how to use transfer learning and multi tasking is key in building a model for the task. In this talk, we will discuss different models like ULMFIT, GPT and BERT, which are popular for transfer learning, and then we will analyze how multi tasking can immensely improve this task and dif-ferent ways of doing multi tasking.


//*. load pb file - either from a file or from resourcesval graphDefBytes = javaClass.getResource(resourceName).readBytes()//reconstruct graph from filegraph.importGraphDef(graphDefBytes)

Now we have loaded the TensorFlow graph into the JVM. In order to do something with it, we need a session:

val session = Session(graph)

We only have to load the latest version of the variable before we can really get started. This can be either the file initially saved in Python or the last state of a previ-ous training, for example, in order to continue a train-ing. The loading of variables is only an operation in the TensorFlow graph, and a string packed into a tensor is needed for this operation. The string contains the name of the *.index file without the suffix, so foo instead of foo.index.

Here, we need the Tensors class for the first time. This class contains help functions to package Java data types into Tensor objects. Hereby, it is automatically taken into consideration that the Tensor has the correct form. Important for every Tensor object: It contains memory that has been allocated outside the JVM. Therefore, it must be closed manually, for which it implements the Closable Interface. In Java, an own try{...} finally { ten-sor.close(); } block must be created for each tensor. For-tunately, this is much easier in Kotlin with use:

Tensors.create(path).use { pathTensor -> session.runner().feed("save/Const", pathTensor) .addTarget("save/restore_all") .run()}

Here we can see all necessary parts of a TensorFlow ac-tion on the JVM:

• A runner is created for the session; this class has a builder API that defines what is supposed to be ex-ecuted.

• The input node for the loading and saving ("save/Const") is filled with the tensor which contains the file name.

• The target node is defined as the target for loading.• The action is executed.

The trick for all operations is to know their names. But since we build the graph ourselves beforehand and can define the name of a node at creation, we can choose them for ourselves. Exceptions are the nodes for loading and saving, which always have the here stated names.

Now we have already seen all the operations needed to interact with TensorFlow from the JVM. Carrying out a training step is now very easy. Let's assume that our input is an array of loaded images. The black and white values of the pixels are converted to float values in the range 0-1. Each image belongs to a class defined by an int value, for example, 0 = dog, 1 = cat. Then the input for a batch (multiple images are always trained at once) is a float[][] array, which contains the images, and an int[] array, which contains the classes to learn. A training step can now be executed as follows (List-ing 2).

We see the same pattern again: A runner is created, the inputs are packaged into tensors, the target is select-ed ("optimize") and the action is executed. But now we have an innovation: We get values back. The names of the nodes that are to be returned are defined with fetch. The names contain a suffix: ":0". This means that they

Selecting the GPU to be usedSometimes we don't want to block all GPUs on sys-tems with multiple GPUs, for example, to run mul-tiple trainings in parallel. For this, we can configure the TensorFlow graph, which normally automatically allocates the GPU or GPUs, so that only one GPU is used. This has the big disadvantage, though, that the graph is then "hard wired" to a certain GPU and can be used only on this GPU. It is much more con-venient to show or hide the GPUs by environment variable before starting the JVM. This can easily be done with the environment variable CUDA_VISIBLE_DEVICES. Here, we can specify a comma-separated list of CUDA devices that should be visible in the current shell. Caution: The numbering starts at 1, not at 0. The following console command, for example, activates only the second graphics card for Tensor-Flow (or other deep learning frameworks):

export CUDA_VISIBLE_DEVICES=2

https://mlconference.ai/tools-apis-frameworks/multi-tasking-deep-learning-for-natural-language-processing-transfer-learning/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





21

ToolsMAGAZINE


are nodes with multiple outputs, the :0 suffix means that the output with index 0 of the node should be returned.

The output is a list of Tensor objects. These can be converted into various primitive types and arrays to make the result available. Important: The Tensor objects created by the API also have to be closed. Normally, the entries in the list would have to be iterated and closed

in a finally block. However, this is very inconvenient and difficult to read. Therefore, it is useful to define an extended use API in Kotlin, with which several objects within a block are marked with use or useAll (for lists of Closables), which are then closed safely (Listing 3).

This useful trick allows you to close all tensors within a TensorFlow call conveniently and safely.

With the inference under Java, it becomes really easy. We remember: Every action on the TensorFlow graph is performed by filling input nodes with input tensors and querying the correct output nodes. This means the following for our example above: The code remains the same, only we don’t set the inputs for the correct solu-tion (labels). This makes sense because we don’t know them yet. In the output, we do not call the nodes for the error calculation and the update of the neural net (total_loss:0, accuracy:0, optimize), so we do not learn. Instead, we only query the result (prediction). Since the input of the solutions is not necessary for the calculation of the result, everything works just like before: There is no error because the part of the graph that trains the neural net remains inactive.

Practical experiencesThe method presented here is not only an interesting ex-periment, but the author has already used it successfully in several commercial projects. Thereby, several advan-tages have emerged in practical use:

• The Java API is fast and efficient: There is no perfor-mance loss compared to the pure Python application. On the contrary: Since Java is much faster than Py-thon for tasks like data import and pre-processing, it is even easier to implement a high-performance train-ing process.

• The training runs absolutely stable over several days, Google's Java implementation has proven to be very reliable.

• The deployment of the finished product is much easier than that of Python-based products, since only a Java runtime environment and the correct CUDA drivers need to be present – all dependencies are part of the Java TensorFlow library.

• TensorFlow's low-level persistence API (as presented here) is easier to use than many of the "official" methods, such as estimators.

The only real drawback is that part of the project is still Python-based – the definition of the graph. So you need a team that is at least partly at home in the Python world.

Listing 2 fun train(inputs: Array<FloatArray>, labels: IntArray) { withResources { val results: List<Tensor<*>> = session.runner() .feed("inputs", Tensors.create(inputs).use()) .feed("labels", Tensors.create(labels).use()) .fetch("total_loss:0") .fetch(“accuracy:0") .fetch("prediction") .addTarget("optimize").run().useAll() val trainingError = results[0].floatValue() val accuracy = results[1].floatValue() val prediction = results[2].intValue() }}

Listing 3class Resources : AutoCloseable { private val resources = mutableListOf<AutoCloseable>()

fun <T: AutoCloseable> T.use(): T { resources += this return this }

fun <T: Collection<AutoCloseable>> T.useAll(): T { resources.addAll(this) return this }

override fun close() { var exception: Exception? = null for (resource in resources.reversed()) { try { resource.close() } catch (closeException: Exception) { if (exception == null) { exception = closeException } else { exception.addSuppressed(closeException) } } } if (exception != null) throw exception }}

inline fun <T> withResources(block: Resources.() -> T): T = Resources().use(block)

Christoph Henkelmann holds a degree in Computer Science from the University of Bonn. He is currently working at DIVISIO (https://divis.io), an AI company from Cologne, where he is CTO and co-founder. At DI-VISIO, he combines practical knowledge from two dec-

ades of server and mobile development with proven AI and ML technology. In his pastime he grows cacti, practices the piano and plays video games.





MAGAZINE

22

Strategy


As organizations move to become more “data-driven” or “AI-driven”, it’s increasingly important to incorpo-rate data science and data engineering approaches into the software development process to avoid silos that hin-der efficient collaboration and alignment. However, this integration also brings new challenges when compared to traditional software development. These include:

A higher number of changing artifacts. Not only do we have to manage the software code artifacts, but also the data sets, the machine learning models, and the param-eters and hyperparameters used by such models. All these artifacts have to be managed, versioned, and promoted through different stages until they’re deployed to produc-tion. It’s harder to achieve versioning, quality control, re-liability, repeatability and audibility in that process.

Size and portability: Training data and machine learn-ing models usually come in volumes that are orders of magnitude higher than the size of the software code. As such they require different tools that are able to handle them efficiently. These tools impede the use of a single unified format to share those artifacts along the path to production, which can lead to a “throw over the wall” attitude between different teams.

Different skills and working processes in the work-force: To develop machine learning applications, ex-perts with complementary skills are necessary, and they sometimes have contradicting goals, approaches, and working processes:

Data Scientists look into the data, extract features and try to find models which best fit the data to achieve

the predictive and prescriptive insights they seek out. They prefer a scientific approach by defining hypotheses and verifying or rejecting them based on the data. They need tools for data wrangling, parallel experimentation, rapid prototyping, data visualization, and for training multiple models at scale.

Developers and machine learning engineers aim for a clear path to incorporate and use the models in a real application or service. They want to ensure that these models are running as reliably, securely, efficiently and as scalable as possible.

Data engineers do the work needed to ensure that the right data is always up-to-date and accessible in the re-quired amount, shape, speed, and granularity, as well as with high quality and minimal cost.

Business representatives define the outcomes to guide the data scientists’ research and exploration, and the KPIs to evaluate if the machine learning system is achiev-ing the desired results with the desired quality levels.

Continuous Delivery for Machine Learning (CD4ML) is the technical approach to solve these challenges, bringing these groups together to develop, deliver, and continuously improve machine learning applications.

The Continuous Intelligence CycleIn the first article [1] of The Intelligent Enterprise series, we introduced the Continuous Intelligence cycle (see Figure 2).

This is a fundamental cycle of transforming data into information, insights and actions that support an

Learn how to combine ML and CD

Continuous Delivery for Machine Learning In modern software development, we’ve grown to expect that new software features and enhancements will simply appear incrementally, on any given day. This applies to consum-er applications such as mobile, web, and desktop apps, as well as modern enterprise soft-ware. We’re no longer tolerant of big, disruptive software deployments. ThoughtWorks has been a pioneer in Continuous Delivery (CD), a set of principles and practices that im-prove the throughput of delivering software to production in a safe and reliable way.

by Danilo Sato, Arif Wider and Christoph Windheuser





MAGAZINE

23

Strategy


organization as it moves towards data-driven decision making. In traditional organizations, this cycle relies on legacy systems (e.g. data warehouses, ERP systems) and human decision making. In these organizations, the pro-cess is slow and contains many friction points: machine learning applications are often developed in isolation and never leave the proof of concept phase. If they make it into production, this is often a one-time ad-hoc pro-cess that makes it difficult to update and re-train them, leading to stale and outdated models.

Intelligent Enterprises implement ways to speed up the Continuous Intelligence cycle and remove the different friction points along the way. CD4ML is the technical approach to accelerate the value generation of machine learning applications as part of the Continuous Intelli-gence cycle. It enables you to move from offline or bench models and manual deployments; to automate the end-to-end process of gathering information and insights out of data; to productionize decisions and actions based on those insights; and collect more data to measure the outcomes once actions have been taken. This allows the

Continuous Intelligence cycle to run faster and produces higher quality outcomes at lower risks by allowing feed-back to be incorporated into the process.

What is CD4ML?To understand CD4ML, we need to first understand Continuous Delivery (CD) and where its principles origi-nated. Continuous Delivery, as Jez Humble and David Farley defined it in their seminal book, is: “… a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time”, which can be achieved if you “…create a repeatable, reliable process for releasing soft-ware, automate almost everything and build quality in.” They also state: “Continuous Delivery is the ability to get changes of all types — including new features, configura-tion changes, bug fixes, and experiments — into produc-tion, or into the hands of users, safely and quickly in a sustainable way.” Changes to machine learning models are just another type of change that needs to be managed and released into production. Besides the code, it requires

Fig. 1: Continuous Delivery for Machine Learning (CD4ML) is integrating the different development processes and workflows of different roles with different skill sets for the development of machine learning applications

Fig. 2: The Continuous Intelligence Cycle






MAGAZINE

24

Strategy


our CD toolset to be extended so that it can handle new types of artifacts. What’s more, the whole process of pro-ducing software in short cycles becomes more complex because there is more variety in the team’s skill sets (data scientists, data engineers, developers and machine learn-ing engineers), with each following different workflows.

ThoughtWorks has further developed the Continuous Delivery approach to overcome these challenges to be applicable to machine learning applications and calls this new approach Continuous Delivery for Machine Learning (CD4ML). It allows us to extend the Continu-ous Delivery definition to incorporate the new elements required to speed up the Continuous Intelligence cycle:

Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe incre-ments that can be reproduced and reliably released at any time, in short adaptation cycles.

This definition contains all the basic principles:Software engineering approach. It enables teams to

efficiently produce high quality software.Cross-functional team. Experts with different skill

sets and workflows across data engineering, data sci-ence, development, operations, and other knowledge areas are working together in a collaborative way em-phasizing the skills and strengths of each team member.

Producing software based on code, data, and machine learning models. All artifacts of the software production process (code, data, models, parameters) require different tools and workflows and must be managed accordingly.

Small and safe increments. The release of software artifacts is divided into small increments, this provides visibility and control around the levels of variance of the outcomes, adding safety into the process.

Reproducible and reliable software release. The pro-cess of releasing software into production is reliable and reproducible, leveraging automation as much as possi-ble. This means that all artifacts (code, data, models, parameters) are versioned appropriately.

Software release at any time. It’s important that the software could be delivered into production at any time. Even if organizations don’t want to deliver software all the time, the fact is that being ready for release makes the decision about when to release it a business decision instead of a technical decision

Short adaptation cycles. Short cycles means develop-ment cycles are in the order of days or even hours, not weeks, months, or even years. To achieve this, you want to automate the process — including quality safeguards built in. This creates a feedback loop that enables you to adapt your models as you learn from their behavior in production.

How it all works togetherCD4ML aims to automate the end-to-end machine learning lifecycle and ensures a continuous and friction-less process from data capture, modeling, experimenta-

tion, and governance, to production deployment. Figure 3 gives an overview of the whole process.

Starting at the left side of the cycle, data scientists work on data they discover and access from data sources. They wrangle the data, perform feature extraction, split the data into training and test data, build data models and experiment with all of them. They write code to train the models (often in Python or R) and tune them by choos-ing parameters and hyperparameters. As these models are trained, the data scientists are constantly evaluating them. This means looking at the model’s error rate, the confusion matrix, the number of false positives and false negatives, or running certain test scripts — for example, for chatbots. The tests should be as automated as pos-sible with the help of test environments, test scripts or test programs. Once a good model is found, it’s ready to be productionized. The model has to be adapted to the production environment. This could mean contain-erization of the model code or even transforming it to a high-performance language like Java or C++ — either manually or using automatic transformation tools. The productionized version of the model has to be tested again in conjunction with other components of the over-all architecture before it can be deployed to production.

In production, we have to observe and monitor how the model behaves “in the wild”. Metrics like usage, model input, model output, and possible model bias are important information about the model performance. This data can be fed back to the first stage of the process to enable further improvement: the whole Continuous Intelligence cycle starts again. The transportation of the artifacts (source code, executables, training, and test data or model parameters) between the different pro-cess stages is controlled via pipelines that are executed by a CD orchestration tool. Every artifact is versioned, enabling reproducibility and auditability, so prior ver-sions can be rebuilt or redeployed if required. The CD orchestration tool ensures the smooth and frictionless operation of the whole process and also allows govern-

Speaker

Samar SinglaJungleworks

Samar is a serial entrepreneur and a physicist by education. He has previously worked as a researcher at IBM and CERN. He is one of the industry’s foremost speakers. Samar founded Jugnoo in 2014 with the

vision to transform the Indian auto-rickshaw sector. Apart from Jugnoo, Samar also founded Click-Labs, a profitable SaaS technology solution provider of Business suit called ‘JungleWorks’.He is an avid traveller and amateur photographer who likes to document the everyday world. Samar’s personal website calls him ‘Someone somewhere in a garage’ which clearly portrays his love and penchant for building new things.

https://mlconference.ai/speaker/samar-singla/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





MAGAZINE

25

Strategy


ance and compliance, so certain quality standards and fairness checks are built into the process.

CD4ML in ActionWe want to demonstrate the approach in practice based on a real client project delivered by ThoughtWorks. In fact, our current notion of CD4ML first emerged several years ago when we first applied Continuous Delivery to a user-facing machine learning application. You can read about it in detail here [2]. Our challenge was to build a price estimation engine for a leading European online car marketplace. The engine needed to be able to give a realis-tic estimate for anybody looking to buy or sell a car. That price estimate would be based on past car sales within the marketplace. As the market for used cars is constantly changing, the price estimation model has to be continu-ously re-trained on new data. A perfect case for CD4ML.

Figure 4 shows the overall CD4ML flow for this specific case. The data scientists train the model using data from the marketplace — such as car specs, asking price and actual sales price. The model then predicts a price based on the car model, age, mileage, engine type, equipment, etc. Before training a model, there’s a lot of data cleanup work to be done: detecting outliers, wrong listings, or dirty data. This is the first quality gate to be automated — is there enough good data to even provide a prediction model for a certain car model?

Once the trained model can make sufficiently accurate price estimates, it’s exported as a productionizable arti-fact, — a JAR or a pickle file. This is the second quality gate: is the model’s error rate acceptable? This predic-tion model is then transformed into a format matching the target platform, then packaged, wrapped, and inte-grated into a deployable artifact — a prediction service JAR containing a web server or a container image that can be readily deployed into a production environment. This deployment artifact is now tested again, this time in an end-to-end fashion: is it still producing the same results as the original, non-integrated prediction mod-el? Does it behave correctly in a production environ-ment, for instance, does it adhere to contracts specified by other consuming services? This is the third quality gate. If all three quality gates succeed, a new re-trained price prediction service is deployed and released. Impor-tantly, all of those steps should be automated so that re-training to reflect the latest market changes happens without manual intervention as long as all quality gates are satisfied.

Finally, the live price prediction is continuously moni-tored: how do the sellers react to the price recommen-dations? How much is the listing price deviating from the suggestion? How close is the price prediction to the final buying price of the respective vehicle? Is the overall conversion and user experience being impacted, for in-

Fig. 3: Continuous Delivery for Machine Learning in action





MAGAZINE

26

Strategy


stance by rising complaints or direct positive feedback? In some cases, it makes sense to deploy the new model next to the old version to compare their performance. All this new data then informs the next iteration of training the prediction model, either directly through new data from cars that were sold or by tweaking the model’s hyperparameters based on user feedback, which closes the Continuous Intelligence cycle.

Opportunities of CD4ML and the road aheadAdopting Continuous Delivery for Machine Learning creates new opportunities to become an Intelligent Enter-prise. By automating the end-to-end process from experi-mentation to deployment, to monitoring in production, CD4ML becomes a strategic enabler to the business. It creates a technological capability that yields a competi-tive advantage. It allows your organization to incorpo-rate learning and feedback into the process, towards a path of continuous improvement. This approach also breaks down the silos between different teams and skill sets, shifting towards a cross-functional and collabora-tive structure to deliver value. It allows you to rethink your organizational structures and technology landscape to create teams and systems aligned to business outcomes. In subsequent articles in the series, we’ll explore how to bring product thinking into the data and machine learn-ing world, as well as the importance of creating a culture that supports Continuous Intelligence. Another key op-portunity to implement CD4ML successfully is to apply platform thinking at the data infrastructure level. This enables teams to quickly build and release new machine learning and insight products without having to rein-vent or duplicate efforts to build common components

Fig. 4: A CD4ML end-to-end process in a real-world example

Speaker

Dr. Pieter ButeneersChatlayer.ai

Pieter Buteneers started his career in aca-demia, first as a PhD student and later as a post-doc, where he did research on Machine Learning, Deep Learning, Brain Computer In-terfaces and Epilepsy. He won the first prize

in the biggest Deep Learning competition of 2015 together with a team machine learners from Ghent University: the National Data Science Bowl hosted on kaggle.com. In 2016, he finished his MBA at Flanders Business School. The last couple of years he consulted many companies where he trained managers and developers to build and implement new strategies to extract value from data using Machine Learning. Now he is making chatbots work for you as the CTO of Chatlayer.ai.

from scratch. We’ll dedicate an entire article to the technical components, tools, techniques, and automation infrastructure that can help you to implement CD4ML.

Finally, leveraging automation and open standards, CD4ML can provide the means to build a robust data and architecture govern-ance process within the organization. It allows introducing processes to check fairness, bias, compliance, or other quality attributes within your models on their path to production. Like Continuous Delivery for software development, CD4ML allows you to manage the risks of re-leasing changes to production at speed, in a safe and reliable fashion.

All in all, Continuous Delivery for Machine Learning moves the development of such appli-cations from proof-of-concept programming to professional state-of-the-art software engineer-ing.

This article was first published on Thought-Works.com [3]

We are a software company (ThoughtWorks.com) and a com-munity of passionate, purpose-led individuals. We think disruptively

to deliver technology to address our clients’ toughest challeng-es, all while seeking to revolutionize the IT industry and create positive social change.

Links & Literature

[1] https://www.thoughtworks.com/insights/articles/intelligent-enterprise-series-models-enterprise-intelligence

[2] https://www.thoughtworks.com/insights/blog/getting-smart-applying-continuous-delivery-data-science-drive-car-sales

[3] https://www.thoughtworks.com/

https://www.thoughtworks.com/insights/articles/intelligent-enterprise-series-models-enterprise-intelligence

https://www.thoughtworks.com/insights/articles/intelligent-enterprise-series-models-enterprise-intelligence

https://www.thoughtworks.com/insights/blog/getting-smart-applying-continuous-delivery-data-science-drive-car-sales



https://www.thoughtworks.com/

https://mlconference.ai/speaker/pieter-buteneers/? utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





MAGAZINE

27

Strategy


by Deepak Jannu

Gartner coined the phrase AIOps in 2016 to introduce IT decision-makers to a modern approach to the age-old problem of IT service outages. The discipline of IT incident management has seen a huge leap with the ap-plication of data science, machine learning, and combi-natorial optimization techniques for service restoration. Here is a quick overview of the different tools that enter-prises have used to maintain and optimize their mission-critical IT services:

1. Event Correlation and Analysis (ECA)In the early 2000s, event correlation and analysis tools (aka event consoles and manager of managers) emerged to streamline incident response workflows. ECA tools ingested events from different IT infrastructure elements and processed them using causal rules and event filters. While ECA tools worked well in an era of predictable IT infrastructure, technologists struggled to make them work for dynamic apps and infrastructure. Examples of ECA tools include BMC ProactiveNet, CA Spectrum, EMC Smarts, HP OMi, and IBM Tivoli Netcool.

2. IT Operations Analytics (ITOA)ITOA tools capitalized on the popularity of big data analytics in the 2010s to de-liver holistic insights into the performance of modern IT services. While ITOA tools were great at uncovering hidden patterns across event datasets, their focus on histori-cal data analysis (as opposed to real-time analysis) limited their appeal to technology operations teams. ITOA vendors of note include Appnomic, Evolven, Nastel, Netui-tive, and Prelert.

3. Artificial Intelligence for IT Operations (AIOps)Artificial Intelligence for IT Operations (AIOps) tools combine “big data and ma-chine learning to automate IT operations

processes, including event correlation, anomaly detec-tion, and causality determination.” [1] The purpose of AIOps tools is to tame event floods by extracting relevant insights for incident response using historical and real-time analysis. Popular AIOps vendors include Moogsoft, BigPanda, Loom Systems, and FixStream.

A short history of AIOps buyoutsIn January 2020, larger IT operations management (ITOM) players acquired three different AIOps start-ups—Unomaly, Nyansa, and Loom Systems—signaling rapid industry consolidation. From 2015, there have been a total of 11 exits in the ITOA/AIOps market, demon-strating incumbent interest in new approaches to incident remediation. Industry leaders like ServiceNow, VMware, New Relic, HP, Cisco, Elastic, PagerDuty, and Splunk all have acquired AIOps tools to complement product port-folios, launch new product lines, build intellectual prop-erty, or gain access to outstanding talent (Fig. 1).

A quick analysis of these 11 buyouts shows that mon-itoring (application, infrastructure, network, and log-ging) companies accounted for more than half of the

How industry consolidation is transforming the AIOps value proposition

AIOps in 2020 Four years ago, Gartner coined the phrase AIOps as an approach to the issue of IT service outages. Now in 2020, AIOps entered the enterprise in modern IT teams. Where do we go from here? Here is a quick overview of the different tools that enterprises have used to maintain and optimize their mission-critical IT services and a short history of AIOps buyouts.

Fig. 1: Acquisitions of AIOps tools.





MAGAZINE

28

Strategy


Speaker

Laurent PicardGoogle

Laurent is a developer passionate about software, hardware, science, and everything shaping the future. He works for Google Cloud where he enjoys exploring and sharing what’s possible. In a prior life, he

pioneered the ebook industry, co-created the 1st European ebook reader, and co-founded Bookeen. You can reach Laurent on Twitter at @PicardParis.

Links & Literature

[1] https://www.gartner.com/en/information-technology/glossary/aiops-artificial-intelligence-operations

[2] https://www.businesswire.com/news/home/ 20200122005228/en/ServiceNow-Acquire-Loom-Systems

[3] https://fixstream.com/blog/resolve-acquires-fixstream/

[4] https://newrelic.com/press-release/20190206-2

[5] https://www.pagerduty.com/newsroom/pagerduty-acquires-event-enrichment-hq/

[6] https://blog.opsramp.com/service-centric-aiops-opsq

2. Stand-alone AIOps tools address a limited part of the incident workflowWhile AIOps tools offer actionable event intelligence, they need to connect back to a system of record (service desk or alert management systems) for prompt incident escalation and response. If the incident requires a stand-ard response, AIOps tools again need to rely on IT pro-cess automation tools for rapid remediation.

3. Domain centricity mattersAIOps tools use machine learning and data science methods to uncover patterns from IT events. IT teams will see great value in marrying AIOps insights with do-main-centric tools, like performance monitoring, so that they can view, analyze, and act with the right context across the entire incident lifecycle.

4. Machine learning is now table stakes for digital operations managementMachine data intelligence is no longer a competitive dif-ferentiator for AIOps tools as most hybrid monitoring tools have incorporated native machine learning algo-rithms for reduced event noise and faster root cause(s) analysis for performance management.

ConclusionIn 2020, there are only a few stand-alone AIOps ven-dors left in the market. IT decision-makers will prioritize AIOps tools that can not only ingest different types of performance data through open integrations and APIs, but also deliver the right business context for incident response using native instrumentation for so-called full-stack observability. Expect more blurring of technol-ogy categories – from ITOM to ITSM – as algorithmic event management, contextual alerting, and self-healing workflows become part of the default toolkit for inci-dent management teams.

Deepak Jannu is a B2B technology marketer with ex-pertise in product marketing, corporate communica-tions, sales enablement, and digital marketing. As the director of product marketing at OpsRamp, he is re-sponsible for high-impact product positioning and mes-

saging that drives awareness of its disruptive ITOM platform.

overall acquisitions in the AIOps space. Here are some of the reasons cited by these acquirers for their purchase:

ServiceNow on the rationale for buying Loom Systems [2] – “With Loom Systems, ServiceNow will increase customers’ ability to apply AI to their knowledge base of issues and fixes for better insights into root causes and allow them to automate remediation tasks, reducing the number of Level 1 IT incidents.”

Resolve Systems on the FixStream purchase [3] – “Ul-timately, the long-term vision for the combined Resolve and FixStream solution is to aid customers in achieving the long-awaited promise of ‘self-healing IT’.”

New Relic on the SignifAI transaction [4] – “To de-liver reliable software at scale, DevOps teams need to leverage machine learning to help them predict and de-tect issues early and reduce alert fatigue.”

PagerDuty on acquiring Event Enrichment HQ [5] – “Integrating the Event Enrichment Platform with Pag-erDuty reduces noise, surfaces actionable alerts and delivers context-rich remediation information so busi-nesses resolve critical incidents faster.”

AIOps in the enterprise: Where do we go from here?What do recent acquisitions portend for the future of AIOps as a stand-alone ITOM category? While AIOps approaches will become increasingly widespread for in-cident management and response, it is unclear if enter-prise IT teams will invest heavily in pure-play AIOps tools for the limited value (event correlation and root cause analysis) that they currently deliver.

Here are four reasons why we will see a transformation in the AIOps value proposition so that modern IT opera-tions teams realize greater value from their investments:

1. It’s all about business-service impactAIOps tools have no way to contextualize the event im-pact for an IT service without native instrumentation. Machine learning algorithms on their own cannot estab-lish the priority and urgency of a specific event without service context. AIOps tools require service-centricity [6] so that IT teams can focus on the critical incidents that matter to their business and safely ignore the rest.

https://www.gartner.com/en/information-technology/glossary/aiops-artificial-intelligence-operations

https://www.gartner.com/en/information-technology/glossary/aiops-artificial-intelligence-operations

https://www.businesswire.com/news/home/ 20200122005228/en/ServiceNow-Acquire-Loom-Systems

https://www.businesswire.com/news/home/ 20200122005228/en/ServiceNow-Acquire-Loom-Systems

https://fixstream.com/blog/resolve-acquires-fixstream/

https://newrelic.com/press-release/20190206-2

https://www.pagerduty.com/newsroom/pagerduty-acquires-event-enrichment-hq/

https://www.pagerduty.com/newsroom/pagerduty-acquires-event-enrichment-hq/

https://blog.opsramp.com/service-centric-aiops-opsq

https://mlconference.ai/speaker/laurent-picard/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





MAGAZINE

29

Strategy


Interview with Meike Hammer and Maksim Moiseikin

“TensorFlow Brings ML Models to All Devices” Machine learning and deep learning are no longer only used on desktop PCs, but also on less powerful devices such as smartphones, tablets or even smartwatches. We spoke to Meike Hammer and Maksim Moiseikin (arconsis IT-Solutions GmbH) to gain insights into this newer ML area. In the interview, they point out what aspects need to be considered, what difficulties can arise, and why TensorFlow is particularly well suited for the task.

Session

Predictive Maintenance – How Does Data Science Revolutionize the World of Machines?Victoriya Kalmanovich (Navy)

In today’s world of machines, there are two leading maintenance techniques to sup-port a standard machine lifcycle. Predictive maintenance revolutionizes the future of machines. It tracks each of the machines’

unique lifecycle and doesn’t generalize. It allows us to know if our machines need to be attended to in advance.Victoriya shares a special maritime case study and discusses the big promise of predictive maintenance. In a world full of machines, we need to be the bridge connecting the methods of the past to the opportunities of the future.

ML Magazine: Meanwhile, machine learning has found its way into user applications on devices such as smart watches, smartphones and tablets. In which cases is ML used there?Meike Hammer and Maksim Moiseikin: Machine learn-ing is becoming increasingly important on mobile devic-es. Many people are already accustomed to using apps with ML technologies, although they usually don't know it. For example, the latest Android versions use machine learning to improve battery life: Depending on how the application is used, they are divided into several classes and each class has its own restrictions on background activity. Fitness apps read information from accelerom-eters and other mobile phone sensors and use machine learning to identify activities and count steps by using this data.

In Natural Language Processing (NLP), two applica-tion scenarios are particularly common. Language as-sistants use deep learning models to understand natural language and find meaningful answers. Intelligent key-boards can predict the next words based on user input, correct typos and increase input speed. In all these use cases, machine learning/deep learning models show bet-ter results than classical algorithms, which is why they are used on mobile devices.

Furthermore, there are many applications in the field of computer vision and computational photography. Camera apps use ML, for example, to make photos brighter and more stable in low-light environments, and to suppress image noise. For this reason, images from high-end smartphones are often compared to DSLR im-ages today. Social networks offer apps that search for specific control points on selfies in real time to apply various filters or masks.


ML Magazine: What are the particular difficulties of ML on these devices?Hammer and Moiseikin: In many scenarios, it would be easier to send data to a powerful server and process it there. But, of course, this is not always possible or desired: In order to protect the user's privacy, some data must never be sent to the server. In addition, there is sometimes too much data to be sent to the server fast enough. And also, some calculations should be able to run offline if there is no internet connection. Therefore, on-device machine learning is very important nowadays.

The biggest challenge lies in the performance of the devices. Smartphones are equipped with energy-efficient processors that have a much lower performance than PCs or notebooks. RAM size is also very limited and the

https://mlconference.ai/machine-learning-business-strategy/predictive-maintenance-how-does-data-science-revolutionize-the-world-of-machines/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





MAGAZINE

30

Strategy


Shorttalk

Back to Basics: Approaches to Machine LearningJigyasa Grover (Twitter, Inc.)

In the contemporary world of learning algo-rithms, along with the aggregate domains plying Machine Learning the complexity of the models itself is swelling. Thus it is impor-tant to approach Machine Learning in a con-

ceptual way and in this talk I will present an informal taxono-my of the Machine Learning algorithms, majorly grouped on the different mathematical abstractions. I will cover Logical Models with tree based or rule based concepts, Geometric Models including linear and distance based approaches, and Probabilistic Models. I will go over the fundamentals of each type of model and discuss their positives, limitations and use cases along-with a very simple hands-on example. The talk will conclude with some pointers on how to explore the data and choose a model to make the learning more efficient. This talk aims to introduce different categories of Machine Learning models from mathematical point of view and encourage budding ML enthusiasts to reflect on the implications of each corresponding to their domain. By better understanding these types of models attendees will be empowered to design intelligent solutions.

Maksim Moiseikin has been supporting arconsis as a working student since 2015. Since February 2019, he has been a permanent member of the team. In his bachelor thesis he dealt with the area of deep learn-ing in computer vision. At arconsis, Maksim develops

and trains machine learning/deep learning models and inte-grates them into applications on various platforms. In addition, he also enjoys developing applications with Kotlin.

Also visit this shorttalk at ML Conference Munich

packages (tf.layers, tf.nn, tf.slim, tf.keras, ...), and are sometimes not compatible with each other. This makes both development and code reuse more difficult. Some operators are not supported for all environments (GPU, TPU). The library works by first defining a graph, which is filled with values later. This makes analysis and debug-ging very difficult. Furthermore, we encountered incom-patibilities with external libraries: With an old version of the CoreML-Converter we could not convert models of newer TensorFlow versions and therefore could not up-date TensorFlow.

TensorFlow 2 has solved many of these problems. Keras was provided as recommended API as well as part of the TensorFlow library and old APIs were cleaned up. An important change is Eager Execution, which is now enabled by default. The operations are evaluated on ex-ecution and return the correct values, not just the graph definitions. This simplifies development and allows you to use the standard Python debugger.

ML Magazine: What tips can you give for integrating machine learning into a user application?Hammer and Moiseikin: With machine learning, it is very easy to underestimate the complexity of the task. With the help of modern libraries and development tools, you can very quickly develop a prototype of your model, which was impossible before. However, it takes a lot of time, modifications, experiments and optimizations to get from the first version to a finished product. So, start with very simple models to verify your idea and improve the model iteratively. In most cases, it does not make sense to implement everything from scratch. There are now many good open source libraries that help with many tasks and usage scenarios.

It can be quite useful to use transfer learning: Down-loading a pre-trained model, using it as a basis and fine-tuning it with a task-specific data set to achieve better results and reduce training time.

And the training data is extremely important. A large amount of data that is as realistic as possible is required to achieve good results in the final application.

Thank you for the interview!Interview questions by Maika Möbus

Meike Hammer has been working as a Software Engi-neer at arconsis IT-Solutions GmbH in Karlsruhe since 2016. She is responsible for the development of mobile apps, the implementation of which increasingly requires technologies such as machine learning. Her focus is es-

pecially on Natural Language Processing for realizing language-based software components as Natural User Interfaces.

ML models and algorithms are very resource intensive. This means we need to find a balance between prediction quality and performance on all devices, not only for high-end devices. During development, for example, we had to struggle a lot with the overheating of smartwatches, and we always had to cool down the test devices in the refrigerator.

ML Magazine: Are certain machine learning frame-works particularly well suited for devices such as smartwatches and smartphones?Hammer and Moiseikin: Currently, we consider Tensor-Flow to be the best library for deployments of ML/DL models on smartphones. TensorFlow offers possibilities to deploy ML models to any device. In many cases, cer-tain tools and optimizations are provided to compress and accelerate models. TensorFlow Lite on Android and iOS can even use the smartphone GPU for calculations. Additionally, models from Keras/TensorFlow can be converted into a CoreML format.

The latest PyTorch versions also offer experimental support for Android and iOS devices and look promis-ing. So far, however, we have not been able to gain any experience in productive use.

ML Magazine: What experience have you gained in the practical implementation of TensorFlow on smart-phones and smartwatches?Hammer and Moiseikin: TensorFlow is a good library and helps train the ML models and run them on end de-vices. But TensorFlow is not perfect either. Many APIs are too complex, are defined multiple times in different

https://mlconference.ai/machine-learning-principles/back-to-basics-approaches-to-machine-learning/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





31

MAGAZINE AI & Ethics


by Dean Chester

As AI and machine learning permeate every sphere of our lives today, it gets easier to celebrate these tech-nologies. From entertainment to customer support to law enforcement, they provide humans with consid-erable help. Certain things they are capable of are so amazing that they seem almost like magic to an outside observer. However, it’s necessary to remember that as astonishing as machine learning-powered tech ad-vancements are, they are still a product created by us, humans. And we can’t simply shed our personalities when developing anything, much less an AI – an algo-rithm that has to think on its own. While developers’ personal experiences and beliefs are an indispensable asset in creating ML algorithms, alas, they come at a cost sometimes.

A brief overview of bias in AINo AI, sadly, can stay 100% impartial to everything. There always are and will be biases in it just like in any product made by a human – especially as sophisticated as machine learning algorithms. Over the last few years, we have seen quite human prejudice exhibited by artifi-cial intelligence more than once.

In cases where AI is used by the police, it can lead to very dire consequences. A 2019 study performed by the UK’s Royal United Services Institute for Defense and

The ethics of AI and ML

What Data Should AI Be Trained on to Avoid Bias? Humans are introducing their own biases and prejudices into machine learning. As advanced as AI can be, having been built by humans, it can still share some of our own ethical shortcomings. The usage of proper databases during training is one of the ways to help prevent biases from developing within artificial intelligence.

Blog

How UX can demystify AI: “We need more than just technical transparency”Ward Van Laer (IxorThink)

Can UX demystify AI? Ward Van Laer answered this question in his session at the ML Conference 2019. We invited him for an interview and asked him how to solve the black box problem in machine learning by

merely improving the user experience.

Security Studies (RUSI) paints a grim picture. It is con-cerned with biases machine learning has but this time, it is the machine learning used for data analysis by the police. As those algorithms are trained on the databases made by the police, they are bound to share the police force’s biases. As the paper quotes a police officer [1], “young black men” are stopped by the police more fre-quently than Caucasian men from the same age group. The AI training on reports that represent such a situa-tion will also see the black population as more likely to commit crimes and analyze data accordingly, thus car-rying the oppression on.

https://mlconference.ai/blog/ai-ux-interview-ward-van-laer/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





32



Session

AI for Decision MakersUri Eliabayev (Machine & Deep Learning Israel)

AI is everywhere. We get it. But still, many managers (Non-techy) can’t understand how they can harvest the huge leap we did in the field of Machine Learning to their own companies or organization.

This special lecture will give you the whole picture when thinking about adding some AI capabilities to your product/service. In this lecture, you will learn the basics terms of AI; you will be introduced to all the latest technologies in the field and finally, you will examine real use cases from global brands that use AI as part of their strategy.

Over the last few years, we have seen quite human prejudice exhibited by artificial intelligence

more than once.

Also visit this session at ML Conference Munich

The necessity of proper databases in AI trainingEven in a hypothetical situation where just one person develops artificial intelligence algorithms, the problem with bias can be avoided by having sufficient databases to train the algorithms on. As long as every ethnicity and race is properly represented in the database, there should be no issue. However, what proportional repre-sentation is fair? After all, African Americans account only for 12.7% [4] of the US population. It’s obvious, though, that if they are represented in an AI training database in that exact proportion, the facial recogni-tion algorithm is going to be less precise for the black population. Therefore, paradoxically, to ensure that AI doesn’t discriminate against minorities, they need to be overrepresented in the databases. The situation with police databases is harder because it’s directly based on officers’ behavior which can be skewed against a par-ticular minority or another subset of the population. It appears that strict control over what information ac-quired by the police gets to be on the machine-training database is necessary.

However, the criteria of how it should be implement-ed are almost impossible to determine because those who will develop and apply them are also bound to be subject to their own biases. So the best solution for AI bias in law enforcement is, ironically, not to use AI in that sphere due to the potential problems it causes for the police force and the population.

Dean Chester is a practicing cybersecurity expert and author of numerous articles on Cooltechzone and oth-er tech websites such as Sensorstechforum, Bdtech-talks, AT&T, OpenVPN, etc. Dean is a fan of all topics related to data privacy and cybersecurity. He usually

takes part in various tech tutorials, forums, conferences, etc.

Racial and gender biases are not the only ones that plague AI. The same machine learning algorithms used by the police forces of England and Wales create an-other situation when overreliance on them can lead to devastating consequences. An example that the RUSI paper gives is AI assigning risk categories to individuals that have had problems with the law: someone whose likelihood of returning to the life of crime is determined as “low” may still require additional help and guidance not to make another slip. Machine learning algorithms do not fully understand that and by labeling such an individual as a “low-risk” one, gives the police a false sense of safety. Similarly, AI biases are dangerous in cy-berbullying prevention [2]. In this sphere of data analy-sis, context plays a huge part. The same neutral terms and phrases are often used by the hate groups and sup-port communities very often. Another example of that is facial recognition algorithms. As it turns out, these algo-rithms have a harder time distinguishing faces of African American people than those of Caucasians. Interestingly enough, the recognition AI in question wasn’t developed by amateurs and neither was it a single case: programs developed and sold by Amazon, Microsoft, and IBM all showed signs of racial bias according to the research conducted last year [3].

The conclusion that research arrived at is that such a state of affairs is caused by the overwhelming majority of employees of the mentioned companies being Cauca-sian. However, this alone shouldn’t make AI unable to recognize faces of people of other ethnicities. A much bigger problem is the data that AI is trained on to do its job.

Links & Literature

[1] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/831750/ RUSI_Report_-_Algorithms_and_Bias_in_Policing.pdf

[2] https://cooltechzone.com/internet-safety-guide

[3] https://time.com/5520558/artificial-intelligence-racial-gender-bias/

[4] https://www.minorityhealth.hhs.gov/omh/browse.aspx?lvl=3&lvlid=61

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/831750/ RUSI_Report_-_Algorithms_and_Bias_in_Policing.pdf



https://cooltechzone.com/internet-safety-guide

https://time.com/5520558/artificial-intelligence-racial-gender-bias/

https://time.com/5520558/artificial-intelligence-racial-gender-bias/

https://www.minorityhealth.hhs.gov/omh/browse.aspx?lvl=3&lvlid=61

https://www.minorityhealth.hhs.gov/omh/browse.aspx?lvl=3&lvlid=61

https://mlconference.ai/machine-learning-business-strategy/ai-for-decision-makers/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





33



Alexander Görlach, editor in chief of the magazine conditiohumana.io, talks to Noam Chomsky

“Democracy Has Suffered as Power Concentrates”

Speaker

Prof. Raul RodríguezWoxsen School of Business

Dr. Raul Villamarin Rodriguez, Asst. Profes-sor of AI and ML and Robotics Head at Wox-sen School of Business, who holds a Ph.D. on Artificial Intelligence and Robotics Automa-tion applications in Human Resources.

Fmr. CEO & HR Manager at Irians Research Institute, a research institute specialized in the field of neuromarket-ing, AI, ML, cybersecurity and market research. He has also collaborated in the organization of symposiums at several educational institutions including Oxford Brookes University, UK and European Union-government bodies in Brussels, Belgium.He is a registered expert in Artificial intelligence, Intelligent Systems, Multi-agent Systems at the European Commission and nominee for the Forbes 30 Under 30 Europe 2020 list.He has co-authored two reference books: “New Age Lead-ership: A Critical Insight” and “Retail Store’e”

who suffer are our grandchildren who will inhabit an earth in which an organized society won’t be able to survive. That’s something we don’t discuss.

Alexander Görlach: Fairness and equality are two top-ics that have been at the heart of your work for dec-ades. Over the years, what role has technology played in bringing about a more equal world?Noam Chomsky: Unfortunately, inequality across the world is still rising. The rich inhabit a world in which they have no responsibility to their own countries. If you look at East Asia, which fifty years ago was as de-veloped as most parts of sub-Saharan Africa, the role

Alexander Görlach: Mr. Chomsky, you have spent a good part of your life advocating an America that is fundamentally different from the one that you live in now. What is most frustrating to you? Noam Chomsky: One of the saddest things is that the death rate, for the first time in decades, is increasing in a particular part of the population: white, working class, working age, roughly 25 to 50 years old. That hasn’t happened since the great flu epidemic a century ago. And then we read in the papers every day that we live in a wonderfully functioning economy.

Alexander Görlach: What’s the reason for that?Noam Chomsky: The last forty years were dominated by neoliberal principles, which was a stark departure from the previous period. Neoliberalism emphasises markets and disregards social needs and demands. Indi-viduals, it argues, should be placed in a market, where they are asked to survive without social supports such as welfare systems, benefits, unions and other forms of associations. Over the last few years, the consequences became increasingly clear. It increased the power of those who already had it and who can now exploit it. Wealth is greatly concentrated, corporate power has greatly increased, and individuals have, generally, suf-fered.

Alexander Görlach: Where do you see the most dramat-ic reversals in the last few years?Noam Chomsky: Probably with regards to climate change. The Environmental Protection Agency is re-moving regulations so that corporations can maximize profit. In fact, the United States is the only country that is increasing the use of fossil fuels except for us. Trump wants to further increase them while also cutting back on regulations for the automotive industry. The ones

https://mlconference.ai/speaker/prof-raul-rodriguez/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





34



Session

Production nightmare of building AI systems at scale – The last mileDr. Danish Rafique (TerraLoupe GmbH)

AI is no longer a secret ritual performed by digital-native organizations. Indeed, there was a time when legacy industries would brush off the need of AI systems right at the outset. The times have changed!

This talk is not about how to build an AI use case, or how to get managements’ buy-in, or how the lack of talent hampers such initiatives. Neither does it focus on how to get the necessary data, nor on building DL models. While a few organizations continue to be challenged by these concerns, most are facing a completely new predicament. A business-driven dilemma of operationalizing AI systems beyond prototypes. That’s what we are going to talk about. I will share my vision, blue print and personal stories across telecom, manufacturing and automotive industries, includ-ing both corporate and start-up experiences. I will tell you how to go from a shiny proof-of-concept to AI production systems, what challenges we faced, and the best practices to avoid the pitfalls.

The rich inhabit a world in which they have no responsibility to their own countries.

Also visit this session at ML Conference Munich

over the population. Or you can use it to liberate people, give them the space to communicate and discuss how they want to organize their world. The way it tends to be used depends on whomever controls it. If it’s controlled by a big corporate or an autocratic state, it will be used for gaining extensive information to then sell that data to advertisers, or to control people.

Unfortunately, inequality across the world is still ris-ing. The rich inhabit a world in which they have no re-sponsibility to their own countries.

Alexander Görlach: But isn’t there a qualitative differ-ence between something like a hammer, which is in-deed neutral, and machine learning?Noam Chomsky: First of all, we should appreciate that machine learning is a way to explore things within fixed domains. At the same time, there’s a lot of hype about it and a lot of people exaggerate the impact it can have. All the talk of superhuman machines that will control our lives is hot air. They are useful for certain things. Google Translate, for example, is a handy device that I use a lot. Its brute force, however, doesn’t tell you any-thing about human beings, cognition or how to bring about change.

Alexander Görlach: What change do you think is neces-sary, both in the United States and across the world?Noam Chomsky: We accept the fact that people spend their lives under the tyranny of private enterprises. That isn’t a law of nature, it can be changed, and it should be. The way out of it is popular democracy. Democ-racy has suffered as power concentrates and grants more influence to the rich and powerful over the democratic system. We need to fully take control of the institutions of our societies and then changing those institutions in a way that is conducive to our lives. And to be perfectly honest, if you look across the globe right now, with pro-tests in Chile, Venezuela, Hong Kong and elsewhere, that change is already happening.

Avram Noam Chomsky (born December 7, 1928) is an American linguist, philosopher, cognitive scientist, his-torian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is also a major figure in analytic philosophy and one of

the founders of the field of cognitive science. He holds a joint appointment as Institute Professor Emeritus at the Massachu-setts Institute of Technology (MIT) and Laureate Professor at the University of Arizona, and is the author of more than 100 books on topics such as linguistics, war, politics, and mass me-dia. Ideologically, he aligns with anarcho-syndicalism and liber-tarian socialism.

of technology in developing economies is obvious. They built their new wealth on the Japanese model – state directed development of technology, investments, and imports of capital goods from Europe. Compare that to Latin America. It has much richer resources and none of the foreign threats. Yet it hasn’t developed nearly in the same way. It maintains more of a colonial relationship with the colonial powers, increasingly with China. The imports into Latin America are luxury goods, and while in East Asia there are barriers against the export of capi-tal, Latin America doesn’t have that.

The ones who suffer are our grandchildren who will inhabit an earth in which an organized society won’t be able to survive. That’s something we don’t discuss.

Alexander Görlach: Has technology lately become more of a hindrance to equality and participation? Does it further inhibit the ability of people to participate in the public spheres?Noam Chomsky: Technology is neutral, it doesn’t care how you use it. A hammer doesn’t care whether you use it to build a house or crush somebody’s skull. That’s none of the hammer’s business. New technology is pret-ty much the same. You can use it, as China does, to impose an incredible system of surveillance and control

https://mlconference.ai/machine-learning-business-strategy/production-nightmare-of-building-ai-systems-at-scale-the-last-mile/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





35



by Aude Gouaux-Langlois and Belinda Sykora

In our daily life we are surrounded by voices. The con-stant stream of voices is made of our inner voice, the voices of others (like our coworkers, friends or people in our environment), recorded voices of the elevator or the voice mail, as well as computer generated voices like Siri or Alexa. The voice acts as a medium and embodiment of the Artificial Intelligence. When we are dealing with AI, we use our voice as a tool to interact with this kind of technology and it is then manifesting itself through sound. The question then arises as to which sound is giv-en to AI, in other words, its materiality. What could AI sound like? Do variations exist? Because AI is computer generated, its gender can be decided: either masculine, feminine or gender neutral. Is it possible to give AI a gender-neutral sound? In the modern age of program-ming, the feeling that we are starting on a blank page is noticeable. But is that really so?

Before the invention of the phonograph, the voice was confronted with the written. The voice stood for the liv-ing, the immediate, the floating in contrast to writing, which embodied rigidity, stability, and permanence. When Edison recorded the voice and played it for the first time, people felt uncomfortable, unheimlich. In-deed, the voices of dead people could be heard and re-called. Bodiless voices were floating among the living.

With the possibility of reproducing voice and its history, which can be traced back for over a hundred years, we are now confronted with a previously unknown phe-nomenon: a variety of bodiless voices constantly buzz around us and feed us with information. One of these voices is the voice of AI, which will be increasingly im-portant in the future.

In order to give a brief overview of the dimension of voice and how it is perceived, the possibilities of the voice will be described first. The voice as a performa-tive phenomenon can be described in its appearance with different characteristics. It carries temporality and spatiality in itself. It is immediately ephemeral, a fleeting event, nothing rigid nor reproducible, retriev-able or repetitive. The voice represents itself in the sound and it is wrapped in sound. The fact that the voice can be described in terms of tonalities and the feelings it conveys does not mean that we can simply define it by a list of characteristics. It is perceived as a sensual, psychological, physical, semiotical as well as mediating content. It has an intersubjective effect as a separating and connecting element. The voice does not only create identity for an individual, but at the same time forms community and conventions. There-fore, contrariness is always an essential characteristic: the voice is a paradox par excellence. The AI’s voice is created in the style of the human voice in which we

How an AI’s identity is embodied in its voice

Thinking AI’s Voices: Gender and Identity Driving home from an appointment in town, you turn on the GPS of your car to help you find your way. “Arrived at destination”, says the voice. In the elevator, you send a voice message to your friend to confirm dinner plans: “I’ll be there at 8pm”. Even though you don’t pay attention to it anymore, the indication “Third floor” resonates in the elevator just before the doors open to your apartment’s floor. Once home, you are greeted by Alexa who immediately responds by the affirmative when you asked her to play “It’s a hard day’s night” by The Beatles. The speakers blast the song and you dis-tinctly say “Alexa, softer”, which results in a more comfortable loudness.





36



Speaker

Jon BratsethVerizon

Jon Bratseth is a distinguished architect in the Big Data and AI group of Verizon, and the architect and one of the main contribu-tors to Vespa.ai, the open big data serving engine. Jon has 20 years experience as

architect and programmer on large distributed systems, and a frequent public speaker. He has a master in computer science from the Norwegian University of Science and Technology.

Being always named and described as “man” or “woman” also constricts a person in the norm.

Performance in Butler’s sense is not a one-time original event, but repetitive and constrained by norms and conventions.

attempts on how to develop a gender-neutral voice for the AI are now being made. The logical conclusion is that a neutral voice must lie in the frequency range in between. But is that really the case? “Q” – a gender-neutral voice developed by the agency Virtue together with the Danish linguist Anna Jørgensen – lays in the frequency range from 145 to 175 Hertz [4]. Therefore, it overlaps in the lowest female voice frequency range but does not enter the highest male voice frequency range. The reason for defining this particular frequency range lays in the fact that five voices were chosen due to their non “typical” male or female sound. These were played to test people’s voices several times and adjusted again and again until the majority of the voices were perceived as gender-neutral. In the context of this study, the method of research is based on how people perceive the voice. In other words it can be stated that perception is a key element in deciding how the neutral voice must sound. Thus, the way we deal with new technologies in-volved in programming, the voice of AI reflects the way we relate to voice in our daily life and in society.

Two famous AI voices are the ones embodying Alexa and Siri – two AI based tools that have an assistant func-tion. Their clear service purpose, combined with their recognisable female voices, enables us to notice that the female voice still carries these outdated stereotypi-cal ideas – even in new technologies like programming voices for AI.

However, this of course goes short if one assumes a global voice networking – since, as already mentioned, different assessments take place depending on the cul-ture and development of a society. Therefore, the ques-

perceive femininity and masculinity as well as the role we attribute to both.

Judith Butler developed the concept of “performativ-ity of gender” at the beginning of the 1990s. She chal-lenged biological gender as a category to define identity: gender is not only determined by the biological sex but also through acts of speaking and doing. In Gen-der Trouble, she questions the binary gender categories “man “or “woman” and states that they are construc-tions of language. According to Butler, the construction of gender starts when the baby is born. At birth, it will be immediately named and categorized: ranging from being “it” to “she” or “he” [1]. From then on, the rep-etition starts and never ends. Being always named and described as “man” or “woman” also constricts a per-son in the norm. Performance in Butler’s sense is not a one-time original event, but repetitive and constrained by norms and conventions: “Performativity is not a sin-gular act, but a repetition and a ritual, which achieves its effects through its naturalization in the context of a body, understood, in part, as a cultural sustained tem-poral duration” [2].

We expose ourselves consciously or unconsciously to repetitive auditory information encapsulated in the voice and process them unconsciously. As human be-ings, we learn through repetition: by repetitive hearing of the reproducible bodiless voices, we also uncritically assimilate to learn their sound characteristics. Accord-ingly, among other things, stereotypical gender roles can be spread, repeated, and perceived again and again.

The sound of the voice in relationship to the biologi-cal sex is subject to certain evaluations depending on the socio-cultural imprints of a society – and this applies in particular to the voice pitch. There are indeed tendencies to attribute certain characteristics to the female voice, such as “emotional”, “loving”, or “helpful”. On the contrary, the male voice is attributed with characteristics such as “dominance”, “assertiveness” or “competence”.

In a frequency range of 175 Hz and 262 Hz, we speak of the female voice. The male voice tends to be in the range of 98 Hz and 131 Hz [3]. Depending on social norms, a male voice that speaks in the higher frequen-cies is associated with female characteristics; the same applies to a female voice in lower frequencies. A deep female voice, for example, is considered more compe-tent than a high pitched female voice, while a male voice with a high frequency is considered more incompetent. In order to avoid such evaluations and categorizations,

https://mlconference.ai/speaker/jon-bratseth/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120





37



tion arises whether the use of a neutral voice is necessary in order to break up old evaluation structures. On the one hand, the use of the gender-neutral voice would pre-vent the opportunity to reinterpret and take advantage of the diversity of feminine and masculine voices. On the other hand, this neutral sound would dissolve the gen-der gap as well as enable a separation between a human being and a computer.

Is it even possible to find a global answer?The work of the artist Holly Herndon points towards

a direction of symbiosis, a fusion of the masculine and the feminine. Holly Herndon collaborated with the AI expert Jules LaPlace to create Spawn, a neural network contributing to her composition process. Holly Hern-don writes a score, records it with her ensemble of five singers, and feeds it into Spawn. During Herdon‘s con-cert at the Volksbühne in Berlin, the ensemble started a call and response with the audience. The recorded result was then fed into Spawn and contributes to “teach her” [5]. Interestingly, the AI is being fed with a multiplicity of singing voices and result in a genderless mixed choir. This artistic approach can be inspiring in order to de-velop AI voices in a humanistic way.

The embodied identity of AI is carried within the voice. For this reason, it is very important to under-

Blog

The Ethics of AI – dealing with dif-ficult choices in a non-binary worldEric Reiss (FatDUX Group)

In the field of machine learning, many ethical questions are taking on new meaning: On what basis does artificial intelligence make decisions? How can we avoid the transfer of social prejudices to machine learning mod-

els? What responsibility do developers have for the results of their algorithms? In his keynote from the Machine Learn-ing Conference 2019, Eric Reiss examines dark patterns in the ethics of machine learning and looks for a better answer than „My company won’t let me do that.“

Links & Literature

[1] Butler, J. (1993). Bodies that matter: on the discursive limits of “sex”. New York: Routledge

[2] Butler, J. (1999). Gender Trouble: Feminism and the Subversion of Identity, New-York: Routledge

[3] Habermann, G. (1978, 1985). Stimme und Sprache. Ein Einführung in ihre Psychologie und Hygiene, Stuttgart: Georg Thieme Verlag

[4] https://www.virtueworldwide.com/case-studies/meet-q

[5] Holly Herndon uses the pronoun she/her referring to Spawn.

PublisherSoftware & Support Media GmbH

Editorial Office AddressSoftware & Support MediaSchwedlerstraße 860314 Frankfurt, Germany

Tel: +49 (0) 69 630089-0Fax: +49 (0) 69 630089-69Web: www.sandsmedia.comLayout: meat* – concept and design

Entire contents copyright © 2020 Software & Support Media GmbH. All rights reserved. No part of this publication may be reproduced, redistributed, posted online, or reused by any means in any form, including print, electronic, photocopy, internal network, Web or any other method, without prior written permission of Software & Support Media GmbH.

The views expressed are solely those of the authors and do not reflect the views or position of their firm, any of their clients, or Publisher. Regarding the information, Publisher disclaims all war-ranties as to the accuracy, completeness, or adequacy of any information, and is not responsible for any errors, omissions, inadequacies, misuse, or the consequences of using any information provided by Publisher.

Rights of disposal of rewarded articles belong to Publisher. All mentioned trademarks and service marks are copyrighted by their respective owners.

Imprint

stand the consequences of the voice given to AI. We are outlining the importance of opening up a new way to think about voice, identity, and AI. As stated earlier, we perpetuate stereotypes in a new technology. Is the non-binary voice a solution? We consider it to be a step, but it is not the end of the thinking process yet. Refer-ring to Butler, we see gender as a spectrum and likewise, the AI voice can assimilate this idea. With the aware-ness of entering a relatively new field, we like the idea of shaping a voice that integrates the feminine and the masculine as a spectrum in order to escape these binary role models. The feminine and the masculine is present in various forms in each human being and we now have the opportunity to think further and sketch a new path that includes ethical thinking, artistic approaches, and a cultural dialogue.

Aude Gouaux-Langlois is a composer, musician and sound artist from France, working with different sound sources that she removes from their context and com-bines with her own voice. Her work merges music, sound design and technology in an organic way.

Belinda Sykora lives and works as an artist, musician and theorist in Berlin and Vienna. Her works deal with language and sound in an interdisciplinary way, using different technical means to play with the perception of the recipients. These include binaural sound walks,

sound installations, radio plays and performances. Aude Gou-aux-Langlois and Belinda Sykora founded the artist collective Ekheo in 2016 during their master’s degree in “Sound Studies” at UdK Berlin. Their work about the voice includes artistic re-search in the field of auditory culture, performance, radio art and experimental music.

https://www.virtueworldwide.com/case-studies/meet-q

https://mlconference.ai/blog/keynote-the-ethics-of-ai-dealing-with-difficult-choices-in-a-non-binary-world/?utm_source=pdf&utm_medium=referral&utm_campaign=mlmagazine120

Global Conference Series for Machine Learning

Innovation

mlconference.ai

MUNICH

BERLIN

SINGAPORE

mlconference Machine Learning Conference





Chomsky Go with the TensorFlow · All Done: TensorFlow 2.0 4 TensorFlow 2.0 offers a good...

Documents

Transcript of Chomsky Go with the TensorFlow · All Done: TensorFlow 2.0 4 TensorFlow 2.0 offers a good...