I develop a voice assistant for digital logistics solutions

Clemens Holzhüter

Clemens Holzhüter

The review of leolab projects 2019

This year we have built several prototypes to improve and simplify logistic processes by means of voice control and the use of voice assistants. But voice control is actually nothing new in logistics: pick-by-voice in the warehouse or the use of navigation software. The next level is Voice UI, the interaction with software solely via speech.

The motivation

We would like to develop a voice assistant that enables “Hands Free, Eyes Off” scenarios in an industrial environment. The fields of application of voice control are for example:

  • Control rooms, in which the current operating status is summarized by voice command
  • Intelligent systems (artificial intelligence) that propose and execute alternative or downstream activities via “dialogue”
  • Barrier-free working

Use cases are realized as rapid prototyping. In this process, established hardware and software familiar from the private environment is used.

The use of amazon alexa

The language assistant Alexa provided by Amazon is very well suited for a prototype. The hardware is inexpensive and very widely used, such as third generation echoes. Speech recognition is included, a cloud API is available. This kind of connection allows an easy connection to own cloud services. In our case accounts and AWS experience are already available, as myleo / dsc also uses AWS.

Application examples in the field of warehouse logistics.

How does a voice assistant support me in my daily work? See a few examples here:

YouTube

By loading the video, you are agree to YouTube privacy policy
Learn more

Video laden

leogistics SAP EWM – Alexa Skill (warehouse movement)

YouTube

By loading the video, you are agree to YouTube privacy policy
Learn more

Video laden

leogistics SAP EWM Alexa Skill (material flow control)

The functionalities of alexa

A prototype using the example of myleo / dsc

The Use Case can be easily described: “In a track-and-trace scenario, a message is to be sent from the control room to a driver.”

The preparation

You will need the Echo and an AWS developer account. The Echo can be set up quickly and easily. Without any previous knowledge, it is recommended to set up the device ” standard ” and for example to start your favorite radio station by voice command. The AWS-Developer-Account is a “must have”. It is required for the development and deployment of the skill, as well as for the Lambda function. More on this in a moment.

The generated skill does not need to be placed in the Amazon Store and does not need to be checked. Thus, the skill can be used directly for testing purposes.

That’s it? Then let’s get started.

YouTube

By loading the video, you are agree to YouTube privacy policy
Learn more

Video laden

Alexa Prototyp in the myleo / dsc

The implementation

At the beginning it is important to get a blueprint for the most important architectural decisions:

schematische Darstellung Sprachasisstent
How an Alexa skill works

The myleo / dsc use case

In our case the echo “Send to ‘HH-LE-1100’ Request for callback” is recorded. The myleo / dsc skill interprets “Send message” and “HH-LE-1337”. The Lambda function executes the API call to the myleo / dsc backend and sends “Request callback” to the truck with the license plate “HH-LE-1100”. If successful, the text “Message sent” is generated and output, or in case of an error, “Error while sending the message”.

The skill development

Broken down, an Alexa skill is a scheme for a dialogue with a computer, also called interaction model. Like in Star Trek, Alexa only reacts when a keyword is entered, followed by a verbal command. This command must follow a defined structure in order for Alexa to recognize it.

The sentence “Alexa, say myleo, send message to ‘HH-LE-1100”’ Request for callback'” corresponds to the following scheme:

“<Alexa activation command>, say <invocation>, <intent>,<slot1><slot2>”

If you do not follow this scheme, Alexa will not recognize what you want. Fortunately, there are possibilities for variations, inquiries or dialogues, with which you can imitate a natural language behaviour.

The use of slots

Slot variables, such as the license plate in our example, are a greater challenge for use in industrial environments. Alexa has a long list of predefined slot types that can be used here (phone numbers, pins, city lists etc.). Complex identifiers such as license plates or document numbers are not included. Making these easily recognizable is a challenge. For example, if you have an ID “IH5001”, “India Hotel fivethrousandone” should be just as valid as “I. H. five zero zero one”. A great solution for this would be “dynamic variables”, i.e. lists that are prompted by your own backend. These are unfortunately limited to 100 variables, which is not enough for large namespaces. Hopefully Amazon will improve this.

The backend communication system

Once you have an interaction model, Alexa is able to recognize the intent and variables and send them to its own backend. There the voice output is generated according to the voice inputs and sent back to Amazon. Amazon recommends an AWS Lambda function for this. This is quickly set up and used and the existing myleo / dsc backend does not need to be extended. Especially when developing prototypes, this feature saves a lot of time.

A tip: Many tutorials on the net are based on an older version of the “Alexa SDK” and use the “skillinator.io” service to create a Boilerplate NodeJS Lambda Function from a model. The current version is v2, the Skillinator is no longer available. Better yet is the service from Amazon direct, which also supports the current SDK. 

The authentication

Since the myleo / dsc backend is of course not usable without authentication, the Alexa device must authenticate itself. Alexa offers secure and convenient OAuth2 support for this. This also supports a reduced version via “Implicit Grant” with which you can quickly and securely connect your backend without OAuth2 service. However, this method requires regular re-logins: It is negligible for rapid prototyping and at the same time not recommended for productive use. For this login, however, a separate website had to be implemented. The existing login page could not be used for this. By the way, this was also the only development work directly in myleo / dsc, everything else was implemented with AWS services. Really efficient!

Final review of the prototype

POSTIVE NEGATIVE
You can easily and quickly create a prototype for a Use Case. There are limitations regarding more complex codes and the use of IDs is hindering (license plates, container numbers). In some use cases this makes productive use difficult.
It is easy to convince a potential user of the advantages of a language assistant, because the hardware makes a very good impression. Security issue: Under certain circumstances, sensitive data may be made visible to Amazon (unlike, for example, all other data within myleo / dsc, which is completely encrypted and cannot be viewed by the cloud operator).
Voice assistants are still on the rise, the topic is actively maintained and pushed by Amazon.
Voice assistants are in vogue and are well known from the home application area.
With the opening of the logistics market for cloud solutions, many more fields of application become accessible.

Were we able to arouse your interest in the use of voice assistants? Do you have a use case for a Voice UI, with Amazon Alexa or your own voice assistant? Feel free to contact us!

If you have any questions about this or other topics on the blog, please contact blog@leogistics.com.

Clemens Holzhüter
Jan-Philipp Horstmann
Digital Supply Chain

CONTACT US

GET IN TOUCH

Are you interested in state-of-the-art logistics solutions? Then I am your contact person. I look forward to your call or your message via contact form.