Markus Dreyer

Machine Learning Scientist

Amazon.com (Alexa)


Markus Dreyer is a Machine Learning Scientist at Amazon. He has been part of the Alexa group since 2015, leading projects in natural language understanding and question answering. He has published on transfer learning, semi-supervised learning, graphical models, semantic parsing, neural lattice parsing, and entity linking.


  • Natural Language Processing
  • Artificial Intelligence/Deep Learning
  • Machine Learning


  • PhD in Computer Science

    Johns Hopkins University

  • M.Sc. in Computer Science

    Johns Hopkins University

  • M.A. in Computational Linguistics, Linguistics, Philosophy

    Heidelberg University



Machine Learning Scientist


March 2015 – Present Seattle
Question answering, summarization, and natural language understanding (NLU).

Research Scientist

SDL (Language Weaver)

January 2011 – February 2015 Los Angeles, California
Algorithms and statistical models for large-scale machine translation (MT).

Research Assistant

CLSP, Johns Hopkins

January 2004 – December 2010 Baltimore, Maryland
Semi-supervised learning of nonconcatenative morphology, based on graphical models, the Dirichlet process and log-linear parameterization of finite-state machines.

Research Scientist

IBM, Speech Group

May 1999 – February 2003 Heidelberg, Germany
Text-to-speech (TTS) technology and parsing methods.

Recent Publications

Quickly discover relevant content by filtering publications.

Multi-Task Networks with Universe, Group, and Task Feature Learning

We present methods for multi-task learning that take advantage of natural groupings of related tasks. Task groups may be defined along known properties of the tasks, such as task domain or language. Such task groups represent supervised information at the inter-task level and can be encoded into the model. We investigate two variants of neural network architectures that accomplish this, learning different feature spaces at the levels of individual tasks, task groups, as well as the universe of all tasks: (1) parallel architectures encode each input simultaneously into feature spaces at different levels; (2) serial architectures encode each input successively into feature spaces at different levels in the task hierarchy. We demonstrate the methods on natural language understanding (NLU) tasks, where a grouping of tasks into different task domains leads to improved performance on ATIS, Snips, and a large in-house dataset.

Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding

This paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon’s virtual assistant, Alexa. At Amazon, the infrastructure powers over 25,000 skills deployed through the ASK, as well as AWS’s Amazon Lex SLU Service. The ASK emphasizes flexibility, predictability and a rapid iteration cycle for third party developers. It imposes inductive biases that allow it to learn robust SLU models from extremely small and sparse datasets and, in doing so, removes significant barriers to entry for software developers and dialogue systems researchers.

Transfer Learning for Neural Semantic Parsing

The goal of semantic parsing is to map natural language to a machine interpretable meaning representation language (MRL). One of the constraints that limits full exploration of deep learning technologies for semantic parsing is the lack of sufficient annotation training data. In this paper, we propose using sequence-to-sequence in a multi-task setup for semantic parsing with focus on transfer learning. We explore three multi-task architectures for sequence-to-sequence model and compare their performance with the independently trained model. Our experiments show that the multi-task setup aids transfer learning from an auxiliary task with large labeled data to the target task with smaller labeled data. We see an absolute accuracy gain ranging from 1.0% to 4.4% in in our in-house data set and we also see good gains ranging from 2.5% to 7.0% on the ATIS semantic parsing tasks with syntactic and semantic auxiliary tasks.

Zero-Shot Learning Across Heterogeneous Overlapping Domains

We present a zero-shot learning approach for text classification, predicting which natural language understanding domain can handle a given utterance. Our approach can predict domains at runtime that did not exist at training time. We achieve this extensibility by learning to project utterances and domains into the same embedding space while generating each domain-specific embedding from a set of attributes that characterize the domain. Our model is a neural network trained via ranking loss. We evaluate the performance of this zero-shot approach on a subset of a virtual assistant’s third-party domains and show the effectiveness of the technique on new domains not observed during training. We compare to generative baselines and show that our approach requires less storage and performs better on new domains.

LatticeRnn: Recurrent Neural Networks Over Lattices

We present a new model called LatticeRnn, which generalizes recurrent neural networks (RNNs) to process weighted lattices as input, instead of sequences. A LatticeRnn can encode the complete structure of a lattice into a dense representation, which makes it suitable to a variety of problems, including rescoring, classifying, parsing, or translating lattices using deep neural networks (DNNs). In this paper, we use LatticeRnns for a classification task: each lattice represents the output from an automatic speech recognition (ASR) component of a spoken language understanding (SLU) system, and we classify the intent of the spoken utterance based on the lattice embedding computed by a LatticeRnn. We show that making decisions based on the full ASR output lattice, as opposed to 1-best or n-best hypotheses, makes SLU systems more robust to ASR errors. Our experiments yield improvements of 13% over a baseline RNN system trained on transcriptions and 10% over an n-best list rescoring system for intent classification.