Robust and scalable probabilistic machine learning methods with applications to the airline industry

Candela, Rosa

In the airline industry, price prediction plays a significant role both for customers and
travel companies. On the customer side, due to the dynamic pricing strategies adopted
by airlines, prices fluctuate over time and the resulting uncertainty causes worries among
travellers. On the other side, travel companies are interested in performing accurate shortand long-term price forecasts to build attractive offers and maximize their revenue margin.
However, price time-series comprise time-evolving complex patterns and non-stationarities,
which make forecasting model deteriorate over time. Thus an efficient model performance
monitoring is key to ensure accurate forecasts. Beside this, the steady growth of airline
traffic leads to the development of massive datasets, which call for automatic procedures.
In this thesis we introduce two machine learning applications which are crucial for the airline
industry. The first one is meant to answer the question about the best moment to buy a
ticket, through the use of specific metrics which help travellers taking the best decision about
whether to buy or wait, while the second one focuses on the problem of model performance
monitoring in industrial applications and it consists in a data-driven framework, built on
top of the existing forecasting models, which estimates model performance and performs
dynamic model selection.
Stochastic Gradient Descent (SGD) represents the workhorse optimization method in the
field of machine learning. When dealing with complex models and massive datasets, we
resort to distributed systems, whereby multiple nodes compute stochastic gradients using
partitions of the dataset and model parameters are updated aggregating information from
all nodes. In asynchronous systems, convergence of SGD is challenging because distributed
workers might produce gradient updates for a loss computed on stale versions of the current
model iterates. In this context, one technique that has achieved remarkable results, albeit in
synchronous setups, is sparsification, in that it reduces communication overheads. In this
thesis we fill the gap in the literature and study sparsification methods in asynchronous
settings. For the first time, we provide a concise and simple convergence rate analysis when
the joint effects of sparsification and asynchrony are taken into account, and show that
sparsified SGD converges at the same rate of standard SGD.
Recently, SGD has played an important role also as a way to perform approximate Bayesian
Inference, which is a principled way of developing probabilistic models. Stochastic gradient
MCMC algorithms use indeed SGD with constant learning rate to obtain samples from
the posterior distribution. Despite mathematical elegance and some promising results restricted to simple models, most of the existing works fall short in easily dealing with the
complexity of the loss landscape of deep models, for which stochastic optimization poses
serious challenges. Existing methods hence result often unpractical, because they require
ad-hoc, sophisticated vanishing learning rate schedules, and hyper-parameter tuning. In
this thesis we introduce a novel, practical approach to posterior sampling, which makes the
SG noise isotropic using a fixed learning rate and that requires weaker assumptions than
existing algorithms.

Data Science
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: