In the airline industry, price prediction plays a significant role both for customers and travel companies. On the customer side, due to the dynamic pricing strategies adopted by airlines, prices fluctuate over time and the resulting uncertainty causes worries among travellers. On the other side, travel companies are interested in performing accurate shortand long-term price forecasts to build attractive offers and maximize their revenue margin. However, price time-series comprise time-evolving complex patterns and non-stationarities, which make forecasting model deteriorate over time. Thus an efficient model performance monitoring is key to ensure accurate forecasts. Beside this, the steady growth of airline traffic leads to the development of massive datasets, which call for automatic procedures. In this thesis we introduce two machine learning applications which are crucial for the airline industry. The first one is meant to answer the question about the best moment to buy a ticket, through the use of specific metrics which help travellers taking the best decision about whether to buy or wait, while the second one focuses on the problem of model performance monitoring in industrial applications and it consists in a data-driven framework, built on top of the existing forecasting models, which estimates model performance and performs dynamic model selection. Stochastic Gradient Descent (SGD) represents the workhorse optimization method in the field of machine learning. When dealing with complex models and massive datasets, we resort to distributed systems, whereby multiple nodes compute stochastic gradients using partitions of the dataset and model parameters are updated aggregating information from all nodes. In asynchronous systems, convergence of SGD is challenging because distributed workers might produce gradient updates for a loss computed on stale versions of the current model iterates. In this context, one technique that has achieved remarkable results, albeit in synchronous setups, is sparsification, in that it reduces communication overheads. In this thesis we fill the gap in the literature and study sparsification methods in asynchronous settings. For the first time, we provide a concise and simple convergence rate analysis when the joint effects of sparsification and asynchrony are taken into account, and show that sparsified SGD converges at the same rate of standard SGD. Recently, SGD has played an important role also as a way to perform approximate Bayesian Inference, which is a principled way of developing probabilistic models. Stochastic gradient MCMC algorithms use indeed SGD with constant learning rate to obtain samples from the posterior distribution. Despite mathematical elegance and some promising results restricted to simple models, most of the existing works fall short in easily dealing with the complexity of the loss landscape of deep models, for which stochastic optimization poses serious challenges. Existing methods hence result often unpractical, because they require ad-hoc, sophisticated vanishing learning rate schedules, and hyper-parameter tuning. In this thesis we introduce a novel, practical approach to posterior sampling, which makes the SG noise isotropic using a fixed learning rate and that requires weaker assumptions than existing algorithms.
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :