DATA TALK :"Tuning-up Data Systems with Deep Learning"

Jarek Szlichta -
Data Science

Date: -
Location: Eurecom

Abstract: Modern data systems, such as IBM Db2 are equipped with hundreds of configuration parameters. Tuning these configurations can wield a significant influence on the performance of business queries. These configuration parameters span allocation of physical memory resources within the system to optimization levels of the query optimizer which dictate the decisions it makes. Traditionally, the process of configuration parameters tuning has been conducted by a system administrator or an expert from the vendor. This manual tuning is, however, a laborious and Time-consuming task. To alleviate the human burden, our system leverages machine learning through deep reinforcement learning using advantage actor-criAc neural networks to automatically optimize configuration parameters. The core of our approach involves translating high-dimensional query execution plans (QEPs) into a lower-dimensional embedding space called QEP2Vec, which serves as input to the ML models. To scale to large query workloads, we bootstrap the training process through transfer learning. Initially, we train our model using estimated costs of queries, and subsequently, we fine-tune it based on actual query execution Ames. This two-step training approach enhances the model's ability to adapt to real-world scenarios, leading to improved tuning performance. Bio : Jarek Szlichta holds the position of an Associate Professor at York University. He is also a Research Faculty Fellow at IBM Centre for Advanced Studies (CAS) and an Adjunct Professor at University of Waterloo. Prior to that he was a Postdoctoral Fellow at the University of Toronto. His research concerns various topics in data science with special interests in data-driven systems, large-scale machine learning, graph data, and responsible AI to obtain trustworthy insights from data. He is a recipient of the CeBIT Computer Expo Business Award for the work on the OCEAN GenRap analytic reporting tool and the runner-up IBM Project of the Year Award for automatic tuning of the IBM Db2 system. His research grants are from both government (NSERC, MITACS and SOSCIP) and industry (IBM and AT&T). He received the PhD degree from York University while he has spent a 3-year student fellowship at IBM CAS (with the IBM Research Student of the Year Award).