Towards a systematic multi-modal representation learning for network data

Ben Houidi, Zied; Azorin, Raphaël; Gallo, Massimo; Finamore, Alessandro; Rossi, Dario

HOTNETS 2022, 20th ACM, Workshop on Hot Topics in Networks, 14-15 November 2022, Austin, TX, USA

Learning the right representations from complex input data is the key ability of successful machine learning (ML) models. The latter are often tailored to a specific data modality. For example, recurrent neural networks (RNNs) were designed having sequential data in mind, while convolutional neural networks (CNNs) were designed to exploit spatial correlation in images. Unlike computer vision (CV) and natural language processing (NLP), each of which targets a single well-defined modality, network ML problems often have a mixture of data modalities as input. Yet, instead of exploiting such abundance, practitioners tend to rely on sub-features thereof, reducing the problem to single modality for the sake of simplicity. In this paper, we advocate for exploiting all the modalities naturally present in network data. As a first step, we observe that network data systematically exhibits a mixture of quantities (e.g., measurements), and entities (e.g., IP addresses, names, etc.). Whereas the former are generally well exploited, the latter are often underused or poorly represented (e.g., with one-hot encoding). We propose to systematically leverage language models to learn entity representations, whenever significant sequences of such entities are historically observed. Through two diverse use-cases, we show that such entity encoding can benefit and naturally augment classic quantity-based features.

Detail

Document

DOI

BIBTEX

Type:

Conference

City:

Austin

Date:

2022-11-14

Department:

Data Science

Eurecom Ref:

7194

© ACM, 2022. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in HOTNETS 2022, 20th ACM, Workshop on Hot Topics in Networks, 14-15 November 2022, Austin, TX, USA https://doi.org/10.1145/3563766.3564108