Traffic classification : Application-based feature selection using logistic regression

En-Najjary, Taoufik; Urvoy-Keller, Guillaume; Pietrzyk, Marcin; Costeux, Jean-Laurent
Research report RR-10-234

Recently, several statistical techniques using flow features have been proposed

 

 

 

to address the problem of traffic classification. These methods achieve in general

 

 

 

high recognition rates of the dominant applications and more random results for

 

 

 

less popular ones. This stems from the selection process of the flow features, used

 

 

 

as inputs of the statistical algorithm, which is biased toward those dominant applications.

 

 

 

As a consequence, existing methods are difficult to adapt to the changing

 

 

 

needs of network administrators that might want to quickly identify dominant applications

 

 

 

like p2p or HTTP based applications or to zoom on specific less popular

 

 

 

(in terms of bytes or flows) applications on a given site, which could be HTTP

 

 

 

streaming or BitTorrent for instance. We propose a new approach, aimed to address

 

 

 

the above mentioned issues, based on logistic regression. Our technique incorporates

 

 

 

the following features: i) Automatic selection of distinct, per-application

 

 

 

features set that best separates it from the rest of the traffic ii) Real time implementation

 

 

 

as it needs only to inspect the first few packets of a flow to classify it, (iii)

 

 

 

Low computation cost as logistic regression is implemented by comparing a linear

 

 

 

combination of a flow features with a fixed threshold value, (iv) Ability to handle

 

 

 

application types that former methods failed to classify. We validate the method

 

 

 

using two recent data sets collected on two ADSL platforms of a large ISP.

 

 


Type:
Rapport
Date:
2010-03-03
Department:
Sécurité numérique
Eurecom Ref:
3039
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research report RR-10-234 and is available at :

PERMALINK : https://www.eurecom.fr/publication/3039