Traffic classification : Application-based feature selection using logistic regression

En-Najjary, Taoufik; Urvoy-Keller, Guillaume; Pietrzyk, Marcin; Costeux, Jean-Laurent
Research report RR-10-234

Recently, several statistical techniques using flow features have been proposed




to address the problem of traffic classification. These methods achieve in general




high recognition rates of the dominant applications and more random results for




less popular ones. This stems from the selection process of the flow features, used




as inputs of the statistical algorithm, which is biased toward those dominant applications.




As a consequence, existing methods are difficult to adapt to the changing




needs of network administrators that might want to quickly identify dominant applications




like p2p or HTTP based applications or to zoom on specific less popular




(in terms of bytes or flows) applications on a given site, which could be HTTP




streaming or BitTorrent for instance. We propose a new approach, aimed to address




the above mentioned issues, based on logistic regression. Our technique incorporates




the following features: i) Automatic selection of distinct, per-application




features set that best separates it from the rest of the traffic ii) Real time implementation




as it needs only to inspect the first few packets of a flow to classify it, (iii)




Low computation cost as logistic regression is implemented by comparing a linear




combination of a flow features with a fixed threshold value, (iv) Ability to handle




application types that former methods failed to classify. We validate the method




using two recent data sets collected on two ADSL platforms of a large ISP.



Digital Security
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Research report RR-10-234 and is available at :