Traffic classification consists of associating network flows with the application that generated them. This subject, of crucial importance for both service providers and network managers, has already received substantial attention in the research community. Despite these efforts, a number of issues still remain unsolved. Therefore, this work presents three parts dealing with the different aspects of the challenging task of traffic classification and its use cases.
First part presents an in-depth study of state-of-the-art statistical classification methods which use passive traces collected in the access network of an ISP offering ADSL access to residential users. We critically evaluate the performance, including the portability aspect, which so far has been overlooked in the community. Portability is defined as the ability of the classifier to perform well on sites (networks) different than it was trained on.
The second part aims to provide a remedy for some of the problems uncovered in part one, mainly the ones concerning portability. We propose a self-learning hybrid classification method that enables synergy between a priori heterogeneous sources of information (e.g. statistical flow features and the presence of a signature). We first extensively evaluate its performance using the data sets from part one. We then report the main findings for tool deployment in an operational ADSL platform, where a hybrid classifier monitored the network for more than half a year. Last part presents a practical use case of traffic classification and focuses on building the profile of customers of an ISP at the application level.