Traffic classification is a key function for ISPs and companies in general. Several
different classes of methods have been proposed, especially deep packet inspection
(DPI) and machine learning based approaches. Each approach is in general
efficient for some classes of applications. However, there is no one-fit-all
method, i.e., no method that offers the best performance for all applications.
In this paper, we propose a framework, called Hybrid Traffic Identification
(HTI) that enables to take advantage of the merits of different approaches. Any
source of information (flow statistics, signatures, etc) is encoded as a feature; the
actual classification is made by a machine learning algorithm. We demonstrated
that HTI is not-dependent on a specific machine learning algorithm, and that any
classification method can be incorporated to HTI as its decision could be encoded
as a new feature.
Using multiple traces from a large ISP, we demonstrate that HTI outperforms
state-of-the-art methods as it can select the best sources of information for each
application to maximize its ability to detect it. We further report on an ongoing live
experiment with our HTI instance in production network of the large ISP, which
already represents several weeks of continual traffic classification.