Hybrid traffic identification

Pietrzyk, Marcin; En-Najjary, Taoufik; Urvoy-Keller, Guillaume; Costeux, Jean-Laurent

Research Report RR-10-238

Traffic classification is a key function for ISPs and companies in general. Several

different classes of methods have been proposed, especially deep packet inspection

(DPI) and machine learning based approaches. Each approach is in general

efficient for some classes of applications. However, there is no one-fit-all

method, i.e., no method that offers the best performance for all applications.

In this paper, we propose a framework, called Hybrid Traffic Identification

(HTI) that enables to take advantage of the merits of different approaches. Any

source of information (flow statistics, signatures, etc) is encoded as a feature; the

actual classification is made by a machine learning algorithm. We demonstrated

that HTI is not-dependent on a specific machine learning algorithm, and that any

classification method can be incorporated to HTI as its decision could be encoded

as a new feature.

Using multiple traces from a large ISP, we demonstrate that HTI outperforms

state-of-the-art methods as it can select the best sources of information for each

application to maximize its ability to detect it. We further report on an ongoing live

experiment with our HTI instance in production network of the large ISP, which

already represents several weeks of continual traffic classification.

Detail

Document

BIBTEX

Type:

Report

Date:

2010-04-13

Department:

Digital Security

Eurecom Ref:

3075