KAD trace

We have been crawling the peers in the region 0x5b of the KAD peer-to-peer for almost half a year every 5 minutes. Moreover we have been crawling the full KAD network for more than one year once a day.

On this website you may download the full datasets.

 

Dataset of the Zonecrawl

The crawl started on

ended on                  

crawl duration          

crawl frequency      

KAD IDs seen          

IP addresses seen  

2006-09-23

2007-03-20

179 days

5 minutes (288 snapshots a day)

400,375

3,228,890

 

download zonecrawl.txt.bz2  33 MB (uncompressed 39 GB  !!!)

 

The file contains a compressed matrix. Under Linux it can be uncompressed with the command: bunzip2 zonecrawl.txt.bz2

The matrix contains

 

NEW: this dataset is now also available in the .avt format. For more information about the .avt format please have a look here: http://www.cs.illinois.edu/~pbg/availability/ .

 

download zonecrawl.avt.gz 54MB (uncompressed 147MB <<< 39GB). To uncompress please use the command: gzip -d zonecrawl.avt.gz

 

Thanks to Lluís Pàmies i Juárez for providing the data in this format.

 

 

Dataset of the Fullcrawl

The crawl started on

ended on                  

crawl duration          

crawl frequency      

KAD IDs seen          

IP addresses seen  

2007-03-20

2008-05-25

433 days

Once per day

64,146,397

97,380,532

 

Since the full dataset is too big to big (604,439,668 lines) to be downloadable in one single file we split it up in 6 files with approximately 750 MB (4.8 GB uncompressed) each:

download

fullcrawl1.bz2 

fullcrawl2.bz2 

fullcrawl3.bz2 

fullcrawl4.bz2 

fullcrawl5.bz2 

fullcrawl6.bz2 

 

Each file contains a compressed list. Under Linux it can be uncompressed with the command: bunzip2 fullcrawlX.txt.bz2

Every line contains a quadruple of <anonymized IP><port #><KAD ID 64 bit><date>. The entries are ordered by the KAD ID and the date.

The entry 234567891234 3456 ABCDEF12345678 12-03-2008 says that on March 12th 2008 the peer with the KAD ID ABCDEF12345678 (the trace contains the first 64 bits of the 128 bit KAD IDs only) was online using the anonymized IP address 234567891234 on port 3456. The anonymization scheme we used does loose all prefix information of the IP address. 

 

 

Please cite the data set the following way:


Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack
A global view of KAD
Proc. of Internet Measurement Conference (IMC), October 2007, San Diego, USA
Details  BibTeX   

 


The most detailed analysis of the trace can be found in the following Technical Report:

 

Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack
Analyzing peer behavior in KAD
Institut Eurecom, October 2007
Technical Report RR-07-205
Details  BibTeX 
 

 

 

 

Previous publications on the crawl methodology and analysis of the results obtained:

 

Damiano Carra, and Ernst W. Biersack
Building a Reliable P2P System Out of Unreliable P2P Clients: The Case of KAD
Proc. of the 3rd International Conference on Emerging Networking Experiments and Technologies (CoNEXT), December 2007, New York, NY, USA
Details  BibTeX   

 

Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack
A global view of KAD
Proc. of Internet Measurement Conference (IMC), October 2007, San Diego, USA
Details  BibTeX   

 

Moritz Steiner, Taoufik En-Najjary, and Ernst W. Biersack
Exploiting KAD: possible uses and misuses
Computer communications review, Volume 37 N°5, October 2007
Details  BibTeX  http://www.eurecom.fr/resources/images/0_Icons/pdf.gif 

 

Moritz Steiner, Ernst W. Biersack, and Taoufik En-Najjary
Actively monitoring peers in KAD
Proc. of the 6th International Workshop on Peer-to-Peer Systems (IPTPS), February 2007, Bellevue, USA
Details  BibTeX  http://www.eurecom.fr/resources/images/0_Icons/pdf.gif 

 

 

This site has been last modified on 2010-04-17