alshukri2011a - Incremental Web-Site Boundary Detection Using Random Walks

Incremental Web-Site Boundary Detection Using Random Walks

by A. Alshukri, F. Coenen, M. Zito

[PDF]

Abstract

The paper describes variations of the classical k-means clustering algorithm that can be used effectively to address the so called Web-site Boundary Detection (WBD) problem. The suggested advantages offered by these techniques are that they can quickly identify most of the pages belonging to a web-site; and, in the long run, return a solution of comparable (if not better) accuracy than other clustering methods. We analyze our techniques on artificial clones of the web generated using a well-known preferential attachment method

Reference

Incremental Web-Site Boundary Detection Using Random Walks (A. Alshukri, F. Coenen, M. Zito), In Proceedings of the 7th International Conference on Machine Learning and Data Mining (MLDM’11). 30th August-3rd September, New York, USA, Springer, 2011.

Bibtex Entry

@inproceedings{Alshukri2011a,
	author = {Alshukri, A. and Coenen, F. and Zito, M.},
	title = {Incremental Web-Site Boundary Detection Using Random Walks},
	booktitle = {Proceedings of the 7th International Conference on Machine Learning and Data Mining (MLDM'11). 30th August-3rd September, New York, USA},
	year = {2011},	
	address = {New York, USA},
	pages = {414----427},
	publisher = {Springer},
	isbn = {},
	series = {Lecture Notes in Computer Science},	
}

For further details see the full paper.