//Incremental Web-Site Boundary Detection Using Random Walks

Incremental Web-Site Boundary Detection Using Random Walks

by A. Alshukri, F. Coenen, M. Zito
Abstract:
The paper describes variations of the classical k-means clustering algorithm that can be used effectively to address the so called Web-site Boundary Detection (WBD) problem. The suggested advantages offered by these techniques are that they can quickly identify most of the pages belonging to a web-site; and, in the long run, return a solution of comparable (if not better) accuracy than other clustering methods. We analyze our techniques on artificial clones of the web generated using a well-known preferential attachment method
Reference:
Incremental Web-Site Boundary Detection Using Random Walks (A. Alshukri, F. Coenen, M. Zito), In Proceedings of the 7th International Conference on Machine Learning and Data Mining (MLDM’11). 30th August-3rd September, New York, USA, Springer, 2011. (slides)
Bibtex Entry:
@inproceedings{Alshukri2011a,
	author = {Alshukri, A. and Coenen, F. and Zito, M.},
	title = {Incremental Web-Site Boundary Detection Using Random Walks},
	abstract = {The paper describes variations of the classical k-means clustering algorithm that can be used effectively to address the so called Web-site Boundary Detection (WBD) problem. The suggested advantages offered by these techniques are that they can quickly identify most of the pages belonging to a web-site; and, in the long run, return a solution of comparable (if not better) accuracy than other clustering methods. We analyze our techniques on artificial clones of the web generated using a well-known preferential attachment method},
	booktitle = {Proceedings of the 7th International Conference on Machine Learning and Data Mining (MLDM'11). 30th August-3rd September, New York, USA},
	year = {2011},	
	address = {New York, USA},
	pages = {414----427},
	publisher = {Springer},
	isbn = {},
	series = {Lecture Notes in Computer Science},	
	url = {http://link.springer.com/chapter/10.1007/978-3-642-14400-4_41},
	url = {http://www.csc.liv.ac.uk/~frans/PostScriptFiles/mldm2011alshukri.pdf},
	url = {http://cgi.csc.liv.ac.uk/~ash/Publications/Papers/Alshukri2011-MLDM_Incremental_Web-Site_Boundary_Detection_Using_Random_Walks.pdf},
	url = {/pubs/MLDM-ICDM/Alshukri2011-MLDM_Incremental_Web-Site_Boundary_Detection_Using_Random_Walks.pdf},		
	comment={<a href="#">slides</a>}	
}