//Web-Site Boundary Detection Using Incremental Random Walk Clustering

Web-Site Boundary Detection Using Incremental Random Walk Clustering

by A. Alshukri, F. Coenen, M. Zito
Abstract:
In this paper we describe a random walk clustering technique to address the Website Boundary Detection (WBD) problem. The technique is fully described and compared with alternative (breadth and depth first) approaches. The reported evaluation demonstrates that the random walk technique produces comparable or better results than those produced by these alternative techniques, while at the same time visiting fewer ‘noise’ pages. To demonstrate that the good results are not simply a consequence of a randomisation of the input data we also compare with a random ordering technique.
Reference:
Web-Site Boundary Detection Using Incremental Random Walk Clustering (A. Alshukri, F. Coenen, M. Zito), In Proceedings of the 31st SGAI International Conference (SGAI’11), 13-15th December, Cambridge, England UK, Springer, 2011. (slides)
Bibtex Entry:
@inproceedings{Alshukri2011b,
	author = {Alshukri, A. and Coenen, F. and Zito, M.},
	title = {Web-Site Boundary Detection Using Incremental Random Walk Clustering},
	abstract = {In this paper we describe a random walk clustering technique to address the Website Boundary Detection (WBD) problem. The technique is fully described and compared with alternative (breadth and depth first) approaches. The reported evaluation demonstrates that the random walk technique produces comparable or better results than those produced by these alternative techniques, while at the same time visiting fewer ‘noise’ pages. To demonstrate that the good results are not simply a consequence of a randomisation of the input data we also compare with a random ordering technique.},
	booktitle = {Proceedings of the 31st SGAI International Conference (SGAI'11), 13-15th December, Cambridge, England UK},
	year = {2011},
	address = {Cambridge, England UK},
	pages = {255--268},
	publisher = {Springer},
	isbn = {},
	series = {},
	url = {http://link.springer.com/chapter/10.1007%2F978-1-4471-2318-7_20},
	url = {http://www.csc.liv.ac.uk/~frans/PostScriptFiles/sgai2011alshukri.pdf},	
	url = {http://cgi.csc.liv.ac.uk/~ash/Publications/Papers/Alshukri2011-SGAI_Web-Site_Boundary_Detection_Using_Incremental_Random_Walk_Clustering.pdf},
	url = {/pubs/BCS-SGAI/Alshukri2011-SGAI_Web-Site_Boundary_Detection_Using_Incremental_Random_Walk_Clustering.pdf},
	comment={<a href="#">slides</a>}
}