alshukri2010 - Web Sub Site Boundary Detection

Web Sub Site Boundary Detection

by A. Alshukri, F. Coenen, M. Zito



Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task.

In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is.

This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify web-site boundaries.

The preliminary results reported seemto indicate that, in general, a combination of features produces the most appropriate result.


Web-Site Boundary Detection (A. Alshukri, F. Coenen, M. Zito), In Proceedings of the 10th Industrial Conference on Data Mining (ICDM’10). 12-14 July, Berlin, Germany, Springer, 2010.

Bibtex Entry

	author = {Alshukri, A. and Coenen, F. and Zito, M.},
	title = {Web-Site Boundary Detection},
	booktitle = {Proceedings of the 10th Industrial Conference on Data Mining (ICDM'10). 12-14 July, Berlin, Germany},
	year = {2010},
	address = {Berlin, Germany},
	pages = {529--543},
	publisher = {Springer},
	isbn = {978-3-642-14399-1},
	series = {Lecture Notes in Computer Science},

For further details see the full paper.