alshukri2014 - A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

by A. Alshukri, F. Coenen

[PDF]

Abstract

This paper presents an investigation into the Website Boundary Detection (WBD) problem in the dynamic context. In the dynamic context (as opposed to the static context) the web data to be considered is not fully available prior to the start of the website boundary detection process. The dynamic approaches presented in this paper are all probabilistic and based on the concept of random walks; three variations are considered: (i) the standard Random Walk (RW), (ii) a Self Avoiding RW and (iii) the Metropolis Hastings RW. The reported evaluation demonstrates that the proposed technique produces good WBD solutions while at the same time reducing the amount of “noise” pages visited. The best performing variation was found to be a Metropolis Hastings RW.

Reference

A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks (A. Alshukri, F. Coenen), In Proceedings of the Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference (WIC’14), 11–14 August 2014, Warsaw, Poland, IEEE Computer Society, 2014. (slides)

Bibtex Entry

@inproceedings{Alshukri2014,
	author = {Alshukri, A. and Coenen, F.},
	title = {A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks},
	booktitle = {Proceedings of the Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference (WIC'14), 11–14 August 2014, Warsaw, Poland},
	year = {2014},
	address = {Los Alamitos, CA, USA},
	pages = {10-14},
	publisher = {IEEE Computer Society},
	isbn = {},
	series = {},
}

For further details see the full paper.

Creating your first programming language is easier than you think,
...also looks great on your resume/cv.