//A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks

by A. Alshukri, F. Coenen
Abstract:
This paper presents an investigation into the Website Boundary Detection (WBD) problem in the dynamic context. In the dynamic context (as opposed to the static context) the web data to be considered is not fully available prior to the start of the website boundary detection process. The dynamic approaches presented in this paper are all probabilistic and based on the concept of random walks; three variations are considered: (i) the standard Random Walk (RW), (ii) a Self Avoiding RW and (iii) the Metropolis Hastings RW. The reported evaluation demonstrates that the proposed technique produces good WBD solutions while at the same time reducing the amount of “noise” pages visited. The best performing variation was found to be a Metropolis Hastings RW.
Reference:
A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks (A. Alshukri, F. Coenen), In Proceedings of the Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference (WIC’14), 11–14 August 2014, Warsaw, Poland, IEEE Computer Society, 2014. (slides)
Bibtex Entry:
@inproceedings{Alshukri2014,
	author = {Alshukri, A. and Coenen, F.},
	title = {A Dynamic Approach To The Website Boundary Detection Problem Using Random Walks},
	abstract = {This paper presents an investigation into the Website Boundary Detection (WBD) problem in the dynamic context. In the dynamic context (as opposed to the static context) the web data to be considered is not fully available prior to the start of the website boundary detection process. The dynamic approaches presented in this paper are all probabilistic and based on the concept of random walks; three variations are considered: (i) the standard Random Walk (RW), (ii) a Self Avoiding RW and (iii) the Metropolis Hastings RW. The reported evaluation demonstrates that the proposed technique produces good WBD solutions while at the same time reducing the amount of “noise” pages visited. The best performing variation was found to be a Metropolis Hastings RW.},
	booktitle = {Proceedings of the Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference (WIC'14), 11–14 August 2014, Warsaw, Poland},
	year = {2014},
	address = {Los Alamitos, CA, USA},
	pages = {10-14},
	publisher = {IEEE Computer Society},
	isbn = {},
	series = {},
	url = {https://cgi.csc.liv.ac.uk/~frans/PostScriptFiles/alshukriWI2014.pdf},	
	url = {http://cgi.csc.liv.ac.uk/~ash/Publications/Papers/Alshukri2014-WIC_A_Dynamic_Approach_To_The_Website_Boundary_Detection_Problem_Using_Random_Walks.pdf},
	url = {/pubs/WIC/Alshukri2014-WIC_A_Dynamic_Approach_To_The_Website_Boundary_Detection_Problem_Using_Random_Walks.pdf?pdf=Alshukri2014-WIC},	
	comment={<a href="#">slides</a>}
}