Specification and discovery of web patterns: A graph grammar approach

Amin Roudaki, Jun Kong*, Kang Zhang

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

20 Citations (Scopus)

Abstract

Finding useful information from the Web becomes increasingly difficult as the volume of Web data rapidly grows. To facilitate effective Web browsing, Web designers usually display the same type of information with a consistent layout (referred to as a Web pattern). Discovering Web patterns can benefit many applications, such as extracting structured data. This paper presents a generic framework for discovering Web patterns and recognizing their instances (i.e., structured data) based on graph grammars. In our framework, a Web pattern is visually yet formally specified as a graph grammar, which is automatically induced through a grammar induction engine. The grammar induction engine is featured by converting the problem of (2-dimensional) graph grammar induction to (1-dimensional) string induction. Based on the induced pattern, matching instances are recognized from Web pages through a graph parsing process. We have evaluated the framework on twenty-one e-commerce Web sites. The evaluation results are promising with a high F1-score.

Original languageEnglish
Pages (from-to)528-545
Number of pages18
JournalInformation Sciences
Volume328
DOIs
Publication statusPublished - 20 Jan 2016
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2015 Elsevier Inc. All rights reserved.

Keywords

  • Graph grammar induction
  • Spatial graph grammar
  • Web patterns

Fingerprint

Dive into the research topics of 'Specification and discovery of web patterns: A graph grammar approach'. Together they form a unique fingerprint.

Cite this