Automated Semantic Analysis of Schematic Data
BeschreibungContent in numerous data sources
are not directly amenable to machine processing.
This book describes techniques for automated semantic analysis of
schematic content which are characterized by being populated from backend
Starting with a seed set of hand-labeled instances of semantic
concepts in a set of HTML documents, a technique is devised that
bootstraps an annotation process for automatic identification of
concept instances present in other documents. The technique exploits
the observation that semantically related items in schematic HTML
documents exhibit consistency in presentation style and spatial
locality to learn statistical concept models, using light-weight
semantic features. This model directs the annotation of diverse
Web documents possessing similar content semantics.
The power of these techniques is demonstrated through applications developed
for real-life problems that include
audio-based assistive browsing for non-visual Web access,
focused browsing on handhelds with semantic bookmarks, text data cleaning,
and accurate identification of remote homologs of
biological protein sequences.
PortraitSaikat Mukherjee received his Ph.D. from State University of New York, Stony Brook, in 2005 and is currently a Research Scientist at Siemens Corporate Research, USA. He is interested in semantic analysis of unstructured text data using ontology-based techniques, machine learning, and natural language processing.
Untertitel: Learning-based Techniques for Scalable and Automated Semantic Understanding of Template Generated Schematic Web Content. Paperback. Sprache: Englisch.
Verlag: VDM Verlag
Erscheinungsdatum: Mai 2008
Seitenanzahl: 108 Seiten