Structure-sensitive learning of text types
Autor(en): | Geibel, P. Krumnack, U. Pustylnikov, O. Mehler, A. Gust, H. Kühnberger, K.-U. |
Stichwörter: | Classification (of information); Learning systems; Trees (mathematics), Logical document structure; Structural features, Text processing | Erscheinungsdatum: | 2007 | Herausgeber: | Springer Verlag | Journal: | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Volumen: | 4830 LNAI | Startseite: | 642 | Seitenende: | 646 | Zusammenfassung: | In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees. We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only. © Springer-Verlag Berlin Heidelberg 2007. |
Beschreibung: | Conference of 20th Australian Joint Conference on Artificial Intelligence, AI 2007 ; Conference Date: 2 December 2007 Through 6 December 2007; Conference Code:71206 |
ISBN: | 9783540769262 | ISSN: | 03029743 | DOI: | 10.1007/978-3-540-76928-6_68 | Externe URL: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-38349068707&doi=10.1007%2f978-3-540-76928-6_68&partnerID=40&md5=d0cda235df49be1177039638371d3d05 |
Zur Langanzeige
Seitenaufrufe
1
Letzte Woche
1
1
Letzter Monat
0
0
geprüft am 14.05.2024