Correlation Analysis of Text Author Identification Results Based on N-Grams Frequency Distribution in Ukrainian Scientific and Technical Articles

Autor(en): Vysotska, V.
Markiv, O.
Teslia, S.
Romanova, Y.
Pihulechko, I.
Herausgeber: Lytvyn, V.
Sharonova, N.
Jonek-Kowalska, I.
Kowalska-Styczen, A.
Vysotska, V.
Kupriianov, Y.
Kanishcheva, O.
Cherednichenko, O.
Hamon, T.
Grabar, N.
Stichwörter: authorship definition; Computational linguistics; correlation analysis; Correlation methods; distribution function density; Distribution-functions; exponential and median smoothing; Exponentials; linguometry; N-Grams; Natural language processing systems, Authorship definition; NLP; Stylometric analyse; stylometric analysis; Stylometrics; Ukrainian text; Ukrainian text, Distribution functions
Erscheinungsdatum: 2022
Herausgeber: CEUR-WS
Enthalten in: CEUR Workshop Proceedings
Band: 3171
Startseite: 277
Seitenende: 314
Zusammenfassung: 
The results of experimental approbation of the proposed content monitoring method used for the determination of the author style in Ukrainian scientific texts of technical profile have been studied. Authorship identification systems typically use plagiarism and rewrite metrics to determine it. There is a necessity to identify whether the work has been borrowed fully or partially. Therefore, the situation when the work has not been published yet is not taken into consideration. Quantitative content analysis of the scientific and technical texts uses the advantages of content monitoring and analysis of text based on NLP, Web-Mining and stylometry methods to identify many authors whose speech styles are similar to the studied passages. It narrows the search for further use in stylometric methods to determine the degree of the analyzed text belonging to a particular author. The method of determining the author has been decomposed on the basis of such speech coefficients analysis as lexical diversity, degree (measure) of syntactic complexity, speech coherence, indices of the text exclusivity and concentration. In parallel, the parameters of the author style, such as the text words, sentences, prepositions, conjunction quantities and the number of words with a frequency of 1, 10 or more have been analyzed. © 2022 Copyright for this paper by its authors.
Beschreibung: 
Conference of 6th International Conference on Computational Linguistics and Intelligent Systems, COLINS 2022 - Volume I: Main ; Conference Date: 12 May 2022 Through 13 May 2022; Conference Code:180931
ISSN: 1613-0073
Externe URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85134754544&partnerID=40&md5=1c099000c8013dcc71b3933b4a1e0394

Zur Langanzeige

Seitenaufrufe

9
Letzte Woche
0
Letzter Monat
1
geprüft am 06.06.2024

Google ScholarTM

Prüfen