Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Ballout, Mohamad; Krumnack, Ulf; Heidemann, Gunther; Kühnberger, Kai-Uwe

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Autor(en):	Ballout, Mohamad Krumnack, Ulf Heidemann, Gunther Kühnberger, Kai-Uwe
Herausgeber:	Jayne, C. Mandic, D. Duro, R.
Stichwörter:	Computational linguistics; Cross-domain; deep learning; Language model; Language processing; Learning systems; multi-modal learning; natural language processing; Natural language processing systems; Natural languages; Performance; Pre-trained transformer; pre-trained transformers; transfer learning
Erscheinungsdatum:	2023
Herausgeber:	Elsevier B.V.
Journal:	Procedia Computer Science
Volumen:	222
Startseite:	114 – 126
Zusammenfassung:	Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained language models to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained language models perform better on the Listops dataset, with an average accuracy of 58.7%, compared to transformers trained from scratch, which have an average accuracy of 29.0%. The significant improvement demonstrated across three types of datasets suggests that pre-training on language helps the models to acquire general knowledge, bringing us a step closer to general AI. We also showed that reducing the number of parameters in pre-trained language models does not have a great impact as the performance drops slightly when using T5-Small instead of T5-Base. In fact, when using only 2% of the parameters, we achieved a great improvement compared to training from scratch. Finally, in contrast to prior work, we find out that using pre-trained embeddings for the input layer is necessary to achieve the desired results. © 2023 The Authors. Published by Elsevier B.V.
Beschreibung:	Cited by: 0; Conference name: International Neural Network Society Workshop on Deep Learning Innovations and Applications, INNS DLIA 2023; Conference date: 18 June 2023 through 23 June 2023; Conference code: 192997; All Open Access, Gold Open Access, Green Open Access
ISSN:	1877-0509
DOI:	10.1016/j.procs.2023.08.147
Externe URL:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175790999&doi=10.1016%2fj.procs.2023.08.147&partnerID=40&md5=d8881dcab29c4a78158ba23d27bddf38

Zur Langanzeige

Seitenaufrufe

1

Letzte Woche
0

Letzter Monat
0

geprüft am 28.05.2024

Google Scholar^TM

Prüfen

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Seitenaufrufe

Google ScholarTM

Altmetric

Google Scholar^TM