Deep Neural Network Models for Sequence Labeling and Coreference Tasks/ Глубокие нейросетевые модели для задач разметки последовательности и разрешения кореференции тема диссертации и автореферата по ВАК РФ 05.13.01, кандидат наук Ле Тхе Ань
- Специальность ВАК РФ05.13.01
- Количество страниц 143
Оглавление диссертации кандидат наук Ле Тхе Ань
List of Figures
List of Tables
1 Introduction
1.1 Overview of Deep Learning
1.1.1 Artificial Intelligence, Machine Learning, and Deep Learning
1.1.2 Milestones in Deep Learning History
1.1.3 Types of Machine Learning Models
1.2 Brief Overview of Natural Language Processing
1.3 Dissertation Overview
1.3.1 Scientific Actuality of the Research
1.3.2 The Goal and Task of the Dissertation
1.3.3 Scientific Novelty
1.3.4 Theoretical and Practical Value of the Work in the Dissertation
1.3.5 Statements to be Defended
1.3.6 Presentations and Validation of the Research Results
1.3.7 Publications
1.3.8 Dissertation Structure
2 Deep Neural Network Models for NLP Tasks
2.1 Word Representation Models
2.1.1 Word Representation
2.1.2 Prediction-based Models
2.1.3 Count-based Models
2.2 Deep Neural Network Models
2.2.1 Convolutional Neural Network
2.2.2 Recurrent Neural Network
2.2.3 Long Short-Term Memory Cells
2.2.4 LSTM Networks
2.3 Pre-trained Language Models
2.3.1 ELMo
2.3.2 Transformer
2.3.3 OpenAI's GPT
2.3.4 Google's BERT
2.4 Summary
3 Sequence Labeling with Character-aware Deep Neural Networks and Language Models
3.1 Introduction to the Sequence Labeling Tasks
3.2 Related Work
3.2.1 Rule-based Models
3.2.2 Feature-based Models
3.2.3 Deep Learning-based Models
3.2.4 Related Work on Vietnamese Named Entity Recognition
3.2.5 Related Work on Russian Named Entity Recognition
3.3 Tagging Schemes
3.4 Evaluation Metrics
3.5 WCC-NN-CRF Models for Sequence Labeling Tasks
3.5.1 Backbone WCC-NN-CRF Architecture
3.5.2 Language Model-based Architecture
3.6 Application of WCC-NN-CRF Models for Named Entity Recognition
3.6.1 Overview of Named Entity Recognition Task
3.6.2 Datasets and Pre-trained Word Embeddings
3.6.3 Evaluation of backbone WCC-NN-CRF Model
3.6.4 Evaluation of ELMo-based WCC-NN-CRF model
3.6.5 Evaluation of BERT-based Multilingual WCC-NN-CRF Model
3.7 Application of WCC-NN-CRF Model for Sentence Boundary Detection
3.7.1 Introduction to the Sentence Boundary Detection Task
3.7.2 Sentence Boundary Detection as a Sequence Labeling Task
3.7.3 Evaluation of WCC-NN-CRF SBD Model
3.8 Conclusions
4 Coreference Resolution with Sentence-level Coreferential Scoring
4.1 The Coreference Resolution Task
4.2 Related Work
4.2.1 Rule-based Models
4.2.2 Deep Learning Models
4.3 Coreference Resolution Evaluation Metrics
4.4 Baseline Model Description
4.5 Sentence-level Coreferential Relation-based Model
4.6 BERT-based Coreference Model
4.7 Experiments and Results
4.7.1 Datasets
4.7.2 Evaluation of Proposed Models
4.8 Conclusions
5 Conclusions
5.1 Conclusions for Sequence Labeling Task
5.2 Conclusions for Coreference Resolution Task
5.3 Summary of the Main Contributions of the Dissertation
Введение диссертации (часть автореферата) на тему «Deep Neural Network Models for Sequence Labeling and Coreference Tasks/ Глубокие нейросетевые модели для задач разметки последовательности и разрешения кореференции»
Deep neural network models have recently received tremendous attentions from both academy and industry, and of course, garnered amazing results in a variety of domains ranging from Computer Vision, Speech Recognition to Natural Language Processing (NLP). They significantly lifted the performance of machine learning-based systems to a whole new level, close to the human-level performance. As a matter of course, the number of deep learning projects has also increased year by year. The IPavlov project1, based at the Neural Networks and Deep Learning Lab of Moscow Institute of Physics and Technology (MIPT), is one of them, aiming at building a set of pre-trained network models, predefined dialogue system components and pipeline templates. This thesis is based on the work carried out as a part of this project, focusing on studying deep neural network models to address Sequence Labeling and Coreference Resolution tasks.
This thesis consists of three main parts. Firstly, we systematically synthesize three key concepts in the field of Deep Learning for NLP closely related to the work carried out in this thesis, including (1) two approaches to word representation learning, (2) deep neural network models often used to address machine learning tasks in general and NLP tasks in particular, and (3) cutting-edge pre-trained language models and their applications in downstream tasks. Secondly, we propose three deep neural network models for Sequence Labeling tasks, including (1) the hybrid model consisting of three sub-networks to fully capture character-level and capitalization features as well as word context features, (2) language modeling-based model, and (3) the multilingual model. These proposed models were evaluated on the task of Named Entity Recognition. Conducted experiments on six datasets covering four languages Russian, Vietnamese, English, and Chinese datasets showed that our models achieved state-of-the-art performance. Besides that, we reformulated the task of Sentence Boundary Detection as Sequence Labeling task and used the proposed model to address this task. The obtained results on two conversational datasets pointed out that the proposed model achieved an impressive accuracy. Thirdly, we propose two models for Coreference Resolution task, including (1) Sentence-level Coreferential Relation-based model that can take as input a paragraph, a discourse, or even a document with hundreds of sentences and predict the coreference relations between sentences, and (2) the language modeling-based
model that leverages the power of modern language models to boost the model performance. The experiment results on two Russian datasets and the comparisons with the other existing models pointed out that our models obtained the cutting-edge results on both Anaphora and Coreference Resolution tasks.
Заключение диссертации по теме «Системный анализ, управление и обработка информации (по отраслям)», Ле Тхе Ань
5.3 Summary of the Main Contributions of the Dissertation
In conclusion, the main contributions of the dissertation are:
1. An original hybrid model for sequence labeling task was proposed and studied. This
model extended existed Bi-LSTM CRF architectures with (1) trainable CNN for generation of character-level representation of an input sequence, and (2) Bi-LSTM network for encoding capitalization features. The model achieved state of the art performance on Russian and Vietnamese datasets with F1 98.21%, 94.43% on NE3 and VLSP-2016. Ablation studies demonstrated that character-level encoding produces a larger improvement than capitalisation encoding.
2. Extensions of the original architecture with encoders based on language models ELMo and BERT were evaluated on Russian and English datasets. It obtained state of the art performance of 99.17%, 92.91% F1 on NE3 and Gareev's dataset, and a comparable performance, 92.27% F1, on CoNLL-2003.
3. Application of proposed sequence labeling model to the sentence boundary detection task produced solid results of 89.99% F1 and 95.88% F1 on the Cornell Movie-Dialog and DailyDialog datasets.
4. Sentence-level coreferential relation can significantly improve the performance of solving coreference resolution task. The experiments on OntoNotes dataset shows that quality of solution can be boosted up to 5.84%.
5. An original model for learning sentence-level coreferential relationships was introduced. Incorporation of this model in the baseline coreference architecture improved it's performance for English.
6. Application of the model with sentence coreference module allowed to achieve state of the art of 58.42% average F1 on RuCor dataset.
