閱讀理解回答問題(Question Answering)---一個更強的BERT預訓練模型

計算巖土力學

2021年7月23日 10:21

1 引言

在<Transformers之問題對答(Question Answering)>中, 使用了mrm8488/bert-multi-cased-finetuned-xquadv1數據集回答問題, 這個數據集是一個多語言預訓練模型: BERT(base-multilingual-cased) fine-tuned for multilingual Q&A. 并且使用了最簡單的管道pileline()調用方法. 就像我們已經看到的一樣, 這個模型得出的結果不理想, 因此本文探索了一個更高級的預訓練模型.

2 模型描述

本文的試驗模型采用了bert-large-uncased-whole-word-masking-finetuned-squad數據集作為問題回答模型。在默認狀態(tài)下, 這個模型保存在C:\Users\m\.cache\huggingface\transformers文件夾內. 該模型不區(qū)分字母的大小寫, 使用了屏蔽語言模擬masked language modeling (MLM) 目標對英語語言進行預訓練。可以在問題回答管道中使用它，或者使用它來輸出給定查詢和上下文的原始結果。BERT模型在BookCorpus上進行了預訓練，該數據集由11,038本未出版的書籍和英文維基百科組成(不包括列表、表格和標題)。

與其他BERT模型不同的是，這個模型使用了全詞屏蔽Whole Word Masking技術進行訓練。在這種情況下，一個詞所對應的所有標記(tokens)都會被一次性屏蔽掉, 而整體屏蔽率保持不變。訓練是相同的 -- 每個被屏蔽的WordPiece標記都是獨立預測的。在預訓練之后，這個模型在SQuAD數據集上用一個微調腳本進行了微調。

BERT是一個以自我監(jiān)督方式在大型英語數據語料庫上預訓練的transformers 模型。這意味著它只對原始文本進行了預訓練，沒有人以任何方式給它們貼標簽（這就是為什么它可以使用大量公開可用的數據），并通過一個自動過程從這些文本中生成輸入和標簽。更確切地說，它的預訓練有兩個目標:

(1) Masked language modeling (MLM): 掩蔽語言模擬(MLM)---取一個句子，模型隨機掩蔽輸入中15%的單詞，然后通過模型運行整個掩蔽的句子預測掩蔽的單詞。這與傳統(tǒng)的遞歸神經網絡RNN不同，RNN通常是一個接一個地看單詞，或者與自回歸模型GPT不同，GPT在內部屏蔽未來的標記。而MLM允許模型學習句子的雙向表示。

(2) Next sentence prediction (NSP): 下一句預測(NSP)---模型在預訓練期間將兩個被掩蓋的句子連接起來作為輸入。作為隨機變量, 有時它們對應于原文中彼此相鄰的句子，有時則不是。然后，該模型預測這兩個句子是否彼此相接。

這樣一來，該模型就學會了英語的內在表示，然后可以用來提取對下游任務有用的特征, 例如，如果有一個標記的句子數據集，就可以使用BERT模型產生的特征作為輸入來訓練一個標準分類器。這個模型有以下配置: 24層, 1024個隱藏維度; 16個注意頭和336M參數。

3 調用方法

<Transformers之問題對答(Question Answering)>[transformers-pipeline-question-answering.py]使用了管道pipleline方法，本例使用AutoTokenizer方法[Transformers-AutoModelForQuestionAnswering.py]。

from transformers import AutoTokenizer, AutoModelForQuestionAnsweringimport torchtokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

4 測試結果

我們使用與上文內容相同的句子作為比較對象，提出以下四個問題::

內容: '''The development of a step-path failure surface is mainly controlled by the orientation and spatial characteristics of the present major rock structure including major joints sets, shear planes and fault planes. '''

(1) 問題: '''What kinds of factors controlled the development of a step-path failure surface?'''

回答: orientation and spatial characteristics of the present major rock structure including major joints sets, shear planes and fault planes

這個回答更精確和完善.
(2) 問題: '''Please describe the major rock structure.'''

回答: major joints sets, shear planes and fault planes

這個回答正確.

(3) 問題: What is rock structure?

回答: major joints sets, shear planes and fault planes

這個回答正確.

(4) How many kinds of present major rock structure?

回答: major joints sets, shear planes and fault planes (0.003)

這個回答正確.

因此，就英語語言來說，這個模型比上一個模型的效果好很多。

5 新的測試

內容: '''The Chuquicamata mine in northern Chile has one of the largest open pits in the world, measuring approximately 4 km long, 3 km wide, and 1 km deep. Removing ore and waste from the mine on conveyors or by truck, using the haul roads such as that illustrated in Fig. 25, is a complex and expensive process. Hence, planning started more than 10 years ago for a transition from open pit to block caving underground as the mining method.''' [智利北部的丘基卡馬塔礦是世界上最大的露天礦之一，長約4公里，寬3公里，深1公里。用傳送帶或卡車將礦石和矸石從礦井中運出，使用如圖25所示的運輸道路，這是一個復雜而昂貴的過程。因此，10多年前就開始規(guī)劃采礦方法，從露天礦過渡到地下塊體崩落法。]

根據上面的描述,提出以下四個問題:

(1) 問題: Where is Chuquicamata mine located?

回答: northern chile

(2) 問題: What size is Chuquicamata mine?

回答: one of the largest open pits in the world

(3) 問題: What method is used in Chuquicamata mine?

回答: block caving underground

(4) How depth is Chuquicamata mine?

回答: 1 km

登錄后免費查看全文

立即登錄