GPT2-Large模型解碼方法比較

1 引言

最近兩年來,由于在數以百萬計網頁上訓練出來的基于Transformer的大型語言模型的興起,如OpenAI的GPT2模型,使得開放式語言生成的技術越來越成熟。在《開放式文本生成(Open-Ended Text Generation)》一文中,使用Transformers的管道"text-generation"產生了句子,這種方法的構建基礎是因果語言模擬(causal language modeling), 與BERT不同,BERT使用的是屏蔽語言模擬(masked language modeling)。不過,除了改進的Transformer架構和大量無監督訓練數據,選擇好的解碼方法也至關重要。這個筆記簡要回顧了不同的解碼策略,并在大量測試后優化出參數值的選擇范圍。這些解碼策略都用于自動回歸語言生成自動回歸式語言生成基于如下假設:一個詞序列的概率分布可以分解為鄰接的下一個詞條件概率分布的乘積。目前,GPT2, XLNet, OpenAi-GPT, CTRL, TransfoXL, XLM, Bart和T5都可用于自回歸語言生成。


2 GPT2-Large模型

由OpenAI開發的GPT2是一個大規模的基于Tranformer的語言模型,它是在一個大型的超過800萬個高質量網頁的文本語料庫上預先訓練的。其結果是只使用預訓練而不進行微調就可以獲得具有競爭力的性能。由于GPT2是一個自回歸語言模型,因此對語言生成任務非常有用。本試驗使用的是GPT-Large模型(3.25G)。


3 解碼方法

目前有四種流行的解碼方法,分別是: (1) Greedy search; (2) Beam search; (3) Top-K sampling; and (4) Top-p sampling. 下面分別進行試驗。

3.1 Greedy search

Greedy search(貪婪搜索)只是選擇概率最高的詞作為其接下來的詞,如下圖所示。從第一個詞"The"開始搜索,下一個概率最大的單詞是"nice", 接下來概率最大的詞是"woman", 整體序列概率為0.5*0.4=0.2。因此產生出來的句子為"The nice woman",

GPT2-Large模型解碼方法比較的圖1

Greedy search的主要缺點是,它錯過了隱藏在低概率詞后面的高概率詞,在上面的圖中,"dog" 的概率是0.4, 但"dog"后面的"has"概率是0.9,"The dog has”的整體概率為0.4*0.9=0.36>0.2,但Greedy search不能發現這一點,其結果是產生的句子中包含已經出現過的詞和句子結構。


使用以"landslide produced by earthquakes"作為開始,使用Greedy search方法產生出如下句子:

landslide produced by earthquakes in the region. The study, published in the journal Geophysical Research Letters, found that the earthquake swarm was caused by a series of small earthquakes that occurred in the same area in the last few days. The researchers said that the swarm was triggered by a small earthquake that occurred on the same day as the swarm. "The swarm was produced by a sequence of small, shallow earthquakes that were triggered by the same earthquake swarm," said lead author Dr. David R. Smith, a geophysicist at the University of California, Santa Cruz. In the study, the scientists used a computer model to simulate the earthquake sequence. It was found that a series small earthquakes occurred in a short period of time, which caused the swarm to form. A series of smaller earthquakes occurred over a period of several days, which triggered the swarm, the study said.[該地區的地震產生的滑坡。發表在《地球物理研究通訊》雜志上的這項研究發現,地震群是由過去幾天在同一地區發生的一系列小地震引起的。研究人員說,地震群是由與地震群同一天發生的一次小地震引發的。"這個地震群是由一連串的小型淺層地震產生的,這些地震是由同一個地震群引發的,"主要作者、加州大學圣克魯茲分校的地球物理學家David R. Smith博士說。在這項研究中,科學家們使用一個計算機模型來模擬地震序列。研究發現,在很短的時間內發生了一系列的小地震,這導致了地震群的形成。研究報告說,在幾天的時間里發生了一系列較小的地震,從而引發了地震群。]

可以發現,盡管這些句子的連貫性很好,但是有些冗余。


3.2 Beam search

Beam serach試圖克服上面Greedy search的缺點,通過在每個時步中保留最有可能的"num_beams",最后選擇總體概率最高的那個組合,我們仍以上面的例子,用 'num_beams=2,3,4三種設置來觀察結果。

(1) num_beams=2

landslide produced by earthquakes. "We have a lot of work to do to understand how these earthquakes happen and how they can be prevented," he said. "But we have a pretty good idea of what's going on." New study finds that earthquakes are more likely to occur in areas with high seismic activity. More information: "The Role of Seismic Activity in the Occurrence of Large-Scale Seismicity in the United States," Science. [由地震產生的山體滑坡。"我們有很多工作要做,以了解這些地震是如何發生的,以及如何能夠預防它們,"他說。"但是我們對發生的事情有一個相當好的想法。" 新的研究發現,在地震活動頻繁的地區更容易發生地震。更多信息參看"地震活動在美國大規模地震發生中的作用",<科學>雜志。]


(2) num_beams=3

landslide produced by earthquakes in the Pacific Northwest. "It's a very exciting time to be a geophysicist," he said. "We're getting to the point where we can do a lot of things that we've never been able to do before." New study shows how earthquakes can trigger tsunamis. More information: The paper is available online. [西北太平洋地區地震產生的滑坡。"作為一名地球物理學家,這是一個非常令人興奮的時刻,"他說。"我們正處于這樣的階段,我們可以做很多以前從未做過的事情。" 新的研究顯示地震如何引發海嘯。更多信息: 該論文可在網上查閱。]


(4)  num_beams=4

landslide produced by earthquakes in the Pacific Northwest. "This is the first time we've seen this kind of landslide in the United States," he said. "It's a very rare event." The landslide occurred in the area of Mount St. Helens, a volcano in Washington state that erupted in 1980 and has been a source of concern for scientists for decades. The volcano is located in a seismically active area of the state, and scientists have been concerned about the possibility of an eruption in the future. The U.S. Geological Survey (USGS) has been monitoring the volcano for more than a decade, and the USGS has warned that an eruption could occur at any time. The USGS is also monitoring Mount Rainier, another volcano in the Cascade Range, which erupted in 1883 and is also located in an earthquake-prone area of Washington state.  [西北太平洋地區的地震產生的山體滑坡。"這是我們第一次在美國看到這種山體滑坡,"他說。"這是一個非常罕見的事件。" 山體滑坡發生在圣海倫斯火山地區,這座位于華盛頓州的火山在1980年爆發,幾十年來一直是科學家們關注的焦點。這座火山位于該州的一個地震活躍區,科學家們一直擔心未來可能會有爆發。美國地質調查局(USGS)已經對該火山進行了十多年的監測,美國地質調查局警告說,火山爆發可能隨時發生。美國地質調查局也在監測雷尼爾火山,這是卡斯卡特山脈的另一座火山,于1883年爆發,也位于華盛頓州的一個地震多發區。]  

可以發現,隨著num_beams的增加,產生出更有意義的句子。


3.3 Sampling

Sampling的意思是按照條件概率分布隨機選取下一個單詞。總的來說,單純使用sampling方法會導致結果的不確定性,因此在實際應用中不使用這種方法。這種方法產生的結果如下:

landslide produced by earthquakes. This was a serious development, and scientists were busy working towards a solution in the face of the uncertainties of the problem. By the early 1980s, the problem had been significantly reduced because physicists and engineers have been using a technique called seismoturbation to investigate fault sequences.[地震產生的山體滑坡。這是一個嚴重的事態發展,面對這個問題的不確定性,科學家們忙于尋找解決方案。到20世紀80年代初,由于物理學家和工程師一直在使用一種叫做地震擾動的技術來調查斷層序列,這個問題已經大大減少。]


4 結束語

這個筆記比較了GPT2-Large模型下的解碼方法,在上面試驗的三種方法中,beam search產生的結果相對合理,但還不能太滿意的效果。接下來要試驗的是Top-K sampling和Top-p sampling這兩種方法。


登錄后免費查看全文
立即登錄
App下載
技術鄰APP
工程師必備
  • 項目客服
  • 培訓客服
  • 平臺客服

TOP