๐Ÿญ

(211010) Diary: Multi & Cross-Lingual Language Model

211010 Diary BERT
ํšŒ์‚ฌ์—์„œ ํ•œ๊ธ€๋กœ ์ž‘์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๊ณ , ๊ณผ๊ฑฐ ์—ฐ๊ตฌ์‹ค์—์„œ ์˜์–ด(ํ˜น์€ ํ•œ๊ธ€) Corpus๋งŒ์„ ๋‹ค๋ฃจ์—ˆ๋˜ ๋‚˜๋Š” ๋‹ค์–‘ํ•œ ์–ธ์–ด๋ฅผ ์ง€์›ํ•˜๋Š” LM(s)์— ๋Œ€ํ•ด ์ƒ๊ฐํ•ด ๋ณผ ๊ธฐํšŒ๊ฐ€ ๊ฑฐ์˜ ์—†์—ˆ๋‹ค. ์ตœ๊ทผ ๊ณต๋ถ€๋ฅผ ํ•˜๋˜ ์ค‘ XLM ๋…ผ๋ฌธ์„ ๋ณด๊ฒŒ ๋˜์—ˆ๊ณ , ์Šค์Šค๋กœ ํ•ด๋‹น ๋‚ด์šฉ(Multi, Cross-Lingual LM)์— ๊ด€ํ•œ ์ง€์‹์ด ๋ถ€์กฑํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค์„ ์ฐพ์•„ ๊ฐ„๋žตํ•˜๊ฒŒ ์‚ดํŽด๋ณด์•˜๋‹ค.

Multilingual LM: BERT

KoBERT๋ฅผ ์‹œ์ž‘์œผ๋กœ KoELECTRA, KLUE-RoBERTa ๋“ฑ ์„ฑ๋Šฅ์ด ์ข‹์€ ํ•œ๊ตญ์–ด LM๋“ค์ด ๋“ฑ์žฅํ•˜๊ธฐ ์ด์ „์—๋Š” Google์—์„œ ๊ณต๊ฐœํ•œ Multilingual Cased BERT(M-BERT)๋ฅผ ์ฃผ๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. Multilingual LM์ด๋ž€ ๋ณต์ˆ˜์˜ ์–ธ์–ด๋กœ ๊ตฌ์„ฑ๋œ Corpus๋ฅผ ํ•™์Šตํ•œ ๋ชจ๋ธ๋กœ, M-BERT์˜ ๊ฒฝ์šฐ 100์—ฌ ๊ฐœ์˜ ์–ธ์–ด๋ฅผ ํ•™์Šตํ•˜๊ณ , ์–ธ์–ด๋“ค ๊ฐ„์— 110K Size์˜ WordPiece Vocab์„ ๊ณต์œ ํ•œ๋‹ค. M-BERT๋Š” ์ผ๋ฐ˜์ ์ธ BERT์™€ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Œ์—๋„, Zero-Shot Cross-Lingual Transfer(Generalization)์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

Cross-Lingual LM: XLM

Cross-Lingual LM์€ ๋‹จ์ˆœํžˆ ๋ณต์ˆ˜์˜ ์–ธ์–ด๋ฅผ ํ•™์Šตํ•˜๋Š” Multilingualํ•œ ๋ฐฉ์‹+๋‹ค๋ฅธ ์–ธ์–ด์˜ ๋™์ผํ•œ ํ‘œํ˜„๋“ค์„ ์œ ์‚ฌํ•œ Embedding Space๋กœ Mappingํ•˜๋„๋ก ํ•™์Šตํ•œ ๋ชจ๋ธ์ด๋‹ค. M-BERT๋Š” ๋™์ผํ•œ Embedding Space์— ๋ชจ๋“  ์–ธ์–ด ํ‘œํ˜„๋“ค์„ Mappingํ•˜์ง€๋งŒ, ๊ฐ™์€ ์˜๋ฏธ์˜ ํ‘œํ˜„๋“ค์ด ์œ ์‚ฌํ•œ Embedding ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•™์Šตํ•˜์ง€ ์•Š์•˜๊ธฐ์— Cross-Lingualํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์—†๋‹ค. ๊ทธ๋Ÿผ์—๋„ Google ๋…ผ๋ฌธ์— ์˜ํ•˜๋ฉด, ์–ธ์–ด๋“ค์˜ Typological Features(SVO ์ˆœ์„œ ๋“ฑ)๊นŒ์ง€ Catchํ•˜๋Š” Multilingual Representation ๋Šฅ๋ ฅ์œผ๋กœ ์ธํ•ด Cross-Lingual Generalization์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.
Cross-Lingual Training Objective๋ฅผ ํ†ตํ•ด LM์„ ๋ช…์‹œ์ ์œผ๋กœ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”๋ฐ, Facebook์—์„œ ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค์„ ๋งŽ์ด ์ˆ˜ํ–‰ํ•˜๋Š” ๋“ฏ ํ•˜๋‹ค. ์—ฐ๊ตฌ์‹ค์—์„œ Unsupervised Translation์„ ์œ„ํ•œ Facebook์˜ Cross-Lingual Word & Sentence Embedding ๋…ผ๋ฌธ์„ ์ฝ์€ ์ ์ด ์žˆ๋‹ค. XLM ์—ญ์‹œ ๋™์ผํ•œ ์ €์ž์˜ ์—ฐ๊ตฌ๋กœ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ„๋žตํžˆ ๋‚ด์šฉ์„ ์ถ”๋ฆด ์ˆ˜ ์žˆ๋‹ค.
โ€ข
Monolingual Data๋ฅผ ํ™œ์šฉํ•˜๋Š” Unsupervised Pre-Taining Objectives 2๊ฐœ: Casual LM(CLM), Masked LM(MLM) ์ œ์•ˆ
โ€ข
Parallel Data๋ฅผ ํ™œ์šฉํ•˜๋Š” Supervised Pre-Training Objective: Translation LM(TLM) ์ œ์•ˆ
โ€ข
CLM, MLM, MLM+TLM์œผ๋กœ XNLI, Supervised & Unsupervised Translation์—์„œ SOTA์˜ ์„ฑ๋Šฅ์„ ๋ƒ„
โ€ข
๋ฌด์—‡๋ณด๋‹ค Labeled Data๊ฐ€ ๋งŽ์€ ์–ธ์–ด์—์„œ ์ ์€ ์–ธ์–ด๋กœ Cross-Lingual Transfer๋ฅผ ํ†ตํ•ด Training Resource๊ฐ€ ๋ถ€์กฑํ•œ ์–ธ์–ด์˜ LM ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ