BERT代碼的解讀1---數據處理部分

 判斷是否是漢字使用的是unicode編碼

#判斷是步是中文字符,漢字的unicode編碼最小值爲:0x4e00,最大值爲0x952f
  def _is_chinese_char(self, cp):
    """Checks whether CP is the codepoint of a CJK character."""
    # This defines a "chinese character" as anything in the CJK Unicode block:
    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
    #
    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
    # despite its name. The modern Korean Hangul alphabet is a different block,
    # as is Japanese Hiragana and Katakana. Those alphabets are used to write
    # space-separated words, so they are not treated specially and handled
    # like the all of the other languages.
    '''
    0x4e00-0x9fff cjk 統一字型 常用字 共 20992個(實際只定義到0x9fc3)
0x3400-0x4dff cjk 統一字型擴展表a 少用字 共 6656個
0x20000-0x2a6df cjk 統一字型擴展表b 少用字,歷史上使用 共42720個
0xf900-0xfaff cjk 兼容字型 重複字,可統一變體,共同字 共512個
0x2f800-0x2fa1f cjk 兼容字型補遺 可統一變體 共544個
    '''
    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
        (cp >= 0x3400 and cp <= 0x4DBF) or  #
        (cp >= 0x20000 and cp <= 0x2A6DF) or  #
        (cp >= 0x2A700 and cp <= 0x2B73F) or  #
        (cp >= 0x2B740 and cp <= 0x2B81F) or  #
        (cp >= 0x2B820 and cp <= 0x2CEAF) or
        (cp >= 0xF900 and cp <= 0xFAFF) or  #
        (cp >= 0x2F800 and cp <= 0x2FA1F)):  #

['this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '區', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত']
Text should be one-sentence-per-line, with empty lines between documents.

all_dovument =[[['this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '區', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত'], ['text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.'], ['this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.']], [['the', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers',

instance =[tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]

all_documents = [[]] #轉成二維矩陣,文檔個數×句子個數

create_instances_from_document():函數的解釋:

1.根據索引選中一篇文檔,並將句子對的最大長度定義爲128,考慮到需要插入標誌3個,最後的長度爲125.

2.考慮到微調和預訓練的過程,以一定的概率隨機產生一些小於最短序列的數據,

3.從選中的文檔中選擇候選集,候選集所有句子的長度不超過設定的最大長度,訓練數據的句子對分爲a句和b句[a,b]共同構成訓練數據,就是先有a句,下一句爲b句。有兩種情況:b是a的下一句;b不是a的下一句。

a的構造:a可能有多個句子組成,a_end是a的結束句,a_end的選取是從,候選集中隨機選取。a確定後在來確定下一句b

b的構造:分爲兩種情況:一種是a的真實下一句,另一種是從其他文檔中隨機的選取b,構成a的下一句。選擇的過程是先隨機選擇一個文檔,在該文檔張隨機選擇初始句子,在從(初始句,結束句)隨機選擇剩下的句子,且b的長度爲最大長度減去a的長度,這樣就構成了b不是a的真正的下一句,這樣的a,b對有50%的可能性。另外一種構成方式是,直接把a部分的真實語句拼接。構成b。以上是由上一句構成下一句的訓練集的構成。

tokens ['[CLS]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
------------ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
============== [47, 3, 8, 11, 4, 46, 40, 7, 28, 30, 33, 26, 18, 12, 22, 39, 35, 21, 31, 42, 15, 1, 38, 34, 44, 29, 32, 19, 17, 43, 6, 37, 45, 27, 41, 36, 13, 20, 14, 23, 25, 9, 24, 48, 2, 10, 5]
 [MaskedLmInstance(index=47, label='##tenberg'), MaskedLmInstance(index=3, label='of'), MaskedLmInstance(index=8, label='seekers'), MaskedLmInstance(index=11, label='was'), MaskedLmInstance(index=4, label='his'), MaskedLmInstance(index=46, label='gut'), MaskedLmInstance(index=40, label='and'), MaskedLmInstance(index=7, label='-')]

以上是被屏蔽的位置和標籤-原來的真實值

然後進行排序後的結果

 [MaskedLmInstance(index=3, label='of'), MaskedLmInstance(index=4, label='his'), MaskedLmInstance(index=7, label='-'), MaskedLmInstance(index=8, label='seekers'), MaskedLmInstance(index=11, label='was'), MaskedLmInstance(index=40, label='and'), MaskedLmInstance(index=46, label='gut'), MaskedLmInstance(index=47, label='##tenberg')]

原始值

 ['[CLS]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']

屏蔽後的值

['[CLS]', 'like', 'most', '[MASK]', '[MASK]', 'fellow', 'gold', '[MASK]', '[MASK]', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', '[MASK]', 'was', 'randomly', 'selected', 'from', 'project', '[MASK]', '[MASK]', '.', '[SEP]']

一個instances

[tokens: [CLS] ceased [MASK] the gray streaks of morning at blazing star , and the [MASK] awoke to a [MASK] sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller [MASK] ut ##ens ##ils , where the heavy showers had washed away the [MASK] [MASK] dust heap ##s before the cabin doors . indeed , it [MASK] recorded in blazing star that a fortunate [MASK] rise ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain had freed from its inc ##umber ##ing soil , and [SEP] this text is [MASK] to [MASK] sure unicode is handled [MASK] : [MASK] 加 勝 北 區 ᴵ bobbie ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 6 8 14 18 36 44 49 50 62 70 83 93 103 105 110 112 118 125
masked_lm_labels: with of at settlement moral camp showers debris and was early of inc included make properly 力 ##ᴺ ##ণ

, tokens: [CLS] possibly this may have been the reason why early rise [MASK] [MASK] that locality , during the [MASK] season , adopted [MASK] thoughtful habit of body , and seldom lifted their eyes to the rift ##ed [MASK] [MASK] - ink washed skies above them . [SEP] [MASK] , [MASK] not with a view [MASK] discovery . a leak in his cabin roof , - - quite consistent with his careless , imp ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded " bunk " and wet blankets . the [MASK] [MASK] his wood pile independently to kind ##le a fire to ##ᵘ [MASK] bed - [MASK] , and he had rec honesty ##e to [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 11 12 14 18 22 37 38 47 49 54 73 102 103 107 114 115 118 124 125
masked_lm_labels: ##rs in locality rainy a or india morning but to ##rov chips from refused dry his clothes ##ours ##e

, tokens: [CLS] this was nearly opposite . mr . cass ##ius crossed the highway , and stopped suddenly . something glitter ##ed in the [MASK] red pool [MASK] him [MASK] gold , surely ! but [MASK] wonderful [MASK] [MASK] , not an irregular , shape ##less fragment of [MASK] ore , fresh from [MASK] ' s cr ##ucible , but a bit of jewel ##er ' s ⁻ ##ic [MASK] ##t in [MASK] form [MASK] a plain gold ring . [MASK] at it [MASK] at ##ten ##tively , he saw that it [MASK] the inscription , " may to cass . [MASK] [SEP] this sample text is public domain and [MASK] randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 23 26 28 34 36 37 47 52 66 68 71 73 79 82 91 96 100 109
masked_lm_labels: nearest before . , to relate crude nature hand ##raf the of looking more bore may " was

, tokens: [CLS] like most [MASK] [MASK] fellow gold [MASK] [MASK] , cass was super ##sti ##tious . [SEP] text should be one - sentence - per - line , with empty lines between documents . this sample text is public domain [MASK] was randomly selected from project [MASK] [MASK] . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 3 4 7 8 11 40 46 47
masked_lm_labels: of his - seekers was and gut ##tenberg

]
write_instance_to_example_files:寫入文件
token轉換成ID
input_ids [101, 2023, 3793, 2003, 2443, 2000, 2191, 2469, 27260, 2003, 8971, 7919, 1024, 1778, 1779, 1780, 1781, 1782, 1493, 30030, 30031, 30032, 29893, 29894, 29895, 29896, 29897, 29898, 3793, 2323, 103, 2028, 1011, 6251, 1011, 103, 1011, 2240, 1010, 2007, 4064, 3210, 103, 5491, 1012, 102, 103, 7099, 3793, 2003, 2270, 5884, 1998, 2001, 103, 3479, 2013, 2622, 9535, 21806, 1012, 102]

覆蓋:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
features["input_ids"]
int64_list {
  value: 101
  value: 2023
  value: 3793
  value: 2003
  value: 2443
  value: 2000
  value: 2191
  value: 2469
  value: 27260
  value: 2003
  value: 8971
  value: 7919
  value: 1024
  value: 1778
  value: 1779
  value: 1780
  value: 1781
  value: 1782
  value: 1493
  value: 30030
  value: 30031
  value: 30032
  value: 29893
  value: 29894
  value: 29895
  value: 29896
  value: 29897
  value: 29898
  value: 3793
  value: 2323
  value: 103
  value: 2028
  value: 1011
  value: 6251
  value: 1011
  value: 103
  value: 1011
  value: 2240
  value: 1010
  value: 2007
  value: 4064
  value: 3210
  value: 103
  value: 5491
  value: 1012
  value: 102
  value: 103
  value: 7099
  value: 3793
  value: 2003
  value: 2270
  value: 5884
  value: 1998
  value: 2001
  value: 103
  value: 3479
  value: 2013
  value: 2622
  value: 9535
  value: 21806
  value: 1012
  value: 102
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
}
----------features OrderedDict([('input_ids', int64_list {
  value: 101
  value: 4298
  value: 2023
  value: 2089
  value: 2031
  value: 2042
  value: 1996
  value: 3114
  value: 2339
  value: 2220
  value: 4125
  value: 103
  value: 103
  value: 2008
  value: 10246
  value: 1010
  value: 2076
  value: 1996
  value: 103
  value: 2161
  value: 1010
  value: 4233
  value: 103
  value: 16465
  value: 10427
  value: 1997
  value: 2303
  value: 1010
  value: 1998
  value: 15839
  value: 4196
  value: 2037
  value: 2159
  value: 2000
  value: 1996
  value: 16931
  value: 2098
  value: 103
  value: 103
  value: 1011
  value: 10710
  value: 8871
  value: 15717
  value: 2682
  value: 2068
  value: 1012
  value: 102
  value: 103
  value: 1010
  value: 103
  value: 2025
  value: 2007
  value: 1037
  value: 3193
  value: 103
  value: 5456
  value: 1012
  value: 1037
  value: 17271
  value: 1999
  value: 2010
  value: 6644
  value: 4412
  value: 1010
  value: 1011
  value: 1011
  value: 3243
  value: 8335
  value: 2007
  value: 2010
  value: 23358
  value: 1010
  value: 17727
  value: 12298
  value: 5178
  value: 3372
  value: 14243
  value: 1010
  value: 1011
  value: 1011
  value: 2018
  value: 27384
  value: 2094
  value: 2032
  value: 2012
  value: 1018
  value: 1037
  value: 1012
  value: 1049
  value: 1012
  value: 1010
  value: 2007
  value: 1037
  value: 10361
  value: 1000
  value: 25277
  value: 1000
  value: 1998
  value: 4954
  value: 15019
  value: 1012
  value: 1996
  value: 103
  value: 103
  value: 2010
  value: 3536
  value: 8632
  value: 9174
  value: 2000
  value: 2785
  value: 2571
  value: 1037
  value: 2543
  value: 2000
  value: 30042
  value: 103
  value: 2793
  value: 1011
  value: 103
  value: 1010
  value: 1998
  value: 2002
  value: 2018
  value: 28667
  value: 16718
  value: 2063
  value: 2000
  value: 102
}
), ('input_mask', int64_list {
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
}
), ('segment_ids', int64_list {
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
}
), ('masked_lm_positions', int64_list {
  value: 11
  value: 12
  value: 14
  value: 18
  value: 22
  value: 37
  value: 38
  value: 47
  value: 49
  value: 54
  value: 73
  value: 102
  value: 103
  value: 107
  value: 114
  value: 115
  value: 118
  value: 124
  value: 125
  value: 0
}
), ('masked_lm_ids', int64_list {
  value: 2869
  value: 1999
  value: 10246
  value: 16373
  value: 1037
  value: 2030
  value: 2634
  value: 2851
  value: 2021
  value: 2000
  value: 12298
  value: 11772
  value: 2013
  value: 4188
  value: 4318
  value: 2010
  value: 4253
  value: 22957
  value: 2063
  value: 0
}
), ('masked_lm_weights', float_list {
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 0.0
}
), ('next_sentence_labels', int64_list {
  value: 0
}
)])

數據格式的構造部分結束

tokens ['[CLS]', 'ancient', 'sage', '-', '-', 'the', 'name', 'is', 'un', '##im', '##port', '##ant', 'to', 'a', 'monk', '-', '-', 'pumped', 'water', 'nightly', 'that', 'he', 'might', 'study', 'by', 'day', ',', 'so', 'i', ',', 'the', 'guardian', 'of', 'cloak', '##s', 'and', 'para', '##sol', '##s', ',', 'at', 'the', 'sacred', 'doors', 'of', 'her', 'lecture', '-', 'room', ',', 'im', '##bib', '##e', 'celestial', 'knowledge', '.', 'from', 'my', 'youth', 'i', 'felt', 'in', 'me', 'a', '[SEP]', 'fallen', 'star', ',', 'i', 'am', ',', 'sir', '!', "'", 'continued', 'he', ',', 'pens', '##ively', ',', 'stroking', 'his', 'lean', 'stomach', '-', '-', "'", 'a', 'fallen', 'star', '!', '-', '-', 'fallen', ',', 'if', 'the', 'dignity', 'of', 'philosophy', 'will', 'allow', 'of', 'the', 'simi', '##le', ',', 'among', 'the', 'hog', '##s', 'of', 'the', 'lower', 'world', '-', '-', 'indeed', ',', 'even', 'into', 'the', 'hog', '-', 'bucket', 'itself', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

]
tokens ['[CLS]', 'there', 'is', 'a', 'phil', '##oso', '##phic', 'pleasure', 'in', 'opening', 'one', "'", 's', 'treasures', 'to', 'the', 'modest', 'young', '.', '[SEP]', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers', 'had', 'washed', 'away', 'the', 'debris', 'and', 'dust', 'heap', '##s', 'before', 'the', 'cabin', 'doors', '.', 'indeed', ',', 'it', 'was', 'recorded', 'in', 'blazing', 'star', 'that', 'a', 'fortunate', 'early', 'rise', '##r', 'had', 'once', 'picked', 'up', 'on', 'the', 'highway', 'a', 'solid', 'chunk', 'of', 'gold', 'quartz', 'which', 'the', 'rain', 'had', 'freed', 'from', 'its', 'inc', '##umber', '##ing', 'soil', ',', 'and', 'washed', 'into', 'immediate', 'and', 'glittering', 'popularity', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

]
tokens ['[CLS]', 'perhaps', 'you', 'will', 'assist', 'me', 'by', 'carrying', 'this', 'basket', 'of', 'fruit', '?', "'", 'and', 'the', 'little', 'man', 'jumped', 'up', ',', 'put', 'his', 'basket', 'on', 'phil', '##am', '##mon', "'", 's', 'head', ',', 'and', 'tr', '##otted', 'off', 'up', 'a', 'neighbouring', 'street', '.', 'phil', '##am', '##mon', 'followed', ',', 'half', 'contempt', '##uous', ',', 'half', 'wondering', 'at', 'what', 'this', 'philosophy', 'might', 'be', ',', 'which', 'could', 'feed', 'the', 'self', '-', 'con', '##ce', '##it', 'of', 'anything', 'so', 'ab', '##ject', 'as', 'his', 'ragged', 'little', 'api', '##sh', 'guide', ';', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

]
tokens ['[CLS]', 'of', 'the', 'street', ',', 'the', 'perpetual', 'stream', 'of', 'busy', 'faces', ',', 'the', 'line', 'of', 'cu', '##rri', '##cles', ',', 'pal', '##an', '##quin', '##s', ',', 'laden', 'ass', '##es', ',', 'camel', '##s', ',', 'elephants', ',', 'which', 'met', 'and', 'passed', 'him', ',', 'and', 'squeezed', 'him', 'up', 'steps', 'and', 'into', 'doorway', '##s', ',', 'as', 'they', 'threaded', 'their', 'way', 'through', 'the', 'great', 'moon', '-', 'gate', 'into', 'the', 'ample', 'street', 'beyond', ',', 'drove', 'everything', 'from', 'his', 'mind', 'but', 'wondering', 'curiosity', ',', 'and', 'a', 'vague', ',', 'helpless', 'dread', 'of', 'that', 'great', 'living', 'wilderness', ',', 'more', 'terrible', 'than', 'any', 'dead', 'wilderness', 'of', 'sand', 'which', 'he', 'had', 'left', '[SEP]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '區', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 區 ##ণ

]
tokens ['[CLS]', 'the', 'rep', '##ose', ',', 'the', 'silence', 'of', 'the', 'laura', '-', '-', 'for', 'faces', 'which', 'knew', 'him', 'and', 'smiled', 'upon', 'him', ';', 'but', 'it', 'was', 'too', 'late', 'to', 'turn', 'back', 'now', '.', 'his', 'guide', 'held', 'on', 'for', 'more', 'than', 'a', 'mile', 'up', 'the', 'great', 'main', 'street', ',', 'crossed', 'in', 'the', 'centre', 'of', 'the', 'city', ',', 'at', 'right', 'angles', ',', 'by', 'one', 'equally', 'magnificent', ',', 'at', 'each', 'end', 'of', 'which', ',', 'miles', 'away', ',', 'appeared', ',', 'dim', 'and', 'distant', 'over', 'the', 'heads', 'of', 'the', 'living', 'stream', 'of', 'passengers', ',', 'the', 'yellow', 'sand', '-', 'hills', 'of', 'the', 'desert', ';', 'while', 'at', 'the', 'end', 'of', 'the', 'vista', 'in', 'front', 'of', 'them', 'gleamed', 'the', 'blue', 'harbour', ',', 'through', 'a', 'network', 'of', 'countless', 'mast', '##s', '.', '[SEP]', 'this', 'was', 'nearly', 'opposite', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 區 ##ণ

, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this

]
tokens ['[CLS]', 'at', 'last', 'they', 'reached', 'the', 'quay', 'at', 'the', 'opposite', 'end', 'of', 'the', 'street', ';', '[SEP]', 'but', ',', 'wonderful', 'to', 'relate', ',', 'not', 'an', 'irregular', ',', 'shape', '##less', 'fragment', 'of', 'crude', 'ore', ',', 'fresh', 'from', 'nature', "'", 's', 'cr', '##ucible', ',', 'but', 'a', 'bit', 'of', 'jewel', '##er', "'", 's', 'hand', '##ic', '##raf', '##t', 'in', 'the', 'form', 'of', 'a', 'plain', 'gold', 'ring', '.', 'looking', 'at', 'it', 'more', 'at', '##ten', '##tively', ',', 'he', 'saw', 'that', 'it', 'bore', 'the', 'inscription', ',', '"', 'may', 'to', 'cass', '.', '"', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 區 ##ণ

, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this

, tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94
masked_lm_labels: last they end but , not fragment , cr but ##ic it more of was

]
tokens ['[CLS]', 'and', 'there', 'burst', 'on', 'phil', '##am', '##mon', "'", 's', 'astonished', 'eyes', 'a', 'vast', 'semi', '##ci', '##rcle', 'of', 'blue', 'sea', ',', 'ring', '##ed', 'with', 'palaces', 'and', 'towers', '.', '[SEP]', 'he', 'stopped', 'in', '##vo', '##lun', '##tar', '##ily', ';', 'and', 'his', 'little', 'guide', 'stopped', 'also', ',', 'and', 'looked', 'ask', '##ance', 'at', 'the', 'young', 'monk', ',', 'to', 'watch', 'the', 'effect', 'which', 'that', 'grand', 'panorama', 'should', 'produce', 'on', 'him', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 勝 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 區 ##ণ

, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this

, tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94
masked_lm_labels: last they end but , not fragment , cr but ##ic it more of was

, tokens: [CLS] and there burst on phil ##am ##mon ' s astonished eyes a [MASK] semi ##ci ##rcle [MASK] blue [MASK] , ring ##ed with palaces and [MASK] . [SEP] he stopped in υ ##lun ##tar [MASK] ; and his little guide stopped also , and looked vuelta ##ance at the young monk , to watch [MASK] [MASK] which that grand panorama should produce on him . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 13 17 19 26 32 35 46 55 56
masked_lm_labels: burst vast of sea towers ##vo ##ily ask the effect

]
tokens ['[CLS]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '區', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', '[SEP]', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] this text is included to make sure unicode is handled [MASK] : 力 加 勝 ##folk 區 ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ [MASK] ##ট ##ড ##ণ greasy text should be one [MASK] sentence - per - line , with empty lines [MASK] documents . [SEP] this sample text [MASK] public domain and was [MASK] selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 11 16 23 27 32 38 42 49 54
masked_lm_labels: properly 北 ##জ ##ত - , between is randomly

]
tokens ['[CLS]', 'the', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers', 'had', 'washed', 'away', 'the', 'debris', 'and', 'dust', 'heap', '##s', 'before', 'the', 'cabin', 'doors', '.', '[SEP]', '##r', 'had', 'once', 'picked', 'up', 'on', 'the', 'highway', 'a', 'solid', 'chunk', 'of', 'gold', 'quartz', 'which', 'the', 'rain', 'had', 'freed', 'from', 'its', 'inc', '##umber', '##ing', 'soil', ',', 'and', 'washed', 'into', 'immediate', 'and', 'glittering', 'popularity', '.', 'possibly', 'this', 'may', 'have', 'been', 'the', 'reason', 'why', 'early', 'rise', '##rs', 'in', 'that', 'locality', ',', 'during', 'the', 'rainy', 'season', ',', 'adopted', 'a', 'thoughtful', 'habit', 'of', 'body', ',', 'and', 'seldom', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season

]
tokens ['[CLS]', 'but', 'not', 'with', 'a', 'view', 'to', 'discovery', '.', 'a', 'leak', 'in', 'his', 'cabin', 'roof', ',', '-', '-', 'quite', 'consistent', 'with', 'his', 'careless', ',', 'imp', '##rov', '##ide', '##nt', 'habits', ',', '-', '-', 'had', 'rouse', '##d', 'him', 'at', '4', 'a', '.', 'm', '.', ',', 'with', 'a', 'flooded', '"', 'bunk', '"', 'and', 'wet', 'blankets', '.', 'the', 'chips', 'from', 'his', 'wood', 'pile', 'refused', 'to', 'kind', '##le', 'a', 'fire', 'to', 'dry', 'his', 'bed', '-', 'clothes', ',', 'and', 'he', 'had', 'rec', '##ours', '##e', 'to', 'a', 'more', 'provide', '##nt', 'neighbor', "'", 's', 'to', 'supply', 'the', 'deficiency', '.', 'this', 'was', 'nearly', 'opposite', '.', 'mr', '.', 'cass', '[SEP]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '勝', '北', '區', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season

, tokens: [CLS] but not with a view to discovery . a [MASK] in his cabin roof , - - [MASK] consistent with his careless , [MASK] ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded [MASK] bunk " and wet blankets . [MASK] chips from [MASK] wood pile ##arion to kind ##le a fire to dry his [MASK] - clothes , and he had [MASK] ##ours ##e to 167 [MASK] provide ##nt neighbor ' s to supply the deficiency . this was nearly opposite . apartments . cass [SEP] this text is included [MASK] make sure unicode is handled properly : 力 加 [MASK] 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড ##ণ [MASK] [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 15 16 18 24 46 53 56 59 60 68 75 79 80 96 104 114 116 126
masked_lm_labels: leak , - quite imp " the his refused to bed rec a more mr to 勝 區 ##ত

]
tokens ['[CLS]', 'something', 'glitter', '##ed', 'in', 'the', 'nearest', 'red', 'pool', 'before', 'him', '.', 'gold', ',', 'surely', '!', 'but', ',', 'wonderful', 'to', 'relate', ',', 'not', 'an', 'irregular', ',', 'shape', '##less', 'fragment', 'of', 'crude', 'ore', ',', 'fresh', 'from', 'nature', "'", 's', 'cr', '##ucible', ',', 'but', 'a', 'bit', 'of', 'jewel', '##er', "'", 's', 'hand', '##ic', '##raf', '##t', 'in', 'the', 'form', 'of', 'a', 'plain', 'gold', 'ring', '.', 'looking', 'at', 'it', 'more', 'at', '##ten', '##tively', ',', 'he', 'saw', 'that', 'it', 'bore', 'the', 'inscription', ',', '"', 'may', 'to', 'cass', '.', '"', '[SEP]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season

, tokens: [CLS] but not with a view to discovery . a [MASK] in his cabin roof , - - [MASK] consistent with his careless , [MASK] ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded [MASK] bunk " and wet blankets . [MASK] chips from [MASK] wood pile ##arion to kind ##le a fire to dry his [MASK] - clothes , and he had [MASK] ##ours ##e to 167 [MASK] provide ##nt neighbor ' s to supply the deficiency . this was nearly opposite . apartments . cass [SEP] this text is included [MASK] make sure unicode is handled properly : 力 加 [MASK] 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড ##ণ [MASK] [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 15 16 18 24 46 53 56 59 60 68 75 79 80 96 104 114 116 126
masked_lm_labels: leak , - quite imp " the his refused to bed rec a more mr to 勝 區 ##ত

, tokens: [CLS] something glitter ##ed in the nearest red [MASK] before him . gold , surely ! but , wonderful to relate , [MASK] [MASK] irregular , shape ##less fragment of crude ore [MASK] [MASK] from nature ' s cr ##ucible , but a bit of jewel ##er ' s hand ##ic ##raf ##t in the form of a plain gold [MASK] . [MASK] at it more at ##ten ##tively , he saw that it bore the inscription , " [MASK] to cass . " [SEP] like most of his fellow gold [MASK] seekers , cass [MASK] super [MASK] ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 8 20 22 23 32 33 55 60 62 64 79 91 94 95 97
masked_lm_labels: pool relate not an , fresh form ring looking it may - cass was ##sti

]
INFO:tensorflow:*** Example ***
INFO:tensorflow:tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
INFO:tensorflow:input_ids: 101 2012 103 518 2584 1996 21048 2012 1996 4500 103 1997 1996 2395 1025 102 103 1010 6919 2000 14396 103 103 2019 12052 1010 4338 3238 103 1997 13587 10848 103 4840 2013 3267 1005 1055 23145 21104 1010 18437 1037 2978 1997 13713 2121 1005 1055 2192 103 27528 2102 1999 1996 2433 1997 1037 5810 2751 3614 1012 2559 2012 103 103 2012 6528 25499 1010 2002 2387 2008 2009 8501 1996 9315 1010 1000 2089 2000 16220 1012 1000 2066 2087 103 2010 3507 2751 1011 24071 1010 16220 103 3565 16643 20771 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94 0 0 0 0 0
INFO:tensorflow:masked_lm_ids: 2197 2027 2203 2021 1010 2025 15778 1010 13675 2021 2594 2009 2062 1997 2001 0 0 0 0 0
INFO:tensorflow:masked_lm_weights: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
INFO:tensorflow:next_sentence_labels: 1

寫入tfrecord的一個完整的數據

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章