BERT代码的解读1---数据处理部分

 判断是否是汉字使用的是unicode编码

#判断是步是中文字符,汉字的unicode编码最小值为:0x4e00,最大值为0x952f
  def _is_chinese_char(self, cp):
    """Checks whether CP is the codepoint of a CJK character."""
    # This defines a "chinese character" as anything in the CJK Unicode block:
    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
    #
    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
    # despite its name. The modern Korean Hangul alphabet is a different block,
    # as is Japanese Hiragana and Katakana. Those alphabets are used to write
    # space-separated words, so they are not treated specially and handled
    # like the all of the other languages.
    '''
    0x4e00-0x9fff cjk 统一字型 常用字 共 20992个(实际只定义到0x9fc3)
0x3400-0x4dff cjk 统一字型扩展表a 少用字 共 6656个
0x20000-0x2a6df cjk 统一字型扩展表b 少用字,历史上使用 共42720个
0xf900-0xfaff cjk 兼容字型 重复字,可统一变体,共同字 共512个
0x2f800-0x2fa1f cjk 兼容字型补遗 可统一变体 共544个
    '''
    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
        (cp >= 0x3400 and cp <= 0x4DBF) or  #
        (cp >= 0x20000 and cp <= 0x2A6DF) or  #
        (cp >= 0x2A700 and cp <= 0x2B73F) or  #
        (cp >= 0x2B740 and cp <= 0x2B81F) or  #
        (cp >= 0x2B820 and cp <= 0x2CEAF) or
        (cp >= 0xF900 and cp <= 0xFAFF) or  #
        (cp >= 0x2F800 and cp <= 0x2FA1F)):  #

['this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '胜', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত']
Text should be one-sentence-per-line, with empty lines between documents.

all_dovument =[[['this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '胜', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত'], ['text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.'], ['this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.']], [['the', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers',

instance =[tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]

all_documents = [[]] #转成二维矩阵,文档个数×句子个数

create_instances_from_document():函数的解释:

1.根据索引选中一篇文档,并将句子对的最大长度定义为128,考虑到需要插入标志3个,最后的长度为125.

2.考虑到微调和预训练的过程,以一定的概率随机产生一些小于最短序列的数据,

3.从选中的文档中选择候选集,候选集所有句子的长度不超过设定的最大长度,训练数据的句子对分为a句和b句[a,b]共同构成训练数据,就是先有a句,下一句为b句。有两种情况:b是a的下一句;b不是a的下一句。

a的构造:a可能有多个句子组成,a_end是a的结束句,a_end的选取是从,候选集中随机选取。a确定后在来确定下一句b

b的构造:分为两种情况:一种是a的真实下一句,另一种是从其他文档中随机的选取b,构成a的下一句。选择的过程是先随机选择一个文档,在该文档张随机选择初始句子,在从(初始句,结束句)随机选择剩下的句子,且b的长度为最大长度减去a的长度,这样就构成了b不是a的真正的下一句,这样的a,b对有50%的可能性。另外一种构成方式是,直接把a部分的真实语句拼接。构成b。以上是由上一句构成下一句的训练集的构成。

tokens ['[CLS]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
------------ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
============== [47, 3, 8, 11, 4, 46, 40, 7, 28, 30, 33, 26, 18, 12, 22, 39, 35, 21, 31, 42, 15, 1, 38, 34, 44, 29, 32, 19, 17, 43, 6, 37, 45, 27, 41, 36, 13, 20, 14, 23, 25, 9, 24, 48, 2, 10, 5]
 [MaskedLmInstance(index=47, label='##tenberg'), MaskedLmInstance(index=3, label='of'), MaskedLmInstance(index=8, label='seekers'), MaskedLmInstance(index=11, label='was'), MaskedLmInstance(index=4, label='his'), MaskedLmInstance(index=46, label='gut'), MaskedLmInstance(index=40, label='and'), MaskedLmInstance(index=7, label='-')]

以上是被屏蔽的位置和标签-原来的真实值

然后进行排序后的结果

 [MaskedLmInstance(index=3, label='of'), MaskedLmInstance(index=4, label='his'), MaskedLmInstance(index=7, label='-'), MaskedLmInstance(index=8, label='seekers'), MaskedLmInstance(index=11, label='was'), MaskedLmInstance(index=40, label='and'), MaskedLmInstance(index=46, label='gut'), MaskedLmInstance(index=47, label='##tenberg')]

原始值

 ['[CLS]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']

屏蔽后的值

['[CLS]', 'like', 'most', '[MASK]', '[MASK]', 'fellow', 'gold', '[MASK]', '[MASK]', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', '[MASK]', 'was', 'randomly', 'selected', 'from', 'project', '[MASK]', '[MASK]', '.', '[SEP]']

一个instances

[tokens: [CLS] ceased [MASK] the gray streaks of morning at blazing star , and the [MASK] awoke to a [MASK] sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller [MASK] ut ##ens ##ils , where the heavy showers had washed away the [MASK] [MASK] dust heap ##s before the cabin doors . indeed , it [MASK] recorded in blazing star that a fortunate [MASK] rise ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain had freed from its inc ##umber ##ing soil , and [SEP] this text is [MASK] to [MASK] sure unicode is handled [MASK] : [MASK] 加 胜 北 区 ᴵ bobbie ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 6 8 14 18 36 44 49 50 62 70 83 93 103 105 110 112 118 125
masked_lm_labels: with of at settlement moral camp showers debris and was early of inc included make properly 力 ##ᴺ ##ণ

, tokens: [CLS] possibly this may have been the reason why early rise [MASK] [MASK] that locality , during the [MASK] season , adopted [MASK] thoughtful habit of body , and seldom lifted their eyes to the rift ##ed [MASK] [MASK] - ink washed skies above them . [SEP] [MASK] , [MASK] not with a view [MASK] discovery . a leak in his cabin roof , - - quite consistent with his careless , imp ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded " bunk " and wet blankets . the [MASK] [MASK] his wood pile independently to kind ##le a fire to ##ᵘ [MASK] bed - [MASK] , and he had rec honesty ##e to [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 11 12 14 18 22 37 38 47 49 54 73 102 103 107 114 115 118 124 125
masked_lm_labels: ##rs in locality rainy a or india morning but to ##rov chips from refused dry his clothes ##ours ##e

, tokens: [CLS] this was nearly opposite . mr . cass ##ius crossed the highway , and stopped suddenly . something glitter ##ed in the [MASK] red pool [MASK] him [MASK] gold , surely ! but [MASK] wonderful [MASK] [MASK] , not an irregular , shape ##less fragment of [MASK] ore , fresh from [MASK] ' s cr ##ucible , but a bit of jewel ##er ' s ⁻ ##ic [MASK] ##t in [MASK] form [MASK] a plain gold ring . [MASK] at it [MASK] at ##ten ##tively , he saw that it [MASK] the inscription , " may to cass . [MASK] [SEP] this sample text is public domain and [MASK] randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 23 26 28 34 36 37 47 52 66 68 71 73 79 82 91 96 100 109
masked_lm_labels: nearest before . , to relate crude nature hand ##raf the of looking more bore may " was

, tokens: [CLS] like most [MASK] [MASK] fellow gold [MASK] [MASK] , cass was super ##sti ##tious . [SEP] text should be one - sentence - per - line , with empty lines between documents . this sample text is public domain [MASK] was randomly selected from project [MASK] [MASK] . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 3 4 7 8 11 40 46 47
masked_lm_labels: of his - seekers was and gut ##tenberg

]
write_instance_to_example_files:写入文件
token转换成ID
input_ids [101, 2023, 3793, 2003, 2443, 2000, 2191, 2469, 27260, 2003, 8971, 7919, 1024, 1778, 1779, 1780, 1781, 1782, 1493, 30030, 30031, 30032, 29893, 29894, 29895, 29896, 29897, 29898, 3793, 2323, 103, 2028, 1011, 6251, 1011, 103, 1011, 2240, 1010, 2007, 4064, 3210, 103, 5491, 1012, 102, 103, 7099, 3793, 2003, 2270, 5884, 1998, 2001, 103, 3479, 2013, 2622, 9535, 21806, 1012, 102]

覆盖:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
features["input_ids"]
int64_list {
  value: 101
  value: 2023
  value: 3793
  value: 2003
  value: 2443
  value: 2000
  value: 2191
  value: 2469
  value: 27260
  value: 2003
  value: 8971
  value: 7919
  value: 1024
  value: 1778
  value: 1779
  value: 1780
  value: 1781
  value: 1782
  value: 1493
  value: 30030
  value: 30031
  value: 30032
  value: 29893
  value: 29894
  value: 29895
  value: 29896
  value: 29897
  value: 29898
  value: 3793
  value: 2323
  value: 103
  value: 2028
  value: 1011
  value: 6251
  value: 1011
  value: 103
  value: 1011
  value: 2240
  value: 1010
  value: 2007
  value: 4064
  value: 3210
  value: 103
  value: 5491
  value: 1012
  value: 102
  value: 103
  value: 7099
  value: 3793
  value: 2003
  value: 2270
  value: 5884
  value: 1998
  value: 2001
  value: 103
  value: 3479
  value: 2013
  value: 2622
  value: 9535
  value: 21806
  value: 1012
  value: 102
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
}
----------features OrderedDict([('input_ids', int64_list {
  value: 101
  value: 4298
  value: 2023
  value: 2089
  value: 2031
  value: 2042
  value: 1996
  value: 3114
  value: 2339
  value: 2220
  value: 4125
  value: 103
  value: 103
  value: 2008
  value: 10246
  value: 1010
  value: 2076
  value: 1996
  value: 103
  value: 2161
  value: 1010
  value: 4233
  value: 103
  value: 16465
  value: 10427
  value: 1997
  value: 2303
  value: 1010
  value: 1998
  value: 15839
  value: 4196
  value: 2037
  value: 2159
  value: 2000
  value: 1996
  value: 16931
  value: 2098
  value: 103
  value: 103
  value: 1011
  value: 10710
  value: 8871
  value: 15717
  value: 2682
  value: 2068
  value: 1012
  value: 102
  value: 103
  value: 1010
  value: 103
  value: 2025
  value: 2007
  value: 1037
  value: 3193
  value: 103
  value: 5456
  value: 1012
  value: 1037
  value: 17271
  value: 1999
  value: 2010
  value: 6644
  value: 4412
  value: 1010
  value: 1011
  value: 1011
  value: 3243
  value: 8335
  value: 2007
  value: 2010
  value: 23358
  value: 1010
  value: 17727
  value: 12298
  value: 5178
  value: 3372
  value: 14243
  value: 1010
  value: 1011
  value: 1011
  value: 2018
  value: 27384
  value: 2094
  value: 2032
  value: 2012
  value: 1018
  value: 1037
  value: 1012
  value: 1049
  value: 1012
  value: 1010
  value: 2007
  value: 1037
  value: 10361
  value: 1000
  value: 25277
  value: 1000
  value: 1998
  value: 4954
  value: 15019
  value: 1012
  value: 1996
  value: 103
  value: 103
  value: 2010
  value: 3536
  value: 8632
  value: 9174
  value: 2000
  value: 2785
  value: 2571
  value: 1037
  value: 2543
  value: 2000
  value: 30042
  value: 103
  value: 2793
  value: 1011
  value: 103
  value: 1010
  value: 1998
  value: 2002
  value: 2018
  value: 28667
  value: 16718
  value: 2063
  value: 2000
  value: 102
}
), ('input_mask', int64_list {
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
}
), ('segment_ids', int64_list {
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 0
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
  value: 1
}
), ('masked_lm_positions', int64_list {
  value: 11
  value: 12
  value: 14
  value: 18
  value: 22
  value: 37
  value: 38
  value: 47
  value: 49
  value: 54
  value: 73
  value: 102
  value: 103
  value: 107
  value: 114
  value: 115
  value: 118
  value: 124
  value: 125
  value: 0
}
), ('masked_lm_ids', int64_list {
  value: 2869
  value: 1999
  value: 10246
  value: 16373
  value: 1037
  value: 2030
  value: 2634
  value: 2851
  value: 2021
  value: 2000
  value: 12298
  value: 11772
  value: 2013
  value: 4188
  value: 4318
  value: 2010
  value: 4253
  value: 22957
  value: 2063
  value: 0
}
), ('masked_lm_weights', float_list {
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 1.0
  value: 0.0
}
), ('next_sentence_labels', int64_list {
  value: 0
}
)])

数据格式的构造部分结束

tokens ['[CLS]', 'ancient', 'sage', '-', '-', 'the', 'name', 'is', 'un', '##im', '##port', '##ant', 'to', 'a', 'monk', '-', '-', 'pumped', 'water', 'nightly', 'that', 'he', 'might', 'study', 'by', 'day', ',', 'so', 'i', ',', 'the', 'guardian', 'of', 'cloak', '##s', 'and', 'para', '##sol', '##s', ',', 'at', 'the', 'sacred', 'doors', 'of', 'her', 'lecture', '-', 'room', ',', 'im', '##bib', '##e', 'celestial', 'knowledge', '.', 'from', 'my', 'youth', 'i', 'felt', 'in', 'me', 'a', '[SEP]', 'fallen', 'star', ',', 'i', 'am', ',', 'sir', '!', "'", 'continued', 'he', ',', 'pens', '##ively', ',', 'stroking', 'his', 'lean', 'stomach', '-', '-', "'", 'a', 'fallen', 'star', '!', '-', '-', 'fallen', ',', 'if', 'the', 'dignity', 'of', 'philosophy', 'will', 'allow', 'of', 'the', 'simi', '##le', ',', 'among', 'the', 'hog', '##s', 'of', 'the', 'lower', 'world', '-', '-', 'indeed', ',', 'even', 'into', 'the', 'hog', '-', 'bucket', 'itself', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

]
tokens ['[CLS]', 'there', 'is', 'a', 'phil', '##oso', '##phic', 'pleasure', 'in', 'opening', 'one', "'", 's', 'treasures', 'to', 'the', 'modest', 'young', '.', '[SEP]', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers', 'had', 'washed', 'away', 'the', 'debris', 'and', 'dust', 'heap', '##s', 'before', 'the', 'cabin', 'doors', '.', 'indeed', ',', 'it', 'was', 'recorded', 'in', 'blazing', 'star', 'that', 'a', 'fortunate', 'early', 'rise', '##r', 'had', 'once', 'picked', 'up', 'on', 'the', 'highway', 'a', 'solid', 'chunk', 'of', 'gold', 'quartz', 'which', 'the', 'rain', 'had', 'freed', 'from', 'its', 'inc', '##umber', '##ing', 'soil', ',', 'and', 'washed', 'into', 'immediate', 'and', 'glittering', 'popularity', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

]
tokens ['[CLS]', 'perhaps', 'you', 'will', 'assist', 'me', 'by', 'carrying', 'this', 'basket', 'of', 'fruit', '?', "'", 'and', 'the', 'little', 'man', 'jumped', 'up', ',', 'put', 'his', 'basket', 'on', 'phil', '##am', '##mon', "'", 's', 'head', ',', 'and', 'tr', '##otted', 'off', 'up', 'a', 'neighbouring', 'street', '.', 'phil', '##am', '##mon', 'followed', ',', 'half', 'contempt', '##uous', ',', 'half', 'wondering', 'at', 'what', 'this', 'philosophy', 'might', 'be', ',', 'which', 'could', 'feed', 'the', 'self', '-', 'con', '##ce', '##it', 'of', 'anything', 'so', 'ab', '##ject', 'as', 'his', 'ragged', 'little', 'api', '##sh', 'guide', ';', '[SEP]', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

]
tokens ['[CLS]', 'of', 'the', 'street', ',', 'the', 'perpetual', 'stream', 'of', 'busy', 'faces', ',', 'the', 'line', 'of', 'cu', '##rri', '##cles', ',', 'pal', '##an', '##quin', '##s', ',', 'laden', 'ass', '##es', ',', 'camel', '##s', ',', 'elephants', ',', 'which', 'met', 'and', 'passed', 'him', ',', 'and', 'squeezed', 'him', 'up', 'steps', 'and', 'into', 'doorway', '##s', ',', 'as', 'they', 'threaded', 'their', 'way', 'through', 'the', 'great', 'moon', '-', 'gate', 'into', 'the', 'ample', 'street', 'beyond', ',', 'drove', 'everything', 'from', 'his', 'mind', 'but', 'wondering', 'curiosity', ',', 'and', 'a', 'vague', ',', 'helpless', 'dread', 'of', 'that', 'great', 'living', 'wilderness', ',', 'more', 'terrible', 'than', 'any', 'dead', 'wilderness', 'of', 'sand', 'which', 'he', 'had', 'left', '[SEP]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '胜', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 胜 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ

]
tokens ['[CLS]', 'the', 'rep', '##ose', ',', 'the', 'silence', 'of', 'the', 'laura', '-', '-', 'for', 'faces', 'which', 'knew', 'him', 'and', 'smiled', 'upon', 'him', ';', 'but', 'it', 'was', 'too', 'late', 'to', 'turn', 'back', 'now', '.', 'his', 'guide', 'held', 'on', 'for', 'more', 'than', 'a', 'mile', 'up', 'the', 'great', 'main', 'street', ',', 'crossed', 'in', 'the', 'centre', 'of', 'the', 'city', ',', 'at', 'right', 'angles', ',', 'by', 'one', 'equally', 'magnificent', ',', 'at', 'each', 'end', 'of', 'which', ',', 'miles', 'away', ',', 'appeared', ',', 'dim', 'and', 'distant', 'over', 'the', 'heads', 'of', 'the', 'living', 'stream', 'of', 'passengers', ',', 'the', 'yellow', 'sand', '-', 'hills', 'of', 'the', 'desert', ';', 'while', 'at', 'the', 'end', 'of', 'the', 'vista', 'in', 'front', 'of', 'them', 'gleamed', 'the', 'blue', 'harbour', ',', 'through', 'a', 'network', 'of', 'countless', 'mast', '##s', '.', '[SEP]', 'this', 'was', 'nearly', 'opposite', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 胜 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ

, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this

]
tokens ['[CLS]', 'at', 'last', 'they', 'reached', 'the', 'quay', 'at', 'the', 'opposite', 'end', 'of', 'the', 'street', ';', '[SEP]', 'but', ',', 'wonderful', 'to', 'relate', ',', 'not', 'an', 'irregular', ',', 'shape', '##less', 'fragment', 'of', 'crude', 'ore', ',', 'fresh', 'from', 'nature', "'", 's', 'cr', '##ucible', ',', 'but', 'a', 'bit', 'of', 'jewel', '##er', "'", 's', 'hand', '##ic', '##raf', '##t', 'in', 'the', 'form', 'of', 'a', 'plain', 'gold', 'ring', '.', 'looking', 'at', 'it', 'more', 'at', '##ten', '##tively', ',', 'he', 'saw', 'that', 'it', 'bore', 'the', 'inscription', ',', '"', 'may', 'to', 'cass', '.', '"', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 胜 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ

, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this

, tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94
masked_lm_labels: last they end but , not fragment , cr but ##ic it more of was

]
tokens ['[CLS]', 'and', 'there', 'burst', 'on', 'phil', '##am', '##mon', "'", 's', 'astonished', 'eyes', 'a', 'vast', 'semi', '##ci', '##rcle', 'of', 'blue', 'sea', ',', 'ring', '##ed', 'with', 'palaces', 'and', 'towers', '.', '[SEP]', 'he', 'stopped', 'in', '##vo', '##lun', '##tar', '##ily', ';', 'and', 'his', 'little', 'guide', 'stopped', 'also', ',', 'and', 'looked', 'ask', '##ance', 'at', 'the', 'young', 'monk', ',', 'to', 'watch', 'the', 'effect', 'which', 'that', 'grand', 'panorama', 'should', 'produce', 'on', 'him', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] ancient sage [MASK] [MASK] the name kang un ##im [MASK] ##ant to a monk - - pumped water nightly that he might study by day , so i [MASK] the [MASK] of cloak ##s [MASK] para ##sol ##acies , at the sacred doors of her [MASK] - room [MASK] im ##bib ##e celestial knowledge . from my youth i felt in me a [SEP] fallen star , i am , bobbie ! ' continued he , [MASK] ##ively , stroking his lean [MASK] - - ' a fallen star ! - [MASK] fallen , if the dignity [MASK] philosophy will allow of the simi ##le , among the hog [MASK] of the lower world - [MASK] indeed , even into the hog - bucket itself . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 4 6 7 10 29 31 35 38 46 49 71 77 83 92 98 110 116 124
masked_lm_labels: - - name is ##port , guardian and ##s lecture , sir pens stomach - of ##s - bucket

, tokens: [CLS] there is a phil ##oso ##phic pleasure in opening [MASK] ' s treasures to the modest young . [SEP] rain had only ceased with [MASK] gray streaks of morning at blazing star , [MASK] the settlement awoke to a moral sense of clean akron 16th [MASK] the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the [MASK] showers had washed away the debris and dust heap [MASK] before the cabin doors . indeed [MASK] [MASK] was recorded in blazing [MASK] that a fortunate [MASK] [MASK] [MASK] had once picked up on [MASK] highway a solid chunk [MASK] [MASK] quartz which the [MASK] had freed from its inc ##umber ##ing soil , and washed into immediate and glittering popularity [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 25 34 44 45 46 61 65 75 82 83 88 92 93 94 100 105 106 110
masked_lm_labels: one the and ##liness , and ##ils heavy ##s , it star early rise ##r the of gold rain

, tokens: [CLS] perhaps murder will assist me by carrying this [MASK] of fruit ? ' [MASK] the little man jumped up , put his basket [MASK] phil ##am ##mon ' [MASK] head , and tr ##otted off up a neighbouring street . phil ##am ##mon followed , half contempt ##uous , half wondering at what this philosophy [MASK] be , which [MASK] [MASK] the self [MASK] con ##ce ##it of anything so ab ##ject as his ragged [MASK] api ##val guide ; [SEP] text should be [MASK] [MASK] sentence - per [MASK] line [MASK] with empty lines [MASK] documents . this sample text is public domain and was randomly selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 9 14 24 29 56 60 61 63 64 76 78 85 86 90 92 96
masked_lm_labels: you basket and on s might could feed self - little ##sh one - - , between

, tokens: [CLS] of [MASK] street , the perpetual stream of busy faces , the line of [MASK] 示 ##cles , pal ##an ##quin ##s , laden ass ##es , camel ##s , elephants [MASK] which met and passed him , [MASK] 1760 him [MASK] steps and into doorway ##s , as they threaded their 1887 through the great moon - association into the ample street beyond , drove everything from his mind but wondering curiosity , and a [MASK] , helpless dread of that great living wilderness , more terrible than any dead wilderness of sand which he had left [SEP] this [MASK] [MASK] included to make sure unicode [MASK] [MASK] properly : 力 加 胜 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড [MASK] ##ত [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 15 16 32 39 40 42 48 53 59 62 77 101 102 108 109 115 116 125
masked_lm_labels: the cu ##rri , and squeezed up , way gate ample vague text is is handled 北 区 ##ণ

, tokens: [CLS] the rep ##ose , [MASK] silence of [MASK] laura - - for [MASK] which [MASK] him [MASK] smiled upon him ; but [MASK] was too late to turn back now . his [MASK] held on [MASK] more [MASK] [unused731] mile up the great [MASK] street , crossed in the centre [MASK] the city , at right angles , by one equally magnificent , at each end of which , miles away , appeared , [MASK] and distant over the heads of the living stream of passengers [MASK] the yellow sand - [MASK] of the desert ; while at the end of the vista in front swaying them gleamed the blue harbour , through a network of countless mast ##s . [SEP] archaeologist was nearly opposite . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 5 8 13 15 17 23 33 36 38 39 44 51 71 75 76 87 92 106 122
masked_lm_labels: the the faces knew and it guide for than a main of away dim and , hills of this

, tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94
masked_lm_labels: last they end but , not fragment , cr but ##ic it more of was

, tokens: [CLS] and there burst on phil ##am ##mon ' s astonished eyes a [MASK] semi ##ci ##rcle [MASK] blue [MASK] , ring ##ed with palaces and [MASK] . [SEP] he stopped in υ ##lun ##tar [MASK] ; and his little guide stopped also , and looked vuelta ##ance at the young monk , to watch [MASK] [MASK] which that grand panorama should produce on him . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 3 13 17 19 26 32 35 46 55 56
masked_lm_labels: burst vast of sea towers ##vo ##ily ask the effect

]
tokens ['[CLS]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '胜', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', 'text', 'should', 'be', 'one', '-', 'sentence', '-', 'per', '-', 'line', ',', 'with', 'empty', 'lines', 'between', 'documents', '.', '[SEP]', 'this', 'sample', 'text', 'is', 'public', 'domain', 'and', 'was', 'randomly', 'selected', 'from', 'project', 'gut', '##tenberg', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] this text is included to make sure unicode is handled [MASK] : 力 加 胜 ##folk 区 ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ [MASK] ##ট ##ড ##ণ greasy text should be one [MASK] sentence - per - line , with empty lines [MASK] documents . [SEP] this sample text [MASK] public domain and was [MASK] selected from project gut ##tenberg . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 11 16 23 27 32 38 42 49 54
masked_lm_labels: properly 北 ##জ ##ত - , between is randomly

]
tokens ['[CLS]', 'the', 'rain', 'had', 'only', 'ceased', 'with', 'the', 'gray', 'streaks', 'of', 'morning', 'at', 'blazing', 'star', ',', 'and', 'the', 'settlement', 'awoke', 'to', 'a', 'moral', 'sense', 'of', 'clean', '##liness', ',', 'and', 'the', 'finding', 'of', 'forgotten', 'knives', ',', 'tin', 'cups', ',', 'and', 'smaller', 'camp', 'ut', '##ens', '##ils', ',', 'where', 'the', 'heavy', 'showers', 'had', 'washed', 'away', 'the', 'debris', 'and', 'dust', 'heap', '##s', 'before', 'the', 'cabin', 'doors', '.', '[SEP]', '##r', 'had', 'once', 'picked', 'up', 'on', 'the', 'highway', 'a', 'solid', 'chunk', 'of', 'gold', 'quartz', 'which', 'the', 'rain', 'had', 'freed', 'from', 'its', 'inc', '##umber', '##ing', 'soil', ',', 'and', 'washed', 'into', 'immediate', 'and', 'glittering', 'popularity', '.', 'possibly', 'this', 'may', 'have', 'been', 'the', 'reason', 'why', 'early', 'rise', '##rs', 'in', 'that', 'locality', ',', 'during', 'the', 'rainy', 'season', ',', 'adopted', 'a', 'thoughtful', 'habit', 'of', 'body', ',', 'and', 'seldom', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season

]
tokens ['[CLS]', 'but', 'not', 'with', 'a', 'view', 'to', 'discovery', '.', 'a', 'leak', 'in', 'his', 'cabin', 'roof', ',', '-', '-', 'quite', 'consistent', 'with', 'his', 'careless', ',', 'imp', '##rov', '##ide', '##nt', 'habits', ',', '-', '-', 'had', 'rouse', '##d', 'him', 'at', '4', 'a', '.', 'm', '.', ',', 'with', 'a', 'flooded', '"', 'bunk', '"', 'and', 'wet', 'blankets', '.', 'the', 'chips', 'from', 'his', 'wood', 'pile', 'refused', 'to', 'kind', '##le', 'a', 'fire', 'to', 'dry', 'his', 'bed', '-', 'clothes', ',', 'and', 'he', 'had', 'rec', '##ours', '##e', 'to', 'a', 'more', 'provide', '##nt', 'neighbor', "'", 's', 'to', 'supply', 'the', 'deficiency', '.', 'this', 'was', 'nearly', 'opposite', '.', 'mr', '.', 'cass', '[SEP]', 'this', 'text', 'is', 'included', 'to', 'make', 'sure', 'unicode', 'is', 'handled', 'properly', ':', '力', '加', '胜', '北', '区', 'ᴵ', '##ᴺ', '##ᵀ', '##ᵃ', '##ছ', '##জ', '##ট', '##ড', '##ণ', '##ত', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season

, tokens: [CLS] but not with a view to discovery . a [MASK] in his cabin roof , - - [MASK] consistent with his careless , [MASK] ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded [MASK] bunk " and wet blankets . [MASK] chips from [MASK] wood pile ##arion to kind ##le a fire to dry his [MASK] - clothes , and he had [MASK] ##ours ##e to 167 [MASK] provide ##nt neighbor ' s to supply the deficiency . this was nearly opposite . apartments . cass [SEP] this text is included [MASK] make sure unicode is handled properly : 力 加 [MASK] 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড ##ণ [MASK] [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 15 16 18 24 46 53 56 59 60 68 75 79 80 96 104 114 116 126
masked_lm_labels: leak , - quite imp " the his refused to bed rec a more mr to 胜 区 ##ত

]
tokens ['[CLS]', 'something', 'glitter', '##ed', 'in', 'the', 'nearest', 'red', 'pool', 'before', 'him', '.', 'gold', ',', 'surely', '!', 'but', ',', 'wonderful', 'to', 'relate', ',', 'not', 'an', 'irregular', ',', 'shape', '##less', 'fragment', 'of', 'crude', 'ore', ',', 'fresh', 'from', 'nature', "'", 's', 'cr', '##ucible', ',', 'but', 'a', 'bit', 'of', 'jewel', '##er', "'", 's', 'hand', '##ic', '##raf', '##t', 'in', 'the', 'form', 'of', 'a', 'plain', 'gold', 'ring', '.', 'looking', 'at', 'it', 'more', 'at', '##ten', '##tively', ',', 'he', 'saw', 'that', 'it', 'bore', 'the', 'inscription', ',', '"', 'may', 'to', 'cass', '.', '"', '[SEP]', 'like', 'most', 'of', 'his', 'fellow', 'gold', '-', 'seekers', ',', 'cass', 'was', 'super', '##sti', '##tious', '.', '[SEP]']
segment_ids [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---- [tokens: [CLS] the rain had only ceased with [MASK] [MASK] streaks of morning at blazing [MASK] , and the settlement awoke to a moral sense of clean ##liness , and the finding of forgotten knives , tin cups , and smaller camp ut ##ens ##ils , where the heavy showers [MASK] washed away the debris and dust [MASK] ##s before the cabin [MASK] . [SEP] ##r had once picked up on the highway a solid chunk [MASK] gold quartz which the rain [MASK] freed from its inc ##umber ##ing soil , and [MASK] into immediate and [MASK] popularity . possibly this may have been the reason why early rise ##rs [MASK] that locality [MASK] during [MASK] [MASK] [MASK] , adopted a thoughtful habit of body , and seldom [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 7 8 14 24 45 49 56 61 62 75 77 81 91 95 109 112 114 115 116
masked_lm_labels: the gray star of where had heap doors . of quartz had washed glittering in , the rainy season

, tokens: [CLS] but not with a view to discovery . a [MASK] in his cabin roof , - - [MASK] consistent with his careless , [MASK] ##rov ##ide ##nt habits , - - had rouse ##d him at 4 a . m . , with a flooded [MASK] bunk " and wet blankets . [MASK] chips from [MASK] wood pile ##arion to kind ##le a fire to dry his [MASK] - clothes , and he had [MASK] ##ours ##e to 167 [MASK] provide ##nt neighbor ' s to supply the deficiency . this was nearly opposite . apartments . cass [SEP] this text is included [MASK] make sure unicode is handled properly : 力 加 [MASK] 北 [MASK] ᴵ ##ᴺ ##ᵀ ##ᵃ ##ছ ##জ ##ট ##ড ##ণ [MASK] [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: True
masked_lm_positions: 10 15 16 18 24 46 53 56 59 60 68 75 79 80 96 104 114 116 126
masked_lm_labels: leak , - quite imp " the his refused to bed rec a more mr to 胜 区 ##ত

, tokens: [CLS] something glitter ##ed in the nearest red [MASK] before him . gold , surely ! but , wonderful to relate , [MASK] [MASK] irregular , shape ##less fragment of crude ore [MASK] [MASK] from nature ' s cr ##ucible , but a bit of jewel ##er ' s hand ##ic ##raf ##t in the form of a plain gold [MASK] . [MASK] at it more at ##ten ##tively , he saw that it bore the inscription , " [MASK] to cass . " [SEP] like most of his fellow gold [MASK] seekers , cass [MASK] super [MASK] ##tious . [SEP]
segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
is_random_next: False
masked_lm_positions: 8 20 22 23 32 33 55 60 62 64 79 91 94 95 97
masked_lm_labels: pool relate not an , fresh form ring looking it may - cass was ##sti

]
INFO:tensorflow:*** Example ***
INFO:tensorflow:tokens: [CLS] at [MASK] [unused513] reached the quay at the opposite [MASK] of the street ; [SEP] [MASK] , wonderful to relate [MASK] [MASK] an irregular , shape ##less [MASK] of crude ore [MASK] fresh from nature ' s disagreements ##ucible , muse a bit of jewel ##er ' s hand [MASK] ##raf ##t in the form of a plain gold ring . looking at [MASK] [MASK] at ##ten ##tively , he saw that it bore the inscription , " may to cass . " like most [MASK] his fellow gold - seekers , cass [MASK] super ##sti ##tious . [SEP]
INFO:tensorflow:input_ids: 101 2012 103 518 2584 1996 21048 2012 1996 4500 103 1997 1996 2395 1025 102 103 1010 6919 2000 14396 103 103 2019 12052 1010 4338 3238 103 1997 13587 10848 103 4840 2013 3267 1005 1055 23145 21104 1010 18437 1037 2978 1997 13713 2121 1005 1055 2192 103 27528 2102 1999 1996 2433 1997 1037 5810 2751 3614 1012 2559 2012 103 103 2012 6528 25499 1010 2002 2387 2008 2009 8501 1996 9315 1010 1000 2089 2000 16220 1012 1000 2066 2087 103 2010 3507 2751 1011 24071 1010 16220 103 3565 16643 20771 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:masked_lm_positions: 2 3 10 16 21 22 28 32 38 41 50 64 65 86 94 0 0 0 0 0
INFO:tensorflow:masked_lm_ids: 2197 2027 2203 2021 1010 2025 15778 1010 13675 2021 2594 2009 2062 1997 2001 0 0 0 0 0
INFO:tensorflow:masked_lm_weights: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
INFO:tensorflow:next_sentence_labels: 1

写入tfrecord的一个完整的数据

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章