SCUER commited on
Commit
b030b5e
·
verified ·
1 Parent(s): 0dbd08d

Upload tokenizer

Browse files
Files changed (3) hide show
  1. special_tokens_map.json +34 -0
  2. tokenizer.json +2428 -0
  3. tokenizer_config.json +75 -0
special_tokens_map.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<s>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "eos_token": {
14
+ "content": "</s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "pad_token": {
21
+ "content": "<pad>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "unk_token": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
tokenizer.json ADDED
@@ -0,0 +1,2428 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 256,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": "BatchLongest",
11
+ "direction": "Right",
12
+ "pad_to_multiple_of": null,
13
+ "pad_id": 0,
14
+ "pad_type_id": 0,
15
+ "pad_token": "<pad>"
16
+ },
17
+ "added_tokens": [
18
+ {
19
+ "id": 0,
20
+ "content": "<pad>",
21
+ "single_word": false,
22
+ "lstrip": false,
23
+ "rstrip": false,
24
+ "normalized": false,
25
+ "special": true
26
+ },
27
+ {
28
+ "id": 1,
29
+ "content": "<unk>",
30
+ "single_word": false,
31
+ "lstrip": false,
32
+ "rstrip": false,
33
+ "normalized": false,
34
+ "special": true
35
+ },
36
+ {
37
+ "id": 2,
38
+ "content": "<s>",
39
+ "single_word": false,
40
+ "lstrip": false,
41
+ "rstrip": false,
42
+ "normalized": false,
43
+ "special": true
44
+ },
45
+ {
46
+ "id": 3,
47
+ "content": "</s>",
48
+ "single_word": false,
49
+ "lstrip": false,
50
+ "rstrip": false,
51
+ "normalized": false,
52
+ "special": true
53
+ },
54
+ {
55
+ "id": 512,
56
+ "content": "<extra_id_0>",
57
+ "single_word": false,
58
+ "lstrip": false,
59
+ "rstrip": false,
60
+ "normalized": false,
61
+ "special": true
62
+ },
63
+ {
64
+ "id": 513,
65
+ "content": "<extra_id_1>",
66
+ "single_word": false,
67
+ "lstrip": false,
68
+ "rstrip": false,
69
+ "normalized": false,
70
+ "special": true
71
+ }
72
+ ],
73
+ "normalizer": {
74
+ "type": "Sequence",
75
+ "normalizers": [
76
+ {
77
+ "type": "Replace",
78
+ "pattern": {
79
+ "String": "\\s+"
80
+ },
81
+ "content": ""
82
+ },
83
+ {
84
+ "type": "NFKC"
85
+ }
86
+ ]
87
+ },
88
+ "pre_tokenizer": {
89
+ "type": "Sequence",
90
+ "pretokenizers": [
91
+ {
92
+ "type": "Split",
93
+ "pattern": {
94
+ "String": "(N\\(\\[c\\]1ccccc1\\)\\(c1ccccc1\\)c1ccccc1|c1ccccc1C\\(=O\\)OOC\\(=O\\)c1ccccc1|CC\\(C\\)\\(CN\\)N=N\\[C\\]\\(CN\\)C\\(C\\)C|c1cc\\(N\\(\\[O\\]\\)c2ccccc2\\)cc1|N\\#CC\\(C\\)\\(C\\)N=NC\\(C\\)\\(C\\)C\\#N|CC\\(C\\)\\(C\\)N\\(\\[O\\]\\)C\\(C\\)\\(C\\)C|OC\\(=O\\)C\\(=S\\)SCCCC\\[CH2\\]|c1cc\\(\\[C\\]\\)c2ccccc2c1|c1c\\(\\[C\\]\\)c2ccccc2cc1|C\\[N\\]1N=CN\\(C\\)C1=\\[N\\]|CC\\(C\\)\\(C\\)OOC\\(C\\)\\(C\\)C|c1ccc\\(\\[C\\]\\(CH3\\)\\)cc1|N1C\\(=O\\)CCC1\\(=O\\)Br|c1ccccc1C\\(=O\\)O\\[C\\]|\\[Se\\]CC\\(c1ccccc1\\)|COC\\(=O\\)OOC\\(=O\\)OC|c1ccc\\(C\\[CH2\\]\\)cc1|BrC\\(=O\\)OC\\(C\\)\\(C\\)C|\\[Si\\]\\(c1ccccc1\\)3|c1ccc2\\[nH\\]ccc12|c1ccc2\\[nH\\]cnc12|c1cccc2ncccc12|\\[P\\]\\(c1ccccc1\\)2|C1N\\(C\\)C\\(=O\\)CC1|\\[CH2\\]c1ccccc1|c1cc\\(\\[C\\]\\)ccc1|c1ccc2cnccc12|c1ccc2ccccc12|\\[CH2\\]Si\\(CH3\\)3|CC=CC=CC\\[CH2\\]|c1ccc\\(\\[C\\]\\)cc1|c1c\\(\\[C\\]\\)cccc1|\\[CH2\\]CC=CC=CC|BrCCCCC\\[CH2\\]|\\[CH2\\]P\\(CH3\\)2|C1C2CCCCC1C2|C1CCCC\\[CH\\]C1|\\[CH2\\]C\\(=O\\)OC|c1ccc2sccc12|ClCCCCC\\[CH2\\]|\\[CH2\\]N\\(CH3\\)2|c1cn2ccnc2c1|c1ccc2occc12|\\[O\\]OC\\(C\\)\\(C\\)C|c1ccccc1\\[N\\]|\\[SiH2\\]C\\(C\\)C|\\[CH2\\]CCC=CC|S\\(=O\\)\\(=O\\)Cl|CC=CCC\\[CH2\\]|\\[c1ccccc1\\+\\]|\\[CH2\\]C\\#CC\\#C|CC\\(=O\\)\\[CH2\\]|N=\\[N\\+\\]=\\[N\\-\\]|C\\(=O\\)\\[C\\]\\(C\\)|c1ccccc1\\[O\\]|\\[CH2\\]C\\(=O\\)C|\\[CH2\\]CCCC\\#C|\\[C\\]\\(C\\)\\(C\\)CN|C\\(=O\\)O\\[CH2\\]|C\\(C\\)\\(C\\)O\\[C\\]|C1CCC\\[CH\\]C1|C\\#CCCC\\[CH2\\]|c1ccccc1SeC|\\[CH2\\]C\\(=O\\)O|c1ccccc1\\[S\\]|\\[C\\]\\(=O\\)\\[O\\-\\]|c1cnn\\[nH\\]1|C1C2CCC1C2|C1CC\\[CH\\]C1|c1c\\[c\\]ccc1|c1cn\\[nH\\]c1|c1cc\\[nH\\]c1|\\[CH2=CH2\\+\\]|c1cc\\[se\\]c1|\\[CH\\]\\(CH3\\)2|\\[Si\\]\\(CH3\\)3|\\[c\\]1ccccc1|OSi\\(C\\)\\(C\\)C|C\\(=O\\)N\\(C\\)C|\\[CH2\\]C=C=C|c1ccccc1C|C1C2CC1C2|CSC\\(=S\\)OC|\\[B\\]\\(CH3\\)2|\\[CH2\\]OCH3|S\\(=O\\)\\(=O\\)|\\[P\\]\\(CH3\\)2|\\[CH2\\]SiH3|C\\(=O\\)\\[C\\]H|\\[PH\\]C\\(C\\)C|\\[CH2\\]SCH3|\\[C\\]\\(CH3\\)3|c1nccnc1|C=C\\[CH2\\]|\\[SiH2\\]CC|\\[CH2\\]PH2|C1CCOCC1|C1CCNCC1|C1CCSCC1|C\\(=S\\)\\[S\\]|\\[O\\]C\\(C\\)C|C1COCCO1|\\[CH2\\]C\\#C|c1cncnc1|\\[CH2\\]C=C|CN\\(\\[O\\]\\)C|\\[CH2\\]NO2|CC\\(=O\\)OC|c1ccncc1|C\\#C\\[CH2\\]|C=C=\\[CH\\]|CC\\[SiH2\\]|C1C\\[CH\\]1|\\[CH2\\]CH3|\\[CH2\\]CF3|\\[CH2\\]CCC|c1ccccc1|Si\\(CH3\\)3|c1ncncn1|\\[S\\]C\\(C\\)C|C1CCCCC1|C\\(=O\\)\\[O\\]|\\[AlH4\\-\\]|N\\(CH3\\)2|CC\\[CH2\\]|\\[BH\\]CH3|C1SCCN1|C1CCSC1|c1ccoc1|C\\(=O\\)SC|C1CCNC1|C1OCCN1|C1CCOC1|C1NCCN1|c1ccns1|C\\(=O\\)OC|C1SCCS1|C1OCOC1|c1ccno1|P\\(CH3\\)2|C\\(=O\\)Br|C\\(=O\\)Cl|CC\\(=O\\)O|c1ccsc1|\\[CH2\\]CN|c1cocn1|CS\\(=O\\)C|C\\[SiH2\\]|C\\(=O\\)SH|\\[PH\\]CC|P\\(=O\\)N|\\[81Br\\]|C1COC1|\\[BH4\\-\\]|\\[CH3\\-\\]|\\[35Cl\\]|\\[CH3\\+\\]|c1csn1|\\[C\\]Br3|C=CC\\#C|C\\(\\[O\\]\\)|\\[SiH2\\]|C\\(\\[C\\]\\)|\\[SiH4\\]|C=\\[CH\\]|\\[AlH2\\]|\\[79Br\\]|C1CNC1|\\[Fe\\+3\\]|\\[SnH3\\]|\\[Fe\\+2\\]|PO\\(C\\)C|CBrCl3|S\\(=O\\)N|C\\#CC\\#C|C\\#CC=C|PO\\(O\\)C|\\[Pt\\+2\\]|C\\(=N\\)N|CC\\[PH\\]|\\[C\\]Cl3|\\[CH4\\+\\]|\\[O\\]N=O|P\\(=O\\)O|\\[O\\]\\[O\\]|C\\[CH2\\]|C\\(=N\\)O|C1CSC1|\\[37Cl\\]|C\\(=O\\)I|\\[GeH3\\]|\\[Cu\\+1\\]|C\\(=S\\)N|C\\(=O\\)H|C\\(=O\\)N|\\[Pd\\+2\\]|C\\(\\[N\\]\\)|\\[SiH3\\]|C\\(=Si\\)|C\\(=O\\)O|CCCCCC|\\[GaH2\\]|\\[Cu\\+2\\]|C=CC=C|CHCl3|\\[12C\\]|\\[CH3\\]|CCOCC|\\[36S\\]|\\[16O\\]|C=C=C|\\[\\-\\-\\-\\]|C1NC1|NHCH3|C\\[PH\\]|\\[14C\\]|C1SC1|C\\(Cl\\)|\\[BH2\\]|C\\(=P\\)|\\[CH4\\]|\\[NH3\\]|\\[O\\]CC|CC\\[N\\]|\\[SeH\\]|SO2Cl|ClCCl|\\[CH2\\]|C\\(Br\\)|\\[PH2\\]|Si\\-Si|BrCBr|\\[S\\]CC|\\[NH4\\]|\\[13C\\]|\\[31P\\]|CC\\[O\\]|\\[PH3\\]|\\[O2\\-\\]|\\[NH2\\]|C\\(=O\\)|\\[15N\\]|PO3H2|\\[C@@\\]|N=C=S|N=C=O|\\[32S\\]|\\[S@@\\]|\\[34S\\]|\\[32P\\]|\\[33S\\]|CC\\[S\\]|C\\#\\[C\\]|C1CC1|\\[SiH\\]|C1OC1|\\[CH\\]N|\\[14N\\]|\\[17O\\]|\\[SH2\\]|\\[18O\\]|\\[P@@\\]|C\\(Si\\)|\\[OH2\\]|C\\(=S\\)|\\[N@@\\]|\\[35S\\]|Si=Si|\\[\\+\\+\\+\\]|C\\(S\\)|\\[C@\\]|C\\(B\\)|OCH3|\\[2H\\]|\\[Si\\]|CCl4|\\[N\\+\\]|CCCl|\\[PH\\]|\\[\\+1\\]|SCH3|CCCC|\\[\\-1\\]|\\[SH\\]|\\(CO\\)|Si\\-H|C\\(I\\)|CCF3|\\[3H\\]|C\\(P\\)|\\[\\-\\-\\]|\\[\\+\\+\\]|\\[NH\\]|\\[O\\-\\]|\\[P\\+\\]|C\\[S\\]|SO3H|\\[CH\\]|\\[oH\\]|SO2N|ICHI|\\[Se\\]|\\[S\\-\\]|CCl3|C\\-Si|\\[P@\\]|C\\[O\\]|CC\\#N|\\[OH\\]|C\\[N\\]|C\\(F\\)|\\[nH\\]|\\[S@\\]|\\[\\-2\\]|\\[N@\\]|C\\(O\\)|\\[cH\\]|C\\(C\\)|\\[O\\]O|C\\(N\\)|\\[C\\]N|\\[BH\\]|\\[\\+2\\]|CBr3|CCBr|\\[1H\\]|P\\-P|N\\-H|N\\#N|B\\-H|C\\#N|CCN|CCI|C\\-P|C\\-C|CCl|\\[p\\]|S\\-S|\\[\\+\\]|\\[n\\]|C\\#C|C=S|\\[C\\]|N=N|P\\-H|CCF|FCF|\\[S\\]|CI3|CF3|CCO|\\[\\*\\]|CCC|CSi|CCS|\\[c\\]|C=C|\\[D\\]|\\[T\\]|O\\-O|\\[\\-\\]|C\\-X|\\[s\\]|\\[B\\]|C\\-O|\\[P\\]|N\\-N|CBr|\\[R\\]|\\[X\\]|P=O|S\\-H|O\\-H|C=O|N\\#C|C\\-S|C=N|C\\-N|NO2|\\[o\\]|N=O|C\\-H|\\[N\\]|P=P|S=O|C\\-B|\\[O\\]|Cu|Be|Br|Mg|Pt|CO|Cd|Sn|Sr|Al|Co|CC|Li|Au|Rb|CN|Ca|Ge|In|Zr|CP|Fe|Hg|CS|Cl|CF|Cs|Te|Pd|Ba|Mn|Bi|Zn|Sb|Pb|As|Se|Na|Ag|Cr|Si|Ni|CI|Ga|Ti|F|O|H|C|K|P|V|S|\\*|B|N|I)"
95
+ },
96
+ "behavior": "Isolated",
97
+ "invert": false
98
+ },
99
+ {
100
+ "type": "Split",
101
+ "pattern": {
102
+ "String": "(\\[.*?\\])"
103
+ },
104
+ "behavior": "Isolated",
105
+ "invert": false
106
+ },
107
+ {
108
+ "type": "Split",
109
+ "pattern": {
110
+ "String": "(Cl|Br)"
111
+ },
112
+ "behavior": "Isolated",
113
+ "invert": false
114
+ },
115
+ {
116
+ "type": "Split",
117
+ "pattern": {
118
+ "String": "(\\(|\\)|=|#|-|\\+|\\.)"
119
+ },
120
+ "behavior": "Isolated",
121
+ "invert": false
122
+ },
123
+ {
124
+ "type": "WhitespaceSplit"
125
+ }
126
+ ]
127
+ },
128
+ "post_processor": {
129
+ "type": "TemplateProcessing",
130
+ "single": [
131
+ {
132
+ "Sequence": {
133
+ "id": "A",
134
+ "type_id": 0
135
+ }
136
+ },
137
+ {
138
+ "SpecialToken": {
139
+ "id": "</s>",
140
+ "type_id": 0
141
+ }
142
+ }
143
+ ],
144
+ "pair": [
145
+ {
146
+ "Sequence": {
147
+ "id": "A",
148
+ "type_id": 0
149
+ }
150
+ },
151
+ {
152
+ "SpecialToken": {
153
+ "id": "</s>",
154
+ "type_id": 0
155
+ }
156
+ },
157
+ {
158
+ "Sequence": {
159
+ "id": "B",
160
+ "type_id": 0
161
+ }
162
+ },
163
+ {
164
+ "SpecialToken": {
165
+ "id": "</s>",
166
+ "type_id": 0
167
+ }
168
+ }
169
+ ],
170
+ "special_tokens": {
171
+ "</s>": {
172
+ "id": "</s>",
173
+ "ids": [
174
+ 3
175
+ ],
176
+ "tokens": [
177
+ "</s>"
178
+ ]
179
+ }
180
+ }
181
+ },
182
+ "decoder": {
183
+ "type": "BPEDecoder",
184
+ "suffix": "</w>"
185
+ },
186
+ "model": {
187
+ "type": "BPE",
188
+ "dropout": null,
189
+ "unk_token": "<unk>",
190
+ "continuing_subword_prefix": null,
191
+ "end_of_word_suffix": null,
192
+ "fuse_unk": false,
193
+ "byte_fallback": false,
194
+ "ignore_merges": false,
195
+ "vocab": {
196
+ "<pad>": 0,
197
+ "<unk>": 1,
198
+ "<s>": 2,
199
+ "</s>": 3,
200
+ "#": 4,
201
+ "%": 5,
202
+ "(": 6,
203
+ ")": 7,
204
+ "*": 8,
205
+ "+": 9,
206
+ "-": 10,
207
+ ".": 11,
208
+ "/": 12,
209
+ "0": 13,
210
+ "1": 14,
211
+ "2": 15,
212
+ "3": 16,
213
+ "4": 17,
214
+ "5": 18,
215
+ "6": 19,
216
+ "7": 20,
217
+ "8": 21,
218
+ "9": 22,
219
+ ":": 23,
220
+ "=": 24,
221
+ "@": 25,
222
+ "A": 26,
223
+ "B": 27,
224
+ "C": 28,
225
+ "D": 29,
226
+ "E": 30,
227
+ "F": 31,
228
+ "G": 32,
229
+ "H": 33,
230
+ "I": 34,
231
+ "J": 35,
232
+ "K": 36,
233
+ "L": 37,
234
+ "M": 38,
235
+ "N": 39,
236
+ "O": 40,
237
+ "P": 41,
238
+ "Q": 42,
239
+ "R": 43,
240
+ "S": 44,
241
+ "T": 45,
242
+ "U": 46,
243
+ "V": 47,
244
+ "W": 48,
245
+ "X": 49,
246
+ "Y": 50,
247
+ "Z": 51,
248
+ "[": 52,
249
+ "\\": 53,
250
+ "]": 54,
251
+ "a": 55,
252
+ "b": 56,
253
+ "c": 57,
254
+ "d": 58,
255
+ "e": 59,
256
+ "f": 60,
257
+ "g": 61,
258
+ "h": 62,
259
+ "i": 63,
260
+ "j": 64,
261
+ "k": 65,
262
+ "l": 66,
263
+ "m": 67,
264
+ "n": 68,
265
+ "o": 69,
266
+ "p": 70,
267
+ "q": 71,
268
+ "r": 72,
269
+ "s": 73,
270
+ "t": 74,
271
+ "u": 75,
272
+ "v": 76,
273
+ "w": 77,
274
+ "x": 78,
275
+ "y": 79,
276
+ "z": 80,
277
+ "{": 81,
278
+ "}": 82,
279
+ "CC": 83,
280
+ "C(": 84,
281
+ "O)": 85,
282
+ "C(C": 86,
283
+ "=O)": 87,
284
+ "C(C)": 88,
285
+ "CCC": 89,
286
+ "CO": 90,
287
+ "C(=O)": 91,
288
+ "C=": 92,
289
+ "cc": 93,
290
+ "CCCC": 94,
291
+ "C=C": 95,
292
+ "[C": 96,
293
+ "[CH": 97,
294
+ "[O": 98,
295
+ "-]": 99,
296
+ "[O-]": 100,
297
+ "[N": 101,
298
+ "+]": 102,
299
+ "[N+]": 103,
300
+ "[CH]": 104,
301
+ "C(=O)O": 105,
302
+ "C(O)": 106,
303
+ "c1": 107,
304
+ "CCO": 108,
305
+ "Cl": 109,
306
+ "(=O)": 110,
307
+ "(C": 111,
308
+ "N=": 112,
309
+ "N=O": 113,
310
+ "2]": 114,
311
+ "[CH2]": 115,
312
+ "CC(=O)": 116,
313
+ "Br": 117,
314
+ "CC(=O)O": 118,
315
+ "cccc": 119,
316
+ "c1cccc": 120,
317
+ "C=O": 121,
318
+ "c1ccccc1": 122,
319
+ "(CO)": 123,
320
+ "C1": 124,
321
+ "C(=O)OC": 125,
322
+ "l)": 126,
323
+ "C(Cl)": 127,
324
+ "CCCCCC": 128,
325
+ "CCl": 129,
326
+ "(C)": 130,
327
+ "OO": 131,
328
+ "C#": 132,
329
+ "CC(=O)OC": 133,
330
+ "C=CC": 134,
331
+ "OC1": 135,
332
+ "C1OC1": 136,
333
+ "CN": 137,
334
+ "=C": 138,
335
+ "CS": 139,
336
+ "C#N": 140,
337
+ "C=CC=C": 141,
338
+ "C1CC": 142,
339
+ "(=O)[O-]": 143,
340
+ "[N+](=O)[O-]": 144,
341
+ "CCC1": 145,
342
+ "CBr": 146,
343
+ "F)": 147,
344
+ "CCN": 148,
345
+ "CC(": 149,
346
+ "Br)": 150,
347
+ "CC(C)": 151,
348
+ "C(F)": 152,
349
+ "C(Br)": 153,
350
+ "CC1": 154,
351
+ "C1CCCCC1": 155,
352
+ "CCCl": 156,
353
+ "O[N+](=O)[O-]": 157,
354
+ "CCS": 158,
355
+ "Si": 159,
356
+ "C#C": 160,
357
+ "[CH2]C=C": 161,
358
+ "S(=O)": 162,
359
+ "S(=O)(=O)": 163,
360
+ "C2": 164,
361
+ "C(=O)N": 165,
362
+ "Sn": 166,
363
+ "[CH2]CCC": 167,
364
+ "C(=": 168,
365
+ "CCOCC": 169,
366
+ "N=N": 170,
367
+ "c1ccccc1C": 171,
368
+ "C(C)(C)": 172,
369
+ "O=": 173,
370
+ "OO)": 174,
371
+ "#N": 175,
372
+ "(O)": 176,
373
+ "[Si": 177,
374
+ "C1CC1": 178,
375
+ "S)": 179,
376
+ "C(=S)": 180,
377
+ "[Si]": 181,
378
+ "c2": 182,
379
+ "O[N+](=O)[O-])": 183,
380
+ "Cl)": 184,
381
+ "CC#N": 185,
382
+ "Na": 186,
383
+ "#C": 187,
384
+ "H]": 188,
385
+ "N#C": 189,
386
+ "CCBr": 190,
387
+ "=O": 191,
388
+ "CC(O)": 192,
389
+ "OOO": 193,
390
+ "CO[N+](=O)[O-]": 194,
391
+ "N)": 195,
392
+ "COO": 196,
393
+ "CC(C)(C)": 197,
394
+ "(OO)": 198,
395
+ "[C@": 199,
396
+ "(C)C": 200,
397
+ "=CC": 201,
398
+ "CC)": 202,
399
+ "C[CH]": 203,
400
+ "c1cc": 204,
401
+ "C2O": 205,
402
+ "(OO": 206,
403
+ "CCC(C)": 207,
404
+ "C=C(C)": 208,
405
+ "OON=O": 209,
406
+ "C(CO)": 210,
407
+ "[O-])": 211,
408
+ "(Cl)": 212,
409
+ "/C=C": 213,
410
+ "CCCC1": 214,
411
+ "C)": 215,
412
+ "([O-])": 216,
413
+ "N=O)": 217,
414
+ "CC(Cl)": 218,
415
+ "=C(C)": 219,
416
+ "CC(C": 220,
417
+ "COOO": 221,
418
+ "OOO[N+](=O)[O-]": 222,
419
+ "ccccc2": 223,
420
+ "1)": 224,
421
+ "C(OO": 225,
422
+ "c2ccccc2": 226,
423
+ "[N+]([O-])": 227,
424
+ "c1ccccc1)": 228,
425
+ "N)N": 229,
426
+ "[C@]": 230,
427
+ "[S": 231,
428
+ "C=N": 232,
429
+ "CCCCCCCC": 233,
430
+ "C(=N)N": 234,
431
+ "O=[N+]([O-])": 235,
432
+ "[CH]N": 236,
433
+ "CO[N+](=O)[O-])": 237,
434
+ "CC(CO)": 238,
435
+ "C=C1": 239,
436
+ "COCCO": 240,
437
+ "CC1(C)": 241,
438
+ "C(C)(C)C": 242,
439
+ "C(N)": 243,
440
+ "C=CC(C)": 244,
441
+ "Mg": 245,
442
+ "C(C)=O": 246,
443
+ "[SH]": 247,
444
+ "COCCO1": 248,
445
+ "c(": 249,
446
+ "C=C=C": 250,
447
+ "C1COCCO1": 251,
448
+ "CCC(=O)": 252,
449
+ "cc1": 253,
450
+ "/C=C\\": 254,
451
+ "COC(=O)": 255,
452
+ "C(=O)OO": 256,
453
+ "1)C2O": 257,
454
+ "CC12": 258,
455
+ "I)": 259,
456
+ "C1O": 260,
457
+ "(CC)": 261,
458
+ "C1C2": 262,
459
+ "CC(C)(O)": 263,
460
+ "(CO[N+](=O)[O-])": 264,
461
+ "C(C)O": 265,
462
+ "CCC(": 266,
463
+ "C(I)": 267,
464
+ "COO)": 268,
465
+ "N1": 269,
466
+ "[CH2]C(=O)O": 270,
467
+ "CI": 271,
468
+ "C(C=O)": 272,
469
+ "CC(C=O)": 273,
470
+ "(O[N+](=O)[O-])": 274,
471
+ "Cc1cc": 275,
472
+ "OC": 276,
473
+ "c(O)": 277,
474
+ "C1(C)": 278,
475
+ "C=CC1": 279,
476
+ "C(=O)N(C)C": 280,
477
+ "COON=O": 281,
478
+ "COOO[N+](=O)[O-]": 282,
479
+ "CCC2": 283,
480
+ "CCCC(C)": 284,
481
+ "O=N": 285,
482
+ "C(=O)OC)": 286,
483
+ "(Cl)Cl": 287,
484
+ "=CC1": 288,
485
+ "c3": 289,
486
+ "CCOO": 290,
487
+ "C(=C": 291,
488
+ "C(C)C": 292,
489
+ "CCC12": 293,
490
+ "CC1=": 294,
491
+ "CC1(C)C2": 295,
492
+ "CCC(O)": 296,
493
+ "=C1": 297,
494
+ "CC[CH]": 298,
495
+ "C(C)(OO": 299,
496
+ "c1ccccc1)c1ccccc1": 300,
497
+ "O=NOO": 301,
498
+ "(F)": 302,
499
+ "C/": 303,
500
+ "CC2": 304,
501
+ "C=CC2": 305,
502
+ "CC1=CC": 306,
503
+ "CCCCC1": 307,
504
+ "C(F)(F)": 308,
505
+ "C(CO)OO": 309,
506
+ "C(OO1)C2O": 310,
507
+ "c2ccccc2)": 311,
508
+ "(Br)": 312,
509
+ "Cc1ccccc1": 313,
510
+ "CCC(C": 314,
511
+ "CCOC(=O)": 315,
512
+ "[CH2]C(=O)": 316,
513
+ "/C=C/": 317,
514
+ "O=[N+]([O-])O": 318,
515
+ "CCCCC": 319,
516
+ "CCCCO": 320,
517
+ "OC)": 321,
518
+ "[n": 322,
519
+ "CC=": 323,
520
+ "[CH2]c1ccccc1": 324,
521
+ "#N)": 325,
522
+ "[nH]": 326,
523
+ "[CH]O": 327,
524
+ "[CH]O)": 328,
525
+ "[CH2]C(=O)OC": 329,
526
+ "CC(C)O": 330,
527
+ "C(C)(C)O": 331,
528
+ "C(OO)": 332,
529
+ "C(CC)": 333,
530
+ "CC(C)(C)O": 334,
531
+ "(OON=O)": 335,
532
+ "[CH2]C(=O)C": 336,
533
+ "2)": 337,
534
+ "C(C)(O)": 338,
535
+ "CCOOO": 339,
536
+ "O=[N+]([O-])OOO": 340,
537
+ "c(C)": 341,
538
+ "[CH]1": 342,
539
+ "C(O)CO": 343,
540
+ "CC(O)(": 344,
541
+ "(CC": 345,
542
+ "C(C)O[N+](=O)[O-]": 346,
543
+ "C(=O)OCC": 347,
544
+ "CC(=": 348,
545
+ "CC(OO": 349,
546
+ "[C@H]": 350,
547
+ "C/C(=C": 351,
548
+ "(O": 352,
549
+ "CO)": 353,
550
+ "(C=O)": 354,
551
+ "OC1(C)": 355,
552
+ "CCCC(=O)": 356,
553
+ "C2(C)": 357,
554
+ "C(C)(C)C)": 358,
555
+ "O=C1": 359,
556
+ "c2ccccc21": 360,
557
+ "OC(C)=O": 361,
558
+ "C(Cl)(Cl)": 362,
559
+ "C12": 363,
560
+ "C(Cl)Cl": 364,
561
+ "C1CC2": 365,
562
+ "CC(CO[N+](=O)[O-])": 366,
563
+ "CC1=C(C)": 367,
564
+ "C=C(C)C1CC": 368,
565
+ "C(C=O)OO": 369,
566
+ "C(O[N+](=O)[O-])": 370,
567
+ "CCC=C(C)": 371,
568
+ "C=C(": 372,
569
+ "CCCC)": 373,
570
+ "CCCC2": 374,
571
+ "C=C[CH]": 375,
572
+ "C(O)COO": 376,
573
+ "OOC1": 377,
574
+ "C=CC=CC1": 378,
575
+ "CC(OO)": 379,
576
+ "C(CO[N+](=O)[O-])": 380,
577
+ "CO1": 381,
578
+ "C=CCC1": 382,
579
+ "OOC(C)": 383,
580
+ "C(COO)": 384,
581
+ "C(O)C(C)": 385,
582
+ "CCOC1": 386,
583
+ "(C#N)": 387,
584
+ "ccccc3": 388,
585
+ "C1=": 389,
586
+ "C#CCCC": 390,
587
+ "C=CC(O)": 391,
588
+ "CC(C)=": 392,
589
+ "[Si](C)": 393,
590
+ "CC(Cl)CC(Cl)": 394,
591
+ "CC(CO)(OO)": 395,
592
+ "N1CCCC1": 396,
593
+ "CCC2C1CC2": 397,
594
+ "c3ccccc3": 398,
595
+ "CCC2C1CC2(C)C": 399,
596
+ "Cc1cccc": 400,
597
+ "[Sn": 401,
598
+ "CCC(C)(C)": 402,
599
+ "C(C)=O)": 403,
600
+ "C(C)OO": 404,
601
+ "CC(CC(": 405,
602
+ "CC(C)(C#N)": 406,
603
+ "C2(C)C": 407,
604
+ "(OOO)": 408,
605
+ "(OOO[N+](=O)[O-])": 409,
606
+ "C(CO)O[N+](=O)[O-]": 410,
607
+ "c(OC)": 411,
608
+ "SC": 412,
609
+ "CCC(OO": 413,
610
+ "CCCC(O)": 414,
611
+ "OOC1(C)": 415,
612
+ "CC(O[N+](=O)[O-])": 416,
613
+ "CC(C)(C)OO": 417,
614
+ "CC(C[CH]": 418,
615
+ "CC1=CCC(": 419,
616
+ "(CCCC)": 420,
617
+ "C(O)C=O": 421,
618
+ "CC(=C": 422,
619
+ "CCCCCCCCCCCC": 423,
620
+ "[Sn]": 424,
621
+ "OOC1(C)C2O": 425,
622
+ "(C(=O)O": 426,
623
+ "(C(=O)OC)": 427,
624
+ "3]": 428,
625
+ "N#N": 429,
626
+ "OCCO": 430,
627
+ "S=O": 431,
628
+ "ncc1": 432,
629
+ "CCI": 433,
630
+ "C(C)OOO": 434,
631
+ "C(=O)SC": 435,
632
+ "[CH3]": 436,
633
+ "[OH]": 437,
634
+ "ClCCl": 438,
635
+ "[CH2]C(C)": 439,
636
+ "C1CCOC1": 440,
637
+ "C1OC1(C)": 441,
638
+ "CC(=CC": 442,
639
+ "[SiH]": 443,
640
+ "COON=O)": 444,
641
+ "c1ccncc1": 445,
642
+ "[S-]": 446,
643
+ "CC(C=O)(OO)": 447,
644
+ "C/C(=C/": 448,
645
+ "C#CCCC[CH2]": 449,
646
+ "CC(C)(C)OOC(C)(C)C": 450,
647
+ "CF": 451,
648
+ "C[CH2]": 452,
649
+ "CC[CH2]": 453,
650
+ "C(O": 454,
651
+ "O)OO": 455,
652
+ "C(C)OON=O": 456,
653
+ "COCO": 457,
654
+ "COC(=O)OO": 458,
655
+ "[Na": 459,
656
+ "ClC1": 460,
657
+ "C1OC1C=O": 461,
658
+ "CSC(=S)": 462,
659
+ "CC(CC": 463,
660
+ "CC1(": 464,
661
+ "CC1C2(C)C": 465,
662
+ "[CH2]CCC=CC": 466,
663
+ "C(C)(C)OO": 467,
664
+ "CC(O)(CO[N+](=O)[O-])": 468,
665
+ "(C)C)": 469,
666
+ "C(CO)OON=O": 470,
667
+ "c(O)c1": 471,
668
+ "C=CC2OOC1(C)C2O": 472,
669
+ "C(F)(F)C(F)": 473,
670
+ "COC(=O)OOC(=O)OC": 474,
671
+ "CSC(=S)OC": 475,
672
+ "(COO)": 476,
673
+ "@H]": 477,
674
+ "n1": 478,
675
+ "C(C)OOO[N+](=O)[O-]": 479,
676
+ "CCC=": 480,
677
+ "(C1": 481,
678
+ "C1=C": 482,
679
+ "C(Cl)(Cl)Cl": 483,
680
+ "CCCCCCCCC": 484,
681
+ "(C)cc": 485,
682
+ "[C@@H]": 486,
683
+ "c1ccc(": 487,
684
+ "CCC(O[N+](=O)[O-])": 488,
685
+ "CCOOO[N+](=O)[O-]": 489,
686
+ "/C=C(": 490,
687
+ "2(C)": 491,
688
+ "OCC": 492,
689
+ "[CH]CO": 493,
690
+ "CC(=O)C(O)": 494,
691
+ "CCC1=C(C)": 495,
692
+ "CC(C)=O": 496,
693
+ "C(C)(C)O[N+](=O)[O-]": 497,
694
+ "CCCCCCCC(=O)": 498,
695
+ "Cc1ccc(": 499,
696
+ "OOC1C2O": 500,
697
+ "OOC(C)(C1": 501,
698
+ "[Na]": 502,
699
+ "(CO": 503,
700
+ "(c1ccccc1)c1ccccc1": 504,
701
+ "C/C=C\\": 505,
702
+ "CCC(CO)": 506,
703
+ "C(COO": 507,
704
+ "C=CCC": 508,
705
+ "C=CCCC1": 509,
706
+ "C(O)OO": 510,
707
+ "CCO[N+](=O)[O-]": 511
708
+ },
709
+ "merges": [
710
+ [
711
+ "C",
712
+ "C"
713
+ ],
714
+ [
715
+ "C",
716
+ "("
717
+ ],
718
+ [
719
+ "O",
720
+ ")"
721
+ ],
722
+ [
723
+ "C(",
724
+ "C"
725
+ ],
726
+ [
727
+ "=",
728
+ "O)"
729
+ ],
730
+ [
731
+ "C(C",
732
+ ")"
733
+ ],
734
+ [
735
+ "CC",
736
+ "C"
737
+ ],
738
+ [
739
+ "C",
740
+ "O"
741
+ ],
742
+ [
743
+ "C(",
744
+ "=O)"
745
+ ],
746
+ [
747
+ "C",
748
+ "="
749
+ ],
750
+ [
751
+ "c",
752
+ "c"
753
+ ],
754
+ [
755
+ "CC",
756
+ "CC"
757
+ ],
758
+ [
759
+ "C=",
760
+ "C"
761
+ ],
762
+ [
763
+ "[",
764
+ "C"
765
+ ],
766
+ [
767
+ "[C",
768
+ "H"
769
+ ],
770
+ [
771
+ "[",
772
+ "O"
773
+ ],
774
+ [
775
+ "-",
776
+ "]"
777
+ ],
778
+ [
779
+ "[O",
780
+ "-]"
781
+ ],
782
+ [
783
+ "[",
784
+ "N"
785
+ ],
786
+ [
787
+ "+",
788
+ "]"
789
+ ],
790
+ [
791
+ "[N",
792
+ "+]"
793
+ ],
794
+ [
795
+ "[CH",
796
+ "]"
797
+ ],
798
+ [
799
+ "C(=O)",
800
+ "O"
801
+ ],
802
+ [
803
+ "C(",
804
+ "O)"
805
+ ],
806
+ [
807
+ "c",
808
+ "1"
809
+ ],
810
+ [
811
+ "CC",
812
+ "O"
813
+ ],
814
+ [
815
+ "C",
816
+ "l"
817
+ ],
818
+ [
819
+ "(",
820
+ "=O)"
821
+ ],
822
+ [
823
+ "(",
824
+ "C"
825
+ ],
826
+ [
827
+ "N",
828
+ "="
829
+ ],
830
+ [
831
+ "N=",
832
+ "O"
833
+ ],
834
+ [
835
+ "2",
836
+ "]"
837
+ ],
838
+ [
839
+ "[CH",
840
+ "2]"
841
+ ],
842
+ [
843
+ "CC",
844
+ "(=O)"
845
+ ],
846
+ [
847
+ "B",
848
+ "r"
849
+ ],
850
+ [
851
+ "CC(=O)",
852
+ "O"
853
+ ],
854
+ [
855
+ "cc",
856
+ "cc"
857
+ ],
858
+ [
859
+ "c1",
860
+ "cccc"
861
+ ],
862
+ [
863
+ "C=",
864
+ "O"
865
+ ],
866
+ [
867
+ "c1cccc",
868
+ "c1"
869
+ ],
870
+ [
871
+ "(C",
872
+ "O)"
873
+ ],
874
+ [
875
+ "C",
876
+ "1"
877
+ ],
878
+ [
879
+ "C(=O)O",
880
+ "C"
881
+ ],
882
+ [
883
+ "l",
884
+ ")"
885
+ ],
886
+ [
887
+ "C(C",
888
+ "l)"
889
+ ],
890
+ [
891
+ "CCCC",
892
+ "CC"
893
+ ],
894
+ [
895
+ "CC",
896
+ "l"
897
+ ],
898
+ [
899
+ "(C",
900
+ ")"
901
+ ],
902
+ [
903
+ "O",
904
+ "O"
905
+ ],
906
+ [
907
+ "C",
908
+ "#"
909
+ ],
910
+ [
911
+ "CC(=O)O",
912
+ "C"
913
+ ],
914
+ [
915
+ "C=",
916
+ "CC"
917
+ ],
918
+ [
919
+ "O",
920
+ "C1"
921
+ ],
922
+ [
923
+ "C1",
924
+ "OC1"
925
+ ],
926
+ [
927
+ "C",
928
+ "N"
929
+ ],
930
+ [
931
+ "=",
932
+ "C"
933
+ ],
934
+ [
935
+ "C",
936
+ "S"
937
+ ],
938
+ [
939
+ "C#",
940
+ "N"
941
+ ],
942
+ [
943
+ "C=CC",
944
+ "=C"
945
+ ],
946
+ [
947
+ "C1",
948
+ "CC"
949
+ ],
950
+ [
951
+ "(=O)",
952
+ "[O-]"
953
+ ],
954
+ [
955
+ "[N+]",
956
+ "(=O)[O-]"
957
+ ],
958
+ [
959
+ "CCC",
960
+ "1"
961
+ ],
962
+ [
963
+ "C",
964
+ "Br"
965
+ ],
966
+ [
967
+ "F",
968
+ ")"
969
+ ],
970
+ [
971
+ "CC",
972
+ "N"
973
+ ],
974
+ [
975
+ "CC",
976
+ "("
977
+ ],
978
+ [
979
+ "Br",
980
+ ")"
981
+ ],
982
+ [
983
+ "CC",
984
+ "(C)"
985
+ ],
986
+ [
987
+ "C(",
988
+ "F)"
989
+ ],
990
+ [
991
+ "C(",
992
+ "Br)"
993
+ ],
994
+ [
995
+ "CC",
996
+ "1"
997
+ ],
998
+ [
999
+ "C1CC",
1000
+ "CCC1"
1001
+ ],
1002
+ [
1003
+ "CCC",
1004
+ "l"
1005
+ ],
1006
+ [
1007
+ "O",
1008
+ "[N+](=O)[O-]"
1009
+ ],
1010
+ [
1011
+ "CC",
1012
+ "S"
1013
+ ],
1014
+ [
1015
+ "S",
1016
+ "i"
1017
+ ],
1018
+ [
1019
+ "C#",
1020
+ "C"
1021
+ ],
1022
+ [
1023
+ "[CH2]",
1024
+ "C=C"
1025
+ ],
1026
+ [
1027
+ "S",
1028
+ "(=O)"
1029
+ ],
1030
+ [
1031
+ "S(=O)",
1032
+ "(=O)"
1033
+ ],
1034
+ [
1035
+ "C",
1036
+ "2"
1037
+ ],
1038
+ [
1039
+ "C(=O)",
1040
+ "N"
1041
+ ],
1042
+ [
1043
+ "S",
1044
+ "n"
1045
+ ],
1046
+ [
1047
+ "[CH2]",
1048
+ "CCC"
1049
+ ],
1050
+ [
1051
+ "C(",
1052
+ "="
1053
+ ],
1054
+ [
1055
+ "CCO",
1056
+ "CC"
1057
+ ],
1058
+ [
1059
+ "N=",
1060
+ "N"
1061
+ ],
1062
+ [
1063
+ "c1ccccc1",
1064
+ "C"
1065
+ ],
1066
+ [
1067
+ "C(C)",
1068
+ "(C)"
1069
+ ],
1070
+ [
1071
+ "O",
1072
+ "="
1073
+ ],
1074
+ [
1075
+ "O",
1076
+ "O)"
1077
+ ],
1078
+ [
1079
+ "#",
1080
+ "N"
1081
+ ],
1082
+ [
1083
+ "(",
1084
+ "O)"
1085
+ ],
1086
+ [
1087
+ "[",
1088
+ "Si"
1089
+ ],
1090
+ [
1091
+ "C1CC",
1092
+ "1"
1093
+ ],
1094
+ [
1095
+ "S",
1096
+ ")"
1097
+ ],
1098
+ [
1099
+ "C(=",
1100
+ "S)"
1101
+ ],
1102
+ [
1103
+ "[Si",
1104
+ "]"
1105
+ ],
1106
+ [
1107
+ "c",
1108
+ "2"
1109
+ ],
1110
+ [
1111
+ "O[N+](=O)[O-]",
1112
+ ")"
1113
+ ],
1114
+ [
1115
+ "Cl",
1116
+ ")"
1117
+ ],
1118
+ [
1119
+ "CC",
1120
+ "#N"
1121
+ ],
1122
+ [
1123
+ "N",
1124
+ "a"
1125
+ ],
1126
+ [
1127
+ "#",
1128
+ "C"
1129
+ ],
1130
+ [
1131
+ "H",
1132
+ "]"
1133
+ ],
1134
+ [
1135
+ "N",
1136
+ "#C"
1137
+ ],
1138
+ [
1139
+ "CC",
1140
+ "Br"
1141
+ ],
1142
+ [
1143
+ "=",
1144
+ "O"
1145
+ ],
1146
+ [
1147
+ "CC(",
1148
+ "O)"
1149
+ ],
1150
+ [
1151
+ "OO",
1152
+ "O"
1153
+ ],
1154
+ [
1155
+ "CO",
1156
+ "[N+](=O)[O-]"
1157
+ ],
1158
+ [
1159
+ "N",
1160
+ ")"
1161
+ ],
1162
+ [
1163
+ "CO",
1164
+ "O"
1165
+ ],
1166
+ [
1167
+ "CC(C)",
1168
+ "(C)"
1169
+ ],
1170
+ [
1171
+ "(",
1172
+ "OO)"
1173
+ ],
1174
+ [
1175
+ "[C",
1176
+ "@"
1177
+ ],
1178
+ [
1179
+ "(C)",
1180
+ "C"
1181
+ ],
1182
+ [
1183
+ "=",
1184
+ "CC"
1185
+ ],
1186
+ [
1187
+ "CC",
1188
+ ")"
1189
+ ],
1190
+ [
1191
+ "C",
1192
+ "[CH]"
1193
+ ],
1194
+ [
1195
+ "c1",
1196
+ "cc"
1197
+ ],
1198
+ [
1199
+ "C2",
1200
+ "O"
1201
+ ],
1202
+ [
1203
+ "(",
1204
+ "OO"
1205
+ ],
1206
+ [
1207
+ "CC",
1208
+ "C(C)"
1209
+ ],
1210
+ [
1211
+ "C=",
1212
+ "C(C)"
1213
+ ],
1214
+ [
1215
+ "OO",
1216
+ "N=O"
1217
+ ],
1218
+ [
1219
+ "C(C",
1220
+ "O)"
1221
+ ],
1222
+ [
1223
+ "[O-]",
1224
+ ")"
1225
+ ],
1226
+ [
1227
+ "(",
1228
+ "Cl)"
1229
+ ],
1230
+ [
1231
+ "/",
1232
+ "C=C"
1233
+ ],
1234
+ [
1235
+ "CCCC",
1236
+ "1"
1237
+ ],
1238
+ [
1239
+ "C",
1240
+ ")"
1241
+ ],
1242
+ [
1243
+ "(",
1244
+ "[O-])"
1245
+ ],
1246
+ [
1247
+ "N",
1248
+ "=O)"
1249
+ ],
1250
+ [
1251
+ "CC(",
1252
+ "Cl)"
1253
+ ],
1254
+ [
1255
+ "=",
1256
+ "C(C)"
1257
+ ],
1258
+ [
1259
+ "CC",
1260
+ "(C"
1261
+ ],
1262
+ [
1263
+ "CO",
1264
+ "OO"
1265
+ ],
1266
+ [
1267
+ "OO",
1268
+ "O[N+](=O)[O-]"
1269
+ ],
1270
+ [
1271
+ "cccc",
1272
+ "c2"
1273
+ ],
1274
+ [
1275
+ "1",
1276
+ ")"
1277
+ ],
1278
+ [
1279
+ "C(",
1280
+ "OO"
1281
+ ],
1282
+ [
1283
+ "c2",
1284
+ "ccccc2"
1285
+ ],
1286
+ [
1287
+ "[N+]",
1288
+ "([O-])"
1289
+ ],
1290
+ [
1291
+ "c1ccccc1",
1292
+ ")"
1293
+ ],
1294
+ [
1295
+ "N)",
1296
+ "N"
1297
+ ],
1298
+ [
1299
+ "[C@",
1300
+ "]"
1301
+ ],
1302
+ [
1303
+ "[",
1304
+ "S"
1305
+ ],
1306
+ [
1307
+ "C=",
1308
+ "N"
1309
+ ],
1310
+ [
1311
+ "CCCC",
1312
+ "CCCC"
1313
+ ],
1314
+ [
1315
+ "C(=",
1316
+ "N)N"
1317
+ ],
1318
+ [
1319
+ "O=",
1320
+ "[N+]([O-])"
1321
+ ],
1322
+ [
1323
+ "[CH]",
1324
+ "N"
1325
+ ],
1326
+ [
1327
+ "CO[N+](=O)[O-]",
1328
+ ")"
1329
+ ],
1330
+ [
1331
+ "CC",
1332
+ "(CO)"
1333
+ ],
1334
+ [
1335
+ "C=C",
1336
+ "1"
1337
+ ],
1338
+ [
1339
+ "CO",
1340
+ "CCO"
1341
+ ],
1342
+ [
1343
+ "CC1",
1344
+ "(C)"
1345
+ ],
1346
+ [
1347
+ "C(C)(C)",
1348
+ "C"
1349
+ ],
1350
+ [
1351
+ "C(",
1352
+ "N)"
1353
+ ],
1354
+ [
1355
+ "C=CC",
1356
+ "(C)"
1357
+ ],
1358
+ [
1359
+ "M",
1360
+ "g"
1361
+ ],
1362
+ [
1363
+ "C(C)",
1364
+ "=O"
1365
+ ],
1366
+ [
1367
+ "[S",
1368
+ "H]"
1369
+ ],
1370
+ [
1371
+ "COCCO",
1372
+ "1"
1373
+ ],
1374
+ [
1375
+ "c",
1376
+ "("
1377
+ ],
1378
+ [
1379
+ "C=",
1380
+ "C=C"
1381
+ ],
1382
+ [
1383
+ "C1",
1384
+ "COCCO1"
1385
+ ],
1386
+ [
1387
+ "CC",
1388
+ "C(=O)"
1389
+ ],
1390
+ [
1391
+ "cc",
1392
+ "1"
1393
+ ],
1394
+ [
1395
+ "/C=C",
1396
+ "\\"
1397
+ ],
1398
+ [
1399
+ "CO",
1400
+ "C(=O)"
1401
+ ],
1402
+ [
1403
+ "C(=O)O",
1404
+ "O"
1405
+ ],
1406
+ [
1407
+ "1)",
1408
+ "C2O"
1409
+ ],
1410
+ [
1411
+ "CC1",
1412
+ "2"
1413
+ ],
1414
+ [
1415
+ "I",
1416
+ ")"
1417
+ ],
1418
+ [
1419
+ "C1",
1420
+ "O"
1421
+ ],
1422
+ [
1423
+ "(",
1424
+ "CC)"
1425
+ ],
1426
+ [
1427
+ "C1",
1428
+ "C2"
1429
+ ],
1430
+ [
1431
+ "CC(C)",
1432
+ "(O)"
1433
+ ],
1434
+ [
1435
+ "(",
1436
+ "CO[N+](=O)[O-])"
1437
+ ],
1438
+ [
1439
+ "C(C)",
1440
+ "O"
1441
+ ],
1442
+ [
1443
+ "CC",
1444
+ "C("
1445
+ ],
1446
+ [
1447
+ "C(",
1448
+ "I)"
1449
+ ],
1450
+ [
1451
+ "CO",
1452
+ "O)"
1453
+ ],
1454
+ [
1455
+ "N",
1456
+ "1"
1457
+ ],
1458
+ [
1459
+ "[CH2]",
1460
+ "C(=O)O"
1461
+ ],
1462
+ [
1463
+ "C",
1464
+ "I"
1465
+ ],
1466
+ [
1467
+ "C(C",
1468
+ "=O)"
1469
+ ],
1470
+ [
1471
+ "CC(C",
1472
+ "=O)"
1473
+ ],
1474
+ [
1475
+ "(",
1476
+ "O[N+](=O)[O-])"
1477
+ ],
1478
+ [
1479
+ "C",
1480
+ "c1cc"
1481
+ ],
1482
+ [
1483
+ "O",
1484
+ "C"
1485
+ ],
1486
+ [
1487
+ "c",
1488
+ "(O)"
1489
+ ],
1490
+ [
1491
+ "C1",
1492
+ "(C)"
1493
+ ],
1494
+ [
1495
+ "C=CC",
1496
+ "1"
1497
+ ],
1498
+ [
1499
+ "C(=O)N",
1500
+ "(C)C"
1501
+ ],
1502
+ [
1503
+ "COO",
1504
+ "N=O"
1505
+ ],
1506
+ [
1507
+ "COOO",
1508
+ "[N+](=O)[O-]"
1509
+ ],
1510
+ [
1511
+ "CCC",
1512
+ "2"
1513
+ ],
1514
+ [
1515
+ "CCCC",
1516
+ "(C)"
1517
+ ],
1518
+ [
1519
+ "O=",
1520
+ "N"
1521
+ ],
1522
+ [
1523
+ "C(=O)OC",
1524
+ ")"
1525
+ ],
1526
+ [
1527
+ "(Cl)",
1528
+ "Cl"
1529
+ ],
1530
+ [
1531
+ "=",
1532
+ "CC1"
1533
+ ],
1534
+ [
1535
+ "c",
1536
+ "3"
1537
+ ],
1538
+ [
1539
+ "CCO",
1540
+ "O"
1541
+ ],
1542
+ [
1543
+ "C(",
1544
+ "=C"
1545
+ ],
1546
+ [
1547
+ "C(C)",
1548
+ "C"
1549
+ ],
1550
+ [
1551
+ "CCC1",
1552
+ "2"
1553
+ ],
1554
+ [
1555
+ "CC1",
1556
+ "="
1557
+ ],
1558
+ [
1559
+ "CC1(C)",
1560
+ "C2"
1561
+ ],
1562
+ [
1563
+ "CC",
1564
+ "C(O)"
1565
+ ],
1566
+ [
1567
+ "=",
1568
+ "C1"
1569
+ ],
1570
+ [
1571
+ "CC",
1572
+ "[CH]"
1573
+ ],
1574
+ [
1575
+ "C(C)",
1576
+ "(OO"
1577
+ ],
1578
+ [
1579
+ "c1ccccc1)",
1580
+ "c1ccccc1"
1581
+ ],
1582
+ [
1583
+ "O=N",
1584
+ "OO"
1585
+ ],
1586
+ [
1587
+ "(",
1588
+ "F)"
1589
+ ],
1590
+ [
1591
+ "C",
1592
+ "/"
1593
+ ],
1594
+ [
1595
+ "CC",
1596
+ "2"
1597
+ ],
1598
+ [
1599
+ "C=CC",
1600
+ "2"
1601
+ ],
1602
+ [
1603
+ "CC1",
1604
+ "=CC"
1605
+ ],
1606
+ [
1607
+ "CC",
1608
+ "CCC1"
1609
+ ],
1610
+ [
1611
+ "C(F)",
1612
+ "(F)"
1613
+ ],
1614
+ [
1615
+ "C(CO)",
1616
+ "OO"
1617
+ ],
1618
+ [
1619
+ "C(OO",
1620
+ "1)C2O"
1621
+ ],
1622
+ [
1623
+ "c2ccccc2",
1624
+ ")"
1625
+ ],
1626
+ [
1627
+ "(",
1628
+ "Br)"
1629
+ ],
1630
+ [
1631
+ "C",
1632
+ "c1ccccc1"
1633
+ ],
1634
+ [
1635
+ "CC",
1636
+ "C(C"
1637
+ ],
1638
+ [
1639
+ "CCO",
1640
+ "C(=O)"
1641
+ ],
1642
+ [
1643
+ "[CH2]",
1644
+ "C(=O)"
1645
+ ],
1646
+ [
1647
+ "/C=C",
1648
+ "/"
1649
+ ],
1650
+ [
1651
+ "O=[N+]([O-])",
1652
+ "O"
1653
+ ],
1654
+ [
1655
+ "CC",
1656
+ "CCC"
1657
+ ],
1658
+ [
1659
+ "CCCC",
1660
+ "O"
1661
+ ],
1662
+ [
1663
+ "O",
1664
+ "C)"
1665
+ ],
1666
+ [
1667
+ "[",
1668
+ "n"
1669
+ ],
1670
+ [
1671
+ "CC",
1672
+ "="
1673
+ ],
1674
+ [
1675
+ "[CH2]",
1676
+ "c1ccccc1"
1677
+ ],
1678
+ [
1679
+ "#N",
1680
+ ")"
1681
+ ],
1682
+ [
1683
+ "[n",
1684
+ "H]"
1685
+ ],
1686
+ [
1687
+ "[CH]",
1688
+ "O"
1689
+ ],
1690
+ [
1691
+ "[CH]",
1692
+ "O)"
1693
+ ],
1694
+ [
1695
+ "[CH2]",
1696
+ "C(=O)OC"
1697
+ ],
1698
+ [
1699
+ "CC(C)",
1700
+ "O"
1701
+ ],
1702
+ [
1703
+ "C(C)(C)",
1704
+ "O"
1705
+ ],
1706
+ [
1707
+ "C(",
1708
+ "OO)"
1709
+ ],
1710
+ [
1711
+ "C(",
1712
+ "CC)"
1713
+ ],
1714
+ [
1715
+ "CC(C)(C)",
1716
+ "O"
1717
+ ],
1718
+ [
1719
+ "(OO",
1720
+ "N=O)"
1721
+ ],
1722
+ [
1723
+ "[CH2]C(=O)",
1724
+ "C"
1725
+ ],
1726
+ [
1727
+ "2",
1728
+ ")"
1729
+ ],
1730
+ [
1731
+ "C(C)",
1732
+ "(O)"
1733
+ ],
1734
+ [
1735
+ "CCO",
1736
+ "OO"
1737
+ ],
1738
+ [
1739
+ "O=[N+]([O-])",
1740
+ "OOO"
1741
+ ],
1742
+ [
1743
+ "c",
1744
+ "(C)"
1745
+ ],
1746
+ [
1747
+ "[CH]",
1748
+ "1"
1749
+ ],
1750
+ [
1751
+ "C(O)",
1752
+ "CO"
1753
+ ],
1754
+ [
1755
+ "CC(O)",
1756
+ "("
1757
+ ],
1758
+ [
1759
+ "(",
1760
+ "CC"
1761
+ ],
1762
+ [
1763
+ "C(C)",
1764
+ "O[N+](=O)[O-]"
1765
+ ],
1766
+ [
1767
+ "C(=O)O",
1768
+ "CC"
1769
+ ],
1770
+ [
1771
+ "CC(",
1772
+ "="
1773
+ ],
1774
+ [
1775
+ "CC(",
1776
+ "OO"
1777
+ ],
1778
+ [
1779
+ "[C@",
1780
+ "H]"
1781
+ ],
1782
+ [
1783
+ "C/",
1784
+ "C(=C"
1785
+ ],
1786
+ [
1787
+ "(",
1788
+ "O"
1789
+ ],
1790
+ [
1791
+ "C",
1792
+ "O)"
1793
+ ],
1794
+ [
1795
+ "(C",
1796
+ "=O)"
1797
+ ],
1798
+ [
1799
+ "OC1",
1800
+ "(C)"
1801
+ ],
1802
+ [
1803
+ "CCCC",
1804
+ "(=O)"
1805
+ ],
1806
+ [
1807
+ "C2",
1808
+ "(C)"
1809
+ ],
1810
+ [
1811
+ "C(C)(C)",
1812
+ "C)"
1813
+ ],
1814
+ [
1815
+ "O=",
1816
+ "C1"
1817
+ ],
1818
+ [
1819
+ "c2ccccc2",
1820
+ "1"
1821
+ ],
1822
+ [
1823
+ "O",
1824
+ "C(C)=O"
1825
+ ],
1826
+ [
1827
+ "C(Cl)",
1828
+ "(Cl)"
1829
+ ],
1830
+ [
1831
+ "C1",
1832
+ "2"
1833
+ ],
1834
+ [
1835
+ "C(Cl)",
1836
+ "Cl"
1837
+ ],
1838
+ [
1839
+ "C1CC",
1840
+ "2"
1841
+ ],
1842
+ [
1843
+ "CC(",
1844
+ "CO[N+](=O)[O-])"
1845
+ ],
1846
+ [
1847
+ "CC1",
1848
+ "=C(C)"
1849
+ ],
1850
+ [
1851
+ "C=C(C)",
1852
+ "C1CC"
1853
+ ],
1854
+ [
1855
+ "C(C=O)",
1856
+ "OO"
1857
+ ],
1858
+ [
1859
+ "C(",
1860
+ "O[N+](=O)[O-])"
1861
+ ],
1862
+ [
1863
+ "CCC",
1864
+ "=C(C)"
1865
+ ],
1866
+ [
1867
+ "C=",
1868
+ "C("
1869
+ ],
1870
+ [
1871
+ "CCCC",
1872
+ ")"
1873
+ ],
1874
+ [
1875
+ "CCCC",
1876
+ "2"
1877
+ ],
1878
+ [
1879
+ "C=C",
1880
+ "[CH]"
1881
+ ],
1882
+ [
1883
+ "C(O)",
1884
+ "COO"
1885
+ ],
1886
+ [
1887
+ "OO",
1888
+ "C1"
1889
+ ],
1890
+ [
1891
+ "C=CC",
1892
+ "=CC1"
1893
+ ],
1894
+ [
1895
+ "CC(",
1896
+ "OO)"
1897
+ ],
1898
+ [
1899
+ "C(C",
1900
+ "O[N+](=O)[O-])"
1901
+ ],
1902
+ [
1903
+ "CO",
1904
+ "1"
1905
+ ],
1906
+ [
1907
+ "C=",
1908
+ "CCC1"
1909
+ ],
1910
+ [
1911
+ "OO",
1912
+ "C(C)"
1913
+ ],
1914
+ [
1915
+ "C(C",
1916
+ "OO)"
1917
+ ],
1918
+ [
1919
+ "C(O)",
1920
+ "C(C)"
1921
+ ],
1922
+ [
1923
+ "CCO",
1924
+ "C1"
1925
+ ],
1926
+ [
1927
+ "(C",
1928
+ "#N)"
1929
+ ],
1930
+ [
1931
+ "cccc",
1932
+ "c3"
1933
+ ],
1934
+ [
1935
+ "C1",
1936
+ "="
1937
+ ],
1938
+ [
1939
+ "C#",
1940
+ "CCCC"
1941
+ ],
1942
+ [
1943
+ "C=CC",
1944
+ "(O)"
1945
+ ],
1946
+ [
1947
+ "CC(C)",
1948
+ "="
1949
+ ],
1950
+ [
1951
+ "[Si]",
1952
+ "(C)"
1953
+ ],
1954
+ [
1955
+ "CC(Cl)",
1956
+ "CC(Cl)"
1957
+ ],
1958
+ [
1959
+ "CC(CO)",
1960
+ "(OO)"
1961
+ ],
1962
+ [
1963
+ "N1",
1964
+ "CCCC1"
1965
+ ],
1966
+ [
1967
+ "CCC2",
1968
+ "C1CC2"
1969
+ ],
1970
+ [
1971
+ "c3",
1972
+ "ccccc3"
1973
+ ],
1974
+ [
1975
+ "CCC2C1CC2",
1976
+ "(C)C"
1977
+ ],
1978
+ [
1979
+ "C",
1980
+ "c1cccc"
1981
+ ],
1982
+ [
1983
+ "[",
1984
+ "Sn"
1985
+ ],
1986
+ [
1987
+ "CC",
1988
+ "C(C)(C)"
1989
+ ],
1990
+ [
1991
+ "C(C)",
1992
+ "=O)"
1993
+ ],
1994
+ [
1995
+ "C(C)",
1996
+ "OO"
1997
+ ],
1998
+ [
1999
+ "CC(",
2000
+ "CC("
2001
+ ],
2002
+ [
2003
+ "CC(C)",
2004
+ "(C#N)"
2005
+ ],
2006
+ [
2007
+ "C2",
2008
+ "(C)C"
2009
+ ],
2010
+ [
2011
+ "(OO",
2012
+ "O)"
2013
+ ],
2014
+ [
2015
+ "(OO",
2016
+ "O[N+](=O)[O-])"
2017
+ ],
2018
+ [
2019
+ "C(CO)",
2020
+ "O[N+](=O)[O-]"
2021
+ ],
2022
+ [
2023
+ "c(",
2024
+ "OC)"
2025
+ ],
2026
+ [
2027
+ "S",
2028
+ "C"
2029
+ ],
2030
+ [
2031
+ "CC",
2032
+ "C(OO"
2033
+ ],
2034
+ [
2035
+ "CCCC",
2036
+ "(O)"
2037
+ ],
2038
+ [
2039
+ "OO",
2040
+ "C1(C)"
2041
+ ],
2042
+ [
2043
+ "CC(",
2044
+ "O[N+](=O)[O-])"
2045
+ ],
2046
+ [
2047
+ "CC(C)(C)",
2048
+ "OO"
2049
+ ],
2050
+ [
2051
+ "CC(C",
2052
+ "[CH]"
2053
+ ],
2054
+ [
2055
+ "CC1=CC",
2056
+ "C("
2057
+ ],
2058
+ [
2059
+ "(",
2060
+ "CCCC)"
2061
+ ],
2062
+ [
2063
+ "C(O)",
2064
+ "C=O"
2065
+ ],
2066
+ [
2067
+ "CC(",
2068
+ "=C"
2069
+ ],
2070
+ [
2071
+ "CCCCCCCC",
2072
+ "CCCC"
2073
+ ],
2074
+ [
2075
+ "[Sn",
2076
+ "]"
2077
+ ],
2078
+ [
2079
+ "OOC1(C)",
2080
+ "C2O"
2081
+ ],
2082
+ [
2083
+ "(",
2084
+ "C(=O)O"
2085
+ ],
2086
+ [
2087
+ "(",
2088
+ "C(=O)OC)"
2089
+ ],
2090
+ [
2091
+ "3",
2092
+ "]"
2093
+ ],
2094
+ [
2095
+ "N",
2096
+ "#N"
2097
+ ],
2098
+ [
2099
+ "O",
2100
+ "CCO"
2101
+ ],
2102
+ [
2103
+ "S",
2104
+ "=O"
2105
+ ],
2106
+ [
2107
+ "n",
2108
+ "cc1"
2109
+ ],
2110
+ [
2111
+ "CC",
2112
+ "I"
2113
+ ],
2114
+ [
2115
+ "C(C)",
2116
+ "OOO"
2117
+ ],
2118
+ [
2119
+ "C(=O)",
2120
+ "SC"
2121
+ ],
2122
+ [
2123
+ "[CH",
2124
+ "3]"
2125
+ ],
2126
+ [
2127
+ "[O",
2128
+ "H]"
2129
+ ],
2130
+ [
2131
+ "Cl",
2132
+ "CCl"
2133
+ ],
2134
+ [
2135
+ "[CH2]",
2136
+ "C(C)"
2137
+ ],
2138
+ [
2139
+ "C1",
2140
+ "CCOC1"
2141
+ ],
2142
+ [
2143
+ "C1OC1",
2144
+ "(C)"
2145
+ ],
2146
+ [
2147
+ "CC(",
2148
+ "=CC"
2149
+ ],
2150
+ [
2151
+ "[Si",
2152
+ "H]"
2153
+ ],
2154
+ [
2155
+ "COO",
2156
+ "N=O)"
2157
+ ],
2158
+ [
2159
+ "c1cc",
2160
+ "ncc1"
2161
+ ],
2162
+ [
2163
+ "[S",
2164
+ "-]"
2165
+ ],
2166
+ [
2167
+ "CC(C=O)",
2168
+ "(OO)"
2169
+ ],
2170
+ [
2171
+ "C/C(=C",
2172
+ "/"
2173
+ ],
2174
+ [
2175
+ "C#CCCC",
2176
+ "[CH2]"
2177
+ ],
2178
+ [
2179
+ "CC(C)(C)OO",
2180
+ "C(C)(C)C"
2181
+ ],
2182
+ [
2183
+ "C",
2184
+ "F"
2185
+ ],
2186
+ [
2187
+ "C",
2188
+ "[CH2]"
2189
+ ],
2190
+ [
2191
+ "CC",
2192
+ "[CH2]"
2193
+ ],
2194
+ [
2195
+ "C(",
2196
+ "O"
2197
+ ],
2198
+ [
2199
+ "O)",
2200
+ "OO"
2201
+ ],
2202
+ [
2203
+ "C(C)",
2204
+ "OON=O"
2205
+ ],
2206
+ [
2207
+ "CO",
2208
+ "CO"
2209
+ ],
2210
+ [
2211
+ "CO",
2212
+ "C(=O)OO"
2213
+ ],
2214
+ [
2215
+ "[N",
2216
+ "a"
2217
+ ],
2218
+ [
2219
+ "Cl",
2220
+ "C1"
2221
+ ],
2222
+ [
2223
+ "C1OC1",
2224
+ "C=O"
2225
+ ],
2226
+ [
2227
+ "CS",
2228
+ "C(=S)"
2229
+ ],
2230
+ [
2231
+ "CC(",
2232
+ "CC"
2233
+ ],
2234
+ [
2235
+ "CC1",
2236
+ "("
2237
+ ],
2238
+ [
2239
+ "CC1",
2240
+ "C2(C)C"
2241
+ ],
2242
+ [
2243
+ "[CH2]CCC",
2244
+ "=CC"
2245
+ ],
2246
+ [
2247
+ "C(C)(C)",
2248
+ "OO"
2249
+ ],
2250
+ [
2251
+ "CC(O)",
2252
+ "(CO[N+](=O)[O-])"
2253
+ ],
2254
+ [
2255
+ "(C)C",
2256
+ ")"
2257
+ ],
2258
+ [
2259
+ "C(CO)",
2260
+ "OON=O"
2261
+ ],
2262
+ [
2263
+ "c(O)",
2264
+ "c1"
2265
+ ],
2266
+ [
2267
+ "C=CC2",
2268
+ "OOC1(C)C2O"
2269
+ ],
2270
+ [
2271
+ "C(F)(F)",
2272
+ "C(F)"
2273
+ ],
2274
+ [
2275
+ "COC(=O)OO",
2276
+ "C(=O)OC"
2277
+ ],
2278
+ [
2279
+ "CSC(=S)",
2280
+ "OC"
2281
+ ],
2282
+ [
2283
+ "(",
2284
+ "COO)"
2285
+ ],
2286
+ [
2287
+ "@",
2288
+ "H]"
2289
+ ],
2290
+ [
2291
+ "n",
2292
+ "1"
2293
+ ],
2294
+ [
2295
+ "C(C)",
2296
+ "OOO[N+](=O)[O-]"
2297
+ ],
2298
+ [
2299
+ "CCC",
2300
+ "="
2301
+ ],
2302
+ [
2303
+ "(C",
2304
+ "1"
2305
+ ],
2306
+ [
2307
+ "C1",
2308
+ "=C"
2309
+ ],
2310
+ [
2311
+ "C(Cl)",
2312
+ "(Cl)Cl"
2313
+ ],
2314
+ [
2315
+ "CCCCCC",
2316
+ "CCC"
2317
+ ],
2318
+ [
2319
+ "(C)",
2320
+ "cc"
2321
+ ],
2322
+ [
2323
+ "[C@",
2324
+ "@H]"
2325
+ ],
2326
+ [
2327
+ "c1cc",
2328
+ "c("
2329
+ ],
2330
+ [
2331
+ "CCC(",
2332
+ "O[N+](=O)[O-])"
2333
+ ],
2334
+ [
2335
+ "CCOOO",
2336
+ "[N+](=O)[O-]"
2337
+ ],
2338
+ [
2339
+ "/",
2340
+ "C=C("
2341
+ ],
2342
+ [
2343
+ "2",
2344
+ "(C)"
2345
+ ],
2346
+ [
2347
+ "O",
2348
+ "CC"
2349
+ ],
2350
+ [
2351
+ "[CH]",
2352
+ "CO"
2353
+ ],
2354
+ [
2355
+ "CC(=O)",
2356
+ "C(O)"
2357
+ ],
2358
+ [
2359
+ "CCC1",
2360
+ "=C(C)"
2361
+ ],
2362
+ [
2363
+ "CC(C)",
2364
+ "=O"
2365
+ ],
2366
+ [
2367
+ "C(C)(C)",
2368
+ "O[N+](=O)[O-]"
2369
+ ],
2370
+ [
2371
+ "CCCCCCCC",
2372
+ "(=O)"
2373
+ ],
2374
+ [
2375
+ "Cc1cc",
2376
+ "c("
2377
+ ],
2378
+ [
2379
+ "OOC1",
2380
+ "C2O"
2381
+ ],
2382
+ [
2383
+ "OOC(C)",
2384
+ "(C1"
2385
+ ],
2386
+ [
2387
+ "[Na",
2388
+ "]"
2389
+ ],
2390
+ [
2391
+ "(",
2392
+ "CO"
2393
+ ],
2394
+ [
2395
+ "(",
2396
+ "c1ccccc1)c1ccccc1"
2397
+ ],
2398
+ [
2399
+ "C",
2400
+ "/C=C\\"
2401
+ ],
2402
+ [
2403
+ "CC",
2404
+ "C(CO)"
2405
+ ],
2406
+ [
2407
+ "C(C",
2408
+ "OO"
2409
+ ],
2410
+ [
2411
+ "C=",
2412
+ "CCC"
2413
+ ],
2414
+ [
2415
+ "C=",
2416
+ "CCCC1"
2417
+ ],
2418
+ [
2419
+ "C(O)",
2420
+ "OO"
2421
+ ],
2422
+ [
2423
+ "CCO",
2424
+ "[N+](=O)[O-]"
2425
+ ]
2426
+ ]
2427
+ }
2428
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<unk>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<s>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "</s>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "512": {
37
+ "content": "<extra_id_0>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "513": {
45
+ "content": "<extra_id_1>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ }
52
+ },
53
+ "additional_special_tokens": [
54
+ "<extra_id_0>",
55
+ "<extra_id_1>"
56
+ ],
57
+ "bos_token": "<s>",
58
+ "clean_up_tokenization_spaces": false,
59
+ "eos_token": "</s>",
60
+ "extra_ids": 2,
61
+ "extra_special_tokens": {},
62
+ "max_length": 256,
63
+ "model_max_length": 1000000000000000019884624838656,
64
+ "model_type": "t5",
65
+ "pad_to_multiple_of": null,
66
+ "pad_token": "<pad>",
67
+ "pad_token_type_id": 0,
68
+ "padding_side": "right",
69
+ "stride": 0,
70
+ "tokenizer_class": "T5TokenizerFast",
71
+ "truncation_side": "right",
72
+ "truncation_strategy": "longest_first",
73
+ "unk_token": "<unk>",
74
+ "vocab_size": 512
75
+ }