Modeling Module¶
modeling
¶
calculate_document_entropy
¶
calculate_document_entropy(model)
Entropy for Document-Topic Distribution
Source code in MS2LDA/modeling.py
103 104 105 106 107 108 109 110 111 112 113 114 |
|
calculate_topic_entropy
¶
calculate_topic_entropy(model)
Entropy for Topic-Word Distribution
Source code in MS2LDA/modeling.py
117 118 119 120 121 122 123 124 125 126 127 128 |
|
check_convergence
¶
check_convergence(entropy_history, epsilon=0.001, n=3)
no
Source code in MS2LDA/modeling.py
131 132 133 134 135 136 137 138 |
|
create_motif_spectra
¶
create_motif_spectra(
motif_features, charge=1, motifset_name="unknown", significant_digits=2
)
creates a matchms spectrum object for the found motifs
Parameters:
Name | Type | Description | Default |
---|---|---|---|
motif_features
|
list
|
tuples within a list of lists with spectral features assigned per motif and their given motif importance |
required |
Returns:
Name | Type | Description |
---|---|---|
motif_spectra |
list
|
list of matchms spectrum objects; one for each motif |
Source code in MS2LDA/modeling.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
define_model
¶
define_model(n_motifs, model_parameters={})
creating a LDA model using the tomotopy library
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_motifs
|
int
|
number of motifs that will be generated |
required |
model_parameters
|
dict
|
defines all further parameters that can be set in the tomotopy LDA model (see https://bab2min.github.io/tomotopy/v0.12.6/en/#tomotopy.LDAModel) |
{}
|
Returns:
Name | Type | Description |
---|---|---|
model |
tomotopy LDAModel class |
Source code in MS2LDA/modeling.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
extract_motifs
¶
extract_motifs(model, top_n=50)
extract motifs from the trained LDA model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
tomotopy LDAModel class |
required | |
top_n
|
int
|
number of top n features extracted per motif |
50
|
Returns:
Name | Type | Description |
---|---|---|
motif_features |
list
|
tuples within a list of lists with spectral features assigned per motif and their given motif importance |
Source code in MS2LDA/modeling.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
train_model
¶
train_model(
model,
documents,
iterations=100,
train_parameters={},
convergence_parameters={
"type": "entropy_history_doc",
"threshold": 0.01,
"window_size": 3,
"step_size": 10,
},
)
trains the LDA model on the given documents
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
tomotopy LDAModel class |
required | |
documents
|
list
|
list of lists with frag@/loss@ strings representing spectral features |
required |
iterations
|
int
|
number of iterations in the training |
100
|
train_parameters
|
dict
|
defines all further parameters that can be set in the tomotopy training function (see https://bab2min.github.io/tomotopy/v0.12.6/en/#tomotopy.LDAModel.train) |
{}
|
Returns:
Name | Type | Description |
---|---|---|
model |
tomotopy LDAModel class |
|
convergence_curve |
list
|
list containing the model perplexity values for after every 10 iterations |
Source code in MS2LDA/modeling.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|