Preprocessing Module¶
Load and Clean¶
load_and_clean
¶
clean_spectra
¶
clean_spectra(spectra, preprocessing_parameters={})
uses matchms to normalize intensities, add information and add losses to the spectra
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spectra
|
generator
|
generator object of matchms.Spectrum.objects loaded via matchms in python |
required |
entropy_threshold
|
float
|
spectral entropy threshold to sort out noisy spectra (see MoNA spectral entropy) |
required |
Returns:
Name | Type | Description |
---|---|---|
cleaned_spectra |
list
|
list of matchms.Spectrum.objects; spectra that do not fit will be removed |
Source code in MS2LDA/Preprocessing/load_and_clean.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
load_mgf
¶
load_mgf(spectra_path)
loads spectra from a mgf file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spectra_path
|
str
|
path to the spectra.mgf file |
required |
Returns:
Name | Type | Description |
---|---|---|
spectra |
generator
|
matchms generator object with the loaded spectra |
Source code in MS2LDA/Preprocessing/load_and_clean.py
11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
load_msp
¶
load_msp(spectra_path)
loads spectra from a mzml file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spectra_path
|
str
|
path to the spectra.mgf file |
required |
Returns:
Name | Type | Description |
---|---|---|
spectra |
generator
|
matchms generator object with the loaded spectra |
Source code in MS2LDA/Preprocessing/load_and_clean.py
41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
load_mzml
¶
load_mzml(spectra_path)
loads spectra from a mzml file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spectra_path
|
str
|
path to the spectra.mgf file |
required |
Returns:
Name | Type | Description |
---|---|---|
spectra |
generator
|
matchms generator object with the loaded spectra |
Source code in MS2LDA/Preprocessing/load_and_clean.py
56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
Generate Corpus¶
generate_corpus
¶
combine_features
¶
combine_features(dataset_frag, dataset_loss)
combines fragments and losses for a list of spectra
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_frag(list)
|
list of lists where each list represents fragements from one spectrum |
required | |
dataset_loss
|
list
|
list of lists where each list represents the losses from one spectrum |
required |
Returns:
Name | Type | Description |
---|---|---|
frag_and_loss |
list
|
list of list where each list represents the fragments and losses from one spectrum |
Source code in MS2LDA/Preprocessing/generate_corpus.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
features_to_words
¶
features_to_words(spectra, significant_figures=2, acquisition_type='DDA')
generates a list of lists for fragments and losses for a dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spectra
|
list
|
list of matchms.Spectrum.objects; they should be cleaned beforehand e.g. intensity normalization, add losses |
required |
Returns:
Name | Type | Description |
---|---|---|
dataset_frag |
list
|
is a list of lists where each list represents fragements from one spectrum |
dataset_loss |
list
|
is a list of lists where each list represents the losses from one spectrum |
Source code in MS2LDA/Preprocessing/generate_corpus.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
map_doc2spec
¶
map_doc2spec(feature_words, spectra)
generates hashkeys to find the original spectrum for a generate document
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature_words
|
|
required | |
metadata
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
doc2spec_map |
|
Source code in MS2LDA/Preprocessing/generate_corpus.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|