Skip to content

MS2LDA Parameter Settings

LDA parameter

  • depends on gensim or tomotopy implementation; tomotopy easier to understand and interesting parameters (e.g. threshold for word frequency), but gensim has "auto" setting for alpha and eta

Motif parameter

  • how many peaks and losses are included in a motif, should be changed to all
  • should the peaks be normalized or not; advantage --> everything is similar; disadvantage --> relative importance gets lost
  • normalize over all motifs

Spec2Vec parameter

  • how many molecules should be retrieved
  • should the scores be normalized? and then top 10 percent or top 10 be retrieved
  • other idea: retrieve best hit -10%; so you can keep the predicted score
  • should the same molecule be removed? probably count them but, for later only one --> one could also combine the spectra of the same compounds;
  • include Spec2Vec in negative mode

Cluster molecules

  • use fingerprints for structures to find similar motifs

Masking

  • the current value is set to 1
  • we need a spearman threshold to say if two spectra are the same or not
  • if more than two than collect disimilar and similar effects based on masking
  • how many clusters; this could be done iteratively until the spearman value in one cluster is high enough within the cluster
  • how does the finding influence the peak importance for a certain subcluster; how to visualize subclusters
  • what difference between mask spectra and the original one is big enough

Screening

  • The levelA, B,C,D schema needs to be re-evaluated if it is suitable. In the original paper they only look for fragments, we also for losses...