MS2LDA Logo

Welcome to the MS2LDA Documentation¶

MS2LDA (Mass Spectrometry–Latent Dirichlet Allocation) is a framework that brings the concept of topic modeling to the world of tandem mass spectrometry (MS/MS). It helps identify recurring fragmentation patterns — known as Mass2Motifs — that represent conserved molecular substructures across complex spectra.

What is MS2LDA?¶

Traditional mass spectrometry analysis depends on spectral libraries or manual curation. MS2LDA offers a machine learning based, data driven, and unsupervised alternative that:

Detects latent fragmentation motifs across large datasets.
Aids structural elucidation of unknown compounds.
Bridges mass spectrometry and cheminformatics.

The MS2LDA framework applies Latent Dirichlet Allocation (LDA), a method originally developed for text analysis, to infer co‑occurring patterns of fragment ions and neutral losses. This allows the discovery of statistically significant patterns that often reflect chemical substructures. 🔍

Key Features¶

🧠 Unsupervised learning of Mass2Motifs at unprecedented speed
🧬 Automated Mass2Motif Annotation Guidance (MAG) with Spec2Vec
🔗 Integration with MassQL-searchable MotifDB
📈 Visualization app for interactive exploration of Mass2Motifs
💻 Command-line access and Jupyter Notebooks for both scripted workflows and interactive data exploration

Documentation Sections¶

This site provides everything you need to get started:

User Guide: Overview, getting started, and usage of the Viz App and Command-Line
Modules Reference: All available classes and functions
Examples & Tutorials: Practical use cases and annotated datasets

Developers & Contributors¶

MS2LDA is developed by a team led by Rosina Torres Ortega, Jonas Dietrich, and Joe Wandy, under the supervision of Justin J.J. van der Hooft at Wageningen University & Research.

📚 MS2LDA builds on the original work published in:

van der Hooft et al. PNAS, 2016 → https://doi.org/10.1073/pnas.1608041113

As well as MotifDB:

Rogers et al. Faraday Discussions, 2019 → https://doi.org/10.1039/C8FD00235E

📝 For methodology details and recent updates, please read our preprint:
Torres Ortega et al. bioRxiv, 2025 → https://doi.org/10.1101/2025.06.19.659491

Ongoing development continues in collaboration with the broader metabolomics and computational biology community. We welcome feedback, issues, and pull requests on our GitHub repository.

Questions? Open an issue or contact the development team directly 🤝

Acknowledgments¶


This work was carried out by the van der Hooft Computational Metabolomics Group.	This work was supported by Wageningen University & Research.