Bioinformatics
AlphaFold is exciting.
Links
- Rosalind - Platform for learning bioinformatics and programming through problem solving.
- Awesome Bioinformatics
- STUDY WITH ME | Computational Biology (2019)
- Primer on statistical mechanics for biochemists
- Kindred - Python biomedical relation extraction package that uses a supervised approach (i.e. needs training data).
- fastq-rs - Can process fastq files at about the speed of the coreutils.
- nglview - Jupyter widget to interactively view molecular structures and trajectories.
- ngl - WebGL protein viewer.
- Mark Zuckerberg Live With Joe DeRisi & Steve Quake || The Future of Technology & Society (2019)
- The Biostar Handbook - Introduces readers to bioinformatics, the scientific discipline at the intersection of biology, computer science, and statistical data analytics dedicated to the digital processing of genomic information.
- Michael Levin | 2017 Allen Frontiers Symposium
- Lattice - Creates design-automation workflows that fundamentally change engineering biology. (GitHub)
- GuacaMol - Benchmarks for generative chemistry.
- Single-cell RNA-seq pseudotime estimation algorithms
- Allen Institute for Brain Science Toolkit
- Computer-Designed Organisms - Scalable pipeline for creating functional novel lifeforms.
- BioGrakn - Provides an intuitive way to query interconnected and heterogeneous biomedical data in one single place. (Article)
- AlphaFold: Using AI for scientific discovery (2020) (HN)
- Rust-Bio - Provides implementations of many algorithms and data structures that are useful for bioinformatics.
- Low-N protein engineering with data-efficient deep learning (2020)
- Machine Boss - Bioinformatics Open Source Sequence machine.
- atomium - Python molecular modeller (with .pdb/.cif/.mmtf parsing and production).
- PoincareMaps - Poincare maps recover continuous hierarchies in single-cell data.
- OpenMM - Toolkit for molecular simulation using high performance GPU code. (OpenMM Cookbook)
- Guide to help wet lab biologists learn computational biology (2020)
- BiGG Models - Search the database by model, reaction, metabolite, or gene. (Code)
- Systems Biology and Biotechnology Specialization course
- Path to a free self-taught education in Bioinformatics
- How to Build a Biotech (HN)
- Unified rational protein engineering with sequence-based deep representation learning (2019)
- Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks (2020) (HN)
- Computationally Comparing Biological Networks and Reconstructing Their Evolution (2012)
- Bioinformatics Specialization courses
- Scanpy - Single-Cell Analysis in Python.
- elPrep - High-performance tool for preparing .sam/.bam files for variant calling in sequencing pipelines.
- Announcing Sylph
- Interactive bioimage analysis with Python and Jupyter (2020)
- Biomedical Applications of Electrical Stimulation (2020)
- a16z invested bio companies
- MultiQC - Searches a given directory for analysis logs and compiles a HTML report. (Code)
- Diyhplus Wiki - Wiki for open source hardware, do-it-yourself biohacking and practical transhumanism. (Code)
- Tellurium - Python environment for reproducible dynamical modeling of biological networks. (Web)
- Chemlambda - Graph rewriting system derived from graphic lambda calculus [1] which can be seen as a simple model of chemical or biological computing. (Code)
- FPbase - Fluorescent Protein Database. (HN)
- Cell Biology by the Numbers
- CReM - Open-source Python framework to generate chemical structures using a fragment-based approach.
- QSAR modeling software and virtual screening
- Axial - Founder-driven life sciences companies (2020)
- Manolis Kellis research
- Pumas AI - Platform for pharmaceutical modeling and simulation. (HN)
- Mitochondrial dynamics in postmitotic cells regulate neurogenesis (2020) (Tweet)
- PVDH Lab - Stem cell and Developmental neurobiology.
- Applying tech frameworks to biotech: key differences (HN)
- Europe’s biotech renaissance (2020)
- Awesome DeepBio
- Coming up with ideas for biotech startups (2020)
- Analyzing toehold sequences for synthetic biology (2020)
- R, Data Science, & Computational Biology (2020)
- Bioinformatics Algorithms book
- ASAP - Automatic Selection And Prediction tools for materials and molecules.
- The second decade of synthetic biology: 2010–2020 (HN)
- BioDesign Research Conference
- Cryo–electron microscopy breaks the atomic resolution barrier at last (2020) (HN)
- Deep Learning for Graphs in Chemistry and Biology
- Amorphous computing - Refers to computational systems that use very large numbers of identical, parallel processors each having limited computational ability and local interactions.
- Awesome Single Cell - Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
- MacLean Lab - Stem Cell Systems Biology Research.
- Cahan Lab - Stem cell engineering, developmental biology, and cancer biology research.
- Tufts Uni: Levin Biology Lab - Investigating information storage and processing in biological systems.
- Nextflow - Bioinformatics workflow manager that enables the development of portable and reproducible workflows. (Web) (GitHub)
- Quantum deep field for molecule
- AlQuraishi Lab at Columbia University - Machine Learning, Molecules, Systems Biology research. (GitHub)
- AlphaFold: a solution to a 50-year-old grand challenge in biology, protein folding (2020) (HN) (Explained) (Lex explains)
- AlphaFold2 @ CASP14: “It feels like one’s child has left home.” (2020) (Tweet) (HN)
- Alphafold2 in PyTorch
- ProSPr: Protein Structure Prediction
- MiniFold - Deep Learning for Protein Structure Prediction inspired by DeepMind AlphaFold algorithm.
- What is protein folding? A brief explanation (2020) (HN)
- HH-suite3 for sensitive sequence searching - Software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
- Soding Lab - Quantitative and Computational Biology Research. (GitHub)
- Protein folding explained (2020)
- sfaira - Data and model repository for single-cell data.
- Theis Lab - Institute of Computational Biology. (GitHub)
- PostEra - Medicinal Chemistry Powered by Machine Learning.
- Bioregistry - Integrative meta-registry of biological databases, ontologies, and nomenclatures.
- Nature Biotechnology
- PyMOL - Molecular visualization system. (Code)
- Bioinformatics Chat - Podcast about computational biology, bioinformatics, and next generation sequencing.
- OpenCADD - Python library for structural cheminformatics.
- Ask HN: Best way to learn computational biology/immunology? (2021)
- Lumol - Classical molecular simulation engine that provides a solid base for developing new algorithms and methods. (Web)
- Tasks Assessing Protein Embeddings (TAPE) - Set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
- Center for Computational Biology
- Yun S. Song Research (Song Lab) (GitHub)
- MolStar - Comprehensive macromolecular library. (Web)
- How to Start a Biotech Company on a Budget (2021)
- BioGraPy - Biological Graphic tool in Python.
- FPbase - The Fluorescent Protein Database. (Code)
- Cell Ontology - Structured controlled vocabulary for cell types in animals. (Code)
- Squidpy - Spatial Molecular Data Analysis in Python.
- TorchProteinLibrary - PyTorch library of layers acting on protein representations.
- The Wilke Lab - Computational Evolutionary Biology.
- parasail - Pairwise Sequence Alignment Library.
- CellChat - R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data.
- BiLSTM-CNN-CRF architecture for sequence tagging
- scArches - Package to integrate newly produced single-cell datasets into integrated reference atlases.
- Biotech for the Biocurious (2021)
- The Century of Biology Newsletter
- Papers on machine learning for proteins
- Finding a "killer application" for novel biotech: AveXis case study (2020) (Tweet)
- BioModels - Vast repository of mathematical models of biological and biomedical systems.
- noodles - Bioinformatics I/O libraries in Rust.
- Datamol - Molecular Manipulation Made Easy.
- Role of Bioelectricity During Cell Proliferation in Different Cell Types (2020)
- tidysq - Contains tools for analysis and manipulation of biological sequences (including amino acid and nucleic acid – e.g. RNA, DNA – sequences).
- SeqKit - Cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Go.
- TaxonKit - Practical and Efficient NCBI Taxonomy Toolkit.
- unikmer - Toolkit for nucleic acid k-mer analysis, including set operations on k-mers (sketch) optional with TaxIDs but without count information.
- ProteinSolver - Graph neural network capable of generating novel amino acid sequences that fold into proteins with predetermined topologies. (Docs)
- Learning from Protein Structure with Geometric Vector Perceptrons (2020) (Code)
- Dabble - Membrane protein builder and parameterizer.
- ffq - Fetch run information from the European Nucleotide Archive (ENA). (Tweet)
- MolPAL - Molecular Pool-based Active Learning.
- Delocalization-Induced Molecular Equality (2021)
- Molecular Assembly Index (2021)
- Depth-First Blog
- ChemCore - Cheminformatics toolkit for Rust. (Article)
- Biochemical Pathway Maps (HN)
- Awesome Biomedical Information Extraction
- Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences (2020) (Code)
- Highly accurate protein structure prediction with AlphaFold (2021) (Code) (Tweet) (HN)
- AlphaFold 2 is here: what’s behind the structure prediction miracle (2021) (HN)
- AlphaFold Protein Structure Database (HN)
- The AlphaFold2 Method Paper: A Fount of Good Ideas (2021) (Tweet)
- AlphaFold Colab
- More Protein Folding Progress – What’s It Mean? (2021) (HN)
- smof - UNIX-style FASTA tools.
- AlphaFold2: Are attention and symmetries all you need? (2021)
- ColabFold - Making Protein folding accessible to all via Google Colab.
- BioLink API - API for linked biological knowledge.
- AlpahFold, a tentative review (2021)
- htsget-rs - Bioinformatic file formats accessible to the web.
- Synthace - Enabling life science, the way it should be done.
- scispacy - Full spaCy pipeline and models for scientific/biomedical documents. (Web)
- Bioinformatics Format Specimens - Collection of bioinformatics file format specimens to test against.
- Single-sequence protein structure prediction using language models from deep learning (2021) (Code)
- Robust, Universal Tree Balance Indices (2021) (Tweet)
- Julia for Biologists (2021)
- Hierarchical Generation of Molecular Graphs using Structural Motifs (2020) (Code)
- Broad Institute - Unique, collaborative community pioneering a new model of biomedical science. (Twitter) (GitHub)
- INDRA (Integrated Network and Dynamical Reasoning Assembler) - Automated model assembly system, originally developed for molecular systems biology and then generalized to other domains. (Web)
- Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers
- Neural Distance Embeddings for Biological Sequences (2021) (Code)
- Protein complex prediction with AlphaFold-Multimer (2021) (Tweet) (HN)
- A cryptography game-changer for biomedical research at scale (2021) (HN)
- MoleculeX - New and rapidly growing suite of machine learning methods and software tools for molecule exploration.
- Biomappings - Community curated and predicted equivalences and related mappings between named biological entities that are not available from primary sources.
- PheKnowLator - Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models.
- Computational Reproducibility with BioNix (2021)
- Self-Alignment Pretraining for Biomedical Entity Representations (2021) (Code)
- Bandage - Bioinformatics Application for Navigating De novo Assembly Graphs Easily. (Code)
- Viv - Multiscale visualization of high-resolution multiplexed bioimaging data on the web.
- PHATE - Visualizing Transitions and Structure for Biological Data Exploration.
- Artificial intelligence reveals nuclear pore complexity (2021) (Tweet)
- BioNix - Functional highly reproducible bioinformatics pipelines.
- Awesome Bioinformatics Formats
- BioShape Lab
- Bioinformatics repository with more and newer packages
- The History of Microbiology – A Personal Interpretation
- SeqFu - General-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files.
- Bioinformatics algorithms covered in Cambridge Uni Bio Course
- block aligner - SIMD-accelerated library for computing global and X-drop affine gap sequence alignments using an adaptive block-based algorithm.
- OpenFold - Trainable & open-source PyTorch reproduction of AlphaFold 2.
- HSM (Hierarchical Statistical Mechanical model) - Biophysical prediction of protein-peptide interactions and signaling networks using machine learning. (Web)
- Foregen and the Science of Regeneration with Tyler Drzod and Eric Cunningham (2021) (Links & Transcript)
- Foregen - Regenerating Foreskins. (Reddit) (Commentarium)
- DeepMind releases massive protein structure database (2021)
- ATOM3D: Tasks on Molecules in Three Dimensions (Code)
- 3D Infomax improves GNNs for Molecular Property Prediction (2021) (Code)
- Regenerated crustacean limbs are precise replicas (2021) (Tweet)
- Dockerfiles and documentation on tools for public health bioinformatics
- Arc Institute - New institution for curiosity-driven biomedical science and technology. (Tweet) (Twitter)
- SeqLike - Unified biological sequence manipulation in Python.
- Deep Learning for Molecules and Materials Book (Code)
- ProtTrans - State of the art pre-trained models for proteins.
- sandbox.bio - Interactive bioinformatics tutorials.
- Foldseek Search Server - Suite for searching and clustering protein structures. (Code) (Tweet)
- AlphaFold 2 & Equivariance (2020)
- AlphaFold-Powered Drug Discovery of a Novel CDK20 Inhibitor (2022) (HN)
- Graphein - Protein Graph Library.
- A backbone-centred energy function of neural networks for protein design (2022)
- EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction (2022) (Code)
- FastFold - Optimizing Protein Structure Prediction Model Training and Inference on GPU Clusters.
- Graphormer - Deep learning package that allows researchers and developers to train custom models for molecule modeling tasks.
- Strobealign - Fast short-read aligner. It achieves the speedup by using a dynamic seed size obtained from syncmer-thinned strobemers.
- Bioinformatics training materials
- 3 Key Questions to Think About When Designing Proteins Computationally (2022) (Tweet)
- AIMSim - Tool for visualizing diversity in your molecular data-set using structural fingerprints.
- ProteInfer - Approach for predicting the functional properties of protein sequences using deep neural networks.
- Learning inverse folding from millions of predicted structures (2022) (Code) (Tweet)
- What's next for AlphaFold and the AI protein-folding revolution (2022)
- AlphaFill - Algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models.
- tiwih - Simple bioinformatics command-line tools I wished I had.
- Awesome Bioinformatics Benchmarks
- GLUE (Graph-Linked Unified Embedding) - Graph-linked unified embedding for single-cell multi-omics data integration.
- Learning to Extend Molecular Scaffolds with Structural Motifs (2022) (Code)
- Functional regeneration and repair of tendons using biomimetic scaffolds loaded with recombinant periostin (2021)
- MUSCLE - Widely-used software for making multiple alignments of biological sequences.
- fastBio - Deep learning library for biological sequences. Extension of Fastai and PyTorch.
- glosim - Python package to compute similarities between molecules and structures.
- Bioontologies - Unified access to biomedical ontologies.
- PyBioPAX - Python implementation of the BioPAX object model.
- AIMNet - Atoms In Molecules Neural Network Potential.
- BigBIO: Biomedical Datasets - Tools for curating biomedical training data for large-scale language modeling.
- molcloud - Make a bunch of molecules.
- RITA: a Study on Scaling Up Generative Protein Sequence Models (2022) (Code)
- Awesome Molecular Generation
- Geometric Transformers for Protein Interface Contact Prediction (2022) (Code)
- DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction (2021) (Code)
- Carina - Fast proteomics search engine in 1000 lines of code.
- Cluster Tools - Distributed segmentation for bio-image-analysis.
- BCALM - Bioinformatics tool for constructing the compacted de Bruijn graph from sequencing data.
- pyfaidx - Efficient pythonic random access to fasta subsequences.
- AlphaFold reveals the structure of the protein universe (2022) (HN)
- K-mer File Format: a standardized and compact disk representation of sets of k-mers (2022) (Tweet)
- A Guide to Decentralized Biotech
- Saez-Rodriguez Group - Develop software tools for systems level analysis and mechanistic modeling of molecular and biomedical data. (GitHub)
- Collection of fold tools
- Bio Embeddings - Get protein embeddings from protein sequences.
- Deep Review - Collaboratively written review paper on deep learning, genomics, and precision medicine. (Web)
- Uni-Fold: an open-source platform for developing protein models beyond AlphaFold
- Fasten - Perform random operations on fastq files, using unix streaming.
- JVARKIT - Java utilities for Bioinformatics.
- Bioconda recipes - Conda recipes for the bioconda channel.
- foldingdiff - Diffusion model for protein backbone generation. (Demo)
- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking (2022) (Code)
- Allbase - Database for engineering organisms. Go REST API to replace Genbank, Uniprot, Rhea, and CHEMBL.
- MICAN - Non-sequential structural alignment program for protein structure.
- Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures (2022) (Code)
- List of papers about Proteins Design using Deep Learning
- ManyFold - Efficient and flexible library for training and validating protein folding models.
- AlphaFold’s new rival? Meta AI predicts shape of 600M proteins (2022) (HN)
- biogo - Bioinformatics library for Go. (Examples)
- Rust-Bio-Tools - Set of ultra fast and robust command line utilities for bioinformatics tasks based on Rust-Bio.
- fastqc-rs - Quality control tool for FASTQ files written in rust.
- This is biology's century. We're not ready for it (2022)
- Foresight Institute - Catalyzing Transformative Technologies. (Twitter)
- Why use Rust for bioinformatics? Defining the problem space. (2022) (Reddit)
- Progres - Protein Graph Embedding Search.
- Mega-scale experimental analysis of protein folding stability in biology and protein design (2022) (Tweet)
- Representation Learning on Biomolecular Structures using Equivariant Graph Attention (2022)
- On-chip on-demand delivery of K+ for in vitro bioelectronics (2022)
- Nanoscale Instruments for Visualizing Small Proteins (2022)
- Latch SDK - Framework to build and deploy bioinformatics workflows.
- LatchBio - End-to-end biocomputing in the browser. (GitHub) (Twitter) (Hiring)
- Robust deep learning based protein sequence design using ProteinMPNN (2022) (Code)
- AlphaFold 2 & Equivariance (2020)
- Vascularization, Survival, and Functionality of Tissue-Engineered Constructs
- OpenBioLink - Resource and evaluation framework for evaluating link prediction models on heterogeneous biomedical graph data.
- Bitesize Bio - Biotech how-tos.
- Fierce Biotech - Biotech industry news.
- Chroma PyTorch - Generative models of protein using DDPM and GNNs, in Pytorch.
- Equiformer Diffusion - Implementation of Denoising Diffusion for protein design, but using the new Equiformer (successor to SE3 Transformers) with some additional improvements.
- BioSpaCy - SpaCy pipeline for processing biology texts.
- Molecular Property Prediction - Message Passing Neural Networks for Molecule Property Prediction.
- topiary - Python framework for doing ancestral sequence reconstruction.
- OngLai: An Algorithm to Classify Homologous Series
- BioGPT: generative pre-trained transformer for biomedical text generation and mining (2022) (Code)
- Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies (2022) (Code)
- Awesome Biomechanics
- Spateo - Spatiotemporal modeling of spatial transcriptomics.
- Awesome Cytodata
- Ankh - Optimized Protein Language Model Unlocks General-Purpose Modelling.
- Gotree - Set of command line tools and an API to manipulate phylogenetic trees.
- biopix - 3D Protein Structure Visualizer Built with Rust.
- vesta_vectors.py - Python 3 script to visualize atomic displacement using the Vesta file format.
- BioGPT: A language model pre-trained on large-scale biomedical literature (2023) (HN)
- EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation (2022) (Code)
- BioDynaMo - High-performance and modular, agent-based simulation platform.
- Systematic Survey of Molecular Pre-trained Models (Chemical Language Models)
- TRILL - Sandbox for Deep-Learning based Computational Protein Design.
- HTSlib - C library for high-throughput sequencing data formats.
- Bioinformatics one-liners
- List of molecular design using Generative AI and Deep Learning
- Molfeat - Hub for all your molecular featurizers.
- ChemSpacE - Interpretable and Interactive Chemical Space Exploration.
- biobear - Work with bioinformatic files using Polars.
- WHERE TRUE Technologies - Bioinformatics toolkits.
- Signal - Voice of Next Generation Biotech. (Intro)
- DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models (2023) (Code)
- Nice starters into bioinformatics (2023)
- lightmotif - Lightweight platform-accelerated library for biological motif scanning using position weight matrices.
- Rust Pseudoaligner - Single-Cell RNA-seq pseudo-aligner.
- Single-cell best practices
- indexedfasta-js - Read FASTA files indexed with .fai indexes. Also supports BGZIP+.gzi.
- AlphaLink - Predicts protein structures using deep learning given a sequence and a set of experimental contacts.
- Antisequence - Rust library for processing sequencing reads.
- Bioinformatics Discord
- AutoDock GPU - AutoDock for GPUs and other accelerators.
- Phanta - Workflow to rapidly quantify taxa from all domains of life, directly from short-read human gut metagenomes.
- g2p - Grapheme-to-Phoneme transductions that preserve input and output indices.
- ProstT5 - Bilingual Language Model for Protein Sequence and Structure.
- Machine Learning Coarse-Grained Potentials of Protein Thermodynamics (2022) (Code)
- chopper - Rust implementation of NanoFilt+NanoLyse.
- Nanofilt - Filtering and trimming of long read sequencing data.
- The complete sequence of a human Y chromosome (2023) (HN)
- fqtk - Toolkit for working with FASTQ files, written in Rust.