Genomics
Introduction to Genomics for Engineers is nice intro.
Links
- Seq - Programming language for high-performance computational genomics. (Web) (HN)
- Open sourcing bioinstruments (2019) (HN)
- goleft - Collection of bioinformatics tools distributed under MIT license in a single static binary.
- When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data (2019)
- Handbook of Statistical Genomics (2019)
- Nature Reviews Genetics - Monthly review journal in genetics and covers the full breadth of modern genetics.
- Bioinformatics and Functional Genomics 3rd Edition by Jonathan Pevsner
- Nucleus - Python and C++ code for reading and writing genomics data.
- Statistical Population Genomics
- Genome Informatics 2019 Lightning Talk: A. Sina Booeshaghi
- 10x Genomics (Code)
- Neher Lab - Pathogen evolution, genomics, and biophysics lab in Switzerland.
- BEDOPS - High-performance genomic feature operations.
- Introduction to Genetics and Evolution
- Open Targets - Partnership that uses human genetics and genomics data for systematic drug target identification and prioritisation.
- Next-Gen Sequence Analysis Workshop (2019)
- Korkin Lab
- Varlociraptor - Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
- genetools - Single-cell analysis recipes. (Code)
- Learn Genetics (HN)
- fgsea - Fast Gene Set Enrichment Analysis.
- Awesome Genome Visualization
- Computational Genomics Class Playlist at San Diego State University
- Computational Genomics Manual (Code)
- A Theory of Natural Universal Computation Through RNA (2020) (Tweet)
- Hail - Open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data. (References) (Web)
- gnomAD - Genome aggregation database.
- gnomad_methods - Hail helper functions for the gnomAD project and Translational Genomics Group.
- We can now edit the human genome – how far should we go? (HN)
- Scientific Background: A tool for genome editing (2020)
- Introduction to Social Science Genetics (2020)
- gplearn - Implements Genetic Programming in Python, with a scikit-learn inspired and compatible API.
- Genetic List - Browsable genetic marker lists.
- Cutevariant - Standalone and free application to explore genetics variations from VCF file.
- A hypothesis is a liability (2020)
- Awesome Imaging Genetics
- There is little chance CRISPR will ever be widely used to directly treat disease (2020) (HN)
- misha - Genomic data analysis suite.
- MetaCell R - Single-cell mRNA Analysis.
- nf-core/eager - Bioinformatics best-practice analysis pipeline for NGS sequencing based ancient DNA (aDNA) data analysis.
- GenomicSQLite - Genomics Extension for SQLite.
- gfabase - Command-line utility for random-access storage of Graphical Fragment Assembly (GFA) data.
- GenomeSpy - Genome visualization tool with a grammar of graphics and WebGL-powered fluid interactions. (Code)
- Decoding the Language of Genomes (2020) (HN)
- GREIN - GEO RNA-seq Experiments Interactive Navigator. (Code)
- FUMA: Functional mapping and annotation of genetic associations (Code)
- GenePattern Notebook - Platform for integrating genomic analysis with Jupyter Notebooks.
- SCDE - R package for analyzing single-cell RNA-seq data.
- Human Genome Idiogram Vector Art Library - Contains image files for each of the 24 primary human chromosomes, as well as one for the entire genome lined up.
- RNA Memory Hypothesis (2021) (HN)
- ELI5 Epigenetics
- Bioconductor - Provides tools for the analysis and comprehension of high-throughput genomic data. (GitHub)
- Orchestrating Single-Cell Analysis with Bioconductor (Code)
- Awesome CRISPR - List of software/websites/databases/papers for genome engineering.
- Genomics Workflows on AWS (Web)
- Pyne Lab - Single molecule biophysics of DNA interactions.
- Nextclade - Viral genome alignment, clade assignment, mutation calling, and quality checks. (Code)
- Shooting out the messenger—mRNA and how the pandemic advanced biotechnologies (2021) (Tweet)
- Cirrocumulus for Single-Cell Data Visualization (Code)
- On detecting gene-gene interactions (2020)
- An on-off switch for gene editing (2021)
- Scientists Catch Jumping Genes Rewiring Genomes (2021) (HN)
- regenie - C++ program for whole genome regression modelling of large genome-wide association studies.
- Edinburgh Genome Foundry (GitHub)
- Cultural Evolution of Genetic Heritability (2021) (Tweet)
- Pachter Lab - Develops computational and experimental methods for genomics. (GitHub)
- The complete sequence of a human genome (2021) (Reddit)
- perbase - Per-base per-nucleotide depth analysis.
- Computational genomics resources
- Seq-N-Slide - Sequencing data analysis pipelines. (Docs)
- Genomics Reddit
- How to sequence your genome at home (2021) (HN)
- Ask HN: What's an interesting DIY genetic engineering project? (2021)
- Illumina - Sequencing and array-based solutions for genetic research.
- Wochenende - Whole Genome/Metagenome Sequencing Alignment Pipeline.
- Using a Quadruplet Codon to Expand the Genetic Code of an Animal (2021)
- PingPong - Comparative genome analysis using sample-specific string detectionin accurate long reads.
- RNA demethylation increases rice and potato yields 50% (2021) (HN)
- Amazon Genomics CLI (HN) (Code)
- The Specious Art of Single-Cell Genomics (2021) (Tweet) (Tweet)
- IGV: Integrative Genomics Viewer - Fast, efficient, scalable visualization tool for genomics data and annotations. (Code)
- Oxford Nanopore Technologies - Nanopores for single molecule (DNA/RNA, protein) analysis using the MinION, GridION and PromethION systems. (GitHub)
- nanoq - Ultra-fast quality control and summary reports for nanopore reads.
- Medaka - Tool to create consensus sequences and variant calls from nanopore sequencing data.
- alignment-nf - Whole Exome/Whole Genome Sequencing alignment pipeline.
- Stanford researchers develop an engineered 'mini' CRISPR genome editing system (2021) (HN)
- PiGx - Pipelines in genomics. (Web)
- snps - Tools for reading, writing, merging, and remapping SNPs.
- Sano Genetics - Upload DNA data to explore personal DNA reports on health, traits and genetic conditions.
- GenomePrep - To preprocess, quality control and prepare consumer DTC genomes for research. (Web)
- scikit-allel - Python package for exploring and analyzing genetic variation data.
- Centre for Genomics and Global Health (GitHub)
- pysamstats - Fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file.
- Local PCA - Methods for examining PCA locally along the genome.
- List of gene lists for genomic analyses
- MacArthur Lab - Extracting useful information from large genomic datasets. (GitHub)
- Directed evolution of rRNA improves translation kinetics and recombinant protein yield (2021) (Tweet)
- classify-genomes - Classify a genome sequence according to the mOTUs/specI taxonomy.
- mOTU profiler - Computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
- Unicycler - Hybrid assembly pipeline for bacterial genomes.
- Trycycler - Tool for generating consensus long-read assemblies for bacterial genomes.
- Badread - Read simulator that can imitate many types of read problems.
- Polypolish - Short-read polishing tool for long-read assemblies.
- Holt Lab - Microbial genomics.
- Prodigal - Prodigal Gene Prediction Software.
- Predicting gene expression with AI | DeepMind (2021) (Code)
- Clonal dominance in excitable cell networks (2021)
- 'Useless Specks of Dust' Turn Out to Be Building Blocks of All Vertebrate Genome (2021) (HN)
- RNA-responsive elements for eukaryotic translational control (2021)
- souporcell - Clustering scRNAseq by genotypes.
- Vireo - Variational Inference for Reconstructing Ensemble Origin by expressed SNPs in multiplexed scRNA-seq data.
- Naturally occurring modified ribonucleosides (2020)
- microPIPE - Pipeline for high-quality bacterial genome construction using ONT sequencing.
- Genomic Medicine and Statistics DPhil Programme
- RNA Takes Over (HN)
- Dense Depth Data Dump (D4) - Format and tool suite provide an alternative to BigWig for fast analysis and compact storage of quantitative genomics datasets.
- bedtools - Swiss army knife for genome arithmetic.
- GEMINI - Integrative exploration of genetic variation and genome annotations.
- Quinlan Lab - Combine computational and genomic techniques to explore genome biology and the genetic basis of traits.
- vg - Tools for working with genome variation graphs.
- seqwish - Alignment to variation graph inducer.
- pggb - Pangenome graph construction pipeline renders a collection of sequences into a pangenome graph.
- Alen - Command-like program to view DNA or protein alignments in FASTA formats.
- gffutils - Python package for working with and manipulating the GFF and GTF format files typically used for genomic annotations.
- The origins and functional effects of postzygotic mutations throughout the human lifespan (2021)
- In genetic programming, does introducing changes (e.g., mutations) at lower levels of a genome tree generally makes for better or worse outcomes than making changes at a higher level in the tree?
- biowasm - WebAssembly modules for genomics.
- Aioli - Framework for building fast genomics web tools with WebAssembly and WebWorkers.
- 42basepairs - Better way to explore your genomics data.
- A catalogue of 1,167 genomes from the human gut archaeome (2021)
- KMCP: accurate metagenomic profiling of both prokaryotic and viral organisms by pseudo-mapping
- ViralMSA - Reference-guided multiple sequence alignment of viral genomes.
- sketchy - Genomic neighbor typing for lineage and genotype inference.
- GeneGrouper - CLI tool for finding gene clusters in many genomes and placing them in discrete groups based on gene content similarity.
- WhatsHap - Software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly.
- RNAflow - Effective and simple RNA-Seq differential gene expression pipeline using Nextflow.
- KneadData - Tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
- RiboDetector - Accurately yet rapidly detect and remove rRNA sequences from metagenomeic, metatranscriptomic, and ncRNA sequencing data.
- NanoSim - Nanopore sequence read simulator.
- GenVisR - Genome data visualizations.
- Glow - Open-source toolkit for large-scale genomic analysis.
- EVcouplings - Evolutionary couplings from protein and RNA sequence alignments.
- alevin-fry - Efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
- Differential Gene Expression using RNA-Seq (Workflow)
- solo - Doublet detection via semi-supervised deep learning.
- scNym - Semi-supervised adversarial neural networks for single cell classification.
- FLAMES - Full-length transcriptome splicing and mutation analysis.
- Open Problems in Single-Cell Analysis - Formalizing and benchmarking open problems in single-cell genomics.
- HiGlass - Fast, flexible and extensible genome browser. (Code)
- Automated Genome Assembly
- mashtree - Create a tree using Mash distances.
- Orpheum - Python package for directly translating RNA-seq reads into coding protein sequence.
- SHAPEwarp - SHAPE-guided RNA structural homology search.
- The complete sequence of human genome (Article) (HN)
- cuteSV - Long read based human genomic structural variation detection with cuteSV.
- pyfastx - Robust python module for fast random access to sequences from plain and gzipped FASTA/Q file.
- Winnowmap - Long read / genome alignment software.
- MGEfinder - Toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
- Assembled Genomes Compressor (AGC) - Tool designed to compress collections of de-novo assembled genomes. It can be used for various types of datasets: short genomes (viruses) as well as long (humans).
- GenomeHubs - Designed to make it easy to set up and host a core set of bioinformatics tools to help research communities share and access genomic datasets for non-model organisms.
- Ribbon - Genome browser that shows long reads and complex variants better. (Code)
- genomepy - Install and use genomes & gene annotations the easy way.
- sgkit - Statistical genetics toolkit in Python.
- Macrel - Predict AMPs in (meta)genomes and peptides.
- Human genetic engineering is coming (2022) (HN)
- gget - Enables efficient querying of genomic databases.
- Raman2RNA: Live-cell label-free prediction of single-cell RNA expression profiles by Raman microscopy (2021) - Taking a high dimensional image of the cell itself and mapping that to transcriptome training data. (Tweet)
- Awesome Nanopore - List of software packages for Nanopore sequencing data analysis, including basecalling, DNA/RNA modifications.
- GNNome Assembly - Learning to untangle genome assembly with graph neural networks.
- starfish - Scalable pipelines for image-based transcriptomics.
- alv - View your DNA or protein multiple-sequence alignments right at your command line.
- New CRISPR-based map ties every human gene to its function (2022) (HN)
- fgbio - Tools for working with genomic and high throughput sequencing data.
- Dagr - Task and pipeline execution system for directed acyclic graphs to support scientific, and more specifically, genomic analysis workflows.
- FASTQ ME
- GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis (2022) (Video)
- A time-resolved, multi-symbol molecular recorder via sequential genome editing (2022)
- Single-cell data structures in TileDB
- Hierarchical progressive learning of cell identities in single-cell data
- A guide to antigen processing and presentation (2022)
- Snakemake - Snakemake pipeline for variant calling from raw sequences.
- Whole Genome Sequencing (HN)
- dRep - Rapid comparison and dereplication of genomes.
- ProLIF - Interaction Fingerprints for protein-ligand complexes and more.
- Live-cell micromanipulation of a genomic locus reveals interphase chromatin mechanics (2022) (Tweet)
- VirSorter 2 - Customizable pipeline to identify viral sequences from (meta)genomic data.
- LightDock - Protein-protein, protein-peptide and protein-DNA docking framework based on the GSO algorithm.
- ADAM - Genomics analysis platform with specialized file formats.
- DoRothEA - R package to access DoRothEA's regulons.
- LANTERN - Interpretable genotype-phenotype landscape modeling.
- La Jolla Assembler - Tool for genome assembly from PacBio HiFI reads based on de Bruijn graphs.
- GENA-LM - Transformer masked language model trained on human DNA sequence.
- RagTag - Tools for fast and flexible genome assembly scaffolding and improvement.
- Two-layer design protects genes from mutations in their enhancers (2022)
- The Era of Fast, Cheap Genome Sequencing Is Here (2022)
- EggNOG-mapper - Tool for fast functional annotation of novel sequences.
- RNA Sequencing - Building your own pipeline from scratch (2022)
- SPALN - Genome mapping and spliced alignment of cDNA or amino acid sequences.
- Resistance Gene Identifier (RGI) - Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
- Genes Can Leap from Snakes to Frogs in Madagascar (2022)
- StringTie - Transcript assembly and quantification for RNA-Seq.
- GffRead - GFF/GTF utility providing format conversions, filtering, FASTA sequence extraction and more.
- GffCompare - Classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF.
- GCLib - Genomic C++ Library.
- rust-alignbench - Pairwise nucleotide alignment benchmark of Rust bindings.
- umis - Tools for processing UMI RNA-tag data.
- bcbio-nextgen - Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis.
- HTS - SAM and BAM handling for Go.
- Introduction to Genomics for Engineers (Lobsters) (Code)
- Consider working on genomics (2022) (HN)
- Pinello Lab - Massachusetts General Hospital/ Harvard Medical School. (GitHub)
- scvi-tools - Deep probabilistic analysis of single-cell omics data.
- gggenomes - Grammar of graphics for comparative genomics.
- sgdemux - Singular Genomics Demultiplexing Tool.
- snp-dists - Pairwise SNP distance matrix from a FASTA sequence alignment.
- T2T-Polish - Evaluation and polishing workflows for T2T genome assemblies.
- GraffiTE - Pipeline that finds polymorphic transposable elements in genome assemblies and genotypes the discovered polymorphisms in read sets using a pangenomic approach.
- PyEnsembl - Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl.
- Inseq - Intepretability for Sequence Generation Models.
- fqgrep - Grep for FASTQ files.
- Fulcrum Genomics (GitHub) (Twitter)
- You can’t take it with you: straight talk about epigenetics (2022)
- VISION - Signature Analysis and Visualization for Single-Cell RNA-seq.
- Seqkit - Toolkit for manipulating FASTA and SAM files.
- psvcp - Pan-genome Construction and Population Structure Variation Calling Pipeline.
- Griffith Lab (GitHub)
- plastiC - Snakemake workflow for recovery of plastid genomes from metagenomic samples.
- RGT - Regulatory Genomics Toolbox.
- Circlator - Tool to circularize genome assemblies.
- Goalign - Set of command line tools to manipulate multiple alignments.
- gos - Declarative library for Python designed to create interactive multi-scale visualizations of genomics and epigenomics data.
- bap - Bead-based scATAC-seq data Processing.
- mgatk - Mitochondrial genome analysis toolkit.
- Falco - C++ drop-in replacement of FastQC to assess the quality of sequence read data.
- Haplotype-based variant detection from short-read sequencing (2012) (Code)
- spacepile - Convert reads from repeated measures of same piece of DNA into spaced matricies for deep learners.
- NGSNGS - Next Generation Simulator for Next Generator Sequencing Data.
- hgvs - Python library to parse, format, validate, normalize, and map sequence variants.
- Gattaca is still pertinent 25 years later (2022) (HN)
- skani - Fast, robust ANI and aligned fraction for metagenomic genomes and contigs.
- stRainy - Graph-based assembly phasing.
- sourmash - Quickly search, compare, and analyze genomic and metagenomic data sets.
- HAT - Tools for calling de novo variants from whole-genome sequencing data.
- The ODIN - DIY genetic engineering. (HN)
- scDrug - From scRNA-seq to Drug Repositioning.
- First UK child to receive gene therapy for fatal genetic disorder is now healthy (2023)
- uBin - Software for manual curation of genomes from metagenomes.
- GenomeScope: Fast genome analysis from unassembled short reads
- pafplot - Base-level sequence alignment rasterizer / dotplot generator.
- PanGenie - Short-read genotyper for various types of genetic variants (such as SNPs, indels and structural variants) represented in a pangenome graph.
- Lakeview - Python 3 library for creating publication-quality IGV-style genomic visualizations.
- The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics (2023) (Code)
- Plotsr - Tool to plot synteny and structural rearrangements between genomes.
- GraphBin - Refined binning of metagenomic contigs using assembly graphs.
- Grass - Genomics data manipulation and analysis system.
- Hidden RNA repair mechanism discovered in humans (2023) (HN)
- Splatter - Simple simulation of single-cell RNA sequencing data.
- VarTrix - Single-Cell Genotyping Tool.
- RNA Tools - Toolbox to analyze sequences, structures and simulations of RNA (and more).
- sleuth - Program for differential analysis of RNA-Seq data.
- seqspec - Machine-readable file format for genomic library sequence and structure.
- skc - Shared k-mer content between two genomes.
- Genomic Benchmarks - Benchmarks for classification of genomic sequences.
- Metabuli - Specific and sensitive metagenomic classification via joint analysis of DNA and amino acid.
- Bioframe - Pandas utilities for tab-delimited and other genomic data files.
- NextPolish2 - Repeat-aware polishing genomes assembled using HiFi long reads.