On this page

Computer vision

LiT (Locked-image Tuning) paper is neat. Trying to understand Vision Transformers. Kornia & Scenic seem like great libraries. Imagen is fascinating.

Embedding Methods for Image Search & Computer Vision: Models, Learning, and Inference are nice reads.

Rerun is great CV visualization tool.

Links

OpenCV - Open Source Computer Vision Library. (Web) (OpenCV Course)
Gluon CV Toolkit - Provides implementations of the sate-of-the-art (SOTA) deep learning models in computer vision.
Pythia - Modular framework for vision and language multimodal research. Built on top of PyTorch.
video-object-removal - Just draw a bounding box and you can remove the object you want to remove.
GoCV - Go package for computer vision using OpenCV 4 and beyond.
Sandbox for training convolutional networks for computer vision
Get started with Computer Vision, Deep Learning, and OpenCV
TorchCV - PyTorch-Based Framework for Deep Learning in Computer Vision.
AI Habitat - Flexible, high-performance 3D simulator for Embodied AI research.
Kornia - Open Source Differentiable Computer Vision Library for PyTorch. (Web)
Roboflow - Raw images to trained computer vision model. (Article)
PySlowFast - Open source video understanding codebase from FAIR that provides state-of-the-art video classification models.
How to Convert a Picture to Numbers
Awesome Computer Vision
The Ancient Secrets of Computer Vision (2018)
Variational Methods for Computer Vision lectures (2013)
Classy Vision - New end-to-end, PyTorch-based framework for large-scale training of state-of-the-art image and video classification models.
Meshroom - 3D Reconstruction Software.
AliceVision - Photogrammetric Computer Vision Framework. (Code) (GitHub)
PyTorch3d - Provides efficient, reusable components for 3D Computer Vision research with PyTorch. (Web)
Face Recognition - World's simplest facial recognition api for Python and the command line.
Deep Hough Voting for 3D Object Detection in Point Clouds
Point Cloud Library - Standalone, large scale, open project for 2D/3D image and point cloud processing.
Disappearing-People - Removing people from complex backgrounds in real time using TensorFlow.js in the web browser. (HN)
Best Practices, code samples, and documentation for Computer Vision
Computer Vision Basics in Microsoft Excel
PolyGen: An Autoregressive Generative Model of 3D Meshes (2020)
Sophus - C++ implementation of Lie Groups using Eigen.
SOLT - Streaming over lightweight data transformations.
Awesome Interaction-aware Behavior and Trajectory Prediction
SynSin: End-to-end View Synthesis from a Single Image (2020) (Code)
Pixel2Mesh - Generating 3D Mesh Models from Single RGB Images.
First Order Motion Model for Image Animation (Code)
PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution
Learning to See Through Obstructions
Learning to Cluster Faces on an Affinity Graph (LTC)
Avatarify - Avatars for Zoom and Skype.
SPSR - PyTorch implementation of Structure-Preserving Super Resolution with Gradient Guidance.
OISR-PyTorch - PyTorch implementation of "ODE-inspired Network Design for Single Image Super-Resolution.
3D Photography using Context-aware Layered Depth Inpainting
CenterMask : Real-Time Anchor-Free Instance Segmentation
Interview with Dmytro Mushkin | Computer Vision Research | Kaggle, ML & Education (2020)
Pytorch code for ICLR-20 Paper "Learning to Explore using Active Neural SLAM"
FaceTracker - Real time deformable face tracking in C++ with OpenCV 3.
Awesome Super Resolution
Adversarial Latent Autoencoders
ElasticFusion - Real-time dense visual SLAM system capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera.
StegaStamp: Invisible Hyperlinks in Physical Photographs
Pose Animator - Takes a 2D vector illustration and animates its containing curves in real-time based on the recognition result from PoseNet and FaceMesh. (HN)
fvcore - Collection of common code that's shared among different research projects in FAIR computer vision team.
Making Sense of Vision and Touch: Multimodal Representations for Contact-Rich Tasks (2020)
ScreenPoint - Project an image centroid to another image using OpenCV.
U^2-Net - Code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection".
TorchIO - Tools for medical image processing in deep learning.
Real time Image Animation in OpenCV using first order model (HN)
OpenMV (Open-Source Machine Vision) - Aims at making machine vision more accessible to beginners by developing a user-friendly, open-source, low-cost machine vision platform.
TSD - 1st place models in Google OpenImage Detection Challenge 2019.
Training-Time-Friendly Network for Real-Time Object Detection
Big Transfer (BiT): General Visual Representation Learning
Fast Human Pose Estimation CVPR2019
Deep High-Resolution Representation Learning for Human Pose Estimation
Background Matting: The World is Your Green Screen
DE⫶TR: End-to-End Object Detection with Transformers
PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
Tracking Objects as Points
VIBE - Video Inference for Human Body Pose and Shape Estimation.
SRZoo - Integrated repository for super-resolution using deep learning.
mAP (mean Average Precision) - Evaluates the performance of your neural net for object recognition.
Neural Pose Transfer by Spatially Adaptive Instance Normalization (2020)
Awesome Neural Rendering
Learning To Classify Images Without Labels
Deep Leakage From Gradients (2019)
3Dflow - Offers customized computer vision software solutions.
labelme - Image Polygonal Annotation with Python.
imgviz - Image Visualization Tools.
Attention-Guided Hierarchical Structure Aggregation for Image Matting
YOLOv5 Is Here: State-of-the-Art Object Detection at 140 FPS (2020) (HN) (Code)
DetectoRS - Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.
PyTorch implementation of paper Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs
VirTex: Learning Visual Representations from Textual Annotations
High-Resolution 3D Human Digitization from A Single Image
FairMOT - Simple baseline for one-shot multi-object tracking.
Implicit Neural Representations with Periodic Activation Functions (2020)
MSeg: A Composite Dataset for Multi-Domain Segmentation
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
MMDetection - OpenMMLab Detection Toolbox and Benchmark.
Fourier Feature Networks in TensorFlow 2
Computer Vision Lab | ETH Zurich
PyTorch Computer Vision Library for Experts and Beginners (2020)
Computer Vision Pretrained Models
Fawkes: Image “Cloaking” for Personal Privacy (HN)
Motion - Software motion detector.
Supervised 3D Mesh Reconstruction (2020)
NeRF in the Wild - Neural Radiance Fields for Unconstrained Photo Collections.
NASA: Neural Articulated Shape Approximation (2020)
An Overview of Deep Learning Architectures in Few-Shot Learning Domain (2020)
FutureMapping: The Computational Structure of Spatial AI Systems (2018) (Tweet)
Optimal Peanut Butter and Banana Sandwiches (2020) (Twitter)
Gesture Recognition with Line Integrals (Code)
Computer Vision: Looking Back to Look Forward (2020)
DAIN (Depth-Aware Video Frame Interpolation)
Picsellia - Development platform dedicated to Computer Vision.
Official implementation of "PifPaf: Composite Fields for Human Pose Estimation" in PyTorch
Object Recognition with Gradient-Based Learning (1999)
Imaginaire - NVIDIA PyTorch GAN library with distributed and mixed precision support. (Docs)
DeepBackSub - Virtual Video Device for Background Replacement with Deep Semantic Segmentation.
Awesome Tiny Object Detection
Flow-edge Guided Video Completion
5 Things to look for in a Computer Vision startup job (2020)
Transformers for Image Recognition at Scale (2020) (HN)
nnU-Net - Segmentation method that is designed to deal with the dataset diversity.
batchgenerators - Framework for data augmentation for 2D and 3D image classification and segmentation.
Lookuq - App to create object detection projects without coding. (HN)
InsightFace - Face Analysis Project on MXNet. (Web)
PyTorch implementation of SwAV (Swapping Assignments between Views)
Asymmetric Loss For Multi-Label Classification in PyTorch
Antialiased CNNs - Making Convolutional Networks Shift-Invariant Again.
Perceptual Similarity Metric and Dataset - Unreasonable Effectiveness of Deep Features as a Perceptual Metric.
Deep Learning Anime Papers
Vision Transformer - Models from the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
Handsfree.js - Wrapper library around computer vision models for working with face pointers, assistive tech, and creative expression. (Web)
ZeroQ: A Novel Zero Shot Quantization Framework
SqueezeNext - Contains the Caffe implementation of SqueezeNext.
ANODE: Adjoint Based Neural ODEs
Python Video Stabilization using OpenCV
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
TorchCV - PyTorch vision library mimics ChainerCV.
Vision Transformer in PyTorch
MedicalTorch - Medical imaging framework for PyTorch. (Docs)
imagecluster - Cluster images based on image content using a pre-trained deep neural network, optional time distance scaling and hierarchical clustering.
Detecto - Build fully-functioning computer vision models with PyTorch. (Docs)
EmoPy - Deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER).
PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder"
Label Decoupling Framework for Salient Object Detection
MONAI - PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. (Web)
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
Faster R-CNN Explained for Object Detection Tasks (2020)
How to Install OpenCV on a Raspberry Pi (2020)
Contextual Encoder-Decoder Network for Visual Saliency Prediction
PyImageSearch - Master Computer Vision, Deep Learning, and OpenCV.
Natural Adversarial Examples - Harder ImageNet Test Set.
How to upload 50 OpenCV frames into cloud storage within 1 second (2020)
Egocentric Videoconferencing (2020) - Method for egocentric videoconferencing that enables handsfree video calls, for instance by people wearing smart glasses or other mixedreality devices. (Video overview)
gradslam - Open source differentiable dense SLAM library for PyTorch.
High-Resolution Daytime Translation Without Domain Labels
Holistically-Nested Edge Detection
pycls - Image classification codebase, written in PyTorch.
PyTorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
How Useful is Self-Supervised Pretraining for Visual Tasks?
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
Multi-object trackers in Python - Easy to use implementation of various multi-object tracking algorithms.
Stanford Vision and Learning Lab (GitHub)
Learning computer vision. Overview of methods and software (2018)
Image embeddings. Image similarity and building (2020) (Code)
All You Need to Know About Object Detection Systems (2020)
Lightly - Computer vision framework for self-supervised learning.
DISK: Learning local features with policy gradient (2020) (Code)
Caer - Lightweight Computer Vision library for high-performance AI research. (Intro)
Awesome Image to Image Translation Papers
EfficientDet: Scalable and Efficient Object Detection, in PyTorch
UNet: semantic segmentation with PyTorch
Exploring Simple Siamese Representation Learning (2020) (Code) (Code)
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Nerfies: Deformable Neural Radiance Fields (Code)
Timeception for Complex Action Recognition (2019) (Code)
Programming Computer Vision with Python (2014) (Code) (Notes)
Fast and Accurate One-Stage Space-Time Video Super-Resolution (2020)
pixelNeRF: Neural Radiance Fields from One or Few Images (2020) (Code)
vedadet - Single stage object detector toolbox based on PyTorch.
OneNet: End-to-End One-Stage Object Detection by Classification Cost
Consistent Video Depth Estimation - Estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.
Implicit Neural Representations with Periodic Activation Functions
Computational Imaging Stanford Lab
Trimap-Free Solution for Portrait Matting in Real Time
Local Light Field Fusion
Awesome Crowd Counting
Neural Sparse Voxel Fields (NSVF)
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (2020) (Tweet)
SharpAI DeepCamera - Source stack for machine learning engineering with private deployment and AutoML for edge computing. (HN)
Contrastive learning of global and local features for medical image segmentation with limited annotations
Real-Time High-Resolution Background Matting (2020) (Code)
Torchreid - Deep learning person re-identification in PyTorch.
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection
PCT: Point Cloud Transformer (2020) (Code)
Learning Continuous Image Representation with Local Implicit Image Function (2020) (Code)
Computer Vision Annotation Tool (CVAT)
DeiT: Data-efficient Image Transformers
Awesome Implicit Neural Representations
ImageAI - Python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities. (Web)
RAIVN Lab - Reasoning, AI and VisioN (RAIVN) Lab. (GitHub)
Norfair - Customizable lightweight Python library for real-time 2D object tracking.
Universal Style Transfer in PyTorch
NVIDIA Deep learning Dataset Synthesizer (NDDS)
Object Detection at 2530 FPS with TensorRT and 8-Bit Quantization (2020)
HTML4Vision - Simple HTML visualization tool for computer vision research.
Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
Taming Transformers for High-Resolution Image Synthesis
X-Temporal - Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs.
NanoDet - Super fast and lightweight anchor-free object detection model. Real-time on mobile devices.
PyTorch Image Models
Awesome Vision and Language - Curated list of awesome vision and language resources.
DropBlock: A regularization method for convolutional networks (2018) (Code)
Glasses - Compact, concise and customizable deep learning computer vision library. (Web)
Explorable Super Resolution (2019)
PySceneDetect - Python and OpenCV-based scene cut/transition detection program & library.
Best Practices for Building Computer Vision Models (2021)
TIDE - General Toolbox for Identifying Object Detection Errors.
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals (2020) (Code)
Unsplash Image Search - Search photos on Unsplash using natural language.
Kimera Semantics - Real-Time 3D Semantic Reconstruction from 2D data.
Voxblox++ - Volumetric object-level semantic mapping framework.
Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces (Code)
Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video (2020) (Code)
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation (2019) (Code)
Awesome Neural Radiance Fields
D2Det: Towards High Quality Object Detection and Instance Segmentation (2020)
DetCo: Unsupervised Contrastive Learning for Object Detection (2021) (Code) (Code)
Computer Vision Video Lectures - Curated list of free, high-quality, university-level courses with video lectures related to the field of Computer Vision.
Cord - Training data toolbox for computer vision. (HN)
Text-Guided Editing of Images (Using CLIP and StyleGAN)
torchvision - Datasets, Transforms and Models specific to Computer Vision. (Web)
MeInGame: Create a Game Character Face from a Single Portrait (2021) (Code)
Awesome Deep Vision
dataset-tools - Tools for quickly normalizing image datasets.
Using Streamlit to visualize object detection output (2021)
Mobile Computer Vision @ Facebook
Opening the black box of vision AI algorithms (2021)
CompreFace - Free face recognition solution that can be easily integrated into any IT system without prior machine learning skills.
IBRNet: Learning Multi-View Image-Based Rendering (2021) (Code)
From Coarse to Fine: Robust Hierarchical Localization at Large Scale (2019) (Code)
Camera Response Function (2021)
I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image (2020) (Code)
SkipNet: Learning Dynamic Routing in Convolutional Networks (2018) (Code)
Mrcal - Camera Calibrations and More. (HN)
Digging Into Self-Supervised Monocular Depth Estimation (2019) (Code) (Code)
VISSL - FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images. (Web)
Zumo Labs - Generate custom synthetic data sets that result in more robust and reliable computer vision models. (GitHub)
Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors (2020) (Code)
Perceiver: General Perception with Iterative Attention (2021) (Code)
SEER: The start of a more powerful, flexible, and accessible era for computer vision (2021)
NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction (2021)
Neural 3D Video Synthesis
Involution: Inverting the Inherence of Convolution for Visual Recognition (2021) (Code)
Awesome Causality in Computer Vision
Vision Transformers for Dense Prediction (2021) (Code)
LoFTR: Detector-Free Local Feature Matching with Transformers (2021) (Code)
ccv - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (2020) (Code)
AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (2021) (Tweet)
Computer Vision and Embroidery (2021) (Code)
mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields (2021)
Python libraries I use every day for computer vision work (2021)
Awesome Temporal Sentence Grounding in Videos
The Affective Growth of Computer Vision
Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (2020) (Code)
End-to-End Video Instance Segmentation with Transformers (2021) (Code)
SAHI: Slicing Aided Hyper Inference
FOVO: A new 3D rendering technique based on human vision (2020) (HN)
Is Space-Time Attention All You Need for Video Understanding? (2021) (Code)
Awesome Visual-Transformer - Transformer with Computer-Vision (CV) papers.
PyTorchVideo - Deep learning library for video understanding research. (Web)
Self-supervised Video Object Segmentation by Motion Grouping (2021) (HN) (Code)
torchvideo - Datasets, transforms and samplers for video in PyTorch.
A General and Adaptive Robust Loss Function (2019) (Code)
Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (2020) (Code)
MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation (2021)
Vizy - AI Camera.
MMPX Style-Preserving Pixel Art Magnification (2021) (HN)
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (Code)
SuperPoint: Self-Supervised Interest Point Detection and Description (2018) (Code)
Multi-Stage Progressive Image Restoration (2021) (Code)
COLMAP - General-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. (Docs)
Awesome Vision-based SLAM / Visual Odometry
Barlow Twins: Self-Supervised Learning via Redundancy Reduction (2021) (Code)
HIPCL - OpenCL/SPIR-V implementation of HIP.
MMCV - Foundational library for computer vision research and supports many research projects. (Docs)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (2021) (Code)
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (2021) (Code) (Code)
Emerging Properties in Self-Supervised Vision Transformers (2021) (Code) (Tweet) (Tweet)
Geometry-Free View Synthesis: Transformers and no 3D Priors (2021) (Code)
Easily Transform Portraits of People into AI Aberrations Using StyleCLIP (2021)
DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates (2021) (Code)
Onepanel - Open and extensible integrated development environment (IDE) for computer vision. (Web)
Vector Neurons: A General Framework for SO(3)-Equivariant Networks (2021) (Code)
ISTR: End-to-End Instance Segmentation with Transformers (2021) (Code)
MLP-Mixer: An all-MLP Architecture for Vision (2021) (Code) (Code)
Self-attention building blocks for computer vision applications in PyTorch
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary (2021) (Web) (Code)
Neural Rendering: How Low Can You Go in Terms of Input? (2021)
Enhancing Photorealism Enhancement (2021) (Paper) (Code)
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control (2021) (Code)
Omnimatte: Associating Objects and Their Effects in Video (2021)
Rethinking "Batch" in BatchNorm (2021)
Most popular metrics used to evaluate object detection algorithms
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation (2020) (Code)
Synthetic for Computer Vision - List of synthetic dataset and tools for computer vision.
vision_blender - Blender addon for generating synthetic ground truth data for Computer Vision applications.
Easy Few-Shot Learning - Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.
BasicSR (Basic Super Restoration) - Open source image and video restoration toolbox based on PyTorch, such as super-resolution, denoise, deblurring, JPEG artifacts removal, etc.
Intriguing Properties of Vision Transformers (2021) (Reddit)
DIY Amazon Go – computer vision tutorial for cashierless checkout
Image Retrieval in the Wild (2020)
Awesome Transformer in CV papers
Sensor Calibration from Scratch with Rust (2021)
Tangram Vision - Integrate, Calibrate Perception Sensors For Robots, Drones & Automation. (Blog)
Rust CV - Project to implement computer vision algorithms, abstractions, and systems in Rust.
Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control (2021) (HN)
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion (2021) (Code)
MERLOT: Multimodal Neural Script Knowledge Models (2021) (Tweet)
Scaling Vision Transformers (2021)
Self-Supervised Scene De-occlusion (2020) (Code)
Pivotal Tuning for Latent-based Editing of Real Images (2021) (Code)
FLAME: Articulated Expressive 3D Head Model (Code)
XCiT: Cross-Covariance Image Transformers (2021) (Code)
Robust Consistent Video Depth Estimation (2021) (Code)
cvpods - All-in-one Toolbox for Computer Vision Research.
CDFI: Compression-Driven Network Design for Frame Interpolation (2021) (Code)
NeRF--: Neural Radiance Fields Without Known Camera Parameters (2021) (Code) (Code)
Oxford Active Vision Laboratory (GitHub)
Computer Vision: Algorithms and Applications, 2nd ed.
motionEyeOS - Linux distribution that turns your single board computer into a video surveillance system.
Long-Short Transformer: Efficient Transformers for Language and Vision (2021) (Code)
Feature Visualization – How NNs understand images (2017)
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis (2019) (Code)
Convolutional Hough Matching Networks (2021) (Code)
Efficient Self-Supervised Vision Transformers (EsViT)
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases (2021) (Code) (Paper Read) (Article)
CO3D: Common Objects In 3D - Tools for working with the Common Objects in 3D (CO3D) dataset.
ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition (2021) (Code)
Vision Transformer Architecture Search (2021) (Code)
TSIT: A Simple and Versatile Framework for Image-to-Image Translation (2020) (Code)
Recognizing People in Photos Through Private On-Device Machine Learning (2021)
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (2021) (Code)
HPNet: Deep Primitive Segmentation Using Hybrid Representations (2021) (Code)
Portal - Fastest way to load and visualize your deep neural networks on images and videos.
Awesome Human Pose Estimation
Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)
PyTorch implementation for Vision Transformer
Repulsive Curves - Model 2D & 3D curves while avoiding self-intersection. (Tweet) (Code) (HN)
SDEdit: Image Synthesis and Editing with Stochastic Differential Equations (Code)
Region Similarity Representation Learning (2021) (Code)
NeX: Real-time View Synthesis with Neural Basis Expansion (2021) (Code)
Convolutional Occupancy Networks (2020) (Code)
Learning Optical Flow from a Few Matches (2021) (Code)
Visual Parser: Representing Part-whole Hierarchies with Transformers (2021) (Code)
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation (Code)
On Generating Transferable Targeted Perturbations (2021) (Code)
Awesome Scene Understanding - List of papers for scene understanding.
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (2021) (Code)
DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks (2021) (Code)
Object Detection in an Hour (2021) (HN)
Fixing the train-test resolution discrepancy (2020) (Code)
Align Deep Features for Oriented Object Detection (2020) (Code)
Vision-Language Transformer and Query Generation for Referring Segmentation (2021) (Code)
Depth-supervised NeRF: Fewer Views and Faster Training for Free (2021) (Code)
SwinIR: Image Restoration Using Swin Transformer (2021) (Code)
You Only Learn One Representation: Unified Network for Multiple Tasks (2021) (Code)
Probabilistic Modeling for Human Mesh Recovery (2021) (Code)
BARF: Bundle-Adjusting Neural Radiance Fields (2021) (Code)
Self-Calibrating Neural Radiance Fields (2021) (Code)
Transformers-Tutorials - Demos I made with the Transformers library by HuggingFace.
3D Human Texture Estimation from a Single Image with Transformers (2021) (Code)
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval (2021) (Code)
RAFT: Recurrent All Pairs Field Transforms for Optical Flow (2020) (Code)
Volume rendering + 3D implicit surface = Neural 3D Reconstruction
Hierarchical Deep Stereo Matching on High-resolution Images (2019) (Code)
Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering (2021) (Code)
Image Synthesis via Semantic Composition (2021) (Code)
Awesome-Edge-Detection-Papers
Awesome-Image-Colorization
Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)
Face Recognition - 2D and 3D Face alignment library build using PyTorch.
Awesome image retrieval papers
PeekingDuck - Modular framework built to simplify Computer Vision inference workloads.
Pri3D: Can 3D Priors Help 2D Representation Learning? (2021) (Code)
FaceXLib - Aims at providing ready-to-use face-related functions based on current STOA open-source methods.
MMAction2 - Open-source toolbox for video understanding based on PyTorch.
Awesome Collision Detection
Video Super-Resolution Transformer (2021) (Code)
NeRF Atlas - Collection of NeRF extensions for fun and experimentation.
Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, BSRGAN, SwinIR
Uformer: A General U-Shaped Transformer for Image Restoration (2021) (Code) (Code)
Self-Supervised Pretraining Improves Self-Supervised Pretraining (2021) (Code)
SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes (2021) (Code)
HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021
IceVision - Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come. (Docs)
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (2021) (Tweet)
Attention Gated Networks (Image Classification & Segmentation) in PyTorch
Full-Duplex Strategy for Video Object Segmentation (2021) (Code)
YoHa - Practical hand tracking engine. (HN) (Code)
Deep Learning for Face Anti-Spoofing: A Survey (2021) (Code)
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (2021) (Code)
Resolution-robust Large Mask Inpainting with Fourier Convolutions (2021) (Code)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (2021) (Code) (Code) (HN)
ADOP: Approximate Differentiable One-Pixel Point Rendering (2021) (Tweet) (Tweet) (Code)
Patches Are All You Need? (2021) (Code)
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation (2020) (Code)
Video Panoptic Segmentation (2020) (Code)
Awesome-ICCV2021-Low-Level-Vision - Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation.
Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts (2021) (Code)
Non-deep Networks (2021) (Code)
receptivefield - Gradient based receptive field estimation for Convolutional Neural Networks.
Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations (2021) (Code)
Neural Articulated Radiance Field (2021) (Code)
Efficient Visual Pretraining with Contrastive Detection (2021) (Code)
VoTT (Visual Object Tagging Tool) - Source annotation and labeling tool for image and video assets.
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes (2021) (Code)
ByteTrack: Multi-Object Tracking by Associating Every Detection Box (2021) (Code)
Dense Video Captioning with Bi-modal Transformer (2020) (Code)
PyTorch-Encoding - CV toolkit for my papers. (Docs)
Space Time Recurrent Memory Network (2021) (Code)
CVNets - Library for training computer vision networks.
Scenic - Jax Library for Computer Vision Research and Beyond. (Paper)
CV Arxiv Daily (Code)
OpenVisionCapsules - Set of libraries for encapsulating smart vision algorithms.
MedMNIST: Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification (Code)
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (2021) (Code)
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces (2021) (Code)
The 2021 Image Similarity Dataset and Challenge (2021) (Code)
K-Net: Towards Unified Image Segmentation (2021) (Code)
Yolov5 + Deep Sort with PyTorch
Shape As Points: A Differentiable Poisson Solver (2021) (Code)
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm (2021) (Code)
Awesome Vision-Language Navigation
An Exploration of Embodied Visual Exploration (2021) (Code)
DVC: An End-to-end Deep Video Compression Framework (2019) (Code)
Pixray - Neural image generation.
Unsupervised Learning of Compositional Energy Concepts (2021) (Tweet)
Learning with Noisy Labels for Robust Point Cloud Segmentation (2021) (Code)
Kalidoface - Become a virtual character with just your webcam. (Web)
KalidoKit - Face, Pose, and Hand Tracking Kinematics.
The Ancient Secrets of Computer Vision
Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training (2020) (Code)
PyGaze - Open source eye-tracking software and more. (HN)
Exploring Relational Context for Multi-Task Dense Prediction (2021) (Code)
Neural Scene Graphs for Dynamic Scenes (2021) (Code)
Image Super-Resolution via Iterative Refinement (HN) (Code)
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning (2021) (Code)
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers (2021) (Code)
Multimodal Virtual Point 3D Detection (2021) (Code)
SiT: Self-supervised vIsion Transformer
Attention Mechanisms in Computer Vision: A Survey (2021)
Awesome Vision Attention Papers
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation (2021) (Code)
RenderNet: A deep convolutional network for differentiable rendering from 3D shapes (2018) (Code)
Masked Autoencoders Are Scalable Vision Learners (2021) (Code) (Code) (Code)
BoostingMonocularDepth
It's About Time: Analog Clock Reading in the Wild (2021) (Tweet) (Code)
Learning to Compose Visual Relations (2021) (Code)
LF-Net: Learning Local Features from Images (2018) (Code)
Aligning Pretraining for Detection via Object-Level Contrastive Learning (2021) (Code)
Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis (2021) (Code)
Deep unfolding network for image super-resolution (2020)
VOLO: Vision Outlooker for Visual Recognition (2021) (Code)
Direct Multi-view Multi-person 3D Pose Estimation (2021) (Code)
Image2Mesh: A learning framework for single image 3D reconstruction (2019) (Code)
GammaCV - WebGL accelerated Computer Vision library for modern web applications. (Web)
Localizing Objects with Self-Supervised Transformers and no Labels (2021) (Code)
Harvester - GenICam-based Image Acquisition Python Library.
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion (2021) (Code) (PyTorch Code)
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision (2021) (Code)
MetaFormer is Actually What You Need for Vision (2021) (Code)
ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators (2021) (Code)
Mesa: A Memory-saving Training Framework for Transformers (2021) (Code)
MMPose - Open-source toolbox for pose estimation based on PyTorch. (Docs)
An Empirical Study of Training End-to-End Vision-and-Language Transformers (2021) (Code)
Useful computer vision PhD resources
Tenyks - Data-centric Computer Vision.
Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation (2021) (Code)
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (2021) (Code)
Learning to See by Looking at Noise (2021) (Code)
iBOT: Image BERT Pre-Training with Online Tokenizer (2021) (Code)
Grounded Language-Image Pre-training (2021) (Code)
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction (2016) (Code)
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (Code)
Awesome Visual Grounding
Are Transformers More Robust Than CNNs? (2021) (Code)
Plenoxels: Radiance Fields without Neural Networks (2021) (Code) (Code)
GFPGAN - Developing Practical Algorithms for Real-world Face Restoration.
Awesome Video Stabilization
MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo (2021) (Code)
Tracking People with 3D Representations (2021) (Code)
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection (2019:) (Code)
Learning to Stylize Novel Views (2021) (Code)
YOLOX - High-performance anchor-free YOLO. (Docs)
PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop (2021) (Code)
SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation (2021) (Code)
NeRD: Neural Reflectance Decomposition from Image Collections (2021) (Code)
Vector Quantized Diffusion Model for Text-to-Image Synthesis (2021) (Code) (Code) (Code)
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models (2021) (Code)
SynthDet - End-to-end object detection pipeline using synthetic data.
MPViT: Multi-Path Vision Transformer for Dense Prediction (2021) (Code)
StyleSwin: Transformer-based GAN for High-resolution Image Generation (2021) (Code)
Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline (2021) (Code)
SLIP: Self-supervision meets Language-Image Pre-training (2021) (Code)
General Facial Representation Learning in a Visual-Linguistic Manner (2021) (Code) (Code)
HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields (Code) (HN)
Learning to Regress Bodies from Images using Differentiable Semantic Rendering (2021) (Code)
High-Resolution Image Synthesis with Latent Diffusion Models (2021) (Code)
Photorealistic Audio-driven Video Portraits (2020) (Code)
Awesome Hand Pose Estimation
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (2021) (Code)
Transformer Interpretability Beyond Attention Visualization (2021) (Code)
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis (2021) (Code)
Light Field Image Super-Resolution with Transformers (2021) (Code)
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes (2021) (Code)
DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample (2021) (Code)
RAFT-3D: Scene Flow using Rigid-Motion Embeddings (2021) (Code)
Unsupervised Indoor Depth Estimation (2020) (Code)
A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose (2021) (Code)
Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective (2021) (Code)
Sara - Easy-to-Use C++ Computer Vision Library.
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching (2021) (Code)
U-2-Net: Going Deeper with Nested U-Structure for Salient Object Detection (2020) (Code)
Language as Queries for Referring Video Object Segmentation (2022) (Code)
Localization with Sampling-Argmax (2021) (Code)
VOCA: Voice Operated Character Animation (Code)
CVZone - Computer vision package that makes its easy to run Image processing and AI functions.
Deepface - Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python.
Location-aware Single Image Reflection Removal (2021) (Code)
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement (2021) (Code)
Detecting Twenty-thousand Classes using Image-level Supervision (2022) (Code)
Language-driven Semantic Segmentation (2022) (Code)
Rethinking Nearest Neighbors for Visual Classification (2021) (Code)
Vision Transformer with Deformable Attention (2022) (Code) (Code)
KerasCV - Industry-strength Computer Vision workflows with Keras.
Instant Neural Graphics Primitives - Lightning fast NeRF and more.
Dynamic Head: Unifying Object Detection Heads with Attentions (2021) (Code)
ELSA: Enhanced Local Self-Attention for Vision Transformer (2021) (Code)
FFCV - Fast Forward Computer Vision (and other ML workloads!) (Web)
Awesome Vit - Curated list and survey of awesome Vision Transformers.
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (2022) (Code) (Code) (Video Summary) (HN)
Road Extraction by Deep Residual U-Net (2017) (Code)
Single-Stage 6D Object Pose Estimation (2019) (Code)
Visual Task Adaptation Benchmark (VTAB)
TAda! Temporally-Adaptive Convolutions for Video Understanding (2022) (Code)
UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (2021) (Code)
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects (2020) (Code)
VRT: A Video Restoration Transformer (2021) (Code)
Unknown Object Segmentation from Stereo Images (2021) (Code)
Stacked Cross Attention for Image-Text Matching (2018) (Code)
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (2022) (Code)
DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows (2021) (Code)
DocFormer: End-to-End Transformer for Document Understanding (2022) (Code)
SeMask: Semantically Masked Transformers for Semantic Segmentation (2021) (Code)
Image Quality Assessment: Unifying Structure and Texture Similarity (2020) (Code)
Learning Super-Features for Image Retrieval (2022)
YOLOv7 - Framework Beyond Detection.
A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model (2021) (Code)
Single/Multiple Object Tracking and Segmentation
Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection (2021) (Code)
HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping (2021) (Code)
Scalable Large Scene Neural View Synthesis (2022) (HN)
Transformer Recipe - Quick recipe to learn all about Transformers.
NeROIC: Neural Rendering of Objects from Online Image Collections (2022) (Code)
DiffusionNet: Discretization Agnostic Learning on Surfaces (2022) (Code)
FILM: Frame Interpolation for Large Motion (2022) (Code) (HN)
Learning Signed Distance Field for Multi-view Surface Reconstruction (2021) (Code)
Deep Metric Learning in PyTorch
ICON: Implicit Clothed humans Obtained from Normals (2021) (Code)
CLIPasso: Semantically-Aware Object Sketching (2022) (Code)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos (2022) (Code)
How Do Vision Transformers Work?
Top 10 Computer Vision Papers of 2021
Exploring Sparsity in Image Super-Resolution for Efficient Inference (2021) (Code)
AutoInt: Automatic Integration for Fast Neural Volume Rendering (2021)
Learning to Prompt for Vision-Language Models (2021) (Code)
Summarizing Videos with Attention (2019) (Code)
vkit - Toolkit designed for CV (Computer Vision) developers. (Docs)
Generative Adversarial Graph Convolutional Networks for Human Action Synthesis (2021) (Code)
Awesome Image Matting
Image-to-Markup Generation with Coarse-to-Fine Attention (Code)
Push-ups with Python, mediapipe and OpenCV (HN)
Lama-cleaner: Image inpainting tool powered by LaMa
Vision-Language Pre-Training with Triple Contrastive Learning (2022) (Code)
3D Machine Learning resources/papers
FiftyOne - Open-source tool for building high-quality datasets and computer vision models.
Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut (2022) (Code)
Awesome Multiple object Tracking
Rethinking Coarse-to-Fine Approach in Single Image Deblurring (2021) (Code)
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling (2021) (Code)
As-ViT: Auto-scaling Vision Transformers without Training (2022) (Code)
Awesome 3D Body Papers
RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth (2021) (Code)
Image Similarity Challenge
Blended Diffusion for Text-driven Editing of Natural Images (2021) (Code)
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization (2021) (Code)
Awesome Object Pose
Video Enhancement papers/resources
PowerQE: An Open Framework for Quality Enhancement of Compressed Visual Data
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels (2022) (Code)
Accurate Image Alignment and Registration Using OpenCV (2022) (HN)
Video Grounding and Captioning
Awesome Detection Transformer
StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis (2021) (Code) (Web) (HN)
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition (2020) (Code)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation (2021) (Code)
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (2022) (Code)
Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation (2022)
CycleMLP: A MLP-like Architecture for Dense Prediction (2022) (Code)
Image Quality Assessment Benchmark
StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation (2021) (Code)
Transformers, originally designed to handle language, are taking on vision (2022) (HN)
Fast Image Processing with Fully-Convolutional Networks (2017) (Code)
Efficient Attention: Attention with Linear Complexities (2020) (Code)
Label-Efficient Semantic Segmentation with Diffusion Models (2022) (Code)
hloc - Modular toolbox for state-of-the-art 6-DoF visual localization.
All Tokens Matter: Token Labeling for Training Better Vision Transformers (2021) (Code)
Deformable ConvNets v2: More Deformable, Better Results (2018) (Code)
Restormer: Efficient Transformer for High-Resolution Image Restoration (2021) (Code)
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice (2022) (Code)
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video (2021) (Code)
Awesome 3D Human Reconstruction
Awesome 3D Human Resources List
A ConvNet for the 2020s (2022) (Code) (Code)
Remote-sensing-image-semantic-segmentation - Uses Unet-based improved networks to study Remote sensing image semantic segmentation, which is based on keras.
Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies (2021) (Code)
TensoRF: Tensorial Radiance Fields (2022) (Code)
Autoregressive Image Generation using Residual Quantization (2022) (Code) (Code)
Pix2Pix Timbre Transfer
One-Shot Adaptation of GAN in Just One CLIP (2022) (Code)
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds (2021) (Code)
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (2022) (Code)
Awesome Masked Image Modeling
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training (2022) (Code)
A Transformer-Based Siamese Network for Change Detection (2022) (Code)
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition (2021) (Code)
Robust fine-tuning of zero-shot models (2022) (Code)
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision (2021) (Code)
GroupViT: Semantic Segmentation Emerges from Text Supervision (2022) (Code)
HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening (2022) (Code)
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing (2022) (Code)
DeepStream-Yolo - NVIDIA DeepStream SDK 6.0.1 configuration for YOLO models.
An Empirical Investigation of 3D Anomaly Detection and Segmentation (2022) (Code)
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation (2021) (Code)
Layered Neural Atlases for Consistent Video Editing (2021) (Code)
TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution (2020)
Shape from Polarization for Complex Scenes in the Wild (2022) (Code)
Pix2Seq - General framework for turning RGB pixels into semantically meaningful sequences.
Gait Recognition in the Wild with Dense 3D Representations and A Benchmark (2022) (Code)
Ensembling Hugging Face Transformers made easy
Relational Knowledge Distillation (2019) (Code)
NICE-SLAM: Neural Implicit Scalable Encoding for SLAM (2021) (Code)
Neural 3D Mesh Renderer (2017) (Code)
Large-scale Bilingual Language-Image Contrastive Learning (2022) (Code)
OpenMVG - Open Multiple View Geometry library. Basis for 3D computer vision and Structure from Motion.
Neural Points: Point Cloud Representation with Neural Fields (2021) (Code)
OpenCV JS Web Worker - Getting started with OpenCV compiled to Webassembly and loaded in a worker.
Learning Graph Regularisation for Guided Super-Resolution (2022) (Code)
Video Polyp Segmentation: A Deep Learning Perspective (2022) (Code)
Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images (2022) (Code)
HybridNets: End-to-End Perception Network (2022) (Code)
HDR-NeRF: High Dynamic Range Neural Radiance Fields (2022) (Code) (HN)
AdaMixer: A Fast-Converging Query-Based Object Detector (2022) (Code)
MixFormer: End-to-End Tracking with Iterative Mixed Attention (2022) (Code)
Bringing Old Films Back to Life (2022) (Code)
Extracting Triangular 3D Models, Materials, and Lighting From Images (2022) (Code)
LiT: Zero-Shot Transfer with Locked-image text Tuning (2021) (Tweet)
LAFITE: Towards Language-Free Training for Text-to-Image Generation (2021) (Code)
Neural 3D Video Synthesis from Multi-view Video (2022) (Code)
ToFu: Topologically Consistent Multi-View Face Inference Using Volumetric Sampling (2021)
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning (2019) (Code)
FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator (2021)
Reddit Place Script 2022 - Script to draw an image onto r/place.
A Unified Objective for Novel Class Discovery (2021) (Code)
Papers and Datasets about Point Cloud
On the Importance of Asymmetry for Siamese Representation Learning (2022) (Code)
REGTR: End-to-end Point Cloud Correspondences with Transformers
A Closer Look at Local Aggregation Operators in Point Cloud Analysis (2020) (Code)
Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries (2022) (Code)
Perception Prioritized Training of Diffusion Models (2022) (Code)
VisualBERT: A Simple and Performant Baseline for Vision and Language (2019) (Code)
MultiMAE: Multi-modal Multi-task Masked Autoencoders (2022) (Code)
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction (2021) (Code)
Towards Open World Object Detection (2021) (Code)
Transformer in Vision - Recent Transformer-based CV and related works.
Shunted Self-Attention via Multi-Scale Token Aggregation (2021) (Code)
Space-Time Correspondence as a Contrastive Random Walk (2020) (Code)
MaskGIT: Masked Generative Image Transformer (2022) (Code)
EasyCV - All-in-one computer vision toolbox based on PyTorch.
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection (2022) (Code)
EMOCA: Emotion Driven Monocular Face Capture and Animation (2022)
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation (2022) (Code)
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (Code)
PointCLIP: Point Cloud Understanding by CLIP (2022) (Code)
DaViT: Dual Attention Vision Transformers (2022) (Code)
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers (2022) (Code)
Recovering 3D Human Mesh from Monocular Images: A Survey (2022) (Code)
Video Diffusion Models (2022) (Web) (Code)
MaxViT: Multi-Axis Vision Transformer (2022) (Code)
Unified Contrastive Learning in Image-Text-Label Space (2022) (Code)
RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering (2021) (Code)
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (2021) (Code)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation (2022) (Code)
MAXIM: Multi-Axis MLP for Image Processing (2022) (Code)
Tensil tutorial for YOLO v4 Tiny on Ultra96 V2 (2022)
UNITER: UNiversal Image-TExt Representation Learning (2020) (Code)
Consistent Depth of Moving Objects in Video (2021) (Code)
Bridging Video-text Retrieval with Multiple Choice Questions (2022) (Code)
Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (2020) (Code)
BACON: Band-limited Coordinate Networks for Multiscale Scene Representation (2022) (Code)
Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results (2022) (Code)
Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering (2021) (Code)
SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image (2022) (Code)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions (2021) (Code)
Neighborhood Attention Transformer (2022) (Code)
3D Surface Reconstruction From Multi-Date Satellite Images (2021) (Code)
Decoupling Makes Weakly Supervised Local Feature Better (2022) (Code)
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic (2022) (Code)
EasyMocap - Open-source toolbox for markerless human motion capture from RGB videos.
QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (2022) (Code)
PolarMask: Single Shot Instance Segmentation with Polar Representation (2019) (Code)
Latent Video Transformer (2020) (Code)
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (2020) (JAX Code)
A Latent Transformer for Disentangled Face Editing in Images and Videos (2021) (Code)
Photorealistic Style Transfer via Wavelet Transforms (2019) (Code)
Probing ViTs
Dense Depth Priors for Neural Radiance Fields from Sparse Input Views (2021) (Code)
Self-Supervised Models are Continual Learners (2021) (Code)
Mask Transfiner for High-Quality Instance Segmentation (2022) (Code)
An Extendable, Efficient and Effective Transformer-based Object Detector (2022)
Learned Queries for Efficient Local Attention (2021) (Code)
3D Human Pose Estimation with Spatial and Temporal Transformers (2021) (Code)
3D human pose estimation in video with temporal convolutions and semi-supervised training (2019) (Code)
MC-Calib: A generic and robust calibration toolbox for multi-camera systems (2022) (Code)
Understanding The Robustness in Vision Transformers (2022) (Code)
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation (2021) (Code)
Tackling multiple tasks with a single visual language model (2022) (Code) (Tweet)
Associating Objects with Transformers for Video Object Segmentation (2021) (Code)
Simple multi-dataset detection - Object detection on multiple datasets with an automatically learned unified label space.
Learning Texture Transformer Network for Image Super-Resolution (2020) (Code)
Balanced MSE for Imbalanced Visual Regression (2022) (Code)
Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions (2022) (Code)
Action-Conditioned 3D Human Motion Synthesis with Transformer VAE (2021) (Code)
CoMoGAN: continuous model-guided image-to-image translation (2021) (Code)
OpenMVS - Open Multi-View Stereo reconstruction library.
Sliced Recursive Transformer (2021) (Code)
Neural Dual Contouring (2022) (Code)
Awesome Deblurring - Curated list of resources for Image and Video Deblurring.
CoCa: Contrastive Captioners are Image-Text Foundation Models (2022) (Code)
Sequencer: Deep LSTM for Image Classification (2022)
Language Models Can See: Plugging Visual Controls in Text Generation (2022) (Code)
flyswot - CLI for Hugging Face Transformers image classification models.
Neural 3D Scene Reconstruction with the Manhattan-world Assumption (2022) (Code)
PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (2022) (Code)
What do the Vision Transformers learn? How do they encode anything useful for image recognition? (2022)
Integrative Few-Shot Learning for Classification and Segmentation (2022) (Code)
DeltaConv: Anisotropic Geometric Deep Learning with Exterior Calculus (2022) (Code)
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis (2021) (Code)
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (2022) (Code)
ConvMAE: Masked Convolution Meets Masked Autoencoders (2022) (Code)
Deep Kernelized Dense Geometric Matching (2022) (Code)
Unsupervised Semantic Segmentation by Distilling Feature Correspondences (2022) (Code)
RecursiveMix: Mixed Learning with History (2022) (Code)
MMDetection3d - OpenMMLab's next-generation platform for general 3D object detection.
Imagen: Text-to-Image Diffusion Models (Tweet) (Code) (HN) (HN)
An End-to-End Transformer Model for 3D Object Detection (2021) (Code)
Neural 3D Reconstruction in the Wild (2022) (Code)
Body shape and pose estimation on 3D scans of people in clothing using Ceres Solver
A Survey of Visual Transformers (2021) (Code)
Nerfies: Deformable Neural Radiance Fields (2021) (Code)
Working notes on the role of vision papers in basic science (2022) (Tweet)
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers (2022) (HN)
Prompt-aligned Gradient for Prompt Tuning (2022) (Code)
Text2Human: Text-Driven Controllable Human Image Generation (2022) (Code)
OnePose: One-Shot Object Pose Estimation without CAD Models (2022) (Code)
OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models (2022) (Code)
PREF: Phasorial Embedding Fields for Compact Neural Representations (2022) (Code)
Optimizing Relevance Maps of Vision Transformers Improves Robustness (2022) (Code)
Exploring Visual Prompts for Adapting Large-Scale Models (2022) (Code)
Deepfake Offensive Toolkit - Makes real-time, controllable deepfakes ready for virtual cameras injection. (HN)
Real-time Object Detection for Streaming Perception (2022) (Code)
Volumentations 3D - Library for 3D augmentations.
Awesome Learning with Label Noise
LIVE: Towards Layer-wise Image Vectorization (2022) (Code)
BEVT: BERT Pretraining of Video Transformers (2021) (Code)
Variable Bitrate Neural Fields (2022) (Code)
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation (2019) (Code)
Masked Unsupervised Self-training for Zero-shot Image Classification (2022) (Code)
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video (2022) (Code)
Awesome Implicit NeRF Robotics
EfficientFormer: Vision Transformers at MobileNet Speed (2022) (Code)
ARF: Artistic Radiance Fields (2022) (Code) (HN)
Patch2Pix: Epipolar-Guided Pixel-Level Correspondences (2020) (Code)
Translating Images into Maps (2022) (Code)
Instances as Queries (2021) (Code)
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction (2022) (Code)
CogView: Mastering Text-to-Image Generation via Transformers (2021) (Code)
All in One: Exploring Unified Video-Language Pre-training (2022) (Code)
Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization (2022) (Code)
Solving Inefficiency of Self-supervised Representation Learning (2021) (Code)
NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination (2021) (Code)
Trending in 3D Vision
ShapeFormer: Transformer-based Shape Completion via Sparse Representation (2022) (Code)
Awesome Prompting Papers in Computer Vision
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation (2022) (Code)
GenDR: A Generalized Differentiable Renderer (2022) (Code)
Elucidating the Design Space of Diffusion-Based Generative Models (2022) (Code) (Code)
IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images (2022) (Code)
Omnivore: A Single Model for Many Visual Modalities (2022) (Code)
Benchmarking and Analyzing Point Cloud Classification under Corruptions (2022) (Code)
DVGO: Direct Voxel Grid Optimization (Super-fast Convergence for Radiance Fields Reconstruction) (2022) (Code)
RegionCLIP: Region-based Language-Image Pretraining (2021) (Code)
Fast Light-Weight Near-Field Photometric Stereo (2022) (Code)
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (2021) (Code)
RePaint: Inpainting using Denoising Diffusion Probabilistic Models
The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions (2022) (Code)
3D Moments from Near-Duplicate Photos (2022) (Code)
Prototypical Contrastive Language Image Pretraining (2022) (Code)
NeRV: Neural Representations for Videos (2021) (Code)
MT-YOLOv6 - Single-stage object detection framework dedicated to industrial applications.
Fast Point Transformer (2022) (Code)
FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation (2022) (Code)
Nettle Magic Project - Scanner for decks of cards with bar codes printed on card edges. (HN)
Image Quality Assessment using Contrastive Learning (2021) (Code)
Denoised MDPs: Learning World Models Better Than The World Itself (2022) (Code)
Sparse Instance Activation for Real-Time Instance Segmentation (2022) (Code)
Referring Image Matting (2022) (Code)
Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds (2022) (Code)
Contrastive Boundary Learning for Point Cloud Segmentation (2022) (Code)
Scaling up Kernels in 3D CNNs (2022) (Code)
Oriented RepPoints for Aerial Object Detection (2022) (Code)
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly (2022) (Code)
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications (2022) (Code)
Awesome Visual Diffusion Models
Vision Transformer Adapter for Dense Predictions (2022) (Code)
Activating More Pixels in Image Super-Resolution Transformer (2022) (Code)
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies (2022) (Code)
GMFlow: Learning Optical Flow via Global Matching (2022) (Code)
Vector-quantized Image Modeling with Improved VQGAN (2021) (JAX Code)
Learned Vertex Descent: A New Direction for 3D Human Model Fitting (2022) (Code)
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (2022) (Code)
AITViewer - Set of tools to visualize and interact with sequences of 3D data.
Object-Compositional Neural Implicit Surfaces (Code)
Awesome Egocentric Vision
MonoScene: Monocular 3D Semantic Scene Completion (2022) (Code)
Visual Prompt Tuning (2022) (Code)
Unified Implicit Neural Stylization (2022) (Code)
3D-Aware Semantic-Guided Generative Model for Human Synthesis (2021) (Code)
Text2LIVE: Text-Driven Layered Image and Video Editing (2022) (HN)
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction (2022) (Code)
Generalization of Otsu's Method and Minimum Error Thresholding (2020)
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model (2022) (Code)
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation (2021) (Code)
Deformable Sprites for Unsupervised Video Decomposition (2022) (Code)
Topologically-Aware Deformation Fields for Single-View 3D Reconstruction (2022) (Code)
Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation (2021) (Code)
Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions (2022) (Code)
Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry (2021) (Code)
Box-supervised Instance Segmentation with Level Set Evolution (2022)
Tent: Fully Test-Time Adaptation by Entropy Minimization (2021) (Code)
UniFormer: Unifying Convolution and Self-attention for Visual Recognition (2022) (Code)
MOTR: End-to-End Multiple-Object Tracking with Transformer (2022) (Code)
Towards Grand Unification of Object Tracking (2022) (Code)
Benchmarking Omni-Vision Representation through the Lens of Visual Realms (2022) (Code)
Color Histograms in Image Retrieval
SeqTR: A Simple yet Universal Network for Visual Grounding (2022) (Code)
Image Inpainting with External-internal Learning and Monochromic Bottleneck (2021) (Code)
Deep Image Homography Estimation (2016) (Code)
Illumination Adaptive Transformer (2022) (Code)
MotionCLIP: Exposing Human Motion Generation to CLIP Space (2022) (Code)
Awesome Image Composition
Scene Text Recognition with Permuted Autoregressive Sequence Models (2022) (Code)
Multimodal Masked Autoencoders Learn Transferable Representations (Code)
BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection (2022) (Code)
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (2022) (Code)
AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields (2022) (Code)
Harmonizer: Learning to Perform White-Box Image and Video Harmonization (2022) (Code)
CVAT - Computer Vision Annotation Tool. (Code)
NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing (2022)
Monocular 3D Object Detection with Depth from Motion (2022) (Code)
Masked Discrimination for Self-Supervised Learning on Point Clouds (2022) (Code)
SORT - Simple, online, and real time tracking of multiple objects in a video sequence.
Local Color Distributions Prior for Image Enhancement (2022) (Code)
S2Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning (2022) (Code)
Is Attention All NeRF Needs? (2022) (Code)
Camouflaged/Concealed Object Detection
Accelerate Vision Transformer (ViT) with Quantization using Optimum (2022)
Optimizing Transformers for GPUs with Optimum (2022)
Photogrammetry Guide (HN)
Multi-View Mesh Reconstruction with Neural Deferred Shading (2022) (Code)
Initialization and Alignment for Adversarial Texture Optimization (2022) (Code)
DCT-Net: Domain-Calibrated Translation for Portrait Stylization (2022) (Code)
Pretraining is All You Need for Image-to-Image Translation (2022) (Code)
Vision-Centric BEV Perception: A Survey
Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency (2022) (Code)
Awesome Weakly Supervised Semantic Segmentation Papers
GAUDI: A Neural Architect for Immersive 3D Scene Generation (2022) (Code) (HN)
Multimodal Image Synthesis and Editing: A Survey (2021) (Code)
High-Resolution Image Synthesis with Latent Diffusion Models (2022) (Code)
ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters (2022) (Code)
3D Vision with Transformers: A Survey (2022)
Optical Flow Processing Stack
VideoX - Multi-modal Video Content Understanding
Simple Baselines for Image Restoration (2022) (Code)
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning (2022)
Revisiting the Critical Factors of Augmentation-Invariant Representation Learning (2022) (Code)
Image Quality Related Papers
Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution (2022) (Code)
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (2022) (Code)
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise (2022) (Code)
Flexible Diffusion Modeling of Long Videos (2022) (Code)
MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries (2022) (Code)
Escaping the Big Data Paradigm with Compact Transformers (2021) (Code)
Towards Layer-wise Image Vectorization (2022) (Code)
Awesome Optical Flow
LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling (2022) (Code)
SimpleRecon: 3D Reconstruction Without 3D Convolutions (2022) (Code)
Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories (2022) (Code)
Lance - Columnar Data Format for Machine Learning and Computer Vision.
Strand-Braid - Live, low-latency 2D and 3D tracking from single or multiple high-speed cameras.
Multi-Domain Incremental Learning for Semantic Segmentation (2022) (Code)
ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning (2022) (Code)
Awesome Vision-and-Language Pre-Training
Deep Vision and Graphics course
OpenMixup - CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark.
MMEditing - Low-level vision toolbox based on PyTorch, supporting super-resolution, inpainting, matting, video interpolation, etc.
Accelerating DETR Convergence via Semantic-Aligned Matching (2022) (Code)
The Follower - Using open cameras and AI to find how an Instagram photo is taken (Tweet)
Image Segmentation Using Text and Image Prompts (2022) (Code)
Knowledge Distillation from A Stronger Teacher (2022) (Code)
Learning Pixel Trajectories with Multiscale Contrastive Random Walks (2022) (Code)
Text2Light: Zero-Shot Text-Driven HDR Panorama Generation (2022) (Code)
detrex - Open-source toolbox that provides state-of-the-art Transformer-based detection algorithms.
VToonify: Controllable High-Resolution Portrait Video Style Transfer (2022) (Code)
MMYOLO - Open source toolbox for YOLO series algorithms based on PyTorch and MMDetection.
Relighting4D: Neural Relightable Human from Videos (2022) (Code)
GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images (2022) (Code)
Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer (2019) (Code)
Ask HN: Any good self-hosted image recognition software? (2022)
LAVIS - One-stop Library for Language-Vision Intelligence.
CATs: Cost Aggregation Transformers for Visual Correspondence (2021) (Code)
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation (2022) (Code)
SetFit - Efficient Few-shot Learning with Sentence Transformers.
Awesome Monocular 3D detection
Human Motion Diffusion Model (2022) (Code) (HN)
DreamFusion: Text-to-3D using 2D Diffusion (2022) (HN)
Recent Advanced in Vision-and-Language Pre-training (2022)
DeepInteraction: 3D Object Detection via Modality Interaction (2022) (Code)
Vision OSC - Send (almost) all Apple Vision Framework's detection results via OSC.
Synergistic Self-supervised and Quantization Learning (2022) (Code)
Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models (2022) (Code)
StyleSwap: Style-Based Generator Empowers Robust Face Swapping (2022) (Code)
IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis (2022) (Code)
VMFormer: End-to-End Video Matting with Transformer (2022) (Code)
Image-Based CLIP-Guided Essence Transfer (2021) (Code)
Equivariant Point Network for 3D Point Cloud Analysis (2022) (Code)
Computer Vision in the Wild Readings
Nerfstudio - Collaboration friendly studio for NeRFs.
Learning Dexterous Manipulation from Exemplar Object Trajectories and Pre-Grasps (2022) (Code)
MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction (2022) (Code)
Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising (2022) (Code)
2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds (2022) (Code)
PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation (2022) (Code)
UPIT - FastAI/PyTorch package for unpaired image-to-image translation.
On Distillation of Guided Diffusion Models (2022)
Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence (2022) (Code)
MaPLe: Multi-modal Prompt Learning (2022) (Code)
GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation (2022) (Code)
End2End Occluded Face Recognition by Masking Corrupted Features (2022) (Code)
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction (2021) (Code)
Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance (2022) (Code)
Adaptive Token Sampling For Efficient Vision Transformers (2022) (Code)
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models (2022) (Code)
Real-Time Neural Character Rendering with Pose-Guided Multiplane Images (2022) (Code)
Subspace Regularizers for Few-Shot Class Incremental Learning (2022) (Code)
Exploring Long-Sequence Masked Autoencoders (2022) (Code)
Awesome 3D-aware Image Synthesis – Papers, Codes and Datasets
An Improved One millisecond Mobile Backbone (2021) (Code)
Fuzzy Metaballs: Approximate Differentiable Rendering with Algebraic Surfaces (2022) (Code)
Focal Modulation Networks (2022)
Monocular Dynamic View Synthesis: A Reality Check (2022) (Code)
Pose Recognition With Cascade Transformers (2021) (Code)
Terran - Human perception library.
Pento - Boost your business with computer vision. (GitHub)
HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields (2022) (Code)
FastestDet - Newly designed ultra lightweight anchor free target detection algorithm.
Computer Vision, From 3D Reconstruction to Recognition Notes
Stanford University: Deep Learning for Computer Vision (Notes)
TAP-Vid: A Benchmark for Tracking Any Point in a Video (2022) (Code)
StyleNAT: Giving Each Head a New Perspective (2022) (Code)
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions (2022) (Code)
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models (2022) (Code)
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery (2022) (Code)
OpenSeeFace - Robust real time face and facial landmark tracking on CPU with Unity integration.
GIT: A Generative Image-to-text Transformer for Vision and Language (2022) (Code)
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis (2022) (Code)
Paddle Detection - High-Efficient Development Toolkit for Object Detection based on PaddlePaddle.
Instant Neural Surface Reconstruction
All are Worth Words: A ViT Backbone for Diffusion Models (2022) (Code)
OneFormer: One Transformer to Rule Universal Image Segmentation (2022) (Code)
Visual Object Tracking
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification (2021) (Code)
Exploring CLIP for Assessing the Look and Feel of Images (2022) (Code)
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation (2022) (Code)
Tracking without bells and whistles (2019) (Code)
LaTr: Layout-Aware Transformer for Scene-Text VQA (2021) (Code)
ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers (2022)
SinDiffusion: Learning a Diffusion Model from a Single Natural Image (2022) (Code)
CLIP4Cir - CLIP for Conditioned image retrieval training code.
Physics-based Character Controllers Using Conditional VAEs (2022) (Code)
Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition (2021) (Code)
Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention (2022) (Code)
VLDet: Learning Object-Language Alignments for Open-Vocabulary Object Detection (2022)
NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views (2022) (Code)
Vision Transformers (ViT) Explained (2022) (HN)
Embedding Methods for Image Search | Pinecone
Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion (2022) (Code)
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild (2022) (Code)
ODaM - Object detection and Monitoring.
Token Merging: Your ViT But Faster (2022) (Code)
NeuralUDF: Learning Unsigned Distance Fields for Multi-view Reconstruction of Surfaces with Arbitrary Topologies (2022) (Code)
Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars (2022) (Code)
PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking (2022) (Code)
Diffusion Models for Medical Image Analysis: A Comprehensive Survey (2022) (Code)
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data (2022) (Code)
Rerun - Open source visualization infrastructure for computer vision and robotics. (Code) (OSS Release) (HN)
ECON: Explicit Clothed humans Obtained from Normals (2022) (Code)
Monocular, One-stage, Regression of Multiple 3D People
Paint by Example: Exemplar-based Image Editing with Diffusion Models (2022) (Code)
Detection Transformers with Assignment (2022)
Splicing ViT Features for Semantic Appearance Transfer (2022) (Code)
Polynomial Neural Fields for Subband Decomposition and Manipulation (2022)
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet (2022) (Code)
RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs (2022)
Vision-and-Language Navigation Resources
HNeRV: A Hybrid Neural Representation for Videos (2022) (Code)
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation (2022) (Code)
Zero Shot Image Restoration Using Denoising Diffusion Null-Space Model (2022) (Code)
DifFace: Blind Face Restoration with Diffused Error Contraction
What do Vision Transformers Learn? A Visual Exploration
Images Speak in Images: A Generalist Painter for In-Context Visual Learning (2022) (Code)
ShuffleMixer: An Efficient ConvNet for Image Super-Resolution
SDFStudio: Unified Framework for Surface Reconstruction (Code)
SegViT: Semantic Segmentation with Plain Vision Transformers (2022) (Code)
MMEngine - Foundational library for training deep learning models based on PyTorch.
SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields (2022) (Code)
Great Computer Vision startups (2022)
CoVA: Context-aware Visual Attention for Webpage Information Extraction (2022)
Awesome 3D Object Detection
ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection (2022) (Code)
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis (2022) (Code)
FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos (2022) (Code)
SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution (2022)
Knowledge Condensation Distillation (2022) (Code)
NeuMan: Neural Human Radiance Field from a Single Video (2022) (Code)
ScaleNet: Searching for the Model to Scale (2022) (Code)
Very Recent Progress in 3D Hand Tasks
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (2022) (Code) (Code)
NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields (2022)
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures (2022) (Code)
Deep Architectures for Content Moderation and Movie Content Rating (2022) (Code)
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning (2022) (Code)
Magic3D: High-Resolution Text-to-3D Content Creation (2022) (Code)
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (2022) (Code)
Exploring Cross-Image Pixel Contrast for Semantic Segmentation (2021) (Code)
Towards Robust Blind Face Restoration with Codebook Lookup Transformer (2022) (Code)
Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction (2022) (Code)
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders (2023) (Code)
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection (2022) (Code)
SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation (2022)
Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion (2022) (Code)
Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera (2022) (Code)
GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer (2022) (Code)
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models (2022)
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models (2022) (Code)
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations (2022) (Code)
Rethinking Resolution in the Context of Efficient Video Recognition (2022) (Code)
OpenCV Mobile
SINE: SINgle Image Editing with Text-to-Image Diffusion Models (2022) (Code)
PETR: Position Embedding Transformation for Multi-View 3D Object Detection (2022)
Awesome Deep Optics/End-to-end Optical Design
HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling (2023) (Code)
Image generation with MNIST (2022)
CiT: Curation in Training for Effective Vision-Language Data (2023) (Code)
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare (2022) (Code)
Bidirectional Projection Network for Cross Dimension Scene Understanding (2021) (Code)
Image Distortion Correction - Curated list of resources on handling Rolling Shutter effects and Radial Distortions.
Ultralytics YOLOv8 - YOLOv8 in PyTorch > ONNX > CoreML > TFLite.
Neural Density-Distance Fields (2022) (Code)
Vision Transformers Are Good Mask Auto-Labelers (2023) (Code)
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition (2022) (Code)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale (2022) (Code)
SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction (2022) (Code)
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling (2023) (Code)
Generalized Decoding for Pixel, Image, and Language (2022) (Code)
Global Context Vision Transformers (2022) (Code)
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning (2023) (Code)
DensePose From WiFi (2022) (Tweet) (HN) (HN)
CHAIRS: Towards Full-Body Articulated Human-Object Interaction (2022) (Code)
MultiAct: Long-Term 3D Human Motion Generation from Multiple Action Labels (2023) (Code)
GLIGEN: Open-Set Grounded Text-to-Image Generation (2023) (Code)
T2M-GPT: Generating Human Motion from Textual Descriptions with discrete Representations (2023) (Code)
Multiview Compressive Coding for 3D Reconstruction (2023) (Code)
Deep Learning Object Detection Paper List
Efficient Neural Radiance Fields for Interactive Free-viewpoint Video (2022) (Code)
InstructPix2Pix: Learning to Follow Image Editing Instructions (2022) (Code)
Learned reconstructions for practical mask-based lensless imaging (Code)
NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos (2022) (Code)
Domain Expansion of Image Generators (2023) (Code)
Computer Vision: Models, Learning, and Inference
Reversible Column Networks (2022) (Code)
Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach (2021) (Code)
Long-tail Detection with Effective Class-Margins (2022) (Code)
Diffusion-SDF: Text-to-Shape via Voxelized Diffusion (2022) (Code)
Learning 3D-aware Image Synthesis with Unknown Pose Distribution (2023) (Code)
Video object detection in Elixir using Nx and Bumblebee (2023)
K-Planes: Explicit Radiance Fields in Space, Time, and Appearance (2023) (Code)
Text2LIVE: Text-Driven Layered Image and Video Editing (2022) (Code)
Disentangled Representation Learning for Text-Video Retrieval (2022) (Code)
Text-To-4D Dynamic Scene Generation (2023) (HN)
PhyCV - Physics-inspired Computer Vision Library.
Fast Dynamic Radiance Fields with Time-Aware Neural Voxels (2022) (Code)
Learning Customized Visual Models with Retrieval-Augmented Knowledge (2023) (Code)
Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models (2022) (Code)
Accelerating Guided Diffusion Sampling with Splitting Numerical Methods (2023) (Code)
SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections (2023) (Code)
STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation (2023) (Code)
Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline (2023) (Code)
Awesome Vision Transformer Collection
Compressed Vision for Efficient Video Understanding (2022) (Code)
Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation (2023) (Code)
SEGA: Instructing Diffusion using Semantic Dimensions (2023) (Code)
Dreamix: Video Diffusion Models are General Video Editors (2023) (Web)
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis (2023) (Code)
ESP-WHO - Face detection and recognition framework.
Egocentric Video-Language Pretraining (2022) (Code)
EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations (2022) (Code)
Revealing Single Frame Bias for Video-and-Language Learning (2022) (Code)
EVA3D: Compositional 3D Human Generation from 2D Image Collections (2022) (Code)
Zero-shot Image-to-Image Translation (2023) (Code)
MatteFormer: Transformer-Based Image Matting via Prior-Tokens (2022) (Code)
minREV - Simple minimal implementation of Reversible Vision Transformers.
Cut and Learn for Unsupervised Object Detection and Instance Segmentation (2023) (Code)
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models (2023) (Code)
3D-aware Conditional Image Synthesis (2023) (Code)
Deblur-NeRF: Neural Radiance Fields from Blurry Images (2021) (Code)
Learning When to Say "I Don't Know" (2022) (Code)
MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation (2023)
NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild (2021) (Code)
3D Shape Analysis Paper List
Generating Holistic 3D Human Motion from Speech (2022) (Code)
SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation (2023) (Code)
Audio-Visual Face Reenactment (2022) (Code)
Awesome Distribution Shift
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval (2022) (Code)
TEXTure: Text-Guided Texturing of 3D Shapes (Code)
SIMPLI - Self-improving Multiplane-to-layer Images for Novel View Synthesis (2023)
Awesome Image Registration - Image registration related books, papers, videos, and toolboxes.
Learning Visual Representations via Language-Guided Sampling (2023) (Code)
RealFusion: 360° Reconstruction of Any Object from a Single Image (2023) (Code)
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment (2022) (Code)
Composer: Creative and Controllable Image Synthesis with Composable Conditions (2023) (Code)
Decoupling Human and Camera Motion from Videos in the Wild (2023) (Code)
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth (2023) (Code)
The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition (2023) (Code)
Image as Set of Points (2023) (Code)
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound (2022) (Code)
NeRF2Mesh: Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement (2023) (Code)
Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction (2023) (Code)
MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices (2023) (Code)
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation (2022) (Code)
FFCV-SSL - Fast Forward Computer Vision for Self-Supervised Learning.
How computer vision is changing manufacturing in 2023 (HN)
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation (2023) (Code)
NeRFshop: Interactive Editing of Neural Radiance Fields
Blind Video Deflickering by Neural Filtering with a Flawed Atlas (2023)
Universal Instance Perception as Object Discovery and Retrieval (2023)
FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization (2023) (Code)
Vid2Seq: a pretrained visual language model for describing multi-event videos (2023) (HN)
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models (2023) (Code)
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation (2023) (Code)
Generative Semantic Segmentation (2023) (Code)
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators (2023) (Code) (HN)
Diffusion-based Generation, Optimization, and Planning in 3D Scenes (2023) (Code)
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes (2023) (Code)
Pointcept - Powerful and flexible codebase for point cloud perception research.
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (2023) (Code)
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model (2023) (Code)
GlueStick - Joint Deep Matcher for Points and Lines.
GVision - Reverse image search app that use Google Cloud Vision API to detect landmarks and web entities from images.
Segment Anything (2023) (Code)
Detecting and Grounding Multi-Modal Media Manipulation (2023) (Code)
Better Aligning Text-to-Image Models with Human Preference (2023) (Code)
Scaling Language-Image Pre-training via Masking (2022) (Code)
From Zero to Hero: Convincing with Extremely Complicated Math (2023) (HN)
Zero-shot Generative Model Adaptation via Image-specific Prompt Learning (2023) (Code)
Awesome Digital Human - Collection of resources on digital human including clothed people digitalization, virtual try-on, and other related directions.
Grounded-Segment-Anything - Marrying Grounding DINO with Segment Anything - Detect and Segment Anything with Text Inputs.
VideoCrafter：Toolkit for Text-to-Video Generation and Editing
DiffMimic: Efficient Motion Mimicking with Differentiable Physics (2023) (Code)
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (2023) (Code)
MetaSeg: Packaged version of the Segment Anything repository
Segment Anything with Clip
Hachi - Natural Language search for Videos and Images.
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (2023) (Code)
EditAnything - Segment Anything + ControlNet + BLIP2 + Stable Diffusion. (HN)
Segment Anything EO tools
PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models (2023) (Code)
Awesome-Anything - General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, etc.
Connect Segment-Anything with CLIP
Semantic Segment Anything - Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset.
Segment Anything Labelling Tool (SALT)
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval (2022) (Code)
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (2023) (Code)
SVDiff: Compact Parameter Space for Diffusion Fine-Tuning (2023) (Code)
Detection Transformer with Stable Matching (2023) (Code)
Caption Anything via Clicking
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition (2023) (Code)
Prompt-Segment-Anything - Implementation of zero-shot instance segmentation using Segment Anything.
Segment Anything for Stable Diffusion Webui
Semaphore - Full-body keyboard using gestures to type through computer vision. (HN)
CleanVision - Automatically find issues in image datasets and practice data-centric computer vision.
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior (2023) (Code)
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs (2023) (Code)
FAIR Animated Drawings (Code) (HN)
Anything-3D - Segment-Anything + 3D. Let's lift the anything to 3D.
3D-Box via Segment Anything
Rich-Text-to-Image Generation
SEEM: Segment Everything Everywhere All at Once
DinoV2: Meta’s Open Source State-of-the-art computer vision models (2023) (HN) (Code)
Dynablox - Real-time detection of diverse dynamic objects in complex environments.
Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning (2023)
Inpaint Anything: Segment Anything Meets Image Inpainting
Transformer-Based Visual Segmentation: A Survey (2023)
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation (2023) (Code)
segment-geospatial - Python package for segmenting geospatial data with the Segment Anything Model (SAM).
I Hear Your True Colors: Image Guided Audio Generation
Contrastive Audio-Visual Masked Autoencoder (2023)
F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories (2023) (Code)
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements (2023) (Code)
Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields (2023) (Code)
Mask-Free Video Instance Segmentation (2023) (Code)
Fine-tuned CLIP models are efficient video learners (2023)
Track-Anything - Flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem.
Supervision - Easy-to-use utils that will come in handy in any Computer Vision project.
Roboflow Notebooks - Examples and tutorials on using SOTA computer vision models and techniques.
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations (2023)
Awesome Segment Anything - Tracking and collecting papers/projects/others related to Segment Anything.
SuperGradients - Easily train or fine-tune SOTA computer vision models with one open source training library.
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking (2023)
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation (2023) (Code)
Shap-E: Generating Conditional 3D Implicit Functions (2023) (Code) (HN)
Personalize Segment Anything Model with One Shot (2023) (Code)
Segment Anything 3D
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision (2023) (Code)
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities (2023) (Code)
Denoising Diffusion Models: A Generative Learning Big Bang (2023) (Code)
LERF: Language Embedded Radiance Fields (2023) (Code)
Masked Diffusion Transformer is a Strong Image Synthesizer (2023) (Code)
Decentralization and Acceleration Enables Large-Scale Bundle Adjustment (2023) (Code)
Awesome-Visual-Instruction-Tuning
Awesome 3D Reconstruction Papers
Better Diffusion Models Further Improve Adversarial Training (2023) (Code)
Accelerated Coordinate Encoding: Learning to Relocalize in Minutes using RGB and Poses (2023) (Code)
MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model (2022) (Code)
DiM: Distilling Dataset into Generative Model (2023) (Code)
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation (2023) (Code)
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models (2023) (Code)
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising (2023) (Code)
Collaborative Diffusion for Multi-Modal Face Generation and Editing (2023) (Code)
Learning Attention as Disentangler for Compositional Zero-shot Learning (2023) (Code)
GRES: Generalized Referring Expression Segmentation (2023) (Code)
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles (2023) (Code)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day (2023) (Code)
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models (2023) (Code)
Text2Tex: Text-driven Texture Synthesis via Diffusion Models (2023) (Code)
Apple releasing segmentation/pose for humans and animals (HN)
Real-time 6K Image Rescaling with Rate-distortion Optimization (2023) (Code)
Awesome Talking Head Generation
Tracking Everything Everywhere All at Once (2023) (Code)
Towards Smooth Video Composition (2023) (Code)
FasterViT: Fast Vision Transformers with Hierarchical Attention
Matting Anything (2023) (HN)
ViTMatte - Boosting Image Matting with Pretrained Plain Vision Transformers.
Neural Kernel Surface Reconstruction (2023)
Inserting Anybody in Diffusion Models via Celeb Basis (2023) (Code)
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions (2023) (Code)
Temporal Voyage: Code for "Neural Scene Chronology"
DynIBaR: Neural Dynamic Image-Based Rendering (2023) (Code)
RelTR: Relation Transformer for Scene Graph Generation (2022) (Code)
Progressively Optimized Local Radiance Fields for Robust View Synthesis (2023)
WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference (2023)
MIME: Human-Aware 3D Scene Generation (2023)
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2023) (Code)
Language Segment-Anything
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation (2023) (Code)
Awesome Segment Anything
Cones 2: Customizable Image Synthesis with Multiple Subjects (2023) (Code)
View Synthesis with Sculpted Neural Points (2022) (Code)
NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action (2022) (Code)
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models (2022) (Code)
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (2023) (Code)
Matte Anything: Interactive Natural Image Matting with Segment Anything Models (2023) (Code)
Infinite Photorealistic Worlds using Procedural Generation (2023) (HN)
VideoComposer: Compositional Video Synthesiswith with Motion Controllability
Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation (2023)
Unpaired Image-to-Image Translation via Neural Schrödinger Bridge (2023) (Code)
Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials (2023) (Code)
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation (2023) (Code)
Multi-scale Attention Guided Pose Transfer (2022) (Code)
SUDS: Scalable Urban Dynamic Scenes
PVO: Panoptic Visual Odometry (2022) (Code)
Fast Segment Anything
FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction (2023) (Code)
DISCO: Disentangled Control for Referring Human Dance Generation in Real World (2023) (Code)
StyleDrop: Text-to-Image Generation in Any Style (2023) (Code)
Segment Anything Meets Point Tracking (2023) (Code)
Denoising Diffusion Models for Plug-and-Play Image Restoration (2023) (Code)
Final2x - Enhance Your Images with Effortless Cross-Platform Super-Resolution at Any Scale.
nr3d_lib - Modules, operators and utilities for 3D neural rendering in single-object, multi-object, categorical and large-scale scenes.
State of Computer Vision 2023
Awesome Object Pose Estimation and Reconstruction
Generative Pretraining in Multimodality (2023) (Code)
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs (2023) (Code)
Collaborative Score Distillation for Consistent Visual Synthesis (Code)
PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning (2022) (Code)
Meta-Transformer: A Unified Framework for Multimodal Learning (2023) (Code)
Awesome Embodied Vision
ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models (2023) (Code)
Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks
Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions (2022) (Code)
ImageNet Model Code
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (2023) (Code)
Zero-1-to-3: Zero-shot One Image to 3D Object (2023) (Code)
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (2023) (Code)
Thin-Plate Spline Motion Model for Image Animation (2022) (Code)
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors (Web) (HN)
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement (2023) (Code)
LISA: Reasoning Segmentation via Large Language Model (2023) (Code)
Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids (2023) (Code)
Key-Locked Rank One Editing for Text-to-Image Personalization (2023) (Code)
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars (2023) (Code)
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (2023) (Code)
PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning (2023) (Code)
Contrastive Model Adaptation for Cross-Condition Robustness in Semantic Segmentation (2023) (Code)
Neuralangelo: High-Fidelity Neural Surface Reconstruction (2023) (Code)
Box-X - Tool-box for efficient build and debug in Python. Especially for Scientific Computing and Computer Vision.
CityNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering (2022) (Code)
NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation (2023) (Code)
Color-NeuS: Reconstructing Neural Implicit Surfaces with Color (2023) (Code)
Inst-Inpaint: Instructing to Remove Objects with Diffusion Models (2023) (Code)
SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning (2023)
3D Gaussian Splatting for Real-Time Radiance Field Rendering (2023) (Code)
XMem++: Production-level Video Segmentation From Few Annotated Frames (2023) (Code)
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization (2023)
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (2023) (Code)
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans (2023) (Code)
Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis (2023) (Code)
Roboflow Inference - Opinionated tool for running inference on state-of-the-art computer vision models. (HN)
FaceFusion - Next generation face swapper and enhancer.
Awesome Adaptive Computation
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis (2023)
StableVideo: Text-driven Consistency-aware Diffusion Video Editing (2023)
Change-Aware Sampling and Contrastive Learning for Satellite Images (2023) (Code)
Dense Text-to-Image Generation with Attention Modulation (2023) (Code)
VisionScript - High-level programming language for using computer vision.
Aligning Pre-training and Fine-tuning in Object Detection (2023)
CoTracker: It is Better to Track Together - Model for tracking any point (pixel) on a video.
MagicEdit: High-Fidelity and Temporally Coherent Video Editing
SAM.cpp - Inference of Meta's Segment Anything Model in pure C/C++. (HN)
Queryable - Run OpenAI's CLIP model on iOS to search photos.
3D-LLM: Injecting the 3D World into Large Language Models
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection
Image Search using CLIP - Search images with a text or image query, using Open AI's pretrained CLIP model.
ResFields: Residual Neural Fields for Spatiotemporal Signals
AdverseCleaner - Remove adversarial noise from images.

Genomics

Immunology

Startups

AWS

Serverless computing

Build systems

Computer vision

Algorithms

Formal verification

Blockchain

Figma

Message queue

Remote Procedure Calls

Psychedelics

Lysergamides

Tryptamines

Renewable energy

CSS

Game development

Game engines

CPU

Nutrition

Drinks

2018

2019

2020

2021

2022

Alfred

Keyboard Maestro

Xcode

Neural networks

Linear algebra

Logic

Automated theorem proving

Mathematical optimization

Statistics

Type Theory

Diseases

Music production

GraphQL

Internet of things

Peer to peer

VPN

GitHub

Containers

Kubernetes

iOS

Linux

Nix

Electrical engineering

Quantum physics

Functional programming

Interactive computing

Software testing

Version control

C

Clojure

C++

Dart

Elixir

Elm

Go

Go libraries

Java

JavaScript

JS libraries

React

Julia

Kotlin

Lisp

Nim

Objective C

OCaml

Processing

Prolog

Python

Python libraries

R language

ReasonML