Computer vision
LiT (Locked-image Tuning) paper is neat. Trying to understand Vision Transformers. Kornia & Scenic seem like great libraries. Imagen is fascinating.
Embedding Methods for Image Search & Computer Vision: Models, Learning, and Inference are nice reads.
Rerun is great CV visualization tool.
Links
- OpenCV - Open Source Computer Vision Library. (Web) (OpenCV Course)
- Gluon CV Toolkit - Provides implementations of the sate-of-the-art (SOTA) deep learning models in computer vision.
- Pythia - Modular framework for vision and language multimodal research. Built on top of PyTorch.
- video-object-removal - Just draw a bounding box and you can remove the object you want to remove.
- GoCV - Go package for computer vision using OpenCV 4 and beyond.
- Sandbox for training convolutional networks for computer vision
- Get started with Computer Vision, Deep Learning, and OpenCV
- TorchCV - PyTorch-Based Framework for Deep Learning in Computer Vision.
- AI Habitat - Flexible, high-performance 3D simulator for Embodied AI research.
- Kornia - Open Source Differentiable Computer Vision Library for PyTorch. (Web)
- Roboflow - Raw images to trained computer vision model. (Article)
- PySlowFast - Open source video understanding codebase from FAIR that provides state-of-the-art video classification models.
- How to Convert a Picture to Numbers
- Awesome Computer Vision
- The Ancient Secrets of Computer Vision (2018)
- Variational Methods for Computer Vision lectures (2013)
- Classy Vision - New end-to-end, PyTorch-based framework for large-scale training of state-of-the-art image and video classification models.
- Meshroom - 3D Reconstruction Software.
- AliceVision - Photogrammetric Computer Vision Framework. (Code) (GitHub)
- PyTorch3d - Provides efficient, reusable components for 3D Computer Vision research with PyTorch. (Web)
- Face Recognition - World's simplest facial recognition api for Python and the command line.
- Deep Hough Voting for 3D Object Detection in Point Clouds
- Point Cloud Library - Standalone, large scale, open project for 2D/3D image and point cloud processing.
- Disappearing-People - Removing people from complex backgrounds in real time using TensorFlow.js in the web browser. (HN)
- Best Practices, code samples, and documentation for Computer Vision
- Computer Vision Basics in Microsoft Excel
- PolyGen: An Autoregressive Generative Model of 3D Meshes (2020)
- Sophus - C++ implementation of Lie Groups using Eigen.
- SOLT - Streaming over lightweight data transformations.
- Awesome Interaction-aware Behavior and Trajectory Prediction
- SynSin: End-to-end View Synthesis from a Single Image (2020) (Code)
- Pixel2Mesh - Generating 3D Mesh Models from Single RGB Images.
- First Order Motion Model for Image Animation (Code)
- PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution
- Learning to See Through Obstructions
- Learning to Cluster Faces on an Affinity Graph (LTC)
- Avatarify - Avatars for Zoom and Skype.
- SPSR - PyTorch implementation of Structure-Preserving Super Resolution with Gradient Guidance.
- OISR-PyTorch - PyTorch implementation of "ODE-inspired Network Design for Single Image Super-Resolution.
- 3D Photography using Context-aware Layered Depth Inpainting
- CenterMask : Real-Time Anchor-Free Instance Segmentation
- Interview with Dmytro Mushkin | Computer Vision Research | Kaggle, ML & Education (2020)
- Pytorch code for ICLR-20 Paper "Learning to Explore using Active Neural SLAM"
- FaceTracker - Real time deformable face tracking in C++ with OpenCV 3.
- Awesome Super Resolution
- Adversarial Latent Autoencoders
- ElasticFusion - Real-time dense visual SLAM system capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera.
- StegaStamp: Invisible Hyperlinks in Physical Photographs
- Pose Animator - Takes a 2D vector illustration and animates its containing curves in real-time based on the recognition result from PoseNet and FaceMesh. (HN)
- fvcore - Collection of common code that's shared among different research projects in FAIR computer vision team.
- Making Sense of Vision and Touch: Multimodal Representations for Contact-Rich Tasks (2020)
- ScreenPoint - Project an image centroid to another image using OpenCV.
- U^2-Net - Code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection".
- TorchIO - Tools for medical image processing in deep learning.
- Real time Image Animation in OpenCV using first order model (HN)
- OpenMV (Open-Source Machine Vision) - Aims at making machine vision more accessible to beginners by developing a user-friendly, open-source, low-cost machine vision platform.
- TSD - 1st place models in Google OpenImage Detection Challenge 2019.
- Training-Time-Friendly Network for Real-Time Object Detection
- Big Transfer (BiT): General Visual Representation Learning
- Fast Human Pose Estimation CVPR2019
- Deep High-Resolution Representation Learning for Human Pose Estimation
- Background Matting: The World is Your Green Screen
- DE⫶TR: End-to-End Object Detection with Transformers
- PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization
- Tracking Objects as Points
- VIBE - Video Inference for Human Body Pose and Shape Estimation.
- SRZoo - Integrated repository for super-resolution using deep learning.
- mAP (mean Average Precision) - Evaluates the performance of your neural net for object recognition.
- Neural Pose Transfer by Spatially Adaptive Instance Normalization (2020)
- Awesome Neural Rendering
- Learning To Classify Images Without Labels
- Deep Leakage From Gradients (2019)
- 3Dflow - Offers customized computer vision software solutions.
- labelme - Image Polygonal Annotation with Python.
- imgviz - Image Visualization Tools.
- Attention-Guided Hierarchical Structure Aggregation for Image Matting
- YOLOv5 Is Here: State-of-the-Art Object Detection at 140 FPS (2020) (HN) (Code)
- DetectoRS - Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.
- PyTorch implementation of paper Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs
- VirTex: Learning Visual Representations from Textual Annotations
- High-Resolution 3D Human Digitization from A Single Image
- FairMOT - Simple baseline for one-shot multi-object tracking.
- Implicit Neural Representations with Periodic Activation Functions (2020)
- MSeg: A Composite Dataset for Multi-Domain Segmentation
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
- MMDetection - OpenMMLab Detection Toolbox and Benchmark.
- Fourier Feature Networks in TensorFlow 2
- Computer Vision Lab | ETH Zurich
- PyTorch Computer Vision Library for Experts and Beginners (2020)
- Computer Vision Pretrained Models
- Fawkes: Image “Cloaking” for Personal Privacy (HN)
- Motion - Software motion detector.
- Supervised 3D Mesh Reconstruction (2020)
- NeRF in the Wild - Neural Radiance Fields for Unconstrained Photo Collections.
- NASA: Neural Articulated Shape Approximation (2020)
- An Overview of Deep Learning Architectures in Few-Shot Learning Domain (2020)
- FutureMapping: The Computational Structure of Spatial AI Systems (2018) (Tweet)
- Optimal Peanut Butter and Banana Sandwiches (2020) (Twitter)
- Gesture Recognition with Line Integrals (Code)
- Computer Vision: Looking Back to Look Forward (2020)
- DAIN (Depth-Aware Video Frame Interpolation)
- Picsellia - Development platform dedicated to Computer Vision.
- Official implementation of "PifPaf: Composite Fields for Human Pose Estimation" in PyTorch
- Object Recognition with Gradient-Based Learning (1999)
- Imaginaire - NVIDIA PyTorch GAN library with distributed and mixed precision support. (Docs)
- DeepBackSub - Virtual Video Device for Background Replacement with Deep Semantic Segmentation.
- Awesome Tiny Object Detection
- Flow-edge Guided Video Completion
- 5 Things to look for in a Computer Vision startup job (2020)
- Transformers for Image Recognition at Scale (2020) (HN)
- nnU-Net - Segmentation method that is designed to deal with the dataset diversity.
- batchgenerators - Framework for data augmentation for 2D and 3D image classification and segmentation.
- Lookuq - App to create object detection projects without coding. (HN)
- InsightFace - Face Analysis Project on MXNet. (Web)
- PyTorch implementation of SwAV (Swapping Assignments between Views)
- Asymmetric Loss For Multi-Label Classification in PyTorch
- Antialiased CNNs - Making Convolutional Networks Shift-Invariant Again.
- Perceptual Similarity Metric and Dataset - Unreasonable Effectiveness of Deep Features as a Perceptual Metric.
- Deep Learning Anime Papers
- Vision Transformer - Models from the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Handsfree.js - Wrapper library around computer vision models for working with face pointers, assistive tech, and creative expression. (Web)
- ZeroQ: A Novel Zero Shot Quantization Framework
- SqueezeNext - Contains the Caffe implementation of SqueezeNext.
- ANODE: Adjoint Based Neural ODEs
- Python Video Stabilization using OpenCV
- Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
- TorchCV - PyTorch vision library mimics ChainerCV.
- Vision Transformer in PyTorch
- MedicalTorch - Medical imaging framework for PyTorch. (Docs)
- imagecluster - Cluster images based on image content using a pre-trained deep neural network, optional time distance scaling and hierarchical clustering.
- Detecto - Build fully-functioning computer vision models with PyTorch. (Docs)
- EmoPy - Deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER).
- PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder"
- Label Decoupling Framework for Salient Object Detection
- MONAI - PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. (Web)
- Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
- Faster R-CNN Explained for Object Detection Tasks (2020)
- How to Install OpenCV on a Raspberry Pi (2020)
- Contextual Encoder-Decoder Network for Visual Saliency Prediction
- PyImageSearch - Master Computer Vision, Deep Learning, and OpenCV.
- Natural Adversarial Examples - Harder ImageNet Test Set.
- How to upload 50 OpenCV frames into cloud storage within 1 second (2020)
- Egocentric Videoconferencing (2020) - Method for egocentric videoconferencing that enables handsfree video calls, for instance by people wearing smart glasses or other mixedreality devices. (Video overview)
- gradslam - Open source differentiable dense SLAM library for PyTorch.
- High-Resolution Daytime Translation Without Domain Labels
- Holistically-Nested Edge Detection
- pycls - Image classification codebase, written in PyTorch.
- PyTorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
- How Useful is Self-Supervised Pretraining for Visual Tasks?
- PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
- InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
- Multi-object trackers in Python - Easy to use implementation of various multi-object tracking algorithms.
- Stanford Vision and Learning Lab (GitHub)
- Learning computer vision. Overview of methods and software (2018)
- Image embeddings. Image similarity and building (2020) (Code)
- All You Need to Know About Object Detection Systems (2020)
- Lightly - Computer vision framework for self-supervised learning.
- DISK: Learning local features with policy gradient (2020) (Code)
- Caer - Lightweight Computer Vision library for high-performance AI research. (Intro)
- Awesome Image to Image Translation Papers
- EfficientDet: Scalable and Efficient Object Detection, in PyTorch
- UNet: semantic segmentation with PyTorch
- Exploring Simple Siamese Representation Learning (2020) (Code) (Code)
- Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
- Nerfies: Deformable Neural Radiance Fields (Code)
- Timeception for Complex Action Recognition (2019) (Code)
- Programming Computer Vision with Python (2014) (Code) (Notes)
- Fast and Accurate One-Stage Space-Time Video Super-Resolution (2020)
- pixelNeRF: Neural Radiance Fields from One or Few Images (2020) (Code)
- vedadet - Single stage object detector toolbox based on PyTorch.
- OneNet: End-to-End One-Stage Object Detection by Classification Cost
- Consistent Video Depth Estimation - Estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.
- Implicit Neural Representations with Periodic Activation Functions
- Computational Imaging Stanford Lab
- Trimap-Free Solution for Portrait Matting in Real Time
- Local Light Field Fusion
- Awesome Crowd Counting
- Neural Sparse Voxel Fields (NSVF)
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing (2020) (Tweet)
- SharpAI DeepCamera - Source stack for machine learning engineering with private deployment and AutoML for edge computing. (HN)
- Contrastive learning of global and local features for medical image segmentation with limited annotations
- Real-Time High-Resolution Background Matting (2020) (Code)
- Torchreid - Deep learning person re-identification in PyTorch.
- Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
- img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
- SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection
- PCT: Point Cloud Transformer (2020) (Code)
- Learning Continuous Image Representation with Local Implicit Image Function (2020) (Code)
- Computer Vision Annotation Tool (CVAT)
- DeiT: Data-efficient Image Transformers
- Awesome Implicit Neural Representations
- ImageAI - Python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities. (Web)
- RAIVN Lab - Reasoning, AI and VisioN (RAIVN) Lab. (GitHub)
- Norfair - Customizable lightweight Python library for real-time 2D object tracking.
- Universal Style Transfer in PyTorch
- NVIDIA Deep learning Dataset Synthesizer (NDDS)
- Object Detection at 2530 FPS with TensorRT and 8-Bit Quantization (2020)
- HTML4Vision - Simple HTML visualization tool for computer vision research.
- Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
- Taming Transformers for High-Resolution Image Synthesis
- X-Temporal - Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs.
- NanoDet - Super fast and lightweight anchor-free object detection model. Real-time on mobile devices.
- PyTorch Image Models
- Awesome Vision and Language - Curated list of awesome vision and language resources.
- DropBlock: A regularization method for convolutional networks (2018) (Code)
- Glasses - Compact, concise and customizable deep learning computer vision library. (Web)
- Explorable Super Resolution (2019)
- PySceneDetect - Python and OpenCV-based scene cut/transition detection program & library.
- Best Practices for Building Computer Vision Models (2021)
- TIDE - General Toolbox for Identifying Object Detection Errors.
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals (2020) (Code)
- Unsplash Image Search - Search photos on Unsplash using natural language.
- Kimera Semantics - Real-Time 3D Semantic Reconstruction from 2D data.
- Voxblox++ - Volumetric object-level semantic mapping framework.
- Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces (Code)
- Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video (2020) (Code)
- DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation (2019) (Code)
- Awesome Neural Radiance Fields
- D2Det: Towards High Quality Object Detection and Instance Segmentation (2020)
- DetCo: Unsupervised Contrastive Learning for Object Detection (2021) (Code) (Code)
- Computer Vision Video Lectures - Curated list of free, high-quality, university-level courses with video lectures related to the field of Computer Vision.
- Cord - Training data toolbox for computer vision. (HN)
- Text-Guided Editing of Images (Using CLIP and StyleGAN)
- torchvision - Datasets, Transforms and Models specific to Computer Vision. (Web)
- MeInGame: Create a Game Character Face from a Single Portrait (2021) (Code)
- Awesome Deep Vision
- dataset-tools - Tools for quickly normalizing image datasets.
- Using Streamlit to visualize object detection output (2021)
- Mobile Computer Vision @ Facebook
- Opening the black box of vision AI algorithms (2021)
- CompreFace - Free face recognition solution that can be easily integrated into any IT system without prior machine learning skills.
- IBRNet: Learning Multi-View Image-Based Rendering (2021) (Code)
- From Coarse to Fine: Robust Hierarchical Localization at Large Scale (2019) (Code)
- Camera Response Function (2021)
- I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image (2020) (Code)
- SkipNet: Learning Dynamic Routing in Convolutional Networks (2018) (Code)
- Mrcal - Camera Calibrations and More. (HN)
- Digging Into Self-Supervised Monocular Depth Estimation (2019) (Code) (Code)
- VISSL - FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images. (Web)
- Zumo Labs - Generate custom synthetic data sets that result in more robust and reliable computer vision models. (GitHub)
- Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors (2020) (Code)
- Perceiver: General Perception with Iterative Attention (2021) (Code)
- SEER: The start of a more powerful, flexible, and accessible era for computer vision (2021)
- NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction (2021)
- Neural 3D Video Synthesis
- Involution: Inverting the Inherence of Convolution for Visual Recognition (2021) (Code)
- Awesome Causality in Computer Vision
- Vision Transformers for Dense Prediction (2021) (Code)
- LoFTR: Detector-Free Local Feature Matching with Transformers (2021) (Code)
- ccv - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
- Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes (2020) (Code)
- AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (2021) (Tweet)
- Computer Vision and Embroidery (2021) (Code)
- mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields (2021)
- Python libraries I use every day for computer vision work (2021)
- Awesome Temporal Sentence Grounding in Videos
- The Affective Growth of Computer Vision
- Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (2020) (Code)
- End-to-End Video Instance Segmentation with Transformers (2021) (Code)
- SAHI: Slicing Aided Hyper Inference
- FOVO: A new 3D rendering technique based on human vision (2020) (HN)
- Is Space-Time Attention All You Need for Video Understanding? (2021) (Code)
- Awesome Visual-Transformer - Transformer with Computer-Vision (CV) papers.
- PyTorchVideo - Deep learning library for video understanding research. (Web)
- Self-supervised Video Object Segmentation by Motion Grouping (2021) (HN) (Code)
- torchvideo - Datasets, transforms and samplers for video in PyTorch.
- A General and Adaptive Robust Loss Function (2019) (Code)
- Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (2020) (Code)
- MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation (2021)
- Vizy - AI Camera.
- MMPX Style-Preserving Pixel Art Magnification (2021) (HN)
- Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion (Code)
- SuperPoint: Self-Supervised Interest Point Detection and Description (2018) (Code)
- Multi-Stage Progressive Image Restoration (2021) (Code)
- COLMAP - General-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. (Docs)
- Awesome Vision-based SLAM / Visual Odometry
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction (2021) (Code)
- HIPCL - OpenCL/SPIR-V implementation of HIP.
- MMCV - Foundational library for computer vision research and supports many research projects. (Docs)
- MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (2021) (Code)
- Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (2021) (Code) (Code)
- Emerging Properties in Self-Supervised Vision Transformers (2021) (Code) (Tweet) (Tweet)
- Geometry-Free View Synthesis: Transformers and no 3D Priors (2021) (Code)
- Easily Transform Portraits of People into AI Aberrations Using StyleCLIP (2021)
- DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates (2021) (Code)
- Onepanel - Open and extensible integrated development environment (IDE) for computer vision. (Web)
- Vector Neurons: A General Framework for SO(3)-Equivariant Networks (2021) (Code)
- ISTR: End-to-End Instance Segmentation with Transformers (2021) (Code)
- MLP-Mixer: An all-MLP Architecture for Vision (2021) (Code) (Code)
- Self-attention building blocks for computer vision applications in PyTorch
- LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary (2021) (Web) (Code)
- Neural Rendering: How Low Can You Go in Terms of Input? (2021)
- Enhancing Photorealism Enhancement (2021) (Paper) (Code)
- DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control (2021) (Code)
- Omnimatte: Associating Objects and Their Effects in Video (2021)
- Rethinking "Batch" in BatchNorm (2021)
- Most popular metrics used to evaluate object detection algorithms
- UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation (2020) (Code)
- Synthetic for Computer Vision - List of synthetic dataset and tools for computer vision.
- vision_blender - Blender addon for generating synthetic ground truth data for Computer Vision applications.
- Easy Few-Shot Learning - Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.
- BasicSR (Basic Super Restoration) - Open source image and video restoration toolbox based on PyTorch, such as super-resolution, denoise, deblurring, JPEG artifacts removal, etc.
- Intriguing Properties of Vision Transformers (2021) (Reddit)
- DIY Amazon Go – computer vision tutorial for cashierless checkout
- Image Retrieval in the Wild (2020)
- Awesome Transformer in CV papers
- Sensor Calibration from Scratch with Rust (2021)
- Tangram Vision - Integrate, Calibrate Perception Sensors For Robots, Drones & Automation. (Blog)
- Rust CV - Project to implement computer vision algorithms, abstractions, and systems in Rust.
- Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control (2021) (HN)
- Robust Instance Segmentation through Reasoning about Multi-Object Occlusion (2021) (Code)
- MERLOT: Multimodal Neural Script Knowledge Models (2021) (Tweet)
- Scaling Vision Transformers (2021)
- Self-Supervised Scene De-occlusion (2020) (Code)
- Pivotal Tuning for Latent-based Editing of Real Images (2021) (Code)
- FLAME: Articulated Expressive 3D Head Model (Code)
- XCiT: Cross-Covariance Image Transformers (2021) (Code)
- Robust Consistent Video Depth Estimation (2021) (Code)
- cvpods - All-in-one Toolbox for Computer Vision Research.
- CDFI: Compression-Driven Network Design for Frame Interpolation (2021) (Code)
- NeRF--: Neural Radiance Fields Without Known Camera Parameters (2021) (Code) (Code)
- Oxford Active Vision Laboratory (GitHub)
- Computer Vision: Algorithms and Applications, 2nd ed.
- motionEyeOS - Linux distribution that turns your single board computer into a video surveillance system.
- Long-Short Transformer: Efficient Transformers for Language and Vision (2021) (Code)
- Feature Visualization – How NNs understand images (2017)
- What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis (2019) (Code)
- Convolutional Hough Matching Networks (2021) (Code)
- Efficient Self-Supervised Vision Transformers (EsViT)
- ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases (2021) (Code) (Paper Read) (Article)
- CO3D: Common Objects In 3D - Tools for working with the Common Objects in 3D (CO3D) dataset.
- ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition (2021) (Code)
- Vision Transformer Architecture Search (2021) (Code)
- TSIT: A Simple and Versatile Framework for Image-to-Image Translation (2020) (Code)
- Recognizing People in Photos Through Private On-Device Machine Learning (2021)
- CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (2021) (Code)
- HPNet: Deep Primitive Segmentation Using Hybrid Representations (2021) (Code)
- Portal - Fastest way to load and visualize your deep neural networks on images and videos.
- Awesome Human Pose Estimation
- Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)
- PyTorch implementation for Vision Transformer
- Repulsive Curves - Model 2D & 3D curves while avoiding self-intersection. (Tweet) (Code) (HN)
- SDEdit: Image Synthesis and Editing with Stochastic Differential Equations (Code)
- Region Similarity Representation Learning (2021) (Code)
- NeX: Real-time View Synthesis with Neural Basis Expansion (2021) (Code)
- Convolutional Occupancy Networks (2020) (Code)
- Learning Optical Flow from a Few Matches (2021) (Code)
- Visual Parser: Representing Part-whole Hierarchies with Transformers (2021) (Code)
- Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation (Code)
- On Generating Transferable Targeted Perturbations (2021) (Code)
- Awesome Scene Understanding - List of papers for scene understanding.
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation (2021) (Code)
- DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks (2021) (Code)
- Object Detection in an Hour (2021) (HN)
- Fixing the train-test resolution discrepancy (2020) (Code)
- Align Deep Features for Oriented Object Detection (2020) (Code)
- Vision-Language Transformer and Query Generation for Referring Segmentation (2021) (Code)
- Depth-supervised NeRF: Fewer Views and Faster Training for Free (2021) (Code)
- SwinIR: Image Restoration Using Swin Transformer (2021) (Code)
- You Only Learn One Representation: Unified Network for Multiple Tasks (2021) (Code)
- Probabilistic Modeling for Human Mesh Recovery (2021) (Code)
- BARF: Bundle-Adjusting Neural Radiance Fields (2021) (Code)
- Self-Calibrating Neural Radiance Fields (2021) (Code)
- Transformers-Tutorials - Demos I made with the Transformers library by HuggingFace.
- 3D Human Texture Estimation from a Single Image with Transformers (2021) (Code)
- CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval (2021) (Code)
- RAFT: Recurrent All Pairs Field Transforms for Optical Flow (2020) (Code)
- Volume rendering + 3D implicit surface = Neural 3D Reconstruction
- Hierarchical Deep Stereo Matching on High-resolution Images (2019) (Code)
- Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering (2021) (Code)
- Image Synthesis via Semantic Composition (2021) (Code)
- Awesome-Edge-Detection-Papers
- Awesome-Image-Colorization
- Learning A Single Network for Scale-Arbitrary Super-Resolution (2021) (Code)
- Face Recognition - 2D and 3D Face alignment library build using PyTorch.
- Awesome image retrieval papers
- PeekingDuck - Modular framework built to simplify Computer Vision inference workloads.
- Pri3D: Can 3D Priors Help 2D Representation Learning? (2021) (Code)
- FaceXLib - Aims at providing ready-to-use face-related functions based on current STOA open-source methods.
- MMAction2 - Open-source toolbox for video understanding based on PyTorch.
- Awesome Collision Detection
- Video Super-Resolution Transformer (2021) (Code)
- NeRF Atlas - Collection of NeRF extensions for fun and experimentation.
- Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, BSRGAN, SwinIR
- Uformer: A General U-Shaped Transformer for Image Restoration (2021) (Code) (Code)
- Self-Supervised Pretraining Improves Self-Supervised Pretraining (2021) (Code)
- SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes (2021) (Code)
- HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021
- IceVision - Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come. (Docs)
- e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks (2021) (Tweet)
- Attention Gated Networks (Image Classification & Segmentation) in PyTorch
- Full-Duplex Strategy for Video Object Segmentation (2021) (Code)
- YoHa - Practical hand tracking engine. (HN) (Code)
- Deep Learning for Face Anti-Spoofing: A Survey (2021) (Code)
- A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (2021) (Code)
- Resolution-robust Large Mask Inpainting with Fourier Convolutions (2021) (Code)
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (2021) (Code) (Code) (HN)
- ADOP: Approximate Differentiable One-Pixel Point Rendering (2021) (Tweet) (Tweet) (Code)
- Patches Are All You Need? (2021) (Code)
- ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation (2020) (Code)
- Video Panoptic Segmentation (2020) (Code)
- Awesome-ICCV2021-Low-Level-Vision - Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation.
- Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts (2021) (Code)
- Non-deep Networks (2021) (Code)
- receptivefield - Gradient based receptive field estimation for Convolutional Neural Networks.
- Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations (2021) (Code)
- Neural Articulated Radiance Field (2021) (Code)
- Efficient Visual Pretraining with Contrastive Detection (2021) (Code)
- VoTT (Visual Object Tagging Tool) - Source annotation and labeling tool for image and video assets.
- FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes (2021) (Code)
- ByteTrack: Multi-Object Tracking by Associating Every Detection Box (2021) (Code)
- Dense Video Captioning with Bi-modal Transformer (2020) (Code)
- PyTorch-Encoding - CV toolkit for my papers. (Docs)
- Space Time Recurrent Memory Network (2021) (Code)
- CVNets - Library for training computer vision networks.
- Scenic - Jax Library for Computer Vision Research and Beyond. (Paper)
- CV Arxiv Daily (Code)
- OpenVisionCapsules - Set of libraries for encapsulating smart vision algorithms.
- MedMNIST: Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification (Code)
- Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (2021) (Code)
- Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces (2021) (Code)
- The 2021 Image Similarity Dataset and Challenge (2021) (Code)
- K-Net: Towards Unified Image Segmentation (2021) (Code)
- Yolov5 + Deep Sort with PyTorch
- Shape As Points: A Differentiable Poisson Solver (2021) (Code)
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm (2021) (Code)
- Awesome Vision-Language Navigation
- An Exploration of Embodied Visual Exploration (2021) (Code)
- DVC: An End-to-end Deep Video Compression Framework (2019) (Code)
- Pixray - Neural image generation.
- Unsupervised Learning of Compositional Energy Concepts (2021) (Tweet)
- Learning with Noisy Labels for Robust Point Cloud Segmentation (2021) (Code)
- Kalidoface - Become a virtual character with just your webcam. (Web)
- KalidoKit - Face, Pose, and Hand Tracking Kinematics.
- The Ancient Secrets of Computer Vision
- Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training (2020) (Code)
- PyGaze - Open source eye-tracking software and more. (HN)
- Exploring Relational Context for Multi-Task Dense Prediction (2021) (Code)
- Neural Scene Graphs for Dynamic Scenes (2021) (Code)
- Image Super-Resolution via Iterative Refinement (HN) (Code)
- UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning (2021) (Code)
- Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers (2021) (Code)
- Multimodal Virtual Point 3D Detection (2021) (Code)
- SiT: Self-supervised vIsion Transformer
- Attention Mechanisms in Computer Vision: A Survey (2021)
- Awesome Vision Attention Papers
- FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation (2021) (Code)
- RenderNet: A deep convolutional network for differentiable rendering from 3D shapes (2018) (Code)
- Masked Autoencoders Are Scalable Vision Learners (2021) (Code) (Code) (Code)
- BoostingMonocularDepth
- It's About Time: Analog Clock Reading in the Wild (2021) (Tweet) (Code)
- Learning to Compose Visual Relations (2021) (Code)
- LF-Net: Learning Local Features from Images (2018) (Code)
- Aligning Pretraining for Detection via Object-Level Contrastive Learning (2021) (Code)
- Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis (2021) (Code)
- Deep unfolding network for image super-resolution (2020)
- VOLO: Vision Outlooker for Visual Recognition (2021) (Code)
- Direct Multi-view Multi-person 3D Pose Estimation (2021) (Code)
- Image2Mesh: A learning framework for single image 3D reconstruction (2019) (Code)
- GammaCV - WebGL accelerated Computer Vision library for modern web applications. (Web)
- Localizing Objects with Self-Supervised Transformers and no Labels (2021) (Code)
- Harvester - GenICam-based Image Acquisition Python Library.
- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion (2021) (Code) (PyTorch Code)
- ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision (2021) (Code)
- MetaFormer is Actually What You Need for Vision (2021) (Code)
- ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators (2021) (Code)
- Mesa: A Memory-saving Training Framework for Transformers (2021) (Code)
- MMPose - Open-source toolbox for pose estimation based on PyTorch. (Docs)
- An Empirical Study of Training End-to-End Vision-and-Language Transformers (2021) (Code)
- Useful computer vision PhD resources
- Tenyks - Data-centric Computer Vision.
- Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation (2021) (Code)
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields (2021) (Code)
- Learning to See by Looking at Noise (2021) (Code)
- iBOT: Image BERT Pre-Training with Online Tokenizer (2021) (Code)
- Grounded Language-Image Pre-training (2021) (Code)
- 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction (2016) (Code)
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks (Code)
- Awesome Visual Grounding
- Are Transformers More Robust Than CNNs? (2021) (Code)
- Plenoxels: Radiance Fields without Neural Networks (2021) (Code) (Code)
- GFPGAN - Developing Practical Algorithms for Real-world Face Restoration.
- Awesome Video Stabilization
- MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo (2021) (Code)
- Tracking People with 3D Representations (2021) (Code)
- Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection (2019:) (Code)
- Learning to Stylize Novel Views (2021) (Code)
- YOLOX - High-performance anchor-free YOLO. (Docs)
- PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop (2021) (Code)
- SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation (2021) (Code)
- NeRD: Neural Reflectance Decomposition from Image Collections (2021) (Code)
- Vector Quantized Diffusion Model for Text-to-Image Synthesis (2021) (Code) (Code) (Code)
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models (2021) (Code)
- SynthDet - End-to-end object detection pipeline using synthetic data.
- MPViT: Multi-Path Vision Transformer for Dense Prediction (2021) (Code)
- StyleSwin: Transformer-based GAN for High-resolution Image Generation (2021) (Code)
- Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline (2021) (Code)
- SLIP: Self-supervision meets Language-Image Pre-training (2021) (Code)
- General Facial Representation Learning in a Visual-Linguistic Manner (2021) (Code) (Code)
- HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields (Code) (HN)
- Learning to Regress Bodies from Images using Differentiable Semantic Rendering (2021) (Code)
- High-Resolution Image Synthesis with Latent Diffusion Models (2021) (Code)
- Photorealistic Audio-driven Video Portraits (2020) (Code)
- Awesome Hand Pose Estimation
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (2021) (Code)
- Transformer Interpretability Beyond Attention Visualization (2021) (Code)
- StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis (2021) (Code)
- Light Field Image Super-Resolution with Transformers (2021) (Code)
- Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes (2021) (Code)
- DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample (2021) (Code)
- RAFT-3D: Scene Flow using Rigid-Motion Embeddings (2021) (Code)
- Unsupervised Indoor Depth Estimation (2020) (Code)
- A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose (2021) (Code)
- Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective (2021) (Code)
- Sara - Easy-to-Use C++ Computer Vision Library.
- RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching (2021) (Code)
- U-2-Net: Going Deeper with Nested U-Structure for Salient Object Detection (2020) (Code)
- Language as Queries for Referring Video Object Segmentation (2022) (Code)
- Localization with Sampling-Argmax (2021) (Code)
- VOCA: Voice Operated Character Animation (Code)
- CVZone - Computer vision package that makes its easy to run Image processing and AI functions.
- Deepface - Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python.
- Location-aware Single Image Reflection Removal (2021) (Code)
- MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement (2021) (Code)
- Detecting Twenty-thousand Classes using Image-level Supervision (2022) (Code)
- Language-driven Semantic Segmentation (2022) (Code)
- Rethinking Nearest Neighbors for Visual Classification (2021) (Code)
- Vision Transformer with Deformable Attention (2022) (Code) (Code)
- KerasCV - Industry-strength Computer Vision workflows with Keras.
- Instant Neural Graphics Primitives - Lightning fast NeRF and more.
- Dynamic Head: Unifying Object Detection Heads with Attentions (2021) (Code)
- ELSA: Enhanced Local Self-Attention for Vision Transformer (2021) (Code)
- FFCV - Fast Forward Computer Vision (and other ML workloads!) (Web)
- Awesome Vit - Curated list and survey of awesome Vision Transformers.
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (2022) (Code) (Code) (Video Summary) (HN)
- Road Extraction by Deep Residual U-Net (2017) (Code)
- Single-Stage 6D Object Pose Estimation (2019) (Code)
- Visual Task Adaptation Benchmark (VTAB)
- TAda! Temporally-Adaptive Convolutions for Video Understanding (2022) (Code)
- UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction (2021) (Code)
- Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects (2020) (Code)
- VRT: A Video Restoration Transformer (2021) (Code)
- Unknown Object Segmentation from Stereo Images (2021) (Code)
- Stacked Cross Attention for Image-Text Matching (2018) (Code)
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (2022) (Code)
- DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows (2021) (Code)
- DocFormer: End-to-End Transformer for Document Understanding (2022) (Code)
- SeMask: Semantically Masked Transformers for Semantic Segmentation (2021) (Code)
- Image Quality Assessment: Unifying Structure and Texture Similarity (2020) (Code)
- Learning Super-Features for Image Retrieval (2022)
- YOLOv7 - Framework Beyond Detection.
- A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model (2021) (Code)
- Single/Multiple Object Tracking and Segmentation
- Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection (2021) (Code)
- HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping (2021) (Code)
- Scalable Large Scene Neural View Synthesis (2022) (HN)
- Transformer Recipe - Quick recipe to learn all about Transformers.
- NeROIC: Neural Rendering of Objects from Online Image Collections (2022) (Code)
- DiffusionNet: Discretization Agnostic Learning on Surfaces (2022) (Code)
- FILM: Frame Interpolation for Large Motion (2022) (Code) (HN)
- Learning Signed Distance Field for Multi-view Surface Reconstruction (2021) (Code)
- Deep Metric Learning in PyTorch
- ICON: Implicit Clothed humans Obtained from Normals (2021) (Code)
- CLIPasso: Semantically-Aware Object Sketching (2022) (Code)
- BANMo: Building Animatable 3D Neural Models from Many Casual Videos (2022) (Code)
- How Do Vision Transformers Work?
- Top 10 Computer Vision Papers of 2021
- Exploring Sparsity in Image Super-Resolution for Efficient Inference (2021) (Code)
- AutoInt: Automatic Integration for Fast Neural Volume Rendering (2021)
- Learning to Prompt for Vision-Language Models (2021) (Code)
- Summarizing Videos with Attention (2019) (Code)
- vkit - Toolkit designed for CV (Computer Vision) developers. (Docs)
- Generative Adversarial Graph Convolutional Networks for Human Action Synthesis (2021) (Code)
- Awesome Image Matting
- Image-to-Markup Generation with Coarse-to-Fine Attention (Code)
- Push-ups with Python, mediapipe and OpenCV (HN)
- Lama-cleaner: Image inpainting tool powered by LaMa
- Vision-Language Pre-Training with Triple Contrastive Learning (2022) (Code)
- 3D Machine Learning resources/papers
- FiftyOne - Open-source tool for building high-quality datasets and computer vision models.
- Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut (2022) (Code)
- Awesome Multiple object Tracking
- Rethinking Coarse-to-Fine Approach in Single Image Deblurring (2021) (Code)
- Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling (2021) (Code)
- As-ViT: Auto-scaling Vision Transformers without Training (2022) (Code)
- Awesome 3D Body Papers
- RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth (2021) (Code)
- Image Similarity Challenge
- Blended Diffusion for Text-driven Editing of Natural Images (2021) (Code)
- The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization (2021) (Code)
- Awesome Object Pose
- Video Enhancement papers/resources
- PowerQE: An Open Framework for Quality Enhancement of Compressed Visual Data
- Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels (2022) (Code)
- Accurate Image Alignment and Registration Using OpenCV (2022) (HN)
- Video Grounding and Captioning
- Awesome Detection Transformer
- StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis (2021) (Code) (Web) (HN)
- Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition (2020) (Code)
- MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation (2021) (Code)
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (2022) (Code)
- Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation (2022)
- CycleMLP: A MLP-like Architecture for Dense Prediction (2022) (Code)
- Image Quality Assessment Benchmark
- StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation (2021) (Code)
- Transformers, originally designed to handle language, are taking on vision (2022) (HN)
- Fast Image Processing with Fully-Convolutional Networks (2017) (Code)
- Efficient Attention: Attention with Linear Complexities (2020) (Code)
- Label-Efficient Semantic Segmentation with Diffusion Models (2022) (Code)
- hloc - Modular toolbox for state-of-the-art 6-DoF visual localization.
- All Tokens Matter: Token Labeling for Training Better Vision Transformers (2021) (Code)
- Deformable ConvNets v2: More Deformable, Better Results (2018) (Code)
- Restormer: Efficient Transformer for High-Resolution Image Restoration (2021) (Code)
- Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice (2022) (Code)
- NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video (2021) (Code)
- Awesome 3D Human Reconstruction
- Awesome 3D Human Resources List
- A ConvNet for the 2020s (2022) (Code) (Code)
- Remote-sensing-image-semantic-segmentation - Uses Unet-based improved networks to study Remote sensing image semantic segmentation, which is based on keras.
- Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies (2021) (Code)
- TensoRF: Tensorial Radiance Fields (2022) (Code)
- Autoregressive Image Generation using Residual Quantization (2022) (Code) (Code)
- Pix2Pix Timbre Transfer
- One-Shot Adaptation of GAN in Just One CLIP (2022) (Code)
- PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds (2021) (Code)
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training (2022) (Code)
- Awesome Masked Image Modeling
- BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training (2022) (Code)
- A Transformer-Based Siamese Network for Change Detection (2022) (Code)
- Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition (2021) (Code)
- Robust fine-tuning of zero-shot models (2022) (Code)
- DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision (2021) (Code)
- GroupViT: Semantic Segmentation Emerges from Text Supervision (2022) (Code)
- HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening (2022) (Code)
- TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing (2022) (Code)
- DeepStream-Yolo - NVIDIA DeepStream SDK 6.0.1 configuration for YOLO models.
- An Empirical Investigation of 3D Anomaly Detection and Segmentation (2022) (Code)
- Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation (2021) (Code)
- Layered Neural Atlases for Consistent Video Editing (2021) (Code)
- TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution (2020)
- Shape from Polarization for Complex Scenes in the Wild (2022) (Code)
- Pix2Seq - General framework for turning RGB pixels into semantically meaningful sequences.
- Gait Recognition in the Wild with Dense 3D Representations and A Benchmark (2022) (Code)
- Ensembling Hugging Face Transformers made easy
- Relational Knowledge Distillation (2019) (Code)
- NICE-SLAM: Neural Implicit Scalable Encoding for SLAM (2021) (Code)
- Neural 3D Mesh Renderer (2017) (Code)
- Large-scale Bilingual Language-Image Contrastive Learning (2022) (Code)
- OpenMVG - Open Multiple View Geometry library. Basis for 3D computer vision and Structure from Motion.
- Neural Points: Point Cloud Representation with Neural Fields (2021) (Code)
- OpenCV JS Web Worker - Getting started with OpenCV compiled to Webassembly and loaded in a worker.
- Learning Graph Regularisation for Guided Super-Resolution (2022) (Code)
- Video Polyp Segmentation: A Deep Learning Perspective (2022) (Code)
- Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images (2022) (Code)
- HybridNets: End-to-End Perception Network (2022) (Code)
- HDR-NeRF: High Dynamic Range Neural Radiance Fields (2022) (Code) (HN)
- AdaMixer: A Fast-Converging Query-Based Object Detector (2022) (Code)
- MixFormer: End-to-End Tracking with Iterative Mixed Attention (2022) (Code)
- Bringing Old Films Back to Life (2022) (Code)
- Extracting Triangular 3D Models, Materials, and Lighting From Images (2022) (Code)
- LiT: Zero-Shot Transfer with Locked-image text Tuning (2021) (Tweet)
- LAFITE: Towards Language-Free Training for Text-to-Image Generation (2021) (Code)
- Neural 3D Video Synthesis from Multi-view Video (2022) (Code)
- ToFu: Topologically Consistent Multi-View Face Inference Using Volumetric Sampling (2021)
- Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning (2019) (Code)
- FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator (2021)
- Reddit Place Script 2022 - Script to draw an image onto r/place.
- A Unified Objective for Novel Class Discovery (2021) (Code)
- Papers and Datasets about Point Cloud
- On the Importance of Asymmetry for Siamese Representation Learning (2022) (Code)
- REGTR: End-to-end Point Cloud Correspondences with Transformers
- A Closer Look at Local Aggregation Operators in Point Cloud Analysis (2020) (Code)
- Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries (2022) (Code)
- Perception Prioritized Training of Diffusion Models (2022) (Code)
- VisualBERT: A Simple and Performant Baseline for Vision and Language (2019) (Code)
- MultiMAE: Multi-modal Multi-task Masked Autoencoders (2022) (Code)
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction (2021) (Code)
- Towards Open World Object Detection (2021) (Code)
- Transformer in Vision - Recent Transformer-based CV and related works.
- Shunted Self-Attention via Multi-Scale Token Aggregation (2021) (Code)
- Space-Time Correspondence as a Contrastive Random Walk (2020) (Code)
- MaskGIT: Masked Generative Image Transformer (2022) (Code)
- EasyCV - All-in-one computer vision toolbox based on PyTorch.
- Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection (2022) (Code)
- EMOCA: Emotion Driven Monocular Face Capture and Animation (2022)
- Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation (2022) (Code)
- FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (Code)
- PointCLIP: Point Cloud Understanding by CLIP (2022) (Code)
- DaViT: Dual Attention Vision Transformers (2022) (Code)
- DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers (2022) (Code)
- Recovering 3D Human Mesh from Monocular Images: A Survey (2022) (Code)
- Video Diffusion Models (2022) (Web) (Code)
- MaxViT: Multi-Axis Vision Transformer (2022) (Code)
- Unified Contrastive Learning in Image-Text-Label Space (2022) (Code)
- RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering (2021) (Code)
- MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (2021) (Code)
- Learning What Not to Segment: A New Perspective on Few-Shot Segmentation (2022) (Code)
- MAXIM: Multi-Axis MLP for Image Processing (2022) (Code)
- Tensil tutorial for YOLO v4 Tiny on Ultra96 V2 (2022)
- UNITER: UNiversal Image-TExt Representation Learning (2020) (Code)
- Consistent Depth of Moving Objects in Video (2021) (Code)
- Bridging Video-text Retrieval with Multiple Choice Questions (2022) (Code)
- Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (2020) (Code)
- BACON: Band-limited Coordinate Networks for Multiscale Scene Representation (2022) (Code)
- Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results (2022) (Code)
- Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering (2021) (Code)
- SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image (2022) (Code)
- StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions (2021) (Code)
- Neighborhood Attention Transformer (2022) (Code)
- 3D Surface Reconstruction From Multi-Date Satellite Images (2021) (Code)
- Decoupling Makes Weakly Supervised Local Feature Better (2022) (Code)
- ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic (2022) (Code)
- EasyMocap - Open-source toolbox for markerless human motion capture from RGB videos.
- QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (2022) (Code)
- PolarMask: Single Shot Instance Segmentation with Polar Representation (2019) (Code)
- Latent Video Transformer (2020) (Code)
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (2020) (JAX Code)
- A Latent Transformer for Disentangled Face Editing in Images and Videos (2021) (Code)
- Photorealistic Style Transfer via Wavelet Transforms (2019) (Code)
- Probing ViTs
- Dense Depth Priors for Neural Radiance Fields from Sparse Input Views (2021) (Code)
- Self-Supervised Models are Continual Learners (2021) (Code)
- Mask Transfiner for High-Quality Instance Segmentation (2022) (Code)
- An Extendable, Efficient and Effective Transformer-based Object Detector (2022)
- Learned Queries for Efficient Local Attention (2021) (Code)
- 3D Human Pose Estimation with Spatial and Temporal Transformers (2021) (Code)
- 3D human pose estimation in video with temporal convolutions and semi-supervised training (2019) (Code)
- MC-Calib: A generic and robust calibration toolbox for multi-camera systems (2022) (Code)
- Understanding The Robustness in Vision Transformers (2022) (Code)
- Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation (2021) (Code)
- Tackling multiple tasks with a single visual language model (2022) (Code) (Tweet)
- Associating Objects with Transformers for Video Object Segmentation (2021) (Code)
- Simple multi-dataset detection - Object detection on multiple datasets with an automatically learned unified label space.
- Learning Texture Transformer Network for Image Super-Resolution (2020) (Code)
- Balanced MSE for Imbalanced Visual Regression (2022) (Code)
- Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions (2022) (Code)
- Action-Conditioned 3D Human Motion Synthesis with Transformer VAE (2021) (Code)
- CoMoGAN: continuous model-guided image-to-image translation (2021) (Code)
- OpenMVS - Open Multi-View Stereo reconstruction library.
- Sliced Recursive Transformer (2021) (Code)
- Neural Dual Contouring (2022) (Code)
- Awesome Deblurring - Curated list of resources for Image and Video Deblurring.
- CoCa: Contrastive Captioners are Image-Text Foundation Models (2022) (Code)
- Sequencer: Deep LSTM for Image Classification (2022)
- Language Models Can See: Plugging Visual Controls in Text Generation (2022) (Code)
- flyswot - CLI for Hugging Face Transformers image classification models.
- Neural 3D Scene Reconstruction with the Manhattan-world Assumption (2022) (Code)
- PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (2022) (Code)
- What do the Vision Transformers learn? How do they encode anything useful for image recognition? (2022)
- Integrative Few-Shot Learning for Classification and Segmentation (2022) (Code)
- DeltaConv: Anisotropic Geometric Deep Learning with Exterior Calculus (2022) (Code)
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis (2021) (Code)
- Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (2022) (Code)
- ConvMAE: Masked Convolution Meets Masked Autoencoders (2022) (Code)
- Deep Kernelized Dense Geometric Matching (2022) (Code)
- Unsupervised Semantic Segmentation by Distilling Feature Correspondences (2022) (Code)
- RecursiveMix: Mixed Learning with History (2022) (Code)
- MMDetection3d - OpenMMLab's next-generation platform for general 3D object detection.
- Imagen: Text-to-Image Diffusion Models (Tweet) (Code) (HN) (HN)
- An End-to-End Transformer Model for 3D Object Detection (2021) (Code)
- Neural 3D Reconstruction in the Wild (2022) (Code)
- Body shape and pose estimation on 3D scans of people in clothing using Ceres Solver
- A Survey of Visual Transformers (2021) (Code)
- Nerfies: Deformable Neural Radiance Fields (2021) (Code)
- Working notes on the role of vision papers in basic science (2022) (Tweet)
- CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers (2022) (HN)
- Prompt-aligned Gradient for Prompt Tuning (2022) (Code)
- Text2Human: Text-Driven Controllable Human Image Generation (2022) (Code)
- OnePose: One-Shot Object Pose Estimation without CAD Models (2022) (Code)
- OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models (2022) (Code)
- PREF: Phasorial Embedding Fields for Compact Neural Representations (2022) (Code)
- Optimizing Relevance Maps of Vision Transformers Improves Robustness (2022) (Code)
- Exploring Visual Prompts for Adapting Large-Scale Models (2022) (Code)
- Deepfake Offensive Toolkit - Makes real-time, controllable deepfakes ready for virtual cameras injection. (HN)
- Real-time Object Detection for Streaming Perception (2022) (Code)
- Volumentations 3D - Library for 3D augmentations.
- Awesome Learning with Label Noise
- LIVE: Towards Layer-wise Image Vectorization (2022) (Code)
- BEVT: BERT Pretraining of Video Transformers (2021) (Code)
- Variable Bitrate Neural Fields (2022) (Code)
- Gated-SCNN: Gated Shape CNNs for Semantic Segmentation (2019) (Code)
- Masked Unsupervised Self-training for Zero-shot Image Classification (2022) (Code)
- HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video (2022) (Code)
- Awesome Implicit NeRF Robotics
- EfficientFormer: Vision Transformers at MobileNet Speed (2022) (Code)
- ARF: Artistic Radiance Fields (2022) (Code) (HN)
- Patch2Pix: Epipolar-Guided Pixel-Level Correspondences (2020) (Code)
- Translating Images into Maps (2022) (Code)
- Instances as Queries (2021) (Code)
- OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction (2022) (Code)
- CogView: Mastering Text-to-Image Generation via Transformers (2021) (Code)
- All in One: Exploring Unified Video-Language Pre-training (2022) (Code)
- Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization (2022) (Code)
- Solving Inefficiency of Self-supervised Representation Learning (2021) (Code)
- NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination (2021) (Code)
- Trending in 3D Vision
- ShapeFormer: Transformer-based Shape Completion via Sparse Representation (2022) (Code)
- Awesome Prompting Papers in Computer Vision
- EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation (2022) (Code)
- GenDR: A Generalized Differentiable Renderer (2022) (Code)
- Elucidating the Design Space of Diffusion-Based Generative Models (2022) (Code) (Code)
- IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images (2022) (Code)
- Omnivore: A Single Model for Many Visual Modalities (2022) (Code)
- Benchmarking and Analyzing Point Cloud Classification under Corruptions (2022) (Code)
- DVGO: Direct Voxel Grid Optimization (Super-fast Convergence for Radiance Fields Reconstruction) (2022) (Code)
- RegionCLIP: Region-based Language-Image Pretraining (2021) (Code)
- Fast Light-Weight Near-Field Photometric Stereo (2022) (Code)
- ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (2021) (Code)
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models
- The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions (2022) (Code)
- 3D Moments from Near-Duplicate Photos (2022) (Code)
- Prototypical Contrastive Language Image Pretraining (2022) (Code)
- NeRV: Neural Representations for Videos (2021) (Code)
- MT-YOLOv6 - Single-stage object detection framework dedicated to industrial applications.
- Fast Point Transformer (2022) (Code)
- FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation (2022) (Code)
- Nettle Magic Project - Scanner for decks of cards with bar codes printed on card edges. (HN)
- Image Quality Assessment using Contrastive Learning (2021) (Code)
- Denoised MDPs: Learning World Models Better Than The World Itself (2022) (Code)
- Sparse Instance Activation for Real-Time Instance Segmentation (2022) (Code)
- Referring Image Matting (2022) (Code)
- Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds (2022) (Code)
- Contrastive Boundary Learning for Point Cloud Segmentation (2022) (Code)
- Scaling up Kernels in 3D CNNs (2022) (Code)
- Oriented RepPoints for Aerial Object Detection (2022) (Code)
- Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly (2022) (Code)
- EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications (2022) (Code)
- Awesome Visual Diffusion Models
- Vision Transformer Adapter for Dense Predictions (2022) (Code)
- Activating More Pixels in Image Super-Resolution Transformer (2022) (Code)
- PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies (2022) (Code)
- GMFlow: Learning Optical Flow via Global Matching (2022) (Code)
- Vector-quantized Image Modeling with Improved VQGAN (2021) (JAX Code)
- Learned Vertex Descent: A New Direction for 3D Human Model Fitting (2022) (Code)
- YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (2022) (Code)
- AITViewer - Set of tools to visualize and interact with sequences of 3D data.
- Object-Compositional Neural Implicit Surfaces (Code)
- Awesome Egocentric Vision
- MonoScene: Monocular 3D Semantic Scene Completion (2022) (Code)
- Visual Prompt Tuning (2022) (Code)
- Unified Implicit Neural Stylization (2022) (Code)
- 3D-Aware Semantic-Guided Generative Model for Human Synthesis (2021) (Code)
- Text2LIVE: Text-Driven Layered Image and Video Editing (2022) (HN)
- HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction (2022) (Code)
- Generalization of Otsu's Method and Minimum Error Thresholding (2020)
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model (2022) (Code)
- Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation (2021) (Code)
- Deformable Sprites for Unsupervised Video Decomposition (2022) (Code)
- Topologically-Aware Deformation Fields for Single-View 3D Reconstruction (2022) (Code)
- Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation (2021) (Code)
- Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions (2022) (Code)
- Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry (2021) (Code)
- Box-supervised Instance Segmentation with Level Set Evolution (2022)
- Tent: Fully Test-Time Adaptation by Entropy Minimization (2021) (Code)
- UniFormer: Unifying Convolution and Self-attention for Visual Recognition (2022) (Code)
- MOTR: End-to-End Multiple-Object Tracking with Transformer (2022) (Code)
- Towards Grand Unification of Object Tracking (2022) (Code)
- Benchmarking Omni-Vision Representation through the Lens of Visual Realms (2022) (Code)
- Color Histograms in Image Retrieval
- SeqTR: A Simple yet Universal Network for Visual Grounding (2022) (Code)
- Image Inpainting with External-internal Learning and Monochromic Bottleneck (2021) (Code)
- Deep Image Homography Estimation (2016) (Code)
- Illumination Adaptive Transformer (2022) (Code)
- MotionCLIP: Exposing Human Motion Generation to CLIP Space (2022) (Code)
- Awesome Image Composition
- Scene Text Recognition with Permuted Autoregressive Sequence Models (2022) (Code)
- Multimodal Masked Autoencoders Learn Transferable Representations (Code)
- BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection (2022) (Code)
- BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (2022) (Code)
- AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields (2022) (Code)
- Harmonizer: Learning to Perform White-Box Image and Video Harmonization (2022) (Code)
- CVAT - Computer Vision Annotation Tool. (Code)
- NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing (2022)
- Monocular 3D Object Detection with Depth from Motion (2022) (Code)
- Masked Discrimination for Self-Supervised Learning on Point Clouds (2022) (Code)
- SORT - Simple, online, and real time tracking of multiple objects in a video sequence.
- Local Color Distributions Prior for Image Enhancement (2022) (Code)
- S2Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning (2022) (Code)
- Is Attention All NeRF Needs? (2022) (Code)
- Camouflaged/Concealed Object Detection
- Accelerate Vision Transformer (ViT) with Quantization using Optimum (2022)
- Optimizing Transformers for GPUs with Optimum (2022)
- Photogrammetry Guide (HN)
- Multi-View Mesh Reconstruction with Neural Deferred Shading (2022) (Code)
- Initialization and Alignment for Adversarial Texture Optimization (2022) (Code)
- DCT-Net: Domain-Calibrated Translation for Portrait Stylization (2022) (Code)
- Pretraining is All You Need for Image-to-Image Translation (2022) (Code)
- Vision-Centric BEV Perception: A Survey
- Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency (2022) (Code)
- Awesome Weakly Supervised Semantic Segmentation Papers
- GAUDI: A Neural Architect for Immersive 3D Scene Generation (2022) (Code) (HN)
- Multimodal Image Synthesis and Editing: A Survey (2021) (Code)
- High-Resolution Image Synthesis with Latent Diffusion Models (2022) (Code)
- ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters (2022) (Code)
- 3D Vision with Transformers: A Survey (2022)
- Optical Flow Processing Stack
- VideoX - Multi-modal Video Content Understanding
- Simple Baselines for Image Restoration (2022) (Code)
- Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning (2022)
- Revisiting the Critical Factors of Augmentation-Invariant Representation Learning (2022) (Code)
- Image Quality Related Papers
- Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution (2022) (Code)
- MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (2022) (Code)
- Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise (2022) (Code)
- Flexible Diffusion Modeling of Long Videos (2022) (Code)
- MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries (2022) (Code)
- Escaping the Big Data Paradigm with Compact Transformers (2021) (Code)
- Towards Layer-wise Image Vectorization (2022) (Code)
- Awesome Optical Flow
- LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling (2022) (Code)
- SimpleRecon: 3D Reconstruction Without 3D Convolutions (2022) (Code)
- Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories (2022) (Code)
- Lance - Columnar Data Format for Machine Learning and Computer Vision.
- Strand-Braid - Live, low-latency 2D and 3D tracking from single or multiple high-speed cameras.
- Multi-Domain Incremental Learning for Semantic Segmentation (2022) (Code)
- ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning (2022) (Code)
- Awesome Vision-and-Language Pre-Training
- Deep Vision and Graphics course
- OpenMixup - CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark.
- MMEditing - Low-level vision toolbox based on PyTorch, supporting super-resolution, inpainting, matting, video interpolation, etc.
- Accelerating DETR Convergence via Semantic-Aligned Matching (2022) (Code)
- The Follower - Using open cameras and AI to find how an Instagram photo is taken (Tweet)
- Image Segmentation Using Text and Image Prompts (2022) (Code)
- Knowledge Distillation from A Stronger Teacher (2022) (Code)
- Learning Pixel Trajectories with Multiscale Contrastive Random Walks (2022) (Code)
- Text2Light: Zero-Shot Text-Driven HDR Panorama Generation (2022) (Code)
- detrex - Open-source toolbox that provides state-of-the-art Transformer-based detection algorithms.
- VToonify: Controllable High-Resolution Portrait Video Style Transfer (2022) (Code)
- MMYOLO - Open source toolbox for YOLO series algorithms based on PyTorch and MMDetection.
- Relighting4D: Neural Relightable Human from Videos (2022) (Code)
- GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images (2022) (Code)
- Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer (2019) (Code)
- Ask HN: Any good self-hosted image recognition software? (2022)
- LAVIS - One-stop Library for Language-Vision Intelligence.
- CATs: Cost Aggregation Transformers for Visual Correspondence (2021) (Code)
- Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation (2022) (Code)
- SetFit - Efficient Few-shot Learning with Sentence Transformers.
- Awesome Monocular 3D detection
- Human Motion Diffusion Model (2022) (Code) (HN)
- DreamFusion: Text-to-3D using 2D Diffusion (2022) (HN)
- Recent Advanced in Vision-and-Language Pre-training (2022)
- DeepInteraction: 3D Object Detection via Modality Interaction (2022) (Code)
- Vision OSC - Send (almost) all Apple Vision Framework's detection results via OSC.
- Synergistic Self-supervised and Quantization Learning (2022) (Code)
- Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models (2022) (Code)
- StyleSwap: Style-Based Generator Empowers Robust Face Swapping (2022) (Code)
- IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis (2022) (Code)
- VMFormer: End-to-End Video Matting with Transformer (2022) (Code)
- Image-Based CLIP-Guided Essence Transfer (2021) (Code)
- Equivariant Point Network for 3D Point Cloud Analysis (2022) (Code)
- Computer Vision in the Wild Readings
- Nerfstudio - Collaboration friendly studio for NeRFs.
- Learning Dexterous Manipulation from Exemplar Object Trajectories and Pre-Grasps (2022) (Code)
- MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction (2022) (Code)
- Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising (2022) (Code)
- 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds (2022) (Code)
- PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation (2022) (Code)
- UPIT - FastAI/PyTorch package for unpaired image-to-image translation.
- On Distillation of Guided Diffusion Models (2022)
- Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence (2022) (Code)
- MaPLe: Multi-modal Prompt Learning (2022) (Code)
- GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation (2022) (Code)
- End2End Occluded Face Recognition by Masking Corrupted Features (2022) (Code)
- Paint Transformer: Feed Forward Neural Painting with Stroke Prediction (2021) (Code)
- Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance (2022) (Code)
- Adaptive Token Sampling For Efficient Vision Transformers (2022) (Code)
- Understanding Pure CLIP Guidance for Voxel Grid NeRF Models (2022) (Code)
- Real-Time Neural Character Rendering with Pose-Guided Multiplane Images (2022) (Code)
- Subspace Regularizers for Few-Shot Class Incremental Learning (2022) (Code)
- Exploring Long-Sequence Masked Autoencoders (2022) (Code)
- Awesome 3D-aware Image Synthesis – Papers, Codes and Datasets
- An Improved One millisecond Mobile Backbone (2021) (Code)
- Fuzzy Metaballs: Approximate Differentiable Rendering with Algebraic Surfaces (2022) (Code)
- Focal Modulation Networks (2022)
- Monocular Dynamic View Synthesis: A Reality Check (2022) (Code)
- Pose Recognition With Cascade Transformers (2021) (Code)
- Terran - Human perception library.
- Pento - Boost your business with computer vision. (GitHub)
- HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields (2022) (Code)
- FastestDet - Newly designed ultra lightweight anchor free target detection algorithm.
- Computer Vision, From 3D Reconstruction to Recognition Notes
- Stanford University: Deep Learning for Computer Vision (Notes)
- TAP-Vid: A Benchmark for Tracking Any Point in a Video (2022) (Code)
- StyleNAT: Giving Each Head a New Perspective (2022) (Code)
- InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions (2022) (Code)
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models (2022) (Code)
- SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery (2022) (Code)
- OpenSeeFace - Robust real time face and facial landmark tracking on CPU with Unity integration.
- GIT: A Generative Image-to-text Transformer for Vision and Language (2022) (Code)
- MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis (2022) (Code)
- Paddle Detection - High-Efficient Development Toolkit for Object Detection based on PaddlePaddle.
- Instant Neural Surface Reconstruction
- All are Worth Words: A ViT Backbone for Diffusion Models (2022) (Code)
- OneFormer: One Transformer to Rule Universal Image Segmentation (2022) (Code)
- Visual Object Tracking
- DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification (2021) (Code)
- Exploring CLIP for Assessing the Look and Feel of Images (2022) (Code)
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation (2022) (Code)
- Tracking without bells and whistles (2019) (Code)
- LaTr: Layout-Aware Transformer for Scene-Text VQA (2021) (Code)
- ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers (2022)
- SinDiffusion: Learning a Diffusion Model from a Single Natural Image (2022) (Code)
- CLIP4Cir - CLIP for Conditioned image retrieval training code.
- Physics-based Character Controllers Using Conditional VAEs (2022) (Code)
- Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition (2021) (Code)
- Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention (2022) (Code)
- VLDet: Learning Object-Language Alignments for Open-Vocabulary Object Detection (2022)
- NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views (2022) (Code)
- Vision Transformers (ViT) Explained (2022) (HN)
- Embedding Methods for Image Search | Pinecone
- Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion (2022) (Code)
- Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild (2022) (Code)
- ODaM - Object detection and Monitoring.
- Token Merging: Your ViT But Faster (2022) (Code)
- NeuralUDF: Learning Unsigned Distance Fields for Multi-view Reconstruction of Surfaces with Arbitrary Topologies (2022) (Code)
- Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars (2022) (Code)
- PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking (2022) (Code)
- Diffusion Models for Medical Image Analysis: A Comprehensive Survey (2022) (Code)
- RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data (2022) (Code)
- Rerun - Open source visualization infrastructure for computer vision and robotics. (Code) (OSS Release) (HN)
- ECON: Explicit Clothed humans Obtained from Normals (2022) (Code)
- Monocular, One-stage, Regression of Multiple 3D People
- Paint by Example: Exemplar-based Image Editing with Diffusion Models (2022) (Code)
- Detection Transformers with Assignment (2022)
- Splicing ViT Features for Semantic Appearance Transfer (2022) (Code)
- Polynomial Neural Fields for Subband Decomposition and Manipulation (2022)
- CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet (2022) (Code)
- RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value Pairs (2022)
- Vision-and-Language Navigation Resources
- HNeRV: A Hybrid Neural Representation for Videos (2022) (Code)
- GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation (2022) (Code)
- Zero Shot Image Restoration Using Denoising Diffusion Null-Space Model (2022) (Code)
- DifFace: Blind Face Restoration with Diffused Error Contraction
- What do Vision Transformers Learn? A Visual Exploration
- Images Speak in Images: A Generalist Painter for In-Context Visual Learning (2022) (Code)
- ShuffleMixer: An Efficient ConvNet for Image Super-Resolution
- SDFStudio: Unified Framework for Surface Reconstruction (Code)
- SegViT: Semantic Segmentation with Plain Vision Transformers (2022) (Code)
- MMEngine - Foundational library for training deep learning models based on PyTorch.
- SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields (2022) (Code)
- Great Computer Vision startups (2022)
- CoVA: Context-aware Visual Attention for Webpage Information Extraction (2022)
- Awesome 3D Object Detection
- ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection (2022) (Code)
- DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis (2022) (Code)
- FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos (2022) (Code)
- SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution (2022)
- Knowledge Condensation Distillation (2022) (Code)
- NeuMan: Neural Human Radiance Field from a Single Video (2022) (Code)
- ScaleNet: Searching for the Model to Scale (2022) (Code)
- Very Recent Progress in 3D Hand Tasks
- Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (2022) (Code) (Code)
- NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields (2022)
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures (2022) (Code)
- Deep Architectures for Content Moderation and Movie Content Rating (2022) (Code)
- TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning (2022) (Code)
- Magic3D: High-Resolution Text-to-3D Content Creation (2022) (Code)
- InternVideo: General Video Foundation Models via Generative and Discriminative Learning (2022) (Code)
- Exploring Cross-Image Pixel Contrast for Semantic Segmentation (2021) (Code)
- Towards Robust Blind Face Restoration with Codebook Lookup Transformer (2022) (Code)
- Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction (2022) (Code)
- ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders (2023) (Code)
- SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection (2022) (Code)
- SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation (2022)
- Guess What Moves: Unsupervised Video and Image Segmentation by Anticipating Motion (2022) (Code)
- Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera (2022) (Code)
- GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer (2022) (Code)
- Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models (2022)
- OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models (2022) (Code)
- Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations (2022) (Code)
- Rethinking Resolution in the Context of Efficient Video Recognition (2022) (Code)
- OpenCV Mobile
- SINE: SINgle Image Editing with Text-to-Image Diffusion Models (2022) (Code)
- PETR: Position Embedding Transformation for Multi-View 3D Object Detection (2022)
- Awesome Deep Optics/End-to-end Optical Design
- HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling (2023) (Code)
- Image generation with MNIST (2022)
- CiT: Curation in Training for Effective Vision-Language Data (2023) (Code)
- MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare (2022) (Code)
- Bidirectional Projection Network for Cross Dimension Scene Understanding (2021) (Code)
- Image Distortion Correction - Curated list of resources on handling Rolling Shutter effects and Radial Distortions.
- Ultralytics YOLOv8 - YOLOv8 in PyTorch > ONNX > CoreML > TFLite.
- Neural Density-Distance Fields (2022) (Code)
- Vision Transformers Are Good Mask Auto-Labelers (2023) (Code)
- TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition (2022) (Code)
- EVA: Exploring the Limits of Masked Visual Representation Learning at Scale (2022) (Code)
- SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction (2022) (Code)
- Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling (2023) (Code)
- Generalized Decoding for Pixel, Image, and Language (2022) (Code)
- Global Context Vision Transformers (2022) (Code)
- Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning (2023) (Code)
- DensePose From WiFi (2022) (Tweet) (HN) (HN)
- CHAIRS: Towards Full-Body Articulated Human-Object Interaction (2022) (Code)
- MultiAct: Long-Term 3D Human Motion Generation from Multiple Action Labels (2023) (Code)
- GLIGEN: Open-Set Grounded Text-to-Image Generation (2023) (Code)
- T2M-GPT: Generating Human Motion from Textual Descriptions with discrete Representations (2023) (Code)
- Multiview Compressive Coding for 3D Reconstruction (2023) (Code)
- Deep Learning Object Detection Paper List
- Efficient Neural Radiance Fields for Interactive Free-viewpoint Video (2022) (Code)
- InstructPix2Pix: Learning to Follow Image Editing Instructions (2022) (Code)
- Learned reconstructions for practical mask-based lensless imaging (Code)
- NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos (2022) (Code)
- Domain Expansion of Image Generators (2023) (Code)
- Computer Vision: Models, Learning, and Inference
- Reversible Column Networks (2022) (Code)
- Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach (2021) (Code)
- Long-tail Detection with Effective Class-Margins (2022) (Code)
- Diffusion-SDF: Text-to-Shape via Voxelized Diffusion (2022) (Code)
- Learning 3D-aware Image Synthesis with Unknown Pose Distribution (2023) (Code)
- Video object detection in Elixir using Nx and Bumblebee (2023)
- K-Planes: Explicit Radiance Fields in Space, Time, and Appearance (2023) (Code)
- Text2LIVE: Text-Driven Layered Image and Video Editing (2022) (Code)
- Disentangled Representation Learning for Text-Video Retrieval (2022) (Code)
- Text-To-4D Dynamic Scene Generation (2023) (HN)
- PhyCV - Physics-inspired Computer Vision Library.
- Fast Dynamic Radiance Fields with Time-Aware Neural Voxels (2022) (Code)
- Learning Customized Visual Models with Retrieval-Augmented Knowledge (2023) (Code)
- Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models (2022) (Code)
- Accelerating Guided Diffusion Sampling with Splitting Numerical Methods (2023) (Code)
- SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections (2023) (Code)
- STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation (2023) (Code)
- Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline (2023) (Code)
- Awesome Vision Transformer Collection
- Compressed Vision for Efficient Video Understanding (2022) (Code)
- Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation (2023) (Code)
- SEGA: Instructing Diffusion using Semantic Dimensions (2023) (Code)
- Dreamix: Video Diffusion Models are General Video Editors (2023) (Web)
- GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis (2023) (Code)
- ESP-WHO - Face detection and recognition framework.
- Egocentric Video-Language Pretraining (2022) (Code)
- EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations (2022) (Code)
- Revealing Single Frame Bias for Video-and-Language Learning (2022) (Code)
- EVA3D: Compositional 3D Human Generation from 2D Image Collections (2022) (Code)
- Zero-shot Image-to-Image Translation (2023) (Code)
- MatteFormer: Transformer-Based Image Matting via Prior-Tokens (2022) (Code)
- minREV - Simple minimal implementation of Reversible Vision Transformers.
- Cut and Learn for Unsupervised Object Detection and Instance Segmentation (2023) (Code)
- T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models (2023) (Code)
- 3D-aware Conditional Image Synthesis (2023) (Code)
- Deblur-NeRF: Neural Radiance Fields from Blurry Images (2021) (Code)
- Learning When to Say "I Don't Know" (2022) (Code)
- MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation (2023)
- NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild (2021) (Code)
- 3D Shape Analysis Paper List
- Generating Holistic 3D Human Motion from Speech (2022) (Code)
- SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation (2023) (Code)
- Audio-Visual Face Reenactment (2022) (Code)
- Awesome Distribution Shift
- Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval (2022) (Code)
- TEXTure: Text-Guided Texturing of 3D Shapes (Code)
- SIMPLI - Self-improving Multiplane-to-layer Images for Novel View Synthesis (2023)
- Awesome Image Registration - Image registration related books, papers, videos, and toolboxes.
- Learning Visual Representations via Language-Guided Sampling (2023) (Code)
- RealFusion: 360° Reconstruction of Any Object from a Single Image (2023) (Code)
- Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment (2022) (Code)
- Composer: Creative and Controllable Image Synthesis with Composable Conditions (2023) (Code)
- Decoupling Human and Camera Motion from Videos in the Wild (2023) (Code)
- ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth (2023) (Code)
- The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition (2023) (Code)
- Image as Set of Points (2023) (Code)
- MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound (2022) (Code)
- NeRF2Mesh: Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement (2023) (Code)
- Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction (2023) (Code)
- MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices (2023) (Code)
- MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation (2022) (Code)
- FFCV-SSL - Fast Forward Computer Vision for Self-Supervised Learning.
- How computer vision is changing manufacturing in 2023 (HN)
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation (2023) (Code)
- NeRFshop: Interactive Editing of Neural Radiance Fields
- Blind Video Deflickering by Neural Filtering with a Flawed Atlas (2023)
- Universal Instance Perception as Object Discovery and Retrieval (2023)
- FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization (2023) (Code)
- Vid2Seq: a pretrained visual language model for describing multi-event videos (2023) (HN)
- Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models (2023) (Code)
- Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation (2023) (Code)
- Generative Semantic Segmentation (2023) (Code)
- Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators (2023) (Code) (HN)
- Diffusion-based Generation, Optimization, and Planning in 3D Scenes (2023) (Code)
- Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes (2023) (Code)
- Pointcept - Powerful and flexible codebase for point cloud perception research.
- Conditional Image-to-Video Generation with Latent Flow Diffusion Models (2023) (Code)
- ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model (2023) (Code)
- GlueStick - Joint Deep Matcher for Points and Lines.
- GVision - Reverse image search app that use Google Cloud Vision API to detect landmarks and web entities from images.
- Segment Anything (2023) (Code)
- Detecting and Grounding Multi-Modal Media Manipulation (2023) (Code)
- Better Aligning Text-to-Image Models with Human Preference (2023) (Code)
- Scaling Language-Image Pre-training via Masking (2022) (Code)
- From Zero to Hero: Convincing with Extremely Complicated Math (2023) (HN)
- Zero-shot Generative Model Adaptation via Image-specific Prompt Learning (2023) (Code)
- Awesome Digital Human - Collection of resources on digital human including clothed people digitalization, virtual try-on, and other related directions.
- Grounded-Segment-Anything - Marrying Grounding DINO with Segment Anything - Detect and Segment Anything with Text Inputs.
- VideoCrafter:Toolkit for Text-to-Video Generation and Editing
- DiffMimic: Efficient Motion Mimicking with Differentiable Physics (2023) (Code)
- Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (2023) (Code)
- MetaSeg: Packaged version of the Segment Anything repository
- Segment Anything with Clip
- Hachi - Natural Language search for Videos and Images.
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (2023) (Code)
- EditAnything - Segment Anything + ControlNet + BLIP2 + Stable Diffusion. (HN)
- Segment Anything EO tools
- PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models (2023) (Code)
- Awesome-Anything - General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, etc.
- Connect Segment-Anything with CLIP
- Semantic Segment Anything - Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset.
- Segment Anything Labelling Tool (SALT)
- CenterCLIP: Token Clustering for Efficient Text-Video Retrieval (2022) (Code)
- FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (2023) (Code)
- SVDiff: Compact Parameter Space for Diffusion Fine-Tuning (2023) (Code)
- Detection Transformer with Stable Matching (2023) (Code)
- Caption Anything via Clicking
- Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition (2023) (Code)
- Prompt-Segment-Anything - Implementation of zero-shot instance segmentation using Segment Anything.
- Segment Anything for Stable Diffusion Webui
- Semaphore - Full-body keyboard using gestures to type through computer vision. (HN)
- CleanVision - Automatically find issues in image datasets and practice data-centric computer vision.
- NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior (2023) (Code)
- Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs (2023) (Code)
- FAIR Animated Drawings (Code) (HN)
- Anything-3D - Segment-Anything + 3D. Let's lift the anything to 3D.
- 3D-Box via Segment Anything
- Rich-Text-to-Image Generation
- SEEM: Segment Everything Everywhere All at Once
- DinoV2: Meta’s Open Source State-of-the-art computer vision models (2023) (HN) (Code)
- Dynablox - Real-time detection of diverse dynamic objects in complex environments.
- Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning (2023)
- Inpaint Anything: Segment Anything Meets Image Inpainting
- Transformer-Based Visual Segmentation: A Survey (2023)
- AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation (2023) (Code)
- segment-geospatial - Python package for segmenting geospatial data with the Segment Anything Model (SAM).
- I Hear Your True Colors: Image Guided Audio Generation
- Contrastive Audio-Visual Masked Autoencoder (2023)
- F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories (2023) (Code)
- Angler: Helping Machine Translation Practitioners Prioritize Model Improvements (2023) (Code)
- Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields (2023) (Code)
- Mask-Free Video Instance Segmentation (2023) (Code)
- Fine-tuned CLIP models are efficient video learners (2023)
- Track-Anything - Flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem.
- Supervision - Easy-to-use utils that will come in handy in any Computer Vision project.
- Roboflow Notebooks - Examples and tutorials on using SOTA computer vision models and techniques.
- Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations (2023)
- Awesome Segment Anything - Tracking and collecting papers/projects/others related to Segment Anything.
- SuperGradients - Easily train or fine-tune SOTA computer vision models with one open source training library.
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking (2023)
- DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation (2023) (Code)
- Shap-E: Generating Conditional 3D Implicit Functions (2023) (Code) (HN)
- Personalize Segment Anything Model with One Shot (2023) (Code)
- Segment Anything 3D
- FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
- ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision (2023) (Code)
- ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities (2023) (Code)
- Denoising Diffusion Models: A Generative Learning Big Bang (2023) (Code)
- LERF: Language Embedded Radiance Fields (2023) (Code)
- Masked Diffusion Transformer is a Strong Image Synthesizer (2023) (Code)
- Decentralization and Acceleration Enables Large-Scale Bundle Adjustment (2023) (Code)
- Awesome-Visual-Instruction-Tuning
- Awesome 3D Reconstruction Papers
- Better Diffusion Models Further Improve Adversarial Training (2023) (Code)
- Accelerated Coordinate Encoding: Learning to Relocalize in Minutes using RGB and Poses (2023) (Code)
- MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model (2022) (Code)
- DiM: Distilling Dataset into Generative Model (2023) (Code)
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation (2023) (Code)
- Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models (2023) (Code)
- Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising (2023) (Code)
- Collaborative Diffusion for Multi-Modal Face Generation and Editing (2023) (Code)
- Learning Attention as Disentangler for Compositional Zero-shot Learning (2023) (Code)
- GRES: Generalized Referring Expression Segmentation (2023) (Code)
- Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles (2023) (Code)
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day (2023) (Code)
- Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models (2023) (Code)
- Text2Tex: Text-driven Texture Synthesis via Diffusion Models (2023) (Code)
- Apple releasing segmentation/pose for humans and animals (HN)
- Real-time 6K Image Rescaling with Rate-distortion Optimization (2023) (Code)
- Awesome Talking Head Generation
- Tracking Everything Everywhere All at Once (2023) (Code)
- Towards Smooth Video Composition (2023) (Code)
- FasterViT: Fast Vision Transformers with Hierarchical Attention
- Matting Anything (2023) (HN)
- ViTMatte - Boosting Image Matting with Pretrained Plain Vision Transformers.
- Neural Kernel Surface Reconstruction (2023)
- Inserting Anybody in Diffusion Models via Celeb Basis (2023) (Code)
- SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions (2023) (Code)
- Temporal Voyage: Code for "Neural Scene Chronology"
- DynIBaR: Neural Dynamic Image-Based Rendering (2023) (Code)
- RelTR: Relation Transformer for Scene Graph Generation (2022) (Code)
- Progressively Optimized Local Radiance Fields for Robust View Synthesis (2023)
- WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference (2023)
- MIME: Human-Aware 3D Scene Generation (2023)
- Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2023) (Code)
- Language Segment-Anything
- Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation (2023) (Code)
- Awesome Segment Anything
- Cones 2: Customizable Image Synthesis with Multiple Subjects (2023) (Code)
- View Synthesis with Sculpted Neural Points (2022) (Code)
- NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action (2022) (Code)
- Zero-Shot Video Question Answering via Frozen Bidirectional Language Models (2022) (Code)
- DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (2023) (Code)
- Matte Anything: Interactive Natural Image Matting with Segment Anything Models (2023) (Code)
- Infinite Photorealistic Worlds using Procedural Generation (2023) (HN)
- VideoComposer: Compositional Video Synthesiswith with Motion Controllability
- Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation (2023)
- Unpaired Image-to-Image Translation via Neural Schrödinger Bridge (2023) (Code)
- Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials (2023) (Code)
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation (2023) (Code)
- Multi-scale Attention Guided Pose Transfer (2022) (Code)
- SUDS: Scalable Urban Dynamic Scenes
- PVO: Panoptic Visual Odometry (2022) (Code)
- Fast Segment Anything
- FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction (2023) (Code)
- DISCO: Disentangled Control for Referring Human Dance Generation in Real World (2023) (Code)
- StyleDrop: Text-to-Image Generation in Any Style (2023) (Code)
- Segment Anything Meets Point Tracking (2023) (Code)
- Denoising Diffusion Models for Plug-and-Play Image Restoration (2023) (Code)
- Final2x - Enhance Your Images with Effortless Cross-Platform Super-Resolution at Any Scale.
- nr3d_lib - Modules, operators and utilities for 3D neural rendering in single-object, multi-object, categorical and large-scale scenes.
- State of Computer Vision 2023
- Awesome Object Pose Estimation and Reconstruction
- Generative Pretraining in Multimodality (2023) (Code)
- mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs (2023) (Code)
- Collaborative Score Distillation for Consistent Visual Synthesis (Code)
- PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning (2022) (Code)
- Meta-Transformer: A Unified Framework for Multimodal Learning (2023) (Code)
- Awesome Embodied Vision
- ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models (2023) (Code)
- Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks
- Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions (2022) (Code)
- ImageNet Model Code
- VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (2023) (Code)
- Zero-1-to-3: Zero-shot One Image to 3D Object (2023) (Code)
- ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (2023) (Code)
- Thin-Plate Spline Motion Model for Image Animation (2022) (Code)
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors (Web) (HN)
- Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement (2023) (Code)
- LISA: Reasoning Segmentation via Large Language Model (2023) (Code)
- Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids (2023) (Code)
- Key-Locked Rank One Editing for Text-to-Image Personalization (2023) (Code)
- DreamWaltz: Make a Scene with Complex 3D Animatable Avatars (2023) (Code)
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (2023) (Code)
- PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning (2023) (Code)
- Contrastive Model Adaptation for Cross-Condition Robustness in Semantic Segmentation (2023) (Code)
- Neuralangelo: High-Fidelity Neural Surface Reconstruction (2023) (Code)
- Box-X - Tool-box for efficient build and debug in Python. Especially for Scientific Computing and Computer Vision.
- CityNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering (2022) (Code)
- NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation (2023) (Code)
- Color-NeuS: Reconstructing Neural Implicit Surfaces with Color (2023) (Code)
- Inst-Inpaint: Instructing to Remove Objects with Diffusion Models (2023) (Code)
- SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning (2023)
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering (2023) (Code)
- XMem++: Production-level Video Segmentation From Few Annotated Frames (2023) (Code)
- FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization (2023)
- CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (2023) (Code)
- TeCH: Text-guided Reconstruction of Lifelike Clothed Humans (2023) (Code)
- Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis (2023) (Code)
- Roboflow Inference - Opinionated tool for running inference on state-of-the-art computer vision models. (HN)
- FaceFusion - Next generation face swapper and enhancer.
- Awesome Adaptive Computation
- Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis (2023)
- StableVideo: Text-driven Consistency-aware Diffusion Video Editing (2023)
- Change-Aware Sampling and Contrastive Learning for Satellite Images (2023) (Code)
- Dense Text-to-Image Generation with Attention Modulation (2023) (Code)
- VisionScript - High-level programming language for using computer vision.
- Aligning Pre-training and Fine-tuning in Object Detection (2023)
- CoTracker: It is Better to Track Together - Model for tracking any point (pixel) on a video.
- MagicEdit: High-Fidelity and Temporally Coherent Video Editing
- SAM.cpp - Inference of Meta's Segment Anything Model in pure C/C++. (HN)
- Queryable - Run OpenAI's CLIP model on iOS to search photos.
- 3D-LLM: Injecting the 3D World into Large Language Models
- YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection
- Image Search using CLIP - Search images with a text or image query, using Open AI's pretrained CLIP model.
- ResFields: Residual Neural Fields for Spatiotemporal Signals
- AdverseCleaner - Remove adversarial noise from images.