Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

LiheYoung/Depth-Anything 19 Jan 2024

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

 Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Data Augmentation Monocular Depth Estimation +1

2,031
16.04 stars / hour

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

codium-ai/alphacodium 16 Jan 2024

Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks.

Code Generation Prompt Engineering +1

1,876
4.77 stars / hour

InstantID: Zero-shot Identity-Preserving Generation in Seconds

instantid/instantid 15 Jan 2024

There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA.

Diffusion Personalization Tuning Free Image Generation

2,914
4.71 stars / hour

Self-Rewarding Language Models

lucidrains/self-rewarding-lm-pytorch 18 Jan 2024

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.

Instruction Following Language Modelling

787
3.54 stars / hour

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

hustvl/vim 17 Jan 2024

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to become the next-generation backbone for vision foundation models.

object-detection Object Detection +3

766
2.69 stars / hour

Efficiently Programming Large Language Models using SGLang

sgl-project/sglang 12 Dec 2023

SGLang is designed for the efficient programming of LLMs and incorporates primitives for common LLM programming patterns.

882
1.99 stars / hour

TaskWeaver: A Code-First Agent Framework

microsoft/taskweaver 29 Nov 2023

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Natural Language Understanding

3,777
1.91 stars / hour

VMamba: Visual State Space Model

mzeromiko/vmamba 18 Jan 2024

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) stand as the two most popular foundation models for visual representation learning.

Representation Learning

335
1.79 stars / hour

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

TencentARC/PhotoMaker 7 Dec 2023

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts.

Diffusion Personalization Tuning Free

6,460
1.70 stars / hour

Scalable Pre-training of Large Autoregressive Image Models

apple/ml-aim 16 Jan 2024

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #322 on Image Classification on ImageNet (using extra training data)

Image Classification

512
1.53 stars / hour