Trending Research

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

LiheYoung/Depth-Anything • • 19 Jan 2024

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Data Augmentation Monocular Depth Estimation +1

2,031

16.04 stars / hour

Paper
Code

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

codium-ai/alphacodium • 16 Jan 2024

Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks.

Code Generation Prompt Engineering +1

1,876

4.77 stars / hour

Paper
Code

InstantID: Zero-shot Identity-Preserving Generation in Seconds

instantid/instantid • • 15 Jan 2024

There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA.

Diffusion Personalization Tuning Free Image Generation

2,914

4.71 stars / hour

Paper
Code

Self-Rewarding Language Models

lucidrains/self-rewarding-lm-pytorch • • 18 Jan 2024

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.

Instruction Following Language Modelling

787

3.54 stars / hour

Paper
Code

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

hustvl/vim • • 17 Jan 2024

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to become the next-generation backbone for vision foundation models.

object-detection Object Detection +3

766

2.69 stars / hour

Paper
Code

Efficiently Programming Large Language Models using SGLang

sgl-project/sglang • 12 Dec 2023

SGLang is designed for the efficient programming of LLMs and incorporates primitives for common LLM programming patterns.

882

1.99 stars / hour

Paper
Code

TaskWeaver: A Code-First Agent Framework

microsoft/taskweaver • 29 Nov 2023

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Natural Language Understanding

3,777

1.91 stars / hour

Paper
Code

VMamba: Visual State Space Model

mzeromiko/vmamba • • 18 Jan 2024

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) stand as the two most popular foundation models for visual representation learning.

Representation Learning

335

1.79 stars / hour

Paper
Code

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

TencentARC/PhotoMaker • • 7 Dec 2023

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts.

Diffusion Personalization Tuning Free

6,460

1.70 stars / hour

Paper
Code

Scalable Pre-training of Large Autoregressive Image Models

apple/ml-aim • • 16 Jan 2024

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #322 on Image Classification on ImageNet (using extra training data)

Image Classification

512

1.53 stars / hour

Paper
Code