Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Pinned
Suggested

Filters

22 Apr 2025
reinforcement-learningreasoningtest-time-inference

A test-time reinforcement learning framework enables large language models to improve their mathematical reasoning capabilities using only unlabeled data, demonstrating a 159% improvement in pass@1 performance on AIME 2024 with Qwen-2.5-Math-7B while requiring no ground truth labels or human annotation.

22 Apr 2025
reasoningchain-of-thought

A comprehensive benchmark suite called PHYBench evaluates physical reasoning capabilities in large language models through 500 carefully curated physics problems and introduces a novel Expression Edit Distance (EED) Score metric, revealing significant performance gaps between current LLMs and human physics students across various domains of physics.

22 Apr 2025
video-understandingvision-language-modelsself-supervised-learning

NVIDIA and UC Berkeley researchers introduce DAM (Describe Anything Model), a vision-language architecture that generates detailed captions for specific regions in images and videos through a focal prompt mechanism and localized vision backbone, achieving state-of-the-art performance across 7 benchmarks while addressing data scarcity through a semi-supervised learning pipeline.

18 Apr 2025
reinforcement-learningreasoningimitation-learning

A comprehensive analysis reveals that Reinforcement Learning with Verifiable Rewards (RLVR) primarily improves sampling efficiency rather than expanding reasoning capabilities in Large Language Models, with base models outperforming RLVR-trained versions at higher sampling rates (k>256) across multiple benchmarks and model architectures.

21 Apr 2025
reinforcement-learningreasoningimitation-learning

LUFFY framework enhances large language models' reasoning capabilities by integrating off-policy guidance into zero-RL training, achieving +7.0 point improvements on math benchmarks and +6.2 points on out-of-distribution tasks compared to previous methods while maintaining exploration through policy shaping and regularized importance sampling.

22 Apr 2025
reinforcement-learningchain-of-thoughtagents

Google DeepMind researchers systematically analyze three key failure modes of LLMs in decision-making tasks (greediness, frequency bias, and knowing-doing gap), demonstrating how reinforcement learning fine-tuning with chain-of-thought reasoning can improve exploration and decision-making capabilities across multi-armed bandits, contextual bandits, and game-playing environments.

17 Apr 2025
generative-modelsvideo-understandingtransformers

A compression technique called FramePack enables next-frame prediction models to generate longer videos by using variable kernel sizes to compress input frames based on their temporal importance, while anti-drifting sampling methods reduce error accumulation through bi-directional context and endpoint anchoring.

21 Apr 2025
vision-language-modelsmulti-modal-learningcomputer-vision-security

A comprehensive benchmark called FG-BMK evaluates large vision-language models' capabilities on fine-grained image analysis through 3.49 million questions across 3.32 million images from 12 established datasets, revealing significant performance gaps in attribute recognition and category reasoning while demonstrating the superiority of contrastive training approaches over generative methods.

17 Apr 2025
contrastive-learningmulti-modal-learningrepresentation-learning

A unified visual encoding framework from Meta FAIR demonstrates that intermediate network layers contain the most effective visual embeddings, with a contrastive learning approach achieving state-of-the-art performance across multiple vision tasks through targeted alignment tuning and robust pretraining strategies.

22 Apr 2025
robotics-perceptionvision-language-modelsrobotic-control

A Vision-Language-Action model called π0.5 enables mobile robots to perform complex household tasks in entirely new homes through co-training on heterogeneous data sources including multi-robot demonstrations, web data, and verbal instructions, while using a hierarchical inference approach that combines semantic subtask prediction with low-level action generation.