Xiangtai Li's picture

Xiangtai Li

LXT

·

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

authored a paper 4 days ago

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

authored a paper 4 days ago

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation

authored a paper 4 days ago

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

View all activity

Organizations

authored 11 papers 4 days ago

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

Paper • 2509.16972 • Published Sep 21, 2025 • 2

LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation

Paper • 2510.11063 • Published Oct 13, 2025 • 1

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

Paper • 2401.10228 • Published Jan 18, 2024

RecTok: Reconstruction Distillation along Rectified Flow

Paper • 2512.13421 • Published Dec 15, 2025 • 5

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing

Paper • 2512.11715 • Published Dec 12, 2025

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Paper • 2512.10958 • Published Dec 11, 2025 • 1

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Paper • 2512.16760 • Published Dec 18, 2025 • 15

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

Paper • 2412.03255 • Published Dec 4, 2024

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 44

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

Paper • 2602.01842 • Published Feb 2 • 3

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Paper • 2312.07526 • Published Apr 8, 2024

upvoted a paper 4 days ago

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

Paper • 2606.07436 • Published 9 days ago • 23

upvoted 2 papers 7 days ago

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Paper • 2606.07433 • Published 10 days ago • 21

Towards One-to-Many Temporal Grounding

Paper • 2606.06294 • Published 11 days ago • 7

liked a model 9 days ago

MSALab/LoomVideo

Updated 9 days ago • 5

upvoted a paper 10 days ago

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Paper • 2606.06042 • Published 11 days ago • 24

liked a dataset about 1 month ago

marinero4972/VideoZeroBench

Preview • Updated May 6 • 207 • 4

upvoted 3 papers about 2 months ago

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

Paper • 2604.26951 • Published Apr 29 • 49

Context Unrolling in Omni Models

Paper • 2604.21921 • Published Apr 23 • 14

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Paper • 2604.20796 • Published Apr 22 • 243