Datasets

This page collects public datasets and benchmarks associated with my publications. I include resources with public project pages, repositories, or dataset releases, and group them by research use case for easier browsing.

Generation And Design

ICE-Bench benchmark overview

ICE-Bench

ICCV 2025

A unified benchmark for image creating and editing. It organizes image generation into four coarse task families, further breaks them into 31 fine-grained tasks, and evaluates outputs across six dimensions including quality, prompt following, consistency, and controllability.

Use case: Image generation and image editing evaluation

PosterLayout paper figure

PKU PosterLayout

CVPR 2023

A benchmark for content-aware visual-textual presentation layout on non-empty canvases. The dataset contains 9,974 poster-layout pairs and 905 image canvases, making it useful for layout generation under realistic graphic design constraints.

Use case: Automatic poster and presentation layout generation

Cross-Modal Retrieval

Real20M dataset overview

PKU Real20M

ACM MM 2023

A large-scale e-commerce dataset for cross-domain retrieval between products and micro-videos. The paper describes a query-driven collection pipeline covering over 20 million e-commerce products and micro-videos with multimodal information.

Use case: Product-to-video and video-to-product retrieval

FG-XMedia paper figure

PKU FG-XMedia

ACM MM 2019

A fine-grained cross-media retrieval benchmark with 200 bird subcategories across four media types: image, text, video, and audio. It was introduced together with FGCrossNet for unified representation learning across all four modalities.

Use case: Fine-grained cross-media retrieval