TL;DR:We model how artistic style flows over 500 years without relying on ground truth pairs.
We curated a large-scale, unified dataset of 650k artworks annotated with creation year and other metadata, covering five centuries of diverse artistic styles.
Overview
We introduce a generative framework that models the temporal evolution of artistic styles as an optimal transport problem in a learned style space. By combining stochastic interpolants with diffusion implicit bridges, the model aligns artistic distributions across centuries without paired data, revealing how visual styles continuously flow and transform through time.
Unlike existing generative models that treat artworks as isolated instances, our approach captures stylistic dynamics and transitions across history. It enables re-synthesis of artworks from or to any era and quantitative analysis of evolving aesthetic patterns. To support this, we also curated a large-scale art dataset spanning 500 years, providing a foundation for studying the evolution of artistic modes and cross-cultural influences in an unsupervised manner.
How can we build correspondences without GT pairs?
In the above 2D example, two unconditional models are trained between Gaussian noise and data distributions $\mathcal{A}$ (squares) and $\mathcal{B}$ (circles),
without access to mode labels (colors).
Despite this, structure is preserved in noise space.
By mapping samples from $\mathcal{A}$ to noise via $v^{(a)}$ and then back to $\mathcal{B}$ via $v^{(b)}$,
we show that correspondences can emerge without paired supervision.
This also holds when we jump between multiple intermediate distributions,
where trajectories show the temporal evolution of data points across different conditions (x-axis).
While modes (colors) smoothly blend, the underlying optimal transport plan stays consistent.
Generated samples reliably match their correct target modes without explicit supervision,
demonstrating strong structural alignment. This robustness persists even when transitioning
through multiple intermediate distributions, where samples may be spatially distorted but
maintain accurate mode correspondence throughout the temporal progression.
In the following, we show how this idea can be extended to model temporal flows in art.
Method
Inference
Our method trains stochastic interpolants to map between the style embedding space and structured noise, learning continuous style flows conditioned on an artwork's creation year. This temporal conditioning aligns artistic distributions across centuries, forming a coherent representation of stylistic evolution.
During inference, the model performs both forward and backward flows to visualize and analyze how artistic styles transition over time, enabling exploration of historical context, influence, and stylistic continuity.
🌊 Stylistic flows over time
We visualize stylistic flows for +20, +40, +80 years (starting from early 20th). Blue quivers indicate the dominant movements of art pieces into the next time frame and visualize how style distribution flows across time.
For each jump, orange distribution shows real artworks from the corresponding period; blue distribution is the fixed late-20th-century target.
Early-20th-century samples transported by our model (green distribution) first match each intermediate distribution and ultimately converge toward the target manifold.
🎨 Semantic and stylistic alignment through time, rather than 🧩 pixels
Qualitative comparison of editing methods across different historical styles.
Each row shows results from a different method.
The 1st and 5th columns show original artworks (Les Demoiselles d'Avignon - 1907 and Landscape near Chatou - 1904).
We maintain better semantic alignment through time while reflecting the correct style, whereas other methods prioritize
preserving the original pixels, resulting in outputs that are constrained by pixel-level fidelity rather than semantic or
stylistic coherence with the target period.
We transform a motorcyclist to the year 1800 with increasing flexibility (left to right, top row).
Low flexibility limits stylistic adaptation. This results in hybrid outputs,
such as a Steam Horse locomotive, that blend past and future characteristics without fully transitioning.
Higher values retain semantic identity with correct stylistic traits.
Compared to other methods that struggle to adapt even with higher guidance scales, instead of rigidly retaining "a man on a motorcycle with wheels",
our method flexibly transforms it into a "man on a horse" for adapting the stylistic
context of the target era.
Understanding of style in specific time periods
We computed the FID score between the generated samples and the ground truth conditioned on the time for 1,000 samples per century.
We also fine-tuned SD 1.5 on our dataset with time as additional text input, allowing for more precise conditioning.
Our method exhibits stronger alignment with ground truth distributions.
Measuring style flow quality without ground truth pairs
Evaluating temporal transitions in artworks is challenging without exact ground-truth pairs. We leverage the assumption that artworks created near each other in time and style tend to evolve together rather than independently. This guides our evaluation of how well the model captures realistic style evolution.
Compactness (δ): Measures style coherence by comparing the variance of transformed samples to random samples from the target period. Lower δ indicates samples remain closely grouped with consistent style.
Triplet consistency (τ): Adapted from triplet loss, this verifies if relationships among anchor, similar (positive), and different (negative) samples are preserved after transformation, reflecting local style structure preservation.
We sampled 100 representative artworks per style and transferred clusters to random years within ±100 years, averaging metrics across transfers. For τ, the 25 nearest neighbors served as positives and neighbors ranked 50–75 as negatives for each anchor.
Citation
@InProceedings{Ma_2025_artfm,
author = {Ma, Pingchuan and Gui, Ming and Schusterbauer, Johannes and Yang, Xiaopei and Grebenkova, Olga and Hu, Vincent Tao and Ommer, Bj\"orn},
title = {Stochastic Interpolants for Revealing Stylistic Flows across the History of Art},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {5867-5878}
}
Check out other works from our group at ICCV 2025: