ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control

Shishi Xiao1, Tongyu Zhou2, David H. Laidlaw1, Gromit Yeuk-Yin Chan2

1Brown University, 2Adobe Research

Teaser

Abstract

A pictorial chart is an effective medium for visual storytelling, combining visual elements with data charts. However, generating such images is challenging because the flexibility of visual elements often conflicts with the rigid structure required to encode data. We present ChArtist, a domain-specific diffusion model for automatic pictorial chart generation with two complementary controls: (1) spatial control that aligns with chart structure, and (2) subject-driven control that preserves the visual characteristics of a reference image. To enable this, we introduce a skeleton-based spatial representation that encodes only the chart’s data structure, allowing flexible integration of reference visuals without rigid outline constraints. Our method is built on a Diffusion Transformer (DiT) with an adaptive positional encoding mechanism to coordinate these two control signals. We further propose Spatially Gated Attention to regulate the interaction between spatial and subject controls. To facilitate model training, we construct a dataset of 30,000 triplets (skeleton, reference image, pictorial chart) and introduce a unified data accuracy metric to evaluate data faithfulness in generated charts.

What ChArtist can do?

Starting from this line chart, we support two types of generation to make it more informative.

Hover over the text/image to see the generated pictorial chart.

Text-driven Generation
Base Line Chart
Image-driven Generation

Method

Task-Specific LoRA

We train two lightweight LoRA modules to control generation from different sources: a spatial LoRA Subject LoRA that follows the chart skeleton and a subject LoRA Subject LoRA that injects the visual appearance from a reference image. These controls can be used independently or combined to support flexible chart creation.

Task-Specific LoRA Illustration

Challenge of Merging Multiple LoRAs

Naively composing multiple LoRAs in parallel introduces cross-condition interference. In pictorial charts, this often leads to structure misalignment or style leakage, where the generated visuals break the chart’s data structure.

Merging Challenge Illustration

Spatially-Gated Attention

To address the challenge, we use Spatially-Gated Attention to coordinate spatial and subject control. A spatial mask from the chart skeleton gates subject attention so that visual elements remain aligned with the chart structure.

Spatially-Gated Attention Illustration

Results

Bar chart

Hover over the result image to compare with the skeleton.

"pagoda tower"

+
Skeleton 1
=
Result 1 Skeleton overlay

"cherry blossom"

+
Skeleton 2
=
Result 2 Skeleton overlay

"ice cream"

+
Skeleton 3
=
Result 3 Skeleton overlay
Text-driven Generation
Reference image 1
+
Skeleton A Skeleton B Skeleton C
click to swap
=
Result 1 Skeleton overlay
Reference image 2
+
Skeleton 2
=
Result 2 Skeleton overlay
Reference image 3
+
Skeleton 3
=
Result 3 Skeleton overlay
Image-driven Generation

Line chart

Hover over the result image to compare with the skeleton.

"People in kimonos walking"

+
Skeleton 1
=
Result 1 Skeleton overlay

"Surfing"

+
Skeleton 2
=
Result 2 Skeleton overlay

"Igloos"

+
Skeleton 3
=
Result 3 Skeleton overlay
Text-driven Generation
Reference image 1
+
Skeleton A Skeleton B Skeleton C
click to swap
=
Result 1 Skeleton overlay
Reference image 2
+
Skeleton 2
=
Result 2 Skeleton overlay
Reference image 3
+
Skeleton 3
=
Result 3 Skeleton overlay
Image-driven Generation

Pie chart

Hover over the result image to compare with the skeleton.

"lollipop"

+
Skeleton 1
=
Result 1 Skeleton overlay

"Castle Turret"

+
Skeleton 2
=
Result 2 Skeleton overlay

"purple flower"

+
Skeleton 3
=
Result 3 Skeleton overlay
Text-driven Generation
Reference image 1
+
Skeleton A Skeleton B Skeleton C
click to swap
=
Result 1 Skeleton overlay
Reference image 2
+
Skeleton 2
=
Result 2 Skeleton overlay
Reference image 3
+
Skeleton 3
=
Result 3 Skeleton overlay
Image-driven Generation

Metric of Data Faithfulness

Preserving the underlying data is critical for pictorial charts. We propose a unified data accuracy metric to measure how well the generated image follows the chart structure. The metric constructs a distance field along the data-encoding dimension and computes a weighted F1 score based on sampled points around the skeleton.

eva_data_vis

Application

interface

BibTeX

@article{chartis2026,
            title={Generating Pictorial Charts with Unified Spatial and Subject Control},
            author={Shishi Xiao, Tongyu Zhou, David Laidlaw and Gromit Yeuk-Yin Chan},
            journal={arXiv preprint arXiv:XXXX.XXXXX},
            year={2026}
            }