Official PyTorch Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation".
[AAAI 2025] Click2Mask: Local Editing with Dynamic Mask Generation
Omer Regev,
Omri Avrahami,
Dani Lischinski
Given an image, a
Click
, and a prompt for an added object, a Mask is generated dynamically,
simultaneously with the object generation throughout the diffusion process.
Current methods rely on existing objects/segments, or user effort (masks/detailed text),
to localize object additions. Our approach enables free-form editing,
where the manipulated area is not well-defined, using just a
Click
for localization.
Try it instantly in your browser - no setup required.
Launch Demo β
Includes both Gradio interface and command line for advanced usage.
Open in Colab β
Each example includes an input image with a
Click
,
followed by outputs corresponding to the prompts below.

A brief glimpse into the qualitative comparison between the SoTA methods β
Emu Edit,
MagicBrush
and InstructPix2Pix
β against our model, Click2Mask.
Upper prompts were given to baselines, and lower (shorter) ones to Click2Mask.
Inputs contain the
Click
given to Click2Mask.

π‘Check out our Hugging Face Demo or Google Colab Demo for instant access without installing.
git clone https://github.com/omeregev/click2mask.git
cd click2maskOption 1: Using pip (Recommended)
pip install -r requirements.txtOption 2: Using Conda
If you prefer conda or need a more isolated environment (note: uses older PyTorch version):
conda env create -f environment.yml
conda activate c2mDownload the Alpha-CLIP checkpoint from here (1.2GB):
mkdir checkpoints
wget -P checkpoints https://huggingface.co/omeregev/click2mask/resolve/main/clip_l14_336_grit1m_fultune_8xe.pthIf the above link is broken, you can use this Google Drive mirror.
Launch the interactive web interface:
python app.pyThen open your browser at the provided public URL interface link.
- Run:
python scripts/text_editing_click2mask.py --image_path "<path/to/input/image>" --prompt "<the prompt>" --output_dir "<path/to/output/directory>" For example:
python scripts/text_editing_click2mask.py --image_path "examples/example1/img1.jpg" --prompt "a sea monster" --output_dir "outputs" -
A window will pop to enable a clicked point over the input image. Once you have clicked with the mouse, press "Enter".
-
The clicked point will be saved in the input directory as "path/to/input/image_click.jpg" for future use. For example:
python scripts/text_editing_click2mask.py --image_path "examples/example2_existing_click/img2.jpg" --prompt "a sea monster" --output_dir "outputs" - If you wish to change the clicked point in future use, delete it or add the argument
"--refresh_click":
python scripts/text_editing_click2mask.py --image_path "examples/example1/img1.jpg" --refresh_click --prompt "a sea monster" --output_dir "outputs" We introduce Edited Alpha-CLIP to evaluate mask-free methods by extracting a mask of the edited region
and using Alpha-CLIP to assess its alignment with the prompt.
Examples of mask extractions: outputs are on the left, extracted masks (green overlay) on the right.

To run Edited Alpha-CLIP similarity tests for methods comparison, here is a usage example (see documentation in script):
from scripts.similarity_tests.edited_alpha_clip import EditedAlphaCLip
edited_ac = EditedAlphaCLip()
image_in_p = "examples/edited_alpha_clip/input.png"
image_out_p = "examples/edited_alpha_clip/magic_brush.jpg"
prompt = "A bench"
save_outs = "outputs/edited_alpha_clip/bench_mb"
similarity = edited_ac.edited_alpha_clip_sim(image_in_p, image_out_p, prompt, save_outs)A higher result is better.
If you find this helpful for your research, please reference the following:
@inproceedings{regev2025click2mask,
title={Click2Mask: Local Editing with Dynamic Mask Generation},
author={Regev, Omer and Avrahami, Omri and Lischinski, Dani},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={7},
pages={6713-6721},
year={2025},
url={https://arxiv.org/abs/2409.08272},
note={Full version with appendices available on arXiv}
}This code is based on Blended Latent Diffusion and Stable Diffusion, and utilizes AlphaCLIP.
