⚠️ Disclaimer: Potentially Harmful Content ⚠️
This project page presents research on enhancing the safety and benchmarking the robustness of Vision-Language Models (VLMs) against harmful content. To effectively demonstrate our methods and results, this page may contain examples of content that some viewers may find offensive, disturbing, or otherwise harmful.

An example of the HoliSafe, a comprehensive dataset that covers all combinations of image and text safeness (safe/unsafe image with safe/unsafe text), and a corresponding evaluation benchmark, HoliSafe-Bench, which poses novel challenges to modern VLMs. Unlike other safety-tuned VLMs (VLGuard and SPA-VL) susceptible to jailbreaks and unsafe responses, SafeLLaVA-7B robustly defends against such attacks.