Visit UCR Return to Campus website - Take the COVID Screening Check survey

Breadcrumb

Fortifying AI Against Rogue Rewiring

As generative AI models move from massive cloud servers to phones and cars, they’re stripped down to save power. But what gets trimmed can include the technology that stops them from spewing hate speech or offering roadmaps for criminal activity. To counter this threat, researchers at the University of California, Riverside, have developed a method to preserve AI safeguards even when open-source AI models are stripped down to run on lower-power devices.

The team’s solution was to retrain the model’s internal structure so that its ability to detect and block dangerous prompts is preserved, even when key layers are removed. Their approach avoids external filters or software patches. Instead, it changes how the model understands risky content at a fundamental level.

Link to the paper: https://arxiv.org/abs/2411.04291

Link to UCR News: https://news.ucr.edu/articles/2025/09/04/ucr-researchers-fortify-ai-against-rogue-rewiring