LLM News Digest

Tag: vision

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
Intermediate

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Xinping Lei, Xinyu Che, Junqi Xiong, Chenchen Zhang, Yukai Huang, Chenyu Zhou, Haoyang Huang, Minghao Liu, Letian Zhu, Hongyi Ye, Jinhua Hao, Ken Deng, Zizheng Zhan, Han Li, Dailin Li, Yifan Yao, Ming Sun, Zhaoxiang Zhang, Jiaheng Liu

WebCompass introduces the first comprehensive benchmark for evaluating code language models on real web development workflows, spanning text, image, and video inputs across generation, editing, and repair tasks. This matters because existing benchmarks only test narrow slices of coding capability while missing visual fidelity and interaction quality — critical gaps if you're building or evaluating AI coding tools for web development.

Takeaways
  • Current coding benchmarks fail to capture the full lifecycle of web development, missing visual fidelity and interaction quality.
  • Real-world web coding requires multimodal understanding across text, image, and video inputs in iterative generation-editing-repair cycles.
  • LLM-as-a-judge evaluation with checklist guidance provides a practical methodology for assessing complex web development outputs.
from Apr 27, 2026 · via api-hf · arXiv:2604.18224
AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning
Intermediate

AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

Guransh Singh

AEGIS solves the critical problem of fine-tuning vision-language models for robotics without destroying their original capabilities. Current approaches either throw away valuable continuous supervision or use LoRA adapters that still overwrite pre-trained knowledge, but AEGIS uses orthogonal gradient projection to enable direct continuous learning while preserving the model's existing visual-question-answering abilities.

Takeaways
  • Fine-tuning VLMs for robotics typically destroys original capabilities due to gradient asymmetry between continuous control and discrete language training.
  • Orthogonal gradient projection enables continuous learning while preserving pre-trained manifolds better than LoRA or stop-gradient approaches.
  • The framework addresses the spectral mismatch between low-rank regression gradients and high-dimensional semantic representations.
from Apr 20, 2026 · via api-arxiv · arXiv:2604.16067