ABSTRACT This study examines how subtitles and image visualizations influence gaze behavior, working alliance, and behavior change intentions in virtual health conversations with ECAs. Visualizations refer to images on a 3D model TV and text on a virtual whiteboard, both reinforcing key content conveyed by the ECA. Using a 2 $$ times $$ 2 factorial design, participants were randomly assigned to one of four conditions: no subtitles or visualizations (Control), subtitles only (SUB), visualizations only (VIS), or both subtitles and visualizations (VISSUB). Structural equation path modeling showed that SUB and VIS individually reduced gaze toward the ECA, whereas VISSUB moderated this reduction, resulting in less gaze loss than the sum of either condition alone. Gaze behavior was positively associated with working alliance, and perceptions of enjoyment and appropriateness influenced engagement, which in turn predicted behavior change intentions. VIS was negatively associated with behavior change intentions, suggesting that excessive visual input may introduce cognitive trade-offs.