Stanford University researchers have published trial results showing that AI practice chatbots alone are not enough to train counselors, therapists or peer supporters in empathy and active listening.
Only when the practice partner is paired with a structured AI feedback mentor do users measurably improve their skills, according to a 90-person randomized controlled trial of the CARE system built at the Stanford Institute for Human-Centered AI.
The finding has direct consequences for EdTech and workforce training platforms building AI tutoring tools for human-facing roles, where practice chatbots without feedback loops are currently common.
The CARE project is led by Diyi Yang, assistant professor of computer science at Stanford, working with Ryan Louie, a postdoc in computer science at Stanford and first author on the project. Their team has built three systems under an AI Partner and AI Mentor framework: one for conflict resolution, one for basic peer counseling, and CARE for novice therapy skills.
Trial groups split on empathy and problem-solving instincts
The trial compared two groups: one practicing with the chatbot alone, and one practicing with chatbot plus feedback from the AI mentor. The practice-only group defaulted to suggesting solutions to clients. The practice-plus-feedback group showed more client-centered behavior and greater empathy, closer to how trained therapists operate.
Yang says both groups gained something. “It helps people build confidence in their abilities”, Louie says of practice alone. “And feedback matters for building competence in specific skills.” He adds that the research sits in what he calls a useful niche for applied AI. “These AIs have enough capability to be helpful while functioning safely in this arena of helping the helpers.”
Rule sets were needed to stop chatbots behaving like chatbots
A core finding of the design process was that off-the-shelf LLMs are poor practice partners without heavy constraint. Yang’s team found default chatbot behavior, including excessive cooperation and early disclosure, prevented users from learning. “Out of the box, LLMs don’t know how to behave in a way that will allow a person to learn specific social skills”, Louie says.
The team co-designed rule sets with domain experts covering 25 scenarios, drawn up with experienced therapists. Louie describes the design brief as calibration rather than realism alone. “We had to find the Goldilocks zone”, he says. The system checks each output against its rule set before responding and regenerates any reply that breaks a rule.
The mentor component was built from therapist-annotated conversation transcripts supplied by supervisors at Stanford School of Medicine, then fine-tuned against that feedback. CARE is now being adapted for community mental health centers that train their own counselors, and Yang’s team has opened a collaboration in India to port the system into a different language and cultural context.