Evaluation of an LLM-based Chatbot

三月 23, 2026

One relevant academic paper that evaluates an LLM-based chatbot is a study on AI chatbots for mental health support published in the Journal of Artificial Intelligence and Autonomous Intelligence.(DOI: 10.54364/JAIAI.2024.1105)

The paper investigates the effectiveness of a chatbot designed as a mental health coach. The evaluation was conducted using a User Experience Questionnaire (UEQ), which measures dimensions such as efficiency, dependability, stimulation, and novelty. The results show that users found the chatbot engaging and helpful, particularly in providing motivational and supportive responses. However, slightly lower scores in efficiency and dependability indicate limitations in maintaining consistent conversational flow .

I selected this paper for three main reasons. First, it clearly involves a large language model-based chatbot in a specific context (mental health support), which aligns with the assignment requirements. Second, the paper includes a substantive evaluation using a structured questionnaire (UEQ) rather than merely describing the system. This makes the findings more reliable and measurable. Third, the evaluation considers multiple UX dimensions, allowing for a more comprehensive assessment of chatbot effectiveness.

This paper is particularly valuable because it demonstrates how LLM-based chatbots can positively influence user experience while also highlighting practical limitations. It provides a balanced perspective on both the strengths and weaknesses of chatbot systems in real-world applications.

Reference

Pan, X. (2024). Evaluation of AI-driven chatbots for user experience and task effectiveness. Journal of Artificial Intelligence and Autonomous Intelligence. https://doi.org/10.54364/JAIAI.2024.1105

搜索此博客

huaz

Evaluation of an LLM-based Chatbot

评论

发表评论

此博客中的热门博文

Comparing Foodie San vs. Genki Sushi Singapore Apps

Dark Patterns in Food Mobile Applications

Welcome to Taste & Touch