Evaluation of an LLM-based Chatbot
One relevant academic paper that evaluates an LLM-based chatbot is a study on AI chatbots for mental health support published in the Journal of Artificial Intelligence and Autonomous Intelligence.(DOI: 10.54364/JAIAI.2024.1105)
The paper investigates the effectiveness of a chatbot designed as a mental health coach. The evaluation was conducted using a User Experience Questionnaire (UEQ), which measures dimensions such as efficiency, dependability, stimulation, and novelty. The results show that users found the chatbot engaging and helpful, particularly in providing motivational and supportive responses. However, slightly lower scores in efficiency and dependability indicate limitations in maintaining consistent conversational flow .
I selected this paper for three main reasons. First, it clearly involves a large language model-based chatbot in a specific context (mental health support), which aligns with the assignment requirements. Second, the paper includes a substantive evaluation using a structured questionnaire (UEQ) rather than merely describing the system. This makes the findings more reliable and measurable. Third, the evaluation considers multiple UX dimensions, allowing for a more comprehensive assessment of chatbot effectiveness.
This paper is particularly valuable because it demonstrates how LLM-based chatbots can positively influence user experience while also highlighting practical limitations. It provides a balanced perspective on both the strengths and weaknesses of chatbot systems in real-world applications.
Reference
Pan, X. (2024). Evaluation of AI-driven chatbots for user experience and task effectiveness. Journal of Artificial Intelligence and Autonomous Intelligence. https://doi.org/10.54364/JAIAI.2024.1105
评论
发表评论