Yeah as has been pointed out in this thread already, a huge issue with these language models in my experience is their inability to answer “I don’t know” or even ask follow-up questions when they lack information. Instead they just write out a plausible-sounding but sometimes completely wrong answer.
And the more niche the topic the worse it gets. That’s why it looks really impressive when you just ask simple questions as the model will usually have ample enough data to reply something coherent. But as soon as you start getting into trickier topics the results can get really poor really quickly.






