It is not known how large language models, such as ChatGPT, can be applied toward the assessment of the efficacy of medications, including in the prevention of migraine, and how it might support those claims with existing medical evidence.
We queried ChatGPT-3.5 on the efficacy of 47 medications for the prevention of migraine and then asked it to give citations in support of its assessment. ChatGPT’s evaluations were then compared to their FDA approval status for this indication as well as the American Academy of Neurology 2012 evidence-based guidelines for the prevention of migraine. The citations ChatGPT generated for these evaluations were then assessed to see if they were real papers and if they were relevant to the query.
ChatGPT affirmed that the 14 medications that have either received FDA approval for prevention of migraine or AAN Grade A/B evidence were effective for migraine. Its assessments of the other 33 medications were unreliable including suggesting possible efficacy for four medications that have never been used for the prevention of migraine. Critically, only 33/115 (29%) of the papers ChatGPT cited were real, while 76/115 (66%) were “hallucinated” not real papers and 6/115 (5%) shared the names of real papers but had not real citations.
While ChatGPT produced tailored answers on the efficacy of the queried medications, the results were unreliable and inaccurate because of the overwhelming volume of “hallucinated” articles it generated and cited.