A recent study indicates that large language models (LLMs) can effectively generate treatment recommendations for early-stage hepatocellular carcinoma (HCC), aligning with established clinical guidelines. However, their effectiveness diminishes in more complex cases, particularly those involving advanced-stage liver cancer. This research, led by Ji Won Han from The Catholic University of Korea, was published in the open-access journal PLOS Medicine.
The challenge of determining the most appropriate treatment for liver cancer is substantial. While international treatment guidelines provide a framework, clinicians must personalize their recommendations based on various factors, including the cancer stage, liver function, and any existing comorbidities. To evaluate the performance of LLMs in this domain, the researchers compared treatment suggestions generated by three LLMs—ChatGPT, Gemini, and Claude—with actual treatments received by over 13,000 newly diagnosed patients with HCC in South Korea.
The study found that for patients diagnosed with early-stage HCC, there was a notable correlation between the treatment recommendations made by LLMs and the actual treatments administered. In these cases, higher agreement between LLM suggestions and clinical practices was linked to improved survival rates. Conversely, for those with advanced-stage liver cancer, the situation was reversed; increased alignment between LLM advice and clinical actions was associated with poorer outcomes.
One significant finding from the study is that LLMs tended to prioritize tumor-related factors, such as size and quantity of tumors, while physicians focused more on liver function. This divergence illustrates a critical limitation of LLMs in complex clinical scenarios. The authors emphasize that while LLMs may assist in straightforward treatment decisions, particularly in early-stage cases, they are not adequately equipped to guide care in more complicated situations that require nuanced clinical judgment.
The study highlights the importance of caution when using LLM-generated advice, regardless of the cancer stage. The authors note, “Our study shows that large language models can help support treatment decisions for early-stage liver cancer, but their performance is more limited in advanced disease. This highlights the importance of using LLMs as a complement to, rather than a replacement for, clinical expertise.”
As the role of artificial intelligence continues to evolve in healthcare, these findings suggest that LLMs could serve as valuable tools for clinicians, especially in initial treatment planning for liver cancer. Nonetheless, the necessity for professional judgment remains paramount in ensuring optimal patient care.
