HealthLLM Arxiv Daily

Updated on 2026.04.01

Usage instructions: here

Table of Contents

HealthLLM
UncertaintyLLM

HealthLLM

Publish Date	Title	Authors	PDF	Code
2025-07-23	From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes	Karen Zhou et.al.	2507.17717	null
2025-07-23	Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination	Mariam ALMutairi et.al.	2507.17134	null
2025-07-22	Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors	Ming Huang et.al.	2507.17009	null
2025-07-22	AI-based Clinical Decision Support for Primary Care: A Real-World Study	Robert Korom et.al.	2507.16947	null
2025-07-22	AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation	Nima Fathi et.al.	2507.16940	null
2025-07-22	Depth Gives a False Sense of Privacy: LLM Internal States Inversion	Tian Dong et.al.	2507.16372	null
2025-07-22	Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens	Fred Mutisya et.al.	2507.16322	null
2025-07-22	Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery	Bo Wen et.al.	2507.16229	null
2025-07-22	SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting	Shuhao Mei et.al.	2507.16145	null
2025-07-21	BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning	Sahana Srinivasan et.al.	2507.15717	null
2025-07-21	ChiMed 2.0: Advancing Chinese Medical Dataset in Facilitating Large Language Modeling	Yuanhe Tian et.al.	2507.15275	null
2025-07-20	What Level of Automation is “Good Enough”? A Benchmark of Large Language Models for Meta-Analysis Data Extraction	Lingbo Li et.al.	2507.15152	null
2025-07-20	Redefining Elderly Care with Agentic AI: Challenges and Opportunities	Ruhul Amin Khalil et.al.	2507.14912	null
2025-07-20	Time-Aware Attention for Enhanced Electronic Health Records Modeling	Junhan Yu et.al.	2507.14847	null
2025-07-19	Investigating the Role of LLMs Hyperparameter Tuning and Prompt Engineering to Support Domain Modeling	Vladyslav Bulhakov et.al.	2507.14735	null
2025-07-19	Rethinking Suicidal Ideation Detection: A Trustworthy Annotation Framework and Cross-Lingual Model Evaluation	Amina Dzafic et.al.	2507.14693	null
2025-07-19	Large Language Models as Medical Codes Selectors: a benchmark using the International Classification of Primary Care	Vinicius Anjos de Almeida et.al.	2507.14681	null
2025-07-19	Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper	Fred Mutisya et.al.	2507.14615	null
2025-07-18	Leveraging LLMs for Formal Software Requirements – Challenges and Prospects	Arshad Beg et.al.	2507.14330	null
2025-07-17	Language Models Change Facts Based on the Way You Talk	Matthew Kearney et.al.	2507.14238	null
2025-07-18	DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits	Garapati Keerthana et.al.	2507.14079	null
2025-07-18	Cross-modal Causal Intervention for Alzheimer’s Disease Prediction	Yutao Jin et.al.	2507.13956	null
2025-07-18	RAG-based Architectures for Drug Side Effect Retrieval in LLMs	Shad Nygren et.al.	2507.13822	null
2025-07-18	DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs	Ye Tian et.al.	2507.13737	null
2025-07-17	Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines	Muhammad Javed et.al.	2507.12840	null
2025-07-17	Emotional Support with LLM-based Empathetic Dialogue Generation	Shiquan Wang et.al.	2507.12820	null
2025-07-17	A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models	Weijieying Ren et.al.	2507.12774	null
2025-07-16	Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes	Johann Frei et.al.	2507.12261	null
2025-07-15	LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer	Yaoxian Dong et.al.	2507.11457	null
2025-07-20	Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian	Andrei Niculae et.al.	2507.11299	null
2025-07-15	LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP	Haowei Yang et.al.	2507.11052	null
2025-07-15	Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation	Yicong Wu et.al.	2507.10911	null
2025-07-14	Automated Thematic Analyses Using LLMs: Xylazine Wound Management Social Media Chatter Use Case	JaMor Hairston et.al.	2507.10803	null
2025-07-14	Exploring User Security and Privacy Attitudes and Concerns Toward the Use of General-Purpose LLM Chatbots for Mental Health	Jabari Kwesi et.al.	2507.10695	null
2025-07-11	Transforming Sensitive Documents into Quantitative Data: An AI-Based Preprocessing Toolchain for Structured and Privacy-Conscious Analysis	Anders Ledberg et.al.	2507.10582	null
2025-07-11	An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered Evaluation	Vimaleswar A et.al.	2507.10580	null
2025-07-14	Towards Emotion Co-regulation with LLM-powered Socially Assistive Robots: Integrating LLM Prompts and Robotic Behaviors to Support Parent-Neurodivergent Child Dyads	Jing Li et.al.	2507.10427	null
2025-07-22	Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models	Anita Kriz et.al.	2507.09279	null
2025-07-12	AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data	Mohammad Abolnejadian et.al.	2507.09100	null
2025-07-11	ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making	Bharadwaj Ravichandran et.al.	2507.09037	null
2025-07-11	Evaluating LLMs in Medicine: A Call for Rigor, Transparency	Mahmoud Alwakeel et.al.	2507.08916	null
2025-07-10	Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial	Neil K. R. Sehgal et.al.	2507.08211	null
2025-07-14	Beyond Scale: Small Language Models are Comparable to GPT-4 in Mental Health Understanding	Hong Jia et.al.	2507.08031	null
2025-07-08	A Systematic Analysis of Declining Medical Safety Messaging in Generative AI Models	Sonali Sharma et.al.	2507.08030	null
2025-07-10	Automating Expert-Level Medical Reasoning Evaluation of Large Language Models	Shuang Zhou et.al.	2507.07988	null
2025-07-10	Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology	Sabine Felde et.al.	2507.07983	null
2025-07-10	DocCHA: Towards LLM-Augmented Interactive Online diagnosis System	Xinyi Liu et.al.	2507.07870	null
2025-07-11	Measuring AI Alignment with Human Flourishing	Elizabeth Hilliard et.al.	2507.07787	null
2025-07-10	Learnable Retrieval Enhanced Visual-Text Alignment and Fusion for Radiology Report Generation	Qin Zhou et.al.	2507.07568	null
2025-07-10	SynthEHR-Eviction: Enhancing Eviction SDoH Detection with LLM-Augmented Synthetic EHR Data	Zonghai Yao et.al.	2507.07421	null
2025-07-10	MedReadCtrl: Personalizing medical text generation with readability-controlled instruction learning	Hieu Tran et.al.	2507.07419	null
2025-07-09	Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation	Anirban Saha Anik et.al.	2507.07307	null
2025-07-09	Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning	Juejing Liu et.al.	2507.07293	null
2025-07-11	Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings	Jean-Philippe Corbeil et.al.	2507.07248	null
2025-07-09	MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation	Qilong Xing et.al.	2507.06992	null
2025-07-09	CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs	Garapati Keerthana et.al.	2507.06715	null
2025-07-08	A Semantic Parsing Framework for End-to-End Time Normalization	Xin Su et.al.	2507.06450	null
2025-07-08	Large Language Models Predict Human Well-being – But Not Equally Everywhere	Pat Pataranutaporn et.al.	2507.06141	null
2025-07-08	Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening	Zhijun Guo et.al.	2507.05984	null
2025-07-08	Affective-ROPTester: Capability and Bias Analysis of LLMs in Predicting Retinopathy of Prematurity	Shuai Zhao et.al.	2507.05816	null
2025-07-08	MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models	Wei Zhang et.al.	2507.05591	null
2025-07-07	SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation	Shovito Barua Soumma et.al.	2507.05541	null
2025-07-09	Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications	Jean-Philippe Corbeil et.al.	2507.05517	null
2025-07-07	GLOSS: Group of LLMs for Open-Ended Sensemaking of Passive Sensing Data for Health and Wellbeing	Akshat Choube et.al.	2507.05461	null
2025-07-07	LCDS: A Logic-Controlled Discharge Summary Generation System Supporting Source Attribution and Expert Review	Cheng Yuan et.al.	2507.05319	null
2025-07-07	DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine	Zewen Sun et.al.	2507.04877	null
2025-07-06	Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models	Pranta Saha et.al.	2507.04432	null
2025-07-08	MedGellan: LLM-Generated Medical Guidance to Support Physicians	Debodeep Banerjee et.al.	2507.04431	null
2025-07-06	Large Language Models’ Varying Accuracy in Recognizing Risk-Promoting and Health-Supporting Sentiments in Public Health Discourse: The Cases of HPV Vaccination and Heated Tobacco Products	Soojong Kim et.al.	2507.04364	null
2025-07-06	Computed Tomography Visual Question Answering with Cross-modal Feature Graphing	Yuanhe Tian et.al.	2507.04333	null
2025-07-06	M $^3$ -Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding	Shenxi Liu et.al.	2507.04289	null
2025-07-05	Dissecting Clinical Reasoning in Language Models: A Comparative Study of Prompts and Model Adaptation Strategies	Mael Jullien et.al.	2507.04142	null
2025-07-05	Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs	Haifeng Zhao et.al.	2507.03908	null
2025-07-05	Enhancing Adaptive Behavioral Interventions with LLM Inference from Participant-Described States	Karine Karine et.al.	2507.03871	null
2025-07-04	ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays	Shehroz S. Khan et.al.	2507.03739	null
2025-07-04	Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation	Tao Tang et.al.	2507.03585	null
2025-07-04	Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models	Adrien Bazoge et.al.	2507.03433	null
2025-07-04	Conformal Information Pursuit for Interactively Guiding Large Language Models	Kwan Ho Ryan Chan et.al.	2507.03279	null
2025-07-03	How Much Content Do LLMs Generate That Induces Cognitive Bias in Users?	Abeer Alessa et.al.	2507.03194	null
2025-07-03	Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case	Alvaro Riquelme et.al.	2507.03067	null
2025-07-03	Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation	Johnson Thomas et.al.	2507.03033	null
2025-07-02	CLUES: Collaborative High-Quality Data Selection for LLMs via Training Dynamics	Wanru Zhao et.al.	2507.03004	null
2025-07-02	Evaluating Hierarchical Clinical Document Classification Using Reasoning-Based LLMs	Akram Mustafa et.al.	2507.03001	null
2025-07-01	Text-Guided Multi-Instance Learning for Scoliosis Screening via Gait Video Analysis	Haiqing Li et.al.	2507.02996	null
2025-07-01	MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization	Huihui Xu et.al.	2507.02994	null
2025-07-01	`For Argument’s Sake, Show Me How to Harm Myself!’: Jailbreaking LLMs in Suicide and Self-Harm Contexts	Annika M Schoene et.al.	2507.02990	null
2025-07-01	Truth, Trust, and Trouble: Medical AI on the Edge	Mohammad Anas Azeez et.al.	2507.02983	null
2025-07-08	Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria	Keita Kiuchi et.al.	2507.02950	null
2025-06-25	Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction	Reza Samimi et.al.	2507.02920	null
2025-07-03	SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model	Wencheng Zhang et.al.	2507.02822	null
2025-07-06	KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs	Yuzhang Xie et.al.	2507.02773	null
2025-07-03	Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data	Irena Girshovitz et.al.	2507.02628	null
2025-07-03	DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making	Tianqi Shang et.al.	2507.02616	null
2025-07-02	BACTA-GPT: An AI-Based Bayesian Adaptive Clinical Trial Architect	Krishna Padmanabhan et.al.	2507.02130	null
2025-07-10	The Thin Line Between Comprehension and Persuasion in LLMs	Adrian de Wynter et.al.	2507.01936	null
2025-07-02	Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America	Dorian Peters et.al.	2507.01719	null
2025-07-02	Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care	Matthew JY Kang et.al.	2507.01282	null
2025-07-02	Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening	Cindy Lie Tabuse et.al.	2507.01278	null
2025-07-01	Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept	Edouard Lansiaux et.al.	2507.01080	null
2025-06-27	Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis	Rafi Al Attrach et.al.	2507.01053	null
2025-07-01	Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection	Yifan Gao et.al.	2507.00693	null
2025-07-02	$μ^2$ Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation	Siyou Li et.al.	2507.00316	null
2025-06-25	VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models	Binesh Sadanandan et.al.	2507.00052	null
2025-06-30	Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning	Seungjun Yi et.al.	2506.23998	null
2025-07-02	Positioning AI Tools to Support Online Harm Reduction Practice: Applications and Design Directions	Kaixuan Wang et.al.	2506.22941	null
2025-06-28	Coordinated 2D-3D Visualization of Volumetric Medical Data in XR with Multimodal Interactions	Qixuan Liu et.al.	2506.22926	null
2025-06-28	MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs	Jianhui Wei et.al.	2506.22808	null
2025-06-22	Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines	Wenhao Li et.al.	2506.21615	null
2025-06-18	Overview of the ClinIQLink 2025 Shared Task on Medical Question-Answering	Brandon Colelough et.al.	2506.21597	null
2025-06-26	“What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets	Akshay Paruchuri et.al.	2506.21532	null
2025-06-26	MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification	Shadman Sobhan et.al.	2506.21199	null
2025-06-25	Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation	Md Toufique Hasan et.al.	2506.20869	null
2025-06-25	An Agentic System for Rare Disease Diagnosis with Traceable Reasoning	Weike Zhao et.al.	2506.20430	null
2025-06-25	ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset	Yilin Wang et.al.	2506.20093	null
2025-06-24	DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction	Weijieying Ren et.al.	2506.20059	null
2025-06-24	Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks	Konstantinos Vrettos et.al.	2506.20009	null
2025-06-24	MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration	Yucheng Zhou et.al.	2506.19835	null
2025-06-24	LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis	Lei Kang et.al.	2506.19702	null
2025-06-26	Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance	Xuesong Li et.al.	2506.19683	null
2025-06-24	Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation	Yuanhe Tian et.al.	2506.19665	null
2025-06-24	Automatic Posology Structuration : What role for LLMs?	Natalia Bobkova et.al.	2506.19525	null
2025-06-24	EmoStage: A Framework for Accurate Empathetic Response Generation via Perspective-Taking and Phase Recognition	Zhiyang Qi et.al.	2506.19279	null
2025-06-23	Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs	Janak Kapuriya et.al.	2506.19185	null
2025-06-23	GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model	Amir Faiyaz et.al.	2506.19164	null
2025-06-23	Enhancing Biosecurity in Tamper-Resistant Large Language Models With Quantum Gradient Descent	Fahmida Hai et.al.	2506.19086	null
2025-06-23	FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation	Nitish Nagesh et.al.	2506.19082	null
2025-06-23	RWESummary: A Framework and Test for Choosing Large Language Models to Summarize Real-World Evidence (RWE) Studies	Arjun Mukerji et.al.	2506.18819	null
2025-06-23	MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Yuting Zhang et.al.	2506.18512	null
2025-06-23	Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics	Yousang Cho et.al.	2506.18387	null
2025-06-27	Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team	Weilun Yu et.al.	2506.18348	null
2025-06-24	Co-persona: Leveraging LLMs and Expert Collaboration to Understand User Personas through Social Media Data Analysis	Min Yin et.al.	2506.18269	null
2025-06-22	Programming Quantum Computers with Large Language Models	Elena R. Henderson et.al.	2506.18125	null
2025-06-22	Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives	Batool Haider et.al.	2506.18116	null
2025-06-22	Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster	Fenghe Tang et.al.	2506.18034	null
2025-06-22	SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model	Guankun Wang et.al.	2506.17873	null
2025-06-21	Engagement and Disclosures in LLM-Powered Cognitive Behavioral Therapy Exercises: A Factorial Design Comparing the Influence of a Robot vs. Chatbot Over Time	Mina Kian et.al.	2506.17831	null
2025-06-21	Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs	Catarina Pires et.al.	2506.17782	null
2025-06-21	Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages	Matthias Schöffel et.al.	2506.17715	null
2025-06-21	LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning	Haoxuan Che et.al.	2506.17562	null
2025-06-20	Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation	Hao Guan et.al.	2506.17442	null
2025-07-01	Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases	Yubeen Bae et.al.	2506.17336	link
2025-06-14	Automating Financial Statement Audits with Large Language Models	Rushi Wang et.al.	2506.17282	null
2025-06-20	The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making	Abinitha Gourabathina et.al.	2506.17163	null
2025-06-20	DistillNote: LLM-based clinical note summaries improve heart failure diagnosis	Heloisa Oss Boll et.al.	2506.16777	null
2025-06-19	Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System	Jianlin Shi et.al.	2506.16628	null
2025-06-19	A Scoping Review of Synthetic Data Generation for Biomedical Research and Applications	Hanshu Rao et.al.	2506.16594	null
2025-06-19	Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support	Sophie Chiang et.al.	2506.16473	null
2025-06-23	From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents	Mohammad Amaan Sayeed et.al.	2506.15911	null
2025-06-18	Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning	Chunlei Li et.al.	2506.15477	null
2025-06-18	DeVisE: Behavioral Testing of Medical Large Language Models	Camila Zurdo Tagliabue et.al.	2506.15339	null
2025-06-18	Universal Laboratory Model: prognosis of abnormal clinical outcomes based on routine tests	Pavel Karpov et.al.	2506.15330	link
2025-06-18	Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment	Shrestha Ghosh et.al.	2506.15301	null
2025-06-18	Mapping Caregiver Needs to AI Chatbot Design: Strengths and Gaps in Mental Health Support for Alzheimer’s and Dementia Caregivers	Jiayue Melissa Shi et.al.	2506.15047	null
2025-06-17	From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction?	Shadman Sakib et.al.	2506.14949	link
2025-06-17	A Vision for Geo-Temporal Deep Research Systems: Towards Comprehensive, Transparent, and Reproducible Geo-Temporal Information Synthesis	Bruno Martins et.al.	2506.14345	null
2025-06-17	Abstract Meaning Representation for Hospital Discharge Summarization	Paul Landes et.al.	2506.14101	link
2025-06-17	InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking	Rahul Seetharaman et.al.	2506.14086	null
2025-06-13	Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases	Bonam Mingole et.al.	2506.13805	null
2025-06-13	Enhancing Clinical Decision Support and EHR Insights through LLMs and the Model Context Protocol: An Open-Source MCP-FHIR Framework	Abul Ehtesham et.al.	2506.13800	null
2025-06-18	The NordDRG AI Benchmark for Large Language Models	Tapio Pitkäranta et.al.	2506.13790	link
2025-06-16	Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems	Shang-Chi Tsai et.al.	2506.13692	null
2025-06-16	Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning	David Bani-Harouni et.al.	2506.13474	null
2025-06-16	Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models	James Chua et.al.	2506.13206	null
2025-06-16	Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs	Gyutaek Oh et.al.	2506.13102	null
2025-06-15	CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation	Naihao Deng et.al.	2506.12936	null
2025-06-15	Towards Visualizing Electronic Medical Records via Natural Language Queries	Haodi Zhang et.al.	2506.12837	null
2025-06-14	Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders	Ananya Joshi et.al.	2506.12576	link
2025-06-14	Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare	Yubin Kim et.al.	2506.12482	null
2025-06-14	Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs	Erica Cai et.al.	2506.12367	null
2025-06-20	Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning	Xiaotian Zhang et.al.	2506.12307	null
2025-06-13	Semantic Scheduling for LLM Inference	Wenyue Hua et.al.	2506.12204	link
2025-06-13	Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs	Chenqian Le et.al.	2506.12182	null
2025-06-10	Risks & Benefits of LLMs & GenAI for Platform Integrity, Healthcare Diagnostics, Cybersecurity, Privacy & AI Safety: A Comprehensive Survey, Roadmap & Implementation Blueprint	Kiarash Ahi et.al.	2506.12088	null
2025-06-16	Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making	Claudio Fanconi et.al.	2506.11887	null
2025-06-13	Converting Annotated Clinical Cases into Structured Case Report Forms	Pietro Ferrazzi et.al.	2506.11666	null
2025-06-24	RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning	Yu Wang et.al.	2506.11555	null
2025-06-13	Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs	Wei Li et.al.	2506.11512	link
2025-06-13	Predicting Early-Onset Colorectal Cancer with Large Language Models	Wilson Lau et.al.	2506.11410	null
2025-06-13	Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning	Liying Wang et.al.	2506.11376	null
2025-06-12	LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic	Weibing Zheng et.al.	2506.11221	link
2025-06-11	Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning	Ji Young Byun et.al.	2506.11166	null
2025-06-16	ADAgent: LLM Agent for Alzheimer’s Disease Analysis with Collaborative Coordinator	Wenlong Hou et.al.	2506.11150	null
2025-06-19	Autonomous Computer Vision Development with Agentic AI	Jin Kim et.al.	2506.11140	link
2025-06-10	Scalable Medication Extraction and Discontinuation Identification from Electronic Health Records Using Large Language Models	Chong Shao et.al.	2506.11137	null
2025-06-10	Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK	Carlos Garcia-Fernandez et.al.	2506.11129	null
2025-06-09	KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations	Junyu Liu et.al.	2506.11114	null
2025-06-16	Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation	Uttej Kallakurik et.al.	2506.11105	null
2025-06-12	The Role of Generative AI in Facilitating Social Interactions: A Scoping Review	T. T. J. E. Arets et.al.	2506.10927	null
2025-06-12	Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs	Alberto Testoni et.al.	2506.10769	null
2025-06-12	Large Language Models for Detection of Life-Threatening Texts	Thanh Thi Nguyen et.al.	2506.10687	null
2025-06-11	HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding	Yanzhao Shi et.al.	2506.09634	null
2025-06-11	ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning	Yu Sun et.al.	2506.09513	link
2025-06-11	Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers	Ilanit Sobol et.al.	2506.09495	null
2025-06-11	“Is This Really a Human Peer Supporter?”: Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions	Kellie Yu Hui Sim et.al.	2506.09354	null
2025-06-10	The Curious Language Model: Strategic Test-Time Information Acquisition	Michael Cooper et.al.	2506.09173	null
2025-06-10	CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling	Yahan Li et.al.	2506.08584	link
2025-06-10	RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being	Rahatara Ferdousi et.al.	2506.08486	null
2025-06-10	Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving	Yuxuan Zhou et.al.	2506.08349	link
2025-06-09	Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework	Melissa Estevez et.al.	2506.08231	null
2025-06-09	Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System	Fan Yang et.al.	2506.07997	null
2025-06-11	MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models	Philip R. Liu et.al.	2506.07400	link
2025-06-08	Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance	Mohammadreza Chavoshi et.al.	2506.07273	null
2025-06-07	AI PsyRoom: Artificial Intelligence Platform for Segmented Yearning and Reactive Outcome Optimization Method	Yigui Feng et.al.	2506.06740	null
2025-06-07	C-PATH: Conversational Patient Assistance and Triage in Healthcare System	Qi Shi et.al.	2506.06737	null
2025-06-07	DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains	Zhihui Chen et.al.	2506.06705	null
2025-06-07	Interpretable Depression Detection from Social Media Text Using LLM-Derived Embeddings	Samuel Kim et.al.	2506.06616	null
2025-06-07	MedCite: Can Language Models Generate Verifiable Text for Medicine?	Xiao Wang et.al.	2506.06605	null
2025-06-14	RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints	Tan-Hanh Pham et.al.	2506.06600	null
2025-06-02	Large Language Models for EEG: A Comprehensive Survey and Taxonomy	Naseem Babu et.al.	2506.06353	null
2025-06-01	Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support	Wu Hao Ran et.al.	2506.06340	null
2025-06-06	Building Models of Neurological Language	Henry Watkins et.al.	2506.06208	null
2025-06-09	MIRIAD: Augmenting LLMs with millions of medical query-response pairs	Qinyue Zheng et.al.	2506.06091	null
2025-06-06	BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions	Saptarshi Sengupta et.al.	2506.05766	null
2025-06-06	Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning	Yangui Fang et.al.	2506.05671	null
2025-06-06	Can LLMs Express Personality Across Cultures? Introducing CulturalPersonas for Evaluating Trait Alignment	Priyanka Dey et.al.	2506.05670	null
2025-06-05	Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction	Zhihao Tang et.al.	2506.05428	null
2025-06-03	Beyond RAG: Reinforced Reasoning Augmented Generation for Clinical Notes	Lo Pang-Yun Ting et.al.	2506.05386	null
2025-06-05	Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation	Soumitra Ghosh et.al.	2506.05073	null
2025-06-05	From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs	Chantal Pellegrini et.al.	2506.04831	null
2025-06-05	A MISMATCHED Benchmark for Scientific Natural Language Inference	Firoz Shaik et.al.	2506.04603	null
2025-06-04	Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification	Payel Bhattacharjee et.al.	2506.04450	null
2025-06-04	MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale	Ran Xu et.al.	2506.04405	null
2025-06-04	AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents	Fengze Liu et.al.	2506.04293	null
2025-06-04	A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization	Sarvesh Soni et.al.	2506.04156	null
2025-06-13	LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation	Ming Zhang et.al.	2506.04078	link
2025-06-04	AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data	Sina Rashidian et.al.	2506.04032	null
2025-06-04	Trustworthy Medical Question Answering: An Evaluation-Centric Survey	Yinuo Wang et.al.	2506.03659	null
2025-06-04	VChatter: Exploring Generative Conversational Agents for Simulating Exposure Therapy to Reduce Social Anxiety	Han Zhang et.al.	2506.03520	null
2025-06-05	Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing	Shigeng Chen et.al.	2506.03490	link
2025-06-04	Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer’s Disease Detection	Chuyuan Li et.al.	2506.03476	null
2025-06-03	Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems	Michael E. Garcia-Alcoser et.al.	2506.03259	null
2025-06-03	Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis	Richard Armitage et.al.	2506.02987	null
2025-06-03	FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models	Yan Gao et.al.	2506.02961	null
2025-06-03	A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning	Xuejiao Zhao et.al.	2506.02470	link
2025-06-02	A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering	Madan Krishnamurthy et.al.	2506.02160	null
2025-06-04	The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning	Edward Y. Chang et.al.	2506.02139	null
2025-06-02	Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis	Chi-Jane Chen et.al.	2506.01918	null
2025-06-02	Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation	Jiaxi Sheng et.al.	2506.01841	null
2025-06-02	Reasoning-Based Approach with Chain-of-Thought for Alzheimer’s Detection Using Speech and Large Language Models	Chanwoo Park et.al.	2506.01683	null
2025-06-02	Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents	Manan Suri et.al.	2506.01344	null
2025-06-02	Evaluating Large Language Models in Crisis Detection: A Real-World Benchmark from Psychological Support Hotlines	Guifeng Deng et.al.	2506.01329	null
2025-06-02	DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models	Jiancheng Ye et.al.	2506.01257	null
2025-06-02	MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine	Shufeng Kong et.al.	2506.01252	null
2025-06-01	Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation	Pimchanok Sukjai et.al.	2506.01118	null
2025-06-03	Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation	Running Yang et.al.	2506.00612	null
2025-05-31	AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation	Ming Wang et.al.	2506.00551	link
2025-05-31	Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization	Suhas BN et.al.	2506.00448	null
2025-05-31	Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees’ Dialogue to Facilitate Nurse Communication Training	Keyeun Lee et.al.	2506.00386	null
2025-05-30	MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform	Hayoung Jung et.al.	2506.00308	null
2025-06-03	PersianMedQA: Language-Centric Evaluation of LLMs in the Persian Medical Domain	Mohammad Javad Ranjbar Kalahroodi et.al.	2506.00250	null
2025-05-30	Structuring Radiology Reports: Challenging LLMs with Lightweight Models	Johannes Moll et.al.	2506.00200	null
2025-05-30	Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models	Fardin Ahsan Sakib et.al.	2506.00134	null
2025-06-04	ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases	Yuchong Li et.al.	2506.00095	null
2025-05-30	Artificial Empathy: AI based Mental Health	Aditya Naik et.al.	2506.00081	null
2025-05-29	Evaluating Prompt Engineering Techniques for Accuracy and Confidence Elicitation in Medical LLMs	Nariman Naderi et.al.	2506.00072	null
2025-05-29	Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports	Sina Amirrajab et.al.	2506.00060	null
2025-05-30	Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs	Juraj Vladika et.al.	2505.24830	null
2025-05-30	A survey of using EHR as real-world evidence for discovering and validating new drug indications	Nabasmita Talukdar et.al.	2505.24767	null
2025-06-06	LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews	Christian Jaumann et.al.	2505.24757	link
2025-06-02	Automated Structured Radiology Report Generation	Jean-Benoit Delbrouck et.al.	2505.24223	null
2025-05-30	Semi-structured LLM Reasoners Can Be Rigorously Audited	Jixuan Leng et.al.	2505.24217	null
2025-05-30	Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning	Jiacheng Lin et.al.	2505.24105	null
2025-05-29	MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering	Yuexing Hao et.al.	2505.24040	null
2025-05-28	Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction	Mai Ali et.al.	2505.23822	null
2025-05-27	MedOrchestra: A Hybrid Cloud-Local LLM Approach for Clinical Data Interpretation	Sihyeon Lee et.al.	2505.23806	null
2025-06-02	MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks	Suhana Bedi et.al.	2505.23802	null
2025-06-03	Can Large Language Models Challenge CNNs in Medical Image Analysis?	Shibbir Ahmed et.al.	2505.23503	null
2025-05-29	Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons	Krithik Vishwanath et.al.	2505.23477	null
2025-05-29	Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble	Amit Kumthekar et.al.	2505.23075	null
2025-05-29	CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents	Zhen Xiang et.al.	2505.23055	link
2025-05-29	Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction	Guangyi Liu et.al.	2505.23034	null
2025-05-29	Exploring Scaling Laws for EHR Foundation Models	Sheng Zhang et.al.	2505.22964	null
2025-05-29	LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements	Jianwei Wang et.al.	2505.22959	link
2025-05-30	ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room	Nikita Mehandru et.al.	2505.22919	null
2025-05-28	Can Large Language Models Match the Conclusions of Systematic Reviews?	Christopher Polzak et.al.	2505.22787	link
2025-05-28	Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation	Yunsoo Kim et.al.	2505.22222	null
2025-05-28	Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection	Jinming Zhang et.al.	2505.22029	link
2025-05-28	Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning	Qihuang Zhong et.al.	2505.21958	null
2025-05-28	Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding	Hanyin Wang et.al.	2505.21908	null
2025-05-29	Query, Don’t Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries	Josefa Lia Stoisser et.al.	2505.21801	null
2025-05-27	BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum	Yubin Kim et.al.	2505.21757	null
2025-05-27	Counterfactual Simulatability of LLM Explanations for Generation Tasks	Marvin Limpijankit et.al.	2505.21740	null
2025-05-24	Vision Meets Language: A RAG-Augmented YOLOv8 Framework for Coffee Disease Diagnosis and Farmer Assistance	Semanto Mondal et.al.	2505.21544	link
2025-05-27	Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making	Yihan Wang et.al.	2505.21503	null
2025-05-27	Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery	Lina Zhao et.al.	2505.21418	null
2025-05-27	Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts	Yuxin Zhu et.al.	2505.21324	null
2025-05-27	Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings	Gunjan Balde et.al.	2505.21242	null
2025-05-27	Simulating Ethics: Using LLM Debate Panels to Model Deliberation on Medical Dilemmas	Hazem Zohny et.al.	2505.21112	null
2025-05-27	MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems	Kai Chen et.al.	2505.20824	link
2025-05-27	Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients	Hyungjun Park et.al.	2505.20609	null
2025-05-26	In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts	Filomene Roquefort et.al.	2505.20491	null
2025-05-24	Do LLMs have a Gender (Entropy) Bias?	Sonal Prabhune et.al.	2505.20343	null
2025-05-23	PMOA-TTS: Introducing the PubMed Open Access Textual Times Series Corpus	Shahriar Noroozizadeh et.al.	2505.20323	null
2025-05-23	Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP	Satya Narayana Cheetirala et.al.	2505.20320	null
2025-05-26	Fine-grained List-wise Alignment for Generative Medication Recommendation	Chenxiao Fan et.al.	2505.20218	link
2025-05-28	Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations	Mohit Chandra et.al.	2505.20201	null
2025-05-26	Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare	Natallia Kokash et.al.	2505.20020	null
2025-05-26	Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation	Hoyun Song et.al.	2505.20014	link
2025-05-26	An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning	Andrew Zamai et.al.	2505.19954	null
2025-05-30	FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks	Atsunori Moteki et.al.	2505.19662	null
2025-05-26	DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue	Yichun Feng et.al.	2505.19630	link
2025-05-26	AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare	Ying Xiao et.al.	2505.19562	link
2025-05-25	Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning	Shaohao Rui et.al.	2505.19213	null
2025-05-25	CardioCoT: Hierarchical Reasoning for Multimodal Survival Analysis	Shaohao Rui et.al.	2505.19195	null
2025-05-25	The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework	Feiran Liu et.al.	2505.19139	null
2025-05-25	Toward Human Centered Interactive Clinical Question Answering System	Dina Albassam et.al.	2505.18928	null
2025-05-24	TULUN: Transparent and Adaptable Low-resource Machine Translation	Raphaël Merx et.al.	2505.18683	null
2025-05-24	DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation	Zhihao Jia et.al.	2505.18630	null
2025-05-24	CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs	Yiqing Zhang et.al.	2505.18527	null
2025-05-24	From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data	Ugur Kursuncu et.al.	2505.18464	null
2025-05-24	MedScore: Factuality Evaluation of Free-Form Medical Answers	Heyuan Huang et.al.	2505.18452	link
2025-05-23	Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering	Jessica Tang et.al.	2505.18412	link
2025-05-23	RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification	Praphul Singh et.al.	2505.18380	null
2025-05-23	Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?	Waleed Reda et.al.	2505.18350	null
2025-05-23	PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language	Naghmeh Jamali et.al.	2505.18331	null
2025-05-23	TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification	Jianghao Wu et.al.	2505.18283	link
2025-05-23	Will Large Language Models Transform Clinical Prediction?	Yusuf Yildiz et.al.	2505.18246	null
2025-05-22	Towards medical AI misalignment: a preliminary study	Barbara Puccio et.al.	2505.18212	null
2025-05-23	Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL	Che Liu et.al.	2505.17952	null
2025-05-23	PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions	Daeun Kyung et.al.	2505.17818	null
2025-05-23	EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications	Ancheng Xu et.al.	2505.17654	null
2025-05-23	WiNGPT-3.0 Technical Report	Boqin Zhuang et.al.	2505.17387	link
2025-05-23	AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing	Yinghui Huang et.al.	2505.17380	null
2025-05-22	CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports	Xiao Yu Cindy Zhang et.al.	2505.17265	null
2025-05-22	CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation	Ibrahim Ethem Hamamci et.al.	2505.17167	null
2025-05-22	Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands	Kristin Qi et.al.	2505.17137	null
2025-05-21	Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models	Zongru Shao et.al.	2505.17119	null
2025-05-21	Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation	Kristine Ann M. Carandang et.al.	2505.17095	null
2025-05-18	Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases	Valentina Carbonari et.al.	2505.17065	null
2025-05-15	Assessing the Quality of AI-Generated Clinical Notes: A Validated Evaluation of a Large Language Model Scribe	Erin Palm et.al.	2505.17047	null
2025-05-22	MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning	Suhao Yu et.al.	2505.16964	null
2025-05-22	A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP	Issey Sukeda et.al.	2505.16661	link
2025-05-22	Collaboration among Multiple Large Language Models for Medical Question Answering	Kexin Shang et.al.	2505.16648	null
2025-05-22	No Black Boxes: Interpretable and Interactable Predictive Healthcare with Knowledge-Enhanced Agentic Causal Discovery	Xiaoxue Han et.al.	2505.16288	null
2025-05-22	Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools	Panagiotis Lymperopoulos et.al.	2505.16113	null
2025-05-23	Continually Self-Improving Language Models for Bariatric Surgery Question–Answering	Yash Kumar Atri et.al.	2505.16102	null
2025-05-22	TrialPanorama: Database and Benchmark for Systematic Review and Design of Clinical Trials	Zifeng Wang et.al.	2505.16097	null
2025-05-22	Multi-modal Integration Analysis of Alzheimer’s Disease Using Large Language Models and Knowledge Graphs	Kanan Kiguchi et.al.	2505.15747	null
2025-05-21	Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling	He Hu et.al.	2505.15715	null
2025-05-21	Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs	Lang Gao et.al.	2505.15524	null
2025-05-22	MentalMAC: Enhancing Large Language Models for Detecting Mental Manipulation via Multi-Task Anti-Curriculum Distillation	Yuansheng Gao et.al.	2505.15255	null
2025-05-21	AI Solutionism and Digital Self-Tracking with Wearables	Hannah R. Nolasco et.al.	2505.15162	null
2025-05-21	A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents	Ian Steenstra et.al.	2505.15108	null
2025-05-23	Diagnosing our datasets: How does my language model learn clinical information?	Furong Jia et.al.	2505.15024	null
2025-05-20	MedBrowseComp: Benchmarking Medical Deep Research and Computer Use	Shan Chen et.al.	2505.14963	null
2025-05-20	RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection	Wenjun Hou et.al.	2505.14318	link
2025-05-20	s3: You Don’t Need That Much Data to Train a Search Agent via RL	Pengcheng Jiang et.al.	2505.14146	link
2025-05-20	ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data	Xinzhe Zheng et.al.	2505.14038	null
2025-05-20	Fragments to Facts: Partial-Information Fragment Inference from LLMs	Lucas Rosenblatt et.al.	2505.13819	link
2025-05-19	VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation	Yubin Kim et.al.	2505.13577	null
2025-05-14	Source framing triggers systematic evaluation bias in Large Language Models	Federico Germani et.al.	2505.13488	null
2025-05-11	Evaluating Reasoning LLMs for Suicide Screening with the Columbia-Suicide Severity Rating Scale	Avinash Patil et.al.	2505.13480	link
2025-05-19	Learnware of Language Models: Specialized Small Language Models Can Do Big	Zhi-Hao Tan et.al.	2505.13425	link
2025-05-19	Dementia Through Different Eyes: Explainable Modeling of Human and LLM Perceptions for Early Awareness	Lotem Peled-Cohen et.al.	2505.13418	null
2025-05-19	Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice	Zhi Liu et.al.	2505.13156	null
2025-05-19	Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning	Xiaoyu Yang et.al.	2505.13081	null
2025-05-19	GAP: Graph-Assisted Prompts for Dialogue-based Medication Recommendation	Jialun Zhong et.al.	2505.12888	null
2025-05-19	EpiLLM: Unlocking the Potential of Large Language Models in Epidemic Forecasting	Chenghua Gong et.al.	2505.12738	null
2025-05-18	ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents	Navid Madani et.al.	2505.12531	null
2025-05-18	MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks	Yinghao Zhu et.al.	2505.12371	link
2025-05-18	PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs	Sriram Selvam et.al.	2505.12238	link
2025-05-17	AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation	Xiechi Zhang et.al.	2505.11887	null
2025-05-21	LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models	Ryan Chen et.al.	2505.11772	null
2025-05-20	MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports	Kevin Wu et.al.	2505.11733	link
2025-05-16	MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models	Xiaomin Li et.al.	2505.11613	null
2025-05-16	Heart2Mind: Human-Centered Contestable Psychiatric Disorder Diagnosis System using Wearable ECG Monitors	Hung Nguyen et.al.	2505.11612	link
2025-05-16	Disentangling Reasoning and Knowledge in Medical Large Language Models	Rahul Thapa et.al.	2505.11462	null
2025-05-16	CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs	Sijia Chen et.al.	2505.11413	null
2025-05-15	Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI	Agnik Saha et.al.	2505.10472	null
2025-05-20	AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges	Ranjan Sapkota et.al.	2505.10468	null
2025-05-15	Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation	Yue Guo et.al.	2505.10409	null
2025-05-15	From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making	Dubai Li et.al.	2505.10282	link
2025-05-15	The Evolving Landscape of Generative Large Language Models and Traditional Natural Language Processing in Medicine	Rui Yang et.al.	2505.10261	null
2025-05-15	What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs	Xinlan Yan et.al.	2505.10113	null
2025-05-14	Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models	Aditya Nagori et.al.	2505.09805	null
2025-05-14	A Multimodal Multi-Agent Framework for Radiology Report Generation	Ziruo Yi et.al.	2505.09787	null
2025-05-16	Tales of the 2025 Los Angeles Fire: Hotwash for Public Health Concerns in Reddit via LLM-Enhanced Topic Modeling	Sulong Zhou et.al.	2505.09665	null
2025-05-13	Performance Gains of LLMs With Humans in a World of LLMs Versus Humans	Lucas McCullum et.al.	2505.08902	null
2025-05-13	NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context	Ben Yao et.al.	2505.08734	null
2025-05-13	LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs	K M Sajjadul Islam et.al.	2505.08704	null
2025-05-13	TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching	Majd Abdallah et.al.	2505.08508	null
2025-05-13	Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions	Lata Pangtey et.al.	2505.08464	null
2025-05-13	Decoding Neighborhood Environments with Large Language Models	Andrew Cart et.al.	2505.08163	null
2025-05-13	Communication Styles and Reader Preferences of LLM and Human Experts in Explaining Health Information	Jiawei Zhou et.al.	2505.08143	null
2025-05-12	Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models	Weiyi Wu et.al.	2505.07968	null
2025-05-11	TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking	Ching Nam Hang et.al.	2505.07891	null
2025-05-07	A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas	Pranav Narayanan Venkit et.al.	2505.07850	null
2025-05-12	Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 2030	Mouxiao Bian et.al.	2505.07205	null
2025-05-12	KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification	Hajar Sakai et.al.	2505.07162	null
2025-05-11	Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI	Chao Ding et.al.	2505.06912	null
2025-05-10	Utilizing LLMs to Investigate the Disputed Role of Evidence in Electronic Cigarette Health Policy Formation in Australia and the UK	Damian Curran et.al.	2505.06782	null
2025-05-10	NeuroPal: A Clinically-Informed Multimodal LLM Assistant for Mental Health Combining Sleep Chronotherapy, Cognitive Behavioral Reframing, and Adaptive Phytochemical Intervention	Xiaoran Han et.al.	2505.06640	null
2025-05-10	Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning	H M Dipu Kabir et.al.	2505.06592	link
2025-05-07	Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs	Hung Manh Pham et.al.	2505.06296	null
2025-05-15	Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	Joshua Harris et.al.	2505.06046	null
2025-05-09	A Day in Their Shoes: Using LLM-Based Perspective-Taking Interactive Fiction to Reduce Stigma Toward Dirty Work	Xiangzhe Yuan et.al.	2505.05786	null
2025-05-09	Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications	Da Wu et.al.	2505.05736	link
2025-05-08	Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models	Wei Peng et.al.	2505.05189	link
2025-05-08	Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization	Ajwad Abrar et.al.	2505.05070	null
2025-05-07	Retrieval Augmented Generation Evaluation for Health Documents	Mario Ceresa et.al.	2505.04680	null
2025-05-06	Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction	Paul Landes et.al.	2505.04655	null
2025-05-06	Advancing Conversational Diagnostic AI with Multimodal Reasoning	Khaled Saab et.al.	2505.04653	null
2025-05-06	FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights	Chengzhang Yu et.al.	2505.04649	null
2025-05-05	ChatGPT for automated grading of short answer questions in mechanical ventilation	Tejas Jade et.al.	2505.04645	null
2025-05-07	The Aloe Family Recipe for Open and Specialized Healthcare LLMs	Dario Garcia-Gasulla et.al.	2505.04388	null
2025-05-07	Can Language Models Understand Social Behavior in Clinical Conversations?	Manas Satish Bedmutha et.al.	2505.04152	null
2025-05-07	Natural Language Generation in Healthcare: A Review of Methods and Applications	Mengxian Lyu et.al.	2505.04073	null
2025-04-30	Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding	Trilok Padhi et.al.	2505.03788	null
2025-04-30	mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging	Eleftherios Tzanis et.al.	2505.03785	link
2025-04-30	ALFRED: Ask a Large-language model For Reliable ECG Diagnosis	Jin Yu et.al.	2505.03781	null
2025-05-06	Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis	Shuang Zhou et.al.	2505.03467	null
2025-05-06	MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks	Mouath Abu Daoud et.al.	2505.03427	link
2025-05-06	Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation	Mohammad Shoaib Ansari et.al.	2505.03406	null
2025-05-06	Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback	Shijing Zhu et.al.	2505.03293	null
2025-05-02	Enhancing ML Model Interpretability: Leveraging Fine-Tuned Large Language Models for Better Understanding of AI	Jonas Bokstaller et.al.	2505.02859	null
2025-05-05	Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry	Junu Kim et.al.	2505.02722	link
2025-05-05	Structure Causal Models and LLMs Integration in Medical Visual Question Answering	Zibo Xu et.al.	2505.02703	null
2025-05-05	AI Standardized Patient Improves Human Conversations in Advanced Cancer Care	Kurtis Haut et.al.	2505.02694	link
2025-05-08	A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law	Qianjun Pan et.al.	2505.02665	null
2025-05-08	Bielik v3 Small: Technical Report	Krzysztof Ociepa et.al.	2505.02550	null
2025-05-05	Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors	Ryan Louie et.al.	2505.02428	null
2025-05-04	Generative AI in clinical practice: novel qualitative evidence of risk and responsible use of Google’s NotebookLM	Max Reuter et.al.	2505.01955	null
2025-05-03	Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings	Alexander Davis et.al.	2505.01711	null
2025-05-03	High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	Brian Wong et.al.	2505.01693	null
2025-05-02	Emotions in the Loop: A Survey of Affective Computing for Emotional Support	Karishma Hegde et.al.	2505.01542	null
2025-05-12	Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications	Jiawei He et.al.	2505.01146	null
2025-05-10	SSRLBot: Designing and Developing a Large Language Model-based Agent using Socially Shared Regulated Learning	Xiaoshan Huang et.al.	2505.00945	null
2025-05-05	Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs	Dung Nguyen et.al.	2505.00744	null
2025-05-01	Red Teaming Large Language Models for Healthcare	Vahid Balazadeh et.al.	2505.00467	null
2025-05-01	KoACD: The First Korean Adolescent Dataset for Cognitive Distortion Analysis	JunSeo Kim et.al.	2505.00367	null
2025-05-01	AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care	Md Asaduzzaman Jabin et.al.	2505.00275	link
2025-04-28	MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis	Yuyang Sha et.al.	2505.00032	null
2025-04-21	Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models	Tri Nguyen et.al.	2505.00010	null
2025-04-30	TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments	Sichang Tu et.al.	2504.21851	null
2025-04-30	TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training	Shengqian Wang et.al.	2504.21735	null
2025-04-30	XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs	Marco Arazzi et.al.	2504.21700	null
2025-04-30	UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation	Linshan Wu et.al.	2504.21336	link
2025-04-30	Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA	Xuanzhao Dong et.al.	2504.21252	link
2025-04-29	A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces	Juliana Barbosa et.al.	2504.21211	null
2025-04-29	Multimodal Large Language Models for Medicine: A Comprehensive Survey	Jiarui Ye et.al.	2504.21051	null
2025-04-23	Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh	MD Thamed Bin Zaman Chowdhury et.al.	2504.21025	null
2025-04-29	Jekyll-and-Hyde Tipping Point in an AI’s Behavior	Neil F. Johnson et.al.	2504.20980	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	link
2025-04-29	Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records	Jesus Lovon et.al.	2504.20547	null
2025-04-30	Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging	Neil K. R. Sehgal et.al.	2504.20519	null
2025-04-29	“I’ve talked to ChatGPT about my issues last night.”: Examining Mental Health Conversations with Large Language Models through Reddit Analysis	Kyuha Jung et.al.	2504.20320	null
2025-04-28	OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis	Jinglin He et.al.	2504.20118	null
2025-04-28	Transforming Evidence Synthesis: A Systematic Review of the Evolution of Automated Meta-Analysis in the Age of AI	Lingbo Li et.al.	2504.20113	null
2025-04-15	Recommending Clinical Trials for Online Patient Cases using Artificial Intelligence	Joey Chan et.al.	2504.20059	null
2025-04-28	Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI	Hugo Georgenthum et.al.	2504.19918	null
2025-04-28	A Tripartite Perspective on GraphRAG	Michael Banf et.al.	2504.19667	null
2025-04-28	m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training	Meng Xiao et.al.	2504.19565	null
2025-05-01	BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text	Jiageng Wu et.al.	2504.19467	link
2025-04-27	HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer’s Disease	Qiuhui Chen et.al.	2504.19075	null
2025-04-27	Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models	Anindya Bijoy Das et.al.	2504.19061	null
2025-04-26	AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression	Dong Whi Yoo et.al.	2504.18932	null
2025-04-26	Clinical knowledge in LLMs does not translate to human interactions	Andrew M. Bean et.al.	2504.18919	link
2025-04-25	Proof-of-TBI – Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction	Ross Gore et.al.	2504.18671	null
2025-04-22	Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes	Guanchen Wu et.al.	2504.18569	null
2025-04-25	Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers	Jared Moore et.al.	2504.18412	link
2025-04-25	MAGI: Multi-Agent Guided Interview for Psychiatric Assessment	Guanqun Bi et.al.	2504.18260	null
2025-04-25	Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization	Wataru Kawakami et.al.	2504.18080	null
2025-05-05	Optimism, Expectation, or Sarcasm? Multi-Class Hope Speech Detection in Spanish and English	Sabur Butt et.al.	2504.17974	null
2025-04-24	LLM Agent Swarm for Hypothesis-Driven Drug Discovery	Kevin Song et.al.	2504.17967	null
2025-04-24	Replay to Remember: Retaining Domain Knowledge in Streaming Language Models	Sneh Pillai et.al.	2504.17780	null
2025-04-24	Towards a HIPAA Compliant Agentic AI System in Healthcare	Subash Neupane et.al.	2504.17669	null
2025-04-24	PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare	Jose G. Moreno et.al.	2504.17360	null
2025-04-24	Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues	Jinfeng Zhou et.al.	2504.17238	null
2025-04-25	The Rise of Small Language Models in Healthcare: A Comprehensive Survey	Muskan Garg et.al.	2504.17119	null
2025-04-23	Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study	Andy Li et.al.	2504.16601	null
2025-04-23	Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition	Zhenguang Zhong et.al.	2504.16504	null
2025-04-23	ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs	Fahmida Liza Piya et.al.	2504.16394	link
2025-04-22	Investigating LLMs in Clinical Triage: Promising Capabilities, Persistent Intersectional Biases	Joseph Lee et.al.	2504.16273	null
2025-04-21	Measuring Interest Group Positions on Legislation: An AI-Driven Analysis of Lobbying Reports	Jiseon Kim et.al.	2504.15333	link
2025-04-21	Med-CoDE: Medical Critique based Disagreement Evaluation Framework	Mohit Gupta et.al.	2504.15330	null
2025-04-21	POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications	Chunjing Gan et.al.	2504.14917	null
2025-04-25	A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs	Yihan Lin et.al.	2504.14657	null
2025-04-20	HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models	Fan Gao et.al.	2504.14594	null
2025-04-19	Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations	Katie Matton et.al.	2504.14150	link
2025-04-18	A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task	Laerdon Kim et.al.	2504.14066	null
2025-04-17	Deep literature reviews: an application of fine-tuned language models to migration research	Stefano M. Iacus et.al.	2504.13685	null
2025-04-18	LLM Sensitivity Evaluation Framework for Clinical Diagnosis	Chenwei Yan et.al.	2504.13475	null
2025-04-17	ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images	Sangwook Kim et.al.	2504.13023	null
2025-04-17	Chinese-Vicuna: A Chinese Instruction-following Llama-based Model	Chenghao Fan et.al.	2504.12737	null
2025-04-16	Leveraging Large Language Models for Multi-Class and Multi-Label Detection of Drug Use and Overdose Symptoms on Social Media	Muhammad Ahmad et.al.	2504.12355	null
2025-04-15	A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports	Jing Wang et.al.	2504.12350	null
2025-04-14	Paging Dr. GPT: Extracting Information from Clinical Notes to Enhance Patient Predictions	David Anderson et.al.	2504.12338	null
2025-04-14	“It Listens Better Than My Therapist”: Exploring Social Media Discourse on LLMs as Mental Health Tool	Anna-Carolina Haensch et.al.	2504.12337	null
2025-04-13	QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model	Zongxian Yang et.al.	2504.12334	null
2025-04-12	Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis	Shahriar Noroozizadeh et.al.	2504.12326	null
2025-04-18	Selective Attention Federated Learning: Improving Privacy and Efficiency for Clinical Text Classification	Yue Li et.al.	2504.11793	null
2025-04-16	Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records	Md Sultan Al Nahian et.al.	2504.11792	null
2025-04-16	Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets	Yongpei Ma et.al.	2504.11777	null
2025-04-15	Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions	Wang Bill Zhu et.al.	2504.11373	link
2025-04-15	Learning to Be A Doctor: Searching for Effective Medical Agent Architectures	Yangyang Zhuang et.al.	2504.11301	null
2025-04-26	Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs	Yingjian Chen et.al.	2504.10982	null
2025-04-15	Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content	Yilang Peng et.al.	2504.10878	null
2025-04-13	Federated Learning with Layer Skipping: Efficient Training of Large Language Models for Healthcare NLP	Lihong Zhang et.al.	2504.10536	null
2025-04-08	Exposure to Content Written by Large Language Models Can Reduce Stigma Around Opioid Use Disorder in Online Communities	Shravika Mittal et.al.	2504.10501	null
2025-04-14	CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation	Jing Chen et.al.	2504.10418	null
2025-04-14	Performance of Large Language Models in Supporting Medical Diagnosis and Treatment	Diogo Sousa et.al.	2504.10405	null
2025-04-20	Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families	Shahriar Noroozizadeh et.al.	2504.10340	null
2025-04-20	Emotional Strain and Frustration in LLM Interactions in Software Engineering	Cristina Martinez Montes et.al.	2504.10050	null
2025-04-19	EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety	Jiahao Qiu et.al.	2504.09689	link
2025-04-15	ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model	Wuyang Lan et.al.	2504.09421	link
2025-04-12	Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries	Koustuv Saha et.al.	2504.09271	null
2025-04-04	The Lyme Disease Controversy: An AI-Driven Discourse Analysis of a Quarter Century of Academic Debate and Divides	Teo Susnjak et.al.	2504.08777	link
2025-04-01	Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation	Xiaofan Zhou et.al.	2504.08768	null
2025-04-11	MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models	Junmo Kim et.al.	2504.08329	link
2025-04-24	Can Reasoning LLMs Enhance Clinical Document Classification?	Akram Mustafa et.al.	2504.08040	null
2025-04-14	Psychological Health Knowledge-Enhanced LLM-based Social Network Crisis Intervention Text Transfer Recognition Method	Shurui Wu et.al.	2504.07983	null
2025-04-11	An LLM-Driven Multi-Agent Debate System for Mendelian Diseases	Xinyang Zhou et.al.	2504.07881	null
2025-04-10	MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval-Augmented Generation	Yixiang Chen et.al.	2504.07724	link
2025-04-17	PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization	Yang Jiao et.al.	2504.07717	null
2025-04-10	Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction	Kyoyun Choi et.al.	2504.07415	null
2025-04-09	Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging	Siyuan Dai et.al.	2504.07336	null
2025-04-09	A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation	Fatemeh Amrollahi et.al.	2504.07278	null
2025-04-09	Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis	Umakanta Maharana et.al.	2504.06581	link
2025-04-08	Human Trust in AI Search: A Large-Scale Experiment	Haiwen Li et.al.	2504.06435	null
2025-04-08	A Geometric-Aware Perspective and Beyond: Hybrid Quantum-Classical Machine Learning Methods	Azadeh Alavia et.al.	2504.06328	null
2025-04-08	LExT: Towards Evaluating Trustworthiness of Natural Language Explanations	Krithi Shailya et.al.	2504.06227	null
2025-04-08	TxGemma: Efficient and Agentic LLMs for Therapeutics	Eric Wang et.al.	2504.06196	null
2025-04-11	Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups	Rijul Magu et.al.	2504.06160	null
2025-04-08	How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM	Jirong Zha et.al.	2504.05786	null
2025-04-07	The challenge of uncertainty quantification of large language models in medicine	Zahra Atf et.al.	2504.05278	null
2025-04-07	On the Performance of an Explainable Language Model on PubMedQA	Venkat Srinivasan et.al.	2504.05074	null
2025-04-07	Leveraging Large Language Models for Cost-Effective, Multilingual Depression Detection and Severity Assessment	Longdi Xian et.al.	2504.04891	null
2025-04-07	Simulating Persuasive Dialogues on Meat Reduction with Generative Agents	Georg Ahnert et.al.	2504.04872	link
2025-04-08	Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide	Zhijie Duan et.al.	2504.04346	null
2025-04-06	MedM-VL: What Makes a Good Medical LVLM?	Yiming Shi et.al.	2504.04323	link
2025-04-05	AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs	Xinyu Mao et.al.	2504.04193	link
2025-04-05	A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models	Aviv Brokman et.al.	2504.04083	null
2025-04-15	Do “New Snow Tablets” Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs	Sifan Li et.al.	2504.03786	link
2025-04-02	Emerging Cyber Attack Risks of Medical AI Agents	Jianing Qiu et.al.	2504.03759	null
2025-04-03	AD-GPT: Large Language Models in Alzheimer’s Disease	Ziyu Liu et.al.	2504.03071	null
2025-04-03	Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models	Chengyang He et.al.	2504.03051	null
2025-04-03	Bias in Large Language Models Across Clinical Applications: A Systematic Review	Thanathip Suenghataiphorn et.al.	2504.02917	null
2025-04-16	OnRL-RAG: Real-Time Personalized Mental Health Dialogue System	Ahsan Bilal et.al.	2504.02894	null
2025-04-01	TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet	Luis Felipe et.al.	2504.02874	null
2025-04-01	Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction	Enshuo Hsu et.al.	2504.02871	null
2025-04-04	A Survey of Large Language Models in Mental Health Disorder Detection on Social Media	Zhuohan Ge et.al.	2504.02800	null
2025-04-03	AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology	Xiang Feng et.al.	2504.02404	link
2025-04-02	Trapped by Expectations: Functional Fixedness in LLM-Enabled Chat Search	Jiqun Liu et.al.	2504.02074	null
2025-04-02	Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment	Abdelrahaman A. Hassan et.al.	2504.01767	null
2025-04-01	Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models	Feng Chen et.al.	2504.01216	null
2025-04-01	Medical large language models are easily distracted	Krithik Vishwanath et.al.	2504.01201	link
2025-04-04	MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs	Juncheng Wu et.al.	2504.00993	link
2025-04-01	InformGen: An AI Copilot for Accurate and Compliant Clinical Research Consent Document Generation	Zifeng Wang et.al.	2504.00934	null
2025-04-01	m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models	Xiaoke Huang et.al.	2504.00869	null
2025-04-01	IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models	Yunsoo Kim et.al.	2504.00748	null
2025-03-31	Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology	Dou Liu et.al.	2504.00061	null
2025-03-31	Integrating Large Language Models with Human Expertise for Disease Detection in Electronic Health Records	Jie Pan et.al.	2504.00053	null
2025-03-27	Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1	Birger Moell et.al.	2504.00016	null
2025-03-31	A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG	Arshia Kermani et.al.	2503.24307	null
2025-03-31	IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots	Mingyang Gu et.al.	2503.24021	null
2025-03-31	Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection	Mahdi Amiri et.al.	2503.23873	null
2025-03-30	When LLM Therapists Become Salespeople: Evaluating Large Language Models for Ethical Motivational Interviewing	Haein Kong et.al.	2503.23566	null
2025-04-01	A Scalable Framework for Evaluating Health Language Models	Neil Mallinar et.al.	2503.23339	null
2025-03-29	Prediction of 30-day hospital readmission with clinical notes and EHR information	Tiago Almeida et.al.	2503.23050	null
2025-04-03	Agentic Large Language Models, a survey	Aske Plaat et.al.	2503.23037	null
2025-03-29	A Retrieval-Augmented Knowledge Mining Method with Deep Thinking LLMs for Biomedical Research and Clinical Support	Yichun Feng et.al.	2503.23029	null
2025-03-29	Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective	Xinyu Yao et.al.	2503.22954	null
2025-03-28	MediTools – Medical Education Powered by LLMs	Amr Alshatnawi et.al.	2503.22769	link
2025-03-26	Susceptibility of Large Language Models to User-Driven Factors in Medical Queries	Kyung Ho Lim et.al.	2503.22746	null
2025-03-25	LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation	Sarah Martinson et.al.	2503.22719	link
2025-03-28	Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions	Mohammad Almansoori et.al.	2503.22678	null
2025-04-08	Modeling Challenging Patient Interactions: LLMs for Medical Communication Training	Anna Bodonhelyi et.al.	2503.22250	null
2025-03-31	PharmAgents: Building a Virtual Pharma with Large Language Model Agents	Bowen Gao et.al.	2503.22164	null
2025-03-28	Leveraging LLMs for Predicting Unknown Diagnoses from Clinical Notes	Dina Albassam et.al.	2503.22092	null
2025-03-27	Socially Constructed Treatment Plans: Analyzing Online Peer Interactions to Understand How Patients Navigate Complex Medical Conditions	Madhusudan Basak et.al.	2503.21986	null
2025-03-27	RedditESS: A Mental Health Social Support Interaction Dataset – Understanding Effective Social Support to Refine AI-Driven Support Tools	Zeyad Alghamdi et.al.	2503.21888	null
2025-03-27	Combining Artificial Users and Psychotherapist Assessment to Evaluate Large Language Model-based Mental Health Chatbots	Florian Onur Kuhlmeier et.al.	2503.21540	null
2025-03-27	Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records	Noah Losch et.al.	2503.21349	null
2025-03-26	Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters	Mahmoud Alwakeel et.al.	2503.21004	null
2025-03-26	Clean & Clear: Feasibility of Safe LLM Clinical Guidance	Julia Ive et.al.	2503.20953	null
2025-03-26	TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews	Huimin Xu et.al.	2503.20666	null
2025-03-26	TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes	Raj Sanjay Shah et.al.	2503.20648	null
2025-03-26	Low-resource Information Extraction with the European Clinical Case Corpus	Soumitra Ghosh et.al.	2503.20568	null
2025-03-26	Explainable ICD Coding via Entity Linking	Leonor Barreiros et.al.	2503.20508	null
2025-03-26	Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering	Zehui Liao et.al.	2503.20504	null
2025-03-25	Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder	Changye Li et.al.	2503.20103	link
2025-03-25	Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications	Ben Rahman et.al.	2503.19276	null
2025-03-25	PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping	Sarah Pungitore et.al.	2503.19265	null
2025-03-24	Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages	Tadesse Destaw Belay et.al.	2503.18253	null
2025-03-26	PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation	Yiheng Zhong et.al.	2503.18227	link
2025-03-23	AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs	Diwei Wang et.al.	2503.18141	null
2025-03-23	Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook	Xu Zheng et.al.	2503.18016	null
2025-03-23	Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA	Justice Ou et.al.	2503.17933	link
2025-03-23	MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation	Hsin-Ling Hsu et.al.	2503.17900	null
2025-03-22	Satisfactory Medical Consultation based on Terminology-Enhanced Information Retrieval and Emotional In-Context Learning	Kaiwen Zuo et.al.	2503.17876	null
2025-03-22	MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation	Xiaodan Zhang et.al.	2503.17784	link
2025-03-22	GPBench: A Comprehensive and Fine-Grained Benchmark for Evaluating Large Language Models as General Practitioners	Zheqing Li et.al.	2503.17599	null
2025-03-21	Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent	Humza Nusrat et.al.	2503.17553	null
2025-03-21	An LLM-Powered Clinical Calculator Chatbot Backed by Verifiable Clinical Calculators and their Metadata	Niranjan Kumar et.al.	2503.17550	null
2025-03-21	Reimagining Support: Exploring Autistic Individuals’ Visions for AI in Coping with Negative Self-Talk	Buse Carik et.al.	2503.17504	null
2025-03-21	Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP	Veysel Kocaman et.al.	2503.17425	null
2025-03-21	Understanding Social Support Needs in Questions: A Hybrid Approach Integrating Semi-Supervised Learning and LLM-based Data Augmentation	Junwei Kuang et.al.	2503.17421	null
2025-03-21	Automating Adjudication of Cardiovascular Events Using Large Language Models	Sonish Sivarajkumar et.al.	2503.17222	null
2025-03-20	Automated Harmfulness Testing for Code Large Language Models	Honghao Tan et.al.	2503.16740	null
2025-03-18	From Patient Consultations to Graphs: Leveraging LLMs for Patient Journey Knowledge Graph Construction	Hassan S. Al Khatib et.al.	2503.16533	null
2025-03-18	Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine	Chengfeng Dou et.al.	2503.16530	null
2025-03-20	OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Long Yuan et.al.	2503.16326	null
2025-03-21	Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1	Peiran Gu et.al.	2503.16304	null
2025-03-21	MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering	Feiyang Li et.al.	2503.16131	null
2025-03-20	BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models	Zenghui Yuan et.al.	2503.16023	null
2025-03-20	Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning	Peiyi Lin et.al.	2503.15924	null
2025-03-20	DeepPsy-Agent: A Stage-Aware and Deep-Thinking Emotional Support Agent System	Kai Chen et.al.	2503.15876	null
2025-03-19	Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation	Hisashi Johno et.al.	2503.15664	null
2025-03-27	Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems	Yuelyu Ji et.al.	2503.15454	null
2025-03-19	Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data	Anatole Callies et.al.	2503.15374	link
2025-03-19	Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks	Yuting Guo et.al.	2503.15169	null
2025-03-28	Envisioning an AI-Enhanced Mental Health Ecosystem	Kellie Yu Hui Sim et.al.	2503.14883	null
2025-03-18	Generating Medically-Informed Explanations for Depression Detection using LLMs	Xiangyong Chen et.al.	2503.14671	null
2025-03-18	MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation	Kai Chen et.al.	2503.13856	null
2025-03-14	RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration	Hong Qing Yu et.al.	2503.13514	null
2025-03-13	It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education	Shrutika Singh et.al.	2503.13508	null
2025-03-17	Reliable and Efficient Amortized Model-based Evaluation	Sang Truong et.al.	2503.13335	null
2025-03-24	LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation	Xiaodi Li et.al.	2503.13281	null
2025-03-17	MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways	Zhen Chen et.al.	2503.13205	null
2025-03-16	From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations	Sarvesh Baskar et.al.	2503.12556	null
2025-03-15	Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes	Da Wu et.al.	2503.12286	null
2025-03-15	TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation	Mayank Kumar et.al.	2503.12217	null
2025-03-20	Applications of Large Language Model Reasoning in Feature Generation	Dharani Chandra et.al.	2503.11989	null
2025-03-14	Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages	Jiyeong Kim et.al.	2503.11384	null
2025-03-14	TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools	Shanghua Gao et.al.	2503.10970	link
2025-03-12	CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries	Zhiyuan Wang et.al.	2503.10707	null
2025-03-12	Medical Large Language Model Benchmarks Should Prioritize Construct Validity	Ahmed Alaa et.al.	2503.10694	null
2025-03-13	Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models	Afrar Jahin et.al.	2503.10573	null
2025-03-13	LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions	Gaurav Kumar Gupta et.al.	2503.10486	null
2025-03-13	Cognitive-Mental-LLM: Leveraging Reasoning in Large Language Models for Mental Health Prediction via Online Text	Avinash Patil et.al.	2503.10095	link
2025-03-12	Review GIDE – Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models	Timothy Laurence et.al.	2503.09743	null
2025-03-12	LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics	Jialiang Tang et.al.	2503.09656	null
2025-03-16	Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy	Abe Bohan Hou et.al.	2503.09639	null
2025-03-12	RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports	Jiushen Cai et.al.	2503.09358	null
2025-03-12	A Survey on Enhancing Causal Reasoning Ability of Large Language Models	Xin Li et.al.	2503.09326	null
2025-03-12	VaxGuard: A Multi-Generator, Multi-Type, and Multi-Role Dataset for Detecting LLM-Generated Vaccine Misinformation	Syed Talal Ahmad et.al.	2503.09103	null
2025-03-12	Teaching LLMs How to Learn with Contextual Fine-Tuning	Younwoo Choi et.al.	2503.09032	null
2025-03-11	Towards Scalable and Cross-Lingual Specialist Language Models for Oncology	Morteza Rohanian et.al.	2503.08323	null
2025-03-10	Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan	Matthias Schöffel et.al.	2503.07827	null
2025-03-20	MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning	Xiangru Tang et.al.	2503.07459	link
2025-03-10	Anatomy-Aware Conditional Image-Text Retrieval	Meng Zheng et.al.	2503.07456	null
2025-03-10	Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment	Xing Xie et.al.	2503.07334	link
2025-03-10	Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies	Luyi Jiang et.al.	2503.07306	null
2025-03-10	A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images	Xiaoyi Liang et.al.	2503.07094	null
2025-03-10	TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine	Tianai Huang et.al.	2503.07041	null
2025-03-10	Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation	Zhi Qin et.al.	2503.07032	null
2025-03-09	Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia	Sabeen Ahmed et.al.	2503.06797	null
2025-03-09	Why Pre-trained Models Fail: Feature Entanglement in Multi-modal Depression Detection	Xiangyu Zhang et.al.	2503.06620	null
2025-03-09	ExKG-LLM: Leveraging Large Language Models for Automated Expansion of Cognitive Neuroscience Knowledge Graphs	Ali Sarabadani et.al.	2503.06479	null
2025-03-09	AXAI-CDSS : An Affective Explainable AI-Driven Clinical Decision Support System for Cannabis Use	Tongze Zhang et.al.	2503.06463	null
2025-03-08	CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset	Oriel Perets et.al.	2503.06204	link
2025-03-08	Towards Conversational AI for Disease Management	Anil Palepu et.al.	2503.06074	null
2025-03-01	MedSimAI: Simulation and Formative Feedback Generation to Enhance Deliberate Practice in Medical Education	Yann Hicke et.al.	2503.05793	null
2025-03-07	Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering	Yusong Ke et.al.	2503.05505	null
2025-03-07	GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation	Zhenxuan Zhang et.al.	2503.05347	link
2025-03-06	HILGEN: Hierarchically-Informed Data Generation for Biomedical NER Using Knowledgebases and Large Language Models	Yao Ge et.al.	2503.04930	null
2025-03-10	Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases	Pengcheng Qiu et.al.	2503.04691	null
2025-03-06	Large Language Models in Bioinformatics: A Survey	Zhenyu Wang et.al.	2503.04490	null
2025-03-06	TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records	Hejie Cui et.al.	2503.04176	null
2025-03-06	KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease	Yongchao Long et.al.	2503.04153	link
2025-03-06	Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting	Jiyue Jiang et.al.	2503.04013	null
2025-03-06	RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models	Wenhui Zhu et.al.	2503.03987	null
2025-03-05	RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction	Fenglin Liu et.al.	2503.03802	link
2025-03-05	Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks	Zihao Zhao et.al.	2503.03687	link
2025-03-05	Psy-Copilot: Visual Chain of Thought for Counseling	Keqi Chen et.al.	2503.03645	null
2025-03-05	Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling	Keqi Chen et.al.	2503.03607	null
2025-03-05	Structured Outputs Enable General-Purpose LLMs to be Medical Experts	Guangfu Guo et.al.	2503.03194	null
2025-03-04	From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance	Jiacheng Tang et.al.	2503.02760	null
2025-03-04	The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats	William Brach et.al.	2503.02650	link
2025-03-04	BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA	Zhengyang Ji et.al.	2503.02476	link
2025-03-04	MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics	Haoan Jin et.al.	2503.02374	null
2025-03-06	EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports	Lama Moukheiber et.al.	2503.02365	null
2025-03-04	Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm	Zhuo Li et.al.	2503.02359	null
2025-03-03	Biomedical Foundation Model: A Survey	Xiangrui Liu et.al.	2503.02104	null
2025-02-28	PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice	Ruoxi Wang et.al.	2503.01903	null
2025-03-03	SHADE-AD: An LLM-Based Framework for Synthesizing Activity Data of Alzheimer’s Patients	Heming Fu et.al.	2503.01768	null
2025-03-03	Designing VR Simulation System for Clinical Communication Training with LLMs-Based Embodied Conversational Agents	Xiuqi Tommy Zhu et.al.	2503.01767	null
2025-03-03	Distilled Prompt Learning for Incomplete Multimodal Survival Prediction	Yingxue Xu et.al.	2503.01653	null
2025-03-03	Leveraging LLMs for Mental Health: Detection and Recommendations from Social Discussions	Vaishali Aggarwal et.al.	2503.01442	null
2025-03-03	Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation	Linhai Zhang et.al.	2503.01315	null
2025-03-03	Cancer Type, Stage and Prognosis Assessment from Pathology Reports using LLMs	Rachit Saluja et.al.	2503.01194	link
2025-03-03	Large Language Models for Healthcare Text Classification: A Systematic Review	Hajar Sakai et.al.	2503.01159	null
2025-03-02	Language-agnostic, automated assessment of listeners’ speech recall using large language models	Björn Herrmann et.al.	2503.01045	null
2025-03-02	FunBench: Benchmarking Fundus Reading Skills of MLLMs	Qijie Wei et.al.	2503.00901	null
2025-03-02	Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies	Tianyi Huang et.al.	2503.00724	null
2025-03-01	Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires	Kyle Gao et.al.	2503.00566	null
2025-03-01	NeuroSymAD: A Neuro-Symbolic Framework for Interpretable Alzheimer’s Disease Diagnosis	Yexiao He et.al.	2503.00510	null
2025-03-01	NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews	Vedant Khandelwal et.al.	2503.00278	null
2025-03-01	Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy	Jahan C. Penny-Dimri et.al.	2503.00269	null
2025-02-24	Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application	Carlos Luengo Vera et.al.	2503.00025	null
2025-02-28	A Non-contrast Head CT Foundation Model for Comprehensive Neuro-Trauma Triage	Youngjin Yoo et.al.	2502.21106	null
2025-02-28	Explainable Biomedical Claim Verification with Large Language Models	Siting Liang et.al.	2502.21014	null
2025-02-28	Merging Clinical Knowledge into Large Language Models for Medical Research and Applications: A Survey	Qiyuan Li et.al.	2502.20988	null
2025-02-28	ProAI: Proactive Multi-Agent Conversational AI with Structured Knowledge Base for Psychiatric Diagnosis	Yuqi Wu et.al.	2502.20689	null
2025-02-28	NutriGen: Personalized Meal Plan Generator Leveraging Large Language Models to Enhance Dietary and Nutritional Adherence	Saman Khamesian et.al.	2502.20601	link
2025-02-27	CoCa-CXR: Contrastive Captioners Learn Strong Temporal Structures for Chest X-Ray Vision-Language Understanding	Yixiong Chen et.al.	2502.20509	null
2025-02-27	KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model	Kai Zhang et.al.	2502.20350	null
2025-02-27	Expertise Is What We Want	Alan Ashworth et.al.	2502.20335	null
2025-02-27	MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue	Yujia Chen et.al.	2502.19860	null
2025-03-03	R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning	Minggui He et.al.	2502.19735	null
2025-02-27	Preference Learning Unlocks LLMs’ Psycho-Counseling Skills	Mian Zhang et.al.	2502.19731	null
2025-02-27	SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning	Mingsheng Cai et.al.	2502.19668	null
2025-02-26	Repurposing the scientific literature with vision-language models	Anton Alyakin et.al.	2502.19546	null
2025-02-26	Conversational Planning for Personal Plans	Konstantina Christakopoulou et.al.	2502.19500	null
2025-02-26	MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis	Daniel Rose et.al.	2502.19175	null
2025-02-26	Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection	Carter Adams et.al.	2502.18823	null
2025-02-26	TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation	Chenlu Ju et.al.	2502.18712	link
2025-02-23	RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis	Jianwei Wang et.al.	2502.18517	null
2025-02-26	Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support	Guoxin Wang et.al.	2502.18274	link
2025-02-25	DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning	Pusheng Xu et.al.	2502.17947	null
2025-02-25	Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation	Tong Li et.al.	2502.17899	null
2025-02-24	Wearable Meets LLM for Stress Management: A Duoethnographic Study Integrating Wearable-Triggered Stressors and LLM Chatbots for Personalized Interventions	Sameer Neupane et.al.	2502.17650	null
2025-02-24	Towards Conditioning Clinical Text Generation for User Control	Osman Alperen Koraş et.al.	2502.17571	null
2025-02-18	User Intent to Use DeekSeep for Healthcare Purposes and their Trust in the Large Language Model: Multinational Survey Study	Avishek Choudhury et.al.	2502.17487	null
2025-03-04	Large Language Models are Powerful EHR Encoders	Stefan Hegselmann et.al.	2502.17403	link
2025-02-24	Real-time Monitoring of Economic Shocks using Company Websites	Michael Koenig et.al.	2502.17161	null
2025-02-24	Applications of Large Models in Medicine	YunHe Su et.al.	2502.17132	null
2025-02-23	GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking	Yingjian Chen et.al.	2502.16514	null
2025-02-22	Large Language Model for Lossless Image Compression with Visual Prompts	Junhao Du et.al.	2502.16163	null
2025-02-25	Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation	Won Seok Jang et.al.	2502.16022	null
2025-02-21	AutoMedPrompt: A New Framework for Optimizing LLM Medical Prompts Using Textual Gradients	Sean Wu et.al.	2502.15944	null
2025-02-21	“Kya family planning after marriage hoti hai?”: Integrating Cultural Sensitivity in an LLM Chatbot for Reproductive Health	Roshini Deva et.al.	2502.15939	null
2025-02-21	CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models	Rikhiya Ghosh et.al.	2502.15932	null
2025-02-21	A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare	Manar Aljohani et.al.	2502.15871	null
2025-02-21	MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models	Suraj Racha et.al.	2502.15418	link
2025-02-20	Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson’s Disease	Elliot Schumacher et.al.	2502.15069	null
2025-02-20	Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning	Shuyue Stella Li et.al.	2502.14860	link
2025-02-20	Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning	Juraj Vladika et.al.	2502.14765	link
2025-02-21	Data-Constrained Synthesis of Training Data for De-Identification	Thomas Vakili et.al.	2502.14677	null
2025-02-20	FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis	Mingyi Jia et.al.	2502.14614	null
2025-02-20	MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models	Shrey Pandit et.al.	2502.14302	null
2025-02-20	Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment	Jiaxi Li et.al.	2502.14275	null
2025-03-03	QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification	Hajar Sakai et.al.	2502.14189	null
2025-02-18	Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics	Kabir Kumar et.al.	2502.13982	null
2025-02-19	Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health	Xingbo Wang et.al.	2502.13920	link
2025-02-19	VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare	Anudeex Shetty et.al.	2502.13775	null
2025-02-19	Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs	Yushi Feng et.al.	2502.13555	link
2025-02-19	Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion	Shuai Niu et.al.	2502.13509	null
2025-02-19	Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning	Yang Yan et.al.	2502.13447	null
2025-02-19	RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering	Sichu Liang et.al.	2502.13361	null
2025-02-18	Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare	Hiba Ahsan et.al.	2502.13319	null
2025-02-18	SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?	Yucheng Shi et.al.	2502.13233	null
2025-02-18	Private Text Generation by Seeding Large Language Model Prompts	Supriya Nagesh et.al.	2502.13193	null
2025-02-18	Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge	Mohammad Reza Rezaei et.al.	2502.13010	null
2025-02-18	An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation	Mohammad Feli et.al.	2502.12836	null
2025-02-18	Baichuan-M1: Pushing the Medical Capability of Large Language Models	Bingning Wang et.al.	2502.12671	null
2025-02-18	Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs: Evidence and Mechanisms for AI Agents to Inform Policy Decisions	Karthik Sreedhar et.al.	2502.12504	null
2025-02-18	USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner	Mingcong Chen et.al.	2502.12498	null
2025-02-14	Leveraging large language models for structured information extraction from pathology reports	Jeya Balaji Balasubramanian et.al.	2502.12183	link
2025-02-17	Exploring Large Language Models in Healthcare: Insights into Corpora Sources, Customization Strategies, and Evaluation Metrics	Shuqi Yang et.al.	2502.11861	null
2025-02-17	LLM Agents Making Agent Tools	Georg Wölflein et.al.	2502.11705	link
2025-02-17	CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation	Guangya Yu et.al.	2502.11703	null
2025-02-17	A Survey of Personalized Large Language Models: Progress and Future Directions	Jiahong Liu et.al.	2502.11528	link
2025-02-16	A Survey of LLM-based Agents in Medicine: How far are we from Baymax?	Wenxuan Wang et.al.	2502.11211	null
2025-02-16	Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications	Alexandru Lecu et.al.	2502.11108	link
2025-02-16	A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions	Hongbin Na et.al.	2502.11095	null
2025-02-16	SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information	Xiangyu Zhang et.al.	2502.10950	null
2025-02-15	Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia	Rohith Perumandla et.al.	2502.10896	null
2025-02-15	ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis	Xueshen Li et.al.	2502.10620	null
2025-02-14	Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes	Ezinne Nwankwo et.al.	2502.10605	null
2025-02-21	HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation	Tianwei Lin et.al.	2502.09838	link
2025-02-12	Cancer Vaccine Adjuvant Name Recognition from Biomedical Literature using Large Language Models	Hasin Rehana et.al.	2502.09659	null
2025-02-17	Zero-shot generation of synthetic neurosurgical data with large language models	Austin A. Barr et.al.	2502.09566	link
2025-02-13	Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs	Chang Liu et.al.	2502.09156	null
2025-02-13	Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech	Jonathan Pofcher et.al.	2502.09004	null
2025-02-13	Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning	Leon Nissen et.al.	2502.08954	link
2025-02-12	Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models	Tabinda Sarwar et.al.	2502.08669	null
2025-02-12	SycEval: Evaluating LLM Sycophancy	Aaron Fanous et.al.	2502.08177	null
2025-02-12	Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD)	Naomi Akhras et.al.	2502.08073	null
2025-02-11	Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?	Hye Sun Yun et.al.	2502.07963	link
2025-02-12	Beyond Prompting: Time2Lang – Bridging Time-Series Foundation Models and Large Language Models for Health Sensing	Arvind Pillai et.al.	2502.07608	link
2025-02-11	Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning	Jiayuan Zhu et.al.	2502.07143	null
2025-02-10	Interactive Data Harmonization with LLM Agents	Aécio Santos et.al.	2502.07132	null
2025-02-09	LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison	Gabriele De Vito et.al.	2502.06890	null
2025-02-06	Integrating Generative Artificial Intelligence in ADRD: A Framework for Streamlining Diagnosis and Care in Neurodegenerative Diseases	Andrew G. Breithaupt et.al.	2502.06842	null
2025-02-04	Diffusion Instruction Tuning	Chen Jin et.al.	2502.06814	null
2025-02-10	Automatic Evaluation of Healthcare LLMs Beyond Question-Answering	Anna Arias-Duart et.al.	2502.06666	null
2025-02-10	Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy	Kamyar Kazari et.al.	2502.06150	null
2025-02-09	HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents	Mohammad Amin Abbasi et.al.	2502.05982	null
2025-02-09	A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography	Nicholas Evans et.al.	2502.05926	null
2025-02-09	Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models	Shiyu Teng et.al.	2502.05879	null
2025-02-09	Large Language Model-based Nonnegative Matrix Factorization For Cardiorespiratory Sound Separation	Yasaman Torabi et.al.	2502.05757	null
2025-02-09	RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care	Ziqi Yang et.al.	2502.05740	null
2025-02-08	KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy	Hyunjong Kim et.al.	2502.05651	null
2025-02-08	ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports	Aynur Guluzade et.al.	2502.05638	link
2025-02-08	OntoTune: Ontology-Driven Self-training for Aligning Large Language Models	Zhiqiang Liu et.al.	2502.05478	link
2025-02-12	Safety at Scale: A Comprehensive Survey of Large Model Safety	Xingjun Ma et.al.	2502.05206	link
2025-02-07	“It Felt Like I Was Left in the Dark”: Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings	Shihan Fu et.al.	2502.05115	null
2025-02-07	Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy	Rishabh Uapadhyay et.al.	2502.04666	null
2025-02-05	Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning	Jonathan Kim et.al.	2502.04381	null
2025-02-04	Open Foundation Models in Healthcare: Challenges, Paradoxes, and Opportunities with GenAI Driven Personalized Prescription	Mahdi Alkaeed et.al.	2502.04356	null
2025-02-04	JingFang: A Traditional Chinese Medicine Large Language Model of Expert-Level Medical Diagnosis and Syndrome Differentiation-Based Treatment	Yehan Yan et.al.	2502.04345	null
2025-02-06	Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond	Mardhiyah Sanni et.al.	2502.03945	null
2025-02-05	A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause	Roshini Deva et.al.	2502.03579	null
2025-02-05	MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters	Amin Dada et.al.	2502.03298	null
2025-02-05	MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation	Seonok Kim et.al.	2502.03004	null
2025-02-05	CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration	Yizhe Yang et.al.	2502.02807	null
2025-02-04	Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation	Atharva Mangeshkumar Agrawal et.al.	2502.02249	null
2025-02-02	Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model	Hadas Ben-Atya et.al.	2502.01691	null
2025-02-03	OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology	Chengfeng Zhou et.al.	2502.01243	null
2025-02-02	Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale	Cliff Wong et.al.	2502.00943	null
2025-02-02	Generalization of Medical Large Language Models through Cross-Domain Weak Supervision	Robert Long et.al.	2502.00832	null
2025-01-31	Fairshare Data Pricing for Large Language Models	Luyang Zhang et.al.	2502.00198	null
2025-01-31	DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets	Abdurrahim Yilmaz et.al.	2502.00196	null
2025-02-04	AIN: The Arabic INclusive Large Multimodal Model	Ahmed Heakl et.al.	2502.00094	link
2025-01-30	A Multi-Layered Large Language Model Framework for Disease Prediction	Malak Mohamed et.al.	2502.00063	null
2025-01-21	Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients	Abdulaziz Ahmed et.al.	2502.00025	null
2025-01-30	Survey and Improvement Strategies for Gene Prioritization with Large Language Models	Matthew Neeley et.al.	2501.18794	null
2025-01-30	Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning	Maya Kruse et.al.	2501.18724	null
2025-02-03	Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models	Manish Sanwal et.al.	2501.18645	null
2025-01-27	Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare	Hang Zhang et.al.	2501.18632	null
2025-01-30	GENIE: Generative Note Information Extraction model for structuring EHR data	Huaiyuan Ying et.al.	2501.18435	null
2025-01-30	Battery State of Health Estimation Using LLM Framework	Aybars Yunusoglu et.al.	2501.18123	null
2025-01-29	Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations	Zijie Liu et.al.	2501.17860	null
2025-01-29	LLM Assistance for Pediatric Depression	Mariia Ignashina et.al.	2501.17510	null
2025-01-28	Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction	Mingyu Derek Ma et.al.	2501.17326	null
2025-01-28	Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology	Peilong Wang et.al.	2501.17286	null
2025-01-28	Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care	Fengpei Yuan et.al.	2501.17206	null
2025-01-27	A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis	Aysegul Ucar et.al.	2501.17190	null
2025-01-28	Adapting Network Information to Semantics for Generalizable and Plug-and-Play Multi-Scenario Network Diagnosis	Tiao Tan et.al.	2501.16842	null
2025-01-28	VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records	Philip Chung et.al.	2501.16672	link
2025-01-27	A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain	Jorge del Pozo Lérida et.al.	2501.16533	null
2025-01-27	Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM	Payal Kamboj et.al.	2501.16481	link
2025-01-24	GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration	Ziwen Li et.al.	2501.16382	link
2025-01-18	An Integrated Approach to AI-Generated Content in e-health	Tasnim Ahmed et.al.	2501.16348	null
2025-01-27	A foundation model for human-AI collaboration in medical literature mining	Zifeng Wang et.al.	2501.16255	null
2025-01-27	Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models	Huayu Li et.al.	2501.16215	link
2025-01-27	MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer	Qi Chen et.al.	2501.15826	null
2025-01-26	Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals	Yinzhou Wang et.al.	2501.15599	null
2025-01-25	The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders?	Ayo Adedeji et.al.	2501.15310	null
2025-01-25	Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training	Xunxin Cai et.al.	2501.15108	null
2025-01-25	Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations	Harshita Chopra et.al.	2501.15056	null
2025-01-24	Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs	Hang Luo et.al.	2501.14892	link
2025-01-24	Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?	Ipek Baris Schlicht et.al.	2501.14719	null
2025-01-24	MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications	Yixing Jiang et.al.	2501.14654	link
2025-01-24	AI Chatbots as Professional Service Agents: Developing a Professional Identity	Wenwen Li et.al.	2501.14179	null
2025-01-23	MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning	Joshua Davis et.al.	2501.14105	link
2025-01-23	Leveraging Large Language Models to Analyze Emotional and Contextual Drivers of Teen Substance Use in Online Discussions	Jianfeng Zhu et.al.	2501.14037	null
2025-01-23	Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs	Bhumika Gupta et.al.	2501.13984	null
2025-01-21	Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs)	Jadon Geathers et.al.	2501.13957	null
2025-01-20	A Layered Multi-Expert Framework for Long-Context Mental Health Assessments	Jinwen Tang et.al.	2501.13951	null
2025-01-14	Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications	Arjun R. Malghan et.al.	2501.13936	null
2025-01-23	Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change	Mowafak Allaham et.al.	2501.13802	null
2025-01-22	Intelligent Exercise and Feedback System for Social Healthcare using LLMOps	Yeongrak Choi et.al.	2501.13723	null
2025-01-23	Question Answering on Patient Medical Records with Private Fine-Tuned LLMs	Sara Kothari et.al.	2501.13687	null
2025-01-23	How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization	Shezheng Song et.al.	2501.13669	null
2025-01-20	Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness	Ambreesh Parthasarathy et.al.	2501.13120	null
2025-01-21	Can open source large language models be used for tumor documentation in Germany? – An evaluation on urological doctors’ notes	Stefan Lenz et.al.	2501.12106	link
2025-01-23	Med-R $^2$ : Crafting Trustworthy LLM Physicians through Retrieval and Reasoning of Evidence-Based Medicine	Keer Lu et.al.	2501.11885	link
2025-01-19	Clinical trial cohort selection using Large Language Models on n2c2 Challenges	Chi-en Amy Tai et.al.	2501.11114	null
2025-01-18	Iterative Tree Analysis for Medical Critics	Zenan Huang et.al.	2501.10642	null
2025-01-17	Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education	William Hersh et.al.	2501.10186	null
2025-01-17	Demo: Interactive Visualization of Semantic Relationships in a Biomedical Project’s Talent Knowledge Graph	Jiawei Xu et.al.	2501.09909	null
2025-01-17	Position: Open and Closed Large Language Models in Healthcare	Jiawei Xu et.al.	2501.09906	null
2025-01-16	Bridging Language Barriers in Healthcare: A Study on Arabic LLMs	Nada Saadi et.al.	2501.09825	null
2025-01-16	Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval	Jesus Lovon et.al.	2501.09384	link
2025-01-16	FineMedLM-o1: Enhancing the Medical Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training	Hongzhou Yu et.al.	2501.09213	link
2025-01-17	Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models	Emma Croxford et.al.	2501.08977	null
2025-01-26	Enhanced Large Language Models for Effective Screening of Depression and Anxiety	June M. Liu et.al.	2501.08769	null
2025-01-14	ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations	Ziyuan Huang et.al.	2501.08324	null
2025-01-14	ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems	Mohita Chowdhury et.al.	2501.08208	null
2025-01-13	Large Language Models for Interpretable Mental Health Diagnosis	Brian Hyeongseok Kim et.al.	2501.07653	null
2025-01-13	RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment	Difei Gu et.al.	2501.07525	link
2025-01-13	Combining LLM decision and RL action selection to improve RL policy for adaptive interventions	Karine Karine et.al.	2501.06980	null
2025-01-12	Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives	Xinyao Ma et.al.	2501.06964	null
2025-01-12	A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context	Noureldin Zahran et.al.	2501.06859	null
2025-01-12	Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation	Shunfan Zheng et.al.	2501.06741	null
2025-01-21	MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare	Ye Chen et.al.	2501.06465	null
2025-01-11	O1 Replication Journey – Part 3: Inference-time Scaling for Medical Reasoning	Zhongzhen Huang et.al.	2501.06458	link
2025-01-10	AFRIDOC-MT: Document-level MT Corpus for African Languages	Jesujoba O. Alabi et.al.	2501.06374	link
2025-01-10	Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts	Elizabeth Schaefer et.al.	2501.06365	null
2025-01-10	Large Language Models for Bioinformatics	Wei Ruan et.al.	2501.06271	null
2025-01-10	From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy	Elham Aghakhani et.al.	2501.06101	null
2025-01-07	Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding	John C. Rollman et.al.	2501.05479	null
2025-01-18	LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models	Hang Yang et.al.	2501.05464	null
2025-01-09	Investigating Numerical Translation with Large Language Models	Wei Tang et.al.	2501.04927	null
2025-01-07	LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment	Gaoussou Youssouf Kebe et.al.	2501.03624	null
2025-01-06	Existential Crisis: A Social Robot’s Reason for Being	Dora Medgyesy et.al.	2501.03376	null
2025-01-06	Design and implementation of tools to build an ontology of Security Requirements for Internet of Medical Things	Daniel Naro et.al.	2501.03067	null
2025-01-06	IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment	Yiming Zhang et.al.	2501.02869	null
2025-01-05	Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine	Yishen Liu et.al.	2501.02471	null
2025-01-05	Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications	Zhe Chen et.al.	2501.02460	null
2025-01-04	Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations	Kangyu Zhu et.al.	2501.02385	null
2025-01-04	Exploring the Capabilities and Limitations of Large Language Models for Radiation Oncology Decision Support	Florian Putz et.al.	2501.02346	null
2025-01-03	PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents	Jingoo Lee et.al.	2501.01594	null
2025-01-02	Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments – The Depression and Anxiety Case	Kaushik Roy et.al.	2501.01305	null
2025-01-02	Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice	Federico Ravenda et.al.	2501.00982	link
2024-12-31	CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care	Michael Gubanov et.al.	2501.00223	null
2024-12-31	An Empirical Evaluation of Large Language Models on Consumer Health Questions	Moaiz Abrar et.al.	2501.00208	null
2024-12-31	GPT-4 on Clinic Depression Assessment: An LLM-Based Pilot Study	Giuliano Lorenzoni et.al.	2501.00199	null
2024-12-30	Temporal reasoning for timeline summarisation in social media	Jiayu Song et.al.	2501.00152	null
2024-12-30	Tackling Cognitive Impairment Detection from Speech: A submission to the PROCESS Challenge	Catarina Botelho et.al.	2501.00145	null
2024-12-21	Distilling Large Language Models for Efficient Clinical Information Extraction	Karthik S. Vedula et.al.	2501.00031	null
2024-12-29	Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain	Shintaro Ozaki et.al.	2412.20309	link
2024-12-28	On the Compositional Generalization of Multimodal LLMs for Medical Imaging	Zhenyang Cai et.al.	2412.20070	link
2024-12-28	The Emotional Spectrum of LLMs: Leveraging Empathy and Emotion-Based Markers for Mental Health Support	Alessandro De Grandi et.al.	2412.20068	null
2025-01-02	MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes	Asma Ben Abacha et.al.	2412.19260	link
2025-01-03	MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models	Kaiwen Zuo et.al.	2412.18947	null
2024-12-25	HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs	Junying Chen et.al.	2412.18925	link
2024-12-24	Research on the Proximity Relationships of Psychosomatic Disease Knowledge Graph Modules Extracted by Large Language Models	Zihan Zhou et.al.	2412.18419	null
2024-12-24	Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine	Yu He Ke et.al.	2412.18096	null
2024-12-23	Generating Completions for Fragmented Broca’s Aphasic Sentences Using Large Language Models	Sijbren van Vaals et.al.	2412.17669	link
2024-12-23	Detecting anxiety and depression in dialogues: a multi-label and explainable approach	Francisco de Arriba-Pérez et.al.	2412.17651	null
2025-01-01	PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health	Huy Vu et.al.	2412.16882	link
2025-01-03	KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis	Kaiwen Zuo et.al.	2412.16833	null
2024-12-21	AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles	Aritra Kumar Lahiri et.al.	2412.16701	null
2024-12-21	Evaluating the Performance of Large Language Models in Scientific Claim Detection and Classification	Tanjim Bin Faruk et.al.	2412.16486	null
2024-12-21	Technical Report: Small Language Model for Japanese Clinical and Medicine	Shogo Watanabe et.al.	2412.16423	null
2024-12-21	Identifying Cyberbullying Roles in Social Media	Manuel Sandoval et.al.	2412.16417	null
2024-12-20	A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models	Ferit Akaybicen et.al.	2412.16341	null
2024-12-20	Improving Equity in Health Modeling with GPT4-Turbo Generated Synthetic Data: A Comparative Study	Daniel Smolyak et.al.	2412.16335	null
2024-12-20	Benchmarking LLMs and SLMs for patient reported outcomes	Matteo Marengo et.al.	2412.16291	null
2024-12-20	Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG	Hasan Md Tusfiqur Alam et.al.	2412.16086	link
2024-12-20	From General to Specific: Tailoring Large Language Models for Personalized Healthcare	Ruize Shi et.al.	2412.15957	null
2024-12-20	Linguistic Features Extracted by GPT-4 Improve Alzheimer’s Disease Detection based on Spontaneous Speech	Jonathan Heitz et.al.	2412.15772	link
2024-12-20	Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models	Shamus Sim et.al.	2412.15748	null
2024-12-20	NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning	Zheyuan Zhang et.al.	2412.15547	null
2024-12-17	A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models	Gongbo Zhang et.al.	2412.15271	null
2024-12-16	Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search	Edward Kim et.al.	2412.15256	null
2024-12-13	Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an “AI Therapist”	Robert Wasenmüller et.al.	2412.15242	null
2024-12-23	CareBot: A Pioneering Full-Process Open-Source Medical Language Model	Lulu Zhao et.al.	2412.15236	null
2024-12-18	Clinical Trials Ontology Engineering with Large Language Models	Berkan Çakır et.al.	2412.14387	null
2024-12-18	Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICs	David Restrepo et.al.	2412.14304	null
2024-12-18	Discovering maximally consistent distribution of causal tournaments with Large Language Models	Federico Baldo et.al.	2412.14019	null
2024-12-18	Cognition Chain for Explainable Psychological Stress Detection on Social Media	Xin Wang et.al.	2412.14009	link
2025-01-08	Federated Learning and RAG Integration: A Scalable Approach for Medical Large Language Models	Jincheol Jung et.al.	2412.13720	null
2024-12-18	Exploring Multi-Modal Integration with Tool-Augmented LLM Agents for Precise Causal Discovery	ChengAo Shen et.al.	2412.13667	link
2024-12-18	PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling	Haojie Xie et.al.	2412.13660	link
2024-12-17	Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health	Vivek Kumar et.al.	2412.12981	link
2024-12-17	Process-Supervised Reward Models for Clinical Note Generation: A Scalable Approach Guided by Domain Expertise	Hanyin Wang et.al.	2412.12583	link
2024-12-17	RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment	Xuanzhong Chen et.al.	2412.12475	null
2024-12-17	Assessing the Limitations of Large Language Models in Clinical Fact Decomposition	Monica Munnangi et.al.	2412.12422	link
2024-12-16	Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments	Tuka Alhanai et.al.	2412.12417	link
2024-12-11	Performance of a large language model-Artificial Intelligence based chatbot for counseling patients with sexually transmitted infections and genital diseases	Nikhil Mehta et.al.	2412.12166	null
2024-12-16	LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts	Zhuhao Wang et.al.	2412.12001	link
2024-12-16	Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives	Sam Relins et.al.	2412.11878	link
2024-12-16	LLMs Can Simulate Standardized Patients via Agent Coevolution	Zhuoyun Du et.al.	2412.11716	link
2024-12-16	Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery	Ryuhaerang Choi et.al.	2412.11656	null
2024-12-16	ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models	Xiechi Zhang et.al.	2412.11453	null
2024-12-19	TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs	Lanxiang Hu et.al.	2412.11242	null
2024-12-15	AD-LLM: Benchmarking Large Language Models for Anomaly Detection	Tiankai Yang et.al.	2412.11142	link
2024-12-15	HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation	Tengfei Liu et.al.	2412.11070	link
2024-12-17	MedG-KRP: Medical Graph Knowledge Representation Probing	Gabriel R. Rosenbaum et.al.	2412.10982	null
2024-12-14	LLMs-in-the-Loop Part 2: Expert Small AI Models for Anonymization and De-identification of PHI Across Multiple Languages	Murat Gunay et.al.	2412.10918	null
2024-12-14	Superhuman performance of a large language model on the reasoning tasks of a physician	Peter G. Brodeur et.al.	2412.10849	null
2024-12-14	Large Language Models for Medical Forecasting – Foresight 2	Zeljko Kraljevic et.al.	2412.10848	null
2024-12-14	A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options	Peilong Wang et.al.	2412.10622	null
2024-12-09	Leveraging Audio and Text Modalities in Mental Health: A Study of LLMs Performance	Abdelrahman A. Ali et.al.	2412.10417	null
2024-12-09	Exploring Complex Mental Health Symptoms via Classifying Social Media Data with Explainable LLMs	Kexin Chen et.al.	2412.10414	null
2024-12-13	UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities	Muhammad Uzair Khattak et.al.	2412.10372	link
2024-12-12	MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced Interpretation	Zheyuan Zhang et.al.	2412.08847	link
2024-12-11	Detecting Conversational Mental Manipulation with Intent-Aware Prompting	Jiayuan Ma et.al.	2412.08414	link
2024-12-10	BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities	Sahal Shaji Mullappilly et.al.	2412.07769	link
2024-12-10	Zero-Shot ATC Coding with Large Language Models for Clinical Assessments	Zijian Chen et.al.	2412.07743	null
2024-12-09	Balancing Efficiency and Effectiveness: An LLM-Infused Approach for Optimized CTR Prediction	Guoxiao Zhang et.al.	2412.06860	null
2024-12-06	Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System	Fang Zeng et.al.	2412.06828	null
2024-12-12	PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Qian Zhang et.al.	2412.06287	link
2024-12-09	MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization	Kangyu Zhu et.al.	2412.06141	link
2024-12-08	Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis	Aman Kassahun Wassie et.al.	2412.05862	null
2024-12-08	Are Clinical T5 Models Better for Clinical Text?	Yahan Li et.al.	2412.05845	link
2024-12-09	Enhancing FKG.in: automating Indian food composition analysis	Saransh Kumar Gupta et.al.	2412.05248	null
2024-12-06	SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot	Jinlin Wu et.al.	2412.05187	link
2024-12-06	A text-to-tabular approach to generate synthetic patient data using LLMs	Margaux Tornqvist et.al.	2412.05153	link
2024-12-05	Give me Some Hard Questions: Synthetic Data Generation for Clinical QA	Fan Bai et.al.	2412.04573	link
2024-12-04	Prompting Large Language Models for Clinical Temporal Relation Extraction	Jianping He et.al.	2412.04512	null
2024-12-05	Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots	Maria Paola Priola et.al.	2412.04235	null
2024-12-05	Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models	Abdelrahaman A. Hassan et.al.	2412.03796	null
2024-11-28	CovidLLM: A Robust Large Language Model with Missing Value Adaptation and Multi-Objective Learning Strategy for Predicting Disease Severity and Clinical Outcomes in COVID-19 Patients	Shengjun Zhu et.al.	2412.03593	link
2024-12-04	A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences	Gabriel Lino Garcia et.al.	2412.03531	null
2024-12-04	Advancing Conversational Psychotherapy: Integrating Privacy, Dual-Memory, and Domain Expertise with Large Language Models	XiuYu Zhang et.al.	2412.02987	null
2024-12-03	A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications	Yixiang Qu et.al.	2412.02868	null
2024-12-09	RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models	Hieu Tran et.al.	2412.02830	link
2024-12-03	Keeping Experts in the Loop: Expert-Guided Optimization for Clinical Data Classification using Large Language Models	Nader Karayanni et.al.	2412.02173	null
2024-12-04	The use of large language models to enhance cancer clinical trial educational materials	Mingye Gao et.al.	2412.01955	null
2024-12-02	Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking	Jie Liu et.al.	2412.01605	null
2024-12-02	Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models	Chayan Tank et.al.	2412.01353	null
2024-12-02	Best Practices for Large Language Models in Radiology	Christian Bluethgen et.al.	2412.01233	null
2024-12-01	Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages	Edward Bayes et.al.	2412.00948	null
2024-12-06	Opus: A Large Work Model for Complex Workflow Generation	Théo Fagnoni et.al.	2412.00573	null
2024-11-30	Polish Medical Exams: A new dataset for cross-lingual medical knowledge transfer assessment	Łukasz Grzybowski et.al.	2412.00559	null
2024-12-07	Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective	Yue Zhou et.al.	2412.00554	null
2024-11-30	CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models	Yan Wang et.al.	2412.00491	null
2024-11-29	SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies	Jiachen Lian et.al.	2412.00265	null
2024-11-29	Fine Tuning Large Language Models to Deliver CBT for Depression	Talha Tahir et.al.	2412.00251	link
2024-11-24	Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis	Ferhat Ozgur Catak et.al.	2412.00056	null
2024-11-29	MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks	John Francis et.al.	2411.19689	null
2024-11-29	SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks	Kim-Celine Kahl et.al.	2411.19688	link
2024-11-28	ComViewer: An Interactive Visual Tool to Help Viewers Seek Social Support in Online Mental Health Communities	Shiwei Wu et.al.	2411.19169	link
2024-11-28	A Unified Platform for At-Home Post-Stroke Rehabilitation Enabled by Wearable Technologies and Artificial Intelligence	Chenyu Tang et.al.	2411.19000	null
2024-11-28	Rephrasing Electronic Health Records for Pretraining Clinical Language Models	Jinghui Liu et.al.	2411.18940	null
2024-11-28	Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer’s Disease	Junan Li et.al.	2411.18922	null
2024-12-06	LLM-ABBA: Understanding time series via symbolic approximation	Erin Carson et.al.	2411.18506	null
2024-11-28	Wearable intelligent throat enables natural speech in stroke patients with dysarthria	Chenyu Tang et.al.	2411.18266	null
2024-11-29	InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks	Xinyao Zheng et.al.	2411.18191	null
2024-11-27	Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track	Deepak Gupta et.al.	2411.18069	null
2024-11-27	QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions	Ramez Kouzy et.al.	2411.17967	link
2024-11-26	Synthetic Data Generation with LLM for Improved Depression Prediction	Andrea Kang et.al.	2411.17672	null
2024-11-26	Can artificial intelligence predict clinical trial outcomes?	Shuyi Jin et.al.	2411.17595	null
2024-11-26	The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations	Theodora Worledge et.al.	2411.17375	link
2024-12-10	Using Large Language Models for Expert Prior Elicitation in Predictive Modelling	Alexander Capstick et.al.	2411.17284	link
2024-11-28	Strategic Prompting for Conversational Tasks: A Comparative Analysis of Large Language Models Across Diverse Conversational Tasks	Ratnesh Kumar Joshi et.al.	2411.17204	null
2024-11-25	Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries	Harshavardhan Battula et.al.	2411.16818	null
2024-11-27	Creating Scalable AGI: the Open General Intelligence Framework	Daniel A. Dollinger et.al.	2411.15832	null
2024-11-24	RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements	Zaifu Zhan et.al.	2411.15700	null
2024-11-23	Ontology-Constrained Generation of Domain-Specific Clinical Summaries	Gaya Mehenni et.al.	2411.15666	link
2024-11-27	AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset	Tobi Olatunji et.al.	2411.15640	null
2024-11-23	Large Language Model with Region-guided Referring and Grounding for CT Report Generation	Zhixuan Chen et.al.	2411.15539	link
2024-11-23	The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges	Jiqun Liu et.al.	2411.15396	null
2024-11-22	Regulator-Manufacturer AI Agents Modeling: Mathematical Feedback-Driven Multi-Agent LLM Framework	Yu Han et.al.	2411.15356	null
2024-11-21	BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models	Taha Koleilat et.al.	2411.15232	link
2024-11-22	Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation	Colin Diggs et.al.	2411.14971	null
2024-11-22	De-biased Multimodal Electrocardiogram Analysis	Haitao Li et.al.	2411.14795	null
2024-11-22	Enhancing Clinical Trial Patient Matching through Knowledge Augmentation with Multi-Agents	Hanwen Shi et.al.	2411.14637	null
2024-11-20	Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine	Yifan Yang et.al.	2411.14487	null
2024-11-16	Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios	Shaochen Xu et.al.	2411.14461	null
2024-11-21	Logic Augmented Generation	Aldo Gangemi et.al.	2411.14012	null
2024-11-21	PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation	Zhijie Bao et.al.	2411.13902	link
2024-11-21	A Multimodal Approach to The Detection and Classification of Skin Diseases	Allen Yang et.al.	2411.13855	null
2024-11-19	Can ChatGPT Overcome Behavioral Biases in the Financial Sector? Classify-and-Rethink: Multi-Step Zero-Shot Reasoning in the Gold Investment	Shuoling Liu et.al.	2411.13599	null
2024-11-20	Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding	Nabeel Seedat et.al.	2411.13163	null
2024-11-19	DIETS: Diabetic Insulin Management System in Everyday Life	Hanyu Zeng et.al.	2411.12812	null
2024-11-19	Conversational Medical AI: Ready for Practice	Antoine Lizée et.al.	2411.12808	null
2024-11-19	Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs	Ahmed Akib Jawad Karim et.al.	2411.12712	null
2024-11-19	Performance of Large Language Models in Technical MRI Question Answering: A Comparative Study	Alan B McMillan et.al.	2411.12238	null
2024-11-18	Medical Video Generation for Disease Progression Simulation	Xu Cao et.al.	2411.11943	null
2024-11-04	Large language models for mental health	Andreas Triantafyllopoulos et.al.	2411.11880	null
2024-11-18	Membership Inference Attack against Long-Context Large Language Models	Zixiong Wang et.al.	2411.11424	null
2024-11-17	BianCang: A Traditional Chinese Medicine Large Language Model	Sibo Wei et.al.	2411.11027	link
2024-11-16	Can Generic LLMs Help Analyze Child-adult Interactions Involving Children with Autism in Clinical Observation?	Tiantian Feng et.al.	2411.10761	null
2024-11-16	Structured Dialogue System for Mental Health: An LLM Chatbot Leveraging the PM+ Guidelines	Yixiang Chen et.al.	2411.10681	link
2024-11-15	Evaluating the role of `Constitutions’ for learning from AI feedback	Saskia Redgate et.al.	2411.10168	null
2024-11-19	Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models?	Yan Hu et.al.	2411.10020	link
2024-11-15	JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging	Kaito Baba et.al.	2411.09933	null
2024-11-15	A Hybrid Artificial Intelligence System for Automated EEG Background Analysis and Report Generation	Chin-Sung Tung et.al.	2411.09874	link
2024-11-19	A Benchmark for Long-Form Medical Question Answering	Pedram Hosseini et.al.	2411.09834	null
2024-11-14	Script-centric behavior understanding for assisted autism spectrum disorder diagnosis	Wenxing Liu et.al.	2411.09413	null
2024-11-14	Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering	Nghia Trung Ngo et.al.	2411.09213	null
2024-11-13	The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models	Daniel P. Jeong et.al.	2411.08870	link
2024-11-14	Optimizing Automatic Summarization of Long Clinical Records Using Dynamic Context Extension:Testing and Evaluation of the NBCE Method	Guoqing Zhang et.al.	2411.08586	null
2024-11-12	Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease	Francesco Chiumento et.al.	2411.07871	null
2024-11-12	Multimodal Clinical Reasoning through Knowledge-augmented Rationale Generation	Shuai Niu et.al.	2411.07611	null
2024-11-11	Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews	Aakash Sorathiya et.al.	2411.07398	null
2024-11-11	A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19	Vedant Khandelwal et.al.	2411.07163	null
2024-11-11	Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models	Aniket Deroy et.al.	2411.06946	null
2024-11-11	Persuasion with Large Language Models: a Survey	Alexander Rogiers et.al.	2411.06837	null
2024-11-11	Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding	Zeyd Boukhers et.al.	2411.06823	null
2024-11-11	Ambient AI Scribing Support: Comparing the Performance of Specialized AI Agentic Architecture to Leading Foundational Models	Chanseo Lee et.al.	2411.06713	null
2024-11-10	In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages	Joseph Gatto et.al.	2411.06549	link
2024-11-10	ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?	Canyu Chen et.al.	2411.06469	null
2024-11-09	GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence	MD Ragib Shahriyear et.al.	2411.06264	null
2024-11-08	Humans Continue to Outperform Large Language Models in Complex Clinical Decision-Making: A Study with Medical Calculators	Nicholas Wan et.al.	2411.05897	null
2024-11-08	Identifying and Decomposing Compound Ingredients in Meal Plans Using Large Language Models	Leon Kopitar et.al.	2411.05892	null
2024-11-08	A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis	Cristiano Patrício et.al.	2411.05609	link
2024-11-08	Analyzing Logs of Large-Scale Software Systems using Time Curves Visualization	Dmytro Borysenkov et.al.	2411.05533	link
2024-11-14	SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark	Sithursan Sivasubramaniam et.al.	2411.05521	link
2024-11-08	Content Quality vs. Attention Allocation: An LLM-Based Case Study in Peer-to-peer Mental Health Networks	Teng Ye et.al.	2411.05328	null
2024-11-07	Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations	Joey Hong et.al.	2411.05194	null
2024-11-11	FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs?	Eric Wu et.al.	2411.05059	link
2024-11-07	Integrating Large Language Models for Genetic Variant Classification	Youssef Boulaimen et.al.	2411.05055	null
2024-11-07	Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability	Yanjun Gao et.al.	2411.04962	null
2024-11-19	Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?	Daniel P. Jeong et.al.	2411.04118	link
2024-11-07	MEG: Medical Knowledge-Augmented Large Language Models for Question Answering	Laura Cabello et.al.	2411.03883	link
2024-11-06	A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients	Yiming Li et.al.	2411.03805	null
2024-11-06	From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond	Harsha Nori et.al.	2411.03590	null
2024-11-05	Exploring Large Language Models for Specialist-level Oncology Care	Anil Palepu et.al.	2411.03395	null
2024-11-05	The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare	Souren Pashangpour et.al.	2411.03287	null
2024-11-05	[Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI	Maren Pielka et.al.	2411.02973	null
2024-11-04	Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge	Karthik Soman et.al.	2411.02657	link
2024-11-04	“It’s a conversation, not a quiz”: A Risk Taxonomy and Reflection Tool for LLM Adoption in Public Health	Jiawei Zhou et.al.	2411.02594	null
2024-11-01	Evaluating the Impact of Lab Test Results on Large Language Models Generated Differential Diagnoses from Clinical Case Vignettes	Balu Bhasuran et.al.	2411.02523	null
2024-11-01	Rationale-Guided Retrieval Augmented Generation for Medical Question Answering	Jiwoong Sohn et.al.	2411.00300	link
2024-11-16	RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models	Serena Zhang et.al.	2411.00299	null
2024-10-31	A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making	Yubin Kim et.al.	2411.00248	link
2024-10-31	Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning	John Wu et.al.	2411.00173	null
2024-10-28	A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges	Zifeng Wang et.al.	2411.00024	null
2024-10-31	Leveraging Large Language Models for Medical Information Extraction and Query Generation	Georgios Peikos et.al.	2410.23851	null
2024-10-31	Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding	Jinlong He et.al.	2410.23822	null
2024-10-31	The Potential of LLMs in Medical Education: Generating Questions and Answers for Qualification Exams	Yunqi Zhu et.al.	2410.23769	null
2024-11-01	Large Language Models for Patient Comments Multi-Label Classification	Hajar Sakai et.al.	2410.23528	null
2024-10-31	LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models	Hieu Tran et.al.	2410.23526	null
2024-10-29	Do Large Language Models Align with Core Mental Health Counseling Competencies?	Viet Cuong Nguyen et.al.	2410.22446	null
2024-10-29	Improving In-Context Learning with Small Language Model Ensembles	M. Mehdi Mojarradi et.al.	2410.21868	link
2024-10-28	Can Large Language Models Replace Data Scientists in Clinical Research?	Zifeng Wang et.al.	2410.21591	null
2024-10-28	LLM-Forest for Health Tabular Data Imputation	Xinrui He et.al.	2410.21520	null
2024-10-28	RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension	Abel Corrêa Dias et.al.	2410.21495	link
2024-11-01	“We do use it, but not how hearing people think”: How the Deaf and Hard of Hearing Community Uses Large Language Model Tools	Shuxu Huffman et.al.	2410.21358	null
2024-10-28	Large Language Model Benchmarks in Medical Tasks	Lawrence K. Q. Yan et.al.	2410.21348	null
2024-10-27	Language Models And A Second Opinion Use Case: The Pocket Professional	David Noever et.al.	2410.20636	null
2024-10-26	Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks	Annalisa Szymanski et.al.	2410.20266	null
2024-10-26	Infectious Disease Forecasting in India using LLM’s and Deep Learning	Chaitya Shah et.al.	2410.20168	null
2024-10-26	AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels	Lei Li et.al.	2410.20050	link
2024-10-25	DualMAR: Medical-Augmented Representation from Dual-Expertise Perspectives	Pengfei Hu et.al.	2410.19955	link
2024-10-18	Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research	Aarsh Shekhar et.al.	2410.19826	null
2024-10-24	Inference time LLM alignment in single and multidomain preference spectrum	Sadat Shahriar et.al.	2410.19206	null
2024-10-24	Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use	Mohit Chandra et.al.	2410.19155	link
2024-10-24	Watermarking Large Language Models and the Generated Content: Opportunities and Challenges	Ruisi Zhang et.al.	2410.19096	null
2024-10-24	BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning	Yujuan Velvin Fu et.al.	2410.18955	null
2024-10-24	Demystifying Large Language Models for Medicine: A Primer	Qiao Jin et.al.	2410.18856	link
2024-10-24	Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare	Yifan Yang et.al.	2410.18460	null
2024-10-23	ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents	Yusheng Liao et.al.	2410.17657	link
2024-10-22	DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR	Miguel Contreras et.al.	2410.17363	null
2024-10-22	DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization	John X. Morris et.al.	2410.17035	null
2024-10-22	SleepCoT: A Lightweight Personalized Sleep Health Model via Chain-of-Thought Distillation	Huimin Zheng et.al.	2410.16924	null
2024-10-22	Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective	Xiaolan Chen et.al.	2410.16662	null
2024-10-21	How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?	Kenza Benkirane et.al.	2410.16574	link
2024-10-21	Large language models enabled multiagent ensemble method for efficient EHR data labeling	Jingwei Huang et.al.	2410.16543	null
2024-10-17	SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques	Qiming Guo et.al.	2410.16322	null
2024-10-22	MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report	Samrajya Thapa et.al.	2410.16239	link
2024-10-21	Fine-Tuning LLMs for Reliable Medical Question-Answering Services	Ali Anaissi et.al.	2410.16088	null
2024-10-21	Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding	Derong Xu et.al.	2410.15702	null
2024-10-21	Resource-Efficient Medical Report Generation using Large Language Models	Abdullah et.al.	2410.15642	null
2024-10-20	Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini	Chanseo Lee et.al.	2410.15528	null
2024-10-20	Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training	Shahrad Mohammadzadeh et.al.	2410.15460	null
2024-10-19	AutoFLUKA: A Large Language Model Based Framework for Automating Monte Carlo Simulations in FLUKA	Zavier Ndum Ndum et.al.	2410.15222	null
2024-10-19	Fine-tuning foundational models to code diagnoses from veterinary health records	Mayla R. Boguslav et.al.	2410.15186	null
2024-10-19	Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs	Xiaocheng Zhang et.al.	2410.15135	null
2024-10-19	LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Xuechen Guo et.al.	2410.15074	null
2024-10-18	Enabling Scalable Evaluation of Bias Patterns in Medical LLMs	Hamed Fayyaz et.al.	2410.14763	link
2024-10-18	Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning	Jialu Tang et.al.	2410.14464	null
2024-10-18	ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM	Songheng Zhang et.al.	2410.14331	null
2024-10-18	LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs	Yujun Zhou et.al.	2410.14182	null
2024-10-17	RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs	Jiatan Huang et.al.	2410.13987	null
2024-10-17	HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings	Varun Gumma et.al.	2410.13671	null
2024-10-17	MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling	Yakun Zhu et.al.	2410.13610	null
2024-10-17	Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?	Che Liu et.al.	2410.13523	null
2024-10-17	MedINST: Meta Dataset of Biomedical Instructions	Wenhan Han et.al.	2410.13458	link
2024-10-17	Augmentation Policy Generation for Image Classification Using Large Language Models	Ant Duru et.al.	2410.13453	null
2024-10-17	Representation Learning of Structured Data for Medical Foundation Models	Vijay Prakash Dwivedi et.al.	2410.13351	null
2024-10-17	CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy	Mian Zhang et.al.	2410.13218	null
2024-10-17	LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch	Caigao Jiang et.al.	2410.13213	link
2024-10-18	MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback	Zonghai Yao et.al.	2410.13191	link
2024-10-16	Leveraging LLMs for Translating and Classifying Mental Health Data	Konstantinos Skianis et.al.	2410.12985	null
2024-10-16	AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning	Mohammad Reza Rezaei et.al.	2410.12886	link
2024-10-13	IMAS: A Comprehensive Agentic Approach to Rural Healthcare Delivery	Agasthya Gangavarapu et.al.	2410.12868	link
2024-10-11	LLMD: A Large Language Model for Interpreting Longitudinal Medical Records	Robert Porter et.al.	2410.12860	null
2024-10-11	Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis	Ameer Hamza Shakur et.al.	2410.12858	null
2024-10-10	Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions	Per Niklas Waaler et.al.	2410.12848	null
2024-10-17	Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2	Mohamad Abdi et.al.	2410.12686	null
2024-10-17	MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration	Jinjie Wei et.al.	2410.12532	null
2024-10-16	Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation	Zerui Xu et.al.	2410.12476	null
2024-10-06	SouLLMate: An Adaptive LLM-Driven System for Advanced Mental Health Support and Assessment, Based on a Systematic Application Survey	Qiming Guo et.al.	2410.11859	null
2024-10-15	Y-Mol: A Multiscale Biomedical Knowledge-Guided Large Language Model for Drug Development	Tengfei Ma et.al.	2410.11550	null
2024-10-15	AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data	Xinjie Zhao et.al.	2410.11531	null
2024-10-15	HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications	Weijie Xu et.al.	2410.11239	null
2024-10-13	3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation	Hongxin Ding et.al.	2410.10901	null
2024-10-08	Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging	Ryota Tozuka et.al.	2410.10869	null
2024-10-08	CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept	YuXuan Wu et.al.	2410.10866	null
2024-10-06	Mitigating Hallucinations Using Ensemble of Knowledge Graph and Vector Store in Large Language Models to Enhance Mental Health Support	Abdul Muqtadir et.al.	2410.10853	null
2024-10-06	On the Reliability of Large Language Models to Misinformed and Demographically-Informed Prompts	Toluwani Aremu et.al.	2410.10850	link
2024-10-14	Thinking LLMs: General Instruction Following with Thought Generation	Tianhao Wu et.al.	2410.10630	null
2024-10-14	Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts	Guorui Zheng et.al.	2410.10626	link
2024-10-14	MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media	Wei Zhai et.al.	2410.10323	link
2024-10-13	Adaptive Reasoning and Acting in Medical Language Agents	Abhishek Dutta et.al.	2410.10020	null
2024-10-15	MisinfoEval: Generative AI in the Era of “Alternative Facts”	Saadia Gabriel et.al.	2410.09949	null
2024-10-13	Equitable Access to Justice: Logical LLMs Show Promise	Manuj Kant et.al.	2410.09904	null
2024-10-13	MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions	Tavish Mankash et.al.	2410.09729	null
2024-10-12	Society of Medical Simplifiers	Chen Lyu et.al.	2410.09631	null
2024-10-12	Enhanced Electronic Health Records Text Summarization Using Large Language Models	Ruvarashe Madzime et.al.	2410.09628	null
2024-10-11	Fine-Tuning In-House Large Language Models to Infer Differential Diagnosis from Radiology Reports	Luoyao Chen et.al.	2410.09234	null
2024-10-04	Leveraging Social Determinants of Health in Alzheimer’s Research Using LLM-Augmented Literature Mining and Knowledge Graphs	Tianqi Shang et.al.	2410.09080	link
2024-10-11	oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness	Yu He Ke et.al.	2410.08431	null
2024-10-10	Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions	Kuleen Sasse et.al.	2410.07951	null
2024-10-09	MoDEM: Mixture of Domain Expert Models	Toby Simonds et.al.	2410.07490	null
2024-10-16	Mental Disorders Detection in the Era of Large Language Models	Gleb Kuzmin et.al.	2410.07129	null
2024-10-09	Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback	Dennis Hein et.al.	2410.07025	null
2024-10-09	Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare	Pardis Sadat Zahraei et.al.	2410.06566	null
2024-10-08	Exploring Large Language Models Through a Neurodivergent Lens: Use, Challenges, Community-Driven Workarounds, and Concerns	Buse Carik et.al.	2410.06336	null
2024-10-08	Linking Code and Documentation Churn: Preliminary Analysis	Ani Hovhannisyan et.al.	2410.05992	null
2024-10-10	KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server	Wenhao Wang et.al.	2410.05725	link
2024-10-10	Copiloting Diagnosis of Autism in Real Clinical Scenarios via LLMs	Yi Jiang et.al.	2410.05684	null
2024-10-07	RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction	Yuwei Zhang et.al.	2410.05361	null
2024-10-14	Mitigating the Risk of Health Inequity Exacerbated by Large Language Models	Yuelyu Ji et.al.	2410.05180	null
2024-10-07	Rule-based Data Selection for Large Language Models	Xiaomin Li et.al.	2410.04715	null
2024-10-07	Knowledge Graph Based Agent for Complex, Knowledge-Intensive QA in Medicine	Xiaorui Su et.al.	2410.04660	null
2024-10-06	CardioAI: A Multimodal AI-based System to Support Symptom Monitoring and Risk Detection of Cancer Treatment-Induced Cardiotoxicity	Siyi Wu et.al.	2410.04592	null
2024-10-06	Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval	Pengcheng Jiang et.al.	2410.04585	link
2024-10-06	MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration	Lai Wei et.al.	2410.04521	link
2024-10-06	Latent Feature Mining for Predictive Model Enhancement with Large Language Models	Bingxuan Li et.al.	2410.04347	null
2024-10-05	RoQLlama: A Lightweight Romanian Adapted Language Model	George-Andrei Dima et.al.	2410.04269	null
2024-10-05	DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech	Dominika Woszczyk et.al.	2410.04188	null
2024-10-05	Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment	Chengfeng Dou et.al.	2410.04112	null
2024-10-04	Searching for Best Practices in Medical Transcription with Large Language Model	Jiafeng Li et.al.	2410.03797	link
2024-10-01	Towards Democratization of Subspeciality Medical Expertise	Jack W. O’Sullivan et.al.	2410.03741	null
2024-10-01	Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model	Aidan Gilson et.al.	2410.03740	null
2024-10-04	Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs)	Abrar Rahman et.al.	2410.03568	null
2024-10-04	CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios	Zetian Ouyang et.al.	2410.03502	link
2024-10-04	Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity	Hyosoon Jang et.al.	2410.03138	null
2024-10-04	Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models	Yan Chen et.al.	2410.03134	null
2024-10-04	Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks	Grant Wardle et.al.	2410.03062	null
2024-10-03	HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router	Lingrui Mei et.al.	2410.02684	link
2024-10-03	ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration	Zixiang Wang et.al.	2410.02551	null
2024-10-04	MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation	Gurucharan Marthi Krishna Kumar et.al.	2410.02458	null
2024-10-02	Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics	Yuan Zhou et.al.	2410.02026	null
2024-09-27	A GEN AI Framework for Medical Note Generation	Hui Yi Leong et.al.	2410.01841	null
2024-10-02	DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning	Yebowen Hu et.al.	2410.01772	null
2024-10-03	Practicing Stress Relief for the Everyday: Designing Social Simulation Using VR, AR, and LLMs	Anna Fang et.al.	2410.01672	null
2024-10-02	MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework	Zonghai Yao et.al.	2410.01553	link
2024-10-01	FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks	Peiran Wu et.al.	2410.01089	null
2024-10-01	Deceptive Risks in LLM-enhanced Robots	Robert Ranisch et.al.	2410.00434	null
2024-10-01	CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset	Xiao Wang et.al.	2410.00379	link
2024-10-01	Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis	Chun-Hsiao Yeh et.al.	2410.00292	null
2024-09-30	A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification	Marina Ribeiro et.al.	2410.00250	null
2024-09-30	EEG Emotion Copilot: Pruning LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation	Hongyu Chen et.al.	2410.00166	null
2024-09-30	Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation	Pedro Henrique Paiola et.al.	2410.00163	null
2024-09-30	Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments	Iker De la Iglesia et.al.	2409.20565	null
2024-09-30	Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation	Shan Chen et.al.	2409.20385	null
2024-09-30	Classification of Radiological Text in Small and Imbalanced Datasets in a Non-English Language	Vincent Beliveau et.al.	2409.20147	link
2024-10-01	See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning	Chengxin Zheng et.al.	2409.19676	link
2024-09-29	MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models	Vibhor Agarwal et.al.	2409.19492	null
2024-10-11	HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations	Ziyu Wang et.al.	2409.19487	null
2024-09-28	INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning	Pablo Romero et.al.	2409.19467	link
2024-09-27	Confidential Prompting: Protecting User Prompts from Cloud LLM Providers	In Gim et.al.	2409.19134	link
2024-09-27	Secure Multiparty Generative AI	Manil Shrestha et.al.	2409.19120	null
2024-09-27	Outlining the Borders for LLM Applications in Patient Education: Developing an Expert-in-the-Loop LLM-Powered Chatbot for Prostate Cancer Patient Education	Yuexing Hao et.al.	2409.19100	null
2024-10-01	AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow	Huizi Yu et.al.	2409.18924	null
2024-09-27	Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications	Aditi Godbole et.al.	2409.18454	null
2024-09-26	Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model	Chuang Niu et.al.	2409.18319	link
2024-09-26	Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams	Yuexing Hao et.al.	2409.18290	link
2024-09-26	Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT	Natthanaphop Isaradech et.al.	2409.17683	null
2024-09-26	Digital Twin Ecosystem for Oncology Clinical Operations	Himanshu Pandey et.al.	2409.17650	null
2024-09-26	ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue	Zhangpu Li et.al.	2409.17610	null
2024-09-26	A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models	Syed Affan Daimi et.al.	2409.17581	link
2024-09-26	Dr. GPT in Campus Counseling: Understanding Higher Education Students’ Opinions on LLM-assisted Mental Health Services	Owen Xingjian Zhang et.al.	2409.17572	null
2024-09-26	Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Xun Zhu et.al.	2409.17508	link
2024-09-25	Severity Prediction in Mental Health: LLM-based Creation, Analysis, Evaluation of a Novel Multilingual Dataset	Konstantinos Skianis et.al.	2409.17397	null
2024-09-25	Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia	Azmul Asmar Irfan et.al.	2409.17054	null
2024-09-25	The Role of Language Models in Modern Healthcare: A Comprehensive Review	Amna Khalid et.al.	2409.16860	null
2024-10-04	“It Explains What I am Currently Going Through Perfectly to a Tee”: Understanding User Perceptions on LLM-Enhanced Narrative Interventions	Ananya Bhattacharjee et.al.	2409.16732	null
2024-09-25	In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results	Mike Thelwall et.al.	2409.16695	null
2024-09-25	Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels	Yishu Wei et.al.	2409.16563	null
2024-09-24	Design and Evaluation of a CDSS for Drug Allergy Management Using LLMs and Pharmaceutical Data Integration	Gabriele De Vito et.al.	2409.16395	null
2024-09-24	CHBench: A Chinese Dataset for Evaluating Health in Large Language Models	Chenlu Guo et.al.	2409.15766	link
2024-09-24	XTRUST: On the Multilingual Trustworthiness of Large Language Models	Yahan Li et.al.	2409.15762	link
2024-09-24	A Comprehensive Evaluation of Large Language Models on Mental Illnesses	Abdelrahman Hanafi et.al.	2409.15687	null
2024-09-23	Voice Assistants for Health Self-Management: Designing for and with Older Adults	Amama Mahmood et.al.	2409.15488	null
2024-09-20	Prompting Large Language Models for Supporting the Differential Diagnosis of Anemia	Elisa Castagnari et.al.	2409.15377	null
2024-09-23	A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?	Yunfei Xie et.al.	2409.15277	null
2024-09-23	Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation	Yi-Fei Zhao et.al.	2409.15260	null
2024-09-24	PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models	Zhiyuan Wang et.al.	2409.15188	link
2024-09-23	Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies	Skatje Myers et.al.	2409.15163	null
2024-09-23	Boosting Healthcare LLMs Through Retrieved Context	Jordi Bayarri-Planas et.al.	2409.15127	link
2024-09-20	Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory	Kunyao Lan et.al.	2409.15084	null
2024-09-23	Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs	Clément Christophe et.al.	2409.14988	null
2024-09-23	Knowledge Planning in Large Language Models for Domain-Aligned Counseling Summarization	Aseem Srivastava et.al.	2409.14907	null
2024-09-24	Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding	Bokang Bi et.al.	2409.14638	null
2024-09-22	Can Large Language Models Logically Predict Myocardial Infarction? Evaluation based on UK Biobank Cohort	Yuxing Zhi et.al.	2409.14478	null
2024-09-22	PretextTrans: Investigating Medical Factual Knowledge Mastery of LLMs with Predicate-text Dual Transformation	Yuxuan Zhou et.al.	2409.14302	null
2024-09-21	Current Trends and Future Directions for Sexual Health Conversational Agents (CAs) for Youth: A Scoping Review	Jinkyung Katie Park et.al.	2409.14226	null
2024-09-20	Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology	Aidan Gilson et.al.	2409.13902	null
2024-09-20	Transfer Learning with Clinical Concept Embeddings from Large Language Models	Yuhe Gao et.al.	2409.13893	null
2024-09-11	A Simplified Retriever to Improve Accuracy of Phenotype Normalizations by Large Language Models	Daniel B. Hier et.al.	2409.13744	null
2024-09-20	Recent Advancement of Emotion Cognition in Large Language Models	Yuyan Chen et.al.	2409.13354	null
2024-09-20	SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation	Jinge Wu et.al.	2409.13321	link
2024-09-20	An adapted large language model facilitates multiple medical tasks in diabetes care	Lai Wei et.al.	2409.13191	link
2024-09-19	A New Perspective on ADHD Research: Knowledge Graph Construction with LLMs and Network Based Insights	Hakan T. Otal et.al.	2409.12853	link
2024-09-20	Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization	Thomas Savage et.al.	2409.12741	null
2024-09-11	Semantic Interoperability on Blockchain by Generating Smart Contracts Based on Knowledge Graphs	William Van Woensel et.al.	2409.12171	null
2024-09-19	Using Large Language Models to Generate Clinical Trial Tables and Figures	Yumeng Yang et.al.	2409.12046	null
2024-09-20	Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources	Issey Sukeda et.al.	2409.11783	link
2024-09-17	Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification	Fatema-E- Jannat et.al.	2409.11375	null
2024-09-17	ASHABot: An LLM-Powered Chatbot to Support the Informational Needs of Community Health Workers	Pragnya Ramjee et.al.	2409.10913	null
2024-09-16	GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students	Vikram Krishnaveti et.al.	2409.10750	null
2024-09-15	Veridical Data Science for Medical Foundation Models	Ahmed Alaa et.al.	2409.10580	null
2024-09-14	On the limits of agency in agent-based models	Ayush Chopra et.al.	2409.10568	link
2024-09-16	DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction	John Wu et.al.	2409.10504	null
2024-09-17	Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot	Bhuvan Sachdeva et.al.	2409.10354	null
2024-09-16	LLMs for clinical risk prediction	Mohamed Rezk et.al.	2409.10191	null
2024-09-16	MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM	Sijie Ji et.al.	2409.10064	null
2024-09-18	HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making	Sumera Anjum et.al.	2409.10011	link
2024-09-15	GP-GPT: Large Language Model for Gene-Phenotype Mapping	Yanjun Lyu et.al.	2409.09825	null
2024-09-15	AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs	Madhusudan Ghosh et.al.	2409.09704	link
2024-09-17	ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language Models	Inhwa Song et.al.	2409.09662	null
2024-09-15	MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences	Subigya Nepal et.al.	2409.09570	null
2024-09-14	Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation	Hui Yi Leong et.al.	2409.09324	null
2024-09-24	Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases	Mercy Asiedu et.al.	2409.09201	null
2024-09-13	Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation	Cheng Charles Ma et.al.	2409.09135	null
2024-08-30	OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography	Youzhu Jin et.al.	2409.09052	null
2024-09-13	Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes	Luis Rita et.al.	2409.08792	null
2024-09-13	Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling	Jialu Tang et.al.	2409.08788	null
2024-09-13	Eir: Thai Medical Large Language Models	Yutthakorn Thiprak et.al.	2409.08523	null
2024-09-11	Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation	Gavin Butts et.al.	2409.07424	null
2024-09-11	MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications	Praveen K Kanithi et.al.	2409.07314	null
2024-09-11	Reranking Laws for Language Generation: A Communication-Theoretic Perspective	António Farinhas et.al.	2409.07131	null
2024-09-10	MAGDA: Multi-agent guideline-driven diagnostic assistance	David Bani-Harouni et.al.	2409.06351	null
2024-09-10	Can Large Language Models Unlock Novel Scientific Research Ideas?	Sandeep Kumar et.al.	2409.06185	link
2024-09-10	Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines	Yining Chen et.al.	2409.06164	link
2024-09-09	Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach	Meng Zhou et.al.	2409.05732	null
2024-09-09	The Influence of Task and Group Disparities over Users’ Attitudes Toward Using Large Language Models for Psychotherapy	Qihang He et.al.	2409.05703	null
2024-09-09	KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models	Yingshu Li et.al.	2409.05370	null
2024-09-06	Toward LLM-Powered Social Robots for Supporting Sensitive Disclosures of Stigmatized Health Conditions	Alemitu Bezabih et.al.	2409.04508	null
2024-09-06	Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials	Yizhen Zheng et.al.	2409.04481	null
2024-09-06	Towards Safer Online Spaces: Simulating and Assessing Intervention Strategies for Eating Disorder Discussions	Louis Penafiel et.al.	2409.04043	null
2024-09-05	CACER: Clinical Concept Annotations for Cancer Events and Relations	Yujuan Fu et.al.	2409.03905	link
2024-09-05	LLM-based event abstraction and integration for IoT-sourced logs	Mohsen Shirali et.al.	2409.03478	link
2024-09-05	Rx Strategist: Prescription Verification using LLM Agents System	Phuc Phan Van et.al.	2409.03440	null
2024-09-05	Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time	Francisco de Arriba-Pérez et.al.	2409.03375	null
2024-09-05	Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration	Jeremy Qin et.al.	2409.03225	link
2024-09-04	Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models	Chih-Yuan Li et.al.	2409.02530	null
2024-09-03	Therapy as an NLP Task: Psychologists’ Comparison of LLMs and Human Peers in CBT	Zainab Iftikhar et.al.	2409.02244	null
2024-09-03	Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation	Jack Krolik et.al.	2409.01941	null
2024-09-03	Training on the Benchmark Is Not All You Need	Shiwen Ni et.al.	2409.01790	link
2024-09-03	It is Time to Develop an Auditing Framework to Promote Value Aware Chatbots	Yanchen Wang et.al.	2409.01539	link
2024-09-02	DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models	Rajat Rawat et.al.	2409.01497	null
2024-09-01	Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering	Derian Boer et.al.	2409.00861	link
2024-09-01	Building FKG.in: a Knowledge Graph for Indian Food	Saransh Kumar Gupta et.al.	2409.00830	null
2024-08-31	Large Language Models-Enabled Digital Twins for Precision Medicine in Rare Gynecological Tumors	Jacqueline Lammert et.al.	2409.00544	link
2024-08-31	Chatting Up Attachment: Using LLMs to Predict Adult Bonds	Paulo Soares et.al.	2409.00347	null
2024-08-29	A Survey for Large Language Models in Biomedicine	Chong Wang et.al.	2409.00133	null
2024-08-27	Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy	Daniil Filienko et.al.	2409.00112	null
2024-08-27	Large Language Models for Disease Diagnosis: A Scoping Review	Shuang Zhou et.al.	2409.00097	null
2024-09-04	Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models	Seyed Amir Ahmad Safavi-Naini et.al.	2409.00084	link
2024-08-30	NDP: Next Distribution Prediction as a More Broad Target	Junhao Ruan et.al.	2408.17377	null
2024-08-29	Instruction-tuned Large Language Models for Machine Translation in the Medical Domain	Miguel Rios et.al.	2408.16440	null
2024-08-29	Enhancing AI-Driven Psychological Consultation: Layered Prompts with Large Language Models	Rafael Souza et.al.	2408.16276	null
2024-08-29	M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation	Jonggwon Park et.al.	2408.16213	null
2024-08-28	Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions	Huachuan Qiu et.al.	2408.15787	link
2024-08-28	A Survey on Evaluation of Multimodal Large Language Models	Jiaxing Huang et.al.	2408.15769	null
2024-08-26	Improving Clinical Note Generation from Complex Doctor-Patient Conversation	Yizhan Li et.al.	2408.14568	null
2024-09-06	MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues	Kuluhan Binici et.al.	2408.14418	null
2024-09-03	Foundation Models for Music: A Survey	Yinghao Ma et.al.	2408.14340	link
2024-08-25	Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data	Felix J. Dorfner et.al.	2408.13833	null
2024-08-25	Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models	Duy Khoa Pham et.al.	2408.13808	null
2024-08-23	IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models	Zhihao Yu et.al.	2408.13073	link
2024-08-23	Guiding IoT-Based Healthcare Alert Systems with Large Language Models	Yulan Gao et.al.	2408.13071	null
2024-08-23	Grounding Fallacies Misrepresenting Scientific Publications in Evidence	Max Glockner et.al.	2408.12812	link
2024-08-22	RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment	Xiaohan Wang et.al.	2408.12579	null
2024-09-05	Towards Evaluating and Building Versatile Large Language Models for Medicine	Chaoyi Wu et.al.	2408.12547	link
2024-08-22	MEDCO: Medical Education Copilots Based on A Multi-Agent Framework	Hao Wei et.al.	2408.12496	null
2024-08-22	Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations	Kai Tzu-iunn Ong et.al.	2408.12315	null
2024-08-22	LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction	Aishik Nagar et.al.	2408.12249	null
2024-08-22	MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient	Yanzeng Li et.al.	2408.12236	null
2024-08-22	Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards	Shresth Verma et.al.	2408.12112	null
2024-08-22	Aligning (Medical) LLMs for (Counterfactual) Fairness	Raphael Poulain et.al.	2408.12055	link
2024-08-21	Exploring Large Language Models for Feature Selection: A Data-centric Perspective	Dawei Li et.al.	2408.12025	null
2024-08-16	Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI	Arindam Sett et.al.	2408.11861	null
2024-08-15	When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?	Yanjun Gao et.al.	2408.11854	null
2024-08-13	MGH Radiology Llama: A Llama 3 70B Model for Radiology	Yucheng Shi et.al.	2408.11848	null
2024-09-01	Clinical Insights: A Comprehensive Review of Language Models in Medicine	Nikita Neveditsin et.al.	2408.11735	null
2024-08-21	BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports	Yuxuan Chen et.al.	2408.11334	null
2024-08-21	Probabilistic Medical Predictions of Large Language Models	Bowen Gu et.al.	2408.11316	null
2024-08-21	Applying and Evaluating Large Language Models in Mental Health Care: A Scoping Review of Human-Assessed Generative Tasks	Yining Hua et.al.	2408.11288	null
2024-08-21	BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation	Haotian Peng et.al.	2408.11281	link
2024-08-20	Public Health in Disaster: Emotional Health and Life Incidents Extraction during Hurricane Harvey	Thomas Hoang et.al.	2408.11133	null
2024-08-20	CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models	Michael Reinisch et.al.	2408.10995	null
2024-08-20	Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology	Yihao Hou et.al.	2408.10715	null
2024-08-20	Large Language Models for Multimodal Deformable Image Registration	Mingrui Ma et.al.	2408.10703	link
2024-08-19	Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory	Haoran Li et.al.	2408.10053	null
2024-08-29	MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis	Ruihui Hou et.al.	2408.10039	null
2024-08-19	Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions	Sebastian Heineking et.al.	2408.09831	link
2024-08-19	R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation	Xiao Wang et.al.	2408.09743	link
2024-08-18	Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities	Minh Duc Chu et.al.	2408.09366	null
2024-08-17	TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems	Xinke Jiang et.al.	2408.09199	link
2024-08-17	AI Managed Emergency Documentation with a Pretrained Model	David Menzies et.al.	2408.09193	null
2024-08-16	Improving VTE Identification through Language Models from Radiology Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT	Jamie Deng et.al.	2408.09043	null
2024-08-16	HSDreport: Heart Sound Diagnosis with Echocardiography Reports	Zihan Zhao et.al.	2408.08669	null
2024-08-16	RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions	Gregory Kell et.al.	2408.08624	link
2024-08-15	Assessing and Enhancing Large Language Models in Rare Disease Question-answering	Guanchu Wang et.al.	2408.08422	null
2024-08-15	LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning	Jiajie Li et.al.	2408.07981	null
2024-08-15	The doctor will polygraph you now: ethical concerns with AI for fact-checking patients	James Anibal et.al.	2408.07896	null
2024-08-15	Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering	Yushi Yang et.al.	2408.07888	link
2024-08-14	MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis	Nimeesha Chan et.al.	2408.07773	link
2024-08-27	Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments	Seungjun Han et.al.	2408.07531	null
2024-08-14	Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health	Yongquan Hu et.al.	2408.07313	null
2024-07-24	Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition	Michele Fiori et.al.	2408.06352	null
2024-08-12	Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM	Trisha Das et.al.	2408.06285	null
2024-08-12	Med42-v2: A Suite of Clinical LLMs	Clément Christophe et.al.	2408.06142	null
2024-08-10	Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction	Jung Hoon Lim et.al.	2408.05555	null
2024-08-16	RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records	Sangjoon Park et.al.	2408.05074	null
2024-08-08	Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction	Reza Khanmohammadi et.al.	2408.04775	null
2024-08-08	Dynamic Fog Computing for Enhanced LLM Execution in Medical Applications	Philipp Zagar et.al.	2408.04680	null
2024-08-03	Building Trust in Mental Health Chatbots: Safety Metrics and LLM-Based Evaluation Tools	Jung In Park et.al.	2408.04650	null
2024-08-08	Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation	Junde Wu et.al.	2408.04187	link
2024-08-08	Academic collaboration on large language model studies increases overall but varies across disciplines	Lingyao Li et.al.	2408.04163	link
2024-08-08	Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering	Haoran Yu et.al.	2408.04138	null
2024-08-07	Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology	Panagiotis Fytas et.al.	2408.04121	null
2024-08-07	Towards Multimodal Emotional Support Conversation Systems	Yuqi Chu et.al.	2408.03650	link
2024-08-06	Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation	Artur Guimarães et.al.	2408.03127	link
2024-08-06	Targeted Visual Prompting for Medical Visual Question Answering	Sergio Tascon-Morales et.al.	2408.03043	link
2024-08-06	Fact Finder – Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs	Daniel Steinigen et.al.	2408.03010	link
2024-08-07	Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval	Iman Azimi et.al.	2408.02964	link
2024-08-04	MedSyn: LLM-based Synthetic Medical Text Generation Framework	Gleb Kumichev et.al.	2408.02056	link
2024-08-06	DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models	Bowen Wang et.al.	2408.01933	link
2024-08-03	MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance	Jihye Choi et.al.	2408.01869	link
2024-07-27	AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools	Aditya Paul et.al.	2408.01459	null
2024-08-02	The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models	Hannah Chen et.al.	2408.01285	null
2024-08-05	Agentic LLM Workflows for Generating Patient-Friendly Medical Reports	Malavikha Sudarshan et.al.	2408.01112	link
2024-08-01	Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions	Guangzhi Xiong et.al.	2408.00727	link
2024-07-25	Closing the gap between open-source and commercial large language models for medical evidence summarization	Gongbo Zhang et.al.	2408.00588	null
2024-07-31	A Taxonomy of Stereotype Content in Large Language Models	Gandalf Nicolas et.al.	2408.00162	null
2024-07-31	A Course Shared Task on Evaluating LLM Output for Clinical Questions	Yufang Hou et.al.	2408.00122	link
2024-07-24	Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications	Cui Long et.al.	2407.21055	null
2024-07-23	An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice	Roma Shusterman et.al.	2407.21051	null
2024-08-12	Artificial Intelligence in Extracting Diagnostic Data from Dental Records	Yao-Shun Chuang et.al.	2407.21050	null
2024-07-30	Can LLMs be Fooled? Investigating Vulnerabilities in LLMs	Sara Abdali et.al.	2407.20529	null
2024-07-29	Exploring Large Language Models to generate Easy to Read content	Paloma Martínez et.al.	2407.20046	null
2024-07-30	CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare	Jingwei Zhu et.al.	2407.19705	link
2024-07-28	A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy	Meng Jiang et.al.	2407.19422	null
2024-07-27	The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations	Thanh-Dung Le et.al.	2407.19299	null
2024-07-27	Multi-Modal CLIP-Informed Protein Editing	Mingze Yin et.al.	2407.19296	null
2024-07-27	Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review	Tongyue Shi et.al.	2407.19256	null
2024-07-26	Large Language Models as Co-Pilots for Causal Inference in Medical Studies	Ahmed Alaa et.al.	2407.19118	null
2024-07-26	Towards Automated Solution Recipe Generation for Industrial Asset Management with LLM	Nianjun Zhou et.al.	2407.18992	null
2024-07-26	Is larger always better? Evaluating and prompting large language models for non-generative medical tasks	Yinghao Zhu et.al.	2407.18525	link
2024-07-24	Online Social Network Data-Driven Early Detection on Short-Form Video Addiction	Fang-Yu Kuo et.al.	2407.18277	null
2024-07-25	The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation	Eric Yang et.al.	2407.18044	null
2024-08-15	The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer	Danqing Hu et.al.	2407.17900	null
2024-07-25	Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?	Hao Shen et.al.	2407.17730	null
2024-07-24	IgnitionInnovators at “Discharge Me!”: Chain-of-Thought Instruction Finetuning Large Language Models for Discharge Summaries	An Quang Tang et.al.	2407.17636	link
2024-07-24	SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)	Bernardo Consoli et.al.	2407.17126	null
2024-07-23	Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models	Ioana Buhnila et.al.	2407.16565	link
2024-07-23	PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets	Jaeyoung Kim et.al.	2407.16329	null
2024-07-23	Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks	Yao-Shun Chuang et.al.	2407.16166	link
2024-07-16	Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis	Qiuhong Wei et.al.	2407.15862	null
2024-07-21	A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech	Gaurav Verma et.al.	2407.15227	null
2024-07-19	CVE-LLM : Automatic vulnerability evaluation in medical device industry using large language models	Rikhiya Ghosh et.al.	2407.14640	null
2024-07-19	Adversarial Databases Improve Success in Retrieval-based Large Language Models	Sean Wu et.al.	2407.14609	null
2024-07-19	Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis	Valentin Pelloin et.al.	2407.14180	link
2024-07-28	Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field	Tobias Kerner et.al.	2407.14076	null
2024-07-19	Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization	Md Sultan Al Nahian et.al.	2407.14000	null
2024-07-18	KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration	Youfu Yan et.al.	2407.13598	null
2024-07-18	Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks	Samy Ateia et.al.	2407.13511	link
2024-07-18	End-To-End Clinical Trial Matching with Large Language Models	Dyke Ferber et.al.	2407.13463	null
2024-07-18	CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis	Junying Chen et.al.	2407.13301	link
2024-07-18	TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models	Ling Yue et.al.	2407.13115	null
2024-07-03	Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital Mindfulness	Harsh Kumar et.al.	2407.13067	null
2024-07-17	Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models	Alexander R. Pelletier et.al.	2407.12888	link
2024-07-06	Large language models are good medical coders, if provided with tools	Keith Kwan et.al.	2407.12849	link
2024-07-04	NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions	Andong Hua et.al.	2407.12843	null
2024-07-02	Lightweight Large Language Model for Medication Enquiry: Med-Pal	Kabilan Elangovan et.al.	2407.12822	null
2024-07-18	Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions	Marcos Fernández-Pichel et.al.	2407.12468	link
2024-07-17	MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models	Thao Minh Nguyen Phan et.al.	2407.12309	null
2024-07-17	A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery	Jike Wang et.al.	2407.12296	null
2024-07-26	LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation	Bunyamin Keles et.al.	2407.12126	null
2024-06-30	Evaluation of Bias Towards Medical Professionals in Large Language Models	Xi Chen et.al.	2407.12031	null
2024-07-16	Schema Matching with Large Language Models: an Experimental Study	Marcel Parciak et.al.	2407.11852	link
2024-07-25	CCoE: A Compact LLM with Collaboration of Experts	Shaomang Huang et.al.	2407.11686	null
2024-07-16	Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise	Qimin Yang et.al.	2407.11536	null
2024-07-09	Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations	Rachael Fleurence et.al.	2407.11054	null
2024-06-25	Panacea: A foundation model for clinical trial search, summarization, design, and recruitment	Jiacheng Lin et.al.	2407.11007	link
2024-07-15	Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities	Nhat Le et.al.	2407.10785	null
2024-07-15	TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction	Xingzhi Zhou et.al.	2407.10510	null
2024-07-15	Enhancing Medication Recommendation with LLM Text Representation	Yu-Tzu Lee et.al.	2407.10453	null
2024-07-13	Causality extraction from medical text using Large Language Models (LLMs)	Seethalakshmi Gopalakrishnan et.al.	2407.10020	null
2024-07-13	PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models	Can Cui et.al.	2407.09979	null
2024-07-12	Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction	Chase Fensore et.al.	2407.09688	link
2024-07-12	Open (Clinical) LLMs are Sensitive to Instruction Phrasings	Alberto Mario Ceballos Arroyo et.al.	2407.09429	link
2024-07-12	STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs	Yiheng Huang et.al.	2407.09096	link
2024-07-11	Uncertainty Estimation of Large Language Models in Medical Question Answering	Jiaxin Wu et.al.	2407.08662	null
2024-07-11	Leveraging LLMs to Predict Affective States via Smartphone Sensor Features	Tianyi Zhang et.al.	2407.08240	null
2024-07-11	DALL-M: Context-Aware Clinical Data Augmentation with LLMs	Chihcheng Hsieh et.al.	2407.08227	link
2024-07-10	Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing	Ian Steenstra et.al.	2407.08095	link
2024-07-04	CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation	Rui Yang et.al.	2407.07913	null
2024-07-10	A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability	Ting Fang Tan et.al.	2407.07666	null
2024-07-10	Interpretable Differential Diagnosis with Dual-Inference Large Language Models	Shuang Zhou et.al.	2407.07330	null
2024-07-09	Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges	Emilio Ferrara et.al.	2407.07196	null
2024-07-09	Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies	Inwon Kang et.al.	2407.07019	null
2024-07-09	End-To-End Causal Effect Estimation from Unstructured Natural Language Data	Nikita Dhawan et.al.	2407.07018	null
2024-07-08	Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities	Avinash Anand et.al.	2407.06125	null
2024-07-08	Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs	Sanjeet Singh et.al.	2407.05887	link
2024-07-08	PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation	Jinpeng Hu et.al.	2407.05721	link
2024-07-07	CLIMB: A Benchmark of Clinical Bias in Large Language Models	Yubo Zhang et.al.	2407.05250	link
2024-07-06	Leveraging Task-Specific Knowledge from LLM for Semi-Supervised 3D Medical Image Segmentation	Suruchi Kumari et.al.	2407.05088	null
2024-07-05	Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework	Reza Averly et.al.	2407.04629	null
2024-07-05	Using LLMs to label medical papers according to the CIViC evidence model	Markus Hisch et.al.	2407.04466	link
2024-07-04	Query-Guided Self-Supervised Summarization of Nursing Notes	Ya Gao et.al.	2407.04125	null
2024-07-04	Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval	Kazuaki Furumai et.al.	2407.03585	null
2024-07-03	Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory	Suyeon Lee et.al.	2407.03103	link
2024-07-03	SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research	Meghal Dani et.al.	2407.03004	null
2024-07-02	Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms	Viet Cuong Nguyen et.al.	2407.02662	null
2024-07-02	MMedAgent: Learning to Use Medical Tools with Multi-modal Agent	Binxu Li et.al.	2407.02483	link
2024-06-29	Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare	Forhan Bin Emdad et.al.	2407.01627	null
2024-07-14	Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles	Ryan Louie et.al.	2407.00870	null
2024-06-30	Large Language Models Struggle in Token-Level Clinical Named Entity Recognition	Qiuhao Lu et.al.	2407.00731	link
2024-06-29	Answering real-world clinical questions using large language model based systems	Yen Sia Low et.al.	2407.00541	null
2024-06-29	ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees	Zhiyuan Wang et.al.	2407.00499	link
2024-06-28	EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models	João Matos et.al.	2407.00242	link
2024-07-02	Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges	Mahmoud Ibrahim et.al.	2407.00116	null
2024-06-27	PathAlign: A vision-language model for whole slide images in histopathology	Faruk Ahmed et.al.	2406.19578	null
2024-06-27	PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models	Cathy Mengying Fang et.al.	2406.19283	null
2024-06-27	HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale	Junying Chen et.al.	2406.19280	link
2024-06-26	Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources	Yiming Li et.al.	2406.18049	null
2024-06-26	LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them	Wenya Xie et.al.	2406.18034	null
2024-06-26	Automated Clinical Data Extraction with Knowledge Conditioned LLMs	Diya Li et.al.	2406.18027	null
2024-07-11	Multi-step Inference over Unstructured Data	Aditya Kalyanpur et.al.	2406.17987	null
2024-06-25	Accelerating Clinical Evidence Synthesis with Large Language Models	Zifeng Wang et.al.	2406.17755	null
2024-07-06	MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation	Yusheng Liao et.al.	2406.17484	link
2024-06-25	Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis	Ajan Subramanian et.al.	2406.16252	null
2024-06-23	Effectiveness of ChatGPT in explaining complex medical reports to patients	Mengxuan Sun et.al.	2406.15963	null
2024-06-22	Real-time Speech Summarization for Medical Conversations	Khai Le-Duc et.al.	2406.15888	link
2024-06-16	WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist	Chenyu Ren et.al.	2406.15474	null
2024-06-15	Mental Disorder Classification via Temporal Representation of Text	Raja Kumar et.al.	2406.15470	null
2024-06-21	Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms	Santiago Berrezueta-Guzman et.al.	2406.15198	null
2024-06-21	Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction	Jinge Wu et.al.	2406.15045	null
2024-06-21	MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens	Yongqi Fan et.al.	2406.15019	link
2024-06-21	Human-AI collectives produce the most accurate differential diagnoses	N. Zöller et.al.	2406.14981	link
2024-06-21	70B-parameter large language models in Japanese medical question-answering	Issey Sukeda et.al.	2406.14882	null
2024-06-27	Efficient Continual Pre-training by Mitigating the Stability Gap	Yiduo Guo et.al.	2406.14833	null
2024-07-01	ACR: A Benchmark for Automatic Cohort Retrieval	Dung Ngoc Thai et.al.	2406.14780	null
2024-06-20	A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes	Syed I. Munzir et.al.	2406.14757	null
2024-06-20	medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs	Mingyi Jia et.al.	2406.14326	link
2024-06-19	ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World	Weixiang Yan et.al.	2406.13890	link
2024-06-24	The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis	Marcin Rządeczka et.al.	2406.13813	null
2024-06-19	Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health	Bo Wen et.al.	2406.13659	null
2024-06-19	Optimizing Psychological Counseling with Instruction-Tuned Large Language Models	Wenjie Li et.al.	2406.13617	null
2024-06-19	Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective	David Restrepo et.al.	2406.13152	null
2024-06-18	Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia	Ankit Aich et.al.	2406.12687	null
2024-06-18	Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics	Huan Xu et.al.	2406.12651	null
2024-06-20	Towards a Client-Centered Assessment of LLM Therapists by Client Simulation	Jiashuo Wang et.al.	2406.12266	link
2024-06-18	Adversarial Attacks on Large Language Models in Medicine	Yifan Yang et.al.	2406.12259	null
2024-06-18	Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models	Lulu Zhao et.al.	2406.12182	null
2024-06-19	Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks	Jack Gallifant et.al.	2406.12066	link
2024-06-28	WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions	Seyedali Mohammadi et.al.	2406.12058	link
2024-06-30	MedCalc-Bench: Evaluating Large Language Models for Medical Calculations	Nikhil Khandekar et.al.	2406.12036	link
2024-06-19	Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models	Yuqing Wang et.al.	2406.12033	link
2024-06-17	Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams	Zheheng Luo et.al.	2406.11328	link
2024-06-17	Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization	Minda Hu et.al.	2406.11258	null
2024-06-16	RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information	Zhiwei Liu et.al.	2406.11093	link
2024-06-15	SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task	Ziije Zhong et.al.	2406.10710	link
2024-06-15	We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation	Palash Moon et.al.	2406.10561	null
2024-06-15	CancerLLM: A Large Language Model in Cancer Domain	Mingchen Li et.al.	2406.10459	null
2024-06-14	Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework	Olivier Binette et.al.	2406.10366	null
2024-06-14	A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations	Jinqiang Wang et.al.	2406.10303	link
2024-06-13	Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark	Chufan Gao et.al.	2406.10292	null
2024-06-11	Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis	Matteo Esposito et.al.	2406.10273	null
2024-06-14	Detecting and Evaluating Medical Hallucinations in Large Vision Language Models	Jiawei Chen et.al.	2406.10185	null
2024-06-14	CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions	Mingyu Derek Ma et.al.	2406.09923	link
2024-06-13	Chain-of-Though (CoT) prompting strategies for medical error detection and correction	Zhaolong Wu et.al.	2406.09103	null
2024-06-13	Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations	Jun-Woo Kim et.al.	2406.08718	null
2024-06-12	Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization	Fengxiao Tang et.al.	2406.08305	null
2024-06-18	SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature	David Wadden et.al.	2406.07835	link
2024-06-12	Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images	Che Liu et.al.	2406.07146	null
2024-06-10	Large language models for generating rules, yay or nay?	Shangeetha Sivasothy et.al.	2406.06835	null
2024-06-10	Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing	Enshuo Hsu et.al.	2406.06723	null
2024-06-09	LLM Questionnaire Completion for Automatic Psychiatric Assessment	Gony Rosenman et.al.	2406.06636	null
2024-06-07	Transforming Dental Diagnostics with Artificial Intelligence: Advanced Integration of ChatGPT and Large Language Models for Patient Care	Masoumeh Farhadi Nia et.al.	2406.06616	null
2024-06-03	MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering	Robert Osazuwa Ness et.al.	2406.06573	null
2024-06-10	Towards a Personal Health Large Language Model	Justin Cosentino et.al.	2406.06474	null
2024-06-11	Transforming Wearable Data into Health Insights using Large Language Model Agents	Mike A. Merrill et.al.	2406.06464	null
2024-06-13	A Large Language Model Pipeline for Breast Cancer Oncology	Tristen Pool et.al.	2406.06455	null
2024-06-10	Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain	Brian Hu et.al.	2406.06435	link
2024-06-10	MedExQA: Medical Question Answering Benchmark with Multiple Explanations	Yunsoo Kim et.al.	2406.06331	link
2024-06-10	Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text	Avijit Mitra et.al.	2406.06056	link
2024-06-10	Enhancing Food Safety in Supply Chains: The Potential Role of Large Language Models in Preventing Campylobacter Contamination	Asaf Tzachor et.al.	2406.06049	null
2024-06-09	Zero-Shot End-To-End Spoken Question Answering In Medical Domain	Yanis Labrak et.al.	2406.05876	null
2024-06-09	MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering	Juraj Vladika et.al.	2406.05845	null
2024-06-08	Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification	Yunhe Gao et.al.	2406.05596	null
2024-06-07	TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models	Ping Yu et.al.	2406.04941	null
2024-06-06	On The Persona-based Summarization of Domain-Specific Documents	Ankan Mullick et.al.	2406.03986	link
2024-06-06	UltraMedical: Building Specialized Generalists in Biomedicine	Kaiyan Zhang et.al.	2406.03949	link
2024-06-06	Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As	Eden Avnat et.al.	2406.03855	null
2024-06-06	A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions	Lei Liu et.al.	2406.03712	null
2024-06-06	M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering	Anand Subramanian et.al.	2406.03699	link
2024-06-05	Missci: Reconstructing Fallacies in Misrepresented Science	Max Glockner et.al.	2406.03181	link
2024-06-05	MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge	Yuxuan Zhou et.al.	2406.02919	link
2024-06-04	Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data	Maxime Griot et.al.	2406.02394	link
2024-06-05	LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing	Maojun Sun et.al.	2406.02350	link
2024-06-04	Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study	Martin J. Hetz et.al.	2406.01428	null
2024-06-03	TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine	Wenjing Yue et.al.	2406.01126	null
2024-06-04	MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning	Shuyue Stella Li et.al.	2406.00922	link
2024-05-29	Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study	David Pissarra et.al.	2406.00062	null
2024-05-27	EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling	Yinghao Zhu et.al.	2406.00036	null
2024-05-22	KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR	Hajung Kim et.al.	2406.00014	null
2024-05-26	Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models	Xijie Huang et.al.	2405.20775	link
2024-05-31	GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models	Mohammed-Khalil Ghali et.al.	2405.20585	null
2024-05-30	PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals	Ruiyi Wang et.al.	2405.19660	link
2024-05-30	Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router	Akul Goel et.al.	2405.19631	null
2024-05-26	ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text	Han Yu et.al.	2405.19366	link
2024-05-29	Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models	Tianrun Chen et.al.	2405.19326	null
2024-06-03	PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications	Dingkang Yang et.al.	2405.19266	link
2024-05-28	Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation	Anjanava Biswas et.al.	2405.18346	null
2024-05-28	Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints	Aryo Pradipta Gema et.al.	2405.18028	null
2024-05-28	SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions	Juexiao Zhou et.al.	2405.18004	null
2024-05-26	Augmented Risk Prediction for the Onset of Alzheimer’s Disease from Electronic Health Records with Large Language Models	Jiankun Wang et.al.	2405.16413	null
2024-05-26	Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions	Man Luo et.al.	2405.16402	null
2024-05-29	Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data	Yuhao Chen et.al.	2405.16295	null
2024-05-28	Ensuring Ground Truth Accuracy in Healthcare with the EVINCE framework	Edward Y. Chang et.al.	2405.15808	null
2024-05-27	Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development	Pranab Sahoo et.al.	2405.15766	link
2024-05-24	Efficient Reinforcement Learning via Large Language Model-based Search	Siddhant Bhambri et.al.	2405.15194	null
2024-05-24	Generalizable and Scalable Multistage Biomedical Concept Normalization Leveraging Large Language Models	Nicholas J Dobbins et.al.	2405.15122	link
2024-05-23	Evaluating Large Language Models for Public Health Classification and Extraction Tasks	Joshua Harris et.al.	2405.14766	null
2024-05-23	Exploring the use of a Large Language Model for data extraction in systematic reviews: a rapid feasibility study	Lena Schmidt et.al.	2405.14445	null
2024-05-23	Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation	Zhusi Zhong et.al.	2405.14113	link
2024-05-22	Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation	Siyi Wu et.al.	2405.13803	null
2024-05-21	How Reliable AI Chatbots are for Disease Prediction from Patient Complaints?	Ayesha Siddika Nipu et.al.	2405.13219	null
2024-05-20	Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian	Rohitash Chandra et.al.	2405.13056	link
2024-05-20	Large Language Models for Medicine: A Survey	Yanxin Zheng et.al.	2405.13055	null
2024-05-12	Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data	Nan Miles Xi et.al.	2405.13005	null
2024-05-21	OLAPH: Improving Factuality in Biomedical Long-form Question Answering	Minbyul Jeong et.al.	2405.12701	link
2024-05-21	Exploration of Masked and Causal Language Modelling for Text Generation	Nicolo Micheletti et.al.	2405.12630	null
2024-05-21	DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge	Bufang Yang et.al.	2405.12541	null
2024-05-20	Can AI Relate: Testing Large Language Model Response for Mental Health Support	Saadia Gabriel et.al.	2405.12021	link
2024-05-19	Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning	Zishan Gu et.al.	2405.11640	null
2024-05-18	Can Public LLMs be used for Self-Diagnosis of Medical Conditions ?	Nikil Sharan Prabahar Balasubramanian et.al.	2405.11407	null
2024-05-18	Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments	Sichang Tu et.al.	2405.11178	null
2024-05-17	From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT	Jace Grandinetti et.al.	2405.11040	null
2024-05-17	COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain	Dimitrios P. Panagoulias et.al.	2405.10893	null
2024-05-16	Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification	Jinge Wu et.al.	2405.10440	null
2024-05-14	PromptMind Team at EHRSQL-2024: Improving Reliability of SQL Generation using Ensemble LLMs	Satya K Gundabathula et.al.	2405.08839	null
2024-05-14	A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine	Hanguang Xiao et.al.	2405.08603	null
2024-05-14	PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles	Satya Kesav Gundabathula et.al.	2405.08373	null
2024-05-30	AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments	Samuel Schmidgall et.al.	2405.07960	null
2024-05-13	Evaluating large language models in medical applications: a survey	Xiaolan Chen et.al.	2405.07468	null
2024-05-10	A Global Data-Driven Model for The Hippocampus and Nucleus Accumbens of Rat From The Local Field Potential Recordings (LFP)	Maedeh Sadeghi et.al.	2405.06732	null
2024-05-09	Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses	Gaurav Kumar Gupta et.al.	2405.06712	null
2024-05-08	Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance	Goran Muric et.al.	2405.06703	null
2024-05-08	Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks	Chancellor R. Woolsey et.al.	2405.06695	null
2024-05-10	Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval	Mengjia Niu et.al.	2405.06545	null
2024-06-03	XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare	Fatemeh Nazary et.al.	2405.06270	null
2024-05-09	Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection	Bhawesh Kumar et.al.	2405.06093	null
2024-05-09	Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents	Matthew Jörke et.al.	2405.06061	null
2024-05-09	Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias	Shan Chen et.al.	2405.05506	link
2024-05-08	Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models	Aylin Gunal et.al.	2405.05060	null
2024-05-12	DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature	Dawei Li et.al.	2405.04819	link
2024-05-08	Empathy Through Multimodality in Conversational Interfaces	Mahyar Abbasian et.al.	2405.04777	null
2024-05-07	AffirmativeAI: Towards LGBTQ+ Friendly Audit Frameworks for Large Language Models	Yinru Long et.al.	2405.04652	null
2024-05-07	D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models	Duygu Altinok et.al.	2405.04170	link
2024-05-14	ERATTA: Extreme RAG for Table To Answers with Large Language Models	Sohini Roychowdhury et.al.	2405.03963	null
2024-05-08	How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	Muhammad Uzair Khattak et.al.	2405.03690	null
2024-05-06	MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline	Mohamed Yaseen Jabarulla et.al.	2405.03359	link
2024-05-06	Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines	Md Main Uddin Rony et.al.	2405.03153	null
2024-05-22	A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)	Lingyao Li et.al.	2405.03066	null
2024-05-05	Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents	Junkai Li et.al.	2405.02957	null
2024-05-05	Confidential and Protected Disease Classifier using Fully Homomorphic Encryption	Aditya Malik et.al.	2405.02790	null
2024-05-04	A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare	Thomas Yu Chow Tam et.al.	2405.02559	null
2024-05-03	MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain	Chao Jiang et.al.	2405.02144	null
2024-05-03	CRCL at SemEval-2024 Task 2: Simple prompt optimizations	Clément Brutti-Mairesse et.al.	2405.01942	link
2024-05-03	Aloe: A Family of Fine-tuned Open Healthcare LLMs	Ashwin Kumar Gururajan et.al.	2405.01886	null
2024-05-02	Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models	Hye Sun Yun et.al.	2405.01686	link
2024-05-22	Leveraging Prompt-Learning for Structured Information Extraction from Crohn’s Disease Radiology Reports in a Low-Resource Language	Liam Hazan et.al.	2405.01682	null
2024-04-29	Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model	Seonhee Cho et.al.	2405.01591	null
2024-05-09	GPT-4 passes most of the 297 written Polish Board Certification Examinations	Jakub Pokrywka et.al.	2405.01589	null
2024-05-02	Prompt engineering paradigms for medical applications: scoping review and recommendations for better practices	Jamil Zaghir et.al.	2405.01249	null
2024-04-27	Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study	Dou Liu et.al.	2405.00728	null
2024-04-25	Large Language Models in Healthcare: A Comprehensive Benchmark	Andrew Liu et.al.	2405.00716	link
2024-04-25	Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation	Hanyin Wang et.al.	2405.00715	link
2024-04-23	Interactive Analysis of LLMs using Meaningful Counterfactuals	Furui Cheng et.al.	2405.00708	null
2024-05-15	“I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust	Sunnie S. Y. Kim et.al.	2405.00623	null
2024-05-01	Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning	Huan Xu et.al.	2405.00461	null
2024-05-01	DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training	Bhuvanesh Verma et.al.	2405.00321	null
2024-05-06	Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models	Scott Sumpter et.al.	2404.19713	null
2024-04-29	It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments	Petter Mæhlum et.al.	2404.18832	null
2024-04-29	Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models	Hongyi Zhu et.al.	2404.18746	null
2024-04-29	6G comprehensive intelligence: network operations and optimization based on Large Language Models	Sifan Long et.al.	2404.18373	null
2024-04-27	MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch	Nadia Saeed et.al.	2404.17999	link
2024-04-27	Advancing Healthcare Automation: Multi-Agent Systems for Medical Necessity Justification	Himanshu Pandey et.al.	2404.17977	null
2024-04-27	Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models	Zhongzhen Huang et.al.	2404.17897	null
2024-04-27	VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition	Junyi Biana et.al.	2404.17835	null
2024-04-25	A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs	Christian N. Mayemba et.al.	2404.16921	null
2024-04-25	Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare	Emre Can Acikgoz et.al.	2404.16621	link
2024-04-26	Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums	Isabelle Lorge et.al.	2404.16461	null
2024-04-25	LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications	Saranya Krishnamoorthy et.al.	2404.16294	link
2024-04-26	Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions	Divyansh Agarwal et.al.	2404.16251	null
2024-05-05	A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry	Yining Huang et.al.	2404.15777	null
2024-04-27	PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models	Shashi Kant Gupta et.al.	2404.15549	null
2024-04-23	IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents	Jean-Philippe Corbeil et.al.	2404.15488	link
2024-04-22	Adaptive Collaboration Strategy for LLMs in Medical Decision Making	Yubin Kim et.al.	2404.15155	link
2024-04-23	Bias patterns in the application of LLMs for clinical decision support: A comprehensive study	Raphael Poulain et.al.	2404.15149	link
2024-04-23	Med42 – Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches	Clément Christophe et.al.	2404.14779	null
2024-04-23	CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning	Ling Yue et.al.	2404.14777	null
2024-04-22	WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models	Ronald Xie et.al.	2404.14567	null
2024-04-22	WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction	Augustin Toma et.al.	2404.14544	null
2024-04-22	No General Code of Ethics for All: Ethical Considerations in Human-bot Psycho-counseling	Lizhi Ma et.al.	2404.14070	null
2024-04-20	“I Wish There Were an AI”: Challenges and AI Potential in Cancer Patient-Provider Communication	Ziqi Yang et.al.	2404.13409	null
2024-04-20	UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions	Ana-Cristina Rogoz et.al.	2404.13343	link
2024-04-20	Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions	Soumyadeep Roy et.al.	2404.13307	link
2024-05-03	LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models	Mouhamed Amine Bouchiha et.al.	2404.13236	link
2024-04-19	Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging	Chia-Hsuan Chang et.al.	2404.13149	null
2024-04-25	Leveraging Large Language Model as Simulated Patients for Clinical Education	Yanzeng Li et.al.	2404.13066	null
2024-04-19	Data Alignment for Zero-Shot Concept Generation in Dermatology AI	Soham Gadgil et.al.	2404.13043	null
2024-04-19	Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation	Guanhua Chen et.al.	2404.12879	null
2024-04-17	Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM	Hongzhao Li et.al.	2404.11209	null
2024-04-15	Numerical Attributes Learning for Cardiac Failure Diagnostic from Clinical Narratives – A LESA-CamemBERT-bio Approach	Boammani Aser Lompo et.al.	2404.10171	null
2024-04-14	Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms	Diandian Guo et.al.	2404.09231	null
2024-04-13	Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model	Zita Lifelo et.al.	2404.09045	null
2024-04-11	Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions	Agasthya Gangavarapu et.al.	2404.08705	null
2024-04-11	Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain	Iker García-Ferrero et.al.	2404.07613	null
2024-04-11	CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models	Sheng Wang et.al.	2404.07424	null
2024-04-10	LLMs in Biomedicine: A study on clinical Named Entity Recognition	Masoud Monajatipoor et.al.	2404.07376	link
2024-04-10	Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study	Hongru Du et.al.	2404.06962	link
2024-04-10	Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination	Soojong Kim et.al.	2404.06731	null
2024-04-10	Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology	Shashi Kant Gupta et.al.	2404.06680	null
2024-04-09	Comparing Two Model Designs for Clinical Note Generation; Is an LLM a Useful Evaluator of Consistency?	Nathan Brake et.al.	2404.06503	null
2024-04-08	MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering	Iñigo Alonso et.al.	2404.05590	null
2024-04-15	Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations	Yiming Li et.al.	2404.05415	null
2024-04-08	Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients	HyoJe Jung et.al.	2404.05144	null
2024-04-07	Clinical Trials Protocol Authoring using LLMs	Morteza Maleki et.al.	2404.05044	null
2024-04-07	SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials	Mael Jullien et.al.	2404.04963	null
2024-04-07	PairAug: What Can Augmented Image-Text Pairs Do for Radiology?	Yutong Xie et.al.	2404.04960	link
2024-04-06	Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology	Dyke Ferber et.al.	2404.04667	null
2024-04-06	IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials	Shreyasi Mandal et.al.	2404.04510	link
2024-04-04	Conversational Disease Diagnosis via External Planner-Controlled Large Language Models	Zhoujian Sun et.al.	2404.04292	link
2024-04-11	CLUE: A Clinical Language Understanding Evaluation for LLMs	Amin Dada et.al.	2404.04067	link
2024-04-04	Personalized LLM Response Generation with Parameterized Memory Injection	Kai Zhang et.al.	2404.03565	link
2024-04-02	Classifying Cancer Stage with Open-Source Clinical Large Language Models	Chia-Hsuan Chang et.al.	2404.01589	null
2024-04-01	Towards a potential paradigm shift in health data collection and analysis	David Josef Herzog et.al.	2404.01403	null
2024-04-01	Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models	Yi-Lin Tuan et.al.	2404.01295	null
2024-04-01	Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided	Hongli Zhan et.al.	2404.01288	link
2024-04-01	Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record	Griffin Adams et.al.	2404.01189	null
2024-04-01	LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation	Zilong Wang et.al.	2404.00998	null
2024-04-05	How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey	Zhonghao Shi et.al.	2404.00938	null
2024-04-04	Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods	Yujuan Fu et.al.	2404.00826	link
2024-03-30	Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4	Aryo Pradipta Gema et.al.	2404.00484	link
2024-03-29	Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain	Burcu Sayin et.al.	2403.20288	link
2024-04-04	Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries	Manjeet Yadav et.al.	2403.20145	null
2024-03-28	Developing Healthcare Language Model Embedding Spaces	Niall Taylor et.al.	2403.19802	null
2024-03-28	Bespoke Large Language Models for Digital Triage Assistance in Mental Health Care	Niall Taylor et.al.	2403.19790	null
2024-03-28	A Benchmark Evaluation of Clinical Named Entity Recognition in French	Nesrine Bannour et.al.	2403.19726	null
2024-03-28	BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation	Yuhong He et.al.	2403.19414	null
2024-03-27	Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data	Yuting Guo et.al.	2403.19031	null
2024-03-27	Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers	Laura Bergomi et.al.	2403.18938	link
2024-03-27	BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models	Haitao Li et.al.	2403.18365	null
2024-03-26	Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach	Andrea Ferrario et.al.	2403.17873	null
2024-03-26	Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization	Jae-hee So et.al.	2403.17428	link
2024-03-27	SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies	Akshat Choube et.al.	2403.17219	link
2024-03-25	Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model	Braja Gopal Patra et.al.	2403.17199	link
2024-03-25	Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data	Shinka Mori et.al.	2403.16909	link
2024-03-25	Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm	Lei Liu et.al.	2403.16446	null
2024-03-25	Dia-LLaMA: Towards Large Language Model-driven CT Report Generation	Zhixuan Chen et.al.	2403.16386	null
2024-03-26	Large Language Models in Biomedical and Health Informatics: A Bibliometric Review	Huizi Yu et.al.	2403.16303	null
2024-03-24	CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering	Hongbin Na et.al.	2403.16008	null
2024-03-23	LLMs Instruct LLMs:An Extraction and Editing Method	Xin Zhang et.al.	2403.15736	null
2024-03-20	Large language models can help boost food production, but be mindful of their risks	Djavan De Clercq et.al.	2403.15475	null
2024-03-19	LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction	Hejie Cui et.al.	2403.15464	null
2024-03-29	WoLF: Wide-scope Large Language Model Framework for CXR Understanding	Seil Kang et.al.	2403.15456	null
2024-03-26	The opportunities and risks of large language models in mental health	Hannah R. Lawrence et.al.	2403.14814	null
2024-04-02	Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis	Junyoung Kim et.al.	2403.14801	null
2024-03-27	Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature	Jeremy R. Harper et.al.	2403.14721	null
2024-03-21	Large Language Models for Multi-Choice Question Classification of Medical Subjects	Víctor Ponce-López et.al.	2403.14582	null
2024-03-20	Polaris: A Safety-focused LLM Constellation Architecture for Healthcare	Subhabrata Mukherjee et.al.	2403.13313	null
2024-03-19	Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning	Mengxian Lyu et.al.	2403.13089	null
2024-03-19	Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning	Cheng Peng et.al.	2403.12374	null
2024-03-18	Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach	Maria Mahbub et.al.	2403.12297	null
2024-03-18	A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models	Stephen R. Pfohl et.al.	2403.12025	link
2024-04-02	CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification	Korbinian Randl et.al.	2403.11904	link
2024-03-18	Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure	Ziyi Chen et.al.	2403.11425	link
2024-03-17	Cheap Ways of Extracting Clinical Markers from Texts	Anastasia Sandu et.al.	2403.11227	link
2024-03-17	Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping	Haoxi Zhang et.al.	2403.11073	null
2024-03-21	Do Large Language Models understand Medical Codes?	Simon A. Lee et.al.	2403.10822	null
2024-03-16	LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices	Jingping Nie et.al.	2403.10779	null
2024-03-16	Depression Detection on Social Media with Large Language Models	Xiaochong Lan et.al.	2403.10750	null
2024-03-15	Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems	Antonios Alexos et.al.	2403.10596	null
2024-03-22	Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction	Chen Chen et.al.	2403.10581	link
2024-03-15	Trusting the Search: Unraveling Human Trust in Health Information from Google and ChatGPT	Xin Sun et.al.	2403.09987	null
2024-03-08	A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health	Alexander Marrapese et.al.	2403.09705	null
2024-03-14	Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge	Li Yizhen et.al.	2403.09164	null
2024-04-01	A Continued Pretrained LLM Approach for Automatic Medical Note Generation	Dong Yuan et.al.	2403.09057	null
2024-03-15	AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic	Emad A. Alghamdi et.al.	2403.09017	null
2024-03-14	Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records	Erlend Frayling et.al.	2403.08664	null
2024-03-13	MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models	Subash Neupane et.al.	2403.08607	null
2024-03-14	Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator	Yusheng Liao et.al.	2403.08495	link
2024-03-12	SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models	Yu Yang et.al.	2403.07384	link
2024-03-11	Real-Time Multimodal Cognitive Assistant for Emergency Medical Services	Keshara Weerasinghe et.al.	2403.06734	link
2024-03-11	Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement	Che Liu et.al.	2403.06659	link
2024-03-11	MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding	Jiageng Wu et.al.	2403.06611	null
2024-03-11	Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds	Jiageng WU et.al.	2403.06609	link
2024-03-11	Can LLMs’ Tuning Methods Work in Medical Multimodal Domain?	Jiawei Chen et.al.	2403.06407	link
2024-03-10	ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes	Shengxin Hong et.al.	2403.06294	null
2024-03-10	FedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning	Zhuo Zhang et.al.	2403.06131	null
2024-03-19	KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques	Rui Yang et.al.	2403.05881	link
2024-03-08	A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries	Asad Aali et.al.	2403.05720	link
2024-03-08	Decomposing Vision-based LLM Predictions for Auto-Evaluation with GPT-4	Qingqing Zhu et.al.	2403.05680	null
2024-03-11	Tell me the truth: A system to measure the trustworthiness of Large Language Models	Carlo Lipizzi et.al.	2403.04964	null
2024-03-13	Electrocardiogram Instruction Tuning for Report Generation	Zhongwei Wan et.al.	2403.04945	null
2024-03-07	Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering	Ojas Gramopadhye et.al.	2403.04890	link
2024-03-06	Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification	Ricardo Bigolin Lanfredi et.al.	2403.04024	link
2024-03-06	Towards Safe and Aligned Large Language Models for Medicine	Tessa Han et.al.	2403.03744	link
2024-03-09	Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People	Xidong Wang et.al.	2403.03640	link
2024-03-05	Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse	Joseph Gatto et.al.	2403.03336	null
2024-03-05	Socratic Reasoning Improves Positive Text Rewriting	Anmol Goel et.al.	2403.03029	null
2024-03-05	Towards Training A Chinese Large Language Model for Anesthesiology	Zhonghai Wang et.al.	2403.02742	null
2024-03-05	Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research	Brenda Y. Miao et.al.	2403.02558	link
2024-03-16	SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction	Jiahuan Yan et.al.	2403.01570	null
2024-03-01	Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries	Zelalem Gero et.al.	2403.01002	link
2024-03-01	Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language	Xiaohan Ding et.al.	2403.00994	null
2024-03-01	AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models	Lang Cao et.al.	2403.00953	null
2024-03-01	SoftTiger: A Clinical Foundation Model for Healthcare Workflows	Ye Chen et.al.	2403.00868	link
2024-02-29	EyeGPT: Ophthalmic Assistant with Large Language Models	Xiaolan Chen et.al.	2403.00840	null
2024-02-28	MedAide: Leveraging Large Language Models for On-Premise Medical Assistance on Edge Devices	Abdul Basit et.al.	2403.00830	null
2024-02-18	ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework	Zhongqi Yang et.al.	2403.00781	null
2024-02-29	OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models	Jenish Maharjan et.al.	2402.19371	null
2024-02-29	Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study	Prottay Kumar Adhikary et.al.	2402.19052	null
2024-02-28	Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models	Derong Xu et.al.	2402.18099	link
2024-03-13	Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Hanjie Chen et.al.	2402.18060	link
2024-03-02	JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability	Junda Wang et.al.	2402.17887	link
2024-02-28	Prescribing Large Language Models for Perioperative Care: What’s The Right Dose for Pre-trained Models?	Bing Xue et.al.	2402.17493	link
2024-02-27	A Piece of Theatre: Investigating How Teachers Design LLM Chatbots to Assist Adolescent Cyberbullying Education	Michael A. Hedderich et.al.	2402.17456	null
2024-02-27	Deep Learning Based Named Entity Recognition Models for Recipes	Mansi Goel et.al.	2402.17447	null
2024-02-26	OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)	Fujian Jia et.al.	2402.16810	null
2024-02-26	LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery	Kexin Chen et.al.	2402.16664	link
2024-02-26	LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification	Yiping Song et.al.	2402.16515	null
2024-02-26	From RAGs to riches: Using large language models to write documents for clinical trials	Nigel Markey et.al.	2402.16406	null
2024-02-25	HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs	Cem Uluoglakci et.al.	2402.16211	link
2024-02-27	EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings	Sunjun Kweon et.al.	2402.16040	link
2024-02-24	Predicting Outcomes in Video Games with Long Short Term Memory Networks	Kittimate Chulajata et.al.	2402.15923	link
2024-02-24	Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study	Zhaoyue Sun et.al.	2402.15663	link
2024-02-23	Enhancing ICU Patient Recovery: Using LLMs to Assist Nurses in Diary Writing	Samuel Kernan Freire et.al.	2402.15205	null
2024-02-21	Automatic Histograms: Leveraging Language Models for Text Dataset Exploration	Emily Reif et.al.	2402.14880	link
2024-02-20	A Dual-Prompting for Interpretable Mental Health Language Models	Hyolim Jeon et.al.	2402.14854	null
2024-02-19	RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning	Congyun Jin et.al.	2402.14840	null
2024-02-23	A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health	Nikhil Behari et.al.	2402.14807	null
2024-02-22	Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond	Zhiyuan Wang et.al.	2402.14259	null
2024-02-22	Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology	Nur Yildirim et.al.	2402.14252	null
2024-02-21	On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study	Minh-Hao Van et.al.	2402.14162	null
2024-02-21	EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy	Hamed Hooshangnejad et.al.	2402.14099	null
2024-02-26	Towards Building Multilingual Language Model for Medicine	Pengcheng Qiu et.al.	2402.13963	link
2024-02-21	SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization	Prakamya Mishra et.al.	2402.13919	link
2024-02-21	Factual Consistency Evaluation of Summarisation in the Era of Large Language Models	Zheheng Luo et.al.	2402.13758	null
2024-02-20	Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation	Zhiyao Ren et.al.	2402.13408	null
2024-02-17	When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection	Xiangyu Zhang et.al.	2402.13276	null
2024-02-20	BiMediX: Bilingual Medical Mixture of Experts LLM	Sara Pieri et.al.	2402.13253	link
2024-02-23	Benchmarking Retrieval-Augmented Generation for Medicine	Guangzhi Xiong et.al.	2402.13178	link
2024-02-20	Few shot clinical entity recognition in three languages: Masked language models outperform LLM prompting	Marco Naguib et.al.	2402.12801	null
2024-02-20	Me LLaMA: Foundation Large Language Models for Medical Applications	Qianqian Xie et.al.	2402.12749	link
2024-02-19	LLM Agents for Psychology: A Study on Gamified Assessments	Qisen Yang et.al.	2402.12326	null
2024-02-19	Automatic Evaluation for Mental Health Counseling using LLMs	Anqi Li et.al.	2402.11958	null
2024-02-19	The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth	Shir Lissak et.al.	2402.11886	link
2024-02-19	NOTE: Notable generation Of patient Text summaries through Efficient approach based on direct preference optimization	Imjin Ahn et.al.	2402.11882	null
2024-02-20	MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs	Yavuz Faruk Bakman et.al.	2402.11756	link
2024-02-18	DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics	YiQiu Guo et.al.	2402.11481	null
2024-02-18	FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence	Sebastian Antony Joseph et.al.	2402.11456	link
2024-02-20	Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis	Shaochen Xu et.al.	2402.11398	null
2024-02-17	Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention	Eunkyung Jo et.al.	2402.11353	null
2024-02-17	KnowTuning: Knowledge-aware Fine-tuning for Large Language Models	Yougang Lyu et.al.	2402.11176	link
2024-02-24	Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model	Salman Rahman et.al.	2402.10965	null
2024-02-10	DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting	Chris von Csefalvay et.al.	2402.10951	null
2024-02-09	Zero-shot Explainable Mental Health Analysis on Social Media by incorporating Mental Scales	Wenyu Li et.al.	2402.10948	null
2024-02-16	Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks	Niall Taylor et.al.	2402.10597	null
2024-02-15	BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains	Yanis Labrak et.al.	2402.10373	link
2024-02-28	Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients	Mahyar Abbasian et.al.	2402.10153	null
2024-02-15	Towards Reducing Diagnostic Errors with Interpretable Risk Prediction	Denis Jered McInerney et.al.	2402.10109	null
2024-02-15	Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4	Ting Fang Tan et.al.	2402.10083	null
2024-02-21	AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis	Zhihao Fan et.al.	2402.09742	link
2024-02-15	GPT-4’s assessment of its performance in a USMLE-based case study	Uttam Dhakal et.al.	2402.09654	null
2024-02-14	Probabilistic Reasoning in Generative Large Language Models	Aliakbar Nafar et.al.	2402.09614	link
2024-02-16	Emerging Opportunities of Using Large Language Models for Translation Between Drug Molecules and Indications	David Oniani et.al.	2402.09588	null
2024-02-14	Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support	Zilin Ma et.al.	2402.09260	null
2024-02-13	Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy	Gioele Barabucci et.al.	2402.08806	null
2024-02-13	JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models	Jillian Fisher et.al.	2402.08761	link
2024-02-13	The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting	David Haag et.al.	2402.08658	null
2024-02-20	Addressing cognitive bias in medical language models	Samuel Schmidgall et.al.	2402.08113	link
2024-02-02	Exploring patient trust in clinical advice from AI-driven LLMs like ChatGPT for self-diagnosis	Delong Du et.al.	2402.07920	null
2024-02-12	CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity	Norbert Tihanyi et.al.	2402.07688	null
2024-02-12	The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models	Ayo Adedeji et.al.	2402.07658	null
2024-02-12	Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models	Isabelle Lorge et.al.	2402.07645	link
2024-02-10	Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations	Ankit Pal et.al.	2402.07023	link
2024-02-10	REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models	Yinghao Zhu et.al.	2402.07016	null
2024-02-09	RareBench: Can LLMs Serve as Rare Diseases Specialists?	Xuanzhong Chen et.al.	2402.06341	link
2024-02-08	FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs	Eun Cheol Choi et.al.	2402.05904	link
2024-02-05	Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering	Aryan Agrawal et.al.	2402.05127	null
2024-02-05	Zero-Shot Clinical Trial Patient Matching with LLMs	Michael Wornow et.al.	2402.05125	null
2024-02-07	CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients	Pragnya Ramjee et.al.	2402.04620	link
2024-02-06	Measuring Implicit Bias in Explicitly Unbiased Large Language Models	Xuechunzi Bai et.al.	2402.04105	link
2024-02-06	The Use of a Large Language Model for Cyberbullying Detection	Bayode Ogunleye et.al.	2402.04088	null
2024-02-06	Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models	Reza Khanmohammadi et.al.	2402.04075	null
2024-02-05	Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach	Sergi Blanco-Cuaresma et.al.	2402.03435	null
2024-02-05	Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models	Zhiyuan Hu et.al.	2402.03271	link
2024-02-05	Large Language Model Distilling Medication Recommendation Model	Qidong Liu et.al.	2402.02803	link
2024-02-05	RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews	Satpreet Harcharan Singh et.al.	2402.02656	link
2024-02-03	How well do LLMs cite relevant medical references? An evaluation framework and analyses	Kevin Wu et.al.	2402.02008	null
2024-02-02	Leveraging Large Language Models for Analyzing Blood Pressure Variations Across Biological Sex from Scientific Literature	Yuting Guo et.al.	2402.01826	null
2024-02-01	Hierarchical Multi-Label Classification of Online Vaccine Concerns	Chloe Qinyu Zhu et.al.	2402.01783	null
2024-01-30	Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer’s Dementia	Balamurali B T et.al.	2402.01751	null
2024-01-29	Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties	Jasmine Chiat Ling Ong et.al.	2402.01741	null
2024-01-29	Development and Testing of Retrieval Augmented Generation in Large Language Models – A Case Study Report	YuHe Ke et.al.	2402.01733	null
2024-01-28	Evaluating LLM – Generated Multimodal Diagnosis from Medical Images and Symptom Analysis	Dimitrios P. Panagoulias et.al.	2402.01730	null
2024-02-10	Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data	Yinghao Zhu et.al.	2402.01713	link
2024-01-25	LLM on FHIR – Demystifying Health Records	Paul Schmiedmayer et.al.	2402.01711	null
2024-01-23	Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study	Zhe He et.al.	2402.01693	null
2024-02-01	HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent	Weijie Xu et.al.	2402.01018	link
2024-02-13	Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model	Mingyu Jin et.al.	2402.00746	link
2024-02-01	SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models	Tianhan Xu et.al.	2402.00474	null
2024-01-31	Multimodal Clinical Pseudo-notes for Emergency Department Prediction Tasks using Multiple Embedding Model for EHR (MEME)	Simon A. Lee et.al.	2402.00160	link
2024-01-30	GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries	Yuyuan Feng et.al.	2402.00068	null
2024-02-03	EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation	Jonathan W. Kim et.al.	2401.18006	null
2024-01-31	Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning	Yuelyu Ji et.al.	2401.17602	link
2024-01-30	Detecting mental disorder on social media: a ChatGPT-augmented explainable approach	Loris Belcastro et.al.	2401.17477	link
2024-02-02	Leveraging Professional Radiologists’ Expertise to Enhance LLMs’ Evaluation for Radiology Reports	Qingqing Zhu et.al.	2401.16578	null
2024-01-29	InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification	Jan Trienes et.al.	2401.16475	link
2024-02-16	Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media	Jiayu Song et.al.	2401.16240	null
2024-01-29	“You tell me”: A Dataset of GPT-4-Based Behaviour Change Support Conversations	Selina Meyer et.al.	2401.16167	null
2024-01-29	Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis	Haochun Wang et.al.	2401.16107	null
2024-01-29	Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning	Kenta Izumi et.al.	2401.15966	null
2024-01-28	AI as a Medical Ally: Evaluating ChatGPT’s Usage and Impact in Indian Healthcare	Aryaman Raina et.al.	2401.15605	null
2024-01-27	Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models	Minbyul Jeong et.al.	2401.15269	link
2024-01-26	Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning	Md Mushfiqur Rahman et.al.	2401.15043	link
2024-01-26	Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias	Yu He Ke et.al.	2401.14589	null
2024-01-25	K-QA: A Real-World Medical Q&A Benchmark	Itay Manes et.al.	2401.14493	link
2024-01-25	LongHealth: A Question Answering Benchmark with Long Clinical Documents	Lisa Adams et.al.	2401.14490	link
2024-01-25	The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support	Inhwa Song et.al.	2401.14362	null
2024-01-25	A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification	Madhumita Sushil et.al.	2401.13887	null
2024-01-24	Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes	Darren Liu et.al.	2401.13588	null
2024-01-20	Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA	Xi Chen et.al.	2401.12998	null
2024-01-10	A General-purpose AI Avatar in Healthcare	Nicholas Yan et.al.	2401.12981	null
2024-01-22	CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation	Zhihong Chen et.al.	2401.12208	null
2024-01-22	CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark	Ge Zhang et.al.	2401.11944	null
2024-01-21	MedLM: Exploring Language Models for Medical Question Answering Systems	Niraj Yagnik et.al.	2401.11389	link
2024-01-23	Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines	David Oniani et.al.	2401.11120	null
2024-01-19	BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks	Valentina Aparicio et.al.	2401.11011	null
2024-01-19	Dynamic Q&A of Clinical Documents with Large Language Models	Ran Elgedawy et.al.	2401.10733	null
2024-01-17	Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study	Niklas Mannhardt et.al.	2401.09637	null
2024-01-16	Gene-associated Disease Discovery Powered by Large Language Models	Jiayu Chang et.al.	2401.09490	null
2024-01-17	Understanding the concerns and choices of public when using large language models for healthcare	Yunpeng Xiao et.al.	2401.09090	null
2024-01-16	Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration	Simone Balloccu et.al.	2401.08420	link
2024-01-14	Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study	Ahmadul Karim Chowdhury et.al.	2401.07310	link
2024-01-13	EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records	Wenqi Shi et.al.	2401.07128	link
2024-01-13	NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey	B. Ross Katz et.al.	2401.06967	link
2024-01-12	Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data	Yubin Kim et.al.	2401.06866	link
2023-12-12	Large language models in healthcare and medical domain: A review	Zabir Al Nazi et.al.	2401.06775	null
2024-01-11	Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models	K M Sajjadul Islam et.al.	2401.06088	null
2024-01-11	EpilepsyLLM: Domain-Specific Large Language Model Fine-tuned with Epilepsy Medical Knowledge	Xuyang Zhao et.al.	2401.05908	null
2024-01-11	Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback	Chengfeng Dou et.al.	2401.05695	link
2024-01-11	Towards Conversational Diagnostic AI	Tao Tu et.al.	2401.05654	null
2024-01-18	MISS: A Generative Pretraining and Finetuning Approach for Med-VQA	Jiawei Chen et.al.	2401.05163	link
2024-01-01	Large Language Models in Mental Health Care: a Scoping Review	Yining Hua et.al.	2401.02984	null
2024-01-05	Generative Large Language Models are autonomous practitioners of evidence-based medicine	Akhil Vaid et.al.	2401.02851	null
2024-01-04	SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval	Griffin Adams et.al.	2401.02369	null
2024-01-04	Text2MDT: Extracting Medical Decision Trees from Medical Texts	Wei Zhu et.al.	2401.02034	null
2024-01-06	Generalist embedding models are better at short-context clinical semantic search than specialized embedding models	Jean-Baptiste Excoffier et.al.	2401.01943	link
2024-01-03	MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries	Akash Ghosh et.al.	2401.01596	link
2024-01-06	Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review	Luoma Ke et.al.	2401.01519	null
2024-01-03	Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation	Walid Saba et.al.	2401.01469	null
2024-01-08	A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models	S. M Towhidul Islam Tonmoy et.al.	2401.01313	null
2024-01-01	A Computational Framework for Behavioral Assessment of LLM Therapists	Yu Ying Chiu et.al.	2401.00820	link
2023-12-31	An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks	Yash Bingi et.al.	2401.00582	null
2023-12-31	Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing	Omid Rohanian et.al.	2401.00579	null
2023-12-29	K-PERM: Personalized Response Generation Using Dynamic Knowledge Retrieval and Persona-Adaptive Queries	Kanak Raj et.al.	2312.17748	link
2023-12-29	Overview of the PromptCBLUE Shared Task in CHIP2023	Wei Zhu et.al.	2312.17522	link
2023-12-29	Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning	Xiao-Yang Liu et.al.	2312.17493	null
2023-12-29	EHR Interaction Between Patients and AI: NoteAid EHR Interaction	Xiaocheng Zhang et.al.	2312.17475	null
2023-12-29	LLM Factoscope: Uncovering LLMs’ Factual Discernment through Inner States Analysis	Jinwen He et.al.	2312.16374	null
2023-12-26	Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models	Xinke Jiang et.al.	2312.15883	null
2023-12-25	IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models	Zhihao Chen et.al.	2312.15663	null
2023-12-23	Multimodal Machine Learning Combining Facial Images and Clinical Texts Improves Diagnosis of Rare Genetic Diseases	Da Wu et.al.	2312.15320	link
2023-12-06	Empowering ChatGPT-Like Large-Scale Language Models with Local Knowledge Base for Industrial Prognostics and Health Management	Huan Wang et.al.	2312.14945	null
2023-12-22	Robust Knowledge Extraction from Large Language Models using Social Choice Theory	Nico Potyka et.al.	2312.14877	link
2023-12-22	Zero-shot Causal Graph Extrapolation from Text via LLMs	Alessandro Antonucci et.al.	2312.14670	link
2023-12-19	Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning	Xiaodan Zhang et.al.	2312.14184	null
2023-12-20	Exploring Multimodal Large Language Models for Radiology Report Error-checking	Jinge Wu et.al.	2312.13103	null
2023-12-20	MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models	Yan Cai et.al.	2312.12806	null
2023-12-20	Fine-tuning Large Language Models for Adaptive Machine Translation	Yasmin Moslem et.al.	2312.12740	link
2023-12-20	Mini-GPTs: Efficient Large Language Models through Contextual Pruning	Tim Valicenti et.al.	2312.12682	null
2023-12-19	Can ChatGPT be Your Personal Medical Assistant?	Md. Rafiul Biswas et.al.	2312.12006	null
2023-12-19	Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health	Maria Antoniak et.al.	2312.11803	link
2023-12-16	CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare	Akash Ghosh et.al.	2312.11541	link
2023-12-16	A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers	Feida Gu et.al.	2312.10419	null
2023-12-15	GPT-doctor: Customizing Large Language Models for Medical Consultation	Wen Wang et.al.	2312.10225	null
2023-12-15	Low-resource classification of mobility functioning information in clinical sentences using large language models	Tuan Dung Le et.al.	2312.10202	null
2023-12-06	Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk	Colleen Chan et.al.	2312.10072	null
2023-12-15	Distilling Large Language Models for Matching Patients to Clinical Trials	Mauro Nievas et.al.	2312.09958	null
2024-01-07	RJUA-QA: A Comprehensive QA Dataset for Urology	Shiwei Lyu et.al.	2312.09785	link
2023-12-14	Evaluating Large Language Models for Health-related Queries with Presuppositions	Navreet Kaur et.al.	2312.08800	link
2023-12-15	High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models	Songchi Zhou et.al.	2312.08274	null
2023-12-13	CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs	Huaiyuan Ying et.al.	2312.08036	link
2023-12-12	Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales	Taeyoon Kwon et.al.	2312.07399	link
2023-12-12	Efficient Few-Shot Clinical Task Adaptation with Large Language Models	Kaipeng Zheng et.al.	2312.07125	null
2023-12-12	SM70: A Large Language Model for Medical Devices	Anubhav Bhatti et.al.	2312.06974	null
2023-12-05	Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety	Manas Gaur et.al.	2312.06798	null
2023-12-11	Large Language Models with Retrieval-Augmented Generation for Zero-Shot Disease Phenotyping	Will E. Thompson et.al.	2312.06457	null
2023-12-11	Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need	Cheng Peng et.al.	2312.06099	null
2023-12-09	Enhancing Medical Specialty Assignment to Patients using NLP Techniques	Chris Solomou et.al.	2312.05585	null
2023-11-10	Holistic Evaluation of GPT-4V for Biomedical Imaging	Zhengliang Liu et.al.	2312.05256	null
2023-12-08	Ophtha-LLaMA2: A Large Language Model for Ophthalmology	Huan Zhao et.al.	2312.04906	null
2023-12-07	AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making	Shusen Liu et.al.	2312.04494	null
2023-12-08	Methods to Estimate Large Language Model Confidence	Maia Kotelanski et.al.	2312.03733	null
2023-12-06	XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering	Joel Stremmel et.al.	2312.03567	null
2023-12-05	Breast Ultrasound Report Generation using LangChain	Jaeyoung Huh et.al.	2312.03013	null
2023-12-05	MedDM:LLM-executable clinical guidance tree for clinical decision-making	Binbin Li et.al.	2312.02441	null
2023-12-04	LLMs Accelerate Annotation for Medical Information Extraction	Akshay Goel et.al.	2312.02296	null
2023-12-04	MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model	Ling Yang et.al.	2312.02233	null
2023-12-03	Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation	Yuzhe Lu et.al.	2312.01504	null
2023-12-18	From Beginner to Expert: Modeling Medical Knowledge into General LLMs	Qiang Li et.al.	2312.01040	null
2023-12-01	Explanatory Argument Extraction of Correct Answers in Resident Medical Exams	Iakes Goenaga et.al.	2312.00567	link
2023-11-30	Towards Accurate Differential Diagnosis with Large Language Models	Daniel McDuff et.al.	2312.00164	null
2023-11-30	RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance	Chantal Pellegrini et.al.	2311.18681	link
2023-11-29	Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A	Andries Smit et.al.	2311.17371	link
2023-11-27	MEDITRON-70B: Scaling Medical Pretraining for Large Language Models	Zeming Chen et.al.	2311.16079	link
2023-11-27	BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights	François Remy et.al.	2311.16075	null
2023-11-27	RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization	Kwanyoung Kim et.al.	2311.15876	null
2023-11-28	The effect of source disclosure on evaluation of AI-generated messages: A two-part study	Sue Lim et.al.	2311.15544	null
2023-11-25	Walking a Tightrope – Evaluating Large Language Models in High-Risk Domains	Chia-Chien Hung et.al.	2311.14966	null
2023-11-20	MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer’s Care Via Unleashing Generative AI	Lifei Zheng et.al.	2311.14730	null
2023-11-10	ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management	Angela Zhang et.al.	2311.14703	null
2023-11-07	Benefits and Harms of Large Language Models in Digital Mental Health	Munmun De Choudhury et.al.	2311.14693	null
2023-11-23	Challenges of Large Language Models for Mental Health Counseling	Neo Christopher Chung et.al.	2311.13857	null
2023-11-22	Surpassing GPT-4 Medical Coding with a Two-Stage Approach	Zhichao Yang et.al.	2311.13735	null
2023-11-22	Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting	Daphne van Zandvoort et.al.	2311.13274	null
2023-11-25	From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models	Zachary Englhardt et.al.	2311.13063	link
2023-10-28	Overview of Current Applications of Large Language Models in Various Medical Specialities	Ummara Mumtaz et.al.	2311.12882	null
2023-11-21	ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models	Jiankai Tang et.al.	2311.12524	link
2023-11-20	Web News Timeline Generation with Extended Task Prompting	Sha Wang et.al.	2311.11652	null
2023-12-17	Rethinking Large Language Models in Mental Health Applications	Shaoxiong Ji et.al.	2311.11267	null
2023-11-18	Designing Interpretable ML System to Enhance Trustworthy AI in Healthcare: A Systematic Review of the Last Decade to A Proposed Robust Framework	Elham Nasarian et.al.	2311.11055	null
2023-11-17	PEFT-MedAware: Large Language Model for Medical Awareness	Keivalya Pandya et.al.	2311.10697	null
2023-11-17	Countering Misinformation via Emotional Response Generation	Daniel Russo et.al.	2311.10587	link
2023-11-16	MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning	Xiangru Tang et.al.	2311.10537	link
2023-11-16	ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond	Kanhai S. Amin et.al.	2311.10075	null
2023-11-16	HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs	Junying Chen et.al.	2311.09774	link
2023-11-16	CARE: Extracting Experimental Findings From Clinical Literature	Aakanksha Naik et.al.	2311.09736	null
2023-11-16	Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation	Zonghai Yao et.al.	2311.09684	link
2023-11-16	LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks	Mihir Parmar et.al.	2311.09564	link
2023-11-12	Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling	Yujin Cho et.al.	2311.09243	null
2023-11-15	PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health	Haoan Jin et.al.	2311.09189	link
2023-11-14	Fine-tuning Language Models for Factuality	Katherine Tian et.al.	2311.08401	null
2023-11-14	Extrinsically-Focused Evaluation of Omissions in Medical Summarization	Elliot Schumacher et.al.	2311.08303	link
2023-11-14	Insights into Classifying and Mitigating LLMs’ Hallucinations	Alessandro Bruno et.al.	2311.08117	null
2023-11-13	It’s Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models	Nishant Balepur et.al.	2311.07532	link
2023-11-13	Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer	Narmada Naik et.al.	2311.07191	null
2023-11-12	Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations?	Antonio Zaitoun et.al.	2311.06858	link
2023-11-23	ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences	Yuanhe Tian et.al.	2311.06025	link
2023-11-09	A Survey of Large Language Models in Medicine: Progress, Application, and Challenge	Hongjian Zhou et.al.	2311.05112	link
2023-11-08	DEMASQ: Unmasking the ChatGPT Wordsmith	Kavita Kumari et.al.	2311.05019	null
2023-11-07	Evaluating Large Language Models in Ophthalmology	Jason Holmes et.al.	2311.04933	null
2023-11-07	Evaluating multiple large language models in pediatric ophthalmology	Jason Holmes et.al.	2311.04368	null
2023-11-08	An Introduction to Natural Language Processing Techniques and Framework for Clinical Implementation in Radiation Oncology	Reza Khanmohammadi et.al.	2311.02205	null
2023-11-03	Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review	Mingze Yuan et.al.	2311.01918	link
2023-11-27	LLM-driven Multimodal Target Volume Contouring in Radiation Oncology	Yujin Oh et.al.	2311.01908	link
2023-11-01	Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models	Ran Xu et.al.	2311.00287	link
2023-10-31	Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision	Jiaxin Zhang et.al.	2310.20153	null
2023-11-03	Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization	Prakamya Mishra et.al.	2310.20033	link
2023-10-30	EHRTutor: Enhancing Patient Understanding of Discharge Instructions	Zihao Zhang et.al.	2310.19212	null
2023-10-23	Health Disparities through Generative AI Models: A Comparison Study Using A Domain Specific large language model	Yohn Jairo Parra Bautista et.al.	2310.18355	null
2023-10-21	MOELoRA: An MOE-based Parameter Efficient Fine-Tuning Method for Multi-task Medical Applications	Qidong Liu et.al.	2310.18339	link
2023-11-01	Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare	Junling Liu et.al.	2310.17956	link
2023-10-31	Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting	Benjamin Yan et.al.	2310.17811	null
2023-10-25	An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives	Young Min Cho et.al.	2310.17017	link
2023-10-24	Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature	Alejandro Lozano et.al.	2310.16146	link
2023-10-24	NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes	Junda Wang et.al.	2310.15959	link
2023-10-24	BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT	Yirong Chen et.al.	2310.15896	link
2023-10-24	BLESS: Benchmarking Large Language Models on Sentence Simplification	Tannon Kew et.al.	2310.15773	link
2023-10-23	AlpaCare:Instruction-tuned Large Language Models for Medical Application	Xinlu Zhang et.al.	2310.14558	link
2023-10-22	PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain	Wei Zhu et.al.	2310.14151	link
2023-10-23	Explainable Depression Symptom Detection in Social Media	Eliseo Bao Souto et.al.	2310.13664	null
2023-10-23	Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries	Yiqiao Jin et.al.	2310.13132	link
2023-10-19	Causal-structure Driven Augmentations for Text OOD Generalization	Amir Feder et.al.	2310.12803	null
2023-10-18	On the Benefit of Generative Foundation Models for Human Activity Recognition	Zikang Leng et.al.	2310.12085	null
2023-10-17	Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models	Khushboo Verma et.al.	2310.11266	null
2023-10-16	JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning	Issey Sukeda et.al.	2310.10083	null
2023-10-13	Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation	Eun Cheol Choi et.al.	2310.09223	null
2023-10-13	Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model	Qichen Ye et.al.	2310.09089	link

UncertaintyLLM

Publish Date	Title	Authors	PDF	Code
2025-07-23	BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems	Malsha Ashani Mahawatta Dona et.al.	2507.17722	null
2025-07-23	Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks	Ilias Chatzistefanidis et.al.	2507.17695	null
2025-07-23	An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models	Haoran Sun et.al.	2507.17477	null
2025-07-23	Each to Their Own: Exploring the Optimal Embedding in RAG	Shiting Chen et.al.	2507.17442	null
2025-07-23	R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning	Zhuokun Chen et.al.	2507.17307	null
2025-07-23	HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery	Haoran Jiang et.al.	2507.17209	null
2025-07-23	SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs	Zhiqiang Liu et.al.	2507.17178	null
2025-07-23	Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination	Mariam ALMutairi et.al.	2507.17134	null
2025-07-23	Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance	Yufei He et.al.	2507.17131	null
2025-07-22	Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems	Chengxuan Xia et.al.	2507.17061	null
2025-07-22	Harnessing RLHF for Robust Unanswerability Recognition and Trustworthy Response Generation in LLMs	Shuyuan Lin et.al.	2507.16951	null
2025-07-22	CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage	Na Li et.al.	2507.16872	null
2025-07-22	Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support	Fangjian Lei et.al.	2507.16754	null
2025-07-23	Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints	Zhenyun Yin et.al.	2507.16727	null
2025-07-22	ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs	Zhenliang Zhang et.al.	2507.16488	null
2025-07-22	Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework	Hongyi Tang et.al.	2507.16414	null
2025-07-23	WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking	Zipeng Ling et.al.	2507.16199	null
2025-07-21	Efficient Compositional Multi-tasking for On-device Large Language Models	Ondrej Bohdal et.al.	2507.16083	null
2025-07-21	Towards Mitigation of Hallucination for LLM-empowered Agents: Progressive Generalization Bound Exploration and Watchdog Monitor	Siyuan Liu et.al.	2507.15903	null
2025-07-21	Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks	Hope Schroeder et.al.	2507.15821	null
2025-07-21	LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	Seth Karten et.al.	2507.15815	null
2025-07-21	Interleaved LLM and Motion Planning for Generalized Multi-Object Collection in Large Scene Graphs	Ruochu Yang et.al.	2507.15782	null
2025-07-21	On the Inevitability of Left-Leaning Political Bias in Aligned Language Models	Thilo Hagendorff et.al.	2507.15328	null
2025-07-21	Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems	Qian Xiong et.al.	2507.15296	null
2025-07-20	MUR: Momentum Uncertainty guided Reasoning for Large Language Models	Hang Yan et.al.	2507.14958	null
2025-07-20	Byzantine-Robust Decentralized Coordination of LLM Agents	Yongrae Jo et.al.	2507.14928	null
2025-07-20	InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis	Jiale Liu et.al.	2507.14899	null
2025-07-19	Large Language Models as Medical Codes Selectors: a benchmark using the International Classification of Primary Care	Vinicius Anjos de Almeida et.al.	2507.14681	null
2025-07-19	Cleanse: Uncertainty Estimation Approach Using Clustering-based Semantic Consistency in LLMs	Minsuh Joo et.al.	2507.14649	null
2025-07-18	Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering	Michael J. Zellinger et.al.	2507.14406	null
2025-07-18	DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation	Ziqi Wang et.al.	2507.14267	null
2025-07-14	DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base	Song Mao et.al.	2507.14189	null
2025-07-18	Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors	Jochen Wulf et.al.	2507.14034	null
2025-07-18	Preprint: Did I Just Browse A Website Written by LLMs?	Sichang “Steven” He et.al.	2507.13933	null
2025-07-18	RAG-based Architectures for Drug Side Effect Retrieval in LLMs	Shad Nygren et.al.	2507.13822	null
2025-07-17	GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models	Eduardo C. Garrido-Merchán et.al.	2507.13550	null
2025-07-17	Aligning Knowledge Graphs and Language Models for Factual Accuracy	Nur A Zarin Nishat et.al.	2507.13411	null
2025-07-17	DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning	Rahel Rickenbach et.al.	2507.12855	null
2025-07-17	Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines	Muhammad Javed et.al.	2507.12840	null
2025-07-16	LLM-Based Config Synthesis requires Disambiguation	Rajdeep Mondal et.al.	2507.12443	null
2025-07-16	From Static to Intelligent: Evolving SaaS Pricing with LLMs	Francisco Javier Cavero et.al.	2507.12104	null
2025-07-16	Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning	Tosin Adewumi et.al.	2507.12079	null
2025-07-16	PoTPTQ: A Two-step Power-of-Two Post-training for LLMs	Xinyu Wang et.al.	2507.11959	null
2025-07-15	CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks	Meng Li et.al.	2507.11742	null
2025-07-15	LLM-based ambiguity detection in natural language instructions for collaborative surgical robots	Ana Davila et.al.	2507.11525	null
2025-07-15	Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces	Yunhao Yang et.al.	2507.11352	null
2025-07-15	Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems	Dany Moshkovich et.al.	2507.11277	null
2025-07-15	An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling	Anh Nguyen-Duc et.al.	2507.11272	null
2025-07-15	An Agentic Flow for Finite State Machine Extraction using Prompt Chaining	Fares Wael et.al.	2507.11222	null
2025-07-15	Mixture of Experts in Large Language Models	Danyang Zhang et.al.	2507.11181	null
2025-07-15	What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests	Dimitri Staufer et.al.	2507.11128	null
2025-07-15	LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP	Haowei Yang et.al.	2507.11052	null
2025-07-15	Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment	Adam Yang et.al.	2507.11042	null
2025-07-15	First-Order Error Matters: Accurate Compensation for Quantized Large Language Models	Xingyu Zheng et.al.	2507.11017	null
2025-07-14	Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs	Ye Yang et.al.	2507.10630	null
2025-07-16	GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning	Ziru Liu et.al.	2507.10628	null
2025-07-11	Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing	Dennis Ulmer et.al.	2507.10587	null
2025-07-11	AutoRAG-LoRA: Hallucination-Triggered Knowledge Retuning via Lightweight Adapters	Kaushik Dwivedi et.al.	2507.10586	null
2025-07-14	Referential ambiguity and clarification requests: comparing human and LLM behaviour	Chris Madge et.al.	2507.10445	null
2025-07-14	DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs	Jiahe Zhao et.al.	2507.10302	null
2025-07-14	The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents	Lixu Wang et.al.	2507.10016	null
2025-07-14	Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning	Zijun Chen et.al.	2507.10007	null
2025-07-13	Prompting for Performance: Exploring LLMs for Configuring Software	Helge Spieker et.al.	2507.09790	null
2025-07-16	Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs	Yangning Li et.al.	2507.09477	null
2025-07-12	LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing	Quanyan Zhu et.al.	2507.09407	null
2025-07-22	Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models	Anita Kriz et.al.	2507.09279	null
2025-07-12	StockSim: A Dual-Mode Order-Level Simulator for Evaluating Multi-Agent LLMs in Financial Markets	Charidimos Papadakis et.al.	2507.09255	null
2025-07-12	Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models	Ameen Ali et.al.	2507.09185	null
2025-07-12	Position Paper: Programming Language Techniques for Bridging LLM Code Generation Semantic Gaps	Yalong Du et.al.	2507.09135	null
2025-07-11	SetupBench: Assessing Software Engineering Agents’ Ability to Bootstrap Development Environments	Avi Arora et.al.	2507.09063	null
2025-07-11	GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval	Savini Kashmira et.al.	2507.08945	null
2025-07-09	RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation	Tianzhe Zhao et.al.	2507.08862	null
2025-07-11	Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study	Marina Luketina et.al.	2507.08468	null
2025-07-10	TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs	Duygu Nur Yaldiz et.al.	2507.08203	null
2025-07-10	CTRLS: Chain-of-Thought Reasoning via Latent State-Transition	Junda Wu et.al.	2507.08182	null
2025-07-10	Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores	Vivek Chari et.al.	2507.08143	null
2025-07-10	TableReasoner: Advancing Table Reasoning Framework with Large Language Models	Sishi Xiong et.al.	2507.08046	null
2025-07-09	Integrating External Tools with Large Language Models to Improve Accuracy	Nripesh Niketan et.al.	2507.08034	null
2025-07-10	Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models	Chen Feng et.al.	2507.07877	null
2025-07-10	DocCHA: Towards LLM-Augmented Interactive Online diagnosis System	Xinyi Liu et.al.	2507.07870	null
2025-07-10	From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems	Youngjoon Jang et.al.	2507.07847	null
2025-07-10	When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance	Peizhang Shao et.al.	2507.07748	null
2025-07-10	Prompt Engineering for Requirements Engineering: A Literature Review and Roadmap	Kaicheng Huang et.al.	2507.07682	null
2025-07-15	Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models	Varin Sikka et.al.	2507.07505	null
2025-07-10	Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models	Kaiqu Liang et.al.	2507.07484	null
2025-07-09	Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery	Malikussaid et.al.	2507.07328	null
2025-07-09	An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation	Maya Kruse et.al.	2507.07236	null
2025-07-09	Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics	Xueqing Xu et.al.	2507.07155	null
2025-07-07	DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning	Shreyas Vinaya Sathyanarayana et.al.	2507.07060	null
2025-07-09	5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage	Ugur Ari et.al.	2507.07045	null
2025-07-09	First Return, Entropy-Eliciting Explore	Tianyu Zheng et.al.	2507.07017	null
2025-07-09	Investigating the Robustness of Retrieval-Augmented Generation at the Query Level	Sezen Perçin et.al.	2507.06956	null
2025-07-09	On the Effect of Uncertainty on Layer-wise Inference Dynamics	Sunwoo Kim et.al.	2507.06722	null
2025-07-10	The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production	Juan B. Gutiérrez et.al.	2507.06565	null
2025-07-09	On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks	Stephen Obadinma et.al.	2507.06489	null
2025-07-08	Humans overrely on overconfident language models, across languages	Neil Rathi et.al.	2507.06306	null
2025-07-08	Differential Mamba	Nadav Schneider et.al.	2507.06204	null
2025-07-08	UQLM: A Python Package for Uncertainty Quantification in Large Language Models	Dylan Bouchard et.al.	2507.06196	null
2025-07-08	KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation	Zeyuan Meng et.al.	2507.05863	null
2025-07-08	Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik’s Cube	Chongshan Fan et.al.	2507.05607	null
2025-07-07	“Lost-in-the-Later”: Framework for Quantifying Contextual Grounding in Large Language Models	Yufei Tao et.al.	2507.05424	null
2025-07-07	On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study	Riccardo Alberghi et.al.	2507.05362	null
2025-07-07	LCDS: A Logic-Controlled Discharge Summary Generation System Supporting Source Attribution and Expert Review	Cheng Yuan et.al.	2507.05319	null
2025-07-04	ReservoirChat: Interactive Documentation Enhanced with LLM and Knowledge Graph for ReservoirPy	Virgile Boraud et.al.	2507.05279	null
2025-07-07	CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale	Jonathan Hyun et.al.	2507.05178	null
2025-07-07	What Shapes User Trust in ChatGPT? A Mixed-Methods Study of User Attributes, Trust Dimensions, Task Context, and Societal Perceptions among University Students	Kadija Bouyzourn et.al.	2507.05046	null
2025-07-07	MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction	Kaleem Ullah Qasim et.al.	2507.04893	null
2025-07-07	Knowledge-Aware Self-Correction in Language Models via Structured Memory Graphs	Swayamjit Saha et.al.	2507.04625	null
2025-07-07	any4: Learned 4-bit Numeric Representation for LLMs	Mostafa Elhoushi et.al.	2507.04610	null
2025-07-06	Unveiling the Potential of Diffusion Large Language Model in Controllable Generation	Zhen Xiong et.al.	2507.04504	null
2025-07-06	The role of large language models in UI/UX design: A systematic literature review	Ammar Ahmed et.al.	2507.04469	null
2025-07-06	Data Discovery using LLMs – A Study of Data User Behaviour	Christin Katharina Kreutz et.al.	2507.04444	null
2025-07-06	Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models	Pranta Saha et.al.	2507.04432	null
2025-07-06	AutoLayout: Closed-Loop Layout Synthesis via Slow-Fast Collaborative Reasoning	Weixing Chen et.al.	2507.04293	null
2025-07-10	DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth	Zheng Lian et.al.	2507.04278	null
2025-07-05	SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding	Runcong Zhao et.al.	2507.04189	null
2025-07-05	Token Level Hallucination Detection via Variance in Language Models	Keshav Kumar et.al.	2507.04137	null
2025-07-05	Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing	Jinwei Hu et.al.	2507.04105	null
2025-07-05	Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features	Thuy An Ha et.al.	2507.03998	null
2025-07-05	CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate	Yiliu Sun et.al.	2507.03928	null
2025-07-05	KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis	Reilly Haskins et.al.	2507.03847	null
2025-07-09	Skewed Score: A statistical framework to assess autograders	Magda Dubois et.al.	2507.03772	null
2025-07-04	Roadmap for using large language models (LLMs) to accelerate cross-disciplinary research with an example from computational biology	Ruian Ke et.al.	2507.03722	null
2025-07-04	Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy	Francisca Lemos et.al.	2507.03620	null
2025-07-04	REAL: Benchmarking Abilities of Large Language Models for Housing Transactions and Services	Kexin Zhu et.al.	2507.03477	null
2025-07-04	Conformal Information Pursuit for Interactively Guiding Large Language Models	Kwan Ho Ryan Chan et.al.	2507.03279	null
2025-07-04	KinyaColBERT: A Lexically Grounded Retrieval Model for Low-Resource Retrieval-Augmented Generation	Antoine Nzeyimana et.al.	2507.03241	null
2025-07-03	How Much Content Do LLMs Generate That Induces Cognitive Bias in Users?	Abeer Alessa et.al.	2507.03194	null
2025-07-03	How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models	Dharshan Kumaran et.al.	2507.03120	null
2025-07-03	Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case	Alvaro Riquelme et.al.	2507.03067	null
2025-07-03	Cautious Next Token Prediction	Yizhou Wang et.al.	2507.03038	null
2025-07-03	Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation	Johnson Thomas et.al.	2507.03033	null
2025-07-01	GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models	Seshu Tirupathi et.al.	2507.02986	null
2025-07-06	KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs	Yuzhang Xie et.al.	2507.02773	null
2025-07-03	Who’s Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots	Zahra Ashktorab et.al.	2507.02745	null
2025-07-03	Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory	Kenneth Payne et.al.	2507.02618	null
2025-07-03	MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion	Xin Guan et.al.	2507.02595	null
2025-07-03	WebSailor: Navigating Super-human Reasoning for Web Agent	Kuan Li et.al.	2507.02592	null
2025-07-04	Introducing a New Brexit-Related Uncertainty Index: Its Evolution and Economic Consequences	Ismet Gocer et.al.	2507.02439	null
2025-07-03	Uncertainty-aware Reward Design Process	Yang Yang et.al.	2507.02256	null
2025-07-03	DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs	Mohammad Akyash et.al.	2507.02226	null
2025-07-02	The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems	Reza Yousefi Maragheh et.al.	2507.02097	null
2025-07-02	Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs	Mohammad Ali Alomrani et.al.	2507.02076	null
2025-07-02	SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars	Xiaosheng Zhao et.al.	2507.01939	null
2025-07-02	High-Layer Attention Pruning with Rescaling	Songtao Liu et.al.	2507.01900	null
2025-07-02	Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks	Hanlin Cai et.al.	2507.01694	null
2025-07-02	Efficient Out-of-Scope Detection in Dialogue Systems via Uncertainty-Driven LLM Routing	Álvaro Zaera et.al.	2507.01541	null
2025-07-02	Using multi-agent architecture to mitigate the risk of LLM hallucinations	Abd Elrahman Amer et.al.	2507.01446	null
2025-07-07	Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading	Yoonseok Yang et.al.	2507.01431	null
2025-07-02	Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing	Inyoung Cheong et.al.	2507.01418	null
2025-07-02	ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks	Zhiyao Ren et.al.	2507.01321	null
2025-07-02	Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care	Matthew JY Kang et.al.	2507.01282	null
2025-07-01	Good Enough to Learn: LLM-based Anomaly Detection in ECU Logs without Reliable Labels	Bogdan Bogdan et.al.	2507.01077	null
2025-07-01	On the Surprising Efficacy of LLMs for Penetration-Testing	Andreas Happe et.al.	2507.00829	null
2025-07-01	Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding	Guangyi Zhang et.al.	2507.00605	null
2025-07-01	TUM-MiKaNi at SemEval-2025 Task 3: Towards Multilingual and Knowledge-Aware Non-factual Hallucination Identification	Miriam Anschütz et.al.	2507.00579	null
2025-07-01	Reliable Annotations with Less Effort: Evaluating LLM-Human Collaboration in Search Clarifications	Leila Tavakoli et.al.	2507.00543	null
2025-06-30	Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission	Faranaksadat Solat et.al.	2507.00082	null
2025-06-26	Estimating Correctness Without Oracles in LLM-Based Code Generation	Thomas Valentin et.al.	2507.00057	null
2025-06-25	VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models	Binesh Sadanandan et.al.	2507.00052	null
2025-06-30	Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice	Akshit Kumar et.al.	2506.23924	null
2025-06-30	Large Language Models for Statistical Inference: Context Augmentation with Applications to the Two-Sample Problem and Regression	Marc Ratkovic et.al.	2506.23862	null
2025-06-30	Leveraging a Multi-Agent LLM-Based System to Educate Teachers in Hate Incidents Management	Ewelina Gajewska et.al.	2506.23774	null
2025-06-30	The Confidence Paradox: Can LLM Know When It’s Wrong	Sahil Tripathi et.al.	2506.23464	null
2025-06-29	Do LLMs Dream of Discrete Algorithms?	Claudionor Coelho Jr et.al.	2506.23408	null
2025-07-01	Learning-to-Context Slope: Evaluating In-Context Learning Effectiveness Beyond Performance Illusions	Dingzriui Wang et.al.	2506.23146	null
2025-06-29	LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented Generation	Shadman Sobhan et.al.	2506.23136	null
2025-06-28	Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration	Ramya Hebbalaguppe et.al.	2506.22819	null
2025-06-28	Enhancing Android Malware Detection with Retrieval-Augmented Generation	Saraga S. et.al.	2506.22750	null
2025-06-28	RAILS: Retrieval-Augmented Intelligence for Learning Software Development	Wali Mohammad Abdullah et.al.	2506.22742	null
2025-06-27	ReCo: Reminder Composition Mitigates Hallucinations in Vision-Language Models	Sotirios Panagiotis Chytas et.al.	2506.22636	null
2025-06-26	Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation	Deyu Zou et.al.	2506.22518	null
2025-06-25	Mitigating Gambling-Like Risk-Taking Behaviors in Large Language Models: A Behavioral Economics Approach to AI Safety	Y. Du et.al.	2506.22496	null
2025-06-24	Hallucination Detection with Small Language Models	Ming Cheung et.al.	2506.22486	null
2025-06-27	Probabilistic Optimality for Inference-time Scaling	Youkang Wang et.al.	2506.22376	null
2025-06-27	Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics	Michael A. Riegler et.al.	2506.21964	null
2025-06-27	The Consistency Hypothesis in Uncertainty Quantification for Large Language Models	Quan Xiao et.al.	2506.21849	null
2025-06-26	MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models	Yifan Liu et.al.	2506.21784	null
2025-06-26	Evaluating List Construction and Temporal Understanding capabilities of Large Language Models	Alexandru Dumitru et.al.	2506.21783	null
2025-06-26	THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?	Xin Wang et.al.	2506.21763	null
2025-06-22	Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines	Wenhao Li et.al.	2506.21615	null
2025-06-20	CORE-KG: An LLM-Driven Knowledge Graph Construction Framework for Human Smuggling Networks	Dipak Meher et.al.	2506.21607	null
2025-06-26	Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection	Ali Şenol et.al.	2506.21443	null
2025-06-26	Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference	Colin Samplawski et.al.	2506.21408	null
2025-06-26	Small Encoders Can Rival Large Decoders in Detecting Groundedness	Istabrak Abbes et.al.	2506.21288	null
2025-06-26	BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services	Zhaojiacheng Zhou et.al.	2506.21033	null
2025-06-26	Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers	Martin Ruskov et.al.	2506.20982	null
2025-06-25	Towards Probabilistic Question Answering Over Tabular Data	Chen Shen et.al.	2506.20747	null
2025-06-25	Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges	Alexander D. Kalian et.al.	2506.20598	null
2025-06-26	TAPS: Tool-Augmented Personalisation via Structured Tagging	Ekaterina Taktasheva et.al.	2506.20409	null
2025-06-25	Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models	Kejia Chen et.al.	2506.20251	null
2025-06-25	DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs	Ruokai Yin et.al.	2506.20194	null
2025-06-24	KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality	Baochang Ren et.al.	2506.19807	null
2025-06-24	LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis	Lei Kang et.al.	2506.19702	null
2025-06-24	Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge	Juraj Vladika et.al.	2506.19607	null
2025-06-24	Automatic Posology Structuration : What role for LLMs?	Natalia Bobkova et.al.	2506.19525	null
2025-06-24	Inference-Time Reward Hacking in Large Language Models	Hadi Khalaf et.al.	2506.19248	null
2025-06-23	AgenticControl: An Automated Control Design Framework Using Large Language Models	Mohammad Narimani et.al.	2506.19160	null
2025-06-23	Human-Aligned Faithfulness in Toxicity Explanations of LLMs	Ramaravind K. Mothilal et.al.	2506.19113	null
2025-06-23	Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge	Sahil Kale et.al.	2506.18998	null
2025-06-23	AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs	Piotr Matys et.al.	2506.18628	null
2025-06-23	ReFrame: Rectification Framework for Image Explaining Architectures	Debjyoti Das Adhikary et.al.	2506.18272	null
2025-06-24	Understanding Reasoning in Thinking Language Models via Steering Vectors	Constantin Venhoff et.al.	2506.18167	null
2025-06-22	Mechanistic Interpretability in the Presence of Architectural Obfuscation	Marcos Florencio et.al.	2506.18053	null
2025-06-22	QueueEDIT: Structural Self-Correction for Sequential Model Editing in LLMs	Taolin Zhang et.al.	2506.17864	null
2025-06-21	Is Your Automated Software Engineer Trustworthy?	Noble Saji Mathews et.al.	2506.17812	null
2025-06-30	KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation	Dalong Zhang et.al.	2506.17728	null
2025-06-21	Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering	Binquan Ji et.al.	2506.17692	null
2025-06-21	Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models	Yukun Huang et.al.	2506.17585	null
2025-06-20	OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections	Manasa Bharadwaj et.al.	2506.17449	null
2025-06-20	UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making	Jinhao Duan et.al.	2506.17419	null
2025-06-20	Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs	Zongjie Li et.al.	2506.17353	null
2025-06-18	Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study	Chuanlei Li et.al.	2506.17311	null
2025-06-17	Semantic uncertainty in advanced decoding methods for LLM generation	Darius Foodeei et.al.	2506.17296	null
2025-06-20	Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction	Jiekai Ma et.al.	2506.17203	null
2025-06-20	Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation	Jiahao Cheng et.al.	2506.17088	null
2025-06-20	Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond	Antonin Berthon et.al.	2506.16982	null
2025-06-20	DistillNote: LLM-based clinical note summaries improve heart failure diagnosis	Heloisa Oss Boll et.al.	2506.16777	null
2025-06-20	eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing	Isaac Shi et.al.	2506.16768	null
2025-06-20	The Role of Model Confidence on Bias Effects in Measured Uncertainties	Xinyi Liu et.al.	2506.16724	null
2025-06-19	Grounding Language Models with Semantic Digital Twins for Robotic Planning	Mehreen Naeem et.al.	2506.16493	null
2025-06-19	Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation	Guilherme Guerino et.al.	2506.16345	null
2025-06-19	SGIC: A Self-Guided Iterative Calibration Framework for RAG	Guanhua Chen et.al.	2506.16172	null
2025-06-19	Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior	Hao Li et.al.	2506.16163	link
2025-06-19	Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning	Duc Hieu Ho et.al.	2506.16064	null
2025-06-19	DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling	Fei Wang et.al.	2506.16043	null
2025-06-18	Understanding Online Polarization Through Human-Agent Interaction in a Synthetic LLM-Based Social Network	Tim Donkers et.al.	2506.15866	null
2025-06-18	PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection	Wenhao Li et.al.	2506.15656	null
2025-06-18	Context-Informed Grounding Supervision	Hyunji Lee et.al.	2506.15480	link
2025-06-18	Unlocking Post-hoc Dataset Inference with Synthetic Data	Bihe Zhao et.al.	2506.15271	null
2025-06-18	Robust Instant Policy: Leveraging Student’s t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation	Hanbit Oh et.al.	2506.15157	null
2025-06-18	HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models	Trishna Chakraborty et.al.	2506.15065	null
2025-06-17	Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning	Wassim Bouaziz et.al.	2506.14913	null
2025-06-17	Issue Retrieval and Verification Enhanced Supplementary Code Comment Generation	Yanzhen Zou et.al.	2506.14649	link
2025-06-17	Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees	Ahmed Heakl et.al.	2506.14606	null
2025-06-17	RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition	Tim Cofala et.al.	2506.14412	null
2025-06-17	Don’t Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning	William F. Shen et.al.	2506.14387	null
2025-06-17	AviationLLM: An LLM-based Knowledge System for Aviation Training	Jia’ang Wan et.al.	2506.14336	null
2025-06-17	Improving LoRA with Variational Learning	Bai Cong et.al.	2506.14280	null
2025-06-17	DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization	Chengyu Huang et.al.	2506.14157	link
2025-06-17	Abstract Meaning Representation for Hospital Discharge Summarization	Paul Landes et.al.	2506.14101	link
2025-06-20	Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs	Hen Davidov et.al.	2506.13593	link
2025-06-16	Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning	David Bani-Harouni et.al.	2506.13474	null
2025-06-17	ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models	Junho Yoon et.al.	2506.13472	null
2025-06-16	From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs	Alsharif Abuadbba et.al.	2506.13434	null
2025-06-16	Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs	Houcheng Jiang et.al.	2506.13285	null
2025-06-16	IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation	Zijie Lin et.al.	2506.13229	link
2025-06-16	SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists	Lynn Khellaf et.al.	2506.13188	null
2025-06-16	Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning	Danny Hoang et.al.	2506.13026	null
2025-06-17	Surprise Calibration for Better In-Context Learning	Zhihang Tan et.al.	2506.12796	null
2025-06-15	Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning	Alexis R. Tudor et.al.	2506.12667	null
2025-06-14	Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics	Jiarui Liu et.al.	2506.12657	null
2025-06-14	GenControl: Generative AI-Driven Autonomous Design of Control Algorithms	Chenggang Cui et.al.	2506.12554	null
2025-06-14	RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking	Shuo Yang et.al.	2506.12538	null
2025-06-14	Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation	Xiangyan Chen et.al.	2506.12496	null
2025-06-14	MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination	Ao Jia et.al.	2506.12483	null
2025-06-13	Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach	Khadija Zanna et.al.	2506.12227	null
2025-06-13	A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions	Stephen Mell et.al.	2506.12202	null
2025-06-13	Maximally-Informative Retrieval for State Space Model Generation	Evan Becker et.al.	2506.12149	null
2025-06-12	LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model’s Response for Vulnerability Analysis	Reza Fayyazi et.al.	2506.12100	link
2025-06-13	LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?	Zihan Zheng et.al.	2506.11928	null
2025-06-13	TreeRL: LLM Reinforcement Learning with On-Policy Tree Search	Zhenyu Hou et.al.	2506.11902	link
2025-06-16	Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making	Claudio Fanconi et.al.	2506.11887	null
2025-06-13	Are LLMs Good Text Diacritizers? An Arabic and Yorùbá Case Study	Hawau Olamide Toyin et.al.	2506.11602	null
2025-06-13	Augmenting the Generality and Performance of Large Language Models for Software Engineering	Fabian C. Peña et.al.	2506.11548	null
2025-06-11	Digitization of Document and Information Extraction using OCR	Rasha Sinha et.al.	2506.11156	null
2025-06-11	From over-reliance to smart integration: using Large-Language Models as translators between specialized modeling and simulation tools	Philippe J. Giabbanelli et.al.	2506.11141	null
2025-06-10	Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK	Carlos Garcia-Fernandez et.al.	2506.11129	null
2025-06-14	Farseer: A Refined Scaling Law in Large Language Models	Houyi Li et.al.	2506.10972	link
2025-06-12	Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers	Yixiao Huang et.al.	2506.10887	null
2025-06-13	Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles	Qingyan Wei et.al.	2506.10848	link
2025-06-12	Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs	Alberto Testoni et.al.	2506.10769	null
2025-06-12	Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs	Yilin Xiao et.al.	2506.10508	null
2025-06-12	PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier	Yuhua Jiang et.al.	2506.10406	null
2025-06-12	AutoGEEval++: A Multi-Level and Multi-Geospatial-Modality Automated Evaluation Framework for Large Language Models in Geospatial Code Generation on Google Earth Engine	Shuyang Hou et.al.	2506.10365	null
2025-06-12	TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree	Yu-Yang Qian et.al.	2506.10355	link
2025-06-12	Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements	Seyed Moein Abtahi et.al.	2506.10330	null
2025-06-12	WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models	Qiyue Yin et.al.	2506.10264	null
2025-06-11	ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs	Xiyao Wang et.al.	2506.10128	link
2025-06-11	Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection	David Farr et.al.	2506.10104	null
2025-06-11	Textual Bayes: Quantifying Uncertainty in LLM-Based Systems	Brendan Leigh Ross et.al.	2506.10060	null
2025-06-10	Evaluation empirique de la sécurisation et de l’alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks	Rafaël Nouailles et.al.	2506.10029	null
2025-06-16	Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs	Hiroshi Matsuda et.al.	2506.09983	link
2025-06-11	Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs	Rodion Oblovatny et.al.	2506.09886	null
2025-06-11	Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?	Andreas Säuberli et.al.	2506.09796	null
2025-06-11	Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models	Haoyi Song et.al.	2506.09684	link
2025-06-11	Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering	Tianjun Yao et.al.	2506.09645	link
2025-06-11	HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding	Yanzhao Shi et.al.	2506.09634	null
2025-06-11	From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies	Blaž Škrlj et.al.	2506.09566	null
2025-06-11	DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts	Yuchen Feng et.al.	2506.09351	null
2025-06-11	Know What You Don’t Know: Uncertainty Calibration of Process Reward Models	Young-Jin Park et.al.	2506.09338	null
2025-06-10	G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration	Samuel Holt et.al.	2506.09272	null
2025-06-10	Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation	Karl Löwenmark et.al.	2506.09247	null
2025-06-10	The Curious Language Model: Strategic Test-Time Information Acquisition	Michael Cooper et.al.	2506.09173	null
2025-06-10	Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models	Xinyuan Wang et.al.	2506.09084	null
2025-06-10	FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making	Jiaxiang Chen et.al.	2506.09080	null
2025-06-10	AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions	Polina Kirichenko et.al.	2506.09038	link
2025-06-11	Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance	Kaifeng He et.al.	2506.08980	null
2025-06-10	The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation	Francisco Vargas et.al.	2506.08827	null
2025-06-12	ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization	Hee Suk Yoon et.al.	2506.08712	null
2025-06-10	RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being	Rahatara Ferdousi et.al.	2506.08486	null
2025-06-10	Olica: Efficient Structured Pruning of Large Language Models without Retraining	Jiujun He et.al.	2506.08436	link
2025-06-11	Transforming Expert Knowledge into Scalable Ontology via Large Language Models	Ikkei Itoku et.al.	2506.08422	null
2025-06-09	Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic	Zhenjiang Mao et.al.	2506.08243	null
2025-06-09	Conservative Bias in Large Language Models: Measuring Relation Predictions	Toyin Aguda et.al.	2506.08120	null
2025-06-10	Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jiaxiang Chen et.al.	2506.07820	null
2025-06-09	Language-Vision Planner and Executor for Text-to-Visual Reasoning	Yichang Xu et.al.	2506.07778	null
2025-06-09	QUITE: A Query Rewrite System Beyond Rules with LLM Agents	Yuyang Song et.al.	2506.07675	null
2025-06-09	Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models	Ruiyang Zhang et.al.	2506.07575	null
2025-06-09	SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition	Mengsong Wu et.al.	2506.07557	null
2025-06-09	CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models	Guang Liu et.al.	2506.07463	null
2025-06-09	From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered	Siddartha Devic et.al.	2506.07461	null
2025-06-09	Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs	T. Duy Nguyen-Hien et.al.	2506.07448	null
2025-06-11	MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models	Philip R. Liu et.al.	2506.07400	link
2025-06-10	ARGUS: Hallucination and Omission Evaluation in Video-LLMs	Ruchit Rawal et.al.	2506.07371	null
2025-06-08	ConfQA: Answer Only If You Are Confident	Yin Huang et.al.	2506.07309	null
2025-06-08	Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance	Mohammadreza Chavoshi et.al.	2506.07273	null
2025-06-08	Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis	Yaping Chai et.al.	2506.07148	null
2025-06-08	Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models	Samir Abdaljalil et.al.	2506.07106	null
2025-06-08	Com $^2$ : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models	Kai Xiong et.al.	2506.07064	null
2025-06-08	AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint	Leheng Sheng et.al.	2506.07022	link
2025-06-07	Quantile Regression with Large Language Models for Price Prediction	Nikhita Vedula et.al.	2506.06657	null
2025-06-07	\textit{QuantMCP}: Grounding Large Language Models in Verifiable Financial Reality	Yifan Zeng et.al.	2506.06622	null
2025-06-06	Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques	Adarsh Prasad Behera et.al.	2506.06579	null
2025-06-06	Beyond Facts: Evaluating Intent Hallucination in Large Language Models	Yijie Hao et.al.	2506.06539	null
2025-06-11	Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models	Pengyi Li et.al.	2506.06395	null
2025-06-04	On the Fundamental Impossibility of Hallucination Control in Large Language Models	Michał P. Karpowicz et.al.	2506.06382	null
2025-06-06	Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge	Yi Sui et.al.	2506.06240	null
2025-06-06	Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach	James Ford et.al.	2506.06175	null
2025-06-06	Recommender systems, stigmergy, and the tyranny of popularity	Zackary Okun Dunivin et.al.	2506.06162	null
2025-06-09	MIRIAD: Augmenting LLMs with millions of medical query-response pairs	Qinyue Zheng et.al.	2506.06091	null
2025-06-06	AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search	Yu Li et.al.	2506.06017	null
2025-06-06	Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques	Xiaofei Xu et.al.	2506.05924	null
2025-06-06	Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness	Rongzhe Wei et.al.	2506.05735	null
2025-06-09	Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models	Zefan Zeng et.al.	2506.05675	null
2025-06-05	When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding	Yan Shu et.al.	2506.05551	null
2025-06-05	Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models	Sima Noorani et.al.	2506.05497	null
2025-06-05	CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection	Ron Eliav et.al.	2506.05243	null
2025-06-05	On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools	Shivani Upadhyay et.al.	2506.05182	link
2025-06-05	When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models	Kai Wang et.al.	2506.04909	null
2025-06-05	Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights	Giorgio Biancini et.al.	2506.04851	null
2025-06-05	Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models	Changyue Wang et.al.	2506.04832	link
2025-06-05	A Reasoning-Based Approach to Cryptic Crossword Clue Solving	Martin Andrews et.al.	2506.04824	null
2025-06-05	GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval	Lingyuan Liu et.al.	2506.04762	link
2025-06-05	Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning	Zhiyuan Ma et.al.	2506.04625	null
2025-06-05	Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification	Chengwu Liu et.al.	2506.04592	null
2025-06-04	AuthGuard: Generalizable Deepfake Detection via Language Guidance	Guangyu Shen et.al.	2506.04501	null
2025-06-04	“Don’t Do That!”: Guiding Embodied Systems through Large Language Model-based Constraint Generation	Aladin Djuhera et.al.	2506.04500	null
2025-06-04	Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification	Payel Bhattacharjee et.al.	2506.04450	null
2025-06-06	TracLLM: A Generic Framework for Attributing Long Context LLMs	Yanting Wang et.al.	2506.04202	link
2025-06-04	N $^2$ : A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion	Caleb Chin et.al.	2506.04166	link
2025-06-04	A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization	Sarvesh Soni et.al.	2506.04156	null
2025-06-04	High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning	Tim Franzmeyer et.al.	2506.04051	null
2025-06-04	Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization	Jiulong Wu et.al.	2506.04039	null
2025-06-05	Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems	Yuxin Zhang et.al.	2506.03901	null
2025-06-04	Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation	Mingxuan Xia et.al.	2506.03857	null
2025-06-04	From Theory to Practice: Real-World Use Cases on Trustworthy LLM-Driven Process Modeling, Prediction and Automation	Peter Pfeiffer et.al.	2506.03801	null
2025-06-04	Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision	Chaeyun Jang et.al.	2506.03723	null
2025-06-04	AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism	Zhepei Wei et.al.	2506.03700	link
2025-06-04	Robust Preference Optimization via Dynamic Target Margins	Jie Sun et.al.	2506.03690	null
2025-06-04	Trustworthy Medical Question Answering: An Evaluation-Centric Survey	Yinuo Wang et.al.	2506.03659	null
2025-06-04	Learning to Insert [PAUSE] Tokens for Better Reasoning	Eunki Kim et.al.	2506.03616	null
2025-06-04	Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering	Zhuo Zhuo et.al.	2506.03504	null
2025-06-03	Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior	Yue Gong et.al.	2506.03444	null
2025-06-03	Sampling Preferences Yields Simple Trustworthiness Scores	Sean Steinle et.al.	2506.03399	null
2025-06-03	Ask a Local: Detecting Hallucinations With Specialized Model Divergence	Aldan Creo et.al.	2506.03357	null
2025-06-03	Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows	Yifei Ming et.al.	2506.03332	null
2025-06-03	FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes	Christodoulos Constantinides et.al.	2506.03278	link
2025-06-03	Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech	Florian Ludwig et.al.	2506.03009	null
2025-06-03	Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation	Li Zhang et.al.	2506.02992	null
2025-06-03	Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation	Dingwei Chen et.al.	2506.02973	null
2025-06-04	A Multi-agent LLM-based JUnit Test Generation with Strong Oracles	Qinghua Xu et.al.	2506.02943	null
2025-06-03	Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs	Shangmin Guo et.al.	2506.02918	null
2025-06-03	Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs	Wenjing Tang et.al.	2506.02860	null
2025-06-03	Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations	Jinyuan Luo et.al.	2506.02696	null
2025-06-04	Computational Thinking Reasoning in Large Language Models	Kechi Zhang et.al.	2506.02658	null
2025-06-03	In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration	Jiajie Fu et.al.	2506.02509	null
2025-06-03	Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning	Haowen Xu et.al.	2506.02485	null
2025-06-02	Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation	Priyaranjan Pattnayak et.al.	2506.02097	null
2025-06-02	DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation	Jennifer Chen et.al.	2506.01954	null
2025-06-02	Self-ensemble: Mitigating Confidence Distortion for Large Language Models	Zicheng Xu et.al.	2506.01951	null
2025-06-02	WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue	Yaoyao Qian et.al.	2506.01881	link
2025-06-02	Benford’s Curse: Tracing Digit Bias to Numerical Hallucination in LLMs	Jiandong Shao et.al.	2506.01734	null
2025-06-02	Fairness Dynamics During Training	Krishna Patel et.al.	2506.01709	null
2025-06-02	When LLMs Team Up: The Emergence of Collaborative Affective Computing	Wenna Lai et.al.	2506.01698	null
2025-06-02	MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments	Xiao Yang et.al.	2506.01616	null
2025-06-02	Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes	Meng Li et.al.	2506.01512	null
2025-06-02	MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations	Kensuke Mitsuzawa et.al.	2506.01367	null
2025-06-02	Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents	Manan Suri et.al.	2506.01344	null
2025-06-02	Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model	Yuanhe Tian et.al.	2506.01266	null
2025-06-01	Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation	Pimchanok Sukjai et.al.	2506.01118	null
2025-06-01	ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation	Xinyi Liu et.al.	2506.01116	null
2025-06-01	Reconsidering LLM Uncertainty Estimation Methods in the Wild	Yavuz Bakman et.al.	2506.01114	null
2025-06-01	Contextual Candor: Enhancing LLM Trustworthiness Through Hierarchical Unanswerability Detection	Steven Robinson et.al.	2506.01104	null
2025-06-01	Taming LLMs by Scaling Learning Rates with Gradient Grouping	Siyuan Li et.al.	2506.01049	null
2025-06-01	Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks	Yuntai Bao et.al.	2506.00823	link
2025-06-01	One for All: Update Parameterized Knowledge Across Multiple Models	Weitao Ma et.al.	2506.00817	null
2025-06-01	Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision	Jiahui Zhou et.al.	2506.00807	null
2025-06-01	KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision	Rong Wu et.al.	2506.00783	null
2025-06-01	Do not Abstain! Identify and Solve the Uncertainty	Jingyu Liu et.al.	2506.00780	null
2025-05-31	Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection	Yeshwanth Venkatesha et.al.	2506.00743	null
2025-05-31	Pitfalls in Evaluating Language Model Forecasters	Daniel Paleka et.al.	2506.00723	null
2025-06-03	Measuring Faithfulness and Abstention: An Automated Pipeline for Evaluating LLM-Generated 3-ply Case-Based Legal Arguments	Li Zhang et.al.	2506.00694	null
2025-05-31	Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs	Chenjun Xu et.al.	2506.00582	link
2025-05-31	AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs	Nicholas E. Corrado et.al.	2506.00569	null
2025-06-03	CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention	Yuxi Sun et.al.	2506.00519	null
2025-05-31	Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering	Linhao Ye et.al.	2506.00491	null
2025-05-31	Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization	Suhas BN et.al.	2506.00448	null
2025-05-31	Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy	Jie Ren et.al.	2506.00359	null
2025-05-31	Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs	Sungjae Lee et.al.	2506.00344	null
2025-05-31	TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering	Boyi Zhang et.al.	2506.00331	null
2025-05-31	Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning	Sara Ghazanfari et.al.	2506.00318	null
2025-05-30	Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity	Dang Nguyen et.al.	2506.00245	null
2025-05-30	MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs	Gabrielle Kaili-May Liu et.al.	2505.24858	link
2025-05-30	Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs	Juraj Vladika et.al.	2505.24830	null
2025-06-02	Guiding Generative Storytelling with Knowledge Graphs	Zhijun Pan et.al.	2505.24803	null
2025-05-30	Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty?	Jiayu Liu et.al.	2505.24778	link
2025-05-30	Can LLMs and humans be friends? Uncovering factors affecting human-AI intimacy formation	Yeseon Hong et.al.	2505.24658	null
2025-05-30	The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models	Junyi Li et.al.	2505.24630	link
2025-05-30	LLM Inference Enhanced by External Knowledge: A Survey	Yu-Hsuan Lin et.al.	2505.24377	link
2025-05-30	ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration	Xianglong Yan et.al.	2505.24357	null
2025-05-30	Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction	Yangui Fang et.al.	2505.24347	null
2025-05-30	LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization	Zirui Shang et.al.	2505.24282	null
2025-06-02	MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM	Bowen Dong et.al.	2505.24238	null
2025-05-30	ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction	Murari Ambati et.al.	2505.24230	null
2025-05-30	Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling	Yimin Du et.al.	2505.24199	null
2025-05-29	Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model	Nokimul Hasan Arif et.al.	2505.24007	null
2025-05-29	Fitting the Message to the Moment: Designing Calendar-Aware Stress Messaging with Large Language Models	Pranav Rao et.al.	2505.23997	null
2025-05-29	Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	Yinong Oliver Wang et.al.	2505.23996	null
2025-05-29	FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression	Jiayi Tian et.al.	2505.23966	link
2025-05-29	Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation	Caiqi Zhang et.al.	2505.23912	null
2025-05-29	Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems	Winstead Zhu et.al.	2505.23908	null
2025-05-29	Revisiting Uncertainty Estimation and Calibration of Large Language Models	Linwei Tao et.al.	2505.23854	null
2025-05-28	Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs	Jakub Podolak et.al.	2505.23845	null
2025-05-28	SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context	Hairu Wang et.al.	2505.23841	null
2025-05-29	SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models	Zixiang Xu et.al.	2505.23713	link
2025-06-02	Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation	Hongxiang Zhang et.al.	2505.23657	null
2025-06-01	Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms	Jane Cleland-Huang et.al.	2505.23576	null
2025-05-30	EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions	Xiaorui Wu et.al.	2505.23473	null
2025-06-01	A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy	Ahmad Mohsin et.al.	2505.23397	null
2025-05-29	Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs	Julia Belikova et.al.	2505.23299	null
2025-05-29	Daunce: Data Attribution through Uncertainty Estimation	Xingyuan Pan et.al.	2505.23223	null
2025-05-29	DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes	Sungjune Park et.al.	2505.23179	null
2025-05-29	AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models	Jinchuan Zhang et.al.	2505.23020	link
2025-05-28	Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents	Michael Kirchhof et.al.	2505.22655	null
2025-05-28	The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason	Ang Lv et.al.	2505.22653	null
2025-05-30	Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs	Ziling Cheng et.al.	2505.22630	null
2025-05-28	Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding	Chengyue Wu et.al.	2505.22618	null
2025-05-28	Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users	Victor Jüttner et.al.	2505.22435	null
2025-05-28	AI Trust Reshaping Administrative Burdens: Understanding Trust-Burden Dynamics in LLM-Assisted Benefits Systems	Jeongwon Jo et.al.	2505.22418	null
2025-05-28	Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation	Yunsoo Kim et.al.	2505.22222	null
2025-05-31	iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs	Runkai Li et.al.	2505.22086	null
2025-05-28	Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?	Yujin Choi et.al.	2505.22061	null
2025-05-28	Legal Assist AI: Leveraging Transformer-Based Model for Effective Legal Assistance	Jatin Gupta et.al.	2505.22003	null
2025-05-28	ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning	Zhendong Mi et.al.	2505.21987	null
2025-05-28	Judging LLMs on a Simplex	Patrick Vossler et.al.	2505.21972	null
2025-05-28	Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning	Qihuang Zhong et.al.	2505.21958	null
2025-05-27	Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation	Tharindu Kumarage et.al.	2505.21784	null
2025-05-27	Calibrating LLM Confidence by Probing Perturbed Representation Stability	Reza Khanmohammadi et.al.	2505.21772	null
2025-05-30	Do We Know What LLMs Don’t Know? A Study of Consistency in Knowledge Probing	Raoyuan Zhao et.al.	2505.21701	null
2025-05-27	The Feasibility of Topic-Based Watermarking on Academic Peer Reviews	Alexander Nemecek et.al.	2505.21636	null
2025-05-27	Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems	Young-Min Cho et.al.	2505.21588	null
2025-05-27	Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making	Yihan Wang et.al.	2505.21503	null
2025-05-27	Can Large Reasoning Models Self-Train?	Sheikh Shafayat et.al.	2505.21444	null
2025-05-27	Pretrained LLMs Learn Multiple Types of Uncertainty	Roi Cohen et.al.	2505.21218	null
2025-05-27	Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA	Sergey Pletenev et.al.	2505.21115	null
2025-05-27	A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction	Bogdan Bogachov et.al.	2505.21109	null
2025-05-27	Thinker: Learning to Think Fast and Slow	Stephen Chung et.al.	2505.21097	null
2025-05-28	Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation	Ekaterina Fadeeva et.al.	2505.21072	null
2025-05-27	Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking	Lingyi Cai et.al.	2505.21045	null
2025-05-27	Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA	Xiangqing Shen et.al.	2505.20971	null
2025-05-27	IRCopilot: Automated Incident Response with Large Language Models	Xihuan Lin et.al.	2505.20945	null
2025-05-27	Towards Objective Fine-tuning: How LLMs’ Prior Knowledge Causes Potential Poor Calibration?	Ziming Wang et.al.	2505.20903	null
2025-05-27	MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection	Baraa Hikal et.al.	2505.20880	null
2025-05-27	Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG	Xin Sun et.al.	2505.20871	null
2025-05-27	AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding	Chaeyoung Jung et.al.	2505.20862	null
2025-05-27	Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation	Wooseong Yang et.al.	2505.20773	null
2025-05-30	CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models	Xiaqiang Tang et.al.	2505.20767	link
2025-05-27	RRO: LLM Agent Optimization Through Rising Reward Trajectories	Zilong Wang et.al.	2505.20737	null
2025-05-26	Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting	Ana Rita Ortigoso et.al.	2505.20521	null
2025-05-26	InFact: Informativeness Alignment for Improved LLM Factuality	Roi Cohen et.al.	2505.20487	null
2025-05-26	HAMburger: Accelerating LLM Inference via Token Smashing	Jingyu Liu et.al.	2505.20438	null
2025-05-26	GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation	Zihong Chen et.al.	2505.20416	link
2025-05-26	GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining	Simin Fan et.al.	2505.20380	null
2025-05-26	Reasoning LLMs are Wandering Solution Explorers	Jiahao Lu et.al.	2505.20296	null
2025-05-26	Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?	Michael Kirchhof et.al.	2505.20295	null
2025-05-26	Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models	Weihao Xuan et.al.	2505.20236	null
2025-05-27	Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning	Xiaorong Wang et.al.	2505.20195	null
2025-05-26	From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data	Chun-Yi Kuan et.al.	2505.20166	null
2025-05-26	Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities	Chuangtao Ma et.al.	2505.20099	link
2025-05-26	Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks	Debargha Ganguly et.al.	2505.20047	null
2025-05-26	Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs	Artem Vazhentsev et.al.	2505.20045	null
2025-05-26	DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response	Bilel Cherif et.al.	2505.19973	null
2025-05-26	CP-Router: An Uncertainty-Aware Router Between LLM and LRM	Jiayuan Su et.al.	2505.19970	null
2025-05-26	Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision	Tej Deep Pala et.al.	2505.19706	link
2025-05-26	Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement	Liqin Ye et.al.	2505.19675	link
2025-05-26	DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue	Yichun Feng et.al.	2505.19630	link
2025-05-26	Learning to Reason without External Rewards	Xuandong Zhao et.al.	2505.19590	link
2025-05-26	Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models	Jianxing Liao et.al.	2505.19490	null
2025-05-26	Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection	Mohammad Mahdi Moradi et.al.	2505.19475	null
2025-05-26	Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents	Ye Ye et.al.	2505.19436	link
2025-05-26	Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering	Jiajun Zhu et.al.	2505.19410	null
2025-05-26	VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation	Ethan TS. Liu et.al.	2505.19395	link
2025-05-25	Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales	Charles Godfrey et.al.	2505.19334	null
2025-05-25	LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models	Aida Kostikova et.al.	2505.19240	null
2025-05-25	GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling	Jialong Zhou et.al.	2505.19234	null
2025-05-25	LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling	Yang Xiao et.al.	2505.19187	link
2025-05-27	When Two LLMs Debate, Both Think They’ll Win	Pradyumna Shyama Prasad et.al.	2505.19184	null
2025-05-25	Do Large Language Models (Really) Need Statistical Foundations?	Weijie Su et.al.	2505.19145	null
2025-05-25	CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models	Yongheng Zhang et.al.	2505.19108	link
2025-05-25	Towards Harmonized Uncertainty Estimation for Large Language Models	Rui Li et.al.	2505.19073	null
2025-05-25	UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models	Roman Vashurin et.al.	2505.19060	null
2025-05-25	Online Knowledge Distillation with Reward Guidance	Chen Jia et.al.	2505.18952	null
2025-05-25	LLM-Guided Taxonomy and Hierarchical Uncertainty for 3D Point CLoud Active Learning	Chenxi Li et.al.	2505.18924	null
2025-05-24	Mitigating Deceptive Alignment via Self-Monitoring	Jiaming Ji et.al.	2505.18807	null
2025-05-24	PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs	Tengxuan Liu et.al.	2505.18610	link
2025-05-24	Response Uncertainty and Probe Modeling: Two Sides of the Same Coin in LLM Interpretability?	Yongjie Wang et.al.	2505.18575	null
2025-05-24	B-score: Detecting biases in large language models using response history	An Vo et.al.	2505.18545	null
2025-05-24	Benchmarking Poisoning Attacks against Retrieval-Augmented Generation	Baolei Zhang et.al.	2505.18543	null
2025-05-24	RoleRAG: Enhancing LLM Role-Playing via Graph Guided Retrieval	Yongjie Wang et.al.	2505.18541	null
2025-05-24	AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking	Soyoung Yoon et.al.	2505.18512	link
2025-05-24	MedScore: Factuality Evaluation of Free-Form Medical Answers	Heyuan Huang et.al.	2505.18452	link
2025-05-23	Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps	Khandakar Ashrafi Akbar et.al.	2505.18426	null
2025-05-23	Model Editing with Graph-Based External Memory	Yash Kumar Atri et.al.	2505.18343	null
2025-05-23	NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache	Donghyun Son et.al.	2505.18231	null
2025-05-23	Evidence-Grounded Multimodal Misinformation Detection with Attention-Based GNNs	Sharad Duwal et.al.	2505.18221	null
2025-05-26	Outcome-based Reinforcement Learning to Predict the Future	Benjamin Turtel et.al.	2505.17989	null
2025-05-23	LLM Meeting Decision Trees on Tabular Data	Hangting Ye et.al.	2505.17918	null
2025-05-23	Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour	Bálint Gyevnár et.al.	2505.17801	null
2025-05-23	C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models	Amir Hossein Rahmati et.al.	2505.17773	null
2025-05-23	But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors	Leon Eshuijs et.al.	2505.17760	null
2025-05-23	Get Experience from Practice: LLM Agents with Record & Replay	Erhu Feng et.al.	2505.17716	null
2025-05-23	Distilling LLM Agent into Small Models with Retrieval and Code Tools	Minki Kang et.al.	2505.17612	link
2025-05-23	Dynamic Text Bundling Supervision for Zero-Shot Inference on Text-Attributed Graphs	Yusheng Zhao et.al.	2505.17599	null
2025-05-23	Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection	Shrey Pandit et.al.	2505.17558	null
2025-05-23	How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception	Shiyu Ni et.al.	2505.17537	null
2025-05-23	CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents	Minsoo Khang et.al.	2505.17503	null
2025-05-23	keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection	Saketh Reddy Vemula et.al.	2505.17485	link
2025-05-23	Self-Training Large Language Models with Confident Reasoning	Hyosoon Jang et.al.	2505.17454	null
2025-05-23	A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit	Zafarullah Mahmood et.al.	2505.17362	link
2025-05-22	GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints	Soren DeHaan et.al.	2505.17327	null
2025-05-22	Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty	Peilin Wu et.al.	2505.17281	null
2025-05-22	Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG)	Clayton Cohn et.al.	2505.17238	null
2025-05-22	LLM-Powered Agents for Navigating Venice’s Historical Cadastre	Tristan Karch et.al.	2505.17148	null
2025-05-22	When can isotropy help adapt LLMs’ next word prediction to numerical domains?	Rashed Shelim et.al.	2505.17135	null
2025-05-21	NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction	Soyeon Kim et.al.	2505.17125	null
2025-05-22	R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning	Huatong Song et.al.	2505.17005	link
2025-05-22	UNCLE: Uncertainty Expressions in Long-Form Generation	Ruihan Yang et.al.	2505.16922	null
2025-05-22	Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs	Zeyu Wei et.al.	2505.16894	null
2025-05-22	Walk&Retrieve: Simple Yet Effective Zero-shot Retrieval-Augmented Generation via Knowledge Graph Walks	Martin Böckling et.al.	2505.16849	link
2025-05-22	Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement	Kexin Zhang et.al.	2505.16806	null
2025-05-22	Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs	Zeping Yu et.al.	2505.16703	null
2025-05-22	Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator	Beier Luo et.al.	2505.16690	null
2025-05-22	Collaboration among Multiple Large Language Models for Medical Question Answering	Kexin Shang et.al.	2505.16648	null
2025-05-22	Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering	Bowen Jiang et.al.	2505.16591	null
2025-05-22	Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs	Giovanni Servedio et.al.	2505.16520	null
2025-05-24	Recursive Offloading for LLM Serving in Multi-tier Networks	Zhiyuan Wu et.al.	2505.16502	link
2025-05-22	Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery	Yanbo Zhang et.al.	2505.16477	null
2025-05-22	MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM	Siwei Meng et.al.	2505.16456	null
2025-05-22	Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems	Hongru Song et.al.	2505.16367	null
2025-05-22	HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation	Shijie Zhang et.al.	2505.16281	null
2025-05-22	Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation	Derong Xu et.al.	2505.16237	null
2025-05-22	Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models	Menschikov Mikhail et.al.	2505.16134	null
2025-05-22	Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning	Junhong Lin et.al.	2505.16122	null
2025-05-22	LLM-Powered AI Agent Systems and Their Applications in Industry	Guannan Liang et.al.	2505.16120	null
2025-05-22	Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools	Panagiotis Lymperopoulos et.al.	2505.16113	null
2025-05-23	Continually Self-Improving Language Models for Bariatric Surgery Question–Answering	Yash Kumar Atri et.al.	2505.16102	null
2025-05-21	Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation	Ruijie Xi et.al.	2505.16065	null
2025-05-21	SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models	Roland Daynauth et.al.	2505.16003	null
2025-05-22	HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving	Zhiwen Chen et.al.	2505.15793	null
2025-05-21	Long-Form Information Alignment Evaluation Beyond Atomic Facts	Danna Zheng et.al.	2505.15792	null
2025-05-21	Large Language Models as Computable Approximations to Solomonoff Induction	Jun Wan et.al.	2505.15784	null
2025-05-21	KaFT: Knowledge-aware Fine-tuning for Boosting LLMs’ Domain-specific Question-Answering Performance	Qihuang Zhong et.al.	2505.15480	null
2025-05-21	AdUE: Improving uncertainty estimation head for LoRA adapters in LLMs	Artem Zabolotnyi et.al.	2505.15443	null
2025-05-21	RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection	Yiming Huang et.al.	2505.15386	null
2025-05-21	Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack	Silvia Cappelletti et.al.	2505.15323	null
2025-05-21	Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization	Joonho Yang et.al.	2505.15291	null
2025-05-21	Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs	Zihao Pan et.al.	2505.15265	null
2025-05-22	Adaptive Plan-Execute Framework for Smart Contract Security Auditing	Zhiyuan Wei et.al.	2505.15242	null
2025-05-21	Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge	Yassir Fathullah et.al.	2505.15240	null
2025-05-21	Multilingual Prompting for Improving LLM Generation Diversity	Qihan Wang et.al.	2505.15229	null
2025-05-21	Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs	Jie Ma et.al.	2505.15210	link
2025-05-21	ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection	Jeonghye Kim et.al.	2505.15182	null
2025-05-21	Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning	Jinghui Lu et.al.	2505.15154	null
2025-05-21	The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning	Shivam Agarwal et.al.	2505.15134	link
2025-05-21	RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals	Xuanliang Zhang et.al.	2505.15110	null
2025-05-21	Cost-aware LLM-based Online Dataset Annotation	Eray Can Elumar et.al.	2505.15101	null
2025-05-21	PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration	Yingming Pu et.al.	2505.15047	link
2025-05-21	Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models	Zhihao Wen et.al.	2505.14992	null
2025-05-20	JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation	Ghasem Pasandi et.al.	2505.14978	null
2025-05-20	Foundations of Unknown-aware Machine Learning	Xuefeng Du et.al.	2505.14933	null
2025-05-20	$\texttt{LLINBO}$ : Trustworthy LLM-in-the-Loop Bayesian Optimization	Chih-Yu Chang et.al.	2505.14756	link
2025-05-20	Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models	Guangzhi Xiong et.al.	2505.14599	link
2025-05-20	Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples	Chun-Yi Kuan et.al.	2505.14518	null
2025-05-20	Reasoning Models Better Express Their Confidence	Dongkeun Yoon et.al.	2505.14489	link
2025-05-21	Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis	Haoming Huang et.al.	2505.14406	null
2025-05-20	Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs	Jiawen Wang et.al.	2505.14368	null
2025-05-20	Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents	Wei Fan et.al.	2505.14104	null
2025-05-20	MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations	Ernests Lavrinovics et.al.	2505.14101	link
2025-05-20	Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering	Yihua Zhu et.al.	2505.14099	null
2025-05-20	ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data	Xinzhe Zheng et.al.	2505.14038	null
2025-05-21	When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty	Yanzhe Wen et.al.	2505.13989	null
2025-05-20	The Hallucination Tax of Reinforcement Finetuning	Linxin Song et.al.	2505.13988	null
2025-05-20	MLZero: A Multi-Agent System for End-to-end Machine Learning Automation	Haoyang Fang et.al.	2505.13941	link
2025-05-20	DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery	Kun Li et.al.	2505.13940	link
2025-05-20	Preference Learning with Lie Detectors can Induce Honesty or Evasion	Chris Cundy et.al.	2505.13787	link
2025-05-19	Incentivizing Truthful Language Models via Peer Elicitation Games	Baiting Chen et.al.	2505.13636	link
2025-05-19	Selective Code Generation for Functional Guarantees	Jaewoo Jeong et.al.	2505.13553	null
2025-05-19	Exploring Federated Pruning for Large Language Models	Pengxin Guo et.al.	2505.13547	link
2025-05-19	Know Or Not: a library for evaluating out-of-knowledge base robustness	Jessica Foo et.al.	2505.13545	link
2025-05-16	An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents	Ayesha Amjad et.al.	2505.13504	null
2025-05-19	GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection	Zhijie Deng et.al.	2505.13312	null
2025-05-19	Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice	Zhi Liu et.al.	2505.13156	null
2025-05-19	Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning	Debarpan Bhattacharya et.al.	2505.13115	link
2025-05-19	Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs	Shmulik Markovich-Golan et.al.	2505.13060	null
2025-05-19	Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering	Jianfeng Cai et.al.	2505.12826	null
2025-05-19	LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries	Kenya Abe et.al.	2505.12694	link
2025-05-19	Know3-RAG: A Knowledge-aware RAG Framework with Adaptive Retrieval, Generation, and Filtering	Xukai Liu et.al.	2505.12662	link
2025-05-18	UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection	Yang Zhao et.al.	2505.12457	null
2025-05-18	VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning	Qi Wang et.al.	2505.12434	link
2025-05-18	PSC: Extending Context Window of Large Language Models via Phase Shift Calibration	Wenqiao Zhu et.al.	2505.12423	link
2025-05-18	SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization	Minghan Chen et.al.	2505.12346	null
2025-05-18	Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge	Luyu Chen et.al.	2505.12301	null
2025-05-18	The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models	Linghan Huang et.al.	2505.12287	null
2025-05-18	Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation	Chengwei Qin et.al.	2505.12265	null
2025-05-17	The Impact of Emerging Phishing Threats: Assessing Quishing and LLM-generated Phishing Emails against Organizations	Marie Weinz et.al.	2505.12104	null
2025-05-20	MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities	Jingxue Chen et.al.	2505.12043	null
2025-05-17	SOCIA: An End-to-End Agentic Framework for Automated Cyber-Physical-Social Simulator Generation	Yuncheng Hua et.al.	2505.12006	null
2025-05-17	TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text	Ahmed Lekssays et.al.	2505.11988	link
2025-05-17	CCNU at SemEval-2025 Task 3: Leveraging Internal and External Knowledge of Large Language Models for Multilingual Hallucination Annotation	Xu Liu et.al.	2505.11965	null
2025-05-17	Fine-Grained ECG-Text Contrastive Learning via Waveform Understanding Enhancement	Haitao Li et.al.	2505.11939	null
2025-05-17	Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?	Zihao Dongfang et.al.	2505.11907	null
2025-05-17	When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research	Guijin Son et.al.	2505.11855	null
2025-05-17	Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs	Xuannan Liu et.al.	2505.11842	link
2025-05-17	Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling	Yitian Chen et.al.	2505.11792	null
2025-05-17	Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission	Seungeun Oh et.al.	2505.11788	null
2025-05-16	Token-Level Uncertainty Estimation for Large Language Model Reasoning	Tunyu Zhang et.al.	2505.11737	null
2025-05-16	Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models	Harshil Vejendla et.al.	2505.11731	null
2025-05-16	Terminators: Terms of Service Parsing and Auditing Agents	Maruf Ahmed Mridul et.al.	2505.11672	null
2025-05-16	EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	Bohao Xing et.al.	2505.11405	link
2025-05-19	Phare: A Safety Probe for Large Language Models	Pierre Le Jeune et.al.	2505.11365	link
2025-05-16	The Way We Prompt: Conceptual Blending, Neural Dynamics, and Prompt-Induced Transitions in LLMs	Makoto Sato et.al.	2505.10948	null
2025-05-19	Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation	Zhan Peng Lee et.al.	2505.10792	link
2025-05-19	Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding	Jianfei Zhao et.al.	2505.10634	null
2025-05-14	The Impact of Large Language Models on Task Automation in Manufacturing Services	Jochen Wulf et.al.	2505.10581	null
2025-05-20	AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges	Ranjan Sapkota et.al.	2505.10468	null
2025-05-15	GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs	Longchao Da et.al.	2505.10143	null
2025-05-16	Leveraging Graph Retrieval-Augmented Generation to Support Learners’ Understanding of Knowledge Concepts in MOOCs	Mohamed Abdelmagied et.al.	2505.10074	null
2025-05-15	Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis	Bingda Tang et.al.	2505.10046	link
2025-05-15	Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph	Deeksha Prahlad et.al.	2505.09945	link
2025-05-15	Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks	Ziyuan Zhang et.al.	2505.09901	link
2025-05-14	A Multimodal Multi-Agent Framework for Radiology Report Generation	Ziruo Yi et.al.	2505.09787	null
2025-05-14	Trustless Autonomy: Understanding Motivations, Benefits and Governance Dilemma in Self-Sovereign Decentralized AI Agents	Botao Amber Hu et.al.	2505.09757	null
2025-05-15	SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation	Achref Doula et.al.	2505.09427	null
2025-05-14	Statistical Modeling and Uncertainty Estimation of LLM Inference Systems	Kaustabha Ray et.al.	2505.09319	null
2025-05-14	Atomic Consistency Preference Optimization for Long-Form Question Answering	Jingfeng Chen et.al.	2505.09039	link
2025-05-13	Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification	Adarsh Kumar et.al.	2505.09031	null
2025-05-13	Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training	Yangyi Chen et.al.	2505.08971	link
2025-05-13	CellTypeAgent: Trustworthy cell type annotation with Large Language Models	Jiawen Chen et.al.	2505.08844	link
2025-05-13	Adaptive Schema-aware Event Extraction with Retrieval-Augmented Generation	Sheng Liang et.al.	2505.08690	null
2025-05-13	RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models	Fujun Zhang et.al.	2505.08463	null
2025-05-13	A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs	Artem Shelmanov et.al.	2505.08200	null
2025-05-12	LLMs to Support K-12 Teachers in Culturally Relevant Pedagogy: An AI Literacy Example	Jiayi Wang et.al.	2505.08083	null
2025-05-11	TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking	Ching Nam Hang et.al.	2505.07891	null
2025-05-10	Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints	Jian-Qiao Zhu et.al.	2505.07883	null
2025-05-09	Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy	A M Muntasir Rahman et.al.	2505.07871	null
2025-05-12	Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding	Yifeng Di et.al.	2505.07768	link
2025-05-12	KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation	Ching Han Chen et.al.	2505.07618	null
2025-05-12	Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent	Ziyang Huang et.al.	2505.07596	null
2025-05-12	Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models	Bahram Mohammadi et.al.	2505.07500	null
2025-05-12	Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis	Heydar Soudani et.al.	2505.07459	null
2025-05-12	LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning	Xiaotian Lin et.al.	2505.07437	link
2025-05-12	Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data	David de-Fitero-Dominguez et.al.	2505.07372	null
2025-05-12	Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection	Pei-Fu Guo et.al.	2505.07309	null
2025-05-12	Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs	Yifan Wei et.al.	2505.07184	link
2025-05-13	Exploring Anthropomorphism in Conversational Agents for Environmental Sustainability	Mathyas Giudici et.al.	2505.07142	null
2025-05-14	RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models	Hanzheng Dai et.al.	2505.07089	null
2025-05-10	POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models	Yangguang Shao et.al.	2505.06579	link
2025-05-10	LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus	Peihan Li et.al.	2505.06513	null
2025-05-09	Evolutionary thoughts: integration of large language models and evolutionary algorithms	Antonio Jimeno Yepes et.al.	2505.05756	link
2025-05-08	Adaptive Stress Testing Black-Box LLM Planners	Neeloy Chakraborty et.al.	2505.05665	null
2025-05-08	HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics	Lennart Luettgau et.al.	2505.05602	link
2025-05-08	FLAM: Frame-Wise Language-Audio Modeling	Yusong Wu et.al.	2505.05335	null
2025-05-08	MARK: Memory Augmented Refinement of Knowledge	Anish Ganguli et.al.	2505.05177	null
2025-05-08	A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network	Haoxiang Luo et.al.	2505.05103	null
2025-05-08	Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware	Yujia Chen et.al.	2505.05057	link
2025-05-08	An Open-Source Dual-Loss Embedding Model for Semantic Retrieval in Higher Education	Ramteja Sajja et.al.	2505.04916	null
2025-05-07	Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards	Manveer Singh Tamber et.al.	2505.04847	link
2025-05-07	Osiris: A Lightweight Open-Source Hallucination Detection System	Alex Shan et.al.	2505.04844	null
2025-05-07	A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models	Pedro Pinacho-Davidson et.al.	2505.04784	null
2025-05-07	The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems	Sutapa Dey Tithi et.al.	2505.04736	null
2025-05-06	Advancing Conversational Diagnostic AI with Multimodal Reasoning	Khaled Saab et.al.	2505.04653	null
2025-05-06	Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions	Adithya Kulkarni et.al.	2505.04651	null
2025-05-09	MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection	Zhihao Zhang et.al.	2505.04594	null
2025-05-07	Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters	David Exler et.al.	2505.04393	null
2025-05-07	Benchmarking LLMs’ Swarm intelligence	Kai Ruan et.al.	2505.04364	link
2025-05-07	LLM-Independent Adaptive RAG: Let the Question Speak for Itself	Maria Marina et.al.	2505.04253	null
2025-05-07	Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications	Yuanai Xie et.al.	2505.04068	null
2025-05-02	Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs	Ganghua Wang et.al.	2505.03814	null
2025-05-02	MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance	Xing Hu et.al.	2505.03804	null
2025-05-02	Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth	Changhai Zhou et.al.	2505.03802	null
2025-04-30	Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding	Trilok Padhi et.al.	2505.03788	null
2025-05-06	A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning	Kolawole E. Ogunsina et.al.	2505.03553	null
2025-05-06	Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis	Shuang Zhou et.al.	2505.03467	null
2025-05-06	Automatic Calibration for Membership Inference Attack on Large Language Models	Saleh Zare Zade et.al.	2505.03392	link
2025-05-06	Interpretable Zero-shot Learning with Infinite Class Concepts	Zihan Ye et.al.	2505.03361	null
2025-05-06	Artificial Behavior Intelligence: Technology, Challenges, and Future Directions	Kanghyun Jo et.al.	2505.03315	null
2025-05-06	A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case	Haoxiang Luo et.al.	2505.03196	null
2025-05-06	Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering	Joshua Owotogbe et.al.	2505.03096	null
2025-05-05	Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models	Zhengliang Shi et.al.	2505.03075	link
2025-05-05	UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output	Sicong Huang et.al.	2505.03030	null
2025-05-05	Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?	Guangzhi Sun et.al.	2505.02884	null
2025-05-05	Phase transitions in AI-human interaction networks: statistics, computation, and probabilistic modeling	Jackson George et.al.	2505.02879	null
2025-05-08	ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations	Dmitriy Shopkhoev et.al.	2505.02819	link
2025-05-05	Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing	Diji Yang et.al.	2505.02811	link
2025-05-06	Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation	Gerard Pons et.al.	2505.02737	null
2025-05-04	SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation	Tanguy Herserant et.al.	2505.02235	null
2025-05-12	LLM-Guided Probabilistic Program Induction for POMDP Model Estimation	Aidan Curtis et.al.	2505.02216	null
2025-05-04	Large Language Models are overconfident and amplify human bias	Fengfei Sun et.al.	2505.02151	null
2025-05-04	VECSR: Virtually Embodied Common Sense Reasoning System	Alexis R. Tudor et.al.	2505.02144	link
2025-05-06	Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation	Chenxi Liu et.al.	2505.02138	link
2025-05-04	Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach	Jiancong Xiao et.al.	2505.01997	null
2025-05-03	High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers	Brian Wong et.al.	2505.01693	null
2025-05-02	Always Tell Me The Odds: Fine-grained Conditional Probability Estimation	Liaoyaqi Wang et.al.	2505.01595	null
2025-05-02	Retrieval Augmented Learning: A Retrial-based Large Language Model Self-Supervised Learning and Autonomous Knowledge Generation	Zongyuan Li et.al.	2505.01073	null
2025-05-02	Multi-agents based User Values Mining for Recommendation	Lijian Chen et.al.	2505.00981	null
2025-05-01	Multivariate Conformal Selection	Tian Bai et.al.	2505.00917	null
2025-05-08	SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation	Quang P. M. Pham et.al.	2505.00831	link
2025-05-01	HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models	Zhaoxing Li et.al.	2505.00820	null
2025-05-01	A Survey on Large Language Model based Human-Agent Systems	Henry Peng Zou et.al.	2505.00753	link
2025-05-05	Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs	Dung Nguyen et.al.	2505.00744	null
2025-05-01	Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models	Makoto Sato et.al.	2505.00557	null
2025-05-01	HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection	Deanna Emery et.al.	2505.00506	null
2025-05-01	Distributed Retrieval-Augmented Generation	Chenhao Xu et.al.	2505.00443	link
2025-04-30	Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs	Jinyan Su et.al.	2505.00127	null
2025-04-30	Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5	Jeho Choi et.al.	2505.00060	null
2025-04-24	An Empirical Study on Prompt Compression for Large Language Models	Zheng Zhang et.al.	2505.00019	link
2025-04-30	MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness	Junsheng Huang et.al.	2504.21773	null
2025-04-30	Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA	Xuanzhao Dong et.al.	2504.21252	link
2025-05-01	AI-in-the-Loop Planning for Transportation Electrification: Case Studies from Austin, Texas	Seung Jun Choi et.al.	2504.21185	null
2025-04-29	LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge	Naheed Rayhan et.al.	2504.21132	null
2025-04-22	ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees	Jun Wang et.al.	2504.21022	null
2025-04-22	Context-Enhanced Contrastive Search for Improved LLM Text Generation	Jaydip Sen et.al.	2504.21020	null
2025-04-29	Jekyll-and-Hyde Tipping Point in an AI’s Behavior	Neil F. Johnson et.al.	2504.20980	null
2025-04-29	SetKE: Knowledge Editing for Knowledge Elements Overlap	Yifan Wei et.al.	2504.20972	null
2025-04-29	Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models	Maryna Vyshnyvetska et.al.	2504.20951	null
2025-04-29	DYNAMAX: Dynamic computing for Transformers and Mamba based architectures	Miguel Nogales et.al.	2504.20922	null
2025-04-29	Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges	Yunseo Lee et.al.	2504.20799	null
2025-04-29	Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think	Hasan Abed Al Kader Hammoud et.al.	2504.20708	null
2025-04-29	Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?	Evangelia Gogoulou et.al.	2504.20699	null
2025-04-29	Identifying Uncertainty in Self-Adaptive Robotics with Large Language Models	Hassan Sartaj et.al.	2504.20684	null
2025-04-30	TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data	Qi Wang et.al.	2504.20462	null
2025-04-28	Towards Large Language Models for Lunar Mission Planning and In Situ Resource Utilization	Michael Pekala et.al.	2504.20125	null
2025-04-24	RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning	Zihan Wang et.al.	2504.20073	link
2025-04-28	Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages	Pritika Rohera et.al.	2504.20022	null
2025-04-28	Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models	Xin Wang et.al.	2504.20020	null
2025-04-28	GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets	Mingqian He et.al.	2504.19898	null
2025-04-28	A Tripartite Perspective on GraphRAG	Michael Banf et.al.	2504.19667	null
2025-04-28	An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination	Dixiao Wei et.al.	2504.19480	null
2025-04-28	Towards Long Context Hallucination Detection	Siyi Liu et.al.	2504.19457	null
2025-04-27	Bi-directional Model Cascading with Proxy Confidence	David Warren et.al.	2504.19391	null
2025-04-27	The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach	Chad Coleman et.al.	2504.19255	null
2025-04-30	Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers	Dylan Bouchard et.al.	2504.19254	link
2025-04-27	Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models	Anindya Bijoy Das et.al.	2504.19061	null
2025-04-26	Calibrating Translation Decoding with Quality Estimation on LLMs	Di Wu et.al.	2504.19044	link
2025-04-26	AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression	Dong Whi Yoo et.al.	2504.18932	null
2025-04-26	Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning	Abdellah Ghassel et.al.	2504.18839	null
2025-04-25	Span-Level Hallucination Detection for LLM-Generated Answers	Passant Elchafei et.al.	2504.18639	null
2025-04-24	Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach	Iizalaarab Elhaimeur et.al.	2504.18603	null
2025-04-25	LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection	Rajesh Yarra et.al.	2504.18423	null
2025-04-25	Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review	Toghrul Abbasli et.al.	2504.18346	null
2025-04-25	Evaluating Evaluation Metrics – The Mirage of Hallucination Detection	Atharva Kulkarni et.al.	2504.18114	null
2025-04-25	Random-Set Large Language Models	Muhammad Mubashar et.al.	2504.18085	null
2025-04-25	Validating Network Protocol Parsers with Traceable RFC Document Interpretation	Mingwei Zheng et.al.	2504.18050	null
2025-04-24	LLM Agent Swarm for Hypothesis-Driven Drug Discovery	Kevin Song et.al.	2504.17967	null
2025-04-24	HalluLens: LLM Hallucination Benchmark	Yejin Bang et.al.	2504.17550	null
2025-04-24	Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing	Hieu Huynh et.al.	2504.17287	null
2025-04-23	How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study	Rendi Chevi et.al.	2504.17083	null
2025-04-23	Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models	Shariar Kabir et.al.	2504.17052	null
2025-04-23	(Im)possibility of Automated Hallucination Detection in Large Language Models	Amin Karbasi et.al.	2504.17004	null
2025-04-18	SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments	Dachun Sun et.al.	2504.16947	null
2025-04-23	Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models	Xuyang Zhu et.al.	2504.16883	null
2025-04-23	Monte Carlo Planning with Large Language Model for Text-Based Game Agents	Zijing Shi et.al.	2504.16855	null
2025-04-23	Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories	Mareike Lisker et.al.	2504.16604	null
2025-04-23	ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving	Jie JW Wu et.al.	2504.16331	null
2025-04-23	Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations	Nikhil Khandalkar et.al.	2504.15903	null
2025-04-22	Dynamic Early Exit in Reasoning Models	Chenxu Yang et.al.	2504.15895	link
2025-04-22	Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback	Ning Wang et.al.	2504.15804	null
2025-04-22	Grounded in Context: Retrieval-Based Method for Hallucination Detection	Assaf Gerner et.al.	2504.15771	null
2025-04-20	PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind	Yajie Yu et.al.	2504.15313	null
2025-04-21	Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning	Ehsan Ahmadi et.al.	2504.15263	null
2025-04-21	Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges	Nandan Thakur et.al.	2504.15205	null
2025-04-21	The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models	Ronak Pradeep et.al.	2504.15068	null
2025-04-23	aiXamine: Simplified LLM Safety and Security	Fatih Deniz et.al.	2504.14985	null
2025-04-21	POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications	Chunjing Gan et.al.	2504.14917	null
2025-04-21	CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs	Yingming Zheng et.al.	2504.14905	link
2025-04-20	HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis	Kangwei Xu et.al.	2504.14641	null
2025-04-20	A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models	Hongming Tan et.al.	2504.14620	null
2025-04-20	a1: Steep Test-time Scaling Law via Environment Augmented Generation	Lingrui Mei et.al.	2504.14597	null
2025-04-20	Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey	Ahsan Bilal et.al.	2504.14520	null
2025-04-20	VizTA: Enhancing Comprehension of Distributional Visualization with Visual-Lexical Fused Conversational Interface	Liangwei Wang et.al.	2504.14507	null
2025-04-20	CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge	Armin Toroghi et.al.	2504.14462	null
2025-04-20	Information Diffusion and Preferential Attachment in a Network of Large Language Models	Adit Jain et.al.	2504.14438	null
2025-04-20	ResNetVLLM-2: Addressing ResNetVLLM’s Multi-Modal Hallucinations	Ahmad Khalil et.al.	2504.14429	null
2025-04-19	Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts	Kun Qian et.al.	2504.14375	null
2025-04-19	Density Measures for Language Generation	Jon Kleinberg et.al.	2504.14370	null
2025-04-19	Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model	Youngbin Lee et.al.	2504.14345	link
2025-04-19	A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models	Chengming Wang et.al.	2504.14241	null
2025-04-18	Metacognition and Uncertainty Communication in Humans and Large Language Models	Mark Steyvers et.al.	2504.14045	null
2025-04-18	Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy	Regan Bolton et.al.	2504.14044	null
2025-04-18	Going Whole Hog: A Philosophical Defense of AI Cognition	Herman Cappelen et.al.	2504.13988	null
2025-04-18	Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations	Chenghao Xiao et.al.	2504.13816	link
2025-04-18	Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results	Andrea Santilli et.al.	2504.13677	null
2025-04-18	Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code	Antonio Della Porta et.al.	2504.13656	null
2025-04-18	Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs	Gabriel Freedman et.al.	2504.13644	link
2025-04-18	Long-context Non-factoid Question Answering in Indic Languages	Ritwik Mishra et.al.	2504.13615	link
2025-04-18	Continual Pre-Training is (not) What You Need in Domain Adaption	Pin-Er Chen et.al.	2504.13603	null
2025-04-18	Trust, but verify	Michael J. Yuan et.al.	2504.13443	null
2025-04-17	Energy-Based Reward Models for Robust Language Model Alignment	Anamika Lochab et.al.	2504.13134	link
2025-04-17	VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	Haojian Huang et.al.	2504.13122	link
2025-04-17	Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild	Jiatai Wang et.al.	2504.12982	null
2025-04-17	QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?	Zhouyang Jiang et.al.	2504.12961	null
2025-04-18	Customizing Emotional Support: How Do Individuals Construct and Interact With LLM-Powered Chatbots	Xi Zheng et.al.	2504.12943	null
2025-04-17	Explainable AI in Usable Privacy and Security: Challenges and Opportunities	Vincent Freiberger et.al.	2504.12931	null
2025-04-17	Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration	Yicheng Pan et.al.	2504.12773	link
2025-04-17	Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations	Yiyou Sun et.al.	2504.12691	link
2025-04-17	Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models	Liyi Zhang et.al.	2504.12585	link
2025-04-16	PlanGlow: Personalized Study Planning with an Explainable and Controllable LLM-Driven System	Jiwon Chun et.al.	2504.12452	link
2025-04-16	Don’t Just Translate, Agitate: Using Large Language Models as Devil’s Advocates for AI Explanations	Ashley Suh et.al.	2504.12424	null
2025-04-16	Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study	Harry Li et.al.	2504.12422	null
2025-04-16	Gauging Overprecision in LLMs: An Empirical Study	Adil Bahaj et.al.	2504.12098	null
2025-04-16	Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models	Kris Pilcher et.al.	2504.12012	null
2025-04-16	SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes	Raúl Vázquez et.al.	2504.11975	null
2025-04-16	Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading	Kihyun Kim et.al.	2504.11816	link
2025-04-16	Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming	Paul Denny et.al.	2504.11723	null
2025-04-15	From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs	Guocong Li et.al.	2504.11277	null
2025-04-16	Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR	Yulong Zhang et.al.	2504.11101	null
2025-04-15	MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique	Shuhang Liu et.al.	2504.11009	null
2025-04-14	CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates	Ankit Kumar Shaw et.al.	2504.10738	null
2025-04-14	HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving	Avinash Kumar et.al.	2504.10724	null
2025-04-14	EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration	Soham Shah et.al.	2504.10681	null
2025-04-14	Efficient Process Reward Model Training via Active Learning	Keyu Duan et.al.	2504.10559	link
2025-04-09	Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion	Jakub Podolak et.al.	2504.10509	null
2025-04-14	Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling?	Olha Shaposhnyk et.al.	2504.10397	null
2025-04-16	Heimdall: test-time scaling on the generative verification	Wenlei Shi et.al.	2504.10337	null
2025-04-14	From Prompting to Alignment: A Generative Framework for Query Recommendation	Erxue Min et.al.	2504.10208	null
2025-04-14	DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation	Hanghui Guo et.al.	2504.10198	null
2025-04-14	HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection	Mohamed A. Abdallah et.al.	2504.10168	null
2025-04-14	C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation	Xu Zhang et.al.	2504.10167	null
2025-04-14	The Human Visual System Can Inspire New Interaction Paradigms for LLMs	Diana Robinson et.al.	2504.10101	null
2025-04-14	Hallucination Detection in LLMs via Topological Divergence on Attention Graphs	Alexandra Bazarova et.al.	2504.10063	null
2025-04-15	Emotional Strain and Frustration in LLM Interactions in Software Engineering	Cristina Martinez Montes et.al.	2504.10050	null
2025-04-14	DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify	Zhengxuan Zhang et.al.	2504.10036	null
2025-04-14	EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control	Hanwen Wan et.al.	2504.10030	link
2025-04-14	KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference	Yuxuan Tian et.al.	2504.09936	null
2025-04-14	Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data	Shuai Zhao et.al.	2504.09895	null
2025-04-14	Reasoning Models Can Be Effective Without Thinking	Wenjie Ma et.al.	2504.09858	null
2025-04-14	RAKG:Document-level Retrieval Augmented Knowledge Graph Construction	Hairong Zhang et.al.	2504.09823	link
2025-04-14	Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning	Jingtian Wu et.al.	2504.09781	null
2025-04-13	DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training	Zhenting Wang et.al.	2504.09710	link
2025-04-17	Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws	Zhixuan Pan et.al.	2504.09597	null
2025-04-17	ControlNET: A Firewall for RAG-based LLM System	Hongwei Yao et.al.	2504.09593	null
2025-04-13	How new data permeates LLM knowledge and how to dilute it	Chen Sun et.al.	2504.09522	null
2025-04-13	HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs	Sharanya Dasgupta et.al.	2504.09482	link
2025-04-13	Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection	MingShan Liu et.al.	2504.09440	null
2025-04-12	Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation	Pengcheng Zhou et.al.	2504.09301	null
2025-04-12	SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders	Ashmi Banerjee et.al.	2504.09277	null
2025-04-12	Towards More Efficient, Robust, Instance-adaptive, and Generalizable Online Learning	Zhiyong Wang et.al.	2504.09192	null
2025-04-11	Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation	Terrence Neumann et.al.	2504.08954	null
2025-04-11	Knowledge Graph-extended Retrieval Augmented Generation for Question Answering	Jasper Linders et.al.	2504.08893	null
2025-04-11	Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning	Fangzhi Xu et.al.	2504.08672	link
2025-04-11	MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation	Tao Zhang et.al.	2504.08621	link
2025-04-16	Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks	Ye Ye et.al.	2504.08525	link
2025-04-07	SEAL: Steerable Reasoning Calibration of Large Language Models for Free	Runjin Chen et.al.	2504.07986	link
2025-04-10	Token Level Routing Inference System for Edge Devices	Jianshu She et.al.	2504.07878	null
2025-04-10	Robust Hallucination Detection in LLMs via Adaptive Token Selection	Mengjia Niu et.al.	2504.07863	null
2025-04-17	PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization	Yang Jiao et.al.	2504.07717	null
2025-04-10	Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations	Sheila Castilho et.al.	2504.07680	null
2025-04-10	Enhancing Large Language Models through Neuro-Symbolic Integration and Ontological Reasoning	Ruslan Idelfonso Magana Vsevolodovna et.al.	2504.07640	link
2025-04-11	Malware analysis assisted by AI with R2AI	Axelle Apvrille et.al.	2504.07574	null
2025-04-10	A taxonomy of epistemic injustice in the context of AI and the case for generative hermeneutical erasure	Warmhold Jan Thomas Mollema et.al.	2504.07531	null
2025-04-10	Supervised Optimism Correction: Be Confident When LLMs Are Sure	Junjie Zhang et.al.	2504.07527	null
2025-04-10	Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction	Kyoyun Choi et.al.	2504.07415	null
2025-04-10	Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression	Hanqi Xiao et.al.	2504.07389	link
2025-04-11	Alice: Proactive Learning with Teacher’s Demonstrations for Weak-to-Strong Generalization	Shujin Wu et.al.	2504.07316	link
2025-04-09	HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification	Bibek Paudel et.al.	2504.07069	null
2025-04-11	Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration	Kostas Hatalis et.al.	2504.06943	null
2025-04-09	Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program	Minghe Gao et.al.	2504.06606	link
2025-04-09	Do Reasoning Models Show Better Verbalized Calibration?	Qingcheng Zeng et.al.	2504.06564	null
2025-04-08	Don’t Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning	Yuehan Qin et.al.	2504.06438	null
2025-04-08	Human Trust in AI Search: A Large-Scale Experiment	Haiwen Li et.al.	2504.06435	null
2025-04-09	GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization	Bojana Ranković et.al.	2504.06265	link
2025-04-08	VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs	Dongjun Qian et.al.	2504.05673	null
2025-04-08	On the Impact of Language Nuances on Sentiment Analysis with Large Language Models: Paraphrasing, Sarcasm, and Emojis	Naman Bhargava et.al.	2504.05603	null
2025-04-07	GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases	Alfred Clemedtson et.al.	2504.05478	link
2025-04-07	The challenge of uncertainty quantification of large language models in medicine	Zahra Atf et.al.	2504.05278	null
2025-04-07	DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation	Xinglin Lyu et.al.	2504.05122	link
2025-04-07	On the Performance of an Explainable Language Model on PubMedQA	Venkat Srinivasan et.al.	2504.05074	null
2025-04-07	Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning	Sugyeong Eo et.al.	2504.05047	null
2025-04-07	A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models	Carlos Peláez-González et.al.	2504.04976	null
2025-04-07	A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization	Wenyuan Xu et.al.	2504.04950	null
2025-04-06	Capturing AI’s Attention: Physics of Repetition, Hallucination, Bias and Beyond	Frank Yingjie Huo et.al.	2504.04600	null
2025-04-06	Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models	Rui Gan et.al.	2504.04562	link
2025-04-06	VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT	Zhuo Zhi et.al.	2504.04471	null
2025-04-06	An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainability	David Herrera-Poyatos et.al.	2504.04462	null
2025-04-09	How Accurately Do Large Language Models Understand Code?	Sabaat Haroon et.al.	2504.04372	null
2025-04-06	Generative Large Language Models Trained for Detecting Errors in Radiology Reports	Cong Sun et.al.	2504.04336	null
2025-04-09	Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks	Marios Kokkodis et.al.	2504.04277	null
2025-04-05	Adaptive Elicitation of Latent Information Using Natural Language	Jimmy Wang et.al.	2504.04204	null
2025-04-04	Structured Extraction of Process Structure Properties Relationships in Materials Science	Amit K Verma et.al.	2504.03979	null
2025-04-04	Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content – The Ask ME Assistant	Kovan Mzwri et.al.	2504.03966	null
2025-04-04	Practical Poisoning Attacks against Retrieval-Augmented Generation	Baolei Zhang et.al.	2504.03957	null
2025-04-04	The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning	Virilo Tejedor et.al.	2504.03823	null
2025-04-04	Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy	Kamil Ciosek et.al.	2504.03579	null
2025-04-04	Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej	Shubham Kumar Nigam et.al.	2504.03486	null
2025-04-07	LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications	Botao Zhu et.al.	2504.03444	null
2025-04-04	Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models	Mirko Borszukovszki et.al.	2504.03440	null
2025-04-04	Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models	Afshin Khadangi et.al.	2504.03302	link
2025-04-04	Do Large Language Models Solve the Problems of Agent-Based Modeling? A Critical Review of Generative Social Simulations	Maik Larooij et.al.	2504.03274	null
2025-04-04	Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation	Weitao Li et.al.	2504.03165	link
2025-04-03	How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence	Hongzhe Du et.al.	2504.02904	null
2025-04-03	Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models	Liangjie Huang et.al.	2504.02902	null
2025-04-01	Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications	Hongliu Cao et.al.	2504.02867	null
2025-04-01	The Illusionist’s Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances	Yining Wang et.al.	2504.02865	null
2025-04-03	A Memory-Augmented LLM-Driven Method for Autonomous Merging of 3D Printing Work Orders	Yuhao Liu et.al.	2504.02509	null
2025-04-03	Cognitive Memory in Large Language Models	Lianlei Shan et.al.	2504.02441	null
2025-04-02	Achieving Unanimous Consensus in Decision Making Using Multi-Agents	Apurba Pokharel et.al.	2504.02128	null
2025-04-02	Aligned Better, Listen Better for Audio-Visual Large Language Models	Yuxin Guo et.al.	2504.02061	null
2025-04-03	Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation	Baban Gain et.al.	2504.01919	null
2025-04-02	LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution	Zhuoran Yang et.al.	2504.01533	null
2025-04-03	Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding	Sakhinana Sagar Srinivas et.al.	2504.01281	null
2025-04-01	Grade Guard: A Smart System for Short Answer Automated Grading	Niharika Dadu et.al.	2504.01253	null
2025-04-01	Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models	Rafael Giebisch et.al.	2504.01248	null
2025-04-01	Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery	Nicholas Clark et.al.	2504.01205	null
2025-04-01	$μ$ KE: Matryoshka Unstructured Knowledge Editing of Large Language Models	Zian Su et.al.	2504.01196	null
2025-04-01	Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations	Mahjabin Nahar et.al.	2504.01153	link
2025-04-01	MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)	Bikash Saha et.al.	2504.01145	link
2025-04-01	Investigating Large Language Models in Diagnosing Students’ Cognitive Skills in Math Problem-solving	Hyoungwook Jin et.al.	2504.00843	null
2025-04-01	Aplicação de Large Language Models na Análise e Síntese de Documentos Jurídicos: Uma Revisão de Literatura	Matheus Belarmino et.al.	2504.00725	null
2025-04-01	GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments	Enjun Du et.al.	2504.00711	null
2025-04-01	DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism	Dengchun Li et.al.	2504.00661	link
2025-04-01	Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences	Xiangyang Liu et.al.	2504.00473	null
2025-04-01	Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics	Shide Zhou et.al.	2504.00446	null
2025-04-01	Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding	Mohanakrishnan Hariharan et.al.	2504.00409	null
2025-04-01	When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR)	Mahak Agarwal et.al.	2504.00374	null
2025-03-31	SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning	Shiyue Zhao et.al.	2504.00115	null
2025-03-30	Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge	Agam Shah et.al.	2504.00042	null
2025-03-27	Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1	Birger Moell et.al.	2504.00016	null
2025-03-31	SQuat: Subspace-orthogonal KV Cache Quantization	Hao Wang et.al.	2503.24358	null
2025-03-31	Model Hemorrhage and the Robustness Limits of Large Language Models	Ziyang Ma et.al.	2503.23924	null
2025-03-31	Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement	Yuqiao Tan et.al.	2503.23895	link
2025-03-31	Adaptive Layer-skipping in Pre-trained LLMs	Xuan Luo et.al.	2503.23798	null
2025-03-31	MKA: Leveraging Cross-Lingual Consensus for Model Abstention	Sharad Duwal et.al.	2503.23687	link
2025-03-30	RARE: Retrieval-Augmented Reasoning Modeling	Zhengren Wang et.al.	2503.23513	link
2025-03-30	SCORE: Story Coherence and Retrieval Enhancement for AI Narratives	Qiang Yi et.al.	2503.23512	null
2025-03-30	Re-Aligning Language to Visual Objects with an Agentic Workflow	Yuming Chen et.al.	2503.23508	null
2025-03-30	An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering	Alexander Murphy et.al.	2503.23415	null
2025-03-30	Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation	Jiwon Jeong et.al.	2503.23363	link
2025-03-30	Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base	Linxin Song et.al.	2503.23361	null
2025-03-29	Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus	Claas Beger et.al.	2503.23229	link
2025-03-29	Large Language Models are Unreliable for Cyber Threat Intelligence	Emanuele Mezzi et.al.	2503.23175	null
2025-03-29	Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments	Yifan Xu et.al.	2503.23105	null
2025-03-29	DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation	Hsin-Ling Hsu et.al.	2503.23013	null
2025-03-29	Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective	Xinyu Yao et.al.	2503.22954	null
2025-03-29	Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering	Yugen Sato et.al.	2503.22941	null
2025-04-02	Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use	Nicholas Roth et.al.	2503.22931	null
2025-03-28	Identifying and Mitigating API Misuse in Large Language Models	Terry Yue Zhuo et.al.	2503.22821	null
2025-03-26	InfoBid: A Simulation Framework for Studying Information Disclosure in Auctions with Large Language Model-based Agents	Yue Yin et.al.	2503.22726	null
2025-03-25	Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models	Bowei Tian et.al.	2503.22720	null
2025-03-25	LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation	Sarah Martinson et.al.	2503.22719	link
2025-03-31	Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning	Abdullah Vanlioglu et.al.	2503.22456	null
2025-03-28	Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs	Yuan He et.al.	2503.22362	link
2025-03-28	Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions	Yubo Li et.al.	2503.22353	null
2025-03-28	BanglAssist: A Bengali-English Generative AI Chatbot for Code-Switching and Dialect-Handling in Customer Service	Francesco Kruk et.al.	2503.22283	null
2025-03-28	Learning to Instruct for Visual Instruction Tuning	Zhihan Zhou et.al.	2503.22215	null
2025-03-28	Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models	Zhanke Zhou et.al.	2503.22165	link
2025-03-27	Entropy-Aware Branching for Improved Mathematical Reasoning	Xianzhi Li et.al.	2503.21961	null
2025-03-25	OAEI-LLM-T: A TBox Benchmark Dataset for Understanding LLM Hallucinations in Ontology Matching Systems	Zhangcheng Qiang et.al.	2503.21813	null
2025-03-27	Cooking Task Planning using LLM and Verified by Graph Network	Ryunosuke Takebayashi et.al.	2503.21564	null
2025-03-27	SWI: Speaking with Intent in Large Language Models	Yuwei Yin et.al.	2503.21544	link
2025-04-02	Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?	Ashish Sardana et.al.	2503.21157	null
2025-03-27	Alleviating LLM-based Generative Retrieval Hallucination in Alipay Search	Yedan Shen et.al.	2503.21098	null
2025-03-26	Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework	Thomson Yen et.al.	2503.21023	link
2025-03-26	Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring	Fraol Batole et.al.	2503.20934	null
2025-03-26	Exploring CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation	Zhiwei Yang et.al.	2503.20826	link
2025-03-26	Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy	Joonhyun Jeong et.al.	2503.20823	link
2025-03-26	MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search	Yunhai Hu et.al.	2503.20757	null
2025-03-26	TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes	Raj Sanjay Shah et.al.	2503.20648	null
2025-03-26	Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering	Zehui Liao et.al.	2503.20504	null
2025-03-26	GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization	Zhouhong Gu et.al.	2503.20194	link
2025-03-25	FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs	Carlos Plou et.al.	2503.19850	null
2025-03-25	HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection	Maryam Bala et.al.	2503.19650	null
2025-03-25	KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models	Zhiwei Wang et.al.	2503.19482	null
2025-03-25	VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU	Zhongchun Zheng et.al.	2503.19449	null
2025-03-25	QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition	Yuxuan Hu et.al.	2503.19353	link
2025-03-24	Language Model Uncertainty Quantification with Attention Chain	Yinghao Li et.al.	2503.19168	link
2025-03-24	Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models	Nariman Naderi et.al.	2503.18562	null
2025-03-24	Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions	Dong Jing et.al.	2503.18320	null
2025-03-23	ShED-HD: A Shannon Entropy Distribution Framework for Lightweight Hallucination Detection on Edge Devices	Aneesh Vathul et.al.	2503.18242	null
2025-03-23	GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks	Varvara Krechetova et.al.	2503.18129	link
2025-03-23	SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA	V Venktesh et.al.	2503.17990	null
2025-03-22	A Modular Dataset to Demonstrate LLM Abstraction Capability	Adam Atanas et.al.	2503.17645	null
2025-03-22	ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently	Jaeyeon Lee et.al.	2503.17587	link
2025-03-21	Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring	Khadija Zanna et.al.	2503.17569	null
2025-03-21	Judge Anything: MLLM as a Judge Across Any Modality	Shu Pu et.al.	2503.17489	null
2025-03-21	LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language	Kun Chu et.al.	2503.17309	link
2025-03-21	FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs	Albert Sawczyn et.al.	2503.17229	null
2025-03-20	Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models	Zahra Khalila et.al.	2503.16581	null
2025-03-26	Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models	Hanzhi Zhang et.al.	2503.16541	null
2025-03-18	Do Multimodal Large Language Models Understand Welding?	Grigorii Khvatskii et.al.	2503.16537	null
2025-03-18	Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine	Chengfeng Dou et.al.	2503.16530	null
2025-03-18	HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL	Heng Ping et.al.	2503.16528	null
2025-03-20	Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data	Zijian Li et.al.	2503.16260	null
2025-03-20	Towards Lighter and Robust Evaluation for Retrieval Augmented Generation	Alex-Razvan Ispas et.al.	2503.16161	link
2025-03-20	ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph	Langming Liu et.al.	2503.15990	null
2025-03-20	Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models	Baolong Bi et.al.	2503.15888	link
2025-03-21	Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance	Hui Liu et.al.	2503.15886	null
2025-03-20	MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations	Kyungho Bae et.al.	2503.15871	null
2025-03-20	Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey	Xiaoou Liu et.al.	2503.15850	null
2025-03-20	Entropy-based Exploration Conduction for Multi-step Reasoning	Jinghan Zhang et.al.	2503.15848	null
2025-03-23	DNA Bench: When Silence is Smarter – Benchmarking Over-Reasoning in Reasoning LLMs	Masoud Hashemi et.al.	2503.15793	null
2025-03-19	R $^2$ : A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs	Zefeng Lin et.al.	2503.15655	null
2025-03-19	How Well Can AI Build SD Models?	William Schoenberg et.al.	2503.15580	null
2025-03-19	Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs	Yuqi Zhu et.al.	2503.15341	null
2025-03-19	Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning?	Roberto Araya et.al.	2503.15268	null
2025-03-19	Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems	Sejong Kim et.al.	2503.15191	link
2025-03-19	Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks	Yuting Guo et.al.	2503.15169	null
2025-03-19	ELTEX: A Framework for Domain-Driven Synthetic Data Generation	Arina Razmyslovich et.al.	2503.15055	link
2025-03-18	Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence	Sophia Hager et.al.	2503.14749	null
2025-03-18	Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving	Priscylla Silva et.al.	2503.14630	link
2025-03-18	Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations	Ziwei Ji et.al.	2503.14477	null
2025-03-18	From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models	Qiantong Wang et.al.	2503.14392	null
2025-03-18	How much do LLMs learn from negative examples?	Shadi Hamdan et.al.	2503.14391	link
2025-03-18	On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller?	Pouria Sarhadi et.al.	2503.14379	link
2025-03-18	Learning on LLM Output Signatures for gray-box LLM Behavior Analysis	Guy Bar-Shalom et.al.	2503.14043	link
2025-03-18	Predicting Human Choice Between Textually Described Lotteries	Eyal Marantz et.al.	2503.14004	null
2025-03-18	FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks	Siqi Zhang et.al.	2503.13966	null
2025-03-19	Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations	Rui Yang et.al.	2503.13857	null
2025-03-18	Empowering GraphRAG with Knowledge Filtering and Integration	Kai Guo et.al.	2503.13804	null
2025-03-18	Mapping the Trust Terrain: LLMs in Software Engineering – Insights and Perspectives	Dipin Khati et.al.	2503.13793	null
2025-03-17	Pareidolic Illusions of Meaning: ChatGPT, Pseudolaw and the Triumph of Form over Substance	Joe McIntyre et.al.	2503.13556	null
2025-03-14	RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration	Hong Qing Yu et.al.	2503.13514	null
2025-03-17	MetaScale: Test-Time Scaling with Evolving Meta-Thoughts	Qin Liu et.al.	2503.13447	null
2025-03-17	Managing Hybrid Solid-State Drives Using Large Language Models	Qian Wei et.al.	2503.13105	null
2025-03-17	Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning	Junming Liu et.al.	2503.12972	null
2025-03-17	MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting	Rui Pu et.al.	2503.12931	null
2025-03-17	HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models	Xinyan Jiang et.al.	2503.12908	link
2025-03-16	Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?	Jacqueline L. Mitchell et.al.	2503.12686	null
2025-03-16	From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations	Sarvesh Baskar et.al.	2503.12556	null
2025-03-21	LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation	Yuqi Sun et.al.	2503.12547	null
2025-03-18	SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?	Jianzhu Yao et.al.	2503.12349	null
2025-03-15	PredicateFix: Repairing Static Analysis Alerts with Bridging Predicates	Yuan-An Xiao et.al.	2503.12205	null
2025-03-20	Applications of Large Language Model Reasoning in Feature Generation	Dharani Chandra et.al.	2503.11989	null
2025-03-14	LLM Agents for Education: Advances and Applications	Zhendong Chu et.al.	2503.11733	null
2025-03-14	Neutralizing Bias in LLM Reasoning using Entailment Graphs	Liang Cheng et.al.	2503.11614	link
2025-03-14	D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning	Jia Zhang et.al.	2503.11441	null
2025-03-14	Modeling Subjectivity in Cognitive Appraisal with Language Models	Yuxiang Zhou et.al.	2503.11381	null
2025-03-14	Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches	Panggih Kusuma Ningrum et.al.	2503.11376	null
2025-03-14	AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation	Fengyu Li et.al.	2503.11346	link
2025-03-14	Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models	Aissatou Diallo et.al.	2503.11336	null
2025-03-14	Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries	Sahil Kale et.al.	2503.11256	link
2025-03-14	Collaboration is all you need: LLM Assisted Safe Code Translation	Rabimba Karanjai et.al.	2503.11237	null
2025-03-13	Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations	Piyush Gupta et.al.	2503.10941	null
2025-03-13	HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust	Yu Luo et.al.	2503.10793	null
2025-03-12	CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries	Zhiyuan Wang et.al.	2503.10707	null
2025-03-12	Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models	Shahnewaz Karim Sakib et.al.	2503.10690	null
2025-03-13	TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention	Jinhao Duan et.al.	2503.10602	link
2025-03-13	SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models	Sahar Admoni et.al.	2503.10509	null
2025-03-13	LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions	Gaurav Kumar Gupta et.al.	2503.10486	null
2025-03-13	Collaborative Speculative Inference for Efficient LLM Inference Serving	Luyao Gao et.al.	2503.10325	null
2025-03-13	StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error	Shu-Xun Yang et.al.	2503.10105	link
2025-03-13	Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model	Qiyuan Deng et.al.	2503.10093	null
2025-03-12	Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets	Zahra Abbasiantaeb et.al.	2503.09902	link
2025-03-12	Probabilistic Reasoning with LLMs for k-anonymity Estimation	Jonathan Zheng et.al.	2503.09674	null
2025-03-12	CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection	Richard A. Dubniczky et.al.	2503.09433	link
2025-03-12	NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model	Yuzhi Lai et.al.	2503.09335	link
2025-03-12	Token Weighting for Long-Range Language Modeling	Falko Helm et.al.	2503.09202	link
2025-03-12	Is LLMs Hallucination Usable? LLM-based Negative Reasoning for Fake News Detection	Chaowei Zhang et.al.	2503.09153	null
2025-03-11	Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation	Yu Wang et.al.	2503.08963	null
2025-03-11	CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving	Changxing Liu et.al.	2503.08683	link
2025-03-11	DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process	Minjun Zhu et.al.	2503.08569	null
2025-03-11	Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework	Zhuo Zhi et.al.	2503.08308	null
2025-03-11	FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback	Kangan Qian et.al.	2503.08162	null
2025-03-11	LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification	Siyuan Wang et.al.	2503.07937	null
2025-03-10	Safety Guardrails for LLM-Enabled Robots	Zachary Ravichandran et.al.	2503.07885	null
2025-03-10	HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations	Samir Abdaljalil et.al.	2503.07833	null
2025-03-07	SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs	Jaewoo Song et.al.	2503.07657	null
2025-03-07	MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration	Jinguang Wang et.al.	2503.07654	null
2025-03-10	Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review	Samuel Ferino et.al.	2503.07556	null
2025-03-10	Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies	Luyi Jiang et.al.	2503.07306	null
2025-03-10	Quantizing Large Language Models for Code Generation: A Differentiated Replication	Alessandro Giagnorio et.al.	2503.07103	null
2025-03-10	CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation	Runqi Sui et.al.	2503.06950	null
2025-03-09	Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia	Sabeen Ahmed et.al.	2503.06797	null
2025-03-09	Delusions of Large Language Models	Hongshen Xu et.al.	2503.06709	null
2025-03-09	Alignment for Efficient Tool Calling of Large Language Models	Hongshen Xu et.al.	2503.06708	null
2025-03-09	Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform	Chenyu Huang et.al.	2503.06676	null
2025-03-09	Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving	Yao Cheng et.al.	2503.06567	null
2025-03-09	Graph Retrieval-Augmented LLM for Conversational Recommendation Systems	Zhangchi Qiu et.al.	2503.06430	null
2025-03-09	Performant LLM Agentic Framework for Conversational AI	Alex Casella et.al.	2503.06410	null
2025-03-08	Sample-aware Adaptive Structured Pruning for Large Language Models	Jun Kong et.al.	2503.06184	null
2025-03-08	Wireless Hallucination in Generative AI-enabled Communications: Concepts, Issues, and Solutions	Xudong Wang et.al.	2503.06149	link
2025-03-08	A Survey on Post-training of Large Language Models	Guiyao Tie et.al.	2503.06072	link
2025-03-07	SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs	Samir Abdaljalil et.al.	2503.05980	null
2025-03-07	TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator	Deepak Vungarala et.al.	2503.05951	null
2025-03-04	I Think, Therefore I Hallucinate: Minds, Machines, and the Art of Being Wrong	Sebastian Barros et.al.	2503.05806	null
2025-03-07	R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning	Huatong Song et.al.	2503.05592	null
2025-03-07	Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information	Junbo Zhao et.al.	2503.05543	null
2025-03-07	Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering	Yusong Ke et.al.	2503.05505	null
2025-03-07	Maximum Hallucination Standards for Domain-Specific Large Language Models	Tingmingke Lu et.al.	2503.05481	null
2025-03-07	An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning	Navdeep Kaur et.al.	2503.05439	null
2025-03-07	GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation	Zhenxuan Zhang et.al.	2503.05347	link
2025-03-07	Path Pooling: Train-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation	Hairu Wang et.al.	2503.05203	null
2025-03-07	RocketEval: Efficient Automated LLM Evaluation via Grading Checklist	Tianjun Wei et.al.	2503.05142	link
2025-03-06	LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression	Souvik Kundu et.al.	2503.04982	null
2025-03-10	Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents	Jingying Zeng et.al.	2503.04830	null
2025-03-07	START: Self-taught Reasoner with Tools	Chengpeng Li et.al.	2503.04625	null
2025-03-06	HalluCounter: Reference-free LLM Hallucination Detection in the Wild!	Ashok Urlana et.al.	2503.04615	null
2025-03-06	Benchmarking Reasoning Robustness in Large Language Models	Tong Yu et.al.	2503.04550	null
2025-03-06	TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction	Chao Wang et.al.	2503.04457	null
2025-03-06	On Fact and Frequency: LLM Responses to Misinformation Expressed with Uncertainty	Yana van de Sande et.al.	2503.04271	null
2025-03-06	Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation	Ziqiang Cui et.al.	2503.04162	null
2025-03-06	KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease	Yongchao Long et.al.	2503.04153	link
2025-03-05	Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis	Ahmad Hafez et.al.	2503.03911	link
2025-03-07	LEWIS (LayEr WIse Sparsity) – A Training Free Guided Model Merging Approach	Hetarth Chopra et.al.	2503.03874	null
2025-03-04	BotUmc: An Uncertainty-Aware Twitter Bot Detection with Multi-view Causal Inference	Tao Yang et.al.	2503.03775	null
2025-03-05	The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems	Richard Ren et.al.	2503.03750	null
2025-03-05	Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models	Bar Karov et.al.	2503.03669	link
2025-03-05	Structured Outputs Enable General-Purpose LLMs to be Medical Experts	Guangfu Guo et.al.	2503.03194	null
2025-03-04	SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs	Samir Abdaljalil et.al.	2503.03032	null
2025-03-04	Effectively Steer LLM To Follow Preference via Building Confident Directions	Bingqing Song et.al.	2503.02989	null
2025-03-04	Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework	Ziang Zhou et.al.	2503.02863	null
2025-03-04	Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers	Zicong He et.al.	2503.02851	link
2025-03-04	Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs	Yuzhe Gu et.al.	2503.02846	link
2025-03-04	FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting	Congluo Xu et.al.	2503.02692	null
2025-03-04	MPO: Boosting LLM Agents with Meta Plan Optimization	Weimin Xiong et.al.	2503.02682	link
2025-03-04	Multidimensional Consistency Improves Reasoning in Language Models	Huiyuan Lai et.al.	2503.02670	null
2025-03-05	Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models	Paul Stangel et.al.	2503.02623	null
2025-03-04	AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection	Dimitra Karkani et.al.	2503.02442	null
2025-03-04	Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling	Hang Zheng et.al.	2503.02233	null
2025-03-04	DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models	Saeed Ranjbar Alvar et.al.	2503.02175	link
2025-03-03	OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments	Qianwei Wang et.al.	2503.02106	null
2025-03-05	HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs	Tin Nguyen et.al.	2503.02003	link
2025-03-02	NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT	Jiaying Hong et.al.	2503.01921	link
2025-03-01	How to Steer LLM Latents for Hallucination Detection?	Seongheon Park et.al.	2503.01917	null
2025-03-03	Can (A)I Change Your Mind?	Miriam Havin et.al.	2503.01844	link
2025-03-04	Position: Don’t use the CLT in LLM evals with fewer than a few hundred datapoints	Sam Bowyer et.al.	2503.01747	null
2025-03-03	Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution	Kun Li et.al.	2503.01695	null
2025-03-03	When an LLM is apprehensive about its answers – and when its uncertainty is justified	Petr Sychev et.al.	2503.01688	link
2025-03-03	Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization	Siya Qi et.al.	2503.01670	link
2025-03-03	Detecting Stylistic Fingerprints of Large Language Models	Yehonatan Bitton et.al.	2503.01659	null
2025-03-03	Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning	Wenjie Wu et.al.	2503.01642	null
2025-03-03	Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering	Zhanghao Hu et.al.	2503.01606	null
2025-03-03	None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering	Zhi Rui Tam et.al.	2503.01550	null
2025-03-03	Revisiting Large Language Model Pruning using Neuron Semantic Attribution	Yizhuo Ding et.al.	2503.01542	null
2025-03-03	What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret	Yufeng Yuan et.al.	2503.01491	null
2025-03-03	Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation	Linhai Zhang et.al.	2503.01315	null
2025-03-03	LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains	Ling Xiao et.al.	2503.01236	null
2025-03-06	CE-U: Cross Entropy Unlearning	Bo Yang et.al.	2503.01224	null
2025-03-03	Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG	Wenbin Wang et.al.	2503.01222	link
2025-03-04	Can Large Language Models Help Experimental Design for Causal Discovery?	Junyi Li et.al.	2503.01139	null
2025-03-02	Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies	Tianyi Huang et.al.	2503.00724	null
2025-03-02	GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development	Leming Shen et.al.	2503.00686	link
2025-03-02	From Prompting to Partnering: Personalization Features for Human-LLM Interactions	Si Thu et.al.	2503.00681	null
2025-03-01	Embracing Diversity: A Multi-Perspective Approach with Soft Labels	Benedetta Muscato et.al.	2503.00489	null
2025-03-01	U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack	Yunfan Gao et.al.	2503.00353	link
2025-03-01	Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy	Jahan C. Penny-Dimri et.al.	2503.00269	null
2025-02-28	A Survey of Uncertainty Estimation Methods on Large Language Models	Zhiqiu Xia et.al.	2503.00172	null
2025-02-27	Societal Alignment Frameworks Can Improve LLM Alignment	Karolina Stańczak et.al.	2503.00069	null
2025-03-04	Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs	Xiaomin Li et.al.	2502.21239	null
2025-02-28	PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information	Hansi Yang et.al.	2502.21087	null
2025-03-03	A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation	Xujie Yuan et.al.	2502.20854	null
2025-02-28	Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow	Jiaqi Bai et.al.	2502.20750	link
2025-02-28	Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models	Colleen Gilhuly et.al.	2502.20647	null
2025-02-28	Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems	Jędrzej Warczyński et.al.	2502.20609	null
2025-02-27	Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization	Ryan C. Barron et.al.	2502.20364	link
2025-02-27	Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models	Yi Jing et.al.	2502.20344	null
2025-02-27	Expertise Is What We Want	Alan Ashworth et.al.	2502.20335	null
2025-02-27	Conformal Tail Risk Control for Large Language Model Alignment	Catherine Yu-Chi Chen et.al.	2502.20285	null
2025-02-27	Similarity-Distance-Magnitude Universal Verification	Allen Schmaltz et.al.	2502.20167	link
2025-03-04	ProAPO: Progressively Automatic Prompt Optimization for Visual Classification	Xiangyan Qu et.al.	2502.19844	link
2025-02-27	Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences	Linzhuo li et.al.	2502.19679	null
2025-02-26	Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review	Sungduk Yu et.al.	2502.19614	null
2025-02-26	Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems	Nayoung Choi et.al.	2502.19596	null
2025-02-26	Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents	Ashley Lewis et.al.	2502.19545	null
2025-02-26	Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices	Xinru Wang et.al.	2502.19410	null
2025-02-26	Verde: Verification via Refereed Delegation for Machine Learning Programs	Arasu Arun et.al.	2502.19405	null
2025-02-26	Efficient Federated Search for Retrieval-Augmented Generation	Rachid Guerraoui et.al.	2502.19280	null
2025-02-26	Bi’an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation	Zhouyu Jiang et.al.	2502.19209	null
2025-02-26	Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement	Siyuan Zhang et.al.	2502.19127	null
2025-02-26	Talking like Piping and Instrumentation Diagrams (P&IDs)	Achmad Anggawirya Alimin et.al.	2502.18928	null
2025-02-26	Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models	Shuliang Liu et.al.	2502.18817	null
2025-02-26	Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science	Xiaohua Wu et.al.	2502.18729	null
2025-02-25	Scalable Best-of-N Selection for Large Language Models via Self-Certainty	Zhewei Kang et.al.	2502.18581	link
2025-02-25	Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions	Yizhe Zhang et.al.	2502.18435	null
2025-02-25	Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods	Nicola Cecere et.al.	2502.18389	null
2025-02-25	BRIDO: Bringing Democratic Order to Abstractive Summarization	Junhyun Lee et.al.	2502.18342	null
2025-02-25	Can LLMs Explain Themselves Counterfactually?	Zahra Dehghanighobadi et.al.	2502.18156	null
2025-02-25	LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers	Zhuocheng Zhang et.al.	2502.18139	link
2025-02-25	Verdict: A Library for Scaling Judge-Time Compute	Nimit Kalra et.al.	2502.18018	link
2025-02-27	LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction	Suozhi Huang et.al.	2502.17925	null
2025-02-25	An Overview of Large Language Models for Statisticians	Wenlong Ji et.al.	2502.17814	null
2025-02-25	Uncertainty Quantification for LLM-Based Survey Simulations	Chengpiao Huang et.al.	2502.17773	null
2025-02-24	Hallucination Detection in LLMs Using Spectral Features of Attention Maps	Jakub Binkowski et.al.	2502.17598	link
2025-02-24	Towards Conditioning Clinical Text Generation for User Control	Osman Alperen Koraş et.al.	2502.17571	null
2025-02-22	SAE-V: Interpreting Multimodal Models for Enhanced Alignment	Hantao Lou et.al.	2502.17514	null
2025-02-24	CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought	Boxuan Zhang et.al.	2502.17214	link
2025-02-24	IGDA: Interactive Graph Discovery through Large Language Model Agents	Alex Havrilla et.al.	2502.17189	null
2025-02-24	LettuceDetect: A Hallucination Detection Framework for RAG Applications	Ádám Kovács et.al.	2502.17125	link
2025-02-27	LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences	Sijia Yao et.al.	2502.17057	link
2025-02-24	Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology	Longchao Da et.al.	2502.17026	null
2025-02-24	Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning	Jiaheng Li et.al.	2502.16896	null
2025-02-24	Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models	Yaqi Sun et.al.	2502.16842	null
2025-02-25	Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses	Tiejin Chen et.al.	2502.16820	null
2025-02-23	Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT	Nidhal Jegham et.al.	2502.16428	null
2025-02-23	Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications	Feng Ma et.al.	2502.16402	null
2025-02-22	An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning	Masoud Shokrnezhad et.al.	2502.16198	null
2025-02-22	EPERM: An Evidence Path Enhanced Reasoning Model for Knowledge Graph Question and Answering	Xiao Long et.al.	2502.16171	null
2025-02-22	ZiGong 1.0: A Large Language Model for Financial Credit	Yu Lei et.al.	2502.16159	null
2025-02-22	The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination	Yuji Zhang et.al.	2502.16143	null
2025-02-22	Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals	Linda Zeng et.al.	2502.16101	null
2025-02-21	Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications	Zichen Chen et.al.	2502.15865	null
2025-02-20	Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection	Yihao Xue et.al.	2502.15845	null
2025-02-20	Hallucination Detection in Large Language Models with Metamorphic Relations	Borui Yang et.al.	2502.15844	null
2025-02-21	AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind	Zhining Zhang et.al.	2502.15676	link
2025-02-24	Empowering LLMs with Logical Reasoning: A Comprehensive Survey	Fengxiang Cheng et.al.	2502.15652	null
2025-02-21	A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany	Ina Dormuth et.al.	2502.15568	null
2025-02-21	PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning	Pengcheng Huang et.al.	2502.15543	link
2025-02-21	Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making	Eunhye Kim et.al.	2502.15395	null
2025-02-21	Evaluating Social Biases in LLM Reasoning	Xuyang Wu et.al.	2502.15361	null
2025-02-21	From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants	Manisha Mukherjee et.al.	2502.15237	null
2025-02-20	Using tournaments to calculate AUROC for zero-shot classification with LLMs	Wonjin Yoon et.al.	2502.15018	null
2025-02-19	OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment	Xiangjin Xie et.al.	2502.14913	null
2025-02-19	EvoP: Robust LLM Inference via Evolutionary Pruning	Shangyu Wu et.al.	2502.14910	null
2025-02-19	KOALA: Knowledge Conflict Augmentations for Robustness in Vision Language Models	Peter Carragher et.al.	2502.14908	link
2025-02-20	Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning	Shuyue Stella Li et.al.	2502.14860	link
2025-02-20	Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs	Zongxia Li et.al.	2502.14748	null
2025-02-20	CER: Confidence Enhanced Reasoning in LLMs	Ali Razghandi et.al.	2502.14634	link
2025-02-20	Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery	Minh-Quyet Ha et.al.	2502.14631	null
2025-02-20	ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification	Hyunseok Lee et.al.	2502.14565	null
2025-02-20	Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation	Austin A. Barr et.al.	2502.14523	link
2025-02-25	How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?	Sergey Pletenev et.al.	2502.14502	link
2025-02-20	Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models	Artem Vazhentsev et.al.	2502.14427	link
2025-02-20	ParallelComp: Parallel Long-Context Compressor for Length Extrapolation	Jing Xiong et.al.	2502.14317	null
2025-02-20	MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models	Shrey Pandit et.al.	2502.14302	null
2025-02-20	STeCa: Step-level Trajectory Calibration for LLM Agent Learning	Hanlin Wang et.al.	2502.14276	link
2025-02-20	Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment	Jiaxi Li et.al.	2502.14275	null
2025-02-20	PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant	Congrui Yin et.al.	2502.14271	null
2025-02-20	MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels	Xiaoou Liu et.al.	2502.14268	null
2025-02-20	Multi-Faceted Studies on Data Poisoning can Advance LLM Development	Pengfei He et.al.	2502.14182	link
2025-02-19	SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation	Song Duong et.al.	2502.13674	null
2025-02-19	C2T: A Classifier-Based Tree Construction Method in Speculative Decoding	Feiye Huo et.al.	2502.13652	null
2025-02-19	REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models	DongGeon Lee et.al.	2502.13622	null
2025-02-19	What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis	Peiran Wang et.al.	2502.13490	null
2025-02-19	LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models	Ruiming Tang et.al.	2502.13481	null
2025-02-19	TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation	Jialin Ouyang et.al.	2502.13442	link
2025-02-19	Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning	Ningke Li et.al.	2502.13416	null
2025-02-19	Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval	Aditya Sharma et.al.	2502.13369	null
2025-02-18	SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?	Yucheng Shi et.al.	2502.13233	null
2025-02-17	Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment	Yuze Zhao et.al.	2502.13170	link
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	link
2025-02-18	Understanding and Rectifying Safety Perception Distortion in VLMs	Xiaohan Zou et.al.	2502.13095	null
2025-02-18	LAMD: Context-driven Android Malware Detection and Classification with LLMs	Xingzhi Qian et.al.	2502.13055	null
2025-02-20	Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation	Sha Li et.al.	2502.13019	null
2025-02-18	Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs	Adi Simhi et.al.	2502.12964	null
2025-02-18	Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models	Elena Stringli et.al.	2502.12821	null
2025-02-20	How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild	Saad Obaid ul Islam et.al.	2502.12769	link
2025-02-18	R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs	Sumin Jo et.al.	2502.12767	link
2025-02-18	“I know myself better, but not really greatly”: Using LLMs to Detect and Explain LLM-Generated Texts	Jiazhou Ji et.al.	2502.12743	null
2025-02-18	R.R.: Unveiling LLM Training Privacy through Recollection and Ranking	Wenlong Meng et.al.	2502.12658	link
2025-02-18	COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation	Sean Wang et.al.	2502.12601	null
2025-02-18	EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning	Xiaoqian Liu et.al.	2502.12486	null
2025-02-18	Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking	Alireza S. Ziabari et.al.	2502.12470	null
2025-02-18	MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation	Yutong Wang et.al.	2502.12468	null
2025-02-17	Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs	Kan Zhu et.al.	2502.12216	null
2025-02-17	Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control	Jinyan Su et.al.	2502.12145	link
2025-02-17	KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs	Qi Zhao et.al.	2502.12029	null
2025-02-17	SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities	Fengqing Jiang et.al.	2502.12025	null
2025-02-17	Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning	Tianyi Wu et.al.	2502.11962	null
2025-02-17	Can Your Uncertainty Scores Detect Hallucinated Entity?	Min-Hsuan Yeh et.al.	2502.11948	null
2025-02-17	Cognitive-Aligned Document Selection for Retrieval-augmented Generation	Bingyu Wan et.al.	2502.11770	null
2025-02-17	ReviewEval: An Evaluation Framework for AI-Generated Reviews	Chavvi Kirtani et.al.	2502.11736	null
2025-02-17	Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception	Shiyu Ni et.al.	2502.11677	null
2025-02-17	Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation	Arindam Sharma et.al.	2502.11620	null
2025-02-17	Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs?	Hanxing Ding et.al.	2502.11400	null
2025-02-17	“Nuclear Deployed!”: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents	Rongwu Xu et.al.	2502.11355	link
2025-02-16	Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation	Hieu Nguyen et.al.	2502.11306	null
2025-02-16	Uncertainty-Aware Step-wise Verification with Generative Reward Models	Zihuiwen Ye et.al.	2502.11250	null
2025-02-16	A Survey of LLM-based Agents in Medicine: How far are we from Baymax?	Wenxuan Wang et.al.	2502.11211	null
2025-02-16	Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs	Fei Yu et.al.	2502.11155	null
2025-02-18	Valuable Hallucinations: Realizable Non-realistic Propositions	Qiucheng Chen et.al.	2502.11113	null
2025-02-16	Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications	Alexandru Lecu et.al.	2502.11108	link
2025-02-16	Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models	Prateek Chhikara et.al.	2502.11028	link
2025-02-16	Leveraging Uncertainty Estimation for Efficient LLM Routing	Tuo Zhang et.al.	2502.11021	null
2025-02-16	Agentic LLM Framework for Adaptive Decision Discourse	Antoine Dolant et.al.	2502.10978	null
2025-02-16	SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information	Xiangyu Zhang et.al.	2502.10950	null
2025-02-15	Towards Effective Extraction and Evaluation of Factual Claims	Dasha Metropolitansky et.al.	2502.10855	null
2025-02-15	An Empirical Analysis of Uncertainty in Large Language Model Evaluations	Qiujie Xie et.al.	2502.10709	link
2025-02-15	LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization	Erica Zhang et.al.	2502.10648	link
2025-02-14	Post-training an LLM for RAG? Train on Self-Generated Demonstrations	Matthew Finlayson et.al.	2502.10596	null
2025-02-14	Can Large Language Model Agents Balance Energy Systems?	Xinxing Ren et.al.	2502.10557	link
2025-02-14	A novel approach to data generation in generative model	JaeHong Kim et.al.	2502.10092	null
2025-02-14	Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos	Weirui Ye et.al.	2502.09886	null
2025-02-14	Automated Hypothesis Validation with Agentic Sequential Falsifications	Kexin Huang et.al.	2502.09858	link
2025-02-13	Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes	Taylan G. Topcu et.al.	2502.09690	null
2025-02-13	LP-LM: No Hallucinations in Question Answering with Logic Programming	Katherine Wu et.al.	2502.09212	link
2025-02-13	Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York	Sanskar Sehgal et.al.	2502.09204	null
2025-02-13	Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables	Xuzhao Geng et.al.	2502.09073	null
2025-02-13	Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models	Xin Zhou et.al.	2502.08922	null
2025-02-13	MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training	Xinxin You et.al.	2502.08904	null
2025-02-12	Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation	Mohammad Mahdi Abootorabi et.al.	2502.08826	link
2025-02-11	Hallucination, Monofacts, and Miscalibration: An Empirical Investigation	Muqing Miao et.al.	2502.08666	link
2025-02-10	Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis	Emanuele Ricco et.al.	2502.08663	null
2025-02-09	Few-shot_LLM_Synthetic_Data_with_Distribution_Matching	Jiyuan Ren et.al.	2502.08661	link
2025-02-08	Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions	Jingxin Xu et.al.	2502.08657	null
2025-02-12	Ensemble based approach to quantifying uncertainty of LLM based classifications	Srijith Rajamohan et.al.	2502.08631	null
2025-02-12	Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding	Konstantin Berestizshevsky et.al.	2502.08363	link
2025-02-17	Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG	Kushagra Bhushan et.al.	2502.08356	link
2025-02-12	Compromising Honesty and Harmlessness in Language Models via Deception Attacks	Laurène Vaugrante et.al.	2502.08301	null
2025-02-12	Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis	Changhua Pei et.al.	2502.08224	null
2025-02-12	Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences	Shanshan Han et.al.	2502.08142	null
2025-02-12	HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses	Sujeong Lee et.al.	2502.08109	null
2025-02-12	Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD)	Naomi Akhras et.al.	2502.08073	null
2025-02-11	From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems	Yining Hong et.al.	2502.07974	null
2025-02-11	Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning	Rujing Yao et.al.	2502.07912	link
2025-02-11	Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights	Ahilan Ayyachamy Nadar Ponnusamy et.al.	2502.07835	link
2025-02-17	Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering	Shuzheng Si et.al.	2502.07340	link
2025-02-11	When More is Less: Understanding Chain-of-Thought Length in LLMs	Yuyang Wu et.al.	2502.07266	null
2025-02-11	Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs	Sina Salimian et.al.	2502.07186	null
2025-02-11	Refine Knowledge of Large Language Models via Adaptive Contrastive Learning	Yinghui Li et.al.	2502.07184	null
2025-02-11	Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning	Feng Chen et.al.	2502.07154	link
2025-02-11	Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning	Jiayuan Zhu et.al.	2502.07143	null
2025-02-08	Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models	Sina Tayebati et.al.	2502.06884	link
2025-02-08	Group Reasoning Emission Estimation Networks	Yanming Guo et.al.	2502.06874	null
2025-02-08	Knowledge Graph-Guided Retrieval Augmented Generation	Xiangrong Zhu et.al.	2502.06864	link
2025-02-07	LLM-Supported Natural Language to Bash Translation	Finnian Westenfelder et.al.	2502.06858	link
2025-02-11	Calibrating LLMs with Information-Theoretic Evidential Deep Learning	Yawei Li et.al.	2502.06351	link
2025-02-10	Expect the Unexpected: FailSafe Long Context QA for Finance	Kiran Kamble et.al.	2502.06329	null
2025-02-10	Emergent Response Planning in LLM	Zhichen Dong et.al.	2502.06258	null
2025-02-10	Confidence Improves Self-Consistency in LLMs	Amir Taubenfeld et.al.	2502.06233	null
2025-02-10	Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement	Junyu Lu et.al.	2502.06207	link
2025-02-10	Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis	Sanket Jantre et.al.	2502.06173	null
2025-02-09	GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation	Runchuan Zhu et.al.	2502.05911	null
2025-02-09	Self-Training Large Language Models for Tool-Use Without Demonstrations	Ne Luo et.al.	2502.05867	null
2025-02-09	Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models	Cheng Peng Huang et.al.	2502.05825	null
2025-02-09	Assessing confidence in frontier AI safety cases	Stephen Barrett et.al.	2502.05791	null
2025-02-09	Visual Text Mining with Progressive Taxonomy Construction for Environmental Studies	Sam Yu-Te Lee et.al.	2502.05731	link
2025-02-07	SEER: Self-Explainability Enhancement of Large Language Models’ Representations	Guanxu Chen et.al.	2502.05242	null
2025-02-07	ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework	Xiaoyu Deng et.al.	2502.05084	null
2025-02-07	Aligning Black-box Language Models with Human Judgments	Gerrit J. J. van den Burg et.al.	2502.04997	null
2025-02-11	CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs	Roman Vashurin et.al.	2502.04964	null
2025-02-07	Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks	Jing Yang et.al.	2502.04797	link
2025-02-10	Confidence Elicitation: A New Attack Vector for Large Language Models	Brian Formento et.al.	2502.04643	link
2025-02-06	TruthFlow: Truthful LLM Generation via Representation Flow Correction	Hanyu Wang et.al.	2502.04556	null
2025-02-06	Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization	Yu-Neng Chuang et.al.	2502.04428	null
2025-02-06	KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference	Xing Li et.al.	2502.04420	link
2025-02-11	Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing	Kunfeng Lai et.al.	2502.04411	null
2025-02-06	FAS: Fast ANN-SNN Conversion for Spiking Large Language Models	Long Chen et.al.	2502.04405	link
2025-02-05	Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning	Jonathan Kim et.al.	2502.04381	null
2025-02-05	MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction	Xiao Hu et.al.	2502.04360	null
2025-02-04	LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving	Md Sifat Hossain et.al.	2502.04355	null
2025-02-06	Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software	Andreas Baumann et.al.	2502.03916	null
2025-02-06	BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation	Bo Pang et.al.	2502.03860	null
2025-02-12	Syntriever: How to Train Your Retriever with Synthetic Data from LLMs	Minsang Kim et.al.	2502.03824	link
2025-02-10	Large Language Models for Multi-Robot Systems: A Survey	Peihan Li et.al.	2502.03814	link
2025-02-08	Enhancing Hallucination Detection through Noise Injection	Litian Liu et.al.	2502.03799	null
2025-02-06	Adaptive Semantic Prompt Caching with VectorQ	Luis Gaspar Schroeder et.al.	2502.03771	null
2025-02-06	Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models	Rui Cai et.al.	2502.03715	null
2025-02-06	MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers	Nicole Cho et.al.	2502.03711	null
2025-02-06	Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers	Daniel Beaglehole et.al.	2502.03708	null
2025-02-06	LLM Alignment as Retriever Optimization: An Information Retrieval Perspective	Bowen Jin et.al.	2502.03699	null
2025-02-05	Reflection-Window Decoding: Text Generation with Selective Refinement	Zeyu Tang et.al.	2502.03678	null
2025-02-05	Advancing Reasoning in Large Language Models: Promising Methods and Approaches	Avinash Patil et.al.	2502.03671	null
2025-02-04	Artificial Intelligence and Legal Analysis: Implications for Legal Education and the Profession	Lee Peoples et.al.	2502.03487	null
2025-02-05	A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)	Yiye Chen et.al.	2502.03450	null
2025-02-05	SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs	Ben Liu et.al.	2502.03283	null
2025-02-05	Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models	Jialiang Wu et.al.	2502.03199	null
2025-02-05	IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates	Aissatou Diallo et.al.	2502.03080	null
2025-02-04	An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification	Riddhi More et.al.	2502.02715	null
2025-02-04	EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization	Yize Wu et.al.	2502.02493	null
2025-02-04	Activation-Informed Merging of Large Language Models	Amin Heyrani Nobari et.al.	2502.02421	link
2025-02-04	From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing	Siwei Luo et.al.	2502.02025	null
2025-02-03	SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models	Diyana Muhammed et.al.	2502.01812	null
2025-02-03	Position: Towards a Responsible LLM-empowered Multi-Agent Systems	Jinwei Hu et.al.	2502.01714	null
2025-02-02	Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model	Hadas Ben-Atya et.al.	2502.01691	null
2025-02-02	LIBRA: Measuring Bias of Large Language Model from a Local Context	Bo Pang et.al.	2502.01679	null
2025-02-01	Benchmark on Peer Review Toxic Detection: A Challenging Task with a New Dataset	Man Luo et.al.	2502.01676	null
2025-02-03	CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering	Zongxi Li et.al.	2502.01523	null
2025-02-03	Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant	Gaole He et.al.	2502.01390	link
2025-02-03	PSSD: Making Large Language Models Self-denial via Human Psyche Structure	Jinzhi Liao et.al.	2502.01344	link
2025-02-03	Human-Agent Interaction in Synthetic Social Networks: A Framework for Studying Online Polarization	Tim Donkers et.al.	2502.01340	null
2025-02-03	DeepRAG: Thinking to Retrieval Step by Step for Large Language Models	Xinyan Guan et.al.	2502.01142	null
2025-02-03	Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning	Guanlin Li et.al.	2502.01116	null
2025-02-03	ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution	Kanika Goswami et.al.	2502.00989	null
2025-02-03	Context-Aware Hierarchical Merging for Long Document Summarization	Litu Ou et.al.	2502.00977	null
2025-02-02	Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications	Yixin Wu et.al.	2502.00808	link
2025-02-02	Generative AI for Analyzing Participatory Rural Appraisal Data: An Exploratory Case Study in Gender Research	Srividya Sheshadri et.al.	2502.00763	null
2025-02-02	MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction	Chao Wang et.al.	2502.00717	null
2025-02-01	Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation	Stuart Armstrong et.al.	2502.00580	link
2025-02-01	Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning	Zhi Zhou et.al.	2502.00511	null
2025-02-01	Estimating LLM Uncertainty with Logits	Huan Ma et.al.	2502.00290	link
2025-01-31	DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets	Abdurrahim Yilmaz et.al.	2502.00196	null
2025-01-31	Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models	Alina Shutova et.al.	2501.19392	link
2025-01-31	Towards Adaptive Self-Improvement for Smarter Energy Systems	Alexander Sommer et.al.	2501.19340	null
2025-01-31	Homogeneity Bias as Differential Sampling Uncertainty in Language Models	Messi H. J. Lee et.al.	2501.19337	null
2025-01-31	Offline Learning for Combinatorial Multi-armed Bandits	Xutong Liu et.al.	2501.19300	null
2025-01-31	Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs	Kejia Zhang et.al.	2501.19164	null
2025-01-31	Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities	Arjun Krishna et.al.	2501.19012	null
2025-01-30	Survey and Improvement Strategies for Gene Prioritization with Large Language Models	Matthew Neeley et.al.	2501.18794	null
2025-01-30	Differentially Private Steering for Large Language Model Alignment	Anmol Goel et.al.	2501.18532	link
2025-01-30	CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization	Yanxia Deng et.al.	2501.18475	null
2025-01-31	RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing	Jinyao Guo et.al.	2501.18160	link
2025-01-29	Large Language Models Think Too Fast To Explore Effectively	Lan Pan et.al.	2501.18009	null
2025-01-29	Uncertainty Quantification and Decomposition for LLM-based Recommendation	Wonbin Kweon et.al.	2501.17630	link
2025-01-29	Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis	Kunrong Li et.al.	2501.17598	null
2025-01-29	CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs	Amey Hengle et.al.	2501.17581	null
2025-01-28	Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization	Zilu Tang et.al.	2501.17295	null
2025-01-26	Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics	Jin Hyun Park et.al.	2501.17187	link
2025-02-01	LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering	Beiming Liu et.al.	2501.17183	null
2025-01-28	FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data	Deren Lei et.al.	2501.17144	link
2025-01-28	MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search	Shuozhi Yuan et.al.	2501.16607	null
2025-01-27	Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models	Huayu Li et.al.	2501.16215	link
2025-01-27	Parametric Retrieval Augmented Generation	Weihang Su et.al.	2501.15915	link
2025-01-26	Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis	Robinson Umeike et.al.	2501.15370	null
2025-01-26	Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection	Bo Yang et.al.	2501.15355	null
2025-01-25	You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning	Ayan Sengupta et.al.	2501.15296	null
2025-01-25	Can Large Language Models Be Trusted as Black-Box Evolutionary Optimizers for Combinatorial Problems?	Jie Zhao et.al.	2501.15081	null
2025-01-25	Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations	Harshita Chopra et.al.	2501.15056	null
2025-01-25	Federated Retrieval Augmented Generation for Multi-Product Question Answering	Parshin Shojaee et.al.	2501.14998	null
2025-01-24	Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing	Madeline Anderson et.al.	2501.14905	null
2025-01-24	Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs	Hang Luo et.al.	2501.14892	link
2025-01-24	Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains	Xu Chu et.al.	2501.14431	null
2025-01-24	Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph	Xujian Liang et.al.	2501.14300	link
2025-01-24	Humanity’s Last Exam	Long Phan et.al.	2501.14249	null
2025-01-24	AI Chatbots as Professional Service Agents: Developing a Professional Identity	Wenwen Li et.al.	2501.14179	null
2025-01-23	OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting	Xing Hu et.al.	2501.13987	link
2025-01-23	Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs	Bhumika Gupta et.al.	2501.13984	null
2025-01-20	A Layered Multi-Expert Framework for Long-Context Mental Health Assessments	Jinwen Tang et.al.	2501.13951	null
2025-01-23	CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation	Guofeng Cui et.al.	2501.13927	null
2025-01-23	On the Reasoning Capacity of AI Models and How to Quantify It	Santosh Kumar Radha et.al.	2501.13833	null
2025-01-23	Hallucinations Can Improve Large Language Models in Drug Discovery	Shuzhou Yuan et.al.	2501.13824	null
2025-01-22	OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models	Chongren Sun et.al.	2501.12975	link
2025-01-22	FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces	Zhenran Xu et.al.	2501.12909	null
2025-01-22	Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home	Viktor Moskvoretskii et.al.	2501.12835	null
2025-01-30	EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering	Chang Zong et.al.	2501.12746	null
2025-01-25	Online Preference Alignment for Language Models via Count-based Exploration	Chenjia Bai et.al.	2501.12735	link
2025-01-22	Paradigm-Based Automatic HDL Code Generation Using LLMs	Wenhao Sun et.al.	2501.12702	null
2025-01-19	AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model	Lipeng Ma et.al.	2501.11031	link
2025-01-18	Iterative Tree Analysis for Medical Critics	Zenan Huang et.al.	2501.10642	null
2025-01-18	Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks	Xin Yi et.al.	2501.10639	link
2025-01-17	4bit-Quantization in Vector-Embedding for RAG	Taehee Jeong et.al.	2501.10534	link
2025-01-17	Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling	Suvodip Dey et.al.	2501.10316	link
2025-01-17	Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions	Zhijie Tan et.al.	2501.10011	null
2025-01-17	Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models	Qiang Liu et.al.	2501.09997	null
2025-01-22	FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs	Zengyi Gao et.al.	2501.09957	null
2025-01-17	Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs	Reham Omar et.al.	2501.09928	link
2025-01-17	Towards A Litmus Test for Common Sense	Hugo Latapie et.al.	2501.09913	null
2025-01-17	FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis	Zhe Chen et.al.	2501.09887	null
2025-01-16	Bridging Language Barriers in Healthcare: A Study on Arabic LLMs	Nada Saadi et.al.	2501.09825	null
2025-01-16	Enhancing Generalization in Chain of Thought Reasoning for Smaller Models	Maxwell J. Yin et.al.	2501.09804	null
2025-01-24	Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong	Tairan Fu et.al.	2501.09775	null
2025-01-16	Confidence Estimation for Error Detection in Text-to-SQL Systems	Oleg Somov et.al.	2501.09527	link
2025-01-16	A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy	Huandong Wang et.al.	2501.09431	null
2025-01-16	Rational Tuning of LLM Cascades via Probabilistic Modeling	Michael J. Zellinger et.al.	2501.09345	null
2025-01-16	To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation	Kaustubh D. Dhole et.al.	2501.09292	null
2025-01-15	Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach	Alireza Ghaffari et.al.	2501.09107	null
2025-01-15	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Ruixiang Jiang et.al.	2501.09012	link
2025-01-15	Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching	Chuangtao Ma et.al.	2501.08686	link
2025-01-14	SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models	Anurag Kumar et.al.	2501.08421	null
2025-01-14	OptiChat: Bridging Optimization Models and Practitioners with Large Language Models	Hao Chen et.al.	2501.08406	link
2025-01-14	HALoGEN: Fantastic LLM Hallucinations and Where to Find Them	Abhilasha Ravichander et.al.	2501.08292	null
2025-01-14	Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering	Feijie Wu et.al.	2501.07813	null
2025-01-13	GPT as a Monte Carlo Language Tree: A Probabilistic Perspective	Kun-Peng Ning et.al.	2501.07641	null
2025-01-13	SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models	Fabien Bernier et.al.	2501.07639	null
2025-01-13	RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment	Difei Gu et.al.	2501.07525	link
2025-01-13	Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection	Xin Yin et.al.	2501.07425	null
2025-01-13	ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training	Jiayang Wu et.al.	2501.07078	link
2025-01-11	Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering	Yinghao Hu et.al.	2501.06521	link
2025-01-11	First Token Probability Guided RAG for Telecom Question Answering	Tingwei Chen et.al.	2501.06468	null
2025-01-21	MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare	Ye Chen et.al.	2501.06465	null
2025-01-10	Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea	Eunjung Cho et.al.	2501.05981	null
2025-01-10	Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models	Sungjae Lee et.al.	2501.05752	null
2025-01-09	Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning	Laura Puccioni et.al.	2501.05248	null
2025-01-09	Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments	Yifan Xu et.al.	2501.04947	null
2025-01-09	HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers	Yiyao Yang et.al.	2501.04908	link
2025-01-09	SUGAR: Leveraging Contextual Confidence for Smarter Retrieval	Hanna Zubkova et.al.	2501.04899	null
2025-01-08	Re-ranking the Context for Multimodal Retrieval Augmented Generation	Matin Mortaheb et.al.	2501.04695	null
2025-01-08	Multi-task retriever fine-tuning for domain-specific and efficient RAG	Patrice Béchard et.al.	2501.04652	null
2025-01-16	Knowledge Retrieval Based on Generative AI	Te-Lun Yang et.al.	2501.04635	null
2025-01-07	RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance	Matin Mortaheb et.al.	2501.03995	null
2025-01-07	Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles	Yuxi Xia et.al.	2501.03991	null
2025-01-07	Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States	Jurgita Kapočiūtė-Dzikienė et.al.	2501.03952	null
2025-01-08	A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval	Shuo Tong et.al.	2501.03295	null
2025-01-06	CALM: Curiosity-Driven Auditing for Large Language Models	Xiang Zheng et.al.	2501.02997	link
2025-01-19	FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models	Zhuo Chen et.al.	2501.02968	null
2025-01-09	InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion	Zhaoyi Yan et.al.	2501.02795	null
2025-01-06	QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance	Binita Saha et.al.	2501.02702	null
2025-01-06	EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models	Andrés Villa et.al.	2501.02699	null
2025-01-05	Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications	Zhe Chen et.al.	2501.02460	null
2025-01-04	Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation	Shijie Wang et.al.	2501.02226	null
2025-01-04	EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks	Shixuan Liu et.al.	2501.02192	null
2025-01-04	The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit	Huixue Zhou et.al.	2501.02173	null
2025-01-02	Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection	Kedi Chen et.al.	2501.02020	null
2025-01-03	Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification	Xiangxiang Dai et.al.	2501.01849	link
2025-01-03	LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries	Michal Kuk et.al.	2501.01711	null
2025-01-03	(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges	Mohamed Hisham Abdellatif et.al.	2501.01588	null
2025-01-02	BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery	Kanishk Gandhi et.al.	2501.01540	link
2025-01-02	Aligning Large Language Models for Faithful Integrity Against Opposing Argument	Yong Zhao et.al.	2501.01336	link
2025-01-02	Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension	Yanbo Fang et.al.	2501.01332	null
2025-01-03	Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking	Xiaoxue Cheng et.al.	2501.01306	null
2025-01-02	Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion	Qiyuan He et.al.	2501.01246	null
2025-01-02	SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization	Yongle Huang et.al.	2501.01245	link
2025-01-02	Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method	Ruichen Zhang et.al.	2501.01141	null
2025-01-02	Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models	Yanwen Huang et.al.	2501.01059	null
2025-01-02	Dynamic Scaling of Unit Tests for Code Reward Modeling	Zeyao Ma et.al.	2501.01054	null
2025-01-07	LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management	Yichen Luo et.al.	2501.00826	null
2025-01-01	NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model	Yuzhi Lai et.al.	2501.00785	null
2024-12-31	Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs	Harit Vishwakarma et.al.	2501.00555	null
2024-12-31	A review of faithfulness metrics for hallucination assessment in Large Language Models	Ben Malin et.al.	2501.00269	null
2024-12-31	CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care	Michael Gubanov et.al.	2501.00223	null
2024-12-30	CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions	Mourad Heddaya et.al.	2501.00097	null
2024-12-30	Facilitating large language model Russian adaptation with Learned Embedding Propagation	Mikhail Tikhomirov et.al.	2412.21140	link
2024-12-30	KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation	Siyuan Fang et.al.	2412.20995	null
2024-12-30	Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs’ Memory	Xingjian Tao et.al.	2412.20846	null
2024-12-30	UBER: Uncertainty-Based Evolution with Large Language Models for Automatic Heuristic Design	Zijie Chen et.al.	2412.20694	link
2025-01-05	Distilling Desired Comments for Enhanced Code Review with Large Language Models	Yongda Yu et.al.	2412.20340	null
2024-12-29	Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain	Shintaro Ozaki et.al.	2412.20309	link
2024-12-28	ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty	Qing Zong et.al.	2412.20251	link
2024-12-27	Toward Adaptive Reasoning in Large Language Models with Thought Rollback	Sijia Chen et.al.	2412.19707	link
2024-12-27	Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs	Zhe Yang et.al.	2412.19513	link
2024-12-27	MBQ: Modality-Balanced Quantization for Large Vision-Language Models	Shiyao Li et.al.	2412.19509	link
2024-12-26	RAG with Differential Privacy	Nicolas Grislain et.al.	2412.19291	link
2025-01-03	MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models	Kaiwen Zuo et.al.	2412.18947	null
2025-01-06	Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation	Derong Xu et.al.	2412.18537	link
2024-12-24	Is Large Language Model Good at Triple Set Prediction? An Empirical Study	Yuan Yuan et.al.	2412.18443	null
2024-12-24	Annotating References to Mythological Entities in French Literature	Thierry Poibeau et.al.	2412.18270	null
2024-12-24	Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine	Yu He Ke et.al.	2412.18096	null
2024-12-23	Trustworthy and Efficient LLMs Meet Databases	Kyoungmin Kim et.al.	2412.18022	null
2024-12-22	The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM’s Internal States	Fabian Ridder et.al.	2412.17056	link
2024-12-22	Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs	Alexander von Recum et.al.	2412.16974	null
2024-12-28	Lillama: Large Language Models Compression via Low-Rank Feature Distillation	Yaya Sy et.al.	2412.16719	null
2024-12-21	Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks	Jinyan Su et.al.	2412.16708	link
2024-12-21	AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles	Aritra Kumar Lahiri et.al.	2412.16701	null
2024-12-21	Internalized Self-Correction for Large Language Models	Nishanth Upadhyaya et.al.	2412.16653	null
2024-12-21	Identifying Cyberbullying Roles in Social Media	Manuel Sandoval et.al.	2412.16417	null
2024-12-20	Towards Safe and Honest AI Agents with Neural Self-Other Overlap	Marc Carauleanu et.al.	2412.16325	null
2024-12-20	Logical Consistency of Large Language Models in Fact-checking	Bishwamittra Ghosh et.al.	2412.16100	null
2024-12-20	To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models	Jessica Y. Bo et.al.	2412.15584	null
2024-12-24	Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage	Saehyung Lee et.al.	2412.15484	null
2024-12-19	Systematic Evaluation of Long-Context LLMs on Financial Concepts	Lavanya Gupta et.al.	2412.15386	null
2024-12-19	Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models	Nishtha N. Vaidya et.al.	2412.15309	null
2024-12-19	A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation	Bhaskarjit Sarmah et.al.	2412.15298	null
2024-12-19	Confidence in the Reasoning of Large Language Models	Yudi Pawitan et.al.	2412.15296	link
2024-12-17	SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation	Yuzheng Cai et.al.	2412.15272	link
2024-12-17	A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models	Gongbo Zhang et.al.	2412.15271	null
2024-12-15	LLMs for Literature Review: Are we there yet?	Shubham Agarwal et.al.	2412.15249	null
2024-12-19	Rethinking Uncertainty Estimation in Natural Language Generation	Lukas Aichberger et.al.	2412.15176	null
2024-12-19	Adaptive Pruning for Large Language Models with Structural Importance Awareness	Haotian Zheng et.al.	2412.15127	null
2024-12-19	Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability	Xiangsen Chen et.al.	2412.15101	null
2024-12-19	RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response	Junyu Luo et.al.	2412.14922	link
2024-12-19	Dehallucinating Parallel Context Extension for Retrieval-Augmented Generation	Zexiong Ma et.al.	2412.14905	null
2024-12-19	Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling	Junyi Li et.al.	2412.14860	null
2024-12-19	Query pipeline optimization for cancer patient question answering systems	Maolin He et.al.	2412.14751	null
2024-12-19	On Verbalized Confidence Scores for LLMs	Daniel Yang et.al.	2412.14737	link
2024-12-25	Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models	Zijun Chen et.al.	2412.14660	link
2024-12-19	Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment	Teng Xiao et.al.	2412.14516	link
2024-12-19	FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis	Abdullah Khan et.al.	2412.14492	link
2024-12-18	LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis	Chengpeng Wang et.al.	2412.14399	null
2024-12-18	Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets	Simon Thorne et.al.	2412.14062	null
2024-12-18	Discovering maximally consistent distribution of causal tournaments with Large Language Models	Federico Baldo et.al.	2412.14019	null
2024-12-27	Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence	Jinghan He et.al.	2412.13949	null
2024-12-29	Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection	Le Yang et.al.	2412.13817	link
2024-12-18	Meta-Reflection: A Feedback-Free Reflection Learning Framework	Yaoke Wang et.al.	2412.13781	null
2024-12-18	Are LLMs Good Literature Review Writers? Evaluating the Literature Review Writing Ability of Large Language Models	Xuemei Tang et.al.	2412.13612	null
2024-12-18	Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement	Qianyue Wang et.al.	2412.13575	link
2024-12-18	C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System	Parker Addison et.al.	2412.13163	null
2024-12-17	Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health	Vivek Kumar et.al.	2412.12981	link
2024-12-17	A Survey of Calibration Process for Black-Box LLMs	Liangru Xie et.al.	2412.12767	null
2024-12-18	Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models	Seungeun Oh et.al.	2412.12687	null
2024-12-17	What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context	Zhiyuan Chang et.al.	2412.12632	null
2024-12-17	Jailbreaking? One Step Is Enough!	Weixiong Zheng et.al.	2412.12621	null
2024-12-17	When to Speak, When to Abstain: Contrastive Decoding with Abstention	Hyuhng Joon Kim et.al.	2412.12527	null
2024-12-12	Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off	Eoin M. Kenny et.al.	2412.12169	link
2024-12-11	SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration	Yuanhao Shen et.al.	2412.12151	link
2024-12-16	LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts	Zhuhao Wang et.al.	2412.12001	link
2024-12-16	RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation	Xiaoxi Li et.al.	2412.11919	link
2024-12-16	Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments	Andrii Nikolaiev et.al.	2412.11908	null
2024-12-16	A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection	Simon Hachmeier et.al.	2412.11851	link
2024-12-16	UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models	Boyang Xue et.al.	2412.11803	link
2024-12-16	Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods	Diana Bar-Or Nirman et.al.	2412.11625	null
2024-12-16	Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes	Antonio Carlos Rivera et.al.	2412.11396	null
2024-12-15	CATER: Leveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation	Kurando IIDA et.al.	2412.11261	null
2024-12-15	Do Tutors Learn from Equity Training and Can Generative AI Assess It?	Danielle R. Thomas et.al.	2412.11255	link
2024-12-15	Task-Oriented Dialog Systems for the Senegalese Wolof Language	Derguene Mbaye et.al.	2412.11203	null
2024-12-15	Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning	Shengqiong Wu et.al.	2412.11124	null
2024-12-15	Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning	Yun Qu et.al.	2412.11120	link
2024-12-15	Empowering LLMs to Understand and Generate Complex Vector Graphics	Ximing Xing et.al.	2412.11102	null
2024-12-17	MedG-KRP: Medical Graph Knowledge Representation Probing	Gabriel R. Rosenbaum et.al.	2412.10982	null
2024-12-14	Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data	Xue Wu et.al.	2412.10654	null
2024-12-13	Benchmarking large language models for materials synthesis: the case of atomic layer deposition	Angel Yanguas-Gil et.al.	2412.10477	null
2024-12-13	Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts	Hazel Kim et.al.	2412.10246	null
2024-12-13	How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives	Timour Ichmoukhamedov et.al.	2412.10220	link
2024-12-13	TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering	Bingru Li et.al.	2412.10139	null
2024-12-13	ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL	Yang Qin et.al.	2412.10138	link
2024-12-12	DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction	Yu Feng et.al.	2412.09572	null
2024-12-12	Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion	Ben Liu et.al.	2412.09094	link
2024-12-12	Dial-In LLM: Human-Aligned Dialogue Intent Clustering with LLM-in-the-loop	Mengze Hong et.al.	2412.09049	null
2024-12-12	Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning	Wenna Lai et.al.	2412.09046	null
2024-12-12	ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty	Meizhi Zhong et.al.	2412.09036	null
2024-12-11	Learning to Reason via Self-Iterative Process Feedback for Small Language Models	Kaiyuan Chen et.al.	2412.08393	null
2024-12-11	What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models	Bangshuo Zhu et.al.	2412.08098	null
2024-12-10	HalluCana: Fixing LLM Hallucination with A Canary Lookahead	Tianyi Li et.al.	2412.07965	null
2024-12-10	Forking Paths in Neural Text Generation	Eric Bigelow et.al.	2412.07961	null
2024-12-10	Low-Rank Correction for Quantized LLMs	Meyer Scetbon et.al.	2412.07902	null
2024-12-08	Language Model as Visual Explainer	Xingyi Yang et.al.	2412.07802	null
2024-12-16	Granite Guardian	Inkit Padhi et.al.	2412.07724	link
2024-12-10	Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation	Qinhong Lin et.al.	2412.07255	null
2024-12-10	Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data Replay	Ruiheng Liu et.al.	2412.07246	null
2024-12-10	MAPLE: A Framework for Active Preference Learning Guided by Large Language Models	Saaduddin Mahmud et.al.	2412.07207	null
2024-12-10	When Graph Meets Retrieval Augmented Generation for Wireless Networks: A Tutorial and Case Study	Yang Xiong et.al.	2412.07189	null
2024-12-10	Post-Training Statistical Calibration for Higher Activation Sparsity	Vui Seng Chua et.al.	2412.07174	link
2024-12-11	ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models	Jieyu Zhang et.al.	2412.07012	link
2024-12-09	Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study	Ehsan Shareghi et.al.	2412.06272	null
2024-12-09	MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization	Kangyu Zhu et.al.	2412.06141	link
2024-12-08	Hallucination-aware Optimization for Large Language Model-empowered Communications	Yinqiu Liu et.al.	2412.06007	link
2024-12-07	Training-Free Bayesianization for Low-Rank Adapters of Large Language Models	Haizhou Shi et.al.	2412.05723	link
2024-12-07	Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent	Ziyuan Qin et.al.	2412.05722	null
2024-12-07	A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions	Ola Shorinwa et.al.	2412.05563	null
2024-12-07	Ranking of Large Language Model with Nonparametric Prompts	Zebin Wang et.al.	2412.05506	null
2024-12-06	Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization	Subhojyoti Mukherjee et.al.	2412.05469	null
2024-12-06	A Graph-Based Approach for Conversational AI-Driven Personal Memory Capture and Retrieval in a Real-world Application	Savini Kashmira et.al.	2412.05447	null
2024-12-06	HiVeGen – Hierarchical LLM-based Verilog Generation for Scalable Chip Design	Jinwei Tang et.al.	2412.05393	null
2024-12-09	Enhancing FKG.in: automating Indian food composition analysis	Saransh Kumar Gupta et.al.	2412.05248	null
2024-12-06	100% Hallucination Elimination Using Acurai	Michael C. Wood et.al.	2412.05223	link
2024-12-06	Steps are all you need: Rethinking STEM Education with Prompt Engineering	Krishnasai Addala et.al.	2412.05023	null
2024-12-06	Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance	Xuchan Bao et.al.	2412.04746	null
2024-12-06	LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs	Xuan Chen et.al.	2412.04690	null
2024-12-05	HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning	Manish Bhattarai et.al.	2412.04661	link
2024-12-10	Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates	Li Shi et.al.	2412.04629	null
2024-12-05	Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion	Jiuhai Chen et.al.	2412.04424	link
2024-12-05	Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation	Xuying Li et.al.	2412.04415	null
2024-12-05	Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots	Maria Paola Priola et.al.	2412.04235	null
2024-12-05	Reducing Tool Hallucination via Reliability Alignment	Hongshen Xu et.al.	2412.04141	null
2024-12-04	A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences	Gabriel Lino Garcia et.al.	2412.03531	null
2024-12-04	You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks?	Dominic Lohr et.al.	2412.03516	null
2024-12-03	Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning	Ranganath Krishnan et.al.	2412.02904	null
2024-12-03	An Evolutionary Large Language Model for Hallucination Mitigation	Abdennour Boulesnane et.al.	2412.02790	null
2024-12-03	OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation	Junyuan Zhang et.al.	2412.02592	link
2024-12-03	Semantic Tokens in Retrieval Augmented Generation	Joel Suro et.al.	2412.02563	null
2024-12-04	The use of large language models to enhance cancer clinical trial educational materials	Mingye Gao et.al.	2412.01955	null
2024-12-04	The Reality of AI and Biorisk	Aidan Peppin et.al.	2412.01946	null
2024-12-02	R-Bot: An LLM-based Query Rewrite System	Zhaoyan Sun et.al.	2412.01661	null
2024-12-02	Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input	Francesco Taioli et.al.	2412.01250	null
2024-12-02	SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages	Jia Guo et.al.	2412.01186	link
2024-12-02	SAUP: Situation Awareness Uncertainty Propagation on LLM Agent	Qiwei Zhao et.al.	2412.01033	null
2024-12-02	AI Benchmarks and Datasets for LLM Evaluation	Todor Ivanov et.al.	2412.01020	null
2024-12-06	Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection	Shanu Kumar et.al.	2412.00353	null
2024-11-30	Human-Like Code Quality Evaluation through LLM-based Recursive Semantic Comprehension	Fangzhou Xu et.al.	2412.00314	null
2024-11-29	An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement	Saurabh Mishra et.al.	2412.00224	null
2024-11-24	Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis	Ferhat Ozgur Catak et.al.	2412.00056	null
2024-12-02	Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis	Alessandro Scirè et.al.	2411.19655	link
2024-11-29	RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation	Xianfeng Tan et.al.	2411.19528	null
2024-11-29	Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems	Shengming Zhao et.al.	2411.19463	null
2024-11-28	Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs	Anirudh Phukan et.al.	2411.19187	null
2024-11-28	Mars-PO: Multi-Agent Reasoning System Preference Optimization	Xiaoxuan Lou et.al.	2411.19039	null
2024-11-28	AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models	Jisheng Bai et.al.	2411.18953	link
2024-11-27	Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students	Tiffany Zhu et.al.	2411.18708	null
2024-11-27	Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track	Deepak Gupta et.al.	2411.18069	null
2024-11-26	MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation	Sankalp Sinha et.al.	2411.17945	link
2024-11-26	AI2T: Building Trustable AI Tutors by Interactively Teaching a Self-Aware Learning Agent	Daniel Weitekamp et.al.	2411.17924	null
2024-11-26	$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs	Selim Furkan Tekin et.al.	2411.17792	link
2024-11-26	MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation	Harsh Singh et.al.	2411.17636	null
2024-11-26	One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models	Pengfei Cao et.al.	2411.17401	null
2024-11-26	Can LLMs be Good Graph Judger for Knowledge Graph Construction?	Haoyu Huang et.al.	2411.17388	link
2024-11-26	Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning	Milena Chadimová et.al.	2411.17304	null
2024-11-26	HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator	Fan Yang et.al.	2411.17261	null
2024-11-25	Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries	Harshavardhan Battula et.al.	2411.16818	null
2024-11-25	Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models	Alireza Amiri-Margavi et.al.	2411.16797	null
2024-11-25	VidHal: Benchmarking Temporal Hallucinations in Vision LLMs	Wey Yeh Choong et.al.	2411.16771	link
2024-11-23	Text-to-SQL Calibration: No Need to Ask – Just Rescale Model Probabilities	Ashwin Ramachandran et.al.	2411.16742	null
2024-11-23	Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction	Mitchell Rosser et.al.	2411.16723	null
2024-11-28	Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation	Sanjana Ramprasad et.al.	2411.16638	null
2024-12-03	AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning	Amy Xin et.al.	2411.16495	link
2024-11-25	Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models	Zhihua Duan et.al.	2411.16189	null
2024-11-24	Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown	Lifu Tu et.al.	2411.15993	null
2024-11-23	Ontology-Constrained Generation of Domain-Specific Clinical Summaries	Gaya Mehenni et.al.	2411.15666	link
2024-11-23	MC-NEST – Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree	Gollam Rabby et.al.	2411.15645	link
2024-11-23	“All that Glitters”: Approaches to Evaluations with Unreliable Model and Human Annotations	Michael Hardy et.al.	2411.15634	link
2024-11-22	Sycophancy in Large Language Models: Causes and Mitigations	Lars Malmqvist et.al.	2411.15287	null
2024-11-18	Can Open-source LLMs Enhance Data Augmentation for Toxic Detection?: An Experimental Study	Zheng Hui et.al.	2411.15175	null
2024-11-22	Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation	Colin Diggs et.al.	2411.14971	null
2024-11-22	SwissADT: An Audio Description Translation System for Swiss Languages	Lukas Fischer et.al.	2411.14967	null
2024-12-01	G-RAG: Knowledge Expansion in Material Science	Radeen Mostafa et.al.	2411.14592	link
2024-11-20	The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz	David Noever et.al.	2411.14486	null
2024-11-19	Why you don’t overfit, and don’t need Bayes if you only train for one epoch	Laurence Aitchison et.al.	2411.14478	null
2024-11-18	Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning	Elizaveta Reganova et.al.	2411.14465	null
2024-11-15	Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models	Maryam Shoaeinaeini et.al.	2411.14457	null
2024-11-21	Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance	Haozhe Zhao et.al.	2411.14279	null
2024-11-21	Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective	Ernests Lavrinovics et.al.	2411.14258	null
2024-11-21	RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks	Changyue Jiang et.al.	2411.14110	null
2024-11-21	XAgents: A Framework for Interpretable Rule-Based Multi-Agents Cooperation	Hailong Yang et.al.	2411.13932	null
2024-11-21	Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels	Jianhao Yan et.al.	2411.13775	link
2024-11-20	Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics	Ryan Mok et.al.	2411.13685	link
2024-11-21	Disentangling Memory and Reasoning Ability in Large Language Models	Mingyu Jin et.al.	2411.13504	link
2024-11-20	Fact-Level Confidence Calibration and Self-Correction	Yige Yuan et.al.	2411.13343	link
2024-11-20	Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding	Nabeel Seedat et.al.	2411.13163	null
2024-11-16	A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery	Grace Sng et.al.	2411.12759	null
2024-11-19	Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs	Malay Kumar et.al.	2411.12685	null
2024-11-15	Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination	Haojie Zheng et.al.	2411.12591	link
2024-11-19	Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering	Aryan Keluskar et.al.	2411.12395	null
2024-11-28	VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation	Ruiyang Zhang et.al.	2411.11919	null
2024-11-07	Deploying Large Language Models With Retrieval Augmented Generation	Sonal Prabhune et.al.	2411.11895	link
2024-11-18	Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality	Viktoriia Chekalina et.al.	2411.11531	null
2024-11-18	Membership Inference Attack against Long-Context Large Language Models	Zixiong Wang et.al.	2411.11424	null
2024-11-29	Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?	Rosalia Tufano et.al.	2411.11401	link
2024-11-17	Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering	Zeping Yu et.al.	2411.10950	link
2024-11-16	Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation	Shuyang Hou et.al.	2411.10753	null
2024-11-16	I’m Spartacus, No, I’m Spartacus: Measuring and Understanding LLM Identity Confusion	Kun Li et.al.	2411.10683	null
2024-11-15	Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering	Buntaro Hiraki et.al.	2411.10583	null
2024-11-15	On the Privacy Risk of In-context Learning	Haonan Duan et.al.	2411.10512	null
2024-11-15	Understanding The Effect Of Temperature On Alignment With Human Opinions	Maja Pavlovic et.al.	2411.10080	null
2024-11-15	Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity	Zichen Song et.al.	2411.10069	null
2024-11-15	Experiences from Using LLMs for Repository Mining Studies in Empirical Software Engineering	Vincenzo de Martino et.al.	2411.09974	null
2024-11-15	AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference	Janghwan Lee et.al.	2411.09909	null
2024-11-14	LLM Hallucination Reasoning with Zero-shot Knowledge Test	Seongmin Lee et.al.	2411.09689	null
2024-11-14	DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine	Jean Seo et.al.	2411.09255	link
2024-11-14	Toward Democratized Generative AI in Next-Generation Mobile Edge Networks	Ruichen Zhang et.al.	2411.09148	null
2024-11-13	The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models	Daniel P. Jeong et.al.	2411.08870	link
2024-11-04	QCG-Rerank: Chunks Graph Rerank with Query Expansion in Retrieval-Augmented LLMs for Tourism Domain	Qikai Wei et.al.	2411.08724	null
2024-11-13	Neural Topic Modeling with Large Language Models in the Loop	Xiaohao Yang et.al.	2411.08534	null
2024-11-13	Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach	Shangfeng Chen et.al.	2411.08348	null
2024-11-13	Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering	Farouq Sammour et.al.	2411.08320	null
2024-11-12	Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data	Juanhui Li et.al.	2411.08028	null
2024-11-12	From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents	Chuyi Kong et.al.	2411.07965	null
2024-11-13	Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders	Xiaofeng Zhu et.al.	2411.07870	null
2024-11-12	Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models	Yusen Zhang et.al.	2411.07858	link
2024-11-12	OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework	Jiaxi Li et.al.	2411.07711	link
2024-11-12	DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises	Nan Xu et.al.	2411.07457	link
2024-11-16	Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation	Ziwei Liu et.al.	2411.07021	null
2024-11-11	LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?	Rikiya Takehi et.al.	2411.06877	link
2024-11-11	AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant	Yujia Zhou et.al.	2411.06805	link
2024-11-11	Anchor Attention, Small Cache: Code Generation with Large Language Models	Xiangyu Zhang et.al.	2411.06680	link
2024-11-10	CriticAL: Critic Automation with Language Models	Michael Y. Li et.al.	2411.06590	null
2024-11-10	Epistemic Integrity in Large Language Models	Bijean Ghafouri et.al.	2411.06528	link
2024-11-10	Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques	Daniil Sulimov et.al.	2411.06445	null
2024-11-09	Sufficient Context: A New Lens on Retrieval Augmented Generation Systems	Hailey Joren et.al.	2411.06037	null
2024-11-12	Game-theoretic LLM: Agent Workflow for Negotiation Games	Wenyue Hua et.al.	2411.05990	link
2024-11-08	FactLens: Benchmarking Fine-Grained Fact Verification	Kushan Mitra et.al.	2411.05980	null
2024-11-08	Mitigating Hallucination with ZeroG: An Advanced Knowledge Management Engine	Anantha Sharma et.al.	2411.05936	null
2024-11-08	The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent	Leon O. H. Kroczek et.al.	2411.05653	null
2024-11-16	Web Archives Metadata Generation with GPT-4o: Challenges and Insights	Abigail Yongping Huang et.al.	2411.05409	link
2024-11-08	Seeing Through the Fog: A Cost-Effectiveness Analysis of Hallucination Detection Systems	Alexander Thomas et.al.	2411.05270	null
2024-11-07	Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability	Yanjun Gao et.al.	2411.04962	null
2024-11-07	Prompt-Guided Internal States for Hallucination Detection of Large Language Models	Fujie Zhang et.al.	2411.04847	link
2024-11-07	Self-Calibrated Listwise Reranking with Large Language Models	Ruiyang Ren et.al.	2411.04602	null
2024-11-07	LLM-R: A Framework for Domain-Adaptive Maintenance Scheme Generation Combining Hierarchical Agents and RAG	Laifa Tao et.al.	2411.04476	null
2024-11-07	Bayesian Calibration of Win Rate Estimation with LLM Evaluators	Yicheng Gao et.al.	2411.04424	link
2024-11-06	A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI	Melusi Malinga et.al.	2411.04316	null
2024-11-06	Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?	Daniel P. Jeong et.al.	2411.04118	link
2024-11-06	Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation	Yuhang Liu et.al.	2411.03957	null
2024-11-06	EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning	Kiran Purohit et.al.	2411.03877	link
2024-11-06	QUILL: Quotation Generation Enhancement of Large Language Models	Jin Xiao et.al.	2411.03675	link
2024-11-05	Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature	Viviane Torres da Silva et.al.	2411.03484	null
2024-11-05	VERITAS: A Unified Approach to Reliability Evaluation	Rajkumar Ramamurthy et.al.	2411.03300	null
2024-11-05	Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities	Ryosuke Takata et.al.	2411.03252	null
2024-11-05	HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems	Jiejun Tan et.al.	2411.02959	link
2024-11-05	Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning	Tao Zhang et.al.	2411.02864	null
2024-11-05	V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization	Yuxi Xie et.al.	2411.02712	link
2024-11-07	FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees	Fan Nie et.al.	2411.02603	null
2024-11-03	Graph-based Confidence Calibration for Large Language Models	Yukun Li et.al.	2411.02454	null
2024-11-03	Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models	Aliyah R. Hsu et.al.	2411.02448	link
2024-11-04	Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models	Guangzhi Xiong et.al.	2411.02382	null
2024-11-04	Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI	Ramneet Kaur et.al.	2411.02381	null
2024-11-04	“Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization	Eldar Kurtic et.al.	2411.02355	null
2024-11-03	Autoformulation of Mathematical Optimization Models Using LLMs	Nicolás Astorga et.al.	2411.01679	null
2024-11-03	Ontology Population using LLMs	Sanaz Saki Norouzi et.al.	2411.01612	null
2024-11-02	AMREx: AMR for Explainable Fact Verification	Chathuri Jayaweera et.al.	2411.01343	null
2024-11-01	Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output	Hithesh Sankararaman et.al.	2411.01022	null
2024-10-30	FPE-LLM: Highly Intelligent Time-Series Forecasting and Language Interaction LLM in Energy Systems	Zihang Qiu et.al.	2411.00852	null
2024-10-30	GWQ: Gradient-Aware Weight Quantization for Large Language Models	Yihua Shao et.al.	2411.00850	null
2024-11-01	CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation	Ziting Wang et.al.	2411.00744	null
2024-11-01	Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval	Qingfei Zhao et.al.	2411.00689	null
2024-11-01	Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation	Bohan Lyu et.al.	2411.00412	null
2024-11-01	Beyond Utility: Evaluating LLM as Recommender	Chumeng Jiang et.al.	2411.00331	link
2024-11-01	Rationale-Guided Retrieval Augmented Generation for Medical Question Answering	Jiwoong Sohn et.al.	2411.00300	link
2024-11-01	RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models	Sraavya Sambara et.al.	2411.00299	null
2024-10-29	Problem Categorization Can Help Large Language Models Solve Math Problems	Amogh Akella et.al.	2411.00042	null
2024-10-28	A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges	Zifeng Wang et.al.	2411.00024	null
2024-11-04	Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models	Ognjen et.al.	2411.00023	null
2024-10-31	Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs	Liyi Chen et.al.	2410.23875	link
2024-10-31	Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs	Shuyang Yu et.al.	2410.23605	null
2024-10-31	Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval	Sheryl Hsu et.al.	2410.23214	null
2024-10-30	VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning	Jingkun Ma et.al.	2410.22995	null
2024-10-30	Retrieval-Augmented Generation with Estimation of Source Reliability	Jeongyeon Hwang et.al.	2410.22954	null
2024-10-30	Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations	Leonardo Ranaldi et.al.	2410.22874	null
2024-10-30	Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot	Sejin Lee et.al.	2410.22767	link
2024-10-30	Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings	Yashvir S. Grewal et.al.	2410.22685	null
2024-10-29	Distinguishing Ignorance from Error in LLM Hallucinations	Adi Simhi et.al.	2410.22071	link
2024-10-29	Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications	Monica Riedler et.al.	2410.21943	link
2024-10-29	MARCO: Multi-Agent Real-time Chat Orchestration	Anubhav Shrimal et.al.	2410.21784	null
2024-10-28	LLM-Forest for Health Tabular Data Imputation	Xinrui He et.al.	2410.21520	null
2024-10-28	EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation	Shih-Yang Liu et.al.	2410.21271	null
2024-10-28	CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models	Meiqi Chen et.al.	2410.21067	null
2024-10-28	Reward Modeling with Weak Supervision for Language Models	Ben Hauptvogel et.al.	2410.20869	link
2024-10-28	Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation	Jaechang Kim et.al.	2410.20811	null
2024-10-28	Graph-based Uncertainty Metrics for Long-form Language Model Outputs	Mingjian Jiang et.al.	2410.20783	link
2024-10-28	Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation	Dongryeol Lee et.al.	2410.20774	link
2024-10-28	Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation	Mufei Li et.al.	2410.20724	link
2024-10-27	Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains	Jiemin Wu et.al.	2410.20340	null
2024-10-26	Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models	Mohammad Beigi et.al.	2410.20199	null
2024-10-26	Uncertainty-Penalized Direct Preference Optimization	Sam Houliston et.al.	2410.20187	null
2024-10-26	Mask-based Membership Inference Attacks for Retrieval-Augmented Generation	Mingrui Liu et.al.	2410.20142	null
2024-10-26	Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics	Mikhail Rumiantsau et.al.	2410.20024	null
2024-10-25	FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning	Nicole Cho et.al.	2410.19727	null
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702	null
2024-10-30	ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems	Ishneet Sukhvinder Singh et.al.	2410.19572	null
2024-11-01	Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization	Anthony Cui et.al.	2410.19499	null
2024-10-25	A Debate-Driven Experiment on LLM Hallucinations and Accuracy	Ray Li et.al.	2410.19485	null
2024-10-25	Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models	Liam Barkley et.al.	2410.19385	null
2024-10-25	Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning	Yujian Liu et.al.	2410.19290	link
2024-10-24	Prebunking Elections Rumors: Artificial Intelligence Assisted Interventions Increase Confidence in American Elections	Mitchell Linegar et.al.	2410.19202	null
2024-10-24	AlignCap: Aligning Speech Emotion Captioning to Human Preferences	Ziqi Liang et.al.	2410.19134	null
2024-10-24	LLM Tree Search	Dylan Wilson et.al.	2410.19117	null
2024-10-30	Dynamic Vocabulary Pruning in Early-Exit LLMs	Jort Vincenti et.al.	2410.18952	link
2024-10-24	DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations	Aryo Pradipta Gema et.al.	2410.18860	link
2024-10-25	An LLM Agent for Automatic Geospatial Data Analysis	Yuxing Chen et.al.	2410.18792	null
2024-10-24	Task Calibration: Calibrating Large Language Models on Inference Tasks	Yingjie Li et.al.	2410.18764	null
2024-10-24	LLM-Slice: Dedicated Wireless Network Slicing for Large Language Models	Boyi Liu et.al.	2410.18499	null
2024-10-23	AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	Kim Sung-Bin et.al.	2410.18325	link
2024-10-23	Multilingual Hallucination Gaps in Large Language Models	Cléa Chataigner et.al.	2410.18270	null
2024-10-23	Beware of Calibration Data for Pruning Large Language Models	Yixin Ji et.al.	2410.17711	null
2024-10-23	MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models	Guijin Son et.al.	2410.17578	link
2024-10-29	Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination	Jerry Huang et.al.	2410.17477	null
2024-10-22	ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs	Reza Fayyazi et.al.	2410.17406	link
2024-10-22	DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR	Miguel Contreras et.al.	2410.17363	null
2024-10-22	Are Large Language Models Ready for Travel Planning?	Ruiping Ren et.al.	2410.17333	null
2024-10-22	Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy	Benedict Aaron Tjandra et.al.	2410.17234	null
2024-10-23	GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks	Shuyang Hou et.al.	2410.17031	null
2024-10-22	SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine	Xiaochen Wang et.al.	2410.17021	null
2024-10-22	Combining Ontological Knowledge and Large Language Model for User-Friendly Service Robots	Haru Nakajima et.al.	2410.16804	null
2024-10-21	Large language models enabled multiagent ensemble method for efficient EHR data labeling	Jingwei Huang et.al.	2410.16543	null
2024-10-21	Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models’ Reasoning with Formal Logic	Jason Chan et.al.	2410.16502	null
2024-10-18	Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs	Rui Pu et.al.	2410.16327	null
2024-10-29	Can Knowledge Editing Really Correct Hallucinations?	Baixiang Huang et.al.	2410.16251	link
2024-10-21	Analyzing Context Contributions in LLM-based Machine Translation	Emmanouil Zaranis et.al.	2410.16246	null
2024-10-23	IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems	Yihuan Mao et.al.	2410.16237	null
2024-10-21	Information for Conversation Generation: Proposals Utilising Knowledge Graphs	Alex Clay et.al.	2410.16196	null
2024-10-22	Reducing Hallucinations in Vision-Language Models via Latent Space Steering	Sheng Liu et.al.	2410.15778	link
2024-10-21	Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding	Derong Xu et.al.	2410.15702	null
2024-10-21	Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences	Yiping Ma et.al.	2410.15701	null
2024-10-21	NetSafe: Exploring the Topological Safety of Multi-agent Networks	Miao Yu et.al.	2410.15686	null
2024-10-21	Bayesian Concept Bottleneck Models with LLM Priors	Jean Feng et.al.	2410.15555	link
2024-10-20	Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini	Chanseo Lee et.al.	2410.15528	null
2024-10-22	Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence	Norbert Tihanyi et.al.	2410.15490	null
2024-10-20	Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training	Shahrad Mohammadzadeh et.al.	2410.15460	null
2024-10-20	CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges	Haitao Li et.al.	2410.15393	link
2024-10-20	A Survey of Hallucination in Large Visual Language Models	Wei Lan et.al.	2410.15359	null
2024-10-20	Modality-Fair Preference Optimization for Trustworthy MLLM Alignment	Songtao Jiang et.al.	2410.15334	null
2024-10-20	A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice	Hsiu-Yuan Huang et.al.	2410.15326	null
2024-10-20	Causality for Large Language Models	Anpeng Wu et.al.	2410.15319	link
2024-10-20	MAD: Move AI Decompiler to Improve Transparency and Auditability on Non-Open-Source Blockchain Smart Contract	Eason Chen et.al.	2410.15275	null
2024-10-19	Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction	Yinhan He et.al.	2410.15165	link
2024-10-19	MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification	Yin Li et.al.	2410.15154	link
2024-10-22	Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization	Zihui Wu et.al.	2410.15052	link
2024-10-19	“Ghost of the past”: identifying and resolving privacy leakage from LLM’s memory through proactive user interaction	Shuning Zhang et.al.	2410.14931	null
2024-10-18	FedSpaLLM: Federated Pruning of Large Language Models	Guangji Bai et.al.	2410.14852	null
2024-10-18	Enabling Scalable Evaluation of Bias Patterns in Medical LLMs	Hamed Fayyaz et.al.	2410.14763	link
2024-10-22	ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries	Kishan Maharaj et.al.	2410.14748	null
2024-10-17	Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors	Anthony Sicilia et.al.	2410.14744	null
2024-10-18	Enhancing Large Language Models’ Situated Faithfulness to External Contexts	Yukun Huang et.al.	2410.14675	link
2024-10-22	Do LLMs estimate uncertainty well in instruction-following?	Juyeon Heo et.al.	2410.14582	link
2024-10-18	Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models	James Vo et.al.	2410.14480	null
2024-10-18	Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Josiah Aklilu et.al.	2410.14340	null
2024-10-18	Critical Questions Generation: Motivation and Challenges	Blanca Calvo Figueras et.al.	2410.14335	link
2024-10-18	ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM	Songheng Zhang et.al.	2410.14331	null
2024-10-18	LoGU: Long-form Generation with Uncertainty Expressions	Ruihan Yang et.al.	2410.14309	link
2024-10-22	Good Parenting is all you need – Multi-agentic LLM Hallucination Mitigation	Ted Kwartler et.al.	2410.14262	null
2024-10-18	Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models	Olga Loginova et.al.	2410.14248	null
2024-10-21	Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning	Xingyu Tan et.al.	2410.14211	null
2024-10-18	Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment	Chenhang Cui et.al.	2410.14148	null
2024-10-17	From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization	Catarina G. Belem et.al.	2410.13961	link
2024-10-17	Goal Inference from Open-Ended Dialog	Rachel Ma et.al.	2410.13957	null
2024-10-17	RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards	Xinze Li et.al.	2410.13509	link
2024-10-17	Advancing Large Language Model Attribution through Self-Improving	Lei Huang et.al.	2410.13298	null
2024-10-17	Learning to Route with Confidence Tokens	Yu-Neng Chuang et.al.	2410.13284	null
2024-10-17	Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning	Minseok Choi et.al.	2410.13274	null
2024-10-17	Atomic Calibration of LLMs in Long-Form Generations	Caiqi Zhang et.al.	2410.13246	null
2024-10-17	LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch	Caigao Jiang et.al.	2410.13213	link
2024-10-17	FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs	Forrest Sheng Bao et.al.	2410.13210	link
2024-10-18	MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback	Zonghai Yao et.al.	2410.13191	link
2024-10-21	Utilizing Large Language Models in An Iterative Paradigm with Domain Feedback for Molecule Optimization	Khiem Le et.al.	2410.13147	null
2024-10-17	Trust but Verify: Programmatic VLM Evaluation in the Wild	Viraj Prabhu et.al.	2410.13121	null
2024-10-17	Learning to Summarize from LLM-generated Feedback	Hwanjun Song et.al.	2410.13116	null
2024-10-16	Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models	Jie Ren et.al.	2410.13088	null
2024-10-16	Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models	Linhao Luo et.al.	2410.13080	link
2024-10-16	PromptExp: Multi-granularity Prompt Explanation of Large Language Models	Ximing Dong et.al.	2410.13073	null
2024-10-16	LLM Confidence Evaluation Measures in Zero-Shot CSS Classification	David Farr et.al.	2410.13047	null
2024-10-16	When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems	Asir Saadat et.al.	2410.13029	null
2024-10-16	LLM Chain Ensembles for Scalable and Accurate Data Annotation	David Farr et.al.	2410.13006	link
2024-10-16	REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models	Ambuje Gupta et.al.	2410.12890	null
2024-10-16	On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs	Herun Wan et.al.	2410.12600	null
2024-10-16	A Claim Decomposition Benchmark for Long-form Answer Verification	Zhihao Zhang et.al.	2410.12558	link
2024-10-17	MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration	Jinjie Wei et.al.	2410.12532	null
2024-10-16	RosePO: Aligning LLM-based Recommenders with Human Values	Jiayi Liao et.al.	2410.12519	null
2024-10-16	KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs	Yongqin Xu et.al.	2410.12480	null
2024-10-18	MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models	Boyang Xue et.al.	2410.12478	link
2024-10-16	ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs	Jingming Zhuo et.al.	2410.12405	link
2024-10-17	Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs	Lei Sun et.al.	2410.12298	null
2024-10-16	Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors	Linwei Tao et.al.	2410.12295	null
2024-10-17	LLM-based Cognitive Models of Students with Misconceptions	Shashank Sonkar et.al.	2410.12294	null
2024-10-16	An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation	Junjie Chen et.al.	2410.12265	null
2024-10-16	CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity	Jintao Liu et.al.	2410.12248	link
2024-10-16	On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation	Xiaonan Jing et.al.	2410.12222	null
2024-10-16	Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning	Huiwen Wu et.al.	2410.12130	null
2024-10-15	Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction	Kaiqiao Han et.al.	2410.12040	link
2024-10-15	Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents	Bolun Sun et.al.	2410.11906	null
2024-10-15	Zero-shot Model-based Reinforcement Learning using Large Language Models	Abdelhakim Benechehab et.al.	2410.11711	link
2024-10-15	Black-box Uncertainty Quantification Method for LLM-as-a-Judge	Nico Wagner et.al.	2410.11594	null
2024-10-15	AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data	Xinjie Zhao et.al.	2410.11531	null
2024-10-15	ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability	Zhongxiang Sun et.al.	2410.11414	null
2024-10-15	LargePiG: Your Large Language Model is Secretly a Pointer Generator	Zhongxiang Sun et.al.	2410.11366	null
2024-10-15	Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs	Shuo Li et.al.	2410.11302	null
2024-10-15	On the Capacity of Citation Generation by Large Language Models	Haosheng Qian et.al.	2410.11217	null
2024-10-14	LLM Unlearning via Loss Adjustment with Only Forget Data	Yaxuan Wang et.al.	2410.11143	null
2024-10-14	Can Structured Data Reduce Epistemic Uncertainty?	Shriram M S et.al.	2410.11141	null
2024-10-14	Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only	Jihan Yao et.al.	2410.11055	link
2024-10-13	3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation	Hongxin Ding et.al.	2410.10901	null
2024-10-14	Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance	Sachin Goyal et.al.	2410.10796	link
2024-10-16	SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators	Rasoul Shafipour et.al.	2410.10714	null
2024-10-14	On Calibration of LLM-based Guard Models for Reliable Content Moderation	Hongfu Liu et.al.	2410.10414	link
2024-10-14	Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion	Xinping Zhao et.al.	2410.10408	null
2024-10-14	Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search	Chenglin Li et.al.	2410.10392	null
2024-10-14	Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning	Yongxin Xu et.al.	2410.10360	null
2024-10-14	SkillAggregation: Reference-free LLM-Dependent Aggregation	Guangzhi Sun et.al.	2410.10215	null
2024-10-13	A Multi-LLM Orchestration Engine for Personalized, Context-Rich Assistance	Sumedh Rasal et.al.	2410.10039	null
2024-10-13	Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code	Nan Jiang et.al.	2410.09997	null
2024-10-15	LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Han Qiu et.al.	2410.09962	link
2024-10-13	Can Large Language Models Generate Geospatial Code?	Shuyang Hou et.al.	2410.09738	null
2024-10-13	Taming Overconfidence in LLMs: Reward Calibration in RLHF	Jixuan Leng et.al.	2410.09724	link
2024-10-13	Honest AI: Fine-Tuning “Small” Language Models to Say “I Don’t Know”, and Reducing Hallucination in RAG	Xinxi Chen et.al.	2410.09699	null
2024-10-13	Integrating Reinforcement Learning and Large Language Models for Crop Production Process Management Optimization and Control through A New Knowledge-Based Deep Learning Paradigm	Dong Chen et.al.	2410.09680	null
2024-10-12	FlatQuant: Flatness Matters for LLM Quantization	Yuxuan Sun et.al.	2410.09426	link
2024-10-12	LLM $\times$ MapReduce: Simplified Long-Sequence Processing using Large Language Models	Zihan Zhou et.al.	2410.09342	link
2024-10-15	Nudging: Inference-time Alignment via Model Collaboration	Yu Fei et.al.	2410.09300	null
2024-10-11	Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective	Bo Ni et.al.	2410.08985	null
2024-10-11	NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models	Zheng Yi Ho et.al.	2410.08970	null
2024-10-11	Decoding Secret Memorization in Code LLMs Through Token-Level Characterization	Yuqing Nie et.al.	2410.08858	null
2024-10-11	Measuring the Inconsistency of Large Language Models in Preferential Ranking	Xiutian Zhao et.al.	2410.08851	null
2024-10-11	Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction	Zhuoran Li et.al.	2410.08829	link
2024-10-11	Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation	Ruobing Wang et.al.	2410.08821	link
2024-10-11	VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding	Houlun Chen et.al.	2410.08593	link
2024-10-11	Humanity in AI: Detecting the Personality of Large Language Models	Baohua Zhan et.al.	2410.08545	null
2024-10-11	Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both	Abhijnan Nath et.al.	2410.08458	null
2024-10-11	oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness	Yu He Ke et.al.	2410.08431	null
2024-10-10	Large Airfoil Models	Howon Lee et.al.	2410.08392	null
2024-10-10	Think Beyond Size: Dynamic Prompting for More Effective Reasoning	Kamesh R et.al.	2410.08130	null
2024-10-10	A Closer Look at Machine Unlearning for Large Language Models	Xiaojian Yuan et.al.	2410.08109	link
2024-10-10	Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering	Yuan Sui et.al.	2410.08085	null
2024-10-10	Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses	Pranav Senthilkumar et.al.	2410.07826	null
2024-10-10	Mitigating Gender Bias in Code Large Language Models via Model Editing	Zhanyue Qin et.al.	2410.07820	null
2024-10-10	Automatic Curriculum Expert Iteration for Reliable LLM Reasoning	Zirui Zhao et.al.	2410.07627	link
2024-10-10	No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users	Mengxuan Hu et.al.	2410.07589	null
2024-10-10	OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting	Xukai Liu et.al.	2410.07549	link
2024-10-10	MKGL: Mastery of a Three-Word Language	Lingbing Guo et.al.	2410.07526	null
2024-10-09	Localizing Factual Inconsistencies in Attributable Text Generation	Arie Cattan et.al.	2410.07473	link
2024-10-09	Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning	Abhinav Bandari et.al.	2410.07461	link
2024-10-09	Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Manling Li et.al.	2410.07166	link
2024-10-09	Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization	Chengtao Jian et.al.	2410.07018	null
2024-10-09	Self-Boosting Large Language Models with Synthetic Preference Data	Qingxiu Dong et.al.	2410.06961	null
2024-10-09	AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation	Huanxi Liu et.al.	2410.06943	null
2024-10-09	Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning	Runchuan Zhu et.al.	2410.06913	link
2024-10-09	Calibrating Verbalized Probabilities for Large Language Models	Cheng Wang et.al.	2410.06707	null
2024-10-09	Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack	Leo McKee-Reid et.al.	2410.06491	null
2024-10-09	Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders	David Noever et.al.	2410.06462	null
2024-10-09	Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs	Ruijia Niu et.al.	2410.06431	null
2024-10-08	Validation of the Scientific Literature via Chemputation Augmented by Large Language Models	Sebastian Pagel et.al.	2410.06384	null
2024-10-08	Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Ruosen Li et.al.	2410.06304	null
2024-10-08	EVOLvE: Evaluating and Optimizing LLMs For Exploration	Allen Nie et.al.	2410.06238	null
2024-10-08	ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution	Corban Rivera et.al.	2410.06108	null
2024-10-10	LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs	Vincent Emonet et.al.	2410.06062	link
2024-10-08	Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models	Bozhou Li et.al.	2410.05802	null
2024-10-08	Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition	Zheyang Xiong et.al.	2410.05603	null
2024-10-07	Self-rationalization improves LLM as a fine-grained judge	Prapti Trivedi et.al.	2410.05495	null
2024-10-07	ESPACE: Dimensionality Reduction of Activations for Model Compression	Charbel Sakr et.al.	2410.05437	null
2024-10-05	PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms	Yilong Li et.al.	2410.05315	null
2024-10-07	SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe	Yuxin Xiao et.al.	2410.05248	null
2024-10-07	Precise Model Benchmarking with Only a Few Observations	Riccardo Fogliato et.al.	2410.05222	null
2024-10-07	Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality	Guanyu Zhou et.al.	2410.04780	link
2024-10-07	Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering	Zimu Wang et.al.	2410.04752	null
2024-10-06	Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval	Pengcheng Jiang et.al.	2410.04585	link
2024-10-06	DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination	Xuan Gong et.al.	2410.04514	null
2024-10-05	DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech	Dominika Woszczyk et.al.	2410.04188	null
2024-10-04	dZiner: Rational Inverse Design of Materials with AI Agents	Mehrad Ansari et.al.	2410.03963	link
2024-10-03	Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge	Aparna Elangovan et.al.	2410.03775	link
2024-10-04	Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores	Robert E. Blackwell et.al.	2410.03492	null
2024-10-04	Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation	Tobias Leemann et.al.	2410.03461	null
2024-10-08	Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs	Louis Serrano et.al.	2410.03437	null
2024-10-04	Towards a Benchmark for Large Language Models for Business Process Management Tasks	Kiran Busch et.al.	2410.03255	link
2024-10-04	Showing LLM-Generated Code Selectively Based on Confidence of LLMs	Jia Li et.al.	2410.03234	null
2024-10-04	ALR $^2$ : A Retrieve-then-Reason Framework for Long-context Question Answering	Huayang Li et.al.	2410.03227	null
2024-10-04	Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback	Kyuyoung Kim et.al.	2410.03145	link
2024-10-04	SAG: Style-Aligned Article Generation via Model Collaboration	Chenning Xu et.al.	2410.03137	null
2024-10-10	ARB-LLM: Alternating Refined Binarizations for Large Language Models	Zhiteng Li et.al.	2410.03129	link
2024-10-04	UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference	Jing Xiong et.al.	2410.03090	null
2024-10-04	Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues	Shilin Qu et.al.	2410.03049	null
2024-10-03	Characterizing Context Influence and Hallucination in Summarization	James Flemings et.al.	2410.03026	link
2024-10-03	Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review	Sungduk Yu et.al.	2410.03019	null
2024-09-30	Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG	Chenhao Fang et.al.	2410.02825	null
2024-10-09	CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation	Han He et.al.	2410.02748	link
2024-10-03	Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization	Lei Xu et.al.	2410.02741	link
2024-10-03	Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization	Ryan C. Barron et.al.	2410.02721	null
2024-10-07	LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations	Hadas Orgad et.al.	2410.02707	link
2024-10-03	Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers	Shijie Chen et.al.	2410.02642	null
2024-10-03	Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration	Yun Qu et.al.	2410.02511	link
2024-10-03	AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models	Junfeng Fang et.al.	2410.02355	link
2024-10-04	How Much Can RAG Help the Reasoning of LLM?	Jingyu Liu et.al.	2410.02338	null
2024-10-03	Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference	Wei Cheng et.al.	2410.02210	null
2024-10-03	Efficiently Deploying LLMs with Controlled Risk	Michael J. Zellinger et.al.	2410.02173	null
2024-10-03	Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments	Amogh Mannekote et.al.	2410.02110	link
2024-10-02	DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection	Daiki Chiba et.al.	2410.02095	null
2024-10-02	DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning	Yebowen Hu et.al.	2410.01772	null
2024-10-02	CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs	Kangsheng Wang et.al.	2410.01696	null
2024-10-02	FactAlign: Long-form Factuality Alignment of Large Language Models	Chao-Wei Huang et.al.	2410.01691	link
2024-10-02	Intent Detection in the Age of LLMs	Gaurav Arora et.al.	2410.01627	null
2024-10-02	Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration	Kangxi Wu et.al.	2410.01285	null
2024-10-02	BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation	Bryan Li et.al.	2410.01171	link
2024-10-01	Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability	Weitong Zhang et.al.	2410.01064	null
2024-10-01	Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown	Xingzhou Lou et.al.	2410.00847	null
2024-10-01	Dynamic Planning for LLM-based Graphical User Interface Automation	Shaoqing Zhang et.al.	2410.00467	link
2024-10-01	UniAdapt: A Universal Adapter for Knowledge Calibration	Tai D. Nguyen et.al.	2410.00454	null
2024-10-01	Are LLMs Aware that Some Questions are not Open-ended?	Dongjie Yang et.al.	2410.00423	null
2024-10-01	Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation	Bhargav Shandilya et.al.	2410.00387	null
2024-09-30	A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification	Marina Ribeiro et.al.	2410.00250	null
2024-09-30	LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation	Ziyao Zhang et.al.	2409.20550	link
2024-09-30	Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models	Arpan Mukherjee et.al.	2409.20512	null
2024-10-04	VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs	Ruotong Liao et.al.	2409.20365	link
2024-09-30	MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants	Zeyu Zhang et.al.	2409.20163	link
2024-09-30	Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation	Huangyu Dai et.al.	2409.19877	null
2024-09-29	Calibrating Language Models with Adaptive Temperature Scaling	Johnathan Xie et.al.	2409.19817	link
2024-09-29	MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models	Vibhor Agarwal et.al.	2409.19492	null
2024-09-28	Overriding Safety protections of Open-source Models	Sachin Kumar et.al.	2409.19476	link
2024-09-28	SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models	Yi Wu et.al.	2409.19471	link
2024-09-28	Decoding Echo Chambers: LLM-Powered Simulations Revealing Polarization in Social Networks	Chenxi Wang et.al.	2409.19338	null
2024-09-28	DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning	Kazuki Matsuda et.al.	2409.19255	null
2024-09-27	Secure Multiparty Generative AI	Manil Shrestha et.al.	2409.19120	null
2024-09-27	A Survey on the Honesty of Large Language Models	Siheng Li et.al.	2409.18786	link
2024-10-02	Model-based Preference Optimization in Abstractive Summarization without Human Feedback	Jaepill Choi et.al.	2409.18618	link
2024-09-26	Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model	Chuang Niu et.al.	2409.18319	link
2024-09-26	Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT	Natthanaphop Isaradech et.al.	2409.17683	null
2024-09-26	A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models	Syed Affan Daimi et.al.	2409.17581	link
2024-09-26	HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection	Xuefeng Du et.al.	2409.17504	null
2024-09-25	Post-hoc Reward Calibration: A Case Study on Length Bias	Zeyu Huang et.al.	2409.17407	link
2024-09-25	Search for Efficient Large Language Models	Xuan Shen et.al.	2409.17372	link
2024-09-20	A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models	Satoshi Munakata et.al.	2409.17173	null
2024-09-25	Mitigating the Bias of Large Language Model Evaluation	Hongli Zhou et.al.	2409.16788	link
2024-09-25	RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems	Yihong Tang et.al.	2409.16727	null
2024-09-25	EventHallusion: Diagnosing Event Hallucinations in Video LLMs	Jiacheng Zhang et.al.	2409.16597	link
2024-09-25	Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels	Yishu Wei et.al.	2409.16563	null
2024-09-24	MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment	Venkata Naren Devarakonda et.al.	2409.16455	null
2024-09-24	Automated test generation to evaluate tool-augmented LLMs as conversational AI agents	Samuel Arcadinho et.al.	2409.15934	null
2024-09-24	Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts	Sukai Huang et.al.	2409.15915	null
2024-09-24	Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection	Xingyu Ma et.al.	2409.15907	null
2024-09-24	XTRUST: On the Multilingual Trustworthiness of Large Language Models	Yahan Li et.al.	2409.15762	link
2024-09-23	Parse Trees Guided LLM Prompt Compression	Wenhao Mao et.al.	2409.15395	link
2024-09-18	VERA: Validation and Enhancement for Retrieval Augmented systems	Nitin Aravind Birur et.al.	2409.15364	null
2024-09-18	Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning	Essa Jan et.al.	2409.15361	null
2024-09-27	Reward-Robust RLHF in LLMs	Yuzi Yan et.al.	2409.15360	null
2024-09-23	A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?	Yunfei Xie et.al.	2409.15277	null
2024-09-26	A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models	Yixi Wu et.al.	2409.15228	null
2024-09-23	Boosting Healthcare LLMs Through Retrieved Context	Jordi Bayarri-Planas et.al.	2409.15127	link
2024-09-23	Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications	Sean Kim et.al.	2409.15076	null
2024-09-23	InterMind: A Doctor-Patient-Family Interactive Depression Assessment System Empowered by Large Language Models	Zhiyuan Zhou et.al.	2409.14878	null
2024-09-23	Past Meets Present: Creating Historical Analogy with Large Language Models	Nianqi Li et.al.	2409.14820	link
2024-09-28	Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method	Weichao Zhang et.al.	2409.14781	link
2024-09-23	zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning	Zixiang Xian et.al.	2409.14644	null
2024-09-22	Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization	Minyi Zhao et.al.	2409.14484	null
2024-09-22	Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses	Hung-Ting Su et.al.	2409.14324	link
2024-09-21	OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching	Zhangcheng Qiang et.al.	2409.14038	null
2024-09-20	Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology	Aidan Gilson et.al.	2409.13902	null
2024-09-20	FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs	Bowen Yan et.al.	2409.13612	null
2024-09-20	ChainBuddy: An AI Agent System for Generating LLM Pipelines	Jingyue Zhang et.al.	2409.13588	null
2024-09-23	AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit	Mohanna Hoveyda et.al.	2409.13447	link
2024-09-20	Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey	Sourav Verma et.al.	2409.13385	link
2024-09-20	Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems	Andrea Colombo et.al.	2409.13252	null
2024-09-19	Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models	Peiyi Zhang et.al.	2409.12739	null
2024-09-19	LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks	Malsha Ashani Mahawatta Dona et.al.	2409.12580	null
2024-09-19	Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation	Chen Liang et.al.	2409.12411	null
2024-09-19	On the Effectiveness of LLMs for Manual Test Verifications	Myron David Lucena Campos Peixoto et.al.	2409.12405	null
2024-09-18	RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models	Abhinav Jain et.al.	2409.12294	null
2024-09-18	Finetuning Language Models to Emit Linguistic Expressions of Uncertainty	Arslan Chaudhry et.al.	2409.12180	null
2024-09-05	LitFM: A Retrieval Augmented Structure-aware Foundation Model For Citation Graphs	Jiasheng Zhang et.al.	2409.12177	null
2024-09-18	Combating Phone Scams with LLM-based Detection: Where Do We Stand?	Zitong Shen et.al.	2409.11643	null
2024-09-17	HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection	Theo King et.al.	2409.11579	link
2024-09-17	What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts	Shuaiyu Chen et.al.	2409.11540	null
2024-09-17	CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration	Jiahui Gao et.al.	2409.11365	null
2024-09-17	THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models	Mengfei Liang et.al.	2409.11353	link
2024-09-25	Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling	Xinyue Fang et.al.	2409.11283	null
2024-09-17	Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models	Bishwash Khanal et.al.	2409.11233	null
2024-09-17	Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization	Jianing Wang et.al.	2409.11212	link
2024-09-17	A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B	Jemin Lee et.al.	2409.11055	link
2024-09-16	Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering	Qingru Zhang et.al.	2409.10790	null
2024-09-16	“The Data Says Otherwise”-Towards Automated Fact-checking and Communication of Data Claims	Yu Fu et.al.	2409.10713	null
2024-09-17	Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot	Bhuvan Sachdeva et.al.	2409.10354	null
2024-09-16	Trustworthiness in Retrieval-Augmented Generation Systems: A Survey	Yujia Zhou et.al.	2409.10102	link
2024-09-16	Benchmarking Large Language Model Uncertainty for Prompt Optimization	Pei-Fu Guo et.al.	2409.10044	link
2024-09-18	HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making	Sumera Anjum et.al.	2409.10011	link
2024-09-23	Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations	Abe Bohan Hou et.al.	2409.09947	link
2024-09-16	Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges	Vinay Samuel et.al.	2409.09927	link
2024-09-16	SFR-RAG: Towards Contextually Faithful LLMs	Xuan-Phi Nguyen et.al.	2409.09916	null
2024-09-15	ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing	Suhyeon Yoo et.al.	2409.09760	null
2024-09-15	ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts	Che Wang et.al.	2409.09661	link
2024-09-21	Confidence Estimation for LLM-Based Dialogue State Tracking	Yi-Jyun Sun et.al.	2409.09629	link
2024-09-14	VernaCopter: Disambiguated Natural-Language-Driven Robot via Formal Specifications	Teun van de Laar et.al.	2409.09536	link
2024-09-14	Hacking, The Lazy Way: LLM Augmented Pentesting	Dhruva Goyal et.al.	2409.09493	null
2024-09-19	The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection	Yi Yang et.al.	2409.09380	null
2024-09-13	Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions	Zahra Ashktorab et.al.	2409.08937	null
2024-09-23	When Context Leads but Parametric Memory Follows in Large Language Models	Yufei Tao et.al.	2409.08435	link
2024-09-12	Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT	Irene Weber et.al.	2409.07732	link
2024-09-11	MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications	Praveen K Kanithi et.al.	2409.07314	null
2024-09-11	Reranking Laws for Language Generation: A Communication-Theoretic Perspective	António Farinhas et.al.	2409.07131	null
2024-09-11	Understanding Knowledge Drift in LLMs through Misinformation	Alina Fastowski et.al.	2409.07085	link
2024-09-11	Representation Tuning	Christopher M. Ackerman et.al.	2409.06927	link
2024-09-10	Semi-Supervised Reward Modeling via Iterative Self-Training	Yifei He et.al.	2409.06903	link
2024-09-10	Geometric-Averaged Preference Optimization for Soft Preference Labels	Hiroki Furuta et.al.	2409.06691	null
2024-09-10	Alleviating Hallucinations in Large Language Models with Scepticism Modeling	Yetao Wu et.al.	2409.06601	null
2024-09-10	GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering	Sacha Muller et.al.	2409.06595	link
2024-09-10	Automate Strategy Finding with LLM in Quant investment	Zhizhuo Kou et.al.	2409.06289	null
2024-09-14	ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog	Yujian Gan et.al.	2409.06097	link
2024-09-09	$\mathbb{USCD}$ : Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding	Shuai Wang et.al.	2409.05923	null
2024-09-09	Benchmarking Chinese Knowledge Rectification in Large Language Models	Tianhe Lu et.al.	2409.05806	link
2024-09-09	LLMs Will Always Hallucinate, and We Need to Live With This	Sourav Banerjee et.al.	2409.05746	null
2024-09-07	LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs	Yongxin Deng et.al.	2409.04744	null
2024-09-03	Here’s Charlie! Realising the Semantic Web vision of Agents in the age of LLMs	Jesse Wright et.al.	2409.04465	null
2024-09-06	Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering	Larissa Pusch et.al.	2409.04181	null
2024-09-13	Safeguarding AI Agents: Developing and Analyzing Safety Architectures	Ishaan Domkundwar et.al.	2409.03793	null
2024-09-06	RAG based Question-Answering for Contextual Response Prediction System	Sriram Veturi et.al.	2409.03708	null
2024-09-05	Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration	Jeremy Qin et.al.	2409.03225	link
2024-09-05	Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models	Jie Ma et.al.	2409.03155	link
2024-09-04	CLUE: Concept-Level Uncertainty Estimation for Large Language Models	Yu-Hsiang Wang et.al.	2409.03021	null
2024-09-04	Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models	Gabriel Y. Arteaga et.al.	2409.02976	link
2024-09-10	LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA	Jiajie Zhang et.al.	2409.02897	link
2024-09-04	Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs	Ruoyu Wang et.al.	2409.02686	null
2024-09-03	Initial Development and Evaluation of the Creative Artificial Intelligence through Recurring Developments and Determinations (CAIRDD) System	Jeremy Straub et.al.	2409.02291	null
2024-09-03	Physical Rule-Guided Convolutional Neural Network	Kishor Datta Gupta et.al.	2409.02081	null
2024-09-03	RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer	Jiangyi Deng et.al.	2409.02074	null
2024-08-25	Path-Consistency: Prefix Enhancement for Efficient Inference in LLM	Jiace Zhu et.al.	2409.01281	null
2024-09-02	Statically Contextualizing Large Language Models with Typed Holes	Andrew Blinn et.al.	2409.00921	null
2024-09-01	Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering	Derian Boer et.al.	2409.00861	link
2024-09-04	Learning to Ask: When LLMs Meet Unclear Instruction	Wenxuan Wang et.al.	2409.00557	null
2024-08-31	Does Alignment Tuning Really Break LLMs’ Internal Confidence?	Hongseok Oh et.al.	2409.00352	link
2024-09-08	ProGRes: Prompted Generative Rescoring on ASR n-Best	Ada Defne Tur et.al.	2409.00217	link
2024-08-30	LLMs hallucinate graphs too: a structural perspective	Erwan Le Merrer et.al.	2409.00159	null
2024-08-29	HoneyComb: A Flexible LLM-Based Agent System for Materials Science	Huan Zhang et.al.	2409.00135	null
2024-09-04	Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs	Ziyan Cui et.al.	2409.00128	null
2024-09-08	Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning	Momin Abbas et.al.	2409.00124	null
2024-09-04	Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation	Mohammad Nadeem et.al.	2409.00105	null
2024-08-26	Evaluating ChatGPT on Nuclear Domain-Specific Data	Muhammad Anwar et.al.	2409.00090	null
2024-08-26	Watermarking Techniques for Large Language Models: A Survey	Yuqing Liang et.al.	2409.00089	null
2024-08-30	Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain	Francesca Grasso et.al.	2408.17362	link
2024-08-30	Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling	Guangya Wan et.al.	2408.17017	null
2024-09-05	UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches	Chao Wang et.al.	2408.16966	null
2024-09-04	Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies	Zhiyang Qi et.al.	2408.16586	null
2024-08-29	LoraMap: Harnessing the Power of LoRA Connections	Hyeryun Park et.al.	2408.16264	null
2024-08-28	Logic-Enhanced Language Model Agents for Trustworthy Social Simulations	Agnieszka Mensfelt et.al.	2408.16081	link
2024-08-28	WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration	Yao Zhang et.al.	2408.15978	null
2024-09-07	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	Yuncheng Yang et.al.	2408.15915	link
2024-08-28	Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization	Léo Hemamou et.al.	2408.15801	null
2024-08-28	An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation	Thai Tang Quoc et.al.	2408.15658	null
2024-08-28	Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation	Lujun Gui et.al.	2408.15562	null
2024-08-29	LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation	Haichuan Hu et.al.	2408.15533	link
2024-08-28	Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression	Haowen Hou et.al.	2408.15491	link
2024-08-27	The Uniqueness of LLaMA3-70B with Per-Channel Quantization: An Empirical Study	Minghai Qin et.al.	2408.15301	null
2024-08-27	Can Unconfident LLM Annotations Be Used for Confident Conclusions?	Kristina Gligorić et.al.	2408.15204	link
2024-08-27	Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation	N. E. Kriman et.al.	2408.15171	null
2024-08-27	Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering	Haowei Du et.al.	2408.15037	null
2024-08-28	Language-specific Calibration for Pruning Multilingual Language Models	Simon Kurz et.al.	2408.14398	null
2024-08-26	Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders	Cong Xu et.al.	2408.14238	link
2024-08-25	CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction	Guangya Wan et.al.	2408.13940	null
2024-08-25	Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models	Duy Khoa Pham et.al.	2408.13808	null
2024-08-25	Poor-Supervised Evaluation for SuperLLM via Mutual Consistency	Peiwen Yuan et.al.	2408.13738	null
2024-08-25	LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models	Aoxiao Zhong et.al.	2408.13727	null
2024-08-24	Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models	Jinyang Wu et.al.	2408.13533	null
2024-08-27	Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning	Hourui Deng et.al.	2408.13184	null
2024-08-23	IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models	Zhihao Yu et.al.	2408.13073	link
2024-08-23	Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering	Haowei Du et.al.	2408.12979	null
2024-08-22	SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection	Mengya Hu et.al.	2408.12748	link
2024-08-22	Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning	Mushui Liu et.al.	2408.12469	null
2024-08-22	A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation	Weijia Zhang et.al.	2408.12398	null
2024-09-04	Graph Retrieval Augmented Trustworthiness Reasoning	Ying Zhu et.al.	2408.12333	link
2024-08-22	Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models	Meiyun Wang et.al.	2408.12326	link
2024-08-22	Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators	Dingkang Yang et.al.	2408.12325	link
2024-08-22	MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient	Yanzeng Li et.al.	2408.12236	null
2024-08-22	FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation	KaShun Shum et.al.	2408.12168	link
2024-08-22	ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM	Zhaochen Su et.al.	2408.12076	link
2024-08-21	Understanding Epistemic Language with a Bayesian Theory of Mind	Lance Ying et.al.	2408.12022	null
2024-08-21	RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization	Jinhu Qi et.al.	2408.12003	null
2024-08-21	Automatic knowledge-graph creation from historical documents: The Chilean dictatorship as a case study	Camila Díaz et.al.	2408.11975	null
2024-08-23	Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy	Priyanka Mandikal et.al.	2408.11903	link
2024-08-17	How Susceptible are LLMs to Influence in Prompts?	Sotiris Anagnostidis et.al.	2408.11865	null
2024-08-21	DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework	Zhifei Xie et.al.	2408.11788	null
2024-08-21	EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning	Zhihao Li et.al.	2408.11397	null
2024-08-21	First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models	Chi Ma et.al.	2408.11393	null
2024-08-21	RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation	Xuanwang Zhang et.al.	2408.11381	link
2024-08-20	A Little Confidence Goes a Long Way	John Scoville et.al.	2408.11239	null
2024-08-20	Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model	Chenhan Yuan et.al.	2408.10764	null
2024-08-20	Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models	Artem Vazhentsev et.al.	2408.10692	null
2024-08-20	Analysis of Plan-based Retrieval for Grounded Text Generation	Ameya Godbole et.al.	2408.10490	null
2024-08-20	LeCov: Multi-level Testing Criteria for Large Language Models	Xuan Xie et.al.	2408.10474	null
2024-08-19	Enhanced document retrieval with topic embeddings	Kavsar Huseynova et.al.	2408.10435	null
2024-08-19	LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain	Nicholas Pipitone et.al.	2408.10343	link
2024-08-19	Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models	Tianyu Zhang et.al.	2408.10124	link
2024-08-19	MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation	Ching-Wen Yang et.al.	2408.09865	null
2024-08-19	Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?	Shiyu Ni et.al.	2408.09773	null
2024-08-19	A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification	Claudio M. V. de Andrade et.al.	2408.09629	null
2024-08-17	TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems	Xinke Jiang et.al.	2408.09199	link
2024-08-17	Chinese Metaphor Recognition Using a Multi-stage Prompting Large Language Model	Jie Wang et.al.	2408.09177	null
2024-08-17	Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making	Siyu Wu et.al.	2408.09176	null
2024-08-24	Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection	Hsiu-Yuan Huang et.al.	2408.09172	null
2024-08-15	Graph Retrieval-Augmented Generation: A Survey	Boci Peng et.al.	2408.08921	link
2024-08-12	Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection	Chengyu Song et.al.	2408.08902	null
2024-08-22	Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions	Chenming Tang et.al.	2408.08780	null
2024-08-16	Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused	Dingwei Chen et.al.	2408.08769	null
2024-08-16	MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector	Wenjie Fu et.al.	2408.08661	link
2024-08-16	PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code	Ziyou Jiang et.al.	2408.08619	null
2024-08-16	SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models	Kaushal Kumar Maurya et.al.	2408.08545	null
2024-08-15	Plan with Code: Comparing approaches for robust NL to DSL generation	Nastaran Bassamzadeh et.al.	2408.08335	null
2024-08-14	CodeMirage: Hallucinations in Code Generated by Large Language Models	Vibhor Agarwal et.al.	2408.08333	null
2024-08-16	Covert Bias: The Severity of Social Views’ Unalignment in Language Models Towards Implicit and Explicit Opinion	Abeer Aldayel et.al.	2408.08212	null
2024-08-15	LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation	Bohao Wang et.al.	2408.08208	null
2024-08-15	Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy	Shaojun Xu et.al.	2408.08188	null
2024-08-15	Confidence-weighted integration of human and machine judgments for superior decision-making	Felipe Yáñez et.al.	2408.08083	link
2024-08-15	LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning	Jiajie Li et.al.	2408.07981	null
2024-08-14	Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization	Yuxin Jiang et.al.	2408.07471	link
2024-08-13	MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty	Yongjin Yang et.al.	2408.06816	link
2024-08-12	A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution	Sampath Rajapaksha et.al.	2408.06272	null
2024-08-12	On Effects of Steering Latent Representation for Large Language Model Unlearning	Dang Huu-Tien et.al.	2408.06223	link
2024-08-11	Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models	Wenbo Zhang et.al.	2408.05873	link
2024-08-10	Can LLMs Replace Manual Annotation of Software Engineering Artifacts?	Toufique Ahmed et.al.	2408.05534	null
2024-08-19	SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning	Yuze Zhao et.al.	2408.05517	link
2024-08-09	FiST-Financial Style Transfer with Hallucination and Creativity Control Framework	Sohini Roychowdhury et.al.	2408.05365	null
2024-08-09	A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning	Ye Yuan et.al.	2408.05141	null
2024-08-16	Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models	Zikai Xie et.al.	2408.05093	link
2024-08-08	Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews	Samantha Chan et.al.	2408.04681	link
2024-08-06	Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)	Avshalom Manevich et.al.	2408.04664	null
2024-08-08	Arctic-TILT. Business Document Understanding at Sub-Billion Scale	Łukasz Borchmann et.al.	2408.04632	null
2024-08-08	Learning Fine-Grained Grounded Citations for Attributed Large Language Models	Lei Huang et.al.	2408.04568	link
2024-08-20	Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate	Yiqun Zhang et.al.	2408.04472	link
2024-08-07	Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology	Panagiotis Fytas et.al.	2408.04121	null
2024-08-07	Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks	Zizhang Chen et.al.	2408.03732	null
2024-08-19	KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models	Ruizhe Zhang et.al.	2408.03297	null
2024-08-05	An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs	Dongming Jin et.al.	2408.02450	null
2024-08-05	SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models	Shujuan Zhao et.al.	2408.02302	null
2024-08-07	SpecRover: Code Intent Extraction via LLMs	Haifeng Ruan et.al.	2408.02232	null
2024-08-05	ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning	Yuxuan Wang et.al.	2408.02210	null
2024-08-04	Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process	Peng Wang et.al.	2408.02103	null
2024-08-04	Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference	Ke Shen et.al.	2408.01935	null
2024-08-03	TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation	Xingpeng Sun et.al.	2408.01867	null
2024-08-03	WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization	Liwenhan Xie et.al.	2408.01703	null
2024-08-02	Analyzing LLMs’ Capabilities to Establish Implicit User Sentiment of Software Desirability	Sherri Weitl-Harms et.al.	2408.01527	null
2024-07-28	Faculty Perspectives on the Potential of RAG in Computer Science Higher Education	Sagnik Dakshit et.al.	2408.01462	null
2024-08-18	RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework	Kunlun Zhu et.al.	2408.01262	link
2024-08-02	Misinforming LLMs: vulnerabilities, challenges and opportunities	Bo Zhou et.al.	2408.01168	null
2024-08-01	Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection	Steven Fincke et.al.	2408.00914	null
2024-07-26	ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model	Ning Xu et.al.	2408.00804	null
2024-08-01	Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions	Guangzhi Xiong et.al.	2408.00727	link
2024-08-01	Future of Artificial Intelligence in Agile Software Development	Mariyam Mahboob et.al.	2408.00703	null
2024-07-25	Closing the gap between open-source and commercial large language models for medical evidence summarization	Gongbo Zhang et.al.	2408.00588	null
2024-08-01	Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation	Xiaoye Qu et.al.	2408.00555	null
2024-08-01	Jailbreaking Text-to-Image Models with LLM-Based Agents	Yingkai Dong et.al.	2408.00523	null
2024-08-01	DeliLaw: A Chinese Legal Counselling System Based on a Large Language Model	Nan Xie et.al.	2408.00357	null
2024-07-31	Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation	Valdemar Danry et.al.	2408.00024	null
2024-07-30	WebApp1K: A Practical Code-Generation Benchmark for Web App Development	Yi Cui et.al.	2408.00019	link
2024-07-31	Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs	Shi Liu et.al.	2407.21771	null
2024-07-31	Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency	Taiji Li et.al.	2407.21443	null
2024-08-09	Cost-Effective Hallucination Detection for LLMs	Simon Valentin et.al.	2407.21424	null
2024-07-31	Towards interfacing large language models with ASR systems using confidence measures and prompting	Maryam Naderi et.al.	2407.21414	null
2024-07-31	Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs	Elan Markowitz et.al.	2407.21358	link
2024-07-30	Accelerating Large Language Model Inference with Self-Supervised Early Exits	Florian Valade et.al.	2407.21082	null
2024-07-25	Multi-group Uncertainty Quantification for Long-form Text Generation	Terrance Liu et.al.	2407.21057	null
2024-07-24	Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications	Cui Long et.al.	2407.21055	null
2024-07-30	Automated Review Generation Method Based on Large Language Models	Shican Wu et.al.	2407.20906	link
2024-07-30	How to Measure the Intelligence of Large Language Models?	Nils Körber et.al.	2407.20828	null
2024-07-30	Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian	Serena Auriemma et.al.	2407.20654	null
2024-07-25	An Efficient Inference Framework for Early-exit Large Language Models	Ruijie Miao et.al.	2407.20272	null
2024-07-17	Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies	Lachlan McGinness et.al.	2407.20244	null
2024-08-02	Improving Retrieval Augmented Language Model with Self-Reasoning	Yuan Xia et.al.	2407.19813	null
2024-07-29	SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages	Wenxuan Zhang et.al.	2407.19672	link
2024-07-27	Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review	Tongyue Shi et.al.	2407.19256	null
2024-07-26	OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation	Zilong Wang et.al.	2407.19056	link
2024-08-08	Know Your Limits: A Survey of Abstention in Large Language Models	Bingbing Wen et.al.	2407.18418	null
2024-07-25	Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement	Jaehun Jung et.al.	2407.18370	null
2024-07-25	The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation	Eric Yang et.al.	2407.18044	null
2024-07-24	WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries	Wenting Zhao et.al.	2407.17468	null
2024-07-24	ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering	Xiuying Chen et.al.	2407.16931	null
2024-07-23	Generation Constraint Scaling Can Mitigate Hallucination	Georgios Kollias et.al.	2407.16908	null
2024-07-23	TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class	Anishka IIITD et.al.	2407.16805	link
2024-07-23	Shared Imagination: LLMs Hallucinate Alike	Yilun Zhou et.al.	2407.16604	null
2024-07-23	Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs	Yifan Xia et.al.	2407.16576	null
2024-07-23	Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models	Ioana Buhnila et.al.	2407.16565	link
2024-07-25	Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models	Kenza Benkirane et.al.	2407.16470	link
2024-07-23	Enhancing LLM’s Cognition via Structurization	Kai Liu et.al.	2407.16434	link
2024-07-23	LawLuo: A Chinese Law Firm Co-run by LLM Agents	Jingyun Sun et.al.	2407.16252	link
2024-07-23	Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models	Nishanth Madhusudhan et.al.	2407.16221	null
2024-07-22	Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned	Song Wang et.al.	2407.15441	null
2024-07-22	MAVEN-Fact: A Large-scale Event Factuality Detection Dataset	Chunyang Li et.al.	2407.15352	link
2024-07-20	Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models	Ze Yu Zhang et.al.	2407.14845	null
2024-07-19	Internal Consistency and Self-Feedback in Large Language Models: A Survey	Xun Liang et.al.	2407.14507	link
2024-07-19	Prompted Aspect Key Point Analysis for Quantitative Review Summarization	An Quang Tang et.al.	2407.14049	link
2024-07-18	CoDefeater: Using LLMs To Find Defeaters in Assurance Cases	Usman Gohar et.al.	2407.13717	link
2024-08-01	Prover-Verifier Games improve legibility of LLM outputs	Jan Hendrik Kirchner et.al.	2407.13692	null
2024-07-18	BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models	Moon Ye-Bin et.al.	2407.13442	null
2024-07-18	CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis	Junying Chen et.al.	2407.13301	link
2024-07-19	AI-Assisted SQL Authoring at Industry Scale	Chandra Maddila et.al.	2407.13280	null
2024-07-19	Retrieval-Augmented Generation for Natural Language Processing: A Survey	Shangyu Wu et.al.	2407.13193	null
2024-07-18	Translate-and-Revise: Boosting Large Language Models for Constrained Translation	Pengcheng Huang et.al.	2407.13164	null
2024-07-17	Halu-J: Critique-Based Hallucination Judge	Binjie Wang et.al.	2407.12943	link
2024-08-01	Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild	Nicolas Richet et.al.	2407.12927	link
2024-07-17	Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models	Alexander R. Pelletier et.al.	2407.12888	link
2024-07-17	LLM-based query paraphrasing for video search	Jiaxin Wu et.al.	2407.12341	null
2024-07-17	Optimizing Query Generation for Enhanced Document Retrieval in RAG	Hamin Koo et.al.	2407.12325	null
2024-07-11	NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2	Tengfei Xue et.al.	2407.12057	null
2024-07-16	What’s Wrong? Refining Meeting Summaries with LLM Feedback	Frederic Kirstein et.al.	2407.11919	null
2024-07-16	LoFTI: Localization and Factuality Transfer to Indian Locales	Sona Elza Simon et.al.	2407.11833	link
2024-07-16	A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots	Lucia Chen et.al.	2407.11387	null
2024-07-19	Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models	Qingcheng Zeng et.al.	2407.11282	link
2024-07-15	AstroMLab 1: Who Wins Astronomy Jeopardy!?	Yuan-Sen Ting et.al.	2407.11194	null
2024-07-15	Inertial Confinement Fusion Forecasting via LLMs	Mingkai Chen et.al.	2407.11098	null
2024-07-15	Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis	Yunting Liu et.al.	2407.10899	null
2024-07-24	MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs	Quang H. Nguyen et.al.	2407.10834	link
2024-07-15	Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval	Shengjie Ma et.al.	2407.10805	link
2024-07-15	GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework	Hannah Sansford et.al.	2407.10793	null
2024-07-15	CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses	Jing Yao et.al.	2407.10725	null
2024-07-15	Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews	Lucas Joos et.al.	2407.10652	null
2024-07-14	GenSco: Can Question Decomposition based Passage Alignment improve Question Answering?	Barah Fazili et.al.	2407.10245	null
2024-07-14	Look Within, Why LLMs Hallucinate: A Causal Perspective	He Li et.al.	2407.10153	null
2024-07-13	Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues	KuanChao Chu et.al.	2407.09897	null
2024-07-13	Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks	Shengbin Yue et.al.	2407.09893	link
2024-07-13	On Mitigating Code LLM Hallucinations with API Documentation	Nihal Jain et.al.	2407.09726	null
2024-07-22	Mitigating Entity-Level Hallucination in Large Language Models	Weihang Su et.al.	2407.09417	link
2024-07-12	PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents	Saber Zerhoudi et.al.	2407.09394	link
2024-07-12	DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection	Sangpil Youm et.al.	2407.09283	null
2024-07-12	The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs	Anh Thu Maria Bui et.al.	2407.09152	null
2024-07-12	Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors	Nico Daheim et.al.	2407.09136	link
2024-07-12	Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations	David N. Palacio et.al.	2407.08983	null
2024-07-15	Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation	Biqing Qi et.al.	2407.08940	link
2024-07-12	Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?	Yingming Pu et.al.	2407.08922	link
2024-07-11	Evaluating Nuanced Bias in Large Language Model Free Response Answers	Jennifer Healey et.al.	2407.08842	null
2024-07-11	Proving that Cryptic Crossword Clue Answers are Correct	Martin Andrews et.al.	2407.08824	link
2024-07-11	Uncertainty Estimation of Large Language Models in Medical Question Answering	Jiaxin Wu et.al.	2407.08662	null
2024-07-11	$β$-DPO: Direct Preference Optimization with Dynamic $β$	Junkang Wu et.al.	2407.08639	link
2024-07-11	On the Universal Truthfulness Hyperplane Inside LLMs	Junteng Liu et.al.	2407.08582	link
2024-07-22	Lynx: An Open Source Hallucination Evaluation Model	Selvan Sunitha Ravi et.al.	2407.08488	null
2024-07-11	On the attribution of confidence to large language models	Geoff Keeling et.al.	2407.08388	null