Contributors Forks Stargazers Issues

Updated on 2025.06.28

Usage instructions: here

Table of Contents
  1. HealthLLM
  2. UncertaintyLLM

HealthLLM

Publish Date Title Authors PDF Code
2025-06-26 “What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets Akshay Paruchuri et.al. 2506.21532 null
2025-06-26 MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification Shadman Sobhan et.al. 2506.21199 null
2025-06-25 Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation Md Toufique Hasan et.al. 2506.20869 null
2025-06-25 An Agentic System for Rare Disease Diagnosis with Traceable Reasoning Weike Zhao et.al. 2506.20430 null
2025-06-25 ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset Yilin Wang et.al. 2506.20093 null
2025-06-24 DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction Weijieying Ren et.al. 2506.20059 null
2025-06-24 Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks Konstantinos Vrettos et.al. 2506.20009 null
2025-06-24 MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration Yucheng Zhou et.al. 2506.19835 null
2025-06-24 LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis Lei Kang et.al. 2506.19702 null
2025-06-26 Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance Xuesong Li et.al. 2506.19683 null
2025-06-24 Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation Yuanhe Tian et.al. 2506.19665 null
2025-06-24 Automatic Posology Structuration : What role for LLMs? Natalia Bobkova et.al. 2506.19525 null
2025-06-24 EmoStage: A Framework for Accurate Empathetic Response Generation via Perspective-Taking and Phase Recognition Zhiyang Qi et.al. 2506.19279 null
2025-06-23 Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs Janak Kapuriya et.al. 2506.19185 null
2025-06-23 GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model Amir Faiyaz et.al. 2506.19164 null
2025-06-23 Enhancing Biosecurity in Tamper-Resistant Large Language Models With Quantum Gradient Descent Fahmida Hai et.al. 2506.19086 null
2025-06-23 FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation Nitish Nagesh et.al. 2506.19082 null
2025-06-23 RWESummary: A Framework and Test for Choosing Large Language Models to Summarize Real-World Evidence (RWE) Studies Arjun Mukerji et.al. 2506.18819 null
2025-06-23 MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis Yuting Zhang et.al. 2506.18512 null
2025-06-23 Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics Yousang Cho et.al. 2506.18387 null
2025-06-23 Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team Weilun Yu et.al. 2506.18348 null
2025-06-24 Co-persona: Leveraging LLMs and Expert Collaboration to Understand User Personas through Social Media Data Analysis Min Yin et.al. 2506.18269 null
2025-06-22 Programming Quantum Computers with Large Language Models Elena R. Henderson et.al. 2506.18125 null
2025-06-22 Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives Batool Haider et.al. 2506.18116 null
2025-06-22 Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster Fenghe Tang et.al. 2506.18034 null
2025-06-22 SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model Guankun Wang et.al. 2506.17873 null
2025-06-21 Engagement and Disclosures in LLM-Powered Cognitive Behavioral Therapy Exercises: A Factorial Design Comparing the Influence of a Robot vs. Chatbot Over Time Mina Kian et.al. 2506.17831 null
2025-06-21 Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs Catarina Pires et.al. 2506.17782 null
2025-06-21 Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages Matthias Schöffel et.al. 2506.17715 null
2025-06-21 LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning Haoxuan Che et.al. 2506.17562 null
2025-06-20 Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation Hao Guan et.al. 2506.17442 null
2025-06-19 Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases Yubeen Bae et.al. 2506.17336 link
2025-06-14 Automating Financial Statement Audits with Large Language Models Rushi Wang et.al. 2506.17282 null
2025-06-20 The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making Abinitha Gourabathina et.al. 2506.17163 null
2025-06-20 DistillNote: LLM-based clinical note summaries improve heart failure diagnosis Heloisa Oss Boll et.al. 2506.16777 null
2025-06-19 Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System Jianlin Shi et.al. 2506.16628 null
2025-06-19 A Scoping Review of Synthetic Data Generation for Biomedical Research and Applications Hanshu Rao et.al. 2506.16594 null
2025-06-19 Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support Sophie Chiang et.al. 2506.16473 null
2025-06-23 From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents Mohammad Amaan Sayeed et.al. 2506.15911 null
2025-06-18 Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning Chunlei Li et.al. 2506.15477 null
2025-06-18 DeVisE: Behavioral Testing of Medical Large Language Models Camila Zurdo Tagliabue et.al. 2506.15339 null
2025-06-18 Universal Laboratory Model: prognosis of abnormal clinical outcomes based on routine tests Pavel Karpov et.al. 2506.15330 link
2025-06-18 Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment Shrestha Ghosh et.al. 2506.15301 null
2025-06-18 Mapping Caregiver Needs to AI Chatbot Design: Strengths and Gaps in Mental Health Support for Alzheimer’s and Dementia Caregivers Jiayue Melissa Shi et.al. 2506.15047 null
2025-06-17 From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction? Shadman Sakib et.al. 2506.14949 link
2025-06-17 A Vision for Geo-Temporal Deep Research Systems: Towards Comprehensive, Transparent, and Reproducible Geo-Temporal Information Synthesis Bruno Martins et.al. 2506.14345 null
2025-06-17 Abstract Meaning Representation for Hospital Discharge Summarization Paul Landes et.al. 2506.14101 link
2025-06-17 InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking Rahul Seetharaman et.al. 2506.14086 null
2025-06-13 Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases Bonam Mingole et.al. 2506.13805 null
2025-06-13 Enhancing Clinical Decision Support and EHR Insights through LLMs and the Model Context Protocol: An Open-Source MCP-FHIR Framework Abul Ehtesham et.al. 2506.13800 null
2025-06-18 The NordDRG AI Benchmark for Large Language Models Tapio Pitkäranta et.al. 2506.13790 link
2025-06-16 Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems Shang-Chi Tsai et.al. 2506.13692 null
2025-06-16 Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning David Bani-Harouni et.al. 2506.13474 null
2025-06-16 Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models James Chua et.al. 2506.13206 null
2025-06-16 Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs Gyutaek Oh et.al. 2506.13102 null
2025-06-15 CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation Naihao Deng et.al. 2506.12936 null
2025-06-15 Towards Visualizing Electronic Medical Records via Natural Language Queries Haodi Zhang et.al. 2506.12837 null
2025-06-14 Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders Ananya Joshi et.al. 2506.12576 link
2025-06-14 Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare Yubin Kim et.al. 2506.12482 null
2025-06-14 Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs Erica Cai et.al. 2506.12367 null
2025-06-20 Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning Xiaotian Zhang et.al. 2506.12307 null
2025-06-13 Semantic Scheduling for LLM Inference Wenyue Hua et.al. 2506.12204 link
2025-06-13 Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs Chenqian Le et.al. 2506.12182 null
2025-06-10 Risks & Benefits of LLMs & GenAI for Platform Integrity, Healthcare Diagnostics, Cybersecurity, Privacy & AI Safety: A Comprehensive Survey, Roadmap & Implementation Blueprint Kiarash Ahi et.al. 2506.12088 null
2025-06-16 Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making Claudio Fanconi et.al. 2506.11887 null
2025-06-13 Converting Annotated Clinical Cases into Structured Case Report Forms Pietro Ferrazzi et.al. 2506.11666 null
2025-06-24 RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning Yu Wang et.al. 2506.11555 null
2025-06-13 Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs Wei Li et.al. 2506.11512 link
2025-06-13 Predicting Early-Onset Colorectal Cancer with Large Language Models Wilson Lau et.al. 2506.11410 null
2025-06-13 Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning Liying Wang et.al. 2506.11376 null
2025-06-12 LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic Weibing Zheng et.al. 2506.11221 link
2025-06-11 Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning Ji Young Byun et.al. 2506.11166 null
2025-06-16 ADAgent: LLM Agent for Alzheimer’s Disease Analysis with Collaborative Coordinator Wenlong Hou et.al. 2506.11150 null
2025-06-19 Autonomous Computer Vision Development with Agentic AI Jin Kim et.al. 2506.11140 link
2025-06-10 Scalable Medication Extraction and Discontinuation Identification from Electronic Health Records Using Large Language Models Chong Shao et.al. 2506.11137 null
2025-06-10 Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK Carlos Garcia-Fernandez et.al. 2506.11129 null
2025-06-09 KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations Junyu Liu et.al. 2506.11114 null
2025-06-16 Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation Uttej Kallakurik et.al. 2506.11105 null
2025-06-12 The Role of Generative AI in Facilitating Social Interactions: A Scoping Review T. T. J. E. Arets et.al. 2506.10927 null
2025-06-12 Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs Alberto Testoni et.al. 2506.10769 null
2025-06-12 Large Language Models for Detection of Life-Threatening Texts Thanh Thi Nguyen et.al. 2506.10687 null
2025-06-11 HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding Yanzhao Shi et.al. 2506.09634 null
2025-06-11 ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Yu Sun et.al. 2506.09513 link
2025-06-11 Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers Ilanit Sobol et.al. 2506.09495 null
2025-06-11 “Is This Really a Human Peer Supporter?”: Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions Kellie Yu Hui Sim et.al. 2506.09354 null
2025-06-10 The Curious Language Model: Strategic Test-Time Information Acquisition Michael Cooper et.al. 2506.09173 null
2025-06-10 CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling Yahan Li et.al. 2506.08584 link
2025-06-10 RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being Rahatara Ferdousi et.al. 2506.08486 null
2025-06-10 Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving Yuxuan Zhou et.al. 2506.08349 link
2025-06-09 Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework Melissa Estevez et.al. 2506.08231 null
2025-06-09 Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System Fan Yang et.al. 2506.07997 null
2025-06-11 MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models Philip R. Liu et.al. 2506.07400 link
2025-06-08 Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance Mohammadreza Chavoshi et.al. 2506.07273 null
2025-06-07 AI PsyRoom: Artificial Intelligence Platform for Segmented Yearning and Reactive Outcome Optimization Method Yigui Feng et.al. 2506.06740 null
2025-06-07 C-PATH: Conversational Patient Assistance and Triage in Healthcare System Qi Shi et.al. 2506.06737 null
2025-06-07 DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains Zhihui Chen et.al. 2506.06705 null
2025-06-07 Interpretable Depression Detection from Social Media Text Using LLM-Derived Embeddings Samuel Kim et.al. 2506.06616 null
2025-06-07 MedCite: Can Language Models Generate Verifiable Text for Medicine? Xiao Wang et.al. 2506.06605 null
2025-06-14 RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints Tan-Hanh Pham et.al. 2506.06600 null
2025-06-02 Large Language Models for EEG: A Comprehensive Survey and Taxonomy Naseem Babu et.al. 2506.06353 null
2025-06-01 Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support Wu Hao Ran et.al. 2506.06340 null
2025-06-06 Building Models of Neurological Language Henry Watkins et.al. 2506.06208 null
2025-06-09 MIRIAD: Augmenting LLMs with millions of medical query-response pairs Qinyue Zheng et.al. 2506.06091 null
2025-06-06 BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions Saptarshi Sengupta et.al. 2506.05766 null
2025-06-06 Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning Yangui Fang et.al. 2506.05671 null
2025-06-06 Can LLMs Express Personality Across Cultures? Introducing CulturalPersonas for Evaluating Trait Alignment Priyanka Dey et.al. 2506.05670 null
2025-06-05 Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction Zhihao Tang et.al. 2506.05428 null
2025-06-03 Beyond RAG: Reinforced Reasoning Augmented Generation for Clinical Notes Lo Pang-Yun Ting et.al. 2506.05386 null
2025-06-05 Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation Soumitra Ghosh et.al. 2506.05073 null
2025-06-05 From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs Chantal Pellegrini et.al. 2506.04831 null
2025-06-05 A MISMATCHED Benchmark for Scientific Natural Language Inference Firoz Shaik et.al. 2506.04603 null
2025-06-04 Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification Payel Bhattacharjee et.al. 2506.04450 null
2025-06-04 MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale Ran Xu et.al. 2506.04405 null
2025-06-04 AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents Fengze Liu et.al. 2506.04293 null
2025-06-04 A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization Sarvesh Soni et.al. 2506.04156 null
2025-06-13 LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation Ming Zhang et.al. 2506.04078 link
2025-06-04 AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data Sina Rashidian et.al. 2506.04032 null
2025-06-04 Trustworthy Medical Question Answering: An Evaluation-Centric Survey Yinuo Wang et.al. 2506.03659 null
2025-06-04 VChatter: Exploring Generative Conversational Agents for Simulating Exposure Therapy to Reduce Social Anxiety Han Zhang et.al. 2506.03520 null
2025-06-05 Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing Shigeng Chen et.al. 2506.03490 link
2025-06-04 Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer’s Disease Detection Chuyuan Li et.al. 2506.03476 null
2025-06-03 Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems Michael E. Garcia-Alcoser et.al. 2506.03259 null
2025-06-03 Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis Richard Armitage et.al. 2506.02987 null
2025-06-03 FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models Yan Gao et.al. 2506.02961 null
2025-06-03 A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning Xuejiao Zhao et.al. 2506.02470 link
2025-06-02 A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering Madan Krishnamurthy et.al. 2506.02160 null
2025-06-04 The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning Edward Y. Chang et.al. 2506.02139 null
2025-06-02 Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis Chi-Jane Chen et.al. 2506.01918 null
2025-06-02 Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation Jiaxi Sheng et.al. 2506.01841 null
2025-06-02 Reasoning-Based Approach with Chain-of-Thought for Alzheimer’s Detection Using Speech and Large Language Models Chanwoo Park et.al. 2506.01683 null
2025-06-02 Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents Manan Suri et.al. 2506.01344 null
2025-06-02 Evaluating Large Language Models in Crisis Detection: A Real-World Benchmark from Psychological Support Hotlines Guifeng Deng et.al. 2506.01329 null
2025-06-02 DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models Jiancheng Ye et.al. 2506.01257 null
2025-06-02 MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine Shufeng Kong et.al. 2506.01252 null
2025-06-01 Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation Pimchanok Sukjai et.al. 2506.01118 null
2025-06-03 Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation Running Yang et.al. 2506.00612 null
2025-05-31 AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation Ming Wang et.al. 2506.00551 link
2025-05-31 Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization Suhas BN et.al. 2506.00448 null
2025-05-31 Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees’ Dialogue to Facilitate Nurse Communication Training Keyeun Lee et.al. 2506.00386 null
2025-05-30 MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform Hayoung Jung et.al. 2506.00308 null
2025-06-03 PersianMedQA: Language-Centric Evaluation of LLMs in the Persian Medical Domain Mohammad Javad Ranjbar Kalahroodi et.al. 2506.00250 null
2025-05-30 Structuring Radiology Reports: Challenging LLMs with Lightweight Models Johannes Moll et.al. 2506.00200 null
2025-05-30 Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models Fardin Ahsan Sakib et.al. 2506.00134 null
2025-06-04 ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases Yuchong Li et.al. 2506.00095 null
2025-05-30 Artificial Empathy: AI based Mental Health Aditya Naik et.al. 2506.00081 null
2025-05-29 Evaluating Prompt Engineering Techniques for Accuracy and Confidence Elicitation in Medical LLMs Nariman Naderi et.al. 2506.00072 null
2025-05-29 Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports Sina Amirrajab et.al. 2506.00060 null
2025-05-30 Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs Juraj Vladika et.al. 2505.24830 null
2025-05-30 A survey of using EHR as real-world evidence for discovering and validating new drug indications Nabasmita Talukdar et.al. 2505.24767 null
2025-06-06 LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews Christian Jaumann et.al. 2505.24757 link
2025-06-02 Automated Structured Radiology Report Generation Jean-Benoit Delbrouck et.al. 2505.24223 null
2025-05-30 Semi-structured LLM Reasoners Can Be Rigorously Audited Jixuan Leng et.al. 2505.24217 null
2025-05-30 Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning Jiacheng Lin et.al. 2505.24105 null
2025-05-29 MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering Yuexing Hao et.al. 2505.24040 null
2025-05-28 Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction Mai Ali et.al. 2505.23822 null
2025-05-27 MedOrchestra: A Hybrid Cloud-Local LLM Approach for Clinical Data Interpretation Sihyeon Lee et.al. 2505.23806 null
2025-06-02 MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks Suhana Bedi et.al. 2505.23802 null
2025-06-03 Can Large Language Models Challenge CNNs in Medical Image Analysis? Shibbir Ahmed et.al. 2505.23503 null
2025-05-29 Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons Krithik Vishwanath et.al. 2505.23477 null
2025-05-29 Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble Amit Kumthekar et.al. 2505.23075 null
2025-05-29 CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents Zhen Xiang et.al. 2505.23055 link
2025-05-29 Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction Guangyi Liu et.al. 2505.23034 null
2025-05-29 Exploring Scaling Laws for EHR Foundation Models Sheng Zhang et.al. 2505.22964 null
2025-05-29 LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements Jianwei Wang et.al. 2505.22959 link
2025-05-30 ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room Nikita Mehandru et.al. 2505.22919 null
2025-05-28 Can Large Language Models Match the Conclusions of Systematic Reviews? Christopher Polzak et.al. 2505.22787 link
2025-05-28 Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation Yunsoo Kim et.al. 2505.22222 null
2025-05-28 Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection Jinming Zhang et.al. 2505.22029 link
2025-05-28 Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning Qihuang Zhong et.al. 2505.21958 null
2025-05-28 Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding Hanyin Wang et.al. 2505.21908 null
2025-05-29 Query, Don’t Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries Josefa Lia Stoisser et.al. 2505.21801 null
2025-05-27 BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum Yubin Kim et.al. 2505.21757 null
2025-05-27 Counterfactual Simulatability of LLM Explanations for Generation Tasks Marvin Limpijankit et.al. 2505.21740 null
2025-05-24 Vision Meets Language: A RAG-Augmented YOLOv8 Framework for Coffee Disease Diagnosis and Farmer Assistance Semanto Mondal et.al. 2505.21544 link
2025-05-27 Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making Yihan Wang et.al. 2505.21503 null
2025-05-27 Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery Lina Zhao et.al. 2505.21418 null
2025-05-27 Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts Yuxin Zhu et.al. 2505.21324 null
2025-05-27 Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings Gunjan Balde et.al. 2505.21242 null
2025-05-27 Simulating Ethics: Using LLM Debate Panels to Model Deliberation on Medical Dilemmas Hazem Zohny et.al. 2505.21112 null
2025-05-27 MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems Kai Chen et.al. 2505.20824 link
2025-05-27 Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients Hyungjun Park et.al. 2505.20609 null
2025-05-26 In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts Filomene Roquefort et.al. 2505.20491 null
2025-05-24 Do LLMs have a Gender (Entropy) Bias? Sonal Prabhune et.al. 2505.20343 null
2025-05-23 PMOA-TTS: Introducing the PubMed Open Access Textual Times Series Corpus Shahriar Noroozizadeh et.al. 2505.20323 null
2025-05-23 Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP Satya Narayana Cheetirala et.al. 2505.20320 null
2025-05-26 Fine-grained List-wise Alignment for Generative Medication Recommendation Chenxiao Fan et.al. 2505.20218 link
2025-05-28 Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations Mohit Chandra et.al. 2505.20201 null
2025-05-26 Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare Natallia Kokash et.al. 2505.20020 null
2025-05-26 Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation Hoyun Song et.al. 2505.20014 link
2025-05-26 An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning Andrew Zamai et.al. 2505.19954 null
2025-05-30 FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks Atsunori Moteki et.al. 2505.19662 null
2025-05-26 DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue Yichun Feng et.al. 2505.19630 link
2025-05-26 AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare Ying Xiao et.al. 2505.19562 link
2025-05-25 Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning Shaohao Rui et.al. 2505.19213 null
2025-05-25 CardioCoT: Hierarchical Reasoning for Multimodal Survival Analysis Shaohao Rui et.al. 2505.19195 null
2025-05-25 The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework Feiran Liu et.al. 2505.19139 null
2025-05-25 Toward Human Centered Interactive Clinical Question Answering System Dina Albassam et.al. 2505.18928 null
2025-05-24 TULUN: Transparent and Adaptable Low-resource Machine Translation Raphaël Merx et.al. 2505.18683 null
2025-05-24 DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation Zhihao Jia et.al. 2505.18630 null
2025-05-24 CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs Yiqing Zhang et.al. 2505.18527 null
2025-05-24 From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data Ugur Kursuncu et.al. 2505.18464 null
2025-05-24 MedScore: Factuality Evaluation of Free-Form Medical Answers Heyuan Huang et.al. 2505.18452 link
2025-05-23 Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering Jessica Tang et.al. 2505.18412 link
2025-05-23 RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification Praphul Singh et.al. 2505.18380 null
2025-05-23 Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? Waleed Reda et.al. 2505.18350 null
2025-05-23 PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language Naghmeh Jamali et.al. 2505.18331 null
2025-05-23 TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification Jianghao Wu et.al. 2505.18283 link
2025-05-23 Will Large Language Models Transform Clinical Prediction? Yusuf Yildiz et.al. 2505.18246 null
2025-05-22 Towards medical AI misalignment: a preliminary study Barbara Puccio et.al. 2505.18212 null
2025-05-23 Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Che Liu et.al. 2505.17952 null
2025-05-23 PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions Daeun Kyung et.al. 2505.17818 null
2025-05-23 EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications Ancheng Xu et.al. 2505.17654 null
2025-05-23 WiNGPT-3.0 Technical Report Boqin Zhuang et.al. 2505.17387 link
2025-05-23 AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing Yinghui Huang et.al. 2505.17380 null
2025-05-22 CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports Xiao Yu Cindy Zhang et.al. 2505.17265 null
2025-05-22 CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation Ibrahim Ethem Hamamci et.al. 2505.17167 null
2025-05-22 Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands Kristin Qi et.al. 2505.17137 null
2025-05-21 Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models Zongru Shao et.al. 2505.17119 null
2025-05-21 Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation Kristine Ann M. Carandang et.al. 2505.17095 null
2025-05-18 Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases Valentina Carbonari et.al. 2505.17065 null
2025-05-15 Assessing the Quality of AI-Generated Clinical Notes: A Validated Evaluation of a Large Language Model Scribe Erin Palm et.al. 2505.17047 null
2025-05-22 MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning Suhao Yu et.al. 2505.16964 null
2025-05-22 A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP Issey Sukeda et.al. 2505.16661 link
2025-05-22 Collaboration among Multiple Large Language Models for Medical Question Answering Kexin Shang et.al. 2505.16648 null
2025-05-22 No Black Boxes: Interpretable and Interactable Predictive Healthcare with Knowledge-Enhanced Agentic Causal Discovery Xiaoxue Han et.al. 2505.16288 null
2025-05-22 Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools Panagiotis Lymperopoulos et.al. 2505.16113 null
2025-05-23 Continually Self-Improving Language Models for Bariatric Surgery Question–Answering Yash Kumar Atri et.al. 2505.16102 null
2025-05-22 TrialPanorama: Database and Benchmark for Systematic Review and Design of Clinical Trials Zifeng Wang et.al. 2505.16097 null
2025-05-22 Multi-modal Integration Analysis of Alzheimer’s Disease Using Large Language Models and Knowledge Graphs Kanan Kiguchi et.al. 2505.15747 null
2025-05-21 Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling He Hu et.al. 2505.15715 null
2025-05-21 Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs Lang Gao et.al. 2505.15524 null
2025-05-22 MentalMAC: Enhancing Large Language Models for Detecting Mental Manipulation via Multi-Task Anti-Curriculum Distillation Yuansheng Gao et.al. 2505.15255 null
2025-05-21 AI Solutionism and Digital Self-Tracking with Wearables Hannah R. Nolasco et.al. 2505.15162 null
2025-05-21 A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents Ian Steenstra et.al. 2505.15108 null
2025-05-23 Diagnosing our datasets: How does my language model learn clinical information? Furong Jia et.al. 2505.15024 null
2025-05-20 MedBrowseComp: Benchmarking Medical Deep Research and Computer Use Shan Chen et.al. 2505.14963 null
2025-05-20 RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection Wenjun Hou et.al. 2505.14318 link
2025-05-20 s3: You Don’t Need That Much Data to Train a Search Agent via RL Pengcheng Jiang et.al. 2505.14146 link
2025-05-20 ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data Xinzhe Zheng et.al. 2505.14038 null
2025-05-20 Fragments to Facts: Partial-Information Fragment Inference from LLMs Lucas Rosenblatt et.al. 2505.13819 link
2025-05-19 VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation Yubin Kim et.al. 2505.13577 null
2025-05-14 Source framing triggers systematic evaluation bias in Large Language Models Federico Germani et.al. 2505.13488 null
2025-05-11 Evaluating Reasoning LLMs for Suicide Screening with the Columbia-Suicide Severity Rating Scale Avinash Patil et.al. 2505.13480 link
2025-05-19 Learnware of Language Models: Specialized Small Language Models Can Do Big Zhi-Hao Tan et.al. 2505.13425 link
2025-05-19 Dementia Through Different Eyes: Explainable Modeling of Human and LLM Perceptions for Early Awareness Lotem Peled-Cohen et.al. 2505.13418 null
2025-05-19 Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice Zhi Liu et.al. 2505.13156 null
2025-05-19 Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning Xiaoyu Yang et.al. 2505.13081 null
2025-05-19 GAP: Graph-Assisted Prompts for Dialogue-based Medication Recommendation Jialun Zhong et.al. 2505.12888 null
2025-05-19 EpiLLM: Unlocking the Potential of Large Language Models in Epidemic Forecasting Chenghua Gong et.al. 2505.12738 null
2025-05-18 ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents Navid Madani et.al. 2505.12531 null
2025-05-18 MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks Yinghao Zhu et.al. 2505.12371 link
2025-05-18 PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs Sriram Selvam et.al. 2505.12238 link
2025-05-17 AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation Xiechi Zhang et.al. 2505.11887 null
2025-05-21 LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models Ryan Chen et.al. 2505.11772 null
2025-05-20 MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports Kevin Wu et.al. 2505.11733 link
2025-05-16 MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models Xiaomin Li et.al. 2505.11613 null
2025-05-16 Heart2Mind: Human-Centered Contestable Psychiatric Disorder Diagnosis System using Wearable ECG Monitors Hung Nguyen et.al. 2505.11612 link
2025-05-16 Disentangling Reasoning and Knowledge in Medical Large Language Models Rahul Thapa et.al. 2505.11462 null
2025-05-16 CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs Sijia Chen et.al. 2505.11413 null
2025-05-15 Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI Agnik Saha et.al. 2505.10472 null
2025-05-20 AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges Ranjan Sapkota et.al. 2505.10468 null
2025-05-15 Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation Yue Guo et.al. 2505.10409 null
2025-05-15 From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making Dubai Li et.al. 2505.10282 link
2025-05-15 The Evolving Landscape of Generative Large Language Models and Traditional Natural Language Processing in Medicine Rui Yang et.al. 2505.10261 null
2025-05-15 What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs Xinlan Yan et.al. 2505.10113 null
2025-05-14 Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models Aditya Nagori et.al. 2505.09805 null
2025-05-14 A Multimodal Multi-Agent Framework for Radiology Report Generation Ziruo Yi et.al. 2505.09787 null
2025-05-16 Tales of the 2025 Los Angeles Fire: Hotwash for Public Health Concerns in Reddit via LLM-Enhanced Topic Modeling Sulong Zhou et.al. 2505.09665 null
2025-05-13 Performance Gains of LLMs With Humans in a World of LLMs Versus Humans Lucas McCullum et.al. 2505.08902 null
2025-05-13 NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context Ben Yao et.al. 2505.08734 null
2025-05-13 LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs K M Sajjadul Islam et.al. 2505.08704 null
2025-05-13 TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching Majd Abdallah et.al. 2505.08508 null
2025-05-13 Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions Lata Pangtey et.al. 2505.08464 null
2025-05-13 Decoding Neighborhood Environments with Large Language Models Andrew Cart et.al. 2505.08163 null
2025-05-13 Communication Styles and Reader Preferences of LLM and Human Experts in Explaining Health Information Jiawei Zhou et.al. 2505.08143 null
2025-05-12 Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models Weiyi Wu et.al. 2505.07968 null
2025-05-11 TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking Ching Nam Hang et.al. 2505.07891 null
2025-05-07 A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas Pranav Narayanan Venkit et.al. 2505.07850 null
2025-05-12 Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 2030 Mouxiao Bian et.al. 2505.07205 null
2025-05-12 KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification Hajar Sakai et.al. 2505.07162 null
2025-05-11 Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI Chao Ding et.al. 2505.06912 null
2025-05-10 Utilizing LLMs to Investigate the Disputed Role of Evidence in Electronic Cigarette Health Policy Formation in Australia and the UK Damian Curran et.al. 2505.06782 null
2025-05-10 NeuroPal: A Clinically-Informed Multimodal LLM Assistant for Mental Health Combining Sleep Chronotherapy, Cognitive Behavioral Reframing, and Adaptive Phytochemical Intervention Xiaoran Han et.al. 2505.06640 null
2025-05-10 Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning H M Dipu Kabir et.al. 2505.06592 link
2025-05-07 Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs Hung Manh Pham et.al. 2505.06296 null
2025-05-15 Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information Joshua Harris et.al. 2505.06046 null
2025-05-09 A Day in Their Shoes: Using LLM-Based Perspective-Taking Interactive Fiction to Reduce Stigma Toward Dirty Work Xiangzhe Yuan et.al. 2505.05786 null
2025-05-09 Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications Da Wu et.al. 2505.05736 link
2025-05-08 Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models Wei Peng et.al. 2505.05189 link
2025-05-08 Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization Ajwad Abrar et.al. 2505.05070 null
2025-05-07 Retrieval Augmented Generation Evaluation for Health Documents Mario Ceresa et.al. 2505.04680 null
2025-05-06 Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction Paul Landes et.al. 2505.04655 null
2025-05-06 Advancing Conversational Diagnostic AI with Multimodal Reasoning Khaled Saab et.al. 2505.04653 null
2025-05-06 FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights Chengzhang Yu et.al. 2505.04649 null
2025-05-05 ChatGPT for automated grading of short answer questions in mechanical ventilation Tejas Jade et.al. 2505.04645 null
2025-05-07 The Aloe Family Recipe for Open and Specialized Healthcare LLMs Dario Garcia-Gasulla et.al. 2505.04388 null
2025-05-07 Can Language Models Understand Social Behavior in Clinical Conversations? Manas Satish Bedmutha et.al. 2505.04152 null
2025-05-07 Natural Language Generation in Healthcare: A Review of Methods and Applications Mengxian Lyu et.al. 2505.04073 null
2025-04-30 Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding Trilok Padhi et.al. 2505.03788 null
2025-04-30 mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging Eleftherios Tzanis et.al. 2505.03785 link
2025-04-30 ALFRED: Ask a Large-language model For Reliable ECG Diagnosis Jin Yu et.al. 2505.03781 null
2025-05-06 Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis Shuang Zhou et.al. 2505.03467 null
2025-05-06 MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks Mouath Abu Daoud et.al. 2505.03427 link
2025-05-06 Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation Mohammad Shoaib Ansari et.al. 2505.03406 null
2025-05-06 Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback Shijing Zhu et.al. 2505.03293 null
2025-05-02 Enhancing ML Model Interpretability: Leveraging Fine-Tuned Large Language Models for Better Understanding of AI Jonas Bokstaller et.al. 2505.02859 null
2025-05-05 Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry Junu Kim et.al. 2505.02722 link
2025-05-05 Structure Causal Models and LLMs Integration in Medical Visual Question Answering Zibo Xu et.al. 2505.02703 null
2025-05-05 AI Standardized Patient Improves Human Conversations in Advanced Cancer Care Kurtis Haut et.al. 2505.02694 link
2025-05-08 A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law Qianjun Pan et.al. 2505.02665 null
2025-05-08 Bielik v3 Small: Technical Report Krzysztof Ociepa et.al. 2505.02550 null
2025-05-05 Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors Ryan Louie et.al. 2505.02428 null
2025-05-04 Generative AI in clinical practice: novel qualitative evidence of risk and responsible use of Google’s NotebookLM Max Reuter et.al. 2505.01955 null
2025-05-03 Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings Alexander Davis et.al. 2505.01711 null
2025-05-03 High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers Brian Wong et.al. 2505.01693 null
2025-05-02 Emotions in the Loop: A Survey of Affective Computing for Emotional Support Karishma Hegde et.al. 2505.01542 null
2025-05-12 Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications Jiawei He et.al. 2505.01146 null
2025-05-10 SSRLBot: Designing and Developing a Large Language Model-based Agent using Socially Shared Regulated Learning Xiaoshan Huang et.al. 2505.00945 null
2025-05-05 Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs Dung Nguyen et.al. 2505.00744 null
2025-05-01 Red Teaming Large Language Models for Healthcare Vahid Balazadeh et.al. 2505.00467 null
2025-05-01 KoACD: The First Korean Adolescent Dataset for Cognitive Distortion Analysis JunSeo Kim et.al. 2505.00367 null
2025-05-01 AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care Md Asaduzzaman Jabin et.al. 2505.00275 link
2025-04-28 MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis Yuyang Sha et.al. 2505.00032 null
2025-04-21 Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models Tri Nguyen et.al. 2505.00010 null
2025-04-30 TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments Sichang Tu et.al. 2504.21851 null
2025-04-30 TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training Shengqian Wang et.al. 2504.21735 null
2025-04-30 XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Marco Arazzi et.al. 2504.21700 null
2025-04-30 UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation Linshan Wu et.al. 2504.21336 link
2025-04-30 Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA Xuanzhao Dong et.al. 2504.21252 link
2025-04-29 A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces Juliana Barbosa et.al. 2504.21211 null
2025-04-29 Multimodal Large Language Models for Medicine: A Comprehensive Survey Jiarui Ye et.al. 2504.21051 null
2025-04-23 Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh MD Thamed Bin Zaman Chowdhury et.al. 2504.21025 null
2025-04-29 Jekyll-and-Hyde Tipping Point in an AI’s Behavior Neil F. Johnson et.al. 2504.20980 null
2025-04-29 ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Ziqing Fan et.al. 2504.20930 link
2025-04-29 Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records Jesus Lovon et.al. 2504.20547 null
2025-04-30 Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging Neil K. R. Sehgal et.al. 2504.20519 null
2025-04-29 “I’ve talked to ChatGPT about my issues last night.”: Examining Mental Health Conversations with Large Language Models through Reddit Analysis Kyuha Jung et.al. 2504.20320 null
2025-04-28 OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis Jinglin He et.al. 2504.20118 null
2025-04-28 Transforming Evidence Synthesis: A Systematic Review of the Evolution of Automated Meta-Analysis in the Age of AI Lingbo Li et.al. 2504.20113 null
2025-04-15 Recommending Clinical Trials for Online Patient Cases using Artificial Intelligence Joey Chan et.al. 2504.20059 null
2025-04-28 Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI Hugo Georgenthum et.al. 2504.19918 null
2025-04-28 A Tripartite Perspective on GraphRAG Michael Banf et.al. 2504.19667 null
2025-04-28 m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training Meng Xiao et.al. 2504.19565 null
2025-05-01 BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text Jiageng Wu et.al. 2504.19467 link
2025-04-27 HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer’s Disease Qiuhui Chen et.al. 2504.19075 null
2025-04-27 Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models Anindya Bijoy Das et.al. 2504.19061 null
2025-04-26 AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression Dong Whi Yoo et.al. 2504.18932 null
2025-04-26 Clinical knowledge in LLMs does not translate to human interactions Andrew M. Bean et.al. 2504.18919 link
2025-04-25 Proof-of-TBI – Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction Ross Gore et.al. 2504.18671 null
2025-04-22 Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes Guanchen Wu et.al. 2504.18569 null
2025-04-25 Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers Jared Moore et.al. 2504.18412 link
2025-04-25 MAGI: Multi-Agent Guided Interview for Psychiatric Assessment Guanqun Bi et.al. 2504.18260 null
2025-04-25 Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization Wataru Kawakami et.al. 2504.18080 null
2025-05-05 Optimism, Expectation, or Sarcasm? Multi-Class Hope Speech Detection in Spanish and English Sabur Butt et.al. 2504.17974 null
2025-04-24 LLM Agent Swarm for Hypothesis-Driven Drug Discovery Kevin Song et.al. 2504.17967 null
2025-04-24 Replay to Remember: Retaining Domain Knowledge in Streaming Language Models Sneh Pillai et.al. 2504.17780 null
2025-04-24 Towards a HIPAA Compliant Agentic AI System in Healthcare Subash Neupane et.al. 2504.17669 null
2025-04-24 PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare Jose G. Moreno et.al. 2504.17360 null
2025-04-24 Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues Jinfeng Zhou et.al. 2504.17238 null
2025-04-25 The Rise of Small Language Models in Healthcare: A Comprehensive Survey Muskan Garg et.al. 2504.17119 null
2025-04-23 Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study Andy Li et.al. 2504.16601 null
2025-04-23 Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition Zhenguang Zhong et.al. 2504.16504 null
2025-04-23 ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs Fahmida Liza Piya et.al. 2504.16394 link
2025-04-22 Investigating LLMs in Clinical Triage: Promising Capabilities, Persistent Intersectional Biases Joseph Lee et.al. 2504.16273 null
2025-04-21 Measuring Interest Group Positions on Legislation: An AI-Driven Analysis of Lobbying Reports Jiseon Kim et.al. 2504.15333 link
2025-04-21 Med-CoDE: Medical Critique based Disagreement Evaluation Framework Mohit Gupta et.al. 2504.15330 null
2025-04-21 POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications Chunjing Gan et.al. 2504.14917 null
2025-04-25 A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs Yihan Lin et.al. 2504.14657 null
2025-04-20 HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models Fan Gao et.al. 2504.14594 null
2025-04-19 Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations Katie Matton et.al. 2504.14150 link
2025-04-18 A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task Laerdon Kim et.al. 2504.14066 null
2025-04-17 Deep literature reviews: an application of fine-tuned language models to migration research Stefano M. Iacus et.al. 2504.13685 null
2025-04-18 LLM Sensitivity Evaluation Framework for Clinical Diagnosis Chenwei Yan et.al. 2504.13475 null
2025-04-17 ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images Sangwook Kim et.al. 2504.13023 null
2025-04-17 Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Chenghao Fan et.al. 2504.12737 null
2025-04-16 Leveraging Large Language Models for Multi-Class and Multi-Label Detection of Drug Use and Overdose Symptoms on Social Media Muhammad Ahmad et.al. 2504.12355 null
2025-04-15 A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports Jing Wang et.al. 2504.12350 null
2025-04-14 Paging Dr. GPT: Extracting Information from Clinical Notes to Enhance Patient Predictions David Anderson et.al. 2504.12338 null
2025-04-14 “It Listens Better Than My Therapist”: Exploring Social Media Discourse on LLMs as Mental Health Tool Anna-Carolina Haensch et.al. 2504.12337 null
2025-04-13 QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model Zongxian Yang et.al. 2504.12334 null
2025-04-12 Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis Shahriar Noroozizadeh et.al. 2504.12326 null
2025-04-18 Selective Attention Federated Learning: Improving Privacy and Efficiency for Clinical Text Classification Yue Li et.al. 2504.11793 null
2025-04-16 Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records Md Sultan Al Nahian et.al. 2504.11792 null
2025-04-16 Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets Yongpei Ma et.al. 2504.11777 null
2025-04-15 Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions Wang Bill Zhu et.al. 2504.11373 link
2025-04-15 Learning to Be A Doctor: Searching for Effective Medical Agent Architectures Yangyang Zhuang et.al. 2504.11301 null
2025-04-26 Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs Yingjian Chen et.al. 2504.10982 null
2025-04-15 Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content Yilang Peng et.al. 2504.10878 null
2025-04-13 Federated Learning with Layer Skipping: Efficient Training of Large Language Models for Healthcare NLP Lihong Zhang et.al. 2504.10536 null
2025-04-08 Exposure to Content Written by Large Language Models Can Reduce Stigma Around Opioid Use Disorder in Online Communities Shravika Mittal et.al. 2504.10501 null
2025-04-14 CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation Jing Chen et.al. 2504.10418 null
2025-04-14 Performance of Large Language Models in Supporting Medical Diagnosis and Treatment Diogo Sousa et.al. 2504.10405 null
2025-04-20 Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families Shahriar Noroozizadeh et.al. 2504.10340 null
2025-04-20 Emotional Strain and Frustration in LLM Interactions in Software Engineering Cristina Martinez Montes et.al. 2504.10050 null
2025-04-19 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety Jiahao Qiu et.al. 2504.09689 link
2025-04-15 ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model Wuyang Lan et.al. 2504.09421 link
2025-04-12 Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries Koustuv Saha et.al. 2504.09271 null
2025-04-04 The Lyme Disease Controversy: An AI-Driven Discourse Analysis of a Quarter Century of Academic Debate and Divides Teo Susnjak et.al. 2504.08777 link
2025-04-01 Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation Xiaofan Zhou et.al. 2504.08768 null
2025-04-11 MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models Junmo Kim et.al. 2504.08329 link
2025-04-24 Can Reasoning LLMs Enhance Clinical Document Classification? Akram Mustafa et.al. 2504.08040 null
2025-04-14 Psychological Health Knowledge-Enhanced LLM-based Social Network Crisis Intervention Text Transfer Recognition Method Shurui Wu et.al. 2504.07983 null
2025-04-11 An LLM-Driven Multi-Agent Debate System for Mendelian Diseases Xinyang Zhou et.al. 2504.07881 null
2025-04-10 MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval-Augmented Generation Yixiang Chen et.al. 2504.07724 link
2025-04-17 PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization Yang Jiao et.al. 2504.07717 null
2025-04-10 Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction Kyoyun Choi et.al. 2504.07415 null
2025-04-09 Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging Siyuan Dai et.al. 2504.07336 null
2025-04-09 A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation Fatemeh Amrollahi et.al. 2504.07278 null
2025-04-09 Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis Umakanta Maharana et.al. 2504.06581 link
2025-04-08 Human Trust in AI Search: A Large-Scale Experiment Haiwen Li et.al. 2504.06435 null
2025-04-08 A Geometric-Aware Perspective and Beyond: Hybrid Quantum-Classical Machine Learning Methods Azadeh Alavia et.al. 2504.06328 null
2025-04-08 LExT: Towards Evaluating Trustworthiness of Natural Language Explanations Krithi Shailya et.al. 2504.06227 null
2025-04-08 TxGemma: Efficient and Agentic LLMs for Therapeutics Eric Wang et.al. 2504.06196 null
2025-04-11 Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups Rijul Magu et.al. 2504.06160 null
2025-04-08 How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM Jirong Zha et.al. 2504.05786 null
2025-04-07 The challenge of uncertainty quantification of large language models in medicine Zahra Atf et.al. 2504.05278 null
2025-04-07 On the Performance of an Explainable Language Model on PubMedQA Venkat Srinivasan et.al. 2504.05074 null
2025-04-07 Leveraging Large Language Models for Cost-Effective, Multilingual Depression Detection and Severity Assessment Longdi Xian et.al. 2504.04891 null
2025-04-07 Simulating Persuasive Dialogues on Meat Reduction with Generative Agents Georg Ahnert et.al. 2504.04872 link
2025-04-08 Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide Zhijie Duan et.al. 2504.04346 null
2025-04-06 MedM-VL: What Makes a Good Medical LVLM? Yiming Shi et.al. 2504.04323 link
2025-04-05 AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs Xinyu Mao et.al. 2504.04193 link
2025-04-05 A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models Aviv Brokman et.al. 2504.04083 null
2025-04-15 Do “New Snow Tablets” Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs Sifan Li et.al. 2504.03786 link
2025-04-02 Emerging Cyber Attack Risks of Medical AI Agents Jianing Qiu et.al. 2504.03759 null
2025-04-03 AD-GPT: Large Language Models in Alzheimer’s Disease Ziyu Liu et.al. 2504.03071 null
2025-04-03 Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models Chengyang He et.al. 2504.03051 null
2025-04-03 Bias in Large Language Models Across Clinical Applications: A Systematic Review Thanathip Suenghataiphorn et.al. 2504.02917 null
2025-04-16 OnRL-RAG: Real-Time Personalized Mental Health Dialogue System Ahsan Bilal et.al. 2504.02894 null
2025-04-01 TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet Luis Felipe et.al. 2504.02874 null
2025-04-01 Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction Enshuo Hsu et.al. 2504.02871 null
2025-04-04 A Survey of Large Language Models in Mental Health Disorder Detection on Social Media Zhuohan Ge et.al. 2504.02800 null
2025-04-03 AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology Xiang Feng et.al. 2504.02404 link
2025-04-02 Trapped by Expectations: Functional Fixedness in LLM-Enabled Chat Search Jiqun Liu et.al. 2504.02074 null
2025-04-02 Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment Abdelrahaman A. Hassan et.al. 2504.01767 null
2025-04-01 Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models Feng Chen et.al. 2504.01216 null
2025-04-01 Medical large language models are easily distracted Krithik Vishwanath et.al. 2504.01201 link
2025-04-04 MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs Juncheng Wu et.al. 2504.00993 link
2025-04-01 InformGen: An AI Copilot for Accurate and Compliant Clinical Research Consent Document Generation Zifeng Wang et.al. 2504.00934 null
2025-04-01 m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models Xiaoke Huang et.al. 2504.00869 null
2025-04-01 IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models Yunsoo Kim et.al. 2504.00748 null
2025-03-31 Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology Dou Liu et.al. 2504.00061 null
2025-03-31 Integrating Large Language Models with Human Expertise for Disease Detection in Electronic Health Records Jie Pan et.al. 2504.00053 null
2025-03-27 Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1 Birger Moell et.al. 2504.00016 null
2025-03-31 A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG Arshia Kermani et.al. 2503.24307 null
2025-03-31 IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots Mingyang Gu et.al. 2503.24021 null
2025-03-31 Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection Mahdi Amiri et.al. 2503.23873 null
2025-03-30 When LLM Therapists Become Salespeople: Evaluating Large Language Models for Ethical Motivational Interviewing Haein Kong et.al. 2503.23566 null
2025-04-01 A Scalable Framework for Evaluating Health Language Models Neil Mallinar et.al. 2503.23339 null
2025-03-29 Prediction of 30-day hospital readmission with clinical notes and EHR information Tiago Almeida et.al. 2503.23050 null
2025-04-03 Agentic Large Language Models, a survey Aske Plaat et.al. 2503.23037 null
2025-03-29 A Retrieval-Augmented Knowledge Mining Method with Deep Thinking LLMs for Biomedical Research and Clinical Support Yichun Feng et.al. 2503.23029 null
2025-03-29 Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective Xinyu Yao et.al. 2503.22954 null
2025-03-28 MediTools – Medical Education Powered by LLMs Amr Alshatnawi et.al. 2503.22769 link
2025-03-26 Susceptibility of Large Language Models to User-Driven Factors in Medical Queries Kyung Ho Lim et.al. 2503.22746 null
2025-03-25 LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation Sarah Martinson et.al. 2503.22719 link
2025-03-28 Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions Mohammad Almansoori et.al. 2503.22678 null
2025-04-08 Modeling Challenging Patient Interactions: LLMs for Medical Communication Training Anna Bodonhelyi et.al. 2503.22250 null
2025-03-31 PharmAgents: Building a Virtual Pharma with Large Language Model Agents Bowen Gao et.al. 2503.22164 null
2025-03-28 Leveraging LLMs for Predicting Unknown Diagnoses from Clinical Notes Dina Albassam et.al. 2503.22092 null
2025-03-27 Socially Constructed Treatment Plans: Analyzing Online Peer Interactions to Understand How Patients Navigate Complex Medical Conditions Madhusudan Basak et.al. 2503.21986 null
2025-03-27 RedditESS: A Mental Health Social Support Interaction Dataset – Understanding Effective Social Support to Refine AI-Driven Support Tools Zeyad Alghamdi et.al. 2503.21888 null
2025-03-27 Combining Artificial Users and Psychotherapist Assessment to Evaluate Large Language Model-based Mental Health Chatbots Florian Onur Kuhlmeier et.al. 2503.21540 null
2025-03-27 Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records Noah Losch et.al. 2503.21349 null
2025-03-26 Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters Mahmoud Alwakeel et.al. 2503.21004 null
2025-03-26 Clean & Clear: Feasibility of Safe LLM Clinical Guidance Julia Ive et.al. 2503.20953 null
2025-03-26 TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews Huimin Xu et.al. 2503.20666 null
2025-03-26 TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes Raj Sanjay Shah et.al. 2503.20648 null
2025-03-26 Low-resource Information Extraction with the European Clinical Case Corpus Soumitra Ghosh et.al. 2503.20568 null
2025-03-26 Explainable ICD Coding via Entity Linking Leonor Barreiros et.al. 2503.20508 null
2025-03-26 Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Zehui Liao et.al. 2503.20504 null
2025-03-25 Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder Changye Li et.al. 2503.20103 link
2025-03-25 Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications Ben Rahman et.al. 2503.19276 null
2025-03-25 PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping Sarah Pungitore et.al. 2503.19265 null
2025-03-24 Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages Tadesse Destaw Belay et.al. 2503.18253 null
2025-03-26 PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation Yiheng Zhong et.al. 2503.18227 link
2025-03-23 AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs Diwei Wang et.al. 2503.18141 null
2025-03-23 Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook Xu Zheng et.al. 2503.18016 null
2025-03-23 Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA Justice Ou et.al. 2503.17933 link
2025-03-23 MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation Hsin-Ling Hsu et.al. 2503.17900 null
2025-03-22 Satisfactory Medical Consultation based on Terminology-Enhanced Information Retrieval and Emotional In-Context Learning Kaiwen Zuo et.al. 2503.17876 null
2025-03-22 MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation Xiaodan Zhang et.al. 2503.17784 link
2025-03-22 GPBench: A Comprehensive and Fine-Grained Benchmark for Evaluating Large Language Models as General Practitioners Zheqing Li et.al. 2503.17599 null
2025-03-21 Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent Humza Nusrat et.al. 2503.17553 null
2025-03-21 An LLM-Powered Clinical Calculator Chatbot Backed by Verifiable Clinical Calculators and their Metadata Niranjan Kumar et.al. 2503.17550 null
2025-03-21 Reimagining Support: Exploring Autistic Individuals’ Visions for AI in Coping with Negative Self-Talk Buse Carik et.al. 2503.17504 null
2025-03-21 Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP Veysel Kocaman et.al. 2503.17425 null
2025-03-21 Understanding Social Support Needs in Questions: A Hybrid Approach Integrating Semi-Supervised Learning and LLM-based Data Augmentation Junwei Kuang et.al. 2503.17421 null
2025-03-21 Automating Adjudication of Cardiovascular Events Using Large Language Models Sonish Sivarajkumar et.al. 2503.17222 null
2025-03-20 Automated Harmfulness Testing for Code Large Language Models Honghao Tan et.al. 2503.16740 null
2025-03-18 From Patient Consultations to Graphs: Leveraging LLMs for Patient Journey Knowledge Graph Construction Hassan S. Al Khatib et.al. 2503.16533 null
2025-03-18 Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine Chengfeng Dou et.al. 2503.16530 null
2025-03-20 OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence Long Yuan et.al. 2503.16326 null
2025-03-21 Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1 Peiran Gu et.al. 2503.16304 null
2025-03-21 MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering Feiyang Li et.al. 2503.16131 null
2025-03-20 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models Zenghui Yuan et.al. 2503.16023 null
2025-03-20 Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning Peiyi Lin et.al. 2503.15924 null
2025-03-20 DeepPsy-Agent: A Stage-Aware and Deep-Thinking Emotional Support Agent System Kai Chen et.al. 2503.15876 null
2025-03-19 Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation Hisashi Johno et.al. 2503.15664 null
2025-03-27 Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems Yuelyu Ji et.al. 2503.15454 null
2025-03-19 Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data Anatole Callies et.al. 2503.15374 link
2025-03-19 Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks Yuting Guo et.al. 2503.15169 null
2025-03-28 Envisioning an AI-Enhanced Mental Health Ecosystem Kellie Yu Hui Sim et.al. 2503.14883 null
2025-03-18 Generating Medically-Informed Explanations for Depression Detection using LLMs Xiangyong Chen et.al. 2503.14671 null
2025-03-18 MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation Kai Chen et.al. 2503.13856 null
2025-03-14 RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration Hong Qing Yu et.al. 2503.13514 null
2025-03-13 It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education Shrutika Singh et.al. 2503.13508 null
2025-03-17 Reliable and Efficient Amortized Model-based Evaluation Sang Truong et.al. 2503.13335 null
2025-03-24 LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation Xiaodi Li et.al. 2503.13281 null
2025-03-17 MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways Zhen Chen et.al. 2503.13205 null
2025-03-16 From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations Sarvesh Baskar et.al. 2503.12556 null
2025-03-15 Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes Da Wu et.al. 2503.12286 null
2025-03-15 TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation Mayank Kumar et.al. 2503.12217 null
2025-03-20 Applications of Large Language Model Reasoning in Feature Generation Dharani Chandra et.al. 2503.11989 null
2025-03-14 Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages Jiyeong Kim et.al. 2503.11384 null
2025-03-14 TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools Shanghua Gao et.al. 2503.10970 link
2025-03-12 CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries Zhiyuan Wang et.al. 2503.10707 null
2025-03-12 Medical Large Language Model Benchmarks Should Prioritize Construct Validity Ahmed Alaa et.al. 2503.10694 null
2025-03-13 Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models Afrar Jahin et.al. 2503.10573 null
2025-03-13 LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions Gaurav Kumar Gupta et.al. 2503.10486 null
2025-03-13 Cognitive-Mental-LLM: Leveraging Reasoning in Large Language Models for Mental Health Prediction via Online Text Avinash Patil et.al. 2503.10095 link
2025-03-12 Review GIDE – Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models Timothy Laurence et.al. 2503.09743 null
2025-03-12 LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics Jialiang Tang et.al. 2503.09656 null
2025-03-16 Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy Abe Bohan Hou et.al. 2503.09639 null
2025-03-12 RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports Jiushen Cai et.al. 2503.09358 null
2025-03-12 A Survey on Enhancing Causal Reasoning Ability of Large Language Models Xin Li et.al. 2503.09326 null
2025-03-12 VaxGuard: A Multi-Generator, Multi-Type, and Multi-Role Dataset for Detecting LLM-Generated Vaccine Misinformation Syed Talal Ahmad et.al. 2503.09103 null
2025-03-12 Teaching LLMs How to Learn with Contextual Fine-Tuning Younwoo Choi et.al. 2503.09032 null
2025-03-11 Towards Scalable and Cross-Lingual Specialist Language Models for Oncology Morteza Rohanian et.al. 2503.08323 null
2025-03-10 Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan Matthias Schöffel et.al. 2503.07827 null
2025-03-20 MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Xiangru Tang et.al. 2503.07459 link
2025-03-10 Anatomy-Aware Conditional Image-Text Retrieval Meng Zheng et.al. 2503.07456 null
2025-03-10 Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment Xing Xie et.al. 2503.07334 link
2025-03-10 Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies Luyi Jiang et.al. 2503.07306 null
2025-03-10 A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images Xiaoyi Liang et.al. 2503.07094 null
2025-03-10 TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine Tianai Huang et.al. 2503.07041 null
2025-03-10 Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation Zhi Qin et.al. 2503.07032 null
2025-03-09 Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia Sabeen Ahmed et.al. 2503.06797 null
2025-03-09 Why Pre-trained Models Fail: Feature Entanglement in Multi-modal Depression Detection Xiangyu Zhang et.al. 2503.06620 null
2025-03-09 ExKG-LLM: Leveraging Large Language Models for Automated Expansion of Cognitive Neuroscience Knowledge Graphs Ali Sarabadani et.al. 2503.06479 null
2025-03-09 AXAI-CDSS : An Affective Explainable AI-Driven Clinical Decision Support System for Cannabis Use Tongze Zhang et.al. 2503.06463 null
2025-03-08 CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset Oriel Perets et.al. 2503.06204 link
2025-03-08 Towards Conversational AI for Disease Management Anil Palepu et.al. 2503.06074 null
2025-03-01 MedSimAI: Simulation and Formative Feedback Generation to Enhance Deliberate Practice in Medical Education Yann Hicke et.al. 2503.05793 null
2025-03-07 Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering Yusong Ke et.al. 2503.05505 null
2025-03-07 GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation Zhenxuan Zhang et.al. 2503.05347 link
2025-03-06 HILGEN: Hierarchically-Informed Data Generation for Biomedical NER Using Knowledgebases and Large Language Models Yao Ge et.al. 2503.04930 null
2025-03-10 Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases Pengcheng Qiu et.al. 2503.04691 null
2025-03-06 Large Language Models in Bioinformatics: A Survey Zhenyu Wang et.al. 2503.04490 null
2025-03-06 TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records Hejie Cui et.al. 2503.04176 null
2025-03-06 KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease Yongchao Long et.al. 2503.04153 link
2025-03-06 Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting Jiyue Jiang et.al. 2503.04013 null
2025-03-06 RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models Wenhui Zhu et.al. 2503.03987 null
2025-03-05 RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction Fenglin Liu et.al. 2503.03802 link
2025-03-05 Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks Zihao Zhao et.al. 2503.03687 link
2025-03-05 Psy-Copilot: Visual Chain of Thought for Counseling Keqi Chen et.al. 2503.03645 null
2025-03-05 Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling Keqi Chen et.al. 2503.03607 null
2025-03-05 Structured Outputs Enable General-Purpose LLMs to be Medical Experts Guangfu Guo et.al. 2503.03194 null
2025-03-04 From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance Jiacheng Tang et.al. 2503.02760 null
2025-03-04 The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats William Brach et.al. 2503.02650 link
2025-03-04 BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA Zhengyang Ji et.al. 2503.02476 link
2025-03-04 MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics Haoan Jin et.al. 2503.02374 null
2025-03-06 EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports Lama Moukheiber et.al. 2503.02365 null
2025-03-04 Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm Zhuo Li et.al. 2503.02359 null
2025-03-03 Biomedical Foundation Model: A Survey Xiangrui Liu et.al. 2503.02104 null
2025-02-28 PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice Ruoxi Wang et.al. 2503.01903 null
2025-03-03 SHADE-AD: An LLM-Based Framework for Synthesizing Activity Data of Alzheimer’s Patients Heming Fu et.al. 2503.01768 null
2025-03-03 Designing VR Simulation System for Clinical Communication Training with LLMs-Based Embodied Conversational Agents Xiuqi Tommy Zhu et.al. 2503.01767 null
2025-03-03 Distilled Prompt Learning for Incomplete Multimodal Survival Prediction Yingxue Xu et.al. 2503.01653 null
2025-03-03 Leveraging LLMs for Mental Health: Detection and Recommendations from Social Discussions Vaishali Aggarwal et.al. 2503.01442 null
2025-03-03 Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation Linhai Zhang et.al. 2503.01315 null
2025-03-03 Cancer Type, Stage and Prognosis Assessment from Pathology Reports using LLMs Rachit Saluja et.al. 2503.01194 link
2025-03-03 Large Language Models for Healthcare Text Classification: A Systematic Review Hajar Sakai et.al. 2503.01159 null
2025-03-02 Language-agnostic, automated assessment of listeners’ speech recall using large language models Björn Herrmann et.al. 2503.01045 null
2025-03-02 FunBench: Benchmarking Fundus Reading Skills of MLLMs Qijie Wei et.al. 2503.00901 null
2025-03-02 Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies Tianyi Huang et.al. 2503.00724 null
2025-03-01 Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires Kyle Gao et.al. 2503.00566 null
2025-03-01 NeuroSymAD: A Neuro-Symbolic Framework for Interpretable Alzheimer’s Disease Diagnosis Yexiao He et.al. 2503.00510 null
2025-03-01 NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews Vedant Khandelwal et.al. 2503.00278 null
2025-03-01 Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy Jahan C. Penny-Dimri et.al. 2503.00269 null
2025-02-24 Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application Carlos Luengo Vera et.al. 2503.00025 null
2025-02-28 A Non-contrast Head CT Foundation Model for Comprehensive Neuro-Trauma Triage Youngjin Yoo et.al. 2502.21106 null
2025-02-28 Explainable Biomedical Claim Verification with Large Language Models Siting Liang et.al. 2502.21014 null
2025-02-28 Merging Clinical Knowledge into Large Language Models for Medical Research and Applications: A Survey Qiyuan Li et.al. 2502.20988 null
2025-02-28 ProAI: Proactive Multi-Agent Conversational AI with Structured Knowledge Base for Psychiatric Diagnosis Yuqi Wu et.al. 2502.20689 null
2025-02-28 NutriGen: Personalized Meal Plan Generator Leveraging Large Language Models to Enhance Dietary and Nutritional Adherence Saman Khamesian et.al. 2502.20601 link
2025-02-27 CoCa-CXR: Contrastive Captioners Learn Strong Temporal Structures for Chest X-Ray Vision-Language Understanding Yixiong Chen et.al. 2502.20509 null
2025-02-27 KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model Kai Zhang et.al. 2502.20350 null
2025-02-27 Expertise Is What We Want Alan Ashworth et.al. 2502.20335 null
2025-02-27 MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue Yujia Chen et.al. 2502.19860 null
2025-03-03 R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning Minggui He et.al. 2502.19735 null
2025-02-27 Preference Learning Unlocks LLMs’ Psycho-Counseling Skills Mian Zhang et.al. 2502.19731 null
2025-02-27 SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning Mingsheng Cai et.al. 2502.19668 null
2025-02-26 Repurposing the scientific literature with vision-language models Anton Alyakin et.al. 2502.19546 null
2025-02-26 Conversational Planning for Personal Plans Konstantina Christakopoulou et.al. 2502.19500 null
2025-02-26 MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis Daniel Rose et.al. 2502.19175 null
2025-02-26 Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection Carter Adams et.al. 2502.18823 null
2025-02-26 TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation Chenlu Ju et.al. 2502.18712 link
2025-02-23 RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis Jianwei Wang et.al. 2502.18517 null
2025-02-26 Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support Guoxin Wang et.al. 2502.18274 link
2025-02-25 DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning Pusheng Xu et.al. 2502.17947 null
2025-02-25 Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation Tong Li et.al. 2502.17899 null
2025-02-24 Wearable Meets LLM for Stress Management: A Duoethnographic Study Integrating Wearable-Triggered Stressors and LLM Chatbots for Personalized Interventions Sameer Neupane et.al. 2502.17650 null
2025-02-24 Towards Conditioning Clinical Text Generation for User Control Osman Alperen Koraş et.al. 2502.17571 null
2025-02-18 User Intent to Use DeekSeep for Healthcare Purposes and their Trust in the Large Language Model: Multinational Survey Study Avishek Choudhury et.al. 2502.17487 null
2025-03-04 Large Language Models are Powerful EHR Encoders Stefan Hegselmann et.al. 2502.17403 link
2025-02-24 Real-time Monitoring of Economic Shocks using Company Websites Michael Koenig et.al. 2502.17161 null
2025-02-24 Applications of Large Models in Medicine YunHe Su et.al. 2502.17132 null
2025-02-23 GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking Yingjian Chen et.al. 2502.16514 null
2025-02-22 Large Language Model for Lossless Image Compression with Visual Prompts Junhao Du et.al. 2502.16163 null
2025-02-25 Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation Won Seok Jang et.al. 2502.16022 null
2025-02-21 AutoMedPrompt: A New Framework for Optimizing LLM Medical Prompts Using Textual Gradients Sean Wu et.al. 2502.15944 null
2025-02-21 “Kya family planning after marriage hoti hai?”: Integrating Cultural Sensitivity in an LLM Chatbot for Reproductive Health Roshini Deva et.al. 2502.15939 null
2025-02-21 CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models Rikhiya Ghosh et.al. 2502.15932 null
2025-02-21 A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare Manar Aljohani et.al. 2502.15871 null
2025-02-21 MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models Suraj Racha et.al. 2502.15418 link
2025-02-20 Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson’s Disease Elliot Schumacher et.al. 2502.15069 null
2025-02-20 Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning Shuyue Stella Li et.al. 2502.14860 link
2025-02-20 Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning Juraj Vladika et.al. 2502.14765 link
2025-02-21 Data-Constrained Synthesis of Training Data for De-Identification Thomas Vakili et.al. 2502.14677 null
2025-02-20 FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis Mingyi Jia et.al. 2502.14614 null
2025-02-20 MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Shrey Pandit et.al. 2502.14302 null
2025-02-20 Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment Jiaxi Li et.al. 2502.14275 null
2025-03-03 QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification Hajar Sakai et.al. 2502.14189 null
2025-02-18 Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics Kabir Kumar et.al. 2502.13982 null
2025-02-19 Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health Xingbo Wang et.al. 2502.13920 link
2025-02-19 VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare Anudeex Shetty et.al. 2502.13775 null
2025-02-19 Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs Yushi Feng et.al. 2502.13555 link
2025-02-19 Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion Shuai Niu et.al. 2502.13509 null
2025-02-19 Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning Yang Yan et.al. 2502.13447 null
2025-02-19 RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering Sichu Liang et.al. 2502.13361 null
2025-02-18 Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare Hiba Ahsan et.al. 2502.13319 null
2025-02-18 SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Yucheng Shi et.al. 2502.13233 null
2025-02-18 Private Text Generation by Seeding Large Language Model Prompts Supriya Nagesh et.al. 2502.13193 null
2025-02-18 Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge Mohammad Reza Rezaei et.al. 2502.13010 null
2025-02-18 An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation Mohammad Feli et.al. 2502.12836 null
2025-02-18 Baichuan-M1: Pushing the Medical Capability of Large Language Models Bingning Wang et.al. 2502.12671 null
2025-02-18 Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs: Evidence and Mechanisms for AI Agents to Inform Policy Decisions Karthik Sreedhar et.al. 2502.12504 null
2025-02-18 USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner Mingcong Chen et.al. 2502.12498 null
2025-02-14 Leveraging large language models for structured information extraction from pathology reports Jeya Balaji Balasubramanian et.al. 2502.12183 link
2025-02-17 Exploring Large Language Models in Healthcare: Insights into Corpora Sources, Customization Strategies, and Evaluation Metrics Shuqi Yang et.al. 2502.11861 null
2025-02-17 LLM Agents Making Agent Tools Georg Wölflein et.al. 2502.11705 link
2025-02-17 CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation Guangya Yu et.al. 2502.11703 null
2025-02-17 A Survey of Personalized Large Language Models: Progress and Future Directions Jiahong Liu et.al. 2502.11528 link
2025-02-16 A Survey of LLM-based Agents in Medicine: How far are we from Baymax? Wenxuan Wang et.al. 2502.11211 null
2025-02-16 Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications Alexandru Lecu et.al. 2502.11108 link
2025-02-16 A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions Hongbin Na et.al. 2502.11095 null
2025-02-16 SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information Xiangyu Zhang et.al. 2502.10950 null
2025-02-15 Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia Rohith Perumandla et.al. 2502.10896 null
2025-02-15 ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis Xueshen Li et.al. 2502.10620 null
2025-02-14 Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes Ezinne Nwankwo et.al. 2502.10605 null
2025-02-21 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Tianwei Lin et.al. 2502.09838 link
2025-02-12 Cancer Vaccine Adjuvant Name Recognition from Biomedical Literature using Large Language Models Hasin Rehana et.al. 2502.09659 null
2025-02-17 Zero-shot generation of synthetic neurosurgical data with large language models Austin A. Barr et.al. 2502.09566 link
2025-02-13 Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs Chang Liu et.al. 2502.09156 null
2025-02-13 Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech Jonathan Pofcher et.al. 2502.09004 null
2025-02-13 Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning Leon Nissen et.al. 2502.08954 link
2025-02-12 Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models Tabinda Sarwar et.al. 2502.08669 null
2025-02-12 SycEval: Evaluating LLM Sycophancy Aaron Fanous et.al. 2502.08177 null
2025-02-12 Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD) Naomi Akhras et.al. 2502.08073 null
2025-02-11 Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? Hye Sun Yun et.al. 2502.07963 link
2025-02-12 Beyond Prompting: Time2Lang – Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Arvind Pillai et.al. 2502.07608 link
2025-02-11 Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning Jiayuan Zhu et.al. 2502.07143 null
2025-02-10 Interactive Data Harmonization with LLM Agents Aécio Santos et.al. 2502.07132 null
2025-02-09 LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison Gabriele De Vito et.al. 2502.06890 null
2025-02-06 Integrating Generative Artificial Intelligence in ADRD: A Framework for Streamlining Diagnosis and Care in Neurodegenerative Diseases Andrew G. Breithaupt et.al. 2502.06842 null
2025-02-04 Diffusion Instruction Tuning Chen Jin et.al. 2502.06814 null
2025-02-10 Automatic Evaluation of Healthcare LLMs Beyond Question-Answering Anna Arias-Duart et.al. 2502.06666 null
2025-02-10 Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy Kamyar Kazari et.al. 2502.06150 null
2025-02-09 HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents Mohammad Amin Abbasi et.al. 2502.05982 null
2025-02-09 A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography Nicholas Evans et.al. 2502.05926 null
2025-02-09 Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models Shiyu Teng et.al. 2502.05879 null
2025-02-09 Large Language Model-based Nonnegative Matrix Factorization For Cardiorespiratory Sound Separation Yasaman Torabi et.al. 2502.05757 null
2025-02-09 RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care Ziqi Yang et.al. 2502.05740 null
2025-02-08 KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy Hyunjong Kim et.al. 2502.05651 null
2025-02-08 ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports Aynur Guluzade et.al. 2502.05638 link
2025-02-08 OntoTune: Ontology-Driven Self-training for Aligning Large Language Models Zhiqiang Liu et.al. 2502.05478 link
2025-02-12 Safety at Scale: A Comprehensive Survey of Large Model Safety Xingjun Ma et.al. 2502.05206 link
2025-02-07 “It Felt Like I Was Left in the Dark”: Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings Shihan Fu et.al. 2502.05115 null
2025-02-07 Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy Rishabh Uapadhyay et.al. 2502.04666 null
2025-02-05 Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning Jonathan Kim et.al. 2502.04381 null
2025-02-04 Open Foundation Models in Healthcare: Challenges, Paradoxes, and Opportunities with GenAI Driven Personalized Prescription Mahdi Alkaeed et.al. 2502.04356 null
2025-02-04 JingFang: A Traditional Chinese Medicine Large Language Model of Expert-Level Medical Diagnosis and Syndrome Differentiation-Based Treatment Yehan Yan et.al. 2502.04345 null
2025-02-06 Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond Mardhiyah Sanni et.al. 2502.03945 null
2025-02-05 A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause Roshini Deva et.al. 2502.03579 null
2025-02-05 MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters Amin Dada et.al. 2502.03298 null
2025-02-05 MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation Seonok Kim et.al. 2502.03004 null
2025-02-05 CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration Yizhe Yang et.al. 2502.02807 null
2025-02-04 Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation Atharva Mangeshkumar Agrawal et.al. 2502.02249 null
2025-02-02 Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model Hadas Ben-Atya et.al. 2502.01691 null
2025-02-03 OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology Chengfeng Zhou et.al. 2502.01243 null
2025-02-02 Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale Cliff Wong et.al. 2502.00943 null
2025-02-02 Generalization of Medical Large Language Models through Cross-Domain Weak Supervision Robert Long et.al. 2502.00832 null
2025-01-31 Fairshare Data Pricing for Large Language Models Luyang Zhang et.al. 2502.00198 null
2025-01-31 DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets Abdurrahim Yilmaz et.al. 2502.00196 null
2025-02-04 AIN: The Arabic INclusive Large Multimodal Model Ahmed Heakl et.al. 2502.00094 link
2025-01-30 A Multi-Layered Large Language Model Framework for Disease Prediction Malak Mohamed et.al. 2502.00063 null
2025-01-21 Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients Abdulaziz Ahmed et.al. 2502.00025 null
2025-01-30 Survey and Improvement Strategies for Gene Prioritization with Large Language Models Matthew Neeley et.al. 2501.18794 null
2025-01-30 Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning Maya Kruse et.al. 2501.18724 null
2025-02-03 Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models Manish Sanwal et.al. 2501.18645 null
2025-01-27 Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare Hang Zhang et.al. 2501.18632 null
2025-01-30 GENIE: Generative Note Information Extraction model for structuring EHR data Huaiyuan Ying et.al. 2501.18435 null
2025-01-30 Battery State of Health Estimation Using LLM Framework Aybars Yunusoglu et.al. 2501.18123 null
2025-01-29 Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations Zijie Liu et.al. 2501.17860 null
2025-01-29 LLM Assistance for Pediatric Depression Mariia Ignashina et.al. 2501.17510 null
2025-01-28 Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction Mingyu Derek Ma et.al. 2501.17326 null
2025-01-28 Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology Peilong Wang et.al. 2501.17286 null
2025-01-28 Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care Fengpei Yuan et.al. 2501.17206 null
2025-01-27 A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis Aysegul Ucar et.al. 2501.17190 null
2025-01-28 Adapting Network Information to Semantics for Generalizable and Plug-and-Play Multi-Scenario Network Diagnosis Tiao Tan et.al. 2501.16842 null
2025-01-28 VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records Philip Chung et.al. 2501.16672 link
2025-01-27 A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain Jorge del Pozo Lérida et.al. 2501.16533 null
2025-01-27 Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM Payal Kamboj et.al. 2501.16481 link
2025-01-24 GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration Ziwen Li et.al. 2501.16382 link
2025-01-18 An Integrated Approach to AI-Generated Content in e-health Tasnim Ahmed et.al. 2501.16348 null
2025-01-27 A foundation model for human-AI collaboration in medical literature mining Zifeng Wang et.al. 2501.16255 null
2025-01-27 Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models Huayu Li et.al. 2501.16215 link
2025-01-27 MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer Qi Chen et.al. 2501.15826 null
2025-01-26 Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals Yinzhou Wang et.al. 2501.15599 null
2025-01-25 The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? Ayo Adedeji et.al. 2501.15310 null
2025-01-25 Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training Xunxin Cai et.al. 2501.15108 null
2025-01-25 Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations Harshita Chopra et.al. 2501.15056 null
2025-01-24 Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs Hang Luo et.al. 2501.14892 link
2025-01-24 Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? Ipek Baris Schlicht et.al. 2501.14719 null
2025-01-24 MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications Yixing Jiang et.al. 2501.14654 link
2025-01-24 AI Chatbots as Professional Service Agents: Developing a Professional Identity Wenwen Li et.al. 2501.14179 null
2025-01-23 MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning Joshua Davis et.al. 2501.14105 link
2025-01-23 Leveraging Large Language Models to Analyze Emotional and Contextual Drivers of Teen Substance Use in Online Discussions Jianfeng Zhu et.al. 2501.14037 null
2025-01-23 Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs Bhumika Gupta et.al. 2501.13984 null
2025-01-21 Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs) Jadon Geathers et.al. 2501.13957 null
2025-01-20 A Layered Multi-Expert Framework for Long-Context Mental Health Assessments Jinwen Tang et.al. 2501.13951 null
2025-01-14 Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications Arjun R. Malghan et.al. 2501.13936 null
2025-01-23 Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change Mowafak Allaham et.al. 2501.13802 null
2025-01-22 Intelligent Exercise and Feedback System for Social Healthcare using LLMOps Yeongrak Choi et.al. 2501.13723 null
2025-01-23 Question Answering on Patient Medical Records with Private Fine-Tuned LLMs Sara Kothari et.al. 2501.13687 null
2025-01-23 How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization Shezheng Song et.al. 2501.13669 null
2025-01-20 Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness Ambreesh Parthasarathy et.al. 2501.13120 null
2025-01-21 Can open source large language models be used for tumor documentation in Germany? – An evaluation on urological doctors’ notes Stefan Lenz et.al. 2501.12106 link
2025-01-23 Med-R $^2$ : Crafting Trustworthy LLM Physicians through Retrieval and Reasoning of Evidence-Based Medicine Keer Lu et.al. 2501.11885 link
2025-01-19 Clinical trial cohort selection using Large Language Models on n2c2 Challenges Chi-en Amy Tai et.al. 2501.11114 null
2025-01-18 Iterative Tree Analysis for Medical Critics Zenan Huang et.al. 2501.10642 null
2025-01-17 Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education William Hersh et.al. 2501.10186 null
2025-01-17 Demo: Interactive Visualization of Semantic Relationships in a Biomedical Project’s Talent Knowledge Graph Jiawei Xu et.al. 2501.09909 null
2025-01-17 Position: Open and Closed Large Language Models in Healthcare Jiawei Xu et.al. 2501.09906 null
2025-01-16 Bridging Language Barriers in Healthcare: A Study on Arabic LLMs Nada Saadi et.al. 2501.09825 null
2025-01-16 Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval Jesus Lovon et.al. 2501.09384 link
2025-01-16 FineMedLM-o1: Enhancing the Medical Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training Hongzhou Yu et.al. 2501.09213 link
2025-01-17 Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models Emma Croxford et.al. 2501.08977 null
2025-01-26 Enhanced Large Language Models for Effective Screening of Depression and Anxiety June M. Liu et.al. 2501.08769 null
2025-01-14 ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations Ziyuan Huang et.al. 2501.08324 null
2025-01-14 ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems Mohita Chowdhury et.al. 2501.08208 null
2025-01-13 Large Language Models for Interpretable Mental Health Diagnosis Brian Hyeongseok Kim et.al. 2501.07653 null
2025-01-13 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Difei Gu et.al. 2501.07525 link
2025-01-13 Combining LLM decision and RL action selection to improve RL policy for adaptive interventions Karine Karine et.al. 2501.06980 null
2025-01-12 Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives Xinyao Ma et.al. 2501.06964 null
2025-01-12 A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context Noureldin Zahran et.al. 2501.06859 null
2025-01-12 Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation Shunfan Zheng et.al. 2501.06741 null
2025-01-21 MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare Ye Chen et.al. 2501.06465 null
2025-01-11 O1 Replication Journey – Part 3: Inference-time Scaling for Medical Reasoning Zhongzhen Huang et.al. 2501.06458 link
2025-01-10 AFRIDOC-MT: Document-level MT Corpus for African Languages Jesujoba O. Alabi et.al. 2501.06374 link
2025-01-10 Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts Elizabeth Schaefer et.al. 2501.06365 null
2025-01-10 Large Language Models for Bioinformatics Wei Ruan et.al. 2501.06271 null
2025-01-10 From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy Elham Aghakhani et.al. 2501.06101 null
2025-01-07 Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding John C. Rollman et.al. 2501.05479 null
2025-01-18 LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models Hang Yang et.al. 2501.05464 null
2025-01-09 Investigating Numerical Translation with Large Language Models Wei Tang et.al. 2501.04927 null
2025-01-07 LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment Gaoussou Youssouf Kebe et.al. 2501.03624 null
2025-01-06 Existential Crisis: A Social Robot’s Reason for Being Dora Medgyesy et.al. 2501.03376 null
2025-01-06 Design and implementation of tools to build an ontology of Security Requirements for Internet of Medical Things Daniel Naro et.al. 2501.03067 null
2025-01-06 IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment Yiming Zhang et.al. 2501.02869 null
2025-01-05 Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine Yishen Liu et.al. 2501.02471 null
2025-01-05 Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications Zhe Chen et.al. 2501.02460 null
2025-01-04 Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Kangyu Zhu et.al. 2501.02385 null
2025-01-04 Exploring the Capabilities and Limitations of Large Language Models for Radiation Oncology Decision Support Florian Putz et.al. 2501.02346 null
2025-01-03 PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents Jingoo Lee et.al. 2501.01594 null
2025-01-02 Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments – The Depression and Anxiety Case Kaushik Roy et.al. 2501.01305 null
2025-01-02 Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice Federico Ravenda et.al. 2501.00982 link
2024-12-31 CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care Michael Gubanov et.al. 2501.00223 null
2024-12-31 An Empirical Evaluation of Large Language Models on Consumer Health Questions Moaiz Abrar et.al. 2501.00208 null
2024-12-31 GPT-4 on Clinic Depression Assessment: An LLM-Based Pilot Study Giuliano Lorenzoni et.al. 2501.00199 null
2024-12-30 Temporal reasoning for timeline summarisation in social media Jiayu Song et.al. 2501.00152 null
2024-12-30 Tackling Cognitive Impairment Detection from Speech: A submission to the PROCESS Challenge Catarina Botelho et.al. 2501.00145 null
2024-12-21 Distilling Large Language Models for Efficient Clinical Information Extraction Karthik S. Vedula et.al. 2501.00031 null
2024-12-29 Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain Shintaro Ozaki et.al. 2412.20309 link
2024-12-28 On the Compositional Generalization of Multimodal LLMs for Medical Imaging Zhenyang Cai et.al. 2412.20070 link
2024-12-28 The Emotional Spectrum of LLMs: Leveraging Empathy and Emotion-Based Markers for Mental Health Support Alessandro De Grandi et.al. 2412.20068 null
2025-01-02 MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes Asma Ben Abacha et.al. 2412.19260 link
2025-01-03 MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models Kaiwen Zuo et.al. 2412.18947 null
2024-12-25 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Junying Chen et.al. 2412.18925 link
2024-12-24 Research on the Proximity Relationships of Psychosomatic Disease Knowledge Graph Modules Extracted by Large Language Models Zihan Zhou et.al. 2412.18419 null
2024-12-24 Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine Yu He Ke et.al. 2412.18096 null
2024-12-23 Generating Completions for Fragmented Broca’s Aphasic Sentences Using Large Language Models Sijbren van Vaals et.al. 2412.17669 link
2024-12-23 Detecting anxiety and depression in dialogues: a multi-label and explainable approach Francisco de Arriba-Pérez et.al. 2412.17651 null
2025-01-01 PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health Huy Vu et.al. 2412.16882 link
2025-01-03 KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis Kaiwen Zuo et.al. 2412.16833 null
2024-12-21 AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles Aritra Kumar Lahiri et.al. 2412.16701 null
2024-12-21 Evaluating the Performance of Large Language Models in Scientific Claim Detection and Classification Tanjim Bin Faruk et.al. 2412.16486 null
2024-12-21 Technical Report: Small Language Model for Japanese Clinical and Medicine Shogo Watanabe et.al. 2412.16423 null
2024-12-21 Identifying Cyberbullying Roles in Social Media Manuel Sandoval et.al. 2412.16417 null
2024-12-20 A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models Ferit Akaybicen et.al. 2412.16341 null
2024-12-20 Improving Equity in Health Modeling with GPT4-Turbo Generated Synthetic Data: A Comparative Study Daniel Smolyak et.al. 2412.16335 null
2024-12-20 Benchmarking LLMs and SLMs for patient reported outcomes Matteo Marengo et.al. 2412.16291 null
2024-12-20 Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG Hasan Md Tusfiqur Alam et.al. 2412.16086 link
2024-12-20 From General to Specific: Tailoring Large Language Models for Personalized Healthcare Ruize Shi et.al. 2412.15957 null
2024-12-20 Linguistic Features Extracted by GPT-4 Improve Alzheimer’s Disease Detection based on Spontaneous Speech Jonathan Heitz et.al. 2412.15772 link
2024-12-20 Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models Shamus Sim et.al. 2412.15748 null
2024-12-20 NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning Zheyuan Zhang et.al. 2412.15547 null
2024-12-17 A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models Gongbo Zhang et.al. 2412.15271 null
2024-12-16 Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search Edward Kim et.al. 2412.15256 null
2024-12-13 Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an “AI Therapist” Robert Wasenmüller et.al. 2412.15242 null
2024-12-23 CareBot: A Pioneering Full-Process Open-Source Medical Language Model Lulu Zhao et.al. 2412.15236 null
2024-12-18 Clinical Trials Ontology Engineering with Large Language Models Berkan Çakır et.al. 2412.14387 null
2024-12-18 Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICs David Restrepo et.al. 2412.14304 null
2024-12-18 Discovering maximally consistent distribution of causal tournaments with Large Language Models Federico Baldo et.al. 2412.14019 null
2024-12-18 Cognition Chain for Explainable Psychological Stress Detection on Social Media Xin Wang et.al. 2412.14009 link
2025-01-08 Federated Learning and RAG Integration: A Scalable Approach for Medical Large Language Models Jincheol Jung et.al. 2412.13720 null
2024-12-18 Exploring Multi-Modal Integration with Tool-Augmented LLM Agents for Precise Causal Discovery ChengAo Shen et.al. 2412.13667 link
2024-12-18 PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling Haojie Xie et.al. 2412.13660 link
2024-12-17 Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health Vivek Kumar et.al. 2412.12981 link
2024-12-17 Process-Supervised Reward Models for Clinical Note Generation: A Scalable Approach Guided by Domain Expertise Hanyin Wang et.al. 2412.12583 link
2024-12-17 RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment Xuanzhong Chen et.al. 2412.12475 null
2024-12-17 Assessing the Limitations of Large Language Models in Clinical Fact Decomposition Monica Munnangi et.al. 2412.12422 link
2024-12-16 Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments Tuka Alhanai et.al. 2412.12417 link
2024-12-11 Performance of a large language model-Artificial Intelligence based chatbot for counseling patients with sexually transmitted infections and genital diseases Nikhil Mehta et.al. 2412.12166 null
2024-12-16 LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts Zhuhao Wang et.al. 2412.12001 link
2024-12-16 Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives Sam Relins et.al. 2412.11878 link
2024-12-16 LLMs Can Simulate Standardized Patients via Agent Coevolution Zhuoyun Du et.al. 2412.11716 link
2024-12-16 Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery Ryuhaerang Choi et.al. 2412.11656 null
2024-12-16 ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models Xiechi Zhang et.al. 2412.11453 null
2024-12-19 TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs Lanxiang Hu et.al. 2412.11242 null
2024-12-15 AD-LLM: Benchmarking Large Language Models for Anomaly Detection Tiankai Yang et.al. 2412.11142 link
2024-12-15 HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation Tengfei Liu et.al. 2412.11070 link
2024-12-17 MedG-KRP: Medical Graph Knowledge Representation Probing Gabriel R. Rosenbaum et.al. 2412.10982 null
2024-12-14 LLMs-in-the-Loop Part 2: Expert Small AI Models for Anonymization and De-identification of PHI Across Multiple Languages Murat Gunay et.al. 2412.10918 null
2024-12-14 Superhuman performance of a large language model on the reasoning tasks of a physician Peter G. Brodeur et.al. 2412.10849 null
2024-12-14 Large Language Models for Medical Forecasting – Foresight 2 Zeljko Kraljevic et.al. 2412.10848 null
2024-12-14 A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options Peilong Wang et.al. 2412.10622 null
2024-12-09 Leveraging Audio and Text Modalities in Mental Health: A Study of LLMs Performance Abdelrahman A. Ali et.al. 2412.10417 null
2024-12-09 Exploring Complex Mental Health Symptoms via Classifying Social Media Data with Explainable LLMs Kexin Chen et.al. 2412.10414 null
2024-12-13 UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities Muhammad Uzair Khattak et.al. 2412.10372 link
2024-12-12 MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced Interpretation Zheyuan Zhang et.al. 2412.08847 link
2024-12-11 Detecting Conversational Mental Manipulation with Intent-Aware Prompting Jiayuan Ma et.al. 2412.08414 link
2024-12-10 BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Sahal Shaji Mullappilly et.al. 2412.07769 link
2024-12-10 Zero-Shot ATC Coding with Large Language Models for Clinical Assessments Zijian Chen et.al. 2412.07743 null
2024-12-09 Balancing Efficiency and Effectiveness: An LLM-Infused Approach for Optimized CTR Prediction Guoxiao Zhang et.al. 2412.06860 null
2024-12-06 Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System Fang Zeng et.al. 2412.06828 null
2024-12-12 PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models Qian Zhang et.al. 2412.06287 link
2024-12-09 MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization Kangyu Zhu et.al. 2412.06141 link
2024-12-08 Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis Aman Kassahun Wassie et.al. 2412.05862 null
2024-12-08 Are Clinical T5 Models Better for Clinical Text? Yahan Li et.al. 2412.05845 link
2024-12-09 Enhancing FKG.in: automating Indian food composition analysis Saransh Kumar Gupta et.al. 2412.05248 null
2024-12-06 SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot Jinlin Wu et.al. 2412.05187 link
2024-12-06 A text-to-tabular approach to generate synthetic patient data using LLMs Margaux Tornqvist et.al. 2412.05153 link
2024-12-05 Give me Some Hard Questions: Synthetic Data Generation for Clinical QA Fan Bai et.al. 2412.04573 link
2024-12-04 Prompting Large Language Models for Clinical Temporal Relation Extraction Jianping He et.al. 2412.04512 null
2024-12-05 Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots Maria Paola Priola et.al. 2412.04235 null
2024-12-05 Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models Abdelrahaman A. Hassan et.al. 2412.03796 null
2024-11-28 CovidLLM: A Robust Large Language Model with Missing Value Adaptation and Multi-Objective Learning Strategy for Predicting Disease Severity and Clinical Outcomes in COVID-19 Patients Shengjun Zhu et.al. 2412.03593 link
2024-12-04 A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences Gabriel Lino Garcia et.al. 2412.03531 null
2024-12-04 Advancing Conversational Psychotherapy: Integrating Privacy, Dual-Memory, and Domain Expertise with Large Language Models XiuYu Zhang et.al. 2412.02987 null
2024-12-03 A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications Yixiang Qu et.al. 2412.02868 null
2024-12-09 RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models Hieu Tran et.al. 2412.02830 link
2024-12-03 Keeping Experts in the Loop: Expert-Guided Optimization for Clinical Data Classification using Large Language Models Nader Karayanni et.al. 2412.02173 null
2024-12-04 The use of large language models to enhance cancer clinical trial educational materials Mingye Gao et.al. 2412.01955 null
2024-12-02 Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking Jie Liu et.al. 2412.01605 null
2024-12-02 Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models Chayan Tank et.al. 2412.01353 null
2024-12-02 Best Practices for Large Language Models in Radiology Christian Bluethgen et.al. 2412.01233 null
2024-12-01 Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages Edward Bayes et.al. 2412.00948 null
2024-12-06 Opus: A Large Work Model for Complex Workflow Generation Théo Fagnoni et.al. 2412.00573 null
2024-11-30 Polish Medical Exams: A new dataset for cross-lingual medical knowledge transfer assessment Łukasz Grzybowski et.al. 2412.00559 null
2024-12-07 Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective Yue Zhou et.al. 2412.00554 null
2024-11-30 CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models Yan Wang et.al. 2412.00491 null
2024-11-29 SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies Jiachen Lian et.al. 2412.00265 null
2024-11-29 Fine Tuning Large Language Models to Deliver CBT for Depression Talha Tahir et.al. 2412.00251 link
2024-11-24 Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis Ferhat Ozgur Catak et.al. 2412.00056 null
2024-11-29 MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks John Francis et.al. 2411.19689 null
2024-11-29 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks Kim-Celine Kahl et.al. 2411.19688 link
2024-11-28 ComViewer: An Interactive Visual Tool to Help Viewers Seek Social Support in Online Mental Health Communities Shiwei Wu et.al. 2411.19169 link
2024-11-28 A Unified Platform for At-Home Post-Stroke Rehabilitation Enabled by Wearable Technologies and Artificial Intelligence Chenyu Tang et.al. 2411.19000 null
2024-11-28 Rephrasing Electronic Health Records for Pretraining Clinical Language Models Jinghui Liu et.al. 2411.18940 null
2024-11-28 Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer’s Disease Junan Li et.al. 2411.18922 null
2024-12-06 LLM-ABBA: Understanding time series via symbolic approximation Erin Carson et.al. 2411.18506 null
2024-11-28 Wearable intelligent throat enables natural speech in stroke patients with dysarthria Chenyu Tang et.al. 2411.18266 null
2024-11-29 InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks Xinyao Zheng et.al. 2411.18191 null
2024-11-27 Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track Deepak Gupta et.al. 2411.18069 null
2024-11-27 QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions Ramez Kouzy et.al. 2411.17967 link
2024-11-26 Synthetic Data Generation with LLM for Improved Depression Prediction Andrea Kang et.al. 2411.17672 null
2024-11-26 Can artificial intelligence predict clinical trial outcomes? Shuyi Jin et.al. 2411.17595 null
2024-11-26 The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations Theodora Worledge et.al. 2411.17375 link
2024-12-10 Using Large Language Models for Expert Prior Elicitation in Predictive Modelling Alexander Capstick et.al. 2411.17284 link
2024-11-28 Strategic Prompting for Conversational Tasks: A Comparative Analysis of Large Language Models Across Diverse Conversational Tasks Ratnesh Kumar Joshi et.al. 2411.17204 null
2024-11-25 Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries Harshavardhan Battula et.al. 2411.16818 null
2024-11-27 Creating Scalable AGI: the Open General Intelligence Framework Daniel A. Dollinger et.al. 2411.15832 null
2024-11-24 RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements Zaifu Zhan et.al. 2411.15700 null
2024-11-23 Ontology-Constrained Generation of Domain-Specific Clinical Summaries Gaya Mehenni et.al. 2411.15666 link
2024-11-27 AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset Tobi Olatunji et.al. 2411.15640 null
2024-11-23 Large Language Model with Region-guided Referring and Grounding for CT Report Generation Zhixuan Chen et.al. 2411.15539 link
2024-11-23 The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges Jiqun Liu et.al. 2411.15396 null
2024-11-22 Regulator-Manufacturer AI Agents Modeling: Mathematical Feedback-Driven Multi-Agent LLM Framework Yu Han et.al. 2411.15356 null
2024-11-21 BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models Taha Koleilat et.al. 2411.15232 link
2024-11-22 Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation Colin Diggs et.al. 2411.14971 null
2024-11-22 De-biased Multimodal Electrocardiogram Analysis Haitao Li et.al. 2411.14795 null
2024-11-22 Enhancing Clinical Trial Patient Matching through Knowledge Augmentation with Multi-Agents Hanwen Shi et.al. 2411.14637 null
2024-11-20 Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine Yifan Yang et.al. 2411.14487 null
2024-11-16 Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios Shaochen Xu et.al. 2411.14461 null
2024-11-21 Logic Augmented Generation Aldo Gangemi et.al. 2411.14012 null
2024-11-21 PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation Zhijie Bao et.al. 2411.13902 link
2024-11-21 A Multimodal Approach to The Detection and Classification of Skin Diseases Allen Yang et.al. 2411.13855 null
2024-11-19 Can ChatGPT Overcome Behavioral Biases in the Financial Sector? Classify-and-Rethink: Multi-Step Zero-Shot Reasoning in the Gold Investment Shuoling Liu et.al. 2411.13599 null
2024-11-20 Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding Nabeel Seedat et.al. 2411.13163 null
2024-11-19 DIETS: Diabetic Insulin Management System in Everyday Life Hanyu Zeng et.al. 2411.12812 null
2024-11-19 Conversational Medical AI: Ready for Practice Antoine Lizée et.al. 2411.12808 null
2024-11-19 Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs Ahmed Akib Jawad Karim et.al. 2411.12712 null
2024-11-19 Performance of Large Language Models in Technical MRI Question Answering: A Comparative Study Alan B McMillan et.al. 2411.12238 null
2024-11-18 Medical Video Generation for Disease Progression Simulation Xu Cao et.al. 2411.11943 null
2024-11-04 Large language models for mental health Andreas Triantafyllopoulos et.al. 2411.11880 null
2024-11-18 Membership Inference Attack against Long-Context Large Language Models Zixiong Wang et.al. 2411.11424 null
2024-11-17 BianCang: A Traditional Chinese Medicine Large Language Model Sibo Wei et.al. 2411.11027 link
2024-11-16 Can Generic LLMs Help Analyze Child-adult Interactions Involving Children with Autism in Clinical Observation? Tiantian Feng et.al. 2411.10761 null
2024-11-16 Structured Dialogue System for Mental Health: An LLM Chatbot Leveraging the PM+ Guidelines Yixiang Chen et.al. 2411.10681 link
2024-11-15 Evaluating the role of `Constitutions’ for learning from AI feedback Saskia Redgate et.al. 2411.10168 null
2024-11-19 Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models? Yan Hu et.al. 2411.10020 link
2024-11-15 JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging Kaito Baba et.al. 2411.09933 null
2024-11-15 A Hybrid Artificial Intelligence System for Automated EEG Background Analysis and Report Generation Chin-Sung Tung et.al. 2411.09874 link
2024-11-19 A Benchmark for Long-Form Medical Question Answering Pedram Hosseini et.al. 2411.09834 null
2024-11-14 Script-centric behavior understanding for assisted autism spectrum disorder diagnosis Wenxing Liu et.al. 2411.09413 null
2024-11-14 Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering Nghia Trung Ngo et.al. 2411.09213 null
2024-11-13 The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Daniel P. Jeong et.al. 2411.08870 link
2024-11-14 Optimizing Automatic Summarization of Long Clinical Records Using Dynamic Context Extension:Testing and Evaluation of the NBCE Method Guoqing Zhang et.al. 2411.08586 null
2024-11-12 Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease Francesco Chiumento et.al. 2411.07871 null
2024-11-12 Multimodal Clinical Reasoning through Knowledge-augmented Rationale Generation Shuai Niu et.al. 2411.07611 null
2024-11-11 Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews Aakash Sorathiya et.al. 2411.07398 null
2024-11-11 A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 Vedant Khandelwal et.al. 2411.07163 null
2024-11-11 Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models Aniket Deroy et.al. 2411.06946 null
2024-11-11 Persuasion with Large Language Models: a Survey Alexander Rogiers et.al. 2411.06837 null
2024-11-11 Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding Zeyd Boukhers et.al. 2411.06823 null
2024-11-11 Ambient AI Scribing Support: Comparing the Performance of Specialized AI Agentic Architecture to Leading Foundational Models Chanseo Lee et.al. 2411.06713 null
2024-11-10 In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages Joseph Gatto et.al. 2411.06549 link
2024-11-10 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Canyu Chen et.al. 2411.06469 null
2024-11-09 GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence MD Ragib Shahriyear et.al. 2411.06264 null
2024-11-08 Humans Continue to Outperform Large Language Models in Complex Clinical Decision-Making: A Study with Medical Calculators Nicholas Wan et.al. 2411.05897 null
2024-11-08 Identifying and Decomposing Compound Ingredients in Meal Plans Using Large Language Models Leon Kopitar et.al. 2411.05892 null
2024-11-08 A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis Cristiano Patrício et.al. 2411.05609 link
2024-11-08 Analyzing Logs of Large-Scale Software Systems using Time Curves Visualization Dmytro Borysenkov et.al. 2411.05533 link
2024-11-14 SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark Sithursan Sivasubramaniam et.al. 2411.05521 link
2024-11-08 Content Quality vs. Attention Allocation: An LLM-Based Case Study in Peer-to-peer Mental Health Networks Teng Ye et.al. 2411.05328 null
2024-11-07 Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations Joey Hong et.al. 2411.05194 null
2024-11-11 FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? Eric Wu et.al. 2411.05059 link
2024-11-07 Integrating Large Language Models for Genetic Variant Classification Youssef Boulaimen et.al. 2411.05055 null
2024-11-07 Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability Yanjun Gao et.al. 2411.04962 null
2024-11-19 Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? Daniel P. Jeong et.al. 2411.04118 link
2024-11-07 MEG: Medical Knowledge-Augmented Large Language Models for Question Answering Laura Cabello et.al. 2411.03883 link
2024-11-06 A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients Yiming Li et.al. 2411.03805 null
2024-11-06 From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond Harsha Nori et.al. 2411.03590 null
2024-11-05 Exploring Large Language Models for Specialist-level Oncology Care Anil Palepu et.al. 2411.03395 null
2024-11-05 The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare Souren Pashangpour et.al. 2411.03287 null
2024-11-05 [Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI Maren Pielka et.al. 2411.02973 null
2024-11-04 Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge Karthik Soman et.al. 2411.02657 link
2024-11-04 “It’s a conversation, not a quiz”: A Risk Taxonomy and Reflection Tool for LLM Adoption in Public Health Jiawei Zhou et.al. 2411.02594 null
2024-11-01 Evaluating the Impact of Lab Test Results on Large Language Models Generated Differential Diagnoses from Clinical Case Vignettes Balu Bhasuran et.al. 2411.02523 null
2024-11-01 Rationale-Guided Retrieval Augmented Generation for Medical Question Answering Jiwoong Sohn et.al. 2411.00300 link
2024-11-16 RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models Serena Zhang et.al. 2411.00299 null
2024-10-31 A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making Yubin Kim et.al. 2411.00248 link
2024-10-31 Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning John Wu et.al. 2411.00173 null
2024-10-28 A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges Zifeng Wang et.al. 2411.00024 null
2024-10-31 Leveraging Large Language Models for Medical Information Extraction and Query Generation Georgios Peikos et.al. 2410.23851 null
2024-10-31 Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding Jinlong He et.al. 2410.23822 null
2024-10-31 The Potential of LLMs in Medical Education: Generating Questions and Answers for Qualification Exams Yunqi Zhu et.al. 2410.23769 null
2024-11-01 Large Language Models for Patient Comments Multi-Label Classification Hajar Sakai et.al. 2410.23528 null
2024-10-31 LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models Hieu Tran et.al. 2410.23526 null
2024-10-29 Do Large Language Models Align with Core Mental Health Counseling Competencies? Viet Cuong Nguyen et.al. 2410.22446 null
2024-10-29 Improving In-Context Learning with Small Language Model Ensembles M. Mehdi Mojarradi et.al. 2410.21868 link
2024-10-28 Can Large Language Models Replace Data Scientists in Clinical Research? Zifeng Wang et.al. 2410.21591 null
2024-10-28 LLM-Forest for Health Tabular Data Imputation Xinrui He et.al. 2410.21520 null
2024-10-28 RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension Abel Corrêa Dias et.al. 2410.21495 link
2024-11-01 “We do use it, but not how hearing people think”: How the Deaf and Hard of Hearing Community Uses Large Language Model Tools Shuxu Huffman et.al. 2410.21358 null
2024-10-28 Large Language Model Benchmarks in Medical Tasks Lawrence K. Q. Yan et.al. 2410.21348 null
2024-10-27 Language Models And A Second Opinion Use Case: The Pocket Professional David Noever et.al. 2410.20636 null
2024-10-26 Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks Annalisa Szymanski et.al. 2410.20266 null
2024-10-26 Infectious Disease Forecasting in India using LLM’s and Deep Learning Chaitya Shah et.al. 2410.20168 null
2024-10-26 AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels Lei Li et.al. 2410.20050 link
2024-10-25 DualMAR: Medical-Augmented Representation from Dual-Expertise Perspectives Pengfei Hu et.al. 2410.19955 link
2024-10-18 Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research Aarsh Shekhar et.al. 2410.19826 null
2024-10-24 Inference time LLM alignment in single and multidomain preference spectrum Sadat Shahriar et.al. 2410.19206 null
2024-10-24 Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use Mohit Chandra et.al. 2410.19155 link
2024-10-24 Watermarking Large Language Models and the Generated Content: Opportunities and Challenges Ruisi Zhang et.al. 2410.19096 null
2024-10-24 BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning Yujuan Velvin Fu et.al. 2410.18955 null
2024-10-24 Demystifying Large Language Models for Medicine: A Primer Qiao Jin et.al. 2410.18856 link
2024-10-24 Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare Yifan Yang et.al. 2410.18460 null
2024-10-23 ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents Yusheng Liao et.al. 2410.17657 link
2024-10-22 DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR Miguel Contreras et.al. 2410.17363 null
2024-10-22 DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization John X. Morris et.al. 2410.17035 null
2024-10-22 SleepCoT: A Lightweight Personalized Sleep Health Model via Chain-of-Thought Distillation Huimin Zheng et.al. 2410.16924 null
2024-10-22 Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective Xiaolan Chen et.al. 2410.16662 null
2024-10-21 How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? Kenza Benkirane et.al. 2410.16574 link
2024-10-21 Large language models enabled multiagent ensemble method for efficient EHR data labeling Jingwei Huang et.al. 2410.16543 null
2024-10-17 SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques Qiming Guo et.al. 2410.16322 null
2024-10-22 MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report Samrajya Thapa et.al. 2410.16239 link
2024-10-21 Fine-Tuning LLMs for Reliable Medical Question-Answering Services Ali Anaissi et.al. 2410.16088 null
2024-10-21 Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding Derong Xu et.al. 2410.15702 null
2024-10-21 Resource-Efficient Medical Report Generation using Large Language Models Abdullah et.al. 2410.15642 null
2024-10-20 Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini Chanseo Lee et.al. 2410.15528 null
2024-10-20 Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training Shahrad Mohammadzadeh et.al. 2410.15460 null
2024-10-19 AutoFLUKA: A Large Language Model Based Framework for Automating Monte Carlo Simulations in FLUKA Zavier Ndum Ndum et.al. 2410.15222 null
2024-10-19 Fine-tuning foundational models to code diagnoses from veterinary health records Mayla R. Boguslav et.al. 2410.15186 null
2024-10-19 Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs Xiaocheng Zhang et.al. 2410.15135 null
2024-10-19 LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound Xuechen Guo et.al. 2410.15074 null
2024-10-18 Enabling Scalable Evaluation of Bias Patterns in Medical LLMs Hamed Fayyaz et.al. 2410.14763 link
2024-10-18 Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning Jialu Tang et.al. 2410.14464 null
2024-10-18 ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM Songheng Zhang et.al. 2410.14331 null
2024-10-18 LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs Yujun Zhou et.al. 2410.14182 null
2024-10-17 RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs Jiatan Huang et.al. 2410.13987 null
2024-10-17 HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings Varun Gumma et.al. 2410.13671 null
2024-10-17 MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling Yakun Zhu et.al. 2410.13610 null
2024-10-17 Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? Che Liu et.al. 2410.13523 null
2024-10-17 MedINST: Meta Dataset of Biomedical Instructions Wenhan Han et.al. 2410.13458 link
2024-10-17 Augmentation Policy Generation for Image Classification Using Large Language Models Ant Duru et.al. 2410.13453 null
2024-10-17 Representation Learning of Structured Data for Medical Foundation Models Vijay Prakash Dwivedi et.al. 2410.13351 null
2024-10-17 CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy Mian Zhang et.al. 2410.13218 null
2024-10-17 LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch Caigao Jiang et.al. 2410.13213 link
2024-10-18 MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback Zonghai Yao et.al. 2410.13191 link
2024-10-16 Leveraging LLMs for Translating and Classifying Mental Health Data Konstantinos Skianis et.al. 2410.12985 null
2024-10-16 AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning Mohammad Reza Rezaei et.al. 2410.12886 link
2024-10-13 IMAS: A Comprehensive Agentic Approach to Rural Healthcare Delivery Agasthya Gangavarapu et.al. 2410.12868 link
2024-10-11 LLMD: A Large Language Model for Interpreting Longitudinal Medical Records Robert Porter et.al. 2410.12860 null
2024-10-11 Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis Ameer Hamza Shakur et.al. 2410.12858 null
2024-10-10 Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions Per Niklas Waaler et.al. 2410.12848 null
2024-10-17 Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 Mohamad Abdi et.al. 2410.12686 null
2024-10-17 MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration Jinjie Wei et.al. 2410.12532 null
2024-10-16 Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation Zerui Xu et.al. 2410.12476 null
2024-10-06 SouLLMate: An Adaptive LLM-Driven System for Advanced Mental Health Support and Assessment, Based on a Systematic Application Survey Qiming Guo et.al. 2410.11859 null
2024-10-15 Y-Mol: A Multiscale Biomedical Knowledge-Guided Large Language Model for Drug Development Tengfei Ma et.al. 2410.11550 null
2024-10-15 AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data Xinjie Zhao et.al. 2410.11531 null
2024-10-15 HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications Weijie Xu et.al. 2410.11239 null
2024-10-13 3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation Hongxin Ding et.al. 2410.10901 null
2024-10-08 Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging Ryota Tozuka et.al. 2410.10869 null
2024-10-08 CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept YuXuan Wu et.al. 2410.10866 null
2024-10-06 Mitigating Hallucinations Using Ensemble of Knowledge Graph and Vector Store in Large Language Models to Enhance Mental Health Support Abdul Muqtadir et.al. 2410.10853 null
2024-10-06 On the Reliability of Large Language Models to Misinformed and Demographically-Informed Prompts Toluwani Aremu et.al. 2410.10850 link
2024-10-14 Thinking LLMs: General Instruction Following with Thought Generation Tianhao Wu et.al. 2410.10630 null
2024-10-14 Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Guorui Zheng et.al. 2410.10626 link
2024-10-14 MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media Wei Zhai et.al. 2410.10323 link
2024-10-13 Adaptive Reasoning and Acting in Medical Language Agents Abhishek Dutta et.al. 2410.10020 null
2024-10-15 MisinfoEval: Generative AI in the Era of “Alternative Facts” Saadia Gabriel et.al. 2410.09949 null
2024-10-13 Equitable Access to Justice: Logical LLMs Show Promise Manuj Kant et.al. 2410.09904 null
2024-10-13 MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions Tavish Mankash et.al. 2410.09729 null
2024-10-12 Society of Medical Simplifiers Chen Lyu et.al. 2410.09631 null
2024-10-12 Enhanced Electronic Health Records Text Summarization Using Large Language Models Ruvarashe Madzime et.al. 2410.09628 null
2024-10-11 Fine-Tuning In-House Large Language Models to Infer Differential Diagnosis from Radiology Reports Luoyao Chen et.al. 2410.09234 null
2024-10-04 Leveraging Social Determinants of Health in Alzheimer’s Research Using LLM-Augmented Literature Mining and Knowledge Graphs Tianqi Shang et.al. 2410.09080 link
2024-10-11 oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness Yu He Ke et.al. 2410.08431 null
2024-10-10 Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions Kuleen Sasse et.al. 2410.07951 null
2024-10-09 MoDEM: Mixture of Domain Expert Models Toby Simonds et.al. 2410.07490 null
2024-10-16 Mental Disorders Detection in the Era of Large Language Models Gleb Kuzmin et.al. 2410.07129 null
2024-10-09 Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback Dennis Hein et.al. 2410.07025 null
2024-10-09 Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare Pardis Sadat Zahraei et.al. 2410.06566 null
2024-10-08 Exploring Large Language Models Through a Neurodivergent Lens: Use, Challenges, Community-Driven Workarounds, and Concerns Buse Carik et.al. 2410.06336 null
2024-10-08 Linking Code and Documentation Churn: Preliminary Analysis Ani Hovhannisyan et.al. 2410.05992 null
2024-10-10 KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server Wenhao Wang et.al. 2410.05725 link
2024-10-10 Copiloting Diagnosis of Autism in Real Clinical Scenarios via LLMs Yi Jiang et.al. 2410.05684 null
2024-10-07 RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction Yuwei Zhang et.al. 2410.05361 null
2024-10-14 Mitigating the Risk of Health Inequity Exacerbated by Large Language Models Yuelyu Ji et.al. 2410.05180 null
2024-10-07 Rule-based Data Selection for Large Language Models Xiaomin Li et.al. 2410.04715 null
2024-10-07 Knowledge Graph Based Agent for Complex, Knowledge-Intensive QA in Medicine Xiaorui Su et.al. 2410.04660 null
2024-10-06 CardioAI: A Multimodal AI-based System to Support Symptom Monitoring and Risk Detection of Cancer Treatment-Induced Cardiotoxicity Siyi Wu et.al. 2410.04592 null
2024-10-06 Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval Pengcheng Jiang et.al. 2410.04585 link
2024-10-06 MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration Lai Wei et.al. 2410.04521 link
2024-10-06 Latent Feature Mining for Predictive Model Enhancement with Large Language Models Bingxuan Li et.al. 2410.04347 null
2024-10-05 RoQLlama: A Lightweight Romanian Adapted Language Model George-Andrei Dima et.al. 2410.04269 null
2024-10-05 DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech Dominika Woszczyk et.al. 2410.04188 null
2024-10-05 Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment Chengfeng Dou et.al. 2410.04112 null
2024-10-04 Searching for Best Practices in Medical Transcription with Large Language Model Jiafeng Li et.al. 2410.03797 link
2024-10-01 Towards Democratization of Subspeciality Medical Expertise Jack W. O’Sullivan et.al. 2410.03741 null
2024-10-01 Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model Aidan Gilson et.al. 2410.03740 null
2024-10-04 Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) Abrar Rahman et.al. 2410.03568 null
2024-10-04 CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios Zetian Ouyang et.al. 2410.03502 link
2024-10-04 Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity Hyosoon Jang et.al. 2410.03138 null
2024-10-04 Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models Yan Chen et.al. 2410.03134 null
2024-10-04 Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks Grant Wardle et.al. 2410.03062 null
2024-10-03 HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router Lingrui Mei et.al. 2410.02684 link
2024-10-03 ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration Zixiang Wang et.al. 2410.02551 null
2024-10-04 MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Gurucharan Marthi Krishna Kumar et.al. 2410.02458 null
2024-10-02 Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics Yuan Zhou et.al. 2410.02026 null
2024-09-27 A GEN AI Framework for Medical Note Generation Hui Yi Leong et.al. 2410.01841 null
2024-10-02 DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning Yebowen Hu et.al. 2410.01772 null
2024-10-03 Practicing Stress Relief for the Everyday: Designing Social Simulation Using VR, AR, and LLMs Anna Fang et.al. 2410.01672 null
2024-10-02 MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework Zonghai Yao et.al. 2410.01553 link
2024-10-01 FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Peiran Wu et.al. 2410.01089 null
2024-10-01 Deceptive Risks in LLM-enhanced Robots Robert Ranisch et.al. 2410.00434 null
2024-10-01 CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset Xiao Wang et.al. 2410.00379 link
2024-10-01 Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis Chun-Hsiao Yeh et.al. 2410.00292 null
2024-09-30 A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification Marina Ribeiro et.al. 2410.00250 null
2024-09-30 EEG Emotion Copilot: Pruning LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation Hongyu Chen et.al. 2410.00166 null
2024-09-30 Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation Pedro Henrique Paiola et.al. 2410.00163 null
2024-09-30 Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments Iker De la Iglesia et.al. 2409.20565 null
2024-09-30 Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation Shan Chen et.al. 2409.20385 null
2024-09-30 Classification of Radiological Text in Small and Imbalanced Datasets in a Non-English Language Vincent Beliveau et.al. 2409.20147 link
2024-10-01 See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning Chengxin Zheng et.al. 2409.19676 link
2024-09-29 MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models Vibhor Agarwal et.al. 2409.19492 null
2024-10-11 HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations Ziyu Wang et.al. 2409.19487 null
2024-09-28 INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning Pablo Romero et.al. 2409.19467 link
2024-09-27 Confidential Prompting: Protecting User Prompts from Cloud LLM Providers In Gim et.al. 2409.19134 link
2024-09-27 Secure Multiparty Generative AI Manil Shrestha et.al. 2409.19120 null
2024-09-27 Outlining the Borders for LLM Applications in Patient Education: Developing an Expert-in-the-Loop LLM-Powered Chatbot for Prostate Cancer Patient Education Yuexing Hao et.al. 2409.19100 null
2024-10-01 AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow Huizi Yu et.al. 2409.18924 null
2024-09-27 Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications Aditi Godbole et.al. 2409.18454 null
2024-09-26 Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model Chuang Niu et.al. 2409.18319 link
2024-09-26 Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams Yuexing Hao et.al. 2409.18290 link
2024-09-26 Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT Natthanaphop Isaradech et.al. 2409.17683 null
2024-09-26 Digital Twin Ecosystem for Oncology Clinical Operations Himanshu Pandey et.al. 2409.17650 null
2024-09-26 ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Zhangpu Li et.al. 2409.17610 null
2024-09-26 A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models Syed Affan Daimi et.al. 2409.17581 link
2024-09-26 Dr. GPT in Campus Counseling: Understanding Higher Education Students’ Opinions on LLM-assisted Mental Health Services Owen Xingjian Zhang et.al. 2409.17572 null
2024-09-26 Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE Xun Zhu et.al. 2409.17508 link
2024-09-25 Severity Prediction in Mental Health: LLM-based Creation, Analysis, Evaluation of a Novel Multilingual Dataset Konstantinos Skianis et.al. 2409.17397 null
2024-09-25 Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia Azmul Asmar Irfan et.al. 2409.17054 null
2024-09-25 The Role of Language Models in Modern Healthcare: A Comprehensive Review Amna Khalid et.al. 2409.16860 null
2024-10-04 “It Explains What I am Currently Going Through Perfectly to a Tee”: Understanding User Perceptions on LLM-Enhanced Narrative Interventions Ananya Bhattacharjee et.al. 2409.16732 null
2024-09-25 In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results Mike Thelwall et.al. 2409.16695 null
2024-09-25 Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels Yishu Wei et.al. 2409.16563 null
2024-09-24 Design and Evaluation of a CDSS for Drug Allergy Management Using LLMs and Pharmaceutical Data Integration Gabriele De Vito et.al. 2409.16395 null
2024-09-24 CHBench: A Chinese Dataset for Evaluating Health in Large Language Models Chenlu Guo et.al. 2409.15766 link
2024-09-24 XTRUST: On the Multilingual Trustworthiness of Large Language Models Yahan Li et.al. 2409.15762 link
2024-09-24 A Comprehensive Evaluation of Large Language Models on Mental Illnesses Abdelrahman Hanafi et.al. 2409.15687 null
2024-09-23 Voice Assistants for Health Self-Management: Designing for and with Older Adults Amama Mahmood et.al. 2409.15488 null
2024-09-20 Prompting Large Language Models for Supporting the Differential Diagnosis of Anemia Elisa Castagnari et.al. 2409.15377 null
2024-09-23 A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Yunfei Xie et.al. 2409.15277 null
2024-09-23 Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation Yi-Fei Zhao et.al. 2409.15260 null
2024-09-24 PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models Zhiyuan Wang et.al. 2409.15188 link
2024-09-23 Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies Skatje Myers et.al. 2409.15163 null
2024-09-23 Boosting Healthcare LLMs Through Retrieved Context Jordi Bayarri-Planas et.al. 2409.15127 link
2024-09-20 Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory Kunyao Lan et.al. 2409.15084 null
2024-09-23 Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs Clément Christophe et.al. 2409.14988 null
2024-09-23 Knowledge Planning in Large Language Models for Domain-Aligned Counseling Summarization Aseem Srivastava et.al. 2409.14907 null
2024-09-24 Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding Bokang Bi et.al. 2409.14638 null
2024-09-22 Can Large Language Models Logically Predict Myocardial Infarction? Evaluation based on UK Biobank Cohort Yuxing Zhi et.al. 2409.14478 null
2024-09-22 PretextTrans: Investigating Medical Factual Knowledge Mastery of LLMs with Predicate-text Dual Transformation Yuxuan Zhou et.al. 2409.14302 null
2024-09-21 Current Trends and Future Directions for Sexual Health Conversational Agents (CAs) for Youth: A Scoping Review Jinkyung Katie Park et.al. 2409.14226 null
2024-09-20 Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology Aidan Gilson et.al. 2409.13902 null
2024-09-20 Transfer Learning with Clinical Concept Embeddings from Large Language Models Yuhe Gao et.al. 2409.13893 null
2024-09-11 A Simplified Retriever to Improve Accuracy of Phenotype Normalizations by Large Language Models Daniel B. Hier et.al. 2409.13744 null
2024-09-20 Recent Advancement of Emotion Cognition in Large Language Models Yuyan Chen et.al. 2409.13354 null
2024-09-20 SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation Jinge Wu et.al. 2409.13321 link
2024-09-20 An adapted large language model facilitates multiple medical tasks in diabetes care Lai Wei et.al. 2409.13191 link
2024-09-19 A New Perspective on ADHD Research: Knowledge Graph Construction with LLMs and Network Based Insights Hakan T. Otal et.al. 2409.12853 link
2024-09-20 Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization Thomas Savage et.al. 2409.12741 null
2024-09-11 Semantic Interoperability on Blockchain by Generating Smart Contracts Based on Knowledge Graphs William Van Woensel et.al. 2409.12171 null
2024-09-19 Using Large Language Models to Generate Clinical Trial Tables and Figures Yumeng Yang et.al. 2409.12046 null
2024-09-20 Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Issey Sukeda et.al. 2409.11783 link
2024-09-17 Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification Fatema-E- Jannat et.al. 2409.11375 null
2024-09-17 ASHABot: An LLM-Powered Chatbot to Support the Informational Needs of Community Health Workers Pragnya Ramjee et.al. 2409.10913 null
2024-09-16 GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students Vikram Krishnaveti et.al. 2409.10750 null
2024-09-15 Veridical Data Science for Medical Foundation Models Ahmed Alaa et.al. 2409.10580 null
2024-09-14 On the limits of agency in agent-based models Ayush Chopra et.al. 2409.10568 link
2024-09-16 DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction John Wu et.al. 2409.10504 null
2024-09-17 Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot Bhuvan Sachdeva et.al. 2409.10354 null
2024-09-16 LLMs for clinical risk prediction Mohamed Rezk et.al. 2409.10191 null
2024-09-16 MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM Sijie Ji et.al. 2409.10064 null
2024-09-18 HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making Sumera Anjum et.al. 2409.10011 link
2024-09-15 GP-GPT: Large Language Model for Gene-Phenotype Mapping Yanjun Lyu et.al. 2409.09825 null
2024-09-15 AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs Madhusudan Ghosh et.al. 2409.09704 link
2024-09-17 ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language Models Inhwa Song et.al. 2409.09662 null
2024-09-15 MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences Subigya Nepal et.al. 2409.09570 null
2024-09-14 Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation Hui Yi Leong et.al. 2409.09324 null
2024-09-24 Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases Mercy Asiedu et.al. 2409.09201 null
2024-09-13 Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation Cheng Charles Ma et.al. 2409.09135 null
2024-08-30 OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography Youzhu Jin et.al. 2409.09052 null
2024-09-13 Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes Luis Rita et.al. 2409.08792 null
2024-09-13 Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling Jialu Tang et.al. 2409.08788 null
2024-09-13 Eir: Thai Medical Large Language Models Yutthakorn Thiprak et.al. 2409.08523 null
2024-09-11 Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation Gavin Butts et.al. 2409.07424 null
2024-09-11 MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Praveen K Kanithi et.al. 2409.07314 null
2024-09-11 Reranking Laws for Language Generation: A Communication-Theoretic Perspective António Farinhas et.al. 2409.07131 null
2024-09-10 MAGDA: Multi-agent guideline-driven diagnostic assistance David Bani-Harouni et.al. 2409.06351 null
2024-09-10 Can Large Language Models Unlock Novel Scientific Research Ideas? Sandeep Kumar et.al. 2409.06185 link
2024-09-10 Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines Yining Chen et.al. 2409.06164 link
2024-09-09 Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach Meng Zhou et.al. 2409.05732 null
2024-09-09 The Influence of Task and Group Disparities over Users’ Attitudes Toward Using Large Language Models for Psychotherapy Qihang He et.al. 2409.05703 null
2024-09-09 KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models Yingshu Li et.al. 2409.05370 null
2024-09-06 Toward LLM-Powered Social Robots for Supporting Sensitive Disclosures of Stigmatized Health Conditions Alemitu Bezabih et.al. 2409.04508 null
2024-09-06 Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials Yizhen Zheng et.al. 2409.04481 null
2024-09-06 Towards Safer Online Spaces: Simulating and Assessing Intervention Strategies for Eating Disorder Discussions Louis Penafiel et.al. 2409.04043 null
2024-09-05 CACER: Clinical Concept Annotations for Cancer Events and Relations Yujuan Fu et.al. 2409.03905 link
2024-09-05 LLM-based event abstraction and integration for IoT-sourced logs Mohsen Shirali et.al. 2409.03478 link
2024-09-05 Rx Strategist: Prescription Verification using LLM Agents System Phuc Phan Van et.al. 2409.03440 null
2024-09-05 Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time Francisco de Arriba-Pérez et.al. 2409.03375 null
2024-09-05 Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration Jeremy Qin et.al. 2409.03225 link
2024-09-04 Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models Chih-Yuan Li et.al. 2409.02530 null
2024-09-03 Therapy as an NLP Task: Psychologists’ Comparison of LLMs and Human Peers in CBT Zainab Iftikhar et.al. 2409.02244 null
2024-09-03 Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation Jack Krolik et.al. 2409.01941 null
2024-09-03 Training on the Benchmark Is Not All You Need Shiwen Ni et.al. 2409.01790 link
2024-09-03 It is Time to Develop an Auditing Framework to Promote Value Aware Chatbots Yanchen Wang et.al. 2409.01539 link
2024-09-02 DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models Rajat Rawat et.al. 2409.01497 null
2024-09-01 Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering Derian Boer et.al. 2409.00861 link
2024-09-01 Building FKG.in: a Knowledge Graph for Indian Food Saransh Kumar Gupta et.al. 2409.00830 null
2024-08-31 Large Language Models-Enabled Digital Twins for Precision Medicine in Rare Gynecological Tumors Jacqueline Lammert et.al. 2409.00544 link
2024-08-31 Chatting Up Attachment: Using LLMs to Predict Adult Bonds Paulo Soares et.al. 2409.00347 null
2024-08-29 A Survey for Large Language Models in Biomedicine Chong Wang et.al. 2409.00133 null
2024-08-27 Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy Daniil Filienko et.al. 2409.00112 null
2024-08-27 Large Language Models for Disease Diagnosis: A Scoping Review Shuang Zhou et.al. 2409.00097 null
2024-09-04 Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models Seyed Amir Ahmad Safavi-Naini et.al. 2409.00084 link
2024-08-30 NDP: Next Distribution Prediction as a More Broad Target Junhao Ruan et.al. 2408.17377 null
2024-08-29 Instruction-tuned Large Language Models for Machine Translation in the Medical Domain Miguel Rios et.al. 2408.16440 null
2024-08-29 Enhancing AI-Driven Psychological Consultation: Layered Prompts with Large Language Models Rafael Souza et.al. 2408.16276 null
2024-08-29 M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation Jonggwon Park et.al. 2408.16213 null
2024-08-28 Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions Huachuan Qiu et.al. 2408.15787 link
2024-08-28 A Survey on Evaluation of Multimodal Large Language Models Jiaxing Huang et.al. 2408.15769 null
2024-08-26 Improving Clinical Note Generation from Complex Doctor-Patient Conversation Yizhan Li et.al. 2408.14568 null
2024-09-06 MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues Kuluhan Binici et.al. 2408.14418 null
2024-09-03 Foundation Models for Music: A Survey Yinghao Ma et.al. 2408.14340 link
2024-08-25 Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data Felix J. Dorfner et.al. 2408.13833 null
2024-08-25 Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models Duy Khoa Pham et.al. 2408.13808 null
2024-08-23 IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models Zhihao Yu et.al. 2408.13073 link
2024-08-23 Guiding IoT-Based Healthcare Alert Systems with Large Language Models Yulan Gao et.al. 2408.13071 null
2024-08-23 Grounding Fallacies Misrepresenting Scientific Publications in Evidence Max Glockner et.al. 2408.12812 link
2024-08-22 RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment Xiaohan Wang et.al. 2408.12579 null
2024-09-05 Towards Evaluating and Building Versatile Large Language Models for Medicine Chaoyi Wu et.al. 2408.12547 link
2024-08-22 MEDCO: Medical Education Copilots Based on A Multi-Agent Framework Hao Wei et.al. 2408.12496 null
2024-08-22 Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations Kai Tzu-iunn Ong et.al. 2408.12315 null
2024-08-22 LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction Aishik Nagar et.al. 2408.12249 null
2024-08-22 MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient Yanzeng Li et.al. 2408.12236 null
2024-08-22 Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards Shresth Verma et.al. 2408.12112 null
2024-08-22 Aligning (Medical) LLMs for (Counterfactual) Fairness Raphael Poulain et.al. 2408.12055 link
2024-08-21 Exploring Large Language Models for Feature Selection: A Data-centric Perspective Dawei Li et.al. 2408.12025 null
2024-08-16 Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI Arindam Sett et.al. 2408.11861 null
2024-08-15 When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? Yanjun Gao et.al. 2408.11854 null
2024-08-13 MGH Radiology Llama: A Llama 3 70B Model for Radiology Yucheng Shi et.al. 2408.11848 null
2024-09-01 Clinical Insights: A Comprehensive Review of Language Models in Medicine Nikita Neveditsin et.al. 2408.11735 null
2024-08-21 BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports Yuxuan Chen et.al. 2408.11334 null
2024-08-21 Probabilistic Medical Predictions of Large Language Models Bowen Gu et.al. 2408.11316 null
2024-08-21 Applying and Evaluating Large Language Models in Mental Health Care: A Scoping Review of Human-Assessed Generative Tasks Yining Hua et.al. 2408.11288 null
2024-08-21 BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation Haotian Peng et.al. 2408.11281 link
2024-08-20 Public Health in Disaster: Emotional Health and Life Incidents Extraction during Hurricane Harvey Thomas Hoang et.al. 2408.11133 null
2024-08-20 CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models Michael Reinisch et.al. 2408.10995 null
2024-08-20 Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology Yihao Hou et.al. 2408.10715 null
2024-08-20 Large Language Models for Multimodal Deformable Image Registration Mingrui Ma et.al. 2408.10703 link
2024-08-19 Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory Haoran Li et.al. 2408.10053 null
2024-08-29 MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis Ruihui Hou et.al. 2408.10039 null
2024-08-19 Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions Sebastian Heineking et.al. 2408.09831 link
2024-08-19 R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation Xiao Wang et.al. 2408.09743 link
2024-08-18 Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities Minh Duc Chu et.al. 2408.09366 null
2024-08-17 TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems Xinke Jiang et.al. 2408.09199 link
2024-08-17 AI Managed Emergency Documentation with a Pretrained Model David Menzies et.al. 2408.09193 null
2024-08-16 Improving VTE Identification through Language Models from Radiology Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT Jamie Deng et.al. 2408.09043 null
2024-08-16 HSDreport: Heart Sound Diagnosis with Echocardiography Reports Zihan Zhao et.al. 2408.08669 null
2024-08-16 RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions Gregory Kell et.al. 2408.08624 link
2024-08-15 Assessing and Enhancing Large Language Models in Rare Disease Question-answering Guanchu Wang et.al. 2408.08422 null
2024-08-15 LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning Jiajie Li et.al. 2408.07981 null
2024-08-15 The doctor will polygraph you now: ethical concerns with AI for fact-checking patients James Anibal et.al. 2408.07896 null
2024-08-15 Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering Yushi Yang et.al. 2408.07888 link
2024-08-14 MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis Nimeesha Chan et.al. 2408.07773 link
2024-08-27 Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments Seungjun Han et.al. 2408.07531 null
2024-08-14 Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health Yongquan Hu et.al. 2408.07313 null
2024-07-24 Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition Michele Fiori et.al. 2408.06352 null
2024-08-12 Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM Trisha Das et.al. 2408.06285 null
2024-08-12 Med42-v2: A Suite of Clinical LLMs Clément Christophe et.al. 2408.06142 null
2024-08-10 Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction Jung Hoon Lim et.al. 2408.05555 null
2024-08-16 RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records Sangjoon Park et.al. 2408.05074 null
2024-08-08 Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction Reza Khanmohammadi et.al. 2408.04775 null
2024-08-08 Dynamic Fog Computing for Enhanced LLM Execution in Medical Applications Philipp Zagar et.al. 2408.04680 null
2024-08-03 Building Trust in Mental Health Chatbots: Safety Metrics and LLM-Based Evaluation Tools Jung In Park et.al. 2408.04650 null
2024-08-08 Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation Junde Wu et.al. 2408.04187 link
2024-08-08 Academic collaboration on large language model studies increases overall but varies across disciplines Lingyao Li et.al. 2408.04163 link
2024-08-08 Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering Haoran Yu et.al. 2408.04138 null
2024-08-07 Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology Panagiotis Fytas et.al. 2408.04121 null
2024-08-07 Towards Multimodal Emotional Support Conversation Systems Yuqi Chu et.al. 2408.03650 link
2024-08-06 Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation Artur Guimarães et.al. 2408.03127 link
2024-08-06 Targeted Visual Prompting for Medical Visual Question Answering Sergio Tascon-Morales et.al. 2408.03043 link
2024-08-06 Fact Finder – Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs Daniel Steinigen et.al. 2408.03010 link
2024-08-07 Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval Iman Azimi et.al. 2408.02964 link
2024-08-04 MedSyn: LLM-based Synthetic Medical Text Generation Framework Gleb Kumichev et.al. 2408.02056 link
2024-08-06 DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models Bowen Wang et.al. 2408.01933 link
2024-08-03 MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance Jihye Choi et.al. 2408.01869 link
2024-07-27 AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools Aditya Paul et.al. 2408.01459 null
2024-08-02 The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models Hannah Chen et.al. 2408.01285 null
2024-08-05 Agentic LLM Workflows for Generating Patient-Friendly Medical Reports Malavikha Sudarshan et.al. 2408.01112 link
2024-08-01 Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions Guangzhi Xiong et.al. 2408.00727 link
2024-07-25 Closing the gap between open-source and commercial large language models for medical evidence summarization Gongbo Zhang et.al. 2408.00588 null
2024-07-31 A Taxonomy of Stereotype Content in Large Language Models Gandalf Nicolas et.al. 2408.00162 null
2024-07-31 A Course Shared Task on Evaluating LLM Output for Clinical Questions Yufang Hou et.al. 2408.00122 link
2024-07-24 Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications Cui Long et.al. 2407.21055 null
2024-07-23 An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice Roma Shusterman et.al. 2407.21051 null
2024-08-12 Artificial Intelligence in Extracting Diagnostic Data from Dental Records Yao-Shun Chuang et.al. 2407.21050 null
2024-07-30 Can LLMs be Fooled? Investigating Vulnerabilities in LLMs Sara Abdali et.al. 2407.20529 null
2024-07-29 Exploring Large Language Models to generate Easy to Read content Paloma Martínez et.al. 2407.20046 null
2024-07-30 CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare Jingwei Zhu et.al. 2407.19705 link
2024-07-28 A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy Meng Jiang et.al. 2407.19422 null
2024-07-27 The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations Thanh-Dung Le et.al. 2407.19299 null
2024-07-27 Multi-Modal CLIP-Informed Protein Editing Mingze Yin et.al. 2407.19296 null
2024-07-27 Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review Tongyue Shi et.al. 2407.19256 null
2024-07-26 Large Language Models as Co-Pilots for Causal Inference in Medical Studies Ahmed Alaa et.al. 2407.19118 null
2024-07-26 Towards Automated Solution Recipe Generation for Industrial Asset Management with LLM Nianjun Zhou et.al. 2407.18992 null
2024-07-26 Is larger always better? Evaluating and prompting large language models for non-generative medical tasks Yinghao Zhu et.al. 2407.18525 link
2024-07-24 Online Social Network Data-Driven Early Detection on Short-Form Video Addiction Fang-Yu Kuo et.al. 2407.18277 null
2024-07-25 The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation Eric Yang et.al. 2407.18044 null
2024-08-15 The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer Danqing Hu et.al. 2407.17900 null
2024-07-25 Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy? Hao Shen et.al. 2407.17730 null
2024-07-24 IgnitionInnovators at “Discharge Me!”: Chain-of-Thought Instruction Finetuning Large Language Models for Discharge Summaries An Quang Tang et.al. 2407.17636 link
2024-07-24 SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH) Bernardo Consoli et.al. 2407.17126 null
2024-07-23 Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models Ioana Buhnila et.al. 2407.16565 link
2024-07-23 PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets Jaeyoung Kim et.al. 2407.16329 null
2024-07-23 Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks Yao-Shun Chuang et.al. 2407.16166 link
2024-07-16 Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis Qiuhong Wei et.al. 2407.15862 null
2024-07-21 A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech Gaurav Verma et.al. 2407.15227 null
2024-07-19 CVE-LLM : Automatic vulnerability evaluation in medical device industry using large language models Rikhiya Ghosh et.al. 2407.14640 null
2024-07-19 Adversarial Databases Improve Success in Retrieval-based Large Language Models Sean Wu et.al. 2407.14609 null
2024-07-19 Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis Valentin Pelloin et.al. 2407.14180 link
2024-07-28 Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field Tobias Kerner et.al. 2407.14076 null
2024-07-19 Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization Md Sultan Al Nahian et.al. 2407.14000 null
2024-07-18 KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration Youfu Yan et.al. 2407.13598 null
2024-07-18 Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks Samy Ateia et.al. 2407.13511 link
2024-07-18 End-To-End Clinical Trial Matching with Large Language Models Dyke Ferber et.al. 2407.13463 null
2024-07-18 CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis Junying Chen et.al. 2407.13301 link
2024-07-18 TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models Ling Yue et.al. 2407.13115 null
2024-07-03 Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital Mindfulness Harsh Kumar et.al. 2407.13067 null
2024-07-17 Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models Alexander R. Pelletier et.al. 2407.12888 link
2024-07-06 Large language models are good medical coders, if provided with tools Keith Kwan et.al. 2407.12849 link
2024-07-04 NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions Andong Hua et.al. 2407.12843 null
2024-07-02 Lightweight Large Language Model for Medication Enquiry: Med-Pal Kabilan Elangovan et.al. 2407.12822 null
2024-07-18 Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions Marcos Fernández-Pichel et.al. 2407.12468 link
2024-07-17 MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models Thao Minh Nguyen Phan et.al. 2407.12309 null
2024-07-17 A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery Jike Wang et.al. 2407.12296 null
2024-07-26 LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation Bunyamin Keles et.al. 2407.12126 null
2024-06-30 Evaluation of Bias Towards Medical Professionals in Large Language Models Xi Chen et.al. 2407.12031 null
2024-07-16 Schema Matching with Large Language Models: an Experimental Study Marcel Parciak et.al. 2407.11852 link
2024-07-25 CCoE: A Compact LLM with Collaboration of Experts Shaomang Huang et.al. 2407.11686 null
2024-07-16 Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise Qimin Yang et.al. 2407.11536 null
2024-07-09 Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations Rachael Fleurence et.al. 2407.11054 null
2024-06-25 Panacea: A foundation model for clinical trial search, summarization, design, and recruitment Jiacheng Lin et.al. 2407.11007 link
2024-07-15 Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities Nhat Le et.al. 2407.10785 null
2024-07-15 TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction Xingzhi Zhou et.al. 2407.10510 null
2024-07-15 Enhancing Medication Recommendation with LLM Text Representation Yu-Tzu Lee et.al. 2407.10453 null
2024-07-13 Causality extraction from medical text using Large Language Models (LLMs) Seethalakshmi Gopalakrishnan et.al. 2407.10020 null
2024-07-13 PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models Can Cui et.al. 2407.09979 null
2024-07-12 Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction Chase Fensore et.al. 2407.09688 link
2024-07-12 Open (Clinical) LLMs are Sensitive to Instruction Phrasings Alberto Mario Ceballos Arroyo et.al. 2407.09429 link
2024-07-12 STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs Yiheng Huang et.al. 2407.09096 link
2024-07-11 Uncertainty Estimation of Large Language Models in Medical Question Answering Jiaxin Wu et.al. 2407.08662 null
2024-07-11 Leveraging LLMs to Predict Affective States via Smartphone Sensor Features Tianyi Zhang et.al. 2407.08240 null
2024-07-11 DALL-M: Context-Aware Clinical Data Augmentation with LLMs Chihcheng Hsieh et.al. 2407.08227 link
2024-07-10 Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing Ian Steenstra et.al. 2407.08095 link
2024-07-04 CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation Rui Yang et.al. 2407.07913 null
2024-07-10 A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability Ting Fang Tan et.al. 2407.07666 null
2024-07-10 Interpretable Differential Diagnosis with Dual-Inference Large Language Models Shuang Zhou et.al. 2407.07330 null
2024-07-09 Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges Emilio Ferrara et.al. 2407.07196 null
2024-07-09 Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies Inwon Kang et.al. 2407.07019 null
2024-07-09 End-To-End Causal Effect Estimation from Unstructured Natural Language Data Nikita Dhawan et.al. 2407.07018 null
2024-07-08 Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities Avinash Anand et.al. 2407.06125 null
2024-07-08 Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs Sanjeet Singh et.al. 2407.05887 link
2024-07-08 PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation Jinpeng Hu et.al. 2407.05721 link
2024-07-07 CLIMB: A Benchmark of Clinical Bias in Large Language Models Yubo Zhang et.al. 2407.05250 link
2024-07-06 Leveraging Task-Specific Knowledge from LLM for Semi-Supervised 3D Medical Image Segmentation Suruchi Kumari et.al. 2407.05088 null
2024-07-05 Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework Reza Averly et.al. 2407.04629 null
2024-07-05 Using LLMs to label medical papers according to the CIViC evidence model Markus Hisch et.al. 2407.04466 link
2024-07-04 Query-Guided Self-Supervised Summarization of Nursing Notes Ya Gao et.al. 2407.04125 null
2024-07-04 Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval Kazuaki Furumai et.al. 2407.03585 null
2024-07-03 Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory Suyeon Lee et.al. 2407.03103 link
2024-07-03 SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research Meghal Dani et.al. 2407.03004 null
2024-07-02 Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms Viet Cuong Nguyen et.al. 2407.02662 null
2024-07-02 MMedAgent: Learning to Use Medical Tools with Multi-modal Agent Binxu Li et.al. 2407.02483 link
2024-06-29 Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare Forhan Bin Emdad et.al. 2407.01627 null
2024-07-14 Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles Ryan Louie et.al. 2407.00870 null
2024-06-30 Large Language Models Struggle in Token-Level Clinical Named Entity Recognition Qiuhao Lu et.al. 2407.00731 link
2024-06-29 Answering real-world clinical questions using large language model based systems Yen Sia Low et.al. 2407.00541 null
2024-06-29 ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees Zhiyuan Wang et.al. 2407.00499 link
2024-06-28 EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models João Matos et.al. 2407.00242 link
2024-07-02 Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges Mahmoud Ibrahim et.al. 2407.00116 null
2024-06-27 PathAlign: A vision-language model for whole slide images in histopathology Faruk Ahmed et.al. 2406.19578 null
2024-06-27 PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models Cathy Mengying Fang et.al. 2406.19283 null
2024-06-27 HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Junying Chen et.al. 2406.19280 link
2024-06-26 Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources Yiming Li et.al. 2406.18049 null
2024-06-26 LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them Wenya Xie et.al. 2406.18034 null
2024-06-26 Automated Clinical Data Extraction with Knowledge Conditioned LLMs Diya Li et.al. 2406.18027 null
2024-07-11 Multi-step Inference over Unstructured Data Aditya Kalyanpur et.al. 2406.17987 null
2024-06-25 Accelerating Clinical Evidence Synthesis with Large Language Models Zifeng Wang et.al. 2406.17755 null
2024-07-06 MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation Yusheng Liao et.al. 2406.17484 link
2024-06-25 Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis Ajan Subramanian et.al. 2406.16252 null
2024-06-23 Effectiveness of ChatGPT in explaining complex medical reports to patients Mengxuan Sun et.al. 2406.15963 null
2024-06-22 Real-time Speech Summarization for Medical Conversations Khai Le-Duc et.al. 2406.15888 link
2024-06-16 WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist Chenyu Ren et.al. 2406.15474 null
2024-06-15 Mental Disorder Classification via Temporal Representation of Text Raja Kumar et.al. 2406.15470 null
2024-06-21 Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms Santiago Berrezueta-Guzman et.al. 2406.15198 null
2024-06-21 Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction Jinge Wu et.al. 2406.15045 null
2024-06-21 MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens Yongqi Fan et.al. 2406.15019 link
2024-06-21 Human-AI collectives produce the most accurate differential diagnoses N. Zöller et.al. 2406.14981 link
2024-06-21 70B-parameter large language models in Japanese medical question-answering Issey Sukeda et.al. 2406.14882 null
2024-06-27 Efficient Continual Pre-training by Mitigating the Stability Gap Yiduo Guo et.al. 2406.14833 null
2024-07-01 ACR: A Benchmark for Automatic Cohort Retrieval Dung Ngoc Thai et.al. 2406.14780 null
2024-06-20 A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes Syed I. Munzir et.al. 2406.14757 null
2024-06-20 medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs Mingyi Jia et.al. 2406.14326 link
2024-06-19 ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World Weixiang Yan et.al. 2406.13890 link
2024-06-24 The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis Marcin Rządeczka et.al. 2406.13813 null
2024-06-19 Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health Bo Wen et.al. 2406.13659 null
2024-06-19 Optimizing Psychological Counseling with Instruction-Tuned Large Language Models Wenjie Li et.al. 2406.13617 null
2024-06-19 Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective David Restrepo et.al. 2406.13152 null
2024-06-18 Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia Ankit Aich et.al. 2406.12687 null
2024-06-18 Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics Huan Xu et.al. 2406.12651 null
2024-06-20 Towards a Client-Centered Assessment of LLM Therapists by Client Simulation Jiashuo Wang et.al. 2406.12266 link
2024-06-18 Adversarial Attacks on Large Language Models in Medicine Yifan Yang et.al. 2406.12259 null
2024-06-18 Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models Lulu Zhao et.al. 2406.12182 null
2024-06-19 Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Jack Gallifant et.al. 2406.12066 link
2024-06-28 WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions Seyedali Mohammadi et.al. 2406.12058 link
2024-06-30 MedCalc-Bench: Evaluating Large Language Models for Medical Calculations Nikhil Khandekar et.al. 2406.12036 link
2024-06-19 Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models Yuqing Wang et.al. 2406.12033 link
2024-06-17 Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams Zheheng Luo et.al. 2406.11328 link
2024-06-17 Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization Minda Hu et.al. 2406.11258 null
2024-06-16 RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information Zhiwei Liu et.al. 2406.11093 link
2024-06-15 SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task Ziije Zhong et.al. 2406.10710 link
2024-06-15 We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation Palash Moon et.al. 2406.10561 null
2024-06-15 CancerLLM: A Large Language Model in Cancer Domain Mingchen Li et.al. 2406.10459 null
2024-06-14 Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework Olivier Binette et.al. 2406.10366 null
2024-06-14 A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations Jinqiang Wang et.al. 2406.10303 link
2024-06-13 Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark Chufan Gao et.al. 2406.10292 null
2024-06-11 Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis Matteo Esposito et.al. 2406.10273 null
2024-06-14 Detecting and Evaluating Medical Hallucinations in Large Vision Language Models Jiawei Chen et.al. 2406.10185 null
2024-06-14 CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions Mingyu Derek Ma et.al. 2406.09923 link
2024-06-13 Chain-of-Though (CoT) prompting strategies for medical error detection and correction Zhaolong Wu et.al. 2406.09103 null
2024-06-13 Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations Jun-Woo Kim et.al. 2406.08718 null
2024-06-12 Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization Fengxiao Tang et.al. 2406.08305 null
2024-06-18 SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature David Wadden et.al. 2406.07835 link
2024-06-12 Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images Che Liu et.al. 2406.07146 null
2024-06-10 Large language models for generating rules, yay or nay? Shangeetha Sivasothy et.al. 2406.06835 null
2024-06-10 Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing Enshuo Hsu et.al. 2406.06723 null
2024-06-09 LLM Questionnaire Completion for Automatic Psychiatric Assessment Gony Rosenman et.al. 2406.06636 null
2024-06-07 Transforming Dental Diagnostics with Artificial Intelligence: Advanced Integration of ChatGPT and Large Language Models for Patient Care Masoumeh Farhadi Nia et.al. 2406.06616 null
2024-06-03 MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering Robert Osazuwa Ness et.al. 2406.06573 null
2024-06-10 Towards a Personal Health Large Language Model Justin Cosentino et.al. 2406.06474 null
2024-06-11 Transforming Wearable Data into Health Insights using Large Language Model Agents Mike A. Merrill et.al. 2406.06464 null
2024-06-13 A Large Language Model Pipeline for Breast Cancer Oncology Tristen Pool et.al. 2406.06455 null
2024-06-10 Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain Brian Hu et.al. 2406.06435 link
2024-06-10 MedExQA: Medical Question Answering Benchmark with Multiple Explanations Yunsoo Kim et.al. 2406.06331 link
2024-06-10 Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text Avijit Mitra et.al. 2406.06056 link
2024-06-10 Enhancing Food Safety in Supply Chains: The Potential Role of Large Language Models in Preventing Campylobacter Contamination Asaf Tzachor et.al. 2406.06049 null
2024-06-09 Zero-Shot End-To-End Spoken Question Answering In Medical Domain Yanis Labrak et.al. 2406.05876 null
2024-06-09 MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering Juraj Vladika et.al. 2406.05845 null
2024-06-08 Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification Yunhe Gao et.al. 2406.05596 null
2024-06-07 TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models Ping Yu et.al. 2406.04941 null
2024-06-06 On The Persona-based Summarization of Domain-Specific Documents Ankan Mullick et.al. 2406.03986 link
2024-06-06 UltraMedical: Building Specialized Generalists in Biomedicine Kaiyan Zhang et.al. 2406.03949 link
2024-06-06 Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As Eden Avnat et.al. 2406.03855 null
2024-06-06 A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions Lei Liu et.al. 2406.03712 null
2024-06-06 M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering Anand Subramanian et.al. 2406.03699 link
2024-06-05 Missci: Reconstructing Fallacies in Misrepresented Science Max Glockner et.al. 2406.03181 link
2024-06-05 MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge Yuxuan Zhou et.al. 2406.02919 link
2024-06-04 Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data Maxime Griot et.al. 2406.02394 link
2024-06-05 LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing Maojun Sun et.al. 2406.02350 link
2024-06-04 Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study Martin J. Hetz et.al. 2406.01428 null
2024-06-03 TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine Wenjing Yue et.al. 2406.01126 null
2024-06-04 MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning Shuyue Stella Li et.al. 2406.00922 link
2024-05-29 Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study David Pissarra et.al. 2406.00062 null
2024-05-27 EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling Yinghao Zhu et.al. 2406.00036 null
2024-05-22 KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR Hajung Kim et.al. 2406.00014 null
2024-05-26 Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models Xijie Huang et.al. 2405.20775 link
2024-05-31 GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models Mohammed-Khalil Ghali et.al. 2405.20585 null
2024-05-30 PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals Ruiyi Wang et.al. 2405.19660 link
2024-05-30 Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router Akul Goel et.al. 2405.19631 null
2024-05-26 ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text Han Yu et.al. 2405.19366 link
2024-05-29 Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models Tianrun Chen et.al. 2405.19326 null
2024-06-03 PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications Dingkang Yang et.al. 2405.19266 link
2024-05-28 Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation Anjanava Biswas et.al. 2405.18346 null
2024-05-28 Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints Aryo Pradipta Gema et.al. 2405.18028 null
2024-05-28 SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions Juexiao Zhou et.al. 2405.18004 null
2024-05-26 Augmented Risk Prediction for the Onset of Alzheimer’s Disease from Electronic Health Records with Large Language Models Jiankun Wang et.al. 2405.16413 null
2024-05-26 Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions Man Luo et.al. 2405.16402 null
2024-05-29 Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data Yuhao Chen et.al. 2405.16295 null
2024-05-28 Ensuring Ground Truth Accuracy in Healthcare with the EVINCE framework Edward Y. Chang et.al. 2405.15808 null
2024-05-27 Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development Pranab Sahoo et.al. 2405.15766 link
2024-05-24 Efficient Reinforcement Learning via Large Language Model-based Search Siddhant Bhambri et.al. 2405.15194 null
2024-05-24 Generalizable and Scalable Multistage Biomedical Concept Normalization Leveraging Large Language Models Nicholas J Dobbins et.al. 2405.15122 link
2024-05-23 Evaluating Large Language Models for Public Health Classification and Extraction Tasks Joshua Harris et.al. 2405.14766 null
2024-05-23 Exploring the use of a Large Language Model for data extraction in systematic reviews: a rapid feasibility study Lena Schmidt et.al. 2405.14445 null
2024-05-23 Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation Zhusi Zhong et.al. 2405.14113 link
2024-05-22 Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation Siyi Wu et.al. 2405.13803 null
2024-05-21 How Reliable AI Chatbots are for Disease Prediction from Patient Complaints? Ayesha Siddika Nipu et.al. 2405.13219 null
2024-05-20 Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian Rohitash Chandra et.al. 2405.13056 link
2024-05-20 Large Language Models for Medicine: A Survey Yanxin Zheng et.al. 2405.13055 null
2024-05-12 Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data Nan Miles Xi et.al. 2405.13005 null
2024-05-21 OLAPH: Improving Factuality in Biomedical Long-form Question Answering Minbyul Jeong et.al. 2405.12701 link
2024-05-21 Exploration of Masked and Causal Language Modelling for Text Generation Nicolo Micheletti et.al. 2405.12630 null
2024-05-21 DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge Bufang Yang et.al. 2405.12541 null
2024-05-20 Can AI Relate: Testing Large Language Model Response for Mental Health Support Saadia Gabriel et.al. 2405.12021 link
2024-05-19 Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning Zishan Gu et.al. 2405.11640 null
2024-05-18 Can Public LLMs be used for Self-Diagnosis of Medical Conditions ? Nikil Sharan Prabahar Balasubramanian et.al. 2405.11407 null
2024-05-18 Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments Sichang Tu et.al. 2405.11178 null
2024-05-17 From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT Jace Grandinetti et.al. 2405.11040 null
2024-05-17 COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain Dimitrios P. Panagoulias et.al. 2405.10893 null
2024-05-16 Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification Jinge Wu et.al. 2405.10440 null
2024-05-14 PromptMind Team at EHRSQL-2024: Improving Reliability of SQL Generation using Ensemble LLMs Satya K Gundabathula et.al. 2405.08839 null
2024-05-14 A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine Hanguang Xiao et.al. 2405.08603 null
2024-05-14 PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles Satya Kesav Gundabathula et.al. 2405.08373 null
2024-05-30 AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments Samuel Schmidgall et.al. 2405.07960 null
2024-05-13 Evaluating large language models in medical applications: a survey Xiaolan Chen et.al. 2405.07468 null
2024-05-10 A Global Data-Driven Model for The Hippocampus and Nucleus Accumbens of Rat From The Local Field Potential Recordings (LFP) Maedeh Sadeghi et.al. 2405.06732 null
2024-05-09 Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses Gaurav Kumar Gupta et.al. 2405.06712 null
2024-05-08 Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance Goran Muric et.al. 2405.06703 null
2024-05-08 Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks Chancellor R. Woolsey et.al. 2405.06695 null
2024-05-10 Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval Mengjia Niu et.al. 2405.06545 null
2024-06-03 XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare Fatemeh Nazary et.al. 2405.06270 null
2024-05-09 Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection Bhawesh Kumar et.al. 2405.06093 null
2024-05-09 Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents Matthew Jörke et.al. 2405.06061 null
2024-05-09 Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias Shan Chen et.al. 2405.05506 link
2024-05-08 Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models Aylin Gunal et.al. 2405.05060 null
2024-05-12 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature Dawei Li et.al. 2405.04819 link
2024-05-08 Empathy Through Multimodality in Conversational Interfaces Mahyar Abbasian et.al. 2405.04777 null
2024-05-07 AffirmativeAI: Towards LGBTQ+ Friendly Audit Frameworks for Large Language Models Yinru Long et.al. 2405.04652 null
2024-05-07 D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models Duygu Altinok et.al. 2405.04170 link
2024-05-14 ERATTA: Extreme RAG for Table To Answers with Large Language Models Sohini Roychowdhury et.al. 2405.03963 null
2024-05-08 How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs Muhammad Uzair Khattak et.al. 2405.03690 null
2024-05-06 MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline Mohamed Yaseen Jabarulla et.al. 2405.03359 link
2024-05-06 Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines Md Main Uddin Rony et.al. 2405.03153 null
2024-05-22 A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs) Lingyao Li et.al. 2405.03066 null
2024-05-05 Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents Junkai Li et.al. 2405.02957 null
2024-05-05 Confidential and Protected Disease Classifier using Fully Homomorphic Encryption Aditya Malik et.al. 2405.02790 null
2024-05-04 A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare Thomas Yu Chow Tam et.al. 2405.02559 null
2024-05-03 MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain Chao Jiang et.al. 2405.02144 null
2024-05-03 CRCL at SemEval-2024 Task 2: Simple prompt optimizations Clément Brutti-Mairesse et.al. 2405.01942 link
2024-05-03 Aloe: A Family of Fine-tuned Open Healthcare LLMs Ashwin Kumar Gururajan et.al. 2405.01886 null
2024-05-02 Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models Hye Sun Yun et.al. 2405.01686 link
2024-05-22 Leveraging Prompt-Learning for Structured Information Extraction from Crohn’s Disease Radiology Reports in a Low-Resource Language Liam Hazan et.al. 2405.01682 null
2024-04-29 Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model Seonhee Cho et.al. 2405.01591 null
2024-05-09 GPT-4 passes most of the 297 written Polish Board Certification Examinations Jakub Pokrywka et.al. 2405.01589 null
2024-05-02 Prompt engineering paradigms for medical applications: scoping review and recommendations for better practices Jamil Zaghir et.al. 2405.01249 null
2024-04-27 Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study Dou Liu et.al. 2405.00728 null
2024-04-25 Large Language Models in Healthcare: A Comprehensive Benchmark Andrew Liu et.al. 2405.00716 link
2024-04-25 Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation Hanyin Wang et.al. 2405.00715 link
2024-04-23 Interactive Analysis of LLMs using Meaningful Counterfactuals Furui Cheng et.al. 2405.00708 null
2024-05-15 “I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust Sunnie S. Y. Kim et.al. 2405.00623 null
2024-05-01 Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning Huan Xu et.al. 2405.00461 null
2024-05-01 DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training Bhuvanesh Verma et.al. 2405.00321 null
2024-05-06 Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models Scott Sumpter et.al. 2404.19713 null
2024-04-29 It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments Petter Mæhlum et.al. 2404.18832 null
2024-04-29 Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models Hongyi Zhu et.al. 2404.18746 null
2024-04-29 6G comprehensive intelligence: network operations and optimization based on Large Language Models Sifan Long et.al. 2404.18373 null
2024-04-27 MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch Nadia Saeed et.al. 2404.17999 link
2024-04-27 Advancing Healthcare Automation: Multi-Agent Systems for Medical Necessity Justification Himanshu Pandey et.al. 2404.17977 null
2024-04-27 Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models Zhongzhen Huang et.al. 2404.17897 null
2024-04-27 VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition Junyi Biana et.al. 2404.17835 null
2024-04-25 A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs Christian N. Mayemba et.al. 2404.16921 null
2024-04-25 Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare Emre Can Acikgoz et.al. 2404.16621 link
2024-04-26 Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums Isabelle Lorge et.al. 2404.16461 null
2024-04-25 LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications Saranya Krishnamoorthy et.al. 2404.16294 link
2024-04-26 Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions Divyansh Agarwal et.al. 2404.16251 null
2024-05-05 A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry Yining Huang et.al. 2404.15777 null
2024-04-27 PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models Shashi Kant Gupta et.al. 2404.15549 null
2024-04-23 IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents Jean-Philippe Corbeil et.al. 2404.15488 link
2024-04-22 Adaptive Collaboration Strategy for LLMs in Medical Decision Making Yubin Kim et.al. 2404.15155 link
2024-04-23 Bias patterns in the application of LLMs for clinical decision support: A comprehensive study Raphael Poulain et.al. 2404.15149 link
2024-04-23 Med42 – Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches Clément Christophe et.al. 2404.14779 null
2024-04-23 CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning Ling Yue et.al. 2404.14777 null
2024-04-22 WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models Ronald Xie et.al. 2404.14567 null
2024-04-22 WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction Augustin Toma et.al. 2404.14544 null
2024-04-22 No General Code of Ethics for All: Ethical Considerations in Human-bot Psycho-counseling Lizhi Ma et.al. 2404.14070 null
2024-04-20 “I Wish There Were an AI”: Challenges and AI Potential in Cancer Patient-Provider Communication Ziqi Yang et.al. 2404.13409 null
2024-04-20 UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions Ana-Cristina Rogoz et.al. 2404.13343 link
2024-04-20 Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions Soumyadeep Roy et.al. 2404.13307 link
2024-05-03 LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models Mouhamed Amine Bouchiha et.al. 2404.13236 link
2024-04-19 Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging Chia-Hsuan Chang et.al. 2404.13149 null
2024-04-25 Leveraging Large Language Model as Simulated Patients for Clinical Education Yanzeng Li et.al. 2404.13066 null
2024-04-19 Data Alignment for Zero-Shot Concept Generation in Dermatology AI Soham Gadgil et.al. 2404.13043 null
2024-04-19 Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation Guanhua Chen et.al. 2404.12879 null
2024-04-17 Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM Hongzhao Li et.al. 2404.11209 null
2024-04-15 Numerical Attributes Learning for Cardiac Failure Diagnostic from Clinical Narratives – A LESA-CamemBERT-bio Approach Boammani Aser Lompo et.al. 2404.10171 null
2024-04-14 Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms Diandian Guo et.al. 2404.09231 null
2024-04-13 Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model Zita Lifelo et.al. 2404.09045 null
2024-04-11 Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions Agasthya Gangavarapu et.al. 2404.08705 null
2024-04-11 Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain Iker García-Ferrero et.al. 2404.07613 null
2024-04-11 CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models Sheng Wang et.al. 2404.07424 null
2024-04-10 LLMs in Biomedicine: A study on clinical Named Entity Recognition Masoud Monajatipoor et.al. 2404.07376 link
2024-04-10 Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study Hongru Du et.al. 2404.06962 link
2024-04-10 Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination Soojong Kim et.al. 2404.06731 null
2024-04-10 Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology Shashi Kant Gupta et.al. 2404.06680 null
2024-04-09 Comparing Two Model Designs for Clinical Note Generation; Is an LLM a Useful Evaluator of Consistency? Nathan Brake et.al. 2404.06503 null
2024-04-08 MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering Iñigo Alonso et.al. 2404.05590 null
2024-04-15 Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations Yiming Li et.al. 2404.05415 null
2024-04-08 Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients HyoJe Jung et.al. 2404.05144 null
2024-04-07 Clinical Trials Protocol Authoring using LLMs Morteza Maleki et.al. 2404.05044 null
2024-04-07 SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials Mael Jullien et.al. 2404.04963 null
2024-04-07 PairAug: What Can Augmented Image-Text Pairs Do for Radiology? Yutong Xie et.al. 2404.04960 link
2024-04-06 Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology Dyke Ferber et.al. 2404.04667 null
2024-04-06 IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials Shreyasi Mandal et.al. 2404.04510 link
2024-04-04 Conversational Disease Diagnosis via External Planner-Controlled Large Language Models Zhoujian Sun et.al. 2404.04292 link
2024-04-11 CLUE: A Clinical Language Understanding Evaluation for LLMs Amin Dada et.al. 2404.04067 link
2024-04-04 Personalized LLM Response Generation with Parameterized Memory Injection Kai Zhang et.al. 2404.03565 link
2024-04-02 Classifying Cancer Stage with Open-Source Clinical Large Language Models Chia-Hsuan Chang et.al. 2404.01589 null
2024-04-01 Towards a potential paradigm shift in health data collection and analysis David Josef Herzog et.al. 2404.01403 null
2024-04-01 Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models Yi-Lin Tuan et.al. 2404.01295 null
2024-04-01 Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided Hongli Zhan et.al. 2404.01288 link
2024-04-01 Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record Griffin Adams et.al. 2404.01189 null
2024-04-01 LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation Zilong Wang et.al. 2404.00998 null
2024-04-05 How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey Zhonghao Shi et.al. 2404.00938 null
2024-04-04 Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods Yujuan Fu et.al. 2404.00826 link
2024-03-30 Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4 Aryo Pradipta Gema et.al. 2404.00484 link
2024-03-29 Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain Burcu Sayin et.al. 2403.20288 link
2024-04-04 Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries Manjeet Yadav et.al. 2403.20145 null
2024-03-28 Developing Healthcare Language Model Embedding Spaces Niall Taylor et.al. 2403.19802 null
2024-03-28 Bespoke Large Language Models for Digital Triage Assistance in Mental Health Care Niall Taylor et.al. 2403.19790 null
2024-03-28 A Benchmark Evaluation of Clinical Named Entity Recognition in French Nesrine Bannour et.al. 2403.19726 null
2024-03-28 BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation Yuhong He et.al. 2403.19414 null
2024-03-27 Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data Yuting Guo et.al. 2403.19031 null
2024-03-27 Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers Laura Bergomi et.al. 2403.18938 link
2024-03-27 BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models Haitao Li et.al. 2403.18365 null
2024-03-26 Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach Andrea Ferrario et.al. 2403.17873 null
2024-03-26 Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization Jae-hee So et.al. 2403.17428 link
2024-03-27 SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies Akshat Choube et.al. 2403.17219 link
2024-03-25 Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model Braja Gopal Patra et.al. 2403.17199 link
2024-03-25 Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data Shinka Mori et.al. 2403.16909 link
2024-03-25 Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm Lei Liu et.al. 2403.16446 null
2024-03-25 Dia-LLaMA: Towards Large Language Model-driven CT Report Generation Zhixuan Chen et.al. 2403.16386 null
2024-03-26 Large Language Models in Biomedical and Health Informatics: A Bibliometric Review Huizi Yu et.al. 2403.16303 null
2024-03-24 CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering Hongbin Na et.al. 2403.16008 null
2024-03-23 LLMs Instruct LLMs:An Extraction and Editing Method Xin Zhang et.al. 2403.15736 null
2024-03-20 Large language models can help boost food production, but be mindful of their risks Djavan De Clercq et.al. 2403.15475 null
2024-03-19 LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction Hejie Cui et.al. 2403.15464 null
2024-03-29 WoLF: Wide-scope Large Language Model Framework for CXR Understanding Seil Kang et.al. 2403.15456 null
2024-03-26 The opportunities and risks of large language models in mental health Hannah R. Lawrence et.al. 2403.14814 null
2024-04-02 Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis Junyoung Kim et.al. 2403.14801 null
2024-03-27 Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature Jeremy R. Harper et.al. 2403.14721 null
2024-03-21 Large Language Models for Multi-Choice Question Classification of Medical Subjects Víctor Ponce-López et.al. 2403.14582 null
2024-03-20 Polaris: A Safety-focused LLM Constellation Architecture for Healthcare Subhabrata Mukherjee et.al. 2403.13313 null
2024-03-19 Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning Mengxian Lyu et.al. 2403.13089 null
2024-03-19 Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning Cheng Peng et.al. 2403.12374 null
2024-03-18 Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach Maria Mahbub et.al. 2403.12297 null
2024-03-18 A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models Stephen R. Pfohl et.al. 2403.12025 link
2024-04-02 CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification Korbinian Randl et.al. 2403.11904 link
2024-03-18 Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure Ziyi Chen et.al. 2403.11425 link
2024-03-17 Cheap Ways of Extracting Clinical Markers from Texts Anastasia Sandu et.al. 2403.11227 link
2024-03-17 Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping Haoxi Zhang et.al. 2403.11073 null
2024-03-21 Do Large Language Models understand Medical Codes? Simon A. Lee et.al. 2403.10822 null
2024-03-16 LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices Jingping Nie et.al. 2403.10779 null
2024-03-16 Depression Detection on Social Media with Large Language Models Xiaochong Lan et.al. 2403.10750 null
2024-03-15 Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems Antonios Alexos et.al. 2403.10596 null
2024-03-22 Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction Chen Chen et.al. 2403.10581 link
2024-03-15 Trusting the Search: Unraveling Human Trust in Health Information from Google and ChatGPT Xin Sun et.al. 2403.09987 null
2024-03-08 A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health Alexander Marrapese et.al. 2403.09705 null
2024-03-14 Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge Li Yizhen et.al. 2403.09164 null
2024-04-01 A Continued Pretrained LLM Approach for Automatic Medical Note Generation Dong Yuan et.al. 2403.09057 null
2024-03-15 AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic Emad A. Alghamdi et.al. 2403.09017 null
2024-03-14 Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records Erlend Frayling et.al. 2403.08664 null
2024-03-13 MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models Subash Neupane et.al. 2403.08607 null
2024-03-14 Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator Yusheng Liao et.al. 2403.08495 link
2024-03-12 SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models Yu Yang et.al. 2403.07384 link
2024-03-11 Real-Time Multimodal Cognitive Assistant for Emergency Medical Services Keshara Weerasinghe et.al. 2403.06734 link
2024-03-11 Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement Che Liu et.al. 2403.06659 link
2024-03-11 MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding Jiageng Wu et.al. 2403.06611 null
2024-03-11 Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds Jiageng WU et.al. 2403.06609 link
2024-03-11 Can LLMs’ Tuning Methods Work in Medical Multimodal Domain? Jiawei Chen et.al. 2403.06407 link
2024-03-10 ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes Shengxin Hong et.al. 2403.06294 null
2024-03-10 FedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning Zhuo Zhang et.al. 2403.06131 null
2024-03-19 KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques Rui Yang et.al. 2403.05881 link
2024-03-08 A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries Asad Aali et.al. 2403.05720 link
2024-03-08 Decomposing Vision-based LLM Predictions for Auto-Evaluation with GPT-4 Qingqing Zhu et.al. 2403.05680 null
2024-03-11 Tell me the truth: A system to measure the trustworthiness of Large Language Models Carlo Lipizzi et.al. 2403.04964 null
2024-03-13 Electrocardiogram Instruction Tuning for Report Generation Zhongwei Wan et.al. 2403.04945 null
2024-03-07 Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering Ojas Gramopadhye et.al. 2403.04890 link
2024-03-06 Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification Ricardo Bigolin Lanfredi et.al. 2403.04024 link
2024-03-06 Towards Safe and Aligned Large Language Models for Medicine Tessa Han et.al. 2403.03744 link
2024-03-09 Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People Xidong Wang et.al. 2403.03640 link
2024-03-05 Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse Joseph Gatto et.al. 2403.03336 null
2024-03-05 Socratic Reasoning Improves Positive Text Rewriting Anmol Goel et.al. 2403.03029 null
2024-03-05 Towards Training A Chinese Large Language Model for Anesthesiology Zhonghai Wang et.al. 2403.02742 null
2024-03-05 Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research Brenda Y. Miao et.al. 2403.02558 link
2024-03-16 SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction Jiahuan Yan et.al. 2403.01570 null
2024-03-01 Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries Zelalem Gero et.al. 2403.01002 link
2024-03-01 Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language Xiaohan Ding et.al. 2403.00994 null
2024-03-01 AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models Lang Cao et.al. 2403.00953 null
2024-03-01 SoftTiger: A Clinical Foundation Model for Healthcare Workflows Ye Chen et.al. 2403.00868 link
2024-02-29 EyeGPT: Ophthalmic Assistant with Large Language Models Xiaolan Chen et.al. 2403.00840 null
2024-02-28 MedAide: Leveraging Large Language Models for On-Premise Medical Assistance on Edge Devices Abdul Basit et.al. 2403.00830 null
2024-02-18 ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework Zhongqi Yang et.al. 2403.00781 null
2024-02-29 OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models Jenish Maharjan et.al. 2402.19371 null
2024-02-29 Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study Prottay Kumar Adhikary et.al. 2402.19052 null
2024-02-28 Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models Derong Xu et.al. 2402.18099 link
2024-03-13 Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions Hanjie Chen et.al. 2402.18060 link
2024-03-02 JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability Junda Wang et.al. 2402.17887 link
2024-02-28 Prescribing Large Language Models for Perioperative Care: What’s The Right Dose for Pre-trained Models? Bing Xue et.al. 2402.17493 link
2024-02-27 A Piece of Theatre: Investigating How Teachers Design LLM Chatbots to Assist Adolescent Cyberbullying Education Michael A. Hedderich et.al. 2402.17456 null
2024-02-27 Deep Learning Based Named Entity Recognition Models for Recipes Mansi Goel et.al. 2402.17447 null
2024-02-26 OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA) Fujian Jia et.al. 2402.16810 null
2024-02-26 LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery Kexin Chen et.al. 2402.16664 link
2024-02-26 LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification Yiping Song et.al. 2402.16515 null
2024-02-26 From RAGs to riches: Using large language models to write documents for clinical trials Nigel Markey et.al. 2402.16406 null
2024-02-25 HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs Cem Uluoglakci et.al. 2402.16211 link
2024-02-27 EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings Sunjun Kweon et.al. 2402.16040 link
2024-02-24 Predicting Outcomes in Video Games with Long Short Term Memory Networks Kittimate Chulajata et.al. 2402.15923 link
2024-02-24 Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study Zhaoyue Sun et.al. 2402.15663 link
2024-02-23 Enhancing ICU Patient Recovery: Using LLMs to Assist Nurses in Diary Writing Samuel Kernan Freire et.al. 2402.15205 null
2024-02-21 Automatic Histograms: Leveraging Language Models for Text Dataset Exploration Emily Reif et.al. 2402.14880 link
2024-02-20 A Dual-Prompting for Interpretable Mental Health Language Models Hyolim Jeon et.al. 2402.14854 null
2024-02-19 RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning Congyun Jin et.al. 2402.14840 null
2024-02-23 A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health Nikhil Behari et.al. 2402.14807 null
2024-02-22 Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond Zhiyuan Wang et.al. 2402.14259 null
2024-02-22 Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology Nur Yildirim et.al. 2402.14252 null
2024-02-21 On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study Minh-Hao Van et.al. 2402.14162 null
2024-02-21 EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy Hamed Hooshangnejad et.al. 2402.14099 null
2024-02-26 Towards Building Multilingual Language Model for Medicine Pengcheng Qiu et.al. 2402.13963 link
2024-02-21 SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization Prakamya Mishra et.al. 2402.13919 link
2024-02-21 Factual Consistency Evaluation of Summarisation in the Era of Large Language Models Zheheng Luo et.al. 2402.13758 null
2024-02-20 Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation Zhiyao Ren et.al. 2402.13408 null
2024-02-17 When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection Xiangyu Zhang et.al. 2402.13276 null
2024-02-20 BiMediX: Bilingual Medical Mixture of Experts LLM Sara Pieri et.al. 2402.13253 link
2024-02-23 Benchmarking Retrieval-Augmented Generation for Medicine Guangzhi Xiong et.al. 2402.13178 link
2024-02-20 Few shot clinical entity recognition in three languages: Masked language models outperform LLM prompting Marco Naguib et.al. 2402.12801 null
2024-02-20 Me LLaMA: Foundation Large Language Models for Medical Applications Qianqian Xie et.al. 2402.12749 link
2024-02-19 LLM Agents for Psychology: A Study on Gamified Assessments Qisen Yang et.al. 2402.12326 null
2024-02-19 Automatic Evaluation for Mental Health Counseling using LLMs Anqi Li et.al. 2402.11958 null
2024-02-19 The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth Shir Lissak et.al. 2402.11886 link
2024-02-19 NOTE: Notable generation Of patient Text summaries through Efficient approach based on direct preference optimization Imjin Ahn et.al. 2402.11882 null
2024-02-20 MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs Yavuz Faruk Bakman et.al. 2402.11756 link
2024-02-18 DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics YiQiu Guo et.al. 2402.11481 null
2024-02-18 FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence Sebastian Antony Joseph et.al. 2402.11456 link
2024-02-20 Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis Shaochen Xu et.al. 2402.11398 null
2024-02-17 Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention Eunkyung Jo et.al. 2402.11353 null
2024-02-17 KnowTuning: Knowledge-aware Fine-tuning for Large Language Models Yougang Lyu et.al. 2402.11176 link
2024-02-24 Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model Salman Rahman et.al. 2402.10965 null
2024-02-10 DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting Chris von Csefalvay et.al. 2402.10951 null
2024-02-09 Zero-shot Explainable Mental Health Analysis on Social Media by incorporating Mental Scales Wenyu Li et.al. 2402.10948 null
2024-02-16 Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks Niall Taylor et.al. 2402.10597 null
2024-02-15 BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains Yanis Labrak et.al. 2402.10373 link
2024-02-28 Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients Mahyar Abbasian et.al. 2402.10153 null
2024-02-15 Towards Reducing Diagnostic Errors with Interpretable Risk Prediction Denis Jered McInerney et.al. 2402.10109 null
2024-02-15 Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4 Ting Fang Tan et.al. 2402.10083 null
2024-02-21 AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis Zhihao Fan et.al. 2402.09742 link
2024-02-15 GPT-4’s assessment of its performance in a USMLE-based case study Uttam Dhakal et.al. 2402.09654 null
2024-02-14 Probabilistic Reasoning in Generative Large Language Models Aliakbar Nafar et.al. 2402.09614 link
2024-02-16 Emerging Opportunities of Using Large Language Models for Translation Between Drug Molecules and Indications David Oniani et.al. 2402.09588 null
2024-02-14 Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support Zilin Ma et.al. 2402.09260 null
2024-02-13 Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy Gioele Barabucci et.al. 2402.08806 null
2024-02-13 JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models Jillian Fisher et.al. 2402.08761 link
2024-02-13 The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting David Haag et.al. 2402.08658 null
2024-02-20 Addressing cognitive bias in medical language models Samuel Schmidgall et.al. 2402.08113 link
2024-02-02 Exploring patient trust in clinical advice from AI-driven LLMs like ChatGPT for self-diagnosis Delong Du et.al. 2402.07920 null
2024-02-12 CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity Norbert Tihanyi et.al. 2402.07688 null
2024-02-12 The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models Ayo Adedeji et.al. 2402.07658 null
2024-02-12 Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models Isabelle Lorge et.al. 2402.07645 link
2024-02-10 Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations Ankit Pal et.al. 2402.07023 link
2024-02-10 REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models Yinghao Zhu et.al. 2402.07016 null
2024-02-09 RareBench: Can LLMs Serve as Rare Diseases Specialists? Xuanzhong Chen et.al. 2402.06341 link
2024-02-08 FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs Eun Cheol Choi et.al. 2402.05904 link
2024-02-05 Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering Aryan Agrawal et.al. 2402.05127 null
2024-02-05 Zero-Shot Clinical Trial Patient Matching with LLMs Michael Wornow et.al. 2402.05125 null
2024-02-07 CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients Pragnya Ramjee et.al. 2402.04620 link
2024-02-06 Measuring Implicit Bias in Explicitly Unbiased Large Language Models Xuechunzi Bai et.al. 2402.04105 link
2024-02-06 The Use of a Large Language Model for Cyberbullying Detection Bayode Ogunleye et.al. 2402.04088 null
2024-02-06 Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models Reza Khanmohammadi et.al. 2402.04075 null
2024-02-05 Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach Sergi Blanco-Cuaresma et.al. 2402.03435 null
2024-02-05 Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models Zhiyuan Hu et.al. 2402.03271 link
2024-02-05 Large Language Model Distilling Medication Recommendation Model Qidong Liu et.al. 2402.02803 link
2024-02-05 RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews Satpreet Harcharan Singh et.al. 2402.02656 link
2024-02-03 How well do LLMs cite relevant medical references? An evaluation framework and analyses Kevin Wu et.al. 2402.02008 null
2024-02-02 Leveraging Large Language Models for Analyzing Blood Pressure Variations Across Biological Sex from Scientific Literature Yuting Guo et.al. 2402.01826 null
2024-02-01 Hierarchical Multi-Label Classification of Online Vaccine Concerns Chloe Qinyu Zhu et.al. 2402.01783 null
2024-01-30 Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer’s Dementia Balamurali B T et.al. 2402.01751 null
2024-01-29 Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties Jasmine Chiat Ling Ong et.al. 2402.01741 null
2024-01-29 Development and Testing of Retrieval Augmented Generation in Large Language Models – A Case Study Report YuHe Ke et.al. 2402.01733 null
2024-01-28 Evaluating LLM – Generated Multimodal Diagnosis from Medical Images and Symptom Analysis Dimitrios P. Panagoulias et.al. 2402.01730 null
2024-02-10 Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data Yinghao Zhu et.al. 2402.01713 link
2024-01-25 LLM on FHIR – Demystifying Health Records Paul Schmiedmayer et.al. 2402.01711 null
2024-01-23 Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study Zhe He et.al. 2402.01693 null
2024-02-01 HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent Weijie Xu et.al. 2402.01018 link
2024-02-13 Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model Mingyu Jin et.al. 2402.00746 link
2024-02-01 SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models Tianhan Xu et.al. 2402.00474 null
2024-01-31 Multimodal Clinical Pseudo-notes for Emergency Department Prediction Tasks using Multiple Embedding Model for EHR (MEME) Simon A. Lee et.al. 2402.00160 link
2024-01-30 GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries Yuyuan Feng et.al. 2402.00068 null
2024-02-03 EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation Jonathan W. Kim et.al. 2401.18006 null
2024-01-31 Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning Yuelyu Ji et.al. 2401.17602 link
2024-01-30 Detecting mental disorder on social media: a ChatGPT-augmented explainable approach Loris Belcastro et.al. 2401.17477 link
2024-02-02 Leveraging Professional Radiologists’ Expertise to Enhance LLMs’ Evaluation for Radiology Reports Qingqing Zhu et.al. 2401.16578 null
2024-01-29 InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification Jan Trienes et.al. 2401.16475 link
2024-02-16 Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media Jiayu Song et.al. 2401.16240 null
2024-01-29 “You tell me”: A Dataset of GPT-4-Based Behaviour Change Support Conversations Selina Meyer et.al. 2401.16167 null
2024-01-29 Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis Haochun Wang et.al. 2401.16107 null
2024-01-29 Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning Kenta Izumi et.al. 2401.15966 null
2024-01-28 AI as a Medical Ally: Evaluating ChatGPT’s Usage and Impact in Indian Healthcare Aryaman Raina et.al. 2401.15605 null
2024-01-27 Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models Minbyul Jeong et.al. 2401.15269 link
2024-01-26 Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning Md Mushfiqur Rahman et.al. 2401.15043 link
2024-01-26 Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias Yu He Ke et.al. 2401.14589 null
2024-01-25 K-QA: A Real-World Medical Q&A Benchmark Itay Manes et.al. 2401.14493 link
2024-01-25 LongHealth: A Question Answering Benchmark with Long Clinical Documents Lisa Adams et.al. 2401.14490 link
2024-01-25 The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support Inhwa Song et.al. 2401.14362 null
2024-01-25 A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification Madhumita Sushil et.al. 2401.13887 null
2024-01-24 Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes Darren Liu et.al. 2401.13588 null
2024-01-20 Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA Xi Chen et.al. 2401.12998 null
2024-01-10 A General-purpose AI Avatar in Healthcare Nicholas Yan et.al. 2401.12981 null
2024-01-22 CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation Zhihong Chen et.al. 2401.12208 null
2024-01-22 CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark Ge Zhang et.al. 2401.11944 null
2024-01-21 MedLM: Exploring Language Models for Medical Question Answering Systems Niraj Yagnik et.al. 2401.11389 link
2024-01-23 Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines David Oniani et.al. 2401.11120 null
2024-01-19 BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks Valentina Aparicio et.al. 2401.11011 null
2024-01-19 Dynamic Q&A of Clinical Documents with Large Language Models Ran Elgedawy et.al. 2401.10733 null
2024-01-17 Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study Niklas Mannhardt et.al. 2401.09637 null
2024-01-16 Gene-associated Disease Discovery Powered by Large Language Models Jiayu Chang et.al. 2401.09490 null
2024-01-17 Understanding the concerns and choices of public when using large language models for healthcare Yunpeng Xiao et.al. 2401.09090 null
2024-01-16 Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration Simone Balloccu et.al. 2401.08420 link
2024-01-14 Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study Ahmadul Karim Chowdhury et.al. 2401.07310 link
2024-01-13 EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records Wenqi Shi et.al. 2401.07128 link
2024-01-13 NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey B. Ross Katz et.al. 2401.06967 link
2024-01-12 Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data Yubin Kim et.al. 2401.06866 link
2023-12-12 Large language models in healthcare and medical domain: A review Zabir Al Nazi et.al. 2401.06775 null
2024-01-11 Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models K M Sajjadul Islam et.al. 2401.06088 null
2024-01-11 EpilepsyLLM: Domain-Specific Large Language Model Fine-tuned with Epilepsy Medical Knowledge Xuyang Zhao et.al. 2401.05908 null
2024-01-11 Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback Chengfeng Dou et.al. 2401.05695 link
2024-01-11 Towards Conversational Diagnostic AI Tao Tu et.al. 2401.05654 null
2024-01-18 MISS: A Generative Pretraining and Finetuning Approach for Med-VQA Jiawei Chen et.al. 2401.05163 link
2024-01-01 Large Language Models in Mental Health Care: a Scoping Review Yining Hua et.al. 2401.02984 null
2024-01-05 Generative Large Language Models are autonomous practitioners of evidence-based medicine Akhil Vaid et.al. 2401.02851 null
2024-01-04 SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval Griffin Adams et.al. 2401.02369 null
2024-01-04 Text2MDT: Extracting Medical Decision Trees from Medical Texts Wei Zhu et.al. 2401.02034 null
2024-01-06 Generalist embedding models are better at short-context clinical semantic search than specialized embedding models Jean-Baptiste Excoffier et.al. 2401.01943 link
2024-01-03 MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries Akash Ghosh et.al. 2401.01596 link
2024-01-06 Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review Luoma Ke et.al. 2401.01519 null
2024-01-03 Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation Walid Saba et.al. 2401.01469 null
2024-01-08 A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models S. M Towhidul Islam Tonmoy et.al. 2401.01313 null
2024-01-01 A Computational Framework for Behavioral Assessment of LLM Therapists Yu Ying Chiu et.al. 2401.00820 link
2023-12-31 An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks Yash Bingi et.al. 2401.00582 null
2023-12-31 Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing Omid Rohanian et.al. 2401.00579 null
2023-12-29 K-PERM: Personalized Response Generation Using Dynamic Knowledge Retrieval and Persona-Adaptive Queries Kanak Raj et.al. 2312.17748 link
2023-12-29 Overview of the PromptCBLUE Shared Task in CHIP2023 Wei Zhu et.al. 2312.17522 link
2023-12-29 Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning Xiao-Yang Liu et.al. 2312.17493 null
2023-12-29 EHR Interaction Between Patients and AI: NoteAid EHR Interaction Xiaocheng Zhang et.al. 2312.17475 null
2023-12-29 LLM Factoscope: Uncovering LLMs’ Factual Discernment through Inner States Analysis Jinwen He et.al. 2312.16374 null
2023-12-26 Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models Xinke Jiang et.al. 2312.15883 null
2023-12-25 IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models Zhihao Chen et.al. 2312.15663 null
2023-12-23 Multimodal Machine Learning Combining Facial Images and Clinical Texts Improves Diagnosis of Rare Genetic Diseases Da Wu et.al. 2312.15320 link
2023-12-06 Empowering ChatGPT-Like Large-Scale Language Models with Local Knowledge Base for Industrial Prognostics and Health Management Huan Wang et.al. 2312.14945 null
2023-12-22 Robust Knowledge Extraction from Large Language Models using Social Choice Theory Nico Potyka et.al. 2312.14877 link
2023-12-22 Zero-shot Causal Graph Extrapolation from Text via LLMs Alessandro Antonucci et.al. 2312.14670 link
2023-12-19 Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning Xiaodan Zhang et.al. 2312.14184 null
2023-12-20 Exploring Multimodal Large Language Models for Radiology Report Error-checking Jinge Wu et.al. 2312.13103 null
2023-12-20 MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models Yan Cai et.al. 2312.12806 null
2023-12-20 Fine-tuning Large Language Models for Adaptive Machine Translation Yasmin Moslem et.al. 2312.12740 link
2023-12-20 Mini-GPTs: Efficient Large Language Models through Contextual Pruning Tim Valicenti et.al. 2312.12682 null
2023-12-19 Can ChatGPT be Your Personal Medical Assistant? Md. Rafiul Biswas et.al. 2312.12006 null
2023-12-19 Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health Maria Antoniak et.al. 2312.11803 link
2023-12-16 CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare Akash Ghosh et.al. 2312.11541 link
2023-12-16 A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers Feida Gu et.al. 2312.10419 null
2023-12-15 GPT-doctor: Customizing Large Language Models for Medical Consultation Wen Wang et.al. 2312.10225 null
2023-12-15 Low-resource classification of mobility functioning information in clinical sentences using large language models Tuan Dung Le et.al. 2312.10202 null
2023-12-06 Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk Colleen Chan et.al. 2312.10072 null
2023-12-15 Distilling Large Language Models for Matching Patients to Clinical Trials Mauro Nievas et.al. 2312.09958 null
2024-01-07 RJUA-QA: A Comprehensive QA Dataset for Urology Shiwei Lyu et.al. 2312.09785 link
2023-12-14 Evaluating Large Language Models for Health-related Queries with Presuppositions Navreet Kaur et.al. 2312.08800 link
2023-12-15 High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models Songchi Zhou et.al. 2312.08274 null
2023-12-13 CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs Huaiyuan Ying et.al. 2312.08036 link
2023-12-12 Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales Taeyoon Kwon et.al. 2312.07399 link
2023-12-12 Efficient Few-Shot Clinical Task Adaptation with Large Language Models Kaipeng Zheng et.al. 2312.07125 null
2023-12-12 SM70: A Large Language Model for Medical Devices Anubhav Bhatti et.al. 2312.06974 null
2023-12-05 Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety Manas Gaur et.al. 2312.06798 null
2023-12-11 Large Language Models with Retrieval-Augmented Generation for Zero-Shot Disease Phenotyping Will E. Thompson et.al. 2312.06457 null
2023-12-11 Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need Cheng Peng et.al. 2312.06099 null
2023-12-09 Enhancing Medical Specialty Assignment to Patients using NLP Techniques Chris Solomou et.al. 2312.05585 null
2023-11-10 Holistic Evaluation of GPT-4V for Biomedical Imaging Zhengliang Liu et.al. 2312.05256 null
2023-12-08 Ophtha-LLaMA2: A Large Language Model for Ophthalmology Huan Zhao et.al. 2312.04906 null
2023-12-07 AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making Shusen Liu et.al. 2312.04494 null
2023-12-08 Methods to Estimate Large Language Model Confidence Maia Kotelanski et.al. 2312.03733 null
2023-12-06 XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering Joel Stremmel et.al. 2312.03567 null
2023-12-05 Breast Ultrasound Report Generation using LangChain Jaeyoung Huh et.al. 2312.03013 null
2023-12-05 MedDM:LLM-executable clinical guidance tree for clinical decision-making Binbin Li et.al. 2312.02441 null
2023-12-04 LLMs Accelerate Annotation for Medical Information Extraction Akshay Goel et.al. 2312.02296 null
2023-12-04 MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model Ling Yang et.al. 2312.02233 null
2023-12-03 Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation Yuzhe Lu et.al. 2312.01504 null
2023-12-18 From Beginner to Expert: Modeling Medical Knowledge into General LLMs Qiang Li et.al. 2312.01040 null
2023-12-01 Explanatory Argument Extraction of Correct Answers in Resident Medical Exams Iakes Goenaga et.al. 2312.00567 link
2023-11-30 Towards Accurate Differential Diagnosis with Large Language Models Daniel McDuff et.al. 2312.00164 null
2023-11-30 RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance Chantal Pellegrini et.al. 2311.18681 link
2023-11-29 Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A Andries Smit et.al. 2311.17371 link
2023-11-27 MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Zeming Chen et.al. 2311.16079 link
2023-11-27 BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights François Remy et.al. 2311.16075 null
2023-11-27 RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization Kwanyoung Kim et.al. 2311.15876 null
2023-11-28 The effect of source disclosure on evaluation of AI-generated messages: A two-part study Sue Lim et.al. 2311.15544 null
2023-11-25 Walking a Tightrope – Evaluating Large Language Models in High-Risk Domains Chia-Chien Hung et.al. 2311.14966 null
2023-11-20 MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer’s Care Via Unleashing Generative AI Lifei Zheng et.al. 2311.14730 null
2023-11-10 ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management Angela Zhang et.al. 2311.14703 null
2023-11-07 Benefits and Harms of Large Language Models in Digital Mental Health Munmun De Choudhury et.al. 2311.14693 null
2023-11-23 Challenges of Large Language Models for Mental Health Counseling Neo Christopher Chung et.al. 2311.13857 null
2023-11-22 Surpassing GPT-4 Medical Coding with a Two-Stage Approach Zhichao Yang et.al. 2311.13735 null
2023-11-22 Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting Daphne van Zandvoort et.al. 2311.13274 null
2023-11-25 From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models Zachary Englhardt et.al. 2311.13063 link
2023-10-28 Overview of Current Applications of Large Language Models in Various Medical Specialities Ummara Mumtaz et.al. 2311.12882 null
2023-11-21 ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models Jiankai Tang et.al. 2311.12524 link
2023-11-20 Web News Timeline Generation with Extended Task Prompting Sha Wang et.al. 2311.11652 null
2023-12-17 Rethinking Large Language Models in Mental Health Applications Shaoxiong Ji et.al. 2311.11267 null
2023-11-18 Designing Interpretable ML System to Enhance Trustworthy AI in Healthcare: A Systematic Review of the Last Decade to A Proposed Robust Framework Elham Nasarian et.al. 2311.11055 null
2023-11-17 PEFT-MedAware: Large Language Model for Medical Awareness Keivalya Pandya et.al. 2311.10697 null
2023-11-17 Countering Misinformation via Emotional Response Generation Daniel Russo et.al. 2311.10587 link
2023-11-16 MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning Xiangru Tang et.al. 2311.10537 link
2023-11-16 ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond Kanhai S. Amin et.al. 2311.10075 null
2023-11-16 HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs Junying Chen et.al. 2311.09774 link
2023-11-16 CARE: Extracting Experimental Findings From Clinical Literature Aakanksha Naik et.al. 2311.09736 null
2023-11-16 Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation Zonghai Yao et.al. 2311.09684 link
2023-11-16 LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks Mihir Parmar et.al. 2311.09564 link
2023-11-12 Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling Yujin Cho et.al. 2311.09243 null
2023-11-15 PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health Haoan Jin et.al. 2311.09189 link
2023-11-14 Fine-tuning Language Models for Factuality Katherine Tian et.al. 2311.08401 null
2023-11-14 Extrinsically-Focused Evaluation of Omissions in Medical Summarization Elliot Schumacher et.al. 2311.08303 link
2023-11-14 Insights into Classifying and Mitigating LLMs’ Hallucinations Alessandro Bruno et.al. 2311.08117 null
2023-11-13 It’s Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models Nishant Balepur et.al. 2311.07532 link
2023-11-13 Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer Narmada Naik et.al. 2311.07191 null
2023-11-12 Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations? Antonio Zaitoun et.al. 2311.06858 link
2023-11-23 ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences Yuanhe Tian et.al. 2311.06025 link
2023-11-09 A Survey of Large Language Models in Medicine: Progress, Application, and Challenge Hongjian Zhou et.al. 2311.05112 link
2023-11-08 DEMASQ: Unmasking the ChatGPT Wordsmith Kavita Kumari et.al. 2311.05019 null
2023-11-07 Evaluating Large Language Models in Ophthalmology Jason Holmes et.al. 2311.04933 null
2023-11-07 Evaluating multiple large language models in pediatric ophthalmology Jason Holmes et.al. 2311.04368 null
2023-11-08 An Introduction to Natural Language Processing Techniques and Framework for Clinical Implementation in Radiation Oncology Reza Khanmohammadi et.al. 2311.02205 null
2023-11-03 Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review Mingze Yuan et.al. 2311.01918 link
2023-11-27 LLM-driven Multimodal Target Volume Contouring in Radiation Oncology Yujin Oh et.al. 2311.01908 link
2023-11-01 Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models Ran Xu et.al. 2311.00287 link
2023-10-31 Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision Jiaxin Zhang et.al. 2310.20153 null
2023-11-03 Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization Prakamya Mishra et.al. 2310.20033 link
2023-10-30 EHRTutor: Enhancing Patient Understanding of Discharge Instructions Zihao Zhang et.al. 2310.19212 null
2023-10-23 Health Disparities through Generative AI Models: A Comparison Study Using A Domain Specific large language model Yohn Jairo Parra Bautista et.al. 2310.18355 null
2023-10-21 MOELoRA: An MOE-based Parameter Efficient Fine-Tuning Method for Multi-task Medical Applications Qidong Liu et.al. 2310.18339 link
2023-11-01 Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare Junling Liu et.al. 2310.17956 link
2023-10-31 Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting Benjamin Yan et.al. 2310.17811 null
2023-10-25 An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives Young Min Cho et.al. 2310.17017 link
2023-10-24 Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature Alejandro Lozano et.al. 2310.16146 link
2023-10-24 NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes Junda Wang et.al. 2310.15959 link
2023-10-24 BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT Yirong Chen et.al. 2310.15896 link
2023-10-24 BLESS: Benchmarking Large Language Models on Sentence Simplification Tannon Kew et.al. 2310.15773 link
2023-10-23 AlpaCare:Instruction-tuned Large Language Models for Medical Application Xinlu Zhang et.al. 2310.14558 link
2023-10-22 PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain Wei Zhu et.al. 2310.14151 link
2023-10-23 Explainable Depression Symptom Detection in Social Media Eliseo Bao Souto et.al. 2310.13664 null
2023-10-23 Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries Yiqiao Jin et.al. 2310.13132 link
2023-10-19 Causal-structure Driven Augmentations for Text OOD Generalization Amir Feder et.al. 2310.12803 null
2023-10-18 On the Benefit of Generative Foundation Models for Human Activity Recognition Zikang Leng et.al. 2310.12085 null
2023-10-17 Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models Khushboo Verma et.al. 2310.11266 null
2023-10-16 JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning Issey Sukeda et.al. 2310.10083 null
2023-10-13 Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation Eun Cheol Choi et.al. 2310.09223 null
2023-10-13 Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model Qichen Ye et.al. 2310.09089 link

UncertaintyLLM

Publish Date Title Authors PDF Code
2025-06-26 Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection Ali Şenol et.al. 2506.21443 null
2025-06-26 Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference Colin Samplawski et.al. 2506.21408 null
2025-06-26 Small Encoders Can Rival Large Decoders in Detecting Groundedness Istabrak Abbes et.al. 2506.21288 null
2025-06-26 BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services Zhaojiacheng Zhou et.al. 2506.21033 null
2025-06-26 Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers Martin Ruskov et.al. 2506.20982 null
2025-06-25 Towards Probabilistic Question Answering Over Tabular Data Chen Shen et.al. 2506.20747 null
2025-06-25 Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges Alexander D. Kalian et.al. 2506.20598 null
2025-06-26 TAPS: Tool-Augmented Personalisation via Structured Tagging Ekaterina Taktasheva et.al. 2506.20409 null
2025-06-25 Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models Kejia Chen et.al. 2506.20251 null
2025-06-25 DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs Ruokai Yin et.al. 2506.20194 null
2025-06-24 KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Baochang Ren et.al. 2506.19807 null
2025-06-24 LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis Lei Kang et.al. 2506.19702 null
2025-06-24 Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge Juraj Vladika et.al. 2506.19607 null
2025-06-24 Automatic Posology Structuration : What role for LLMs? Natalia Bobkova et.al. 2506.19525 null
2025-06-24 Inference-Time Reward Hacking in Large Language Models Hadi Khalaf et.al. 2506.19248 null
2025-06-23 AgenticControl: An Automated Control Design Framework Using Large Language Models Mohammad Narimani et.al. 2506.19160 null
2025-06-23 Human-Aligned Faithfulness in Toxicity Explanations of LLMs Ramaravind K. Mothilal et.al. 2506.19113 null
2025-06-23 Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge Sahil Kale et.al. 2506.18998 null
2025-06-23 AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs Piotr Matys et.al. 2506.18628 null
2025-06-23 ReFrame: Rectification Framework for Image Explaining Architectures Debjyoti Das Adhikary et.al. 2506.18272 null
2025-06-24 Understanding Reasoning in Thinking Language Models via Steering Vectors Constantin Venhoff et.al. 2506.18167 null
2025-06-22 Mechanistic Interpretability in the Presence of Architectural Obfuscation Marcos Florencio et.al. 2506.18053 null
2025-06-22 QueueEDIT: Structural Self-Correction for Sequential Model Editing in LLMs Taolin Zhang et.al. 2506.17864 null
2025-06-21 Is Your Automated Software Engineer Trustworthy? Noble Saji Mathews et.al. 2506.17812 null
2025-06-24 KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation Dalong Zhang et.al. 2506.17728 null
2025-06-21 Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering Binquan Ji et.al. 2506.17692 null
2025-06-21 Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models Yukun Huang et.al. 2506.17585 null
2025-06-20 OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections Manasa Bharadwaj et.al. 2506.17449 null
2025-06-20 UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making Jinhao Duan et.al. 2506.17419 null
2025-06-20 Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs Zongjie Li et.al. 2506.17353 null
2025-06-18 Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study Chuanlei Li et.al. 2506.17311 null
2025-06-17 Semantic uncertainty in advanced decoding methods for LLM generation Darius Foodeei et.al. 2506.17296 null
2025-06-20 Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction Jiekai Ma et.al. 2506.17203 null
2025-06-20 Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation Jiahao Cheng et.al. 2506.17088 null
2025-06-20 Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond Antonin Berthon et.al. 2506.16982 null
2025-06-20 DistillNote: LLM-based clinical note summaries improve heart failure diagnosis Heloisa Oss Boll et.al. 2506.16777 null
2025-06-20 eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing Isaac Shi et.al. 2506.16768 null
2025-06-20 The Role of Model Confidence on Bias Effects in Measured Uncertainties Xinyi Liu et.al. 2506.16724 null
2025-06-19 Grounding Language Models with Semantic Digital Twins for Robotic Planning Mehreen Naeem et.al. 2506.16493 null
2025-06-19 Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation Guilherme Guerino et.al. 2506.16345 null
2025-06-19 SGIC: A Self-Guided Iterative Calibration Framework for RAG Guanhua Chen et.al. 2506.16172 null
2025-06-19 Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior Hao Li et.al. 2506.16163 link
2025-06-19 Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning Duc Hieu Ho et.al. 2506.16064 null
2025-06-19 DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling Fei Wang et.al. 2506.16043 null
2025-06-18 Understanding Online Polarization Through Human-Agent Interaction in a Synthetic LLM-Based Social Network Tim Donkers et.al. 2506.15866 null
2025-06-18 PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection Wenhao Li et.al. 2506.15656 null
2025-06-18 Context-Informed Grounding Supervision Hyunji Lee et.al. 2506.15480 link
2025-06-18 Unlocking Post-hoc Dataset Inference with Synthetic Data Bihe Zhao et.al. 2506.15271 null
2025-06-18 Robust Instant Policy: Leveraging Student’s t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation Hanbit Oh et.al. 2506.15157 null
2025-06-18 HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models Trishna Chakraborty et.al. 2506.15065 null
2025-06-17 Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning Wassim Bouaziz et.al. 2506.14913 null
2025-06-17 Issue Retrieval and Verification Enhanced Supplementary Code Comment Generation Yanzhen Zou et.al. 2506.14649 link
2025-06-17 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees Ahmed Heakl et.al. 2506.14606 null
2025-06-17 RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition Tim Cofala et.al. 2506.14412 null
2025-06-17 Don’t Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning William F. Shen et.al. 2506.14387 null
2025-06-17 AviationLLM: An LLM-based Knowledge System for Aviation Training Jia’ang Wan et.al. 2506.14336 null
2025-06-17 Improving LoRA with Variational Learning Bai Cong et.al. 2506.14280 null
2025-06-17 DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization Chengyu Huang et.al. 2506.14157 link
2025-06-17 Abstract Meaning Representation for Hospital Discharge Summarization Paul Landes et.al. 2506.14101 link
2025-06-20 Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs Hen Davidov et.al. 2506.13593 link
2025-06-16 Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning David Bani-Harouni et.al. 2506.13474 null
2025-06-17 ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models Junho Yoon et.al. 2506.13472 null
2025-06-16 From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs Alsharif Abuadbba et.al. 2506.13434 null
2025-06-16 Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs Houcheng Jiang et.al. 2506.13285 null
2025-06-16 IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation Zijie Lin et.al. 2506.13229 link
2025-06-16 SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists Lynn Khellaf et.al. 2506.13188 null
2025-06-16 Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning Danny Hoang et.al. 2506.13026 null
2025-06-17 Surprise Calibration for Better In-Context Learning Zhihang Tan et.al. 2506.12796 null
2025-06-15 Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning Alexis R. Tudor et.al. 2506.12667 null
2025-06-14 Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics Jiarui Liu et.al. 2506.12657 null
2025-06-14 GenControl: Generative AI-Driven Autonomous Design of Control Algorithms Chenggang Cui et.al. 2506.12554 null
2025-06-14 RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking Shuo Yang et.al. 2506.12538 null
2025-06-14 Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation Xiangyan Chen et.al. 2506.12496 null
2025-06-14 MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination Ao Jia et.al. 2506.12483 null
2025-06-13 Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach Khadija Zanna et.al. 2506.12227 null
2025-06-13 A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions Stephen Mell et.al. 2506.12202 null
2025-06-13 Maximally-Informative Retrieval for State Space Model Generation Evan Becker et.al. 2506.12149 null
2025-06-12 LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model’s Response for Vulnerability Analysis Reza Fayyazi et.al. 2506.12100 link
2025-06-13 LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Zihan Zheng et.al. 2506.11928 null
2025-06-13 TreeRL: LLM Reinforcement Learning with On-Policy Tree Search Zhenyu Hou et.al. 2506.11902 link
2025-06-16 Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making Claudio Fanconi et.al. 2506.11887 null
2025-06-13 Are LLMs Good Text Diacritizers? An Arabic and Yorùbá Case Study Hawau Olamide Toyin et.al. 2506.11602 null
2025-06-13 Augmenting the Generality and Performance of Large Language Models for Software Engineering Fabian C. Peña et.al. 2506.11548 null
2025-06-11 Digitization of Document and Information Extraction using OCR Rasha Sinha et.al. 2506.11156 null
2025-06-11 From over-reliance to smart integration: using Large-Language Models as translators between specialized modeling and simulation tools Philippe J. Giabbanelli et.al. 2506.11141 null
2025-06-10 Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK Carlos Garcia-Fernandez et.al. 2506.11129 null
2025-06-14 Farseer: A Refined Scaling Law in Large Language Models Houyi Li et.al. 2506.10972 link
2025-06-12 Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Yixiao Huang et.al. 2506.10887 null
2025-06-13 Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles Qingyan Wei et.al. 2506.10848 link
2025-06-12 Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs Alberto Testoni et.al. 2506.10769 null
2025-06-12 Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs Yilin Xiao et.al. 2506.10508 null
2025-06-12 PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier Yuhua Jiang et.al. 2506.10406 null
2025-06-12 AutoGEEval++: A Multi-Level and Multi-Geospatial-Modality Automated Evaluation Framework for Large Language Models in Geospatial Code Generation on Google Earth Engine Shuyang Hou et.al. 2506.10365 null
2025-06-12 TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree Yu-Yang Qian et.al. 2506.10355 link
2025-06-12 Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements Seyed Moein Abtahi et.al. 2506.10330 null
2025-06-12 WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models Qiyue Yin et.al. 2506.10264 null
2025-06-11 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Xiyao Wang et.al. 2506.10128 link
2025-06-11 Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection David Farr et.al. 2506.10104 null
2025-06-11 Textual Bayes: Quantifying Uncertainty in LLM-Based Systems Brendan Leigh Ross et.al. 2506.10060 null
2025-06-10 Evaluation empirique de la sécurisation et de l’alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks Rafaël Nouailles et.al. 2506.10029 null
2025-06-16 Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs Hiroshi Matsuda et.al. 2506.09983 link
2025-06-11 Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs Rodion Oblovatny et.al. 2506.09886 null
2025-06-11 Do LLMs Give Psychometrically Plausible Responses in Educational Assessments? Andreas Säuberli et.al. 2506.09796 null
2025-06-11 Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models Haoyi Song et.al. 2506.09684 link
2025-06-11 Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering Tianjun Yao et.al. 2506.09645 link
2025-06-11 HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding Yanzhao Shi et.al. 2506.09634 null
2025-06-11 From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies Blaž Škrlj et.al. 2506.09566 null
2025-06-11 DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts Yuchen Feng et.al. 2506.09351 null
2025-06-11 Know What You Don’t Know: Uncertainty Calibration of Process Reward Models Young-Jin Park et.al. 2506.09338 null
2025-06-10 G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration Samuel Holt et.al. 2506.09272 null
2025-06-10 Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation Karl Löwenmark et.al. 2506.09247 null
2025-06-10 The Curious Language Model: Strategic Test-Time Information Acquisition Michael Cooper et.al. 2506.09173 null
2025-06-10 Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models Xinyuan Wang et.al. 2506.09084 null
2025-06-10 FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making Jiaxiang Chen et.al. 2506.09080 null
2025-06-10 AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions Polina Kirichenko et.al. 2506.09038 link
2025-06-11 Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance Kaifeng He et.al. 2506.08980 null
2025-06-10 The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation Francisco Vargas et.al. 2506.08827 null
2025-06-12 ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization Hee Suk Yoon et.al. 2506.08712 null
2025-06-10 RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being Rahatara Ferdousi et.al. 2506.08486 null
2025-06-10 Olica: Efficient Structured Pruning of Large Language Models without Retraining Jiujun He et.al. 2506.08436 link
2025-06-11 Transforming Expert Knowledge into Scalable Ontology via Large Language Models Ikkei Itoku et.al. 2506.08422 null
2025-06-09 Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic Zhenjiang Mao et.al. 2506.08243 null
2025-06-09 Conservative Bias in Large Language Models: Measuring Relation Predictions Toyin Aguda et.al. 2506.08120 null
2025-06-10 Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation Jiaxiang Chen et.al. 2506.07820 null
2025-06-09 Language-Vision Planner and Executor for Text-to-Visual Reasoning Yichang Xu et.al. 2506.07778 null
2025-06-09 QUITE: A Query Rewrite System Beyond Rules with LLM Agents Yuyang Song et.al. 2506.07675 null
2025-06-09 Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models Ruiyang Zhang et.al. 2506.07575 null
2025-06-09 SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition Mengsong Wu et.al. 2506.07557 null
2025-06-09 CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models Guang Liu et.al. 2506.07463 null
2025-06-09 From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered Siddartha Devic et.al. 2506.07461 null
2025-06-09 Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs T. Duy Nguyen-Hien et.al. 2506.07448 null
2025-06-11 MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models Philip R. Liu et.al. 2506.07400 link
2025-06-10 ARGUS: Hallucination and Omission Evaluation in Video-LLMs Ruchit Rawal et.al. 2506.07371 null
2025-06-08 ConfQA: Answer Only If You Are Confident Yin Huang et.al. 2506.07309 null
2025-06-08 Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance Mohammadreza Chavoshi et.al. 2506.07273 null
2025-06-08 Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis Yaping Chai et.al. 2506.07148 null
2025-06-08 Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models Samir Abdaljalil et.al. 2506.07106 null
2025-06-08 Com $^2$ : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models Kai Xiong et.al. 2506.07064 null
2025-06-08 AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint Leheng Sheng et.al. 2506.07022 link
2025-06-07 Quantile Regression with Large Language Models for Price Prediction Nikhita Vedula et.al. 2506.06657 null
2025-06-07 \textit{QuantMCP}: Grounding Large Language Models in Verifiable Financial Reality Yifan Zeng et.al. 2506.06622 null
2025-06-06 Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques Adarsh Prasad Behera et.al. 2506.06579 null
2025-06-06 Beyond Facts: Evaluating Intent Hallucination in Large Language Models Yijie Hao et.al. 2506.06539 null
2025-06-11 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Pengyi Li et.al. 2506.06395 null
2025-06-04 On the Fundamental Impossibility of Hallucination Control in Large Language Models Michał P. Karpowicz et.al. 2506.06382 null
2025-06-06 Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge Yi Sui et.al. 2506.06240 null
2025-06-06 Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach James Ford et.al. 2506.06175 null
2025-06-06 Recommender systems, stigmergy, and the tyranny of popularity Zackary Okun Dunivin et.al. 2506.06162 null
2025-06-09 MIRIAD: Augmenting LLMs with millions of medical query-response pairs Qinyue Zheng et.al. 2506.06091 null
2025-06-06 AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search Yu Li et.al. 2506.06017 null
2025-06-06 Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques Xiaofei Xu et.al. 2506.05924 null
2025-06-06 Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness Rongzhe Wei et.al. 2506.05735 null
2025-06-09 Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models Zefan Zeng et.al. 2506.05675 null
2025-06-05 When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Yan Shu et.al. 2506.05551 null
2025-06-05 Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models Sima Noorani et.al. 2506.05497 null
2025-06-05 CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection Ron Eliav et.al. 2506.05243 null
2025-06-05 On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools Shivani Upadhyay et.al. 2506.05182 link
2025-06-05 When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models Kai Wang et.al. 2506.04909 null
2025-06-05 Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights Giorgio Biancini et.al. 2506.04851 null
2025-06-05 Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models Changyue Wang et.al. 2506.04832 link
2025-06-05 A Reasoning-Based Approach to Cryptic Crossword Clue Solving Martin Andrews et.al. 2506.04824 null
2025-06-05 GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval Lingyuan Liu et.al. 2506.04762 link
2025-06-05 Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning Zhiyuan Ma et.al. 2506.04625 null
2025-06-05 Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification Chengwu Liu et.al. 2506.04592 null
2025-06-04 AuthGuard: Generalizable Deepfake Detection via Language Guidance Guangyu Shen et.al. 2506.04501 null
2025-06-04 “Don’t Do That!”: Guiding Embodied Systems through Large Language Model-based Constraint Generation Aladin Djuhera et.al. 2506.04500 null
2025-06-04 Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification Payel Bhattacharjee et.al. 2506.04450 null
2025-06-06 TracLLM: A Generic Framework for Attributing Long Context LLMs Yanting Wang et.al. 2506.04202 link
2025-06-04 N $^2$ : A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion Caleb Chin et.al. 2506.04166 link
2025-06-04 A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization Sarvesh Soni et.al. 2506.04156 null
2025-06-04 High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning Tim Franzmeyer et.al. 2506.04051 null
2025-06-04 Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization Jiulong Wu et.al. 2506.04039 null
2025-06-05 Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems Yuxin Zhang et.al. 2506.03901 null
2025-06-04 Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation Mingxuan Xia et.al. 2506.03857 null
2025-06-04 From Theory to Practice: Real-World Use Cases on Trustworthy LLM-Driven Process Modeling, Prediction and Automation Peter Pfeiffer et.al. 2506.03801 null
2025-06-04 Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision Chaeyun Jang et.al. 2506.03723 null
2025-06-04 AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism Zhepei Wei et.al. 2506.03700 link
2025-06-04 Robust Preference Optimization via Dynamic Target Margins Jie Sun et.al. 2506.03690 null
2025-06-04 Trustworthy Medical Question Answering: An Evaluation-Centric Survey Yinuo Wang et.al. 2506.03659 null
2025-06-04 Learning to Insert [PAUSE] Tokens for Better Reasoning Eunki Kim et.al. 2506.03616 null
2025-06-04 Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering Zhuo Zhuo et.al. 2506.03504 null
2025-06-03 Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior Yue Gong et.al. 2506.03444 null
2025-06-03 Sampling Preferences Yields Simple Trustworthiness Scores Sean Steinle et.al. 2506.03399 null
2025-06-03 Ask a Local: Detecting Hallucinations With Specialized Model Divergence Aldan Creo et.al. 2506.03357 null
2025-06-03 Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows Yifei Ming et.al. 2506.03332 null
2025-06-03 FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes Christodoulos Constantinides et.al. 2506.03278 link
2025-06-03 Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech Florian Ludwig et.al. 2506.03009 null
2025-06-03 Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation Li Zhang et.al. 2506.02992 null
2025-06-03 Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation Dingwei Chen et.al. 2506.02973 null
2025-06-04 A Multi-agent LLM-based JUnit Test Generation with Strong Oracles Qinghua Xu et.al. 2506.02943 null
2025-06-03 Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs Shangmin Guo et.al. 2506.02918 null
2025-06-03 Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs Wenjing Tang et.al. 2506.02860 null
2025-06-03 Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations Jinyuan Luo et.al. 2506.02696 null
2025-06-04 Computational Thinking Reasoning in Large Language Models Kechi Zhang et.al. 2506.02658 null
2025-06-03 In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration Jiajie Fu et.al. 2506.02509 null
2025-06-03 Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning Haowen Xu et.al. 2506.02485 null
2025-06-02 Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation Priyaranjan Pattnayak et.al. 2506.02097 null
2025-06-02 DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation Jennifer Chen et.al. 2506.01954 null
2025-06-02 Self-ensemble: Mitigating Confidence Distortion for Large Language Models Zicheng Xu et.al. 2506.01951 null
2025-06-02 WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue Yaoyao Qian et.al. 2506.01881 link
2025-06-02 Benford’s Curse: Tracing Digit Bias to Numerical Hallucination in LLMs Jiandong Shao et.al. 2506.01734 null
2025-06-02 Fairness Dynamics During Training Krishna Patel et.al. 2506.01709 null
2025-06-02 When LLMs Team Up: The Emergence of Collaborative Affective Computing Wenna Lai et.al. 2506.01698 null
2025-06-02 MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments Xiao Yang et.al. 2506.01616 null
2025-06-02 Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes Meng Li et.al. 2506.01512 null
2025-06-02 MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations Kensuke Mitsuzawa et.al. 2506.01367 null
2025-06-02 Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents Manan Suri et.al. 2506.01344 null
2025-06-02 Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model Yuanhe Tian et.al. 2506.01266 null
2025-06-01 Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation Pimchanok Sukjai et.al. 2506.01118 null
2025-06-01 ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation Xinyi Liu et.al. 2506.01116 null
2025-06-01 Reconsidering LLM Uncertainty Estimation Methods in the Wild Yavuz Bakman et.al. 2506.01114 null
2025-06-01 Contextual Candor: Enhancing LLM Trustworthiness Through Hierarchical Unanswerability Detection Steven Robinson et.al. 2506.01104 null
2025-06-01 Taming LLMs by Scaling Learning Rates with Gradient Grouping Siyuan Li et.al. 2506.01049 null
2025-06-01 Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks Yuntai Bao et.al. 2506.00823 link
2025-06-01 One for All: Update Parameterized Knowledge Across Multiple Models Weitao Ma et.al. 2506.00817 null
2025-06-01 Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision Jiahui Zhou et.al. 2506.00807 null
2025-06-01 KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision Rong Wu et.al. 2506.00783 null
2025-06-01 Do not Abstain! Identify and Solve the Uncertainty Jingyu Liu et.al. 2506.00780 null
2025-05-31 Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection Yeshwanth Venkatesha et.al. 2506.00743 null
2025-05-31 Pitfalls in Evaluating Language Model Forecasters Daniel Paleka et.al. 2506.00723 null
2025-06-03 Measuring Faithfulness and Abstention: An Automated Pipeline for Evaluating LLM-Generated 3-ply Case-Based Legal Arguments Li Zhang et.al. 2506.00694 null
2025-05-31 Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs Chenjun Xu et.al. 2506.00582 link
2025-05-31 AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs Nicholas E. Corrado et.al. 2506.00569 null
2025-06-03 CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention Yuxi Sun et.al. 2506.00519 null
2025-05-31 Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering Linhao Ye et.al. 2506.00491 null
2025-05-31 Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization Suhas BN et.al. 2506.00448 null
2025-05-31 Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy Jie Ren et.al. 2506.00359 null
2025-05-31 Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs Sungjae Lee et.al. 2506.00344 null
2025-05-31 TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering Boyi Zhang et.al. 2506.00331 null
2025-05-31 Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning Sara Ghazanfari et.al. 2506.00318 null
2025-05-30 Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity Dang Nguyen et.al. 2506.00245 null
2025-05-30 MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs Gabrielle Kaili-May Liu et.al. 2505.24858 link
2025-05-30 Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs Juraj Vladika et.al. 2505.24830 null
2025-06-02 Guiding Generative Storytelling with Knowledge Graphs Zhijun Pan et.al. 2505.24803 null
2025-05-30 Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty? Jiayu Liu et.al. 2505.24778 link
2025-05-30 Can LLMs and humans be friends? Uncovering factors affecting human-AI intimacy formation Yeseon Hong et.al. 2505.24658 null
2025-05-30 The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models Junyi Li et.al. 2505.24630 link
2025-05-30 LLM Inference Enhanced by External Knowledge: A Survey Yu-Hsuan Lin et.al. 2505.24377 link
2025-05-30 ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration Xianglong Yan et.al. 2505.24357 null
2025-05-30 Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction Yangui Fang et.al. 2505.24347 null
2025-05-30 LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization Zirui Shang et.al. 2505.24282 null
2025-06-02 MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM Bowen Dong et.al. 2505.24238 null
2025-05-30 ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction Murari Ambati et.al. 2505.24230 null
2025-05-30 Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling Yimin Du et.al. 2505.24199 null
2025-05-29 Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model Nokimul Hasan Arif et.al. 2505.24007 null
2025-05-29 Fitting the Message to the Moment: Designing Calendar-Aware Stress Messaging with Large Language Models Pranav Rao et.al. 2505.23997 null
2025-05-29 Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs Yinong Oliver Wang et.al. 2505.23996 null
2025-05-29 FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression Jiayi Tian et.al. 2505.23966 link
2025-05-29 Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation Caiqi Zhang et.al. 2505.23912 null
2025-05-29 Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems Winstead Zhu et.al. 2505.23908 null
2025-05-29 Revisiting Uncertainty Estimation and Calibration of Large Language Models Linwei Tao et.al. 2505.23854 null
2025-05-28 Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs Jakub Podolak et.al. 2505.23845 null
2025-05-28 SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context Hairu Wang et.al. 2505.23841 null
2025-05-29 SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models Zixiang Xu et.al. 2505.23713 link
2025-06-02 Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation Hongxiang Zhang et.al. 2505.23657 null
2025-06-01 Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms Jane Cleland-Huang et.al. 2505.23576 null
2025-05-30 EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions Xiaorui Wu et.al. 2505.23473 null
2025-06-01 A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy Ahmad Mohsin et.al. 2505.23397 null
2025-05-29 Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs Julia Belikova et.al. 2505.23299 null
2025-05-29 Daunce: Data Attribution through Uncertainty Estimation Xingyuan Pan et.al. 2505.23223 null
2025-05-29 DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes Sungjune Park et.al. 2505.23179 null
2025-05-29 AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models Jinchuan Zhang et.al. 2505.23020 link
2025-05-28 Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents Michael Kirchhof et.al. 2505.22655 null
2025-05-28 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Ang Lv et.al. 2505.22653 null
2025-05-30 Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs Ziling Cheng et.al. 2505.22630 null
2025-05-28 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Chengyue Wu et.al. 2505.22618 null
2025-05-28 Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users Victor Jüttner et.al. 2505.22435 null
2025-05-28 AI Trust Reshaping Administrative Burdens: Understanding Trust-Burden Dynamics in LLM-Assisted Benefits Systems Jeongwon Jo et.al. 2505.22418 null
2025-05-28 Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation Yunsoo Kim et.al. 2505.22222 null
2025-05-31 iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs Runkai Li et.al. 2505.22086 null
2025-05-28 Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home? Yujin Choi et.al. 2505.22061 null
2025-05-28 Legal Assist AI: Leveraging Transformer-Based Model for Effective Legal Assistance Jatin Gupta et.al. 2505.22003 null
2025-05-28 ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning Zhendong Mi et.al. 2505.21987 null
2025-05-28 Judging LLMs on a Simplex Patrick Vossler et.al. 2505.21972 null
2025-05-28 Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning Qihuang Zhong et.al. 2505.21958 null
2025-05-27 Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation Tharindu Kumarage et.al. 2505.21784 null
2025-05-27 Calibrating LLM Confidence by Probing Perturbed Representation Stability Reza Khanmohammadi et.al. 2505.21772 null
2025-05-30 Do We Know What LLMs Don’t Know? A Study of Consistency in Knowledge Probing Raoyuan Zhao et.al. 2505.21701 null
2025-05-27 The Feasibility of Topic-Based Watermarking on Academic Peer Reviews Alexander Nemecek et.al. 2505.21636 null
2025-05-27 Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems Young-Min Cho et.al. 2505.21588 null
2025-05-27 Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making Yihan Wang et.al. 2505.21503 null
2025-05-27 Can Large Reasoning Models Self-Train? Sheikh Shafayat et.al. 2505.21444 null
2025-05-27 Pretrained LLMs Learn Multiple Types of Uncertainty Roi Cohen et.al. 2505.21218 null
2025-05-27 Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Sergey Pletenev et.al. 2505.21115 null
2025-05-27 A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction Bogdan Bogachov et.al. 2505.21109 null
2025-05-27 Thinker: Learning to Think Fast and Slow Stephen Chung et.al. 2505.21097 null
2025-05-28 Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation Ekaterina Fadeeva et.al. 2505.21072 null
2025-05-27 Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking Lingyi Cai et.al. 2505.21045 null
2025-05-27 Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA Xiangqing Shen et.al. 2505.20971 null
2025-05-27 IRCopilot: Automated Incident Response with Large Language Models Xihuan Lin et.al. 2505.20945 null
2025-05-27 Towards Objective Fine-tuning: How LLMs’ Prior Knowledge Causes Potential Poor Calibration? Ziming Wang et.al. 2505.20903 null
2025-05-27 MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection Baraa Hikal et.al. 2505.20880 null
2025-05-27 Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG Xin Sun et.al. 2505.20871 null
2025-05-27 AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding Chaeyoung Jung et.al. 2505.20862 null
2025-05-27 Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation Wooseong Yang et.al. 2505.20773 null
2025-05-30 CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models Xiaqiang Tang et.al. 2505.20767 link
2025-05-27 RRO: LLM Agent Optimization Through Rising Reward Trajectories Zilong Wang et.al. 2505.20737 null
2025-05-26 Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting Ana Rita Ortigoso et.al. 2505.20521 null
2025-05-26 InFact: Informativeness Alignment for Improved LLM Factuality Roi Cohen et.al. 2505.20487 null
2025-05-26 HAMburger: Accelerating LLM Inference via Token Smashing Jingyu Liu et.al. 2505.20438 null
2025-05-26 GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation Zihong Chen et.al. 2505.20416 link
2025-05-26 GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining Simin Fan et.al. 2505.20380 null
2025-05-26 Reasoning LLMs are Wandering Solution Explorers Jiahao Lu et.al. 2505.20296 null
2025-05-26 Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? Michael Kirchhof et.al. 2505.20295 null
2025-05-26 Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models Weihao Xuan et.al. 2505.20236 null
2025-05-27 Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning Xiaorong Wang et.al. 2505.20195 null
2025-05-26 From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data Chun-Yi Kuan et.al. 2505.20166 null
2025-05-26 Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities Chuangtao Ma et.al. 2505.20099 link
2025-05-26 Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks Debargha Ganguly et.al. 2505.20047 null
2025-05-26 Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs Artem Vazhentsev et.al. 2505.20045 null
2025-05-26 DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response Bilel Cherif et.al. 2505.19973 null
2025-05-26 CP-Router: An Uncertainty-Aware Router Between LLM and LRM Jiayuan Su et.al. 2505.19970 null
2025-05-26 Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision Tej Deep Pala et.al. 2505.19706 link
2025-05-26 Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement Liqin Ye et.al. 2505.19675 link
2025-05-26 DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue Yichun Feng et.al. 2505.19630 link
2025-05-26 Learning to Reason without External Rewards Xuandong Zhao et.al. 2505.19590 link
2025-05-26 Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models Jianxing Liao et.al. 2505.19490 null
2025-05-26 Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection Mohammad Mahdi Moradi et.al. 2505.19475 null
2025-05-26 Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents Ye Ye et.al. 2505.19436 link
2025-05-26 Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering Jiajun Zhu et.al. 2505.19410 null
2025-05-26 VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation Ethan TS. Liu et.al. 2505.19395 link
2025-05-25 Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales Charles Godfrey et.al. 2505.19334 null
2025-05-25 LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models Aida Kostikova et.al. 2505.19240 null
2025-05-25 GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling Jialong Zhou et.al. 2505.19234 null
2025-05-25 LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling Yang Xiao et.al. 2505.19187 link
2025-05-27 When Two LLMs Debate, Both Think They’ll Win Pradyumna Shyama Prasad et.al. 2505.19184 null
2025-05-25 Do Large Language Models (Really) Need Statistical Foundations? Weijie Su et.al. 2505.19145 null
2025-05-25 CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models Yongheng Zhang et.al. 2505.19108 link
2025-05-25 Towards Harmonized Uncertainty Estimation for Large Language Models Rui Li et.al. 2505.19073 null
2025-05-25 UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models Roman Vashurin et.al. 2505.19060 null
2025-05-25 Online Knowledge Distillation with Reward Guidance Chen Jia et.al. 2505.18952 null
2025-05-25 LLM-Guided Taxonomy and Hierarchical Uncertainty for 3D Point CLoud Active Learning Chenxi Li et.al. 2505.18924 null
2025-05-24 Mitigating Deceptive Alignment via Self-Monitoring Jiaming Ji et.al. 2505.18807 null
2025-05-24 PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs Tengxuan Liu et.al. 2505.18610 link
2025-05-24 Response Uncertainty and Probe Modeling: Two Sides of the Same Coin in LLM Interpretability? Yongjie Wang et.al. 2505.18575 null
2025-05-24 B-score: Detecting biases in large language models using response history An Vo et.al. 2505.18545 null
2025-05-24 Benchmarking Poisoning Attacks against Retrieval-Augmented Generation Baolei Zhang et.al. 2505.18543 null
2025-05-24 RoleRAG: Enhancing LLM Role-Playing via Graph Guided Retrieval Yongjie Wang et.al. 2505.18541 null
2025-05-24 AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking Soyoung Yoon et.al. 2505.18512 link
2025-05-24 MedScore: Factuality Evaluation of Free-Form Medical Answers Heyuan Huang et.al. 2505.18452 link
2025-05-23 Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps Khandakar Ashrafi Akbar et.al. 2505.18426 null
2025-05-23 Model Editing with Graph-Based External Memory Yash Kumar Atri et.al. 2505.18343 null
2025-05-23 NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache Donghyun Son et.al. 2505.18231 null
2025-05-23 Evidence-Grounded Multimodal Misinformation Detection with Attention-Based GNNs Sharad Duwal et.al. 2505.18221 null
2025-05-26 Outcome-based Reinforcement Learning to Predict the Future Benjamin Turtel et.al. 2505.17989 null
2025-05-23 LLM Meeting Decision Trees on Tabular Data Hangting Ye et.al. 2505.17918 null
2025-05-23 Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour Bálint Gyevnár et.al. 2505.17801 null
2025-05-23 C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models Amir Hossein Rahmati et.al. 2505.17773 null
2025-05-23 But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors Leon Eshuijs et.al. 2505.17760 null
2025-05-23 Get Experience from Practice: LLM Agents with Record & Replay Erhu Feng et.al. 2505.17716 null
2025-05-23 Distilling LLM Agent into Small Models with Retrieval and Code Tools Minki Kang et.al. 2505.17612 link
2025-05-23 Dynamic Text Bundling Supervision for Zero-Shot Inference on Text-Attributed Graphs Yusheng Zhao et.al. 2505.17599 null
2025-05-23 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection Shrey Pandit et.al. 2505.17558 null
2025-05-23 How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception Shiyu Ni et.al. 2505.17537 null
2025-05-23 CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents Minsoo Khang et.al. 2505.17503 null
2025-05-23 keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection Saketh Reddy Vemula et.al. 2505.17485 link
2025-05-23 Self-Training Large Language Models with Confident Reasoning Hyosoon Jang et.al. 2505.17454 null
2025-05-23 A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit Zafarullah Mahmood et.al. 2505.17362 link
2025-05-22 GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints Soren DeHaan et.al. 2505.17327 null
2025-05-22 Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty Peilin Wu et.al. 2505.17281 null
2025-05-22 Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG) Clayton Cohn et.al. 2505.17238 null
2025-05-22 LLM-Powered Agents for Navigating Venice’s Historical Cadastre Tristan Karch et.al. 2505.17148 null
2025-05-22 When can isotropy help adapt LLMs’ next word prediction to numerical domains? Rashed Shelim et.al. 2505.17135 null
2025-05-21 NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction Soyeon Kim et.al. 2505.17125 null
2025-05-22 R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning Huatong Song et.al. 2505.17005 link
2025-05-22 UNCLE: Uncertainty Expressions in Long-Form Generation Ruihan Yang et.al. 2505.16922 null
2025-05-22 Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs Zeyu Wei et.al. 2505.16894 null
2025-05-22 Walk&Retrieve: Simple Yet Effective Zero-shot Retrieval-Augmented Generation via Knowledge Graph Walks Martin Böckling et.al. 2505.16849 link
2025-05-22 Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement Kexin Zhang et.al. 2505.16806 null
2025-05-22 Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs Zeping Yu et.al. 2505.16703 null
2025-05-22 Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator Beier Luo et.al. 2505.16690 null
2025-05-22 Collaboration among Multiple Large Language Models for Medical Question Answering Kexin Shang et.al. 2505.16648 null
2025-05-22 Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering Bowen Jiang et.al. 2505.16591 null
2025-05-22 Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs Giovanni Servedio et.al. 2505.16520 null
2025-05-24 Recursive Offloading for LLM Serving in Multi-tier Networks Zhiyuan Wu et.al. 2505.16502 link
2025-05-22 Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery Yanbo Zhang et.al. 2505.16477 null
2025-05-22 MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM Siwei Meng et.al. 2505.16456 null
2025-05-22 Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems Hongru Song et.al. 2505.16367 null
2025-05-22 HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation Shijie Zhang et.al. 2505.16281 null
2025-05-22 Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation Derong Xu et.al. 2505.16237 null
2025-05-22 Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models Menschikov Mikhail et.al. 2505.16134 null
2025-05-22 Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning Junhong Lin et.al. 2505.16122 null
2025-05-22 LLM-Powered AI Agent Systems and Their Applications in Industry Guannan Liang et.al. 2505.16120 null
2025-05-22 Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools Panagiotis Lymperopoulos et.al. 2505.16113 null
2025-05-23 Continually Self-Improving Language Models for Bariatric Surgery Question–Answering Yash Kumar Atri et.al. 2505.16102 null
2025-05-21 Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation Ruijie Xi et.al. 2505.16065 null
2025-05-21 SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models Roland Daynauth et.al. 2505.16003 null
2025-05-22 HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving Zhiwen Chen et.al. 2505.15793 null
2025-05-21 Long-Form Information Alignment Evaluation Beyond Atomic Facts Danna Zheng et.al. 2505.15792 null
2025-05-21 Large Language Models as Computable Approximations to Solomonoff Induction Jun Wan et.al. 2505.15784 null
2025-05-21 KaFT: Knowledge-aware Fine-tuning for Boosting LLMs’ Domain-specific Question-Answering Performance Qihuang Zhong et.al. 2505.15480 null
2025-05-21 AdUE: Improving uncertainty estimation head for LoRA adapters in LLMs Artem Zabolotnyi et.al. 2505.15443 null
2025-05-21 RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection Yiming Huang et.al. 2505.15386 null
2025-05-21 Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack Silvia Cappelletti et.al. 2505.15323 null
2025-05-21 Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization Joonho Yang et.al. 2505.15291 null
2025-05-21 Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs Zihao Pan et.al. 2505.15265 null
2025-05-22 Adaptive Plan-Execute Framework for Smart Contract Security Auditing Zhiyuan Wei et.al. 2505.15242 null
2025-05-21 Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge Yassir Fathullah et.al. 2505.15240 null
2025-05-21 Multilingual Prompting for Improving LLM Generation Diversity Qihan Wang et.al. 2505.15229 null
2025-05-21 Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs Jie Ma et.al. 2505.15210 link
2025-05-21 ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection Jeonghye Kim et.al. 2505.15182 null
2025-05-21 Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning Jinghui Lu et.al. 2505.15154 null
2025-05-21 The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning Shivam Agarwal et.al. 2505.15134 link
2025-05-21 RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals Xuanliang Zhang et.al. 2505.15110 null
2025-05-21 Cost-aware LLM-based Online Dataset Annotation Eray Can Elumar et.al. 2505.15101 null
2025-05-21 PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration Yingming Pu et.al. 2505.15047 link
2025-05-21 Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models Zhihao Wen et.al. 2505.14992 null
2025-05-20 JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation Ghasem Pasandi et.al. 2505.14978 null
2025-05-20 Foundations of Unknown-aware Machine Learning Xuefeng Du et.al. 2505.14933 null
2025-05-20 $\texttt{LLINBO}$ : Trustworthy LLM-in-the-Loop Bayesian Optimization Chih-Yu Chang et.al. 2505.14756 link
2025-05-20 Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models Guangzhi Xiong et.al. 2505.14599 link
2025-05-20 Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples Chun-Yi Kuan et.al. 2505.14518 null
2025-05-20 Reasoning Models Better Express Their Confidence Dongkeun Yoon et.al. 2505.14489 link
2025-05-21 Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis Haoming Huang et.al. 2505.14406 null
2025-05-20 Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs Jiawen Wang et.al. 2505.14368 null
2025-05-20 Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents Wei Fan et.al. 2505.14104 null
2025-05-20 MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations Ernests Lavrinovics et.al. 2505.14101 link
2025-05-20 Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering Yihua Zhu et.al. 2505.14099 null
2025-05-20 ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data Xinzhe Zheng et.al. 2505.14038 null
2025-05-21 When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty Yanzhe Wen et.al. 2505.13989 null
2025-05-20 The Hallucination Tax of Reinforcement Finetuning Linxin Song et.al. 2505.13988 null
2025-05-20 MLZero: A Multi-Agent System for End-to-end Machine Learning Automation Haoyang Fang et.al. 2505.13941 link
2025-05-20 DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery Kun Li et.al. 2505.13940 link
2025-05-20 Preference Learning with Lie Detectors can Induce Honesty or Evasion Chris Cundy et.al. 2505.13787 link
2025-05-19 Incentivizing Truthful Language Models via Peer Elicitation Games Baiting Chen et.al. 2505.13636 link
2025-05-19 Selective Code Generation for Functional Guarantees Jaewoo Jeong et.al. 2505.13553 null
2025-05-19 Exploring Federated Pruning for Large Language Models Pengxin Guo et.al. 2505.13547 link
2025-05-19 Know Or Not: a library for evaluating out-of-knowledge base robustness Jessica Foo et.al. 2505.13545 link
2025-05-16 An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents Ayesha Amjad et.al. 2505.13504 null
2025-05-19 GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection Zhijie Deng et.al. 2505.13312 null
2025-05-19 Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice Zhi Liu et.al. 2505.13156 null
2025-05-19 Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning Debarpan Bhattacharya et.al. 2505.13115 link
2025-05-19 Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs Shmulik Markovich-Golan et.al. 2505.13060 null
2025-05-19 Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering Jianfeng Cai et.al. 2505.12826 null
2025-05-19 LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries Kenya Abe et.al. 2505.12694 link
2025-05-19 Know3-RAG: A Knowledge-aware RAG Framework with Adaptive Retrieval, Generation, and Filtering Xukai Liu et.al. 2505.12662 link
2025-05-18 UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection Yang Zhao et.al. 2505.12457 null
2025-05-18 VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning Qi Wang et.al. 2505.12434 link
2025-05-18 PSC: Extending Context Window of Large Language Models via Phase Shift Calibration Wenqiao Zhu et.al. 2505.12423 link
2025-05-18 SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization Minghan Chen et.al. 2505.12346 null
2025-05-18 Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge Luyu Chen et.al. 2505.12301 null
2025-05-18 The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models Linghan Huang et.al. 2505.12287 null
2025-05-18 Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation Chengwei Qin et.al. 2505.12265 null
2025-05-17 The Impact of Emerging Phishing Threats: Assessing Quishing and LLM-generated Phishing Emails against Organizations Marie Weinz et.al. 2505.12104 null
2025-05-20 MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities Jingxue Chen et.al. 2505.12043 null
2025-05-17 SOCIA: An End-to-End Agentic Framework for Automated Cyber-Physical-Social Simulator Generation Yuncheng Hua et.al. 2505.12006 null
2025-05-17 TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text Ahmed Lekssays et.al. 2505.11988 link
2025-05-17 CCNU at SemEval-2025 Task 3: Leveraging Internal and External Knowledge of Large Language Models for Multilingual Hallucination Annotation Xu Liu et.al. 2505.11965 null
2025-05-17 Fine-Grained ECG-Text Contrastive Learning via Waveform Understanding Enhancement Haitao Li et.al. 2505.11939 null
2025-05-17 Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning? Zihao Dongfang et.al. 2505.11907 null
2025-05-17 When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Guijin Son et.al. 2505.11855 null
2025-05-17 Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs Xuannan Liu et.al. 2505.11842 link
2025-05-17 Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling Yitian Chen et.al. 2505.11792 null
2025-05-17 Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission Seungeun Oh et.al. 2505.11788 null
2025-05-16 Token-Level Uncertainty Estimation for Large Language Model Reasoning Tunyu Zhang et.al. 2505.11737 null
2025-05-16 Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models Harshil Vejendla et.al. 2505.11731 null
2025-05-16 Terminators: Terms of Service Parsing and Auditing Agents Maruf Ahmed Mridul et.al. 2505.11672 null
2025-05-16 EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models Bohao Xing et.al. 2505.11405 link
2025-05-19 Phare: A Safety Probe for Large Language Models Pierre Le Jeune et.al. 2505.11365 link
2025-05-16 The Way We Prompt: Conceptual Blending, Neural Dynamics, and Prompt-Induced Transitions in LLMs Makoto Sato et.al. 2505.10948 null
2025-05-19 Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation Zhan Peng Lee et.al. 2505.10792 link
2025-05-19 Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding Jianfei Zhao et.al. 2505.10634 null
2025-05-14 The Impact of Large Language Models on Task Automation in Manufacturing Services Jochen Wulf et.al. 2505.10581 null
2025-05-20 AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges Ranjan Sapkota et.al. 2505.10468 null
2025-05-15 GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs Longchao Da et.al. 2505.10143 null
2025-05-16 Leveraging Graph Retrieval-Augmented Generation to Support Learners’ Understanding of Knowledge Concepts in MOOCs Mohamed Abdelmagied et.al. 2505.10074 null
2025-05-15 Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Bingda Tang et.al. 2505.10046 link
2025-05-15 Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph Deeksha Prahlad et.al. 2505.09945 link
2025-05-15 Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks Ziyuan Zhang et.al. 2505.09901 link
2025-05-14 A Multimodal Multi-Agent Framework for Radiology Report Generation Ziruo Yi et.al. 2505.09787 null
2025-05-14 Trustless Autonomy: Understanding Motivations, Benefits and Governance Dilemma in Self-Sovereign Decentralized AI Agents Botao Amber Hu et.al. 2505.09757 null
2025-05-15 SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation Achref Doula et.al. 2505.09427 null
2025-05-14 Statistical Modeling and Uncertainty Estimation of LLM Inference Systems Kaustabha Ray et.al. 2505.09319 null
2025-05-14 Atomic Consistency Preference Optimization for Long-Form Question Answering Jingfeng Chen et.al. 2505.09039 link
2025-05-13 Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification Adarsh Kumar et.al. 2505.09031 null
2025-05-13 Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training Yangyi Chen et.al. 2505.08971 link
2025-05-13 CellTypeAgent: Trustworthy cell type annotation with Large Language Models Jiawen Chen et.al. 2505.08844 link
2025-05-13 Adaptive Schema-aware Event Extraction with Retrieval-Augmented Generation Sheng Liang et.al. 2505.08690 null
2025-05-13 RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models Fujun Zhang et.al. 2505.08463 null
2025-05-13 A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs Artem Shelmanov et.al. 2505.08200 null
2025-05-12 LLMs to Support K-12 Teachers in Culturally Relevant Pedagogy: An AI Literacy Example Jiayi Wang et.al. 2505.08083 null
2025-05-11 TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking Ching Nam Hang et.al. 2505.07891 null
2025-05-10 Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints Jian-Qiao Zhu et.al. 2505.07883 null
2025-05-09 Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy A M Muntasir Rahman et.al. 2505.07871 null
2025-05-12 Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding Yifeng Di et.al. 2505.07768 link
2025-05-12 KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation Ching Han Chen et.al. 2505.07618 null
2025-05-12 Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent Ziyang Huang et.al. 2505.07596 null
2025-05-12 Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models Bahram Mohammadi et.al. 2505.07500 null
2025-05-12 Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis Heydar Soudani et.al. 2505.07459 null
2025-05-12 LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning Xiaotian Lin et.al. 2505.07437 link
2025-05-12 Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data David de-Fitero-Dominguez et.al. 2505.07372 null
2025-05-12 Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection Pei-Fu Guo et.al. 2505.07309 null
2025-05-12 Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs Yifan Wei et.al. 2505.07184 link
2025-05-13 Exploring Anthropomorphism in Conversational Agents for Environmental Sustainability Mathyas Giudici et.al. 2505.07142 null
2025-05-14 RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models Hanzheng Dai et.al. 2505.07089 null
2025-05-10 POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models Yangguang Shao et.al. 2505.06579 link
2025-05-10 LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus Peihan Li et.al. 2505.06513 null
2025-05-09 Evolutionary thoughts: integration of large language models and evolutionary algorithms Antonio Jimeno Yepes et.al. 2505.05756 link
2025-05-08 Adaptive Stress Testing Black-Box LLM Planners Neeloy Chakraborty et.al. 2505.05665 null
2025-05-08 HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics Lennart Luettgau et.al. 2505.05602 link
2025-05-08 FLAM: Frame-Wise Language-Audio Modeling Yusong Wu et.al. 2505.05335 null
2025-05-08 MARK: Memory Augmented Refinement of Knowledge Anish Ganguli et.al. 2505.05177 null
2025-05-08 A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network Haoxiang Luo et.al. 2505.05103 null
2025-05-08 Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware Yujia Chen et.al. 2505.05057 link
2025-05-08 An Open-Source Dual-Loss Embedding Model for Semantic Retrieval in Higher Education Ramteja Sajja et.al. 2505.04916 null
2025-05-07 Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards Manveer Singh Tamber et.al. 2505.04847 link
2025-05-07 Osiris: A Lightweight Open-Source Hallucination Detection System Alex Shan et.al. 2505.04844 null
2025-05-07 A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models Pedro Pinacho-Davidson et.al. 2505.04784 null
2025-05-07 The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems Sutapa Dey Tithi et.al. 2505.04736 null
2025-05-06 Advancing Conversational Diagnostic AI with Multimodal Reasoning Khaled Saab et.al. 2505.04653 null
2025-05-06 Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions Adithya Kulkarni et.al. 2505.04651 null
2025-05-09 MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection Zhihao Zhang et.al. 2505.04594 null
2025-05-07 Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters David Exler et.al. 2505.04393 null
2025-05-07 Benchmarking LLMs’ Swarm intelligence Kai Ruan et.al. 2505.04364 link
2025-05-07 LLM-Independent Adaptive RAG: Let the Question Speak for Itself Maria Marina et.al. 2505.04253 null
2025-05-07 Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications Yuanai Xie et.al. 2505.04068 null
2025-05-02 Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs Ganghua Wang et.al. 2505.03814 null
2025-05-02 MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance Xing Hu et.al. 2505.03804 null
2025-05-02 Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth Changhai Zhou et.al. 2505.03802 null
2025-04-30 Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding Trilok Padhi et.al. 2505.03788 null
2025-05-06 A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning Kolawole E. Ogunsina et.al. 2505.03553 null
2025-05-06 Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis Shuang Zhou et.al. 2505.03467 null
2025-05-06 Automatic Calibration for Membership Inference Attack on Large Language Models Saleh Zare Zade et.al. 2505.03392 link
2025-05-06 Interpretable Zero-shot Learning with Infinite Class Concepts Zihan Ye et.al. 2505.03361 null
2025-05-06 Artificial Behavior Intelligence: Technology, Challenges, and Future Directions Kanghyun Jo et.al. 2505.03315 null
2025-05-06 A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case Haoxiang Luo et.al. 2505.03196 null
2025-05-06 Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering Joshua Owotogbe et.al. 2505.03096 null
2025-05-05 Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models Zhengliang Shi et.al. 2505.03075 link
2025-05-05 UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output Sicong Huang et.al. 2505.03030 null
2025-05-05 Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? Guangzhi Sun et.al. 2505.02884 null
2025-05-05 Phase transitions in AI-human interaction networks: statistics, computation, and probabilistic modeling Jackson George et.al. 2505.02879 null
2025-05-08 ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations Dmitriy Shopkhoev et.al. 2505.02819 link
2025-05-05 Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing Diji Yang et.al. 2505.02811 link
2025-05-06 Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation Gerard Pons et.al. 2505.02737 null
2025-05-04 SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation Tanguy Herserant et.al. 2505.02235 null
2025-05-12 LLM-Guided Probabilistic Program Induction for POMDP Model Estimation Aidan Curtis et.al. 2505.02216 null
2025-05-04 Large Language Models are overconfident and amplify human bias Fengfei Sun et.al. 2505.02151 null
2025-05-04 VECSR: Virtually Embodied Common Sense Reasoning System Alexis R. Tudor et.al. 2505.02144 link
2025-05-06 Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation Chenxi Liu et.al. 2505.02138 link
2025-05-04 Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach Jiancong Xiao et.al. 2505.01997 null
2025-05-03 High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers Brian Wong et.al. 2505.01693 null
2025-05-02 Always Tell Me The Odds: Fine-grained Conditional Probability Estimation Liaoyaqi Wang et.al. 2505.01595 null
2025-05-02 Retrieval Augmented Learning: A Retrial-based Large Language Model Self-Supervised Learning and Autonomous Knowledge Generation Zongyuan Li et.al. 2505.01073 null
2025-05-02 Multi-agents based User Values Mining for Recommendation Lijian Chen et.al. 2505.00981 null
2025-05-01 Multivariate Conformal Selection Tian Bai et.al. 2505.00917 null
2025-05-08 SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation Quang P. M. Pham et.al. 2505.00831 link
2025-05-01 HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models Zhaoxing Li et.al. 2505.00820 null
2025-05-01 A Survey on Large Language Model based Human-Agent Systems Henry Peng Zou et.al. 2505.00753 link
2025-05-05 Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs Dung Nguyen et.al. 2505.00744 null
2025-05-01 Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models Makoto Sato et.al. 2505.00557 null
2025-05-01 HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection Deanna Emery et.al. 2505.00506 null
2025-05-01 Distributed Retrieval-Augmented Generation Chenhao Xu et.al. 2505.00443 link
2025-04-30 Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs Jinyan Su et.al. 2505.00127 null
2025-04-30 Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 Jeho Choi et.al. 2505.00060 null
2025-04-24 An Empirical Study on Prompt Compression for Large Language Models Zheng Zhang et.al. 2505.00019 link
2025-04-30 MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness Junsheng Huang et.al. 2504.21773 null
2025-04-30 Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA Xuanzhao Dong et.al. 2504.21252 link
2025-05-01 AI-in-the-Loop Planning for Transportation Electrification: Case Studies from Austin, Texas Seung Jun Choi et.al. 2504.21185 null
2025-04-29 LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge Naheed Rayhan et.al. 2504.21132 null
2025-04-22 ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees Jun Wang et.al. 2504.21022 null
2025-04-22 Context-Enhanced Contrastive Search for Improved LLM Text Generation Jaydip Sen et.al. 2504.21020 null
2025-04-29 Jekyll-and-Hyde Tipping Point in an AI’s Behavior Neil F. Johnson et.al. 2504.20980 null
2025-04-29 SetKE: Knowledge Editing for Knowledge Elements Overlap Yifan Wei et.al. 2504.20972 null
2025-04-29 Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models Maryna Vyshnyvetska et.al. 2504.20951 null
2025-04-29 DYNAMAX: Dynamic computing for Transformers and Mamba based architectures Miguel Nogales et.al. 2504.20922 null
2025-04-29 Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges Yunseo Lee et.al. 2504.20799 null
2025-04-29 Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think Hasan Abed Al Kader Hammoud et.al. 2504.20708 null
2025-04-29 Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation? Evangelia Gogoulou et.al. 2504.20699 null
2025-04-29 Identifying Uncertainty in Self-Adaptive Robotics with Large Language Models Hassan Sartaj et.al. 2504.20684 null
2025-04-30 TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data Qi Wang et.al. 2504.20462 null
2025-04-28 Towards Large Language Models for Lunar Mission Planning and In Situ Resource Utilization Michael Pekala et.al. 2504.20125 null
2025-04-24 RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning Zihan Wang et.al. 2504.20073 link
2025-04-28 Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages Pritika Rohera et.al. 2504.20022 null
2025-04-28 Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models Xin Wang et.al. 2504.20020 null
2025-04-28 GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets Mingqian He et.al. 2504.19898 null
2025-04-28 A Tripartite Perspective on GraphRAG Michael Banf et.al. 2504.19667 null
2025-04-28 An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination Dixiao Wei et.al. 2504.19480 null
2025-04-28 Towards Long Context Hallucination Detection Siyi Liu et.al. 2504.19457 null
2025-04-27 Bi-directional Model Cascading with Proxy Confidence David Warren et.al. 2504.19391 null
2025-04-27 The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach Chad Coleman et.al. 2504.19255 null
2025-04-30 Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers Dylan Bouchard et.al. 2504.19254 link
2025-04-27 Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models Anindya Bijoy Das et.al. 2504.19061 null
2025-04-26 Calibrating Translation Decoding with Quality Estimation on LLMs Di Wu et.al. 2504.19044 link
2025-04-26 AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression Dong Whi Yoo et.al. 2504.18932 null
2025-04-26 Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning Abdellah Ghassel et.al. 2504.18839 null
2025-04-25 Span-Level Hallucination Detection for LLM-Generated Answers Passant Elchafei et.al. 2504.18639 null
2025-04-24 Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach Iizalaarab Elhaimeur et.al. 2504.18603 null
2025-04-25 LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection Rajesh Yarra et.al. 2504.18423 null
2025-04-25 Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review Toghrul Abbasli et.al. 2504.18346 null
2025-04-25 Evaluating Evaluation Metrics – The Mirage of Hallucination Detection Atharva Kulkarni et.al. 2504.18114 null
2025-04-25 Random-Set Large Language Models Muhammad Mubashar et.al. 2504.18085 null
2025-04-25 Validating Network Protocol Parsers with Traceable RFC Document Interpretation Mingwei Zheng et.al. 2504.18050 null
2025-04-24 LLM Agent Swarm for Hypothesis-Driven Drug Discovery Kevin Song et.al. 2504.17967 null
2025-04-24 HalluLens: LLM Hallucination Benchmark Yejin Bang et.al. 2504.17550 null
2025-04-24 Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing Hieu Huynh et.al. 2504.17287 null
2025-04-23 How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study Rendi Chevi et.al. 2504.17083 null
2025-04-23 Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models Shariar Kabir et.al. 2504.17052 null
2025-04-23 (Im)possibility of Automated Hallucination Detection in Large Language Models Amin Karbasi et.al. 2504.17004 null
2025-04-18 SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments Dachun Sun et.al. 2504.16947 null
2025-04-23 Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models Xuyang Zhu et.al. 2504.16883 null
2025-04-23 Monte Carlo Planning with Large Language Model for Text-Based Game Agents Zijing Shi et.al. 2504.16855 null
2025-04-23 Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories Mareike Lisker et.al. 2504.16604 null
2025-04-23 ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving Jie JW Wu et.al. 2504.16331 null
2025-04-23 Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations Nikhil Khandalkar et.al. 2504.15903 null
2025-04-22 Dynamic Early Exit in Reasoning Models Chenxu Yang et.al. 2504.15895 link
2025-04-22 Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback Ning Wang et.al. 2504.15804 null
2025-04-22 Grounded in Context: Retrieval-Based Method for Hallucination Detection Assaf Gerner et.al. 2504.15771 null
2025-04-20 PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind Yajie Yu et.al. 2504.15313 null
2025-04-21 Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning Ehsan Ahmadi et.al. 2504.15263 null
2025-04-21 Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges Nandan Thakur et.al. 2504.15205 null
2025-04-21 The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models Ronak Pradeep et.al. 2504.15068 null
2025-04-23 aiXamine: Simplified LLM Safety and Security Fatih Deniz et.al. 2504.14985 null
2025-04-21 POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications Chunjing Gan et.al. 2504.14917 null
2025-04-21 CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs Yingming Zheng et.al. 2504.14905 link
2025-04-20 HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis Kangwei Xu et.al. 2504.14641 null
2025-04-20 A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models Hongming Tan et.al. 2504.14620 null
2025-04-20 a1: Steep Test-time Scaling Law via Environment Augmented Generation Lingrui Mei et.al. 2504.14597 null
2025-04-20 Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey Ahsan Bilal et.al. 2504.14520 null
2025-04-20 VizTA: Enhancing Comprehension of Distributional Visualization with Visual-Lexical Fused Conversational Interface Liangwei Wang et.al. 2504.14507 null
2025-04-20 CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge Armin Toroghi et.al. 2504.14462 null
2025-04-20 Information Diffusion and Preferential Attachment in a Network of Large Language Models Adit Jain et.al. 2504.14438 null
2025-04-20 ResNetVLLM-2: Addressing ResNetVLLM’s Multi-Modal Hallucinations Ahmad Khalil et.al. 2504.14429 null
2025-04-19 Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts Kun Qian et.al. 2504.14375 null
2025-04-19 Density Measures for Language Generation Jon Kleinberg et.al. 2504.14370 null
2025-04-19 Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model Youngbin Lee et.al. 2504.14345 link
2025-04-19 A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models Chengming Wang et.al. 2504.14241 null
2025-04-18 Metacognition and Uncertainty Communication in Humans and Large Language Models Mark Steyvers et.al. 2504.14045 null
2025-04-18 Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy Regan Bolton et.al. 2504.14044 null
2025-04-18 Going Whole Hog: A Philosophical Defense of AI Cognition Herman Cappelen et.al. 2504.13988 null
2025-04-18 Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Chenghao Xiao et.al. 2504.13816 link
2025-04-18 Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results Andrea Santilli et.al. 2504.13677 null
2025-04-18 Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code Antonio Della Porta et.al. 2504.13656 null
2025-04-18 Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs Gabriel Freedman et.al. 2504.13644 link
2025-04-18 Long-context Non-factoid Question Answering in Indic Languages Ritwik Mishra et.al. 2504.13615 link
2025-04-18 Continual Pre-Training is (not) What You Need in Domain Adaption Pin-Er Chen et.al. 2504.13603 null
2025-04-18 Trust, but verify Michael J. Yuan et.al. 2504.13443 null
2025-04-17 Energy-Based Reward Models for Robust Language Model Alignment Anamika Lochab et.al. 2504.13134 link
2025-04-17 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Haojian Huang et.al. 2504.13122 link
2025-04-17 Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild Jiatai Wang et.al. 2504.12982 null
2025-04-17 QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? Zhouyang Jiang et.al. 2504.12961 null
2025-04-18 Customizing Emotional Support: How Do Individuals Construct and Interact With LLM-Powered Chatbots Xi Zheng et.al. 2504.12943 null
2025-04-17 Explainable AI in Usable Privacy and Security: Challenges and Opportunities Vincent Freiberger et.al. 2504.12931 null
2025-04-17 Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration Yicheng Pan et.al. 2504.12773 link
2025-04-17 Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations Yiyou Sun et.al. 2504.12691 link
2025-04-17 Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models Liyi Zhang et.al. 2504.12585 link
2025-04-16 PlanGlow: Personalized Study Planning with an Explainable and Controllable LLM-Driven System Jiwon Chun et.al. 2504.12452 link
2025-04-16 Don’t Just Translate, Agitate: Using Large Language Models as Devil’s Advocates for AI Explanations Ashley Suh et.al. 2504.12424 null
2025-04-16 Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study Harry Li et.al. 2504.12422 null
2025-04-16 Gauging Overprecision in LLMs: An Empirical Study Adil Bahaj et.al. 2504.12098 null
2025-04-16 Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models Kris Pilcher et.al. 2504.12012 null
2025-04-16 SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes Raúl Vázquez et.al. 2504.11975 null
2025-04-16 Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading Kihyun Kim et.al. 2504.11816 link
2025-04-16 Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming Paul Denny et.al. 2504.11723 null
2025-04-15 From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs Guocong Li et.al. 2504.11277 null
2025-04-16 Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR Yulong Zhang et.al. 2504.11101 null
2025-04-15 MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique Shuhang Liu et.al. 2504.11009 null
2025-04-14 CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates Ankit Kumar Shaw et.al. 2504.10738 null
2025-04-14 HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving Avinash Kumar et.al. 2504.10724 null
2025-04-14 EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration Soham Shah et.al. 2504.10681 null
2025-04-14 Efficient Process Reward Model Training via Active Learning Keyu Duan et.al. 2504.10559 link
2025-04-09 Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion Jakub Podolak et.al. 2504.10509 null
2025-04-14 Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? Olha Shaposhnyk et.al. 2504.10397 null
2025-04-16 Heimdall: test-time scaling on the generative verification Wenlei Shi et.al. 2504.10337 null
2025-04-14 From Prompting to Alignment: A Generative Framework for Query Recommendation Erxue Min et.al. 2504.10208 null
2025-04-14 DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation Hanghui Guo et.al. 2504.10198 null
2025-04-14 HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection Mohamed A. Abdallah et.al. 2504.10168 null
2025-04-14 C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation Xu Zhang et.al. 2504.10167 null
2025-04-14 The Human Visual System Can Inspire New Interaction Paradigms for LLMs Diana Robinson et.al. 2504.10101 null
2025-04-14 Hallucination Detection in LLMs via Topological Divergence on Attention Graphs Alexandra Bazarova et.al. 2504.10063 null
2025-04-15 Emotional Strain and Frustration in LLM Interactions in Software Engineering Cristina Martinez Montes et.al. 2504.10050 null
2025-04-14 DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify Zhengxuan Zhang et.al. 2504.10036 null
2025-04-14 EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control Hanwen Wan et.al. 2504.10030 link
2025-04-14 KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference Yuxuan Tian et.al. 2504.09936 null
2025-04-14 Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data Shuai Zhao et.al. 2504.09895 null
2025-04-14 Reasoning Models Can Be Effective Without Thinking Wenjie Ma et.al. 2504.09858 null
2025-04-14 RAKG:Document-level Retrieval Augmented Knowledge Graph Construction Hairong Zhang et.al. 2504.09823 link
2025-04-14 Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning Jingtian Wu et.al. 2504.09781 null
2025-04-13 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training Zhenting Wang et.al. 2504.09710 link
2025-04-17 Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws Zhixuan Pan et.al. 2504.09597 null
2025-04-17 ControlNET: A Firewall for RAG-based LLM System Hongwei Yao et.al. 2504.09593 null
2025-04-13 How new data permeates LLM knowledge and how to dilute it Chen Sun et.al. 2504.09522 null
2025-04-13 HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs Sharanya Dasgupta et.al. 2504.09482 link
2025-04-13 Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection MingShan Liu et.al. 2504.09440 null
2025-04-12 Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation Pengcheng Zhou et.al. 2504.09301 null
2025-04-12 SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders Ashmi Banerjee et.al. 2504.09277 null
2025-04-12 Towards More Efficient, Robust, Instance-adaptive, and Generalizable Online Learning Zhiyong Wang et.al. 2504.09192 null
2025-04-11 Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation Terrence Neumann et.al. 2504.08954 null
2025-04-11 Knowledge Graph-extended Retrieval Augmented Generation for Question Answering Jasper Linders et.al. 2504.08893 null
2025-04-11 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Fangzhi Xu et.al. 2504.08672 link
2025-04-11 MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation Tao Zhang et.al. 2504.08621 link
2025-04-16 Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks Ye Ye et.al. 2504.08525 link
2025-04-07 SEAL: Steerable Reasoning Calibration of Large Language Models for Free Runjin Chen et.al. 2504.07986 link
2025-04-10 Token Level Routing Inference System for Edge Devices Jianshu She et.al. 2504.07878 null
2025-04-10 Robust Hallucination Detection in LLMs via Adaptive Token Selection Mengjia Niu et.al. 2504.07863 null
2025-04-17 PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization Yang Jiao et.al. 2504.07717 null
2025-04-10 Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations Sheila Castilho et.al. 2504.07680 null
2025-04-10 Enhancing Large Language Models through Neuro-Symbolic Integration and Ontological Reasoning Ruslan Idelfonso Magana Vsevolodovna et.al. 2504.07640 link
2025-04-11 Malware analysis assisted by AI with R2AI Axelle Apvrille et.al. 2504.07574 null
2025-04-10 A taxonomy of epistemic injustice in the context of AI and the case for generative hermeneutical erasure Warmhold Jan Thomas Mollema et.al. 2504.07531 null
2025-04-10 Supervised Optimism Correction: Be Confident When LLMs Are Sure Junjie Zhang et.al. 2504.07527 null
2025-04-10 Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction Kyoyun Choi et.al. 2504.07415 null
2025-04-10 Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression Hanqi Xiao et.al. 2504.07389 link
2025-04-11 Alice: Proactive Learning with Teacher’s Demonstrations for Weak-to-Strong Generalization Shujin Wu et.al. 2504.07316 link
2025-04-09 HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification Bibek Paudel et.al. 2504.07069 null
2025-04-11 Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration Kostas Hatalis et.al. 2504.06943 null
2025-04-09 Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program Minghe Gao et.al. 2504.06606 link
2025-04-09 Do Reasoning Models Show Better Verbalized Calibration? Qingcheng Zeng et.al. 2504.06564 null
2025-04-08 Don’t Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning Yuehan Qin et.al. 2504.06438 null
2025-04-08 Human Trust in AI Search: A Large-Scale Experiment Haiwen Li et.al. 2504.06435 null
2025-04-09 GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization Bojana Ranković et.al. 2504.06265 link
2025-04-08 VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs Dongjun Qian et.al. 2504.05673 null
2025-04-08 On the Impact of Language Nuances on Sentiment Analysis with Large Language Models: Paraphrasing, Sarcasm, and Emojis Naman Bhargava et.al. 2504.05603 null
2025-04-07 GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases Alfred Clemedtson et.al. 2504.05478 link
2025-04-07 The challenge of uncertainty quantification of large language models in medicine Zahra Atf et.al. 2504.05278 null
2025-04-07 DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation Xinglin Lyu et.al. 2504.05122 link
2025-04-07 On the Performance of an Explainable Language Model on PubMedQA Venkat Srinivasan et.al. 2504.05074 null
2025-04-07 Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning Sugyeong Eo et.al. 2504.05047 null
2025-04-07 A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models Carlos Peláez-González et.al. 2504.04976 null
2025-04-07 A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization Wenyuan Xu et.al. 2504.04950 null
2025-04-06 Capturing AI’s Attention: Physics of Repetition, Hallucination, Bias and Beyond Frank Yingjie Huo et.al. 2504.04600 null
2025-04-06 Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models Rui Gan et.al. 2504.04562 link
2025-04-06 VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT Zhuo Zhi et.al. 2504.04471 null
2025-04-06 An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainability David Herrera-Poyatos et.al. 2504.04462 null
2025-04-09 How Accurately Do Large Language Models Understand Code? Sabaat Haroon et.al. 2504.04372 null
2025-04-06 Generative Large Language Models Trained for Detecting Errors in Radiology Reports Cong Sun et.al. 2504.04336 null
2025-04-09 Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks Marios Kokkodis et.al. 2504.04277 null
2025-04-05 Adaptive Elicitation of Latent Information Using Natural Language Jimmy Wang et.al. 2504.04204 null
2025-04-04 Structured Extraction of Process Structure Properties Relationships in Materials Science Amit K Verma et.al. 2504.03979 null
2025-04-04 Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content – The Ask ME Assistant Kovan Mzwri et.al. 2504.03966 null
2025-04-04 Practical Poisoning Attacks against Retrieval-Augmented Generation Baolei Zhang et.al. 2504.03957 null
2025-04-04 The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning Virilo Tejedor et.al. 2504.03823 null
2025-04-04 Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy Kamil Ciosek et.al. 2504.03579 null
2025-04-04 Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej Shubham Kumar Nigam et.al. 2504.03486 null
2025-04-07 LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications Botao Zhu et.al. 2504.03444 null
2025-04-04 Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models Mirko Borszukovszki et.al. 2504.03440 null
2025-04-04 Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models Afshin Khadangi et.al. 2504.03302 link
2025-04-04 Do Large Language Models Solve the Problems of Agent-Based Modeling? A Critical Review of Generative Social Simulations Maik Larooij et.al. 2504.03274 null
2025-04-04 Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation Weitao Li et.al. 2504.03165 link
2025-04-03 How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence Hongzhe Du et.al. 2504.02904 null
2025-04-03 Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models Liangjie Huang et.al. 2504.02902 null
2025-04-01 Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications Hongliu Cao et.al. 2504.02867 null
2025-04-01 The Illusionist’s Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances Yining Wang et.al. 2504.02865 null
2025-04-03 A Memory-Augmented LLM-Driven Method for Autonomous Merging of 3D Printing Work Orders Yuhao Liu et.al. 2504.02509 null
2025-04-03 Cognitive Memory in Large Language Models Lianlei Shan et.al. 2504.02441 null
2025-04-02 Achieving Unanimous Consensus in Decision Making Using Multi-Agents Apurba Pokharel et.al. 2504.02128 null
2025-04-02 Aligned Better, Listen Better for Audio-Visual Large Language Models Yuxin Guo et.al. 2504.02061 null
2025-04-03 Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation Baban Gain et.al. 2504.01919 null
2025-04-02 LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution Zhuoran Yang et.al. 2504.01533 null
2025-04-03 Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding Sakhinana Sagar Srinivas et.al. 2504.01281 null
2025-04-01 Grade Guard: A Smart System for Short Answer Automated Grading Niharika Dadu et.al. 2504.01253 null
2025-04-01 Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models Rafael Giebisch et.al. 2504.01248 null
2025-04-01 Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery Nicholas Clark et.al. 2504.01205 null
2025-04-01 $μ$ KE: Matryoshka Unstructured Knowledge Editing of Large Language Models Zian Su et.al. 2504.01196 null
2025-04-01 Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations Mahjabin Nahar et.al. 2504.01153 link
2025-04-01 MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs) Bikash Saha et.al. 2504.01145 link
2025-04-01 Investigating Large Language Models in Diagnosing Students’ Cognitive Skills in Math Problem-solving Hyoungwook Jin et.al. 2504.00843 null
2025-04-01 Aplicação de Large Language Models na Análise e Síntese de Documentos Jurídicos: Uma Revisão de Literatura Matheus Belarmino et.al. 2504.00725 null
2025-04-01 GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments Enjun Du et.al. 2504.00711 null
2025-04-01 DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism Dengchun Li et.al. 2504.00661 link
2025-04-01 Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences Xiangyang Liu et.al. 2504.00473 null
2025-04-01 Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics Shide Zhou et.al. 2504.00446 null
2025-04-01 Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding Mohanakrishnan Hariharan et.al. 2504.00409 null
2025-04-01 When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR) Mahak Agarwal et.al. 2504.00374 null
2025-03-31 SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning Shiyue Zhao et.al. 2504.00115 null
2025-03-30 Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge Agam Shah et.al. 2504.00042 null
2025-03-27 Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1 Birger Moell et.al. 2504.00016 null
2025-03-31 SQuat: Subspace-orthogonal KV Cache Quantization Hao Wang et.al. 2503.24358 null
2025-03-31 Model Hemorrhage and the Robustness Limits of Large Language Models Ziyang Ma et.al. 2503.23924 null
2025-03-31 Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement Yuqiao Tan et.al. 2503.23895 link
2025-03-31 Adaptive Layer-skipping in Pre-trained LLMs Xuan Luo et.al. 2503.23798 null
2025-03-31 MKA: Leveraging Cross-Lingual Consensus for Model Abstention Sharad Duwal et.al. 2503.23687 link
2025-03-30 RARE: Retrieval-Augmented Reasoning Modeling Zhengren Wang et.al. 2503.23513 link
2025-03-30 SCORE: Story Coherence and Retrieval Enhancement for AI Narratives Qiang Yi et.al. 2503.23512 null
2025-03-30 Re-Aligning Language to Visual Objects with an Agentic Workflow Yuming Chen et.al. 2503.23508 null
2025-03-30 An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering Alexander Murphy et.al. 2503.23415 null
2025-03-30 Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation Jiwon Jeong et.al. 2503.23363 link
2025-03-30 Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base Linxin Song et.al. 2503.23361 null
2025-03-29 Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus Claas Beger et.al. 2503.23229 link
2025-03-29 Large Language Models are Unreliable for Cyber Threat Intelligence Emanuele Mezzi et.al. 2503.23175 null
2025-03-29 Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments Yifan Xu et.al. 2503.23105 null
2025-03-29 DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation Hsin-Ling Hsu et.al. 2503.23013 null
2025-03-29 Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective Xinyu Yao et.al. 2503.22954 null
2025-03-29 Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering Yugen Sato et.al. 2503.22941 null
2025-04-02 Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use Nicholas Roth et.al. 2503.22931 null
2025-03-28 Identifying and Mitigating API Misuse in Large Language Models Terry Yue Zhuo et.al. 2503.22821 null
2025-03-26 InfoBid: A Simulation Framework for Studying Information Disclosure in Auctions with Large Language Model-based Agents Yue Yin et.al. 2503.22726 null
2025-03-25 Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models Bowei Tian et.al. 2503.22720 null
2025-03-25 LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation Sarah Martinson et.al. 2503.22719 link
2025-03-31 Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning Abdullah Vanlioglu et.al. 2503.22456 null
2025-03-28 Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs Yuan He et.al. 2503.22362 link
2025-03-28 Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions Yubo Li et.al. 2503.22353 null
2025-03-28 BanglAssist: A Bengali-English Generative AI Chatbot for Code-Switching and Dialect-Handling in Customer Service Francesco Kruk et.al. 2503.22283 null
2025-03-28 Learning to Instruct for Visual Instruction Tuning Zhihan Zhou et.al. 2503.22215 null
2025-03-28 Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models Zhanke Zhou et.al. 2503.22165 link
2025-03-27 Entropy-Aware Branching for Improved Mathematical Reasoning Xianzhi Li et.al. 2503.21961 null
2025-03-25 OAEI-LLM-T: A TBox Benchmark Dataset for Understanding LLM Hallucinations in Ontology Matching Systems Zhangcheng Qiang et.al. 2503.21813 null
2025-03-27 Cooking Task Planning using LLM and Verified by Graph Network Ryunosuke Takebayashi et.al. 2503.21564 null
2025-03-27 SWI: Speaking with Intent in Large Language Models Yuwei Yin et.al. 2503.21544 link
2025-04-02 Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best? Ashish Sardana et.al. 2503.21157 null
2025-03-27 Alleviating LLM-based Generative Retrieval Hallucination in Alipay Search Yedan Shen et.al. 2503.21098 null
2025-03-26 Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework Thomson Yen et.al. 2503.21023 link
2025-03-26 Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring Fraol Batole et.al. 2503.20934 null
2025-03-26 Exploring CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation Zhiwei Yang et.al. 2503.20826 link
2025-03-26 Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy Joonhyun Jeong et.al. 2503.20823 link
2025-03-26 MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search Yunhai Hu et.al. 2503.20757 null
2025-03-26 TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes Raj Sanjay Shah et.al. 2503.20648 null
2025-03-26 Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Zehui Liao et.al. 2503.20504 null
2025-03-26 GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization Zhouhong Gu et.al. 2503.20194 link
2025-03-25 FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs Carlos Plou et.al. 2503.19850 null
2025-03-25 HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection Maryam Bala et.al. 2503.19650 null
2025-03-25 KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models Zhiwei Wang et.al. 2503.19482 null
2025-03-25 VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU Zhongchun Zheng et.al. 2503.19449 null
2025-03-25 QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition Yuxuan Hu et.al. 2503.19353 link
2025-03-24 Language Model Uncertainty Quantification with Attention Chain Yinghao Li et.al. 2503.19168 link
2025-03-24 Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models Nariman Naderi et.al. 2503.18562 null
2025-03-24 Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions Dong Jing et.al. 2503.18320 null
2025-03-23 ShED-HD: A Shannon Entropy Distribution Framework for Lightweight Hallucination Detection on Edge Devices Aneesh Vathul et.al. 2503.18242 null
2025-03-23 GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks Varvara Krechetova et.al. 2503.18129 link
2025-03-23 SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA V Venktesh et.al. 2503.17990 null
2025-03-22 A Modular Dataset to Demonstrate LLM Abstraction Capability Adam Atanas et.al. 2503.17645 null
2025-03-22 ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently Jaeyeon Lee et.al. 2503.17587 link
2025-03-21 Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring Khadija Zanna et.al. 2503.17569 null
2025-03-21 Judge Anything: MLLM as a Judge Across Any Modality Shu Pu et.al. 2503.17489 null
2025-03-21 LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language Kun Chu et.al. 2503.17309 link
2025-03-21 FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs Albert Sawczyn et.al. 2503.17229 null
2025-03-20 Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models Zahra Khalila et.al. 2503.16581 null
2025-03-26 Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models Hanzhi Zhang et.al. 2503.16541 null
2025-03-18 Do Multimodal Large Language Models Understand Welding? Grigorii Khvatskii et.al. 2503.16537 null
2025-03-18 Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine Chengfeng Dou et.al. 2503.16530 null
2025-03-18 HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL Heng Ping et.al. 2503.16528 null
2025-03-20 Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Zijian Li et.al. 2503.16260 null
2025-03-20 Towards Lighter and Robust Evaluation for Retrieval Augmented Generation Alex-Razvan Ispas et.al. 2503.16161 link
2025-03-20 ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph Langming Liu et.al. 2503.15990 null
2025-03-20 Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models Baolong Bi et.al. 2503.15888 link
2025-03-21 Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance Hui Liu et.al. 2503.15886 null
2025-03-20 MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations Kyungho Bae et.al. 2503.15871 null
2025-03-20 Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey Xiaoou Liu et.al. 2503.15850 null
2025-03-20 Entropy-based Exploration Conduction for Multi-step Reasoning Jinghan Zhang et.al. 2503.15848 null
2025-03-23 DNA Bench: When Silence is Smarter – Benchmarking Over-Reasoning in Reasoning LLMs Masoud Hashemi et.al. 2503.15793 null
2025-03-19 R $^2$ : A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs Zefeng Lin et.al. 2503.15655 null
2025-03-19 How Well Can AI Build SD Models? William Schoenberg et.al. 2503.15580 null
2025-03-19 Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs Yuqi Zhu et.al. 2503.15341 null
2025-03-19 Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning? Roberto Araya et.al. 2503.15268 null
2025-03-19 Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems Sejong Kim et.al. 2503.15191 link
2025-03-19 Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks Yuting Guo et.al. 2503.15169 null
2025-03-19 ELTEX: A Framework for Domain-Driven Synthetic Data Generation Arina Razmyslovich et.al. 2503.15055 link
2025-03-18 Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence Sophia Hager et.al. 2503.14749 null
2025-03-18 Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving Priscylla Silva et.al. 2503.14630 link
2025-03-18 Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations Ziwei Ji et.al. 2503.14477 null
2025-03-18 From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models Qiantong Wang et.al. 2503.14392 null
2025-03-18 How much do LLMs learn from negative examples? Shadi Hamdan et.al. 2503.14391 link
2025-03-18 On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller? Pouria Sarhadi et.al. 2503.14379 link
2025-03-18 Learning on LLM Output Signatures for gray-box LLM Behavior Analysis Guy Bar-Shalom et.al. 2503.14043 link
2025-03-18 Predicting Human Choice Between Textually Described Lotteries Eyal Marantz et.al. 2503.14004 null
2025-03-18 FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks Siqi Zhang et.al. 2503.13966 null
2025-03-19 Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations Rui Yang et.al. 2503.13857 null
2025-03-18 Empowering GraphRAG with Knowledge Filtering and Integration Kai Guo et.al. 2503.13804 null
2025-03-18 Mapping the Trust Terrain: LLMs in Software Engineering – Insights and Perspectives Dipin Khati et.al. 2503.13793 null
2025-03-17 Pareidolic Illusions of Meaning: ChatGPT, Pseudolaw and the Triumph of Form over Substance Joe McIntyre et.al. 2503.13556 null
2025-03-14 RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration Hong Qing Yu et.al. 2503.13514 null
2025-03-17 MetaScale: Test-Time Scaling with Evolving Meta-Thoughts Qin Liu et.al. 2503.13447 null
2025-03-17 Managing Hybrid Solid-State Drives Using Large Language Models Qian Wei et.al. 2503.13105 null
2025-03-17 Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning Junming Liu et.al. 2503.12972 null
2025-03-17 MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting Rui Pu et.al. 2503.12931 null
2025-03-17 HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models Xinyan Jiang et.al. 2503.12908 link
2025-03-16 Can LLMs Formally Reason as Abstract Interpreters for Program Analysis? Jacqueline L. Mitchell et.al. 2503.12686 null
2025-03-16 From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations Sarvesh Baskar et.al. 2503.12556 null
2025-03-21 LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation Yuqi Sun et.al. 2503.12547 null
2025-03-18 SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? Jianzhu Yao et.al. 2503.12349 null
2025-03-15 PredicateFix: Repairing Static Analysis Alerts with Bridging Predicates Yuan-An Xiao et.al. 2503.12205 null
2025-03-20 Applications of Large Language Model Reasoning in Feature Generation Dharani Chandra et.al. 2503.11989 null
2025-03-14 LLM Agents for Education: Advances and Applications Zhendong Chu et.al. 2503.11733 null
2025-03-14 Neutralizing Bias in LLM Reasoning using Entailment Graphs Liang Cheng et.al. 2503.11614 link
2025-03-14 D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning Jia Zhang et.al. 2503.11441 null
2025-03-14 Modeling Subjectivity in Cognitive Appraisal with Language Models Yuxiang Zhou et.al. 2503.11381 null
2025-03-14 Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches Panggih Kusuma Ningrum et.al. 2503.11376 null
2025-03-14 AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation Fengyu Li et.al. 2503.11346 link
2025-03-14 Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models Aissatou Diallo et.al. 2503.11336 null
2025-03-14 Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries Sahil Kale et.al. 2503.11256 link
2025-03-14 Collaboration is all you need: LLM Assisted Safe Code Translation Rabimba Karanjai et.al. 2503.11237 null
2025-03-13 Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations Piyush Gupta et.al. 2503.10941 null
2025-03-13 HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust Yu Luo et.al. 2503.10793 null
2025-03-12 CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries Zhiyuan Wang et.al. 2503.10707 null
2025-03-12 Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models Shahnewaz Karim Sakib et.al. 2503.10690 null
2025-03-13 TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention Jinhao Duan et.al. 2503.10602 link
2025-03-13 SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models Sahar Admoni et.al. 2503.10509 null
2025-03-13 LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions Gaurav Kumar Gupta et.al. 2503.10486 null
2025-03-13 Collaborative Speculative Inference for Efficient LLM Inference Serving Luyao Gao et.al. 2503.10325 null
2025-03-13 StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error Shu-Xun Yang et.al. 2503.10105 link
2025-03-13 Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model Qiyuan Deng et.al. 2503.10093 null
2025-03-12 Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets Zahra Abbasiantaeb et.al. 2503.09902 link
2025-03-12 Probabilistic Reasoning with LLMs for k-anonymity Estimation Jonathan Zheng et.al. 2503.09674 null
2025-03-12 CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection Richard A. Dubniczky et.al. 2503.09433 link
2025-03-12 NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model Yuzhi Lai et.al. 2503.09335 link
2025-03-12 Token Weighting for Long-Range Language Modeling Falko Helm et.al. 2503.09202 link
2025-03-12 Is LLMs Hallucination Usable? LLM-based Negative Reasoning for Fake News Detection Chaowei Zhang et.al. 2503.09153 null
2025-03-11 Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation Yu Wang et.al. 2503.08963 null
2025-03-11 CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving Changxing Liu et.al. 2503.08683 link
2025-03-11 DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process Minjun Zhu et.al. 2503.08569 null
2025-03-11 Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Zhuo Zhi et.al. 2503.08308 null
2025-03-11 FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback Kangan Qian et.al. 2503.08162 null
2025-03-11 LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification Siyuan Wang et.al. 2503.07937 null
2025-03-10 Safety Guardrails for LLM-Enabled Robots Zachary Ravichandran et.al. 2503.07885 null
2025-03-10 HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations Samir Abdaljalil et.al. 2503.07833 null
2025-03-07 SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs Jaewoo Song et.al. 2503.07657 null
2025-03-07 MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration Jinguang Wang et.al. 2503.07654 null
2025-03-10 Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review Samuel Ferino et.al. 2503.07556 null
2025-03-10 Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies Luyi Jiang et.al. 2503.07306 null
2025-03-10 Quantizing Large Language Models for Code Generation: A Differentiated Replication Alessandro Giagnorio et.al. 2503.07103 null
2025-03-10 CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation Runqi Sui et.al. 2503.06950 null
2025-03-09 Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia Sabeen Ahmed et.al. 2503.06797 null
2025-03-09 Delusions of Large Language Models Hongshen Xu et.al. 2503.06709 null
2025-03-09 Alignment for Efficient Tool Calling of Large Language Models Hongshen Xu et.al. 2503.06708 null
2025-03-09 Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform Chenyu Huang et.al. 2503.06676 null
2025-03-09 Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving Yao Cheng et.al. 2503.06567 null
2025-03-09 Graph Retrieval-Augmented LLM for Conversational Recommendation Systems Zhangchi Qiu et.al. 2503.06430 null
2025-03-09 Performant LLM Agentic Framework for Conversational AI Alex Casella et.al. 2503.06410 null
2025-03-08 Sample-aware Adaptive Structured Pruning for Large Language Models Jun Kong et.al. 2503.06184 null
2025-03-08 Wireless Hallucination in Generative AI-enabled Communications: Concepts, Issues, and Solutions Xudong Wang et.al. 2503.06149 link
2025-03-08 A Survey on Post-training of Large Language Models Guiyao Tie et.al. 2503.06072 link
2025-03-07 SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs Samir Abdaljalil et.al. 2503.05980 null
2025-03-07 TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator Deepak Vungarala et.al. 2503.05951 null
2025-03-04 I Think, Therefore I Hallucinate: Minds, Machines, and the Art of Being Wrong Sebastian Barros et.al. 2503.05806 null
2025-03-07 R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Huatong Song et.al. 2503.05592 null
2025-03-07 Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information Junbo Zhao et.al. 2503.05543 null
2025-03-07 Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering Yusong Ke et.al. 2503.05505 null
2025-03-07 Maximum Hallucination Standards for Domain-Specific Large Language Models Tingmingke Lu et.al. 2503.05481 null
2025-03-07 An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning Navdeep Kaur et.al. 2503.05439 null
2025-03-07 GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation Zhenxuan Zhang et.al. 2503.05347 link
2025-03-07 Path Pooling: Train-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation Hairu Wang et.al. 2503.05203 null
2025-03-07 RocketEval: Efficient Automated LLM Evaluation via Grading Checklist Tianjun Wei et.al. 2503.05142 link
2025-03-06 LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression Souvik Kundu et.al. 2503.04982 null
2025-03-10 Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents Jingying Zeng et.al. 2503.04830 null
2025-03-07 START: Self-taught Reasoner with Tools Chengpeng Li et.al. 2503.04625 null
2025-03-06 HalluCounter: Reference-free LLM Hallucination Detection in the Wild! Ashok Urlana et.al. 2503.04615 null
2025-03-06 Benchmarking Reasoning Robustness in Large Language Models Tong Yu et.al. 2503.04550 null
2025-03-06 TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction Chao Wang et.al. 2503.04457 null
2025-03-06 On Fact and Frequency: LLM Responses to Misinformation Expressed with Uncertainty Yana van de Sande et.al. 2503.04271 null
2025-03-06 Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation Ziqiang Cui et.al. 2503.04162 null
2025-03-06 KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease Yongchao Long et.al. 2503.04153 link
2025-03-05 Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis Ahmad Hafez et.al. 2503.03911 link
2025-03-07 LEWIS (LayEr WIse Sparsity) – A Training Free Guided Model Merging Approach Hetarth Chopra et.al. 2503.03874 null
2025-03-04 BotUmc: An Uncertainty-Aware Twitter Bot Detection with Multi-view Causal Inference Tao Yang et.al. 2503.03775 null
2025-03-05 The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems Richard Ren et.al. 2503.03750 null
2025-03-05 Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models Bar Karov et.al. 2503.03669 link
2025-03-05 Structured Outputs Enable General-Purpose LLMs to be Medical Experts Guangfu Guo et.al. 2503.03194 null
2025-03-04 SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs Samir Abdaljalil et.al. 2503.03032 null
2025-03-04 Effectively Steer LLM To Follow Preference via Building Confident Directions Bingqing Song et.al. 2503.02989 null
2025-03-04 Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework Ziang Zhou et.al. 2503.02863 null
2025-03-04 Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers Zicong He et.al. 2503.02851 link
2025-03-04 Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs Yuzhe Gu et.al. 2503.02846 link
2025-03-04 FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting Congluo Xu et.al. 2503.02692 null
2025-03-04 MPO: Boosting LLM Agents with Meta Plan Optimization Weimin Xiong et.al. 2503.02682 link
2025-03-04 Multidimensional Consistency Improves Reasoning in Language Models Huiyuan Lai et.al. 2503.02670 null
2025-03-05 Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models Paul Stangel et.al. 2503.02623 null
2025-03-04 AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection Dimitra Karkani et.al. 2503.02442 null
2025-03-04 Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling Hang Zheng et.al. 2503.02233 null
2025-03-04 DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models Saeed Ranjbar Alvar et.al. 2503.02175 link
2025-03-03 OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments Qianwei Wang et.al. 2503.02106 null
2025-03-05 HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Tin Nguyen et.al. 2503.02003 link
2025-03-02 NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT Jiaying Hong et.al. 2503.01921 link
2025-03-01 How to Steer LLM Latents for Hallucination Detection? Seongheon Park et.al. 2503.01917 null
2025-03-03 Can (A)I Change Your Mind? Miriam Havin et.al. 2503.01844 link
2025-03-04 Position: Don’t use the CLT in LLM evals with fewer than a few hundred datapoints Sam Bowyer et.al. 2503.01747 null
2025-03-03 Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution Kun Li et.al. 2503.01695 null
2025-03-03 When an LLM is apprehensive about its answers – and when its uncertainty is justified Petr Sychev et.al. 2503.01688 link
2025-03-03 Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization Siya Qi et.al. 2503.01670 link
2025-03-03 Detecting Stylistic Fingerprints of Large Language Models Yehonatan Bitton et.al. 2503.01659 null
2025-03-03 Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning Wenjie Wu et.al. 2503.01642 null
2025-03-03 Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering Zhanghao Hu et.al. 2503.01606 null
2025-03-03 None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering Zhi Rui Tam et.al. 2503.01550 null
2025-03-03 Revisiting Large Language Model Pruning using Neuron Semantic Attribution Yizhuo Ding et.al. 2503.01542 null
2025-03-03 What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret Yufeng Yuan et.al. 2503.01491 null
2025-03-03 Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation Linhai Zhang et.al. 2503.01315 null
2025-03-03 LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains Ling Xiao et.al. 2503.01236 null
2025-03-06 CE-U: Cross Entropy Unlearning Bo Yang et.al. 2503.01224 null
2025-03-03 Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG Wenbin Wang et.al. 2503.01222 link
2025-03-04 Can Large Language Models Help Experimental Design for Causal Discovery? Junyi Li et.al. 2503.01139 null
2025-03-02 Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies Tianyi Huang et.al. 2503.00724 null
2025-03-02 GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development Leming Shen et.al. 2503.00686 link
2025-03-02 From Prompting to Partnering: Personalization Features for Human-LLM Interactions Si Thu et.al. 2503.00681 null
2025-03-01 Embracing Diversity: A Multi-Perspective Approach with Soft Labels Benedetta Muscato et.al. 2503.00489 null
2025-03-01 U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack Yunfan Gao et.al. 2503.00353 link
2025-03-01 Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy Jahan C. Penny-Dimri et.al. 2503.00269 null
2025-02-28 A Survey of Uncertainty Estimation Methods on Large Language Models Zhiqiu Xia et.al. 2503.00172 null
2025-02-27 Societal Alignment Frameworks Can Improve LLM Alignment Karolina Stańczak et.al. 2503.00069 null
2025-03-04 Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs Xiaomin Li et.al. 2502.21239 null
2025-02-28 PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information Hansi Yang et.al. 2502.21087 null
2025-03-03 A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation Xujie Yuan et.al. 2502.20854 null
2025-02-28 Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow Jiaqi Bai et.al. 2502.20750 link
2025-02-28 Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models Colleen Gilhuly et.al. 2502.20647 null
2025-02-28 Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems Jędrzej Warczyński et.al. 2502.20609 null
2025-02-27 Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization Ryan C. Barron et.al. 2502.20364 link
2025-02-27 Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models Yi Jing et.al. 2502.20344 null
2025-02-27 Expertise Is What We Want Alan Ashworth et.al. 2502.20335 null
2025-02-27 Conformal Tail Risk Control for Large Language Model Alignment Catherine Yu-Chi Chen et.al. 2502.20285 null
2025-02-27 Similarity-Distance-Magnitude Universal Verification Allen Schmaltz et.al. 2502.20167 link
2025-03-04 ProAPO: Progressively Automatic Prompt Optimization for Visual Classification Xiangyan Qu et.al. 2502.19844 link
2025-02-27 Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences Linzhuo li et.al. 2502.19679 null
2025-02-26 Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review Sungduk Yu et.al. 2502.19614 null
2025-02-26 Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems Nayoung Choi et.al. 2502.19596 null
2025-02-26 Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents Ashley Lewis et.al. 2502.19545 null
2025-02-26 Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices Xinru Wang et.al. 2502.19410 null
2025-02-26 Verde: Verification via Refereed Delegation for Machine Learning Programs Arasu Arun et.al. 2502.19405 null
2025-02-26 Efficient Federated Search for Retrieval-Augmented Generation Rachid Guerraoui et.al. 2502.19280 null
2025-02-26 Bi’an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation Zhouyu Jiang et.al. 2502.19209 null
2025-02-26 Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement Siyuan Zhang et.al. 2502.19127 null
2025-02-26 Talking like Piping and Instrumentation Diagrams (P&IDs) Achmad Anggawirya Alimin et.al. 2502.18928 null
2025-02-26 Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models Shuliang Liu et.al. 2502.18817 null
2025-02-26 Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science Xiaohua Wu et.al. 2502.18729 null
2025-02-25 Scalable Best-of-N Selection for Large Language Models via Self-Certainty Zhewei Kang et.al. 2502.18581 link
2025-02-25 Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions Yizhe Zhang et.al. 2502.18435 null
2025-02-25 Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods Nicola Cecere et.al. 2502.18389 null
2025-02-25 BRIDO: Bringing Democratic Order to Abstractive Summarization Junhyun Lee et.al. 2502.18342 null
2025-02-25 Can LLMs Explain Themselves Counterfactually? Zahra Dehghanighobadi et.al. 2502.18156 null
2025-02-25 LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers Zhuocheng Zhang et.al. 2502.18139 link
2025-02-25 Verdict: A Library for Scaling Judge-Time Compute Nimit Kalra et.al. 2502.18018 link
2025-02-27 LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction Suozhi Huang et.al. 2502.17925 null
2025-02-25 An Overview of Large Language Models for Statisticians Wenlong Ji et.al. 2502.17814 null
2025-02-25 Uncertainty Quantification for LLM-Based Survey Simulations Chengpiao Huang et.al. 2502.17773 null
2025-02-24 Hallucination Detection in LLMs Using Spectral Features of Attention Maps Jakub Binkowski et.al. 2502.17598 link
2025-02-24 Towards Conditioning Clinical Text Generation for User Control Osman Alperen Koraş et.al. 2502.17571 null
2025-02-22 SAE-V: Interpreting Multimodal Models for Enhanced Alignment Hantao Lou et.al. 2502.17514 null
2025-02-24 CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought Boxuan Zhang et.al. 2502.17214 link
2025-02-24 IGDA: Interactive Graph Discovery through Large Language Model Agents Alex Havrilla et.al. 2502.17189 null
2025-02-24 LettuceDetect: A Hallucination Detection Framework for RAG Applications Ádám Kovács et.al. 2502.17125 link
2025-02-27 LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences Sijia Yao et.al. 2502.17057 link
2025-02-24 Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology Longchao Da et.al. 2502.17026 null
2025-02-24 Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning Jiaheng Li et.al. 2502.16896 null
2025-02-24 Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models Yaqi Sun et.al. 2502.16842 null
2025-02-25 Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses Tiejin Chen et.al. 2502.16820 null
2025-02-23 Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT Nidhal Jegham et.al. 2502.16428 null
2025-02-23 Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications Feng Ma et.al. 2502.16402 null
2025-02-22 An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning Masoud Shokrnezhad et.al. 2502.16198 null
2025-02-22 EPERM: An Evidence Path Enhanced Reasoning Model for Knowledge Graph Question and Answering Xiao Long et.al. 2502.16171 null
2025-02-22 ZiGong 1.0: A Large Language Model for Financial Credit Yu Lei et.al. 2502.16159 null
2025-02-22 The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination Yuji Zhang et.al. 2502.16143 null
2025-02-22 Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals Linda Zeng et.al. 2502.16101 null
2025-02-21 Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications Zichen Chen et.al. 2502.15865 null
2025-02-20 Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection Yihao Xue et.al. 2502.15845 null
2025-02-20 Hallucination Detection in Large Language Models with Metamorphic Relations Borui Yang et.al. 2502.15844 null
2025-02-21 AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind Zhining Zhang et.al. 2502.15676 link
2025-02-24 Empowering LLMs with Logical Reasoning: A Comprehensive Survey Fengxiang Cheng et.al. 2502.15652 null
2025-02-21 A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany Ina Dormuth et.al. 2502.15568 null
2025-02-21 PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning Pengcheng Huang et.al. 2502.15543 link
2025-02-21 Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making Eunhye Kim et.al. 2502.15395 null
2025-02-21 Evaluating Social Biases in LLM Reasoning Xuyang Wu et.al. 2502.15361 null
2025-02-21 From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants Manisha Mukherjee et.al. 2502.15237 null
2025-02-20 Using tournaments to calculate AUROC for zero-shot classification with LLMs Wonjin Yoon et.al. 2502.15018 null
2025-02-19 OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment Xiangjin Xie et.al. 2502.14913 null
2025-02-19 EvoP: Robust LLM Inference via Evolutionary Pruning Shangyu Wu et.al. 2502.14910 null
2025-02-19 KOALA: Knowledge Conflict Augmentations for Robustness in Vision Language Models Peter Carragher et.al. 2502.14908 link
2025-02-20 Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning Shuyue Stella Li et.al. 2502.14860 link
2025-02-20 Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs Zongxia Li et.al. 2502.14748 null
2025-02-20 CER: Confidence Enhanced Reasoning in LLMs Ali Razghandi et.al. 2502.14634 link
2025-02-20 Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery Minh-Quyet Ha et.al. 2502.14631 null
2025-02-20 ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification Hyunseok Lee et.al. 2502.14565 null
2025-02-20 Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation Austin A. Barr et.al. 2502.14523 link
2025-02-25 How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Sergey Pletenev et.al. 2502.14502 link
2025-02-20 Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models Artem Vazhentsev et.al. 2502.14427 link
2025-02-20 ParallelComp: Parallel Long-Context Compressor for Length Extrapolation Jing Xiong et.al. 2502.14317 null
2025-02-20 MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Shrey Pandit et.al. 2502.14302 null
2025-02-20 STeCa: Step-level Trajectory Calibration for LLM Agent Learning Hanlin Wang et.al. 2502.14276 link
2025-02-20 Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment Jiaxi Li et.al. 2502.14275 null
2025-02-20 PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant Congrui Yin et.al. 2502.14271 null
2025-02-20 MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels Xiaoou Liu et.al. 2502.14268 null
2025-02-20 Multi-Faceted Studies on Data Poisoning can Advance LLM Development Pengfei He et.al. 2502.14182 link
2025-02-19 SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation Song Duong et.al. 2502.13674 null
2025-02-19 C2T: A Classifier-Based Tree Construction Method in Speculative Decoding Feiye Huo et.al. 2502.13652 null
2025-02-19 REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models DongGeon Lee et.al. 2502.13622 null
2025-02-19 What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis Peiran Wang et.al. 2502.13490 null
2025-02-19 LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models Ruiming Tang et.al. 2502.13481 null
2025-02-19 TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation Jialin Ouyang et.al. 2502.13442 link
2025-02-19 Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning Ningke Li et.al. 2502.13416 null
2025-02-19 Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval Aditya Sharma et.al. 2502.13369 null
2025-02-18 SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Yucheng Shi et.al. 2502.13233 null
2025-02-17 Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment Yuze Zhao et.al. 2502.13170 link
2025-02-18 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Shuo Xing et.al. 2502.13146 link
2025-02-18 Understanding and Rectifying Safety Perception Distortion in VLMs Xiaohan Zou et.al. 2502.13095 null
2025-02-18 LAMD: Context-driven Android Malware Detection and Classification with LLMs Xingzhi Qian et.al. 2502.13055 null
2025-02-20 Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation Sha Li et.al. 2502.13019 null
2025-02-18 Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs Adi Simhi et.al. 2502.12964 null
2025-02-18 Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models Elena Stringli et.al. 2502.12821 null
2025-02-20 How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild Saad Obaid ul Islam et.al. 2502.12769 link
2025-02-18 R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs Sumin Jo et.al. 2502.12767 link
2025-02-18 “I know myself better, but not really greatly”: Using LLMs to Detect and Explain LLM-Generated Texts Jiazhou Ji et.al. 2502.12743 null
2025-02-18 R.R.: Unveiling LLM Training Privacy through Recollection and Ranking Wenlong Meng et.al. 2502.12658 link
2025-02-18 COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation Sean Wang et.al. 2502.12601 null
2025-02-18 EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning Xiaoqian Liu et.al. 2502.12486 null
2025-02-18 Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking Alireza S. Ziabari et.al. 2502.12470 null
2025-02-18 MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation Yutong Wang et.al. 2502.12468 null
2025-02-17 Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu et.al. 2502.12216 null
2025-02-17 Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control Jinyan Su et.al. 2502.12145 link
2025-02-17 KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs Qi Zhao et.al. 2502.12029 null
2025-02-17 SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Fengqing Jiang et.al. 2502.12025 null
2025-02-17 Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning Tianyi Wu et.al. 2502.11962 null
2025-02-17 Can Your Uncertainty Scores Detect Hallucinated Entity? Min-Hsuan Yeh et.al. 2502.11948 null
2025-02-17 Cognitive-Aligned Document Selection for Retrieval-augmented Generation Bingyu Wan et.al. 2502.11770 null
2025-02-17 ReviewEval: An Evaluation Framework for AI-Generated Reviews Chavvi Kirtani et.al. 2502.11736 null
2025-02-17 Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception Shiyu Ni et.al. 2502.11677 null
2025-02-17 Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation Arindam Sharma et.al. 2502.11620 null
2025-02-17 Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? Hanxing Ding et.al. 2502.11400 null
2025-02-17 “Nuclear Deployed!”: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents Rongwu Xu et.al. 2502.11355 link
2025-02-16 Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation Hieu Nguyen et.al. 2502.11306 null
2025-02-16 Uncertainty-Aware Step-wise Verification with Generative Reward Models Zihuiwen Ye et.al. 2502.11250 null
2025-02-16 A Survey of LLM-based Agents in Medicine: How far are we from Baymax? Wenxuan Wang et.al. 2502.11211 null
2025-02-16 Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs Fei Yu et.al. 2502.11155 null
2025-02-18 Valuable Hallucinations: Realizable Non-realistic Propositions Qiucheng Chen et.al. 2502.11113 null
2025-02-16 Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications Alexandru Lecu et.al. 2502.11108 link
2025-02-16 Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models Prateek Chhikara et.al. 2502.11028 link
2025-02-16 Leveraging Uncertainty Estimation for Efficient LLM Routing Tuo Zhang et.al. 2502.11021 null
2025-02-16 Agentic LLM Framework for Adaptive Decision Discourse Antoine Dolant et.al. 2502.10978 null
2025-02-16 SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information Xiangyu Zhang et.al. 2502.10950 null
2025-02-15 Towards Effective Extraction and Evaluation of Factual Claims Dasha Metropolitansky et.al. 2502.10855 null
2025-02-15 An Empirical Analysis of Uncertainty in Large Language Model Evaluations Qiujie Xie et.al. 2502.10709 link
2025-02-15 LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization Erica Zhang et.al. 2502.10648 link
2025-02-14 Post-training an LLM for RAG? Train on Self-Generated Demonstrations Matthew Finlayson et.al. 2502.10596 null
2025-02-14 Can Large Language Model Agents Balance Energy Systems? Xinxing Ren et.al. 2502.10557 link
2025-02-14 A novel approach to data generation in generative model JaeHong Kim et.al. 2502.10092 null
2025-02-14 Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos Weirui Ye et.al. 2502.09886 null
2025-02-14 Automated Hypothesis Validation with Agentic Sequential Falsifications Kexin Huang et.al. 2502.09858 link
2025-02-13 Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes Taylan G. Topcu et.al. 2502.09690 null
2025-02-13 LP-LM: No Hallucinations in Question Answering with Logic Programming Katherine Wu et.al. 2502.09212 link
2025-02-13 Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York Sanskar Sehgal et.al. 2502.09204 null
2025-02-13 Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables Xuzhao Geng et.al. 2502.09073 null
2025-02-13 Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models Xin Zhou et.al. 2502.08922 null
2025-02-13 MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training Xinxin You et.al. 2502.08904 null
2025-02-12 Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Mohammad Mahdi Abootorabi et.al. 2502.08826 link
2025-02-11 Hallucination, Monofacts, and Miscalibration: An Empirical Investigation Muqing Miao et.al. 2502.08666 link
2025-02-10 Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis Emanuele Ricco et.al. 2502.08663 null
2025-02-09 Few-shot_LLM_Synthetic_Data_with_Distribution_Matching Jiyuan Ren et.al. 2502.08661 link
2025-02-08 Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions Jingxin Xu et.al. 2502.08657 null
2025-02-12 Ensemble based approach to quantifying uncertainty of LLM based classifications Srijith Rajamohan et.al. 2502.08631 null
2025-02-12 Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding Konstantin Berestizshevsky et.al. 2502.08363 link
2025-02-17 Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG Kushagra Bhushan et.al. 2502.08356 link
2025-02-12 Compromising Honesty and Harmlessness in Language Models via Deception Attacks Laurène Vaugrante et.al. 2502.08301 null
2025-02-12 Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis Changhua Pei et.al. 2502.08224 null
2025-02-12 Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences Shanshan Han et.al. 2502.08142 null
2025-02-12 HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses Sujeong Lee et.al. 2502.08109 null
2025-02-12 Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD) Naomi Akhras et.al. 2502.08073 null
2025-02-11 From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems Yining Hong et.al. 2502.07974 null
2025-02-11 Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning Rujing Yao et.al. 2502.07912 link
2025-02-11 Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights Ahilan Ayyachamy Nadar Ponnusamy et.al. 2502.07835 link
2025-02-17 Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering Shuzheng Si et.al. 2502.07340 link
2025-02-11 When More is Less: Understanding Chain-of-Thought Length in LLMs Yuyang Wu et.al. 2502.07266 null
2025-02-11 Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs Sina Salimian et.al. 2502.07186 null
2025-02-11 Refine Knowledge of Large Language Models via Adaptive Contrastive Learning Yinghui Li et.al. 2502.07184 null
2025-02-11 Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning Feng Chen et.al. 2502.07154 link
2025-02-11 Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning Jiayuan Zhu et.al. 2502.07143 null
2025-02-08 Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models Sina Tayebati et.al. 2502.06884 link
2025-02-08 Group Reasoning Emission Estimation Networks Yanming Guo et.al. 2502.06874 null
2025-02-08 Knowledge Graph-Guided Retrieval Augmented Generation Xiangrong Zhu et.al. 2502.06864 link
2025-02-07 LLM-Supported Natural Language to Bash Translation Finnian Westenfelder et.al. 2502.06858 link
2025-02-11 Calibrating LLMs with Information-Theoretic Evidential Deep Learning Yawei Li et.al. 2502.06351 link
2025-02-10 Expect the Unexpected: FailSafe Long Context QA for Finance Kiran Kamble et.al. 2502.06329 null
2025-02-10 Emergent Response Planning in LLM Zhichen Dong et.al. 2502.06258 null
2025-02-10 Confidence Improves Self-Consistency in LLMs Amir Taubenfeld et.al. 2502.06233 null
2025-02-10 Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement Junyu Lu et.al. 2502.06207 link
2025-02-10 Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis Sanket Jantre et.al. 2502.06173 null
2025-02-09 GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation Runchuan Zhu et.al. 2502.05911 null
2025-02-09 Self-Training Large Language Models for Tool-Use Without Demonstrations Ne Luo et.al. 2502.05867 null
2025-02-09 Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models Cheng Peng Huang et.al. 2502.05825 null
2025-02-09 Assessing confidence in frontier AI safety cases Stephen Barrett et.al. 2502.05791 null
2025-02-09 Visual Text Mining with Progressive Taxonomy Construction for Environmental Studies Sam Yu-Te Lee et.al. 2502.05731 link
2025-02-07 SEER: Self-Explainability Enhancement of Large Language Models’ Representations Guanxu Chen et.al. 2502.05242 null
2025-02-07 ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework Xiaoyu Deng et.al. 2502.05084 null
2025-02-07 Aligning Black-box Language Models with Human Judgments Gerrit J. J. van den Burg et.al. 2502.04997 null
2025-02-11 CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs Roman Vashurin et.al. 2502.04964 null
2025-02-07 Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks Jing Yang et.al. 2502.04797 link
2025-02-10 Confidence Elicitation: A New Attack Vector for Large Language Models Brian Formento et.al. 2502.04643 link
2025-02-06 TruthFlow: Truthful LLM Generation via Representation Flow Correction Hanyu Wang et.al. 2502.04556 null
2025-02-06 Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization Yu-Neng Chuang et.al. 2502.04428 null
2025-02-06 KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Xing Li et.al. 2502.04420 link
2025-02-11 Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing Kunfeng Lai et.al. 2502.04411 null
2025-02-06 FAS: Fast ANN-SNN Conversion for Spiking Large Language Models Long Chen et.al. 2502.04405 link
2025-02-05 Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning Jonathan Kim et.al. 2502.04381 null
2025-02-05 MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction Xiao Hu et.al. 2502.04360 null
2025-02-04 LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving Md Sifat Hossain et.al. 2502.04355 null
2025-02-06 Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software Andreas Baumann et.al. 2502.03916 null
2025-02-06 BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Bo Pang et.al. 2502.03860 null
2025-02-12 Syntriever: How to Train Your Retriever with Synthetic Data from LLMs Minsang Kim et.al. 2502.03824 link
2025-02-10 Large Language Models for Multi-Robot Systems: A Survey Peihan Li et.al. 2502.03814 link
2025-02-08 Enhancing Hallucination Detection through Noise Injection Litian Liu et.al. 2502.03799 null
2025-02-06 Adaptive Semantic Prompt Caching with VectorQ Luis Gaspar Schroeder et.al. 2502.03771 null
2025-02-06 Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models Rui Cai et.al. 2502.03715 null
2025-02-06 MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers Nicole Cho et.al. 2502.03711 null
2025-02-06 Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers Daniel Beaglehole et.al. 2502.03708 null
2025-02-06 LLM Alignment as Retriever Optimization: An Information Retrieval Perspective Bowen Jin et.al. 2502.03699 null
2025-02-05 Reflection-Window Decoding: Text Generation with Selective Refinement Zeyu Tang et.al. 2502.03678 null
2025-02-05 Advancing Reasoning in Large Language Models: Promising Methods and Approaches Avinash Patil et.al. 2502.03671 null
2025-02-04 Artificial Intelligence and Legal Analysis: Implications for Legal Education and the Profession Lee Peoples et.al. 2502.03487 null
2025-02-05 A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) Yiye Chen et.al. 2502.03450 null
2025-02-05 SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs Ben Liu et.al. 2502.03283 null
2025-02-05 Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models Jialiang Wu et.al. 2502.03199 null
2025-02-05 IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates Aissatou Diallo et.al. 2502.03080 null
2025-02-04 An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification Riddhi More et.al. 2502.02715 null
2025-02-04 EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Yize Wu et.al. 2502.02493 null
2025-02-04 Activation-Informed Merging of Large Language Models Amin Heyrani Nobari et.al. 2502.02421 link
2025-02-04 From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing Siwei Luo et.al. 2502.02025 null
2025-02-03 SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models Diyana Muhammed et.al. 2502.01812 null
2025-02-03 Position: Towards a Responsible LLM-empowered Multi-Agent Systems Jinwei Hu et.al. 2502.01714 null
2025-02-02 Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model Hadas Ben-Atya et.al. 2502.01691 null
2025-02-02 LIBRA: Measuring Bias of Large Language Model from a Local Context Bo Pang et.al. 2502.01679 null
2025-02-01 Benchmark on Peer Review Toxic Detection: A Challenging Task with a New Dataset Man Luo et.al. 2502.01676 null
2025-02-03 CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering Zongxi Li et.al. 2502.01523 null
2025-02-03 Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant Gaole He et.al. 2502.01390 link
2025-02-03 PSSD: Making Large Language Models Self-denial via Human Psyche Structure Jinzhi Liao et.al. 2502.01344 link
2025-02-03 Human-Agent Interaction in Synthetic Social Networks: A Framework for Studying Online Polarization Tim Donkers et.al. 2502.01340 null
2025-02-03 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Xinyan Guan et.al. 2502.01142 null
2025-02-03 Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning Guanlin Li et.al. 2502.01116 null
2025-02-03 ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution Kanika Goswami et.al. 2502.00989 null
2025-02-03 Context-Aware Hierarchical Merging for Long Document Summarization Litu Ou et.al. 2502.00977 null
2025-02-02 Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications Yixin Wu et.al. 2502.00808 link
2025-02-02 Generative AI for Analyzing Participatory Rural Appraisal Data: An Exploratory Case Study in Gender Research Srividya Sheshadri et.al. 2502.00763 null
2025-02-02 MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction Chao Wang et.al. 2502.00717 null
2025-02-01 Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation Stuart Armstrong et.al. 2502.00580 link
2025-02-01 Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning Zhi Zhou et.al. 2502.00511 null
2025-02-01 Estimating LLM Uncertainty with Logits Huan Ma et.al. 2502.00290 link
2025-01-31 DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets Abdurrahim Yilmaz et.al. 2502.00196 null
2025-01-31 Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models Alina Shutova et.al. 2501.19392 link
2025-01-31 Towards Adaptive Self-Improvement for Smarter Energy Systems Alexander Sommer et.al. 2501.19340 null
2025-01-31 Homogeneity Bias as Differential Sampling Uncertainty in Language Models Messi H. J. Lee et.al. 2501.19337 null
2025-01-31 Offline Learning for Combinatorial Multi-armed Bandits Xutong Liu et.al. 2501.19300 null
2025-01-31 Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs Kejia Zhang et.al. 2501.19164 null
2025-01-31 Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities Arjun Krishna et.al. 2501.19012 null
2025-01-30 Survey and Improvement Strategies for Gene Prioritization with Large Language Models Matthew Neeley et.al. 2501.18794 null
2025-01-30 Differentially Private Steering for Large Language Model Alignment Anmol Goel et.al. 2501.18532 link
2025-01-30 CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization Yanxia Deng et.al. 2501.18475 null
2025-01-31 RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing Jinyao Guo et.al. 2501.18160 link
2025-01-29 Large Language Models Think Too Fast To Explore Effectively Lan Pan et.al. 2501.18009 null
2025-01-29 Uncertainty Quantification and Decomposition for LLM-based Recommendation Wonbin Kweon et.al. 2501.17630 link
2025-01-29 Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis Kunrong Li et.al. 2501.17598 null
2025-01-29 CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs Amey Hengle et.al. 2501.17581 null
2025-01-28 Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization Zilu Tang et.al. 2501.17295 null
2025-01-26 Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics Jin Hyun Park et.al. 2501.17187 link
2025-02-01 LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering Beiming Liu et.al. 2501.17183 null
2025-01-28 FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Deren Lei et.al. 2501.17144 link
2025-01-28 MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search Shuozhi Yuan et.al. 2501.16607 null
2025-01-27 Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models Huayu Li et.al. 2501.16215 link
2025-01-27 Parametric Retrieval Augmented Generation Weihang Su et.al. 2501.15915 link
2025-01-26 Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis Robinson Umeike et.al. 2501.15370 null
2025-01-26 Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection Bo Yang et.al. 2501.15355 null
2025-01-25 You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning Ayan Sengupta et.al. 2501.15296 null
2025-01-25 Can Large Language Models Be Trusted as Black-Box Evolutionary Optimizers for Combinatorial Problems? Jie Zhao et.al. 2501.15081 null
2025-01-25 Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations Harshita Chopra et.al. 2501.15056 null
2025-01-25 Federated Retrieval Augmented Generation for Multi-Product Question Answering Parshin Shojaee et.al. 2501.14998 null
2025-01-24 Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing Madeline Anderson et.al. 2501.14905 null
2025-01-24 Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs Hang Luo et.al. 2501.14892 link
2025-01-24 Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains Xu Chu et.al. 2501.14431 null
2025-01-24 Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph Xujian Liang et.al. 2501.14300 link
2025-01-24 Humanity’s Last Exam Long Phan et.al. 2501.14249 null
2025-01-24 AI Chatbots as Professional Service Agents: Developing a Professional Identity Wenwen Li et.al. 2501.14179 null
2025-01-23 OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting Xing Hu et.al. 2501.13987 link
2025-01-23 Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs Bhumika Gupta et.al. 2501.13984 null
2025-01-20 A Layered Multi-Expert Framework for Long-Context Mental Health Assessments Jinwen Tang et.al. 2501.13951 null
2025-01-23 CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation Guofeng Cui et.al. 2501.13927 null
2025-01-23 On the Reasoning Capacity of AI Models and How to Quantify It Santosh Kumar Radha et.al. 2501.13833 null
2025-01-23 Hallucinations Can Improve Large Language Models in Drug Discovery Shuzhou Yuan et.al. 2501.13824 null
2025-01-22 OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models Chongren Sun et.al. 2501.12975 link
2025-01-22 FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Zhenran Xu et.al. 2501.12909 null
2025-01-22 Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home Viktor Moskvoretskii et.al. 2501.12835 null
2025-01-30 EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering Chang Zong et.al. 2501.12746 null
2025-01-25 Online Preference Alignment for Language Models via Count-based Exploration Chenjia Bai et.al. 2501.12735 link
2025-01-22 Paradigm-Based Automatic HDL Code Generation Using LLMs Wenhao Sun et.al. 2501.12702 null
2025-01-19 AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model Lipeng Ma et.al. 2501.11031 link
2025-01-18 Iterative Tree Analysis for Medical Critics Zenan Huang et.al. 2501.10642 null
2025-01-18 Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks Xin Yi et.al. 2501.10639 link
2025-01-17 4bit-Quantization in Vector-Embedding for RAG Taehee Jeong et.al. 2501.10534 link
2025-01-17 Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling Suvodip Dey et.al. 2501.10316 link
2025-01-17 Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions Zhijie Tan et.al. 2501.10011 null
2025-01-17 Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models Qiang Liu et.al. 2501.09997 null
2025-01-22 FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs Zengyi Gao et.al. 2501.09957 null
2025-01-17 Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs Reham Omar et.al. 2501.09928 link
2025-01-17 Towards A Litmus Test for Common Sense Hugo Latapie et.al. 2501.09913 null
2025-01-17 FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis Zhe Chen et.al. 2501.09887 null
2025-01-16 Bridging Language Barriers in Healthcare: A Study on Arabic LLMs Nada Saadi et.al. 2501.09825 null
2025-01-16 Enhancing Generalization in Chain of Thought Reasoning for Smaller Models Maxwell J. Yin et.al. 2501.09804 null
2025-01-24 Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Tairan Fu et.al. 2501.09775 null
2025-01-16 Confidence Estimation for Error Detection in Text-to-SQL Systems Oleg Somov et.al. 2501.09527 link
2025-01-16 A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy Huandong Wang et.al. 2501.09431 null
2025-01-16 Rational Tuning of LLM Cascades via Probabilistic Modeling Michael J. Zellinger et.al. 2501.09345 null
2025-01-16 To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation Kaustubh D. Dhole et.al. 2501.09292 null
2025-01-15 Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach Alireza Ghaffari et.al. 2501.09107 null
2025-01-15 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Ruixiang Jiang et.al. 2501.09012 link
2025-01-15 Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching Chuangtao Ma et.al. 2501.08686 link
2025-01-14 SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models Anurag Kumar et.al. 2501.08421 null
2025-01-14 OptiChat: Bridging Optimization Models and Practitioners with Large Language Models Hao Chen et.al. 2501.08406 link
2025-01-14 HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Abhilasha Ravichander et.al. 2501.08292 null
2025-01-14 Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering Feijie Wu et.al. 2501.07813 null
2025-01-13 GPT as a Monte Carlo Language Tree: A Probabilistic Perspective Kun-Peng Ning et.al. 2501.07641 null
2025-01-13 SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models Fabien Bernier et.al. 2501.07639 null
2025-01-13 RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Difei Gu et.al. 2501.07525 link
2025-01-13 Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection Xin Yin et.al. 2501.07425 null
2025-01-13 ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training Jiayang Wu et.al. 2501.07078 link
2025-01-11 Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering Yinghao Hu et.al. 2501.06521 link
2025-01-11 First Token Probability Guided RAG for Telecom Question Answering Tingwei Chen et.al. 2501.06468 null
2025-01-21 MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare Ye Chen et.al. 2501.06465 null
2025-01-10 Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea Eunjung Cho et.al. 2501.05981 null
2025-01-10 Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models Sungjae Lee et.al. 2501.05752 null
2025-01-09 Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning Laura Puccioni et.al. 2501.05248 null
2025-01-09 Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments Yifan Xu et.al. 2501.04947 null
2025-01-09 HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers Yiyao Yang et.al. 2501.04908 link
2025-01-09 SUGAR: Leveraging Contextual Confidence for Smarter Retrieval Hanna Zubkova et.al. 2501.04899 null
2025-01-08 Re-ranking the Context for Multimodal Retrieval Augmented Generation Matin Mortaheb et.al. 2501.04695 null
2025-01-08 Multi-task retriever fine-tuning for domain-specific and efficient RAG Patrice Béchard et.al. 2501.04652 null
2025-01-16 Knowledge Retrieval Based on Generative AI Te-Lun Yang et.al. 2501.04635 null
2025-01-07 RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance Matin Mortaheb et.al. 2501.03995 null
2025-01-07 Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles Yuxi Xia et.al. 2501.03991 null
2025-01-07 Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States Jurgita Kapočiūtė-Dzikienė et.al. 2501.03952 null
2025-01-08 A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval Shuo Tong et.al. 2501.03295 null
2025-01-06 CALM: Curiosity-Driven Auditing for Large Language Models Xiang Zheng et.al. 2501.02997 link
2025-01-19 FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models Zhuo Chen et.al. 2501.02968 null
2025-01-09 InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion Zhaoyi Yan et.al. 2501.02795 null
2025-01-06 QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance Binita Saha et.al. 2501.02702 null
2025-01-06 EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models Andrés Villa et.al. 2501.02699 null
2025-01-05 Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications Zhe Chen et.al. 2501.02460 null
2025-01-04 Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation Shijie Wang et.al. 2501.02226 null
2025-01-04 EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks Shixuan Liu et.al. 2501.02192 null
2025-01-04 The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit Huixue Zhou et.al. 2501.02173 null
2025-01-02 Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection Kedi Chen et.al. 2501.02020 null
2025-01-03 Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification Xiangxiang Dai et.al. 2501.01849 link
2025-01-03 LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries Michal Kuk et.al. 2501.01711 null
2025-01-03 (WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges Mohamed Hisham Abdellatif et.al. 2501.01588 null
2025-01-02 BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery Kanishk Gandhi et.al. 2501.01540 link
2025-01-02 Aligning Large Language Models for Faithful Integrity Against Opposing Argument Yong Zhao et.al. 2501.01336 link
2025-01-02 Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension Yanbo Fang et.al. 2501.01332 null
2025-01-03 Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking Xiaoxue Cheng et.al. 2501.01306 null
2025-01-02 Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion Qiyuan He et.al. 2501.01246 null
2025-01-02 SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization Yongle Huang et.al. 2501.01245 link
2025-01-02 Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method Ruichen Zhang et.al. 2501.01141 null
2025-01-02 Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models Yanwen Huang et.al. 2501.01059 null
2025-01-02 Dynamic Scaling of Unit Tests for Code Reward Modeling Zeyao Ma et.al. 2501.01054 null
2025-01-07 LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management Yichen Luo et.al. 2501.00826 null
2025-01-01 NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model Yuzhi Lai et.al. 2501.00785 null
2024-12-31 Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs Harit Vishwakarma et.al. 2501.00555 null
2024-12-31 A review of faithfulness metrics for hallucination assessment in Large Language Models Ben Malin et.al. 2501.00269 null
2024-12-31 CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care Michael Gubanov et.al. 2501.00223 null
2024-12-30 CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions Mourad Heddaya et.al. 2501.00097 null
2024-12-30 Facilitating large language model Russian adaptation with Learned Embedding Propagation Mikhail Tikhomirov et.al. 2412.21140 link
2024-12-30 KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation Siyuan Fang et.al. 2412.20995 null
2024-12-30 Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs’ Memory Xingjian Tao et.al. 2412.20846 null
2024-12-30 UBER: Uncertainty-Based Evolution with Large Language Models for Automatic Heuristic Design Zijie Chen et.al. 2412.20694 link
2025-01-05 Distilling Desired Comments for Enhanced Code Review with Large Language Models Yongda Yu et.al. 2412.20340 null
2024-12-29 Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain Shintaro Ozaki et.al. 2412.20309 link
2024-12-28 ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty Qing Zong et.al. 2412.20251 link
2024-12-27 Toward Adaptive Reasoning in Large Language Models with Thought Rollback Sijia Chen et.al. 2412.19707 link
2024-12-27 Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs Zhe Yang et.al. 2412.19513 link
2024-12-27 MBQ: Modality-Balanced Quantization for Large Vision-Language Models Shiyao Li et.al. 2412.19509 link
2024-12-26 RAG with Differential Privacy Nicolas Grislain et.al. 2412.19291 link
2025-01-03 MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models Kaiwen Zuo et.al. 2412.18947 null
2025-01-06 Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation Derong Xu et.al. 2412.18537 link
2024-12-24 Is Large Language Model Good at Triple Set Prediction? An Empirical Study Yuan Yuan et.al. 2412.18443 null
2024-12-24 Annotating References to Mythological Entities in French Literature Thierry Poibeau et.al. 2412.18270 null
2024-12-24 Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine Yu He Ke et.al. 2412.18096 null
2024-12-23 Trustworthy and Efficient LLMs Meet Databases Kyoungmin Kim et.al. 2412.18022 null
2024-12-22 The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM’s Internal States Fabian Ridder et.al. 2412.17056 link
2024-12-22 Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs Alexander von Recum et.al. 2412.16974 null
2024-12-28 Lillama: Large Language Models Compression via Low-Rank Feature Distillation Yaya Sy et.al. 2412.16719 null
2024-12-21 Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks Jinyan Su et.al. 2412.16708 link
2024-12-21 AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles Aritra Kumar Lahiri et.al. 2412.16701 null
2024-12-21 Internalized Self-Correction for Large Language Models Nishanth Upadhyaya et.al. 2412.16653 null
2024-12-21 Identifying Cyberbullying Roles in Social Media Manuel Sandoval et.al. 2412.16417 null
2024-12-20 Towards Safe and Honest AI Agents with Neural Self-Other Overlap Marc Carauleanu et.al. 2412.16325 null
2024-12-20 Logical Consistency of Large Language Models in Fact-checking Bishwamittra Ghosh et.al. 2412.16100 null
2024-12-20 To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models Jessica Y. Bo et.al. 2412.15584 null
2024-12-24 Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Saehyung Lee et.al. 2412.15484 null
2024-12-19 Systematic Evaluation of Long-Context LLMs on Financial Concepts Lavanya Gupta et.al. 2412.15386 null
2024-12-19 Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models Nishtha N. Vaidya et.al. 2412.15309 null
2024-12-19 A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation Bhaskarjit Sarmah et.al. 2412.15298 null
2024-12-19 Confidence in the Reasoning of Large Language Models Yudi Pawitan et.al. 2412.15296 link
2024-12-17 SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation Yuzheng Cai et.al. 2412.15272 link
2024-12-17 A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models Gongbo Zhang et.al. 2412.15271 null
2024-12-15 LLMs for Literature Review: Are we there yet? Shubham Agarwal et.al. 2412.15249 null
2024-12-19 Rethinking Uncertainty Estimation in Natural Language Generation Lukas Aichberger et.al. 2412.15176 null
2024-12-19 Adaptive Pruning for Large Language Models with Structural Importance Awareness Haotian Zheng et.al. 2412.15127 null
2024-12-19 Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability Xiangsen Chen et.al. 2412.15101 null
2024-12-19 RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Junyu Luo et.al. 2412.14922 link
2024-12-19 Dehallucinating Parallel Context Extension for Retrieval-Augmented Generation Zexiong Ma et.al. 2412.14905 null
2024-12-19 Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling Junyi Li et.al. 2412.14860 null
2024-12-19 Query pipeline optimization for cancer patient question answering systems Maolin He et.al. 2412.14751 null
2024-12-19 On Verbalized Confidence Scores for LLMs Daniel Yang et.al. 2412.14737 link
2024-12-25 Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models Zijun Chen et.al. 2412.14660 link
2024-12-19 Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment Teng Xiao et.al. 2412.14516 link
2024-12-19 FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis Abdullah Khan et.al. 2412.14492 link
2024-12-18 LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis Chengpeng Wang et.al. 2412.14399 null
2024-12-18 Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets Simon Thorne et.al. 2412.14062 null
2024-12-18 Discovering maximally consistent distribution of causal tournaments with Large Language Models Federico Baldo et.al. 2412.14019 null
2024-12-27 Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence Jinghan He et.al. 2412.13949 null
2024-12-29 Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection Le Yang et.al. 2412.13817 link
2024-12-18 Meta-Reflection: A Feedback-Free Reflection Learning Framework Yaoke Wang et.al. 2412.13781 null
2024-12-18 Are LLMs Good Literature Review Writers? Evaluating the Literature Review Writing Ability of Large Language Models Xuemei Tang et.al. 2412.13612 null
2024-12-18 Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement Qianyue Wang et.al. 2412.13575 link
2024-12-18 C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System Parker Addison et.al. 2412.13163 null
2024-12-17 Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health Vivek Kumar et.al. 2412.12981 link
2024-12-17 A Survey of Calibration Process for Black-Box LLMs Liangru Xie et.al. 2412.12767 null
2024-12-18 Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models Seungeun Oh et.al. 2412.12687 null
2024-12-17 What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context Zhiyuan Chang et.al. 2412.12632 null
2024-12-17 Jailbreaking? One Step Is Enough! Weixiong Zheng et.al. 2412.12621 null
2024-12-17 When to Speak, When to Abstain: Contrastive Decoding with Abstention Hyuhng Joon Kim et.al. 2412.12527 null
2024-12-12 Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off Eoin M. Kenny et.al. 2412.12169 link
2024-12-11 SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration Yuanhao Shen et.al. 2412.12151 link
2024-12-16 LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts Zhuhao Wang et.al. 2412.12001 link
2024-12-16 RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Xiaoxi Li et.al. 2412.11919 link
2024-12-16 Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments Andrii Nikolaiev et.al. 2412.11908 null
2024-12-16 A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection Simon Hachmeier et.al. 2412.11851 link
2024-12-16 UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models Boyang Xue et.al. 2412.11803 link
2024-12-16 Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods Diana Bar-Or Nirman et.al. 2412.11625 null
2024-12-16 Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes Antonio Carlos Rivera et.al. 2412.11396 null
2024-12-15 CATER: Leveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation Kurando IIDA et.al. 2412.11261 null
2024-12-15 Do Tutors Learn from Equity Training and Can Generative AI Assess It? Danielle R. Thomas et.al. 2412.11255 link
2024-12-15 Task-Oriented Dialog Systems for the Senegalese Wolof Language Derguene Mbaye et.al. 2412.11203 null
2024-12-15 Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning Shengqiong Wu et.al. 2412.11124 null
2024-12-15 Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Yun Qu et.al. 2412.11120 link
2024-12-15 Empowering LLMs to Understand and Generate Complex Vector Graphics Ximing Xing et.al. 2412.11102 null
2024-12-17 MedG-KRP: Medical Graph Knowledge Representation Probing Gabriel R. Rosenbaum et.al. 2412.10982 null
2024-12-14 Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data Xue Wu et.al. 2412.10654 null
2024-12-13 Benchmarking large language models for materials synthesis: the case of atomic layer deposition Angel Yanguas-Gil et.al. 2412.10477 null
2024-12-13 Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts Hazel Kim et.al. 2412.10246 null
2024-12-13 How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives Timour Ichmoukhamedov et.al. 2412.10220 link
2024-12-13 TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering Bingru Li et.al. 2412.10139 null
2024-12-13 ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL Yang Qin et.al. 2412.10138 link
2024-12-12 DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction Yu Feng et.al. 2412.09572 null
2024-12-12 Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion Ben Liu et.al. 2412.09094 link
2024-12-12 Dial-In LLM: Human-Aligned Dialogue Intent Clustering with LLM-in-the-loop Mengze Hong et.al. 2412.09049 null
2024-12-12 Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning Wenna Lai et.al. 2412.09046 null
2024-12-12 ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty Meizhi Zhong et.al. 2412.09036 null
2024-12-11 Learning to Reason via Self-Iterative Process Feedback for Small Language Models Kaiyuan Chen et.al. 2412.08393 null
2024-12-11 What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models Bangshuo Zhu et.al. 2412.08098 null
2024-12-10 HalluCana: Fixing LLM Hallucination with A Canary Lookahead Tianyi Li et.al. 2412.07965 null
2024-12-10 Forking Paths in Neural Text Generation Eric Bigelow et.al. 2412.07961 null
2024-12-10 Low-Rank Correction for Quantized LLMs Meyer Scetbon et.al. 2412.07902 null
2024-12-08 Language Model as Visual Explainer Xingyi Yang et.al. 2412.07802 null
2024-12-16 Granite Guardian Inkit Padhi et.al. 2412.07724 link
2024-12-10 Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation Qinhong Lin et.al. 2412.07255 null
2024-12-10 Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data Replay Ruiheng Liu et.al. 2412.07246 null
2024-12-10 MAPLE: A Framework for Active Preference Learning Guided by Large Language Models Saaduddin Mahmud et.al. 2412.07207 null
2024-12-10 When Graph Meets Retrieval Augmented Generation for Wireless Networks: A Tutorial and Case Study Yang Xiong et.al. 2412.07189 null
2024-12-10 Post-Training Statistical Calibration for Higher Activation Sparsity Vui Seng Chua et.al. 2412.07174 link
2024-12-11 ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models Jieyu Zhang et.al. 2412.07012 link
2024-12-09 Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study Ehsan Shareghi et.al. 2412.06272 null
2024-12-09 MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization Kangyu Zhu et.al. 2412.06141 link
2024-12-08 Hallucination-aware Optimization for Large Language Model-empowered Communications Yinqiu Liu et.al. 2412.06007 link
2024-12-07 Training-Free Bayesianization for Low-Rank Adapters of Large Language Models Haizhou Shi et.al. 2412.05723 link
2024-12-07 Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent Ziyuan Qin et.al. 2412.05722 null
2024-12-07 A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions Ola Shorinwa et.al. 2412.05563 null
2024-12-07 Ranking of Large Language Model with Nonparametric Prompts Zebin Wang et.al. 2412.05506 null
2024-12-06 Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization Subhojyoti Mukherjee et.al. 2412.05469 null
2024-12-06 A Graph-Based Approach for Conversational AI-Driven Personal Memory Capture and Retrieval in a Real-world Application Savini Kashmira et.al. 2412.05447 null
2024-12-06 HiVeGen – Hierarchical LLM-based Verilog Generation for Scalable Chip Design Jinwei Tang et.al. 2412.05393 null
2024-12-09 Enhancing FKG.in: automating Indian food composition analysis Saransh Kumar Gupta et.al. 2412.05248 null
2024-12-06 100% Hallucination Elimination Using Acurai Michael C. Wood et.al. 2412.05223 link
2024-12-06 Steps are all you need: Rethinking STEM Education with Prompt Engineering Krishnasai Addala et.al. 2412.05023 null
2024-12-06 Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance Xuchan Bao et.al. 2412.04746 null
2024-12-06 LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs Xuan Chen et.al. 2412.04690 null
2024-12-05 HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning Manish Bhattarai et.al. 2412.04661 link
2024-12-10 Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates Li Shi et.al. 2412.04629 null
2024-12-05 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen et.al. 2412.04424 link
2024-12-05 Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation Xuying Li et.al. 2412.04415 null
2024-12-05 Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots Maria Paola Priola et.al. 2412.04235 null
2024-12-05 Reducing Tool Hallucination via Reliability Alignment Hongshen Xu et.al. 2412.04141 null
2024-12-04 A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences Gabriel Lino Garcia et.al. 2412.03531 null
2024-12-04 You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? Dominic Lohr et.al. 2412.03516 null
2024-12-03 Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning Ranganath Krishnan et.al. 2412.02904 null
2024-12-03 An Evolutionary Large Language Model for Hallucination Mitigation Abdennour Boulesnane et.al. 2412.02790 null
2024-12-03 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Junyuan Zhang et.al. 2412.02592 link
2024-12-03 Semantic Tokens in Retrieval Augmented Generation Joel Suro et.al. 2412.02563 null
2024-12-04 The use of large language models to enhance cancer clinical trial educational materials Mingye Gao et.al. 2412.01955 null
2024-12-04 The Reality of AI and Biorisk Aidan Peppin et.al. 2412.01946 null
2024-12-02 R-Bot: An LLM-based Query Rewrite System Zhaoyan Sun et.al. 2412.01661 null
2024-12-02 Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input Francesco Taioli et.al. 2412.01250 null
2024-12-02 SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages Jia Guo et.al. 2412.01186 link
2024-12-02 SAUP: Situation Awareness Uncertainty Propagation on LLM Agent Qiwei Zhao et.al. 2412.01033 null
2024-12-02 AI Benchmarks and Datasets for LLM Evaluation Todor Ivanov et.al. 2412.01020 null
2024-12-06 Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection Shanu Kumar et.al. 2412.00353 null
2024-11-30 Human-Like Code Quality Evaluation through LLM-based Recursive Semantic Comprehension Fangzhou Xu et.al. 2412.00314 null
2024-11-29 An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement Saurabh Mishra et.al. 2412.00224 null
2024-11-24 Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis Ferhat Ozgur Catak et.al. 2412.00056 null
2024-12-02 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis Alessandro Scirè et.al. 2411.19655 link
2024-11-29 RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation Xianfeng Tan et.al. 2411.19528 null
2024-11-29 Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems Shengming Zhao et.al. 2411.19463 null
2024-11-28 Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs Anirudh Phukan et.al. 2411.19187 null
2024-11-28 Mars-PO: Multi-Agent Reasoning System Preference Optimization Xiaoxuan Lou et.al. 2411.19039 null
2024-11-28 AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Jisheng Bai et.al. 2411.18953 link
2024-11-27 Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students Tiffany Zhu et.al. 2411.18708 null
2024-11-27 Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track Deepak Gupta et.al. 2411.18069 null
2024-11-26 MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation Sankalp Sinha et.al. 2411.17945 link
2024-11-26 AI2T: Building Trustable AI Tutors by Interactively Teaching a Self-Aware Learning Agent Daniel Weitekamp et.al. 2411.17924 null
2024-11-26 $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Selim Furkan Tekin et.al. 2411.17792 link
2024-11-26 MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation Harsh Singh et.al. 2411.17636 null
2024-11-26 One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models Pengfei Cao et.al. 2411.17401 null
2024-11-26 Can LLMs be Good Graph Judger for Knowledge Graph Construction? Haoyu Huang et.al. 2411.17388 link
2024-11-26 Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning Milena Chadimová et.al. 2411.17304 null
2024-11-26 HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator Fan Yang et.al. 2411.17261 null
2024-11-25 Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries Harshavardhan Battula et.al. 2411.16818 null
2024-11-25 Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models Alireza Amiri-Margavi et.al. 2411.16797 null
2024-11-25 VidHal: Benchmarking Temporal Hallucinations in Vision LLMs Wey Yeh Choong et.al. 2411.16771 link
2024-11-23 Text-to-SQL Calibration: No Need to Ask – Just Rescale Model Probabilities Ashwin Ramachandran et.al. 2411.16742 null
2024-11-23 Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction Mitchell Rosser et.al. 2411.16723 null
2024-11-28 Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation Sanjana Ramprasad et.al. 2411.16638 null
2024-12-03 AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning Amy Xin et.al. 2411.16495 link
2024-11-25 Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models Zhihua Duan et.al. 2411.16189 null
2024-11-24 Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown Lifu Tu et.al. 2411.15993 null
2024-11-23 Ontology-Constrained Generation of Domain-Specific Clinical Summaries Gaya Mehenni et.al. 2411.15666 link
2024-11-23 MC-NEST – Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree Gollam Rabby et.al. 2411.15645 link
2024-11-23 “All that Glitters”: Approaches to Evaluations with Unreliable Model and Human Annotations Michael Hardy et.al. 2411.15634 link
2024-11-22 Sycophancy in Large Language Models: Causes and Mitigations Lars Malmqvist et.al. 2411.15287 null
2024-11-18 Can Open-source LLMs Enhance Data Augmentation for Toxic Detection?: An Experimental Study Zheng Hui et.al. 2411.15175 null
2024-11-22 Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation Colin Diggs et.al. 2411.14971 null
2024-11-22 SwissADT: An Audio Description Translation System for Swiss Languages Lukas Fischer et.al. 2411.14967 null
2024-12-01 G-RAG: Knowledge Expansion in Material Science Radeen Mostafa et.al. 2411.14592 link
2024-11-20 The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz David Noever et.al. 2411.14486 null
2024-11-19 Why you don’t overfit, and don’t need Bayes if you only train for one epoch Laurence Aitchison et.al. 2411.14478 null
2024-11-18 Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning Elizaveta Reganova et.al. 2411.14465 null
2024-11-15 Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models Maryam Shoaeinaeini et.al. 2411.14457 null
2024-11-21 Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance Haozhe Zhao et.al. 2411.14279 null
2024-11-21 Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective Ernests Lavrinovics et.al. 2411.14258 null
2024-11-21 RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks Changyue Jiang et.al. 2411.14110 null
2024-11-21 XAgents: A Framework for Interpretable Rule-Based Multi-Agents Cooperation Hailong Yang et.al. 2411.13932 null
2024-11-21 Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels Jianhao Yan et.al. 2411.13775 link
2024-11-20 Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics Ryan Mok et.al. 2411.13685 link
2024-11-21 Disentangling Memory and Reasoning Ability in Large Language Models Mingyu Jin et.al. 2411.13504 link
2024-11-20 Fact-Level Confidence Calibration and Self-Correction Yige Yuan et.al. 2411.13343 link
2024-11-20 Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding Nabeel Seedat et.al. 2411.13163 null
2024-11-16 A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery Grace Sng et.al. 2411.12759 null
2024-11-19 Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs Malay Kumar et.al. 2411.12685 null
2024-11-15 Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination Haojie Zheng et.al. 2411.12591 link
2024-11-19 Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering Aryan Keluskar et.al. 2411.12395 null
2024-11-28 VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation Ruiyang Zhang et.al. 2411.11919 null
2024-11-07 Deploying Large Language Models With Retrieval Augmented Generation Sonal Prabhune et.al. 2411.11895 link
2024-11-18 Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality Viktoriia Chekalina et.al. 2411.11531 null
2024-11-18 Membership Inference Attack against Long-Context Large Language Models Zixiong Wang et.al. 2411.11424 null
2024-11-29 Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? Rosalia Tufano et.al. 2411.11401 link
2024-11-17 Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu et.al. 2411.10950 link
2024-11-16 Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation Shuyang Hou et.al. 2411.10753 null
2024-11-16 I’m Spartacus, No, I’m Spartacus: Measuring and Understanding LLM Identity Confusion Kun Li et.al. 2411.10683 null
2024-11-15 Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering Buntaro Hiraki et.al. 2411.10583 null
2024-11-15 On the Privacy Risk of In-context Learning Haonan Duan et.al. 2411.10512 null
2024-11-15 Understanding The Effect Of Temperature On Alignment With Human Opinions Maja Pavlovic et.al. 2411.10080 null
2024-11-15 Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity Zichen Song et.al. 2411.10069 null
2024-11-15 Experiences from Using LLMs for Repository Mining Studies in Empirical Software Engineering Vincenzo de Martino et.al. 2411.09974 null
2024-11-15 AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Janghwan Lee et.al. 2411.09909 null
2024-11-14 LLM Hallucination Reasoning with Zero-shot Knowledge Test Seongmin Lee et.al. 2411.09689 null
2024-11-14 DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine Jean Seo et.al. 2411.09255 link
2024-11-14 Toward Democratized Generative AI in Next-Generation Mobile Edge Networks Ruichen Zhang et.al. 2411.09148 null
2024-11-13 The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models Daniel P. Jeong et.al. 2411.08870 link
2024-11-04 QCG-Rerank: Chunks Graph Rerank with Query Expansion in Retrieval-Augmented LLMs for Tourism Domain Qikai Wei et.al. 2411.08724 null
2024-11-13 Neural Topic Modeling with Large Language Models in the Loop Xiaohao Yang et.al. 2411.08534 null
2024-11-13 Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach Shangfeng Chen et.al. 2411.08348 null
2024-11-13 Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering Farouq Sammour et.al. 2411.08320 null
2024-11-12 Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data Juanhui Li et.al. 2411.08028 null
2024-11-12 From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents Chuyi Kong et.al. 2411.07965 null
2024-11-13 Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders Xiaofeng Zhu et.al. 2411.07870 null
2024-11-12 Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models Yusen Zhang et.al. 2411.07858 link
2024-11-12 OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework Jiaxi Li et.al. 2411.07711 link
2024-11-12 DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises Nan Xu et.al. 2411.07457 link
2024-11-16 Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation Ziwei Liu et.al. 2411.07021 null
2024-11-11 LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help? Rikiya Takehi et.al. 2411.06877 link
2024-11-11 AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant Yujia Zhou et.al. 2411.06805 link
2024-11-11 Anchor Attention, Small Cache: Code Generation with Large Language Models Xiangyu Zhang et.al. 2411.06680 link
2024-11-10 CriticAL: Critic Automation with Language Models Michael Y. Li et.al. 2411.06590 null
2024-11-10 Epistemic Integrity in Large Language Models Bijean Ghafouri et.al. 2411.06528 link
2024-11-10 Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques Daniil Sulimov et.al. 2411.06445 null
2024-11-09 Sufficient Context: A New Lens on Retrieval Augmented Generation Systems Hailey Joren et.al. 2411.06037 null
2024-11-12 Game-theoretic LLM: Agent Workflow for Negotiation Games Wenyue Hua et.al. 2411.05990 link
2024-11-08 FactLens: Benchmarking Fine-Grained Fact Verification Kushan Mitra et.al. 2411.05980 null
2024-11-08 Mitigating Hallucination with ZeroG: An Advanced Knowledge Management Engine Anantha Sharma et.al. 2411.05936 null
2024-11-08 The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent Leon O. H. Kroczek et.al. 2411.05653 null
2024-11-16 Web Archives Metadata Generation with GPT-4o: Challenges and Insights Abigail Yongping Huang et.al. 2411.05409 link
2024-11-08 Seeing Through the Fog: A Cost-Effectiveness Analysis of Hallucination Detection Systems Alexander Thomas et.al. 2411.05270 null
2024-11-07 Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability Yanjun Gao et.al. 2411.04962 null
2024-11-07 Prompt-Guided Internal States for Hallucination Detection of Large Language Models Fujie Zhang et.al. 2411.04847 link
2024-11-07 Self-Calibrated Listwise Reranking with Large Language Models Ruiyang Ren et.al. 2411.04602 null
2024-11-07 LLM-R: A Framework for Domain-Adaptive Maintenance Scheme Generation Combining Hierarchical Agents and RAG Laifa Tao et.al. 2411.04476 null
2024-11-07 Bayesian Calibration of Win Rate Estimation with LLM Evaluators Yicheng Gao et.al. 2411.04424 link
2024-11-06 A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI Melusi Malinga et.al. 2411.04316 null
2024-11-06 Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? Daniel P. Jeong et.al. 2411.04118 link
2024-11-06 Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation Yuhang Liu et.al. 2411.03957 null
2024-11-06 EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning Kiran Purohit et.al. 2411.03877 link
2024-11-06 QUILL: Quotation Generation Enhancement of Large Language Models Jin Xiao et.al. 2411.03675 link
2024-11-05 Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature Viviane Torres da Silva et.al. 2411.03484 null
2024-11-05 VERITAS: A Unified Approach to Reliability Evaluation Rajkumar Ramamurthy et.al. 2411.03300 null
2024-11-05 Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities Ryosuke Takata et.al. 2411.03252 null
2024-11-05 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Jiejun Tan et.al. 2411.02959 link
2024-11-05 Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning Tao Zhang et.al. 2411.02864 null
2024-11-05 V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization Yuxi Xie et.al. 2411.02712 link
2024-11-07 FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees Fan Nie et.al. 2411.02603 null
2024-11-03 Graph-based Confidence Calibration for Large Language Models Yukun Li et.al. 2411.02454 null
2024-11-03 Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models Aliyah R. Hsu et.al. 2411.02448 link
2024-11-04 Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models Guangzhi Xiong et.al. 2411.02382 null
2024-11-04 Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI Ramneet Kaur et.al. 2411.02381 null
2024-11-04 “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization Eldar Kurtic et.al. 2411.02355 null
2024-11-03 Autoformulation of Mathematical Optimization Models Using LLMs Nicolás Astorga et.al. 2411.01679 null
2024-11-03 Ontology Population using LLMs Sanaz Saki Norouzi et.al. 2411.01612 null
2024-11-02 AMREx: AMR for Explainable Fact Verification Chathuri Jayaweera et.al. 2411.01343 null
2024-11-01 Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output Hithesh Sankararaman et.al. 2411.01022 null
2024-10-30 FPE-LLM: Highly Intelligent Time-Series Forecasting and Language Interaction LLM in Energy Systems Zihang Qiu et.al. 2411.00852 null
2024-10-30 GWQ: Gradient-Aware Weight Quantization for Large Language Models Yihua Shao et.al. 2411.00850 null
2024-11-01 CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation Ziting Wang et.al. 2411.00744 null
2024-11-01 Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval Qingfei Zhao et.al. 2411.00689 null
2024-11-01 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation Bohan Lyu et.al. 2411.00412 null
2024-11-01 Beyond Utility: Evaluating LLM as Recommender Chumeng Jiang et.al. 2411.00331 link
2024-11-01 Rationale-Guided Retrieval Augmented Generation for Medical Question Answering Jiwoong Sohn et.al. 2411.00300 link
2024-11-01 RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models Sraavya Sambara et.al. 2411.00299 null
2024-10-29 Problem Categorization Can Help Large Language Models Solve Math Problems Amogh Akella et.al. 2411.00042 null
2024-10-28 A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges Zifeng Wang et.al. 2411.00024 null
2024-11-04 Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models Ognjen et.al. 2411.00023 null
2024-10-31 Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs Liyi Chen et.al. 2410.23875 link
2024-10-31 Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs Shuyang Yu et.al. 2410.23605 null
2024-10-31 Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval Sheryl Hsu et.al. 2410.23214 null
2024-10-30 VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning Jingkun Ma et.al. 2410.22995 null
2024-10-30 Retrieval-Augmented Generation with Estimation of Source Reliability Jeongyeon Hwang et.al. 2410.22954 null
2024-10-30 Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations Leonardo Ranaldi et.al. 2410.22874 null
2024-10-30 Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot Sejin Lee et.al. 2410.22767 link
2024-10-30 Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings Yashvir S. Grewal et.al. 2410.22685 null
2024-10-29 Distinguishing Ignorance from Error in LLM Hallucinations Adi Simhi et.al. 2410.22071 link
2024-10-29 Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Monica Riedler et.al. 2410.21943 link
2024-10-29 MARCO: Multi-Agent Real-time Chat Orchestration Anubhav Shrimal et.al. 2410.21784 null
2024-10-28 LLM-Forest for Health Tabular Data Imputation Xinrui He et.al. 2410.21520 null
2024-10-28 EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation Shih-Yang Liu et.al. 2410.21271 null
2024-10-28 CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models Meiqi Chen et.al. 2410.21067 null
2024-10-28 Reward Modeling with Weak Supervision for Language Models Ben Hauptvogel et.al. 2410.20869 link
2024-10-28 Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation Jaechang Kim et.al. 2410.20811 null
2024-10-28 Graph-based Uncertainty Metrics for Long-form Language Model Outputs Mingjian Jiang et.al. 2410.20783 link
2024-10-28 Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee et.al. 2410.20774 link
2024-10-28 Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation Mufei Li et.al. 2410.20724 link
2024-10-27 Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains Jiemin Wu et.al. 2410.20340 null
2024-10-26 Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models Mohammad Beigi et.al. 2410.20199 null
2024-10-26 Uncertainty-Penalized Direct Preference Optimization Sam Houliston et.al. 2410.20187 null
2024-10-26 Mask-based Membership Inference Attacks for Retrieval-Augmented Generation Mingrui Liu et.al. 2410.20142 null
2024-10-26 Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics Mikhail Rumiantsau et.al. 2410.20024 null
2024-10-25 FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning Nicole Cho et.al. 2410.19727 null
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702 null
2024-10-30 ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems Ishneet Sukhvinder Singh et.al. 2410.19572 null
2024-11-01 Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization Anthony Cui et.al. 2410.19499 null
2024-10-25 A Debate-Driven Experiment on LLM Hallucinations and Accuracy Ray Li et.al. 2410.19485 null
2024-10-25 Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models Liam Barkley et.al. 2410.19385 null
2024-10-25 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning Yujian Liu et.al. 2410.19290 link
2024-10-24 Prebunking Elections Rumors: Artificial Intelligence Assisted Interventions Increase Confidence in American Elections Mitchell Linegar et.al. 2410.19202 null
2024-10-24 AlignCap: Aligning Speech Emotion Captioning to Human Preferences Ziqi Liang et.al. 2410.19134 null
2024-10-24 LLM Tree Search Dylan Wilson et.al. 2410.19117 null
2024-10-30 Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti et.al. 2410.18952 link
2024-10-24 DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations Aryo Pradipta Gema et.al. 2410.18860 link
2024-10-25 An LLM Agent for Automatic Geospatial Data Analysis Yuxing Chen et.al. 2410.18792 null
2024-10-24 Task Calibration: Calibrating Large Language Models on Inference Tasks Yingjie Li et.al. 2410.18764 null
2024-10-24 LLM-Slice: Dedicated Wireless Network Slicing for Large Language Models Boyi Liu et.al. 2410.18499 null
2024-10-23 AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models Kim Sung-Bin et.al. 2410.18325 link
2024-10-23 Multilingual Hallucination Gaps in Large Language Models Cléa Chataigner et.al. 2410.18270 null
2024-10-23 Beware of Calibration Data for Pruning Large Language Models Yixin Ji et.al. 2410.17711 null
2024-10-23 MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models Guijin Son et.al. 2410.17578 link
2024-10-29 Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination Jerry Huang et.al. 2410.17477 null
2024-10-22 ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs Reza Fayyazi et.al. 2410.17406 link
2024-10-22 DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR Miguel Contreras et.al. 2410.17363 null
2024-10-22 Are Large Language Models Ready for Travel Planning? Ruiping Ren et.al. 2410.17333 null
2024-10-22 Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy Benedict Aaron Tjandra et.al. 2410.17234 null
2024-10-23 GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks Shuyang Hou et.al. 2410.17031 null
2024-10-22 SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine Xiaochen Wang et.al. 2410.17021 null
2024-10-22 Combining Ontological Knowledge and Large Language Model for User-Friendly Service Robots Haru Nakajima et.al. 2410.16804 null
2024-10-21 Large language models enabled multiagent ensemble method for efficient EHR data labeling Jingwei Huang et.al. 2410.16543 null
2024-10-21 Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models’ Reasoning with Formal Logic Jason Chan et.al. 2410.16502 null
2024-10-18 Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs Rui Pu et.al. 2410.16327 null
2024-10-29 Can Knowledge Editing Really Correct Hallucinations? Baixiang Huang et.al. 2410.16251 link
2024-10-21 Analyzing Context Contributions in LLM-based Machine Translation Emmanouil Zaranis et.al. 2410.16246 null
2024-10-23 IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems Yihuan Mao et.al. 2410.16237 null
2024-10-21 Information for Conversation Generation: Proposals Utilising Knowledge Graphs Alex Clay et.al. 2410.16196 null
2024-10-22 Reducing Hallucinations in Vision-Language Models via Latent Space Steering Sheng Liu et.al. 2410.15778 link
2024-10-21 Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding Derong Xu et.al. 2410.15702 null
2024-10-21 Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences Yiping Ma et.al. 2410.15701 null
2024-10-21 NetSafe: Exploring the Topological Safety of Multi-agent Networks Miao Yu et.al. 2410.15686 null
2024-10-21 Bayesian Concept Bottleneck Models with LLM Priors Jean Feng et.al. 2410.15555 link
2024-10-20 Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini Chanseo Lee et.al. 2410.15528 null
2024-10-22 Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence Norbert Tihanyi et.al. 2410.15490 null
2024-10-20 Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training Shahrad Mohammadzadeh et.al. 2410.15460 null
2024-10-20 CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges Haitao Li et.al. 2410.15393 link
2024-10-20 A Survey of Hallucination in Large Visual Language Models Wei Lan et.al. 2410.15359 null
2024-10-20 Modality-Fair Preference Optimization for Trustworthy MLLM Alignment Songtao Jiang et.al. 2410.15334 null
2024-10-20 A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice Hsiu-Yuan Huang et.al. 2410.15326 null
2024-10-20 Causality for Large Language Models Anpeng Wu et.al. 2410.15319 link
2024-10-20 MAD: Move AI Decompiler to Improve Transparency and Auditability on Non-Open-Source Blockchain Smart Contract Eason Chen et.al. 2410.15275 null
2024-10-19 Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction Yinhan He et.al. 2410.15165 link
2024-10-19 MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification Yin Li et.al. 2410.15154 link
2024-10-22 Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization Zihui Wu et.al. 2410.15052 link
2024-10-19 “Ghost of the past”: identifying and resolving privacy leakage from LLM’s memory through proactive user interaction Shuning Zhang et.al. 2410.14931 null
2024-10-18 FedSpaLLM: Federated Pruning of Large Language Models Guangji Bai et.al. 2410.14852 null
2024-10-18 Enabling Scalable Evaluation of Bias Patterns in Medical LLMs Hamed Fayyaz et.al. 2410.14763 link
2024-10-22 ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries Kishan Maharaj et.al. 2410.14748 null
2024-10-17 Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors Anthony Sicilia et.al. 2410.14744 null
2024-10-18 Enhancing Large Language Models’ Situated Faithfulness to External Contexts Yukun Huang et.al. 2410.14675 link
2024-10-22 Do LLMs estimate uncertainty well in instruction-following? Juyeon Heo et.al. 2410.14582 link
2024-10-18 Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models James Vo et.al. 2410.14480 null
2024-10-18 Zero-shot Action Localization via the Confidence of Large Vision-Language Models Josiah Aklilu et.al. 2410.14340 null
2024-10-18 Critical Questions Generation: Motivation and Challenges Blanca Calvo Figueras et.al. 2410.14335 link
2024-10-18 ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM Songheng Zhang et.al. 2410.14331 null
2024-10-18 LoGU: Long-form Generation with Uncertainty Expressions Ruihan Yang et.al. 2410.14309 link
2024-10-22 Good Parenting is all you need – Multi-agentic LLM Hallucination Mitigation Ted Kwartler et.al. 2410.14262 null
2024-10-18 Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models Olga Loginova et.al. 2410.14248 null
2024-10-21 Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning Xingyu Tan et.al. 2410.14211 null
2024-10-18 Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment Chenhang Cui et.al. 2410.14148 null
2024-10-17 From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization Catarina G. Belem et.al. 2410.13961 link
2024-10-17 Goal Inference from Open-Ended Dialog Rachel Ma et.al. 2410.13957 null
2024-10-17 RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards Xinze Li et.al. 2410.13509 link
2024-10-17 Advancing Large Language Model Attribution through Self-Improving Lei Huang et.al. 2410.13298 null
2024-10-17 Learning to Route with Confidence Tokens Yu-Neng Chuang et.al. 2410.13284 null
2024-10-17 Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning Minseok Choi et.al. 2410.13274 null
2024-10-17 Atomic Calibration of LLMs in Long-Form Generations Caiqi Zhang et.al. 2410.13246 null
2024-10-17 LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch Caigao Jiang et.al. 2410.13213 link
2024-10-17 FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs Forrest Sheng Bao et.al. 2410.13210 link
2024-10-18 MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback Zonghai Yao et.al. 2410.13191 link
2024-10-21 Utilizing Large Language Models in An Iterative Paradigm with Domain Feedback for Molecule Optimization Khiem Le et.al. 2410.13147 null
2024-10-17 Trust but Verify: Programmatic VLM Evaluation in the Wild Viraj Prabhu et.al. 2410.13121 null
2024-10-17 Learning to Summarize from LLM-generated Feedback Hwanjun Song et.al. 2410.13116 null
2024-10-16 Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models Jie Ren et.al. 2410.13088 null
2024-10-16 Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models Linhao Luo et.al. 2410.13080 link
2024-10-16 PromptExp: Multi-granularity Prompt Explanation of Large Language Models Ximing Dong et.al. 2410.13073 null
2024-10-16 LLM Confidence Evaluation Measures in Zero-Shot CSS Classification David Farr et.al. 2410.13047 null
2024-10-16 When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems Asir Saadat et.al. 2410.13029 null
2024-10-16 LLM Chain Ensembles for Scalable and Accurate Data Annotation David Farr et.al. 2410.13006 link
2024-10-16 REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models Ambuje Gupta et.al. 2410.12890 null
2024-10-16 On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs Herun Wan et.al. 2410.12600 null
2024-10-16 A Claim Decomposition Benchmark for Long-form Answer Verification Zhihao Zhang et.al. 2410.12558 link
2024-10-17 MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration Jinjie Wei et.al. 2410.12532 null
2024-10-16 RosePO: Aligning LLM-based Recommenders with Human Values Jiayi Liao et.al. 2410.12519 null
2024-10-16 KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs Yongqin Xu et.al. 2410.12480 null
2024-10-18 MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models Boyang Xue et.al. 2410.12478 link
2024-10-16 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Jingming Zhuo et.al. 2410.12405 link
2024-10-17 Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs Lei Sun et.al. 2410.12298 null
2024-10-16 Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors Linwei Tao et.al. 2410.12295 null
2024-10-17 LLM-based Cognitive Models of Students with Misconceptions Shashank Sonkar et.al. 2410.12294 null
2024-10-16 An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation Junjie Chen et.al. 2410.12265 null
2024-10-16 CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity Jintao Liu et.al. 2410.12248 link
2024-10-16 On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation Xiaonan Jing et.al. 2410.12222 null
2024-10-16 Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning Huiwen Wu et.al. 2410.12130 null
2024-10-15 Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction Kaiqiao Han et.al. 2410.12040 link
2024-10-15 Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents Bolun Sun et.al. 2410.11906 null
2024-10-15 Zero-shot Model-based Reinforcement Learning using Large Language Models Abdelhakim Benechehab et.al. 2410.11711 link
2024-10-15 Black-box Uncertainty Quantification Method for LLM-as-a-Judge Nico Wagner et.al. 2410.11594 null
2024-10-15 AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data Xinjie Zhao et.al. 2410.11531 null
2024-10-15 ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability Zhongxiang Sun et.al. 2410.11414 null
2024-10-15 LargePiG: Your Large Language Model is Secretly a Pointer Generator Zhongxiang Sun et.al. 2410.11366 null
2024-10-15 Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs Shuo Li et.al. 2410.11302 null
2024-10-15 On the Capacity of Citation Generation by Large Language Models Haosheng Qian et.al. 2410.11217 null
2024-10-14 LLM Unlearning via Loss Adjustment with Only Forget Data Yaxuan Wang et.al. 2410.11143 null
2024-10-14 Can Structured Data Reduce Epistemic Uncertainty? Shriram M S et.al. 2410.11141 null
2024-10-14 Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only Jihan Yao et.al. 2410.11055 link
2024-10-13 3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation Hongxin Ding et.al. 2410.10901 null
2024-10-14 Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance Sachin Goyal et.al. 2410.10796 link
2024-10-16 SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators Rasoul Shafipour et.al. 2410.10714 null
2024-10-14 On Calibration of LLM-based Guard Models for Reliable Content Moderation Hongfu Liu et.al. 2410.10414 link
2024-10-14 Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion Xinping Zhao et.al. 2410.10408 null
2024-10-14 Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search Chenglin Li et.al. 2410.10392 null
2024-10-14 Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning Yongxin Xu et.al. 2410.10360 null
2024-10-14 SkillAggregation: Reference-free LLM-Dependent Aggregation Guangzhi Sun et.al. 2410.10215 null
2024-10-13 A Multi-LLM Orchestration Engine for Personalized, Context-Rich Assistance Sumedh Rasal et.al. 2410.10039 null
2024-10-13 Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code Nan Jiang et.al. 2410.09997 null
2024-10-15 LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models Han Qiu et.al. 2410.09962 link
2024-10-13 Can Large Language Models Generate Geospatial Code? Shuyang Hou et.al. 2410.09738 null
2024-10-13 Taming Overconfidence in LLMs: Reward Calibration in RLHF Jixuan Leng et.al. 2410.09724 link
2024-10-13 Honest AI: Fine-Tuning “Small” Language Models to Say “I Don’t Know”, and Reducing Hallucination in RAG Xinxi Chen et.al. 2410.09699 null
2024-10-13 Integrating Reinforcement Learning and Large Language Models for Crop Production Process Management Optimization and Control through A New Knowledge-Based Deep Learning Paradigm Dong Chen et.al. 2410.09680 null
2024-10-12 FlatQuant: Flatness Matters for LLM Quantization Yuxuan Sun et.al. 2410.09426 link
2024-10-12 LLM $\times$ MapReduce: Simplified Long-Sequence Processing using Large Language Models Zihan Zhou et.al. 2410.09342 link
2024-10-15 Nudging: Inference-time Alignment via Model Collaboration Yu Fei et.al. 2410.09300 null
2024-10-11 Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective Bo Ni et.al. 2410.08985 null
2024-10-11 NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models Zheng Yi Ho et.al. 2410.08970 null
2024-10-11 Decoding Secret Memorization in Code LLMs Through Token-Level Characterization Yuqing Nie et.al. 2410.08858 null
2024-10-11 Measuring the Inconsistency of Large Language Models in Preferential Ranking Xiutian Zhao et.al. 2410.08851 null
2024-10-11 Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction Zhuoran Li et.al. 2410.08829 link
2024-10-11 Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation Ruobing Wang et.al. 2410.08821 link
2024-10-11 VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding Houlun Chen et.al. 2410.08593 link
2024-10-11 Humanity in AI: Detecting the Personality of Large Language Models Baohua Zhan et.al. 2410.08545 null
2024-10-11 Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both Abhijnan Nath et.al. 2410.08458 null
2024-10-11 oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness Yu He Ke et.al. 2410.08431 null
2024-10-10 Large Airfoil Models Howon Lee et.al. 2410.08392 null
2024-10-10 Think Beyond Size: Dynamic Prompting for More Effective Reasoning Kamesh R et.al. 2410.08130 null
2024-10-10 A Closer Look at Machine Unlearning for Large Language Models Xiaojian Yuan et.al. 2410.08109 link
2024-10-10 Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering Yuan Sui et.al. 2410.08085 null
2024-10-10 Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses Pranav Senthilkumar et.al. 2410.07826 null
2024-10-10 Mitigating Gender Bias in Code Large Language Models via Model Editing Zhanyue Qin et.al. 2410.07820 null
2024-10-10 Automatic Curriculum Expert Iteration for Reliable LLM Reasoning Zirui Zhao et.al. 2410.07627 link
2024-10-10 No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users Mengxuan Hu et.al. 2410.07589 null
2024-10-10 OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting Xukai Liu et.al. 2410.07549 link
2024-10-10 MKGL: Mastery of a Three-Word Language Lingbing Guo et.al. 2410.07526 null
2024-10-09 Localizing Factual Inconsistencies in Attributable Text Generation Arie Cattan et.al. 2410.07473 link
2024-10-09 Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning Abhinav Bandari et.al. 2410.07461 link
2024-10-09 Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making Manling Li et.al. 2410.07166 link
2024-10-09 Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization Chengtao Jian et.al. 2410.07018 null
2024-10-09 Self-Boosting Large Language Models with Synthetic Preference Data Qingxiu Dong et.al. 2410.06961 null
2024-10-09 AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation Huanxi Liu et.al. 2410.06943 null
2024-10-09 Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning Runchuan Zhu et.al. 2410.06913 link
2024-10-09 Calibrating Verbalized Probabilities for Large Language Models Cheng Wang et.al. 2410.06707 null
2024-10-09 Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack Leo McKee-Reid et.al. 2410.06491 null
2024-10-09 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders David Noever et.al. 2410.06462 null
2024-10-09 Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs Ruijia Niu et.al. 2410.06431 null
2024-10-08 Validation of the Scientific Literature via Chemputation Augmented by Large Language Models Sebastian Pagel et.al. 2410.06384 null
2024-10-08 Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning Ruosen Li et.al. 2410.06304 null
2024-10-08 EVOLvE: Evaluating and Optimizing LLMs For Exploration Allen Nie et.al. 2410.06238 null
2024-10-08 ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution Corban Rivera et.al. 2410.06108 null
2024-10-10 LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs Vincent Emonet et.al. 2410.06062 link
2024-10-08 Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models Bozhou Li et.al. 2410.05802 null
2024-10-08 Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Zheyang Xiong et.al. 2410.05603 null
2024-10-07 Self-rationalization improves LLM as a fine-grained judge Prapti Trivedi et.al. 2410.05495 null
2024-10-07 ESPACE: Dimensionality Reduction of Activations for Model Compression Charbel Sakr et.al. 2410.05437 null
2024-10-05 PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms Yilong Li et.al. 2410.05315 null
2024-10-07 SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Yuxin Xiao et.al. 2410.05248 null
2024-10-07 Precise Model Benchmarking with Only a Few Observations Riccardo Fogliato et.al. 2410.05222 null
2024-10-07 Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality Guanyu Zhou et.al. 2410.04780 link
2024-10-07 Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering Zimu Wang et.al. 2410.04752 null
2024-10-06 Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval Pengcheng Jiang et.al. 2410.04585 link
2024-10-06 DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination Xuan Gong et.al. 2410.04514 null
2024-10-05 DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech Dominika Woszczyk et.al. 2410.04188 null
2024-10-04 dZiner: Rational Inverse Design of Materials with AI Agents Mehrad Ansari et.al. 2410.03963 link
2024-10-03 Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge Aparna Elangovan et.al. 2410.03775 link
2024-10-04 Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores Robert E. Blackwell et.al. 2410.03492 null
2024-10-04 Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation Tobias Leemann et.al. 2410.03461 null
2024-10-08 Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs Louis Serrano et.al. 2410.03437 null
2024-10-04 Towards a Benchmark for Large Language Models for Business Process Management Tasks Kiran Busch et.al. 2410.03255 link
2024-10-04 Showing LLM-Generated Code Selectively Based on Confidence of LLMs Jia Li et.al. 2410.03234 null
2024-10-04 ALR $^2$ : A Retrieve-then-Reason Framework for Long-context Question Answering Huayang Li et.al. 2410.03227 null
2024-10-04 Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback Kyuyoung Kim et.al. 2410.03145 link
2024-10-04 SAG: Style-Aligned Article Generation via Model Collaboration Chenning Xu et.al. 2410.03137 null
2024-10-10 ARB-LLM: Alternating Refined Binarizations for Large Language Models Zhiteng Li et.al. 2410.03129 link
2024-10-04 UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference Jing Xiong et.al. 2410.03090 null
2024-10-04 Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues Shilin Qu et.al. 2410.03049 null
2024-10-03 Characterizing Context Influence and Hallucination in Summarization James Flemings et.al. 2410.03026 link
2024-10-03 Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review Sungduk Yu et.al. 2410.03019 null
2024-09-30 Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG Chenhao Fang et.al. 2410.02825 null
2024-10-09 CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation Han He et.al. 2410.02748 link
2024-10-03 Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization Lei Xu et.al. 2410.02741 link
2024-10-03 Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization Ryan C. Barron et.al. 2410.02721 null
2024-10-07 LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Hadas Orgad et.al. 2410.02707 link
2024-10-03 Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers Shijie Chen et.al. 2410.02642 null
2024-10-03 Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration Yun Qu et.al. 2410.02511 link
2024-10-03 AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models Junfeng Fang et.al. 2410.02355 link
2024-10-04 How Much Can RAG Help the Reasoning of LLM? Jingyu Liu et.al. 2410.02338 null
2024-10-03 Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference Wei Cheng et.al. 2410.02210 null
2024-10-03 Efficiently Deploying LLMs with Controlled Risk Michael J. Zellinger et.al. 2410.02173 null
2024-10-03 Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments Amogh Mannekote et.al. 2410.02110 link
2024-10-02 DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection Daiki Chiba et.al. 2410.02095 null
2024-10-02 DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning Yebowen Hu et.al. 2410.01772 null
2024-10-02 CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs Kangsheng Wang et.al. 2410.01696 null
2024-10-02 FactAlign: Long-form Factuality Alignment of Large Language Models Chao-Wei Huang et.al. 2410.01691 link
2024-10-02 Intent Detection in the Age of LLMs Gaurav Arora et.al. 2410.01627 null
2024-10-02 Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration Kangxi Wu et.al. 2410.01285 null
2024-10-02 BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation Bryan Li et.al. 2410.01171 link
2024-10-01 Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability Weitong Zhang et.al. 2410.01064 null
2024-10-01 Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown Xingzhou Lou et.al. 2410.00847 null
2024-10-01 Dynamic Planning for LLM-based Graphical User Interface Automation Shaoqing Zhang et.al. 2410.00467 link
2024-10-01 UniAdapt: A Universal Adapter for Knowledge Calibration Tai D. Nguyen et.al. 2410.00454 null
2024-10-01 Are LLMs Aware that Some Questions are not Open-ended? Dongjie Yang et.al. 2410.00423 null
2024-10-01 Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation Bhargav Shandilya et.al. 2410.00387 null
2024-09-30 A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification Marina Ribeiro et.al. 2410.00250 null
2024-09-30 LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Ziyao Zhang et.al. 2409.20550 link
2024-09-30 Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models Arpan Mukherjee et.al. 2409.20512 null
2024-10-04 VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs Ruotong Liao et.al. 2409.20365 link
2024-09-30 MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants Zeyu Zhang et.al. 2409.20163 link
2024-09-30 Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation Huangyu Dai et.al. 2409.19877 null
2024-09-29 Calibrating Language Models with Adaptive Temperature Scaling Johnathan Xie et.al. 2409.19817 link
2024-09-29 MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models Vibhor Agarwal et.al. 2409.19492 null
2024-09-28 Overriding Safety protections of Open-source Models Sachin Kumar et.al. 2409.19476 link
2024-09-28 SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models Yi Wu et.al. 2409.19471 link
2024-09-28 Decoding Echo Chambers: LLM-Powered Simulations Revealing Polarization in Social Networks Chenxi Wang et.al. 2409.19338 null
2024-09-28 DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning Kazuki Matsuda et.al. 2409.19255 null
2024-09-27 Secure Multiparty Generative AI Manil Shrestha et.al. 2409.19120 null
2024-09-27 A Survey on the Honesty of Large Language Models Siheng Li et.al. 2409.18786 link
2024-10-02 Model-based Preference Optimization in Abstractive Summarization without Human Feedback Jaepill Choi et.al. 2409.18618 link
2024-09-26 Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model Chuang Niu et.al. 2409.18319 link
2024-09-26 Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT Natthanaphop Isaradech et.al. 2409.17683 null
2024-09-26 A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models Syed Affan Daimi et.al. 2409.17581 link
2024-09-26 HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection Xuefeng Du et.al. 2409.17504 null
2024-09-25 Post-hoc Reward Calibration: A Case Study on Length Bias Zeyu Huang et.al. 2409.17407 link
2024-09-25 Search for Efficient Large Language Models Xuan Shen et.al. 2409.17372 link
2024-09-20 A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models Satoshi Munakata et.al. 2409.17173 null
2024-09-25 Mitigating the Bias of Large Language Model Evaluation Hongli Zhou et.al. 2409.16788 link
2024-09-25 RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems Yihong Tang et.al. 2409.16727 null
2024-09-25 EventHallusion: Diagnosing Event Hallucinations in Video LLMs Jiacheng Zhang et.al. 2409.16597 link
2024-09-25 Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels Yishu Wei et.al. 2409.16563 null
2024-09-24 MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment Venkata Naren Devarakonda et.al. 2409.16455 null
2024-09-24 Automated test generation to evaluate tool-augmented LLMs as conversational AI agents Samuel Arcadinho et.al. 2409.15934 null
2024-09-24 Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts Sukai Huang et.al. 2409.15915 null
2024-09-24 Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection Xingyu Ma et.al. 2409.15907 null
2024-09-24 XTRUST: On the Multilingual Trustworthiness of Large Language Models Yahan Li et.al. 2409.15762 link
2024-09-23 Parse Trees Guided LLM Prompt Compression Wenhao Mao et.al. 2409.15395 link
2024-09-18 VERA: Validation and Enhancement for Retrieval Augmented systems Nitin Aravind Birur et.al. 2409.15364 null
2024-09-18 Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning Essa Jan et.al. 2409.15361 null
2024-09-27 Reward-Robust RLHF in LLMs Yuzi Yan et.al. 2409.15360 null
2024-09-23 A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Yunfei Xie et.al. 2409.15277 null
2024-09-26 A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models Yixi Wu et.al. 2409.15228 null
2024-09-23 Boosting Healthcare LLMs Through Retrieved Context Jordi Bayarri-Planas et.al. 2409.15127 link
2024-09-23 Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications Sean Kim et.al. 2409.15076 null
2024-09-23 InterMind: A Doctor-Patient-Family Interactive Depression Assessment System Empowered by Large Language Models Zhiyuan Zhou et.al. 2409.14878 null
2024-09-23 Past Meets Present: Creating Historical Analogy with Large Language Models Nianqi Li et.al. 2409.14820 link
2024-09-28 Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method Weichao Zhang et.al. 2409.14781 link
2024-09-23 zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning Zixiang Xian et.al. 2409.14644 null
2024-09-22 Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization Minyi Zhao et.al. 2409.14484 null
2024-09-22 Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses Hung-Ting Su et.al. 2409.14324 link
2024-09-21 OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching Zhangcheng Qiang et.al. 2409.14038 null
2024-09-20 Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology Aidan Gilson et.al. 2409.13902 null
2024-09-20 FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs Bowen Yan et.al. 2409.13612 null
2024-09-20 ChainBuddy: An AI Agent System for Generating LLM Pipelines Jingyue Zhang et.al. 2409.13588 null
2024-09-23 AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit Mohanna Hoveyda et.al. 2409.13447 link
2024-09-20 Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey Sourav Verma et.al. 2409.13385 link
2024-09-20 Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems Andrea Colombo et.al. 2409.13252 null
2024-09-19 Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models Peiyi Zhang et.al. 2409.12739 null
2024-09-19 LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks Malsha Ashani Mahawatta Dona et.al. 2409.12580 null
2024-09-19 Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation Chen Liang et.al. 2409.12411 null
2024-09-19 On the Effectiveness of LLMs for Manual Test Verifications Myron David Lucena Campos Peixoto et.al. 2409.12405 null
2024-09-18 RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models Abhinav Jain et.al. 2409.12294 null
2024-09-18 Finetuning Language Models to Emit Linguistic Expressions of Uncertainty Arslan Chaudhry et.al. 2409.12180 null
2024-09-05 LitFM: A Retrieval Augmented Structure-aware Foundation Model For Citation Graphs Jiasheng Zhang et.al. 2409.12177 null
2024-09-18 Combating Phone Scams with LLM-based Detection: Where Do We Stand? Zitong Shen et.al. 2409.11643 null
2024-09-17 HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection Theo King et.al. 2409.11579 link
2024-09-17 What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts Shuaiyu Chen et.al. 2409.11540 null
2024-09-17 CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration Jiahui Gao et.al. 2409.11365 null
2024-09-17 THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models Mengfei Liang et.al. 2409.11353 link
2024-09-25 Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling Xinyue Fang et.al. 2409.11283 null
2024-09-17 Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models Bishwash Khanal et.al. 2409.11233 null
2024-09-17 Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization Jianing Wang et.al. 2409.11212 link
2024-09-17 A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B Jemin Lee et.al. 2409.11055 link
2024-09-16 Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering Qingru Zhang et.al. 2409.10790 null
2024-09-16 “The Data Says Otherwise”-Towards Automated Fact-checking and Communication of Data Claims Yu Fu et.al. 2409.10713 null
2024-09-17 Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot Bhuvan Sachdeva et.al. 2409.10354 null
2024-09-16 Trustworthiness in Retrieval-Augmented Generation Systems: A Survey Yujia Zhou et.al. 2409.10102 link
2024-09-16 Benchmarking Large Language Model Uncertainty for Prompt Optimization Pei-Fu Guo et.al. 2409.10044 link
2024-09-18 HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making Sumera Anjum et.al. 2409.10011 link
2024-09-23 Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations Abe Bohan Hou et.al. 2409.09947 link
2024-09-16 Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges Vinay Samuel et.al. 2409.09927 link
2024-09-16 SFR-RAG: Towards Contextually Faithful LLMs Xuan-Phi Nguyen et.al. 2409.09916 null
2024-09-15 ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing Suhyeon Yoo et.al. 2409.09760 null
2024-09-15 ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts Che Wang et.al. 2409.09661 link
2024-09-21 Confidence Estimation for LLM-Based Dialogue State Tracking Yi-Jyun Sun et.al. 2409.09629 link
2024-09-14 VernaCopter: Disambiguated Natural-Language-Driven Robot via Formal Specifications Teun van de Laar et.al. 2409.09536 link
2024-09-14 Hacking, The Lazy Way: LLM Augmented Pentesting Dhruva Goyal et.al. 2409.09493 null
2024-09-19 The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection Yi Yang et.al. 2409.09380 null
2024-09-13 Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions Zahra Ashktorab et.al. 2409.08937 null
2024-09-23 When Context Leads but Parametric Memory Follows in Large Language Models Yufei Tao et.al. 2409.08435 link
2024-09-12 Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT Irene Weber et.al. 2409.07732 link
2024-09-11 MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Praveen K Kanithi et.al. 2409.07314 null
2024-09-11 Reranking Laws for Language Generation: A Communication-Theoretic Perspective António Farinhas et.al. 2409.07131 null
2024-09-11 Understanding Knowledge Drift in LLMs through Misinformation Alina Fastowski et.al. 2409.07085 link
2024-09-11 Representation Tuning Christopher M. Ackerman et.al. 2409.06927 link
2024-09-10 Semi-Supervised Reward Modeling via Iterative Self-Training Yifei He et.al. 2409.06903 link
2024-09-10 Geometric-Averaged Preference Optimization for Soft Preference Labels Hiroki Furuta et.al. 2409.06691 null
2024-09-10 Alleviating Hallucinations in Large Language Models with Scepticism Modeling Yetao Wu et.al. 2409.06601 null
2024-09-10 GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Sacha Muller et.al. 2409.06595 link
2024-09-10 Automate Strategy Finding with LLM in Quant investment Zhizhuo Kou et.al. 2409.06289 null
2024-09-14 ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog Yujian Gan et.al. 2409.06097 link
2024-09-09 $\mathbb{USCD}$ : Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding Shuai Wang et.al. 2409.05923 null
2024-09-09 Benchmarking Chinese Knowledge Rectification in Large Language Models Tianhe Lu et.al. 2409.05806 link
2024-09-09 LLMs Will Always Hallucinate, and We Need to Live With This Sourav Banerjee et.al. 2409.05746 null
2024-09-07 LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs Yongxin Deng et.al. 2409.04744 null
2024-09-03 Here’s Charlie! Realising the Semantic Web vision of Agents in the age of LLMs Jesse Wright et.al. 2409.04465 null
2024-09-06 Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering Larissa Pusch et.al. 2409.04181 null
2024-09-13 Safeguarding AI Agents: Developing and Analyzing Safety Architectures Ishaan Domkundwar et.al. 2409.03793 null
2024-09-06 RAG based Question-Answering for Contextual Response Prediction System Sriram Veturi et.al. 2409.03708 null
2024-09-05 Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration Jeremy Qin et.al. 2409.03225 link
2024-09-05 Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models Jie Ma et.al. 2409.03155 link
2024-09-04 CLUE: Concept-Level Uncertainty Estimation for Large Language Models Yu-Hsiang Wang et.al. 2409.03021 null
2024-09-04 Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models Gabriel Y. Arteaga et.al. 2409.02976 link
2024-09-10 LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Jiajie Zhang et.al. 2409.02897 link
2024-09-04 Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs Ruoyu Wang et.al. 2409.02686 null
2024-09-03 Initial Development and Evaluation of the Creative Artificial Intelligence through Recurring Developments and Determinations (CAIRDD) System Jeremy Straub et.al. 2409.02291 null
2024-09-03 Physical Rule-Guided Convolutional Neural Network Kishor Datta Gupta et.al. 2409.02081 null
2024-09-03 RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer Jiangyi Deng et.al. 2409.02074 null
2024-08-25 Path-Consistency: Prefix Enhancement for Efficient Inference in LLM Jiace Zhu et.al. 2409.01281 null
2024-09-02 Statically Contextualizing Large Language Models with Typed Holes Andrew Blinn et.al. 2409.00921 null
2024-09-01 Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering Derian Boer et.al. 2409.00861 link
2024-09-04 Learning to Ask: When LLMs Meet Unclear Instruction Wenxuan Wang et.al. 2409.00557 null
2024-08-31 Does Alignment Tuning Really Break LLMs’ Internal Confidence? Hongseok Oh et.al. 2409.00352 link
2024-09-08 ProGRes: Prompted Generative Rescoring on ASR n-Best Ada Defne Tur et.al. 2409.00217 link
2024-08-30 LLMs hallucinate graphs too: a structural perspective Erwan Le Merrer et.al. 2409.00159 null
2024-08-29 HoneyComb: A Flexible LLM-Based Agent System for Materials Science Huan Zhang et.al. 2409.00135 null
2024-09-04 Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs Ziyan Cui et.al. 2409.00128 null
2024-09-08 Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning Momin Abbas et.al. 2409.00124 null
2024-09-04 Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation Mohammad Nadeem et.al. 2409.00105 null
2024-08-26 Evaluating ChatGPT on Nuclear Domain-Specific Data Muhammad Anwar et.al. 2409.00090 null
2024-08-26 Watermarking Techniques for Large Language Models: A Survey Yuqing Liang et.al. 2409.00089 null
2024-08-30 Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain Francesca Grasso et.al. 2408.17362 link
2024-08-30 Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling Guangya Wan et.al. 2408.17017 null
2024-09-05 UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches Chao Wang et.al. 2408.16966 null
2024-09-04 Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies Zhiyang Qi et.al. 2408.16586 null
2024-08-29 LoraMap: Harnessing the Power of LoRA Connections Hyeryun Park et.al. 2408.16264 null
2024-08-28 Logic-Enhanced Language Model Agents for Trustworthy Social Simulations Agnieszka Mensfelt et.al. 2408.16081 link
2024-08-28 WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration Yao Zhang et.al. 2408.15978 null
2024-09-07 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Yuncheng Yang et.al. 2408.15915 link
2024-08-28 Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization Léo Hemamou et.al. 2408.15801 null
2024-08-28 An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation Thai Tang Quoc et.al. 2408.15658 null
2024-08-28 Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation Lujun Gui et.al. 2408.15562 null
2024-08-29 LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation Haichuan Hu et.al. 2408.15533 link
2024-08-28 Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression Haowen Hou et.al. 2408.15491 link
2024-08-27 The Uniqueness of LLaMA3-70B with Per-Channel Quantization: An Empirical Study Minghai Qin et.al. 2408.15301 null
2024-08-27 Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić et.al. 2408.15204 link
2024-08-27 Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation N. E. Kriman et.al. 2408.15171 null
2024-08-27 Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering Haowei Du et.al. 2408.15037 null
2024-08-28 Language-specific Calibration for Pruning Multilingual Language Models Simon Kurz et.al. 2408.14398 null
2024-08-26 Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders Cong Xu et.al. 2408.14238 link
2024-08-25 CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction Guangya Wan et.al. 2408.13940 null
2024-08-25 Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models Duy Khoa Pham et.al. 2408.13808 null
2024-08-25 Poor-Supervised Evaluation for SuperLLM via Mutual Consistency Peiwen Yuan et.al. 2408.13738 null
2024-08-25 LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models Aoxiao Zhong et.al. 2408.13727 null
2024-08-24 Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models Jinyang Wu et.al. 2408.13533 null
2024-08-27 Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning Hourui Deng et.al. 2408.13184 null
2024-08-23 IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models Zhihao Yu et.al. 2408.13073 link
2024-08-23 Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering Haowei Du et.al. 2408.12979 null
2024-08-22 SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection Mengya Hu et.al. 2408.12748 link
2024-08-22 Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning Mushui Liu et.al. 2408.12469 null
2024-08-22 A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation Weijia Zhang et.al. 2408.12398 null
2024-09-04 Graph Retrieval Augmented Trustworthiness Reasoning Ying Zhu et.al. 2408.12333 link
2024-08-22 Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models Meiyun Wang et.al. 2408.12326 link
2024-08-22 Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators Dingkang Yang et.al. 2408.12325 link
2024-08-22 MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient Yanzeng Li et.al. 2408.12236 null
2024-08-22 FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation KaShun Shum et.al. 2408.12168 link
2024-08-22 ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Zhaochen Su et.al. 2408.12076 link
2024-08-21 Understanding Epistemic Language with a Bayesian Theory of Mind Lance Ying et.al. 2408.12022 null
2024-08-21 RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization Jinhu Qi et.al. 2408.12003 null
2024-08-21 Automatic knowledge-graph creation from historical documents: The Chilean dictatorship as a case study Camila Díaz et.al. 2408.11975 null
2024-08-23 Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy Priyanka Mandikal et.al. 2408.11903 link
2024-08-17 How Susceptible are LLMs to Influence in Prompts? Sotiris Anagnostidis et.al. 2408.11865 null
2024-08-21 DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework Zhifei Xie et.al. 2408.11788 null
2024-08-21 EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning Zhihao Li et.al. 2408.11397 null
2024-08-21 First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models Chi Ma et.al. 2408.11393 null
2024-08-21 RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation Xuanwang Zhang et.al. 2408.11381 link
2024-08-20 A Little Confidence Goes a Long Way John Scoville et.al. 2408.11239 null
2024-08-20 Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model Chenhan Yuan et.al. 2408.10764 null
2024-08-20 Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models Artem Vazhentsev et.al. 2408.10692 null
2024-08-20 Analysis of Plan-based Retrieval for Grounded Text Generation Ameya Godbole et.al. 2408.10490 null
2024-08-20 LeCov: Multi-level Testing Criteria for Large Language Models Xuan Xie et.al. 2408.10474 null
2024-08-19 Enhanced document retrieval with topic embeddings Kavsar Huseynova et.al. 2408.10435 null
2024-08-19 LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain Nicholas Pipitone et.al. 2408.10343 link
2024-08-19 Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models Tianyu Zhang et.al. 2408.10124 link
2024-08-19 MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation Ching-Wen Yang et.al. 2408.09865 null
2024-08-19 Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? Shiyu Ni et.al. 2408.09773 null
2024-08-19 A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification Claudio M. V. de Andrade et.al. 2408.09629 null
2024-08-17 TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems Xinke Jiang et.al. 2408.09199 link
2024-08-17 Chinese Metaphor Recognition Using a Multi-stage Prompting Large Language Model Jie Wang et.al. 2408.09177 null
2024-08-17 Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making Siyu Wu et.al. 2408.09176 null
2024-08-24 Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection Hsiu-Yuan Huang et.al. 2408.09172 null
2024-08-15 Graph Retrieval-Augmented Generation: A Survey Boci Peng et.al. 2408.08921 link
2024-08-12 Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection Chengyu Song et.al. 2408.08902 null
2024-08-22 Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions Chenming Tang et.al. 2408.08780 null
2024-08-16 Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused Dingwei Chen et.al. 2408.08769 null
2024-08-16 MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector Wenjie Fu et.al. 2408.08661 link
2024-08-16 PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code Ziyou Jiang et.al. 2408.08619 null
2024-08-16 SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models Kaushal Kumar Maurya et.al. 2408.08545 null
2024-08-15 Plan with Code: Comparing approaches for robust NL to DSL generation Nastaran Bassamzadeh et.al. 2408.08335 null
2024-08-14 CodeMirage: Hallucinations in Code Generated by Large Language Models Vibhor Agarwal et.al. 2408.08333 null
2024-08-16 Covert Bias: The Severity of Social Views’ Unalignment in Language Models Towards Implicit and Explicit Opinion Abeer Aldayel et.al. 2408.08212 null
2024-08-15 LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation Bohao Wang et.al. 2408.08208 null
2024-08-15 Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy Shaojun Xu et.al. 2408.08188 null
2024-08-15 Confidence-weighted integration of human and machine judgments for superior decision-making Felipe Yáñez et.al. 2408.08083 link
2024-08-15 LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning Jiajie Li et.al. 2408.07981 null
2024-08-14 Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization Yuxin Jiang et.al. 2408.07471 link
2024-08-13 MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty Yongjin Yang et.al. 2408.06816 link
2024-08-12 A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution Sampath Rajapaksha et.al. 2408.06272 null
2024-08-12 On Effects of Steering Latent Representation for Large Language Model Unlearning Dang Huu-Tien et.al. 2408.06223 link
2024-08-11 Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models Wenbo Zhang et.al. 2408.05873 link
2024-08-10 Can LLMs Replace Manual Annotation of Software Engineering Artifacts? Toufique Ahmed et.al. 2408.05534 null
2024-08-19 SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Yuze Zhao et.al. 2408.05517 link
2024-08-09 FiST-Financial Style Transfer with Hallucination and Creativity Control Framework Sohini Roychowdhury et.al. 2408.05365 null
2024-08-09 A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning Ye Yuan et.al. 2408.05141 null
2024-08-16 Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models Zikai Xie et.al. 2408.05093 link
2024-08-08 Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews Samantha Chan et.al. 2408.04681 link
2024-08-06 Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD) Avshalom Manevich et.al. 2408.04664 null
2024-08-08 Arctic-TILT. Business Document Understanding at Sub-Billion Scale Łukasz Borchmann et.al. 2408.04632 null
2024-08-08 Learning Fine-Grained Grounded Citations for Attributed Large Language Models Lei Huang et.al. 2408.04568 link
2024-08-20 Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate Yiqun Zhang et.al. 2408.04472 link
2024-08-07 Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology Panagiotis Fytas et.al. 2408.04121 null
2024-08-07 Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks Zizhang Chen et.al. 2408.03732 null
2024-08-19 KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models Ruizhe Zhang et.al. 2408.03297 null
2024-08-05 An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs Dongming Jin et.al. 2408.02450 null
2024-08-05 SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models Shujuan Zhao et.al. 2408.02302 null
2024-08-07 SpecRover: Code Intent Extraction via LLMs Haifeng Ruan et.al. 2408.02232 null
2024-08-05 ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning Yuxuan Wang et.al. 2408.02210 null
2024-08-04 Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process Peng Wang et.al. 2408.02103 null
2024-08-04 Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference Ke Shen et.al. 2408.01935 null
2024-08-03 TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation Xingpeng Sun et.al. 2408.01867 null
2024-08-03 WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization Liwenhan Xie et.al. 2408.01703 null
2024-08-02 Analyzing LLMs’ Capabilities to Establish Implicit User Sentiment of Software Desirability Sherri Weitl-Harms et.al. 2408.01527 null
2024-07-28 Faculty Perspectives on the Potential of RAG in Computer Science Higher Education Sagnik Dakshit et.al. 2408.01462 null
2024-08-18 RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework Kunlun Zhu et.al. 2408.01262 link
2024-08-02 Misinforming LLMs: vulnerabilities, challenges and opportunities Bo Zhou et.al. 2408.01168 null
2024-08-01 Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection Steven Fincke et.al. 2408.00914 null
2024-07-26 ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model Ning Xu et.al. 2408.00804 null
2024-08-01 Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions Guangzhi Xiong et.al. 2408.00727 link
2024-08-01 Future of Artificial Intelligence in Agile Software Development Mariyam Mahboob et.al. 2408.00703 null
2024-07-25 Closing the gap between open-source and commercial large language models for medical evidence summarization Gongbo Zhang et.al. 2408.00588 null
2024-08-01 Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation Xiaoye Qu et.al. 2408.00555 null
2024-08-01 Jailbreaking Text-to-Image Models with LLM-Based Agents Yingkai Dong et.al. 2408.00523 null
2024-08-01 DeliLaw: A Chinese Legal Counselling System Based on a Large Language Model Nan Xie et.al. 2408.00357 null
2024-07-31 Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation Valdemar Danry et.al. 2408.00024 null
2024-07-30 WebApp1K: A Practical Code-Generation Benchmark for Web App Development Yi Cui et.al. 2408.00019 link
2024-07-31 Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs Shi Liu et.al. 2407.21771 null
2024-07-31 Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency Taiji Li et.al. 2407.21443 null
2024-08-09 Cost-Effective Hallucination Detection for LLMs Simon Valentin et.al. 2407.21424 null
2024-07-31 Towards interfacing large language models with ASR systems using confidence measures and prompting Maryam Naderi et.al. 2407.21414 null
2024-07-31 Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs Elan Markowitz et.al. 2407.21358 link
2024-07-30 Accelerating Large Language Model Inference with Self-Supervised Early Exits Florian Valade et.al. 2407.21082 null
2024-07-25 Multi-group Uncertainty Quantification for Long-form Text Generation Terrance Liu et.al. 2407.21057 null
2024-07-24 Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications Cui Long et.al. 2407.21055 null
2024-07-30 Automated Review Generation Method Based on Large Language Models Shican Wu et.al. 2407.20906 link
2024-07-30 How to Measure the Intelligence of Large Language Models? Nils Körber et.al. 2407.20828 null
2024-07-30 Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian Serena Auriemma et.al. 2407.20654 null
2024-07-25 An Efficient Inference Framework for Early-exit Large Language Models Ruijie Miao et.al. 2407.20272 null
2024-07-17 Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies Lachlan McGinness et.al. 2407.20244 null
2024-08-02 Improving Retrieval Augmented Language Model with Self-Reasoning Yuan Xia et.al. 2407.19813 null
2024-07-29 SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Wenxuan Zhang et.al. 2407.19672 link
2024-07-27 Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review Tongyue Shi et.al. 2407.19256 null
2024-07-26 OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation Zilong Wang et.al. 2407.19056 link
2024-08-08 Know Your Limits: A Survey of Abstention in Large Language Models Bingbing Wen et.al. 2407.18418 null
2024-07-25 Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement Jaehun Jung et.al. 2407.18370 null
2024-07-25 The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation Eric Yang et.al. 2407.18044 null
2024-07-24 WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries Wenting Zhao et.al. 2407.17468 null
2024-07-24 ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering Xiuying Chen et.al. 2407.16931 null
2024-07-23 Generation Constraint Scaling Can Mitigate Hallucination Georgios Kollias et.al. 2407.16908 null
2024-07-23 TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class Anishka IIITD et.al. 2407.16805 link
2024-07-23 Shared Imagination: LLMs Hallucinate Alike Yilun Zhou et.al. 2407.16604 null
2024-07-23 Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs Yifan Xia et.al. 2407.16576 null
2024-07-23 Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models Ioana Buhnila et.al. 2407.16565 link
2024-07-25 Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models Kenza Benkirane et.al. 2407.16470 link
2024-07-23 Enhancing LLM’s Cognition via Structurization Kai Liu et.al. 2407.16434 link
2024-07-23 LawLuo: A Chinese Law Firm Co-run by LLM Agents Jingyun Sun et.al. 2407.16252 link
2024-07-23 Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models Nishanth Madhusudhan et.al. 2407.16221 null
2024-07-22 Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned Song Wang et.al. 2407.15441 null
2024-07-22 MAVEN-Fact: A Large-scale Event Factuality Detection Dataset Chunyang Li et.al. 2407.15352 link
2024-07-20 Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models Ze Yu Zhang et.al. 2407.14845 null
2024-07-19 Internal Consistency and Self-Feedback in Large Language Models: A Survey Xun Liang et.al. 2407.14507 link
2024-07-19 Prompted Aspect Key Point Analysis for Quantitative Review Summarization An Quang Tang et.al. 2407.14049 link
2024-07-18 CoDefeater: Using LLMs To Find Defeaters in Assurance Cases Usman Gohar et.al. 2407.13717 link
2024-08-01 Prover-Verifier Games improve legibility of LLM outputs Jan Hendrik Kirchner et.al. 2407.13692 null
2024-07-18 BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models Moon Ye-Bin et.al. 2407.13442 null
2024-07-18 CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis Junying Chen et.al. 2407.13301 link
2024-07-19 AI-Assisted SQL Authoring at Industry Scale Chandra Maddila et.al. 2407.13280 null
2024-07-19 Retrieval-Augmented Generation for Natural Language Processing: A Survey Shangyu Wu et.al. 2407.13193 null
2024-07-18 Translate-and-Revise: Boosting Large Language Models for Constrained Translation Pengcheng Huang et.al. 2407.13164 null
2024-07-17 Halu-J: Critique-Based Hallucination Judge Binjie Wang et.al. 2407.12943 link
2024-08-01 Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild Nicolas Richet et.al. 2407.12927 link
2024-07-17 Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models Alexander R. Pelletier et.al. 2407.12888 link
2024-07-17 LLM-based query paraphrasing for video search Jiaxin Wu et.al. 2407.12341 null
2024-07-17 Optimizing Query Generation for Enhanced Document Retrieval in RAG Hamin Koo et.al. 2407.12325 null
2024-07-11 NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2 Tengfei Xue et.al. 2407.12057 null
2024-07-16 What’s Wrong? Refining Meeting Summaries with LLM Feedback Frederic Kirstein et.al. 2407.11919 null
2024-07-16 LoFTI: Localization and Factuality Transfer to Indian Locales Sona Elza Simon et.al. 2407.11833 link
2024-07-16 A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots Lucia Chen et.al. 2407.11387 null
2024-07-19 Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models Qingcheng Zeng et.al. 2407.11282 link
2024-07-15 AstroMLab 1: Who Wins Astronomy Jeopardy!? Yuan-Sen Ting et.al. 2407.11194 null
2024-07-15 Inertial Confinement Fusion Forecasting via LLMs Mingkai Chen et.al. 2407.11098 null
2024-07-15 Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis Yunting Liu et.al. 2407.10899 null
2024-07-24 MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs Quang H. Nguyen et.al. 2407.10834 link
2024-07-15 Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval Shengjie Ma et.al. 2407.10805 link
2024-07-15 GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework Hannah Sansford et.al. 2407.10793 null
2024-07-15 CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses Jing Yao et.al. 2407.10725 null
2024-07-15 Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews Lucas Joos et.al. 2407.10652 null
2024-07-14 GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? Barah Fazili et.al. 2407.10245 null
2024-07-14 Look Within, Why LLMs Hallucinate: A Causal Perspective He Li et.al. 2407.10153 null
2024-07-13 Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues KuanChao Chu et.al. 2407.09897 null
2024-07-13 Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks Shengbin Yue et.al. 2407.09893 link
2024-07-13 On Mitigating Code LLM Hallucinations with API Documentation Nihal Jain et.al. 2407.09726 null
2024-07-22 Mitigating Entity-Level Hallucination in Large Language Models Weihang Su et.al. 2407.09417 link
2024-07-12 PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents Saber Zerhoudi et.al. 2407.09394 link
2024-07-12 DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection Sangpil Youm et.al. 2407.09283 null
2024-07-12 The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs Anh Thu Maria Bui et.al. 2407.09152 null
2024-07-12 Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors Nico Daheim et.al. 2407.09136 link
2024-07-12 Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations David N. Palacio et.al. 2407.08983 null
2024-07-15 Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation Biqing Qi et.al. 2407.08940 link
2024-07-12 Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? Yingming Pu et.al. 2407.08922 link
2024-07-11 Evaluating Nuanced Bias in Large Language Model Free Response Answers Jennifer Healey et.al. 2407.08842 null
2024-07-11 Proving that Cryptic Crossword Clue Answers are Correct Martin Andrews et.al. 2407.08824 link
2024-07-11 Uncertainty Estimation of Large Language Models in Medical Question Answering Jiaxin Wu et.al. 2407.08662 null
2024-07-11 $β$-DPO: Direct Preference Optimization with Dynamic $β$ Junkang Wu et.al. 2407.08639 link
2024-07-11 On the Universal Truthfulness Hyperplane Inside LLMs Junteng Liu et.al. 2407.08582 link
2024-07-22 Lynx: An Open Source Hallucination Evaluation Model Selvan Sunitha Ravi et.al. 2407.08488 null
2024-07-11 On the attribution of confidence to large language models Geoff Keeling et.al. 2407.08388 null