Updated on 2025.06.28
Usage instructions: here
Table of Contents
HealthLLM
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | “What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets | Akshay Paruchuri et.al. | 2506.21532 | null |
2025-06-26 | MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification | Shadman Sobhan et.al. | 2506.21199 | null |
2025-06-25 | Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation | Md Toufique Hasan et.al. | 2506.20869 | null |
2025-06-25 | An Agentic System for Rare Disease Diagnosis with Traceable Reasoning | Weike Zhao et.al. | 2506.20430 | null |
2025-06-25 | ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset | Yilin Wang et.al. | 2506.20093 | null |
2025-06-24 | DiaLLMs: EHR Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction | Weijieying Ren et.al. | 2506.20059 | null |
2025-06-24 | Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks | Konstantinos Vrettos et.al. | 2506.20009 | null |
2025-06-24 | MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | Yucheng Zhou et.al. | 2506.19835 | null |
2025-06-24 | LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis | Lei Kang et.al. | 2506.19702 | null |
2025-06-26 | Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance | Xuesong Li et.al. | 2506.19683 | null |
2025-06-24 | Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation | Yuanhe Tian et.al. | 2506.19665 | null |
2025-06-24 | Automatic Posology Structuration : What role for LLMs? | Natalia Bobkova et.al. | 2506.19525 | null |
2025-06-24 | EmoStage: A Framework for Accurate Empathetic Response Generation via Perspective-Taking and Phase Recognition | Zhiyang Qi et.al. | 2506.19279 | null |
2025-06-23 | Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs | Janak Kapuriya et.al. | 2506.19185 | null |
2025-06-23 | GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model | Amir Faiyaz et.al. | 2506.19164 | null |
2025-06-23 | Enhancing Biosecurity in Tamper-Resistant Large Language Models With Quantum Gradient Descent | Fahmida Hai et.al. | 2506.19086 | null |
2025-06-23 | FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation | Nitish Nagesh et.al. | 2506.19082 | null |
2025-06-23 | RWESummary: A Framework and Test for Choosing Large Language Models to Summarize Real-World Evidence (RWE) Studies | Arjun Mukerji et.al. | 2506.18819 | null |
2025-06-23 | MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis | Yuting Zhang et.al. | 2506.18512 | null |
2025-06-23 | Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics | Yousang Cho et.al. | 2506.18387 | null |
2025-06-23 | Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team | Weilun Yu et.al. | 2506.18348 | null |
2025-06-24 | Co-persona: Leveraging LLMs and Expert Collaboration to Understand User Personas through Social Media Data Analysis | Min Yin et.al. | 2506.18269 | null |
2025-06-22 | Programming Quantum Computers with Large Language Models | Elena R. Henderson et.al. | 2506.18125 | null |
2025-06-22 | Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives | Batool Haider et.al. | 2506.18116 | null |
2025-06-22 | Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster | Fenghe Tang et.al. | 2506.18034 | null |
2025-06-22 | SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model | Guankun Wang et.al. | 2506.17873 | null |
2025-06-21 | Engagement and Disclosures in LLM-Powered Cognitive Behavioral Therapy Exercises: A Factorial Design Comparing the Influence of a Robot vs. Chatbot Over Time | Mina Kian et.al. | 2506.17831 | null |
2025-06-21 | Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs | Catarina Pires et.al. | 2506.17782 | null |
2025-06-21 | Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages | Matthias Schöffel et.al. | 2506.17715 | null |
2025-06-21 | LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning | Haoxuan Che et.al. | 2506.17562 | null |
2025-06-20 | Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation | Hao Guan et.al. | 2506.17442 | null |
2025-06-19 | Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases | Yubeen Bae et.al. | 2506.17336 | link |
2025-06-14 | Automating Financial Statement Audits with Large Language Models | Rushi Wang et.al. | 2506.17282 | null |
2025-06-20 | The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making | Abinitha Gourabathina et.al. | 2506.17163 | null |
2025-06-20 | DistillNote: LLM-based clinical note summaries improve heart failure diagnosis | Heloisa Oss Boll et.al. | 2506.16777 | null |
2025-06-19 | Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System | Jianlin Shi et.al. | 2506.16628 | null |
2025-06-19 | A Scoping Review of Synthetic Data Generation for Biomedical Research and Applications | Hanshu Rao et.al. | 2506.16594 | null |
2025-06-19 | Do We Talk to Robots Like Therapists, and Do They Respond Accordingly? Language Alignment in AI Emotional Support | Sophie Chiang et.al. | 2506.16473 | null |
2025-06-23 | From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents | Mohammad Amaan Sayeed et.al. | 2506.15911 | null |
2025-06-18 | Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning | Chunlei Li et.al. | 2506.15477 | null |
2025-06-18 | DeVisE: Behavioral Testing of Medical Large Language Models | Camila Zurdo Tagliabue et.al. | 2506.15339 | null |
2025-06-18 | Universal Laboratory Model: prognosis of abnormal clinical outcomes based on routine tests | Pavel Karpov et.al. | 2506.15330 | link |
2025-06-18 | Cohort Discovery: A Survey on LLM-Assisted Clinical Trial Recruitment | Shrestha Ghosh et.al. | 2506.15301 | null |
2025-06-18 | Mapping Caregiver Needs to AI Chatbot Design: Strengths and Gaps in Mental Health Support for Alzheimer’s and Dementia Caregivers | Jiayue Melissa Shi et.al. | 2506.15047 | null |
2025-06-17 | From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction? | Shadman Sakib et.al. | 2506.14949 | link |
2025-06-17 | A Vision for Geo-Temporal Deep Research Systems: Towards Comprehensive, Transparent, and Reproducible Geo-Temporal Information Synthesis | Bruno Martins et.al. | 2506.14345 | null |
2025-06-17 | Abstract Meaning Representation for Hospital Discharge Summarization | Paul Landes et.al. | 2506.14101 | link |
2025-06-17 | InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking | Rahul Seetharaman et.al. | 2506.14086 | null |
2025-06-13 | Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases | Bonam Mingole et.al. | 2506.13805 | null |
2025-06-13 | Enhancing Clinical Decision Support and EHR Insights through LLMs and the Model Context Protocol: An Open-Source MCP-FHIR Framework | Abul Ehtesham et.al. | 2506.13800 | null |
2025-06-18 | The NordDRG AI Benchmark for Large Language Models | Tapio Pitkäranta et.al. | 2506.13790 | link |
2025-06-16 | Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems | Shang-Chi Tsai et.al. | 2506.13692 | null |
2025-06-16 | Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning | David Bani-Harouni et.al. | 2506.13474 | null |
2025-06-16 | Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models | James Chua et.al. | 2506.13206 | null |
2025-06-16 | Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs | Gyutaek Oh et.al. | 2506.13102 | null |
2025-06-15 | CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation | Naihao Deng et.al. | 2506.12936 | null |
2025-06-15 | Towards Visualizing Electronic Medical Records via Natural Language Queries | Haodi Zhang et.al. | 2506.12837 | null |
2025-06-14 | Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders | Ananya Joshi et.al. | 2506.12576 | link |
2025-06-14 | Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare | Yubin Kim et.al. | 2506.12482 | null |
2025-06-14 | Understanding the Effect of Knowledge Graph Extraction Error on Downstream Graph Analyses: A Case Study on Affiliation Graphs | Erica Cai et.al. | 2506.12367 | null |
2025-06-20 | Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning | Xiaotian Zhang et.al. | 2506.12307 | null |
2025-06-13 | Semantic Scheduling for LLM Inference | Wenyue Hua et.al. | 2506.12204 | link |
2025-06-13 | Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs | Chenqian Le et.al. | 2506.12182 | null |
2025-06-10 | Risks & Benefits of LLMs & GenAI for Platform Integrity, Healthcare Diagnostics, Cybersecurity, Privacy & AI Safety: A Comprehensive Survey, Roadmap & Implementation Blueprint | Kiarash Ahi et.al. | 2506.12088 | null |
2025-06-16 | Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making | Claudio Fanconi et.al. | 2506.11887 | null |
2025-06-13 | Converting Annotated Clinical Cases into Structured Case Report Forms | Pietro Ferrazzi et.al. | 2506.11666 | null |
2025-06-24 | RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning | Yu Wang et.al. | 2506.11555 | null |
2025-06-13 | Prioritizing Alignment Paradigms over Task-Specific Model Customization in Time-Series LLMs | Wei Li et.al. | 2506.11512 | link |
2025-06-13 | Predicting Early-Onset Colorectal Cancer with Large Language Models | Wilson Lau et.al. | 2506.11410 | null |
2025-06-13 | Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning | Liying Wang et.al. | 2506.11376 | null |
2025-06-12 | LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic | Weibing Zheng et.al. | 2506.11221 | link |
2025-06-11 | Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning | Ji Young Byun et.al. | 2506.11166 | null |
2025-06-16 | ADAgent: LLM Agent for Alzheimer’s Disease Analysis with Collaborative Coordinator | Wenlong Hou et.al. | 2506.11150 | null |
2025-06-19 | Autonomous Computer Vision Development with Agentic AI | Jin Kim et.al. | 2506.11140 | link |
2025-06-10 | Scalable Medication Extraction and Discontinuation Identification from Electronic Health Records Using Large Language Models | Chong Shao et.al. | 2506.11137 | null |
2025-06-10 | Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK | Carlos Garcia-Fernandez et.al. | 2506.11129 | null |
2025-06-09 | KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Junyu Liu et.al. | 2506.11114 | null |
2025-06-16 | Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation | Uttej Kallakurik et.al. | 2506.11105 | null |
2025-06-12 | The Role of Generative AI in Facilitating Social Interactions: A Scoping Review | T. T. J. E. Arets et.al. | 2506.10927 | null |
2025-06-12 | Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs | Alberto Testoni et.al. | 2506.10769 | null |
2025-06-12 | Large Language Models for Detection of Life-Threatening Texts | Thanh Thi Nguyen et.al. | 2506.10687 | null |
2025-06-11 | HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding | Yanzhao Shi et.al. | 2506.09634 | null |
2025-06-11 | ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning | Yu Sun et.al. | 2506.09513 | link |
2025-06-11 | Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers | Ilanit Sobol et.al. | 2506.09495 | null |
2025-06-11 | “Is This Really a Human Peer Supporter?”: Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions | Kellie Yu Hui Sim et.al. | 2506.09354 | null |
2025-06-10 | The Curious Language Model: Strategic Test-Time Information Acquisition | Michael Cooper et.al. | 2506.09173 | null |
2025-06-10 | CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling | Yahan Li et.al. | 2506.08584 | link |
2025-06-10 | RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being | Rahatara Ferdousi et.al. | 2506.08486 | null |
2025-06-10 | Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving | Yuxuan Zhou et.al. | 2506.08349 | link |
2025-06-09 | Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework | Melissa Estevez et.al. | 2506.08231 | null |
2025-06-09 | Supporting Construction Worker Well-Being with a Multi-Agent Conversational AI System | Fan Yang et.al. | 2506.07997 | null |
2025-06-11 | MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models | Philip R. Liu et.al. | 2506.07400 | link |
2025-06-08 | Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance | Mohammadreza Chavoshi et.al. | 2506.07273 | null |
2025-06-07 | AI PsyRoom: Artificial Intelligence Platform for Segmented Yearning and Reactive Outcome Optimization Method | Yigui Feng et.al. | 2506.06740 | null |
2025-06-07 | C-PATH: Conversational Patient Assistance and Triage in Healthcare System | Qi Shi et.al. | 2506.06737 | null |
2025-06-07 | DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains | Zhihui Chen et.al. | 2506.06705 | null |
2025-06-07 | Interpretable Depression Detection from Social Media Text Using LLM-Derived Embeddings | Samuel Kim et.al. | 2506.06616 | null |
2025-06-07 | MedCite: Can Language Models Generate Verifiable Text for Medicine? | Xiao Wang et.al. | 2506.06605 | null |
2025-06-14 | RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints | Tan-Hanh Pham et.al. | 2506.06600 | null |
2025-06-02 | Large Language Models for EEG: A Comprehensive Survey and Taxonomy | Naseem Babu et.al. | 2506.06353 | null |
2025-06-01 | Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support | Wu Hao Ran et.al. | 2506.06340 | null |
2025-06-06 | Building Models of Neurological Language | Henry Watkins et.al. | 2506.06208 | null |
2025-06-09 | MIRIAD: Augmenting LLMs with millions of medical query-response pairs | Qinyue Zheng et.al. | 2506.06091 | null |
2025-06-06 | BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions | Saptarshi Sengupta et.al. | 2506.05766 | null |
2025-06-06 | Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning | Yangui Fang et.al. | 2506.05671 | null |
2025-06-06 | Can LLMs Express Personality Across Cultures? Introducing CulturalPersonas for Evaluating Trait Alignment | Priyanka Dey et.al. | 2506.05670 | null |
2025-06-05 | Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction | Zhihao Tang et.al. | 2506.05428 | null |
2025-06-03 | Beyond RAG: Reinforced Reasoning Augmented Generation for Clinical Notes | Lo Pang-Yun Ting et.al. | 2506.05386 | null |
2025-06-05 | Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation | Soumitra Ghosh et.al. | 2506.05073 | null |
2025-06-05 | From EHRs to Patient Pathways: Scalable Modeling of Longitudinal Health Trajectories with LLMs | Chantal Pellegrini et.al. | 2506.04831 | null |
2025-06-05 | A MISMATCHED Benchmark for Scientific Natural Language Inference | Firoz Shaik et.al. | 2506.04603 | null |
2025-06-04 | Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification | Payel Bhattacharjee et.al. | 2506.04450 | null |
2025-06-04 | MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale | Ran Xu et.al. | 2506.04405 | null |
2025-06-04 | AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents | Fengze Liu et.al. | 2506.04293 | null |
2025-06-04 | A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization | Sarvesh Soni et.al. | 2506.04156 | null |
2025-06-13 | LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation | Ming Zhang et.al. | 2506.04078 | link |
2025-06-04 | AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data | Sina Rashidian et.al. | 2506.04032 | null |
2025-06-04 | Trustworthy Medical Question Answering: An Evaluation-Centric Survey | Yinuo Wang et.al. | 2506.03659 | null |
2025-06-04 | VChatter: Exploring Generative Conversational Agents for Simulating Exposure Therapy to Reduce Social Anxiety | Han Zhang et.al. | 2506.03520 | null |
2025-06-05 | Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing | Shigeng Chen et.al. | 2506.03490 | link |
2025-06-04 | Delta-KNN: Improving Demonstration Selection in In-Context Learning for Alzheimer’s Disease Detection | Chuyuan Li et.al. | 2506.03476 | null |
2025-06-03 | Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems | Michael E. Garcia-Alcoser et.al. | 2506.03259 | null |
2025-06-03 | Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis | Richard Armitage et.al. | 2506.02987 | null |
2025-06-03 | FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models | Yan Gao et.al. | 2506.02961 | null |
2025-06-03 | A Smart Multimodal Healthcare Copilot with Powerful LLM Reasoning | Xuejiao Zhao et.al. | 2506.02470 | link |
2025-06-02 | A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering | Madan Krishnamurthy et.al. | 2506.02160 | null |
2025-06-04 | The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning | Edward Y. Chang et.al. | 2506.02139 | null |
2025-06-02 | Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis | Chi-Jane Chen et.al. | 2506.01918 | null |
2025-06-02 | Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation | Jiaxi Sheng et.al. | 2506.01841 | null |
2025-06-02 | Reasoning-Based Approach with Chain-of-Thought for Alzheimer’s Detection Using Speech and Large Language Models | Chanwoo Park et.al. | 2506.01683 | null |
2025-06-02 | Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents | Manan Suri et.al. | 2506.01344 | null |
2025-06-02 | Evaluating Large Language Models in Crisis Detection: A Real-World Benchmark from Psychological Support Hotlines | Guifeng Deng et.al. | 2506.01329 | null |
2025-06-02 | DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models | Jiancheng Ye et.al. | 2506.01257 | null |
2025-06-02 | MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine | Shufeng Kong et.al. | 2506.01252 | null |
2025-06-01 | Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation | Pimchanok Sukjai et.al. | 2506.01118 | null |
2025-06-03 | Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation | Running Yang et.al. | 2506.00612 | null |
2025-05-31 | AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation | Ming Wang et.al. | 2506.00551 | link |
2025-05-31 | Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization | Suhas BN et.al. | 2506.00448 | null |
2025-05-31 | Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees’ Dialogue to Facilitate Nurse Communication Training | Keyeun Lee et.al. | 2506.00386 | null |
2025-05-30 | MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform | Hayoung Jung et.al. | 2506.00308 | null |
2025-06-03 | PersianMedQA: Language-Centric Evaluation of LLMs in the Persian Medical Domain | Mohammad Javad Ranjbar Kalahroodi et.al. | 2506.00250 | null |
2025-05-30 | Structuring Radiology Reports: Challenging LLMs with Lightweight Models | Johannes Moll et.al. | 2506.00200 | null |
2025-05-30 | Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models | Fardin Ahsan Sakib et.al. | 2506.00134 | null |
2025-06-04 | ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases | Yuchong Li et.al. | 2506.00095 | null |
2025-05-30 | Artificial Empathy: AI based Mental Health | Aditya Naik et.al. | 2506.00081 | null |
2025-05-29 | Evaluating Prompt Engineering Techniques for Accuracy and Confidence Elicitation in Medical LLMs | Nariman Naderi et.al. | 2506.00072 | null |
2025-05-29 | Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports | Sina Amirrajab et.al. | 2506.00060 | null |
2025-05-30 | Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs | Juraj Vladika et.al. | 2505.24830 | null |
2025-05-30 | A survey of using EHR as real-world evidence for discovering and validating new drug indications | Nabasmita Talukdar et.al. | 2505.24767 | null |
2025-06-06 | LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews | Christian Jaumann et.al. | 2505.24757 | link |
2025-06-02 | Automated Structured Radiology Report Generation | Jean-Benoit Delbrouck et.al. | 2505.24223 | null |
2025-05-30 | Semi-structured LLM Reasoners Can Be Rigorously Audited | Jixuan Leng et.al. | 2505.24217 | null |
2025-05-30 | Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning | Jiacheng Lin et.al. | 2505.24105 | null |
2025-05-29 | MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering | Yuexing Hao et.al. | 2505.24040 | null |
2025-05-28 | Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction | Mai Ali et.al. | 2505.23822 | null |
2025-05-27 | MedOrchestra: A Hybrid Cloud-Local LLM Approach for Clinical Data Interpretation | Sihyeon Lee et.al. | 2505.23806 | null |
2025-06-02 | MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks | Suhana Bedi et.al. | 2505.23802 | null |
2025-06-03 | Can Large Language Models Challenge CNNs in Medical Image Analysis? | Shibbir Ahmed et.al. | 2505.23503 | null |
2025-05-29 | Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons | Krithik Vishwanath et.al. | 2505.23477 | null |
2025-05-29 | Second Opinion Matters: Towards Adaptive Clinical AI via the Consensus of Expert Model Ensemble | Amit Kumthekar et.al. | 2505.23075 | null |
2025-05-29 | CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents | Zhen Xiang et.al. | 2505.23055 | link |
2025-05-29 | Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction | Guangyi Liu et.al. | 2505.23034 | null |
2025-05-29 | Exploring Scaling Laws for EHR Foundation Models | Sheng Zhang et.al. | 2505.22964 | null |
2025-05-29 | LLM-based HSE Compliance Assessment: Benchmark, Performance, and Advancements | Jianwei Wang et.al. | 2505.22959 | link |
2025-05-30 | ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room | Nikita Mehandru et.al. | 2505.22919 | null |
2025-05-28 | Can Large Language Models Match the Conclusions of Systematic Reviews? | Christopher Polzak et.al. | 2505.22787 | link |
2025-05-28 | Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation | Yunsoo Kim et.al. | 2505.22222 | null |
2025-05-28 | Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection | Jinming Zhang et.al. | 2505.22029 | link |
2025-05-28 | Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning | Qihuang Zhong et.al. | 2505.21958 | null |
2025-05-28 | Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding | Hanyin Wang et.al. | 2505.21908 | null |
2025-05-29 | Query, Don’t Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries | Josefa Lia Stoisser et.al. | 2505.21801 | null |
2025-05-27 | BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum | Yubin Kim et.al. | 2505.21757 | null |
2025-05-27 | Counterfactual Simulatability of LLM Explanations for Generation Tasks | Marvin Limpijankit et.al. | 2505.21740 | null |
2025-05-24 | Vision Meets Language: A RAG-Augmented YOLOv8 Framework for Coffee Disease Diagnosis and Farmer Assistance | Semanto Mondal et.al. | 2505.21544 | link |
2025-05-27 | Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | Yihan Wang et.al. | 2505.21503 | null |
2025-05-27 | Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery | Lina Zhao et.al. | 2505.21418 | null |
2025-05-27 | Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts | Yuxin Zhu et.al. | 2505.21324 | null |
2025-05-27 | Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings | Gunjan Balde et.al. | 2505.21242 | null |
2025-05-27 | Simulating Ethics: Using LLM Debate Panels to Model Deliberation on Medical Dilemmas | Hazem Zohny et.al. | 2505.21112 | null |
2025-05-27 | MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems | Kai Chen et.al. | 2505.20824 | link |
2025-05-27 | Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients | Hyungjun Park et.al. | 2505.20609 | null |
2025-05-26 | In-context learning capabilities of Large Language Models to detect suicide risk among adolescents from speech transcripts | Filomene Roquefort et.al. | 2505.20491 | null |
2025-05-24 | Do LLMs have a Gender (Entropy) Bias? | Sonal Prabhune et.al. | 2505.20343 | null |
2025-05-23 | PMOA-TTS: Introducing the PubMed Open Access Textual Times Series Corpus | Shahriar Noroozizadeh et.al. | 2505.20323 | null |
2025-05-23 | Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP | Satya Narayana Cheetirala et.al. | 2505.20320 | null |
2025-05-26 | Fine-grained List-wise Alignment for Generative Medication Recommendation | Chenxiao Fan et.al. | 2505.20218 | link |
2025-05-28 | Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations | Mohit Chandra et.al. | 2505.20201 | null |
2025-05-26 | Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare | Natallia Kokash et.al. | 2505.20020 | null |
2025-05-26 | Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation | Hoyun Song et.al. | 2505.20014 | link |
2025-05-26 | An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning | Andrew Zamai et.al. | 2505.19954 | null |
2025-05-30 | FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks | Atsunori Moteki et.al. | 2505.19662 | null |
2025-05-26 | DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue | Yichun Feng et.al. | 2505.19630 | link |
2025-05-26 | AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare | Ying Xiao et.al. | 2505.19562 | link |
2025-05-25 | Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning | Shaohao Rui et.al. | 2505.19213 | null |
2025-05-25 | CardioCoT: Hierarchical Reasoning for Multimodal Survival Analysis | Shaohao Rui et.al. | 2505.19195 | null |
2025-05-25 | The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework | Feiran Liu et.al. | 2505.19139 | null |
2025-05-25 | Toward Human Centered Interactive Clinical Question Answering System | Dina Albassam et.al. | 2505.18928 | null |
2025-05-24 | TULUN: Transparent and Adaptable Low-resource Machine Translation | Raphaël Merx et.al. | 2505.18683 | null |
2025-05-24 | DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation | Zhihao Jia et.al. | 2505.18630 | null |
2025-05-24 | CLaDMoP: Learning Transferrable Models from Successful Clinical Trials via LLMs | Yiqing Zhang et.al. | 2505.18527 | null |
2025-05-24 | From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data | Ugur Kursuncu et.al. | 2505.18464 | null |
2025-05-24 | MedScore: Factuality Evaluation of Free-Form Medical Answers | Heyuan Huang et.al. | 2505.18452 | link |
2025-05-23 | Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering | Jessica Tang et.al. | 2505.18412 | link |
2025-05-23 | RedactOR: An LLM-Powered Framework for Automatic Clinical Data De-Identification | Praphul Singh et.al. | 2505.18380 | null |
2025-05-23 | Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? | Waleed Reda et.al. | 2505.18350 | null |
2025-05-23 | PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language | Naghmeh Jamali et.al. | 2505.18331 | null |
2025-05-23 | TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification | Jianghao Wu et.al. | 2505.18283 | link |
2025-05-23 | Will Large Language Models Transform Clinical Prediction? | Yusuf Yildiz et.al. | 2505.18246 | null |
2025-05-22 | Towards medical AI misalignment: a preliminary study | Barbara Puccio et.al. | 2505.18212 | null |
2025-05-23 | Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL | Che Liu et.al. | 2505.17952 | null |
2025-05-23 | PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions | Daeun Kyung et.al. | 2505.17818 | null |
2025-05-23 | EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications | Ancheng Xu et.al. | 2505.17654 | null |
2025-05-23 | WiNGPT-3.0 Technical Report | Boqin Zhuang et.al. | 2505.17387 | link |
2025-05-23 | AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing | Yinghui Huang et.al. | 2505.17380 | null |
2025-05-22 | CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports | Xiao Yu Cindy Zhang et.al. | 2505.17265 | null |
2025-05-22 | CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation | Ibrahim Ethem Hamamci et.al. | 2505.17167 | null |
2025-05-22 | Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands | Kristin Qi et.al. | 2505.17137 | null |
2025-05-21 | Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models | Zongru Shao et.al. | 2505.17119 | null |
2025-05-21 | Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation | Kristine Ann M. Carandang et.al. | 2505.17095 | null |
2025-05-18 | Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases | Valentina Carbonari et.al. | 2505.17065 | null |
2025-05-15 | Assessing the Quality of AI-Generated Clinical Notes: A Validated Evaluation of a Large Language Model Scribe | Erin Palm et.al. | 2505.17047 | null |
2025-05-22 | MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning | Suhao Yu et.al. | 2505.16964 | null |
2025-05-22 | A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP | Issey Sukeda et.al. | 2505.16661 | link |
2025-05-22 | Collaboration among Multiple Large Language Models for Medical Question Answering | Kexin Shang et.al. | 2505.16648 | null |
2025-05-22 | No Black Boxes: Interpretable and Interactable Predictive Healthcare with Knowledge-Enhanced Agentic Causal Discovery | Xiaoxue Han et.al. | 2505.16288 | null |
2025-05-22 | Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools | Panagiotis Lymperopoulos et.al. | 2505.16113 | null |
2025-05-23 | Continually Self-Improving Language Models for Bariatric Surgery Question–Answering | Yash Kumar Atri et.al. | 2505.16102 | null |
2025-05-22 | TrialPanorama: Database and Benchmark for Systematic Review and Design of Clinical Trials | Zifeng Wang et.al. | 2505.16097 | null |
2025-05-22 | Multi-modal Integration Analysis of Alzheimer’s Disease Using Large Language Models and Knowledge Graphs | Kanan Kiguchi et.al. | 2505.15747 | null |
2025-05-21 | Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling | He Hu et.al. | 2505.15715 | null |
2025-05-21 | Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs | Lang Gao et.al. | 2505.15524 | null |
2025-05-22 | MentalMAC: Enhancing Large Language Models for Detecting Mental Manipulation via Multi-Task Anti-Curriculum Distillation | Yuansheng Gao et.al. | 2505.15255 | null |
2025-05-21 | AI Solutionism and Digital Self-Tracking with Wearables | Hannah R. Nolasco et.al. | 2505.15162 | null |
2025-05-21 | A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents | Ian Steenstra et.al. | 2505.15108 | null |
2025-05-23 | Diagnosing our datasets: How does my language model learn clinical information? | Furong Jia et.al. | 2505.15024 | null |
2025-05-20 | MedBrowseComp: Benchmarking Medical Deep Research and Computer Use | Shan Chen et.al. | 2505.14963 | null |
2025-05-20 | RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection | Wenjun Hou et.al. | 2505.14318 | link |
2025-05-20 | s3: You Don’t Need That Much Data to Train a Search Agent via RL | Pengcheng Jiang et.al. | 2505.14146 | link |
2025-05-20 | ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data | Xinzhe Zheng et.al. | 2505.14038 | null |
2025-05-20 | Fragments to Facts: Partial-Information Fragment Inference from LLMs | Lucas Rosenblatt et.al. | 2505.13819 | link |
2025-05-19 | VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation | Yubin Kim et.al. | 2505.13577 | null |
2025-05-14 | Source framing triggers systematic evaluation bias in Large Language Models | Federico Germani et.al. | 2505.13488 | null |
2025-05-11 | Evaluating Reasoning LLMs for Suicide Screening with the Columbia-Suicide Severity Rating Scale | Avinash Patil et.al. | 2505.13480 | link |
2025-05-19 | Learnware of Language Models: Specialized Small Language Models Can Do Big | Zhi-Hao Tan et.al. | 2505.13425 | link |
2025-05-19 | Dementia Through Different Eyes: Explainable Modeling of Human and LLM Perceptions for Early Awareness | Lotem Peled-Cohen et.al. | 2505.13418 | null |
2025-05-19 | Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice | Zhi Liu et.al. | 2505.13156 | null |
2025-05-19 | Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning | Xiaoyu Yang et.al. | 2505.13081 | null |
2025-05-19 | GAP: Graph-Assisted Prompts for Dialogue-based Medication Recommendation | Jialun Zhong et.al. | 2505.12888 | null |
2025-05-19 | EpiLLM: Unlocking the Potential of Large Language Models in Epidemic Forecasting | Chenghua Gong et.al. | 2505.12738 | null |
2025-05-18 | ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents | Navid Madani et.al. | 2505.12531 | null |
2025-05-18 | MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks | Yinghao Zhu et.al. | 2505.12371 | link |
2025-05-18 | PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs | Sriram Selvam et.al. | 2505.12238 | link |
2025-05-17 | AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation | Xiechi Zhang et.al. | 2505.11887 | null |
2025-05-21 | LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models | Ryan Chen et.al. | 2505.11772 | null |
2025-05-20 | MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | Kevin Wu et.al. | 2505.11733 | link |
2025-05-16 | MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models | Xiaomin Li et.al. | 2505.11613 | null |
2025-05-16 | Heart2Mind: Human-Centered Contestable Psychiatric Disorder Diagnosis System using Wearable ECG Monitors | Hung Nguyen et.al. | 2505.11612 | link |
2025-05-16 | Disentangling Reasoning and Knowledge in Medical Large Language Models | Rahul Thapa et.al. | 2505.11462 | null |
2025-05-16 | CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs | Sijia Chen et.al. | 2505.11413 | null |
2025-05-15 | Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI | Agnik Saha et.al. | 2505.10472 | null |
2025-05-20 | AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges | Ranjan Sapkota et.al. | 2505.10468 | null |
2025-05-15 | Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation | Yue Guo et.al. | 2505.10409 | null |
2025-05-15 | From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making | Dubai Li et.al. | 2505.10282 | link |
2025-05-15 | The Evolving Landscape of Generative Large Language Models and Traditional Natural Language Processing in Medicine | Rui Yang et.al. | 2505.10261 | null |
2025-05-15 | What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs | Xinlan Yan et.al. | 2505.10113 | null |
2025-05-14 | Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models | Aditya Nagori et.al. | 2505.09805 | null |
2025-05-14 | A Multimodal Multi-Agent Framework for Radiology Report Generation | Ziruo Yi et.al. | 2505.09787 | null |
2025-05-16 | Tales of the 2025 Los Angeles Fire: Hotwash for Public Health Concerns in Reddit via LLM-Enhanced Topic Modeling | Sulong Zhou et.al. | 2505.09665 | null |
2025-05-13 | Performance Gains of LLMs With Humans in a World of LLMs Versus Humans | Lucas McCullum et.al. | 2505.08902 | null |
2025-05-13 | NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context | Ben Yao et.al. | 2505.08734 | null |
2025-05-13 | LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs | K M Sajjadul Islam et.al. | 2505.08704 | null |
2025-05-13 | TrialMatchAI: An End-to-End AI-powered Clinical Trial Recommendation System to Streamline Patient-to-Trial Matching | Majd Abdallah et.al. | 2505.08508 | null |
2025-05-13 | Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions | Lata Pangtey et.al. | 2505.08464 | null |
2025-05-13 | Decoding Neighborhood Environments with Large Language Models | Andrew Cart et.al. | 2505.08163 | null |
2025-05-13 | Communication Styles and Reader Preferences of LLM and Human Experts in Explaining Health Information | Jiawei Zhou et.al. | 2505.08143 | null |
2025-05-12 | Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models | Weiyi Wu et.al. | 2505.07968 | null |
2025-05-11 | TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking | Ching Nam Hang et.al. | 2505.07891 | null |
2025-05-07 | A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas | Pranav Narayanan Venkit et.al. | 2505.07850 | null |
2025-05-12 | Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 2030 | Mouxiao Bian et.al. | 2505.07205 | null |
2025-05-12 | KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification | Hajar Sakai et.al. | 2505.07162 | null |
2025-05-11 | Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI | Chao Ding et.al. | 2505.06912 | null |
2025-05-10 | Utilizing LLMs to Investigate the Disputed Role of Evidence in Electronic Cigarette Health Policy Formation in Australia and the UK | Damian Curran et.al. | 2505.06782 | null |
2025-05-10 | NeuroPal: A Clinically-Informed Multimodal LLM Assistant for Mental Health Combining Sleep Chronotherapy, Cognitive Behavioral Reframing, and Adaptive Phytochemical Intervention | Xiaoran Han et.al. | 2505.06640 | null |
2025-05-10 | Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning | H M Dipu Kabir et.al. | 2505.06592 | link |
2025-05-07 | Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs | Hung Manh Pham et.al. | 2505.06296 | null |
2025-05-15 | Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information | Joshua Harris et.al. | 2505.06046 | null |
2025-05-09 | A Day in Their Shoes: Using LLM-Based Perspective-Taking Interactive Fiction to Reduce Stigma Toward Dirty Work | Xiangzhe Yuan et.al. | 2505.05786 | null |
2025-05-09 | Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications | Da Wu et.al. | 2505.05736 | link |
2025-05-08 | Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models | Wei Peng et.al. | 2505.05189 | link |
2025-05-08 | Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization | Ajwad Abrar et.al. | 2505.05070 | null |
2025-05-07 | Retrieval Augmented Generation Evaluation for Health Documents | Mario Ceresa et.al. | 2505.04680 | null |
2025-05-06 | Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction | Paul Landes et.al. | 2505.04655 | null |
2025-05-06 | Advancing Conversational Diagnostic AI with Multimodal Reasoning | Khaled Saab et.al. | 2505.04653 | null |
2025-05-06 | FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights | Chengzhang Yu et.al. | 2505.04649 | null |
2025-05-05 | ChatGPT for automated grading of short answer questions in mechanical ventilation | Tejas Jade et.al. | 2505.04645 | null |
2025-05-07 | The Aloe Family Recipe for Open and Specialized Healthcare LLMs | Dario Garcia-Gasulla et.al. | 2505.04388 | null |
2025-05-07 | Can Language Models Understand Social Behavior in Clinical Conversations? | Manas Satish Bedmutha et.al. | 2505.04152 | null |
2025-05-07 | Natural Language Generation in Healthcare: A Review of Methods and Applications | Mengxian Lyu et.al. | 2505.04073 | null |
2025-04-30 | Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding | Trilok Padhi et.al. | 2505.03788 | null |
2025-04-30 | mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging | Eleftherios Tzanis et.al. | 2505.03785 | link |
2025-04-30 | ALFRED: Ask a Large-language model For Reliable ECG Diagnosis | Jin Yu et.al. | 2505.03781 | null |
2025-05-06 | Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis | Shuang Zhou et.al. | 2505.03467 | null |
2025-05-06 | MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks | Mouath Abu Daoud et.al. | 2505.03427 | link |
2025-05-06 | Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation | Mohammad Shoaib Ansari et.al. | 2505.03406 | null |
2025-05-06 | Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback | Shijing Zhu et.al. | 2505.03293 | null |
2025-05-02 | Enhancing ML Model Interpretability: Leveraging Fine-Tuned Large Language Models for Better Understanding of AI | Jonas Bokstaller et.al. | 2505.02859 | null |
2025-05-05 | Enhancing LLMs’ Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry | Junu Kim et.al. | 2505.02722 | link |
2025-05-05 | Structure Causal Models and LLMs Integration in Medical Visual Question Answering | Zibo Xu et.al. | 2505.02703 | null |
2025-05-05 | AI Standardized Patient Improves Human Conversations in Advanced Cancer Care | Kurtis Haut et.al. | 2505.02694 | link |
2025-05-08 | A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law | Qianjun Pan et.al. | 2505.02665 | null |
2025-05-08 | Bielik v3 Small: Technical Report | Krzysztof Ociepa et.al. | 2505.02550 | null |
2025-05-05 | Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors | Ryan Louie et.al. | 2505.02428 | null |
2025-05-04 | Generative AI in clinical practice: novel qualitative evidence of risk and responsible use of Google’s NotebookLM | Max Reuter et.al. | 2505.01955 | null |
2025-05-03 | Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings | Alexander Davis et.al. | 2505.01711 | null |
2025-05-03 | High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers | Brian Wong et.al. | 2505.01693 | null |
2025-05-02 | Emotions in the Loop: A Survey of Affective Computing for Emotional Support | Karishma Hegde et.al. | 2505.01542 | null |
2025-05-12 | Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications | Jiawei He et.al. | 2505.01146 | null |
2025-05-10 | SSRLBot: Designing and Developing a Large Language Model-based Agent using Socially Shared Regulated Learning | Xiaoshan Huang et.al. | 2505.00945 | null |
2025-05-05 | Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs | Dung Nguyen et.al. | 2505.00744 | null |
2025-05-01 | Red Teaming Large Language Models for Healthcare | Vahid Balazadeh et.al. | 2505.00467 | null |
2025-05-01 | KoACD: The First Korean Adolescent Dataset for Cognitive Distortion Analysis | JunSeo Kim et.al. | 2505.00367 | null |
2025-05-01 | AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care | Md Asaduzzaman Jabin et.al. | 2505.00275 | link |
2025-04-28 | MDD-LLM: Towards Accuracy Large Language Models for Major Depressive Disorder Diagnosis | Yuyang Sha et.al. | 2505.00032 | null |
2025-04-21 | Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models | Tri Nguyen et.al. | 2505.00010 | null |
2025-04-30 | TRUST: An LLM-Based Dialogue System for Trauma Understanding and Structured Assessments | Sichang Tu et.al. | 2504.21851 | null |
2025-04-30 | TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training | Shengqian Wang et.al. | 2504.21735 | null |
2025-04-30 | XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs | Marco Arazzi et.al. | 2504.21700 | null |
2025-04-30 | UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation | Linshan Wu et.al. | 2504.21336 | link |
2025-04-30 | Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA | Xuanzhao Dong et.al. | 2504.21252 | link |
2025-04-29 | A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces | Juliana Barbosa et.al. | 2504.21211 | null |
2025-04-29 | Multimodal Large Language Models for Medicine: A Comprehensive Survey | Jiarui Ye et.al. | 2504.21051 | null |
2025-04-23 | Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh | MD Thamed Bin Zaman Chowdhury et.al. | 2504.21025 | null |
2025-04-29 | Jekyll-and-Hyde Tipping Point in an AI’s Behavior | Neil F. Johnson et.al. | 2504.20980 | null |
2025-04-29 | ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Ziqing Fan et.al. | 2504.20930 | link |
2025-04-29 | Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records | Jesus Lovon et.al. | 2504.20547 | null |
2025-04-30 | Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging | Neil K. R. Sehgal et.al. | 2504.20519 | null |
2025-04-29 | “I’ve talked to ChatGPT about my issues last night.”: Examining Mental Health Conversations with Large Language Models through Reddit Analysis | Kyuha Jung et.al. | 2504.20320 | null |
2025-04-28 | OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis | Jinglin He et.al. | 2504.20118 | null |
2025-04-28 | Transforming Evidence Synthesis: A Systematic Review of the Evolution of Automated Meta-Analysis in the Age of AI | Lingbo Li et.al. | 2504.20113 | null |
2025-04-15 | Recommending Clinical Trials for Online Patient Cases using Artificial Intelligence | Joey Chan et.al. | 2504.20059 | null |
2025-04-28 | Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI | Hugo Georgenthum et.al. | 2504.19918 | null |
2025-04-28 | A Tripartite Perspective on GraphRAG | Michael Banf et.al. | 2504.19667 | null |
2025-04-28 | m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training | Meng Xiao et.al. | 2504.19565 | null |
2025-05-01 | BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text | Jiageng Wu et.al. | 2504.19467 | link |
2025-04-27 | HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer’s Disease | Qiuhui Chen et.al. | 2504.19075 | null |
2025-04-27 | Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models | Anindya Bijoy Das et.al. | 2504.19061 | null |
2025-04-26 | AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression | Dong Whi Yoo et.al. | 2504.18932 | null |
2025-04-26 | Clinical knowledge in LLMs does not translate to human interactions | Andrew M. Bean et.al. | 2504.18919 | link |
2025-04-25 | Proof-of-TBI – Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction | Ross Gore et.al. | 2504.18671 | null |
2025-04-22 | Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes | Guanchen Wu et.al. | 2504.18569 | null |
2025-04-25 | Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers | Jared Moore et.al. | 2504.18412 | link |
2025-04-25 | MAGI: Multi-Agent Guided Interview for Psychiatric Assessment | Guanqun Bi et.al. | 2504.18260 | null |
2025-04-25 | Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization | Wataru Kawakami et.al. | 2504.18080 | null |
2025-05-05 | Optimism, Expectation, or Sarcasm? Multi-Class Hope Speech Detection in Spanish and English | Sabur Butt et.al. | 2504.17974 | null |
2025-04-24 | LLM Agent Swarm for Hypothesis-Driven Drug Discovery | Kevin Song et.al. | 2504.17967 | null |
2025-04-24 | Replay to Remember: Retaining Domain Knowledge in Streaming Language Models | Sneh Pillai et.al. | 2504.17780 | null |
2025-04-24 | Towards a HIPAA Compliant Agentic AI System in Healthcare | Subash Neupane et.al. | 2504.17669 | null |
2025-04-24 | PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare | Jose G. Moreno et.al. | 2504.17360 | null |
2025-04-24 | Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues | Jinfeng Zhou et.al. | 2504.17238 | null |
2025-04-25 | The Rise of Small Language Models in Healthcare: A Comprehensive Survey | Muskan Garg et.al. | 2504.17119 | null |
2025-04-23 | Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study | Andy Li et.al. | 2504.16601 | null |
2025-04-23 | Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition | Zhenguang Zhong et.al. | 2504.16504 | null |
2025-04-23 | ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs | Fahmida Liza Piya et.al. | 2504.16394 | link |
2025-04-22 | Investigating LLMs in Clinical Triage: Promising Capabilities, Persistent Intersectional Biases | Joseph Lee et.al. | 2504.16273 | null |
2025-04-21 | Measuring Interest Group Positions on Legislation: An AI-Driven Analysis of Lobbying Reports | Jiseon Kim et.al. | 2504.15333 | link |
2025-04-21 | Med-CoDE: Medical Critique based Disagreement Evaluation Framework | Mohit Gupta et.al. | 2504.15330 | null |
2025-04-21 | POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications | Chunjing Gan et.al. | 2504.14917 | null |
2025-04-25 | A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs | Yihan Lin et.al. | 2504.14657 | null |
2025-04-20 | HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models | Fan Gao et.al. | 2504.14594 | null |
2025-04-19 | Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations | Katie Matton et.al. | 2504.14150 | link |
2025-04-18 | A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task | Laerdon Kim et.al. | 2504.14066 | null |
2025-04-17 | Deep literature reviews: an application of fine-tuned language models to migration research | Stefano M. Iacus et.al. | 2504.13685 | null |
2025-04-18 | LLM Sensitivity Evaluation Framework for Clinical Diagnosis | Chenwei Yan et.al. | 2504.13475 | null |
2025-04-17 | ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images | Sangwook Kim et.al. | 2504.13023 | null |
2025-04-17 | Chinese-Vicuna: A Chinese Instruction-following Llama-based Model | Chenghao Fan et.al. | 2504.12737 | null |
2025-04-16 | Leveraging Large Language Models for Multi-Class and Multi-Label Detection of Drug Use and Overdose Symptoms on Social Media | Muhammad Ahmad et.al. | 2504.12355 | null |
2025-04-15 | A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports | Jing Wang et.al. | 2504.12350 | null |
2025-04-14 | Paging Dr. GPT: Extracting Information from Clinical Notes to Enhance Patient Predictions | David Anderson et.al. | 2504.12338 | null |
2025-04-14 | “It Listens Better Than My Therapist”: Exploring Social Media Discourse on LLMs as Mental Health Tool | Anna-Carolina Haensch et.al. | 2504.12337 | null |
2025-04-13 | QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model | Zongxian Yang et.al. | 2504.12334 | null |
2025-04-12 | Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis | Shahriar Noroozizadeh et.al. | 2504.12326 | null |
2025-04-18 | Selective Attention Federated Learning: Improving Privacy and Efficiency for Clinical Text Classification | Yue Li et.al. | 2504.11793 | null |
2025-04-16 | Large Language Models for Drug Overdose Prediction from Longitudinal Medical Records | Md Sultan Al Nahian et.al. | 2504.11792 | null |
2025-04-16 | Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets | Yongpei Ma et.al. | 2504.11777 | null |
2025-04-15 | Cancer-Myth: Evaluating AI Chatbot on Patient Questions with False Presuppositions | Wang Bill Zhu et.al. | 2504.11373 | link |
2025-04-15 | Learning to Be A Doctor: Searching for Effective Medical Agent Architectures | Yangyang Zhuang et.al. | 2504.11301 | null |
2025-04-26 | Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs | Yingjian Chen et.al. | 2504.10982 | null |
2025-04-15 | Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content | Yilang Peng et.al. | 2504.10878 | null |
2025-04-13 | Federated Learning with Layer Skipping: Efficient Training of Large Language Models for Healthcare NLP | Lihong Zhang et.al. | 2504.10536 | null |
2025-04-08 | Exposure to Content Written by Large Language Models Can Reduce Stigma Around Opioid Use Disorder in Online Communities | Shravika Mittal et.al. | 2504.10501 | null |
2025-04-14 | CliniChat: A Multi-Source Knowledge-Driven Framework for Clinical Interview Dialogue Reconstruction and Evaluation | Jing Chen et.al. | 2504.10418 | null |
2025-04-14 | Performance of Large Language Models in Supporting Medical Diagnosis and Treatment | Diogo Sousa et.al. | 2504.10405 | null |
2025-04-20 | Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families | Shahriar Noroozizadeh et.al. | 2504.10340 | null |
2025-04-20 | Emotional Strain and Frustration in LLM Interactions in Software Engineering | Cristina Martinez Montes et.al. | 2504.10050 | null |
2025-04-19 | EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety | Jiahao Qiu et.al. | 2504.09689 | link |
2025-04-15 | ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model | Wuyang Lan et.al. | 2504.09421 | link |
2025-04-12 | Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries | Koustuv Saha et.al. | 2504.09271 | null |
2025-04-04 | The Lyme Disease Controversy: An AI-Driven Discourse Analysis of a Quarter Century of Academic Debate and Divides | Teo Susnjak et.al. | 2504.08777 | link |
2025-04-01 | Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation | Xiaofan Zhou et.al. | 2504.08768 | null |
2025-04-11 | MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models | Junmo Kim et.al. | 2504.08329 | link |
2025-04-24 | Can Reasoning LLMs Enhance Clinical Document Classification? | Akram Mustafa et.al. | 2504.08040 | null |
2025-04-14 | Psychological Health Knowledge-Enhanced LLM-based Social Network Crisis Intervention Text Transfer Recognition Method | Shurui Wu et.al. | 2504.07983 | null |
2025-04-11 | An LLM-Driven Multi-Agent Debate System for Mendelian Diseases | Xinyang Zhou et.al. | 2504.07881 | null |
2025-04-10 | MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval-Augmented Generation | Yixiang Chen et.al. | 2504.07724 | link |
2025-04-17 | PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization | Yang Jiao et.al. | 2504.07717 | null |
2025-04-10 | Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction | Kyoyun Choi et.al. | 2504.07415 | null |
2025-04-09 | Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging | Siyuan Dai et.al. | 2504.07336 | null |
2025-04-09 | A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation | Fatemeh Amrollahi et.al. | 2504.07278 | null |
2025-04-09 | Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis | Umakanta Maharana et.al. | 2504.06581 | link |
2025-04-08 | Human Trust in AI Search: A Large-Scale Experiment | Haiwen Li et.al. | 2504.06435 | null |
2025-04-08 | A Geometric-Aware Perspective and Beyond: Hybrid Quantum-Classical Machine Learning Methods | Azadeh Alavia et.al. | 2504.06328 | null |
2025-04-08 | LExT: Towards Evaluating Trustworthiness of Natural Language Explanations | Krithi Shailya et.al. | 2504.06227 | null |
2025-04-08 | TxGemma: Efficient and Agentic LLMs for Therapeutics | Eric Wang et.al. | 2504.06196 | null |
2025-04-11 | Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups | Rijul Magu et.al. | 2504.06160 | null |
2025-04-08 | How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM | Jirong Zha et.al. | 2504.05786 | null |
2025-04-07 | The challenge of uncertainty quantification of large language models in medicine | Zahra Atf et.al. | 2504.05278 | null |
2025-04-07 | On the Performance of an Explainable Language Model on PubMedQA | Venkat Srinivasan et.al. | 2504.05074 | null |
2025-04-07 | Leveraging Large Language Models for Cost-Effective, Multilingual Depression Detection and Severity Assessment | Longdi Xian et.al. | 2504.04891 | null |
2025-04-07 | Simulating Persuasive Dialogues on Meat Reduction with Generative Agents | Georg Ahnert et.al. | 2504.04872 | link |
2025-04-08 | Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide | Zhijie Duan et.al. | 2504.04346 | null |
2025-04-06 | MedM-VL: What Makes a Good Medical LVLM? | Yiming Shi et.al. | 2504.04323 | link |
2025-04-05 | AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs | Xinyu Mao et.al. | 2504.04193 | link |
2025-04-05 | A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models | Aviv Brokman et.al. | 2504.04083 | null |
2025-04-15 | Do “New Snow Tablets” Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs | Sifan Li et.al. | 2504.03786 | link |
2025-04-02 | Emerging Cyber Attack Risks of Medical AI Agents | Jianing Qiu et.al. | 2504.03759 | null |
2025-04-03 | AD-GPT: Large Language Models in Alzheimer’s Disease | Ziyu Liu et.al. | 2504.03071 | null |
2025-04-03 | Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models | Chengyang He et.al. | 2504.03051 | null |
2025-04-03 | Bias in Large Language Models Across Clinical Applications: A Systematic Review | Thanathip Suenghataiphorn et.al. | 2504.02917 | null |
2025-04-16 | OnRL-RAG: Real-Time Personalized Mental Health Dialogue System | Ahsan Bilal et.al. | 2504.02894 | null |
2025-04-01 | TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet | Luis Felipe et.al. | 2504.02874 | null |
2025-04-01 | Synthesized Annotation Guidelines are Knowledge-Lite Boosters for Clinical Information Extraction | Enshuo Hsu et.al. | 2504.02871 | null |
2025-04-04 | A Survey of Large Language Models in Mental Health Disorder Detection on Social Media | Zhuohan Ge et.al. | 2504.02800 | null |
2025-04-03 | AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology | Xiang Feng et.al. | 2504.02404 | link |
2025-04-02 | Trapped by Expectations: Functional Fixedness in LLM-Enabled Chat Search | Jiqun Liu et.al. | 2504.02074 | null |
2025-04-02 | Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment | Abdelrahaman A. Hassan et.al. | 2504.01767 | null |
2025-04-01 | Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models | Feng Chen et.al. | 2504.01216 | null |
2025-04-01 | Medical large language models are easily distracted | Krithik Vishwanath et.al. | 2504.01201 | link |
2025-04-04 | MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs | Juncheng Wu et.al. | 2504.00993 | link |
2025-04-01 | InformGen: An AI Copilot for Accurate and Compliant Clinical Research Consent Document Generation | Zifeng Wang et.al. | 2504.00934 | null |
2025-04-01 | m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models | Xiaoke Huang et.al. | 2504.00869 | null |
2025-04-01 | IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models | Yunsoo Kim et.al. | 2504.00748 | null |
2025-03-31 | Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology | Dou Liu et.al. | 2504.00061 | null |
2025-03-31 | Integrating Large Language Models with Human Expertise for Disease Detection in Electronic Health Records | Jie Pan et.al. | 2504.00053 | null |
2025-03-27 | Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1 | Birger Moell et.al. | 2504.00016 | null |
2025-03-31 | A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG | Arshia Kermani et.al. | 2503.24307 | null |
2025-03-31 | IntelliCircos: A Data-driven and AI-powered Authoring Tool for Circos Plots | Mingyang Gu et.al. | 2503.24021 | null |
2025-03-31 | Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection | Mahdi Amiri et.al. | 2503.23873 | null |
2025-03-30 | When LLM Therapists Become Salespeople: Evaluating Large Language Models for Ethical Motivational Interviewing | Haein Kong et.al. | 2503.23566 | null |
2025-04-01 | A Scalable Framework for Evaluating Health Language Models | Neil Mallinar et.al. | 2503.23339 | null |
2025-03-29 | Prediction of 30-day hospital readmission with clinical notes and EHR information | Tiago Almeida et.al. | 2503.23050 | null |
2025-04-03 | Agentic Large Language Models, a survey | Aske Plaat et.al. | 2503.23037 | null |
2025-03-29 | A Retrieval-Augmented Knowledge Mining Method with Deep Thinking LLMs for Biomedical Research and Clinical Support | Yichun Feng et.al. | 2503.23029 | null |
2025-03-29 | Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective | Xinyu Yao et.al. | 2503.22954 | null |
2025-03-28 | MediTools – Medical Education Powered by LLMs | Amr Alshatnawi et.al. | 2503.22769 | link |
2025-03-26 | Susceptibility of Large Language Models to User-Driven Factors in Medical Queries | Kyung Ho Lim et.al. | 2503.22746 | null |
2025-03-25 | LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation | Sarah Martinson et.al. | 2503.22719 | link |
2025-03-28 | Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions | Mohammad Almansoori et.al. | 2503.22678 | null |
2025-04-08 | Modeling Challenging Patient Interactions: LLMs for Medical Communication Training | Anna Bodonhelyi et.al. | 2503.22250 | null |
2025-03-31 | PharmAgents: Building a Virtual Pharma with Large Language Model Agents | Bowen Gao et.al. | 2503.22164 | null |
2025-03-28 | Leveraging LLMs for Predicting Unknown Diagnoses from Clinical Notes | Dina Albassam et.al. | 2503.22092 | null |
2025-03-27 | Socially Constructed Treatment Plans: Analyzing Online Peer Interactions to Understand How Patients Navigate Complex Medical Conditions | Madhusudan Basak et.al. | 2503.21986 | null |
2025-03-27 | RedditESS: A Mental Health Social Support Interaction Dataset – Understanding Effective Social Support to Refine AI-Driven Support Tools | Zeyad Alghamdi et.al. | 2503.21888 | null |
2025-03-27 | Combining Artificial Users and Psychotherapist Assessment to Evaluate Large Language Model-based Mental Health Chatbots | Florian Onur Kuhlmeier et.al. | 2503.21540 | null |
2025-03-27 | Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records | Noah Losch et.al. | 2503.21349 | null |
2025-03-26 | Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters | Mahmoud Alwakeel et.al. | 2503.21004 | null |
2025-03-26 | Clean & Clear: Feasibility of Safe LLM Clinical Guidance | Julia Ive et.al. | 2503.20953 | null |
2025-03-26 | TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews | Huimin Xu et.al. | 2503.20666 | null |
2025-03-26 | TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes | Raj Sanjay Shah et.al. | 2503.20648 | null |
2025-03-26 | Low-resource Information Extraction with the European Clinical Case Corpus | Soumitra Ghosh et.al. | 2503.20568 | null |
2025-03-26 | Explainable ICD Coding via Entity Linking | Leonor Barreiros et.al. | 2503.20508 | null |
2025-03-26 | Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering | Zehui Liao et.al. | 2503.20504 | null |
2025-03-25 | Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder | Changye Li et.al. | 2503.20103 | link |
2025-03-25 | Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications | Ben Rahman et.al. | 2503.19276 | null |
2025-03-25 | PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping | Sarah Pungitore et.al. | 2503.19265 | null |
2025-03-24 | Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages | Tadesse Destaw Belay et.al. | 2503.18253 | null |
2025-03-26 | PG-SAM: Prior-Guided SAM with Medical for Multi-organ Segmentation | Yiheng Zhong et.al. | 2503.18227 | link |
2025-03-23 | AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs | Diwei Wang et.al. | 2503.18141 | null |
2025-03-23 | Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook | Xu Zheng et.al. | 2503.18016 | null |
2025-03-23 | Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA | Justice Ou et.al. | 2503.17933 | link |
2025-03-23 | MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation | Hsin-Ling Hsu et.al. | 2503.17900 | null |
2025-03-22 | Satisfactory Medical Consultation based on Terminology-Enhanced Information Retrieval and Emotional In-Context Learning | Kaiwen Zuo et.al. | 2503.17876 | null |
2025-03-22 | MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation | Xiaodan Zhang et.al. | 2503.17784 | link |
2025-03-22 | GPBench: A Comprehensive and Fine-Grained Benchmark for Evaluating Large Language Models as General Practitioners | Zheqing Li et.al. | 2503.17599 | null |
2025-03-21 | Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent | Humza Nusrat et.al. | 2503.17553 | null |
2025-03-21 | An LLM-Powered Clinical Calculator Chatbot Backed by Verifiable Clinical Calculators and their Metadata | Niranjan Kumar et.al. | 2503.17550 | null |
2025-03-21 | Reimagining Support: Exploring Autistic Individuals’ Visions for AI in Coping with Negative Self-Talk | Buse Carik et.al. | 2503.17504 | null |
2025-03-21 | Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP | Veysel Kocaman et.al. | 2503.17425 | null |
2025-03-21 | Understanding Social Support Needs in Questions: A Hybrid Approach Integrating Semi-Supervised Learning and LLM-based Data Augmentation | Junwei Kuang et.al. | 2503.17421 | null |
2025-03-21 | Automating Adjudication of Cardiovascular Events Using Large Language Models | Sonish Sivarajkumar et.al. | 2503.17222 | null |
2025-03-20 | Automated Harmfulness Testing for Code Large Language Models | Honghao Tan et.al. | 2503.16740 | null |
2025-03-18 | From Patient Consultations to Graphs: Leveraging LLMs for Patient Journey Knowledge Graph Construction | Hassan S. Al Khatib et.al. | 2503.16533 | null |
2025-03-18 | Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine | Chengfeng Dou et.al. | 2503.16530 | null |
2025-03-20 | OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence | Long Yuan et.al. | 2503.16326 | null |
2025-03-21 | Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1 | Peiran Gu et.al. | 2503.16304 | null |
2025-03-21 | MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering | Feiyang Li et.al. | 2503.16131 | null |
2025-03-20 | BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models | Zenghui Yuan et.al. | 2503.16023 | null |
2025-03-20 | Towards Automatic Continual Learning: A Self-Adaptive Framework for Continual Instruction Tuning | Peiyi Lin et.al. | 2503.15924 | null |
2025-03-20 | DeepPsy-Agent: A Stage-Aware and Deep-Thinking Emotional Support Agent System | Kai Chen et.al. | 2503.15876 | null |
2025-03-19 | Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation | Hisashi Johno et.al. | 2503.15664 | null |
2025-03-27 | Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems | Yuelyu Ji et.al. | 2503.15454 | null |
2025-03-19 | Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data | Anatole Callies et.al. | 2503.15374 | link |
2025-03-19 | Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks | Yuting Guo et.al. | 2503.15169 | null |
2025-03-28 | Envisioning an AI-Enhanced Mental Health Ecosystem | Kellie Yu Hui Sim et.al. | 2503.14883 | null |
2025-03-18 | Generating Medically-Informed Explanations for Depression Detection using LLMs | Xiangyong Chen et.al. | 2503.14671 | null |
2025-03-18 | MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation | Kai Chen et.al. | 2503.13856 | null |
2025-03-14 | RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration | Hong Qing Yu et.al. | 2503.13514 | null |
2025-03-13 | It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education | Shrutika Singh et.al. | 2503.13508 | null |
2025-03-17 | Reliable and Efficient Amortized Model-based Evaluation | Sang Truong et.al. | 2503.13335 | null |
2025-03-24 | LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation | Xiaodi Li et.al. | 2503.13281 | null |
2025-03-17 | MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways | Zhen Chen et.al. | 2503.13205 | null |
2025-03-16 | From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations | Sarvesh Baskar et.al. | 2503.12556 | null |
2025-03-15 | Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes | Da Wu et.al. | 2503.12286 | null |
2025-03-15 | TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation | Mayank Kumar et.al. | 2503.12217 | null |
2025-03-20 | Applications of Large Language Model Reasoning in Feature Generation | Dharani Chandra et.al. | 2503.11989 | null |
2025-03-14 | Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages | Jiyeong Kim et.al. | 2503.11384 | null |
2025-03-14 | TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools | Shanghua Gao et.al. | 2503.10970 | link |
2025-03-12 | CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries | Zhiyuan Wang et.al. | 2503.10707 | null |
2025-03-12 | Medical Large Language Model Benchmarks Should Prioritize Construct Validity | Ahmed Alaa et.al. | 2503.10694 | null |
2025-03-13 | Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models | Afrar Jahin et.al. | 2503.10573 | null |
2025-03-13 | LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions | Gaurav Kumar Gupta et.al. | 2503.10486 | null |
2025-03-13 | Cognitive-Mental-LLM: Leveraging Reasoning in Large Language Models for Mental Health Prediction via Online Text | Avinash Patil et.al. | 2503.10095 | link |
2025-03-12 | Review GIDE – Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models | Timothy Laurence et.al. | 2503.09743 | null |
2025-03-12 | LLM-PS: Empowering Large Language Models for Time Series Forecasting with Temporal Patterns and Semantics | Jialiang Tang et.al. | 2503.09656 | null |
2025-03-16 | Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy | Abe Bohan Hou et.al. | 2503.09639 | null |
2025-03-12 | RetSTA: An LLM-Based Approach for Standardizing Clinical Fundus Image Reports | Jiushen Cai et.al. | 2503.09358 | null |
2025-03-12 | A Survey on Enhancing Causal Reasoning Ability of Large Language Models | Xin Li et.al. | 2503.09326 | null |
2025-03-12 | VaxGuard: A Multi-Generator, Multi-Type, and Multi-Role Dataset for Detecting LLM-Generated Vaccine Misinformation | Syed Talal Ahmad et.al. | 2503.09103 | null |
2025-03-12 | Teaching LLMs How to Learn with Contextual Fine-Tuning | Younwoo Choi et.al. | 2503.09032 | null |
2025-03-11 | Towards Scalable and Cross-Lingual Specialist Language Models for Oncology | Morteza Rohanian et.al. | 2503.08323 | null |
2025-03-10 | Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan | Matthias Schöffel et.al. | 2503.07827 | null |
2025-03-20 | MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | Xiangru Tang et.al. | 2503.07459 | link |
2025-03-10 | Anatomy-Aware Conditional Image-Text Retrieval | Meng Zheng et.al. | 2503.07456 | null |
2025-03-10 | Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment | Xing Xie et.al. | 2503.07334 | link |
2025-03-10 | Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies | Luyi Jiang et.al. | 2503.07306 | null |
2025-03-10 | A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images | Xiaoyi Liang et.al. | 2503.07094 | null |
2025-03-10 | TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine | Tianai Huang et.al. | 2503.07041 | null |
2025-03-10 | Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation | Zhi Qin et.al. | 2503.07032 | null |
2025-03-09 | Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia | Sabeen Ahmed et.al. | 2503.06797 | null |
2025-03-09 | Why Pre-trained Models Fail: Feature Entanglement in Multi-modal Depression Detection | Xiangyu Zhang et.al. | 2503.06620 | null |
2025-03-09 | ExKG-LLM: Leveraging Large Language Models for Automated Expansion of Cognitive Neuroscience Knowledge Graphs | Ali Sarabadani et.al. | 2503.06479 | null |
2025-03-09 | AXAI-CDSS : An Affective Explainable AI-Driven Clinical Decision Support System for Cannabis Use | Tongze Zhang et.al. | 2503.06463 | null |
2025-03-08 | CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset | Oriel Perets et.al. | 2503.06204 | link |
2025-03-08 | Towards Conversational AI for Disease Management | Anil Palepu et.al. | 2503.06074 | null |
2025-03-01 | MedSimAI: Simulation and Formative Feedback Generation to Enhance Deliberate Practice in Medical Education | Yann Hicke et.al. | 2503.05793 | null |
2025-03-07 | Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering | Yusong Ke et.al. | 2503.05505 | null |
2025-03-07 | GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation | Zhenxuan Zhang et.al. | 2503.05347 | link |
2025-03-06 | HILGEN: Hierarchically-Informed Data Generation for Biomedical NER Using Knowledgebases and Large Language Models | Yao Ge et.al. | 2503.04930 | null |
2025-03-10 | Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases | Pengcheng Qiu et.al. | 2503.04691 | null |
2025-03-06 | Large Language Models in Bioinformatics: A Survey | Zhenyu Wang et.al. | 2503.04490 | null |
2025-03-06 | TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records | Hejie Cui et.al. | 2503.04176 | null |
2025-03-06 | KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease | Yongchao Long et.al. | 2503.04153 | link |
2025-03-06 | Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting | Jiyue Jiang et.al. | 2503.04013 | null |
2025-03-06 | RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models | Wenhui Zhu et.al. | 2503.03987 | null |
2025-03-05 | RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction | Fenglin Liu et.al. | 2503.03802 | link |
2025-03-05 | Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks | Zihao Zhao et.al. | 2503.03687 | link |
2025-03-05 | Psy-Copilot: Visual Chain of Thought for Counseling | Keqi Chen et.al. | 2503.03645 | null |
2025-03-05 | Psy-Insight: Explainable Multi-turn Bilingual Dataset for Mental Health Counseling | Keqi Chen et.al. | 2503.03607 | null |
2025-03-05 | Structured Outputs Enable General-Purpose LLMs to be Medical Experts | Guangfu Guo et.al. | 2503.03194 | null |
2025-03-04 | From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance | Jiacheng Tang et.al. | 2503.02760 | null |
2025-03-04 | The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats | William Brach et.al. | 2503.02650 | link |
2025-03-04 | BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA | Zhengyang Ji et.al. | 2503.02476 | link |
2025-03-04 | MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics | Haoan Jin et.al. | 2503.02374 | null |
2025-03-06 | EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports | Lama Moukheiber et.al. | 2503.02365 | null |
2025-03-04 | Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm | Zhuo Li et.al. | 2503.02359 | null |
2025-03-03 | Biomedical Foundation Model: A Survey | Xiangrui Liu et.al. | 2503.02104 | null |
2025-02-28 | PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice | Ruoxi Wang et.al. | 2503.01903 | null |
2025-03-03 | SHADE-AD: An LLM-Based Framework for Synthesizing Activity Data of Alzheimer’s Patients | Heming Fu et.al. | 2503.01768 | null |
2025-03-03 | Designing VR Simulation System for Clinical Communication Training with LLMs-Based Embodied Conversational Agents | Xiuqi Tommy Zhu et.al. | 2503.01767 | null |
2025-03-03 | Distilled Prompt Learning for Incomplete Multimodal Survival Prediction | Yingxue Xu et.al. | 2503.01653 | null |
2025-03-03 | Leveraging LLMs for Mental Health: Detection and Recommendations from Social Discussions | Vaishali Aggarwal et.al. | 2503.01442 | null |
2025-03-03 | Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation | Linhai Zhang et.al. | 2503.01315 | null |
2025-03-03 | Cancer Type, Stage and Prognosis Assessment from Pathology Reports using LLMs | Rachit Saluja et.al. | 2503.01194 | link |
2025-03-03 | Large Language Models for Healthcare Text Classification: A Systematic Review | Hajar Sakai et.al. | 2503.01159 | null |
2025-03-02 | Language-agnostic, automated assessment of listeners’ speech recall using large language models | Björn Herrmann et.al. | 2503.01045 | null |
2025-03-02 | FunBench: Benchmarking Fundus Reading Skills of MLLMs | Qijie Wei et.al. | 2503.00901 | null |
2025-03-02 | Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies | Tianyi Huang et.al. | 2503.00724 | null |
2025-03-01 | Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires | Kyle Gao et.al. | 2503.00566 | null |
2025-03-01 | NeuroSymAD: A Neuro-Symbolic Framework for Interpretable Alzheimer’s Disease Diagnosis | Yexiao He et.al. | 2503.00510 | null |
2025-03-01 | NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews | Vedant Khandelwal et.al. | 2503.00278 | null |
2025-03-01 | Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy | Jahan C. Penny-Dimri et.al. | 2503.00269 | null |
2025-02-24 | Evaluating Large Language Models on the Spanish Medical Intern Resident (MIR) Examination 2024/2025:A Comparative Analysis of Clinical Reasoning and Knowledge Application | Carlos Luengo Vera et.al. | 2503.00025 | null |
2025-02-28 | A Non-contrast Head CT Foundation Model for Comprehensive Neuro-Trauma Triage | Youngjin Yoo et.al. | 2502.21106 | null |
2025-02-28 | Explainable Biomedical Claim Verification with Large Language Models | Siting Liang et.al. | 2502.21014 | null |
2025-02-28 | Merging Clinical Knowledge into Large Language Models for Medical Research and Applications: A Survey | Qiyuan Li et.al. | 2502.20988 | null |
2025-02-28 | ProAI: Proactive Multi-Agent Conversational AI with Structured Knowledge Base for Psychiatric Diagnosis | Yuqi Wu et.al. | 2502.20689 | null |
2025-02-28 | NutriGen: Personalized Meal Plan Generator Leveraging Large Language Models to Enhance Dietary and Nutritional Adherence | Saman Khamesian et.al. | 2502.20601 | link |
2025-02-27 | CoCa-CXR: Contrastive Captioners Learn Strong Temporal Structures for Chest X-Ray Vision-Language Understanding | Yixiong Chen et.al. | 2502.20509 | null |
2025-02-27 | KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model | Kai Zhang et.al. | 2502.20350 | null |
2025-02-27 | Expertise Is What We Want | Alan Ashworth et.al. | 2502.20335 | null |
2025-02-27 | MIND: Towards Immersive Psychological Healing with Multi-agent Inner Dialogue | Yujia Chen et.al. | 2502.19860 | null |
2025-03-03 | R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning | Minggui He et.al. | 2502.19735 | null |
2025-02-27 | Preference Learning Unlocks LLMs’ Psycho-Counseling Skills | Mian Zhang et.al. | 2502.19731 | null |
2025-02-27 | SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning | Mingsheng Cai et.al. | 2502.19668 | null |
2025-02-26 | Repurposing the scientific literature with vision-language models | Anton Alyakin et.al. | 2502.19546 | null |
2025-02-26 | Conversational Planning for Personal Plans | Konstantina Christakopoulou et.al. | 2502.19500 | null |
2025-02-26 | MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis | Daniel Rose et.al. | 2502.19175 | null |
2025-02-26 | Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection | Carter Adams et.al. | 2502.18823 | null |
2025-02-26 | TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation | Chenlu Ju et.al. | 2502.18712 | link |
2025-02-23 | RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis | Jianwei Wang et.al. | 2502.18517 | null |
2025-02-26 | Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support | Guoxin Wang et.al. | 2502.18274 | link |
2025-02-25 | DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning | Pusheng Xu et.al. | 2502.17947 | null |
2025-02-25 | Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation | Tong Li et.al. | 2502.17899 | null |
2025-02-24 | Wearable Meets LLM for Stress Management: A Duoethnographic Study Integrating Wearable-Triggered Stressors and LLM Chatbots for Personalized Interventions | Sameer Neupane et.al. | 2502.17650 | null |
2025-02-24 | Towards Conditioning Clinical Text Generation for User Control | Osman Alperen Koraş et.al. | 2502.17571 | null |
2025-02-18 | User Intent to Use DeekSeep for Healthcare Purposes and their Trust in the Large Language Model: Multinational Survey Study | Avishek Choudhury et.al. | 2502.17487 | null |
2025-03-04 | Large Language Models are Powerful EHR Encoders | Stefan Hegselmann et.al. | 2502.17403 | link |
2025-02-24 | Real-time Monitoring of Economic Shocks using Company Websites | Michael Koenig et.al. | 2502.17161 | null |
2025-02-24 | Applications of Large Models in Medicine | YunHe Su et.al. | 2502.17132 | null |
2025-02-23 | GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking | Yingjian Chen et.al. | 2502.16514 | null |
2025-02-22 | Large Language Model for Lossless Image Compression with Visual Prompts | Junhao Du et.al. | 2502.16163 | null |
2025-02-25 | Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation | Won Seok Jang et.al. | 2502.16022 | null |
2025-02-21 | AutoMedPrompt: A New Framework for Optimizing LLM Medical Prompts Using Textual Gradients | Sean Wu et.al. | 2502.15944 | null |
2025-02-21 | “Kya family planning after marriage hoti hai?”: Integrating Cultural Sensitivity in an LLM Chatbot for Reproductive Health | Roshini Deva et.al. | 2502.15939 | null |
2025-02-21 | CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models | Rikhiya Ghosh et.al. | 2502.15932 | null |
2025-02-21 | A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare | Manar Aljohani et.al. | 2502.15871 | null |
2025-02-21 | MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models | Suraj Racha et.al. | 2502.15418 | link |
2025-02-20 | Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson’s Disease | Elliot Schumacher et.al. | 2502.15069 | null |
2025-02-20 | Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning | Shuyue Stella Li et.al. | 2502.14860 | link |
2025-02-20 | Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning | Juraj Vladika et.al. | 2502.14765 | link |
2025-02-21 | Data-Constrained Synthesis of Training Data for De-Identification | Thomas Vakili et.al. | 2502.14677 | null |
2025-02-20 | FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis | Mingyi Jia et.al. | 2502.14614 | null |
2025-02-20 | MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | Shrey Pandit et.al. | 2502.14302 | null |
2025-02-20 | Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment | Jiaxi Li et.al. | 2502.14275 | null |
2025-03-03 | QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification | Hajar Sakai et.al. | 2502.14189 | null |
2025-02-18 | Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics | Kabir Kumar et.al. | 2502.13982 | null |
2025-02-19 | Exploring Personalized Health Support through Data-Driven, Theory-Guided LLMs: A Case Study in Sleep Health | Xingbo Wang et.al. | 2502.13920 | link |
2025-02-19 | VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare | Anudeex Shetty et.al. | 2502.13775 | null |
2025-02-19 | Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs | Yushi Feng et.al. | 2502.13555 | link |
2025-02-19 | Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion | Shuai Niu et.al. | 2502.13509 | null |
2025-02-19 | Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Yang Yan et.al. | 2502.13447 | null |
2025-02-19 | RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering | Sichu Liang et.al. | 2502.13361 | null |
2025-02-18 | Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare | Hiba Ahsan et.al. | 2502.13319 | null |
2025-02-18 | SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? | Yucheng Shi et.al. | 2502.13233 | null |
2025-02-18 | Private Text Generation by Seeding Large Language Model Prompts | Supriya Nagesh et.al. | 2502.13193 | null |
2025-02-18 | Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Mohammad Reza Rezaei et.al. | 2502.13010 | null |
2025-02-18 | An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation | Mohammad Feli et.al. | 2502.12836 | null |
2025-02-18 | Baichuan-M1: Pushing the Medical Capability of Large Language Models | Bingning Wang et.al. | 2502.12671 | null |
2025-02-18 | Simulating Cooperative Prosocial Behavior with Multi-Agent LLMs: Evidence and Mechanisms for AI Agents to Inform Policy Decisions | Karthik Sreedhar et.al. | 2502.12504 | null |
2025-02-18 | USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner | Mingcong Chen et.al. | 2502.12498 | null |
2025-02-14 | Leveraging large language models for structured information extraction from pathology reports | Jeya Balaji Balasubramanian et.al. | 2502.12183 | link |
2025-02-17 | Exploring Large Language Models in Healthcare: Insights into Corpora Sources, Customization Strategies, and Evaluation Metrics | Shuqi Yang et.al. | 2502.11861 | null |
2025-02-17 | LLM Agents Making Agent Tools | Georg Wölflein et.al. | 2502.11705 | link |
2025-02-17 | CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation | Guangya Yu et.al. | 2502.11703 | null |
2025-02-17 | A Survey of Personalized Large Language Models: Progress and Future Directions | Jiahong Liu et.al. | 2502.11528 | link |
2025-02-16 | A Survey of LLM-based Agents in Medicine: How far are we from Baymax? | Wenxuan Wang et.al. | 2502.11211 | null |
2025-02-16 | Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications | Alexandru Lecu et.al. | 2502.11108 | link |
2025-02-16 | A Survey of Large Language Models in Psychotherapy: Current Landscape and Future Directions | Hongbin Na et.al. | 2502.11095 | null |
2025-02-16 | SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information | Xiangyu Zhang et.al. | 2502.10950 | null |
2025-02-15 | Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia | Rohith Perumandla et.al. | 2502.10896 | null |
2025-02-15 | ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis | Xueshen Li et.al. | 2502.10620 | null |
2025-02-14 | Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes | Ezinne Nwankwo et.al. | 2502.10605 | null |
2025-02-21 | HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | Tianwei Lin et.al. | 2502.09838 | link |
2025-02-12 | Cancer Vaccine Adjuvant Name Recognition from Biomedical Literature using Large Language Models | Hasin Rehana et.al. | 2502.09659 | null |
2025-02-17 | Zero-shot generation of synthetic neurosurgical data with large language models | Austin A. Barr et.al. | 2502.09566 | link |
2025-02-13 | Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs | Chang Liu et.al. | 2502.09156 | null |
2025-02-13 | Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech | Jonathan Pofcher et.al. | 2502.09004 | null |
2025-02-13 | Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning | Leon Nissen et.al. | 2502.08954 | link |
2025-02-12 | Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models | Tabinda Sarwar et.al. | 2502.08669 | null |
2025-02-12 | SycEval: Evaluating LLM Sycophancy | Aaron Fanous et.al. | 2502.08177 | null |
2025-02-12 | Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD) | Naomi Akhras et.al. | 2502.08073 | null |
2025-02-11 | Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? | Hye Sun Yun et.al. | 2502.07963 | link |
2025-02-12 | Beyond Prompting: Time2Lang – Bridging Time-Series Foundation Models and Large Language Models for Health Sensing | Arvind Pillai et.al. | 2502.07608 | link |
2025-02-11 | Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning | Jiayuan Zhu et.al. | 2502.07143 | null |
2025-02-10 | Interactive Data Harmonization with LLM Agents | Aécio Santos et.al. | 2502.07132 | null |
2025-02-09 | LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison | Gabriele De Vito et.al. | 2502.06890 | null |
2025-02-06 | Integrating Generative Artificial Intelligence in ADRD: A Framework for Streamlining Diagnosis and Care in Neurodegenerative Diseases | Andrew G. Breithaupt et.al. | 2502.06842 | null |
2025-02-04 | Diffusion Instruction Tuning | Chen Jin et.al. | 2502.06814 | null |
2025-02-10 | Automatic Evaluation of Healthcare LLMs Beyond Question-Answering | Anna Arias-Duart et.al. | 2502.06666 | null |
2025-02-10 | Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy | Kamyar Kazari et.al. | 2502.06150 | null |
2025-02-09 | HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | Mohammad Amin Abbasi et.al. | 2502.05982 | null |
2025-02-09 | A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography | Nicholas Evans et.al. | 2502.05926 | null |
2025-02-09 | Enhancing Depression Detection with Chain-of-Thought Prompting: From Emotion to Reasoning Using Large Language Models | Shiyu Teng et.al. | 2502.05879 | null |
2025-02-09 | Large Language Model-based Nonnegative Matrix Factorization For Cardiorespiratory Sound Separation | Yasaman Torabi et.al. | 2502.05757 | null |
2025-02-09 | RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care | Ziqi Yang et.al. | 2502.05740 | null |
2025-02-08 | KMI: A Dataset of Korean Motivational Interviewing Dialogues for Psychotherapy | Hyunjong Kim et.al. | 2502.05651 | null |
2025-02-08 | ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports | Aynur Guluzade et.al. | 2502.05638 | link |
2025-02-08 | OntoTune: Ontology-Driven Self-training for Aligning Large Language Models | Zhiqiang Liu et.al. | 2502.05478 | link |
2025-02-12 | Safety at Scale: A Comprehensive Survey of Large Model Safety | Xingjun Ma et.al. | 2502.05206 | link |
2025-02-07 | “It Felt Like I Was Left in the Dark”: Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings | Shihan Fu et.al. | 2502.05115 | null |
2025-02-07 | Enhancing Health Information Retrieval with RAG by Prioritizing Topical Relevance and Factual Accuracy | Rishabh Uapadhyay et.al. | 2502.04666 | null |
2025-02-05 | Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning | Jonathan Kim et.al. | 2502.04381 | null |
2025-02-04 | Open Foundation Models in Healthcare: Challenges, Paradoxes, and Opportunities with GenAI Driven Personalized Prescription | Mahdi Alkaeed et.al. | 2502.04356 | null |
2025-02-04 | JingFang: A Traditional Chinese Medicine Large Language Model of Expert-Level Medical Diagnosis and Syndrome Differentiation-Based Treatment | Yehan Yan et.al. | 2502.04345 | null |
2025-02-06 | Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond | Mardhiyah Sanni et.al. | 2502.03945 | null |
2025-02-05 | A Mixed-Methods Evaluation of LLM-Based Chatbots for Menopause | Roshini Deva et.al. | 2502.03579 | null |
2025-02-05 | MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters | Amin Dada et.al. | 2502.03298 | null |
2025-02-05 | MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation | Seonok Kim et.al. | 2502.03004 | null |
2025-02-05 | CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration | Yizhe Yang et.al. | 2502.02807 | null |
2025-02-04 | Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation | Atharva Mangeshkumar Agrawal et.al. | 2502.02249 | null |
2025-02-02 | Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model | Hadas Ben-Atya et.al. | 2502.01691 | null |
2025-02-03 | OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology | Chengfeng Zhou et.al. | 2502.01243 | null |
2025-02-02 | Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale | Cliff Wong et.al. | 2502.00943 | null |
2025-02-02 | Generalization of Medical Large Language Models through Cross-Domain Weak Supervision | Robert Long et.al. | 2502.00832 | null |
2025-01-31 | Fairshare Data Pricing for Large Language Models | Luyang Zhang et.al. | 2502.00198 | null |
2025-01-31 | DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets | Abdurrahim Yilmaz et.al. | 2502.00196 | null |
2025-02-04 | AIN: The Arabic INclusive Large Multimodal Model | Ahmed Heakl et.al. | 2502.00094 | link |
2025-01-30 | A Multi-Layered Large Language Model Framework for Disease Prediction | Malak Mohamed et.al. | 2502.00063 | null |
2025-01-21 | Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients | Abdulaziz Ahmed et.al. | 2502.00025 | null |
2025-01-30 | Survey and Improvement Strategies for Gene Prioritization with Large Language Models | Matthew Neeley et.al. | 2501.18794 | null |
2025-01-30 | Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning | Maya Kruse et.al. | 2501.18724 | null |
2025-02-03 | Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models | Manish Sanwal et.al. | 2501.18645 | null |
2025-01-27 | Towards Safe AI Clinicians: A Comprehensive Study on Large Language Model Jailbreaking in Healthcare | Hang Zhang et.al. | 2501.18632 | null |
2025-01-30 | GENIE: Generative Note Information Extraction model for structuring EHR data | Huaiyuan Ying et.al. | 2501.18435 | null |
2025-01-30 | Battery State of Health Estimation Using LLM Framework | Aybars Yunusoglu et.al. | 2501.18123 | null |
2025-01-29 | Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations | Zijie Liu et.al. | 2501.17860 | null |
2025-01-29 | LLM Assistance for Pediatric Depression | Mariia Ignashina et.al. | 2501.17510 | null |
2025-01-28 | Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction | Mingyu Derek Ma et.al. | 2501.17326 | null |
2025-01-28 | Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology | Peilong Wang et.al. | 2501.17286 | null |
2025-01-28 | Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care | Fengpei Yuan et.al. | 2501.17206 | null |
2025-01-27 | A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis | Aysegul Ucar et.al. | 2501.17190 | null |
2025-01-28 | Adapting Network Information to Semantics for Generalizable and Plug-and-Play Multi-Scenario Network Diagnosis | Tiao Tan et.al. | 2501.16842 | null |
2025-01-28 | VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records | Philip Chung et.al. | 2501.16672 | link |
2025-01-27 | A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain | Jorge del Pozo Lérida et.al. | 2501.16533 | null |
2025-01-27 | Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM | Payal Kamboj et.al. | 2501.16481 | link |
2025-01-24 | GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration | Ziwen Li et.al. | 2501.16382 | link |
2025-01-18 | An Integrated Approach to AI-Generated Content in e-health | Tasnim Ahmed et.al. | 2501.16348 | null |
2025-01-27 | A foundation model for human-AI collaboration in medical literature mining | Zifeng Wang et.al. | 2501.16255 | null |
2025-01-27 | Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models | Huayu Li et.al. | 2501.16215 | link |
2025-01-27 | MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | Qi Chen et.al. | 2501.15826 | null |
2025-01-26 | Evaluating an LLM-Powered Chatbot for Cognitive Restructuring: Insights from Mental Health Professionals | Yinzhou Wang et.al. | 2501.15599 | null |
2025-01-25 | The Multicultural Medical Assistant: Can LLMs Improve Medical ASR Errors Across Borders? | Ayo Adedeji et.al. | 2501.15310 | null |
2025-01-25 | Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training | Xunxin Cai et.al. | 2501.15108 | null |
2025-01-25 | Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations | Harshita Chopra et.al. | 2501.15056 | null |
2025-01-24 | Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs | Hang Luo et.al. | 2501.14892 | link |
2025-01-24 | Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | Ipek Baris Schlicht et.al. | 2501.14719 | null |
2025-01-24 | MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications | Yixing Jiang et.al. | 2501.14654 | link |
2025-01-24 | AI Chatbots as Professional Service Agents: Developing a Professional Identity | Wenwen Li et.al. | 2501.14179 | null |
2025-01-23 | MedSlice: Fine-Tuned Large Language Models for Secure Clinical Note Sectioning | Joshua Davis et.al. | 2501.14105 | link |
2025-01-23 | Leveraging Large Language Models to Analyze Emotional and Contextual Drivers of Teen Substance Use in Online Discussions | Jianfeng Zhu et.al. | 2501.14037 | null |
2025-01-23 | Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs | Bhumika Gupta et.al. | 2501.13984 | null |
2025-01-21 | Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs) | Jadon Geathers et.al. | 2501.13957 | null |
2025-01-20 | A Layered Multi-Expert Framework for Long-Context Mental Health Assessments | Jinwen Tang et.al. | 2501.13951 | null |
2025-01-14 | Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications | Arjun R. Malghan et.al. | 2501.13936 | null |
2025-01-23 | Enhancing LLMs for Governance with Human Oversight: Evaluating and Aligning LLMs on Expert Classification of Climate Misinformation for Detecting False or Misleading Claims about Climate Change | Mowafak Allaham et.al. | 2501.13802 | null |
2025-01-22 | Intelligent Exercise and Feedback System for Social Healthcare using LLMOps | Yeongrak Choi et.al. | 2501.13723 | null |
2025-01-23 | Question Answering on Patient Medical Records with Private Fine-Tuned LLMs | Sara Kothari et.al. | 2501.13687 | null |
2025-01-23 | How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization | Shezheng Song et.al. | 2501.13669 | null |
2025-01-20 | Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness | Ambreesh Parthasarathy et.al. | 2501.13120 | null |
2025-01-21 | Can open source large language models be used for tumor documentation in Germany? – An evaluation on urological doctors’ notes | Stefan Lenz et.al. | 2501.12106 | link |
2025-01-23 | Med-R $^2$ : Crafting Trustworthy LLM Physicians through Retrieval and Reasoning of Evidence-Based Medicine | Keer Lu et.al. | 2501.11885 | link |
2025-01-19 | Clinical trial cohort selection using Large Language Models on n2c2 Challenges | Chi-en Amy Tai et.al. | 2501.11114 | null |
2025-01-18 | Iterative Tree Analysis for Medical Critics | Zenan Huang et.al. | 2501.10642 | null |
2025-01-17 | Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education | William Hersh et.al. | 2501.10186 | null |
2025-01-17 | Demo: Interactive Visualization of Semantic Relationships in a Biomedical Project’s Talent Knowledge Graph | Jiawei Xu et.al. | 2501.09909 | null |
2025-01-17 | Position: Open and Closed Large Language Models in Healthcare | Jiawei Xu et.al. | 2501.09906 | null |
2025-01-16 | Bridging Language Barriers in Healthcare: A Study on Arabic LLMs | Nada Saadi et.al. | 2501.09825 | null |
2025-01-16 | Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval | Jesus Lovon et.al. | 2501.09384 | link |
2025-01-16 | FineMedLM-o1: Enhancing the Medical Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training | Hongzhou Yu et.al. | 2501.09213 | link |
2025-01-17 | Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models | Emma Croxford et.al. | 2501.08977 | null |
2025-01-26 | Enhanced Large Language Models for Effective Screening of Depression and Anxiety | June M. Liu et.al. | 2501.08769 | null |
2025-01-14 | ADAM-1: AI and Bioinformatics for Alzheimer’s Detection and Microbiome-Clinical Data Integrations | Ziyuan Huang et.al. | 2501.08324 | null |
2025-01-14 | ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems | Mohita Chowdhury et.al. | 2501.08208 | null |
2025-01-13 | Large Language Models for Interpretable Mental Health Diagnosis | Brian Hyeongseok Kim et.al. | 2501.07653 | null |
2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525 | link |
2025-01-13 | Combining LLM decision and RL action selection to improve RL policy for adaptive interventions | Karine Karine et.al. | 2501.06980 | null |
2025-01-12 | Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives | Xinyao Ma et.al. | 2501.06964 | null |
2025-01-12 | A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context | Noureldin Zahran et.al. | 2501.06859 | null |
2025-01-12 | Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation | Shunfan Zheng et.al. | 2501.06741 | null |
2025-01-21 | MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare | Ye Chen et.al. | 2501.06465 | null |
2025-01-11 | O1 Replication Journey – Part 3: Inference-time Scaling for Medical Reasoning | Zhongzhen Huang et.al. | 2501.06458 | link |
2025-01-10 | AFRIDOC-MT: Document-level MT Corpus for African Languages | Jesujoba O. Alabi et.al. | 2501.06374 | link |
2025-01-10 | Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts | Elizabeth Schaefer et.al. | 2501.06365 | null |
2025-01-10 | Large Language Models for Bioinformatics | Wei Ruan et.al. | 2501.06271 | null |
2025-01-10 | From Conversation to Automation: Leveraging Large Language Models to Analyze Strategies in Problem Solving Therapy | Elham Aghakhani et.al. | 2501.06101 | null |
2025-01-07 | Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding | John C. Rollman et.al. | 2501.05479 | null |
2025-01-18 | LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models | Hang Yang et.al. | 2501.05464 | null |
2025-01-09 | Investigating Numerical Translation with Large Language Models | Wei Tang et.al. | 2501.04927 | null |
2025-01-07 | LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment | Gaoussou Youssouf Kebe et.al. | 2501.03624 | null |
2025-01-06 | Existential Crisis: A Social Robot’s Reason for Being | Dora Medgyesy et.al. | 2501.03376 | null |
2025-01-06 | Design and implementation of tools to build an ontology of Security Requirements for Internet of Medical Things | Daniel Naro et.al. | 2501.03067 | null |
2025-01-06 | IIMedGPT: Promoting Large Language Model Capabilities of Medical Tasks by Efficient Human Preference Alignment | Yiming Zhang et.al. | 2501.02869 | null |
2025-01-05 | Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine | Yishen Liu et.al. | 2501.02471 | null |
2025-01-05 | Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications | Zhe Chen et.al. | 2501.02460 | null |
2025-01-04 | Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations | Kangyu Zhu et.al. | 2501.02385 | null |
2025-01-04 | Exploring the Capabilities and Limitations of Large Language Models for Radiation Oncology Decision Support | Florian Putz et.al. | 2501.02346 | null |
2025-01-03 | PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | Jingoo Lee et.al. | 2501.01594 | null |
2025-01-02 | Large Language Models for Mental Health Diagnostic Assessments: Exploring The Potential of Large Language Models for Assisting with Mental Health Diagnostic Assessments – The Depression and Anxiety Case | Kaushik Roy et.al. | 2501.01305 | null |
2025-01-02 | Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice | Federico Ravenda et.al. | 2501.00982 | link |
2024-12-31 | CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care | Michael Gubanov et.al. | 2501.00223 | null |
2024-12-31 | An Empirical Evaluation of Large Language Models on Consumer Health Questions | Moaiz Abrar et.al. | 2501.00208 | null |
2024-12-31 | GPT-4 on Clinic Depression Assessment: An LLM-Based Pilot Study | Giuliano Lorenzoni et.al. | 2501.00199 | null |
2024-12-30 | Temporal reasoning for timeline summarisation in social media | Jiayu Song et.al. | 2501.00152 | null |
2024-12-30 | Tackling Cognitive Impairment Detection from Speech: A submission to the PROCESS Challenge | Catarina Botelho et.al. | 2501.00145 | null |
2024-12-21 | Distilling Large Language Models for Efficient Clinical Information Extraction | Karthik S. Vedula et.al. | 2501.00031 | null |
2024-12-29 | Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain | Shintaro Ozaki et.al. | 2412.20309 | link |
2024-12-28 | On the Compositional Generalization of Multimodal LLMs for Medical Imaging | Zhenyang Cai et.al. | 2412.20070 | link |
2024-12-28 | The Emotional Spectrum of LLMs: Leveraging Empathy and Emotion-Based Markers for Mental Health Support | Alessandro De Grandi et.al. | 2412.20068 | null |
2025-01-02 | MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes | Asma Ben Abacha et.al. | 2412.19260 | link |
2025-01-03 | MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models | Kaiwen Zuo et.al. | 2412.18947 | null |
2024-12-25 | HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs | Junying Chen et.al. | 2412.18925 | link |
2024-12-24 | Research on the Proximity Relationships of Psychosomatic Disease Knowledge Graph Modules Extracted by Large Language Models | Zihan Zhou et.al. | 2412.18419 | null |
2024-12-24 | Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine | Yu He Ke et.al. | 2412.18096 | null |
2024-12-23 | Generating Completions for Fragmented Broca’s Aphasic Sentences Using Large Language Models | Sijbren van Vaals et.al. | 2412.17669 | link |
2024-12-23 | Detecting anxiety and depression in dialogues: a multi-label and explainable approach | Francisco de Arriba-Pérez et.al. | 2412.17651 | null |
2025-01-01 | PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health | Huy Vu et.al. | 2412.16882 | link |
2025-01-03 | KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis | Kaiwen Zuo et.al. | 2412.16833 | null |
2024-12-21 | AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles | Aritra Kumar Lahiri et.al. | 2412.16701 | null |
2024-12-21 | Evaluating the Performance of Large Language Models in Scientific Claim Detection and Classification | Tanjim Bin Faruk et.al. | 2412.16486 | null |
2024-12-21 | Technical Report: Small Language Model for Japanese Clinical and Medicine | Shogo Watanabe et.al. | 2412.16423 | null |
2024-12-21 | Identifying Cyberbullying Roles in Social Media | Manuel Sandoval et.al. | 2412.16417 | null |
2024-12-20 | A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models | Ferit Akaybicen et.al. | 2412.16341 | null |
2024-12-20 | Improving Equity in Health Modeling with GPT4-Turbo Generated Synthetic Data: A Comparative Study | Daniel Smolyak et.al. | 2412.16335 | null |
2024-12-20 | Benchmarking LLMs and SLMs for patient reported outcomes | Matteo Marengo et.al. | 2412.16291 | null |
2024-12-20 | Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | Hasan Md Tusfiqur Alam et.al. | 2412.16086 | link |
2024-12-20 | From General to Specific: Tailoring Large Language Models for Personalized Healthcare | Ruize Shi et.al. | 2412.15957 | null |
2024-12-20 | Linguistic Features Extracted by GPT-4 Improve Alzheimer’s Disease Detection based on Spontaneous Speech | Jonathan Heitz et.al. | 2412.15772 | link |
2024-12-20 | Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models | Shamus Sim et.al. | 2412.15748 | null |
2024-12-20 | NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning | Zheyuan Zhang et.al. | 2412.15547 | null |
2024-12-17 | A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models | Gongbo Zhang et.al. | 2412.15271 | null |
2024-12-16 | Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search | Edward Kim et.al. | 2412.15256 | null |
2024-12-13 | Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an “AI Therapist” | Robert Wasenmüller et.al. | 2412.15242 | null |
2024-12-23 | CareBot: A Pioneering Full-Process Open-Source Medical Language Model | Lulu Zhao et.al. | 2412.15236 | null |
2024-12-18 | Clinical Trials Ontology Engineering with Large Language Models | Berkan Çakır et.al. | 2412.14387 | null |
2024-12-18 | Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICs | David Restrepo et.al. | 2412.14304 | null |
2024-12-18 | Discovering maximally consistent distribution of causal tournaments with Large Language Models | Federico Baldo et.al. | 2412.14019 | null |
2024-12-18 | Cognition Chain for Explainable Psychological Stress Detection on Social Media | Xin Wang et.al. | 2412.14009 | link |
2025-01-08 | Federated Learning and RAG Integration: A Scalable Approach for Medical Large Language Models | Jincheol Jung et.al. | 2412.13720 | null |
2024-12-18 | Exploring Multi-Modal Integration with Tool-Augmented LLM Agents for Precise Causal Discovery | ChengAo Shen et.al. | 2412.13667 | link |
2024-12-18 | PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling | Haojie Xie et.al. | 2412.13660 | link |
2024-12-17 | Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health | Vivek Kumar et.al. | 2412.12981 | link |
2024-12-17 | Process-Supervised Reward Models for Clinical Note Generation: A Scalable Approach Guided by Domain Expertise | Hanyin Wang et.al. | 2412.12583 | link |
2024-12-17 | RareAgents: Autonomous Multi-disciplinary Team for Rare Disease Diagnosis and Treatment | Xuanzhong Chen et.al. | 2412.12475 | null |
2024-12-17 | Assessing the Limitations of Large Language Models in Clinical Fact Decomposition | Monica Munnangi et.al. | 2412.12422 | link |
2024-12-16 | Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments | Tuka Alhanai et.al. | 2412.12417 | link |
2024-12-11 | Performance of a large language model-Artificial Intelligence based chatbot for counseling patients with sexually transmitted infections and genital diseases | Nikhil Mehta et.al. | 2412.12166 | null |
2024-12-16 | LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts | Zhuhao Wang et.al. | 2412.12001 | link |
2024-12-16 | Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives | Sam Relins et.al. | 2412.11878 | link |
2024-12-16 | LLMs Can Simulate Standardized Patients via Agent Coevolution | Zhuoyun Du et.al. | 2412.11716 | link |
2024-12-16 | Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery | Ryuhaerang Choi et.al. | 2412.11656 | null |
2024-12-16 | ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models | Xiechi Zhang et.al. | 2412.11453 | null |
2024-12-19 | TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs | Lanxiang Hu et.al. | 2412.11242 | null |
2024-12-15 | AD-LLM: Benchmarking Large Language Models for Anomaly Detection | Tiankai Yang et.al. | 2412.11142 | link |
2024-12-15 | HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation | Tengfei Liu et.al. | 2412.11070 | link |
2024-12-17 | MedG-KRP: Medical Graph Knowledge Representation Probing | Gabriel R. Rosenbaum et.al. | 2412.10982 | null |
2024-12-14 | LLMs-in-the-Loop Part 2: Expert Small AI Models for Anonymization and De-identification of PHI Across Multiple Languages | Murat Gunay et.al. | 2412.10918 | null |
2024-12-14 | Superhuman performance of a large language model on the reasoning tasks of a physician | Peter G. Brodeur et.al. | 2412.10849 | null |
2024-12-14 | Large Language Models for Medical Forecasting – Foresight 2 | Zeljko Kraljevic et.al. | 2412.10848 | null |
2024-12-14 | A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options | Peilong Wang et.al. | 2412.10622 | null |
2024-12-09 | Leveraging Audio and Text Modalities in Mental Health: A Study of LLMs Performance | Abdelrahman A. Ali et.al. | 2412.10417 | null |
2024-12-09 | Exploring Complex Mental Health Symptoms via Classifying Social Media Data with Explainable LLMs | Kexin Chen et.al. | 2412.10414 | null |
2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372 | link |
2024-12-12 | MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced Interpretation | Zheyuan Zhang et.al. | 2412.08847 | link |
2024-12-11 | Detecting Conversational Mental Manipulation with Intent-Aware Prompting | Jiayuan Ma et.al. | 2412.08414 | link |
2024-12-10 | BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | Sahal Shaji Mullappilly et.al. | 2412.07769 | link |
2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743 | null |
2024-12-09 | Balancing Efficiency and Effectiveness: An LLM-Infused Approach for Optimized CTR Prediction | Guoxiao Zhang et.al. | 2412.06860 | null |
2024-12-06 | Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System | Fang Zeng et.al. | 2412.06828 | null |
2024-12-12 | PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models | Qian Zhang et.al. | 2412.06287 | link |
2024-12-09 | MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization | Kangyu Zhu et.al. | 2412.06141 | link |
2024-12-08 | Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis | Aman Kassahun Wassie et.al. | 2412.05862 | null |
2024-12-08 | Are Clinical T5 Models Better for Clinical Text? | Yahan Li et.al. | 2412.05845 | link |
2024-12-09 | Enhancing FKG.in: automating Indian food composition analysis | Saransh Kumar Gupta et.al. | 2412.05248 | null |
2024-12-06 | SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot | Jinlin Wu et.al. | 2412.05187 | link |
2024-12-06 | A text-to-tabular approach to generate synthetic patient data using LLMs | Margaux Tornqvist et.al. | 2412.05153 | link |
2024-12-05 | Give me Some Hard Questions: Synthetic Data Generation for Clinical QA | Fan Bai et.al. | 2412.04573 | link |
2024-12-04 | Prompting Large Language Models for Clinical Temporal Relation Extraction | Jianping He et.al. | 2412.04512 | null |
2024-12-05 | Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots | Maria Paola Priola et.al. | 2412.04235 | null |
2024-12-05 | Automated Multi-Label Annotation for Mental Health Illnesses Using Large Language Models | Abdelrahaman A. Hassan et.al. | 2412.03796 | null |
2024-11-28 | CovidLLM: A Robust Large Language Model with Missing Value Adaptation and Multi-Objective Learning Strategy for Predicting Disease Severity and Clinical Outcomes in COVID-19 Patients | Shengjun Zhu et.al. | 2412.03593 | link |
2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
2024-12-04 | Advancing Conversational Psychotherapy: Integrating Privacy, Dual-Memory, and Domain Expertise with Large Language Models | XiuYu Zhang et.al. | 2412.02987 | null |
2024-12-03 | A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications | Yixiang Qu et.al. | 2412.02868 | null |
2024-12-09 | RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models | Hieu Tran et.al. | 2412.02830 | link |
2024-12-03 | Keeping Experts in the Loop: Expert-Guided Optimization for Clinical Data Classification using Large Language Models | Nader Karayanni et.al. | 2412.02173 | null |
2024-12-04 | The use of large language models to enhance cancer clinical trial educational materials | Mingye Gao et.al. | 2412.01955 | null |
2024-12-02 | Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | Jie Liu et.al. | 2412.01605 | null |
2024-12-02 | Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models | Chayan Tank et.al. | 2412.01353 | null |
2024-12-02 | Best Practices for Large Language Models in Radiology | Christian Bluethgen et.al. | 2412.01233 | null |
2024-12-01 | Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages | Edward Bayes et.al. | 2412.00948 | null |
2024-12-06 | Opus: A Large Work Model for Complex Workflow Generation | Théo Fagnoni et.al. | 2412.00573 | null |
2024-11-30 | Polish Medical Exams: A new dataset for cross-lingual medical knowledge transfer assessment | Łukasz Grzybowski et.al. | 2412.00559 | null |
2024-12-07 | Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective | Yue Zhou et.al. | 2412.00554 | null |
2024-11-30 | CDEMapper: Enhancing NIH Common Data Element Normalization using Large Language Models | Yan Wang et.al. | 2412.00491 | null |
2024-11-29 | SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies | Jiachen Lian et.al. | 2412.00265 | null |
2024-11-29 | Fine Tuning Large Language Models to Deliver CBT for Depression | Talha Tahir et.al. | 2412.00251 | link |
2024-11-24 | Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis | Ferhat Ozgur Catak et.al. | 2412.00056 | null |
2024-11-29 | MIMDE: Exploring the Use of Synthetic vs Human Data for Evaluating Multi-Insight Multi-Document Extraction Tasks | John Francis et.al. | 2411.19689 | null |
2024-11-29 | SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks | Kim-Celine Kahl et.al. | 2411.19688 | link |
2024-11-28 | ComViewer: An Interactive Visual Tool to Help Viewers Seek Social Support in Online Mental Health Communities | Shiwei Wu et.al. | 2411.19169 | link |
2024-11-28 | A Unified Platform for At-Home Post-Stroke Rehabilitation Enabled by Wearable Technologies and Artificial Intelligence | Chenyu Tang et.al. | 2411.19000 | null |
2024-11-28 | Rephrasing Electronic Health Records for Pretraining Clinical Language Models | Jinghui Liu et.al. | 2411.18940 | null |
2024-11-28 | Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer’s Disease | Junan Li et.al. | 2411.18922 | null |
2024-12-06 | LLM-ABBA: Understanding time series via symbolic approximation | Erin Carson et.al. | 2411.18506 | null |
2024-11-28 | Wearable intelligent throat enables natural speech in stroke patients with dysarthria | Chenyu Tang et.al. | 2411.18266 | null |
2024-11-29 | InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks | Xinyao Zheng et.al. | 2411.18191 | null |
2024-11-27 | Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track | Deepak Gupta et.al. | 2411.18069 | null |
2024-11-27 | QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions | Ramez Kouzy et.al. | 2411.17967 | link |
2024-11-26 | Synthetic Data Generation with LLM for Improved Depression Prediction | Andrea Kang et.al. | 2411.17672 | null |
2024-11-26 | Can artificial intelligence predict clinical trial outcomes? | Shuyi Jin et.al. | 2411.17595 | null |
2024-11-26 | The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations | Theodora Worledge et.al. | 2411.17375 | link |
2024-12-10 | Using Large Language Models for Expert Prior Elicitation in Predictive Modelling | Alexander Capstick et.al. | 2411.17284 | link |
2024-11-28 | Strategic Prompting for Conversational Tasks: A Comparative Analysis of Large Language Models Across Diverse Conversational Tasks | Ratnesh Kumar Joshi et.al. | 2411.17204 | null |
2024-11-25 | Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries | Harshavardhan Battula et.al. | 2411.16818 | null |
2024-11-27 | Creating Scalable AGI: the Open General Intelligence Framework | Daniel A. Dollinger et.al. | 2411.15832 | null |
2024-11-24 | RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements | Zaifu Zhan et.al. | 2411.15700 | null |
2024-11-23 | Ontology-Constrained Generation of Domain-Specific Clinical Summaries | Gaya Mehenni et.al. | 2411.15666 | link |
2024-11-27 | AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Tobi Olatunji et.al. | 2411.15640 | null |
2024-11-23 | Large Language Model with Region-guided Referring and Grounding for CT Report Generation | Zhixuan Chen et.al. | 2411.15539 | link |
2024-11-23 | The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges | Jiqun Liu et.al. | 2411.15396 | null |
2024-11-22 | Regulator-Manufacturer AI Agents Modeling: Mathematical Feedback-Driven Multi-Agent LLM Framework | Yu Han et.al. | 2411.15356 | null |
2024-11-21 | BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models | Taha Koleilat et.al. | 2411.15232 | link |
2024-11-22 | Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation | Colin Diggs et.al. | 2411.14971 | null |
2024-11-22 | De-biased Multimodal Electrocardiogram Analysis | Haitao Li et.al. | 2411.14795 | null |
2024-11-22 | Enhancing Clinical Trial Patient Matching through Knowledge Augmentation with Multi-Agents | Hanwen Shi et.al. | 2411.14637 | null |
2024-11-20 | Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine | Yifan Yang et.al. | 2411.14487 | null |
2024-11-16 | Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios | Shaochen Xu et.al. | 2411.14461 | null |
2024-11-21 | Logic Augmented Generation | Aldo Gangemi et.al. | 2411.14012 | null |
2024-11-21 | PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | Zhijie Bao et.al. | 2411.13902 | link |
2024-11-21 | A Multimodal Approach to The Detection and Classification of Skin Diseases | Allen Yang et.al. | 2411.13855 | null |
2024-11-19 | Can ChatGPT Overcome Behavioral Biases in the Financial Sector? Classify-and-Rethink: Multi-Step Zero-Shot Reasoning in the Gold Investment | Shuoling Liu et.al. | 2411.13599 | null |
2024-11-20 | Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding | Nabeel Seedat et.al. | 2411.13163 | null |
2024-11-19 | DIETS: Diabetic Insulin Management System in Everyday Life | Hanyu Zeng et.al. | 2411.12812 | null |
2024-11-19 | Conversational Medical AI: Ready for Practice | Antoine Lizée et.al. | 2411.12808 | null |
2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712 | null |
2024-11-19 | Performance of Large Language Models in Technical MRI Question Answering: A Comparative Study | Alan B McMillan et.al. | 2411.12238 | null |
2024-11-18 | Medical Video Generation for Disease Progression Simulation | Xu Cao et.al. | 2411.11943 | null |
2024-11-04 | Large language models for mental health | Andreas Triantafyllopoulos et.al. | 2411.11880 | null |
2024-11-18 | Membership Inference Attack against Long-Context Large Language Models | Zixiong Wang et.al. | 2411.11424 | null |
2024-11-17 | BianCang: A Traditional Chinese Medicine Large Language Model | Sibo Wei et.al. | 2411.11027 | link |
2024-11-16 | Can Generic LLMs Help Analyze Child-adult Interactions Involving Children with Autism in Clinical Observation? | Tiantian Feng et.al. | 2411.10761 | null |
2024-11-16 | Structured Dialogue System for Mental Health: An LLM Chatbot Leveraging the PM+ Guidelines | Yixiang Chen et.al. | 2411.10681 | link |
2024-11-15 | Evaluating the role of `Constitutions’ for learning from AI feedback | Saskia Redgate et.al. | 2411.10168 | null |
2024-11-19 | Information Extraction from Clinical Notes: Are We Ready to Switch to Large Language Models? | Yan Hu et.al. | 2411.10020 | link |
2024-11-15 | JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging | Kaito Baba et.al. | 2411.09933 | null |
2024-11-15 | A Hybrid Artificial Intelligence System for Automated EEG Background Analysis and Report Generation | Chin-Sung Tung et.al. | 2411.09874 | link |
2024-11-19 | A Benchmark for Long-Form Medical Question Answering | Pedram Hosseini et.al. | 2411.09834 | null |
2024-11-14 | Script-centric behavior understanding for assisted autism spectrum disorder diagnosis | Wenxing Liu et.al. | 2411.09413 | null |
2024-11-14 | Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering | Nghia Trung Ngo et.al. | 2411.09213 | null |
2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | link |
2024-11-14 | Optimizing Automatic Summarization of Long Clinical Records Using Dynamic Context Extension:Testing and Evaluation of the NBCE Method | Guoqing Zhang et.al. | 2411.08586 | null |
2024-11-12 | Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer’s Disease | Francesco Chiumento et.al. | 2411.07871 | null |
2024-11-12 | Multimodal Clinical Reasoning through Knowledge-augmented Rationale Generation | Shuai Niu et.al. | 2411.07611 | null |
2024-11-11 | Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews | Aakash Sorathiya et.al. | 2411.07398 | null |
2024-11-11 | A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 | Vedant Khandelwal et.al. | 2411.07163 | null |
2024-11-11 | Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models | Aniket Deroy et.al. | 2411.06946 | null |
2024-11-11 | Persuasion with Large Language Models: a Survey | Alexander Rogiers et.al. | 2411.06837 | null |
2024-11-11 | Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding | Zeyd Boukhers et.al. | 2411.06823 | null |
2024-11-11 | Ambient AI Scribing Support: Comparing the Performance of Specialized AI Agentic Architecture to Leading Foundational Models | Chanseo Lee et.al. | 2411.06713 | null |
2024-11-10 | In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages | Joseph Gatto et.al. | 2411.06549 | link |
2024-11-10 | ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? | Canyu Chen et.al. | 2411.06469 | null |
2024-11-09 | GuidelineGuard: An Agentic Framework for Medical Note Evaluation with Guideline Adherence | MD Ragib Shahriyear et.al. | 2411.06264 | null |
2024-11-08 | Humans Continue to Outperform Large Language Models in Complex Clinical Decision-Making: A Study with Medical Calculators | Nicholas Wan et.al. | 2411.05897 | null |
2024-11-08 | Identifying and Decomposing Compound Ingredients in Meal Plans Using Large Language Models | Leon Kopitar et.al. | 2411.05892 | null |
2024-11-08 | A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis | Cristiano Patrício et.al. | 2411.05609 | link |
2024-11-08 | Analyzing Logs of Large-Scale Software Systems using Time Curves Visualization | Dmytro Borysenkov et.al. | 2411.05533 | link |
2024-11-14 | SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark | Sithursan Sivasubramaniam et.al. | 2411.05521 | link |
2024-11-08 | Content Quality vs. Attention Allocation: An LLM-Based Case Study in Peer-to-peer Mental Health Networks | Teng Ye et.al. | 2411.05328 | null |
2024-11-07 | Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | Joey Hong et.al. | 2411.05194 | null |
2024-11-11 | FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? | Eric Wu et.al. | 2411.05059 | link |
2024-11-07 | Integrating Large Language Models for Genetic Variant Classification | Youssef Boulaimen et.al. | 2411.05055 | null |
2024-11-07 | Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability | Yanjun Gao et.al. | 2411.04962 | null |
2024-11-19 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | link |
2024-11-07 | MEG: Medical Knowledge-Augmented Large Language Models for Question Answering | Laura Cabello et.al. | 2411.03883 | link |
2024-11-06 | A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients | Yiming Li et.al. | 2411.03805 | null |
2024-11-06 | From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond | Harsha Nori et.al. | 2411.03590 | null |
2024-11-05 | Exploring Large Language Models for Specialist-level Oncology Care | Anil Palepu et.al. | 2411.03395 | null |
2024-11-05 | The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare | Souren Pashangpour et.al. | 2411.03287 | null |
2024-11-05 | [Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI | Maren Pielka et.al. | 2411.02973 | null |
2024-11-04 | Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge | Karthik Soman et.al. | 2411.02657 | link |
2024-11-04 | “It’s a conversation, not a quiz”: A Risk Taxonomy and Reflection Tool for LLM Adoption in Public Health | Jiawei Zhou et.al. | 2411.02594 | null |
2024-11-01 | Evaluating the Impact of Lab Test Results on Large Language Models Generated Differential Diagnoses from Clinical Case Vignettes | Balu Bhasuran et.al. | 2411.02523 | null |
2024-11-01 | Rationale-Guided Retrieval Augmented Generation for Medical Question Answering | Jiwoong Sohn et.al. | 2411.00300 | link |
2024-11-16 | RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models | Serena Zhang et.al. | 2411.00299 | null |
2024-10-31 | A Demonstration of Adaptive Collaboration of Large Language Models for Medical Decision-Making | Yubin Kim et.al. | 2411.00248 | link |
2024-10-31 | Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning | John Wu et.al. | 2411.00173 | null |
2024-10-28 | A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges | Zifeng Wang et.al. | 2411.00024 | null |
2024-10-31 | Leveraging Large Language Models for Medical Information Extraction and Query Generation | Georgios Peikos et.al. | 2410.23851 | null |
2024-10-31 | Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding | Jinlong He et.al. | 2410.23822 | null |
2024-10-31 | The Potential of LLMs in Medical Education: Generating Questions and Answers for Qualification Exams | Yunqi Zhu et.al. | 2410.23769 | null |
2024-11-01 | Large Language Models for Patient Comments Multi-Label Classification | Hajar Sakai et.al. | 2410.23528 | null |
2024-10-31 | LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models | Hieu Tran et.al. | 2410.23526 | null |
2024-10-29 | Do Large Language Models Align with Core Mental Health Counseling Competencies? | Viet Cuong Nguyen et.al. | 2410.22446 | null |
2024-10-29 | Improving In-Context Learning with Small Language Model Ensembles | M. Mehdi Mojarradi et.al. | 2410.21868 | link |
2024-10-28 | Can Large Language Models Replace Data Scientists in Clinical Research? | Zifeng Wang et.al. | 2410.21591 | null |
2024-10-28 | LLM-Forest for Health Tabular Data Imputation | Xinrui He et.al. | 2410.21520 | null |
2024-10-28 | RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension | Abel Corrêa Dias et.al. | 2410.21495 | link |
2024-11-01 | “We do use it, but not how hearing people think”: How the Deaf and Hard of Hearing Community Uses Large Language Model Tools | Shuxu Huffman et.al. | 2410.21358 | null |
2024-10-28 | Large Language Model Benchmarks in Medical Tasks | Lawrence K. Q. Yan et.al. | 2410.21348 | null |
2024-10-27 | Language Models And A Second Opinion Use Case: The Pocket Professional | David Noever et.al. | 2410.20636 | null |
2024-10-26 | Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks | Annalisa Szymanski et.al. | 2410.20266 | null |
2024-10-26 | Infectious Disease Forecasting in India using LLM’s and Deep Learning | Chaitya Shah et.al. | 2410.20168 | null |
2024-10-26 | AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels | Lei Li et.al. | 2410.20050 | link |
2024-10-25 | DualMAR: Medical-Augmented Representation from Dual-Expertise Perspectives | Pengfei Hu et.al. | 2410.19955 | link |
2024-10-18 | Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research | Aarsh Shekhar et.al. | 2410.19826 | null |
2024-10-24 | Inference time LLM alignment in single and multidomain preference spectrum | Sadat Shahriar et.al. | 2410.19206 | null |
2024-10-24 | Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use | Mohit Chandra et.al. | 2410.19155 | link |
2024-10-24 | Watermarking Large Language Models and the Generated Content: Opportunities and Challenges | Ruisi Zhang et.al. | 2410.19096 | null |
2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955 | null |
2024-10-24 | Demystifying Large Language Models for Medicine: A Primer | Qiao Jin et.al. | 2410.18856 | link |
2024-10-24 | Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare | Yifan Yang et.al. | 2410.18460 | null |
2024-10-23 | ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | Yusheng Liao et.al. | 2410.17657 | link |
2024-10-22 | DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR | Miguel Contreras et.al. | 2410.17363 | null |
2024-10-22 | DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization | John X. Morris et.al. | 2410.17035 | null |
2024-10-22 | SleepCoT: A Lightweight Personalized Sleep Health Model via Chain-of-Thought Distillation | Huimin Zheng et.al. | 2410.16924 | null |
2024-10-22 | Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective | Xiaolan Chen et.al. | 2410.16662 | null |
2024-10-21 | How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? | Kenza Benkirane et.al. | 2410.16574 | link |
2024-10-21 | Large language models enabled multiagent ensemble method for efficient EHR data labeling | Jingwei Huang et.al. | 2410.16543 | null |
2024-10-17 | SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques | Qiming Guo et.al. | 2410.16322 | null |
2024-10-22 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239 | link |
2024-10-21 | Fine-Tuning LLMs for Reliable Medical Question-Answering Services | Ali Anaissi et.al. | 2410.16088 | null |
2024-10-21 | Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding | Derong Xu et.al. | 2410.15702 | null |
2024-10-21 | Resource-Efficient Medical Report Generation using Large Language Models | Abdullah et.al. | 2410.15642 | null |
2024-10-20 | Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini | Chanseo Lee et.al. | 2410.15528 | null |
2024-10-20 | Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training | Shahrad Mohammadzadeh et.al. | 2410.15460 | null |
2024-10-19 | AutoFLUKA: A Large Language Model Based Framework for Automating Monte Carlo Simulations in FLUKA | Zavier Ndum Ndum et.al. | 2410.15222 | null |
2024-10-19 | Fine-tuning foundational models to code diagnoses from veterinary health records | Mayla R. Boguslav et.al. | 2410.15186 | null |
2024-10-19 | Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs | Xiaocheng Zhang et.al. | 2410.15135 | null |
2024-10-19 | LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | Xuechen Guo et.al. | 2410.15074 | null |
2024-10-18 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz et.al. | 2410.14763 | link |
2024-10-18 | Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning | Jialu Tang et.al. | 2410.14464 | null |
2024-10-18 | ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM | Songheng Zhang et.al. | 2410.14331 | null |
2024-10-18 | LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs | Yujun Zhou et.al. | 2410.14182 | null |
2024-10-17 | RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs | Jiatan Huang et.al. | 2410.13987 | null |
2024-10-17 | HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings | Varun Gumma et.al. | 2410.13671 | null |
2024-10-17 | MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | Yakun Zhu et.al. | 2410.13610 | null |
2024-10-17 | Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? | Che Liu et.al. | 2410.13523 | null |
2024-10-17 | MedINST: Meta Dataset of Biomedical Instructions | Wenhan Han et.al. | 2410.13458 | link |
2024-10-17 | Augmentation Policy Generation for Image Classification Using Large Language Models | Ant Duru et.al. | 2410.13453 | null |
2024-10-17 | Representation Learning of Structured Data for Medical Foundation Models | Vijay Prakash Dwivedi et.al. | 2410.13351 | null |
2024-10-17 | CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy | Mian Zhang et.al. | 2410.13218 | null |
2024-10-17 | LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch | Caigao Jiang et.al. | 2410.13213 | link |
2024-10-18 | MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback | Zonghai Yao et.al. | 2410.13191 | link |
2024-10-16 | Leveraging LLMs for Translating and Classifying Mental Health Data | Konstantinos Skianis et.al. | 2410.12985 | null |
2024-10-16 | AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning | Mohammad Reza Rezaei et.al. | 2410.12886 | link |
2024-10-13 | IMAS: A Comprehensive Agentic Approach to Rural Healthcare Delivery | Agasthya Gangavarapu et.al. | 2410.12868 | link |
2024-10-11 | LLMD: A Large Language Model for Interpreting Longitudinal Medical Records | Robert Porter et.al. | 2410.12860 | null |
2024-10-11 | Large Language Models for Medical OSCE Assessment: A Novel Approach to Transcript Analysis | Ameer Hamza Shakur et.al. | 2410.12858 | null |
2024-10-10 | Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions | Per Niklas Waaler et.al. | 2410.12848 | null |
2024-10-17 | Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 | Mohamad Abdi et.al. | 2410.12686 | null |
2024-10-17 | MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration | Jinjie Wei et.al. | 2410.12532 | null |
2024-10-16 | Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation | Zerui Xu et.al. | 2410.12476 | null |
2024-10-06 | SouLLMate: An Adaptive LLM-Driven System for Advanced Mental Health Support and Assessment, Based on a Systematic Application Survey | Qiming Guo et.al. | 2410.11859 | null |
2024-10-15 | Y-Mol: A Multiscale Biomedical Knowledge-Guided Large Language Model for Drug Development | Tengfei Ma et.al. | 2410.11550 | null |
2024-10-15 | AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data | Xinjie Zhao et.al. | 2410.11531 | null |
2024-10-15 | HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications | Weijie Xu et.al. | 2410.11239 | null |
2024-10-13 | 3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation | Hongxin Ding et.al. | 2410.10901 | null |
2024-10-08 | Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging | Ryota Tozuka et.al. | 2410.10869 | null |
2024-10-08 | CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept | YuXuan Wu et.al. | 2410.10866 | null |
2024-10-06 | Mitigating Hallucinations Using Ensemble of Knowledge Graph and Vector Store in Large Language Models to Enhance Mental Health Support | Abdul Muqtadir et.al. | 2410.10853 | null |
2024-10-06 | On the Reliability of Large Language Models to Misinformed and Demographically-Informed Prompts | Toluwani Aremu et.al. | 2410.10850 | link |
2024-10-14 | Thinking LLMs: General Instruction Following with Thought Generation | Tianhao Wu et.al. | 2410.10630 | null |
2024-10-14 | Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Guorui Zheng et.al. | 2410.10626 | link |
2024-10-14 | MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media | Wei Zhai et.al. | 2410.10323 | link |
2024-10-13 | Adaptive Reasoning and Acting in Medical Language Agents | Abhishek Dutta et.al. | 2410.10020 | null |
2024-10-15 | MisinfoEval: Generative AI in the Era of “Alternative Facts” | Saadia Gabriel et.al. | 2410.09949 | null |
2024-10-13 | Equitable Access to Justice: Logical LLMs Show Promise | Manuj Kant et.al. | 2410.09904 | null |
2024-10-13 | MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions | Tavish Mankash et.al. | 2410.09729 | null |
2024-10-12 | Society of Medical Simplifiers | Chen Lyu et.al. | 2410.09631 | null |
2024-10-12 | Enhanced Electronic Health Records Text Summarization Using Large Language Models | Ruvarashe Madzime et.al. | 2410.09628 | null |
2024-10-11 | Fine-Tuning In-House Large Language Models to Infer Differential Diagnosis from Radiology Reports | Luoyao Chen et.al. | 2410.09234 | null |
2024-10-04 | Leveraging Social Determinants of Health in Alzheimer’s Research Using LLM-Augmented Literature Mining and Knowledge Graphs | Tianqi Shang et.al. | 2410.09080 | link |
2024-10-11 | oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness | Yu He Ke et.al. | 2410.08431 | null |
2024-10-10 | Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions | Kuleen Sasse et.al. | 2410.07951 | null |
2024-10-09 | MoDEM: Mixture of Domain Expert Models | Toby Simonds et.al. | 2410.07490 | null |
2024-10-16 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129 | null |
2024-10-09 | Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback | Dennis Hein et.al. | 2410.07025 | null |
2024-10-09 | Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare | Pardis Sadat Zahraei et.al. | 2410.06566 | null |
2024-10-08 | Exploring Large Language Models Through a Neurodivergent Lens: Use, Challenges, Community-Driven Workarounds, and Concerns | Buse Carik et.al. | 2410.06336 | null |
2024-10-08 | Linking Code and Documentation Churn: Preliminary Analysis | Ani Hovhannisyan et.al. | 2410.05992 | null |
2024-10-10 | KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server | Wenhao Wang et.al. | 2410.05725 | link |
2024-10-10 | Copiloting Diagnosis of Autism in Real Clinical Scenarios via LLMs | Yi Jiang et.al. | 2410.05684 | null |
2024-10-07 | RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction | Yuwei Zhang et.al. | 2410.05361 | null |
2024-10-14 | Mitigating the Risk of Health Inequity Exacerbated by Large Language Models | Yuelyu Ji et.al. | 2410.05180 | null |
2024-10-07 | Rule-based Data Selection for Large Language Models | Xiaomin Li et.al. | 2410.04715 | null |
2024-10-07 | Knowledge Graph Based Agent for Complex, Knowledge-Intensive QA in Medicine | Xiaorui Su et.al. | 2410.04660 | null |
2024-10-06 | CardioAI: A Multimodal AI-based System to Support Symptom Monitoring and Risk Detection of Cancer Treatment-Induced Cardiotoxicity | Siyi Wu et.al. | 2410.04592 | null |
2024-10-06 | Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Pengcheng Jiang et.al. | 2410.04585 | link |
2024-10-06 | MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration | Lai Wei et.al. | 2410.04521 | link |
2024-10-06 | Latent Feature Mining for Predictive Model Enhancement with Large Language Models | Bingxuan Li et.al. | 2410.04347 | null |
2024-10-05 | RoQLlama: A Lightweight Romanian Adapted Language Model | George-Andrei Dima et.al. | 2410.04269 | null |
2024-10-05 | DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech | Dominika Woszczyk et.al. | 2410.04188 | null |
2024-10-05 | Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment | Chengfeng Dou et.al. | 2410.04112 | null |
2024-10-04 | Searching for Best Practices in Medical Transcription with Large Language Model | Jiafeng Li et.al. | 2410.03797 | link |
2024-10-01 | Towards Democratization of Subspeciality Medical Expertise | Jack W. O’Sullivan et.al. | 2410.03741 | null |
2024-10-01 | Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model | Aidan Gilson et.al. | 2410.03740 | null |
2024-10-04 | Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) | Abrar Rahman et.al. | 2410.03568 | null |
2024-10-04 | CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios | Zetian Ouyang et.al. | 2410.03502 | link |
2024-10-04 | Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity | Hyosoon Jang et.al. | 2410.03138 | null |
2024-10-04 | Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models | Yan Chen et.al. | 2410.03134 | null |
2024-10-04 | Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks | Grant Wardle et.al. | 2410.03062 | null |
2024-10-03 | HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router | Lingrui Mei et.al. | 2410.02684 | link |
2024-10-03 | ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration | Zixiang Wang et.al. | 2410.02551 | null |
2024-10-04 | MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation | Gurucharan Marthi Krishna Kumar et.al. | 2410.02458 | null |
2024-10-02 | Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | Yuan Zhou et.al. | 2410.02026 | null |
2024-09-27 | A GEN AI Framework for Medical Note Generation | Hui Yi Leong et.al. | 2410.01841 | null |
2024-10-02 | DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning | Yebowen Hu et.al. | 2410.01772 | null |
2024-10-03 | Practicing Stress Relief for the Everyday: Designing Social Simulation Using VR, AR, and LLMs | Anna Fang et.al. | 2410.01672 | null |
2024-10-02 | MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework | Zonghai Yao et.al. | 2410.01553 | link |
2024-10-01 | FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks | Peiran Wu et.al. | 2410.01089 | null |
2024-10-01 | Deceptive Risks in LLM-enhanced Robots | Robert Ranisch et.al. | 2410.00434 | null |
2024-10-01 | CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset | Xiao Wang et.al. | 2410.00379 | link |
2024-10-01 | Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis | Chun-Hsiao Yeh et.al. | 2410.00292 | null |
2024-09-30 | A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification | Marina Ribeiro et.al. | 2410.00250 | null |
2024-09-30 | EEG Emotion Copilot: Pruning LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation | Hongyu Chen et.al. | 2410.00166 | null |
2024-09-30 | Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation | Pedro Henrique Paiola et.al. | 2410.00163 | null |
2024-09-30 | Ranking Over Scoring: Towards Reliable and Robust Automated Evaluation of LLM-Generated Medical Explanatory Arguments | Iker De la Iglesia et.al. | 2409.20565 | null |
2024-09-30 | Wait, but Tylenol is Acetaminophen… Investigating and Improving Language Models’ Ability to Resist Requests for Misinformation | Shan Chen et.al. | 2409.20385 | null |
2024-09-30 | Classification of Radiological Text in Small and Imbalanced Datasets in a Non-English Language | Vincent Beliveau et.al. | 2409.20147 | link |
2024-10-01 | See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning | Chengxin Zheng et.al. | 2409.19676 | link |
2024-09-29 | MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models | Vibhor Agarwal et.al. | 2409.19492 | null |
2024-10-11 | HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations | Ziyu Wang et.al. | 2409.19487 | null |
2024-09-28 | INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning | Pablo Romero et.al. | 2409.19467 | link |
2024-09-27 | Confidential Prompting: Protecting User Prompts from Cloud LLM Providers | In Gim et.al. | 2409.19134 | link |
2024-09-27 | Secure Multiparty Generative AI | Manil Shrestha et.al. | 2409.19120 | null |
2024-09-27 | Outlining the Borders for LLM Applications in Patient Education: Developing an Expert-in-the-Loop LLM-Powered Chatbot for Prostate Cancer Patient Education | Yuexing Hao et.al. | 2409.19100 | null |
2024-10-01 | AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow | Huizi Yu et.al. | 2409.18924 | null |
2024-09-27 | Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications | Aditi Godbole et.al. | 2409.18454 | null |
2024-09-26 | Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model | Chuang Niu et.al. | 2409.18319 | link |
2024-09-26 | Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams | Yuexing Hao et.al. | 2409.18290 | link |
2024-09-26 | Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT | Natthanaphop Isaradech et.al. | 2409.17683 | null |
2024-09-26 | Digital Twin Ecosystem for Oncology Clinical Operations | Himanshu Pandey et.al. | 2409.17650 | null |
2024-09-26 | ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue | Zhangpu Li et.al. | 2409.17610 | null |
2024-09-26 | A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models | Syed Affan Daimi et.al. | 2409.17581 | link |
2024-09-26 | Dr. GPT in Campus Counseling: Understanding Higher Education Students’ Opinions on LLM-assisted Mental Health Services | Owen Xingjian Zhang et.al. | 2409.17572 | null |
2024-09-26 | Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Xun Zhu et.al. | 2409.17508 | link |
2024-09-25 | Severity Prediction in Mental Health: LLM-based Creation, Analysis, Evaluation of a Novel Multilingual Dataset | Konstantinos Skianis et.al. | 2409.17397 | null |
2024-09-25 | Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia | Azmul Asmar Irfan et.al. | 2409.17054 | null |
2024-09-25 | The Role of Language Models in Modern Healthcare: A Comprehensive Review | Amna Khalid et.al. | 2409.16860 | null |
2024-10-04 | “It Explains What I am Currently Going Through Perfectly to a Tee”: Understanding User Perceptions on LLM-Enhanced Narrative Interventions | Ananya Bhattacharjee et.al. | 2409.16732 | null |
2024-09-25 | In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results | Mike Thelwall et.al. | 2409.16695 | null |
2024-09-25 | Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels | Yishu Wei et.al. | 2409.16563 | null |
2024-09-24 | Design and Evaluation of a CDSS for Drug Allergy Management Using LLMs and Pharmaceutical Data Integration | Gabriele De Vito et.al. | 2409.16395 | null |
2024-09-24 | CHBench: A Chinese Dataset for Evaluating Health in Large Language Models | Chenlu Guo et.al. | 2409.15766 | link |
2024-09-24 | XTRUST: On the Multilingual Trustworthiness of Large Language Models | Yahan Li et.al. | 2409.15762 | link |
2024-09-24 | A Comprehensive Evaluation of Large Language Models on Mental Illnesses | Abdelrahman Hanafi et.al. | 2409.15687 | null |
2024-09-23 | Voice Assistants for Health Self-Management: Designing for and with Older Adults | Amama Mahmood et.al. | 2409.15488 | null |
2024-09-20 | Prompting Large Language Models for Supporting the Differential Diagnosis of Anemia | Elisa Castagnari et.al. | 2409.15377 | null |
2024-09-23 | A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | Yunfei Xie et.al. | 2409.15277 | null |
2024-09-23 | Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation | Yi-Fei Zhao et.al. | 2409.15260 | null |
2024-09-24 | PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models | Zhiyuan Wang et.al. | 2409.15188 | link |
2024-09-23 | Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies | Skatje Myers et.al. | 2409.15163 | null |
2024-09-23 | Boosting Healthcare LLMs Through Retrieved Context | Jordi Bayarri-Planas et.al. | 2409.15127 | link |
2024-09-20 | Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory | Kunyao Lan et.al. | 2409.15084 | null |
2024-09-23 | Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs | Clément Christophe et.al. | 2409.14988 | null |
2024-09-23 | Knowledge Planning in Large Language Models for Domain-Aligned Counseling Summarization | Aseem Srivastava et.al. | 2409.14907 | null |
2024-09-24 | Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding | Bokang Bi et.al. | 2409.14638 | null |
2024-09-22 | Can Large Language Models Logically Predict Myocardial Infarction? Evaluation based on UK Biobank Cohort | Yuxing Zhi et.al. | 2409.14478 | null |
2024-09-22 | PretextTrans: Investigating Medical Factual Knowledge Mastery of LLMs with Predicate-text Dual Transformation | Yuxuan Zhou et.al. | 2409.14302 | null |
2024-09-21 | Current Trends and Future Directions for Sexual Health Conversational Agents (CAs) for Youth: A Scoping Review | Jinkyung Katie Park et.al. | 2409.14226 | null |
2024-09-20 | Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology | Aidan Gilson et.al. | 2409.13902 | null |
2024-09-20 | Transfer Learning with Clinical Concept Embeddings from Large Language Models | Yuhe Gao et.al. | 2409.13893 | null |
2024-09-11 | A Simplified Retriever to Improve Accuracy of Phenotype Normalizations by Large Language Models | Daniel B. Hier et.al. | 2409.13744 | null |
2024-09-20 | Recent Advancement of Emotion Cognition in Large Language Models | Yuyan Chen et.al. | 2409.13354 | null |
2024-09-20 | SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation | Jinge Wu et.al. | 2409.13321 | link |
2024-09-20 | An adapted large language model facilitates multiple medical tasks in diabetes care | Lai Wei et.al. | 2409.13191 | link |
2024-09-19 | A New Perspective on ADHD Research: Knowledge Graph Construction with LLMs and Network Based Insights | Hakan T. Otal et.al. | 2409.12853 | link |
2024-09-20 | Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization | Thomas Savage et.al. | 2409.12741 | null |
2024-09-11 | Semantic Interoperability on Blockchain by Generating Smart Contracts Based on Knowledge Graphs | William Van Woensel et.al. | 2409.12171 | null |
2024-09-19 | Using Large Language Models to Generate Clinical Trial Tables and Figures | Yumeng Yang et.al. | 2409.12046 | null |
2024-09-20 | Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources | Issey Sukeda et.al. | 2409.11783 | link |
2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375 | null |
2024-09-17 | ASHABot: An LLM-Powered Chatbot to Support the Informational Needs of Community Health Workers | Pragnya Ramjee et.al. | 2409.10913 | null |
2024-09-16 | GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students | Vikram Krishnaveti et.al. | 2409.10750 | null |
2024-09-15 | Veridical Data Science for Medical Foundation Models | Ahmed Alaa et.al. | 2409.10580 | null |
2024-09-14 | On the limits of agency in agent-based models | Ayush Chopra et.al. | 2409.10568 | link |
2024-09-16 | DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction | John Wu et.al. | 2409.10504 | null |
2024-09-17 | Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | Bhuvan Sachdeva et.al. | 2409.10354 | null |
2024-09-16 | LLMs for clinical risk prediction | Mohamed Rezk et.al. | 2409.10191 | null |
2024-09-16 | MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM | Sijie Ji et.al. | 2409.10064 | null |
2024-09-18 | HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making | Sumera Anjum et.al. | 2409.10011 | link |
2024-09-15 | GP-GPT: Large Language Model for Gene-Phenotype Mapping | Yanjun Lyu et.al. | 2409.09825 | null |
2024-09-15 | AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs | Madhusudan Ghosh et.al. | 2409.09704 | link |
2024-09-17 | ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language Models | Inhwa Song et.al. | 2409.09662 | null |
2024-09-15 | MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences | Subigya Nepal et.al. | 2409.09570 | null |
2024-09-14 | Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation | Hui Yi Leong et.al. | 2409.09324 | null |
2024-09-24 | Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases | Mercy Asiedu et.al. | 2409.09201 | null |
2024-09-13 | Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation | Cheng Charles Ma et.al. | 2409.09135 | null |
2024-08-30 | OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography | Youzhu Jin et.al. | 2409.09052 | null |
2024-09-13 | Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes | Luis Rita et.al. | 2409.08792 | null |
2024-09-13 | Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling | Jialu Tang et.al. | 2409.08788 | null |
2024-09-13 | Eir: Thai Medical Large Language Models | Yutthakorn Thiprak et.al. | 2409.08523 | null |
2024-09-11 | Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation | Gavin Butts et.al. | 2409.07424 | null |
2024-09-11 | MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | Praveen K Kanithi et.al. | 2409.07314 | null |
2024-09-11 | Reranking Laws for Language Generation: A Communication-Theoretic Perspective | António Farinhas et.al. | 2409.07131 | null |
2024-09-10 | MAGDA: Multi-agent guideline-driven diagnostic assistance | David Bani-Harouni et.al. | 2409.06351 | null |
2024-09-10 | Can Large Language Models Unlock Novel Scientific Research Ideas? | Sandeep Kumar et.al. | 2409.06185 | link |
2024-09-10 | Deep Learning and Large Language Models for Audio and Text Analysis in Predicting Suicidal Acts in Chinese Psychological Support Hotlines | Yining Chen et.al. | 2409.06164 | link |
2024-09-09 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou et.al. | 2409.05732 | null |
2024-09-09 | The Influence of Task and Group Disparities over Users’ Attitudes Toward Using Large Language Models for Psychotherapy | Qihang He et.al. | 2409.05703 | null |
2024-09-09 | KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models | Yingshu Li et.al. | 2409.05370 | null |
2024-09-06 | Toward LLM-Powered Social Robots for Supporting Sensitive Disclosures of Stigmatized Health Conditions | Alemitu Bezabih et.al. | 2409.04508 | null |
2024-09-06 | Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials | Yizhen Zheng et.al. | 2409.04481 | null |
2024-09-06 | Towards Safer Online Spaces: Simulating and Assessing Intervention Strategies for Eating Disorder Discussions | Louis Penafiel et.al. | 2409.04043 | null |
2024-09-05 | CACER: Clinical Concept Annotations for Cancer Events and Relations | Yujuan Fu et.al. | 2409.03905 | link |
2024-09-05 | LLM-based event abstraction and integration for IoT-sourced logs | Mohsen Shirali et.al. | 2409.03478 | link |
2024-09-05 | Rx Strategist: Prescription Verification using LLM Agents System | Phuc Phan Van et.al. | 2409.03440 | null |
2024-09-05 | Leveraging Large Language Models through Natural Language Processing to provide interpretable Machine Learning predictions of mental deterioration in real time | Francisco de Arriba-Pérez et.al. | 2409.03375 | null |
2024-09-05 | Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration | Jeremy Qin et.al. | 2409.03225 | link |
2024-09-04 | Understanding eGFR Trajectories and Kidney Function Decline via Large Multimodal Models | Chih-Yuan Li et.al. | 2409.02530 | null |
2024-09-03 | Therapy as an NLP Task: Psychologists’ Comparison of LLMs and Human Peers in CBT | Zainab Iftikhar et.al. | 2409.02244 | null |
2024-09-03 | Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation | Jack Krolik et.al. | 2409.01941 | null |
2024-09-03 | Training on the Benchmark Is Not All You Need | Shiwen Ni et.al. | 2409.01790 | link |
2024-09-03 | It is Time to Develop an Auditing Framework to Promote Value Aware Chatbots | Yanchen Wang et.al. | 2409.01539 | link |
2024-09-02 | DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models | Rajat Rawat et.al. | 2409.01497 | null |
2024-09-01 | Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering | Derian Boer et.al. | 2409.00861 | link |
2024-09-01 | Building FKG.in: a Knowledge Graph for Indian Food | Saransh Kumar Gupta et.al. | 2409.00830 | null |
2024-08-31 | Large Language Models-Enabled Digital Twins for Precision Medicine in Rare Gynecological Tumors | Jacqueline Lammert et.al. | 2409.00544 | link |
2024-08-31 | Chatting Up Attachment: Using LLMs to Predict Adult Bonds | Paulo Soares et.al. | 2409.00347 | null |
2024-08-29 | A Survey for Large Language Models in Biomedicine | Chong Wang et.al. | 2409.00133 | null |
2024-08-27 | Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy | Daniil Filienko et.al. | 2409.00112 | null |
2024-08-27 | Large Language Models for Disease Diagnosis: A Scoping Review | Shuang Zhou et.al. | 2409.00097 | null |
2024-09-04 | Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models | Seyed Amir Ahmad Safavi-Naini et.al. | 2409.00084 | link |
2024-08-30 | NDP: Next Distribution Prediction as a More Broad Target | Junhao Ruan et.al. | 2408.17377 | null |
2024-08-29 | Instruction-tuned Large Language Models for Machine Translation in the Medical Domain | Miguel Rios et.al. | 2408.16440 | null |
2024-08-29 | Enhancing AI-Driven Psychological Consultation: Layered Prompts with Large Language Models | Rafael Souza et.al. | 2408.16276 | null |
2024-08-29 | M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation | Jonggwon Park et.al. | 2408.16213 | null |
2024-08-28 | Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | Huachuan Qiu et.al. | 2408.15787 | link |
2024-08-28 | A Survey on Evaluation of Multimodal Large Language Models | Jiaxing Huang et.al. | 2408.15769 | null |
2024-08-26 | Improving Clinical Note Generation from Complex Doctor-Patient Conversation | Yizhan Li et.al. | 2408.14568 | null |
2024-09-06 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues | Kuluhan Binici et.al. | 2408.14418 | null |
2024-09-03 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-25 | Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data | Felix J. Dorfner et.al. | 2408.13833 | null |
2024-08-25 | Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models | Duy Khoa Pham et.al. | 2408.13808 | null |
2024-08-23 | IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models | Zhihao Yu et.al. | 2408.13073 | link |
2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
2024-08-23 | Grounding Fallacies Misrepresenting Scientific Publications in Evidence | Max Glockner et.al. | 2408.12812 | link |
2024-08-22 | RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | Xiaohan Wang et.al. | 2408.12579 | null |
2024-09-05 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu et.al. | 2408.12547 | link |
2024-08-22 | MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | Hao Wei et.al. | 2408.12496 | null |
2024-08-22 | Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations | Kai Tzu-iunn Ong et.al. | 2408.12315 | null |
2024-08-22 | LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction | Aishik Nagar et.al. | 2408.12249 | null |
2024-08-22 | MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient | Yanzeng Li et.al. | 2408.12236 | null |
2024-08-22 | Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards | Shresth Verma et.al. | 2408.12112 | null |
2024-08-22 | Aligning (Medical) LLMs for (Counterfactual) Fairness | Raphael Poulain et.al. | 2408.12055 | link |
2024-08-21 | Exploring Large Language Models for Feature Selection: A Data-centric Perspective | Dawei Li et.al. | 2408.12025 | null |
2024-08-16 | Speaking the Same Language: Leveraging LLMs in Standardizing Clinical Data for AI | Arindam Sett et.al. | 2408.11861 | null |
2024-08-15 | When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? | Yanjun Gao et.al. | 2408.11854 | null |
2024-08-13 | MGH Radiology Llama: A Llama 3 70B Model for Radiology | Yucheng Shi et.al. | 2408.11848 | null |
2024-09-01 | Clinical Insights: A Comprehensive Review of Language Models in Medicine | Nikita Neveditsin et.al. | 2408.11735 | null |
2024-08-21 | BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports | Yuxuan Chen et.al. | 2408.11334 | null |
2024-08-21 | Probabilistic Medical Predictions of Large Language Models | Bowen Gu et.al. | 2408.11316 | null |
2024-08-21 | Applying and Evaluating Large Language Models in Mental Health Care: A Scoping Review of Human-Assessed Generative Tasks | Yining Hua et.al. | 2408.11288 | null |
2024-08-21 | BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation | Haotian Peng et.al. | 2408.11281 | link |
2024-08-20 | Public Health in Disaster: Emotional Health and Life Incidents Extraction during Hurricane Harvey | Thomas Hoang et.al. | 2408.11133 | null |
2024-08-20 | CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | Michael Reinisch et.al. | 2408.10995 | null |
2024-08-20 | Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology | Yihao Hou et.al. | 2408.10715 | null |
2024-08-20 | Large Language Models for Multimodal Deformable Image Registration | Mingrui Ma et.al. | 2408.10703 | link |
2024-08-19 | Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory | Haoran Li et.al. | 2408.10053 | null |
2024-08-29 | MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis | Ruihui Hou et.al. | 2408.10039 | null |
2024-08-19 | Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions | Sebastian Heineking et.al. | 2408.09831 | link |
2024-08-19 | R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation | Xiao Wang et.al. | 2408.09743 | link |
2024-08-18 | Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities | Minh Duc Chu et.al. | 2408.09366 | null |
2024-08-17 | TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems | Xinke Jiang et.al. | 2408.09199 | link |
2024-08-17 | AI Managed Emergency Documentation with a Pretrained Model | David Menzies et.al. | 2408.09193 | null |
2024-08-16 | Improving VTE Identification through Language Models from Radiology Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT | Jamie Deng et.al. | 2408.09043 | null |
2024-08-16 | HSDreport: Heart Sound Diagnosis with Echocardiography Reports | Zihan Zhao et.al. | 2408.08669 | null |
2024-08-16 | RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions | Gregory Kell et.al. | 2408.08624 | link |
2024-08-15 | Assessing and Enhancing Large Language Models in Rare Disease Question-answering | Guanchu Wang et.al. | 2408.08422 | null |
2024-08-15 | LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Jiajie Li et.al. | 2408.07981 | null |
2024-08-15 | The doctor will polygraph you now: ethical concerns with AI for fact-checking patients | James Anibal et.al. | 2408.07896 | null |
2024-08-15 | Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering | Yushi Yang et.al. | 2408.07888 | link |
2024-08-14 | MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis | Nimeesha Chan et.al. | 2408.07773 | link |
2024-08-27 | Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | Seungjun Han et.al. | 2408.07531 | null |
2024-08-14 | Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health | Yongquan Hu et.al. | 2408.07313 | null |
2024-07-24 | Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition | Michele Fiori et.al. | 2408.06352 | null |
2024-08-12 | Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM | Trisha Das et.al. | 2408.06285 | null |
2024-08-12 | Med42-v2: A Suite of Clinical LLMs | Clément Christophe et.al. | 2408.06142 | null |
2024-08-10 | Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction | Jung Hoon Lim et.al. | 2408.05555 | null |
2024-08-16 | RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records | Sangjoon Park et.al. | 2408.05074 | null |
2024-08-08 | Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction | Reza Khanmohammadi et.al. | 2408.04775 | null |
2024-08-08 | Dynamic Fog Computing for Enhanced LLM Execution in Medical Applications | Philipp Zagar et.al. | 2408.04680 | null |
2024-08-03 | Building Trust in Mental Health Chatbots: Safety Metrics and LLM-Based Evaluation Tools | Jung In Park et.al. | 2408.04650 | null |
2024-08-08 | Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | Junde Wu et.al. | 2408.04187 | link |
2024-08-08 | Academic collaboration on large language model studies increases overall but varies across disciplines | Lingyao Li et.al. | 2408.04163 | link |
2024-08-08 | Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering | Haoran Yu et.al. | 2408.04138 | null |
2024-08-07 | Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology | Panagiotis Fytas et.al. | 2408.04121 | null |
2024-08-07 | Towards Multimodal Emotional Support Conversation Systems | Yuqi Chu et.al. | 2408.03650 | link |
2024-08-06 | Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation | Artur Guimarães et.al. | 2408.03127 | link |
2024-08-06 | Targeted Visual Prompting for Medical Visual Question Answering | Sergio Tascon-Morales et.al. | 2408.03043 | link |
2024-08-06 | Fact Finder – Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs | Daniel Steinigen et.al. | 2408.03010 | link |
2024-08-07 | Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval | Iman Azimi et.al. | 2408.02964 | link |
2024-08-04 | MedSyn: LLM-based Synthetic Medical Text Generation Framework | Gleb Kumichev et.al. | 2408.02056 | link |
2024-08-06 | DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models | Bowen Wang et.al. | 2408.01933 | link |
2024-08-03 | MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | Jihye Choi et.al. | 2408.01869 | link |
2024-07-27 | AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools | Aditya Paul et.al. | 2408.01459 | null |
2024-08-02 | The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models | Hannah Chen et.al. | 2408.01285 | null |
2024-08-05 | Agentic LLM Workflows for Generating Patient-Friendly Medical Reports | Malavikha Sudarshan et.al. | 2408.01112 | link |
2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | link |
2024-07-25 | Closing the gap between open-source and commercial large language models for medical evidence summarization | Gongbo Zhang et.al. | 2408.00588 | null |
2024-07-31 | A Taxonomy of Stereotype Content in Large Language Models | Gandalf Nicolas et.al. | 2408.00162 | null |
2024-07-31 | A Course Shared Task on Evaluating LLM Output for Clinical Questions | Yufang Hou et.al. | 2408.00122 | link |
2024-07-24 | Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications | Cui Long et.al. | 2407.21055 | null |
2024-07-23 | An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice | Roma Shusterman et.al. | 2407.21051 | null |
2024-08-12 | Artificial Intelligence in Extracting Diagnostic Data from Dental Records | Yao-Shun Chuang et.al. | 2407.21050 | null |
2024-07-30 | Can LLMs be Fooled? Investigating Vulnerabilities in LLMs | Sara Abdali et.al. | 2407.20529 | null |
2024-07-29 | Exploring Large Language Models to generate Easy to Read content | Paloma Martínez et.al. | 2407.20046 | null |
2024-07-30 | CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare | Jingwei Zhu et.al. | 2407.19705 | link |
2024-07-28 | A Generic Review of Integrating Artificial Intelligence in Cognitive Behavioral Therapy | Meng Jiang et.al. | 2407.19422 | null |
2024-07-27 | The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations | Thanh-Dung Le et.al. | 2407.19299 | null |
2024-07-27 | Multi-Modal CLIP-Informed Protein Editing | Mingze Yin et.al. | 2407.19296 | null |
2024-07-27 | Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review | Tongyue Shi et.al. | 2407.19256 | null |
2024-07-26 | Large Language Models as Co-Pilots for Causal Inference in Medical Studies | Ahmed Alaa et.al. | 2407.19118 | null |
2024-07-26 | Towards Automated Solution Recipe Generation for Industrial Asset Management with LLM | Nianjun Zhou et.al. | 2407.18992 | null |
2024-07-26 | Is larger always better? Evaluating and prompting large language models for non-generative medical tasks | Yinghao Zhu et.al. | 2407.18525 | link |
2024-07-24 | Online Social Network Data-Driven Early Detection on Short-Form Video Addiction | Fang-Yu Kuo et.al. | 2407.18277 | null |
2024-07-25 | The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | Eric Yang et.al. | 2407.18044 | null |
2024-08-15 | The Power of Combining Data and Knowledge: GPT-4o is an Effective Interpreter of Machine Learning Models in Predicting Lymph Node Metastasis of Lung Cancer | Danqing Hu et.al. | 2407.17900 | null |
2024-07-25 | Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy? | Hao Shen et.al. | 2407.17730 | null |
2024-07-24 | IgnitionInnovators at “Discharge Me!”: Chain-of-Thought Instruction Finetuning Large Language Models for Discharge Summaries | An Quang Tang et.al. | 2407.17636 | link |
2024-07-24 | SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH) | Bernardo Consoli et.al. | 2407.17126 | null |
2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | link |
2024-07-23 | PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets | Jaeyoung Kim et.al. | 2407.16329 | null |
2024-07-23 | Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks | Yao-Shun Chuang et.al. | 2407.16166 | link |
2024-07-16 | Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis | Qiuhong Wei et.al. | 2407.15862 | null |
2024-07-21 | A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech | Gaurav Verma et.al. | 2407.15227 | null |
2024-07-19 | CVE-LLM : Automatic vulnerability evaluation in medical device industry using large language models | Rikhiya Ghosh et.al. | 2407.14640 | null |
2024-07-19 | Adversarial Databases Improve Success in Retrieval-based Large Language Models | Sean Wu et.al. | 2407.14609 | null |
2024-07-19 | Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis | Valentin Pelloin et.al. | 2407.14180 | link |
2024-07-28 | Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field | Tobias Kerner et.al. | 2407.14076 | null |
2024-07-19 | Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization | Md Sultan Al Nahian et.al. | 2407.14000 | null |
2024-07-18 | KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration | Youfu Yan et.al. | 2407.13598 | null |
2024-07-18 | Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks | Samy Ateia et.al. | 2407.13511 | link |
2024-07-18 | End-To-End Clinical Trial Matching with Large Language Models | Dyke Ferber et.al. | 2407.13463 | null |
2024-07-18 | CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | Junying Chen et.al. | 2407.13301 | link |
2024-07-18 | TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models | Ling Yue et.al. | 2407.13115 | null |
2024-07-03 | Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital Mindfulness | Harsh Kumar et.al. | 2407.13067 | null |
2024-07-17 | Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models | Alexander R. Pelletier et.al. | 2407.12888 | link |
2024-07-06 | Large language models are good medical coders, if provided with tools | Keith Kwan et.al. | 2407.12849 | link |
2024-07-04 | NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions | Andong Hua et.al. | 2407.12843 | null |
2024-07-02 | Lightweight Large Language Model for Medication Enquiry: Med-Pal | Kabilan Elangovan et.al. | 2407.12822 | null |
2024-07-18 | Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions | Marcos Fernández-Pichel et.al. | 2407.12468 | link |
2024-07-17 | MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models | Thao Minh Nguyen Phan et.al. | 2407.12309 | null |
2024-07-17 | A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery | Jike Wang et.al. | 2407.12296 | null |
2024-07-26 | LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation | Bunyamin Keles et.al. | 2407.12126 | null |
2024-06-30 | Evaluation of Bias Towards Medical Professionals in Large Language Models | Xi Chen et.al. | 2407.12031 | null |
2024-07-16 | Schema Matching with Large Language Models: an Experimental Study | Marcel Parciak et.al. | 2407.11852 | link |
2024-07-25 | CCoE: A Compact LLM with Collaboration of Experts | Shaomang Huang et.al. | 2407.11686 | null |
2024-07-16 | Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise | Qimin Yang et.al. | 2407.11536 | null |
2024-07-09 | Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations | Rachael Fleurence et.al. | 2407.11054 | null |
2024-06-25 | Panacea: A foundation model for clinical trial search, summarization, design, and recruitment | Jiacheng Lin et.al. | 2407.11007 | link |
2024-07-15 | Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities | Nhat Le et.al. | 2407.10785 | null |
2024-07-15 | TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction | Xingzhi Zhou et.al. | 2407.10510 | null |
2024-07-15 | Enhancing Medication Recommendation with LLM Text Representation | Yu-Tzu Lee et.al. | 2407.10453 | null |
2024-07-13 | Causality extraction from medical text using Large Language Models (LLMs) | Seethalakshmi Gopalakrishnan et.al. | 2407.10020 | null |
2024-07-13 | PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models | Can Cui et.al. | 2407.09979 | null |
2024-07-12 | Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction | Chase Fensore et.al. | 2407.09688 | link |
2024-07-12 | Open (Clinical) LLMs are Sensitive to Instruction Phrasings | Alberto Mario Ceballos Arroyo et.al. | 2407.09429 | link |
2024-07-12 | STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs | Yiheng Huang et.al. | 2407.09096 | link |
2024-07-11 | Uncertainty Estimation of Large Language Models in Medical Question Answering | Jiaxin Wu et.al. | 2407.08662 | null |
2024-07-11 | Leveraging LLMs to Predict Affective States via Smartphone Sensor Features | Tianyi Zhang et.al. | 2407.08240 | null |
2024-07-11 | DALL-M: Context-Aware Clinical Data Augmentation with LLMs | Chihcheng Hsieh et.al. | 2407.08227 | link |
2024-07-10 | Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing | Ian Steenstra et.al. | 2407.08095 | link |
2024-07-04 | CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation | Rui Yang et.al. | 2407.07913 | null |
2024-07-10 | A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability | Ting Fang Tan et.al. | 2407.07666 | null |
2024-07-10 | Interpretable Differential Diagnosis with Dual-Inference Large Language Models | Shuang Zhou et.al. | 2407.07330 | null |
2024-07-09 | Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges | Emilio Ferrara et.al. | 2407.07196 | null |
2024-07-09 | Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies | Inwon Kang et.al. | 2407.07019 | null |
2024-07-09 | End-To-End Causal Effect Estimation from Unstructured Natural Language Data | Nikita Dhawan et.al. | 2407.07018 | null |
2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125 | null |
2024-07-08 | Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs | Sanjeet Singh et.al. | 2407.05887 | link |
2024-07-08 | PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation | Jinpeng Hu et.al. | 2407.05721 | link |
2024-07-07 | CLIMB: A Benchmark of Clinical Bias in Large Language Models | Yubo Zhang et.al. | 2407.05250 | link |
2024-07-06 | Leveraging Task-Specific Knowledge from LLM for Semi-Supervised 3D Medical Image Segmentation | Suruchi Kumari et.al. | 2407.05088 | null |
2024-07-05 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework | Reza Averly et.al. | 2407.04629 | null |
2024-07-05 | Using LLMs to label medical papers according to the CIViC evidence model | Markus Hisch et.al. | 2407.04466 | link |
2024-07-04 | Query-Guided Self-Supervised Summarization of Nursing Notes | Ya Gao et.al. | 2407.04125 | null |
2024-07-04 | Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval | Kazuaki Furumai et.al. | 2407.03585 | null |
2024-07-03 | Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory | Suyeon Lee et.al. | 2407.03103 | link |
2024-07-03 | SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research | Meghal Dani et.al. | 2407.03004 | null |
2024-07-02 | Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms | Viet Cuong Nguyen et.al. | 2407.02662 | null |
2024-07-02 | MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | Binxu Li et.al. | 2407.02483 | link |
2024-06-29 | Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare | Forhan Bin Emdad et.al. | 2407.01627 | null |
2024-07-14 | Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles | Ryan Louie et.al. | 2407.00870 | null |
2024-06-30 | Large Language Models Struggle in Token-Level Clinical Named Entity Recognition | Qiuhao Lu et.al. | 2407.00731 | link |
2024-06-29 | Answering real-world clinical questions using large language model based systems | Yen Sia Low et.al. | 2407.00541 | null |
2024-06-29 | ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees | Zhiyuan Wang et.al. | 2407.00499 | link |
2024-06-28 | EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models | João Matos et.al. | 2407.00242 | link |
2024-07-02 | Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges | Mahmoud Ibrahim et.al. | 2407.00116 | null |
2024-06-27 | PathAlign: A vision-language model for whole slide images in histopathology | Faruk Ahmed et.al. | 2406.19578 | null |
2024-06-27 | PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models | Cathy Mengying Fang et.al. | 2406.19283 | null |
2024-06-27 | HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale | Junying Chen et.al. | 2406.19280 | link |
2024-06-26 | Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources | Yiming Li et.al. | 2406.18049 | null |
2024-06-26 | LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them | Wenya Xie et.al. | 2406.18034 | null |
2024-06-26 | Automated Clinical Data Extraction with Knowledge Conditioned LLMs | Diya Li et.al. | 2406.18027 | null |
2024-07-11 | Multi-step Inference over Unstructured Data | Aditya Kalyanpur et.al. | 2406.17987 | null |
2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
2024-07-06 | MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation | Yusheng Liao et.al. | 2406.17484 | link |
2024-06-25 | Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis | Ajan Subramanian et.al. | 2406.16252 | null |
2024-06-23 | Effectiveness of ChatGPT in explaining complex medical reports to patients | Mengxuan Sun et.al. | 2406.15963 | null |
2024-06-22 | Real-time Speech Summarization for Medical Conversations | Khai Le-Duc et.al. | 2406.15888 | link |
2024-06-16 | WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist | Chenyu Ren et.al. | 2406.15474 | null |
2024-06-15 | Mental Disorder Classification via Temporal Representation of Text | Raja Kumar et.al. | 2406.15470 | null |
2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
2024-06-21 | Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction | Jinge Wu et.al. | 2406.15045 | null |
2024-06-21 | MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens | Yongqi Fan et.al. | 2406.15019 | link |
2024-06-21 | Human-AI collectives produce the most accurate differential diagnoses | N. Zöller et.al. | 2406.14981 | link |
2024-06-21 | 70B-parameter large language models in Japanese medical question-answering | Issey Sukeda et.al. | 2406.14882 | null |
2024-06-27 | Efficient Continual Pre-training by Mitigating the Stability Gap | Yiduo Guo et.al. | 2406.14833 | null |
2024-07-01 | ACR: A Benchmark for Automatic Cohort Retrieval | Dung Ngoc Thai et.al. | 2406.14780 | null |
2024-06-20 | A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes | Syed I. Munzir et.al. | 2406.14757 | null |
2024-06-20 | medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs | Mingyi Jia et.al. | 2406.14326 | link |
2024-06-19 | ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World | Weixiang Yan et.al. | 2406.13890 | link |
2024-06-24 | The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis | Marcin Rządeczka et.al. | 2406.13813 | null |
2024-06-19 | Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health | Bo Wen et.al. | 2406.13659 | null |
2024-06-19 | Optimizing Psychological Counseling with Instruction-Tuned Large Language Models | Wenjie Li et.al. | 2406.13617 | null |
2024-06-19 | Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective | David Restrepo et.al. | 2406.13152 | null |
2024-06-18 | Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia | Ankit Aich et.al. | 2406.12687 | null |
2024-06-18 | Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics | Huan Xu et.al. | 2406.12651 | null |
2024-06-20 | Towards a Client-Centered Assessment of LLM Therapists by Client Simulation | Jiashuo Wang et.al. | 2406.12266 | link |
2024-06-18 | Adversarial Attacks on Large Language Models in Medicine | Yifan Yang et.al. | 2406.12259 | null |
2024-06-18 | Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models | Lulu Zhao et.al. | 2406.12182 | null |
2024-06-19 | Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks | Jack Gallifant et.al. | 2406.12066 | link |
2024-06-28 | WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions | Seyedali Mohammadi et.al. | 2406.12058 | link |
2024-06-30 | MedCalc-Bench: Evaluating Large Language Models for Medical Calculations | Nikhil Khandekar et.al. | 2406.12036 | link |
2024-06-19 | Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models | Yuqing Wang et.al. | 2406.12033 | link |
2024-06-17 | Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams | Zheheng Luo et.al. | 2406.11328 | link |
2024-06-17 | Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization | Minda Hu et.al. | 2406.11258 | null |
2024-06-16 | RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information | Zhiwei Liu et.al. | 2406.11093 | link |
2024-06-15 | SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task | Ziije Zhong et.al. | 2406.10710 | link |
2024-06-15 | We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation | Palash Moon et.al. | 2406.10561 | null |
2024-06-15 | CancerLLM: A Large Language Model in Cancer Domain | Mingchen Li et.al. | 2406.10459 | null |
2024-06-14 | Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework | Olivier Binette et.al. | 2406.10366 | null |
2024-06-14 | A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations | Jinqiang Wang et.al. | 2406.10303 | link |
2024-06-13 | Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark | Chufan Gao et.al. | 2406.10292 | null |
2024-06-11 | Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis | Matteo Esposito et.al. | 2406.10273 | null |
2024-06-14 | Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jiawei Chen et.al. | 2406.10185 | null |
2024-06-14 | CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions | Mingyu Derek Ma et.al. | 2406.09923 | link |
2024-06-13 | Chain-of-Though (CoT) prompting strategies for medical error detection and correction | Zhaolong Wu et.al. | 2406.09103 | null |
2024-06-13 | Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations | Jun-Woo Kim et.al. | 2406.08718 | null |
2024-06-12 | Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization | Fengxiao Tang et.al. | 2406.08305 | null |
2024-06-18 | SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature | David Wadden et.al. | 2406.07835 | link |
2024-06-12 | Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images | Che Liu et.al. | 2406.07146 | null |
2024-06-10 | Large language models for generating rules, yay or nay? | Shangeetha Sivasothy et.al. | 2406.06835 | null |
2024-06-10 | Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing | Enshuo Hsu et.al. | 2406.06723 | null |
2024-06-09 | LLM Questionnaire Completion for Automatic Psychiatric Assessment | Gony Rosenman et.al. | 2406.06636 | null |
2024-06-07 | Transforming Dental Diagnostics with Artificial Intelligence: Advanced Integration of ChatGPT and Large Language Models for Patient Care | Masoumeh Farhadi Nia et.al. | 2406.06616 | null |
2024-06-03 | MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering | Robert Osazuwa Ness et.al. | 2406.06573 | null |
2024-06-10 | Towards a Personal Health Large Language Model | Justin Cosentino et.al. | 2406.06474 | null |
2024-06-11 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464 | null |
2024-06-13 | A Large Language Model Pipeline for Breast Cancer Oncology | Tristen Pool et.al. | 2406.06455 | null |
2024-06-10 | Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain | Brian Hu et.al. | 2406.06435 | link |
2024-06-10 | MedExQA: Medical Question Answering Benchmark with Multiple Explanations | Yunsoo Kim et.al. | 2406.06331 | link |
2024-06-10 | Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text | Avijit Mitra et.al. | 2406.06056 | link |
2024-06-10 | Enhancing Food Safety in Supply Chains: The Potential Role of Large Language Models in Preventing Campylobacter Contamination | Asaf Tzachor et.al. | 2406.06049 | null |
2024-06-09 | Zero-Shot End-To-End Spoken Question Answering In Medical Domain | Yanis Labrak et.al. | 2406.05876 | null |
2024-06-09 | MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering | Juraj Vladika et.al. | 2406.05845 | null |
2024-06-08 | Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification | Yunhe Gao et.al. | 2406.05596 | null |
2024-06-07 | TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models | Ping Yu et.al. | 2406.04941 | null |
2024-06-06 | On The Persona-based Summarization of Domain-Specific Documents | Ankan Mullick et.al. | 2406.03986 | link |
2024-06-06 | UltraMedical: Building Specialized Generalists in Biomedicine | Kaiyan Zhang et.al. | 2406.03949 | link |
2024-06-06 | Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As | Eden Avnat et.al. | 2406.03855 | null |
2024-06-06 | A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions | Lei Liu et.al. | 2406.03712 | null |
2024-06-06 | M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering | Anand Subramanian et.al. | 2406.03699 | link |
2024-06-05 | Missci: Reconstructing Fallacies in Misrepresented Science | Max Glockner et.al. | 2406.03181 | link |
2024-06-05 | MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge | Yuxuan Zhou et.al. | 2406.02919 | link |
2024-06-04 | Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | Maxime Griot et.al. | 2406.02394 | link |
2024-06-05 | LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing | Maojun Sun et.al. | 2406.02350 | link |
2024-06-04 | Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Martin J. Hetz et.al. | 2406.01428 | null |
2024-06-03 | TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine | Wenjing Yue et.al. | 2406.01126 | null |
2024-06-04 | MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning | Shuyue Stella Li et.al. | 2406.00922 | link |
2024-05-29 | Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study | David Pissarra et.al. | 2406.00062 | null |
2024-05-27 | EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling | Yinghao Zhu et.al. | 2406.00036 | null |
2024-05-22 | KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR | Hajung Kim et.al. | 2406.00014 | null |
2024-05-26 | Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | Xijie Huang et.al. | 2405.20775 | link |
2024-05-31 | GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models | Mohammed-Khalil Ghali et.al. | 2405.20585 | null |
2024-05-30 | PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals | Ruiyi Wang et.al. | 2405.19660 | link |
2024-05-30 | Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router | Akul Goel et.al. | 2405.19631 | null |
2024-05-26 | ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text | Han Yu et.al. | 2405.19366 | link |
2024-05-29 | Reasoning3D – Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
2024-06-03 | PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications | Dingkang Yang et.al. | 2405.19266 | link |
2024-05-28 | Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation | Anjanava Biswas et.al. | 2405.18346 | null |
2024-05-28 | Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints | Aryo Pradipta Gema et.al. | 2405.18028 | null |
2024-05-28 | SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions | Juexiao Zhou et.al. | 2405.18004 | null |
2024-05-26 | Augmented Risk Prediction for the Onset of Alzheimer’s Disease from Electronic Health Records with Large Language Models | Jiankun Wang et.al. | 2405.16413 | null |
2024-05-26 | Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions | Man Luo et.al. | 2405.16402 | null |
2024-05-29 | Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data | Yuhao Chen et.al. | 2405.16295 | null |
2024-05-28 | Ensuring Ground Truth Accuracy in Healthcare with the EVINCE framework | Edward Y. Chang et.al. | 2405.15808 | null |
2024-05-27 | Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development | Pranab Sahoo et.al. | 2405.15766 | link |
2024-05-24 | Efficient Reinforcement Learning via Large Language Model-based Search | Siddhant Bhambri et.al. | 2405.15194 | null |
2024-05-24 | Generalizable and Scalable Multistage Biomedical Concept Normalization Leveraging Large Language Models | Nicholas J Dobbins et.al. | 2405.15122 | link |
2024-05-23 | Evaluating Large Language Models for Public Health Classification and Extraction Tasks | Joshua Harris et.al. | 2405.14766 | null |
2024-05-23 | Exploring the use of a Large Language Model for data extraction in systematic reviews: a rapid feasibility study | Lena Schmidt et.al. | 2405.14445 | null |
2024-05-23 | Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation | Zhusi Zhong et.al. | 2405.14113 | link |
2024-05-22 | Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation | Siyi Wu et.al. | 2405.13803 | null |
2024-05-21 | How Reliable AI Chatbots are for Disease Prediction from Patient Complaints? | Ayesha Siddika Nipu et.al. | 2405.13219 | null |
2024-05-20 | Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian | Rohitash Chandra et.al. | 2405.13056 | link |
2024-05-20 | Large Language Models for Medicine: A Survey | Yanxin Zheng et.al. | 2405.13055 | null |
2024-05-12 | Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data | Nan Miles Xi et.al. | 2405.13005 | null |
2024-05-21 | OLAPH: Improving Factuality in Biomedical Long-form Question Answering | Minbyul Jeong et.al. | 2405.12701 | link |
2024-05-21 | Exploration of Masked and Causal Language Modelling for Text Generation | Nicolo Micheletti et.al. | 2405.12630 | null |
2024-05-21 | DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge | Bufang Yang et.al. | 2405.12541 | null |
2024-05-20 | Can AI Relate: Testing Large Language Model Response for Mental Health Support | Saadia Gabriel et.al. | 2405.12021 | link |
2024-05-19 | Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | Zishan Gu et.al. | 2405.11640 | null |
2024-05-18 | Can Public LLMs be used for Self-Diagnosis of Medical Conditions ? | Nikil Sharan Prabahar Balasubramanian et.al. | 2405.11407 | null |
2024-05-18 | Automating PTSD Diagnostics in Clinical Interviews: Leveraging Large Language Models for Trauma Assessments | Sichang Tu et.al. | 2405.11178 | null |
2024-05-17 | From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | Jace Grandinetti et.al. | 2405.11040 | null |
2024-05-17 | COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | Dimitrios P. Panagoulias et.al. | 2405.10893 | null |
2024-05-16 | Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification | Jinge Wu et.al. | 2405.10440 | null |
2024-05-14 | PromptMind Team at EHRSQL-2024: Improving Reliability of SQL Generation using Ensemble LLMs | Satya K Gundabathula et.al. | 2405.08839 | null |
2024-05-14 | A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine | Hanguang Xiao et.al. | 2405.08603 | null |
2024-05-14 | PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles | Satya Kesav Gundabathula et.al. | 2405.08373 | null |
2024-05-30 | AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | Samuel Schmidgall et.al. | 2405.07960 | null |
2024-05-13 | Evaluating large language models in medical applications: a survey | Xiaolan Chen et.al. | 2405.07468 | null |
2024-05-10 | A Global Data-Driven Model for The Hippocampus and Nucleus Accumbens of Rat From The Local Field Potential Recordings (LFP) | Maedeh Sadeghi et.al. | 2405.06732 | null |
2024-05-09 | Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses | Gaurav Kumar Gupta et.al. | 2405.06712 | null |
2024-05-08 | Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance | Goran Muric et.al. | 2405.06703 | null |
2024-05-08 | Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks | Chancellor R. Woolsey et.al. | 2405.06695 | null |
2024-05-10 | Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | Mengjia Niu et.al. | 2405.06545 | null |
2024-06-03 | XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare | Fatemeh Nazary et.al. | 2405.06270 | null |
2024-05-09 | Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection | Bhawesh Kumar et.al. | 2405.06093 | null |
2024-05-09 | Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents | Matthew Jörke et.al. | 2405.06061 | null |
2024-05-09 | Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | Shan Chen et.al. | 2405.05506 | link |
2024-05-08 | Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models | Aylin Gunal et.al. | 2405.05060 | null |
2024-05-12 | DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature | Dawei Li et.al. | 2405.04819 | link |
2024-05-08 | Empathy Through Multimodality in Conversational Interfaces | Mahyar Abbasian et.al. | 2405.04777 | null |
2024-05-07 | AffirmativeAI: Towards LGBTQ+ Friendly Audit Frameworks for Large Language Models | Yinru Long et.al. | 2405.04652 | null |
2024-05-07 | D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models | Duygu Altinok et.al. | 2405.04170 | link |
2024-05-14 | ERATTA: Extreme RAG for Table To Answers with Large Language Models | Sohini Roychowdhury et.al. | 2405.03963 | null |
2024-05-08 | How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690 | null |
2024-05-06 | MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline | Mohamed Yaseen Jabarulla et.al. | 2405.03359 | link |
2024-05-06 | Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines | Md Main Uddin Rony et.al. | 2405.03153 | null |
2024-05-22 | A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs) | Lingyao Li et.al. | 2405.03066 | null |
2024-05-05 | Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | Junkai Li et.al. | 2405.02957 | null |
2024-05-05 | Confidential and Protected Disease Classifier using Fully Homomorphic Encryption | Aditya Malik et.al. | 2405.02790 | null |
2024-05-04 | A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare | Thomas Yu Chow Tam et.al. | 2405.02559 | null |
2024-05-03 | MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain | Chao Jiang et.al. | 2405.02144 | null |
2024-05-03 | CRCL at SemEval-2024 Task 2: Simple prompt optimizations | Clément Brutti-Mairesse et.al. | 2405.01942 | link |
2024-05-03 | Aloe: A Family of Fine-tuned Open Healthcare LLMs | Ashwin Kumar Gururajan et.al. | 2405.01886 | null |
2024-05-02 | Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models | Hye Sun Yun et.al. | 2405.01686 | link |
2024-05-22 | Leveraging Prompt-Learning for Structured Information Extraction from Crohn’s Disease Radiology Reports in a Low-Resource Language | Liam Hazan et.al. | 2405.01682 | null |
2024-04-29 | Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model | Seonhee Cho et.al. | 2405.01591 | null |
2024-05-09 | GPT-4 passes most of the 297 written Polish Board Certification Examinations | Jakub Pokrywka et.al. | 2405.01589 | null |
2024-05-02 | Prompt engineering paradigms for medical applications: scoping review and recommendations for better practices | Jamil Zaghir et.al. | 2405.01249 | null |
2024-04-27 | Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study | Dou Liu et.al. | 2405.00728 | null |
2024-04-25 | Large Language Models in Healthcare: A Comprehensive Benchmark | Andrew Liu et.al. | 2405.00716 | link |
2024-04-25 | Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation | Hanyin Wang et.al. | 2405.00715 | link |
2024-04-23 | Interactive Analysis of LLMs using Meaningful Counterfactuals | Furui Cheng et.al. | 2405.00708 | null |
2024-05-15 | “I’m Not Sure, But…”: Examining the Impact of Large Language Models’ Uncertainty Expression on User Reliance and Trust | Sunnie S. Y. Kim et.al. | 2405.00623 | null |
2024-05-01 | Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning | Huan Xu et.al. | 2405.00461 | null |
2024-05-01 | DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training | Bhuvanesh Verma et.al. | 2405.00321 | null |
2024-05-06 | Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models | Scott Sumpter et.al. | 2404.19713 | null |
2024-04-29 | It’s Difficult to be Neutral – Human and LLM-based Sentiment Annotation of Patient Comments | Petter Mæhlum et.al. | 2404.18832 | null |
2024-04-29 | Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models | Hongyi Zhu et.al. | 2404.18746 | null |
2024-04-29 | 6G comprehensive intelligence: network operations and optimization based on Large Language Models | Sifan Long et.al. | 2404.18373 | null |
2024-04-27 | MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch | Nadia Saeed et.al. | 2404.17999 | link |
2024-04-27 | Advancing Healthcare Automation: Multi-Agent Systems for Medical Necessity Justification | Himanshu Pandey et.al. | 2404.17977 | null |
2024-04-27 | Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models | Zhongzhen Huang et.al. | 2404.17897 | null |
2024-04-27 | VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition | Junyi Biana et.al. | 2404.17835 | null |
2024-04-25 | A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs | Christian N. Mayemba et.al. | 2404.16921 | null |
2024-04-25 | Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare | Emre Can Acikgoz et.al. | 2404.16621 | link |
2024-04-26 | Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums | Isabelle Lorge et.al. | 2404.16461 | null |
2024-04-25 | LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications | Saranya Krishnamoorthy et.al. | 2404.16294 | link |
2024-04-26 | Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions | Divyansh Agarwal et.al. | 2404.16251 | null |
2024-05-05 | A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry | Yining Huang et.al. | 2404.15777 | null |
2024-04-27 | PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models | Shashi Kant Gupta et.al. | 2404.15549 | null |
2024-04-23 | IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents | Jean-Philippe Corbeil et.al. | 2404.15488 | link |
2024-04-22 | Adaptive Collaboration Strategy for LLMs in Medical Decision Making | Yubin Kim et.al. | 2404.15155 | link |
2024-04-23 | Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Raphael Poulain et.al. | 2404.15149 | link |
2024-04-23 | Med42 – Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches | Clément Christophe et.al. | 2404.14779 | null |
2024-04-23 | CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning | Ling Yue et.al. | 2404.14777 | null |
2024-04-22 | WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models | Ronald Xie et.al. | 2404.14567 | null |
2024-04-22 | WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction | Augustin Toma et.al. | 2404.14544 | null |
2024-04-22 | No General Code of Ethics for All: Ethical Considerations in Human-bot Psycho-counseling | Lizhi Ma et.al. | 2404.14070 | null |
2024-04-20 | “I Wish There Were an AI”: Challenges and AI Potential in Cancer Patient-Provider Communication | Ziqi Yang et.al. | 2404.13409 | null |
2024-04-20 | UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions | Ana-Cristina Rogoz et.al. | 2404.13343 | link |
2024-04-20 | Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions | Soumyadeep Roy et.al. | 2404.13307 | link |
2024-05-03 | LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models | Mouhamed Amine Bouchiha et.al. | 2404.13236 | link |
2024-04-19 | Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging | Chia-Hsuan Chang et.al. | 2404.13149 | null |
2024-04-25 | Leveraging Large Language Model as Simulated Patients for Clinical Education | Yanzeng Li et.al. | 2404.13066 | null |
2024-04-19 | Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil et.al. | 2404.13043 | null |
2024-04-19 | Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation | Guanhua Chen et.al. | 2404.12879 | null |
2024-04-17 | Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM | Hongzhao Li et.al. | 2404.11209 | null |
2024-04-15 | Numerical Attributes Learning for Cardiac Failure Diagnostic from Clinical Narratives – A LESA-CamemBERT-bio Approach | Boammani Aser Lompo et.al. | 2404.10171 | null |
2024-04-14 | Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms | Diandian Guo et.al. | 2404.09231 | null |
2024-04-13 | Adapting Mental Health Prediction Tasks for Cross-lingual Learning via Meta-Training and In-context Learning with Large Language Model | Zita Lifelo et.al. | 2404.09045 | null |
2024-04-11 | Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions | Agasthya Gangavarapu et.al. | 2404.08705 | null |
2024-04-11 | Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain | Iker García-Ferrero et.al. | 2404.07613 | null |
2024-04-11 | CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models | Sheng Wang et.al. | 2404.07424 | null |
2024-04-10 | LLMs in Biomedicine: A study on clinical Named Entity Recognition | Masoud Monajatipoor et.al. | 2404.07376 | link |
2024-04-10 | Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study | Hongru Du et.al. | 2404.06962 | link |
2024-04-10 | Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination | Soojong Kim et.al. | 2404.06731 | null |
2024-04-10 | Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology | Shashi Kant Gupta et.al. | 2404.06680 | null |
2024-04-09 | Comparing Two Model Designs for Clinical Note Generation; Is an LLM a Useful Evaluator of Consistency? | Nathan Brake et.al. | 2404.06503 | null |
2024-04-08 | MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Iñigo Alonso et.al. | 2404.05590 | null |
2024-04-15 | Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations | Yiming Li et.al. | 2404.05415 | null |
2024-04-08 | Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients | HyoJe Jung et.al. | 2404.05144 | null |
2024-04-07 | Clinical Trials Protocol Authoring using LLMs | Morteza Maleki et.al. | 2404.05044 | null |
2024-04-07 | SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials | Mael Jullien et.al. | 2404.04963 | null |
2024-04-07 | PairAug: What Can Augmented Image-Text Pairs Do for Radiology? | Yutong Xie et.al. | 2404.04960 | link |
2024-04-06 | Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology | Dyke Ferber et.al. | 2404.04667 | null |
2024-04-06 | IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials | Shreyasi Mandal et.al. | 2404.04510 | link |
2024-04-04 | Conversational Disease Diagnosis via External Planner-Controlled Large Language Models | Zhoujian Sun et.al. | 2404.04292 | link |
2024-04-11 | CLUE: A Clinical Language Understanding Evaluation for LLMs | Amin Dada et.al. | 2404.04067 | link |
2024-04-04 | Personalized LLM Response Generation with Parameterized Memory Injection | Kai Zhang et.al. | 2404.03565 | link |
2024-04-02 | Classifying Cancer Stage with Open-Source Clinical Large Language Models | Chia-Hsuan Chang et.al. | 2404.01589 | null |
2024-04-01 | Towards a potential paradigm shift in health data collection and analysis | David Josef Herzog et.al. | 2404.01403 | null |
2024-04-01 | Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models | Yi-Lin Tuan et.al. | 2404.01295 | null |
2024-04-01 | Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided | Hongli Zhan et.al. | 2404.01288 | link |
2024-04-01 | Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record | Griffin Adams et.al. | 2404.01189 | null |
2024-04-01 | LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation | Zilong Wang et.al. | 2404.00998 | null |
2024-04-05 | How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey | Zhonghao Shi et.al. | 2404.00938 | null |
2024-04-04 | Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods | Yujuan Fu et.al. | 2404.00826 | link |
2024-03-30 | Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4 | Aryo Pradipta Gema et.al. | 2404.00484 | link |
2024-03-29 | Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain | Burcu Sayin et.al. | 2403.20288 | link |
2024-04-04 | Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries | Manjeet Yadav et.al. | 2403.20145 | null |
2024-03-28 | Developing Healthcare Language Model Embedding Spaces | Niall Taylor et.al. | 2403.19802 | null |
2024-03-28 | Bespoke Large Language Models for Digital Triage Assistance in Mental Health Care | Niall Taylor et.al. | 2403.19790 | null |
2024-03-28 | A Benchmark Evaluation of Clinical Named Entity Recognition in French | Nesrine Bannour et.al. | 2403.19726 | null |
2024-03-28 | BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation | Yuhong He et.al. | 2403.19414 | null |
2024-03-27 | Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data | Yuting Guo et.al. | 2403.19031 | null |
2024-03-27 | Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers | Laura Bergomi et.al. | 2403.18938 | link |
2024-03-27 | BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models | Haitao Li et.al. | 2403.18365 | null |
2024-03-26 | Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach | Andrea Ferrario et.al. | 2403.17873 | null |
2024-03-26 | Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization | Jae-hee So et.al. | 2403.17428 | link |
2024-03-27 | SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies | Akshat Choube et.al. | 2403.17219 | link |
2024-03-25 | Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model | Braja Gopal Patra et.al. | 2403.17199 | link |
2024-03-25 | Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data | Shinka Mori et.al. | 2403.16909 | link |
2024-03-25 | Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm | Lei Liu et.al. | 2403.16446 | null |
2024-03-25 | Dia-LLaMA: Towards Large Language Model-driven CT Report Generation | Zhixuan Chen et.al. | 2403.16386 | null |
2024-03-26 | Large Language Models in Biomedical and Health Informatics: A Bibliometric Review | Huizi Yu et.al. | 2403.16303 | null |
2024-03-24 | CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering | Hongbin Na et.al. | 2403.16008 | null |
2024-03-23 | LLMs Instruct LLMs:An Extraction and Editing Method | Xin Zhang et.al. | 2403.15736 | null |
2024-03-20 | Large language models can help boost food production, but be mindful of their risks | Djavan De Clercq et.al. | 2403.15475 | null |
2024-03-19 | LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction | Hejie Cui et.al. | 2403.15464 | null |
2024-03-29 | WoLF: Wide-scope Large Language Model Framework for CXR Understanding | Seil Kang et.al. | 2403.15456 | null |
2024-03-26 | The opportunities and risks of large language models in mental health | Hannah R. Lawrence et.al. | 2403.14814 | null |
2024-04-02 | Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis | Junyoung Kim et.al. | 2403.14801 | null |
2024-03-27 | Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature | Jeremy R. Harper et.al. | 2403.14721 | null |
2024-03-21 | Large Language Models for Multi-Choice Question Classification of Medical Subjects | Víctor Ponce-López et.al. | 2403.14582 | null |
2024-03-20 | Polaris: A Safety-focused LLM Constellation Architecture for Healthcare | Subhabrata Mukherjee et.al. | 2403.13313 | null |
2024-03-19 | Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning | Mengxian Lyu et.al. | 2403.13089 | null |
2024-03-19 | Improving Generalizability of Extracting Social Determinants of Health Using Large Language Models through Prompt-tuning | Cheng Peng et.al. | 2403.12374 | null |
2024-03-18 | Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach | Maria Mahbub et.al. | 2403.12297 | null |
2024-03-18 | A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models | Stephen R. Pfohl et.al. | 2403.12025 | link |
2024-04-02 | CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification | Korbinian Randl et.al. | 2403.11904 | link |
2024-03-18 | Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure | Ziyi Chen et.al. | 2403.11425 | link |
2024-03-17 | Cheap Ways of Extracting Clinical Markers from Texts | Anastasia Sandu et.al. | 2403.11227 | link |
2024-03-17 | Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping | Haoxi Zhang et.al. | 2403.11073 | null |
2024-03-21 | Do Large Language Models understand Medical Codes? | Simon A. Lee et.al. | 2403.10822 | null |
2024-03-16 | LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices | Jingping Nie et.al. | 2403.10779 | null |
2024-03-16 | Depression Detection on Social Media with Large Language Models | Xiaochong Lan et.al. | 2403.10750 | null |
2024-03-15 | Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems | Antonios Alexos et.al. | 2403.10596 | null |
2024-03-22 | Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction | Chen Chen et.al. | 2403.10581 | link |
2024-03-15 | Trusting the Search: Unraveling Human Trust in Health Information from Google and ChatGPT | Xin Sun et.al. | 2403.09987 | null |
2024-03-08 | A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health | Alexander Marrapese et.al. | 2403.09705 | null |
2024-03-14 | Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge | Li Yizhen et.al. | 2403.09164 | null |
2024-04-01 | A Continued Pretrained LLM Approach for Automatic Medical Note Generation | Dong Yuan et.al. | 2403.09057 | null |
2024-03-15 | AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic | Emad A. Alghamdi et.al. | 2403.09017 | null |
2024-03-14 | Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records | Erlend Frayling et.al. | 2403.08664 | null |
2024-03-13 | MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models | Subash Neupane et.al. | 2403.08607 | null |
2024-03-14 | Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator | Yusheng Liao et.al. | 2403.08495 | link |
2024-03-12 | SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models | Yu Yang et.al. | 2403.07384 | link |
2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | Keshara Weerasinghe et.al. | 2403.06734 | link |
2024-03-11 | Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement | Che Liu et.al. | 2403.06659 | link |
2024-03-11 | MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding | Jiageng Wu et.al. | 2403.06611 | null |
2024-03-11 | Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds | Jiageng WU et.al. | 2403.06609 | link |
2024-03-11 | Can LLMs’ Tuning Methods Work in Medical Multimodal Domain? | Jiawei Chen et.al. | 2403.06407 | link |
2024-03-10 | ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes | Shengxin Hong et.al. | 2403.06294 | null |
2024-03-10 | FedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning | Zhuo Zhang et.al. | 2403.06131 | null |
2024-03-19 | KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques | Rui Yang et.al. | 2403.05881 | link |
2024-03-08 | A Benchmark of Domain-Adapted Large Language Models for Generating Brief Hospital Course Summaries | Asad Aali et.al. | 2403.05720 | link |
2024-03-08 | Decomposing Vision-based LLM Predictions for Auto-Evaluation with GPT-4 | Qingqing Zhu et.al. | 2403.05680 | null |
2024-03-11 | Tell me the truth: A system to measure the trustworthiness of Large Language Models | Carlo Lipizzi et.al. | 2403.04964 | null |
2024-03-13 | Electrocardiogram Instruction Tuning for Report Generation | Zhongwei Wan et.al. | 2403.04945 | null |
2024-03-07 | Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering | Ojas Gramopadhye et.al. | 2403.04890 | link |
2024-03-06 | Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification | Ricardo Bigolin Lanfredi et.al. | 2403.04024 | link |
2024-03-06 | Towards Safe and Aligned Large Language Models for Medicine | Tessa Han et.al. | 2403.03744 | link |
2024-03-09 | Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People | Xidong Wang et.al. | 2403.03640 | link |
2024-03-05 | Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse | Joseph Gatto et.al. | 2403.03336 | null |
2024-03-05 | Socratic Reasoning Improves Positive Text Rewriting | Anmol Goel et.al. | 2403.03029 | null |
2024-03-05 | Towards Training A Chinese Large Language Model for Anesthesiology | Zhonghai Wang et.al. | 2403.02742 | null |
2024-03-05 | Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research | Brenda Y. Miao et.al. | 2403.02558 | link |
2024-03-16 | SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction | Jiahuan Yan et.al. | 2403.01570 | null |
2024-03-01 | Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries | Zelalem Gero et.al. | 2403.01002 | link |
2024-03-01 | Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language | Xiaohan Ding et.al. | 2403.00994 | null |
2024-03-01 | AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models | Lang Cao et.al. | 2403.00953 | null |
2024-03-01 | SoftTiger: A Clinical Foundation Model for Healthcare Workflows | Ye Chen et.al. | 2403.00868 | link |
2024-02-29 | EyeGPT: Ophthalmic Assistant with Large Language Models | Xiaolan Chen et.al. | 2403.00840 | null |
2024-02-28 | MedAide: Leveraging Large Language Models for On-Premise Medical Assistance on Edge Devices | Abdul Basit et.al. | 2403.00830 | null |
2024-02-18 | ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework | Zhongqi Yang et.al. | 2403.00781 | null |
2024-02-29 | OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Jenish Maharjan et.al. | 2402.19371 | null |
2024-02-29 | Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study | Prottay Kumar Adhikary et.al. | 2402.19052 | null |
2024-02-28 | Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models | Derong Xu et.al. | 2402.18099 | link |
2024-03-13 | Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions | Hanjie Chen et.al. | 2402.18060 | link |
2024-03-02 | JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability | Junda Wang et.al. | 2402.17887 | link |
2024-02-28 | Prescribing Large Language Models for Perioperative Care: What’s The Right Dose for Pre-trained Models? | Bing Xue et.al. | 2402.17493 | link |
2024-02-27 | A Piece of Theatre: Investigating How Teachers Design LLM Chatbots to Assist Adolescent Cyberbullying Education | Michael A. Hedderich et.al. | 2402.17456 | null |
2024-02-27 | Deep Learning Based Named Entity Recognition Models for Recipes | Mansi Goel et.al. | 2402.17447 | null |
2024-02-26 | OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA) | Fujian Jia et.al. | 2402.16810 | null |
2024-02-26 | LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery | Kexin Chen et.al. | 2402.16664 | link |
2024-02-26 | LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification | Yiping Song et.al. | 2402.16515 | null |
2024-02-26 | From RAGs to riches: Using large language models to write documents for clinical trials | Nigel Markey et.al. | 2402.16406 | null |
2024-02-25 | HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs | Cem Uluoglakci et.al. | 2402.16211 | link |
2024-02-27 | EHRNoteQA: A Patient-Specific Question Answering Benchmark for Evaluating Large Language Models in Clinical Settings | Sunjun Kweon et.al. | 2402.16040 | link |
2024-02-24 | Predicting Outcomes in Video Games with Long Short Term Memory Networks | Kittimate Chulajata et.al. | 2402.15923 | link |
2024-02-24 | Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study | Zhaoyue Sun et.al. | 2402.15663 | link |
2024-02-23 | Enhancing ICU Patient Recovery: Using LLMs to Assist Nurses in Diary Writing | Samuel Kernan Freire et.al. | 2402.15205 | null |
2024-02-21 | Automatic Histograms: Leveraging Language Models for Text Dataset Exploration | Emily Reif et.al. | 2402.14880 | link |
2024-02-20 | A Dual-Prompting for Interpretable Mental Health Language Models | Hyolim Jeon et.al. | 2402.14854 | null |
2024-02-19 | RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning | Congyun Jin et.al. | 2402.14840 | null |
2024-02-23 | A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health | Nikhil Behari et.al. | 2402.14807 | null |
2024-02-22 | Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond | Zhiyuan Wang et.al. | 2402.14259 | null |
2024-02-22 | Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology | Nur Yildirim et.al. | 2402.14252 | null |
2024-02-21 | On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study | Minh-Hao Van et.al. | 2402.14162 | null |
2024-02-21 | EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy | Hamed Hooshangnejad et.al. | 2402.14099 | null |
2024-02-26 | Towards Building Multilingual Language Model for Medicine | Pengcheng Qiu et.al. | 2402.13963 | link |
2024-02-21 | SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization | Prakamya Mishra et.al. | 2402.13919 | link |
2024-02-21 | Factual Consistency Evaluation of Summarisation in the Era of Large Language Models | Zheheng Luo et.al. | 2402.13758 | null |
2024-02-20 | Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation | Zhiyao Ren et.al. | 2402.13408 | null |
2024-02-17 | When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection | Xiangyu Zhang et.al. | 2402.13276 | null |
2024-02-20 | BiMediX: Bilingual Medical Mixture of Experts LLM | Sara Pieri et.al. | 2402.13253 | link |
2024-02-23 | Benchmarking Retrieval-Augmented Generation for Medicine | Guangzhi Xiong et.al. | 2402.13178 | link |
2024-02-20 | Few shot clinical entity recognition in three languages: Masked language models outperform LLM prompting | Marco Naguib et.al. | 2402.12801 | null |
2024-02-20 | Me LLaMA: Foundation Large Language Models for Medical Applications | Qianqian Xie et.al. | 2402.12749 | link |
2024-02-19 | LLM Agents for Psychology: A Study on Gamified Assessments | Qisen Yang et.al. | 2402.12326 | null |
2024-02-19 | Automatic Evaluation for Mental Health Counseling using LLMs | Anqi Li et.al. | 2402.11958 | null |
2024-02-19 | The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth | Shir Lissak et.al. | 2402.11886 | link |
2024-02-19 | NOTE: Notable generation Of patient Text summaries through Efficient approach based on direct preference optimization | Imjin Ahn et.al. | 2402.11882 | null |
2024-02-20 | MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs | Yavuz Faruk Bakman et.al. | 2402.11756 | link |
2024-02-18 | DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics | YiQiu Guo et.al. | 2402.11481 | null |
2024-02-18 | FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence | Sebastian Antony Joseph et.al. | 2402.11456 | link |
2024-02-20 | Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis | Shaochen Xu et.al. | 2402.11398 | null |
2024-02-17 | Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention | Eunkyung Jo et.al. | 2402.11353 | null |
2024-02-17 | KnowTuning: Knowledge-aware Fine-tuning for Large Language Models | Yougang Lyu et.al. | 2402.11176 | link |
2024-02-24 | Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model | Salman Rahman et.al. | 2402.10965 | null |
2024-02-10 | DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting | Chris von Csefalvay et.al. | 2402.10951 | null |
2024-02-09 | Zero-shot Explainable Mental Health Analysis on Social Media by incorporating Mental Scales | Wenyu Li et.al. | 2402.10948 | null |
2024-02-16 | Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks | Niall Taylor et.al. | 2402.10597 | null |
2024-02-15 | BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains | Yanis Labrak et.al. | 2402.10373 | link |
2024-02-28 | Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients | Mahyar Abbasian et.al. | 2402.10153 | null |
2024-02-15 | Towards Reducing Diagnostic Errors with Interpretable Risk Prediction | Denis Jered McInerney et.al. | 2402.10109 | null |
2024-02-15 | Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4 | Ting Fang Tan et.al. | 2402.10083 | null |
2024-02-21 | AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis | Zhihao Fan et.al. | 2402.09742 | link |
2024-02-15 | GPT-4’s assessment of its performance in a USMLE-based case study | Uttam Dhakal et.al. | 2402.09654 | null |
2024-02-14 | Probabilistic Reasoning in Generative Large Language Models | Aliakbar Nafar et.al. | 2402.09614 | link |
2024-02-16 | Emerging Opportunities of Using Large Language Models for Translation Between Drug Molecules and Indications | David Oniani et.al. | 2402.09588 | null |
2024-02-14 | Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support | Zilin Ma et.al. | 2402.09260 | null |
2024-02-13 | Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy | Gioele Barabucci et.al. | 2402.08806 | null |
2024-02-13 | JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models | Jillian Fisher et.al. | 2402.08761 | link |
2024-02-13 | The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting | David Haag et.al. | 2402.08658 | null |
2024-02-20 | Addressing cognitive bias in medical language models | Samuel Schmidgall et.al. | 2402.08113 | link |
2024-02-02 | Exploring patient trust in clinical advice from AI-driven LLMs like ChatGPT for self-diagnosis | Delong Du et.al. | 2402.07920 | null |
2024-02-12 | CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity | Norbert Tihanyi et.al. | 2402.07688 | null |
2024-02-12 | The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models | Ayo Adedeji et.al. | 2402.07658 | null |
2024-02-12 | Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models | Isabelle Lorge et.al. | 2402.07645 | link |
2024-02-10 | Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations | Ankit Pal et.al. | 2402.07023 | link |
2024-02-10 | REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models | Yinghao Zhu et.al. | 2402.07016 | null |
2024-02-09 | RareBench: Can LLMs Serve as Rare Diseases Specialists? | Xuanzhong Chen et.al. | 2402.06341 | link |
2024-02-08 | FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs | Eun Cheol Choi et.al. | 2402.05904 | link |
2024-02-05 | Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering | Aryan Agrawal et.al. | 2402.05127 | null |
2024-02-05 | Zero-Shot Clinical Trial Patient Matching with LLMs | Michael Wornow et.al. | 2402.05125 | null |
2024-02-07 | CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients | Pragnya Ramjee et.al. | 2402.04620 | link |
2024-02-06 | Measuring Implicit Bias in Explicitly Unbiased Large Language Models | Xuechunzi Bai et.al. | 2402.04105 | link |
2024-02-06 | The Use of a Large Language Model for Cyberbullying Detection | Bayode Ogunleye et.al. | 2402.04088 | null |
2024-02-06 | Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models | Reza Khanmohammadi et.al. | 2402.04075 | null |
2024-02-05 | Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach | Sergi Blanco-Cuaresma et.al. | 2402.03435 | null |
2024-02-05 | Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models | Zhiyuan Hu et.al. | 2402.03271 | link |
2024-02-05 | Large Language Model Distilling Medication Recommendation Model | Qidong Liu et.al. | 2402.02803 | link |
2024-02-05 | RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews | Satpreet Harcharan Singh et.al. | 2402.02656 | link |
2024-02-03 | How well do LLMs cite relevant medical references? An evaluation framework and analyses | Kevin Wu et.al. | 2402.02008 | null |
2024-02-02 | Leveraging Large Language Models for Analyzing Blood Pressure Variations Across Biological Sex from Scientific Literature | Yuting Guo et.al. | 2402.01826 | null |
2024-02-01 | Hierarchical Multi-Label Classification of Online Vaccine Concerns | Chloe Qinyu Zhu et.al. | 2402.01783 | null |
2024-01-30 | Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer’s Dementia | Balamurali B T et.al. | 2402.01751 | null |
2024-01-29 | Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties | Jasmine Chiat Ling Ong et.al. | 2402.01741 | null |
2024-01-29 | Development and Testing of Retrieval Augmented Generation in Large Language Models – A Case Study Report | YuHe Ke et.al. | 2402.01733 | null |
2024-01-28 | Evaluating LLM – Generated Multimodal Diagnosis from Medical Images and Symptom Analysis | Dimitrios P. Panagoulias et.al. | 2402.01730 | null |
2024-02-10 | Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record Data | Yinghao Zhu et.al. | 2402.01713 | link |
2024-01-25 | LLM on FHIR – Demystifying Health Records | Paul Schmiedmayer et.al. | 2402.01711 | null |
2024-01-23 | Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study | Zhe He et.al. | 2402.01693 | null |
2024-02-01 | HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent | Weijie Xu et.al. | 2402.01018 | link |
2024-02-13 | Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model | Mingyu Jin et.al. | 2402.00746 | link |
2024-02-01 | SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models | Tianhan Xu et.al. | 2402.00474 | null |
2024-01-31 | Multimodal Clinical Pseudo-notes for Emergency Department Prediction Tasks using Multiple Embedding Model for EHR (MEME) | Simon A. Lee et.al. | 2402.00160 | link |
2024-01-30 | GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries | Yuyuan Feng et.al. | 2402.00068 | null |
2024-02-03 | EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation | Jonathan W. Kim et.al. | 2401.18006 | null |
2024-01-31 | Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning | Yuelyu Ji et.al. | 2401.17602 | link |
2024-01-30 | Detecting mental disorder on social media: a ChatGPT-augmented explainable approach | Loris Belcastro et.al. | 2401.17477 | link |
2024-02-02 | Leveraging Professional Radiologists’ Expertise to Enhance LLMs’ Evaluation for Radiology Reports | Qingqing Zhu et.al. | 2401.16578 | null |
2024-01-29 | InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification | Jan Trienes et.al. | 2401.16475 | link |
2024-02-16 | Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media | Jiayu Song et.al. | 2401.16240 | null |
2024-01-29 | “You tell me”: A Dataset of GPT-4-Based Behaviour Change Support Conversations | Selina Meyer et.al. | 2401.16167 | null |
2024-01-29 | Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis | Haochun Wang et.al. | 2401.16107 | null |
2024-01-29 | Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning | Kenta Izumi et.al. | 2401.15966 | null |
2024-01-28 | AI as a Medical Ally: Evaluating ChatGPT’s Usage and Impact in Indian Healthcare | Aryaman Raina et.al. | 2401.15605 | null |
2024-01-27 | Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models | Minbyul Jeong et.al. | 2401.15269 | link |
2024-01-26 | Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement Learning | Md Mushfiqur Rahman et.al. | 2401.15043 | link |
2024-01-26 | Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias | Yu He Ke et.al. | 2401.14589 | null |
2024-01-25 | K-QA: A Real-World Medical Q&A Benchmark | Itay Manes et.al. | 2401.14493 | link |
2024-01-25 | LongHealth: A Question Answering Benchmark with Long Clinical Documents | Lisa Adams et.al. | 2401.14490 | link |
2024-01-25 | The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support | Inhwa Song et.al. | 2401.14362 | null |
2024-01-25 | A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification | Madhumita Sushil et.al. | 2401.13887 | null |
2024-01-24 | Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes | Darren Liu et.al. | 2401.13588 | null |
2024-01-20 | Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA | Xi Chen et.al. | 2401.12998 | null |
2024-01-10 | A General-purpose AI Avatar in Healthcare | Nicholas Yan et.al. | 2401.12981 | null |
2024-01-22 | CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation | Zhihong Chen et.al. | 2401.12208 | null |
2024-01-22 | CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark | Ge Zhang et.al. | 2401.11944 | null |
2024-01-21 | MedLM: Exploring Language Models for Medical Question Answering Systems | Niraj Yagnik et.al. | 2401.11389 | link |
2024-01-23 | Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines | David Oniani et.al. | 2401.11120 | null |
2024-01-19 | BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks | Valentina Aparicio et.al. | 2401.11011 | null |
2024-01-19 | Dynamic Q&A of Clinical Documents with Large Language Models | Ran Elgedawy et.al. | 2401.10733 | null |
2024-01-17 | Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods Study | Niklas Mannhardt et.al. | 2401.09637 | null |
2024-01-16 | Gene-associated Disease Discovery Powered by Large Language Models | Jiayu Chang et.al. | 2401.09490 | null |
2024-01-17 | Understanding the concerns and choices of public when using large language models for healthcare | Yunpeng Xiao et.al. | 2401.09090 | null |
2024-01-16 | Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration | Simone Balloccu et.al. | 2401.08420 | link |
2024-01-14 | Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study | Ahmadul Karim Chowdhury et.al. | 2401.07310 | link |
2024-01-13 | EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records | Wenqi Shi et.al. | 2401.07128 | link |
2024-01-13 | NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey | B. Ross Katz et.al. | 2401.06967 | link |
2024-01-12 | Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data | Yubin Kim et.al. | 2401.06866 | link |
2023-12-12 | Large language models in healthcare and medical domain: A review | Zabir Al Nazi et.al. | 2401.06775 | null |
2024-01-11 | Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models | K M Sajjadul Islam et.al. | 2401.06088 | null |
2024-01-11 | EpilepsyLLM: Domain-Specific Large Language Model Fine-tuned with Epilepsy Medical Knowledge | Xuyang Zhao et.al. | 2401.05908 | null |
2024-01-11 | Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback | Chengfeng Dou et.al. | 2401.05695 | link |
2024-01-11 | Towards Conversational Diagnostic AI | Tao Tu et.al. | 2401.05654 | null |
2024-01-18 | MISS: A Generative Pretraining and Finetuning Approach for Med-VQA | Jiawei Chen et.al. | 2401.05163 | link |
2024-01-01 | Large Language Models in Mental Health Care: a Scoping Review | Yining Hua et.al. | 2401.02984 | null |
2024-01-05 | Generative Large Language Models are autonomous practitioners of evidence-based medicine | Akhil Vaid et.al. | 2401.02851 | null |
2024-01-04 | SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval | Griffin Adams et.al. | 2401.02369 | null |
2024-01-04 | Text2MDT: Extracting Medical Decision Trees from Medical Texts | Wei Zhu et.al. | 2401.02034 | null |
2024-01-06 | Generalist embedding models are better at short-context clinical semantic search than specialized embedding models | Jean-Baptiste Excoffier et.al. | 2401.01943 | link |
2024-01-03 | MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries | Akash Ghosh et.al. | 2401.01596 | link |
2024-01-06 | Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review | Luoma Ke et.al. | 2401.01519 | null |
2024-01-03 | Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented Generation | Walid Saba et.al. | 2401.01469 | null |
2024-01-08 | A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models | S. M Towhidul Islam Tonmoy et.al. | 2401.01313 | null |
2024-01-01 | A Computational Framework for Behavioral Assessment of LLM Therapists | Yu Ying Chiu et.al. | 2401.00820 | link |
2023-12-31 | An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks | Yash Bingi et.al. | 2401.00582 | null |
2023-12-31 | Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing | Omid Rohanian et.al. | 2401.00579 | null |
2023-12-29 | K-PERM: Personalized Response Generation Using Dynamic Knowledge Retrieval and Persona-Adaptive Queries | Kanak Raj et.al. | 2312.17748 | link |
2023-12-29 | Overview of the PromptCBLUE Shared Task in CHIP2023 | Wei Zhu et.al. | 2312.17522 | link |
2023-12-29 | Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning | Xiao-Yang Liu et.al. | 2312.17493 | null |
2023-12-29 | EHR Interaction Between Patients and AI: NoteAid EHR Interaction | Xiaocheng Zhang et.al. | 2312.17475 | null |
2023-12-29 | LLM Factoscope: Uncovering LLMs’ Factual Discernment through Inner States Analysis | Jinwen He et.al. | 2312.16374 | null |
2023-12-26 | Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models | Xinke Jiang et.al. | 2312.15883 | null |
2023-12-25 | IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models | Zhihao Chen et.al. | 2312.15663 | null |
2023-12-23 | Multimodal Machine Learning Combining Facial Images and Clinical Texts Improves Diagnosis of Rare Genetic Diseases | Da Wu et.al. | 2312.15320 | link |
2023-12-06 | Empowering ChatGPT-Like Large-Scale Language Models with Local Knowledge Base for Industrial Prognostics and Health Management | Huan Wang et.al. | 2312.14945 | null |
2023-12-22 | Robust Knowledge Extraction from Large Language Models using Social Choice Theory | Nico Potyka et.al. | 2312.14877 | link |
2023-12-22 | Zero-shot Causal Graph Extrapolation from Text via LLMs | Alessandro Antonucci et.al. | 2312.14670 | link |
2023-12-19 | Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning | Xiaodan Zhang et.al. | 2312.14184 | null |
2023-12-20 | Exploring Multimodal Large Language Models for Radiology Report Error-checking | Jinge Wu et.al. | 2312.13103 | null |
2023-12-20 | MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models | Yan Cai et.al. | 2312.12806 | null |
2023-12-20 | Fine-tuning Large Language Models for Adaptive Machine Translation | Yasmin Moslem et.al. | 2312.12740 | link |
2023-12-20 | Mini-GPTs: Efficient Large Language Models through Contextual Pruning | Tim Valicenti et.al. | 2312.12682 | null |
2023-12-19 | Can ChatGPT be Your Personal Medical Assistant? | Md. Rafiul Biswas et.al. | 2312.12006 | null |
2023-12-19 | Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health | Maria Antoniak et.al. | 2312.11803 | link |
2023-12-16 | CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare | Akash Ghosh et.al. | 2312.11541 | link |
2023-12-16 | A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers | Feida Gu et.al. | 2312.10419 | null |
2023-12-15 | GPT-doctor: Customizing Large Language Models for Medical Consultation | Wen Wang et.al. | 2312.10225 | null |
2023-12-15 | Low-resource classification of mobility functioning information in clinical sentences using large language models | Tuan Dung Le et.al. | 2312.10202 | null |
2023-12-06 | Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk | Colleen Chan et.al. | 2312.10072 | null |
2023-12-15 | Distilling Large Language Models for Matching Patients to Clinical Trials | Mauro Nievas et.al. | 2312.09958 | null |
2024-01-07 | RJUA-QA: A Comprehensive QA Dataset for Urology | Shiwei Lyu et.al. | 2312.09785 | link |
2023-12-14 | Evaluating Large Language Models for Health-related Queries with Presuppositions | Navreet Kaur et.al. | 2312.08800 | link |
2023-12-15 | High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models | Songchi Zhou et.al. | 2312.08274 | null |
2023-12-13 | CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs | Huaiyuan Ying et.al. | 2312.08036 | link |
2023-12-12 | Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales | Taeyoon Kwon et.al. | 2312.07399 | link |
2023-12-12 | Efficient Few-Shot Clinical Task Adaptation with Large Language Models | Kaipeng Zheng et.al. | 2312.07125 | null |
2023-12-12 | SM70: A Large Language Model for Medical Devices | Anubhav Bhatti et.al. | 2312.06974 | null |
2023-12-05 | Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety | Manas Gaur et.al. | 2312.06798 | null |
2023-12-11 | Large Language Models with Retrieval-Augmented Generation for Zero-Shot Disease Phenotyping | Will E. Thompson et.al. | 2312.06457 | null |
2023-12-11 | Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need | Cheng Peng et.al. | 2312.06099 | null |
2023-12-09 | Enhancing Medical Specialty Assignment to Patients using NLP Techniques | Chris Solomou et.al. | 2312.05585 | null |
2023-11-10 | Holistic Evaluation of GPT-4V for Biomedical Imaging | Zhengliang Liu et.al. | 2312.05256 | null |
2023-12-08 | Ophtha-LLaMA2: A Large Language Model for Ophthalmology | Huan Zhao et.al. | 2312.04906 | null |
2023-12-07 | AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making | Shusen Liu et.al. | 2312.04494 | null |
2023-12-08 | Methods to Estimate Large Language Model Confidence | Maia Kotelanski et.al. | 2312.03733 | null |
2023-12-06 | XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering | Joel Stremmel et.al. | 2312.03567 | null |
2023-12-05 | Breast Ultrasound Report Generation using LangChain | Jaeyoung Huh et.al. | 2312.03013 | null |
2023-12-05 | MedDM:LLM-executable clinical guidance tree for clinical decision-making | Binbin Li et.al. | 2312.02441 | null |
2023-12-04 | LLMs Accelerate Annotation for Medical Information Extraction | Akshay Goel et.al. | 2312.02296 | null |
2023-12-04 | MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model | Ling Yang et.al. | 2312.02233 | null |
2023-12-03 | Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation | Yuzhe Lu et.al. | 2312.01504 | null |
2023-12-18 | From Beginner to Expert: Modeling Medical Knowledge into General LLMs | Qiang Li et.al. | 2312.01040 | null |
2023-12-01 | Explanatory Argument Extraction of Correct Answers in Resident Medical Exams | Iakes Goenaga et.al. | 2312.00567 | link |
2023-11-30 | Towards Accurate Differential Diagnosis with Large Language Models | Daniel McDuff et.al. | 2312.00164 | null |
2023-11-30 | RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance | Chantal Pellegrini et.al. | 2311.18681 | link |
2023-11-29 | Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A | Andries Smit et.al. | 2311.17371 | link |
2023-11-27 | MEDITRON-70B: Scaling Medical Pretraining for Large Language Models | Zeming Chen et.al. | 2311.16079 | link |
2023-11-27 | BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights | François Remy et.al. | 2311.16075 | null |
2023-11-27 | RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization | Kwanyoung Kim et.al. | 2311.15876 | null |
2023-11-28 | The effect of source disclosure on evaluation of AI-generated messages: A two-part study | Sue Lim et.al. | 2311.15544 | null |
2023-11-25 | Walking a Tightrope – Evaluating Large Language Models in High-Risk Domains | Chia-Chien Hung et.al. | 2311.14966 | null |
2023-11-20 | MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer’s Care Via Unleashing Generative AI | Lifei Zheng et.al. | 2311.14730 | null |
2023-11-10 | ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management | Angela Zhang et.al. | 2311.14703 | null |
2023-11-07 | Benefits and Harms of Large Language Models in Digital Mental Health | Munmun De Choudhury et.al. | 2311.14693 | null |
2023-11-23 | Challenges of Large Language Models for Mental Health Counseling | Neo Christopher Chung et.al. | 2311.13857 | null |
2023-11-22 | Surpassing GPT-4 Medical Coding with a Two-Stage Approach | Zhichao Yang et.al. | 2311.13735 | null |
2023-11-22 | Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting | Daphne van Zandvoort et.al. | 2311.13274 | null |
2023-11-25 | From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models | Zachary Englhardt et.al. | 2311.13063 | link |
2023-10-28 | Overview of Current Applications of Large Language Models in Various Medical Specialities | Ummara Mumtaz et.al. | 2311.12882 | null |
2023-11-21 | ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models | Jiankai Tang et.al. | 2311.12524 | link |
2023-11-20 | Web News Timeline Generation with Extended Task Prompting | Sha Wang et.al. | 2311.11652 | null |
2023-12-17 | Rethinking Large Language Models in Mental Health Applications | Shaoxiong Ji et.al. | 2311.11267 | null |
2023-11-18 | Designing Interpretable ML System to Enhance Trustworthy AI in Healthcare: A Systematic Review of the Last Decade to A Proposed Robust Framework | Elham Nasarian et.al. | 2311.11055 | null |
2023-11-17 | PEFT-MedAware: Large Language Model for Medical Awareness | Keivalya Pandya et.al. | 2311.10697 | null |
2023-11-17 | Countering Misinformation via Emotional Response Generation | Daniel Russo et.al. | 2311.10587 | link |
2023-11-16 | MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning | Xiangru Tang et.al. | 2311.10537 | link |
2023-11-16 | ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond | Kanhai S. Amin et.al. | 2311.10075 | null |
2023-11-16 | HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs | Junying Chen et.al. | 2311.09774 | link |
2023-11-16 | CARE: Extracting Experimental Findings From Clinical Literature | Aakanksha Naik et.al. | 2311.09736 | null |
2023-11-16 | Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation | Zonghai Yao et.al. | 2311.09684 | link |
2023-11-16 | LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks | Mihir Parmar et.al. | 2311.09564 | link |
2023-11-12 | Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling | Yujin Cho et.al. | 2311.09243 | null |
2023-11-15 | PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health | Haoan Jin et.al. | 2311.09189 | link |
2023-11-14 | Fine-tuning Language Models for Factuality | Katherine Tian et.al. | 2311.08401 | null |
2023-11-14 | Extrinsically-Focused Evaluation of Omissions in Medical Summarization | Elliot Schumacher et.al. | 2311.08303 | link |
2023-11-14 | Insights into Classifying and Mitigating LLMs’ Hallucinations | Alessandro Bruno et.al. | 2311.08117 | null |
2023-11-13 | It’s Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models | Nishant Balepur et.al. | 2311.07532 | link |
2023-11-13 | Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer | Narmada Naik et.al. | 2311.07191 | null |
2023-11-12 | Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations? | Antonio Zaitoun et.al. | 2311.06858 | link |
2023-11-23 | ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences | Yuanhe Tian et.al. | 2311.06025 | link |
2023-11-09 | A Survey of Large Language Models in Medicine: Progress, Application, and Challenge | Hongjian Zhou et.al. | 2311.05112 | link |
2023-11-08 | DEMASQ: Unmasking the ChatGPT Wordsmith | Kavita Kumari et.al. | 2311.05019 | null |
2023-11-07 | Evaluating Large Language Models in Ophthalmology | Jason Holmes et.al. | 2311.04933 | null |
2023-11-07 | Evaluating multiple large language models in pediatric ophthalmology | Jason Holmes et.al. | 2311.04368 | null |
2023-11-08 | An Introduction to Natural Language Processing Techniques and Framework for Clinical Implementation in Radiation Oncology | Reza Khanmohammadi et.al. | 2311.02205 | null |
2023-11-03 | Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review | Mingze Yuan et.al. | 2311.01918 | link |
2023-11-27 | LLM-driven Multimodal Target Volume Contouring in Radiation Oncology | Yujin Oh et.al. | 2311.01908 | link |
2023-11-01 | Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models | Ran Xu et.al. | 2311.00287 | link |
2023-10-31 | Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision | Jiaxin Zhang et.al. | 2310.20153 | null |
2023-11-03 | Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization | Prakamya Mishra et.al. | 2310.20033 | link |
2023-10-30 | EHRTutor: Enhancing Patient Understanding of Discharge Instructions | Zihao Zhang et.al. | 2310.19212 | null |
2023-10-23 | Health Disparities through Generative AI Models: A Comparison Study Using A Domain Specific large language model | Yohn Jairo Parra Bautista et.al. | 2310.18355 | null |
2023-10-21 | MOELoRA: An MOE-based Parameter Efficient Fine-Tuning Method for Multi-task Medical Applications | Qidong Liu et.al. | 2310.18339 | link |
2023-11-01 | Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare | Junling Liu et.al. | 2310.17956 | link |
2023-10-31 | Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting | Benjamin Yan et.al. | 2310.17811 | null |
2023-10-25 | An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives | Young Min Cho et.al. | 2310.17017 | link |
2023-10-24 | Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature | Alejandro Lozano et.al. | 2310.16146 | link |
2023-10-24 | NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes | Junda Wang et.al. | 2310.15959 | link |
2023-10-24 | BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT | Yirong Chen et.al. | 2310.15896 | link |
2023-10-24 | BLESS: Benchmarking Large Language Models on Sentence Simplification | Tannon Kew et.al. | 2310.15773 | link |
2023-10-23 | AlpaCare:Instruction-tuned Large Language Models for Medical Application | Xinlu Zhang et.al. | 2310.14558 | link |
2023-10-22 | PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain | Wei Zhu et.al. | 2310.14151 | link |
2023-10-23 | Explainable Depression Symptom Detection in Social Media | Eliseo Bao Souto et.al. | 2310.13664 | null |
2023-10-23 | Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries | Yiqiao Jin et.al. | 2310.13132 | link |
2023-10-19 | Causal-structure Driven Augmentations for Text OOD Generalization | Amir Feder et.al. | 2310.12803 | null |
2023-10-18 | On the Benefit of Generative Foundation Models for Human Activity Recognition | Zikang Leng et.al. | 2310.12085 | null |
2023-10-17 | Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models | Khushboo Verma et.al. | 2310.11266 | null |
2023-10-16 | JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning | Issey Sukeda et.al. | 2310.10083 | null |
2023-10-13 | Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation | Eun Cheol Choi et.al. | 2310.09223 | null |
2023-10-13 | Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model | Qichen Ye et.al. | 2310.09089 | link |
UncertaintyLLM
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-26 | Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection | Ali Şenol et.al. | 2506.21443 | null |
2025-06-26 | Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference | Colin Samplawski et.al. | 2506.21408 | null |
2025-06-26 | Small Encoders Can Rival Large Decoders in Detecting Groundedness | Istabrak Abbes et.al. | 2506.21288 | null |
2025-06-26 | BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services | Zhaojiacheng Zhou et.al. | 2506.21033 | null |
2025-06-26 | Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers | Martin Ruskov et.al. | 2506.20982 | null |
2025-06-25 | Towards Probabilistic Question Answering Over Tabular Data | Chen Shen et.al. | 2506.20747 | null |
2025-06-25 | Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges | Alexander D. Kalian et.al. | 2506.20598 | null |
2025-06-26 | TAPS: Tool-Augmented Personalisation via Structured Tagging | Ekaterina Taktasheva et.al. | 2506.20409 | null |
2025-06-25 | Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models | Kejia Chen et.al. | 2506.20251 | null |
2025-06-25 | DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs | Ruokai Yin et.al. | 2506.20194 | null |
2025-06-24 | KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | Baochang Ren et.al. | 2506.19807 | null |
2025-06-24 | LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis | Lei Kang et.al. | 2506.19702 | null |
2025-06-24 | Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge | Juraj Vladika et.al. | 2506.19607 | null |
2025-06-24 | Automatic Posology Structuration : What role for LLMs? | Natalia Bobkova et.al. | 2506.19525 | null |
2025-06-24 | Inference-Time Reward Hacking in Large Language Models | Hadi Khalaf et.al. | 2506.19248 | null |
2025-06-23 | AgenticControl: An Automated Control Design Framework Using Large Language Models | Mohammad Narimani et.al. | 2506.19160 | null |
2025-06-23 | Human-Aligned Faithfulness in Toxicity Explanations of LLMs | Ramaravind K. Mothilal et.al. | 2506.19113 | null |
2025-06-23 | Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge | Sahil Kale et.al. | 2506.18998 | null |
2025-06-23 | AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs | Piotr Matys et.al. | 2506.18628 | null |
2025-06-23 | ReFrame: Rectification Framework for Image Explaining Architectures | Debjyoti Das Adhikary et.al. | 2506.18272 | null |
2025-06-24 | Understanding Reasoning in Thinking Language Models via Steering Vectors | Constantin Venhoff et.al. | 2506.18167 | null |
2025-06-22 | Mechanistic Interpretability in the Presence of Architectural Obfuscation | Marcos Florencio et.al. | 2506.18053 | null |
2025-06-22 | QueueEDIT: Structural Self-Correction for Sequential Model Editing in LLMs | Taolin Zhang et.al. | 2506.17864 | null |
2025-06-21 | Is Your Automated Software Engineer Trustworthy? | Noble Saji Mathews et.al. | 2506.17812 | null |
2025-06-24 | KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation | Dalong Zhang et.al. | 2506.17728 | null |
2025-06-21 | Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering | Binquan Ji et.al. | 2506.17692 | null |
2025-06-21 | Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models | Yukun Huang et.al. | 2506.17585 | null |
2025-06-20 | OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections | Manasa Bharadwaj et.al. | 2506.17449 | null |
2025-06-20 | UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making | Jinhao Duan et.al. | 2506.17419 | null |
2025-06-20 | Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs | Zongjie Li et.al. | 2506.17353 | null |
2025-06-18 | Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study | Chuanlei Li et.al. | 2506.17311 | null |
2025-06-17 | Semantic uncertainty in advanced decoding methods for LLM generation | Darius Foodeei et.al. | 2506.17296 | null |
2025-06-20 | Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction | Jiekai Ma et.al. | 2506.17203 | null |
2025-06-20 | Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation | Jiahao Cheng et.al. | 2506.17088 | null |
2025-06-20 | Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond | Antonin Berthon et.al. | 2506.16982 | null |
2025-06-20 | DistillNote: LLM-based clinical note summaries improve heart failure diagnosis | Heloisa Oss Boll et.al. | 2506.16777 | null |
2025-06-20 | eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing | Isaac Shi et.al. | 2506.16768 | null |
2025-06-20 | The Role of Model Confidence on Bias Effects in Measured Uncertainties | Xinyi Liu et.al. | 2506.16724 | null |
2025-06-19 | Grounding Language Models with Semantic Digital Twins for Robotic Planning | Mehreen Naeem et.al. | 2506.16493 | null |
2025-06-19 | Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation | Guilherme Guerino et.al. | 2506.16345 | null |
2025-06-19 | SGIC: A Self-Guided Iterative Calibration Framework for RAG | Guanhua Chen et.al. | 2506.16172 | null |
2025-06-19 | Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior | Hao Li et.al. | 2506.16163 | link |
2025-06-19 | Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning | Duc Hieu Ho et.al. | 2506.16064 | null |
2025-06-19 | DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling | Fei Wang et.al. | 2506.16043 | null |
2025-06-18 | Understanding Online Polarization Through Human-Agent Interaction in a Synthetic LLM-Based Social Network | Tim Donkers et.al. | 2506.15866 | null |
2025-06-18 | PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection | Wenhao Li et.al. | 2506.15656 | null |
2025-06-18 | Context-Informed Grounding Supervision | Hyunji Lee et.al. | 2506.15480 | link |
2025-06-18 | Unlocking Post-hoc Dataset Inference with Synthetic Data | Bihe Zhao et.al. | 2506.15271 | null |
2025-06-18 | Robust Instant Policy: Leveraging Student’s t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation | Hanbit Oh et.al. | 2506.15157 | null |
2025-06-18 | HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models | Trishna Chakraborty et.al. | 2506.15065 | null |
2025-06-17 | Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning | Wassim Bouaziz et.al. | 2506.14913 | null |
2025-06-17 | Issue Retrieval and Verification Enhanced Supplementary Code Comment Generation | Yanzhen Zou et.al. | 2506.14649 | link |
2025-06-17 | Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees | Ahmed Heakl et.al. | 2506.14606 | null |
2025-06-17 | RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition | Tim Cofala et.al. | 2506.14412 | null |
2025-06-17 | Don’t Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning | William F. Shen et.al. | 2506.14387 | null |
2025-06-17 | AviationLLM: An LLM-based Knowledge System for Aviation Training | Jia’ang Wan et.al. | 2506.14336 | null |
2025-06-17 | Improving LoRA with Variational Learning | Bai Cong et.al. | 2506.14280 | null |
2025-06-17 | DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization | Chengyu Huang et.al. | 2506.14157 | link |
2025-06-17 | Abstract Meaning Representation for Hospital Discharge Summarization | Paul Landes et.al. | 2506.14101 | link |
2025-06-20 | Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs | Hen Davidov et.al. | 2506.13593 | link |
2025-06-16 | Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning | David Bani-Harouni et.al. | 2506.13474 | null |
2025-06-17 | ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models | Junho Yoon et.al. | 2506.13472 | null |
2025-06-16 | From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs | Alsharif Abuadbba et.al. | 2506.13434 | null |
2025-06-16 | Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs | Houcheng Jiang et.al. | 2506.13285 | null |
2025-06-16 | IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation | Zijie Lin et.al. | 2506.13229 | link |
2025-06-16 | SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists | Lynn Khellaf et.al. | 2506.13188 | null |
2025-06-16 | Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning | Danny Hoang et.al. | 2506.13026 | null |
2025-06-17 | Surprise Calibration for Better In-Context Learning | Zhihang Tan et.al. | 2506.12796 | null |
2025-06-15 | Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning | Alexis R. Tudor et.al. | 2506.12667 | null |
2025-06-14 | Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics | Jiarui Liu et.al. | 2506.12657 | null |
2025-06-14 | GenControl: Generative AI-Driven Autonomous Design of Control Algorithms | Chenggang Cui et.al. | 2506.12554 | null |
2025-06-14 | RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking | Shuo Yang et.al. | 2506.12538 | null |
2025-06-14 | Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation | Xiangyan Chen et.al. | 2506.12496 | null |
2025-06-14 | MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination | Ao Jia et.al. | 2506.12483 | null |
2025-06-13 | Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach | Khadija Zanna et.al. | 2506.12227 | null |
2025-06-13 | A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions | Stephen Mell et.al. | 2506.12202 | null |
2025-06-13 | Maximally-Informative Retrieval for State Space Model Generation | Evan Becker et.al. | 2506.12149 | null |
2025-06-12 | LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model’s Response for Vulnerability Analysis | Reza Fayyazi et.al. | 2506.12100 | link |
2025-06-13 | LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? | Zihan Zheng et.al. | 2506.11928 | null |
2025-06-13 | TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Zhenyu Hou et.al. | 2506.11902 | link |
2025-06-16 | Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making | Claudio Fanconi et.al. | 2506.11887 | null |
2025-06-13 | Are LLMs Good Text Diacritizers? An Arabic and Yorùbá Case Study | Hawau Olamide Toyin et.al. | 2506.11602 | null |
2025-06-13 | Augmenting the Generality and Performance of Large Language Models for Software Engineering | Fabian C. Peña et.al. | 2506.11548 | null |
2025-06-11 | Digitization of Document and Information Extraction using OCR | Rasha Sinha et.al. | 2506.11156 | null |
2025-06-11 | From over-reliance to smart integration: using Large-Language Models as translators between specialized modeling and simulation tools | Philippe J. Giabbanelli et.al. | 2506.11141 | null |
2025-06-10 | Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK | Carlos Garcia-Fernandez et.al. | 2506.11129 | null |
2025-06-14 | Farseer: A Refined Scaling Law in Large Language Models | Houyi Li et.al. | 2506.10972 | link |
2025-06-12 | Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers | Yixiao Huang et.al. | 2506.10887 | null |
2025-06-13 | Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles | Qingyan Wei et.al. | 2506.10848 | link |
2025-06-12 | Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs | Alberto Testoni et.al. | 2506.10769 | null |
2025-06-12 | Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs | Yilin Xiao et.al. | 2506.10508 | null |
2025-06-12 | PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier | Yuhua Jiang et.al. | 2506.10406 | null |
2025-06-12 | AutoGEEval++: A Multi-Level and Multi-Geospatial-Modality Automated Evaluation Framework for Large Language Models in Geospatial Code Generation on Google Earth Engine | Shuyang Hou et.al. | 2506.10365 | null |
2025-06-12 | TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree | Yu-Yang Qian et.al. | 2506.10355 | link |
2025-06-12 | Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements | Seyed Moein Abtahi et.al. | 2506.10330 | null |
2025-06-12 | WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models | Qiyue Yin et.al. | 2506.10264 | null |
2025-06-11 | ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs | Xiyao Wang et.al. | 2506.10128 | link |
2025-06-11 | Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection | David Farr et.al. | 2506.10104 | null |
2025-06-11 | Textual Bayes: Quantifying Uncertainty in LLM-Based Systems | Brendan Leigh Ross et.al. | 2506.10060 | null |
2025-06-10 | Evaluation empirique de la sécurisation et de l’alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks | Rafaël Nouailles et.al. | 2506.10029 | null |
2025-06-16 | Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs | Hiroshi Matsuda et.al. | 2506.09983 | link |
2025-06-11 | Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs | Rodion Oblovatny et.al. | 2506.09886 | null |
2025-06-11 | Do LLMs Give Psychometrically Plausible Responses in Educational Assessments? | Andreas Säuberli et.al. | 2506.09796 | null |
2025-06-11 | Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models | Haoyi Song et.al. | 2506.09684 | link |
2025-06-11 | Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering | Tianjun Yao et.al. | 2506.09645 | link |
2025-06-11 | HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding | Yanzhao Shi et.al. | 2506.09634 | null |
2025-06-11 | From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies | Blaž Škrlj et.al. | 2506.09566 | null |
2025-06-11 | DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts | Yuchen Feng et.al. | 2506.09351 | null |
2025-06-11 | Know What You Don’t Know: Uncertainty Calibration of Process Reward Models | Young-Jin Park et.al. | 2506.09338 | null |
2025-06-10 | G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration | Samuel Holt et.al. | 2506.09272 | null |
2025-06-10 | Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation | Karl Löwenmark et.al. | 2506.09247 | null |
2025-06-10 | The Curious Language Model: Strategic Test-Time Information Acquisition | Michael Cooper et.al. | 2506.09173 | null |
2025-06-10 | Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models | Xinyuan Wang et.al. | 2506.09084 | null |
2025-06-10 | FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making | Jiaxiang Chen et.al. | 2506.09080 | null |
2025-06-10 | AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions | Polina Kirichenko et.al. | 2506.09038 | link |
2025-06-11 | Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance | Kaifeng He et.al. | 2506.08980 | null |
2025-06-10 | The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation | Francisco Vargas et.al. | 2506.08827 | null |
2025-06-12 | ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization | Hee Suk Yoon et.al. | 2506.08712 | null |
2025-06-10 | RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being | Rahatara Ferdousi et.al. | 2506.08486 | null |
2025-06-10 | Olica: Efficient Structured Pruning of Large Language Models without Retraining | Jiujun He et.al. | 2506.08436 | link |
2025-06-11 | Transforming Expert Knowledge into Scalable Ontology via Large Language Models | Ikkei Itoku et.al. | 2506.08422 | null |
2025-06-09 | Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic | Zhenjiang Mao et.al. | 2506.08243 | null |
2025-06-09 | Conservative Bias in Large Language Models: Measuring Relation Predictions | Toyin Aguda et.al. | 2506.08120 | null |
2025-06-10 | Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jiaxiang Chen et.al. | 2506.07820 | null |
2025-06-09 | Language-Vision Planner and Executor for Text-to-Visual Reasoning | Yichang Xu et.al. | 2506.07778 | null |
2025-06-09 | QUITE: A Query Rewrite System Beyond Rules with LLM Agents | Yuyang Song et.al. | 2506.07675 | null |
2025-06-09 | Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models | Ruiyang Zhang et.al. | 2506.07575 | null |
2025-06-09 | SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition | Mengsong Wu et.al. | 2506.07557 | null |
2025-06-09 | CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models | Guang Liu et.al. | 2506.07463 | null |
2025-06-09 | From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered | Siddartha Devic et.al. | 2506.07461 | null |
2025-06-09 | Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs | T. Duy Nguyen-Hien et.al. | 2506.07448 | null |
2025-06-11 | MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models | Philip R. Liu et.al. | 2506.07400 | link |
2025-06-10 | ARGUS: Hallucination and Omission Evaluation in Video-LLMs | Ruchit Rawal et.al. | 2506.07371 | null |
2025-06-08 | ConfQA: Answer Only If You Are Confident | Yin Huang et.al. | 2506.07309 | null |
2025-06-08 | Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance | Mohammadreza Chavoshi et.al. | 2506.07273 | null |
2025-06-08 | Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis | Yaping Chai et.al. | 2506.07148 | null |
2025-06-08 | Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models | Samir Abdaljalil et.al. | 2506.07106 | null |
2025-06-08 | Com $^2$ : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models | Kai Xiong et.al. | 2506.07064 | null |
2025-06-08 | AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint | Leheng Sheng et.al. | 2506.07022 | link |
2025-06-07 | Quantile Regression with Large Language Models for Price Prediction | Nikhita Vedula et.al. | 2506.06657 | null |
2025-06-07 | \textit{QuantMCP}: Grounding Large Language Models in Verifiable Financial Reality | Yifan Zeng et.al. | 2506.06622 | null |
2025-06-06 | Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques | Adarsh Prasad Behera et.al. | 2506.06579 | null |
2025-06-06 | Beyond Facts: Evaluating Intent Hallucination in Large Language Models | Yijie Hao et.al. | 2506.06539 | null |
2025-06-11 | Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models | Pengyi Li et.al. | 2506.06395 | null |
2025-06-04 | On the Fundamental Impossibility of Hallucination Control in Large Language Models | Michał P. Karpowicz et.al. | 2506.06382 | null |
2025-06-06 | Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge | Yi Sui et.al. | 2506.06240 | null |
2025-06-06 | Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach | James Ford et.al. | 2506.06175 | null |
2025-06-06 | Recommender systems, stigmergy, and the tyranny of popularity | Zackary Okun Dunivin et.al. | 2506.06162 | null |
2025-06-09 | MIRIAD: Augmenting LLMs with millions of medical query-response pairs | Qinyue Zheng et.al. | 2506.06091 | null |
2025-06-06 | AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | Yu Li et.al. | 2506.06017 | null |
2025-06-06 | Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques | Xiaofei Xu et.al. | 2506.05924 | null |
2025-06-06 | Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness | Rongzhe Wei et.al. | 2506.05735 | null |
2025-06-09 | Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models | Zefan Zeng et.al. | 2506.05675 | null |
2025-06-05 | When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding | Yan Shu et.al. | 2506.05551 | null |
2025-06-05 | Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models | Sima Noorani et.al. | 2506.05497 | null |
2025-06-05 | CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection | Ron Eliav et.al. | 2506.05243 | null |
2025-06-05 | On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools | Shivani Upadhyay et.al. | 2506.05182 | link |
2025-06-05 | When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models | Kai Wang et.al. | 2506.04909 | null |
2025-06-05 | Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights | Giorgio Biancini et.al. | 2506.04851 | null |
2025-06-05 | Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models | Changyue Wang et.al. | 2506.04832 | link |
2025-06-05 | A Reasoning-Based Approach to Cryptic Crossword Clue Solving | Martin Andrews et.al. | 2506.04824 | null |
2025-06-05 | GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval | Lingyuan Liu et.al. | 2506.04762 | link |
2025-06-05 | Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning | Zhiyuan Ma et.al. | 2506.04625 | null |
2025-06-05 | Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification | Chengwu Liu et.al. | 2506.04592 | null |
2025-06-04 | AuthGuard: Generalizable Deepfake Detection via Language Guidance | Guangyu Shen et.al. | 2506.04501 | null |
2025-06-04 | “Don’t Do That!”: Guiding Embodied Systems through Large Language Model-based Constraint Generation | Aladin Djuhera et.al. | 2506.04500 | null |
2025-06-04 | Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification | Payel Bhattacharjee et.al. | 2506.04450 | null |
2025-06-06 | TracLLM: A Generic Framework for Attributing Long Context LLMs | Yanting Wang et.al. | 2506.04202 | link |
2025-06-04 | N $^2$ : A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion | Caleb Chin et.al. | 2506.04166 | link |
2025-06-04 | A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization | Sarvesh Soni et.al. | 2506.04156 | null |
2025-06-04 | High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning | Tim Franzmeyer et.al. | 2506.04051 | null |
2025-06-04 | Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization | Jiulong Wu et.al. | 2506.04039 | null |
2025-06-05 | Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems | Yuxin Zhang et.al. | 2506.03901 | null |
2025-06-04 | Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation | Mingxuan Xia et.al. | 2506.03857 | null |
2025-06-04 | From Theory to Practice: Real-World Use Cases on Trustworthy LLM-Driven Process Modeling, Prediction and Automation | Peter Pfeiffer et.al. | 2506.03801 | null |
2025-06-04 | Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision | Chaeyun Jang et.al. | 2506.03723 | null |
2025-06-04 | AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism | Zhepei Wei et.al. | 2506.03700 | link |
2025-06-04 | Robust Preference Optimization via Dynamic Target Margins | Jie Sun et.al. | 2506.03690 | null |
2025-06-04 | Trustworthy Medical Question Answering: An Evaluation-Centric Survey | Yinuo Wang et.al. | 2506.03659 | null |
2025-06-04 | Learning to Insert [PAUSE] Tokens for Better Reasoning | Eunki Kim et.al. | 2506.03616 | null |
2025-06-04 | Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering | Zhuo Zhuo et.al. | 2506.03504 | null |
2025-06-03 | Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior | Yue Gong et.al. | 2506.03444 | null |
2025-06-03 | Sampling Preferences Yields Simple Trustworthiness Scores | Sean Steinle et.al. | 2506.03399 | null |
2025-06-03 | Ask a Local: Detecting Hallucinations With Specialized Model Divergence | Aldan Creo et.al. | 2506.03357 | null |
2025-06-03 | Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows | Yifei Ming et.al. | 2506.03332 | null |
2025-06-03 | FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes | Christodoulos Constantinides et.al. | 2506.03278 | link |
2025-06-03 | Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech | Florian Ludwig et.al. | 2506.03009 | null |
2025-06-03 | Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation | Li Zhang et.al. | 2506.02992 | null |
2025-06-03 | Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation | Dingwei Chen et.al. | 2506.02973 | null |
2025-06-04 | A Multi-agent LLM-based JUnit Test Generation with Strong Oracles | Qinghua Xu et.al. | 2506.02943 | null |
2025-06-03 | Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs | Shangmin Guo et.al. | 2506.02918 | null |
2025-06-03 | Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs | Wenjing Tang et.al. | 2506.02860 | null |
2025-06-03 | Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations | Jinyuan Luo et.al. | 2506.02696 | null |
2025-06-04 | Computational Thinking Reasoning in Large Language Models | Kechi Zhang et.al. | 2506.02658 | null |
2025-06-03 | In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration | Jiajie Fu et.al. | 2506.02509 | null |
2025-06-03 | Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning | Haowen Xu et.al. | 2506.02485 | null |
2025-06-02 | Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation | Priyaranjan Pattnayak et.al. | 2506.02097 | null |
2025-06-02 | DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation | Jennifer Chen et.al. | 2506.01954 | null |
2025-06-02 | Self-ensemble: Mitigating Confidence Distortion for Large Language Models | Zicheng Xu et.al. | 2506.01951 | null |
2025-06-02 | WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue | Yaoyao Qian et.al. | 2506.01881 | link |
2025-06-02 | Benford’s Curse: Tracing Digit Bias to Numerical Hallucination in LLMs | Jiandong Shao et.al. | 2506.01734 | null |
2025-06-02 | Fairness Dynamics During Training | Krishna Patel et.al. | 2506.01709 | null |
2025-06-02 | When LLMs Team Up: The Emergence of Collaborative Affective Computing | Wenna Lai et.al. | 2506.01698 | null |
2025-06-02 | MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments | Xiao Yang et.al. | 2506.01616 | null |
2025-06-02 | Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes | Meng Li et.al. | 2506.01512 | null |
2025-06-02 | MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations | Kensuke Mitsuzawa et.al. | 2506.01367 | null |
2025-06-02 | Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents | Manan Suri et.al. | 2506.01344 | null |
2025-06-02 | Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model | Yuanhe Tian et.al. | 2506.01266 | null |
2025-06-01 | Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation | Pimchanok Sukjai et.al. | 2506.01118 | null |
2025-06-01 | ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation | Xinyi Liu et.al. | 2506.01116 | null |
2025-06-01 | Reconsidering LLM Uncertainty Estimation Methods in the Wild | Yavuz Bakman et.al. | 2506.01114 | null |
2025-06-01 | Contextual Candor: Enhancing LLM Trustworthiness Through Hierarchical Unanswerability Detection | Steven Robinson et.al. | 2506.01104 | null |
2025-06-01 | Taming LLMs by Scaling Learning Rates with Gradient Grouping | Siyuan Li et.al. | 2506.01049 | null |
2025-06-01 | Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks | Yuntai Bao et.al. | 2506.00823 | link |
2025-06-01 | One for All: Update Parameterized Knowledge Across Multiple Models | Weitao Ma et.al. | 2506.00817 | null |
2025-06-01 | Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision | Jiahui Zhou et.al. | 2506.00807 | null |
2025-06-01 | KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision | Rong Wu et.al. | 2506.00783 | null |
2025-06-01 | Do not Abstain! Identify and Solve the Uncertainty | Jingyu Liu et.al. | 2506.00780 | null |
2025-05-31 | Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection | Yeshwanth Venkatesha et.al. | 2506.00743 | null |
2025-05-31 | Pitfalls in Evaluating Language Model Forecasters | Daniel Paleka et.al. | 2506.00723 | null |
2025-06-03 | Measuring Faithfulness and Abstention: An Automated Pipeline for Evaluating LLM-Generated 3-ply Case-Based Legal Arguments | Li Zhang et.al. | 2506.00694 | null |
2025-05-31 | Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs | Chenjun Xu et.al. | 2506.00582 | link |
2025-05-31 | AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs | Nicholas E. Corrado et.al. | 2506.00569 | null |
2025-06-03 | CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention | Yuxi Sun et.al. | 2506.00519 | null |
2025-05-31 | Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering | Linhao Ye et.al. | 2506.00491 | null |
2025-05-31 | Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization | Suhas BN et.al. | 2506.00448 | null |
2025-05-31 | Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy | Jie Ren et.al. | 2506.00359 | null |
2025-05-31 | Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs | Sungjae Lee et.al. | 2506.00344 | null |
2025-05-31 | TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering | Boyi Zhang et.al. | 2506.00331 | null |
2025-05-31 | Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning | Sara Ghazanfari et.al. | 2506.00318 | null |
2025-05-30 | Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity | Dang Nguyen et.al. | 2506.00245 | null |
2025-05-30 | MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs | Gabrielle Kaili-May Liu et.al. | 2505.24858 | link |
2025-05-30 | Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs | Juraj Vladika et.al. | 2505.24830 | null |
2025-06-02 | Guiding Generative Storytelling with Knowledge Graphs | Zhijun Pan et.al. | 2505.24803 | null |
2025-05-30 | Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty? | Jiayu Liu et.al. | 2505.24778 | link |
2025-05-30 | Can LLMs and humans be friends? Uncovering factors affecting human-AI intimacy formation | Yeseon Hong et.al. | 2505.24658 | null |
2025-05-30 | The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models | Junyi Li et.al. | 2505.24630 | link |
2025-05-30 | LLM Inference Enhanced by External Knowledge: A Survey | Yu-Hsuan Lin et.al. | 2505.24377 | link |
2025-05-30 | ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration | Xianglong Yan et.al. | 2505.24357 | null |
2025-05-30 | Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction | Yangui Fang et.al. | 2505.24347 | null |
2025-05-30 | LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization | Zirui Shang et.al. | 2505.24282 | null |
2025-06-02 | MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | Bowen Dong et.al. | 2505.24238 | null |
2025-05-30 | ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction | Murari Ambati et.al. | 2505.24230 | null |
2025-05-30 | Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling | Yimin Du et.al. | 2505.24199 | null |
2025-05-29 | Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model | Nokimul Hasan Arif et.al. | 2505.24007 | null |
2025-05-29 | Fitting the Message to the Moment: Designing Calendar-Aware Stress Messaging with Large Language Models | Pranav Rao et.al. | 2505.23997 | null |
2025-05-29 | Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs | Yinong Oliver Wang et.al. | 2505.23996 | null |
2025-05-29 | FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression | Jiayi Tian et.al. | 2505.23966 | link |
2025-05-29 | Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation | Caiqi Zhang et.al. | 2505.23912 | null |
2025-05-29 | Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems | Winstead Zhu et.al. | 2505.23908 | null |
2025-05-29 | Revisiting Uncertainty Estimation and Calibration of Large Language Models | Linwei Tao et.al. | 2505.23854 | null |
2025-05-28 | Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs | Jakub Podolak et.al. | 2505.23845 | null |
2025-05-28 | SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context | Hairu Wang et.al. | 2505.23841 | null |
2025-05-29 | SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models | Zixiang Xu et.al. | 2505.23713 | link |
2025-06-02 | Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation | Hongxiang Zhang et.al. | 2505.23657 | null |
2025-06-01 | Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms | Jane Cleland-Huang et.al. | 2505.23576 | null |
2025-05-30 | EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions | Xiaorui Wu et.al. | 2505.23473 | null |
2025-06-01 | A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy | Ahmad Mohsin et.al. | 2505.23397 | null |
2025-05-29 | Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs | Julia Belikova et.al. | 2505.23299 | null |
2025-05-29 | Daunce: Data Attribution through Uncertainty Estimation | Xingyuan Pan et.al. | 2505.23223 | null |
2025-05-29 | DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes | Sungjune Park et.al. | 2505.23179 | null |
2025-05-29 | AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models | Jinchuan Zhang et.al. | 2505.23020 | link |
2025-05-28 | Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents | Michael Kirchhof et.al. | 2505.22655 | null |
2025-05-28 | The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason | Ang Lv et.al. | 2505.22653 | null |
2025-05-30 | Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs | Ziling Cheng et.al. | 2505.22630 | null |
2025-05-28 | Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding | Chengyue Wu et.al. | 2505.22618 | null |
2025-05-28 | Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users | Victor Jüttner et.al. | 2505.22435 | null |
2025-05-28 | AI Trust Reshaping Administrative Burdens: Understanding Trust-Burden Dynamics in LLM-Assisted Benefits Systems | Jeongwon Jo et.al. | 2505.22418 | null |
2025-05-28 | Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation | Yunsoo Kim et.al. | 2505.22222 | null |
2025-05-31 | iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs | Runkai Li et.al. | 2505.22086 | null |
2025-05-28 | Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home? | Yujin Choi et.al. | 2505.22061 | null |
2025-05-28 | Legal Assist AI: Leveraging Transformer-Based Model for Effective Legal Assistance | Jatin Gupta et.al. | 2505.22003 | null |
2025-05-28 | ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning | Zhendong Mi et.al. | 2505.21987 | null |
2025-05-28 | Judging LLMs on a Simplex | Patrick Vossler et.al. | 2505.21972 | null |
2025-05-28 | Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning | Qihuang Zhong et.al. | 2505.21958 | null |
2025-05-27 | Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation | Tharindu Kumarage et.al. | 2505.21784 | null |
2025-05-27 | Calibrating LLM Confidence by Probing Perturbed Representation Stability | Reza Khanmohammadi et.al. | 2505.21772 | null |
2025-05-30 | Do We Know What LLMs Don’t Know? A Study of Consistency in Knowledge Probing | Raoyuan Zhao et.al. | 2505.21701 | null |
2025-05-27 | The Feasibility of Topic-Based Watermarking on Academic Peer Reviews | Alexander Nemecek et.al. | 2505.21636 | null |
2025-05-27 | Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems | Young-Min Cho et.al. | 2505.21588 | null |
2025-05-27 | Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | Yihan Wang et.al. | 2505.21503 | null |
2025-05-27 | Can Large Reasoning Models Self-Train? | Sheikh Shafayat et.al. | 2505.21444 | null |
2025-05-27 | Pretrained LLMs Learn Multiple Types of Uncertainty | Roi Cohen et.al. | 2505.21218 | null |
2025-05-27 | Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA | Sergey Pletenev et.al. | 2505.21115 | null |
2025-05-27 | A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction | Bogdan Bogachov et.al. | 2505.21109 | null |
2025-05-27 | Thinker: Learning to Think Fast and Slow | Stephen Chung et.al. | 2505.21097 | null |
2025-05-28 | Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation | Ekaterina Fadeeva et.al. | 2505.21072 | null |
2025-05-27 | Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking | Lingyi Cai et.al. | 2505.21045 | null |
2025-05-27 | Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA | Xiangqing Shen et.al. | 2505.20971 | null |
2025-05-27 | IRCopilot: Automated Incident Response with Large Language Models | Xihuan Lin et.al. | 2505.20945 | null |
2025-05-27 | Towards Objective Fine-tuning: How LLMs’ Prior Knowledge Causes Potential Poor Calibration? | Ziming Wang et.al. | 2505.20903 | null |
2025-05-27 | MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection | Baraa Hikal et.al. | 2505.20880 | null |
2025-05-27 | Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG | Xin Sun et.al. | 2505.20871 | null |
2025-05-27 | AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding | Chaeyoung Jung et.al. | 2505.20862 | null |
2025-05-27 | Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation | Wooseong Yang et.al. | 2505.20773 | null |
2025-05-30 | CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models | Xiaqiang Tang et.al. | 2505.20767 | link |
2025-05-27 | RRO: LLM Agent Optimization Through Rising Reward Trajectories | Zilong Wang et.al. | 2505.20737 | null |
2025-05-26 | Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting | Ana Rita Ortigoso et.al. | 2505.20521 | null |
2025-05-26 | InFact: Informativeness Alignment for Improved LLM Factuality | Roi Cohen et.al. | 2505.20487 | null |
2025-05-26 | HAMburger: Accelerating LLM Inference via Token Smashing | Jingyu Liu et.al. | 2505.20438 | null |
2025-05-26 | GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation | Zihong Chen et.al. | 2505.20416 | link |
2025-05-26 | GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining | Simin Fan et.al. | 2505.20380 | null |
2025-05-26 | Reasoning LLMs are Wandering Solution Explorers | Jiahao Lu et.al. | 2505.20296 | null |
2025-05-26 | Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? | Michael Kirchhof et.al. | 2505.20295 | null |
2025-05-26 | Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models | Weihao Xuan et.al. | 2505.20236 | null |
2025-05-27 | Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning | Xiaorong Wang et.al. | 2505.20195 | null |
2025-05-26 | From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data | Chun-Yi Kuan et.al. | 2505.20166 | null |
2025-05-26 | Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities | Chuangtao Ma et.al. | 2505.20099 | link |
2025-05-26 | Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks | Debargha Ganguly et.al. | 2505.20047 | null |
2025-05-26 | Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs | Artem Vazhentsev et.al. | 2505.20045 | null |
2025-05-26 | DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response | Bilel Cherif et.al. | 2505.19973 | null |
2025-05-26 | CP-Router: An Uncertainty-Aware Router Between LLM and LRM | Jiayuan Su et.al. | 2505.19970 | null |
2025-05-26 | Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | Tej Deep Pala et.al. | 2505.19706 | link |
2025-05-26 | Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement | Liqin Ye et.al. | 2505.19675 | link |
2025-05-26 | DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue | Yichun Feng et.al. | 2505.19630 | link |
2025-05-26 | Learning to Reason without External Rewards | Xuandong Zhao et.al. | 2505.19590 | link |
2025-05-26 | Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models | Jianxing Liao et.al. | 2505.19490 | null |
2025-05-26 | Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection | Mohammad Mahdi Moradi et.al. | 2505.19475 | null |
2025-05-26 | Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents | Ye Ye et.al. | 2505.19436 | link |
2025-05-26 | Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering | Jiajun Zhu et.al. | 2505.19410 | null |
2025-05-26 | VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation | Ethan TS. Liu et.al. | 2505.19395 | link |
2025-05-25 | Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales | Charles Godfrey et.al. | 2505.19334 | null |
2025-05-25 | LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models | Aida Kostikova et.al. | 2505.19240 | null |
2025-05-25 | GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling | Jialong Zhou et.al. | 2505.19234 | null |
2025-05-25 | LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling | Yang Xiao et.al. | 2505.19187 | link |
2025-05-27 | When Two LLMs Debate, Both Think They’ll Win | Pradyumna Shyama Prasad et.al. | 2505.19184 | null |
2025-05-25 | Do Large Language Models (Really) Need Statistical Foundations? | Weijie Su et.al. | 2505.19145 | null |
2025-05-25 | CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models | Yongheng Zhang et.al. | 2505.19108 | link |
2025-05-25 | Towards Harmonized Uncertainty Estimation for Large Language Models | Rui Li et.al. | 2505.19073 | null |
2025-05-25 | UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models | Roman Vashurin et.al. | 2505.19060 | null |
2025-05-25 | Online Knowledge Distillation with Reward Guidance | Chen Jia et.al. | 2505.18952 | null |
2025-05-25 | LLM-Guided Taxonomy and Hierarchical Uncertainty for 3D Point CLoud Active Learning | Chenxi Li et.al. | 2505.18924 | null |
2025-05-24 | Mitigating Deceptive Alignment via Self-Monitoring | Jiaming Ji et.al. | 2505.18807 | null |
2025-05-24 | PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs | Tengxuan Liu et.al. | 2505.18610 | link |
2025-05-24 | Response Uncertainty and Probe Modeling: Two Sides of the Same Coin in LLM Interpretability? | Yongjie Wang et.al. | 2505.18575 | null |
2025-05-24 | B-score: Detecting biases in large language models using response history | An Vo et.al. | 2505.18545 | null |
2025-05-24 | Benchmarking Poisoning Attacks against Retrieval-Augmented Generation | Baolei Zhang et.al. | 2505.18543 | null |
2025-05-24 | RoleRAG: Enhancing LLM Role-Playing via Graph Guided Retrieval | Yongjie Wang et.al. | 2505.18541 | null |
2025-05-24 | AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking | Soyoung Yoon et.al. | 2505.18512 | link |
2025-05-24 | MedScore: Factuality Evaluation of Free-Form Medical Answers | Heyuan Huang et.al. | 2505.18452 | link |
2025-05-23 | Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps | Khandakar Ashrafi Akbar et.al. | 2505.18426 | null |
2025-05-23 | Model Editing with Graph-Based External Memory | Yash Kumar Atri et.al. | 2505.18343 | null |
2025-05-23 | NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache | Donghyun Son et.al. | 2505.18231 | null |
2025-05-23 | Evidence-Grounded Multimodal Misinformation Detection with Attention-Based GNNs | Sharad Duwal et.al. | 2505.18221 | null |
2025-05-26 | Outcome-based Reinforcement Learning to Predict the Future | Benjamin Turtel et.al. | 2505.17989 | null |
2025-05-23 | LLM Meeting Decision Trees on Tabular Data | Hangting Ye et.al. | 2505.17918 | null |
2025-05-23 | Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour | Bálint Gyevnár et.al. | 2505.17801 | null |
2025-05-23 | C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models | Amir Hossein Rahmati et.al. | 2505.17773 | null |
2025-05-23 | But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors | Leon Eshuijs et.al. | 2505.17760 | null |
2025-05-23 | Get Experience from Practice: LLM Agents with Record & Replay | Erhu Feng et.al. | 2505.17716 | null |
2025-05-23 | Distilling LLM Agent into Small Models with Retrieval and Code Tools | Minki Kang et.al. | 2505.17612 | link |
2025-05-23 | Dynamic Text Bundling Supervision for Zero-Shot Inference on Text-Attributed Graphs | Yusheng Zhao et.al. | 2505.17599 | null |
2025-05-23 | Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection | Shrey Pandit et.al. | 2505.17558 | null |
2025-05-23 | How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception | Shiyu Ni et.al. | 2505.17537 | null |
2025-05-23 | CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents | Minsoo Khang et.al. | 2505.17503 | null |
2025-05-23 | keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection | Saketh Reddy Vemula et.al. | 2505.17485 | link |
2025-05-23 | Self-Training Large Language Models with Confident Reasoning | Hyosoon Jang et.al. | 2505.17454 | null |
2025-05-23 | A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit | Zafarullah Mahmood et.al. | 2505.17362 | link |
2025-05-22 | GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints | Soren DeHaan et.al. | 2505.17327 | null |
2025-05-22 | Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty | Peilin Wu et.al. | 2505.17281 | null |
2025-05-22 | Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG) | Clayton Cohn et.al. | 2505.17238 | null |
2025-05-22 | LLM-Powered Agents for Navigating Venice’s Historical Cadastre | Tristan Karch et.al. | 2505.17148 | null |
2025-05-22 | When can isotropy help adapt LLMs’ next word prediction to numerical domains? | Rashed Shelim et.al. | 2505.17135 | null |
2025-05-21 | NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction | Soyeon Kim et.al. | 2505.17125 | null |
2025-05-22 | R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | Huatong Song et.al. | 2505.17005 | link |
2025-05-22 | UNCLE: Uncertainty Expressions in Long-Form Generation | Ruihan Yang et.al. | 2505.16922 | null |
2025-05-22 | Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs | Zeyu Wei et.al. | 2505.16894 | null |
2025-05-22 | Walk&Retrieve: Simple Yet Effective Zero-shot Retrieval-Augmented Generation via Knowledge Graph Walks | Martin Böckling et.al. | 2505.16849 | link |
2025-05-22 | Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement | Kexin Zhang et.al. | 2505.16806 | null |
2025-05-22 | Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs | Zeping Yu et.al. | 2505.16703 | null |
2025-05-22 | Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator | Beier Luo et.al. | 2505.16690 | null |
2025-05-22 | Collaboration among Multiple Large Language Models for Medical Question Answering | Kexin Shang et.al. | 2505.16648 | null |
2025-05-22 | Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering | Bowen Jiang et.al. | 2505.16591 | null |
2025-05-22 | Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs | Giovanni Servedio et.al. | 2505.16520 | null |
2025-05-24 | Recursive Offloading for LLM Serving in Multi-tier Networks | Zhiyuan Wu et.al. | 2505.16502 | link |
2025-05-22 | Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery | Yanbo Zhang et.al. | 2505.16477 | null |
2025-05-22 | MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | Siwei Meng et.al. | 2505.16456 | null |
2025-05-22 | Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems | Hongru Song et.al. | 2505.16367 | null |
2025-05-22 | HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation | Shijie Zhang et.al. | 2505.16281 | null |
2025-05-22 | Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation | Derong Xu et.al. | 2505.16237 | null |
2025-05-22 | Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models | Menschikov Mikhail et.al. | 2505.16134 | null |
2025-05-22 | Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning | Junhong Lin et.al. | 2505.16122 | null |
2025-05-22 | LLM-Powered AI Agent Systems and Their Applications in Industry | Guannan Liang et.al. | 2505.16120 | null |
2025-05-22 | Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools | Panagiotis Lymperopoulos et.al. | 2505.16113 | null |
2025-05-23 | Continually Self-Improving Language Models for Bariatric Surgery Question–Answering | Yash Kumar Atri et.al. | 2505.16102 | null |
2025-05-21 | Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation | Ruijie Xi et.al. | 2505.16065 | null |
2025-05-21 | SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models | Roland Daynauth et.al. | 2505.16003 | null |
2025-05-22 | HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving | Zhiwen Chen et.al. | 2505.15793 | null |
2025-05-21 | Long-Form Information Alignment Evaluation Beyond Atomic Facts | Danna Zheng et.al. | 2505.15792 | null |
2025-05-21 | Large Language Models as Computable Approximations to Solomonoff Induction | Jun Wan et.al. | 2505.15784 | null |
2025-05-21 | KaFT: Knowledge-aware Fine-tuning for Boosting LLMs’ Domain-specific Question-Answering Performance | Qihuang Zhong et.al. | 2505.15480 | null |
2025-05-21 | AdUE: Improving uncertainty estimation head for LoRA adapters in LLMs | Artem Zabolotnyi et.al. | 2505.15443 | null |
2025-05-21 | RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection | Yiming Huang et.al. | 2505.15386 | null |
2025-05-21 | Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack | Silvia Cappelletti et.al. | 2505.15323 | null |
2025-05-21 | Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization | Joonho Yang et.al. | 2505.15291 | null |
2025-05-21 | Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs | Zihao Pan et.al. | 2505.15265 | null |
2025-05-22 | Adaptive Plan-Execute Framework for Smart Contract Security Auditing | Zhiyuan Wei et.al. | 2505.15242 | null |
2025-05-21 | Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge | Yassir Fathullah et.al. | 2505.15240 | null |
2025-05-21 | Multilingual Prompting for Improving LLM Generation Diversity | Qihan Wang et.al. | 2505.15229 | null |
2025-05-21 | Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs | Jie Ma et.al. | 2505.15210 | link |
2025-05-21 | ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection | Jeonghye Kim et.al. | 2505.15182 | null |
2025-05-21 | Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning | Jinghui Lu et.al. | 2505.15154 | null |
2025-05-21 | The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning | Shivam Agarwal et.al. | 2505.15134 | link |
2025-05-21 | RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals | Xuanliang Zhang et.al. | 2505.15110 | null |
2025-05-21 | Cost-aware LLM-based Online Dataset Annotation | Eray Can Elumar et.al. | 2505.15101 | null |
2025-05-21 | PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration | Yingming Pu et.al. | 2505.15047 | link |
2025-05-21 | Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models | Zhihao Wen et.al. | 2505.14992 | null |
2025-05-20 | JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation | Ghasem Pasandi et.al. | 2505.14978 | null |
2025-05-20 | Foundations of Unknown-aware Machine Learning | Xuefeng Du et.al. | 2505.14933 | null |
2025-05-20 | $\texttt{LLINBO}$ : Trustworthy LLM-in-the-Loop Bayesian Optimization | Chih-Yu Chang et.al. | 2505.14756 | link |
2025-05-20 | Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models | Guangzhi Xiong et.al. | 2505.14599 | link |
2025-05-20 | Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples | Chun-Yi Kuan et.al. | 2505.14518 | null |
2025-05-20 | Reasoning Models Better Express Their Confidence | Dongkeun Yoon et.al. | 2505.14489 | link |
2025-05-21 | Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis | Haoming Huang et.al. | 2505.14406 | null |
2025-05-20 | Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs | Jiawen Wang et.al. | 2505.14368 | null |
2025-05-20 | Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents | Wei Fan et.al. | 2505.14104 | null |
2025-05-20 | MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations | Ernests Lavrinovics et.al. | 2505.14101 | link |
2025-05-20 | Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering | Yihua Zhu et.al. | 2505.14099 | null |
2025-05-20 | ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data | Xinzhe Zheng et.al. | 2505.14038 | null |
2025-05-21 | When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty | Yanzhe Wen et.al. | 2505.13989 | null |
2025-05-20 | The Hallucination Tax of Reinforcement Finetuning | Linxin Song et.al. | 2505.13988 | null |
2025-05-20 | MLZero: A Multi-Agent System for End-to-end Machine Learning Automation | Haoyang Fang et.al. | 2505.13941 | link |
2025-05-20 | DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery | Kun Li et.al. | 2505.13940 | link |
2025-05-20 | Preference Learning with Lie Detectors can Induce Honesty or Evasion | Chris Cundy et.al. | 2505.13787 | link |
2025-05-19 | Incentivizing Truthful Language Models via Peer Elicitation Games | Baiting Chen et.al. | 2505.13636 | link |
2025-05-19 | Selective Code Generation for Functional Guarantees | Jaewoo Jeong et.al. | 2505.13553 | null |
2025-05-19 | Exploring Federated Pruning for Large Language Models | Pengxin Guo et.al. | 2505.13547 | link |
2025-05-19 | Know Or Not: a library for evaluating out-of-knowledge base robustness | Jessica Foo et.al. | 2505.13545 | link |
2025-05-16 | An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents | Ayesha Amjad et.al. | 2505.13504 | null |
2025-05-19 | GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection | Zhijie Deng et.al. | 2505.13312 | null |
2025-05-19 | Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice | Zhi Liu et.al. | 2505.13156 | null |
2025-05-19 | Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning | Debarpan Bhattacharya et.al. | 2505.13115 | link |
2025-05-19 | Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs | Shmulik Markovich-Golan et.al. | 2505.13060 | null |
2025-05-19 | Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering | Jianfeng Cai et.al. | 2505.12826 | null |
2025-05-19 | LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries | Kenya Abe et.al. | 2505.12694 | link |
2025-05-19 | Know3-RAG: A Knowledge-aware RAG Framework with Adaptive Retrieval, Generation, and Filtering | Xukai Liu et.al. | 2505.12662 | link |
2025-05-18 | UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection | Yang Zhao et.al. | 2505.12457 | null |
2025-05-18 | VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning | Qi Wang et.al. | 2505.12434 | link |
2025-05-18 | PSC: Extending Context Window of Large Language Models via Phase Shift Calibration | Wenqiao Zhu et.al. | 2505.12423 | link |
2025-05-18 | SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization | Minghan Chen et.al. | 2505.12346 | null |
2025-05-18 | Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge | Luyu Chen et.al. | 2505.12301 | null |
2025-05-18 | The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models | Linghan Huang et.al. | 2505.12287 | null |
2025-05-18 | Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation | Chengwei Qin et.al. | 2505.12265 | null |
2025-05-17 | The Impact of Emerging Phishing Threats: Assessing Quishing and LLM-generated Phishing Emails against Organizations | Marie Weinz et.al. | 2505.12104 | null |
2025-05-20 | MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities | Jingxue Chen et.al. | 2505.12043 | null |
2025-05-17 | SOCIA: An End-to-End Agentic Framework for Automated Cyber-Physical-Social Simulator Generation | Yuncheng Hua et.al. | 2505.12006 | null |
2025-05-17 | TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text | Ahmed Lekssays et.al. | 2505.11988 | link |
2025-05-17 | CCNU at SemEval-2025 Task 3: Leveraging Internal and External Knowledge of Large Language Models for Multilingual Hallucination Annotation | Xu Liu et.al. | 2505.11965 | null |
2025-05-17 | Fine-Grained ECG-Text Contrastive Learning via Waveform Understanding Enhancement | Haitao Li et.al. | 2505.11939 | null |
2025-05-17 | Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning? | Zihao Dongfang et.al. | 2505.11907 | null |
2025-05-17 | When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research | Guijin Son et.al. | 2505.11855 | null |
2025-05-17 | Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs | Xuannan Liu et.al. | 2505.11842 | link |
2025-05-17 | Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling | Yitian Chen et.al. | 2505.11792 | null |
2025-05-17 | Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission | Seungeun Oh et.al. | 2505.11788 | null |
2025-05-16 | Token-Level Uncertainty Estimation for Large Language Model Reasoning | Tunyu Zhang et.al. | 2505.11737 | null |
2025-05-16 | Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models | Harshil Vejendla et.al. | 2505.11731 | null |
2025-05-16 | Terminators: Terms of Service Parsing and Auditing Agents | Maruf Ahmed Mridul et.al. | 2505.11672 | null |
2025-05-16 | EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models | Bohao Xing et.al. | 2505.11405 | link |
2025-05-19 | Phare: A Safety Probe for Large Language Models | Pierre Le Jeune et.al. | 2505.11365 | link |
2025-05-16 | The Way We Prompt: Conceptual Blending, Neural Dynamics, and Prompt-Induced Transitions in LLMs | Makoto Sato et.al. | 2505.10948 | null |
2025-05-19 | Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation | Zhan Peng Lee et.al. | 2505.10792 | link |
2025-05-19 | Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding | Jianfei Zhao et.al. | 2505.10634 | null |
2025-05-14 | The Impact of Large Language Models on Task Automation in Manufacturing Services | Jochen Wulf et.al. | 2505.10581 | null |
2025-05-20 | AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges | Ranjan Sapkota et.al. | 2505.10468 | null |
2025-05-15 | GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs | Longchao Da et.al. | 2505.10143 | null |
2025-05-16 | Leveraging Graph Retrieval-Augmented Generation to Support Learners’ Understanding of Knowledge Concepts in MOOCs | Mohamed Abdelmagied et.al. | 2505.10074 | null |
2025-05-15 | Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis | Bingda Tang et.al. | 2505.10046 | link |
2025-05-15 | Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph | Deeksha Prahlad et.al. | 2505.09945 | link |
2025-05-15 | Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks | Ziyuan Zhang et.al. | 2505.09901 | link |
2025-05-14 | A Multimodal Multi-Agent Framework for Radiology Report Generation | Ziruo Yi et.al. | 2505.09787 | null |
2025-05-14 | Trustless Autonomy: Understanding Motivations, Benefits and Governance Dilemma in Self-Sovereign Decentralized AI Agents | Botao Amber Hu et.al. | 2505.09757 | null |
2025-05-15 | SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation | Achref Doula et.al. | 2505.09427 | null |
2025-05-14 | Statistical Modeling and Uncertainty Estimation of LLM Inference Systems | Kaustabha Ray et.al. | 2505.09319 | null |
2025-05-14 | Atomic Consistency Preference Optimization for Long-Form Question Answering | Jingfeng Chen et.al. | 2505.09039 | link |
2025-05-13 | Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification | Adarsh Kumar et.al. | 2505.09031 | null |
2025-05-13 | Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training | Yangyi Chen et.al. | 2505.08971 | link |
2025-05-13 | CellTypeAgent: Trustworthy cell type annotation with Large Language Models | Jiawen Chen et.al. | 2505.08844 | link |
2025-05-13 | Adaptive Schema-aware Event Extraction with Retrieval-Augmented Generation | Sheng Liang et.al. | 2505.08690 | null |
2025-05-13 | RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models | Fujun Zhang et.al. | 2505.08463 | null |
2025-05-13 | A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs | Artem Shelmanov et.al. | 2505.08200 | null |
2025-05-12 | LLMs to Support K-12 Teachers in Culturally Relevant Pedagogy: An AI Literacy Example | Jiayi Wang et.al. | 2505.08083 | null |
2025-05-11 | TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking | Ching Nam Hang et.al. | 2505.07891 | null |
2025-05-10 | Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints | Jian-Qiao Zhu et.al. | 2505.07883 | null |
2025-05-09 | Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy | A M Muntasir Rahman et.al. | 2505.07871 | null |
2025-05-12 | Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding | Yifeng Di et.al. | 2505.07768 | link |
2025-05-12 | KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation | Ching Han Chen et.al. | 2505.07618 | null |
2025-05-12 | Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent | Ziyang Huang et.al. | 2505.07596 | null |
2025-05-12 | Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models | Bahram Mohammadi et.al. | 2505.07500 | null |
2025-05-12 | Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis | Heydar Soudani et.al. | 2505.07459 | null |
2025-05-12 | LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning | Xiaotian Lin et.al. | 2505.07437 | link |
2025-05-12 | Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data | David de-Fitero-Dominguez et.al. | 2505.07372 | null |
2025-05-12 | Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection | Pei-Fu Guo et.al. | 2505.07309 | null |
2025-05-12 | Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs | Yifan Wei et.al. | 2505.07184 | link |
2025-05-13 | Exploring Anthropomorphism in Conversational Agents for Environmental Sustainability | Mathyas Giudici et.al. | 2505.07142 | null |
2025-05-14 | RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models | Hanzheng Dai et.al. | 2505.07089 | null |
2025-05-10 | POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models | Yangguang Shao et.al. | 2505.06579 | link |
2025-05-10 | LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus | Peihan Li et.al. | 2505.06513 | null |
2025-05-09 | Evolutionary thoughts: integration of large language models and evolutionary algorithms | Antonio Jimeno Yepes et.al. | 2505.05756 | link |
2025-05-08 | Adaptive Stress Testing Black-Box LLM Planners | Neeloy Chakraborty et.al. | 2505.05665 | null |
2025-05-08 | HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics | Lennart Luettgau et.al. | 2505.05602 | link |
2025-05-08 | FLAM: Frame-Wise Language-Audio Modeling | Yusong Wu et.al. | 2505.05335 | null |
2025-05-08 | MARK: Memory Augmented Refinement of Knowledge | Anish Ganguli et.al. | 2505.05177 | null |
2025-05-08 | A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network | Haoxiang Luo et.al. | 2505.05103 | null |
2025-05-08 | Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware | Yujia Chen et.al. | 2505.05057 | link |
2025-05-08 | An Open-Source Dual-Loss Embedding Model for Semantic Retrieval in Higher Education | Ramteja Sajja et.al. | 2505.04916 | null |
2025-05-07 | Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards | Manveer Singh Tamber et.al. | 2505.04847 | link |
2025-05-07 | Osiris: A Lightweight Open-Source Hallucination Detection System | Alex Shan et.al. | 2505.04844 | null |
2025-05-07 | A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models | Pedro Pinacho-Davidson et.al. | 2505.04784 | null |
2025-05-07 | The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems | Sutapa Dey Tithi et.al. | 2505.04736 | null |
2025-05-06 | Advancing Conversational Diagnostic AI with Multimodal Reasoning | Khaled Saab et.al. | 2505.04653 | null |
2025-05-06 | Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions | Adithya Kulkarni et.al. | 2505.04651 | null |
2025-05-09 | MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection | Zhihao Zhang et.al. | 2505.04594 | null |
2025-05-07 | Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters | David Exler et.al. | 2505.04393 | null |
2025-05-07 | Benchmarking LLMs’ Swarm intelligence | Kai Ruan et.al. | 2505.04364 | link |
2025-05-07 | LLM-Independent Adaptive RAG: Let the Question Speak for Itself | Maria Marina et.al. | 2505.04253 | null |
2025-05-07 | Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications | Yuanai Xie et.al. | 2505.04068 | null |
2025-05-02 | Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs | Ganghua Wang et.al. | 2505.03814 | null |
2025-05-02 | MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance | Xing Hu et.al. | 2505.03804 | null |
2025-05-02 | Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth | Changhai Zhou et.al. | 2505.03802 | null |
2025-04-30 | Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding | Trilok Padhi et.al. | 2505.03788 | null |
2025-05-06 | A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning | Kolawole E. Ogunsina et.al. | 2505.03553 | null |
2025-05-06 | Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis | Shuang Zhou et.al. | 2505.03467 | null |
2025-05-06 | Automatic Calibration for Membership Inference Attack on Large Language Models | Saleh Zare Zade et.al. | 2505.03392 | link |
2025-05-06 | Interpretable Zero-shot Learning with Infinite Class Concepts | Zihan Ye et.al. | 2505.03361 | null |
2025-05-06 | Artificial Behavior Intelligence: Technology, Challenges, and Future Directions | Kanghyun Jo et.al. | 2505.03315 | null |
2025-05-06 | A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case | Haoxiang Luo et.al. | 2505.03196 | null |
2025-05-06 | Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering | Joshua Owotogbe et.al. | 2505.03096 | null |
2025-05-05 | Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models | Zhengliang Shi et.al. | 2505.03075 | link |
2025-05-05 | UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output | Sicong Huang et.al. | 2505.03030 | null |
2025-05-05 | Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? | Guangzhi Sun et.al. | 2505.02884 | null |
2025-05-05 | Phase transitions in AI-human interaction networks: statistics, computation, and probabilistic modeling | Jackson George et.al. | 2505.02879 | null |
2025-05-08 | ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations | Dmitriy Shopkhoev et.al. | 2505.02819 | link |
2025-05-05 | Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing | Diji Yang et.al. | 2505.02811 | link |
2025-05-06 | Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation | Gerard Pons et.al. | 2505.02737 | null |
2025-05-04 | SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation | Tanguy Herserant et.al. | 2505.02235 | null |
2025-05-12 | LLM-Guided Probabilistic Program Induction for POMDP Model Estimation | Aidan Curtis et.al. | 2505.02216 | null |
2025-05-04 | Large Language Models are overconfident and amplify human bias | Fengfei Sun et.al. | 2505.02151 | null |
2025-05-04 | VECSR: Virtually Embodied Common Sense Reasoning System | Alexis R. Tudor et.al. | 2505.02144 | link |
2025-05-06 | Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation | Chenxi Liu et.al. | 2505.02138 | link |
2025-05-04 | Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach | Jiancong Xiao et.al. | 2505.01997 | null |
2025-05-03 | High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers | Brian Wong et.al. | 2505.01693 | null |
2025-05-02 | Always Tell Me The Odds: Fine-grained Conditional Probability Estimation | Liaoyaqi Wang et.al. | 2505.01595 | null |
2025-05-02 | Retrieval Augmented Learning: A Retrial-based Large Language Model Self-Supervised Learning and Autonomous Knowledge Generation | Zongyuan Li et.al. | 2505.01073 | null |
2025-05-02 | Multi-agents based User Values Mining for Recommendation | Lijian Chen et.al. | 2505.00981 | null |
2025-05-01 | Multivariate Conformal Selection | Tian Bai et.al. | 2505.00917 | null |
2025-05-08 | SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation | Quang P. M. Pham et.al. | 2505.00831 | link |
2025-05-01 | HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models | Zhaoxing Li et.al. | 2505.00820 | null |
2025-05-01 | A Survey on Large Language Model based Human-Agent Systems | Henry Peng Zou et.al. | 2505.00753 | link |
2025-05-05 | Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs | Dung Nguyen et.al. | 2505.00744 | null |
2025-05-01 | Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models | Makoto Sato et.al. | 2505.00557 | null |
2025-05-01 | HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection | Deanna Emery et.al. | 2505.00506 | null |
2025-05-01 | Distributed Retrieval-Augmented Generation | Chenhao Xu et.al. | 2505.00443 | link |
2025-04-30 | Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs | Jinyan Su et.al. | 2505.00127 | null |
2025-04-30 | Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 | Jeho Choi et.al. | 2505.00060 | null |
2025-04-24 | An Empirical Study on Prompt Compression for Large Language Models | Zheng Zhang et.al. | 2505.00019 | link |
2025-04-30 | MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness | Junsheng Huang et.al. | 2504.21773 | null |
2025-04-30 | Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA | Xuanzhao Dong et.al. | 2504.21252 | link |
2025-05-01 | AI-in-the-Loop Planning for Transportation Electrification: Case Studies from Austin, Texas | Seung Jun Choi et.al. | 2504.21185 | null |
2025-04-29 | LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge | Naheed Rayhan et.al. | 2504.21132 | null |
2025-04-22 | ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees | Jun Wang et.al. | 2504.21022 | null |
2025-04-22 | Context-Enhanced Contrastive Search for Improved LLM Text Generation | Jaydip Sen et.al. | 2504.21020 | null |
2025-04-29 | Jekyll-and-Hyde Tipping Point in an AI’s Behavior | Neil F. Johnson et.al. | 2504.20980 | null |
2025-04-29 | SetKE: Knowledge Editing for Knowledge Elements Overlap | Yifan Wei et.al. | 2504.20972 | null |
2025-04-29 | Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models | Maryna Vyshnyvetska et.al. | 2504.20951 | null |
2025-04-29 | DYNAMAX: Dynamic computing for Transformers and Mamba based architectures | Miguel Nogales et.al. | 2504.20922 | null |
2025-04-29 | Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges | Yunseo Lee et.al. | 2504.20799 | null |
2025-04-29 | Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think | Hasan Abed Al Kader Hammoud et.al. | 2504.20708 | null |
2025-04-29 | Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation? | Evangelia Gogoulou et.al. | 2504.20699 | null |
2025-04-29 | Identifying Uncertainty in Self-Adaptive Robotics with Large Language Models | Hassan Sartaj et.al. | 2504.20684 | null |
2025-04-30 | TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data | Qi Wang et.al. | 2504.20462 | null |
2025-04-28 | Towards Large Language Models for Lunar Mission Planning and In Situ Resource Utilization | Michael Pekala et.al. | 2504.20125 | null |
2025-04-24 | RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning | Zihan Wang et.al. | 2504.20073 | link |
2025-04-28 | Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages | Pritika Rohera et.al. | 2504.20022 | null |
2025-04-28 | Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models | Xin Wang et.al. | 2504.20020 | null |
2025-04-28 | GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets | Mingqian He et.al. | 2504.19898 | null |
2025-04-28 | A Tripartite Perspective on GraphRAG | Michael Banf et.al. | 2504.19667 | null |
2025-04-28 | An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination | Dixiao Wei et.al. | 2504.19480 | null |
2025-04-28 | Towards Long Context Hallucination Detection | Siyi Liu et.al. | 2504.19457 | null |
2025-04-27 | Bi-directional Model Cascading with Proxy Confidence | David Warren et.al. | 2504.19391 | null |
2025-04-27 | The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach | Chad Coleman et.al. | 2504.19255 | null |
2025-04-30 | Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers | Dylan Bouchard et.al. | 2504.19254 | link |
2025-04-27 | Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models | Anindya Bijoy Das et.al. | 2504.19061 | null |
2025-04-26 | Calibrating Translation Decoding with Quality Estimation on LLMs | Di Wu et.al. | 2504.19044 | link |
2025-04-26 | AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression | Dong Whi Yoo et.al. | 2504.18932 | null |
2025-04-26 | Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning | Abdellah Ghassel et.al. | 2504.18839 | null |
2025-04-25 | Span-Level Hallucination Detection for LLM-Generated Answers | Passant Elchafei et.al. | 2504.18639 | null |
2025-04-24 | Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach | Iizalaarab Elhaimeur et.al. | 2504.18603 | null |
2025-04-25 | LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection | Rajesh Yarra et.al. | 2504.18423 | null |
2025-04-25 | Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review | Toghrul Abbasli et.al. | 2504.18346 | null |
2025-04-25 | Evaluating Evaluation Metrics – The Mirage of Hallucination Detection | Atharva Kulkarni et.al. | 2504.18114 | null |
2025-04-25 | Random-Set Large Language Models | Muhammad Mubashar et.al. | 2504.18085 | null |
2025-04-25 | Validating Network Protocol Parsers with Traceable RFC Document Interpretation | Mingwei Zheng et.al. | 2504.18050 | null |
2025-04-24 | LLM Agent Swarm for Hypothesis-Driven Drug Discovery | Kevin Song et.al. | 2504.17967 | null |
2025-04-24 | HalluLens: LLM Hallucination Benchmark | Yejin Bang et.al. | 2504.17550 | null |
2025-04-24 | Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing | Hieu Huynh et.al. | 2504.17287 | null |
2025-04-23 | How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study | Rendi Chevi et.al. | 2504.17083 | null |
2025-04-23 | Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models | Shariar Kabir et.al. | 2504.17052 | null |
2025-04-23 | (Im)possibility of Automated Hallucination Detection in Large Language Models | Amin Karbasi et.al. | 2504.17004 | null |
2025-04-18 | SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments | Dachun Sun et.al. | 2504.16947 | null |
2025-04-23 | Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models | Xuyang Zhu et.al. | 2504.16883 | null |
2025-04-23 | Monte Carlo Planning with Large Language Model for Text-Based Game Agents | Zijing Shi et.al. | 2504.16855 | null |
2025-04-23 | Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories | Mareike Lisker et.al. | 2504.16604 | null |
2025-04-23 | ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving | Jie JW Wu et.al. | 2504.16331 | null |
2025-04-23 | Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations | Nikhil Khandalkar et.al. | 2504.15903 | null |
2025-04-22 | Dynamic Early Exit in Reasoning Models | Chenxu Yang et.al. | 2504.15895 | link |
2025-04-22 | Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback | Ning Wang et.al. | 2504.15804 | null |
2025-04-22 | Grounded in Context: Retrieval-Based Method for Hallucination Detection | Assaf Gerner et.al. | 2504.15771 | null |
2025-04-20 | PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind | Yajie Yu et.al. | 2504.15313 | null |
2025-04-21 | Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning | Ehsan Ahmadi et.al. | 2504.15263 | null |
2025-04-21 | Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges | Nandan Thakur et.al. | 2504.15205 | null |
2025-04-21 | The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models | Ronak Pradeep et.al. | 2504.15068 | null |
2025-04-23 | aiXamine: Simplified LLM Safety and Security | Fatih Deniz et.al. | 2504.14985 | null |
2025-04-21 | POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications | Chunjing Gan et.al. | 2504.14917 | null |
2025-04-21 | CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs | Yingming Zheng et.al. | 2504.14905 | link |
2025-04-20 | HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis | Kangwei Xu et.al. | 2504.14641 | null |
2025-04-20 | A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models | Hongming Tan et.al. | 2504.14620 | null |
2025-04-20 | a1: Steep Test-time Scaling Law via Environment Augmented Generation | Lingrui Mei et.al. | 2504.14597 | null |
2025-04-20 | Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey | Ahsan Bilal et.al. | 2504.14520 | null |
2025-04-20 | VizTA: Enhancing Comprehension of Distributional Visualization with Visual-Lexical Fused Conversational Interface | Liangwei Wang et.al. | 2504.14507 | null |
2025-04-20 | CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge | Armin Toroghi et.al. | 2504.14462 | null |
2025-04-20 | Information Diffusion and Preferential Attachment in a Network of Large Language Models | Adit Jain et.al. | 2504.14438 | null |
2025-04-20 | ResNetVLLM-2: Addressing ResNetVLLM’s Multi-Modal Hallucinations | Ahmad Khalil et.al. | 2504.14429 | null |
2025-04-19 | Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts | Kun Qian et.al. | 2504.14375 | null |
2025-04-19 | Density Measures for Language Generation | Jon Kleinberg et.al. | 2504.14370 | null |
2025-04-19 | Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model | Youngbin Lee et.al. | 2504.14345 | link |
2025-04-19 | A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models | Chengming Wang et.al. | 2504.14241 | null |
2025-04-18 | Metacognition and Uncertainty Communication in Humans and Large Language Models | Mark Steyvers et.al. | 2504.14045 | null |
2025-04-18 | Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy | Regan Bolton et.al. | 2504.14044 | null |
2025-04-18 | Going Whole Hog: A Philosophical Defense of AI Cognition | Herman Cappelen et.al. | 2504.13988 | null |
2025-04-18 | Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations | Chenghao Xiao et.al. | 2504.13816 | link |
2025-04-18 | Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results | Andrea Santilli et.al. | 2504.13677 | null |
2025-04-18 | Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code | Antonio Della Porta et.al. | 2504.13656 | null |
2025-04-18 | Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs | Gabriel Freedman et.al. | 2504.13644 | link |
2025-04-18 | Long-context Non-factoid Question Answering in Indic Languages | Ritwik Mishra et.al. | 2504.13615 | link |
2025-04-18 | Continual Pre-Training is (not) What You Need in Domain Adaption | Pin-Er Chen et.al. | 2504.13603 | null |
2025-04-18 | Trust, but verify | Michael J. Yuan et.al. | 2504.13443 | null |
2025-04-17 | Energy-Based Reward Models for Robust Language Model Alignment | Anamika Lochab et.al. | 2504.13134 | link |
2025-04-17 | VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models | Haojian Huang et.al. | 2504.13122 | link |
2025-04-17 | Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild | Jiatai Wang et.al. | 2504.12982 | null |
2025-04-17 | QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? | Zhouyang Jiang et.al. | 2504.12961 | null |
2025-04-18 | Customizing Emotional Support: How Do Individuals Construct and Interact With LLM-Powered Chatbots | Xi Zheng et.al. | 2504.12943 | null |
2025-04-17 | Explainable AI in Usable Privacy and Security: Challenges and Opportunities | Vincent Freiberger et.al. | 2504.12931 | null |
2025-04-17 | Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Yicheng Pan et.al. | 2504.12773 | link |
2025-04-17 | Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations | Yiyou Sun et.al. | 2504.12691 | link |
2025-04-17 | Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models | Liyi Zhang et.al. | 2504.12585 | link |
2025-04-16 | PlanGlow: Personalized Study Planning with an Explainable and Controllable LLM-Driven System | Jiwon Chun et.al. | 2504.12452 | link |
2025-04-16 | Don’t Just Translate, Agitate: Using Large Language Models as Devil’s Advocates for AI Explanations | Ashley Suh et.al. | 2504.12424 | null |
2025-04-16 | Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study | Harry Li et.al. | 2504.12422 | null |
2025-04-16 | Gauging Overprecision in LLMs: An Empirical Study | Adil Bahaj et.al. | 2504.12098 | null |
2025-04-16 | Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models | Kris Pilcher et.al. | 2504.12012 | null |
2025-04-16 | SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes | Raúl Vázquez et.al. | 2504.11975 | null |
2025-04-16 | Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading | Kihyun Kim et.al. | 2504.11816 | link |
2025-04-16 | Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming | Paul Denny et.al. | 2504.11723 | null |
2025-04-15 | From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs | Guocong Li et.al. | 2504.11277 | null |
2025-04-16 | Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR | Yulong Zhang et.al. | 2504.11101 | null |
2025-04-15 | MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique | Shuhang Liu et.al. | 2504.11009 | null |
2025-04-14 | CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates | Ankit Kumar Shaw et.al. | 2504.10738 | null |
2025-04-14 | HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving | Avinash Kumar et.al. | 2504.10724 | null |
2025-04-14 | EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration | Soham Shah et.al. | 2504.10681 | null |
2025-04-14 | Efficient Process Reward Model Training via Active Learning | Keyu Duan et.al. | 2504.10559 | link |
2025-04-09 | Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion | Jakub Podolak et.al. | 2504.10509 | null |
2025-04-14 | Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? | Olha Shaposhnyk et.al. | 2504.10397 | null |
2025-04-16 | Heimdall: test-time scaling on the generative verification | Wenlei Shi et.al. | 2504.10337 | null |
2025-04-14 | From Prompting to Alignment: A Generative Framework for Query Recommendation | Erxue Min et.al. | 2504.10208 | null |
2025-04-14 | DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation | Hanghui Guo et.al. | 2504.10198 | null |
2025-04-14 | HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection | Mohamed A. Abdallah et.al. | 2504.10168 | null |
2025-04-14 | C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation | Xu Zhang et.al. | 2504.10167 | null |
2025-04-14 | The Human Visual System Can Inspire New Interaction Paradigms for LLMs | Diana Robinson et.al. | 2504.10101 | null |
2025-04-14 | Hallucination Detection in LLMs via Topological Divergence on Attention Graphs | Alexandra Bazarova et.al. | 2504.10063 | null |
2025-04-15 | Emotional Strain and Frustration in LLM Interactions in Software Engineering | Cristina Martinez Montes et.al. | 2504.10050 | null |
2025-04-14 | DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify | Zhengxuan Zhang et.al. | 2504.10036 | null |
2025-04-14 | EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control | Hanwen Wan et.al. | 2504.10030 | link |
2025-04-14 | KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference | Yuxuan Tian et.al. | 2504.09936 | null |
2025-04-14 | Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data | Shuai Zhao et.al. | 2504.09895 | null |
2025-04-14 | Reasoning Models Can Be Effective Without Thinking | Wenjie Ma et.al. | 2504.09858 | null |
2025-04-14 | RAKG:Document-level Retrieval Augmented Knowledge Graph Construction | Hairong Zhang et.al. | 2504.09823 | link |
2025-04-14 | Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning | Jingtian Wu et.al. | 2504.09781 | null |
2025-04-13 | DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training | Zhenting Wang et.al. | 2504.09710 | link |
2025-04-17 | Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws | Zhixuan Pan et.al. | 2504.09597 | null |
2025-04-17 | ControlNET: A Firewall for RAG-based LLM System | Hongwei Yao et.al. | 2504.09593 | null |
2025-04-13 | How new data permeates LLM knowledge and how to dilute it | Chen Sun et.al. | 2504.09522 | null |
2025-04-13 | HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs | Sharanya Dasgupta et.al. | 2504.09482 | link |
2025-04-13 | Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection | MingShan Liu et.al. | 2504.09440 | null |
2025-04-12 | Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation | Pengcheng Zhou et.al. | 2504.09301 | null |
2025-04-12 | SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders | Ashmi Banerjee et.al. | 2504.09277 | null |
2025-04-12 | Towards More Efficient, Robust, Instance-adaptive, and Generalizable Online Learning | Zhiyong Wang et.al. | 2504.09192 | null |
2025-04-11 | Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation | Terrence Neumann et.al. | 2504.08954 | null |
2025-04-11 | Knowledge Graph-extended Retrieval Augmented Generation for Question Answering | Jasper Linders et.al. | 2504.08893 | null |
2025-04-11 | Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning | Fangzhi Xu et.al. | 2504.08672 | link |
2025-04-11 | MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation | Tao Zhang et.al. | 2504.08621 | link |
2025-04-16 | Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks | Ye Ye et.al. | 2504.08525 | link |
2025-04-07 | SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Runjin Chen et.al. | 2504.07986 | link |
2025-04-10 | Token Level Routing Inference System for Edge Devices | Jianshu She et.al. | 2504.07878 | null |
2025-04-10 | Robust Hallucination Detection in LLMs via Adaptive Token Selection | Mengjia Niu et.al. | 2504.07863 | null |
2025-04-17 | PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization | Yang Jiao et.al. | 2504.07717 | null |
2025-04-10 | Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations | Sheila Castilho et.al. | 2504.07680 | null |
2025-04-10 | Enhancing Large Language Models through Neuro-Symbolic Integration and Ontological Reasoning | Ruslan Idelfonso Magana Vsevolodovna et.al. | 2504.07640 | link |
2025-04-11 | Malware analysis assisted by AI with R2AI | Axelle Apvrille et.al. | 2504.07574 | null |
2025-04-10 | A taxonomy of epistemic injustice in the context of AI and the case for generative hermeneutical erasure | Warmhold Jan Thomas Mollema et.al. | 2504.07531 | null |
2025-04-10 | Supervised Optimism Correction: Be Confident When LLMs Are Sure | Junjie Zhang et.al. | 2504.07527 | null |
2025-04-10 | Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction | Kyoyun Choi et.al. | 2504.07415 | null |
2025-04-10 | Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | Hanqi Xiao et.al. | 2504.07389 | link |
2025-04-11 | Alice: Proactive Learning with Teacher’s Demonstrations for Weak-to-Strong Generalization | Shujin Wu et.al. | 2504.07316 | link |
2025-04-09 | HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification | Bibek Paudel et.al. | 2504.07069 | null |
2025-04-11 | Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration | Kostas Hatalis et.al. | 2504.06943 | null |
2025-04-09 | Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program | Minghe Gao et.al. | 2504.06606 | link |
2025-04-09 | Do Reasoning Models Show Better Verbalized Calibration? | Qingcheng Zeng et.al. | 2504.06564 | null |
2025-04-08 | Don’t Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning | Yuehan Qin et.al. | 2504.06438 | null |
2025-04-08 | Human Trust in AI Search: A Large-Scale Experiment | Haiwen Li et.al. | 2504.06435 | null |
2025-04-09 | GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization | Bojana Ranković et.al. | 2504.06265 | link |
2025-04-08 | VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs | Dongjun Qian et.al. | 2504.05673 | null |
2025-04-08 | On the Impact of Language Nuances on Sentiment Analysis with Large Language Models: Paraphrasing, Sarcasm, and Emojis | Naman Bhargava et.al. | 2504.05603 | null |
2025-04-07 | GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases | Alfred Clemedtson et.al. | 2504.05478 | link |
2025-04-07 | The challenge of uncertainty quantification of large language models in medicine | Zahra Atf et.al. | 2504.05278 | null |
2025-04-07 | DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation | Xinglin Lyu et.al. | 2504.05122 | link |
2025-04-07 | On the Performance of an Explainable Language Model on PubMedQA | Venkat Srinivasan et.al. | 2504.05074 | null |
2025-04-07 | Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning | Sugyeong Eo et.al. | 2504.05047 | null |
2025-04-07 | A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models | Carlos Peláez-González et.al. | 2504.04976 | null |
2025-04-07 | A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization | Wenyuan Xu et.al. | 2504.04950 | null |
2025-04-06 | Capturing AI’s Attention: Physics of Repetition, Hallucination, Bias and Beyond | Frank Yingjie Huo et.al. | 2504.04600 | null |
2025-04-06 | Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models | Rui Gan et.al. | 2504.04562 | link |
2025-04-06 | VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT | Zhuo Zhi et.al. | 2504.04471 | null |
2025-04-06 | An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainability | David Herrera-Poyatos et.al. | 2504.04462 | null |
2025-04-09 | How Accurately Do Large Language Models Understand Code? | Sabaat Haroon et.al. | 2504.04372 | null |
2025-04-06 | Generative Large Language Models Trained for Detecting Errors in Radiology Reports | Cong Sun et.al. | 2504.04336 | null |
2025-04-09 | Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks | Marios Kokkodis et.al. | 2504.04277 | null |
2025-04-05 | Adaptive Elicitation of Latent Information Using Natural Language | Jimmy Wang et.al. | 2504.04204 | null |
2025-04-04 | Structured Extraction of Process Structure Properties Relationships in Materials Science | Amit K Verma et.al. | 2504.03979 | null |
2025-04-04 | Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content – The Ask ME Assistant | Kovan Mzwri et.al. | 2504.03966 | null |
2025-04-04 | Practical Poisoning Attacks against Retrieval-Augmented Generation | Baolei Zhang et.al. | 2504.03957 | null |
2025-04-04 | The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning | Virilo Tejedor et.al. | 2504.03823 | null |
2025-04-04 | Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy | Kamil Ciosek et.al. | 2504.03579 | null |
2025-04-04 | Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej | Shubham Kumar Nigam et.al. | 2504.03486 | null |
2025-04-07 | LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications | Botao Zhu et.al. | 2504.03444 | null |
2025-04-04 | Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models | Mirko Borszukovszki et.al. | 2504.03440 | null |
2025-04-04 | Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models | Afshin Khadangi et.al. | 2504.03302 | link |
2025-04-04 | Do Large Language Models Solve the Problems of Agent-Based Modeling? A Critical Review of Generative Social Simulations | Maik Larooij et.al. | 2504.03274 | null |
2025-04-04 | Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation | Weitao Li et.al. | 2504.03165 | link |
2025-04-03 | How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence | Hongzhe Du et.al. | 2504.02904 | null |
2025-04-03 | Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models | Liangjie Huang et.al. | 2504.02902 | null |
2025-04-01 | Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications | Hongliu Cao et.al. | 2504.02867 | null |
2025-04-01 | The Illusionist’s Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances | Yining Wang et.al. | 2504.02865 | null |
2025-04-03 | A Memory-Augmented LLM-Driven Method for Autonomous Merging of 3D Printing Work Orders | Yuhao Liu et.al. | 2504.02509 | null |
2025-04-03 | Cognitive Memory in Large Language Models | Lianlei Shan et.al. | 2504.02441 | null |
2025-04-02 | Achieving Unanimous Consensus in Decision Making Using Multi-Agents | Apurba Pokharel et.al. | 2504.02128 | null |
2025-04-02 | Aligned Better, Listen Better for Audio-Visual Large Language Models | Yuxin Guo et.al. | 2504.02061 | null |
2025-04-03 | Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation | Baban Gain et.al. | 2504.01919 | null |
2025-04-02 | LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution | Zhuoran Yang et.al. | 2504.01533 | null |
2025-04-03 | Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding | Sakhinana Sagar Srinivas et.al. | 2504.01281 | null |
2025-04-01 | Grade Guard: A Smart System for Short Answer Automated Grading | Niharika Dadu et.al. | 2504.01253 | null |
2025-04-01 | Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models | Rafael Giebisch et.al. | 2504.01248 | null |
2025-04-01 | Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery | Nicholas Clark et.al. | 2504.01205 | null |
2025-04-01 | $μ$ KE: Matryoshka Unstructured Knowledge Editing of Large Language Models | Zian Su et.al. | 2504.01196 | null |
2025-04-01 | Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations | Mahjabin Nahar et.al. | 2504.01153 | link |
2025-04-01 | MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs) | Bikash Saha et.al. | 2504.01145 | link |
2025-04-01 | Investigating Large Language Models in Diagnosing Students’ Cognitive Skills in Math Problem-solving | Hyoungwook Jin et.al. | 2504.00843 | null |
2025-04-01 | Aplicação de Large Language Models na Análise e Síntese de Documentos Jurídicos: Uma Revisão de Literatura | Matheus Belarmino et.al. | 2504.00725 | null |
2025-04-01 | GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments | Enjun Du et.al. | 2504.00711 | null |
2025-04-01 | DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism | Dengchun Li et.al. | 2504.00661 | link |
2025-04-01 | Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences | Xiangyang Liu et.al. | 2504.00473 | null |
2025-04-01 | Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics | Shide Zhou et.al. | 2504.00446 | null |
2025-04-01 | Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding | Mohanakrishnan Hariharan et.al. | 2504.00409 | null |
2025-04-01 | When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR) | Mahak Agarwal et.al. | 2504.00374 | null |
2025-03-31 | SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning | Shiyue Zhao et.al. | 2504.00115 | null |
2025-03-30 | Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge | Agam Shah et.al. | 2504.00042 | null |
2025-03-27 | Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1 | Birger Moell et.al. | 2504.00016 | null |
2025-03-31 | SQuat: Subspace-orthogonal KV Cache Quantization | Hao Wang et.al. | 2503.24358 | null |
2025-03-31 | Model Hemorrhage and the Robustness Limits of Large Language Models | Ziyang Ma et.al. | 2503.23924 | null |
2025-03-31 | Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement | Yuqiao Tan et.al. | 2503.23895 | link |
2025-03-31 | Adaptive Layer-skipping in Pre-trained LLMs | Xuan Luo et.al. | 2503.23798 | null |
2025-03-31 | MKA: Leveraging Cross-Lingual Consensus for Model Abstention | Sharad Duwal et.al. | 2503.23687 | link |
2025-03-30 | RARE: Retrieval-Augmented Reasoning Modeling | Zhengren Wang et.al. | 2503.23513 | link |
2025-03-30 | SCORE: Story Coherence and Retrieval Enhancement for AI Narratives | Qiang Yi et.al. | 2503.23512 | null |
2025-03-30 | Re-Aligning Language to Visual Objects with an Agentic Workflow | Yuming Chen et.al. | 2503.23508 | null |
2025-03-30 | An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering | Alexander Murphy et.al. | 2503.23415 | null |
2025-03-30 | Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation | Jiwon Jeong et.al. | 2503.23363 | link |
2025-03-30 | Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base | Linxin Song et.al. | 2503.23361 | null |
2025-03-29 | Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus | Claas Beger et.al. | 2503.23229 | link |
2025-03-29 | Large Language Models are Unreliable for Cyber Threat Intelligence | Emanuele Mezzi et.al. | 2503.23175 | null |
2025-03-29 | Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments | Yifan Xu et.al. | 2503.23105 | null |
2025-03-29 | DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation | Hsin-Ling Hsu et.al. | 2503.23013 | null |
2025-03-29 | Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective | Xinyu Yao et.al. | 2503.22954 | null |
2025-03-29 | Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering | Yugen Sato et.al. | 2503.22941 | null |
2025-04-02 | Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use | Nicholas Roth et.al. | 2503.22931 | null |
2025-03-28 | Identifying and Mitigating API Misuse in Large Language Models | Terry Yue Zhuo et.al. | 2503.22821 | null |
2025-03-26 | InfoBid: A Simulation Framework for Studying Information Disclosure in Auctions with Large Language Model-based Agents | Yue Yin et.al. | 2503.22726 | null |
2025-03-25 | Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models | Bowei Tian et.al. | 2503.22720 | null |
2025-03-25 | LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation | Sarah Martinson et.al. | 2503.22719 | link |
2025-03-31 | Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning | Abdullah Vanlioglu et.al. | 2503.22456 | null |
2025-03-28 | Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs | Yuan He et.al. | 2503.22362 | link |
2025-03-28 | Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions | Yubo Li et.al. | 2503.22353 | null |
2025-03-28 | BanglAssist: A Bengali-English Generative AI Chatbot for Code-Switching and Dialect-Handling in Customer Service | Francesco Kruk et.al. | 2503.22283 | null |
2025-03-28 | Learning to Instruct for Visual Instruction Tuning | Zhihan Zhou et.al. | 2503.22215 | null |
2025-03-28 | Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models | Zhanke Zhou et.al. | 2503.22165 | link |
2025-03-27 | Entropy-Aware Branching for Improved Mathematical Reasoning | Xianzhi Li et.al. | 2503.21961 | null |
2025-03-25 | OAEI-LLM-T: A TBox Benchmark Dataset for Understanding LLM Hallucinations in Ontology Matching Systems | Zhangcheng Qiang et.al. | 2503.21813 | null |
2025-03-27 | Cooking Task Planning using LLM and Verified by Graph Network | Ryunosuke Takebayashi et.al. | 2503.21564 | null |
2025-03-27 | SWI: Speaking with Intent in Large Language Models | Yuwei Yin et.al. | 2503.21544 | link |
2025-04-02 | Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best? | Ashish Sardana et.al. | 2503.21157 | null |
2025-03-27 | Alleviating LLM-based Generative Retrieval Hallucination in Alipay Search | Yedan Shen et.al. | 2503.21098 | null |
2025-03-26 | Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework | Thomson Yen et.al. | 2503.21023 | link |
2025-03-26 | Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring | Fraol Batole et.al. | 2503.20934 | null |
2025-03-26 | Exploring CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2503.20826 | link |
2025-03-26 | Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy | Joonhyun Jeong et.al. | 2503.20823 | link |
2025-03-26 | MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search | Yunhai Hu et.al. | 2503.20757 | null |
2025-03-26 | TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes | Raj Sanjay Shah et.al. | 2503.20648 | null |
2025-03-26 | Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering | Zehui Liao et.al. | 2503.20504 | null |
2025-03-26 | GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization | Zhouhong Gu et.al. | 2503.20194 | link |
2025-03-25 | FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs | Carlos Plou et.al. | 2503.19850 | null |
2025-03-25 | HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection | Maryam Bala et.al. | 2503.19650 | null |
2025-03-25 | KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models | Zhiwei Wang et.al. | 2503.19482 | null |
2025-03-25 | VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU | Zhongchun Zheng et.al. | 2503.19449 | null |
2025-03-25 | QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition | Yuxuan Hu et.al. | 2503.19353 | link |
2025-03-24 | Language Model Uncertainty Quantification with Attention Chain | Yinghao Li et.al. | 2503.19168 | link |
2025-03-24 | Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models | Nariman Naderi et.al. | 2503.18562 | null |
2025-03-24 | Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions | Dong Jing et.al. | 2503.18320 | null |
2025-03-23 | ShED-HD: A Shannon Entropy Distribution Framework for Lightweight Hallucination Detection on Edge Devices | Aneesh Vathul et.al. | 2503.18242 | null |
2025-03-23 | GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks | Varvara Krechetova et.al. | 2503.18129 | link |
2025-03-23 | SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA | V Venktesh et.al. | 2503.17990 | null |
2025-03-22 | A Modular Dataset to Demonstrate LLM Abstraction Capability | Adam Atanas et.al. | 2503.17645 | null |
2025-03-22 | ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently | Jaeyeon Lee et.al. | 2503.17587 | link |
2025-03-21 | Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring | Khadija Zanna et.al. | 2503.17569 | null |
2025-03-21 | Judge Anything: MLLM as a Judge Across Any Modality | Shu Pu et.al. | 2503.17489 | null |
2025-03-21 | LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language | Kun Chu et.al. | 2503.17309 | link |
2025-03-21 | FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs | Albert Sawczyn et.al. | 2503.17229 | null |
2025-03-20 | Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models | Zahra Khalila et.al. | 2503.16581 | null |
2025-03-26 | Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models | Hanzhi Zhang et.al. | 2503.16541 | null |
2025-03-18 | Do Multimodal Large Language Models Understand Welding? | Grigorii Khvatskii et.al. | 2503.16537 | null |
2025-03-18 | Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine | Chengfeng Dou et.al. | 2503.16530 | null |
2025-03-18 | HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL | Heng Ping et.al. | 2503.16528 | null |
2025-03-20 | Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data | Zijian Li et.al. | 2503.16260 | null |
2025-03-20 | Towards Lighter and Robust Evaluation for Retrieval Augmented Generation | Alex-Razvan Ispas et.al. | 2503.16161 | link |
2025-03-20 | ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph | Langming Liu et.al. | 2503.15990 | null |
2025-03-20 | Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models | Baolong Bi et.al. | 2503.15888 | link |
2025-03-21 | Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance | Hui Liu et.al. | 2503.15886 | null |
2025-03-20 | MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations | Kyungho Bae et.al. | 2503.15871 | null |
2025-03-20 | Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey | Xiaoou Liu et.al. | 2503.15850 | null |
2025-03-20 | Entropy-based Exploration Conduction for Multi-step Reasoning | Jinghan Zhang et.al. | 2503.15848 | null |
2025-03-23 | DNA Bench: When Silence is Smarter – Benchmarking Over-Reasoning in Reasoning LLMs | Masoud Hashemi et.al. | 2503.15793 | null |
2025-03-19 | R $^2$ : A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs | Zefeng Lin et.al. | 2503.15655 | null |
2025-03-19 | How Well Can AI Build SD Models? | William Schoenberg et.al. | 2503.15580 | null |
2025-03-19 | Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs | Yuqi Zhu et.al. | 2503.15341 | null |
2025-03-19 | Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning? | Roberto Araya et.al. | 2503.15268 | null |
2025-03-19 | Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems | Sejong Kim et.al. | 2503.15191 | link |
2025-03-19 | Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks | Yuting Guo et.al. | 2503.15169 | null |
2025-03-19 | ELTEX: A Framework for Domain-Driven Synthetic Data Generation | Arina Razmyslovich et.al. | 2503.15055 | link |
2025-03-18 | Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence | Sophia Hager et.al. | 2503.14749 | null |
2025-03-18 | Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving | Priscylla Silva et.al. | 2503.14630 | link |
2025-03-18 | Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations | Ziwei Ji et.al. | 2503.14477 | null |
2025-03-18 | From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models | Qiantong Wang et.al. | 2503.14392 | null |
2025-03-18 | How much do LLMs learn from negative examples? | Shadi Hamdan et.al. | 2503.14391 | link |
2025-03-18 | On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller? | Pouria Sarhadi et.al. | 2503.14379 | link |
2025-03-18 | Learning on LLM Output Signatures for gray-box LLM Behavior Analysis | Guy Bar-Shalom et.al. | 2503.14043 | link |
2025-03-18 | Predicting Human Choice Between Textually Described Lotteries | Eyal Marantz et.al. | 2503.14004 | null |
2025-03-18 | FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks | Siqi Zhang et.al. | 2503.13966 | null |
2025-03-19 | Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations | Rui Yang et.al. | 2503.13857 | null |
2025-03-18 | Empowering GraphRAG with Knowledge Filtering and Integration | Kai Guo et.al. | 2503.13804 | null |
2025-03-18 | Mapping the Trust Terrain: LLMs in Software Engineering – Insights and Perspectives | Dipin Khati et.al. | 2503.13793 | null |
2025-03-17 | Pareidolic Illusions of Meaning: ChatGPT, Pseudolaw and the Triumph of Form over Substance | Joe McIntyre et.al. | 2503.13556 | null |
2025-03-14 | RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration | Hong Qing Yu et.al. | 2503.13514 | null |
2025-03-17 | MetaScale: Test-Time Scaling with Evolving Meta-Thoughts | Qin Liu et.al. | 2503.13447 | null |
2025-03-17 | Managing Hybrid Solid-State Drives Using Large Language Models | Qian Wei et.al. | 2503.13105 | null |
2025-03-17 | Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning | Junming Liu et.al. | 2503.12972 | null |
2025-03-17 | MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting | Rui Pu et.al. | 2503.12931 | null |
2025-03-17 | HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models | Xinyan Jiang et.al. | 2503.12908 | link |
2025-03-16 | Can LLMs Formally Reason as Abstract Interpreters for Program Analysis? | Jacqueline L. Mitchell et.al. | 2503.12686 | null |
2025-03-16 | From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations | Sarvesh Baskar et.al. | 2503.12556 | null |
2025-03-21 | LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation | Yuqi Sun et.al. | 2503.12547 | null |
2025-03-18 | SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? | Jianzhu Yao et.al. | 2503.12349 | null |
2025-03-15 | PredicateFix: Repairing Static Analysis Alerts with Bridging Predicates | Yuan-An Xiao et.al. | 2503.12205 | null |
2025-03-20 | Applications of Large Language Model Reasoning in Feature Generation | Dharani Chandra et.al. | 2503.11989 | null |
2025-03-14 | LLM Agents for Education: Advances and Applications | Zhendong Chu et.al. | 2503.11733 | null |
2025-03-14 | Neutralizing Bias in LLM Reasoning using Entailment Graphs | Liang Cheng et.al. | 2503.11614 | link |
2025-03-14 | D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning | Jia Zhang et.al. | 2503.11441 | null |
2025-03-14 | Modeling Subjectivity in Cognitive Appraisal with Language Models | Yuxiang Zhou et.al. | 2503.11381 | null |
2025-03-14 | Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches | Panggih Kusuma Ningrum et.al. | 2503.11376 | null |
2025-03-14 | AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation | Fengyu Li et.al. | 2503.11346 | link |
2025-03-14 | Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models | Aissatou Diallo et.al. | 2503.11336 | null |
2025-03-14 | Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries | Sahil Kale et.al. | 2503.11256 | link |
2025-03-14 | Collaboration is all you need: LLM Assisted Safe Code Translation | Rabimba Karanjai et.al. | 2503.11237 | null |
2025-03-13 | Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations | Piyush Gupta et.al. | 2503.10941 | null |
2025-03-13 | HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust | Yu Luo et.al. | 2503.10793 | null |
2025-03-12 | CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries | Zhiyuan Wang et.al. | 2503.10707 | null |
2025-03-12 | Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models | Shahnewaz Karim Sakib et.al. | 2503.10690 | null |
2025-03-13 | TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention | Jinhao Duan et.al. | 2503.10602 | link |
2025-03-13 | SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models | Sahar Admoni et.al. | 2503.10509 | null |
2025-03-13 | LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions | Gaurav Kumar Gupta et.al. | 2503.10486 | null |
2025-03-13 | Collaborative Speculative Inference for Efficient LLM Inference Serving | Luyao Gao et.al. | 2503.10325 | null |
2025-03-13 | StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error | Shu-Xun Yang et.al. | 2503.10105 | link |
2025-03-13 | Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model | Qiyuan Deng et.al. | 2503.10093 | null |
2025-03-12 | Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets | Zahra Abbasiantaeb et.al. | 2503.09902 | link |
2025-03-12 | Probabilistic Reasoning with LLMs for k-anonymity Estimation | Jonathan Zheng et.al. | 2503.09674 | null |
2025-03-12 | CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection | Richard A. Dubniczky et.al. | 2503.09433 | link |
2025-03-12 | NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model | Yuzhi Lai et.al. | 2503.09335 | link |
2025-03-12 | Token Weighting for Long-Range Language Modeling | Falko Helm et.al. | 2503.09202 | link |
2025-03-12 | Is LLMs Hallucination Usable? LLM-based Negative Reasoning for Fake News Detection | Chaowei Zhang et.al. | 2503.09153 | null |
2025-03-11 | Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation | Yu Wang et.al. | 2503.08963 | null |
2025-03-11 | CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving | Changxing Liu et.al. | 2503.08683 | link |
2025-03-11 | DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process | Minjun Zhu et.al. | 2503.08569 | null |
2025-03-11 | Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Zhuo Zhi et.al. | 2503.08308 | null |
2025-03-11 | FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback | Kangan Qian et.al. | 2503.08162 | null |
2025-03-11 | LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification | Siyuan Wang et.al. | 2503.07937 | null |
2025-03-10 | Safety Guardrails for LLM-Enabled Robots | Zachary Ravichandran et.al. | 2503.07885 | null |
2025-03-10 | HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations | Samir Abdaljalil et.al. | 2503.07833 | null |
2025-03-07 | SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs | Jaewoo Song et.al. | 2503.07657 | null |
2025-03-07 | MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration | Jinguang Wang et.al. | 2503.07654 | null |
2025-03-10 | Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review | Samuel Ferino et.al. | 2503.07556 | null |
2025-03-10 | Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies | Luyi Jiang et.al. | 2503.07306 | null |
2025-03-10 | Quantizing Large Language Models for Code Generation: A Differentiated Replication | Alessandro Giagnorio et.al. | 2503.07103 | null |
2025-03-10 | CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation | Runqi Sui et.al. | 2503.06950 | null |
2025-03-09 | Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia | Sabeen Ahmed et.al. | 2503.06797 | null |
2025-03-09 | Delusions of Large Language Models | Hongshen Xu et.al. | 2503.06709 | null |
2025-03-09 | Alignment for Efficient Tool Calling of Large Language Models | Hongshen Xu et.al. | 2503.06708 | null |
2025-03-09 | Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform | Chenyu Huang et.al. | 2503.06676 | null |
2025-03-09 | Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving | Yao Cheng et.al. | 2503.06567 | null |
2025-03-09 | Graph Retrieval-Augmented LLM for Conversational Recommendation Systems | Zhangchi Qiu et.al. | 2503.06430 | null |
2025-03-09 | Performant LLM Agentic Framework for Conversational AI | Alex Casella et.al. | 2503.06410 | null |
2025-03-08 | Sample-aware Adaptive Structured Pruning for Large Language Models | Jun Kong et.al. | 2503.06184 | null |
2025-03-08 | Wireless Hallucination in Generative AI-enabled Communications: Concepts, Issues, and Solutions | Xudong Wang et.al. | 2503.06149 | link |
2025-03-08 | A Survey on Post-training of Large Language Models | Guiyao Tie et.al. | 2503.06072 | link |
2025-03-07 | SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs | Samir Abdaljalil et.al. | 2503.05980 | null |
2025-03-07 | TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator | Deepak Vungarala et.al. | 2503.05951 | null |
2025-03-04 | I Think, Therefore I Hallucinate: Minds, Machines, and the Art of Being Wrong | Sebastian Barros et.al. | 2503.05806 | null |
2025-03-07 | R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Huatong Song et.al. | 2503.05592 | null |
2025-03-07 | Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information | Junbo Zhao et.al. | 2503.05543 | null |
2025-03-07 | Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering | Yusong Ke et.al. | 2503.05505 | null |
2025-03-07 | Maximum Hallucination Standards for Domain-Specific Large Language Models | Tingmingke Lu et.al. | 2503.05481 | null |
2025-03-07 | An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning | Navdeep Kaur et.al. | 2503.05439 | null |
2025-03-07 | GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation | Zhenxuan Zhang et.al. | 2503.05347 | link |
2025-03-07 | Path Pooling: Train-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation | Hairu Wang et.al. | 2503.05203 | null |
2025-03-07 | RocketEval: Efficient Automated LLM Evaluation via Grading Checklist | Tianjun Wei et.al. | 2503.05142 | link |
2025-03-06 | LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression | Souvik Kundu et.al. | 2503.04982 | null |
2025-03-10 | Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents | Jingying Zeng et.al. | 2503.04830 | null |
2025-03-07 | START: Self-taught Reasoner with Tools | Chengpeng Li et.al. | 2503.04625 | null |
2025-03-06 | HalluCounter: Reference-free LLM Hallucination Detection in the Wild! | Ashok Urlana et.al. | 2503.04615 | null |
2025-03-06 | Benchmarking Reasoning Robustness in Large Language Models | Tong Yu et.al. | 2503.04550 | null |
2025-03-06 | TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction | Chao Wang et.al. | 2503.04457 | null |
2025-03-06 | On Fact and Frequency: LLM Responses to Misinformation Expressed with Uncertainty | Yana van de Sande et.al. | 2503.04271 | null |
2025-03-06 | Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation | Ziqiang Cui et.al. | 2503.04162 | null |
2025-03-06 | KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease | Yongchao Long et.al. | 2503.04153 | link |
2025-03-05 | Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis | Ahmad Hafez et.al. | 2503.03911 | link |
2025-03-07 | LEWIS (LayEr WIse Sparsity) – A Training Free Guided Model Merging Approach | Hetarth Chopra et.al. | 2503.03874 | null |
2025-03-04 | BotUmc: An Uncertainty-Aware Twitter Bot Detection with Multi-view Causal Inference | Tao Yang et.al. | 2503.03775 | null |
2025-03-05 | The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems | Richard Ren et.al. | 2503.03750 | null |
2025-03-05 | Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models | Bar Karov et.al. | 2503.03669 | link |
2025-03-05 | Structured Outputs Enable General-Purpose LLMs to be Medical Experts | Guangfu Guo et.al. | 2503.03194 | null |
2025-03-04 | SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs | Samir Abdaljalil et.al. | 2503.03032 | null |
2025-03-04 | Effectively Steer LLM To Follow Preference via Building Confident Directions | Bingqing Song et.al. | 2503.02989 | null |
2025-03-04 | Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework | Ziang Zhou et.al. | 2503.02863 | null |
2025-03-04 | Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers | Zicong He et.al. | 2503.02851 | link |
2025-03-04 | Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs | Yuzhe Gu et.al. | 2503.02846 | link |
2025-03-04 | FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting | Congluo Xu et.al. | 2503.02692 | null |
2025-03-04 | MPO: Boosting LLM Agents with Meta Plan Optimization | Weimin Xiong et.al. | 2503.02682 | link |
2025-03-04 | Multidimensional Consistency Improves Reasoning in Language Models | Huiyuan Lai et.al. | 2503.02670 | null |
2025-03-05 | Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models | Paul Stangel et.al. | 2503.02623 | null |
2025-03-04 | AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection | Dimitra Karkani et.al. | 2503.02442 | null |
2025-03-04 | Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling | Hang Zheng et.al. | 2503.02233 | null |
2025-03-04 | DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models | Saeed Ranjbar Alvar et.al. | 2503.02175 | link |
2025-03-03 | OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments | Qianwei Wang et.al. | 2503.02106 | null |
2025-03-05 | HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs | Tin Nguyen et.al. | 2503.02003 | link |
2025-03-02 | NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT | Jiaying Hong et.al. | 2503.01921 | link |
2025-03-01 | How to Steer LLM Latents for Hallucination Detection? | Seongheon Park et.al. | 2503.01917 | null |
2025-03-03 | Can (A)I Change Your Mind? | Miriam Havin et.al. | 2503.01844 | link |
2025-03-04 | Position: Don’t use the CLT in LLM evals with fewer than a few hundred datapoints | Sam Bowyer et.al. | 2503.01747 | null |
2025-03-03 | Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution | Kun Li et.al. | 2503.01695 | null |
2025-03-03 | When an LLM is apprehensive about its answers – and when its uncertainty is justified | Petr Sychev et.al. | 2503.01688 | link |
2025-03-03 | Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization | Siya Qi et.al. | 2503.01670 | link |
2025-03-03 | Detecting Stylistic Fingerprints of Large Language Models | Yehonatan Bitton et.al. | 2503.01659 | null |
2025-03-03 | Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning | Wenjie Wu et.al. | 2503.01642 | null |
2025-03-03 | Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering | Zhanghao Hu et.al. | 2503.01606 | null |
2025-03-03 | None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering | Zhi Rui Tam et.al. | 2503.01550 | null |
2025-03-03 | Revisiting Large Language Model Pruning using Neuron Semantic Attribution | Yizhuo Ding et.al. | 2503.01542 | null |
2025-03-03 | What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret | Yufeng Yuan et.al. | 2503.01491 | null |
2025-03-03 | Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation | Linhai Zhang et.al. | 2503.01315 | null |
2025-03-03 | LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains | Ling Xiao et.al. | 2503.01236 | null |
2025-03-06 | CE-U: Cross Entropy Unlearning | Bo Yang et.al. | 2503.01224 | null |
2025-03-03 | Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG | Wenbin Wang et.al. | 2503.01222 | link |
2025-03-04 | Can Large Language Models Help Experimental Design for Causal Discovery? | Junyi Li et.al. | 2503.01139 | null |
2025-03-02 | Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies | Tianyi Huang et.al. | 2503.00724 | null |
2025-03-02 | GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development | Leming Shen et.al. | 2503.00686 | link |
2025-03-02 | From Prompting to Partnering: Personalization Features for Human-LLM Interactions | Si Thu et.al. | 2503.00681 | null |
2025-03-01 | Embracing Diversity: A Multi-Perspective Approach with Soft Labels | Benedetta Muscato et.al. | 2503.00489 | null |
2025-03-01 | U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack | Yunfan Gao et.al. | 2503.00353 | link |
2025-03-01 | Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy | Jahan C. Penny-Dimri et.al. | 2503.00269 | null |
2025-02-28 | A Survey of Uncertainty Estimation Methods on Large Language Models | Zhiqiu Xia et.al. | 2503.00172 | null |
2025-02-27 | Societal Alignment Frameworks Can Improve LLM Alignment | Karolina Stańczak et.al. | 2503.00069 | null |
2025-03-04 | Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs | Xiaomin Li et.al. | 2502.21239 | null |
2025-02-28 | PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information | Hansi Yang et.al. | 2502.21087 | null |
2025-03-03 | A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation | Xujie Yuan et.al. | 2502.20854 | null |
2025-02-28 | Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow | Jiaqi Bai et.al. | 2502.20750 | link |
2025-02-28 | Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models | Colleen Gilhuly et.al. | 2502.20647 | null |
2025-02-28 | Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems | Jędrzej Warczyński et.al. | 2502.20609 | null |
2025-02-27 | Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization | Ryan C. Barron et.al. | 2502.20364 | link |
2025-02-27 | Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models | Yi Jing et.al. | 2502.20344 | null |
2025-02-27 | Expertise Is What We Want | Alan Ashworth et.al. | 2502.20335 | null |
2025-02-27 | Conformal Tail Risk Control for Large Language Model Alignment | Catherine Yu-Chi Chen et.al. | 2502.20285 | null |
2025-02-27 | Similarity-Distance-Magnitude Universal Verification | Allen Schmaltz et.al. | 2502.20167 | link |
2025-03-04 | ProAPO: Progressively Automatic Prompt Optimization for Visual Classification | Xiangyan Qu et.al. | 2502.19844 | link |
2025-02-27 | Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences | Linzhuo li et.al. | 2502.19679 | null |
2025-02-26 | Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review | Sungduk Yu et.al. | 2502.19614 | null |
2025-02-26 | Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems | Nayoung Choi et.al. | 2502.19596 | null |
2025-02-26 | Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents | Ashley Lewis et.al. | 2502.19545 | null |
2025-02-26 | Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices | Xinru Wang et.al. | 2502.19410 | null |
2025-02-26 | Verde: Verification via Refereed Delegation for Machine Learning Programs | Arasu Arun et.al. | 2502.19405 | null |
2025-02-26 | Efficient Federated Search for Retrieval-Augmented Generation | Rachid Guerraoui et.al. | 2502.19280 | null |
2025-02-26 | Bi’an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation | Zhouyu Jiang et.al. | 2502.19209 | null |
2025-02-26 | Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement | Siyuan Zhang et.al. | 2502.19127 | null |
2025-02-26 | Talking like Piping and Instrumentation Diagrams (P&IDs) | Achmad Anggawirya Alimin et.al. | 2502.18928 | null |
2025-02-26 | Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models | Shuliang Liu et.al. | 2502.18817 | null |
2025-02-26 | Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science | Xiaohua Wu et.al. | 2502.18729 | null |
2025-02-25 | Scalable Best-of-N Selection for Large Language Models via Self-Certainty | Zhewei Kang et.al. | 2502.18581 | link |
2025-02-25 | Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions | Yizhe Zhang et.al. | 2502.18435 | null |
2025-02-25 | Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods | Nicola Cecere et.al. | 2502.18389 | null |
2025-02-25 | BRIDO: Bringing Democratic Order to Abstractive Summarization | Junhyun Lee et.al. | 2502.18342 | null |
2025-02-25 | Can LLMs Explain Themselves Counterfactually? | Zahra Dehghanighobadi et.al. | 2502.18156 | null |
2025-02-25 | LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers | Zhuocheng Zhang et.al. | 2502.18139 | link |
2025-02-25 | Verdict: A Library for Scaling Judge-Time Compute | Nimit Kalra et.al. | 2502.18018 | link |
2025-02-27 | LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction | Suozhi Huang et.al. | 2502.17925 | null |
2025-02-25 | An Overview of Large Language Models for Statisticians | Wenlong Ji et.al. | 2502.17814 | null |
2025-02-25 | Uncertainty Quantification for LLM-Based Survey Simulations | Chengpiao Huang et.al. | 2502.17773 | null |
2025-02-24 | Hallucination Detection in LLMs Using Spectral Features of Attention Maps | Jakub Binkowski et.al. | 2502.17598 | link |
2025-02-24 | Towards Conditioning Clinical Text Generation for User Control | Osman Alperen Koraş et.al. | 2502.17571 | null |
2025-02-22 | SAE-V: Interpreting Multimodal Models for Enhanced Alignment | Hantao Lou et.al. | 2502.17514 | null |
2025-02-24 | CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought | Boxuan Zhang et.al. | 2502.17214 | link |
2025-02-24 | IGDA: Interactive Graph Discovery through Large Language Model Agents | Alex Havrilla et.al. | 2502.17189 | null |
2025-02-24 | LettuceDetect: A Hallucination Detection Framework for RAG Applications | Ádám Kovács et.al. | 2502.17125 | link |
2025-02-27 | LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences | Sijia Yao et.al. | 2502.17057 | link |
2025-02-24 | Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology | Longchao Da et.al. | 2502.17026 | null |
2025-02-24 | Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning | Jiaheng Li et.al. | 2502.16896 | null |
2025-02-24 | Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models | Yaqi Sun et.al. | 2502.16842 | null |
2025-02-25 | Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses | Tiejin Chen et.al. | 2502.16820 | null |
2025-02-23 | Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT | Nidhal Jegham et.al. | 2502.16428 | null |
2025-02-23 | Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications | Feng Ma et.al. | 2502.16402 | null |
2025-02-22 | An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning | Masoud Shokrnezhad et.al. | 2502.16198 | null |
2025-02-22 | EPERM: An Evidence Path Enhanced Reasoning Model for Knowledge Graph Question and Answering | Xiao Long et.al. | 2502.16171 | null |
2025-02-22 | ZiGong 1.0: A Large Language Model for Financial Credit | Yu Lei et.al. | 2502.16159 | null |
2025-02-22 | The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination | Yuji Zhang et.al. | 2502.16143 | null |
2025-02-22 | Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals | Linda Zeng et.al. | 2502.16101 | null |
2025-02-21 | Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications | Zichen Chen et.al. | 2502.15865 | null |
2025-02-20 | Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection | Yihao Xue et.al. | 2502.15845 | null |
2025-02-20 | Hallucination Detection in Large Language Models with Metamorphic Relations | Borui Yang et.al. | 2502.15844 | null |
2025-02-21 | AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind | Zhining Zhang et.al. | 2502.15676 | link |
2025-02-24 | Empowering LLMs with Logical Reasoning: A Comprehensive Survey | Fengxiang Cheng et.al. | 2502.15652 | null |
2025-02-21 | A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany | Ina Dormuth et.al. | 2502.15568 | null |
2025-02-21 | PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning | Pengcheng Huang et.al. | 2502.15543 | link |
2025-02-21 | Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making | Eunhye Kim et.al. | 2502.15395 | null |
2025-02-21 | Evaluating Social Biases in LLM Reasoning | Xuyang Wu et.al. | 2502.15361 | null |
2025-02-21 | From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants | Manisha Mukherjee et.al. | 2502.15237 | null |
2025-02-20 | Using tournaments to calculate AUROC for zero-shot classification with LLMs | Wonjin Yoon et.al. | 2502.15018 | null |
2025-02-19 | OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment | Xiangjin Xie et.al. | 2502.14913 | null |
2025-02-19 | EvoP: Robust LLM Inference via Evolutionary Pruning | Shangyu Wu et.al. | 2502.14910 | null |
2025-02-19 | KOALA: Knowledge Conflict Augmentations for Robustness in Vision Language Models | Peter Carragher et.al. | 2502.14908 | link |
2025-02-20 | Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning | Shuyue Stella Li et.al. | 2502.14860 | link |
2025-02-20 | Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs | Zongxia Li et.al. | 2502.14748 | null |
2025-02-20 | CER: Confidence Enhanced Reasoning in LLMs | Ali Razghandi et.al. | 2502.14634 | link |
2025-02-20 | Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery | Minh-Quyet Ha et.al. | 2502.14631 | null |
2025-02-20 | ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification | Hyunseok Lee et.al. | 2502.14565 | null |
2025-02-20 | Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation | Austin A. Barr et.al. | 2502.14523 | link |
2025-02-25 | How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? | Sergey Pletenev et.al. | 2502.14502 | link |
2025-02-20 | Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models | Artem Vazhentsev et.al. | 2502.14427 | link |
2025-02-20 | ParallelComp: Parallel Long-Context Compressor for Length Extrapolation | Jing Xiong et.al. | 2502.14317 | null |
2025-02-20 | MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | Shrey Pandit et.al. | 2502.14302 | null |
2025-02-20 | STeCa: Step-level Trajectory Calibration for LLM Agent Learning | Hanlin Wang et.al. | 2502.14276 | link |
2025-02-20 | Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment | Jiaxi Li et.al. | 2502.14275 | null |
2025-02-20 | PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant | Congrui Yin et.al. | 2502.14271 | null |
2025-02-20 | MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels | Xiaoou Liu et.al. | 2502.14268 | null |
2025-02-20 | Multi-Faceted Studies on Data Poisoning can Advance LLM Development | Pengfei He et.al. | 2502.14182 | link |
2025-02-19 | SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation | Song Duong et.al. | 2502.13674 | null |
2025-02-19 | C2T: A Classifier-Based Tree Construction Method in Speculative Decoding | Feiye Huo et.al. | 2502.13652 | null |
2025-02-19 | REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models | DongGeon Lee et.al. | 2502.13622 | null |
2025-02-19 | What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis | Peiran Wang et.al. | 2502.13490 | null |
2025-02-19 | LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models | Ruiming Tang et.al. | 2502.13481 | null |
2025-02-19 | TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Jialin Ouyang et.al. | 2502.13442 | link |
2025-02-19 | Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning | Ningke Li et.al. | 2502.13416 | null |
2025-02-19 | Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval | Aditya Sharma et.al. | 2502.13369 | null |
2025-02-18 | SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? | Yucheng Shi et.al. | 2502.13233 | null |
2025-02-17 | Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment | Yuze Zhao et.al. | 2502.13170 | link |
2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | link |
2025-02-18 | Understanding and Rectifying Safety Perception Distortion in VLMs | Xiaohan Zou et.al. | 2502.13095 | null |
2025-02-18 | LAMD: Context-driven Android Malware Detection and Classification with LLMs | Xingzhi Qian et.al. | 2502.13055 | null |
2025-02-20 | Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation | Sha Li et.al. | 2502.13019 | null |
2025-02-18 | Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs | Adi Simhi et.al. | 2502.12964 | null |
2025-02-18 | Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models | Elena Stringli et.al. | 2502.12821 | null |
2025-02-20 | How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild | Saad Obaid ul Islam et.al. | 2502.12769 | link |
2025-02-18 | R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | Sumin Jo et.al. | 2502.12767 | link |
2025-02-18 | “I know myself better, but not really greatly”: Using LLMs to Detect and Explain LLM-Generated Texts | Jiazhou Ji et.al. | 2502.12743 | null |
2025-02-18 | R.R.: Unveiling LLM Training Privacy through Recollection and Ranking | Wenlong Meng et.al. | 2502.12658 | link |
2025-02-18 | COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation | Sean Wang et.al. | 2502.12601 | null |
2025-02-18 | EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning | Xiaoqian Liu et.al. | 2502.12486 | null |
2025-02-18 | Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking | Alireza S. Ziabari et.al. | 2502.12470 | null |
2025-02-18 | MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation | Yutong Wang et.al. | 2502.12468 | null |
2025-02-17 | Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs | Kan Zhu et.al. | 2502.12216 | null |
2025-02-17 | Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control | Jinyan Su et.al. | 2502.12145 | link |
2025-02-17 | KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs | Qi Zhao et.al. | 2502.12029 | null |
2025-02-17 | SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities | Fengqing Jiang et.al. | 2502.12025 | null |
2025-02-17 | Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning | Tianyi Wu et.al. | 2502.11962 | null |
2025-02-17 | Can Your Uncertainty Scores Detect Hallucinated Entity? | Min-Hsuan Yeh et.al. | 2502.11948 | null |
2025-02-17 | Cognitive-Aligned Document Selection for Retrieval-augmented Generation | Bingyu Wan et.al. | 2502.11770 | null |
2025-02-17 | ReviewEval: An Evaluation Framework for AI-Generated Reviews | Chavvi Kirtani et.al. | 2502.11736 | null |
2025-02-17 | Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception | Shiyu Ni et.al. | 2502.11677 | null |
2025-02-17 | Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation | Arindam Sharma et.al. | 2502.11620 | null |
2025-02-17 | Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? | Hanxing Ding et.al. | 2502.11400 | null |
2025-02-17 | “Nuclear Deployed!”: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Rongwu Xu et.al. | 2502.11355 | link |
2025-02-16 | Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation | Hieu Nguyen et.al. | 2502.11306 | null |
2025-02-16 | Uncertainty-Aware Step-wise Verification with Generative Reward Models | Zihuiwen Ye et.al. | 2502.11250 | null |
2025-02-16 | A Survey of LLM-based Agents in Medicine: How far are we from Baymax? | Wenxuan Wang et.al. | 2502.11211 | null |
2025-02-16 | Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs | Fei Yu et.al. | 2502.11155 | null |
2025-02-18 | Valuable Hallucinations: Realizable Non-realistic Propositions | Qiucheng Chen et.al. | 2502.11113 | null |
2025-02-16 | Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications | Alexandru Lecu et.al. | 2502.11108 | link |
2025-02-16 | Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models | Prateek Chhikara et.al. | 2502.11028 | link |
2025-02-16 | Leveraging Uncertainty Estimation for Efficient LLM Routing | Tuo Zhang et.al. | 2502.11021 | null |
2025-02-16 | Agentic LLM Framework for Adaptive Decision Discourse | Antoine Dolant et.al. | 2502.10978 | null |
2025-02-16 | SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information | Xiangyu Zhang et.al. | 2502.10950 | null |
2025-02-15 | Towards Effective Extraction and Evaluation of Factual Claims | Dasha Metropolitansky et.al. | 2502.10855 | null |
2025-02-15 | An Empirical Analysis of Uncertainty in Large Language Model Evaluations | Qiujie Xie et.al. | 2502.10709 | link |
2025-02-15 | LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization | Erica Zhang et.al. | 2502.10648 | link |
2025-02-14 | Post-training an LLM for RAG? Train on Self-Generated Demonstrations | Matthew Finlayson et.al. | 2502.10596 | null |
2025-02-14 | Can Large Language Model Agents Balance Energy Systems? | Xinxing Ren et.al. | 2502.10557 | link |
2025-02-14 | A novel approach to data generation in generative model | JaeHong Kim et.al. | 2502.10092 | null |
2025-02-14 | Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos | Weirui Ye et.al. | 2502.09886 | null |
2025-02-14 | Automated Hypothesis Validation with Agentic Sequential Falsifications | Kexin Huang et.al. | 2502.09858 | link |
2025-02-13 | Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes | Taylan G. Topcu et.al. | 2502.09690 | null |
2025-02-13 | LP-LM: No Hallucinations in Question Answering with Logic Programming | Katherine Wu et.al. | 2502.09212 | link |
2025-02-13 | Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York | Sanskar Sehgal et.al. | 2502.09204 | null |
2025-02-13 | Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables | Xuzhao Geng et.al. | 2502.09073 | null |
2025-02-13 | Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models | Xin Zhou et.al. | 2502.08922 | null |
2025-02-13 | MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training | Xinxin You et.al. | 2502.08904 | null |
2025-02-12 | Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | Mohammad Mahdi Abootorabi et.al. | 2502.08826 | link |
2025-02-11 | Hallucination, Monofacts, and Miscalibration: An Empirical Investigation | Muqing Miao et.al. | 2502.08666 | link |
2025-02-10 | Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis | Emanuele Ricco et.al. | 2502.08663 | null |
2025-02-09 | Few-shot_LLM_Synthetic_Data_with_Distribution_Matching | Jiyuan Ren et.al. | 2502.08661 | link |
2025-02-08 | Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions | Jingxin Xu et.al. | 2502.08657 | null |
2025-02-12 | Ensemble based approach to quantifying uncertainty of LLM based classifications | Srijith Rajamohan et.al. | 2502.08631 | null |
2025-02-12 | Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding | Konstantin Berestizshevsky et.al. | 2502.08363 | link |
2025-02-17 | Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG | Kushagra Bhushan et.al. | 2502.08356 | link |
2025-02-12 | Compromising Honesty and Harmlessness in Language Models via Deception Attacks | Laurène Vaugrante et.al. | 2502.08301 | null |
2025-02-12 | Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis | Changhua Pei et.al. | 2502.08224 | null |
2025-02-12 | Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences | Shanshan Han et.al. | 2502.08142 | null |
2025-02-12 | HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses | Sujeong Lee et.al. | 2502.08109 | null |
2025-02-12 | Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD) | Naomi Akhras et.al. | 2502.08073 | null |
2025-02-11 | From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems | Yining Hong et.al. | 2502.07974 | null |
2025-02-11 | Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning | Rujing Yao et.al. | 2502.07912 | link |
2025-02-11 | Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights | Ahilan Ayyachamy Nadar Ponnusamy et.al. | 2502.07835 | link |
2025-02-17 | Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering | Shuzheng Si et.al. | 2502.07340 | link |
2025-02-11 | When More is Less: Understanding Chain-of-Thought Length in LLMs | Yuyang Wu et.al. | 2502.07266 | null |
2025-02-11 | Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs | Sina Salimian et.al. | 2502.07186 | null |
2025-02-11 | Refine Knowledge of Large Language Models via Adaptive Contrastive Learning | Yinghui Li et.al. | 2502.07184 | null |
2025-02-11 | Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feng Chen et.al. | 2502.07154 | link |
2025-02-11 | Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning | Jiayuan Zhu et.al. | 2502.07143 | null |
2025-02-08 | Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models | Sina Tayebati et.al. | 2502.06884 | link |
2025-02-08 | Group Reasoning Emission Estimation Networks | Yanming Guo et.al. | 2502.06874 | null |
2025-02-08 | Knowledge Graph-Guided Retrieval Augmented Generation | Xiangrong Zhu et.al. | 2502.06864 | link |
2025-02-07 | LLM-Supported Natural Language to Bash Translation | Finnian Westenfelder et.al. | 2502.06858 | link |
2025-02-11 | Calibrating LLMs with Information-Theoretic Evidential Deep Learning | Yawei Li et.al. | 2502.06351 | link |
2025-02-10 | Expect the Unexpected: FailSafe Long Context QA for Finance | Kiran Kamble et.al. | 2502.06329 | null |
2025-02-10 | Emergent Response Planning in LLM | Zhichen Dong et.al. | 2502.06258 | null |
2025-02-10 | Confidence Improves Self-Consistency in LLMs | Amir Taubenfeld et.al. | 2502.06233 | null |
2025-02-10 | Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement | Junyu Lu et.al. | 2502.06207 | link |
2025-02-10 | Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis | Sanket Jantre et.al. | 2502.06173 | null |
2025-02-09 | GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation | Runchuan Zhu et.al. | 2502.05911 | null |
2025-02-09 | Self-Training Large Language Models for Tool-Use Without Demonstrations | Ne Luo et.al. | 2502.05867 | null |
2025-02-09 | Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models | Cheng Peng Huang et.al. | 2502.05825 | null |
2025-02-09 | Assessing confidence in frontier AI safety cases | Stephen Barrett et.al. | 2502.05791 | null |
2025-02-09 | Visual Text Mining with Progressive Taxonomy Construction for Environmental Studies | Sam Yu-Te Lee et.al. | 2502.05731 | link |
2025-02-07 | SEER: Self-Explainability Enhancement of Large Language Models’ Representations | Guanxu Chen et.al. | 2502.05242 | null |
2025-02-07 | ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework | Xiaoyu Deng et.al. | 2502.05084 | null |
2025-02-07 | Aligning Black-box Language Models with Human Judgments | Gerrit J. J. van den Burg et.al. | 2502.04997 | null |
2025-02-11 | CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs | Roman Vashurin et.al. | 2502.04964 | null |
2025-02-07 | Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks | Jing Yang et.al. | 2502.04797 | link |
2025-02-10 | Confidence Elicitation: A New Attack Vector for Large Language Models | Brian Formento et.al. | 2502.04643 | link |
2025-02-06 | TruthFlow: Truthful LLM Generation via Representation Flow Correction | Hanyu Wang et.al. | 2502.04556 | null |
2025-02-06 | Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization | Yu-Neng Chuang et.al. | 2502.04428 | null |
2025-02-06 | KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Xing Li et.al. | 2502.04420 | link |
2025-02-11 | Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing | Kunfeng Lai et.al. | 2502.04411 | null |
2025-02-06 | FAS: Fast ANN-SNN Conversion for Spiking Large Language Models | Long Chen et.al. | 2502.04405 | link |
2025-02-05 | Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning | Jonathan Kim et.al. | 2502.04381 | null |
2025-02-05 | MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction | Xiao Hu et.al. | 2502.04360 | null |
2025-02-04 | LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving | Md Sifat Hossain et.al. | 2502.04355 | null |
2025-02-06 | Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software | Andreas Baumann et.al. | 2502.03916 | null |
2025-02-06 | BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation | Bo Pang et.al. | 2502.03860 | null |
2025-02-12 | Syntriever: How to Train Your Retriever with Synthetic Data from LLMs | Minsang Kim et.al. | 2502.03824 | link |
2025-02-10 | Large Language Models for Multi-Robot Systems: A Survey | Peihan Li et.al. | 2502.03814 | link |
2025-02-08 | Enhancing Hallucination Detection through Noise Injection | Litian Liu et.al. | 2502.03799 | null |
2025-02-06 | Adaptive Semantic Prompt Caching with VectorQ | Luis Gaspar Schroeder et.al. | 2502.03771 | null |
2025-02-06 | Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models | Rui Cai et.al. | 2502.03715 | null |
2025-02-06 | MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers | Nicole Cho et.al. | 2502.03711 | null |
2025-02-06 | Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers | Daniel Beaglehole et.al. | 2502.03708 | null |
2025-02-06 | LLM Alignment as Retriever Optimization: An Information Retrieval Perspective | Bowen Jin et.al. | 2502.03699 | null |
2025-02-05 | Reflection-Window Decoding: Text Generation with Selective Refinement | Zeyu Tang et.al. | 2502.03678 | null |
2025-02-05 | Advancing Reasoning in Large Language Models: Promising Methods and Approaches | Avinash Patil et.al. | 2502.03671 | null |
2025-02-04 | Artificial Intelligence and Legal Analysis: Implications for Legal Education and the Profession | Lee Peoples et.al. | 2502.03487 | null |
2025-02-05 | A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) | Yiye Chen et.al. | 2502.03450 | null |
2025-02-05 | SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs | Ben Liu et.al. | 2502.03283 | null |
2025-02-05 | Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models | Jialiang Wu et.al. | 2502.03199 | null |
2025-02-05 | IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates | Aissatou Diallo et.al. | 2502.03080 | null |
2025-02-04 | An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification | Riddhi More et.al. | 2502.02715 | null |
2025-02-04 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Yize Wu et.al. | 2502.02493 | null |
2025-02-04 | Activation-Informed Merging of Large Language Models | Amin Heyrani Nobari et.al. | 2502.02421 | link |
2025-02-04 | From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing | Siwei Luo et.al. | 2502.02025 | null |
2025-02-03 | SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models | Diyana Muhammed et.al. | 2502.01812 | null |
2025-02-03 | Position: Towards a Responsible LLM-empowered Multi-Agent Systems | Jinwei Hu et.al. | 2502.01714 | null |
2025-02-02 | Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model | Hadas Ben-Atya et.al. | 2502.01691 | null |
2025-02-02 | LIBRA: Measuring Bias of Large Language Model from a Local Context | Bo Pang et.al. | 2502.01679 | null |
2025-02-01 | Benchmark on Peer Review Toxic Detection: A Challenging Task with a New Dataset | Man Luo et.al. | 2502.01676 | null |
2025-02-03 | CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering | Zongxi Li et.al. | 2502.01523 | null |
2025-02-03 | Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant | Gaole He et.al. | 2502.01390 | link |
2025-02-03 | PSSD: Making Large Language Models Self-denial via Human Psyche Structure | Jinzhi Liao et.al. | 2502.01344 | link |
2025-02-03 | Human-Agent Interaction in Synthetic Social Networks: A Framework for Studying Online Polarization | Tim Donkers et.al. | 2502.01340 | null |
2025-02-03 | DeepRAG: Thinking to Retrieval Step by Step for Large Language Models | Xinyan Guan et.al. | 2502.01142 | null |
2025-02-03 | Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning | Guanlin Li et.al. | 2502.01116 | null |
2025-02-03 | ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution | Kanika Goswami et.al. | 2502.00989 | null |
2025-02-03 | Context-Aware Hierarchical Merging for Long Document Summarization | Litu Ou et.al. | 2502.00977 | null |
2025-02-02 | Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications | Yixin Wu et.al. | 2502.00808 | link |
2025-02-02 | Generative AI for Analyzing Participatory Rural Appraisal Data: An Exploratory Case Study in Gender Research | Srividya Sheshadri et.al. | 2502.00763 | null |
2025-02-02 | MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction | Chao Wang et.al. | 2502.00717 | null |
2025-02-01 | Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation | Stuart Armstrong et.al. | 2502.00580 | link |
2025-02-01 | Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning | Zhi Zhou et.al. | 2502.00511 | null |
2025-02-01 | Estimating LLM Uncertainty with Logits | Huan Ma et.al. | 2502.00290 | link |
2025-01-31 | DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets | Abdurrahim Yilmaz et.al. | 2502.00196 | null |
2025-01-31 | Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models | Alina Shutova et.al. | 2501.19392 | link |
2025-01-31 | Towards Adaptive Self-Improvement for Smarter Energy Systems | Alexander Sommer et.al. | 2501.19340 | null |
2025-01-31 | Homogeneity Bias as Differential Sampling Uncertainty in Language Models | Messi H. J. Lee et.al. | 2501.19337 | null |
2025-01-31 | Offline Learning for Combinatorial Multi-armed Bandits | Xutong Liu et.al. | 2501.19300 | null |
2025-01-31 | Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs | Kejia Zhang et.al. | 2501.19164 | null |
2025-01-31 | Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities | Arjun Krishna et.al. | 2501.19012 | null |
2025-01-30 | Survey and Improvement Strategies for Gene Prioritization with Large Language Models | Matthew Neeley et.al. | 2501.18794 | null |
2025-01-30 | Differentially Private Steering for Large Language Model Alignment | Anmol Goel et.al. | 2501.18532 | link |
2025-01-30 | CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization | Yanxia Deng et.al. | 2501.18475 | null |
2025-01-31 | RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing | Jinyao Guo et.al. | 2501.18160 | link |
2025-01-29 | Large Language Models Think Too Fast To Explore Effectively | Lan Pan et.al. | 2501.18009 | null |
2025-01-29 | Uncertainty Quantification and Decomposition for LLM-based Recommendation | Wonbin Kweon et.al. | 2501.17630 | link |
2025-01-29 | Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis | Kunrong Li et.al. | 2501.17598 | null |
2025-01-29 | CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs | Amey Hengle et.al. | 2501.17581 | null |
2025-01-28 | Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization | Zilu Tang et.al. | 2501.17295 | null |
2025-01-26 | Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics | Jin Hyun Park et.al. | 2501.17187 | link |
2025-02-01 | LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering | Beiming Liu et.al. | 2501.17183 | null |
2025-01-28 | FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data | Deren Lei et.al. | 2501.17144 | link |
2025-01-28 | MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search | Shuozhi Yuan et.al. | 2501.16607 | null |
2025-01-27 | Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models | Huayu Li et.al. | 2501.16215 | link |
2025-01-27 | Parametric Retrieval Augmented Generation | Weihang Su et.al. | 2501.15915 | link |
2025-01-26 | Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis | Robinson Umeike et.al. | 2501.15370 | null |
2025-01-26 | Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection | Bo Yang et.al. | 2501.15355 | null |
2025-01-25 | You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning | Ayan Sengupta et.al. | 2501.15296 | null |
2025-01-25 | Can Large Language Models Be Trusted as Black-Box Evolutionary Optimizers for Combinatorial Problems? | Jie Zhao et.al. | 2501.15081 | null |
2025-01-25 | Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations | Harshita Chopra et.al. | 2501.15056 | null |
2025-01-25 | Federated Retrieval Augmented Generation for Multi-Product Question Answering | Parshin Shojaee et.al. | 2501.14998 | null |
2025-01-24 | Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing | Madeline Anderson et.al. | 2501.14905 | null |
2025-01-24 | Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs | Hang Luo et.al. | 2501.14892 | link |
2025-01-24 | Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains | Xu Chu et.al. | 2501.14431 | null |
2025-01-24 | Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph | Xujian Liang et.al. | 2501.14300 | link |
2025-01-24 | Humanity’s Last Exam | Long Phan et.al. | 2501.14249 | null |
2025-01-24 | AI Chatbots as Professional Service Agents: Developing a Professional Identity | Wenwen Li et.al. | 2501.14179 | null |
2025-01-23 | OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting | Xing Hu et.al. | 2501.13987 | link |
2025-01-23 | Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs | Bhumika Gupta et.al. | 2501.13984 | null |
2025-01-20 | A Layered Multi-Expert Framework for Long-Context Mental Health Assessments | Jinwen Tang et.al. | 2501.13951 | null |
2025-01-23 | CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation | Guofeng Cui et.al. | 2501.13927 | null |
2025-01-23 | On the Reasoning Capacity of AI Models and How to Quantify It | Santosh Kumar Radha et.al. | 2501.13833 | null |
2025-01-23 | Hallucinations Can Improve Large Language Models in Drug Discovery | Shuzhou Yuan et.al. | 2501.13824 | null |
2025-01-22 | OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models | Chongren Sun et.al. | 2501.12975 | link |
2025-01-22 | FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | Zhenran Xu et.al. | 2501.12909 | null |
2025-01-22 | Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home | Viktor Moskvoretskii et.al. | 2501.12835 | null |
2025-01-30 | EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering | Chang Zong et.al. | 2501.12746 | null |
2025-01-25 | Online Preference Alignment for Language Models via Count-based Exploration | Chenjia Bai et.al. | 2501.12735 | link |
2025-01-22 | Paradigm-Based Automatic HDL Code Generation Using LLMs | Wenhao Sun et.al. | 2501.12702 | null |
2025-01-19 | AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model | Lipeng Ma et.al. | 2501.11031 | link |
2025-01-18 | Iterative Tree Analysis for Medical Critics | Zenan Huang et.al. | 2501.10642 | null |
2025-01-18 | Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks | Xin Yi et.al. | 2501.10639 | link |
2025-01-17 | 4bit-Quantization in Vector-Embedding for RAG | Taehee Jeong et.al. | 2501.10534 | link |
2025-01-17 | Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling | Suvodip Dey et.al. | 2501.10316 | link |
2025-01-17 | Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions | Zhijie Tan et.al. | 2501.10011 | null |
2025-01-17 | Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models | Qiang Liu et.al. | 2501.09997 | null |
2025-01-22 | FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs | Zengyi Gao et.al. | 2501.09957 | null |
2025-01-17 | Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs | Reham Omar et.al. | 2501.09928 | link |
2025-01-17 | Towards A Litmus Test for Common Sense | Hugo Latapie et.al. | 2501.09913 | null |
2025-01-17 | FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Zhe Chen et.al. | 2501.09887 | null |
2025-01-16 | Bridging Language Barriers in Healthcare: A Study on Arabic LLMs | Nada Saadi et.al. | 2501.09825 | null |
2025-01-16 | Enhancing Generalization in Chain of Thought Reasoning for Smaller Models | Maxwell J. Yin et.al. | 2501.09804 | null |
2025-01-24 | Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong | Tairan Fu et.al. | 2501.09775 | null |
2025-01-16 | Confidence Estimation for Error Detection in Text-to-SQL Systems | Oleg Somov et.al. | 2501.09527 | link |
2025-01-16 | A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy | Huandong Wang et.al. | 2501.09431 | null |
2025-01-16 | Rational Tuning of LLM Cascades via Probabilistic Modeling | Michael J. Zellinger et.al. | 2501.09345 | null |
2025-01-16 | To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation | Kaustubh D. Dhole et.al. | 2501.09292 | null |
2025-01-15 | Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach | Alireza Ghaffari et.al. | 2501.09107 | null |
2025-01-15 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Ruixiang Jiang et.al. | 2501.09012 | link |
2025-01-15 | Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching | Chuangtao Ma et.al. | 2501.08686 | link |
2025-01-14 | SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models | Anurag Kumar et.al. | 2501.08421 | null |
2025-01-14 | OptiChat: Bridging Optimization Models and Practitioners with Large Language Models | Hao Chen et.al. | 2501.08406 | link |
2025-01-14 | HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | Abhilasha Ravichander et.al. | 2501.08292 | null |
2025-01-14 | Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | Feijie Wu et.al. | 2501.07813 | null |
2025-01-13 | GPT as a Monte Carlo Language Tree: A Probabilistic Perspective | Kun-Peng Ning et.al. | 2501.07641 | null |
2025-01-13 | SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models | Fabien Bernier et.al. | 2501.07639 | null |
2025-01-13 | RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | Difei Gu et.al. | 2501.07525 | link |
2025-01-13 | Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection | Xin Yin et.al. | 2501.07425 | null |
2025-01-13 | ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training | Jiayang Wu et.al. | 2501.07078 | link |
2025-01-11 | Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering | Yinghao Hu et.al. | 2501.06521 | link |
2025-01-11 | First Token Probability Guided RAG for Telecom Question Answering | Tingwei Chen et.al. | 2501.06468 | null |
2025-01-21 | MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare | Ye Chen et.al. | 2501.06465 | null |
2025-01-10 | Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea | Eunjung Cho et.al. | 2501.05981 | null |
2025-01-10 | Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models | Sungjae Lee et.al. | 2501.05752 | null |
2025-01-09 | Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning | Laura Puccioni et.al. | 2501.05248 | null |
2025-01-09 | Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments | Yifan Xu et.al. | 2501.04947 | null |
2025-01-09 | HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers | Yiyao Yang et.al. | 2501.04908 | link |
2025-01-09 | SUGAR: Leveraging Contextual Confidence for Smarter Retrieval | Hanna Zubkova et.al. | 2501.04899 | null |
2025-01-08 | Re-ranking the Context for Multimodal Retrieval Augmented Generation | Matin Mortaheb et.al. | 2501.04695 | null |
2025-01-08 | Multi-task retriever fine-tuning for domain-specific and efficient RAG | Patrice Béchard et.al. | 2501.04652 | null |
2025-01-16 | Knowledge Retrieval Based on Generative AI | Te-Lun Yang et.al. | 2501.04635 | null |
2025-01-07 | RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance | Matin Mortaheb et.al. | 2501.03995 | null |
2025-01-07 | Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles | Yuxi Xia et.al. | 2501.03991 | null |
2025-01-07 | Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States | Jurgita Kapočiūtė-Dzikienė et.al. | 2501.03952 | null |
2025-01-08 | A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval | Shuo Tong et.al. | 2501.03295 | null |
2025-01-06 | CALM: Curiosity-Driven Auditing for Large Language Models | Xiang Zheng et.al. | 2501.02997 | link |
2025-01-19 | FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models | Zhuo Chen et.al. | 2501.02968 | null |
2025-01-09 | InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion | Zhaoyi Yan et.al. | 2501.02795 | null |
2025-01-06 | QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance | Binita Saha et.al. | 2501.02702 | null |
2025-01-06 | EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models | Andrés Villa et.al. | 2501.02699 | null |
2025-01-05 | Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications | Zhe Chen et.al. | 2501.02460 | null |
2025-01-04 | Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation | Shijie Wang et.al. | 2501.02226 | null |
2025-01-04 | EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks | Shixuan Liu et.al. | 2501.02192 | null |
2025-01-04 | The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit | Huixue Zhou et.al. | 2501.02173 | null |
2025-01-02 | Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection | Kedi Chen et.al. | 2501.02020 | null |
2025-01-03 | Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification | Xiangxiang Dai et.al. | 2501.01849 | link |
2025-01-03 | LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries | Michal Kuk et.al. | 2501.01711 | null |
2025-01-03 | (WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges | Mohamed Hisham Abdellatif et.al. | 2501.01588 | null |
2025-01-02 | BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery | Kanishk Gandhi et.al. | 2501.01540 | link |
2025-01-02 | Aligning Large Language Models for Faithful Integrity Against Opposing Argument | Yong Zhao et.al. | 2501.01336 | link |
2025-01-02 | Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension | Yanbo Fang et.al. | 2501.01332 | null |
2025-01-03 | Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking | Xiaoxue Cheng et.al. | 2501.01306 | null |
2025-01-02 | Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion | Qiyuan He et.al. | 2501.01246 | null |
2025-01-02 | SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization | Yongle Huang et.al. | 2501.01245 | link |
2025-01-02 | Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method | Ruichen Zhang et.al. | 2501.01141 | null |
2025-01-02 | Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models | Yanwen Huang et.al. | 2501.01059 | null |
2025-01-02 | Dynamic Scaling of Unit Tests for Code Reward Modeling | Zeyao Ma et.al. | 2501.01054 | null |
2025-01-07 | LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management | Yichen Luo et.al. | 2501.00826 | null |
2025-01-01 | NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model | Yuzhi Lai et.al. | 2501.00785 | null |
2024-12-31 | Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs | Harit Vishwakarma et.al. | 2501.00555 | null |
2024-12-31 | A review of faithfulness metrics for hallucination assessment in Large Language Models | Ben Malin et.al. | 2501.00269 | null |
2024-12-31 | CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care | Michael Gubanov et.al. | 2501.00223 | null |
2024-12-30 | CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions | Mourad Heddaya et.al. | 2501.00097 | null |
2024-12-30 | Facilitating large language model Russian adaptation with Learned Embedding Propagation | Mikhail Tikhomirov et.al. | 2412.21140 | link |
2024-12-30 | KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation | Siyuan Fang et.al. | 2412.20995 | null |
2024-12-30 | Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs’ Memory | Xingjian Tao et.al. | 2412.20846 | null |
2024-12-30 | UBER: Uncertainty-Based Evolution with Large Language Models for Automatic Heuristic Design | Zijie Chen et.al. | 2412.20694 | link |
2025-01-05 | Distilling Desired Comments for Enhanced Code Review with Large Language Models | Yongda Yu et.al. | 2412.20340 | null |
2024-12-29 | Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain | Shintaro Ozaki et.al. | 2412.20309 | link |
2024-12-28 | ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty | Qing Zong et.al. | 2412.20251 | link |
2024-12-27 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Sijia Chen et.al. | 2412.19707 | link |
2024-12-27 | Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs | Zhe Yang et.al. | 2412.19513 | link |
2024-12-27 | MBQ: Modality-Balanced Quantization for Large Vision-Language Models | Shiyao Li et.al. | 2412.19509 | link |
2024-12-26 | RAG with Differential Privacy | Nicolas Grislain et.al. | 2412.19291 | link |
2025-01-03 | MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models | Kaiwen Zuo et.al. | 2412.18947 | null |
2025-01-06 | Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | Derong Xu et.al. | 2412.18537 | link |
2024-12-24 | Is Large Language Model Good at Triple Set Prediction? An Empirical Study | Yuan Yuan et.al. | 2412.18443 | null |
2024-12-24 | Annotating References to Mythological Entities in French Literature | Thierry Poibeau et.al. | 2412.18270 | null |
2024-12-24 | Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine | Yu He Ke et.al. | 2412.18096 | null |
2024-12-23 | Trustworthy and Efficient LLMs Meet Databases | Kyoungmin Kim et.al. | 2412.18022 | null |
2024-12-22 | The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM’s Internal States | Fabian Ridder et.al. | 2412.17056 | link |
2024-12-22 | Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs | Alexander von Recum et.al. | 2412.16974 | null |
2024-12-28 | Lillama: Large Language Models Compression via Low-Rank Feature Distillation | Yaya Sy et.al. | 2412.16719 | null |
2024-12-21 | Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks | Jinyan Su et.al. | 2412.16708 | link |
2024-12-21 | AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles | Aritra Kumar Lahiri et.al. | 2412.16701 | null |
2024-12-21 | Internalized Self-Correction for Large Language Models | Nishanth Upadhyaya et.al. | 2412.16653 | null |
2024-12-21 | Identifying Cyberbullying Roles in Social Media | Manuel Sandoval et.al. | 2412.16417 | null |
2024-12-20 | Towards Safe and Honest AI Agents with Neural Self-Other Overlap | Marc Carauleanu et.al. | 2412.16325 | null |
2024-12-20 | Logical Consistency of Large Language Models in Fact-checking | Bishwamittra Ghosh et.al. | 2412.16100 | null |
2024-12-20 | To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models | Jessica Y. Bo et.al. | 2412.15584 | null |
2024-12-24 | Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage | Saehyung Lee et.al. | 2412.15484 | null |
2024-12-19 | Systematic Evaluation of Long-Context LLMs on Financial Concepts | Lavanya Gupta et.al. | 2412.15386 | null |
2024-12-19 | Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models | Nishtha N. Vaidya et.al. | 2412.15309 | null |
2024-12-19 | A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation | Bhaskarjit Sarmah et.al. | 2412.15298 | null |
2024-12-19 | Confidence in the Reasoning of Large Language Models | Yudi Pawitan et.al. | 2412.15296 | link |
2024-12-17 | SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation | Yuzheng Cai et.al. | 2412.15272 | link |
2024-12-17 | A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models | Gongbo Zhang et.al. | 2412.15271 | null |
2024-12-15 | LLMs for Literature Review: Are we there yet? | Shubham Agarwal et.al. | 2412.15249 | null |
2024-12-19 | Rethinking Uncertainty Estimation in Natural Language Generation | Lukas Aichberger et.al. | 2412.15176 | null |
2024-12-19 | Adaptive Pruning for Large Language Models with Structural Importance Awareness | Haotian Zheng et.al. | 2412.15127 | null |
2024-12-19 | Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability | Xiangsen Chen et.al. | 2412.15101 | null |
2024-12-19 | RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response | Junyu Luo et.al. | 2412.14922 | link |
2024-12-19 | Dehallucinating Parallel Context Extension for Retrieval-Augmented Generation | Zexiong Ma et.al. | 2412.14905 | null |
2024-12-19 | Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling | Junyi Li et.al. | 2412.14860 | null |
2024-12-19 | Query pipeline optimization for cancer patient question answering systems | Maolin He et.al. | 2412.14751 | null |
2024-12-19 | On Verbalized Confidence Scores for LLMs | Daniel Yang et.al. | 2412.14737 | link |
2024-12-25 | Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Zijun Chen et.al. | 2412.14660 | link |
2024-12-19 | Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment | Teng Xiao et.al. | 2412.14516 | link |
2024-12-19 | FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis | Abdullah Khan et.al. | 2412.14492 | link |
2024-12-18 | LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis | Chengpeng Wang et.al. | 2412.14399 | null |
2024-12-18 | Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets | Simon Thorne et.al. | 2412.14062 | null |
2024-12-18 | Discovering maximally consistent distribution of causal tournaments with Large Language Models | Federico Baldo et.al. | 2412.14019 | null |
2024-12-27 | Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence | Jinghan He et.al. | 2412.13949 | null |
2024-12-29 | Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection | Le Yang et.al. | 2412.13817 | link |
2024-12-18 | Meta-Reflection: A Feedback-Free Reflection Learning Framework | Yaoke Wang et.al. | 2412.13781 | null |
2024-12-18 | Are LLMs Good Literature Review Writers? Evaluating the Literature Review Writing Ability of Large Language Models | Xuemei Tang et.al. | 2412.13612 | null |
2024-12-18 | Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement | Qianyue Wang et.al. | 2412.13575 | link |
2024-12-18 | C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System | Parker Addison et.al. | 2412.13163 | null |
2024-12-17 | Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health | Vivek Kumar et.al. | 2412.12981 | link |
2024-12-17 | A Survey of Calibration Process for Black-Box LLMs | Liangru Xie et.al. | 2412.12767 | null |
2024-12-18 | Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models | Seungeun Oh et.al. | 2412.12687 | null |
2024-12-17 | What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context | Zhiyuan Chang et.al. | 2412.12632 | null |
2024-12-17 | Jailbreaking? One Step Is Enough! | Weixiong Zheng et.al. | 2412.12621 | null |
2024-12-17 | When to Speak, When to Abstain: Contrastive Decoding with Abstention | Hyuhng Joon Kim et.al. | 2412.12527 | null |
2024-12-12 | Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off | Eoin M. Kenny et.al. | 2412.12169 | link |
2024-12-11 | SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration | Yuanhao Shen et.al. | 2412.12151 | link |
2024-12-16 | LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts | Zhuhao Wang et.al. | 2412.12001 | link |
2024-12-16 | RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation | Xiaoxi Li et.al. | 2412.11919 | link |
2024-12-16 | Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments | Andrii Nikolaiev et.al. | 2412.11908 | null |
2024-12-16 | A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection | Simon Hachmeier et.al. | 2412.11851 | link |
2024-12-16 | UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models | Boyang Xue et.al. | 2412.11803 | link |
2024-12-16 | Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods | Diana Bar-Or Nirman et.al. | 2412.11625 | null |
2024-12-16 | Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes | Antonio Carlos Rivera et.al. | 2412.11396 | null |
2024-12-15 | CATER: Leveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation | Kurando IIDA et.al. | 2412.11261 | null |
2024-12-15 | Do Tutors Learn from Equity Training and Can Generative AI Assess It? | Danielle R. Thomas et.al. | 2412.11255 | link |
2024-12-15 | Task-Oriented Dialog Systems for the Senegalese Wolof Language | Derguene Mbaye et.al. | 2412.11203 | null |
2024-12-15 | Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning | Shengqiong Wu et.al. | 2412.11124 | null |
2024-12-15 | Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning | Yun Qu et.al. | 2412.11120 | link |
2024-12-15 | Empowering LLMs to Understand and Generate Complex Vector Graphics | Ximing Xing et.al. | 2412.11102 | null |
2024-12-17 | MedG-KRP: Medical Graph Knowledge Representation Probing | Gabriel R. Rosenbaum et.al. | 2412.10982 | null |
2024-12-14 | Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data | Xue Wu et.al. | 2412.10654 | null |
2024-12-13 | Benchmarking large language models for materials synthesis: the case of atomic layer deposition | Angel Yanguas-Gil et.al. | 2412.10477 | null |
2024-12-13 | Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts | Hazel Kim et.al. | 2412.10246 | null |
2024-12-13 | How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives | Timour Ichmoukhamedov et.al. | 2412.10220 | link |
2024-12-13 | TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering | Bingru Li et.al. | 2412.10139 | null |
2024-12-13 | ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL | Yang Qin et.al. | 2412.10138 | link |
2024-12-12 | DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | Yu Feng et.al. | 2412.09572 | null |
2024-12-12 | Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion | Ben Liu et.al. | 2412.09094 | link |
2024-12-12 | Dial-In LLM: Human-Aligned Dialogue Intent Clustering with LLM-in-the-loop | Mengze Hong et.al. | 2412.09049 | null |
2024-12-12 | Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning | Wenna Lai et.al. | 2412.09046 | null |
2024-12-12 | ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty | Meizhi Zhong et.al. | 2412.09036 | null |
2024-12-11 | Learning to Reason via Self-Iterative Process Feedback for Small Language Models | Kaiyuan Chen et.al. | 2412.08393 | null |
2024-12-11 | What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models | Bangshuo Zhu et.al. | 2412.08098 | null |
2024-12-10 | HalluCana: Fixing LLM Hallucination with A Canary Lookahead | Tianyi Li et.al. | 2412.07965 | null |
2024-12-10 | Forking Paths in Neural Text Generation | Eric Bigelow et.al. | 2412.07961 | null |
2024-12-10 | Low-Rank Correction for Quantized LLMs | Meyer Scetbon et.al. | 2412.07902 | null |
2024-12-08 | Language Model as Visual Explainer | Xingyi Yang et.al. | 2412.07802 | null |
2024-12-16 | Granite Guardian | Inkit Padhi et.al. | 2412.07724 | link |
2024-12-10 | Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation | Qinhong Lin et.al. | 2412.07255 | null |
2024-12-10 | Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data Replay | Ruiheng Liu et.al. | 2412.07246 | null |
2024-12-10 | MAPLE: A Framework for Active Preference Learning Guided by Large Language Models | Saaduddin Mahmud et.al. | 2412.07207 | null |
2024-12-10 | When Graph Meets Retrieval Augmented Generation for Wireless Networks: A Tutorial and Case Study | Yang Xiong et.al. | 2412.07189 | null |
2024-12-10 | Post-Training Statistical Calibration for Higher Activation Sparsity | Vui Seng Chua et.al. | 2412.07174 | link |
2024-12-11 | ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models | Jieyu Zhang et.al. | 2412.07012 | link |
2024-12-09 | Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study | Ehsan Shareghi et.al. | 2412.06272 | null |
2024-12-09 | MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization | Kangyu Zhu et.al. | 2412.06141 | link |
2024-12-08 | Hallucination-aware Optimization for Large Language Model-empowered Communications | Yinqiu Liu et.al. | 2412.06007 | link |
2024-12-07 | Training-Free Bayesianization for Low-Rank Adapters of Large Language Models | Haizhou Shi et.al. | 2412.05723 | link |
2024-12-07 | Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent | Ziyuan Qin et.al. | 2412.05722 | null |
2024-12-07 | A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions | Ola Shorinwa et.al. | 2412.05563 | null |
2024-12-07 | Ranking of Large Language Model with Nonparametric Prompts | Zebin Wang et.al. | 2412.05506 | null |
2024-12-06 | Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization | Subhojyoti Mukherjee et.al. | 2412.05469 | null |
2024-12-06 | A Graph-Based Approach for Conversational AI-Driven Personal Memory Capture and Retrieval in a Real-world Application | Savini Kashmira et.al. | 2412.05447 | null |
2024-12-06 | HiVeGen – Hierarchical LLM-based Verilog Generation for Scalable Chip Design | Jinwei Tang et.al. | 2412.05393 | null |
2024-12-09 | Enhancing FKG.in: automating Indian food composition analysis | Saransh Kumar Gupta et.al. | 2412.05248 | null |
2024-12-06 | 100% Hallucination Elimination Using Acurai | Michael C. Wood et.al. | 2412.05223 | link |
2024-12-06 | Steps are all you need: Rethinking STEM Education with Prompt Engineering | Krishnasai Addala et.al. | 2412.05023 | null |
2024-12-06 | Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance | Xuchan Bao et.al. | 2412.04746 | null |
2024-12-06 | LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs | Xuan Chen et.al. | 2412.04690 | null |
2024-12-05 | HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning | Manish Bhattarai et.al. | 2412.04661 | link |
2024-12-10 | Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates | Li Shi et.al. | 2412.04629 | null |
2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
2024-12-05 | Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation | Xuying Li et.al. | 2412.04415 | null |
2024-12-05 | Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots | Maria Paola Priola et.al. | 2412.04235 | null |
2024-12-05 | Reducing Tool Hallucination via Reliability Alignment | Hongshen Xu et.al. | 2412.04141 | null |
2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
2024-12-04 | You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? | Dominic Lohr et.al. | 2412.03516 | null |
2024-12-03 | Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning | Ranganath Krishnan et.al. | 2412.02904 | null |
2024-12-03 | An Evolutionary Large Language Model for Hallucination Mitigation | Abdennour Boulesnane et.al. | 2412.02790 | null |
2024-12-03 | OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Junyuan Zhang et.al. | 2412.02592 | link |
2024-12-03 | Semantic Tokens in Retrieval Augmented Generation | Joel Suro et.al. | 2412.02563 | null |
2024-12-04 | The use of large language models to enhance cancer clinical trial educational materials | Mingye Gao et.al. | 2412.01955 | null |
2024-12-04 | The Reality of AI and Biorisk | Aidan Peppin et.al. | 2412.01946 | null |
2024-12-02 | R-Bot: An LLM-based Query Rewrite System | Zhaoyan Sun et.al. | 2412.01661 | null |
2024-12-02 | Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input | Francesco Taioli et.al. | 2412.01250 | null |
2024-12-02 | SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages | Jia Guo et.al. | 2412.01186 | link |
2024-12-02 | SAUP: Situation Awareness Uncertainty Propagation on LLM Agent | Qiwei Zhao et.al. | 2412.01033 | null |
2024-12-02 | AI Benchmarks and Datasets for LLM Evaluation | Todor Ivanov et.al. | 2412.01020 | null |
2024-12-06 | Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection | Shanu Kumar et.al. | 2412.00353 | null |
2024-11-30 | Human-Like Code Quality Evaluation through LLM-based Recursive Semantic Comprehension | Fangzhou Xu et.al. | 2412.00314 | null |
2024-11-29 | An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement | Saurabh Mishra et.al. | 2412.00224 | null |
2024-11-24 | Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis | Ferhat Ozgur Catak et.al. | 2412.00056 | null |
2024-12-02 | Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis | Alessandro Scirè et.al. | 2411.19655 | link |
2024-11-29 | RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation | Xianfeng Tan et.al. | 2411.19528 | null |
2024-11-29 | Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems | Shengming Zhao et.al. | 2411.19463 | null |
2024-11-28 | Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs | Anirudh Phukan et.al. | 2411.19187 | null |
2024-11-28 | Mars-PO: Multi-Agent Reasoning System Preference Optimization | Xiaoxuan Lou et.al. | 2411.19039 | null |
2024-11-28 | AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models | Jisheng Bai et.al. | 2411.18953 | link |
2024-11-27 | Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students | Tiffany Zhu et.al. | 2411.18708 | null |
2024-11-27 | Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track | Deepak Gupta et.al. | 2411.18069 | null |
2024-11-26 | MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation | Sankalp Sinha et.al. | 2411.17945 | link |
2024-11-26 | AI2T: Building Trustable AI Tutors by Interactively Teaching a Self-Aware Learning Agent | Daniel Weitekamp et.al. | 2411.17924 | null |
2024-11-26 | $H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs | Selim Furkan Tekin et.al. | 2411.17792 | link |
2024-11-26 | MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation | Harsh Singh et.al. | 2411.17636 | null |
2024-11-26 | One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models | Pengfei Cao et.al. | 2411.17401 | null |
2024-11-26 | Can LLMs be Good Graph Judger for Knowledge Graph Construction? | Haoyu Huang et.al. | 2411.17388 | link |
2024-11-26 | Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning | Milena Chadimová et.al. | 2411.17304 | null |
2024-11-26 | HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator | Fan Yang et.al. | 2411.17261 | null |
2024-11-25 | Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries | Harshavardhan Battula et.al. | 2411.16818 | null |
2024-11-25 | Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models | Alireza Amiri-Margavi et.al. | 2411.16797 | null |
2024-11-25 | VidHal: Benchmarking Temporal Hallucinations in Vision LLMs | Wey Yeh Choong et.al. | 2411.16771 | link |
2024-11-23 | Text-to-SQL Calibration: No Need to Ask – Just Rescale Model Probabilities | Ashwin Ramachandran et.al. | 2411.16742 | null |
2024-11-23 | Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction | Mitchell Rosser et.al. | 2411.16723 | null |
2024-11-28 | Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation | Sanjana Ramprasad et.al. | 2411.16638 | null |
2024-12-03 | AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning | Amy Xin et.al. | 2411.16495 | link |
2024-11-25 | Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models | Zhihua Duan et.al. | 2411.16189 | null |
2024-11-24 | Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown | Lifu Tu et.al. | 2411.15993 | null |
2024-11-23 | Ontology-Constrained Generation of Domain-Specific Clinical Summaries | Gaya Mehenni et.al. | 2411.15666 | link |
2024-11-23 | MC-NEST – Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree | Gollam Rabby et.al. | 2411.15645 | link |
2024-11-23 | “All that Glitters”: Approaches to Evaluations with Unreliable Model and Human Annotations | Michael Hardy et.al. | 2411.15634 | link |
2024-11-22 | Sycophancy in Large Language Models: Causes and Mitigations | Lars Malmqvist et.al. | 2411.15287 | null |
2024-11-18 | Can Open-source LLMs Enhance Data Augmentation for Toxic Detection?: An Experimental Study | Zheng Hui et.al. | 2411.15175 | null |
2024-11-22 | Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation | Colin Diggs et.al. | 2411.14971 | null |
2024-11-22 | SwissADT: An Audio Description Translation System for Swiss Languages | Lukas Fischer et.al. | 2411.14967 | null |
2024-12-01 | G-RAG: Knowledge Expansion in Material Science | Radeen Mostafa et.al. | 2411.14592 | link |
2024-11-20 | The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz | David Noever et.al. | 2411.14486 | null |
2024-11-19 | Why you don’t overfit, and don’t need Bayes if you only train for one epoch | Laurence Aitchison et.al. | 2411.14478 | null |
2024-11-18 | Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning | Elizaveta Reganova et.al. | 2411.14465 | null |
2024-11-15 | Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models | Maryam Shoaeinaeini et.al. | 2411.14457 | null |
2024-11-21 | Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance | Haozhe Zhao et.al. | 2411.14279 | null |
2024-11-21 | Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective | Ernests Lavrinovics et.al. | 2411.14258 | null |
2024-11-21 | RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks | Changyue Jiang et.al. | 2411.14110 | null |
2024-11-21 | XAgents: A Framework for Interpretable Rule-Based Multi-Agents Cooperation | Hailong Yang et.al. | 2411.13932 | null |
2024-11-21 | Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels | Jianhao Yan et.al. | 2411.13775 | link |
2024-11-20 | Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics | Ryan Mok et.al. | 2411.13685 | link |
2024-11-21 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
2024-11-20 | Fact-Level Confidence Calibration and Self-Correction | Yige Yuan et.al. | 2411.13343 | link |
2024-11-20 | Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding | Nabeel Seedat et.al. | 2411.13163 | null |
2024-11-16 | A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery | Grace Sng et.al. | 2411.12759 | null |
2024-11-19 | Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs | Malay Kumar et.al. | 2411.12685 | null |
2024-11-15 | Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Haojie Zheng et.al. | 2411.12591 | link |
2024-11-19 | Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering | Aryan Keluskar et.al. | 2411.12395 | null |
2024-11-28 | VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation | Ruiyang Zhang et.al. | 2411.11919 | null |
2024-11-07 | Deploying Large Language Models With Retrieval Augmented Generation | Sonal Prabhune et.al. | 2411.11895 | link |
2024-11-18 | Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality | Viktoriia Chekalina et.al. | 2411.11531 | null |
2024-11-18 | Membership Inference Attack against Long-Context Large Language Models | Zixiong Wang et.al. | 2411.11424 | null |
2024-11-29 | Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? | Rosalia Tufano et.al. | 2411.11401 | link |
2024-11-17 | Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering | Zeping Yu et.al. | 2411.10950 | link |
2024-11-16 | Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation | Shuyang Hou et.al. | 2411.10753 | null |
2024-11-16 | I’m Spartacus, No, I’m Spartacus: Measuring and Understanding LLM Identity Confusion | Kun Li et.al. | 2411.10683 | null |
2024-11-15 | Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering | Buntaro Hiraki et.al. | 2411.10583 | null |
2024-11-15 | On the Privacy Risk of In-context Learning | Haonan Duan et.al. | 2411.10512 | null |
2024-11-15 | Understanding The Effect Of Temperature On Alignment With Human Opinions | Maja Pavlovic et.al. | 2411.10080 | null |
2024-11-15 | Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity | Zichen Song et.al. | 2411.10069 | null |
2024-11-15 | Experiences from Using LLMs for Repository Mining Studies in Empirical Software Engineering | Vincenzo de Martino et.al. | 2411.09974 | null |
2024-11-15 | AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference | Janghwan Lee et.al. | 2411.09909 | null |
2024-11-14 | LLM Hallucination Reasoning with Zero-shot Knowledge Test | Seongmin Lee et.al. | 2411.09689 | null |
2024-11-14 | DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine | Jean Seo et.al. | 2411.09255 | link |
2024-11-14 | Toward Democratized Generative AI in Next-Generation Mobile Edge Networks | Ruichen Zhang et.al. | 2411.09148 | null |
2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | link |
2024-11-04 | QCG-Rerank: Chunks Graph Rerank with Query Expansion in Retrieval-Augmented LLMs for Tourism Domain | Qikai Wei et.al. | 2411.08724 | null |
2024-11-13 | Neural Topic Modeling with Large Language Models in the Loop | Xiaohao Yang et.al. | 2411.08534 | null |
2024-11-13 | Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach | Shangfeng Chen et.al. | 2411.08348 | null |
2024-11-13 | Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering | Farouq Sammour et.al. | 2411.08320 | null |
2024-11-12 | Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data | Juanhui Li et.al. | 2411.08028 | null |
2024-11-12 | From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents | Chuyi Kong et.al. | 2411.07965 | null |
2024-11-13 | Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders | Xiaofeng Zhu et.al. | 2411.07870 | null |
2024-11-12 | Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models | Yusen Zhang et.al. | 2411.07858 | link |
2024-11-12 | OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework | Jiaxi Li et.al. | 2411.07711 | link |
2024-11-12 | DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises | Nan Xu et.al. | 2411.07457 | link |
2024-11-16 | Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation | Ziwei Liu et.al. | 2411.07021 | null |
2024-11-11 | LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help? | Rikiya Takehi et.al. | 2411.06877 | link |
2024-11-11 | AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant | Yujia Zhou et.al. | 2411.06805 | link |
2024-11-11 | Anchor Attention, Small Cache: Code Generation with Large Language Models | Xiangyu Zhang et.al. | 2411.06680 | link |
2024-11-10 | CriticAL: Critic Automation with Language Models | Michael Y. Li et.al. | 2411.06590 | null |
2024-11-10 | Epistemic Integrity in Large Language Models | Bijean Ghafouri et.al. | 2411.06528 | link |
2024-11-10 | Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques | Daniil Sulimov et.al. | 2411.06445 | null |
2024-11-09 | Sufficient Context: A New Lens on Retrieval Augmented Generation Systems | Hailey Joren et.al. | 2411.06037 | null |
2024-11-12 | Game-theoretic LLM: Agent Workflow for Negotiation Games | Wenyue Hua et.al. | 2411.05990 | link |
2024-11-08 | FactLens: Benchmarking Fine-Grained Fact Verification | Kushan Mitra et.al. | 2411.05980 | null |
2024-11-08 | Mitigating Hallucination with ZeroG: An Advanced Knowledge Management Engine | Anantha Sharma et.al. | 2411.05936 | null |
2024-11-08 | The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent | Leon O. H. Kroczek et.al. | 2411.05653 | null |
2024-11-16 | Web Archives Metadata Generation with GPT-4o: Challenges and Insights | Abigail Yongping Huang et.al. | 2411.05409 | link |
2024-11-08 | Seeing Through the Fog: A Cost-Effectiveness Analysis of Hallucination Detection Systems | Alexander Thomas et.al. | 2411.05270 | null |
2024-11-07 | Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability | Yanjun Gao et.al. | 2411.04962 | null |
2024-11-07 | Prompt-Guided Internal States for Hallucination Detection of Large Language Models | Fujie Zhang et.al. | 2411.04847 | link |
2024-11-07 | Self-Calibrated Listwise Reranking with Large Language Models | Ruiyang Ren et.al. | 2411.04602 | null |
2024-11-07 | LLM-R: A Framework for Domain-Adaptive Maintenance Scheme Generation Combining Hierarchical Agents and RAG | Laifa Tao et.al. | 2411.04476 | null |
2024-11-07 | Bayesian Calibration of Win Rate Estimation with LLM Evaluators | Yicheng Gao et.al. | 2411.04424 | link |
2024-11-06 | A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI | Melusi Malinga et.al. | 2411.04316 | null |
2024-11-06 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | link |
2024-11-06 | Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation | Yuhang Liu et.al. | 2411.03957 | null |
2024-11-06 | EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning | Kiran Purohit et.al. | 2411.03877 | link |
2024-11-06 | QUILL: Quotation Generation Enhancement of Large Language Models | Jin Xiao et.al. | 2411.03675 | link |
2024-11-05 | Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature | Viviane Torres da Silva et.al. | 2411.03484 | null |
2024-11-05 | VERITAS: A Unified Approach to Reliability Evaluation | Rajkumar Ramamurthy et.al. | 2411.03300 | null |
2024-11-05 | Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities | Ryosuke Takata et.al. | 2411.03252 | null |
2024-11-05 | HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | Jiejun Tan et.al. | 2411.02959 | link |
2024-11-05 | Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning | Tao Zhang et.al. | 2411.02864 | null |
2024-11-05 | V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization | Yuxi Xie et.al. | 2411.02712 | link |
2024-11-07 | FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees | Fan Nie et.al. | 2411.02603 | null |
2024-11-03 | Graph-based Confidence Calibration for Large Language Models | Yukun Li et.al. | 2411.02454 | null |
2024-11-03 | Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models | Aliyah R. Hsu et.al. | 2411.02448 | link |
2024-11-04 | Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models | Guangzhi Xiong et.al. | 2411.02382 | null |
2024-11-04 | Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI | Ramneet Kaur et.al. | 2411.02381 | null |
2024-11-04 | “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization | Eldar Kurtic et.al. | 2411.02355 | null |
2024-11-03 | Autoformulation of Mathematical Optimization Models Using LLMs | Nicolás Astorga et.al. | 2411.01679 | null |
2024-11-03 | Ontology Population using LLMs | Sanaz Saki Norouzi et.al. | 2411.01612 | null |
2024-11-02 | AMREx: AMR for Explainable Fact Verification | Chathuri Jayaweera et.al. | 2411.01343 | null |
2024-11-01 | Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output | Hithesh Sankararaman et.al. | 2411.01022 | null |
2024-10-30 | FPE-LLM: Highly Intelligent Time-Series Forecasting and Language Interaction LLM in Energy Systems | Zihang Qiu et.al. | 2411.00852 | null |
2024-10-30 | GWQ: Gradient-Aware Weight Quantization for Large Language Models | Yihua Shao et.al. | 2411.00850 | null |
2024-11-01 | CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation | Ziting Wang et.al. | 2411.00744 | null |
2024-11-01 | Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval | Qingfei Zhao et.al. | 2411.00689 | null |
2024-11-01 | Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation | Bohan Lyu et.al. | 2411.00412 | null |
2024-11-01 | Beyond Utility: Evaluating LLM as Recommender | Chumeng Jiang et.al. | 2411.00331 | link |
2024-11-01 | Rationale-Guided Retrieval Augmented Generation for Medical Question Answering | Jiwoong Sohn et.al. | 2411.00300 | link |
2024-11-01 | RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models | Sraavya Sambara et.al. | 2411.00299 | null |
2024-10-29 | Problem Categorization Can Help Large Language Models Solve Math Problems | Amogh Akella et.al. | 2411.00042 | null |
2024-10-28 | A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges | Zifeng Wang et.al. | 2411.00024 | null |
2024-11-04 | Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models | Ognjen et.al. | 2411.00023 | null |
2024-10-31 | Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs | Liyi Chen et.al. | 2410.23875 | link |
2024-10-31 | Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs | Shuyang Yu et.al. | 2410.23605 | null |
2024-10-31 | Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | Sheryl Hsu et.al. | 2410.23214 | null |
2024-10-30 | VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Jingkun Ma et.al. | 2410.22995 | null |
2024-10-30 | Retrieval-Augmented Generation with Estimation of Source Reliability | Jeongyeon Hwang et.al. | 2410.22954 | null |
2024-10-30 | Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations | Leonardo Ranaldi et.al. | 2410.22874 | null |
2024-10-30 | Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot | Sejin Lee et.al. | 2410.22767 | link |
2024-10-30 | Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings | Yashvir S. Grewal et.al. | 2410.22685 | null |
2024-10-29 | Distinguishing Ignorance from Error in LLM Hallucinations | Adi Simhi et.al. | 2410.22071 | link |
2024-10-29 | Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | Monica Riedler et.al. | 2410.21943 | link |
2024-10-29 | MARCO: Multi-Agent Real-time Chat Orchestration | Anubhav Shrimal et.al. | 2410.21784 | null |
2024-10-28 | LLM-Forest for Health Tabular Data Imputation | Xinrui He et.al. | 2410.21520 | null |
2024-10-28 | EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | Shih-Yang Liu et.al. | 2410.21271 | null |
2024-10-28 | CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | Meiqi Chen et.al. | 2410.21067 | null |
2024-10-28 | Reward Modeling with Weak Supervision for Language Models | Ben Hauptvogel et.al. | 2410.20869 | link |
2024-10-28 | Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation | Jaechang Kim et.al. | 2410.20811 | null |
2024-10-28 | Graph-based Uncertainty Metrics for Long-form Language Model Outputs | Mingjian Jiang et.al. | 2410.20783 | link |
2024-10-28 | Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation | Dongryeol Lee et.al. | 2410.20774 | link |
2024-10-28 | Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation | Mufei Li et.al. | 2410.20724 | link |
2024-10-27 | Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains | Jiemin Wu et.al. | 2410.20340 | null |
2024-10-26 | Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models | Mohammad Beigi et.al. | 2410.20199 | null |
2024-10-26 | Uncertainty-Penalized Direct Preference Optimization | Sam Houliston et.al. | 2410.20187 | null |
2024-10-26 | Mask-based Membership Inference Attacks for Retrieval-Augmented Generation | Mingrui Liu et.al. | 2410.20142 | null |
2024-10-26 | Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics | Mikhail Rumiantsau et.al. | 2410.20024 | null |
2024-10-25 | FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning | Nicole Cho et.al. | 2410.19727 | null |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | null |
2024-10-30 | ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems | Ishneet Sukhvinder Singh et.al. | 2410.19572 | null |
2024-11-01 | Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization | Anthony Cui et.al. | 2410.19499 | null |
2024-10-25 | A Debate-Driven Experiment on LLM Hallucinations and Accuracy | Ray Li et.al. | 2410.19485 | null |
2024-10-25 | Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models | Liam Barkley et.al. | 2410.19385 | null |
2024-10-25 | Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning | Yujian Liu et.al. | 2410.19290 | link |
2024-10-24 | Prebunking Elections Rumors: Artificial Intelligence Assisted Interventions Increase Confidence in American Elections | Mitchell Linegar et.al. | 2410.19202 | null |
2024-10-24 | AlignCap: Aligning Speech Emotion Captioning to Human Preferences | Ziqi Liang et.al. | 2410.19134 | null |
2024-10-24 | LLM Tree Search | Dylan Wilson et.al. | 2410.19117 | null |
2024-10-30 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
2024-10-24 | DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations | Aryo Pradipta Gema et.al. | 2410.18860 | link |
2024-10-25 | An LLM Agent for Automatic Geospatial Data Analysis | Yuxing Chen et.al. | 2410.18792 | null |
2024-10-24 | Task Calibration: Calibrating Large Language Models on Inference Tasks | Yingjie Li et.al. | 2410.18764 | null |
2024-10-24 | LLM-Slice: Dedicated Wireless Network Slicing for Large Language Models | Boyi Liu et.al. | 2410.18499 | null |
2024-10-23 | AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models | Kim Sung-Bin et.al. | 2410.18325 | link |
2024-10-23 | Multilingual Hallucination Gaps in Large Language Models | Cléa Chataigner et.al. | 2410.18270 | null |
2024-10-23 | Beware of Calibration Data for Pruning Large Language Models | Yixin Ji et.al. | 2410.17711 | null |
2024-10-23 | MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models | Guijin Son et.al. | 2410.17578 | link |
2024-10-29 | Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination | Jerry Huang et.al. | 2410.17477 | null |
2024-10-22 | ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs | Reza Fayyazi et.al. | 2410.17406 | link |
2024-10-22 | DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR | Miguel Contreras et.al. | 2410.17363 | null |
2024-10-22 | Are Large Language Models Ready for Travel Planning? | Ruiping Ren et.al. | 2410.17333 | null |
2024-10-22 | Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | Benedict Aaron Tjandra et.al. | 2410.17234 | null |
2024-10-23 | GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks | Shuyang Hou et.al. | 2410.17031 | null |
2024-10-22 | SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine | Xiaochen Wang et.al. | 2410.17021 | null |
2024-10-22 | Combining Ontological Knowledge and Large Language Model for User-Friendly Service Robots | Haru Nakajima et.al. | 2410.16804 | null |
2024-10-21 | Large language models enabled multiagent ensemble method for efficient EHR data labeling | Jingwei Huang et.al. | 2410.16543 | null |
2024-10-21 | Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models’ Reasoning with Formal Logic | Jason Chan et.al. | 2410.16502 | null |
2024-10-18 | Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs | Rui Pu et.al. | 2410.16327 | null |
2024-10-29 | Can Knowledge Editing Really Correct Hallucinations? | Baixiang Huang et.al. | 2410.16251 | link |
2024-10-21 | Analyzing Context Contributions in LLM-based Machine Translation | Emmanouil Zaranis et.al. | 2410.16246 | null |
2024-10-23 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237 | null |
2024-10-21 | Information for Conversation Generation: Proposals Utilising Knowledge Graphs | Alex Clay et.al. | 2410.16196 | null |
2024-10-22 | Reducing Hallucinations in Vision-Language Models via Latent Space Steering | Sheng Liu et.al. | 2410.15778 | link |
2024-10-21 | Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding | Derong Xu et.al. | 2410.15702 | null |
2024-10-21 | Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences | Yiping Ma et.al. | 2410.15701 | null |
2024-10-21 | NetSafe: Exploring the Topological Safety of Multi-agent Networks | Miao Yu et.al. | 2410.15686 | null |
2024-10-21 | Bayesian Concept Bottleneck Models with LLM Priors | Jean Feng et.al. | 2410.15555 | link |
2024-10-20 | Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini | Chanseo Lee et.al. | 2410.15528 | null |
2024-10-22 | Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence | Norbert Tihanyi et.al. | 2410.15490 | null |
2024-10-20 | Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training | Shahrad Mohammadzadeh et.al. | 2410.15460 | null |
2024-10-20 | CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges | Haitao Li et.al. | 2410.15393 | link |
2024-10-20 | A Survey of Hallucination in Large Visual Language Models | Wei Lan et.al. | 2410.15359 | null |
2024-10-20 | Modality-Fair Preference Optimization for Trustworthy MLLM Alignment | Songtao Jiang et.al. | 2410.15334 | null |
2024-10-20 | A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice | Hsiu-Yuan Huang et.al. | 2410.15326 | null |
2024-10-20 | Causality for Large Language Models | Anpeng Wu et.al. | 2410.15319 | link |
2024-10-20 | MAD: Move AI Decompiler to Improve Transparency and Auditability on Non-Open-Source Blockchain Smart Contract | Eason Chen et.al. | 2410.15275 | null |
2024-10-19 | Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction | Yinhan He et.al. | 2410.15165 | link |
2024-10-19 | MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification | Yin Li et.al. | 2410.15154 | link |
2024-10-22 | Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization | Zihui Wu et.al. | 2410.15052 | link |
2024-10-19 | “Ghost of the past”: identifying and resolving privacy leakage from LLM’s memory through proactive user interaction | Shuning Zhang et.al. | 2410.14931 | null |
2024-10-18 | FedSpaLLM: Federated Pruning of Large Language Models | Guangji Bai et.al. | 2410.14852 | null |
2024-10-18 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz et.al. | 2410.14763 | link |
2024-10-22 | ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries | Kishan Maharaj et.al. | 2410.14748 | null |
2024-10-17 | Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors | Anthony Sicilia et.al. | 2410.14744 | null |
2024-10-18 | Enhancing Large Language Models’ Situated Faithfulness to External Contexts | Yukun Huang et.al. | 2410.14675 | link |
2024-10-22 | Do LLMs estimate uncertainty well in instruction-following? | Juyeon Heo et.al. | 2410.14582 | link |
2024-10-18 | Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models | James Vo et.al. | 2410.14480 | null |
2024-10-18 | Zero-shot Action Localization via the Confidence of Large Vision-Language Models | Josiah Aklilu et.al. | 2410.14340 | null |
2024-10-18 | Critical Questions Generation: Motivation and Challenges | Blanca Calvo Figueras et.al. | 2410.14335 | link |
2024-10-18 | ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM | Songheng Zhang et.al. | 2410.14331 | null |
2024-10-18 | LoGU: Long-form Generation with Uncertainty Expressions | Ruihan Yang et.al. | 2410.14309 | link |
2024-10-22 | Good Parenting is all you need – Multi-agentic LLM Hallucination Mitigation | Ted Kwartler et.al. | 2410.14262 | null |
2024-10-18 | Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models | Olga Loginova et.al. | 2410.14248 | null |
2024-10-21 | Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning | Xingyu Tan et.al. | 2410.14211 | null |
2024-10-18 | Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment | Chenhang Cui et.al. | 2410.14148 | null |
2024-10-17 | From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization | Catarina G. Belem et.al. | 2410.13961 | link |
2024-10-17 | Goal Inference from Open-Ended Dialog | Rachel Ma et.al. | 2410.13957 | null |
2024-10-17 | RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards | Xinze Li et.al. | 2410.13509 | link |
2024-10-17 | Advancing Large Language Model Attribution through Self-Improving | Lei Huang et.al. | 2410.13298 | null |
2024-10-17 | Learning to Route with Confidence Tokens | Yu-Neng Chuang et.al. | 2410.13284 | null |
2024-10-17 | Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning | Minseok Choi et.al. | 2410.13274 | null |
2024-10-17 | Atomic Calibration of LLMs in Long-Form Generations | Caiqi Zhang et.al. | 2410.13246 | null |
2024-10-17 | LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch | Caigao Jiang et.al. | 2410.13213 | link |
2024-10-17 | FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs | Forrest Sheng Bao et.al. | 2410.13210 | link |
2024-10-18 | MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback | Zonghai Yao et.al. | 2410.13191 | link |
2024-10-21 | Utilizing Large Language Models in An Iterative Paradigm with Domain Feedback for Molecule Optimization | Khiem Le et.al. | 2410.13147 | null |
2024-10-17 | Trust but Verify: Programmatic VLM Evaluation in the Wild | Viraj Prabhu et.al. | 2410.13121 | null |
2024-10-17 | Learning to Summarize from LLM-generated Feedback | Hwanjun Song et.al. | 2410.13116 | null |
2024-10-16 | Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models | Jie Ren et.al. | 2410.13088 | null |
2024-10-16 | Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models | Linhao Luo et.al. | 2410.13080 | link |
2024-10-16 | PromptExp: Multi-granularity Prompt Explanation of Large Language Models | Ximing Dong et.al. | 2410.13073 | null |
2024-10-16 | LLM Confidence Evaluation Measures in Zero-Shot CSS Classification | David Farr et.al. | 2410.13047 | null |
2024-10-16 | When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems | Asir Saadat et.al. | 2410.13029 | null |
2024-10-16 | LLM Chain Ensembles for Scalable and Accurate Data Annotation | David Farr et.al. | 2410.13006 | link |
2024-10-16 | REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models | Ambuje Gupta et.al. | 2410.12890 | null |
2024-10-16 | On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs | Herun Wan et.al. | 2410.12600 | null |
2024-10-16 | A Claim Decomposition Benchmark for Long-form Answer Verification | Zhihao Zhang et.al. | 2410.12558 | link |
2024-10-17 | MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration | Jinjie Wei et.al. | 2410.12532 | null |
2024-10-16 | RosePO: Aligning LLM-based Recommenders with Human Values | Jiayi Liao et.al. | 2410.12519 | null |
2024-10-16 | KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs | Yongqin Xu et.al. | 2410.12480 | null |
2024-10-18 | MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models | Boyang Xue et.al. | 2410.12478 | link |
2024-10-16 | ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs | Jingming Zhuo et.al. | 2410.12405 | link |
2024-10-17 | Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs | Lei Sun et.al. | 2410.12298 | null |
2024-10-16 | Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors | Linwei Tao et.al. | 2410.12295 | null |
2024-10-17 | LLM-based Cognitive Models of Students with Misconceptions | Shashank Sonkar et.al. | 2410.12294 | null |
2024-10-16 | An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation | Junjie Chen et.al. | 2410.12265 | null |
2024-10-16 | CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity | Jintao Liu et.al. | 2410.12248 | link |
2024-10-16 | On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation | Xiaonan Jing et.al. | 2410.12222 | null |
2024-10-16 | Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning | Huiwen Wu et.al. | 2410.12130 | null |
2024-10-15 | Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction | Kaiqiao Han et.al. | 2410.12040 | link |
2024-10-15 | Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents | Bolun Sun et.al. | 2410.11906 | null |
2024-10-15 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab et.al. | 2410.11711 | link |
2024-10-15 | Black-box Uncertainty Quantification Method for LLM-as-a-Judge | Nico Wagner et.al. | 2410.11594 | null |
2024-10-15 | AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data | Xinjie Zhao et.al. | 2410.11531 | null |
2024-10-15 | ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability | Zhongxiang Sun et.al. | 2410.11414 | null |
2024-10-15 | LargePiG: Your Large Language Model is Secretly a Pointer Generator | Zhongxiang Sun et.al. | 2410.11366 | null |
2024-10-15 | Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs | Shuo Li et.al. | 2410.11302 | null |
2024-10-15 | On the Capacity of Citation Generation by Large Language Models | Haosheng Qian et.al. | 2410.11217 | null |
2024-10-14 | LLM Unlearning via Loss Adjustment with Only Forget Data | Yaxuan Wang et.al. | 2410.11143 | null |
2024-10-14 | Can Structured Data Reduce Epistemic Uncertainty? | Shriram M S et.al. | 2410.11141 | null |
2024-10-14 | Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only | Jihan Yao et.al. | 2410.11055 | link |
2024-10-13 | 3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation | Hongxin Ding et.al. | 2410.10901 | null |
2024-10-14 | Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance | Sachin Goyal et.al. | 2410.10796 | link |
2024-10-16 | SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators | Rasoul Shafipour et.al. | 2410.10714 | null |
2024-10-14 | On Calibration of LLM-based Guard Models for Reliable Content Moderation | Hongfu Liu et.al. | 2410.10414 | link |
2024-10-14 | Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion | Xinping Zhao et.al. | 2410.10408 | null |
2024-10-14 | Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search | Chenglin Li et.al. | 2410.10392 | null |
2024-10-14 | Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning | Yongxin Xu et.al. | 2410.10360 | null |
2024-10-14 | SkillAggregation: Reference-free LLM-Dependent Aggregation | Guangzhi Sun et.al. | 2410.10215 | null |
2024-10-13 | A Multi-LLM Orchestration Engine for Personalized, Context-Rich Assistance | Sumedh Rasal et.al. | 2410.10039 | null |
2024-10-13 | Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code | Nan Jiang et.al. | 2410.09997 | null |
2024-10-15 | LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Han Qiu et.al. | 2410.09962 | link |
2024-10-13 | Can Large Language Models Generate Geospatial Code? | Shuyang Hou et.al. | 2410.09738 | null |
2024-10-13 | Taming Overconfidence in LLMs: Reward Calibration in RLHF | Jixuan Leng et.al. | 2410.09724 | link |
2024-10-13 | Honest AI: Fine-Tuning “Small” Language Models to Say “I Don’t Know”, and Reducing Hallucination in RAG | Xinxi Chen et.al. | 2410.09699 | null |
2024-10-13 | Integrating Reinforcement Learning and Large Language Models for Crop Production Process Management Optimization and Control through A New Knowledge-Based Deep Learning Paradigm | Dong Chen et.al. | 2410.09680 | null |
2024-10-12 | FlatQuant: Flatness Matters for LLM Quantization | Yuxuan Sun et.al. | 2410.09426 | link |
2024-10-12 | LLM $\times$ MapReduce: Simplified Long-Sequence Processing using Large Language Models | Zihan Zhou et.al. | 2410.09342 | link |
2024-10-15 | Nudging: Inference-time Alignment via Model Collaboration | Yu Fei et.al. | 2410.09300 | null |
2024-10-11 | Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective | Bo Ni et.al. | 2410.08985 | null |
2024-10-11 | NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Zheng Yi Ho et.al. | 2410.08970 | null |
2024-10-11 | Decoding Secret Memorization in Code LLMs Through Token-Level Characterization | Yuqing Nie et.al. | 2410.08858 | null |
2024-10-11 | Measuring the Inconsistency of Large Language Models in Preferential Ranking | Xiutian Zhao et.al. | 2410.08851 | null |
2024-10-11 | Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction | Zhuoran Li et.al. | 2410.08829 | link |
2024-10-11 | Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation | Ruobing Wang et.al. | 2410.08821 | link |
2024-10-11 | VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding | Houlun Chen et.al. | 2410.08593 | link |
2024-10-11 | Humanity in AI: Detecting the Personality of Large Language Models | Baohua Zhan et.al. | 2410.08545 | null |
2024-10-11 | Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both | Abhijnan Nath et.al. | 2410.08458 | null |
2024-10-11 | oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness | Yu He Ke et.al. | 2410.08431 | null |
2024-10-10 | Large Airfoil Models | Howon Lee et.al. | 2410.08392 | null |
2024-10-10 | Think Beyond Size: Dynamic Prompting for More Effective Reasoning | Kamesh R et.al. | 2410.08130 | null |
2024-10-10 | A Closer Look at Machine Unlearning for Large Language Models | Xiaojian Yuan et.al. | 2410.08109 | link |
2024-10-10 | Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering | Yuan Sui et.al. | 2410.08085 | null |
2024-10-10 | Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses | Pranav Senthilkumar et.al. | 2410.07826 | null |
2024-10-10 | Mitigating Gender Bias in Code Large Language Models via Model Editing | Zhanyue Qin et.al. | 2410.07820 | null |
2024-10-10 | Automatic Curriculum Expert Iteration for Reliable LLM Reasoning | Zirui Zhao et.al. | 2410.07627 | link |
2024-10-10 | No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users | Mengxuan Hu et.al. | 2410.07589 | null |
2024-10-10 | OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting | Xukai Liu et.al. | 2410.07549 | link |
2024-10-10 | MKGL: Mastery of a Three-Word Language | Lingbing Guo et.al. | 2410.07526 | null |
2024-10-09 | Localizing Factual Inconsistencies in Attributable Text Generation | Arie Cattan et.al. | 2410.07473 | link |
2024-10-09 | Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning | Abhinav Bandari et.al. | 2410.07461 | link |
2024-10-09 | Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Manling Li et.al. | 2410.07166 | link |
2024-10-09 | Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization | Chengtao Jian et.al. | 2410.07018 | null |
2024-10-09 | Self-Boosting Large Language Models with Synthetic Preference Data | Qingxiu Dong et.al. | 2410.06961 | null |
2024-10-09 | AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation | Huanxi Liu et.al. | 2410.06943 | null |
2024-10-09 | Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning | Runchuan Zhu et.al. | 2410.06913 | link |
2024-10-09 | Calibrating Verbalized Probabilities for Large Language Models | Cheng Wang et.al. | 2410.06707 | null |
2024-10-09 | Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack | Leo McKee-Reid et.al. | 2410.06491 | null |
2024-10-09 | Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | David Noever et.al. | 2410.06462 | null |
2024-10-09 | Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs | Ruijia Niu et.al. | 2410.06431 | null |
2024-10-08 | Validation of the Scientific Literature via Chemputation Augmented by Large Language Models | Sebastian Pagel et.al. | 2410.06384 | null |
2024-10-08 | Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Ruosen Li et.al. | 2410.06304 | null |
2024-10-08 | EVOLvE: Evaluating and Optimizing LLMs For Exploration | Allen Nie et.al. | 2410.06238 | null |
2024-10-08 | ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution | Corban Rivera et.al. | 2410.06108 | null |
2024-10-10 | LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs | Vincent Emonet et.al. | 2410.06062 | link |
2024-10-08 | Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models | Bozhou Li et.al. | 2410.05802 | null |
2024-10-08 | Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition | Zheyang Xiong et.al. | 2410.05603 | null |
2024-10-07 | Self-rationalization improves LLM as a fine-grained judge | Prapti Trivedi et.al. | 2410.05495 | null |
2024-10-07 | ESPACE: Dimensionality Reduction of Activations for Model Compression | Charbel Sakr et.al. | 2410.05437 | null |
2024-10-05 | PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms | Yilong Li et.al. | 2410.05315 | null |
2024-10-07 | SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | Yuxin Xiao et.al. | 2410.05248 | null |
2024-10-07 | Precise Model Benchmarking with Only a Few Observations | Riccardo Fogliato et.al. | 2410.05222 | null |
2024-10-07 | Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Guanyu Zhou et.al. | 2410.04780 | link |
2024-10-07 | Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering | Zimu Wang et.al. | 2410.04752 | null |
2024-10-06 | Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Pengcheng Jiang et.al. | 2410.04585 | link |
2024-10-06 | DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination | Xuan Gong et.al. | 2410.04514 | null |
2024-10-05 | DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech | Dominika Woszczyk et.al. | 2410.04188 | null |
2024-10-04 | dZiner: Rational Inverse Design of Materials with AI Agents | Mehrad Ansari et.al. | 2410.03963 | link |
2024-10-03 | Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge | Aparna Elangovan et.al. | 2410.03775 | link |
2024-10-04 | Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores | Robert E. Blackwell et.al. | 2410.03492 | null |
2024-10-04 | Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation | Tobias Leemann et.al. | 2410.03461 | null |
2024-10-08 | Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs | Louis Serrano et.al. | 2410.03437 | null |
2024-10-04 | Towards a Benchmark for Large Language Models for Business Process Management Tasks | Kiran Busch et.al. | 2410.03255 | link |
2024-10-04 | Showing LLM-Generated Code Selectively Based on Confidence of LLMs | Jia Li et.al. | 2410.03234 | null |
2024-10-04 | ALR $^2$ : A Retrieve-then-Reason Framework for Long-context Question Answering | Huayang Li et.al. | 2410.03227 | null |
2024-10-04 | Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback | Kyuyoung Kim et.al. | 2410.03145 | link |
2024-10-04 | SAG: Style-Aligned Article Generation via Model Collaboration | Chenning Xu et.al. | 2410.03137 | null |
2024-10-10 | ARB-LLM: Alternating Refined Binarizations for Large Language Models | Zhiteng Li et.al. | 2410.03129 | link |
2024-10-04 | UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference | Jing Xiong et.al. | 2410.03090 | null |
2024-10-04 | Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues | Shilin Qu et.al. | 2410.03049 | null |
2024-10-03 | Characterizing Context Influence and Hallucination in Summarization | James Flemings et.al. | 2410.03026 | link |
2024-10-03 | Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review | Sungduk Yu et.al. | 2410.03019 | null |
2024-09-30 | Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG | Chenhao Fang et.al. | 2410.02825 | null |
2024-10-09 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | link |
2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | link |
2024-10-03 | Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization | Ryan C. Barron et.al. | 2410.02721 | null |
2024-10-07 | LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations | Hadas Orgad et.al. | 2410.02707 | link |
2024-10-03 | Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers | Shijie Chen et.al. | 2410.02642 | null |
2024-10-03 | Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration | Yun Qu et.al. | 2410.02511 | link |
2024-10-03 | AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models | Junfeng Fang et.al. | 2410.02355 | link |
2024-10-04 | How Much Can RAG Help the Reasoning of LLM? | Jingyu Liu et.al. | 2410.02338 | null |
2024-10-03 | Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference | Wei Cheng et.al. | 2410.02210 | null |
2024-10-03 | Efficiently Deploying LLMs with Controlled Risk | Michael J. Zellinger et.al. | 2410.02173 | null |
2024-10-03 | Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments | Amogh Mannekote et.al. | 2410.02110 | link |
2024-10-02 | DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection | Daiki Chiba et.al. | 2410.02095 | null |
2024-10-02 | DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning | Yebowen Hu et.al. | 2410.01772 | null |
2024-10-02 | CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs | Kangsheng Wang et.al. | 2410.01696 | null |
2024-10-02 | FactAlign: Long-form Factuality Alignment of Large Language Models | Chao-Wei Huang et.al. | 2410.01691 | link |
2024-10-02 | Intent Detection in the Age of LLMs | Gaurav Arora et.al. | 2410.01627 | null |
2024-10-02 | Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration | Kangxi Wu et.al. | 2410.01285 | null |
2024-10-02 | BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation | Bryan Li et.al. | 2410.01171 | link |
2024-10-01 | Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability | Weitong Zhang et.al. | 2410.01064 | null |
2024-10-01 | Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown | Xingzhou Lou et.al. | 2410.00847 | null |
2024-10-01 | Dynamic Planning for LLM-based Graphical User Interface Automation | Shaoqing Zhang et.al. | 2410.00467 | link |
2024-10-01 | UniAdapt: A Universal Adapter for Knowledge Calibration | Tai D. Nguyen et.al. | 2410.00454 | null |
2024-10-01 | Are LLMs Aware that Some Questions are not Open-ended? | Dongjie Yang et.al. | 2410.00423 | null |
2024-10-01 | Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation | Bhargav Shandilya et.al. | 2410.00387 | null |
2024-09-30 | A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification | Marina Ribeiro et.al. | 2410.00250 | null |
2024-09-30 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang et.al. | 2409.20550 | link |
2024-09-30 | Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models | Arpan Mukherjee et.al. | 2409.20512 | null |
2024-10-04 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao et.al. | 2409.20365 | link |
2024-09-30 | MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants | Zeyu Zhang et.al. | 2409.20163 | link |
2024-09-30 | Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation | Huangyu Dai et.al. | 2409.19877 | null |
2024-09-29 | Calibrating Language Models with Adaptive Temperature Scaling | Johnathan Xie et.al. | 2409.19817 | link |
2024-09-29 | MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models | Vibhor Agarwal et.al. | 2409.19492 | null |
2024-09-28 | Overriding Safety protections of Open-source Models | Sachin Kumar et.al. | 2409.19476 | link |
2024-09-28 | SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | Yi Wu et.al. | 2409.19471 | link |
2024-09-28 | Decoding Echo Chambers: LLM-Powered Simulations Revealing Polarization in Social Networks | Chenxi Wang et.al. | 2409.19338 | null |
2024-09-28 | DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning | Kazuki Matsuda et.al. | 2409.19255 | null |
2024-09-27 | Secure Multiparty Generative AI | Manil Shrestha et.al. | 2409.19120 | null |
2024-09-27 | A Survey on the Honesty of Large Language Models | Siheng Li et.al. | 2409.18786 | link |
2024-10-02 | Model-based Preference Optimization in Abstractive Summarization without Human Feedback | Jaepill Choi et.al. | 2409.18618 | link |
2024-09-26 | Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model | Chuang Niu et.al. | 2409.18319 | link |
2024-09-26 | Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT | Natthanaphop Isaradech et.al. | 2409.17683 | null |
2024-09-26 | A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models | Syed Affan Daimi et.al. | 2409.17581 | link |
2024-09-26 | HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection | Xuefeng Du et.al. | 2409.17504 | null |
2024-09-25 | Post-hoc Reward Calibration: A Case Study on Length Bias | Zeyu Huang et.al. | 2409.17407 | link |
2024-09-25 | Search for Efficient Large Language Models | Xuan Shen et.al. | 2409.17372 | link |
2024-09-20 | A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models | Satoshi Munakata et.al. | 2409.17173 | null |
2024-09-25 | Mitigating the Bias of Large Language Model Evaluation | Hongli Zhou et.al. | 2409.16788 | link |
2024-09-25 | RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems | Yihong Tang et.al. | 2409.16727 | null |
2024-09-25 | EventHallusion: Diagnosing Event Hallucinations in Video LLMs | Jiacheng Zhang et.al. | 2409.16597 | link |
2024-09-25 | Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels | Yishu Wei et.al. | 2409.16563 | null |
2024-09-24 | MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment | Venkata Naren Devarakonda et.al. | 2409.16455 | null |
2024-09-24 | Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | Samuel Arcadinho et.al. | 2409.15934 | null |
2024-09-24 | Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts | Sukai Huang et.al. | 2409.15915 | null |
2024-09-24 | Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection | Xingyu Ma et.al. | 2409.15907 | null |
2024-09-24 | XTRUST: On the Multilingual Trustworthiness of Large Language Models | Yahan Li et.al. | 2409.15762 | link |
2024-09-23 | Parse Trees Guided LLM Prompt Compression | Wenhao Mao et.al. | 2409.15395 | link |
2024-09-18 | VERA: Validation and Enhancement for Retrieval Augmented systems | Nitin Aravind Birur et.al. | 2409.15364 | null |
2024-09-18 | Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning | Essa Jan et.al. | 2409.15361 | null |
2024-09-27 | Reward-Robust RLHF in LLMs | Yuzi Yan et.al. | 2409.15360 | null |
2024-09-23 | A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | Yunfei Xie et.al. | 2409.15277 | null |
2024-09-26 | A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models | Yixi Wu et.al. | 2409.15228 | null |
2024-09-23 | Boosting Healthcare LLMs Through Retrieved Context | Jordi Bayarri-Planas et.al. | 2409.15127 | link |
2024-09-23 | Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications | Sean Kim et.al. | 2409.15076 | null |
2024-09-23 | InterMind: A Doctor-Patient-Family Interactive Depression Assessment System Empowered by Large Language Models | Zhiyuan Zhou et.al. | 2409.14878 | null |
2024-09-23 | Past Meets Present: Creating Historical Analogy with Large Language Models | Nianqi Li et.al. | 2409.14820 | link |
2024-09-28 | Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method | Weichao Zhang et.al. | 2409.14781 | link |
2024-09-23 | zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning | Zixiang Xian et.al. | 2409.14644 | null |
2024-09-22 | Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization | Minyi Zhao et.al. | 2409.14484 | null |
2024-09-22 | Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses | Hung-Ting Su et.al. | 2409.14324 | link |
2024-09-21 | OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching | Zhangcheng Qiang et.al. | 2409.14038 | null |
2024-09-20 | Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology | Aidan Gilson et.al. | 2409.13902 | null |
2024-09-20 | FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs | Bowen Yan et.al. | 2409.13612 | null |
2024-09-20 | ChainBuddy: An AI Agent System for Generating LLM Pipelines | Jingyue Zhang et.al. | 2409.13588 | null |
2024-09-23 | AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit | Mohanna Hoveyda et.al. | 2409.13447 | link |
2024-09-20 | Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey | Sourav Verma et.al. | 2409.13385 | link |
2024-09-20 | Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems | Andrea Colombo et.al. | 2409.13252 | null |
2024-09-19 | Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models | Peiyi Zhang et.al. | 2409.12739 | null |
2024-09-19 | LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks | Malsha Ashani Mahawatta Dona et.al. | 2409.12580 | null |
2024-09-19 | Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation | Chen Liang et.al. | 2409.12411 | null |
2024-09-19 | On the Effectiveness of LLMs for Manual Test Verifications | Myron David Lucena Campos Peixoto et.al. | 2409.12405 | null |
2024-09-18 | RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models | Abhinav Jain et.al. | 2409.12294 | null |
2024-09-18 | Finetuning Language Models to Emit Linguistic Expressions of Uncertainty | Arslan Chaudhry et.al. | 2409.12180 | null |
2024-09-05 | LitFM: A Retrieval Augmented Structure-aware Foundation Model For Citation Graphs | Jiasheng Zhang et.al. | 2409.12177 | null |
2024-09-18 | Combating Phone Scams with LLM-based Detection: Where Do We Stand? | Zitong Shen et.al. | 2409.11643 | null |
2024-09-17 | HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection | Theo King et.al. | 2409.11579 | link |
2024-09-17 | What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts | Shuaiyu Chen et.al. | 2409.11540 | null |
2024-09-17 | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | Jiahui Gao et.al. | 2409.11365 | null |
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | link |
2024-09-25 | Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling | Xinyue Fang et.al. | 2409.11283 | null |
2024-09-17 | Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models | Bishwash Khanal et.al. | 2409.11233 | null |
2024-09-17 | Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization | Jianing Wang et.al. | 2409.11212 | link |
2024-09-17 | A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B | Jemin Lee et.al. | 2409.11055 | link |
2024-09-16 | Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering | Qingru Zhang et.al. | 2409.10790 | null |
2024-09-16 | “The Data Says Otherwise”-Towards Automated Fact-checking and Communication of Data Claims | Yu Fu et.al. | 2409.10713 | null |
2024-09-17 | Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | Bhuvan Sachdeva et.al. | 2409.10354 | null |
2024-09-16 | Trustworthiness in Retrieval-Augmented Generation Systems: A Survey | Yujia Zhou et.al. | 2409.10102 | link |
2024-09-16 | Benchmarking Large Language Model Uncertainty for Prompt Optimization | Pei-Fu Guo et.al. | 2409.10044 | link |
2024-09-18 | HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making | Sumera Anjum et.al. | 2409.10011 | link |
2024-09-23 | Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations | Abe Bohan Hou et.al. | 2409.09947 | link |
2024-09-16 | Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges | Vinay Samuel et.al. | 2409.09927 | link |
2024-09-16 | SFR-RAG: Towards Contextually Faithful LLMs | Xuan-Phi Nguyen et.al. | 2409.09916 | null |
2024-09-15 | ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing | Suhyeon Yoo et.al. | 2409.09760 | null |
2024-09-15 | ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts | Che Wang et.al. | 2409.09661 | link |
2024-09-21 | Confidence Estimation for LLM-Based Dialogue State Tracking | Yi-Jyun Sun et.al. | 2409.09629 | link |
2024-09-14 | VernaCopter: Disambiguated Natural-Language-Driven Robot via Formal Specifications | Teun van de Laar et.al. | 2409.09536 | link |
2024-09-14 | Hacking, The Lazy Way: LLM Augmented Pentesting | Dhruva Goyal et.al. | 2409.09493 | null |
2024-09-19 | The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection | Yi Yang et.al. | 2409.09380 | null |
2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
2024-09-23 | When Context Leads but Parametric Memory Follows in Large Language Models | Yufei Tao et.al. | 2409.08435 | link |
2024-09-12 | Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT | Irene Weber et.al. | 2409.07732 | link |
2024-09-11 | MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | Praveen K Kanithi et.al. | 2409.07314 | null |
2024-09-11 | Reranking Laws for Language Generation: A Communication-Theoretic Perspective | António Farinhas et.al. | 2409.07131 | null |
2024-09-11 | Understanding Knowledge Drift in LLMs through Misinformation | Alina Fastowski et.al. | 2409.07085 | link |
2024-09-11 | Representation Tuning | Christopher M. Ackerman et.al. | 2409.06927 | link |
2024-09-10 | Semi-Supervised Reward Modeling via Iterative Self-Training | Yifei He et.al. | 2409.06903 | link |
2024-09-10 | Geometric-Averaged Preference Optimization for Soft Preference Labels | Hiroki Furuta et.al. | 2409.06691 | null |
2024-09-10 | Alleviating Hallucinations in Large Language Models with Scepticism Modeling | Yetao Wu et.al. | 2409.06601 | null |
2024-09-10 | GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | Sacha Muller et.al. | 2409.06595 | link |
2024-09-10 | Automate Strategy Finding with LLM in Quant investment | Zhizhuo Kou et.al. | 2409.06289 | null |
2024-09-14 | ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog | Yujian Gan et.al. | 2409.06097 | link |
2024-09-09 | $\mathbb{USCD}$ : Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding | Shuai Wang et.al. | 2409.05923 | null |
2024-09-09 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu et.al. | 2409.05806 | link |
2024-09-09 | LLMs Will Always Hallucinate, and We Need to Live With This | Sourav Banerjee et.al. | 2409.05746 | null |
2024-09-07 | LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs | Yongxin Deng et.al. | 2409.04744 | null |
2024-09-03 | Here’s Charlie! Realising the Semantic Web vision of Agents in the age of LLMs | Jesse Wright et.al. | 2409.04465 | null |
2024-09-06 | Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering | Larissa Pusch et.al. | 2409.04181 | null |
2024-09-13 | Safeguarding AI Agents: Developing and Analyzing Safety Architectures | Ishaan Domkundwar et.al. | 2409.03793 | null |
2024-09-06 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
2024-09-05 | Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration | Jeremy Qin et.al. | 2409.03225 | link |
2024-09-05 | Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models | Jie Ma et.al. | 2409.03155 | link |
2024-09-04 | CLUE: Concept-Level Uncertainty Estimation for Large Language Models | Yu-Hsiang Wang et.al. | 2409.03021 | null |
2024-09-04 | Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models | Gabriel Y. Arteaga et.al. | 2409.02976 | link |
2024-09-10 | LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Jiajie Zhang et.al. | 2409.02897 | link |
2024-09-04 | Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs | Ruoyu Wang et.al. | 2409.02686 | null |
2024-09-03 | Initial Development and Evaluation of the Creative Artificial Intelligence through Recurring Developments and Determinations (CAIRDD) System | Jeremy Straub et.al. | 2409.02291 | null |
2024-09-03 | Physical Rule-Guided Convolutional Neural Network | Kishor Datta Gupta et.al. | 2409.02081 | null |
2024-09-03 | RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer | Jiangyi Deng et.al. | 2409.02074 | null |
2024-08-25 | Path-Consistency: Prefix Enhancement for Efficient Inference in LLM | Jiace Zhu et.al. | 2409.01281 | null |
2024-09-02 | Statically Contextualizing Large Language Models with Typed Holes | Andrew Blinn et.al. | 2409.00921 | null |
2024-09-01 | Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering | Derian Boer et.al. | 2409.00861 | link |
2024-09-04 | Learning to Ask: When LLMs Meet Unclear Instruction | Wenxuan Wang et.al. | 2409.00557 | null |
2024-08-31 | Does Alignment Tuning Really Break LLMs’ Internal Confidence? | Hongseok Oh et.al. | 2409.00352 | link |
2024-09-08 | ProGRes: Prompted Generative Rescoring on ASR n-Best | Ada Defne Tur et.al. | 2409.00217 | link |
2024-08-30 | LLMs hallucinate graphs too: a structural perspective | Erwan Le Merrer et.al. | 2409.00159 | null |
2024-08-29 | HoneyComb: A Flexible LLM-Based Agent System for Materials Science | Huan Zhang et.al. | 2409.00135 | null |
2024-09-04 | Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs | Ziyan Cui et.al. | 2409.00128 | null |
2024-09-08 | Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning | Momin Abbas et.al. | 2409.00124 | null |
2024-09-04 | Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation | Mohammad Nadeem et.al. | 2409.00105 | null |
2024-08-26 | Evaluating ChatGPT on Nuclear Domain-Specific Data | Muhammad Anwar et.al. | 2409.00090 | null |
2024-08-26 | Watermarking Techniques for Large Language Models: A Survey | Yuqing Liang et.al. | 2409.00089 | null |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
2024-08-30 | Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling | Guangya Wan et.al. | 2408.17017 | null |
2024-09-05 | UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches | Chao Wang et.al. | 2408.16966 | null |
2024-09-04 | Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies | Zhiyang Qi et.al. | 2408.16586 | null |
2024-08-29 | LoraMap: Harnessing the Power of LoRA Connections | Hyeryun Park et.al. | 2408.16264 | null |
2024-08-28 | Logic-Enhanced Language Model Agents for Trustworthy Social Simulations | Agnieszka Mensfelt et.al. | 2408.16081 | link |
2024-08-28 | WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration | Yao Zhang et.al. | 2408.15978 | null |
2024-09-07 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
2024-08-28 | Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization | Léo Hemamou et.al. | 2408.15801 | null |
2024-08-28 | An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation | Thai Tang Quoc et.al. | 2408.15658 | null |
2024-08-28 | Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Lujun Gui et.al. | 2408.15562 | null |
2024-08-29 | LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation | Haichuan Hu et.al. | 2408.15533 | link |
2024-08-28 | Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression | Haowen Hou et.al. | 2408.15491 | link |
2024-08-27 | The Uniqueness of LLaMA3-70B with Per-Channel Quantization: An Empirical Study | Minghai Qin et.al. | 2408.15301 | null |
2024-08-27 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Kristina Gligorić et.al. | 2408.15204 | link |
2024-08-27 | Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation | N. E. Kriman et.al. | 2408.15171 | null |
2024-08-27 | Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering | Haowei Du et.al. | 2408.15037 | null |
2024-08-28 | Language-specific Calibration for Pruning Multilingual Language Models | Simon Kurz et.al. | 2408.14398 | null |
2024-08-26 | Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders | Cong Xu et.al. | 2408.14238 | link |
2024-08-25 | CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction | Guangya Wan et.al. | 2408.13940 | null |
2024-08-25 | Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models | Duy Khoa Pham et.al. | 2408.13808 | null |
2024-08-25 | Poor-Supervised Evaluation for SuperLLM via Mutual Consistency | Peiwen Yuan et.al. | 2408.13738 | null |
2024-08-25 | LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models | Aoxiao Zhong et.al. | 2408.13727 | null |
2024-08-24 | Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models | Jinyang Wu et.al. | 2408.13533 | null |
2024-08-27 | Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Hourui Deng et.al. | 2408.13184 | null |
2024-08-23 | IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models | Zhihao Yu et.al. | 2408.13073 | link |
2024-08-23 | Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering | Haowei Du et.al. | 2408.12979 | null |
2024-08-22 | SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection | Mengya Hu et.al. | 2408.12748 | link |
2024-08-22 | Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning | Mushui Liu et.al. | 2408.12469 | null |
2024-08-22 | A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation | Weijia Zhang et.al. | 2408.12398 | null |
2024-09-04 | Graph Retrieval Augmented Trustworthiness Reasoning | Ying Zhu et.al. | 2408.12333 | link |
2024-08-22 | Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models | Meiyun Wang et.al. | 2408.12326 | link |
2024-08-22 | Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators | Dingkang Yang et.al. | 2408.12325 | link |
2024-08-22 | MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient | Yanzeng Li et.al. | 2408.12236 | null |
2024-08-22 | FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation | KaShun Shum et.al. | 2408.12168 | link |
2024-08-22 | ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM | Zhaochen Su et.al. | 2408.12076 | link |
2024-08-21 | Understanding Epistemic Language with a Bayesian Theory of Mind | Lance Ying et.al. | 2408.12022 | null |
2024-08-21 | RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization | Jinhu Qi et.al. | 2408.12003 | null |
2024-08-21 | Automatic knowledge-graph creation from historical documents: The Chilean dictatorship as a case study | Camila Díaz et.al. | 2408.11975 | null |
2024-08-23 | Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy | Priyanka Mandikal et.al. | 2408.11903 | link |
2024-08-17 | How Susceptible are LLMs to Influence in Prompts? | Sotiris Anagnostidis et.al. | 2408.11865 | null |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
2024-08-21 | EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning | Zhihao Li et.al. | 2408.11397 | null |
2024-08-21 | First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models | Chi Ma et.al. | 2408.11393 | null |
2024-08-21 | RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation | Xuanwang Zhang et.al. | 2408.11381 | link |
2024-08-20 | A Little Confidence Goes a Long Way | John Scoville et.al. | 2408.11239 | null |
2024-08-20 | Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model | Chenhan Yuan et.al. | 2408.10764 | null |
2024-08-20 | Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models | Artem Vazhentsev et.al. | 2408.10692 | null |
2024-08-20 | Analysis of Plan-based Retrieval for Grounded Text Generation | Ameya Godbole et.al. | 2408.10490 | null |
2024-08-20 | LeCov: Multi-level Testing Criteria for Large Language Models | Xuan Xie et.al. | 2408.10474 | null |
2024-08-19 | Enhanced document retrieval with topic embeddings | Kavsar Huseynova et.al. | 2408.10435 | null |
2024-08-19 | LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain | Nicholas Pipitone et.al. | 2408.10343 | link |
2024-08-19 | Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models | Tianyu Zhang et.al. | 2408.10124 | link |
2024-08-19 | MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation | Ching-Wen Yang et.al. | 2408.09865 | null |
2024-08-19 | Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? | Shiyu Ni et.al. | 2408.09773 | null |
2024-08-19 | A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification | Claudio M. V. de Andrade et.al. | 2408.09629 | null |
2024-08-17 | TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems | Xinke Jiang et.al. | 2408.09199 | link |
2024-08-17 | Chinese Metaphor Recognition Using a Multi-stage Prompting Large Language Model | Jie Wang et.al. | 2408.09177 | null |
2024-08-17 | Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making | Siyu Wu et.al. | 2408.09176 | null |
2024-08-24 | Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection | Hsiu-Yuan Huang et.al. | 2408.09172 | null |
2024-08-15 | Graph Retrieval-Augmented Generation: A Survey | Boci Peng et.al. | 2408.08921 | link |
2024-08-12 | Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection | Chengyu Song et.al. | 2408.08902 | null |
2024-08-22 | Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Chenming Tang et.al. | 2408.08780 | null |
2024-08-16 | Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused | Dingwei Chen et.al. | 2408.08769 | null |
2024-08-16 | MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector | Wenjie Fu et.al. | 2408.08661 | link |
2024-08-16 | PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code | Ziyou Jiang et.al. | 2408.08619 | null |
2024-08-16 | SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models | Kaushal Kumar Maurya et.al. | 2408.08545 | null |
2024-08-15 | Plan with Code: Comparing approaches for robust NL to DSL generation | Nastaran Bassamzadeh et.al. | 2408.08335 | null |
2024-08-14 | CodeMirage: Hallucinations in Code Generated by Large Language Models | Vibhor Agarwal et.al. | 2408.08333 | null |
2024-08-16 | Covert Bias: The Severity of Social Views’ Unalignment in Language Models Towards Implicit and Explicit Opinion | Abeer Aldayel et.al. | 2408.08212 | null |
2024-08-15 | LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation | Bohao Wang et.al. | 2408.08208 | null |
2024-08-15 | Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy | Shaojun Xu et.al. | 2408.08188 | null |
2024-08-15 | Confidence-weighted integration of human and machine judgments for superior decision-making | Felipe Yáñez et.al. | 2408.08083 | link |
2024-08-15 | LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Jiajie Li et.al. | 2408.07981 | null |
2024-08-14 | Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization | Yuxin Jiang et.al. | 2408.07471 | link |
2024-08-13 | MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Yongjin Yang et.al. | 2408.06816 | link |
2024-08-12 | A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | Sampath Rajapaksha et.al. | 2408.06272 | null |
2024-08-12 | On Effects of Steering Latent Representation for Large Language Model Unlearning | Dang Huu-Tien et.al. | 2408.06223 | link |
2024-08-11 | Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models | Wenbo Zhang et.al. | 2408.05873 | link |
2024-08-10 | Can LLMs Replace Manual Annotation of Software Engineering Artifacts? | Toufique Ahmed et.al. | 2408.05534 | null |
2024-08-19 | SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Yuze Zhao et.al. | 2408.05517 | link |
2024-08-09 | FiST-Financial Style Transfer with Hallucination and Creativity Control Framework | Sohini Roychowdhury et.al. | 2408.05365 | null |
2024-08-09 | A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning | Ye Yuan et.al. | 2408.05141 | null |
2024-08-16 | Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models | Zikai Xie et.al. | 2408.05093 | link |
2024-08-08 | Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews | Samantha Chan et.al. | 2408.04681 | link |
2024-08-06 | Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD) | Avshalom Manevich et.al. | 2408.04664 | null |
2024-08-08 | Arctic-TILT. Business Document Understanding at Sub-Billion Scale | Łukasz Borchmann et.al. | 2408.04632 | null |
2024-08-08 | Learning Fine-Grained Grounded Citations for Attributed Large Language Models | Lei Huang et.al. | 2408.04568 | link |
2024-08-20 | Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | Yiqun Zhang et.al. | 2408.04472 | link |
2024-08-07 | Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology | Panagiotis Fytas et.al. | 2408.04121 | null |
2024-08-07 | Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks | Zizhang Chen et.al. | 2408.03732 | null |
2024-08-19 | KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models | Ruizhe Zhang et.al. | 2408.03297 | null |
2024-08-05 | An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs | Dongming Jin et.al. | 2408.02450 | null |
2024-08-05 | SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models | Shujuan Zhao et.al. | 2408.02302 | null |
2024-08-07 | SpecRover: Code Intent Extraction via LLMs | Haifeng Ruan et.al. | 2408.02232 | null |
2024-08-05 | ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning | Yuxuan Wang et.al. | 2408.02210 | null |
2024-08-04 | Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process | Peng Wang et.al. | 2408.02103 | null |
2024-08-04 | Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference | Ke Shen et.al. | 2408.01935 | null |
2024-08-03 | TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation | Xingpeng Sun et.al. | 2408.01867 | null |
2024-08-03 | WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization | Liwenhan Xie et.al. | 2408.01703 | null |
2024-08-02 | Analyzing LLMs’ Capabilities to Establish Implicit User Sentiment of Software Desirability | Sherri Weitl-Harms et.al. | 2408.01527 | null |
2024-07-28 | Faculty Perspectives on the Potential of RAG in Computer Science Higher Education | Sagnik Dakshit et.al. | 2408.01462 | null |
2024-08-18 | RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | Kunlun Zhu et.al. | 2408.01262 | link |
2024-08-02 | Misinforming LLMs: vulnerabilities, challenges and opportunities | Bo Zhou et.al. | 2408.01168 | null |
2024-08-01 | Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection | Steven Fincke et.al. | 2408.00914 | null |
2024-07-26 | ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model | Ning Xu et.al. | 2408.00804 | null |
2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | link |
2024-08-01 | Future of Artificial Intelligence in Agile Software Development | Mariyam Mahboob et.al. | 2408.00703 | null |
2024-07-25 | Closing the gap between open-source and commercial large language models for medical evidence summarization | Gongbo Zhang et.al. | 2408.00588 | null |
2024-08-01 | Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation | Xiaoye Qu et.al. | 2408.00555 | null |
2024-08-01 | Jailbreaking Text-to-Image Models with LLM-Based Agents | Yingkai Dong et.al. | 2408.00523 | null |
2024-08-01 | DeliLaw: A Chinese Legal Counselling System Based on a Large Language Model | Nan Xie et.al. | 2408.00357 | null |
2024-07-31 | Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation | Valdemar Danry et.al. | 2408.00024 | null |
2024-07-30 | WebApp1K: A Practical Code-Generation Benchmark for Web App Development | Yi Cui et.al. | 2408.00019 | link |
2024-07-31 | Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Shi Liu et.al. | 2407.21771 | null |
2024-07-31 | Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency | Taiji Li et.al. | 2407.21443 | null |
2024-08-09 | Cost-Effective Hallucination Detection for LLMs | Simon Valentin et.al. | 2407.21424 | null |
2024-07-31 | Towards interfacing large language models with ASR systems using confidence measures and prompting | Maryam Naderi et.al. | 2407.21414 | null |
2024-07-31 | Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs | Elan Markowitz et.al. | 2407.21358 | link |
2024-07-30 | Accelerating Large Language Model Inference with Self-Supervised Early Exits | Florian Valade et.al. | 2407.21082 | null |
2024-07-25 | Multi-group Uncertainty Quantification for Long-form Text Generation | Terrance Liu et.al. | 2407.21057 | null |
2024-07-24 | Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications | Cui Long et.al. | 2407.21055 | null |
2024-07-30 | Automated Review Generation Method Based on Large Language Models | Shican Wu et.al. | 2407.20906 | link |
2024-07-30 | How to Measure the Intelligence of Large Language Models? | Nils Körber et.al. | 2407.20828 | null |
2024-07-30 | Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian | Serena Auriemma et.al. | 2407.20654 | null |
2024-07-25 | An Efficient Inference Framework for Early-exit Large Language Models | Ruijie Miao et.al. | 2407.20272 | null |
2024-07-17 | Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies | Lachlan McGinness et.al. | 2407.20244 | null |
2024-08-02 | Improving Retrieval Augmented Language Model with Self-Reasoning | Yuan Xia et.al. | 2407.19813 | null |
2024-07-29 | SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages | Wenxuan Zhang et.al. | 2407.19672 | link |
2024-07-27 | Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review | Tongyue Shi et.al. | 2407.19256 | null |
2024-07-26 | OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation | Zilong Wang et.al. | 2407.19056 | link |
2024-08-08 | Know Your Limits: A Survey of Abstention in Large Language Models | Bingbing Wen et.al. | 2407.18418 | null |
2024-07-25 | Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement | Jaehun Jung et.al. | 2407.18370 | null |
2024-07-25 | The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | Eric Yang et.al. | 2407.18044 | null |
2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
2024-07-24 | ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering | Xiuying Chen et.al. | 2407.16931 | null |
2024-07-23 | Generation Constraint Scaling Can Mitigate Hallucination | Georgios Kollias et.al. | 2407.16908 | null |
2024-07-23 | TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class | Anishka IIITD et.al. | 2407.16805 | link |
2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | link |
2024-07-25 | Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models | Kenza Benkirane et.al. | 2407.16470 | link |
2024-07-23 | Enhancing LLM’s Cognition via Structurization | Kai Liu et.al. | 2407.16434 | link |
2024-07-23 | LawLuo: A Chinese Law Firm Co-run by LLM Agents | Jingyun Sun et.al. | 2407.16252 | link |
2024-07-23 | Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models | Nishanth Madhusudhan et.al. | 2407.16221 | null |
2024-07-22 | Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned | Song Wang et.al. | 2407.15441 | null |
2024-07-22 | MAVEN-Fact: A Large-scale Event Factuality Detection Dataset | Chunyang Li et.al. | 2407.15352 | link |
2024-07-20 | Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models | Ze Yu Zhang et.al. | 2407.14845 | null |
2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507 | link |
2024-07-19 | Prompted Aspect Key Point Analysis for Quantitative Review Summarization | An Quang Tang et.al. | 2407.14049 | link |
2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | link |
2024-08-01 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692 | null |
2024-07-18 | BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models | Moon Ye-Bin et.al. | 2407.13442 | null |
2024-07-18 | CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | Junying Chen et.al. | 2407.13301 | link |
2024-07-19 | AI-Assisted SQL Authoring at Industry Scale | Chandra Maddila et.al. | 2407.13280 | null |
2024-07-19 | Retrieval-Augmented Generation for Natural Language Processing: A Survey | Shangyu Wu et.al. | 2407.13193 | null |
2024-07-18 | Translate-and-Revise: Boosting Large Language Models for Constrained Translation | Pengcheng Huang et.al. | 2407.13164 | null |
2024-07-17 | Halu-J: Critique-Based Hallucination Judge | Binjie Wang et.al. | 2407.12943 | link |
2024-08-01 | Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild | Nicolas Richet et.al. | 2407.12927 | link |
2024-07-17 | Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models | Alexander R. Pelletier et.al. | 2407.12888 | link |
2024-07-17 | LLM-based query paraphrasing for video search | Jiaxin Wu et.al. | 2407.12341 | null |
2024-07-17 | Optimizing Query Generation for Enhanced Document Retrieval in RAG | Hamin Koo et.al. | 2407.12325 | null |
2024-07-11 | NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2 | Tengfei Xue et.al. | 2407.12057 | null |
2024-07-16 | What’s Wrong? Refining Meeting Summaries with LLM Feedback | Frederic Kirstein et.al. | 2407.11919 | null |
2024-07-16 | LoFTI: Localization and Factuality Transfer to Indian Locales | Sona Elza Simon et.al. | 2407.11833 | link |
2024-07-16 | A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots | Lucia Chen et.al. | 2407.11387 | null |
2024-07-19 | Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models | Qingcheng Zeng et.al. | 2407.11282 | link |
2024-07-15 | AstroMLab 1: Who Wins Astronomy Jeopardy!? | Yuan-Sen Ting et.al. | 2407.11194 | null |
2024-07-15 | Inertial Confinement Fusion Forecasting via LLMs | Mingkai Chen et.al. | 2407.11098 | null |
2024-07-15 | Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis | Yunting Liu et.al. | 2407.10899 | null |
2024-07-24 | MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs | Quang H. Nguyen et.al. | 2407.10834 | link |
2024-07-15 | Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval | Shengjie Ma et.al. | 2407.10805 | link |
2024-07-15 | GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework | Hannah Sansford et.al. | 2407.10793 | null |
2024-07-15 | CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses | Jing Yao et.al. | 2407.10725 | null |
2024-07-15 | Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews | Lucas Joos et.al. | 2407.10652 | null |
2024-07-14 | GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? | Barah Fazili et.al. | 2407.10245 | null |
2024-07-14 | Look Within, Why LLMs Hallucinate: A Causal Perspective | He Li et.al. | 2407.10153 | null |
2024-07-13 | Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | KuanChao Chu et.al. | 2407.09897 | null |
2024-07-13 | Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks | Shengbin Yue et.al. | 2407.09893 | link |
2024-07-13 | On Mitigating Code LLM Hallucinations with API Documentation | Nihal Jain et.al. | 2407.09726 | null |
2024-07-22 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su et.al. | 2407.09417 | link |
2024-07-12 | PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Saber Zerhoudi et.al. | 2407.09394 | link |
2024-07-12 | DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection | Sangpil Youm et.al. | 2407.09283 | null |
2024-07-12 | The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs | Anh Thu Maria Bui et.al. | 2407.09152 | null |
2024-07-12 | Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors | Nico Daheim et.al. | 2407.09136 | link |
2024-07-12 | Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations | David N. Palacio et.al. | 2407.08983 | null |
2024-07-15 | Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation | Biqing Qi et.al. | 2407.08940 | link |
2024-07-12 | Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? | Yingming Pu et.al. | 2407.08922 | link |
2024-07-11 | Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jennifer Healey et.al. | 2407.08842 | null |
2024-07-11 | Proving that Cryptic Crossword Clue Answers are Correct | Martin Andrews et.al. | 2407.08824 | link |
2024-07-11 | Uncertainty Estimation of Large Language Models in Medical Question Answering | Jiaxin Wu et.al. | 2407.08662 | null |
2024-07-11 | $β$-DPO: Direct Preference Optimization with Dynamic $β$ | Junkang Wu et.al. | 2407.08639 | link |
2024-07-11 | On the Universal Truthfulness Hyperplane Inside LLMs | Junteng Liu et.al. | 2407.08582 | link |
2024-07-22 | Lynx: An Open Source Hallucination Evaluation Model | Selvan Sunitha Ravi et.al. | 2407.08488 | null |
2024-07-11 | On the attribution of confidence to large language models | Geoff Keeling et.al. | 2407.08388 | null |