2025-07-23 |
BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems |
Malsha Ashani Mahawatta Dona et.al. |
2507.17722 |
null |
2025-07-23 |
Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks |
Ilias Chatzistefanidis et.al. |
2507.17695 |
null |
2025-07-23 |
An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models |
Haoran Sun et.al. |
2507.17477 |
null |
2025-07-23 |
Each to Their Own: Exploring the Optimal Embedding in RAG |
Shiting Chen et.al. |
2507.17442 |
null |
2025-07-23 |
R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning |
Zhuokun Chen et.al. |
2507.17307 |
null |
2025-07-23 |
HypoChainer: A Collaborative System Combining LLMs and Knowledge Graphs for Hypothesis-Driven Scientific Discovery |
Haoran Jiang et.al. |
2507.17209 |
null |
2025-07-23 |
SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs |
Zhiqiang Liu et.al. |
2507.17178 |
null |
2025-07-23 |
Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination |
Mariam ALMutairi et.al. |
2507.17134 |
null |
2025-07-23 |
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance |
Yufei He et.al. |
2507.17131 |
null |
2025-07-22 |
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems |
Chengxuan Xia et.al. |
2507.17061 |
null |
2025-07-22 |
Harnessing RLHF for Robust Unanswerability Recognition and Trustworthy Response Generation in LLMs |
Shuyuan Lin et.al. |
2507.16951 |
null |
2025-07-22 |
CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage |
Na Li et.al. |
2507.16872 |
null |
2025-07-22 |
Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support |
Fangjian Lei et.al. |
2507.16754 |
null |
2025-07-23 |
Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints |
Zhenyun Yin et.al. |
2507.16727 |
null |
2025-07-22 |
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs |
Zhenliang Zhang et.al. |
2507.16488 |
null |
2025-07-22 |
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework |
Hongyi Tang et.al. |
2507.16414 |
null |
2025-07-23 |
WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking |
Zipeng Ling et.al. |
2507.16199 |
null |
2025-07-21 |
Efficient Compositional Multi-tasking for On-device Large Language Models |
Ondrej Bohdal et.al. |
2507.16083 |
null |
2025-07-21 |
Towards Mitigation of Hallucination for LLM-empowered Agents: Progressive Generalization Bound Exploration and Watchdog Monitor |
Siyuan Liu et.al. |
2507.15903 |
null |
2025-07-21 |
Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks |
Hope Schroeder et.al. |
2507.15821 |
null |
2025-07-21 |
LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra |
Seth Karten et.al. |
2507.15815 |
null |
2025-07-21 |
Interleaved LLM and Motion Planning for Generalized Multi-Object Collection in Large Scene Graphs |
Ruochu Yang et.al. |
2507.15782 |
null |
2025-07-21 |
On the Inevitability of Left-Leaning Political Bias in Aligned Language Models |
Thilo Hagendorff et.al. |
2507.15328 |
null |
2025-07-21 |
Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems |
Qian Xiong et.al. |
2507.15296 |
null |
2025-07-20 |
MUR: Momentum Uncertainty guided Reasoning for Large Language Models |
Hang Yan et.al. |
2507.14958 |
null |
2025-07-20 |
Byzantine-Robust Decentralized Coordination of LLM Agents |
Yongrae Jo et.al. |
2507.14928 |
null |
2025-07-20 |
InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis |
Jiale Liu et.al. |
2507.14899 |
null |
2025-07-19 |
Large Language Models as Medical Codes Selectors: a benchmark using the International Classification of Primary Care |
Vinicius Anjos de Almeida et.al. |
2507.14681 |
null |
2025-07-19 |
Cleanse: Uncertainty Estimation Approach Using Clustering-based Semantic Consistency in LLMs |
Minsuh Joo et.al. |
2507.14649 |
null |
2025-07-18 |
Fail Fast, or Ask: Mitigating the Deficiencies of Reasoning LLMs with Human-in-the-Loop Systems Engineering |
Michael J. Zellinger et.al. |
2507.14406 |
null |
2025-07-18 |
DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation |
Ziqi Wang et.al. |
2507.14267 |
null |
2025-07-14 |
DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base |
Song Mao et.al. |
2507.14189 |
null |
2025-07-18 |
Architecting Human-AI Cocreation for Technical Services – Interaction Modes and Contingency Factors |
Jochen Wulf et.al. |
2507.14034 |
null |
2025-07-18 |
Preprint: Did I Just Browse A Website Written by LLMs? |
Sichang “Steven” He et.al. |
2507.13933 |
null |
2025-07-18 |
RAG-based Architectures for Drug Side Effect Retrieval in LLMs |
Shad Nygren et.al. |
2507.13822 |
null |
2025-07-17 |
GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models |
Eduardo C. Garrido-Merchán et.al. |
2507.13550 |
null |
2025-07-17 |
Aligning Knowledge Graphs and Language Models for Factual Accuracy |
Nur A Zarin Nishat et.al. |
2507.13411 |
null |
2025-07-17 |
DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning |
Rahel Rickenbach et.al. |
2507.12855 |
null |
2025-07-17 |
Bridging the Gap: Leveraging Retrieval-Augmented Generation to Better Understand Public Concerns about Vaccines |
Muhammad Javed et.al. |
2507.12840 |
null |
2025-07-16 |
LLM-Based Config Synthesis requires Disambiguation |
Rajdeep Mondal et.al. |
2507.12443 |
null |
2025-07-16 |
From Static to Intelligent: Evolving SaaS Pricing with LLMs |
Francisco Javier Cavero et.al. |
2507.12104 |
null |
2025-07-16 |
Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning |
Tosin Adewumi et.al. |
2507.12079 |
null |
2025-07-16 |
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs |
Xinyu Wang et.al. |
2507.11959 |
null |
2025-07-15 |
CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks |
Meng Li et.al. |
2507.11742 |
null |
2025-07-15 |
LLM-based ambiguity detection in natural language instructions for collaborative surgical robots |
Ana Davila et.al. |
2507.11525 |
null |
2025-07-15 |
Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces |
Yunhao Yang et.al. |
2507.11352 |
null |
2025-07-15 |
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems |
Dany Moshkovich et.al. |
2507.11277 |
null |
2025-07-15 |
An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling |
Anh Nguyen-Duc et.al. |
2507.11272 |
null |
2025-07-15 |
An Agentic Flow for Finite State Machine Extraction using Prompt Chaining |
Fares Wael et.al. |
2507.11222 |
null |
2025-07-15 |
Mixture of Experts in Large Language Models |
Danyang Zhang et.al. |
2507.11181 |
null |
2025-07-15 |
What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests |
Dimitri Staufer et.al. |
2507.11128 |
null |
2025-07-15 |
LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP |
Haowei Yang et.al. |
2507.11052 |
null |
2025-07-15 |
Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment |
Adam Yang et.al. |
2507.11042 |
null |
2025-07-15 |
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models |
Xingyu Zheng et.al. |
2507.11017 |
null |
2025-07-14 |
Enhancing the Capabilities of Large Language Models for API calls through Knowledge Graphs |
Ye Yang et.al. |
2507.10630 |
null |
2025-07-16 |
GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning |
Ziru Liu et.al. |
2507.10628 |
null |
2025-07-11 |
Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing |
Dennis Ulmer et.al. |
2507.10587 |
null |
2025-07-11 |
AutoRAG-LoRA: Hallucination-Triggered Knowledge Retuning via Lightweight Adapters |
Kaushik Dwivedi et.al. |
2507.10586 |
null |
2025-07-14 |
Referential ambiguity and clarification requests: comparing human and LLM behaviour |
Chris Madge et.al. |
2507.10445 |
null |
2025-07-14 |
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs |
Jiahe Zhao et.al. |
2507.10302 |
null |
2025-07-14 |
The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents |
Lixu Wang et.al. |
2507.10016 |
null |
2025-07-14 |
Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning |
Zijun Chen et.al. |
2507.10007 |
null |
2025-07-13 |
Prompting for Performance: Exploring LLMs for Configuring Software |
Helge Spieker et.al. |
2507.09790 |
null |
2025-07-16 |
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs |
Yangning Li et.al. |
2507.09477 |
null |
2025-07-12 |
LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing |
Quanyan Zhu et.al. |
2507.09407 |
null |
2025-07-22 |
Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models |
Anita Kriz et.al. |
2507.09279 |
null |
2025-07-12 |
StockSim: A Dual-Mode Order-Level Simulator for Evaluating Multi-Agent LLMs in Financial Markets |
Charidimos Papadakis et.al. |
2507.09255 |
null |
2025-07-12 |
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models |
Ameen Ali et.al. |
2507.09185 |
null |
2025-07-12 |
Position Paper: Programming Language Techniques for Bridging LLM Code Generation Semantic Gaps |
Yalong Du et.al. |
2507.09135 |
null |
2025-07-11 |
SetupBench: Assessing Software Engineering Agents’ Ability to Bootstrap Development Environments |
Avi Arora et.al. |
2507.09063 |
null |
2025-07-11 |
GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval |
Savini Kashmira et.al. |
2507.08945 |
null |
2025-07-09 |
RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation |
Tianzhe Zhao et.al. |
2507.08862 |
null |
2025-07-11 |
Using Large Language Models for Legal Decision-Making in Austrian Value-Added Tax Law: An Experimental Study |
Marina Luketina et.al. |
2507.08468 |
null |
2025-07-10 |
TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs |
Duygu Nur Yaldiz et.al. |
2507.08203 |
null |
2025-07-10 |
CTRLS: Chain-of-Thought Reasoning via Latent State-Transition |
Junda Wu et.al. |
2507.08182 |
null |
2025-07-10 |
Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores |
Vivek Chari et.al. |
2507.08143 |
null |
2025-07-10 |
TableReasoner: Advancing Table Reasoning Framework with Large Language Models |
Sishi Xiong et.al. |
2507.08046 |
null |
2025-07-09 |
Integrating External Tools with Large Language Models to Improve Accuracy |
Nripesh Niketan et.al. |
2507.08034 |
null |
2025-07-10 |
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models |
Chen Feng et.al. |
2507.07877 |
null |
2025-07-10 |
DocCHA: Towards LLM-Augmented Interactive Online diagnosis System |
Xinyi Liu et.al. |
2507.07870 |
null |
2025-07-10 |
From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems |
Youngjoon Jang et.al. |
2507.07847 |
null |
2025-07-10 |
When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance |
Peizhang Shao et.al. |
2507.07748 |
null |
2025-07-10 |
Prompt Engineering for Requirements Engineering: A Literature Review and Roadmap |
Kaicheng Huang et.al. |
2507.07682 |
null |
2025-07-15 |
Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models |
Varin Sikka et.al. |
2507.07505 |
null |
2025-07-10 |
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models |
Kaiqu Liang et.al. |
2507.07484 |
null |
2025-07-09 |
Bridging the Plausibility-Validity Gap by Fine-Tuning a Reasoning-Enhanced LLM for Chemical Synthesis and Discovery |
Malikussaid et.al. |
2507.07328 |
null |
2025-07-09 |
An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation |
Maya Kruse et.al. |
2507.07236 |
null |
2025-07-09 |
Evaluating Retrieval-Augmented Generation Agents for Autonomous Scientific Discovery in Astrophysics |
Xueqing Xu et.al. |
2507.07155 |
null |
2025-07-07 |
DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning |
Shreyas Vinaya Sathyanarayana et.al. |
2507.07060 |
null |
2025-07-09 |
5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage |
Ugur Ari et.al. |
2507.07045 |
null |
2025-07-09 |
First Return, Entropy-Eliciting Explore |
Tianyu Zheng et.al. |
2507.07017 |
null |
2025-07-09 |
Investigating the Robustness of Retrieval-Augmented Generation at the Query Level |
Sezen Perçin et.al. |
2507.06956 |
null |
2025-07-09 |
On the Effect of Uncertainty on Layer-wise Inference Dynamics |
Sunwoo Kim et.al. |
2507.06722 |
null |
2025-07-10 |
The Flaws of Others: An LLM-driven Framework for Scientific Knowledge Production |
Juan B. Gutiérrez et.al. |
2507.06565 |
null |
2025-07-09 |
On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks |
Stephen Obadinma et.al. |
2507.06489 |
null |
2025-07-08 |
Humans overrely on overconfident language models, across languages |
Neil Rathi et.al. |
2507.06306 |
null |
2025-07-08 |
Differential Mamba |
Nadav Schneider et.al. |
2507.06204 |
null |
2025-07-08 |
UQLM: A Python Package for Uncertainty Quantification in Large Language Models |
Dylan Bouchard et.al. |
2507.06196 |
null |
2025-07-08 |
KERAG_R: Knowledge-Enhanced Retrieval-Augmented Generation for Recommendation |
Zeyuan Meng et.al. |
2507.05863 |
null |
2025-07-08 |
Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik’s Cube |
Chongshan Fan et.al. |
2507.05607 |
null |
2025-07-07 |
“Lost-in-the-Later”: Framework for Quantifying Contextual Grounding in Large Language Models |
Yufei Tao et.al. |
2507.05424 |
null |
2025-07-07 |
On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study |
Riccardo Alberghi et.al. |
2507.05362 |
null |
2025-07-07 |
LCDS: A Logic-Controlled Discharge Summary Generation System Supporting Source Attribution and Expert Review |
Cheng Yuan et.al. |
2507.05319 |
null |
2025-07-04 |
ReservoirChat: Interactive Documentation Enhanced with LLM and Knowledge Graph for ReservoirPy |
Virgile Boraud et.al. |
2507.05279 |
null |
2025-07-07 |
CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale |
Jonathan Hyun et.al. |
2507.05178 |
null |
2025-07-07 |
What Shapes User Trust in ChatGPT? A Mixed-Methods Study of User Attributes, Trust Dimensions, Task Context, and Societal Perceptions among University Students |
Kadija Bouyzourn et.al. |
2507.05046 |
null |
2025-07-07 |
MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction |
Kaleem Ullah Qasim et.al. |
2507.04893 |
null |
2025-07-07 |
Knowledge-Aware Self-Correction in Language Models via Structured Memory Graphs |
Swayamjit Saha et.al. |
2507.04625 |
null |
2025-07-07 |
any4: Learned 4-bit Numeric Representation for LLMs |
Mostafa Elhoushi et.al. |
2507.04610 |
null |
2025-07-06 |
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation |
Zhen Xiong et.al. |
2507.04504 |
null |
2025-07-06 |
The role of large language models in UI/UX design: A systematic literature review |
Ammar Ahmed et.al. |
2507.04469 |
null |
2025-07-06 |
Data Discovery using LLMs – A Study of Data User Behaviour |
Christin Katharina Kreutz et.al. |
2507.04444 |
null |
2025-07-06 |
Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models |
Pranta Saha et.al. |
2507.04432 |
null |
2025-07-06 |
AutoLayout: Closed-Loop Layout Synthesis via Slow-Fast Collaborative Reasoning |
Weixing Chen et.al. |
2507.04293 |
null |
2025-07-10 |
DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth |
Zheng Lian et.al. |
2507.04278 |
null |
2025-07-05 |
SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding |
Runcong Zhao et.al. |
2507.04189 |
null |
2025-07-05 |
Token Level Hallucination Detection via Variance in Language Models |
Keshav Kumar et.al. |
2507.04137 |
null |
2025-07-05 |
Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing |
Jinwei Hu et.al. |
2507.04105 |
null |
2025-07-05 |
Toward Better Generalisation in Uncertainty Estimators: Leveraging Data-Agnostic Features |
Thuy An Ha et.al. |
2507.03998 |
null |
2025-07-05 |
CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate |
Yiliu Sun et.al. |
2507.03928 |
null |
2025-07-05 |
KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis |
Reilly Haskins et.al. |
2507.03847 |
null |
2025-07-09 |
Skewed Score: A statistical framework to assess autograders |
Magda Dubois et.al. |
2507.03772 |
null |
2025-07-04 |
Roadmap for using large language models (LLMs) to accelerate cross-disciplinary research with an example from computational biology |
Ruian Ke et.al. |
2507.03722 |
null |
2025-07-04 |
Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy |
Francisca Lemos et.al. |
2507.03620 |
null |
2025-07-04 |
REAL: Benchmarking Abilities of Large Language Models for Housing Transactions and Services |
Kexin Zhu et.al. |
2507.03477 |
null |
2025-07-04 |
Conformal Information Pursuit for Interactively Guiding Large Language Models |
Kwan Ho Ryan Chan et.al. |
2507.03279 |
null |
2025-07-04 |
KinyaColBERT: A Lexically Grounded Retrieval Model for Low-Resource Retrieval-Augmented Generation |
Antoine Nzeyimana et.al. |
2507.03241 |
null |
2025-07-03 |
How Much Content Do LLMs Generate That Induces Cognitive Bias in Users? |
Abeer Alessa et.al. |
2507.03194 |
null |
2025-07-03 |
How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models |
Dharshan Kumaran et.al. |
2507.03120 |
null |
2025-07-03 |
Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case |
Alvaro Riquelme et.al. |
2507.03067 |
null |
2025-07-03 |
Cautious Next Token Prediction |
Yizhou Wang et.al. |
2507.03038 |
null |
2025-07-03 |
Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation |
Johnson Thomas et.al. |
2507.03033 |
null |
2025-07-01 |
GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models |
Seshu Tirupathi et.al. |
2507.02986 |
null |
2025-07-06 |
KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs |
Yuzhang Xie et.al. |
2507.02773 |
null |
2025-07-03 |
Who’s Sorry Now: User Preferences Among Rote, Empathic, and Explanatory Apologies from LLM Chatbots |
Zahra Ashktorab et.al. |
2507.02745 |
null |
2025-07-03 |
Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory |
Kenneth Payne et.al. |
2507.02618 |
null |
2025-07-03 |
MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion |
Xin Guan et.al. |
2507.02595 |
null |
2025-07-03 |
WebSailor: Navigating Super-human Reasoning for Web Agent |
Kuan Li et.al. |
2507.02592 |
null |
2025-07-04 |
Introducing a New Brexit-Related Uncertainty Index: Its Evolution and Economic Consequences |
Ismet Gocer et.al. |
2507.02439 |
null |
2025-07-03 |
Uncertainty-aware Reward Design Process |
Yang Yang et.al. |
2507.02256 |
null |
2025-07-03 |
DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs |
Mohammad Akyash et.al. |
2507.02226 |
null |
2025-07-02 |
The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems |
Reza Yousefi Maragheh et.al. |
2507.02097 |
null |
2025-07-02 |
Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs |
Mohammad Ali Alomrani et.al. |
2507.02076 |
null |
2025-07-02 |
SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars |
Xiaosheng Zhao et.al. |
2507.01939 |
null |
2025-07-02 |
High-Layer Attention Pruning with Rescaling |
Songtao Liu et.al. |
2507.01900 |
null |
2025-07-02 |
Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks |
Hanlin Cai et.al. |
2507.01694 |
null |
2025-07-02 |
Efficient Out-of-Scope Detection in Dialogue Systems via Uncertainty-Driven LLM Routing |
Álvaro Zaera et.al. |
2507.01541 |
null |
2025-07-02 |
Using multi-agent architecture to mitigate the risk of LLM hallucinations |
Abd Elrahman Amer et.al. |
2507.01446 |
null |
2025-07-07 |
Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading |
Yoonseok Yang et.al. |
2507.01431 |
null |
2025-07-02 |
Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing |
Inyoung Cheong et.al. |
2507.01418 |
null |
2025-07-02 |
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks |
Zhiyao Ren et.al. |
2507.01321 |
null |
2025-07-02 |
Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care |
Matthew JY Kang et.al. |
2507.01282 |
null |
2025-07-01 |
Good Enough to Learn: LLM-based Anomaly Detection in ECU Logs without Reliable Labels |
Bogdan Bogdan et.al. |
2507.01077 |
null |
2025-07-01 |
On the Surprising Efficacy of LLMs for Penetration-Testing |
Andreas Happe et.al. |
2507.00829 |
null |
2025-07-01 |
Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding |
Guangyi Zhang et.al. |
2507.00605 |
null |
2025-07-01 |
TUM-MiKaNi at SemEval-2025 Task 3: Towards Multilingual and Knowledge-Aware Non-factual Hallucination Identification |
Miriam Anschütz et.al. |
2507.00579 |
null |
2025-07-01 |
Reliable Annotations with Less Effort: Evaluating LLM-Human Collaboration in Search Clarifications |
Leila Tavakoli et.al. |
2507.00543 |
null |
2025-06-30 |
Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission |
Faranaksadat Solat et.al. |
2507.00082 |
null |
2025-06-26 |
Estimating Correctness Without Oracles in LLM-Based Code Generation |
Thomas Valentin et.al. |
2507.00057 |
null |
2025-06-25 |
VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models |
Binesh Sadanandan et.al. |
2507.00052 |
null |
2025-06-30 |
Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice |
Akshit Kumar et.al. |
2506.23924 |
null |
2025-06-30 |
Large Language Models for Statistical Inference: Context Augmentation with Applications to the Two-Sample Problem and Regression |
Marc Ratkovic et.al. |
2506.23862 |
null |
2025-06-30 |
Leveraging a Multi-Agent LLM-Based System to Educate Teachers in Hate Incidents Management |
Ewelina Gajewska et.al. |
2506.23774 |
null |
2025-06-30 |
The Confidence Paradox: Can LLM Know When It’s Wrong |
Sahil Tripathi et.al. |
2506.23464 |
null |
2025-06-29 |
Do LLMs Dream of Discrete Algorithms? |
Claudionor Coelho Jr et.al. |
2506.23408 |
null |
2025-07-01 |
Learning-to-Context Slope: Evaluating In-Context Learning Effectiveness Beyond Performance Illusions |
Dingzriui Wang et.al. |
2506.23146 |
null |
2025-06-29 |
LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented Generation |
Shadman Sobhan et.al. |
2506.23136 |
null |
2025-06-28 |
Prompting without Panic: Attribute-aware, Zero-shot, Test-Time Calibration |
Ramya Hebbalaguppe et.al. |
2506.22819 |
null |
2025-06-28 |
Enhancing Android Malware Detection with Retrieval-Augmented Generation |
Saraga S. et.al. |
2506.22750 |
null |
2025-06-28 |
RAILS: Retrieval-Augmented Intelligence for Learning Software Development |
Wali Mohammad Abdullah et.al. |
2506.22742 |
null |
2025-06-27 |
ReCo: Reminder Composition Mitigates Hallucinations in Vision-Language Models |
Sotirios Panagiotis Chytas et.al. |
2506.22636 |
null |
2025-06-26 |
Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation |
Deyu Zou et.al. |
2506.22518 |
null |
2025-06-25 |
Mitigating Gambling-Like Risk-Taking Behaviors in Large Language Models: A Behavioral Economics Approach to AI Safety |
Y. Du et.al. |
2506.22496 |
null |
2025-06-24 |
Hallucination Detection with Small Language Models |
Ming Cheung et.al. |
2506.22486 |
null |
2025-06-27 |
Probabilistic Optimality for Inference-time Scaling |
Youkang Wang et.al. |
2506.22376 |
null |
2025-06-27 |
Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics |
Michael A. Riegler et.al. |
2506.21964 |
null |
2025-06-27 |
The Consistency Hypothesis in Uncertainty Quantification for Large Language Models |
Quan Xiao et.al. |
2506.21849 |
null |
2025-06-26 |
MobiVerse: Scaling Urban Mobility Simulation with Hybrid Lightweight Domain-Specific Generator and Large Language Models |
Yifan Liu et.al. |
2506.21784 |
null |
2025-06-26 |
Evaluating List Construction and Temporal Understanding capabilities of Large Language Models |
Alexandru Dumitru et.al. |
2506.21783 |
null |
2025-06-26 |
THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning? |
Xin Wang et.al. |
2506.21763 |
null |
2025-06-22 |
Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines |
Wenhao Li et.al. |
2506.21615 |
null |
2025-06-20 |
CORE-KG: An LLM-Driven Knowledge Graph Construction Framework for Human Smuggling Networks |
Dipak Meher et.al. |
2506.21607 |
null |
2025-06-26 |
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection |
Ali Şenol et.al. |
2506.21443 |
null |
2025-06-26 |
Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference |
Colin Samplawski et.al. |
2506.21408 |
null |
2025-06-26 |
Small Encoders Can Rival Large Decoders in Detecting Groundedness |
Istabrak Abbes et.al. |
2506.21288 |
null |
2025-06-26 |
BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services |
Zhaojiacheng Zhou et.al. |
2506.21033 |
null |
2025-06-26 |
Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers |
Martin Ruskov et.al. |
2506.20982 |
null |
2025-06-25 |
Towards Probabilistic Question Answering Over Tabular Data |
Chen Shen et.al. |
2506.20747 |
null |
2025-06-25 |
Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges |
Alexander D. Kalian et.al. |
2506.20598 |
null |
2025-06-26 |
TAPS: Tool-Augmented Personalisation via Structured Tagging |
Ekaterina Taktasheva et.al. |
2506.20409 |
null |
2025-06-25 |
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models |
Kejia Chen et.al. |
2506.20251 |
null |
2025-06-25 |
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs |
Ruokai Yin et.al. |
2506.20194 |
null |
2025-06-24 |
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality |
Baochang Ren et.al. |
2506.19807 |
null |
2025-06-24 |
LLM-Driven Medical Document Analysis: Enhancing Trustworthy Pathology and Differential Diagnosis |
Lei Kang et.al. |
2506.19702 |
null |
2025-06-24 |
Correcting Hallucinations in News Summaries: Exploration of Self-Correcting LLM Methods with External Knowledge |
Juraj Vladika et.al. |
2506.19607 |
null |
2025-06-24 |
Automatic Posology Structuration : What role for LLMs? |
Natalia Bobkova et.al. |
2506.19525 |
null |
2025-06-24 |
Inference-Time Reward Hacking in Large Language Models |
Hadi Khalaf et.al. |
2506.19248 |
null |
2025-06-23 |
AgenticControl: An Automated Control Design Framework Using Large Language Models |
Mohammad Narimani et.al. |
2506.19160 |
null |
2025-06-23 |
Human-Aligned Faithfulness in Toxicity Explanations of LLMs |
Ramaravind K. Mothilal et.al. |
2506.19113 |
null |
2025-06-23 |
Mirage of Mastery: Memorization Tricks LLMs into Artificially Inflated Self-Knowledge |
Sahil Kale et.al. |
2506.18998 |
null |
2025-06-23 |
AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs |
Piotr Matys et.al. |
2506.18628 |
null |
2025-06-23 |
ReFrame: Rectification Framework for Image Explaining Architectures |
Debjyoti Das Adhikary et.al. |
2506.18272 |
null |
2025-06-24 |
Understanding Reasoning in Thinking Language Models via Steering Vectors |
Constantin Venhoff et.al. |
2506.18167 |
null |
2025-06-22 |
Mechanistic Interpretability in the Presence of Architectural Obfuscation |
Marcos Florencio et.al. |
2506.18053 |
null |
2025-06-22 |
QueueEDIT: Structural Self-Correction for Sequential Model Editing in LLMs |
Taolin Zhang et.al. |
2506.17864 |
null |
2025-06-21 |
Is Your Automated Software Engineer Trustworthy? |
Noble Saji Mathews et.al. |
2506.17812 |
null |
2025-06-30 |
KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation |
Dalong Zhang et.al. |
2506.17728 |
null |
2025-06-21 |
Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering |
Binquan Ji et.al. |
2506.17692 |
null |
2025-06-21 |
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models |
Yukun Huang et.al. |
2506.17585 |
null |
2025-06-20 |
OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections |
Manasa Bharadwaj et.al. |
2506.17449 |
null |
2025-06-20 |
UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making |
Jinhao Duan et.al. |
2506.17419 |
null |
2025-06-20 |
Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs |
Zongjie Li et.al. |
2506.17353 |
null |
2025-06-18 |
Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study |
Chuanlei Li et.al. |
2506.17311 |
null |
2025-06-17 |
Semantic uncertainty in advanced decoding methods for LLM generation |
Darius Foodeei et.al. |
2506.17296 |
null |
2025-06-20 |
Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction |
Jiekai Ma et.al. |
2506.17203 |
null |
2025-06-20 |
Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation |
Jiahao Cheng et.al. |
2506.17088 |
null |
2025-06-20 |
Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond |
Antonin Berthon et.al. |
2506.16982 |
null |
2025-06-20 |
DistillNote: LLM-based clinical note summaries improve heart failure diagnosis |
Heloisa Oss Boll et.al. |
2506.16777 |
null |
2025-06-20 |
eSapiens: A Real-World NLP Framework for Multimodal Document Understanding and Enterprise Knowledge Processing |
Isaac Shi et.al. |
2506.16768 |
null |
2025-06-20 |
The Role of Model Confidence on Bias Effects in Measured Uncertainties |
Xinyi Liu et.al. |
2506.16724 |
null |
2025-06-19 |
Grounding Language Models with Semantic Digital Twins for Robotic Planning |
Mehreen Naeem et.al. |
2506.16493 |
null |
2025-06-19 |
Can GPT-4o Evaluate Usability Like Human Experts? A Comparative Study on Issue Identification in Heuristic Evaluation |
Guilherme Guerino et.al. |
2506.16345 |
null |
2025-06-19 |
SGIC: A Self-Guided Iterative Calibration Framework for RAG |
Guanhua Chen et.al. |
2506.16172 |
null |
2025-06-19 |
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior |
Hao Li et.al. |
2506.16163 |
link |
2025-06-19 |
Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning |
Duc Hieu Ho et.al. |
2506.16064 |
null |
2025-06-19 |
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling |
Fei Wang et.al. |
2506.16043 |
null |
2025-06-18 |
Understanding Online Polarization Through Human-Agent Interaction in a Synthetic LLM-Based Social Network |
Tim Donkers et.al. |
2506.15866 |
null |
2025-06-18 |
PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection |
Wenhao Li et.al. |
2506.15656 |
null |
2025-06-18 |
Context-Informed Grounding Supervision |
Hyunji Lee et.al. |
2506.15480 |
link |
2025-06-18 |
Unlocking Post-hoc Dataset Inference with Synthetic Data |
Bihe Zhao et.al. |
2506.15271 |
null |
2025-06-18 |
Robust Instant Policy: Leveraging Student’s t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation |
Hanbit Oh et.al. |
2506.15157 |
null |
2025-06-18 |
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models |
Trishna Chakraborty et.al. |
2506.15065 |
null |
2025-06-17 |
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning |
Wassim Bouaziz et.al. |
2506.14913 |
null |
2025-06-17 |
Issue Retrieval and Verification Enhanced Supplementary Code Comment Generation |
Yanzhen Zou et.al. |
2506.14649 |
link |
2025-06-17 |
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees |
Ahmed Heakl et.al. |
2506.14606 |
null |
2025-06-17 |
RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition |
Tim Cofala et.al. |
2506.14412 |
null |
2025-06-17 |
Don’t Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning |
William F. Shen et.al. |
2506.14387 |
null |
2025-06-17 |
AviationLLM: An LLM-based Knowledge System for Aviation Training |
Jia’ang Wan et.al. |
2506.14336 |
null |
2025-06-17 |
Improving LoRA with Variational Learning |
Bai Cong et.al. |
2506.14280 |
null |
2025-06-17 |
DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization |
Chengyu Huang et.al. |
2506.14157 |
link |
2025-06-17 |
Abstract Meaning Representation for Hospital Discharge Summarization |
Paul Landes et.al. |
2506.14101 |
link |
2025-06-20 |
Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs |
Hen Davidov et.al. |
2506.13593 |
link |
2025-06-16 |
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning |
David Bani-Harouni et.al. |
2506.13474 |
null |
2025-06-17 |
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models |
Junho Yoon et.al. |
2506.13472 |
null |
2025-06-16 |
From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs |
Alsharif Abuadbba et.al. |
2506.13434 |
null |
2025-06-16 |
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs |
Houcheng Jiang et.al. |
2506.13285 |
null |
2025-06-16 |
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation |
Zijie Lin et.al. |
2506.13229 |
link |
2025-06-16 |
SPOT: Bridging Natural Language and Geospatial Search for Investigative Journalists |
Lynn Khellaf et.al. |
2506.13188 |
null |
2025-06-16 |
Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning |
Danny Hoang et.al. |
2506.13026 |
null |
2025-06-17 |
Surprise Calibration for Better In-Context Learning |
Zhihang Tan et.al. |
2506.12796 |
null |
2025-06-15 |
Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning |
Alexis R. Tudor et.al. |
2506.12667 |
null |
2025-06-14 |
Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics |
Jiarui Liu et.al. |
2506.12657 |
null |
2025-06-14 |
GenControl: Generative AI-Driven Autonomous Design of Control Algorithms |
Chenggang Cui et.al. |
2506.12554 |
null |
2025-06-14 |
RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking |
Shuo Yang et.al. |
2506.12538 |
null |
2025-06-14 |
Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation |
Xiangyan Chen et.al. |
2506.12496 |
null |
2025-06-14 |
MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination |
Ao Jia et.al. |
2506.12483 |
null |
2025-06-13 |
Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach |
Khadija Zanna et.al. |
2506.12227 |
null |
2025-06-13 |
A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions |
Stephen Mell et.al. |
2506.12202 |
null |
2025-06-13 |
Maximally-Informative Retrieval for State Space Model Generation |
Evan Becker et.al. |
2506.12149 |
null |
2025-06-12 |
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model’s Response for Vulnerability Analysis |
Reza Fayyazi et.al. |
2506.12100 |
link |
2025-06-13 |
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? |
Zihan Zheng et.al. |
2506.11928 |
null |
2025-06-13 |
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search |
Zhenyu Hou et.al. |
2506.11902 |
link |
2025-06-16 |
Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making |
Claudio Fanconi et.al. |
2506.11887 |
null |
2025-06-13 |
Are LLMs Good Text Diacritizers? An Arabic and Yorùbá Case Study |
Hawau Olamide Toyin et.al. |
2506.11602 |
null |
2025-06-13 |
Augmenting the Generality and Performance of Large Language Models for Software Engineering |
Fabian C. Peña et.al. |
2506.11548 |
null |
2025-06-11 |
Digitization of Document and Information Extraction using OCR |
Rasha Sinha et.al. |
2506.11156 |
null |
2025-06-11 |
From over-reliance to smart integration: using Large-Language Models as translators between specialized modeling and simulation tools |
Philippe J. Giabbanelli et.al. |
2506.11141 |
null |
2025-06-10 |
Trustworthy AI for Medicine: Continuous Hallucination Detection and Elimination with CHECK |
Carlos Garcia-Fernandez et.al. |
2506.11129 |
null |
2025-06-14 |
Farseer: A Refined Scaling Law in Large Language Models |
Houyi Li et.al. |
2506.10972 |
link |
2025-06-12 |
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers |
Yixiao Huang et.al. |
2506.10887 |
null |
2025-06-13 |
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles |
Qingyan Wei et.al. |
2506.10848 |
link |
2025-06-12 |
Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs |
Alberto Testoni et.al. |
2506.10769 |
null |
2025-06-12 |
Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs |
Yilin Xiao et.al. |
2506.10508 |
null |
2025-06-12 |
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier |
Yuhua Jiang et.al. |
2506.10406 |
null |
2025-06-12 |
AutoGEEval++: A Multi-Level and Multi-Geospatial-Modality Automated Evaluation Framework for Large Language Models in Geospatial Code Generation on Google Earth Engine |
Shuyang Hou et.al. |
2506.10365 |
null |
2025-06-12 |
TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree |
Yu-Yang Qian et.al. |
2506.10355 |
link |
2025-06-12 |
Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements |
Seyed Moein Abtahi et.al. |
2506.10330 |
null |
2025-06-12 |
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models |
Qiyue Yin et.al. |
2506.10264 |
null |
2025-06-11 |
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs |
Xiyao Wang et.al. |
2506.10128 |
link |
2025-06-11 |
Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection |
David Farr et.al. |
2506.10104 |
null |
2025-06-11 |
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems |
Brendan Leigh Ross et.al. |
2506.10060 |
null |
2025-06-10 |
Evaluation empirique de la sécurisation et de l’alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks |
Rafaël Nouailles et.al. |
2506.10029 |
null |
2025-06-16 |
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs |
Hiroshi Matsuda et.al. |
2506.09983 |
link |
2025-06-11 |
Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs |
Rodion Oblovatny et.al. |
2506.09886 |
null |
2025-06-11 |
Do LLMs Give Psychometrically Plausible Responses in Educational Assessments? |
Andreas Säuberli et.al. |
2506.09796 |
null |
2025-06-11 |
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models |
Haoyi Song et.al. |
2506.09684 |
link |
2025-06-11 |
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering |
Tianjun Yao et.al. |
2506.09645 |
link |
2025-06-11 |
HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding |
Yanzhao Shi et.al. |
2506.09634 |
null |
2025-06-11 |
From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies |
Blaž Škrlj et.al. |
2506.09566 |
null |
2025-06-11 |
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts |
Yuchen Feng et.al. |
2506.09351 |
null |
2025-06-11 |
Know What You Don’t Know: Uncertainty Calibration of Process Reward Models |
Young-Jin Park et.al. |
2506.09338 |
null |
2025-06-10 |
G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration |
Samuel Holt et.al. |
2506.09272 |
null |
2025-06-10 |
Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation |
Karl Löwenmark et.al. |
2506.09247 |
null |
2025-06-10 |
The Curious Language Model: Strategic Test-Time Information Acquisition |
Michael Cooper et.al. |
2506.09173 |
null |
2025-06-10 |
Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models |
Xinyuan Wang et.al. |
2506.09084 |
null |
2025-06-10 |
FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making |
Jiaxiang Chen et.al. |
2506.09080 |
null |
2025-06-10 |
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions |
Polina Kirichenko et.al. |
2506.09038 |
link |
2025-06-11 |
Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance |
Kaifeng He et.al. |
2506.08980 |
null |
2025-06-10 |
The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation |
Francisco Vargas et.al. |
2506.08827 |
null |
2025-06-12 |
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization |
Hee Suk Yoon et.al. |
2506.08712 |
null |
2025-06-10 |
RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being |
Rahatara Ferdousi et.al. |
2506.08486 |
null |
2025-06-10 |
Olica: Efficient Structured Pruning of Large Language Models without Retraining |
Jiujun He et.al. |
2506.08436 |
link |
2025-06-11 |
Transforming Expert Knowledge into Scalable Ontology via Large Language Models |
Ikkei Itoku et.al. |
2506.08422 |
null |
2025-06-09 |
Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic |
Zhenjiang Mao et.al. |
2506.08243 |
null |
2025-06-09 |
Conservative Bias in Large Language Models: Measuring Relation Predictions |
Toyin Aguda et.al. |
2506.08120 |
null |
2025-06-10 |
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation |
Jiaxiang Chen et.al. |
2506.07820 |
null |
2025-06-09 |
Language-Vision Planner and Executor for Text-to-Visual Reasoning |
Yichang Xu et.al. |
2506.07778 |
null |
2025-06-09 |
QUITE: A Query Rewrite System Beyond Rules with LLM Agents |
Yuyang Song et.al. |
2506.07675 |
null |
2025-06-09 |
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models |
Ruiyang Zhang et.al. |
2506.07575 |
null |
2025-06-09 |
SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition |
Mengsong Wu et.al. |
2506.07557 |
null |
2025-06-09 |
CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models |
Guang Liu et.al. |
2506.07463 |
null |
2025-06-09 |
From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered |
Siddartha Devic et.al. |
2506.07461 |
null |
2025-06-09 |
Extending Epistemic Uncertainty Beyond Parameters Would Assist in Designing Reliable LLMs |
T. Duy Nguyen-Hien et.al. |
2506.07448 |
null |
2025-06-11 |
MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models |
Philip R. Liu et.al. |
2506.07400 |
link |
2025-06-10 |
ARGUS: Hallucination and Omission Evaluation in Video-LLMs |
Ruchit Rawal et.al. |
2506.07371 |
null |
2025-06-08 |
ConfQA: Answer Only If You Are Confident |
Yin Huang et.al. |
2506.07309 |
null |
2025-06-08 |
Impact of Label Noise from Large Language Models Generated Annotations on Evaluation of Diagnostic Model Performance |
Mohammadreza Chavoshi et.al. |
2506.07273 |
null |
2025-06-08 |
Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis |
Yaping Chai et.al. |
2506.07148 |
null |
2025-06-08 |
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models |
Samir Abdaljalil et.al. |
2506.07106 |
null |
2025-06-08 |
Com $^2$ : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models |
Kai Xiong et.al. |
2506.07064 |
null |
2025-06-08 |
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint |
Leheng Sheng et.al. |
2506.07022 |
link |
2025-06-07 |
Quantile Regression with Large Language Models for Price Prediction |
Nikhita Vedula et.al. |
2506.06657 |
null |
2025-06-07 |
\textit{QuantMCP}: Grounding Large Language Models in Verifiable Financial Reality |
Yifan Zeng et.al. |
2506.06622 |
null |
2025-06-06 |
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques |
Adarsh Prasad Behera et.al. |
2506.06579 |
null |
2025-06-06 |
Beyond Facts: Evaluating Intent Hallucination in Large Language Models |
Yijie Hao et.al. |
2506.06539 |
null |
2025-06-11 |
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models |
Pengyi Li et.al. |
2506.06395 |
null |
2025-06-04 |
On the Fundamental Impossibility of Hallucination Control in Large Language Models |
Michał P. Karpowicz et.al. |
2506.06382 |
null |
2025-06-06 |
Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge |
Yi Sui et.al. |
2506.06240 |
null |
2025-06-06 |
Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach |
James Ford et.al. |
2506.06175 |
null |
2025-06-06 |
Recommender systems, stigmergy, and the tyranny of popularity |
Zackary Okun Dunivin et.al. |
2506.06162 |
null |
2025-06-09 |
MIRIAD: Augmenting LLMs with millions of medical query-response pairs |
Qinyue Zheng et.al. |
2506.06091 |
null |
2025-06-06 |
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search |
Yu Li et.al. |
2506.06017 |
null |
2025-06-06 |
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques |
Xiaofei Xu et.al. |
2506.05924 |
null |
2025-06-06 |
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness |
Rongzhe Wei et.al. |
2506.05735 |
null |
2025-06-09 |
Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models |
Zefan Zeng et.al. |
2506.05675 |
null |
2025-06-05 |
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding |
Yan Shu et.al. |
2506.05551 |
null |
2025-06-05 |
Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models |
Sima Noorani et.al. |
2506.05497 |
null |
2025-06-05 |
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection |
Ron Eliav et.al. |
2506.05243 |
null |
2025-06-05 |
On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools |
Shivani Upadhyay et.al. |
2506.05182 |
link |
2025-06-05 |
When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models |
Kai Wang et.al. |
2506.04909 |
null |
2025-06-05 |
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights |
Giorgio Biancini et.al. |
2506.04851 |
null |
2025-06-05 |
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models |
Changyue Wang et.al. |
2506.04832 |
link |
2025-06-05 |
A Reasoning-Based Approach to Cryptic Crossword Clue Solving |
Martin Andrews et.al. |
2506.04824 |
null |
2025-06-05 |
GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval |
Lingyuan Liu et.al. |
2506.04762 |
link |
2025-06-05 |
Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning |
Zhiyuan Ma et.al. |
2506.04625 |
null |
2025-06-05 |
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification |
Chengwu Liu et.al. |
2506.04592 |
null |
2025-06-04 |
AuthGuard: Generalizable Deepfake Detection via Language Guidance |
Guangyu Shen et.al. |
2506.04501 |
null |
2025-06-04 |
“Don’t Do That!”: Guiding Embodied Systems through Large Language Model-based Constraint Generation |
Aladin Djuhera et.al. |
2506.04500 |
null |
2025-06-04 |
Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification |
Payel Bhattacharjee et.al. |
2506.04450 |
null |
2025-06-06 |
TracLLM: A Generic Framework for Attributing Long Context LLMs |
Yanting Wang et.al. |
2506.04202 |
link |
2025-06-04 |
N $^2$ : A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion |
Caleb Chin et.al. |
2506.04166 |
link |
2025-06-04 |
A Dataset for Addressing Patient’s Information Needs related to Clinical Course of Hospitalization |
Sarvesh Soni et.al. |
2506.04156 |
null |
2025-06-04 |
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning |
Tim Franzmeyer et.al. |
2506.04051 |
null |
2025-06-04 |
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization |
Jiulong Wu et.al. |
2506.04039 |
null |
2025-06-05 |
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems |
Yuxin Zhang et.al. |
2506.03901 |
null |
2025-06-04 |
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation |
Mingxuan Xia et.al. |
2506.03857 |
null |
2025-06-04 |
From Theory to Practice: Real-World Use Cases on Trustworthy LLM-Driven Process Modeling, Prediction and Automation |
Peter Pfeiffer et.al. |
2506.03801 |
null |
2025-06-04 |
Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision |
Chaeyun Jang et.al. |
2506.03723 |
null |
2025-06-04 |
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism |
Zhepei Wei et.al. |
2506.03700 |
link |
2025-06-04 |
Robust Preference Optimization via Dynamic Target Margins |
Jie Sun et.al. |
2506.03690 |
null |
2025-06-04 |
Trustworthy Medical Question Answering: An Evaluation-Centric Survey |
Yinuo Wang et.al. |
2506.03659 |
null |
2025-06-04 |
Learning to Insert [PAUSE] Tokens for Better Reasoning |
Eunki Kim et.al. |
2506.03616 |
null |
2025-06-04 |
Beyond C/C++: Probabilistic and LLM Methods for Next-Generation Software Reverse Engineering |
Zhuo Zhuo et.al. |
2506.03504 |
null |
2025-06-03 |
Exploiting LLMs for Automatic Hypothesis Assessment via a Logit-Based Calibrated Prior |
Yue Gong et.al. |
2506.03444 |
null |
2025-06-03 |
Sampling Preferences Yields Simple Trustworthiness Scores |
Sean Steinle et.al. |
2506.03399 |
null |
2025-06-03 |
Ask a Local: Detecting Hallucinations With Specialized Model Divergence |
Aldan Creo et.al. |
2506.03357 |
null |
2025-06-03 |
Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows |
Yifei Ming et.al. |
2506.03332 |
null |
2025-06-03 |
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes |
Christodoulos Constantinides et.al. |
2506.03278 |
link |
2025-06-03 |
Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech |
Florian Ludwig et.al. |
2506.03009 |
null |
2025-06-03 |
Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation |
Li Zhang et.al. |
2506.02992 |
null |
2025-06-03 |
Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation |
Dingwei Chen et.al. |
2506.02973 |
null |
2025-06-04 |
A Multi-agent LLM-based JUnit Test Generation with Strong Oracles |
Qinghua Xu et.al. |
2506.02943 |
null |
2025-06-03 |
Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs |
Shangmin Guo et.al. |
2506.02918 |
null |
2025-06-03 |
Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs |
Wenjing Tang et.al. |
2506.02860 |
null |
2025-06-03 |
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations |
Jinyuan Luo et.al. |
2506.02696 |
null |
2025-06-04 |
Computational Thinking Reasoning in Large Language Models |
Kechi Zhang et.al. |
2506.02658 |
null |
2025-06-03 |
In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration |
Jiajie Fu et.al. |
2506.02509 |
null |
2025-06-03 |
Generative AI for Predicting 2D and 3D Wildfire Spread: Beyond Physics-Based Models and Traditional Deep Learning |
Haowen Xu et.al. |
2506.02485 |
null |
2025-06-02 |
Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation |
Priyaranjan Pattnayak et.al. |
2506.02097 |
null |
2025-06-02 |
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation |
Jennifer Chen et.al. |
2506.01954 |
null |
2025-06-02 |
Self-ensemble: Mitigating Confidence Distortion for Large Language Models |
Zicheng Xu et.al. |
2506.01951 |
null |
2025-06-02 |
WHEN TO ACT, WHEN TO WAIT: Modeling Structural Trajectories for Intent Triggerability in Task-Oriented Dialogue |
Yaoyao Qian et.al. |
2506.01881 |
link |
2025-06-02 |
Benford’s Curse: Tracing Digit Bias to Numerical Hallucination in LLMs |
Jiandong Shao et.al. |
2506.01734 |
null |
2025-06-02 |
Fairness Dynamics During Training |
Krishna Patel et.al. |
2506.01709 |
null |
2025-06-02 |
When LLMs Team Up: The Emergence of Collaborative Affective Computing |
Wenna Lai et.al. |
2506.01698 |
null |
2025-06-02 |
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments |
Xiao Yang et.al. |
2506.01616 |
null |
2025-06-02 |
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes |
Meng Li et.al. |
2506.01512 |
null |
2025-06-02 |
MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations |
Kensuke Mitsuzawa et.al. |
2506.01367 |
null |
2025-06-02 |
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents |
Manan Suri et.al. |
2506.01344 |
null |
2025-06-02 |
Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model |
Yuanhe Tian et.al. |
2506.01266 |
null |
2025-06-01 |
Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation |
Pimchanok Sukjai et.al. |
2506.01118 |
null |
2025-06-01 |
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation |
Xinyi Liu et.al. |
2506.01116 |
null |
2025-06-01 |
Reconsidering LLM Uncertainty Estimation Methods in the Wild |
Yavuz Bakman et.al. |
2506.01114 |
null |
2025-06-01 |
Contextual Candor: Enhancing LLM Trustworthiness Through Hierarchical Unanswerability Detection |
Steven Robinson et.al. |
2506.01104 |
null |
2025-06-01 |
Taming LLMs by Scaling Learning Rates with Gradient Grouping |
Siyuan Li et.al. |
2506.01049 |
null |
2025-06-01 |
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks |
Yuntai Bao et.al. |
2506.00823 |
link |
2025-06-01 |
One for All: Update Parameterized Knowledge Across Multiple Models |
Weitao Ma et.al. |
2506.00817 |
null |
2025-06-01 |
Enhancing LLM Reasoning for Time Series Classification by Tailored Thinking and Fused Decision |
Jiahui Zhou et.al. |
2506.00807 |
null |
2025-06-01 |
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision |
Rong Wu et.al. |
2506.00783 |
null |
2025-06-01 |
Do not Abstain! Identify and Solve the Uncertainty |
Jingyu Liu et.al. |
2506.00780 |
null |
2025-05-31 |
Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection |
Yeshwanth Venkatesha et.al. |
2506.00743 |
null |
2025-05-31 |
Pitfalls in Evaluating Language Model Forecasters |
Daniel Paleka et.al. |
2506.00723 |
null |
2025-06-03 |
Measuring Faithfulness and Abstention: An Automated Pipeline for Evaluating LLM-Generated 3-ply Case-Based Legal Arguments |
Li Zhang et.al. |
2506.00694 |
null |
2025-05-31 |
Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs |
Chenjun Xu et.al. |
2506.00582 |
link |
2025-05-31 |
AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs |
Nicholas E. Corrado et.al. |
2506.00569 |
null |
2025-06-03 |
CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention |
Yuxi Sun et.al. |
2506.00519 |
null |
2025-05-31 |
Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering |
Linhao Ye et.al. |
2506.00491 |
null |
2025-05-31 |
Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization |
Suhas BN et.al. |
2506.00448 |
null |
2025-05-31 |
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy |
Jie Ren et.al. |
2506.00359 |
null |
2025-05-31 |
Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs |
Sungjae Lee et.al. |
2506.00344 |
null |
2025-05-31 |
TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering |
Boyi Zhang et.al. |
2506.00331 |
null |
2025-05-31 |
Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning |
Sara Ghazanfari et.al. |
2506.00318 |
null |
2025-05-30 |
Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity |
Dang Nguyen et.al. |
2506.00245 |
null |
2025-05-30 |
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs |
Gabrielle Kaili-May Liu et.al. |
2505.24858 |
link |
2025-05-30 |
Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs |
Juraj Vladika et.al. |
2505.24830 |
null |
2025-06-02 |
Guiding Generative Storytelling with Knowledge Graphs |
Zhijun Pan et.al. |
2505.24803 |
null |
2025-05-30 |
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty? |
Jiayu Liu et.al. |
2505.24778 |
link |
2025-05-30 |
Can LLMs and humans be friends? Uncovering factors affecting human-AI intimacy formation |
Yeseon Hong et.al. |
2505.24658 |
null |
2025-05-30 |
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models |
Junyi Li et.al. |
2505.24630 |
link |
2025-05-30 |
LLM Inference Enhanced by External Knowledge: A Survey |
Yu-Hsuan Lin et.al. |
2505.24377 |
link |
2025-05-30 |
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration |
Xianglong Yan et.al. |
2505.24357 |
null |
2025-05-30 |
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction |
Yangui Fang et.al. |
2505.24347 |
null |
2025-05-30 |
LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization |
Zirui Shang et.al. |
2505.24282 |
null |
2025-06-02 |
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM |
Bowen Dong et.al. |
2505.24238 |
null |
2025-05-30 |
ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction |
Murari Ambati et.al. |
2505.24230 |
null |
2025-05-30 |
Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling |
Yimin Du et.al. |
2505.24199 |
null |
2025-05-29 |
Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model |
Nokimul Hasan Arif et.al. |
2505.24007 |
null |
2025-05-29 |
Fitting the Message to the Moment: Designing Calendar-Aware Stress Messaging with Large Language Models |
Pranav Rao et.al. |
2505.23997 |
null |
2025-05-29 |
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs |
Yinong Oliver Wang et.al. |
2505.23996 |
null |
2025-05-29 |
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression |
Jiayi Tian et.al. |
2505.23966 |
link |
2025-05-29 |
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation |
Caiqi Zhang et.al. |
2505.23912 |
null |
2025-05-29 |
Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems |
Winstead Zhu et.al. |
2505.23908 |
null |
2025-05-29 |
Revisiting Uncertainty Estimation and Calibration of Large Language Models |
Linwei Tao et.al. |
2505.23854 |
null |
2025-05-28 |
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs |
Jakub Podolak et.al. |
2505.23845 |
null |
2025-05-28 |
SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context |
Hairu Wang et.al. |
2505.23841 |
null |
2025-05-29 |
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models |
Zixiang Xu et.al. |
2505.23713 |
link |
2025-06-02 |
Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation |
Hongxiang Zhang et.al. |
2505.23657 |
null |
2025-06-01 |
Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms |
Jane Cleland-Huang et.al. |
2505.23576 |
null |
2025-05-30 |
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions |
Xiaorui Wu et.al. |
2505.23473 |
null |
2025-06-01 |
A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy |
Ahmad Mohsin et.al. |
2505.23397 |
null |
2025-05-29 |
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs |
Julia Belikova et.al. |
2505.23299 |
null |
2025-05-29 |
Daunce: Data Attribution through Uncertainty Estimation |
Xingyuan Pan et.al. |
2505.23223 |
null |
2025-05-29 |
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes |
Sungjune Park et.al. |
2505.23179 |
null |
2025-05-29 |
AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models |
Jinchuan Zhang et.al. |
2505.23020 |
link |
2025-05-28 |
Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents |
Michael Kirchhof et.al. |
2505.22655 |
null |
2025-05-28 |
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason |
Ang Lv et.al. |
2505.22653 |
null |
2025-05-30 |
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs |
Ziling Cheng et.al. |
2505.22630 |
null |
2025-05-28 |
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding |
Chengyue Wu et.al. |
2505.22618 |
null |
2025-05-28 |
Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users |
Victor Jüttner et.al. |
2505.22435 |
null |
2025-05-28 |
AI Trust Reshaping Administrative Burdens: Understanding Trust-Burden Dynamics in LLM-Assisted Benefits Systems |
Jeongwon Jo et.al. |
2505.22418 |
null |
2025-05-28 |
Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation |
Yunsoo Kim et.al. |
2505.22222 |
null |
2025-05-31 |
iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs |
Runkai Li et.al. |
2505.22086 |
null |
2025-05-28 |
Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home? |
Yujin Choi et.al. |
2505.22061 |
null |
2025-05-28 |
Legal Assist AI: Leveraging Transformer-Based Model for Effective Legal Assistance |
Jatin Gupta et.al. |
2505.22003 |
null |
2025-05-28 |
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning |
Zhendong Mi et.al. |
2505.21987 |
null |
2025-05-28 |
Judging LLMs on a Simplex |
Patrick Vossler et.al. |
2505.21972 |
null |
2025-05-28 |
Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning |
Qihuang Zhong et.al. |
2505.21958 |
null |
2025-05-27 |
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation |
Tharindu Kumarage et.al. |
2505.21784 |
null |
2025-05-27 |
Calibrating LLM Confidence by Probing Perturbed Representation Stability |
Reza Khanmohammadi et.al. |
2505.21772 |
null |
2025-05-30 |
Do We Know What LLMs Don’t Know? A Study of Consistency in Knowledge Probing |
Raoyuan Zhao et.al. |
2505.21701 |
null |
2025-05-27 |
The Feasibility of Topic-Based Watermarking on Academic Peer Reviews |
Alexander Nemecek et.al. |
2505.21636 |
null |
2025-05-27 |
Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems |
Young-Min Cho et.al. |
2505.21588 |
null |
2025-05-27 |
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making |
Yihan Wang et.al. |
2505.21503 |
null |
2025-05-27 |
Can Large Reasoning Models Self-Train? |
Sheikh Shafayat et.al. |
2505.21444 |
null |
2025-05-27 |
Pretrained LLMs Learn Multiple Types of Uncertainty |
Roi Cohen et.al. |
2505.21218 |
null |
2025-05-27 |
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA |
Sergey Pletenev et.al. |
2505.21115 |
null |
2025-05-27 |
A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction |
Bogdan Bogachov et.al. |
2505.21109 |
null |
2025-05-27 |
Thinker: Learning to Think Fast and Slow |
Stephen Chung et.al. |
2505.21097 |
null |
2025-05-28 |
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation |
Ekaterina Fadeeva et.al. |
2505.21072 |
null |
2025-05-27 |
Large Language Model-enhanced Reinforcement Learning for Low-Altitude Economy Networking |
Lingyi Cai et.al. |
2505.21045 |
null |
2025-05-27 |
Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA |
Xiangqing Shen et.al. |
2505.20971 |
null |
2025-05-27 |
IRCopilot: Automated Incident Response with Large Language Models |
Xihuan Lin et.al. |
2505.20945 |
null |
2025-05-27 |
Towards Objective Fine-tuning: How LLMs’ Prior Knowledge Causes Potential Poor Calibration? |
Ziming Wang et.al. |
2505.20903 |
null |
2025-05-27 |
MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection |
Baraa Hikal et.al. |
2505.20880 |
null |
2025-05-27 |
Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG |
Xin Sun et.al. |
2505.20871 |
null |
2025-05-27 |
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding |
Chaeyoung Jung et.al. |
2505.20862 |
null |
2025-05-27 |
Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation |
Wooseong Yang et.al. |
2505.20773 |
null |
2025-05-30 |
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models |
Xiaqiang Tang et.al. |
2505.20767 |
link |
2025-05-27 |
RRO: LLM Agent Optimization Through Rising Reward Trajectories |
Zilong Wang et.al. |
2505.20737 |
null |
2025-05-26 |
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting |
Ana Rita Ortigoso et.al. |
2505.20521 |
null |
2025-05-26 |
InFact: Informativeness Alignment for Improved LLM Factuality |
Roi Cohen et.al. |
2505.20487 |
null |
2025-05-26 |
HAMburger: Accelerating LLM Inference via Token Smashing |
Jingyu Liu et.al. |
2505.20438 |
null |
2025-05-26 |
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation |
Zihong Chen et.al. |
2505.20416 |
link |
2025-05-26 |
GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining |
Simin Fan et.al. |
2505.20380 |
null |
2025-05-26 |
Reasoning LLMs are Wandering Solution Explorers |
Jiahao Lu et.al. |
2505.20296 |
null |
2025-05-26 |
Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution? |
Michael Kirchhof et.al. |
2505.20295 |
null |
2025-05-26 |
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models |
Weihao Xuan et.al. |
2505.20236 |
null |
2025-05-27 |
Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning |
Xiaorong Wang et.al. |
2505.20195 |
null |
2025-05-26 |
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data |
Chun-Yi Kuan et.al. |
2505.20166 |
null |
2025-05-26 |
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities |
Chuangtao Ma et.al. |
2505.20099 |
link |
2025-05-26 |
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks |
Debargha Ganguly et.al. |
2505.20047 |
null |
2025-05-26 |
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs |
Artem Vazhentsev et.al. |
2505.20045 |
null |
2025-05-26 |
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response |
Bilel Cherif et.al. |
2505.19973 |
null |
2025-05-26 |
CP-Router: An Uncertainty-Aware Router Between LLM and LRM |
Jiayuan Su et.al. |
2505.19970 |
null |
2025-05-26 |
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision |
Tej Deep Pala et.al. |
2505.19706 |
link |
2025-05-26 |
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement |
Liqin Ye et.al. |
2505.19675 |
link |
2025-05-26 |
DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue |
Yichun Feng et.al. |
2505.19630 |
link |
2025-05-26 |
Learning to Reason without External Rewards |
Xuandong Zhao et.al. |
2505.19590 |
link |
2025-05-26 |
Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models |
Jianxing Liao et.al. |
2505.19490 |
null |
2025-05-26 |
Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection |
Mohammad Mahdi Moradi et.al. |
2505.19475 |
null |
2025-05-26 |
Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents |
Ye Ye et.al. |
2505.19436 |
link |
2025-05-26 |
Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering |
Jiajun Zhu et.al. |
2505.19410 |
null |
2025-05-26 |
VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation |
Ethan TS. Liu et.al. |
2505.19395 |
link |
2025-05-25 |
Likert or Not: LLM Absolute Relevance Judgments on Fine-Grained Ordinal Scales |
Charles Godfrey et.al. |
2505.19334 |
null |
2025-05-25 |
LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models |
Aida Kostikova et.al. |
2505.19240 |
null |
2025-05-25 |
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling |
Jialong Zhou et.al. |
2505.19234 |
null |
2025-05-25 |
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling |
Yang Xiao et.al. |
2505.19187 |
link |
2025-05-27 |
When Two LLMs Debate, Both Think They’ll Win |
Pradyumna Shyama Prasad et.al. |
2505.19184 |
null |
2025-05-25 |
Do Large Language Models (Really) Need Statistical Foundations? |
Weijie Su et.al. |
2505.19145 |
null |
2025-05-25 |
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models |
Yongheng Zhang et.al. |
2505.19108 |
link |
2025-05-25 |
Towards Harmonized Uncertainty Estimation for Large Language Models |
Rui Li et.al. |
2505.19073 |
null |
2025-05-25 |
UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models |
Roman Vashurin et.al. |
2505.19060 |
null |
2025-05-25 |
Online Knowledge Distillation with Reward Guidance |
Chen Jia et.al. |
2505.18952 |
null |
2025-05-25 |
LLM-Guided Taxonomy and Hierarchical Uncertainty for 3D Point CLoud Active Learning |
Chenxi Li et.al. |
2505.18924 |
null |
2025-05-24 |
Mitigating Deceptive Alignment via Self-Monitoring |
Jiaming Ji et.al. |
2505.18807 |
null |
2025-05-24 |
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs |
Tengxuan Liu et.al. |
2505.18610 |
link |
2025-05-24 |
Response Uncertainty and Probe Modeling: Two Sides of the Same Coin in LLM Interpretability? |
Yongjie Wang et.al. |
2505.18575 |
null |
2025-05-24 |
B-score: Detecting biases in large language models using response history |
An Vo et.al. |
2505.18545 |
null |
2025-05-24 |
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation |
Baolei Zhang et.al. |
2505.18543 |
null |
2025-05-24 |
RoleRAG: Enhancing LLM Role-Playing via Graph Guided Retrieval |
Yongjie Wang et.al. |
2505.18541 |
null |
2025-05-24 |
AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking |
Soyoung Yoon et.al. |
2505.18512 |
link |
2025-05-24 |
MedScore: Factuality Evaluation of Free-Form Medical Answers |
Heyuan Huang et.al. |
2505.18452 |
link |
2025-05-23 |
Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps |
Khandakar Ashrafi Akbar et.al. |
2505.18426 |
null |
2025-05-23 |
Model Editing with Graph-Based External Memory |
Yash Kumar Atri et.al. |
2505.18343 |
null |
2025-05-23 |
NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache |
Donghyun Son et.al. |
2505.18231 |
null |
2025-05-23 |
Evidence-Grounded Multimodal Misinformation Detection with Attention-Based GNNs |
Sharad Duwal et.al. |
2505.18221 |
null |
2025-05-26 |
Outcome-based Reinforcement Learning to Predict the Future |
Benjamin Turtel et.al. |
2505.17989 |
null |
2025-05-23 |
LLM Meeting Decision Trees on Tabular Data |
Hangting Ye et.al. |
2505.17918 |
null |
2025-05-23 |
Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour |
Bálint Gyevnár et.al. |
2505.17801 |
null |
2025-05-23 |
C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models |
Amir Hossein Rahmati et.al. |
2505.17773 |
null |
2025-05-23 |
But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors |
Leon Eshuijs et.al. |
2505.17760 |
null |
2025-05-23 |
Get Experience from Practice: LLM Agents with Record & Replay |
Erhu Feng et.al. |
2505.17716 |
null |
2025-05-23 |
Distilling LLM Agent into Small Models with Retrieval and Code Tools |
Minki Kang et.al. |
2505.17612 |
link |
2025-05-23 |
Dynamic Text Bundling Supervision for Zero-Shot Inference on Text-Attributed Graphs |
Yusheng Zhao et.al. |
2505.17599 |
null |
2025-05-23 |
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection |
Shrey Pandit et.al. |
2505.17558 |
null |
2025-05-23 |
How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception |
Shiyu Ni et.al. |
2505.17537 |
null |
2025-05-23 |
CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents |
Minsoo Khang et.al. |
2505.17503 |
null |
2025-05-23 |
keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection |
Saketh Reddy Vemula et.al. |
2505.17485 |
link |
2025-05-23 |
Self-Training Large Language Models with Confident Reasoning |
Hyosoon Jang et.al. |
2505.17454 |
null |
2025-05-23 |
A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit |
Zafarullah Mahmood et.al. |
2505.17362 |
link |
2025-05-22 |
GPT Editors, Not Authors: The Stylistic Footprint of LLMs in Academic Preprints |
Soren DeHaan et.al. |
2505.17327 |
null |
2025-05-22 |
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty |
Peilin Wu et.al. |
2505.17281 |
null |
2025-05-22 |
Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG) |
Clayton Cohn et.al. |
2505.17238 |
null |
2025-05-22 |
LLM-Powered Agents for Navigating Venice’s Historical Cadastre |
Tristan Karch et.al. |
2505.17148 |
null |
2025-05-22 |
When can isotropy help adapt LLMs’ next word prediction to numerical domains? |
Rashed Shelim et.al. |
2505.17135 |
null |
2025-05-21 |
NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction |
Soyeon Kim et.al. |
2505.17125 |
null |
2025-05-22 |
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning |
Huatong Song et.al. |
2505.17005 |
link |
2025-05-22 |
UNCLE: Uncertainty Expressions in Long-Form Generation |
Ruihan Yang et.al. |
2505.16922 |
null |
2025-05-22 |
Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs |
Zeyu Wei et.al. |
2505.16894 |
null |
2025-05-22 |
Walk&Retrieve: Simple Yet Effective Zero-shot Retrieval-Augmented Generation via Knowledge Graph Walks |
Martin Böckling et.al. |
2505.16849 |
link |
2025-05-22 |
Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement |
Kexin Zhang et.al. |
2505.16806 |
null |
2025-05-22 |
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs |
Zeping Yu et.al. |
2505.16703 |
null |
2025-05-22 |
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator |
Beier Luo et.al. |
2505.16690 |
null |
2025-05-22 |
Collaboration among Multiple Large Language Models for Medical Question Answering |
Kexin Shang et.al. |
2505.16648 |
null |
2025-05-22 |
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering |
Bowen Jiang et.al. |
2505.16591 |
null |
2025-05-22 |
Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs |
Giovanni Servedio et.al. |
2505.16520 |
null |
2025-05-24 |
Recursive Offloading for LLM Serving in Multi-tier Networks |
Zhiyuan Wu et.al. |
2505.16502 |
link |
2025-05-22 |
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery |
Yanbo Zhang et.al. |
2505.16477 |
null |
2025-05-22 |
MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM |
Siwei Meng et.al. |
2505.16456 |
null |
2025-05-22 |
Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems |
Hongru Song et.al. |
2505.16367 |
null |
2025-05-22 |
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation |
Shijie Zhang et.al. |
2505.16281 |
null |
2025-05-22 |
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation |
Derong Xu et.al. |
2505.16237 |
null |
2025-05-22 |
Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models |
Menschikov Mikhail et.al. |
2505.16134 |
null |
2025-05-22 |
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning |
Junhong Lin et.al. |
2505.16122 |
null |
2025-05-22 |
LLM-Powered AI Agent Systems and Their Applications in Industry |
Guannan Liang et.al. |
2505.16120 |
null |
2025-05-22 |
Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools |
Panagiotis Lymperopoulos et.al. |
2505.16113 |
null |
2025-05-23 |
Continually Self-Improving Language Models for Bariatric Surgery Question–Answering |
Yash Kumar Atri et.al. |
2505.16102 |
null |
2025-05-21 |
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation |
Ruijie Xi et.al. |
2505.16065 |
null |
2025-05-21 |
SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models |
Roland Daynauth et.al. |
2505.16003 |
null |
2025-05-22 |
HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving |
Zhiwen Chen et.al. |
2505.15793 |
null |
2025-05-21 |
Long-Form Information Alignment Evaluation Beyond Atomic Facts |
Danna Zheng et.al. |
2505.15792 |
null |
2025-05-21 |
Large Language Models as Computable Approximations to Solomonoff Induction |
Jun Wan et.al. |
2505.15784 |
null |
2025-05-21 |
KaFT: Knowledge-aware Fine-tuning for Boosting LLMs’ Domain-specific Question-Answering Performance |
Qihuang Zhong et.al. |
2505.15480 |
null |
2025-05-21 |
AdUE: Improving uncertainty estimation head for LoRA adapters in LLMs |
Artem Zabolotnyi et.al. |
2505.15443 |
null |
2025-05-21 |
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection |
Yiming Huang et.al. |
2505.15386 |
null |
2025-05-21 |
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack |
Silvia Cappelletti et.al. |
2505.15323 |
null |
2025-05-21 |
Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization |
Joonho Yang et.al. |
2505.15291 |
null |
2025-05-21 |
Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs |
Zihao Pan et.al. |
2505.15265 |
null |
2025-05-22 |
Adaptive Plan-Execute Framework for Smart Contract Security Auditing |
Zhiyuan Wei et.al. |
2505.15242 |
null |
2025-05-21 |
Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge |
Yassir Fathullah et.al. |
2505.15240 |
null |
2025-05-21 |
Multilingual Prompting for Improving LLM Generation Diversity |
Qihan Wang et.al. |
2505.15229 |
null |
2025-05-21 |
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs |
Jie Ma et.al. |
2505.15210 |
link |
2025-05-21 |
ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection |
Jeonghye Kim et.al. |
2505.15182 |
null |
2025-05-21 |
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning |
Jinghui Lu et.al. |
2505.15154 |
null |
2025-05-21 |
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning |
Shivam Agarwal et.al. |
2505.15134 |
link |
2025-05-21 |
RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals |
Xuanliang Zhang et.al. |
2505.15110 |
null |
2025-05-21 |
Cost-aware LLM-based Online Dataset Annotation |
Eray Can Elumar et.al. |
2505.15101 |
null |
2025-05-21 |
PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration |
Yingming Pu et.al. |
2505.15047 |
link |
2025-05-21 |
Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models |
Zhihao Wen et.al. |
2505.14992 |
null |
2025-05-20 |
JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation |
Ghasem Pasandi et.al. |
2505.14978 |
null |
2025-05-20 |
Foundations of Unknown-aware Machine Learning |
Xuefeng Du et.al. |
2505.14933 |
null |
2025-05-20 |
$\texttt{LLINBO}$ : Trustworthy LLM-in-the-Loop Bayesian Optimization |
Chih-Yu Chang et.al. |
2505.14756 |
link |
2025-05-20 |
Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models |
Guangzhi Xiong et.al. |
2505.14599 |
link |
2025-05-20 |
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples |
Chun-Yi Kuan et.al. |
2505.14518 |
null |
2025-05-20 |
Reasoning Models Better Express Their Confidence |
Dongkeun Yoon et.al. |
2505.14489 |
link |
2025-05-21 |
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis |
Haoming Huang et.al. |
2505.14406 |
null |
2025-05-20 |
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs |
Jiawen Wang et.al. |
2505.14368 |
null |
2025-05-20 |
Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents |
Wei Fan et.al. |
2505.14104 |
null |
2025-05-20 |
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations |
Ernests Lavrinovics et.al. |
2505.14101 |
link |
2025-05-20 |
Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering |
Yihua Zhu et.al. |
2505.14099 |
null |
2025-05-20 |
ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data |
Xinzhe Zheng et.al. |
2505.14038 |
null |
2025-05-21 |
When LLMs meet open-world graph learning: a new perspective for unlabeled data uncertainty |
Yanzhe Wen et.al. |
2505.13989 |
null |
2025-05-20 |
The Hallucination Tax of Reinforcement Finetuning |
Linxin Song et.al. |
2505.13988 |
null |
2025-05-20 |
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation |
Haoyang Fang et.al. |
2505.13941 |
link |
2025-05-20 |
DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery |
Kun Li et.al. |
2505.13940 |
link |
2025-05-20 |
Preference Learning with Lie Detectors can Induce Honesty or Evasion |
Chris Cundy et.al. |
2505.13787 |
link |
2025-05-19 |
Incentivizing Truthful Language Models via Peer Elicitation Games |
Baiting Chen et.al. |
2505.13636 |
link |
2025-05-19 |
Selective Code Generation for Functional Guarantees |
Jaewoo Jeong et.al. |
2505.13553 |
null |
2025-05-19 |
Exploring Federated Pruning for Large Language Models |
Pengxin Guo et.al. |
2505.13547 |
link |
2025-05-19 |
Know Or Not: a library for evaluating out-of-knowledge base robustness |
Jessica Foo et.al. |
2505.13545 |
link |
2025-05-16 |
An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents |
Ayesha Amjad et.al. |
2505.13504 |
null |
2025-05-19 |
GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection |
Zhijie Deng et.al. |
2505.13312 |
null |
2025-05-19 |
Tianyi: A Traditional Chinese Medicine all-rounder language model and its Real-World Clinical Practice |
Zhi Liu et.al. |
2505.13156 |
null |
2025-05-19 |
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning |
Debarpan Bhattacharya et.al. |
2505.13115 |
link |
2025-05-19 |
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs |
Shmulik Markovich-Golan et.al. |
2505.13060 |
null |
2025-05-19 |
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering |
Jianfeng Cai et.al. |
2505.12826 |
null |
2025-05-19 |
LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries |
Kenya Abe et.al. |
2505.12694 |
link |
2025-05-19 |
Know3-RAG: A Knowledge-aware RAG Framework with Adaptive Retrieval, Generation, and Filtering |
Xukai Liu et.al. |
2505.12662 |
link |
2025-05-18 |
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection |
Yang Zhao et.al. |
2505.12457 |
null |
2025-05-18 |
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning |
Qi Wang et.al. |
2505.12434 |
link |
2025-05-18 |
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration |
Wenqiao Zhu et.al. |
2505.12423 |
link |
2025-05-18 |
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization |
Minghan Chen et.al. |
2505.12346 |
null |
2025-05-18 |
Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge |
Luyu Chen et.al. |
2505.12301 |
null |
2025-05-18 |
The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models |
Linghan Huang et.al. |
2505.12287 |
null |
2025-05-18 |
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation |
Chengwei Qin et.al. |
2505.12265 |
null |
2025-05-17 |
The Impact of Emerging Phishing Threats: Assessing Quishing and LLM-generated Phishing Emails against Organizations |
Marie Weinz et.al. |
2505.12104 |
null |
2025-05-20 |
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities |
Jingxue Chen et.al. |
2505.12043 |
null |
2025-05-17 |
SOCIA: An End-to-End Agentic Framework for Automated Cyber-Physical-Social Simulator Generation |
Yuncheng Hua et.al. |
2505.12006 |
null |
2025-05-17 |
TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text |
Ahmed Lekssays et.al. |
2505.11988 |
link |
2025-05-17 |
CCNU at SemEval-2025 Task 3: Leveraging Internal and External Knowledge of Large Language Models for Multilingual Hallucination Annotation |
Xu Liu et.al. |
2505.11965 |
null |
2025-05-17 |
Fine-Grained ECG-Text Contrastive Learning via Waveform Understanding Enhancement |
Haitao Li et.al. |
2505.11939 |
null |
2025-05-17 |
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning? |
Zihao Dongfang et.al. |
2505.11907 |
null |
2025-05-17 |
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research |
Guijin Son et.al. |
2505.11855 |
null |
2025-05-17 |
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs |
Xuannan Liu et.al. |
2505.11842 |
link |
2025-05-17 |
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling |
Yitian Chen et.al. |
2505.11792 |
null |
2025-05-17 |
Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission |
Seungeun Oh et.al. |
2505.11788 |
null |
2025-05-16 |
Token-Level Uncertainty Estimation for Large Language Model Reasoning |
Tunyu Zhang et.al. |
2505.11737 |
null |
2025-05-16 |
Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models |
Harshil Vejendla et.al. |
2505.11731 |
null |
2025-05-16 |
Terminators: Terms of Service Parsing and Auditing Agents |
Maruf Ahmed Mridul et.al. |
2505.11672 |
null |
2025-05-16 |
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models |
Bohao Xing et.al. |
2505.11405 |
link |
2025-05-19 |
Phare: A Safety Probe for Large Language Models |
Pierre Le Jeune et.al. |
2505.11365 |
link |
2025-05-16 |
The Way We Prompt: Conceptual Blending, Neural Dynamics, and Prompt-Induced Transitions in LLMs |
Makoto Sato et.al. |
2505.10948 |
null |
2025-05-19 |
Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation |
Zhan Peng Lee et.al. |
2505.10792 |
link |
2025-05-19 |
Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding |
Jianfei Zhao et.al. |
2505.10634 |
null |
2025-05-14 |
The Impact of Large Language Models on Task Automation in Manufacturing Services |
Jochen Wulf et.al. |
2505.10581 |
null |
2025-05-20 |
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges |
Ranjan Sapkota et.al. |
2505.10468 |
null |
2025-05-15 |
GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs |
Longchao Da et.al. |
2505.10143 |
null |
2025-05-16 |
Leveraging Graph Retrieval-Augmented Generation to Support Learners’ Understanding of Knowledge Concepts in MOOCs |
Mohamed Abdelmagied et.al. |
2505.10074 |
null |
2025-05-15 |
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis |
Bingda Tang et.al. |
2505.10046 |
link |
2025-05-15 |
Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph |
Deeksha Prahlad et.al. |
2505.09945 |
link |
2025-05-15 |
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Tasks |
Ziyuan Zhang et.al. |
2505.09901 |
link |
2025-05-14 |
A Multimodal Multi-Agent Framework for Radiology Report Generation |
Ziruo Yi et.al. |
2505.09787 |
null |
2025-05-14 |
Trustless Autonomy: Understanding Motivations, Benefits and Governance Dilemma in Self-Sovereign Decentralized AI Agents |
Botao Amber Hu et.al. |
2505.09757 |
null |
2025-05-15 |
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation |
Achref Doula et.al. |
2505.09427 |
null |
2025-05-14 |
Statistical Modeling and Uncertainty Estimation of LLM Inference Systems |
Kaustabha Ray et.al. |
2505.09319 |
null |
2025-05-14 |
Atomic Consistency Preference Optimization for Long-Form Question Answering |
Jingfeng Chen et.al. |
2505.09039 |
link |
2025-05-13 |
Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification |
Adarsh Kumar et.al. |
2505.09031 |
null |
2025-05-13 |
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training |
Yangyi Chen et.al. |
2505.08971 |
link |
2025-05-13 |
CellTypeAgent: Trustworthy cell type annotation with Large Language Models |
Jiawen Chen et.al. |
2505.08844 |
link |
2025-05-13 |
Adaptive Schema-aware Event Extraction with Retrieval-Augmented Generation |
Sheng Liang et.al. |
2505.08690 |
null |
2025-05-13 |
RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models |
Fujun Zhang et.al. |
2505.08463 |
null |
2025-05-13 |
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs |
Artem Shelmanov et.al. |
2505.08200 |
null |
2025-05-12 |
LLMs to Support K-12 Teachers in Culturally Relevant Pedagogy: An AI Literacy Example |
Jiayi Wang et.al. |
2505.08083 |
null |
2025-05-11 |
TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking |
Ching Nam Hang et.al. |
2505.07891 |
null |
2025-05-10 |
Recovering Event Probabilities from Large Language Model Embeddings via Axiomatic Constraints |
Jian-Qiao Zhu et.al. |
2505.07883 |
null |
2025-05-09 |
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy |
A M Muntasir Rahman et.al. |
2505.07871 |
null |
2025-05-12 |
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding |
Yifeng Di et.al. |
2505.07768 |
link |
2025-05-12 |
KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation |
Ching Han Chen et.al. |
2505.07618 |
null |
2025-05-12 |
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent |
Ziyang Huang et.al. |
2505.07596 |
null |
2025-05-12 |
Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models |
Bahram Mohammadi et.al. |
2505.07500 |
null |
2025-05-12 |
Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis |
Heydar Soudani et.al. |
2505.07459 |
null |
2025-05-12 |
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning |
Xiaotian Lin et.al. |
2505.07437 |
link |
2025-05-12 |
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data |
David de-Fitero-Dominguez et.al. |
2505.07372 |
null |
2025-05-12 |
Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection |
Pei-Fu Guo et.al. |
2505.07309 |
null |
2025-05-12 |
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs |
Yifan Wei et.al. |
2505.07184 |
link |
2025-05-13 |
Exploring Anthropomorphism in Conversational Agents for Environmental Sustainability |
Mathyas Giudici et.al. |
2505.07142 |
null |
2025-05-14 |
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models |
Hanzheng Dai et.al. |
2505.07089 |
null |
2025-05-10 |
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models |
Yangguang Shao et.al. |
2505.06579 |
link |
2025-05-10 |
LLM-Flock: Decentralized Multi-Robot Flocking via Large Language Models and Influence-Based Consensus |
Peihan Li et.al. |
2505.06513 |
null |
2025-05-09 |
Evolutionary thoughts: integration of large language models and evolutionary algorithms |
Antonio Jimeno Yepes et.al. |
2505.05756 |
link |
2025-05-08 |
Adaptive Stress Testing Black-Box LLM Planners |
Neeloy Chakraborty et.al. |
2505.05665 |
null |
2025-05-08 |
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics |
Lennart Luettgau et.al. |
2505.05602 |
link |
2025-05-08 |
FLAM: Frame-Wise Language-Audio Modeling |
Yusong Wu et.al. |
2505.05335 |
null |
2025-05-08 |
MARK: Memory Augmented Refinement of Knowledge |
Anish Ganguli et.al. |
2505.05177 |
null |
2025-05-08 |
A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network |
Haoxiang Luo et.al. |
2505.05103 |
null |
2025-05-08 |
Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware |
Yujia Chen et.al. |
2505.05057 |
link |
2025-05-08 |
An Open-Source Dual-Loss Embedding Model for Semantic Retrieval in Higher Education |
Ramteja Sajja et.al. |
2505.04916 |
null |
2025-05-07 |
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards |
Manveer Singh Tamber et.al. |
2505.04847 |
link |
2025-05-07 |
Osiris: A Lightweight Open-Source Hallucination Detection System |
Alex Shan et.al. |
2505.04844 |
null |
2025-05-07 |
A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models |
Pedro Pinacho-Davidson et.al. |
2505.04784 |
null |
2025-05-07 |
The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems |
Sutapa Dey Tithi et.al. |
2505.04736 |
null |
2025-05-06 |
Advancing Conversational Diagnostic AI with Multimodal Reasoning |
Khaled Saab et.al. |
2505.04653 |
null |
2025-05-06 |
Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions |
Adithya Kulkarni et.al. |
2505.04651 |
null |
2025-05-09 |
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection |
Zhihao Zhang et.al. |
2505.04594 |
null |
2025-05-07 |
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters |
David Exler et.al. |
2505.04393 |
null |
2025-05-07 |
Benchmarking LLMs’ Swarm intelligence |
Kai Ruan et.al. |
2505.04364 |
link |
2025-05-07 |
LLM-Independent Adaptive RAG: Let the Question Speak for Itself |
Maria Marina et.al. |
2505.04253 |
null |
2025-05-07 |
Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications |
Yuanai Xie et.al. |
2505.04068 |
null |
2025-05-02 |
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs |
Ganghua Wang et.al. |
2505.03814 |
null |
2025-05-02 |
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance |
Xing Hu et.al. |
2505.03804 |
null |
2025-05-02 |
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth |
Changhai Zhou et.al. |
2505.03802 |
null |
2025-04-30 |
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding |
Trilok Padhi et.al. |
2505.03788 |
null |
2025-05-06 |
A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning |
Kolawole E. Ogunsina et.al. |
2505.03553 |
null |
2025-05-06 |
Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis |
Shuang Zhou et.al. |
2505.03467 |
null |
2025-05-06 |
Automatic Calibration for Membership Inference Attack on Large Language Models |
Saleh Zare Zade et.al. |
2505.03392 |
link |
2025-05-06 |
Interpretable Zero-shot Learning with Infinite Class Concepts |
Zihan Ye et.al. |
2505.03361 |
null |
2025-05-06 |
Artificial Behavior Intelligence: Technology, Challenges, and Future Directions |
Kanghyun Jo et.al. |
2505.03315 |
null |
2025-05-06 |
A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case |
Haoxiang Luo et.al. |
2505.03196 |
null |
2025-05-06 |
Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering |
Joshua Owotogbe et.al. |
2505.03096 |
null |
2025-05-05 |
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models |
Zhengliang Shi et.al. |
2505.03075 |
link |
2025-05-05 |
UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output |
Sicong Huang et.al. |
2505.03030 |
null |
2025-05-05 |
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? |
Guangzhi Sun et.al. |
2505.02884 |
null |
2025-05-05 |
Phase transitions in AI-human interaction networks: statistics, computation, and probabilistic modeling |
Jackson George et.al. |
2505.02879 |
null |
2025-05-08 |
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations |
Dmitriy Shopkhoev et.al. |
2505.02819 |
link |
2025-05-05 |
Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing |
Diji Yang et.al. |
2505.02811 |
link |
2025-05-06 |
Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation |
Gerard Pons et.al. |
2505.02737 |
null |
2025-05-04 |
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation |
Tanguy Herserant et.al. |
2505.02235 |
null |
2025-05-12 |
LLM-Guided Probabilistic Program Induction for POMDP Model Estimation |
Aidan Curtis et.al. |
2505.02216 |
null |
2025-05-04 |
Large Language Models are overconfident and amplify human bias |
Fengfei Sun et.al. |
2505.02151 |
null |
2025-05-04 |
VECSR: Virtually Embodied Common Sense Reasoning System |
Alexis R. Tudor et.al. |
2505.02144 |
link |
2025-05-06 |
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation |
Chenxi Liu et.al. |
2505.02138 |
link |
2025-05-04 |
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach |
Jiancong Xiao et.al. |
2505.01997 |
null |
2025-05-03 |
High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers |
Brian Wong et.al. |
2505.01693 |
null |
2025-05-02 |
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation |
Liaoyaqi Wang et.al. |
2505.01595 |
null |
2025-05-02 |
Retrieval Augmented Learning: A Retrial-based Large Language Model Self-Supervised Learning and Autonomous Knowledge Generation |
Zongyuan Li et.al. |
2505.01073 |
null |
2025-05-02 |
Multi-agents based User Values Mining for Recommendation |
Lijian Chen et.al. |
2505.00981 |
null |
2025-05-01 |
Multivariate Conformal Selection |
Tian Bai et.al. |
2505.00917 |
null |
2025-05-08 |
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation |
Quang P. M. Pham et.al. |
2505.00831 |
link |
2025-05-01 |
HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models |
Zhaoxing Li et.al. |
2505.00820 |
null |
2025-05-01 |
A Survey on Large Language Model based Human-Agent Systems |
Henry Peng Zou et.al. |
2505.00753 |
link |
2025-05-05 |
Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs |
Dung Nguyen et.al. |
2505.00744 |
null |
2025-05-01 |
Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models |
Makoto Sato et.al. |
2505.00557 |
null |
2025-05-01 |
HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection |
Deanna Emery et.al. |
2505.00506 |
null |
2025-05-01 |
Distributed Retrieval-Augmented Generation |
Chenhao Xu et.al. |
2505.00443 |
link |
2025-04-30 |
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs |
Jinyan Su et.al. |
2505.00127 |
null |
2025-04-30 |
Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 |
Jeho Choi et.al. |
2505.00060 |
null |
2025-04-24 |
An Empirical Study on Prompt Compression for Large Language Models |
Zheng Zhang et.al. |
2505.00019 |
link |
2025-04-30 |
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness |
Junsheng Huang et.al. |
2504.21773 |
null |
2025-04-30 |
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA |
Xuanzhao Dong et.al. |
2504.21252 |
link |
2025-05-01 |
AI-in-the-Loop Planning for Transportation Electrification: Case Studies from Austin, Texas |
Seung Jun Choi et.al. |
2504.21185 |
null |
2025-04-29 |
LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge |
Naheed Rayhan et.al. |
2504.21132 |
null |
2025-04-22 |
ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees |
Jun Wang et.al. |
2504.21022 |
null |
2025-04-22 |
Context-Enhanced Contrastive Search for Improved LLM Text Generation |
Jaydip Sen et.al. |
2504.21020 |
null |
2025-04-29 |
Jekyll-and-Hyde Tipping Point in an AI’s Behavior |
Neil F. Johnson et.al. |
2504.20980 |
null |
2025-04-29 |
SetKE: Knowledge Editing for Knowledge Elements Overlap |
Yifan Wei et.al. |
2504.20972 |
null |
2025-04-29 |
Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models |
Maryna Vyshnyvetska et.al. |
2504.20951 |
null |
2025-04-29 |
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures |
Miguel Nogales et.al. |
2504.20922 |
null |
2025-04-29 |
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges |
Yunseo Lee et.al. |
2504.20799 |
null |
2025-04-29 |
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think |
Hasan Abed Al Kader Hammoud et.al. |
2504.20708 |
null |
2025-04-29 |
Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation? |
Evangelia Gogoulou et.al. |
2504.20699 |
null |
2025-04-29 |
Identifying Uncertainty in Self-Adaptive Robotics with Large Language Models |
Hassan Sartaj et.al. |
2504.20684 |
null |
2025-04-30 |
TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data |
Qi Wang et.al. |
2504.20462 |
null |
2025-04-28 |
Towards Large Language Models for Lunar Mission Planning and In Situ Resource Utilization |
Michael Pekala et.al. |
2504.20125 |
null |
2025-04-24 |
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning |
Zihan Wang et.al. |
2504.20073 |
link |
2025-04-28 |
Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages |
Pritika Rohera et.al. |
2504.20022 |
null |
2025-04-28 |
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models |
Xin Wang et.al. |
2504.20020 |
null |
2025-04-28 |
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets |
Mingqian He et.al. |
2504.19898 |
null |
2025-04-28 |
A Tripartite Perspective on GraphRAG |
Michael Banf et.al. |
2504.19667 |
null |
2025-04-28 |
An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination |
Dixiao Wei et.al. |
2504.19480 |
null |
2025-04-28 |
Towards Long Context Hallucination Detection |
Siyi Liu et.al. |
2504.19457 |
null |
2025-04-27 |
Bi-directional Model Cascading with Proxy Confidence |
David Warren et.al. |
2504.19391 |
null |
2025-04-27 |
The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach |
Chad Coleman et.al. |
2504.19255 |
null |
2025-04-30 |
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers |
Dylan Bouchard et.al. |
2504.19254 |
link |
2025-04-27 |
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models |
Anindya Bijoy Das et.al. |
2504.19061 |
null |
2025-04-26 |
Calibrating Translation Decoding with Quality Estimation on LLMs |
Di Wu et.al. |
2504.19044 |
link |
2025-04-26 |
AI Chatbots for Mental Health: Values and Harms from Lived Experiences of Depression |
Dong Whi Yoo et.al. |
2504.18932 |
null |
2025-04-26 |
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning |
Abdellah Ghassel et.al. |
2504.18839 |
null |
2025-04-25 |
Span-Level Hallucination Detection for LLM-Generated Answers |
Passant Elchafei et.al. |
2504.18639 |
null |
2025-04-24 |
Toward Personalizing Quantum Computing Education: An Evolutionary LLM-Powered Approach |
Iizalaarab Elhaimeur et.al. |
2504.18603 |
null |
2025-04-25 |
LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection |
Rajesh Yarra et.al. |
2504.18423 |
null |
2025-04-25 |
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review |
Toghrul Abbasli et.al. |
2504.18346 |
null |
2025-04-25 |
Evaluating Evaluation Metrics – The Mirage of Hallucination Detection |
Atharva Kulkarni et.al. |
2504.18114 |
null |
2025-04-25 |
Random-Set Large Language Models |
Muhammad Mubashar et.al. |
2504.18085 |
null |
2025-04-25 |
Validating Network Protocol Parsers with Traceable RFC Document Interpretation |
Mingwei Zheng et.al. |
2504.18050 |
null |
2025-04-24 |
LLM Agent Swarm for Hypothesis-Driven Drug Discovery |
Kevin Song et.al. |
2504.17967 |
null |
2025-04-24 |
HalluLens: LLM Hallucination Benchmark |
Yejin Bang et.al. |
2504.17550 |
null |
2025-04-24 |
Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing |
Hieu Huynh et.al. |
2504.17287 |
null |
2025-04-23 |
How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study |
Rendi Chevi et.al. |
2504.17083 |
null |
2025-04-23 |
Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models |
Shariar Kabir et.al. |
2504.17052 |
null |
2025-04-23 |
(Im)possibility of Automated Hallucination Detection in Large Language Models |
Amin Karbasi et.al. |
2504.17004 |
null |
2025-04-18 |
SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments |
Dachun Sun et.al. |
2504.16947 |
null |
2025-04-23 |
Enhancing Critical Thinking with AI: A Tailored Warning System for RAG Models |
Xuyang Zhu et.al. |
2504.16883 |
null |
2025-04-23 |
Monte Carlo Planning with Large Language Model for Text-Based Game Agents |
Zijing Shi et.al. |
2504.16855 |
null |
2025-04-23 |
Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories |
Mareike Lisker et.al. |
2504.16604 |
null |
2025-04-23 |
ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving |
Jie JW Wu et.al. |
2504.16331 |
null |
2025-04-23 |
Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations |
Nikhil Khandalkar et.al. |
2504.15903 |
null |
2025-04-22 |
Dynamic Early Exit in Reasoning Models |
Chenxu Yang et.al. |
2504.15895 |
link |
2025-04-22 |
Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback |
Ning Wang et.al. |
2504.15804 |
null |
2025-04-22 |
Grounded in Context: Retrieval-Based Method for Hallucination Detection |
Assaf Gerner et.al. |
2504.15771 |
null |
2025-04-20 |
PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind |
Yajie Yu et.al. |
2504.15313 |
null |
2025-04-21 |
Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning |
Ehsan Ahmadi et.al. |
2504.15263 |
null |
2025-04-21 |
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges |
Nandan Thakur et.al. |
2504.15205 |
null |
2025-04-21 |
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models |
Ronak Pradeep et.al. |
2504.15068 |
null |
2025-04-23 |
aiXamine: Simplified LLM Safety and Security |
Fatih Deniz et.al. |
2504.14985 |
null |
2025-04-21 |
POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications |
Chunjing Gan et.al. |
2504.14917 |
null |
2025-04-21 |
CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs |
Yingming Zheng et.al. |
2504.14905 |
link |
2025-04-20 |
HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis |
Kangwei Xu et.al. |
2504.14641 |
null |
2025-04-20 |
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models |
Hongming Tan et.al. |
2504.14620 |
null |
2025-04-20 |
a1: Steep Test-time Scaling Law via Environment Augmented Generation |
Lingrui Mei et.al. |
2504.14597 |
null |
2025-04-20 |
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey |
Ahsan Bilal et.al. |
2504.14520 |
null |
2025-04-20 |
VizTA: Enhancing Comprehension of Distributional Visualization with Visual-Lexical Fused Conversational Interface |
Liangwei Wang et.al. |
2504.14507 |
null |
2025-04-20 |
CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge |
Armin Toroghi et.al. |
2504.14462 |
null |
2025-04-20 |
Information Diffusion and Preferential Attachment in a Network of Large Language Models |
Adit Jain et.al. |
2504.14438 |
null |
2025-04-20 |
ResNetVLLM-2: Addressing ResNetVLLM’s Multi-Modal Hallucinations |
Ahmad Khalil et.al. |
2504.14429 |
null |
2025-04-19 |
Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts |
Kun Qian et.al. |
2504.14375 |
null |
2025-04-19 |
Density Measures for Language Generation |
Jon Kleinberg et.al. |
2504.14370 |
null |
2025-04-19 |
Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model |
Youngbin Lee et.al. |
2504.14345 |
link |
2025-04-19 |
A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models |
Chengming Wang et.al. |
2504.14241 |
null |
2025-04-18 |
Metacognition and Uncertainty Communication in Humans and Large Language Models |
Mark Steyvers et.al. |
2504.14045 |
null |
2025-04-18 |
Multi-Stage Retrieval for Operational Technology Cybersecurity Compliance Using Large Language Models: A Railway Casestudy |
Regan Bolton et.al. |
2504.14044 |
null |
2025-04-18 |
Going Whole Hog: A Philosophical Defense of AI Cognition |
Herman Cappelen et.al. |
2504.13988 |
null |
2025-04-18 |
Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations |
Chenghao Xiao et.al. |
2504.13816 |
link |
2025-04-18 |
Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results |
Andrea Santilli et.al. |
2504.13677 |
null |
2025-04-18 |
Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code |
Antonio Della Porta et.al. |
2504.13656 |
null |
2025-04-18 |
Exploring the Potential for Large Language Models to Demonstrate Rational Probabilistic Beliefs |
Gabriel Freedman et.al. |
2504.13644 |
link |
2025-04-18 |
Long-context Non-factoid Question Answering in Indic Languages |
Ritwik Mishra et.al. |
2504.13615 |
link |
2025-04-18 |
Continual Pre-Training is (not) What You Need in Domain Adaption |
Pin-Er Chen et.al. |
2504.13603 |
null |
2025-04-18 |
Trust, but verify |
Michael J. Yuan et.al. |
2504.13443 |
null |
2025-04-17 |
Energy-Based Reward Models for Robust Language Model Alignment |
Anamika Lochab et.al. |
2504.13134 |
link |
2025-04-17 |
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models |
Haojian Huang et.al. |
2504.13122 |
link |
2025-04-17 |
Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild |
Jiatai Wang et.al. |
2504.12982 |
null |
2025-04-17 |
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? |
Zhouyang Jiang et.al. |
2504.12961 |
null |
2025-04-18 |
Customizing Emotional Support: How Do Individuals Construct and Interact With LLM-Powered Chatbots |
Xi Zheng et.al. |
2504.12943 |
null |
2025-04-17 |
Explainable AI in Usable Privacy and Security: Challenges and Opportunities |
Vincent Freiberger et.al. |
2504.12931 |
null |
2025-04-17 |
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration |
Yicheng Pan et.al. |
2504.12773 |
link |
2025-04-17 |
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations |
Yiyou Sun et.al. |
2504.12691 |
link |
2025-04-17 |
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models |
Liyi Zhang et.al. |
2504.12585 |
link |
2025-04-16 |
PlanGlow: Personalized Study Planning with an Explainable and Controllable LLM-Driven System |
Jiwon Chun et.al. |
2504.12452 |
link |
2025-04-16 |
Don’t Just Translate, Agitate: Using Large Language Models as Devil’s Advocates for AI Explanations |
Ashley Suh et.al. |
2504.12424 |
null |
2025-04-16 |
Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study |
Harry Li et.al. |
2504.12422 |
null |
2025-04-16 |
Gauging Overprecision in LLMs: An Empirical Study |
Adil Bahaj et.al. |
2504.12098 |
null |
2025-04-16 |
Purposefully Induced Psychosis (PIP): Embracing Hallucination as Imagination in Large Language Models |
Kris Pilcher et.al. |
2504.12012 |
null |
2025-04-16 |
SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes |
Raúl Vázquez et.al. |
2504.11975 |
null |
2025-04-16 |
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading |
Kihyun Kim et.al. |
2504.11816 |
link |
2025-04-16 |
Probing the Unknown: Exploring Student Interactions with Probeable Problems at Scale in Introductory Programming |
Paul Denny et.al. |
2504.11723 |
null |
2025-04-15 |
From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs |
Guocong Li et.al. |
2504.11277 |
null |
2025-04-16 |
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR |
Yulong Zhang et.al. |
2504.11101 |
null |
2025-04-15 |
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique |
Shuhang Liu et.al. |
2504.11009 |
null |
2025-04-14 |
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates |
Ankit Kumar Shaw et.al. |
2504.10738 |
null |
2025-04-14 |
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving |
Avinash Kumar et.al. |
2504.10724 |
null |
2025-04-14 |
EMAFusion: A Self-Optimizing System for Seamless LLM Selection and Integration |
Soham Shah et.al. |
2504.10681 |
null |
2025-04-14 |
Efficient Process Reward Model Training via Active Learning |
Keyu Duan et.al. |
2504.10559 |
link |
2025-04-09 |
Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion |
Jakub Podolak et.al. |
2504.10509 |
null |
2025-04-14 |
Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? |
Olha Shaposhnyk et.al. |
2504.10397 |
null |
2025-04-16 |
Heimdall: test-time scaling on the generative verification |
Wenlei Shi et.al. |
2504.10337 |
null |
2025-04-14 |
From Prompting to Alignment: A Generative Framework for Query Recommendation |
Erxue Min et.al. |
2504.10208 |
null |
2025-04-14 |
DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation |
Hanghui Guo et.al. |
2504.10198 |
null |
2025-04-14 |
HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection |
Mohamed A. Abdallah et.al. |
2504.10168 |
null |
2025-04-14 |
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation |
Xu Zhang et.al. |
2504.10167 |
null |
2025-04-14 |
The Human Visual System Can Inspire New Interaction Paradigms for LLMs |
Diana Robinson et.al. |
2504.10101 |
null |
2025-04-14 |
Hallucination Detection in LLMs via Topological Divergence on Attention Graphs |
Alexandra Bazarova et.al. |
2504.10063 |
null |
2025-04-15 |
Emotional Strain and Frustration in LLM Interactions in Software Engineering |
Cristina Martinez Montes et.al. |
2504.10050 |
null |
2025-04-14 |
DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify |
Zhengxuan Zhang et.al. |
2504.10036 |
null |
2025-04-14 |
EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control |
Hanwen Wan et.al. |
2504.10030 |
link |
2025-04-14 |
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference |
Yuxuan Tian et.al. |
2504.09936 |
null |
2025-04-14 |
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data |
Shuai Zhao et.al. |
2504.09895 |
null |
2025-04-14 |
Reasoning Models Can Be Effective Without Thinking |
Wenjie Ma et.al. |
2504.09858 |
null |
2025-04-14 |
RAKG:Document-level Retrieval Augmented Knowledge Graph Construction |
Hairong Zhang et.al. |
2504.09823 |
link |
2025-04-14 |
Reasoning Court: Combining Reasoning, Action, and Judgment for Multi-Hop Reasoning |
Jingtian Wu et.al. |
2504.09781 |
null |
2025-04-13 |
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training |
Zhenting Wang et.al. |
2504.09710 |
link |
2025-04-17 |
Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws |
Zhixuan Pan et.al. |
2504.09597 |
null |
2025-04-17 |
ControlNET: A Firewall for RAG-based LLM System |
Hongwei Yao et.al. |
2504.09593 |
null |
2025-04-13 |
How new data permeates LLM knowledge and how to dilute it |
Chen Sun et.al. |
2504.09522 |
null |
2025-04-13 |
HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs |
Sharanya Dasgupta et.al. |
2504.09482 |
link |
2025-04-13 |
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection |
MingShan Liu et.al. |
2504.09440 |
null |
2025-04-12 |
Continuum-Interaction-Driven Intelligence: Human-Aligned Neural Architecture via Crystallized Reasoning and Fluid Generation |
Pengcheng Zhou et.al. |
2504.09301 |
null |
2025-04-12 |
SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders |
Ashmi Banerjee et.al. |
2504.09277 |
null |
2025-04-12 |
Towards More Efficient, Robust, Instance-adaptive, and Generalizable Online Learning |
Zhiyong Wang et.al. |
2504.09192 |
null |
2025-04-11 |
Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation |
Terrence Neumann et.al. |
2504.08954 |
null |
2025-04-11 |
Knowledge Graph-extended Retrieval Augmented Generation for Question Answering |
Jasper Linders et.al. |
2504.08893 |
null |
2025-04-11 |
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning |
Fangzhi Xu et.al. |
2504.08672 |
link |
2025-04-11 |
MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation |
Tao Zhang et.al. |
2504.08621 |
link |
2025-04-16 |
Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks |
Ye Ye et.al. |
2504.08525 |
link |
2025-04-07 |
SEAL: Steerable Reasoning Calibration of Large Language Models for Free |
Runjin Chen et.al. |
2504.07986 |
link |
2025-04-10 |
Token Level Routing Inference System for Edge Devices |
Jianshu She et.al. |
2504.07878 |
null |
2025-04-10 |
Robust Hallucination Detection in LLMs via Adaptive Token Selection |
Mengjia Niu et.al. |
2504.07863 |
null |
2025-04-17 |
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization |
Yang Jiao et.al. |
2504.07717 |
null |
2025-04-10 |
Synthetic Fluency: Hallucinations, Confabulations, and the Creation of Irish Words in LLM-Generated Translations |
Sheila Castilho et.al. |
2504.07680 |
null |
2025-04-10 |
Enhancing Large Language Models through Neuro-Symbolic Integration and Ontological Reasoning |
Ruslan Idelfonso Magana Vsevolodovna et.al. |
2504.07640 |
link |
2025-04-11 |
Malware analysis assisted by AI with R2AI |
Axelle Apvrille et.al. |
2504.07574 |
null |
2025-04-10 |
A taxonomy of epistemic injustice in the context of AI and the case for generative hermeneutical erasure |
Warmhold Jan Thomas Mollema et.al. |
2504.07531 |
null |
2025-04-10 |
Supervised Optimism Correction: Be Confident When LLMs Are Sure |
Junjie Zhang et.al. |
2504.07527 |
null |
2025-04-10 |
Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction |
Kyoyun Choi et.al. |
2504.07415 |
null |
2025-04-10 |
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression |
Hanqi Xiao et.al. |
2504.07389 |
link |
2025-04-11 |
Alice: Proactive Learning with Teacher’s Demonstrations for Weak-to-Strong Generalization |
Shujin Wu et.al. |
2504.07316 |
link |
2025-04-09 |
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification |
Bibek Paudel et.al. |
2504.07069 |
null |
2025-04-11 |
Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration |
Kostas Hatalis et.al. |
2504.06943 |
null |
2025-04-09 |
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program |
Minghe Gao et.al. |
2504.06606 |
link |
2025-04-09 |
Do Reasoning Models Show Better Verbalized Calibration? |
Qingcheng Zeng et.al. |
2504.06564 |
null |
2025-04-08 |
Don’t Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning |
Yuehan Qin et.al. |
2504.06438 |
null |
2025-04-08 |
Human Trust in AI Search: A Large-Scale Experiment |
Haiwen Li et.al. |
2504.06435 |
null |
2025-04-09 |
GOLLuM: Gaussian Process Optimized LLMs – Reframing LLM Finetuning through Bayesian Optimization |
Bojana Ranković et.al. |
2504.06265 |
link |
2025-04-08 |
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs |
Dongjun Qian et.al. |
2504.05673 |
null |
2025-04-08 |
On the Impact of Language Nuances on Sentiment Analysis with Large Language Models: Paraphrasing, Sarcasm, and Emojis |
Naman Bhargava et.al. |
2504.05603 |
null |
2025-04-07 |
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases |
Alfred Clemedtson et.al. |
2504.05478 |
link |
2025-04-07 |
The challenge of uncertainty quantification of large language models in medicine |
Zahra Atf et.al. |
2504.05278 |
null |
2025-04-07 |
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation |
Xinglin Lyu et.al. |
2504.05122 |
link |
2025-04-07 |
On the Performance of an Explainable Language Model on PubMedQA |
Venkat Srinivasan et.al. |
2504.05074 |
null |
2025-04-07 |
Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning |
Sugyeong Eo et.al. |
2504.05047 |
null |
2025-04-07 |
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models |
Carlos Peláez-González et.al. |
2504.04976 |
null |
2025-04-07 |
A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization |
Wenyuan Xu et.al. |
2504.04950 |
null |
2025-04-06 |
Capturing AI’s Attention: Physics of Repetition, Hallucination, Bias and Beyond |
Frank Yingjie Huo et.al. |
2504.04600 |
null |
2025-04-06 |
Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language Models |
Rui Gan et.al. |
2504.04562 |
link |
2025-04-06 |
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT |
Zhuo Zhi et.al. |
2504.04471 |
null |
2025-04-06 |
An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainability |
David Herrera-Poyatos et.al. |
2504.04462 |
null |
2025-04-09 |
How Accurately Do Large Language Models Understand Code? |
Sabaat Haroon et.al. |
2504.04372 |
null |
2025-04-06 |
Generative Large Language Models Trained for Detecting Errors in Radiology Reports |
Cong Sun et.al. |
2504.04336 |
null |
2025-04-09 |
Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks |
Marios Kokkodis et.al. |
2504.04277 |
null |
2025-04-05 |
Adaptive Elicitation of Latent Information Using Natural Language |
Jimmy Wang et.al. |
2504.04204 |
null |
2025-04-04 |
Structured Extraction of Process Structure Properties Relationships in Materials Science |
Amit K Verma et.al. |
2504.03979 |
null |
2025-04-04 |
Bridging LMS and Generative AI: Dynamic Course Content Integration (DCCI) for Connecting LLMs to Course Content – The Ask ME Assistant |
Kovan Mzwri et.al. |
2504.03966 |
null |
2025-04-04 |
Practical Poisoning Attacks against Retrieval-Augmented Generation |
Baolei Zhang et.al. |
2504.03957 |
null |
2025-04-04 |
The H-Elena Trojan Virus to Infect Model Weights: A Wake-Up Call on the Security Risks of Malicious Fine-Tuning |
Virilo Tejedor et.al. |
2504.03823 |
null |
2025-04-04 |
Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy |
Kamil Ciosek et.al. |
2504.03579 |
null |
2025-04-04 |
Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej |
Shubham Kumar Nigam et.al. |
2504.03486 |
null |
2025-04-07 |
LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications |
Botao Zhu et.al. |
2504.03444 |
null |
2025-04-04 |
Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models |
Mirko Borszukovszki et.al. |
2504.03440 |
null |
2025-04-04 |
Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models |
Afshin Khadangi et.al. |
2504.03302 |
link |
2025-04-04 |
Do Large Language Models Solve the Problems of Agent-Based Modeling? A Critical Review of Generative Social Simulations |
Maik Larooij et.al. |
2504.03274 |
null |
2025-04-04 |
Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation |
Weitao Li et.al. |
2504.03165 |
link |
2025-04-03 |
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence |
Hongzhe Du et.al. |
2504.02904 |
null |
2025-04-03 |
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models |
Liangjie Huang et.al. |
2504.02902 |
null |
2025-04-01 |
Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications |
Hongliu Cao et.al. |
2504.02867 |
null |
2025-04-01 |
The Illusionist’s Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances |
Yining Wang et.al. |
2504.02865 |
null |
2025-04-03 |
A Memory-Augmented LLM-Driven Method for Autonomous Merging of 3D Printing Work Orders |
Yuhao Liu et.al. |
2504.02509 |
null |
2025-04-03 |
Cognitive Memory in Large Language Models |
Lianlei Shan et.al. |
2504.02441 |
null |
2025-04-02 |
Achieving Unanimous Consensus in Decision Making Using Multi-Agents |
Apurba Pokharel et.al. |
2504.02128 |
null |
2025-04-02 |
Aligned Better, Listen Better for Audio-Visual Large Language Models |
Yuxin Guo et.al. |
2504.02061 |
null |
2025-04-03 |
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation |
Baban Gain et.al. |
2504.01919 |
null |
2025-04-02 |
LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution |
Zhuoran Yang et.al. |
2504.01533 |
null |
2025-04-03 |
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding |
Sakhinana Sagar Srinivas et.al. |
2504.01281 |
null |
2025-04-01 |
Grade Guard: A Smart System for Short Answer Automated Grading |
Niharika Dadu et.al. |
2504.01253 |
null |
2025-04-01 |
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models |
Rafael Giebisch et.al. |
2504.01248 |
null |
2025-04-01 |
Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery |
Nicholas Clark et.al. |
2504.01205 |
null |
2025-04-01 |
$μ$ KE: Matryoshka Unstructured Knowledge Editing of Large Language Models |
Zian Su et.al. |
2504.01196 |
null |
2025-04-01 |
Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations |
Mahjabin Nahar et.al. |
2504.01153 |
link |
2025-04-01 |
MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs) |
Bikash Saha et.al. |
2504.01145 |
link |
2025-04-01 |
Investigating Large Language Models in Diagnosing Students’ Cognitive Skills in Math Problem-solving |
Hyoungwook Jin et.al. |
2504.00843 |
null |
2025-04-01 |
Aplicação de Large Language Models na Análise e Síntese de Documentos Jurídicos: Uma Revisão de Literatura |
Matheus Belarmino et.al. |
2504.00725 |
null |
2025-04-01 |
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments |
Enjun Du et.al. |
2504.00711 |
null |
2025-04-01 |
DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism |
Dengchun Li et.al. |
2504.00661 |
link |
2025-04-01 |
Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences |
Xiangyang Liu et.al. |
2504.00473 |
null |
2025-04-01 |
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics |
Shide Zhou et.al. |
2504.00446 |
null |
2025-04-01 |
Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding |
Mohanakrishnan Hariharan et.al. |
2504.00409 |
null |
2025-04-01 |
When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR) |
Mahak Agarwal et.al. |
2504.00374 |
null |
2025-03-31 |
SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning |
Shiyue Zhao et.al. |
2504.00115 |
null |
2025-03-30 |
Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge |
Agam Shah et.al. |
2504.00042 |
null |
2025-03-27 |
Medical Reasoning in LLMs: An In-Depth Analysis of DeepSeek R1 |
Birger Moell et.al. |
2504.00016 |
null |
2025-03-31 |
SQuat: Subspace-orthogonal KV Cache Quantization |
Hao Wang et.al. |
2503.24358 |
null |
2025-03-31 |
Model Hemorrhage and the Robustness Limits of Large Language Models |
Ziyang Ma et.al. |
2503.23924 |
null |
2025-03-31 |
Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement |
Yuqiao Tan et.al. |
2503.23895 |
link |
2025-03-31 |
Adaptive Layer-skipping in Pre-trained LLMs |
Xuan Luo et.al. |
2503.23798 |
null |
2025-03-31 |
MKA: Leveraging Cross-Lingual Consensus for Model Abstention |
Sharad Duwal et.al. |
2503.23687 |
link |
2025-03-30 |
RARE: Retrieval-Augmented Reasoning Modeling |
Zhengren Wang et.al. |
2503.23513 |
link |
2025-03-30 |
SCORE: Story Coherence and Retrieval Enhancement for AI Narratives |
Qiang Yi et.al. |
2503.23512 |
null |
2025-03-30 |
Re-Aligning Language to Visual Objects with an Agentic Workflow |
Yuming Chen et.al. |
2503.23508 |
null |
2025-03-30 |
An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering |
Alexander Murphy et.al. |
2503.23415 |
null |
2025-03-30 |
Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation |
Jiwon Jeong et.al. |
2503.23363 |
link |
2025-03-30 |
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base |
Linxin Song et.al. |
2503.23361 |
null |
2025-03-29 |
Citegeist: Automated Generation of Related Work Analysis on the arXiv Corpus |
Claas Beger et.al. |
2503.23229 |
link |
2025-03-29 |
Large Language Models are Unreliable for Cyber Threat Intelligence |
Emanuele Mezzi et.al. |
2503.23175 |
null |
2025-03-29 |
Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments |
Yifan Xu et.al. |
2503.23105 |
null |
2025-03-29 |
DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation |
Hsin-Ling Hsu et.al. |
2503.23013 |
null |
2025-03-29 |
Can LLMs Support Medical Knowledge Imputation? An Evaluation-Based Perspective |
Xinyu Yao et.al. |
2503.22954 |
null |
2025-03-29 |
Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering |
Yugen Sato et.al. |
2503.22941 |
null |
2025-04-02 |
Factored Agents: Decoupling In-Context Learning and Memorization for Robust Tool Use |
Nicholas Roth et.al. |
2503.22931 |
null |
2025-03-28 |
Identifying and Mitigating API Misuse in Large Language Models |
Terry Yue Zhuo et.al. |
2503.22821 |
null |
2025-03-26 |
InfoBid: A Simulation Framework for Studying Information Disclosure in Auctions with Large Language Model-based Agents |
Yue Yin et.al. |
2503.22726 |
null |
2025-03-25 |
Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models |
Bowei Tian et.al. |
2503.22720 |
null |
2025-03-25 |
LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation |
Sarah Martinson et.al. |
2503.22719 |
link |
2025-03-31 |
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning |
Abdullah Vanlioglu et.al. |
2503.22456 |
null |
2025-03-28 |
Supposedly Equivalent Facts That Aren’t? Entity Frequency in Pre-training Induces Asymmetry in LLMs |
Yuan He et.al. |
2503.22362 |
link |
2025-03-28 |
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions |
Yubo Li et.al. |
2503.22353 |
null |
2025-03-28 |
BanglAssist: A Bengali-English Generative AI Chatbot for Code-Switching and Dialect-Handling in Customer Service |
Francesco Kruk et.al. |
2503.22283 |
null |
2025-03-28 |
Learning to Instruct for Visual Instruction Tuning |
Zhihan Zhou et.al. |
2503.22215 |
null |
2025-03-28 |
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models |
Zhanke Zhou et.al. |
2503.22165 |
link |
2025-03-27 |
Entropy-Aware Branching for Improved Mathematical Reasoning |
Xianzhi Li et.al. |
2503.21961 |
null |
2025-03-25 |
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding LLM Hallucinations in Ontology Matching Systems |
Zhangcheng Qiang et.al. |
2503.21813 |
null |
2025-03-27 |
Cooking Task Planning using LLM and Verified by Graph Network |
Ryunosuke Takebayashi et.al. |
2503.21564 |
null |
2025-03-27 |
SWI: Speaking with Intent in Large Language Models |
Yuwei Yin et.al. |
2503.21544 |
link |
2025-04-02 |
Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best? |
Ashish Sardana et.al. |
2503.21157 |
null |
2025-03-27 |
Alleviating LLM-based Generative Retrieval Hallucination in Alipay Search |
Yedan Shen et.al. |
2503.21098 |
null |
2025-03-26 |
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework |
Thomson Yen et.al. |
2503.21023 |
link |
2025-03-26 |
Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring |
Fraol Batole et.al. |
2503.20934 |
null |
2025-03-26 |
Exploring CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation |
Zhiwei Yang et.al. |
2503.20826 |
link |
2025-03-26 |
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy |
Joonhyun Jeong et.al. |
2503.20823 |
link |
2025-03-26 |
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search |
Yunhai Hu et.al. |
2503.20757 |
null |
2025-03-26 |
TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes |
Raj Sanjay Shah et.al. |
2503.20648 |
null |
2025-03-26 |
Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering |
Zehui Liao et.al. |
2503.20504 |
null |
2025-03-26 |
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization |
Zhouhong Gu et.al. |
2503.20194 |
link |
2025-03-25 |
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs |
Carlos Plou et.al. |
2503.19850 |
null |
2025-03-25 |
HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection |
Maryam Bala et.al. |
2503.19650 |
null |
2025-03-25 |
KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models |
Zhiwei Wang et.al. |
2503.19482 |
null |
2025-03-25 |
VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU |
Zhongchun Zheng et.al. |
2503.19449 |
null |
2025-03-25 |
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition |
Yuxuan Hu et.al. |
2503.19353 |
link |
2025-03-24 |
Language Model Uncertainty Quantification with Attention Chain |
Yinghao Li et.al. |
2503.19168 |
link |
2025-03-24 |
Self-Reported Confidence of Large Language Models in Gastroenterology: Analysis of Commercial, Open-Source, and Quantized Models |
Nariman Naderi et.al. |
2503.18562 |
null |
2025-03-24 |
Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions |
Dong Jing et.al. |
2503.18320 |
null |
2025-03-23 |
ShED-HD: A Shannon Entropy Distribution Framework for Lightweight Hallucination Detection on Edge Devices |
Aneesh Vathul et.al. |
2503.18242 |
null |
2025-03-23 |
GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks |
Varvara Krechetova et.al. |
2503.18129 |
link |
2025-03-23 |
SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA |
V Venktesh et.al. |
2503.17990 |
null |
2025-03-22 |
A Modular Dataset to Demonstrate LLM Abstraction Capability |
Adam Atanas et.al. |
2503.17645 |
null |
2025-03-22 |
ConSol: Sequential Probability Ratio Testing to Find Consistent LLM Reasoning Paths Efficiently |
Jaeyeon Lee et.al. |
2503.17587 |
link |
2025-03-21 |
Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring |
Khadija Zanna et.al. |
2503.17569 |
null |
2025-03-21 |
Judge Anything: MLLM as a Judge Across Any Modality |
Shu Pu et.al. |
2503.17489 |
null |
2025-03-21 |
LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language |
Kun Chu et.al. |
2503.17309 |
link |
2025-03-21 |
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs |
Albert Sawczyn et.al. |
2503.17229 |
null |
2025-03-20 |
Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models |
Zahra Khalila et.al. |
2503.16581 |
null |
2025-03-26 |
Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models |
Hanzhi Zhang et.al. |
2503.16541 |
null |
2025-03-18 |
Do Multimodal Large Language Models Understand Welding? |
Grigorii Khvatskii et.al. |
2503.16537 |
null |
2025-03-18 |
Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine |
Chengfeng Dou et.al. |
2503.16530 |
null |
2025-03-18 |
HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL |
Heng Ping et.al. |
2503.16528 |
null |
2025-03-20 |
Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data |
Zijian Li et.al. |
2503.16260 |
null |
2025-03-20 |
Towards Lighter and Robust Evaluation for Retrieval Augmented Generation |
Alex-Razvan Ispas et.al. |
2503.16161 |
link |
2025-03-20 |
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph |
Langming Liu et.al. |
2503.15990 |
null |
2025-03-20 |
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models |
Baolong Bi et.al. |
2503.15888 |
link |
2025-03-21 |
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance |
Hui Liu et.al. |
2503.15886 |
null |
2025-03-20 |
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations |
Kyungho Bae et.al. |
2503.15871 |
null |
2025-03-20 |
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey |
Xiaoou Liu et.al. |
2503.15850 |
null |
2025-03-20 |
Entropy-based Exploration Conduction for Multi-step Reasoning |
Jinghan Zhang et.al. |
2503.15848 |
null |
2025-03-23 |
DNA Bench: When Silence is Smarter – Benchmarking Over-Reasoning in Reasoning LLMs |
Masoud Hashemi et.al. |
2503.15793 |
null |
2025-03-19 |
R $^2$ : A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs |
Zefeng Lin et.al. |
2503.15655 |
null |
2025-03-19 |
How Well Can AI Build SD Models? |
William Schoenberg et.al. |
2503.15580 |
null |
2025-03-19 |
Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs |
Yuqi Zhu et.al. |
2503.15341 |
null |
2025-03-19 |
Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning? |
Roberto Araya et.al. |
2503.15268 |
null |
2025-03-19 |
Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems |
Sejong Kim et.al. |
2503.15191 |
link |
2025-03-19 |
Comparing Llama3 and DeepSeekR1 on Biomedical Text Classification Tasks |
Yuting Guo et.al. |
2503.15169 |
null |
2025-03-19 |
ELTEX: A Framework for Domain-Driven Synthetic Data Generation |
Arina Razmyslovich et.al. |
2503.15055 |
link |
2025-03-18 |
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence |
Sophia Hager et.al. |
2503.14749 |
null |
2025-03-18 |
Assessing Large Language Models for Automated Feedback Generation in Learning Programming Problem Solving |
Priscylla Silva et.al. |
2503.14630 |
link |
2025-03-18 |
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations |
Ziwei Ji et.al. |
2503.14477 |
null |
2025-03-18 |
From “Hallucination” to “Suture”: Insights from Language Philosophy to Enhance Large Language Models |
Qiantong Wang et.al. |
2503.14392 |
null |
2025-03-18 |
How much do LLMs learn from negative examples? |
Shadi Hamdan et.al. |
2503.14391 |
link |
2025-03-18 |
On the Standard Performance Criteria for Applied Control Design: PID, MPC or Machine Learning Controller? |
Pouria Sarhadi et.al. |
2503.14379 |
link |
2025-03-18 |
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis |
Guy Bar-Shalom et.al. |
2503.14043 |
link |
2025-03-18 |
Predicting Human Choice Between Textually Described Lotteries |
Eyal Marantz et.al. |
2503.14004 |
null |
2025-03-18 |
FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks |
Siqi Zhang et.al. |
2503.13966 |
null |
2025-03-19 |
Enabling Inclusive Systematic Reviews: Incorporating Preprint Articles with Large Language Model-Driven Evaluations |
Rui Yang et.al. |
2503.13857 |
null |
2025-03-18 |
Empowering GraphRAG with Knowledge Filtering and Integration |
Kai Guo et.al. |
2503.13804 |
null |
2025-03-18 |
Mapping the Trust Terrain: LLMs in Software Engineering – Insights and Perspectives |
Dipin Khati et.al. |
2503.13793 |
null |
2025-03-17 |
Pareidolic Illusions of Meaning: ChatGPT, Pseudolaw and the Triumph of Form over Substance |
Joe McIntyre et.al. |
2503.13556 |
null |
2025-03-14 |
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration |
Hong Qing Yu et.al. |
2503.13514 |
null |
2025-03-17 |
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts |
Qin Liu et.al. |
2503.13447 |
null |
2025-03-17 |
Managing Hybrid Solid-State Drives Using Large Language Models |
Qian Wei et.al. |
2503.13105 |
null |
2025-03-17 |
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning |
Junming Liu et.al. |
2503.12972 |
null |
2025-03-17 |
MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting |
Rui Pu et.al. |
2503.12931 |
null |
2025-03-17 |
HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models |
Xinyan Jiang et.al. |
2503.12908 |
link |
2025-03-16 |
Can LLMs Formally Reason as Abstract Interpreters for Program Analysis? |
Jacqueline L. Mitchell et.al. |
2503.12686 |
null |
2025-03-16 |
From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations |
Sarvesh Baskar et.al. |
2503.12556 |
null |
2025-03-21 |
LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation |
Yuqi Sun et.al. |
2503.12547 |
null |
2025-03-18 |
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? |
Jianzhu Yao et.al. |
2503.12349 |
null |
2025-03-15 |
PredicateFix: Repairing Static Analysis Alerts with Bridging Predicates |
Yuan-An Xiao et.al. |
2503.12205 |
null |
2025-03-20 |
Applications of Large Language Model Reasoning in Feature Generation |
Dharani Chandra et.al. |
2503.11989 |
null |
2025-03-14 |
LLM Agents for Education: Advances and Applications |
Zhendong Chu et.al. |
2503.11733 |
null |
2025-03-14 |
Neutralizing Bias in LLM Reasoning using Entailment Graphs |
Liang Cheng et.al. |
2503.11614 |
link |
2025-03-14 |
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning |
Jia Zhang et.al. |
2503.11441 |
null |
2025-03-14 |
Modeling Subjectivity in Cognitive Appraisal with Language Models |
Yuxiang Zhou et.al. |
2503.11381 |
null |
2025-03-14 |
Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches |
Panggih Kusuma Ningrum et.al. |
2503.11376 |
null |
2025-03-14 |
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation |
Fengyu Li et.al. |
2503.11346 |
link |
2025-03-14 |
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models |
Aissatou Diallo et.al. |
2503.11336 |
null |
2025-03-14 |
Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries |
Sahil Kale et.al. |
2503.11256 |
link |
2025-03-14 |
Collaboration is all you need: LLM Assisted Safe Code Translation |
Rabimba Karanjai et.al. |
2503.11237 |
null |
2025-03-13 |
Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations |
Piyush Gupta et.al. |
2503.10941 |
null |
2025-03-13 |
HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust |
Yu Luo et.al. |
2503.10793 |
null |
2025-03-12 |
CALLM: Context-Aware Emotion Analysis in Cancer Survivors Using LLMs and Retrieval-Augmented Mobile Diaries |
Zhiyuan Wang et.al. |
2503.10707 |
null |
2025-03-12 |
Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models |
Shahnewaz Karim Sakib et.al. |
2503.10690 |
null |
2025-03-13 |
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention |
Jinhao Duan et.al. |
2503.10602 |
link |
2025-03-13 |
SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models |
Sahar Admoni et.al. |
2503.10509 |
null |
2025-03-13 |
LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions |
Gaurav Kumar Gupta et.al. |
2503.10486 |
null |
2025-03-13 |
Collaborative Speculative Inference for Efficient LLM Inference Serving |
Luyao Gao et.al. |
2503.10325 |
null |
2025-03-13 |
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error |
Shu-Xun Yang et.al. |
2503.10105 |
link |
2025-03-13 |
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model |
Qiyuan Deng et.al. |
2503.10093 |
null |
2025-03-12 |
Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets |
Zahra Abbasiantaeb et.al. |
2503.09902 |
link |
2025-03-12 |
Probabilistic Reasoning with LLMs for k-anonymity Estimation |
Jonathan Zheng et.al. |
2503.09674 |
null |
2025-03-12 |
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection |
Richard A. Dubniczky et.al. |
2503.09433 |
link |
2025-03-12 |
NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model |
Yuzhi Lai et.al. |
2503.09335 |
link |
2025-03-12 |
Token Weighting for Long-Range Language Modeling |
Falko Helm et.al. |
2503.09202 |
link |
2025-03-12 |
Is LLMs Hallucination Usable? LLM-based Negative Reasoning for Fake News Detection |
Chaowei Zhang et.al. |
2503.09153 |
null |
2025-03-11 |
Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation |
Yu Wang et.al. |
2503.08963 |
null |
2025-03-11 |
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving |
Changxing Liu et.al. |
2503.08683 |
link |
2025-03-11 |
DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process |
Minjun Zhu et.al. |
2503.08569 |
null |
2025-03-11 |
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework |
Zhuo Zhi et.al. |
2503.08308 |
null |
2025-03-11 |
FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback |
Kangan Qian et.al. |
2503.08162 |
null |
2025-03-11 |
LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification |
Siyuan Wang et.al. |
2503.07937 |
null |
2025-03-10 |
Safety Guardrails for LLM-Enabled Robots |
Zachary Ravichandran et.al. |
2503.07885 |
null |
2025-03-10 |
HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations |
Samir Abdaljalil et.al. |
2503.07833 |
null |
2025-03-07 |
SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs |
Jaewoo Song et.al. |
2503.07657 |
null |
2025-03-07 |
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration |
Jinguang Wang et.al. |
2503.07654 |
null |
2025-03-10 |
Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review |
Samuel Ferino et.al. |
2503.07556 |
null |
2025-03-10 |
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies |
Luyi Jiang et.al. |
2503.07306 |
null |
2025-03-10 |
Quantizing Large Language Models for Code Generation: A Differentiated Replication |
Alessandro Giagnorio et.al. |
2503.07103 |
null |
2025-03-10 |
CtrlRAG: Black-box Adversarial Attacks Based on Masked Language Models in Retrieval-Augmented Language Generation |
Runqi Sui et.al. |
2503.06950 |
null |
2025-03-09 |
Multimodal AI-driven Biomarker for Early Detection of Cancer Cachexia |
Sabeen Ahmed et.al. |
2503.06797 |
null |
2025-03-09 |
Delusions of Large Language Models |
Hongshen Xu et.al. |
2503.06709 |
null |
2025-03-09 |
Alignment for Efficient Tool Calling of Large Language Models |
Hongshen Xu et.al. |
2503.06708 |
null |
2025-03-09 |
Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform |
Chenyu Huang et.al. |
2503.06676 |
null |
2025-03-09 |
Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving |
Yao Cheng et.al. |
2503.06567 |
null |
2025-03-09 |
Graph Retrieval-Augmented LLM for Conversational Recommendation Systems |
Zhangchi Qiu et.al. |
2503.06430 |
null |
2025-03-09 |
Performant LLM Agentic Framework for Conversational AI |
Alex Casella et.al. |
2503.06410 |
null |
2025-03-08 |
Sample-aware Adaptive Structured Pruning for Large Language Models |
Jun Kong et.al. |
2503.06184 |
null |
2025-03-08 |
Wireless Hallucination in Generative AI-enabled Communications: Concepts, Issues, and Solutions |
Xudong Wang et.al. |
2503.06149 |
link |
2025-03-08 |
A Survey on Post-training of Large Language Models |
Guiyao Tie et.al. |
2503.06072 |
link |
2025-03-07 |
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs |
Samir Abdaljalil et.al. |
2503.05980 |
null |
2025-03-07 |
TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator |
Deepak Vungarala et.al. |
2503.05951 |
null |
2025-03-04 |
I Think, Therefore I Hallucinate: Minds, Machines, and the Art of Being Wrong |
Sebastian Barros et.al. |
2503.05806 |
null |
2025-03-07 |
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning |
Huatong Song et.al. |
2503.05592 |
null |
2025-03-07 |
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information |
Junbo Zhao et.al. |
2503.05543 |
null |
2025-03-07 |
Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering |
Yusong Ke et.al. |
2503.05505 |
null |
2025-03-07 |
Maximum Hallucination Standards for Domain-Specific Large Language Models |
Tingmingke Lu et.al. |
2503.05481 |
null |
2025-03-07 |
An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning |
Navdeep Kaur et.al. |
2503.05439 |
null |
2025-03-07 |
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation |
Zhenxuan Zhang et.al. |
2503.05347 |
link |
2025-03-07 |
Path Pooling: Train-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation |
Hairu Wang et.al. |
2503.05203 |
null |
2025-03-07 |
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist |
Tianjun Wei et.al. |
2503.05142 |
link |
2025-03-06 |
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression |
Souvik Kundu et.al. |
2503.04982 |
null |
2025-03-10 |
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents |
Jingying Zeng et.al. |
2503.04830 |
null |
2025-03-07 |
START: Self-taught Reasoner with Tools |
Chengpeng Li et.al. |
2503.04625 |
null |
2025-03-06 |
HalluCounter: Reference-free LLM Hallucination Detection in the Wild! |
Ashok Urlana et.al. |
2503.04615 |
null |
2025-03-06 |
Benchmarking Reasoning Robustness in Large Language Models |
Tong Yu et.al. |
2503.04550 |
null |
2025-03-06 |
TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction |
Chao Wang et.al. |
2503.04457 |
null |
2025-03-06 |
On Fact and Frequency: LLM Responses to Misinformation Expressed with Uncertainty |
Yana van de Sande et.al. |
2503.04271 |
null |
2025-03-06 |
Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation |
Ziqiang Cui et.al. |
2503.04162 |
null |
2025-03-06 |
KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease |
Yongchao Long et.al. |
2503.04153 |
link |
2025-03-05 |
Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis |
Ahmad Hafez et.al. |
2503.03911 |
link |
2025-03-07 |
LEWIS (LayEr WIse Sparsity) – A Training Free Guided Model Merging Approach |
Hetarth Chopra et.al. |
2503.03874 |
null |
2025-03-04 |
BotUmc: An Uncertainty-Aware Twitter Bot Detection with Multi-view Causal Inference |
Tao Yang et.al. |
2503.03775 |
null |
2025-03-05 |
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems |
Richard Ren et.al. |
2503.03750 |
null |
2025-03-05 |
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models |
Bar Karov et.al. |
2503.03669 |
link |
2025-03-05 |
Structured Outputs Enable General-Purpose LLMs to be Medical Experts |
Guangfu Guo et.al. |
2503.03194 |
null |
2025-03-04 |
SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs |
Samir Abdaljalil et.al. |
2503.03032 |
null |
2025-03-04 |
Effectively Steer LLM To Follow Preference via Building Confident Directions |
Bingqing Song et.al. |
2503.02989 |
null |
2025-03-04 |
Calibrating LLM Confidence with Semantic Steering: A Multi-Prompt Aggregation Framework |
Ziang Zhou et.al. |
2503.02863 |
null |
2025-03-04 |
Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers |
Zicong He et.al. |
2503.02851 |
link |
2025-03-04 |
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs |
Yuzhe Gu et.al. |
2503.02846 |
link |
2025-03-04 |
FinArena: A Human-Agent Collaboration Framework for Financial Market Analysis and Forecasting |
Congluo Xu et.al. |
2503.02692 |
null |
2025-03-04 |
MPO: Boosting LLM Agents with Meta Plan Optimization |
Weimin Xiong et.al. |
2503.02682 |
link |
2025-03-04 |
Multidimensional Consistency Improves Reasoning in Language Models |
Huiyuan Lai et.al. |
2503.02670 |
null |
2025-03-05 |
Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models |
Paul Stangel et.al. |
2503.02623 |
null |
2025-03-04 |
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection |
Dimitra Karkani et.al. |
2503.02442 |
null |
2025-03-04 |
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling |
Hang Zheng et.al. |
2503.02233 |
null |
2025-03-04 |
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models |
Saeed Ranjbar Alvar et.al. |
2503.02175 |
link |
2025-03-03 |
OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments |
Qianwei Wang et.al. |
2503.02106 |
null |
2025-03-05 |
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs |
Tin Nguyen et.al. |
2503.02003 |
link |
2025-03-02 |
NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT |
Jiaying Hong et.al. |
2503.01921 |
link |
2025-03-01 |
How to Steer LLM Latents for Hallucination Detection? |
Seongheon Park et.al. |
2503.01917 |
null |
2025-03-03 |
Can (A)I Change Your Mind? |
Miriam Havin et.al. |
2503.01844 |
link |
2025-03-04 |
Position: Don’t use the CLT in LLM evals with fewer than a few hundred datapoints |
Sam Bowyer et.al. |
2503.01747 |
null |
2025-03-03 |
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution |
Kun Li et.al. |
2503.01695 |
null |
2025-03-03 |
When an LLM is apprehensive about its answers – and when its uncertainty is justified |
Petr Sychev et.al. |
2503.01688 |
link |
2025-03-03 |
Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization |
Siya Qi et.al. |
2503.01670 |
link |
2025-03-03 |
Detecting Stylistic Fingerprints of Large Language Models |
Yehonatan Bitton et.al. |
2503.01659 |
null |
2025-03-03 |
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning |
Wenjie Wu et.al. |
2503.01642 |
null |
2025-03-03 |
Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering |
Zhanghao Hu et.al. |
2503.01606 |
null |
2025-03-03 |
None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering |
Zhi Rui Tam et.al. |
2503.01550 |
null |
2025-03-03 |
Revisiting Large Language Model Pruning using Neuron Semantic Attribution |
Yizhuo Ding et.al. |
2503.01542 |
null |
2025-03-03 |
What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret |
Yufeng Yuan et.al. |
2503.01491 |
null |
2025-03-03 |
Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation |
Linhai Zhang et.al. |
2503.01315 |
null |
2025-03-03 |
LLM-Advisor: An LLM Benchmark for Cost-efficient Path Planning across Multiple Terrains |
Ling Xiao et.al. |
2503.01236 |
null |
2025-03-06 |
CE-U: Cross Entropy Unlearning |
Bo Yang et.al. |
2503.01224 |
null |
2025-03-03 |
Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG |
Wenbin Wang et.al. |
2503.01222 |
link |
2025-03-04 |
Can Large Language Models Help Experimental Design for Causal Discovery? |
Junyi Li et.al. |
2503.01139 |
null |
2025-03-02 |
Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies |
Tianyi Huang et.al. |
2503.00724 |
null |
2025-03-02 |
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development |
Leming Shen et.al. |
2503.00686 |
link |
2025-03-02 |
From Prompting to Partnering: Personalization Features for Human-LLM Interactions |
Si Thu et.al. |
2503.00681 |
null |
2025-03-01 |
Embracing Diversity: A Multi-Perspective Approach with Soft Labels |
Benedetta Muscato et.al. |
2503.00489 |
null |
2025-03-01 |
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack |
Yunfan Gao et.al. |
2503.00353 |
link |
2025-03-01 |
Reducing Large Language Model Safety Risks in Women’s Health using Semantic Entropy |
Jahan C. Penny-Dimri et.al. |
2503.00269 |
null |
2025-02-28 |
A Survey of Uncertainty Estimation Methods on Large Language Models |
Zhiqiu Xia et.al. |
2503.00172 |
null |
2025-02-27 |
Societal Alignment Frameworks Can Improve LLM Alignment |
Karolina Stańczak et.al. |
2503.00069 |
null |
2025-03-04 |
Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs |
Xiaomin Li et.al. |
2502.21239 |
null |
2025-02-28 |
PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information |
Hansi Yang et.al. |
2502.21087 |
null |
2025-03-03 |
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation |
Xujie Yuan et.al. |
2502.20854 |
null |
2025-02-28 |
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow |
Jiaqi Bai et.al. |
2502.20750 |
link |
2025-02-28 |
Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models |
Colleen Gilhuly et.al. |
2502.20647 |
null |
2025-02-28 |
Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems |
Jędrzej Warczyński et.al. |
2502.20609 |
null |
2025-02-27 |
Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization |
Ryan C. Barron et.al. |
2502.20364 |
link |
2025-02-27 |
Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models |
Yi Jing et.al. |
2502.20344 |
null |
2025-02-27 |
Expertise Is What We Want |
Alan Ashworth et.al. |
2502.20335 |
null |
2025-02-27 |
Conformal Tail Risk Control for Large Language Model Alignment |
Catherine Yu-Chi Chen et.al. |
2502.20285 |
null |
2025-02-27 |
Similarity-Distance-Magnitude Universal Verification |
Allen Schmaltz et.al. |
2502.20167 |
link |
2025-03-04 |
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification |
Xiangyan Qu et.al. |
2502.19844 |
link |
2025-02-27 |
Old Experience Helps: Leveraging Survey Methodology to Improve AI Text Annotation Reliability in Social Sciences |
Linzhuo li et.al. |
2502.19679 |
null |
2025-02-26 |
Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review |
Sungduk Yu et.al. |
2502.19614 |
null |
2025-02-26 |
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems |
Nayoung Choi et.al. |
2502.19596 |
null |
2025-02-26 |
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents |
Ashley Lewis et.al. |
2502.19545 |
null |
2025-02-26 |
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices |
Xinru Wang et.al. |
2502.19410 |
null |
2025-02-26 |
Verde: Verification via Refereed Delegation for Machine Learning Programs |
Arasu Arun et.al. |
2502.19405 |
null |
2025-02-26 |
Efficient Federated Search for Retrieval-Augmented Generation |
Rachid Guerraoui et.al. |
2502.19280 |
null |
2025-02-26 |
Bi’an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation |
Zhouyu Jiang et.al. |
2502.19209 |
null |
2025-02-26 |
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement |
Siyuan Zhang et.al. |
2502.19127 |
null |
2025-02-26 |
Talking like Piping and Instrumentation Diagrams (P&IDs) |
Achmad Anggawirya Alimin et.al. |
2502.18928 |
null |
2025-02-26 |
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models |
Shuliang Liu et.al. |
2502.18817 |
null |
2025-02-26 |
Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science |
Xiaohua Wu et.al. |
2502.18729 |
null |
2025-02-25 |
Scalable Best-of-N Selection for Large Language Models via Self-Certainty |
Zhewei Kang et.al. |
2502.18581 |
link |
2025-02-25 |
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions |
Yizhe Zhang et.al. |
2502.18435 |
null |
2025-02-25 |
Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods |
Nicola Cecere et.al. |
2502.18389 |
null |
2025-02-25 |
BRIDO: Bringing Democratic Order to Abstractive Summarization |
Junhyun Lee et.al. |
2502.18342 |
null |
2025-02-25 |
Can LLMs Explain Themselves Counterfactually? |
Zahra Dehghanighobadi et.al. |
2502.18156 |
null |
2025-02-25 |
LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers |
Zhuocheng Zhang et.al. |
2502.18139 |
link |
2025-02-25 |
Verdict: A Library for Scaling Judge-Time Compute |
Nimit Kalra et.al. |
2502.18018 |
link |
2025-02-27 |
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction |
Suozhi Huang et.al. |
2502.17925 |
null |
2025-02-25 |
An Overview of Large Language Models for Statisticians |
Wenlong Ji et.al. |
2502.17814 |
null |
2025-02-25 |
Uncertainty Quantification for LLM-Based Survey Simulations |
Chengpiao Huang et.al. |
2502.17773 |
null |
2025-02-24 |
Hallucination Detection in LLMs Using Spectral Features of Attention Maps |
Jakub Binkowski et.al. |
2502.17598 |
link |
2025-02-24 |
Towards Conditioning Clinical Text Generation for User Control |
Osman Alperen Koraş et.al. |
2502.17571 |
null |
2025-02-22 |
SAE-V: Interpreting Multimodal Models for Enhanced Alignment |
Hantao Lou et.al. |
2502.17514 |
null |
2025-02-24 |
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought |
Boxuan Zhang et.al. |
2502.17214 |
link |
2025-02-24 |
IGDA: Interactive Graph Discovery through Large Language Model Agents |
Alex Havrilla et.al. |
2502.17189 |
null |
2025-02-24 |
LettuceDetect: A Hallucination Detection Framework for RAG Applications |
Ádám Kovács et.al. |
2502.17125 |
link |
2025-02-27 |
LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences |
Sijia Yao et.al. |
2502.17057 |
link |
2025-02-24 |
Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology |
Longchao Da et.al. |
2502.17026 |
null |
2025-02-24 |
Zero-shot Load Forecasting for Integrated Energy Systems: A Large Language Model-based Framework with Multi-task Learning |
Jiaheng Li et.al. |
2502.16896 |
null |
2025-02-24 |
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models |
Yaqi Sun et.al. |
2502.16842 |
null |
2025-02-25 |
Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses |
Tiejin Chen et.al. |
2502.16820 |
null |
2025-02-23 |
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT |
Nidhal Jegham et.al. |
2502.16428 |
null |
2025-02-23 |
Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications |
Feng Ma et.al. |
2502.16402 |
null |
2025-02-22 |
An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning |
Masoud Shokrnezhad et.al. |
2502.16198 |
null |
2025-02-22 |
EPERM: An Evidence Path Enhanced Reasoning Model for Knowledge Graph Question and Answering |
Xiao Long et.al. |
2502.16171 |
null |
2025-02-22 |
ZiGong 1.0: A Large Language Model for Financial Credit |
Yu Lei et.al. |
2502.16159 |
null |
2025-02-22 |
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination |
Yuji Zhang et.al. |
2502.16143 |
null |
2025-02-22 |
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals |
Linda Zeng et.al. |
2502.16101 |
null |
2025-02-21 |
Position: Standard Benchmarks Fail – LLM Agents Present Overlooked Risks for Financial Applications |
Zichen Chen et.al. |
2502.15865 |
null |
2025-02-20 |
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection |
Yihao Xue et.al. |
2502.15845 |
null |
2025-02-20 |
Hallucination Detection in Large Language Models with Metamorphic Relations |
Borui Yang et.al. |
2502.15844 |
null |
2025-02-21 |
AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind |
Zhining Zhang et.al. |
2502.15676 |
link |
2025-02-24 |
Empowering LLMs with Logical Reasoning: A Comprehensive Survey |
Fengxiang Cheng et.al. |
2502.15652 |
null |
2025-02-21 |
A Cautionary Tale About “Neutrally” Informative AI Tools Ahead of the 2025 Federal Elections in Germany |
Ina Dormuth et.al. |
2502.15568 |
null |
2025-02-21 |
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning |
Pengcheng Huang et.al. |
2502.15543 |
link |
2025-02-21 |
Beyond Tools: Understanding How Heavy Users Integrate LLMs into Everyday Tasks and Decision-Making |
Eunhye Kim et.al. |
2502.15395 |
null |
2025-02-21 |
Evaluating Social Biases in LLM Reasoning |
Xuyang Wu et.al. |
2502.15361 |
null |
2025-02-21 |
From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants |
Manisha Mukherjee et.al. |
2502.15237 |
null |
2025-02-20 |
Using tournaments to calculate AUROC for zero-shot classification with LLMs |
Wonjin Yoon et.al. |
2502.15018 |
null |
2025-02-19 |
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment |
Xiangjin Xie et.al. |
2502.14913 |
null |
2025-02-19 |
EvoP: Robust LLM Inference via Evolutionary Pruning |
Shangyu Wu et.al. |
2502.14910 |
null |
2025-02-19 |
KOALA: Knowledge Conflict Augmentations for Robustness in Vision Language Models |
Peter Carragher et.al. |
2502.14908 |
link |
2025-02-20 |
Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning |
Shuyue Stella Li et.al. |
2502.14860 |
link |
2025-02-20 |
Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of LLMs |
Zongxia Li et.al. |
2502.14748 |
null |
2025-02-20 |
CER: Confidence Enhanced Reasoning in LLMs |
Ali Razghandi et.al. |
2502.14634 |
link |
2025-02-20 |
Synergistic Fusion of Multi-Source Knowledge via Evidence Theory for High-Entropy Alloy Discovery |
Minh-Quyet Ha et.al. |
2502.14631 |
null |
2025-02-20 |
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification |
Hyunseok Lee et.al. |
2502.14565 |
null |
2025-02-20 |
Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation |
Austin A. Barr et.al. |
2502.14523 |
link |
2025-02-25 |
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? |
Sergey Pletenev et.al. |
2502.14502 |
link |
2025-02-20 |
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models |
Artem Vazhentsev et.al. |
2502.14427 |
link |
2025-02-20 |
ParallelComp: Parallel Long-Context Compressor for Length Extrapolation |
Jing Xiong et.al. |
2502.14317 |
null |
2025-02-20 |
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models |
Shrey Pandit et.al. |
2502.14302 |
null |
2025-02-20 |
STeCa: Step-level Trajectory Calibration for LLM Agent Learning |
Hanlin Wang et.al. |
2502.14276 |
link |
2025-02-20 |
Fact or Guesswork? Evaluating Large Language Model’s Medical Knowledge with Structured One-Hop Judgment |
Jiaxi Li et.al. |
2502.14275 |
null |
2025-02-20 |
PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant |
Congrui Yin et.al. |
2502.14271 |
null |
2025-02-20 |
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels |
Xiaoou Liu et.al. |
2502.14268 |
null |
2025-02-20 |
Multi-Faceted Studies on Data Poisoning can Advance LLM Development |
Pengfei He et.al. |
2502.14182 |
link |
2025-02-19 |
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation |
Song Duong et.al. |
2502.13674 |
null |
2025-02-19 |
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding |
Feiye Huo et.al. |
2502.13652 |
null |
2025-02-19 |
REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models |
DongGeon Lee et.al. |
2502.13622 |
null |
2025-02-19 |
What are Models Thinking about? Understanding Large Language Model Hallucinations “Psychology” through Model Inner State Analysis |
Peiran Wang et.al. |
2502.13490 |
null |
2025-02-19 |
LLM4Tag: Automatic Tagging System for Information Retrieval via Large Language Models |
Ruiming Tang et.al. |
2502.13481 |
null |
2025-02-19 |
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation |
Jialin Ouyang et.al. |
2502.13442 |
link |
2025-02-19 |
Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning |
Ningke Li et.al. |
2502.13416 |
null |
2025-02-19 |
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval |
Aditya Sharma et.al. |
2502.13369 |
null |
2025-02-18 |
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? |
Yucheng Shi et.al. |
2502.13233 |
null |
2025-02-17 |
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment |
Yuze Zhao et.al. |
2502.13170 |
link |
2025-02-18 |
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization |
Shuo Xing et.al. |
2502.13146 |
link |
2025-02-18 |
Understanding and Rectifying Safety Perception Distortion in VLMs |
Xiaohan Zou et.al. |
2502.13095 |
null |
2025-02-18 |
LAMD: Context-driven Android Malware Detection and Classification with LLMs |
Xingzhi Qian et.al. |
2502.13055 |
null |
2025-02-20 |
Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation |
Sha Li et.al. |
2502.13019 |
null |
2025-02-18 |
Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs |
Adi Simhi et.al. |
2502.12964 |
null |
2025-02-18 |
Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models |
Elena Stringli et.al. |
2502.12821 |
null |
2025-02-20 |
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild |
Saad Obaid ul Islam et.al. |
2502.12769 |
link |
2025-02-18 |
R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs |
Sumin Jo et.al. |
2502.12767 |
link |
2025-02-18 |
“I know myself better, but not really greatly”: Using LLMs to Detect and Explain LLM-Generated Texts |
Jiazhou Ji et.al. |
2502.12743 |
null |
2025-02-18 |
R.R.: Unveiling LLM Training Privacy through Recollection and Ranking |
Wenlong Meng et.al. |
2502.12658 |
link |
2025-02-18 |
COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation |
Sean Wang et.al. |
2502.12601 |
null |
2025-02-18 |
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning |
Xiaoqian Liu et.al. |
2502.12486 |
null |
2025-02-18 |
Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking |
Alireza S. Ziabari et.al. |
2502.12470 |
null |
2025-02-18 |
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation |
Yutong Wang et.al. |
2502.12468 |
null |
2025-02-17 |
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs |
Kan Zhu et.al. |
2502.12216 |
null |
2025-02-17 |
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control |
Jinyan Su et.al. |
2502.12145 |
link |
2025-02-17 |
KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs |
Qi Zhao et.al. |
2502.12029 |
null |
2025-02-17 |
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities |
Fengqing Jiang et.al. |
2502.12025 |
null |
2025-02-17 |
Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning |
Tianyi Wu et.al. |
2502.11962 |
null |
2025-02-17 |
Can Your Uncertainty Scores Detect Hallucinated Entity? |
Min-Hsuan Yeh et.al. |
2502.11948 |
null |
2025-02-17 |
Cognitive-Aligned Document Selection for Retrieval-augmented Generation |
Bingyu Wan et.al. |
2502.11770 |
null |
2025-02-17 |
ReviewEval: An Evaluation Framework for AI-Generated Reviews |
Chavvi Kirtani et.al. |
2502.11736 |
null |
2025-02-17 |
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception |
Shiyu Ni et.al. |
2502.11677 |
null |
2025-02-17 |
Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation |
Arindam Sharma et.al. |
2502.11620 |
null |
2025-02-17 |
Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? |
Hanxing Ding et.al. |
2502.11400 |
null |
2025-02-17 |
“Nuclear Deployed!”: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents |
Rongwu Xu et.al. |
2502.11355 |
link |
2025-02-16 |
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation |
Hieu Nguyen et.al. |
2502.11306 |
null |
2025-02-16 |
Uncertainty-Aware Step-wise Verification with Generative Reward Models |
Zihuiwen Ye et.al. |
2502.11250 |
null |
2025-02-16 |
A Survey of LLM-based Agents in Medicine: How far are we from Baymax? |
Wenxuan Wang et.al. |
2502.11211 |
null |
2025-02-16 |
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs |
Fei Yu et.al. |
2502.11155 |
null |
2025-02-18 |
Valuable Hallucinations: Realizable Non-realistic Propositions |
Qiucheng Chen et.al. |
2502.11113 |
null |
2025-02-16 |
Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications |
Alexandru Lecu et.al. |
2502.11108 |
link |
2025-02-16 |
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models |
Prateek Chhikara et.al. |
2502.11028 |
link |
2025-02-16 |
Leveraging Uncertainty Estimation for Efficient LLM Routing |
Tuo Zhang et.al. |
2502.11021 |
null |
2025-02-16 |
Agentic LLM Framework for Adaptive Decision Discourse |
Antoine Dolant et.al. |
2502.10978 |
null |
2025-02-16 |
SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information |
Xiangyu Zhang et.al. |
2502.10950 |
null |
2025-02-15 |
Towards Effective Extraction and Evaluation of Factual Claims |
Dasha Metropolitansky et.al. |
2502.10855 |
null |
2025-02-15 |
An Empirical Analysis of Uncertainty in Large Language Model Evaluations |
Qiujie Xie et.al. |
2502.10709 |
link |
2025-02-15 |
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization |
Erica Zhang et.al. |
2502.10648 |
link |
2025-02-14 |
Post-training an LLM for RAG? Train on Self-Generated Demonstrations |
Matthew Finlayson et.al. |
2502.10596 |
null |
2025-02-14 |
Can Large Language Model Agents Balance Energy Systems? |
Xinxing Ren et.al. |
2502.10557 |
link |
2025-02-14 |
A novel approach to data generation in generative model |
JaeHong Kim et.al. |
2502.10092 |
null |
2025-02-14 |
Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos |
Weirui Ye et.al. |
2502.09886 |
null |
2025-02-14 |
Automated Hypothesis Validation with Agentic Sequential Falsifications |
Kexin Huang et.al. |
2502.09858 |
link |
2025-02-13 |
Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes |
Taylan G. Topcu et.al. |
2502.09690 |
null |
2025-02-13 |
LP-LM: No Hallucinations in Question Answering with Logic Programming |
Katherine Wu et.al. |
2502.09212 |
link |
2025-02-13 |
Logical Lease Litigation: Prolog and LLMs for Rental Law Compliance in New York |
Sanskar Sehgal et.al. |
2502.09204 |
null |
2025-02-13 |
Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables |
Xuzhao Geng et.al. |
2502.09073 |
null |
2025-02-13 |
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models |
Xin Zhou et.al. |
2502.08922 |
null |
2025-02-13 |
MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training |
Xinxin You et.al. |
2502.08904 |
null |
2025-02-12 |
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation |
Mohammad Mahdi Abootorabi et.al. |
2502.08826 |
link |
2025-02-11 |
Hallucination, Monofacts, and Miscalibration: An Empirical Investigation |
Muqing Miao et.al. |
2502.08666 |
link |
2025-02-10 |
Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis |
Emanuele Ricco et.al. |
2502.08663 |
null |
2025-02-09 |
Few-shot_LLM_Synthetic_Data_with_Distribution_Matching |
Jiyuan Ren et.al. |
2502.08661 |
link |
2025-02-08 |
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions |
Jingxin Xu et.al. |
2502.08657 |
null |
2025-02-12 |
Ensemble based approach to quantifying uncertainty of LLM based classifications |
Srijith Rajamohan et.al. |
2502.08631 |
null |
2025-02-12 |
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding |
Konstantin Berestizshevsky et.al. |
2502.08363 |
link |
2025-02-17 |
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG |
Kushagra Bhushan et.al. |
2502.08356 |
link |
2025-02-12 |
Compromising Honesty and Harmlessness in Language Models via Deception Attacks |
Laurène Vaugrante et.al. |
2502.08301 |
null |
2025-02-12 |
Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis |
Changhua Pei et.al. |
2502.08224 |
null |
2025-02-12 |
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences |
Shanshan Han et.al. |
2502.08142 |
null |
2025-02-12 |
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses |
Sujeong Lee et.al. |
2502.08109 |
null |
2025-02-12 |
Large language models perpetuate bias in palliative care: development and analysis of the Palliative Care Adversarial Dataset (PCAD) |
Naomi Akhras et.al. |
2502.08073 |
null |
2025-02-11 |
From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems |
Yining Hong et.al. |
2502.07974 |
null |
2025-02-11 |
Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning |
Rujing Yao et.al. |
2502.07912 |
link |
2025-02-11 |
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights |
Ahilan Ayyachamy Nadar Ponnusamy et.al. |
2502.07835 |
link |
2025-02-17 |
Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering |
Shuzheng Si et.al. |
2502.07340 |
link |
2025-02-11 |
When More is Less: Understanding Chain-of-Thought Length in LLMs |
Yuyang Wu et.al. |
2502.07266 |
null |
2025-02-11 |
Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs |
Sina Salimian et.al. |
2502.07186 |
null |
2025-02-11 |
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning |
Yinghui Li et.al. |
2502.07184 |
null |
2025-02-11 |
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning |
Feng Chen et.al. |
2502.07154 |
link |
2025-02-11 |
Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning |
Jiayuan Zhu et.al. |
2502.07143 |
null |
2025-02-08 |
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models |
Sina Tayebati et.al. |
2502.06884 |
link |
2025-02-08 |
Group Reasoning Emission Estimation Networks |
Yanming Guo et.al. |
2502.06874 |
null |
2025-02-08 |
Knowledge Graph-Guided Retrieval Augmented Generation |
Xiangrong Zhu et.al. |
2502.06864 |
link |
2025-02-07 |
LLM-Supported Natural Language to Bash Translation |
Finnian Westenfelder et.al. |
2502.06858 |
link |
2025-02-11 |
Calibrating LLMs with Information-Theoretic Evidential Deep Learning |
Yawei Li et.al. |
2502.06351 |
link |
2025-02-10 |
Expect the Unexpected: FailSafe Long Context QA for Finance |
Kiran Kamble et.al. |
2502.06329 |
null |
2025-02-10 |
Emergent Response Planning in LLM |
Zhichen Dong et.al. |
2502.06258 |
null |
2025-02-10 |
Confidence Improves Self-Consistency in LLMs |
Amir Taubenfeld et.al. |
2502.06233 |
null |
2025-02-10 |
Unveiling the Capabilities of Large Language Models in Detecting Offensive Language with Annotation Disagreement |
Junyu Lu et.al. |
2502.06207 |
link |
2025-02-10 |
Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis |
Sanket Jantre et.al. |
2502.06173 |
null |
2025-02-09 |
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation |
Runchuan Zhu et.al. |
2502.05911 |
null |
2025-02-09 |
Self-Training Large Language Models for Tool-Use Without Demonstrations |
Ne Luo et.al. |
2502.05867 |
null |
2025-02-09 |
Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models |
Cheng Peng Huang et.al. |
2502.05825 |
null |
2025-02-09 |
Assessing confidence in frontier AI safety cases |
Stephen Barrett et.al. |
2502.05791 |
null |
2025-02-09 |
Visual Text Mining with Progressive Taxonomy Construction for Environmental Studies |
Sam Yu-Te Lee et.al. |
2502.05731 |
link |
2025-02-07 |
SEER: Self-Explainability Enhancement of Large Language Models’ Representations |
Guanxu Chen et.al. |
2502.05242 |
null |
2025-02-07 |
ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework |
Xiaoyu Deng et.al. |
2502.05084 |
null |
2025-02-07 |
Aligning Black-box Language Models with Human Judgments |
Gerrit J. J. van den Burg et.al. |
2502.04997 |
null |
2025-02-11 |
CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs |
Roman Vashurin et.al. |
2502.04964 |
null |
2025-02-07 |
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks |
Jing Yang et.al. |
2502.04797 |
link |
2025-02-10 |
Confidence Elicitation: A New Attack Vector for Large Language Models |
Brian Formento et.al. |
2502.04643 |
link |
2025-02-06 |
TruthFlow: Truthful LLM Generation via Representation Flow Correction |
Hanyu Wang et.al. |
2502.04556 |
null |
2025-02-06 |
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization |
Yu-Neng Chuang et.al. |
2502.04428 |
null |
2025-02-06 |
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference |
Xing Li et.al. |
2502.04420 |
link |
2025-02-11 |
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing |
Kunfeng Lai et.al. |
2502.04411 |
null |
2025-02-06 |
FAS: Fast ANN-SNN Conversion for Spiking Large Language Models |
Long Chen et.al. |
2502.04405 |
link |
2025-02-05 |
Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning |
Jonathan Kim et.al. |
2502.04381 |
null |
2025-02-05 |
MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction |
Xiao Hu et.al. |
2502.04360 |
null |
2025-02-04 |
LLM-ProS: Analyzing Large Language Models’ Performance in Competitive Problem Solving |
Md Sifat Hossain et.al. |
2502.04355 |
null |
2025-02-06 |
Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software |
Andreas Baumann et.al. |
2502.03916 |
null |
2025-02-06 |
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation |
Bo Pang et.al. |
2502.03860 |
null |
2025-02-12 |
Syntriever: How to Train Your Retriever with Synthetic Data from LLMs |
Minsang Kim et.al. |
2502.03824 |
link |
2025-02-10 |
Large Language Models for Multi-Robot Systems: A Survey |
Peihan Li et.al. |
2502.03814 |
link |
2025-02-08 |
Enhancing Hallucination Detection through Noise Injection |
Litian Liu et.al. |
2502.03799 |
null |
2025-02-06 |
Adaptive Semantic Prompt Caching with VectorQ |
Luis Gaspar Schroeder et.al. |
2502.03771 |
null |
2025-02-06 |
Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models |
Rui Cai et.al. |
2502.03715 |
null |
2025-02-06 |
MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers |
Nicole Cho et.al. |
2502.03711 |
null |
2025-02-06 |
Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers |
Daniel Beaglehole et.al. |
2502.03708 |
null |
2025-02-06 |
LLM Alignment as Retriever Optimization: An Information Retrieval Perspective |
Bowen Jin et.al. |
2502.03699 |
null |
2025-02-05 |
Reflection-Window Decoding: Text Generation with Selective Refinement |
Zeyu Tang et.al. |
2502.03678 |
null |
2025-02-05 |
Advancing Reasoning in Large Language Models: Promising Methods and Approaches |
Avinash Patil et.al. |
2502.03671 |
null |
2025-02-04 |
Artificial Intelligence and Legal Analysis: Implications for Legal Education and the Profession |
Lee Peoples et.al. |
2502.03487 |
null |
2025-02-05 |
A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) |
Yiye Chen et.al. |
2502.03450 |
null |
2025-02-05 |
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs |
Ben Liu et.al. |
2502.03283 |
null |
2025-02-05 |
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models |
Jialiang Wu et.al. |
2502.03199 |
null |
2025-02-05 |
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates |
Aissatou Diallo et.al. |
2502.03080 |
null |
2025-02-04 |
An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification |
Riddhi More et.al. |
2502.02715 |
null |
2025-02-04 |
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization |
Yize Wu et.al. |
2502.02493 |
null |
2025-02-04 |
Activation-Informed Merging of Large Language Models |
Amin Heyrani Nobari et.al. |
2502.02421 |
link |
2025-02-04 |
From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing |
Siwei Luo et.al. |
2502.02025 |
null |
2025-02-03 |
SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models |
Diyana Muhammed et.al. |
2502.01812 |
null |
2025-02-03 |
Position: Towards a Responsible LLM-empowered Multi-Agent Systems |
Jinwei Hu et.al. |
2502.01714 |
null |
2025-02-02 |
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model |
Hadas Ben-Atya et.al. |
2502.01691 |
null |
2025-02-02 |
LIBRA: Measuring Bias of Large Language Model from a Local Context |
Bo Pang et.al. |
2502.01679 |
null |
2025-02-01 |
Benchmark on Peer Review Toxic Detection: A Challenging Task with a New Dataset |
Man Luo et.al. |
2502.01676 |
null |
2025-02-03 |
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering |
Zongxi Li et.al. |
2502.01523 |
null |
2025-02-03 |
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant |
Gaole He et.al. |
2502.01390 |
link |
2025-02-03 |
PSSD: Making Large Language Models Self-denial via Human Psyche Structure |
Jinzhi Liao et.al. |
2502.01344 |
link |
2025-02-03 |
Human-Agent Interaction in Synthetic Social Networks: A Framework for Studying Online Polarization |
Tim Donkers et.al. |
2502.01340 |
null |
2025-02-03 |
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models |
Xinyan Guan et.al. |
2502.01142 |
null |
2025-02-03 |
Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning |
Guanlin Li et.al. |
2502.01116 |
null |
2025-02-03 |
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution |
Kanika Goswami et.al. |
2502.00989 |
null |
2025-02-03 |
Context-Aware Hierarchical Merging for Long Document Summarization |
Litu Ou et.al. |
2502.00977 |
null |
2025-02-02 |
Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications |
Yixin Wu et.al. |
2502.00808 |
link |
2025-02-02 |
Generative AI for Analyzing Participatory Rural Appraisal Data: An Exploratory Case Study in Gender Research |
Srividya Sheshadri et.al. |
2502.00763 |
null |
2025-02-02 |
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction |
Chao Wang et.al. |
2502.00717 |
null |
2025-02-01 |
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation |
Stuart Armstrong et.al. |
2502.00580 |
link |
2025-02-01 |
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning |
Zhi Zhou et.al. |
2502.00511 |
null |
2025-02-01 |
Estimating LLM Uncertainty with Logits |
Huan Ma et.al. |
2502.00290 |
link |
2025-01-31 |
DermaSynth: Rich Synthetic Image-Text Pairs Using Open Access Dermatology Datasets |
Abdurrahim Yilmaz et.al. |
2502.00196 |
null |
2025-01-31 |
Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models |
Alina Shutova et.al. |
2501.19392 |
link |
2025-01-31 |
Towards Adaptive Self-Improvement for Smarter Energy Systems |
Alexander Sommer et.al. |
2501.19340 |
null |
2025-01-31 |
Homogeneity Bias as Differential Sampling Uncertainty in Language Models |
Messi H. J. Lee et.al. |
2501.19337 |
null |
2025-01-31 |
Offline Learning for Combinatorial Multi-armed Bandits |
Xutong Liu et.al. |
2501.19300 |
null |
2025-01-31 |
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs |
Kejia Zhang et.al. |
2501.19164 |
null |
2025-01-31 |
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities |
Arjun Krishna et.al. |
2501.19012 |
null |
2025-01-30 |
Survey and Improvement Strategies for Gene Prioritization with Large Language Models |
Matthew Neeley et.al. |
2501.18794 |
null |
2025-01-30 |
Differentially Private Steering for Large Language Model Alignment |
Anmol Goel et.al. |
2501.18532 |
link |
2025-01-30 |
CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization |
Yanxia Deng et.al. |
2501.18475 |
null |
2025-01-31 |
RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing |
Jinyao Guo et.al. |
2501.18160 |
link |
2025-01-29 |
Large Language Models Think Too Fast To Explore Effectively |
Lan Pan et.al. |
2501.18009 |
null |
2025-01-29 |
Uncertainty Quantification and Decomposition for LLM-based Recommendation |
Wonbin Kweon et.al. |
2501.17630 |
link |
2025-01-29 |
Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis |
Kunrong Li et.al. |
2501.17598 |
null |
2025-01-29 |
CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs |
Amey Hengle et.al. |
2501.17581 |
null |
2025-01-28 |
Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization |
Zilu Tang et.al. |
2501.17295 |
null |
2025-01-26 |
Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics |
Jin Hyun Park et.al. |
2501.17187 |
link |
2025-02-01 |
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering |
Beiming Liu et.al. |
2501.17183 |
null |
2025-01-28 |
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data |
Deren Lei et.al. |
2501.17144 |
link |
2025-01-28 |
MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search |
Shuozhi Yuan et.al. |
2501.16607 |
null |
2025-01-27 |
Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models |
Huayu Li et.al. |
2501.16215 |
link |
2025-01-27 |
Parametric Retrieval Augmented Generation |
Weihang Su et.al. |
2501.15915 |
link |
2025-01-26 |
Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis |
Robinson Umeike et.al. |
2501.15370 |
null |
2025-01-26 |
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection |
Bo Yang et.al. |
2501.15355 |
null |
2025-01-25 |
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning |
Ayan Sengupta et.al. |
2501.15296 |
null |
2025-01-25 |
Can Large Language Models Be Trusted as Black-Box Evolutionary Optimizers for Combinatorial Problems? |
Jie Zhao et.al. |
2501.15081 |
null |
2025-01-25 |
Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations |
Harshita Chopra et.al. |
2501.15056 |
null |
2025-01-25 |
Federated Retrieval Augmented Generation for Multi-Product Question Answering |
Parshin Shojaee et.al. |
2501.14998 |
null |
2025-01-24 |
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing |
Madeline Anderson et.al. |
2501.14905 |
null |
2025-01-24 |
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs |
Hang Luo et.al. |
2501.14892 |
link |
2025-01-24 |
Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains |
Xu Chu et.al. |
2501.14431 |
null |
2025-01-24 |
Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph |
Xujian Liang et.al. |
2501.14300 |
link |
2025-01-24 |
Humanity’s Last Exam |
Long Phan et.al. |
2501.14249 |
null |
2025-01-24 |
AI Chatbots as Professional Service Agents: Developing a Professional Identity |
Wenwen Li et.al. |
2501.14179 |
null |
2025-01-23 |
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting |
Xing Hu et.al. |
2501.13987 |
link |
2025-01-23 |
Comprehensive Modeling and Question Answering of Cancer Clinical Practice Guidelines using LLMs |
Bhumika Gupta et.al. |
2501.13984 |
null |
2025-01-20 |
A Layered Multi-Expert Framework for Long-Context Mental Health Assessments |
Jinwen Tang et.al. |
2501.13951 |
null |
2025-01-23 |
CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation |
Guofeng Cui et.al. |
2501.13927 |
null |
2025-01-23 |
On the Reasoning Capacity of AI Models and How to Quantify It |
Santosh Kumar Radha et.al. |
2501.13833 |
null |
2025-01-23 |
Hallucinations Can Improve Large Language Models in Drug Discovery |
Shuzhou Yuan et.al. |
2501.13824 |
null |
2025-01-22 |
OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models |
Chongren Sun et.al. |
2501.12975 |
link |
2025-01-22 |
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces |
Zhenran Xu et.al. |
2501.12909 |
null |
2025-01-22 |
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home |
Viktor Moskvoretskii et.al. |
2501.12835 |
null |
2025-01-30 |
EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering |
Chang Zong et.al. |
2501.12746 |
null |
2025-01-25 |
Online Preference Alignment for Language Models via Count-based Exploration |
Chenjia Bai et.al. |
2501.12735 |
link |
2025-01-22 |
Paradigm-Based Automatic HDL Code Generation Using LLMs |
Wenhao Sun et.al. |
2501.12702 |
null |
2025-01-19 |
AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model |
Lipeng Ma et.al. |
2501.11031 |
link |
2025-01-18 |
Iterative Tree Analysis for Medical Critics |
Zenan Huang et.al. |
2501.10642 |
null |
2025-01-18 |
Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks |
Xin Yi et.al. |
2501.10639 |
link |
2025-01-17 |
4bit-Quantization in Vector-Embedding for RAG |
Taehee Jeong et.al. |
2501.10534 |
link |
2025-01-17 |
Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling |
Suvodip Dey et.al. |
2501.10316 |
link |
2025-01-17 |
Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions |
Zhijie Tan et.al. |
2501.10011 |
null |
2025-01-17 |
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models |
Qiang Liu et.al. |
2501.09997 |
null |
2025-01-22 |
FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs |
Zengyi Gao et.al. |
2501.09957 |
null |
2025-01-17 |
Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs |
Reham Omar et.al. |
2501.09928 |
link |
2025-01-17 |
Towards A Litmus Test for Common Sense |
Hugo Latapie et.al. |
2501.09913 |
null |
2025-01-17 |
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis |
Zhe Chen et.al. |
2501.09887 |
null |
2025-01-16 |
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs |
Nada Saadi et.al. |
2501.09825 |
null |
2025-01-16 |
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models |
Maxwell J. Yin et.al. |
2501.09804 |
null |
2025-01-24 |
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong |
Tairan Fu et.al. |
2501.09775 |
null |
2025-01-16 |
Confidence Estimation for Error Detection in Text-to-SQL Systems |
Oleg Somov et.al. |
2501.09527 |
link |
2025-01-16 |
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy |
Huandong Wang et.al. |
2501.09431 |
null |
2025-01-16 |
Rational Tuning of LLM Cascades via Probabilistic Modeling |
Michael J. Zellinger et.al. |
2501.09345 |
null |
2025-01-16 |
To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation |
Kaustubh D. Dhole et.al. |
2501.09292 |
null |
2025-01-15 |
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach |
Alireza Ghaffari et.al. |
2501.09107 |
null |
2025-01-15 |
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot |
Ruixiang Jiang et.al. |
2501.09012 |
link |
2025-01-15 |
Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching |
Chuangtao Ma et.al. |
2501.08686 |
link |
2025-01-14 |
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models |
Anurag Kumar et.al. |
2501.08421 |
null |
2025-01-14 |
OptiChat: Bridging Optimization Models and Practitioners with Large Language Models |
Hao Chen et.al. |
2501.08406 |
link |
2025-01-14 |
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them |
Abhilasha Ravichander et.al. |
2501.08292 |
null |
2025-01-14 |
Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering |
Feijie Wu et.al. |
2501.07813 |
null |
2025-01-13 |
GPT as a Monte Carlo Language Tree: A Probabilistic Perspective |
Kun-Peng Ning et.al. |
2501.07641 |
null |
2025-01-13 |
SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models |
Fabien Bernier et.al. |
2501.07639 |
null |
2025-01-13 |
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment |
Difei Gu et.al. |
2501.07525 |
link |
2025-01-13 |
Enhancing LLM’s Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection |
Xin Yin et.al. |
2501.07425 |
null |
2025-01-13 |
ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training |
Jiayang Wu et.al. |
2501.07078 |
link |
2025-01-11 |
Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering |
Yinghao Hu et.al. |
2501.06521 |
link |
2025-01-11 |
First Token Probability Guided RAG for Telecom Question Answering |
Tingwei Chen et.al. |
2501.06468 |
null |
2025-01-21 |
MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare |
Ye Chen et.al. |
2501.06465 |
null |
2025-01-10 |
Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea |
Eunjung Cho et.al. |
2501.05981 |
null |
2025-01-10 |
Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models |
Sungjae Lee et.al. |
2501.05752 |
null |
2025-01-09 |
Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning |
Laura Puccioni et.al. |
2501.05248 |
null |
2025-01-09 |
Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments |
Yifan Xu et.al. |
2501.04947 |
null |
2025-01-09 |
HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers |
Yiyao Yang et.al. |
2501.04908 |
link |
2025-01-09 |
SUGAR: Leveraging Contextual Confidence for Smarter Retrieval |
Hanna Zubkova et.al. |
2501.04899 |
null |
2025-01-08 |
Re-ranking the Context for Multimodal Retrieval Augmented Generation |
Matin Mortaheb et.al. |
2501.04695 |
null |
2025-01-08 |
Multi-task retriever fine-tuning for domain-specific and efficient RAG |
Patrice Béchard et.al. |
2501.04652 |
null |
2025-01-16 |
Knowledge Retrieval Based on Generative AI |
Te-Lun Yang et.al. |
2501.04635 |
null |
2025-01-07 |
RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance |
Matin Mortaheb et.al. |
2501.03995 |
null |
2025-01-07 |
Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles |
Yuxi Xia et.al. |
2501.03991 |
null |
2025-01-07 |
Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States |
Jurgita Kapočiūtė-Dzikienė et.al. |
2501.03952 |
null |
2025-01-08 |
A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval |
Shuo Tong et.al. |
2501.03295 |
null |
2025-01-06 |
CALM: Curiosity-Driven Auditing for Large Language Models |
Xiang Zheng et.al. |
2501.02997 |
link |
2025-01-19 |
FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models |
Zhuo Chen et.al. |
2501.02968 |
null |
2025-01-09 |
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion |
Zhaoyi Yan et.al. |
2501.02795 |
null |
2025-01-06 |
QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance |
Binita Saha et.al. |
2501.02702 |
null |
2025-01-06 |
EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models |
Andrés Villa et.al. |
2501.02699 |
null |
2025-01-05 |
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications |
Zhe Chen et.al. |
2501.02460 |
null |
2025-01-04 |
Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation |
Shijie Wang et.al. |
2501.02226 |
null |
2025-01-04 |
EvoPath: Evolutionary Meta-path Discovery with Large Language Models for Complex Heterogeneous Information Networks |
Shixuan Liu et.al. |
2501.02192 |
null |
2025-01-04 |
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit |
Huixue Zhou et.al. |
2501.02173 |
null |
2025-01-02 |
Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection |
Kedi Chen et.al. |
2501.02020 |
null |
2025-01-03 |
Multi-Agent Conversational Online Learning for Adaptive LLM Response Identification |
Xiangxiang Dai et.al. |
2501.01849 |
link |
2025-01-03 |
LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries |
Michal Kuk et.al. |
2501.01711 |
null |
2025-01-03 |
(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges |
Mohamed Hisham Abdellatif et.al. |
2501.01588 |
null |
2025-01-02 |
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery |
Kanishk Gandhi et.al. |
2501.01540 |
link |
2025-01-02 |
Aligning Large Language Models for Faithful Integrity Against Opposing Argument |
Yong Zhao et.al. |
2501.01336 |
link |
2025-01-02 |
Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension |
Yanbo Fang et.al. |
2501.01332 |
null |
2025-01-03 |
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking |
Xiaoxue Cheng et.al. |
2501.01306 |
null |
2025-01-02 |
Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion |
Qiyuan He et.al. |
2501.01246 |
null |
2025-01-02 |
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization |
Yongle Huang et.al. |
2501.01245 |
link |
2025-01-02 |
Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method |
Ruichen Zhang et.al. |
2501.01141 |
null |
2025-01-02 |
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models |
Yanwen Huang et.al. |
2501.01059 |
null |
2025-01-02 |
Dynamic Scaling of Unit Tests for Code Reward Modeling |
Zeyao Ma et.al. |
2501.01054 |
null |
2025-01-07 |
LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management |
Yichen Luo et.al. |
2501.00826 |
null |
2025-01-01 |
NMM-HRI: Natural Multi-modal Human-Robot Interaction with Voice and Deictic Posture via Large Language Model |
Yuzhi Lai et.al. |
2501.00785 |
null |
2024-12-31 |
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs |
Harit Vishwakarma et.al. |
2501.00555 |
null |
2024-12-31 |
A review of faithfulness metrics for hallucination assessment in Large Language Models |
Ben Malin et.al. |
2501.00269 |
null |
2024-12-31 |
CancerKG.ORG A Web-scale, Interactive, Verifiable Knowledge Graph-LLM Hybrid for Assisting with Optimal Cancer Treatment and Care |
Michael Gubanov et.al. |
2501.00223 |
null |
2024-12-30 |
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions |
Mourad Heddaya et.al. |
2501.00097 |
null |
2024-12-30 |
Facilitating large language model Russian adaptation with Learned Embedding Propagation |
Mikhail Tikhomirov et.al. |
2412.21140 |
link |
2024-12-30 |
KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation |
Siyuan Fang et.al. |
2412.20995 |
null |
2024-12-30 |
Are LLMs Really Not Knowledgable? Mining the Submerged Knowledge in LLMs’ Memory |
Xingjian Tao et.al. |
2412.20846 |
null |
2024-12-30 |
UBER: Uncertainty-Based Evolution with Large Language Models for Automatic Heuristic Design |
Zijie Chen et.al. |
2412.20694 |
link |
2025-01-05 |
Distilling Desired Comments for Enhanced Code Review with Large Language Models |
Yongda Yu et.al. |
2412.20340 |
null |
2024-12-29 |
Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain |
Shintaro Ozaki et.al. |
2412.20309 |
link |
2024-12-28 |
ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty |
Qing Zong et.al. |
2412.20251 |
link |
2024-12-27 |
Toward Adaptive Reasoning in Large Language Models with Thought Rollback |
Sijia Chen et.al. |
2412.19707 |
link |
2024-12-27 |
Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs |
Zhe Yang et.al. |
2412.19513 |
link |
2024-12-27 |
MBQ: Modality-Balanced Quantization for Large Vision-Language Models |
Shiyao Li et.al. |
2412.19509 |
link |
2024-12-26 |
RAG with Differential Privacy |
Nicolas Grislain et.al. |
2412.19291 |
link |
2025-01-03 |
MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models |
Kaiwen Zuo et.al. |
2412.18947 |
null |
2025-01-06 |
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation |
Derong Xu et.al. |
2412.18537 |
link |
2024-12-24 |
Is Large Language Model Good at Triple Set Prediction? An Empirical Study |
Yuan Yuan et.al. |
2412.18443 |
null |
2024-12-24 |
Annotating References to Mythological Entities in French Literature |
Thierry Poibeau et.al. |
2412.18270 |
null |
2024-12-24 |
Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) – a Large Language Model Chatbot for Perioperative Medicine |
Yu He Ke et.al. |
2412.18096 |
null |
2024-12-23 |
Trustworthy and Efficient LLMs Meet Databases |
Kyoungmin Kim et.al. |
2412.18022 |
null |
2024-12-22 |
The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM’s Internal States |
Fabian Ridder et.al. |
2412.17056 |
link |
2024-12-22 |
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs |
Alexander von Recum et.al. |
2412.16974 |
null |
2024-12-28 |
Lillama: Large Language Models Compression via Low-Rank Feature Distillation |
Yaya Sy et.al. |
2412.16719 |
null |
2024-12-21 |
Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks |
Jinyan Su et.al. |
2412.16708 |
link |
2024-12-21 |
AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles |
Aritra Kumar Lahiri et.al. |
2412.16701 |
null |
2024-12-21 |
Internalized Self-Correction for Large Language Models |
Nishanth Upadhyaya et.al. |
2412.16653 |
null |
2024-12-21 |
Identifying Cyberbullying Roles in Social Media |
Manuel Sandoval et.al. |
2412.16417 |
null |
2024-12-20 |
Towards Safe and Honest AI Agents with Neural Self-Other Overlap |
Marc Carauleanu et.al. |
2412.16325 |
null |
2024-12-20 |
Logical Consistency of Large Language Models in Fact-checking |
Bishwamittra Ghosh et.al. |
2412.16100 |
null |
2024-12-20 |
To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models |
Jessica Y. Bo et.al. |
2412.15584 |
null |
2024-12-24 |
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage |
Saehyung Lee et.al. |
2412.15484 |
null |
2024-12-19 |
Systematic Evaluation of Long-Context LLMs on Financial Concepts |
Lavanya Gupta et.al. |
2412.15386 |
null |
2024-12-19 |
Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models |
Nishtha N. Vaidya et.al. |
2412.15309 |
null |
2024-12-19 |
A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation |
Bhaskarjit Sarmah et.al. |
2412.15298 |
null |
2024-12-19 |
Confidence in the Reasoning of Large Language Models |
Yudi Pawitan et.al. |
2412.15296 |
link |
2024-12-17 |
SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation |
Yuzheng Cai et.al. |
2412.15272 |
link |
2024-12-17 |
A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models |
Gongbo Zhang et.al. |
2412.15271 |
null |
2024-12-15 |
LLMs for Literature Review: Are we there yet? |
Shubham Agarwal et.al. |
2412.15249 |
null |
2024-12-19 |
Rethinking Uncertainty Estimation in Natural Language Generation |
Lukas Aichberger et.al. |
2412.15176 |
null |
2024-12-19 |
Adaptive Pruning for Large Language Models with Structural Importance Awareness |
Haotian Zheng et.al. |
2412.15127 |
null |
2024-12-19 |
Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability |
Xiangsen Chen et.al. |
2412.15101 |
null |
2024-12-19 |
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response |
Junyu Luo et.al. |
2412.14922 |
link |
2024-12-19 |
Dehallucinating Parallel Context Extension for Retrieval-Augmented Generation |
Zexiong Ma et.al. |
2412.14905 |
null |
2024-12-19 |
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling |
Junyi Li et.al. |
2412.14860 |
null |
2024-12-19 |
Query pipeline optimization for cancer patient question answering systems |
Maolin He et.al. |
2412.14751 |
null |
2024-12-19 |
On Verbalized Confidence Scores for LLMs |
Daniel Yang et.al. |
2412.14737 |
link |
2024-12-25 |
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models |
Zijun Chen et.al. |
2412.14660 |
link |
2024-12-19 |
Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment |
Teng Xiao et.al. |
2412.14516 |
link |
2024-12-19 |
FaultExplainer: Leveraging Large Language Models for Interpretable Fault Detection and Diagnosis |
Abdullah Khan et.al. |
2412.14492 |
link |
2024-12-18 |
LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis |
Chengpeng Wang et.al. |
2412.14399 |
null |
2024-12-18 |
Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets |
Simon Thorne et.al. |
2412.14062 |
null |
2024-12-18 |
Discovering maximally consistent distribution of causal tournaments with Large Language Models |
Federico Baldo et.al. |
2412.14019 |
null |
2024-12-27 |
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence |
Jinghan He et.al. |
2412.13949 |
null |
2024-12-29 |
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection |
Le Yang et.al. |
2412.13817 |
link |
2024-12-18 |
Meta-Reflection: A Feedback-Free Reflection Learning Framework |
Yaoke Wang et.al. |
2412.13781 |
null |
2024-12-18 |
Are LLMs Good Literature Review Writers? Evaluating the Literature Review Writing Ability of Large Language Models |
Xuemei Tang et.al. |
2412.13612 |
null |
2024-12-18 |
Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement |
Qianyue Wang et.al. |
2412.13575 |
link |
2024-12-18 |
C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System |
Parker Addison et.al. |
2412.13163 |
null |
2024-12-17 |
Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health |
Vivek Kumar et.al. |
2412.12981 |
link |
2024-12-17 |
A Survey of Calibration Process for Black-Box LLMs |
Liangru Xie et.al. |
2412.12767 |
null |
2024-12-18 |
Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models |
Seungeun Oh et.al. |
2412.12687 |
null |
2024-12-17 |
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context |
Zhiyuan Chang et.al. |
2412.12632 |
null |
2024-12-17 |
Jailbreaking? One Step Is Enough! |
Weixiong Zheng et.al. |
2412.12621 |
null |
2024-12-17 |
When to Speak, When to Abstain: Contrastive Decoding with Abstention |
Hyuhng Joon Kim et.al. |
2412.12527 |
null |
2024-12-12 |
Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off |
Eoin M. Kenny et.al. |
2412.12169 |
link |
2024-12-11 |
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration |
Yuanhao Shen et.al. |
2412.12151 |
link |
2024-12-16 |
LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts |
Zhuhao Wang et.al. |
2412.12001 |
link |
2024-12-16 |
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation |
Xiaoxi Li et.al. |
2412.11919 |
link |
2024-12-16 |
Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments |
Andrii Nikolaiev et.al. |
2412.11908 |
null |
2024-12-16 |
A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection |
Simon Hachmeier et.al. |
2412.11851 |
link |
2024-12-16 |
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models |
Boyang Xue et.al. |
2412.11803 |
link |
2024-12-16 |
Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods |
Diana Bar-Or Nirman et.al. |
2412.11625 |
null |
2024-12-16 |
Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes |
Antonio Carlos Rivera et.al. |
2412.11396 |
null |
2024-12-15 |
CATER: Leveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation |
Kurando IIDA et.al. |
2412.11261 |
null |
2024-12-15 |
Do Tutors Learn from Equity Training and Can Generative AI Assess It? |
Danielle R. Thomas et.al. |
2412.11255 |
link |
2024-12-15 |
Task-Oriented Dialog Systems for the Senegalese Wolof Language |
Derguene Mbaye et.al. |
2412.11203 |
null |
2024-12-15 |
Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning |
Shengqiong Wu et.al. |
2412.11124 |
null |
2024-12-15 |
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning |
Yun Qu et.al. |
2412.11120 |
link |
2024-12-15 |
Empowering LLMs to Understand and Generate Complex Vector Graphics |
Ximing Xing et.al. |
2412.11102 |
null |
2024-12-17 |
MedG-KRP: Medical Graph Knowledge Representation Probing |
Gabriel R. Rosenbaum et.al. |
2412.10982 |
null |
2024-12-14 |
Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data |
Xue Wu et.al. |
2412.10654 |
null |
2024-12-13 |
Benchmarking large language models for materials synthesis: the case of atomic layer deposition |
Angel Yanguas-Gil et.al. |
2412.10477 |
null |
2024-12-13 |
Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts |
Hazel Kim et.al. |
2412.10246 |
null |
2024-12-13 |
How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives |
Timour Ichmoukhamedov et.al. |
2412.10220 |
link |
2024-12-13 |
TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering |
Bingru Li et.al. |
2412.10139 |
null |
2024-12-13 |
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL |
Yang Qin et.al. |
2412.10138 |
link |
2024-12-12 |
DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction |
Yu Feng et.al. |
2412.09572 |
null |
2024-12-12 |
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion |
Ben Liu et.al. |
2412.09094 |
link |
2024-12-12 |
Dial-In LLM: Human-Aligned Dialogue Intent Clustering with LLM-in-the-loop |
Mengze Hong et.al. |
2412.09049 |
null |
2024-12-12 |
Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning |
Wenna Lai et.al. |
2412.09046 |
null |
2024-12-12 |
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty |
Meizhi Zhong et.al. |
2412.09036 |
null |
2024-12-11 |
Learning to Reason via Self-Iterative Process Feedback for Small Language Models |
Kaiyuan Chen et.al. |
2412.08393 |
null |
2024-12-11 |
What You See Is Not Always What You Get: An Empirical Study of Code Comprehension by Large Language Models |
Bangshuo Zhu et.al. |
2412.08098 |
null |
2024-12-10 |
HalluCana: Fixing LLM Hallucination with A Canary Lookahead |
Tianyi Li et.al. |
2412.07965 |
null |
2024-12-10 |
Forking Paths in Neural Text Generation |
Eric Bigelow et.al. |
2412.07961 |
null |
2024-12-10 |
Low-Rank Correction for Quantized LLMs |
Meyer Scetbon et.al. |
2412.07902 |
null |
2024-12-08 |
Language Model as Visual Explainer |
Xingyi Yang et.al. |
2412.07802 |
null |
2024-12-16 |
Granite Guardian |
Inkit Padhi et.al. |
2412.07724 |
link |
2024-12-10 |
Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation |
Qinhong Lin et.al. |
2412.07255 |
null |
2024-12-10 |
Filling Memory Gaps: Enhancing Continual Semantic Parsing via SQL Syntax Variance-Guided LLMs without Real Data Replay |
Ruiheng Liu et.al. |
2412.07246 |
null |
2024-12-10 |
MAPLE: A Framework for Active Preference Learning Guided by Large Language Models |
Saaduddin Mahmud et.al. |
2412.07207 |
null |
2024-12-10 |
When Graph Meets Retrieval Augmented Generation for Wireless Networks: A Tutorial and Case Study |
Yang Xiong et.al. |
2412.07189 |
null |
2024-12-10 |
Post-Training Statistical Calibration for Higher Activation Sparsity |
Vui Seng Chua et.al. |
2412.07174 |
link |
2024-12-11 |
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models |
Jieyu Zhang et.al. |
2412.07012 |
link |
2024-12-09 |
Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study |
Ehsan Shareghi et.al. |
2412.06272 |
null |
2024-12-09 |
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization |
Kangyu Zhu et.al. |
2412.06141 |
link |
2024-12-08 |
Hallucination-aware Optimization for Large Language Model-empowered Communications |
Yinqiu Liu et.al. |
2412.06007 |
link |
2024-12-07 |
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models |
Haizhou Shi et.al. |
2412.05723 |
link |
2024-12-07 |
Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent |
Ziyuan Qin et.al. |
2412.05722 |
null |
2024-12-07 |
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions |
Ola Shorinwa et.al. |
2412.05563 |
null |
2024-12-07 |
Ranking of Large Language Model with Nonparametric Prompts |
Zebin Wang et.al. |
2412.05506 |
null |
2024-12-06 |
Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization |
Subhojyoti Mukherjee et.al. |
2412.05469 |
null |
2024-12-06 |
A Graph-Based Approach for Conversational AI-Driven Personal Memory Capture and Retrieval in a Real-world Application |
Savini Kashmira et.al. |
2412.05447 |
null |
2024-12-06 |
HiVeGen – Hierarchical LLM-based Verilog Generation for Scalable Chip Design |
Jinwei Tang et.al. |
2412.05393 |
null |
2024-12-09 |
Enhancing FKG.in: automating Indian food composition analysis |
Saransh Kumar Gupta et.al. |
2412.05248 |
null |
2024-12-06 |
100% Hallucination Elimination Using Acurai |
Michael C. Wood et.al. |
2412.05223 |
link |
2024-12-06 |
Steps are all you need: Rethinking STEM Education with Prompt Engineering |
Krishnasai Addala et.al. |
2412.05023 |
null |
2024-12-06 |
Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance |
Xuchan Bao et.al. |
2412.04746 |
null |
2024-12-06 |
LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs |
Xuan Chen et.al. |
2412.04690 |
null |
2024-12-05 |
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning |
Manish Bhattarai et.al. |
2412.04661 |
link |
2024-12-10 |
Argumentative Experience: Reducing Confirmation Bias on Controversial Issues through LLM-Generated Multi-Persona Debates |
Li Shi et.al. |
2412.04629 |
null |
2024-12-05 |
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion |
Jiuhai Chen et.al. |
2412.04424 |
link |
2024-12-05 |
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation |
Xuying Li et.al. |
2412.04415 |
null |
2024-12-05 |
Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots |
Maria Paola Priola et.al. |
2412.04235 |
null |
2024-12-05 |
Reducing Tool Hallucination via Reliability Alignment |
Hongshen Xu et.al. |
2412.04141 |
null |
2024-12-04 |
A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences |
Gabriel Lino Garcia et.al. |
2412.03531 |
null |
2024-12-04 |
You’re (Not) My Type – Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? |
Dominic Lohr et.al. |
2412.03516 |
null |
2024-12-03 |
Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning |
Ranganath Krishnan et.al. |
2412.02904 |
null |
2024-12-03 |
An Evolutionary Large Language Model for Hallucination Mitigation |
Abdennour Boulesnane et.al. |
2412.02790 |
null |
2024-12-03 |
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation |
Junyuan Zhang et.al. |
2412.02592 |
link |
2024-12-03 |
Semantic Tokens in Retrieval Augmented Generation |
Joel Suro et.al. |
2412.02563 |
null |
2024-12-04 |
The use of large language models to enhance cancer clinical trial educational materials |
Mingye Gao et.al. |
2412.01955 |
null |
2024-12-04 |
The Reality of AI and Biorisk |
Aidan Peppin et.al. |
2412.01946 |
null |
2024-12-02 |
R-Bot: An LLM-based Query Rewrite System |
Zhaoyan Sun et.al. |
2412.01661 |
null |
2024-12-02 |
Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input |
Francesco Taioli et.al. |
2412.01250 |
null |
2024-12-02 |
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages |
Jia Guo et.al. |
2412.01186 |
link |
2024-12-02 |
SAUP: Situation Awareness Uncertainty Propagation on LLM Agent |
Qiwei Zhao et.al. |
2412.01033 |
null |
2024-12-02 |
AI Benchmarks and Datasets for LLM Evaluation |
Todor Ivanov et.al. |
2412.01020 |
null |
2024-12-06 |
Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection |
Shanu Kumar et.al. |
2412.00353 |
null |
2024-11-30 |
Human-Like Code Quality Evaluation through LLM-based Recursive Semantic Comprehension |
Fangzhou Xu et.al. |
2412.00314 |
null |
2024-11-29 |
An AI-Driven Data Mesh Architecture Enhancing Decision-Making in Infrastructure Construction and Public Procurement |
Saurabh Mishra et.al. |
2412.00224 |
null |
2024-11-24 |
Improving Medical Diagnostics with Vision-Language Models: Convex Hull-Based Uncertainty Analysis |
Ferhat Ozgur Catak et.al. |
2412.00056 |
null |
2024-12-02 |
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis |
Alessandro Scirè et.al. |
2411.19655 |
link |
2024-11-29 |
RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation |
Xianfeng Tan et.al. |
2411.19528 |
null |
2024-11-29 |
Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems |
Shengming Zhao et.al. |
2411.19463 |
null |
2024-11-28 |
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs |
Anirudh Phukan et.al. |
2411.19187 |
null |
2024-11-28 |
Mars-PO: Multi-Agent Reasoning System Preference Optimization |
Xiaoxuan Lou et.al. |
2411.19039 |
null |
2024-11-28 |
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models |
Jisheng Bai et.al. |
2411.18953 |
link |
2024-11-27 |
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students |
Tiffany Zhu et.al. |
2411.18708 |
null |
2024-11-27 |
Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track |
Deepak Gupta et.al. |
2411.18069 |
null |
2024-11-26 |
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation |
Sankalp Sinha et.al. |
2411.17945 |
link |
2024-11-26 |
AI2T: Building Trustable AI Tutors by Interactively Teaching a Self-Aware Learning Agent |
Daniel Weitekamp et.al. |
2411.17924 |
null |
2024-11-26 |
$H^3$ Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs |
Selim Furkan Tekin et.al. |
2411.17792 |
link |
2024-11-26 |
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation |
Harsh Singh et.al. |
2411.17636 |
null |
2024-11-26 |
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models |
Pengfei Cao et.al. |
2411.17401 |
null |
2024-11-26 |
Can LLMs be Good Graph Judger for Knowledge Graph Construction? |
Haoyu Huang et.al. |
2411.17388 |
link |
2024-11-26 |
Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning |
Milena Chadimová et.al. |
2411.17304 |
null |
2024-11-26 |
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator |
Fan Yang et.al. |
2411.17261 |
null |
2024-11-25 |
Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries |
Harshavardhan Battula et.al. |
2411.16818 |
null |
2024-11-25 |
Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models |
Alireza Amiri-Margavi et.al. |
2411.16797 |
null |
2024-11-25 |
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs |
Wey Yeh Choong et.al. |
2411.16771 |
link |
2024-11-23 |
Text-to-SQL Calibration: No Need to Ask – Just Rescale Model Probabilities |
Ashwin Ramachandran et.al. |
2411.16742 |
null |
2024-11-23 |
Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction |
Mitchell Rosser et.al. |
2411.16723 |
null |
2024-11-28 |
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation |
Sanjana Ramprasad et.al. |
2411.16638 |
null |
2024-12-03 |
AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning |
Amy Xin et.al. |
2411.16495 |
link |
2024-11-25 |
Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models |
Zhihua Duan et.al. |
2411.16189 |
null |
2024-11-24 |
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown |
Lifu Tu et.al. |
2411.15993 |
null |
2024-11-23 |
Ontology-Constrained Generation of Domain-Specific Clinical Summaries |
Gaya Mehenni et.al. |
2411.15666 |
link |
2024-11-23 |
MC-NEST – Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree |
Gollam Rabby et.al. |
2411.15645 |
link |
2024-11-23 |
“All that Glitters”: Approaches to Evaluations with Unreliable Model and Human Annotations |
Michael Hardy et.al. |
2411.15634 |
link |
2024-11-22 |
Sycophancy in Large Language Models: Causes and Mitigations |
Lars Malmqvist et.al. |
2411.15287 |
null |
2024-11-18 |
Can Open-source LLMs Enhance Data Augmentation for Toxic Detection?: An Experimental Study |
Zheng Hui et.al. |
2411.15175 |
null |
2024-11-22 |
Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation |
Colin Diggs et.al. |
2411.14971 |
null |
2024-11-22 |
SwissADT: An Audio Description Translation System for Swiss Languages |
Lukas Fischer et.al. |
2411.14967 |
null |
2024-12-01 |
G-RAG: Knowledge Expansion in Material Science |
Radeen Mostafa et.al. |
2411.14592 |
link |
2024-11-20 |
The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz |
David Noever et.al. |
2411.14486 |
null |
2024-11-19 |
Why you don’t overfit, and don’t need Bayes if you only train for one epoch |
Laurence Aitchison et.al. |
2411.14478 |
null |
2024-11-18 |
Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning |
Elizaveta Reganova et.al. |
2411.14465 |
null |
2024-11-15 |
Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models |
Maryam Shoaeinaeini et.al. |
2411.14457 |
null |
2024-11-21 |
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance |
Haozhe Zhao et.al. |
2411.14279 |
null |
2024-11-21 |
Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective |
Ernests Lavrinovics et.al. |
2411.14258 |
null |
2024-11-21 |
RAG-Thief: Scalable Extraction of Private Data from Retrieval-Augmented Generation Applications with Agent-based Attacks |
Changyue Jiang et.al. |
2411.14110 |
null |
2024-11-21 |
XAgents: A Framework for Interpretable Rule-Based Multi-Agents Cooperation |
Hailong Yang et.al. |
2411.13932 |
null |
2024-11-21 |
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels |
Jianhao Yan et.al. |
2411.13775 |
link |
2024-11-20 |
Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics |
Ryan Mok et.al. |
2411.13685 |
link |
2024-11-21 |
Disentangling Memory and Reasoning Ability in Large Language Models |
Mingyu Jin et.al. |
2411.13504 |
link |
2024-11-20 |
Fact-Level Confidence Calibration and Self-Correction |
Yige Yuan et.al. |
2411.13343 |
link |
2024-11-20 |
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding |
Nabeel Seedat et.al. |
2411.13163 |
null |
2024-11-16 |
A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery |
Grace Sng et.al. |
2411.12759 |
null |
2024-11-19 |
Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs |
Malay Kumar et.al. |
2411.12685 |
null |
2024-11-15 |
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination |
Haojie Zheng et.al. |
2411.12591 |
link |
2024-11-19 |
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering |
Aryan Keluskar et.al. |
2411.12395 |
null |
2024-11-28 |
VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation |
Ruiyang Zhang et.al. |
2411.11919 |
null |
2024-11-07 |
Deploying Large Language Models With Retrieval Augmented Generation |
Sonal Prabhune et.al. |
2411.11895 |
link |
2024-11-18 |
Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality |
Viktoriia Chekalina et.al. |
2411.11531 |
null |
2024-11-18 |
Membership Inference Attack against Long-Context Large Language Models |
Zixiong Wang et.al. |
2411.11424 |
null |
2024-11-29 |
Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? |
Rosalia Tufano et.al. |
2411.11401 |
link |
2024-11-17 |
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering |
Zeping Yu et.al. |
2411.10950 |
link |
2024-11-16 |
Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation |
Shuyang Hou et.al. |
2411.10753 |
null |
2024-11-16 |
I’m Spartacus, No, I’m Spartacus: Measuring and Understanding LLM Identity Confusion |
Kun Li et.al. |
2411.10683 |
null |
2024-11-15 |
Personalization of Code Readability Evaluation Based on LLM Using Collaborative Filtering |
Buntaro Hiraki et.al. |
2411.10583 |
null |
2024-11-15 |
On the Privacy Risk of In-context Learning |
Haonan Duan et.al. |
2411.10512 |
null |
2024-11-15 |
Understanding The Effect Of Temperature On Alignment With Human Opinions |
Maja Pavlovic et.al. |
2411.10080 |
null |
2024-11-15 |
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity |
Zichen Song et.al. |
2411.10069 |
null |
2024-11-15 |
Experiences from Using LLMs for Repository Mining Studies in Empirical Software Engineering |
Vincenzo de Martino et.al. |
2411.09974 |
null |
2024-11-15 |
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference |
Janghwan Lee et.al. |
2411.09909 |
null |
2024-11-14 |
LLM Hallucination Reasoning with Zero-shot Knowledge Test |
Seongmin Lee et.al. |
2411.09689 |
null |
2024-11-14 |
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine |
Jean Seo et.al. |
2411.09255 |
link |
2024-11-14 |
Toward Democratized Generative AI in Next-Generation Mobile Edge Networks |
Ruichen Zhang et.al. |
2411.09148 |
null |
2024-11-13 |
The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models |
Daniel P. Jeong et.al. |
2411.08870 |
link |
2024-11-04 |
QCG-Rerank: Chunks Graph Rerank with Query Expansion in Retrieval-Augmented LLMs for Tourism Domain |
Qikai Wei et.al. |
2411.08724 |
null |
2024-11-13 |
Neural Topic Modeling with Large Language Models in the Loop |
Xiaohao Yang et.al. |
2411.08534 |
null |
2024-11-13 |
Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach |
Shangfeng Chen et.al. |
2411.08348 |
null |
2024-11-13 |
Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering |
Farouq Sammour et.al. |
2411.08320 |
null |
2024-11-12 |
Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data |
Juanhui Li et.al. |
2411.08028 |
null |
2024-11-12 |
From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents |
Chuyi Kong et.al. |
2411.07965 |
null |
2024-11-13 |
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders |
Xiaofeng Zhu et.al. |
2411.07870 |
null |
2024-11-12 |
Verbosity $\neq$ Veracity: Demystify Verbosity Compensation Behavior of Large Language Models |
Yusen Zhang et.al. |
2411.07858 |
link |
2024-11-12 |
OWLed: Outlier-weighed Layerwise Pruning for Efficient Autonomous Driving Framework |
Jiaxi Li et.al. |
2411.07711 |
link |
2024-11-12 |
DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises |
Nan Xu et.al. |
2411.07457 |
link |
2024-11-16 |
Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation |
Ziwei Liu et.al. |
2411.07021 |
null |
2024-11-11 |
LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help? |
Rikiya Takehi et.al. |
2411.06877 |
link |
2024-11-11 |
AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant |
Yujia Zhou et.al. |
2411.06805 |
link |
2024-11-11 |
Anchor Attention, Small Cache: Code Generation with Large Language Models |
Xiangyu Zhang et.al. |
2411.06680 |
link |
2024-11-10 |
CriticAL: Critic Automation with Language Models |
Michael Y. Li et.al. |
2411.06590 |
null |
2024-11-10 |
Epistemic Integrity in Large Language Models |
Bijean Ghafouri et.al. |
2411.06528 |
link |
2024-11-10 |
Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques |
Daniil Sulimov et.al. |
2411.06445 |
null |
2024-11-09 |
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems |
Hailey Joren et.al. |
2411.06037 |
null |
2024-11-12 |
Game-theoretic LLM: Agent Workflow for Negotiation Games |
Wenyue Hua et.al. |
2411.05990 |
link |
2024-11-08 |
FactLens: Benchmarking Fine-Grained Fact Verification |
Kushan Mitra et.al. |
2411.05980 |
null |
2024-11-08 |
Mitigating Hallucination with ZeroG: An Advanced Knowledge Management Engine |
Anantha Sharma et.al. |
2411.05936 |
null |
2024-11-08 |
The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent |
Leon O. H. Kroczek et.al. |
2411.05653 |
null |
2024-11-16 |
Web Archives Metadata Generation with GPT-4o: Challenges and Insights |
Abigail Yongping Huang et.al. |
2411.05409 |
link |
2024-11-08 |
Seeing Through the Fog: A Cost-Effectiveness Analysis of Hallucination Detection Systems |
Alexander Thomas et.al. |
2411.05270 |
null |
2024-11-07 |
Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability |
Yanjun Gao et.al. |
2411.04962 |
null |
2024-11-07 |
Prompt-Guided Internal States for Hallucination Detection of Large Language Models |
Fujie Zhang et.al. |
2411.04847 |
link |
2024-11-07 |
Self-Calibrated Listwise Reranking with Large Language Models |
Ruiyang Ren et.al. |
2411.04602 |
null |
2024-11-07 |
LLM-R: A Framework for Domain-Adaptive Maintenance Scheme Generation Combining Hierarchical Agents and RAG |
Laifa Tao et.al. |
2411.04476 |
null |
2024-11-07 |
Bayesian Calibration of Win Rate Estimation with LLM Evaluators |
Yicheng Gao et.al. |
2411.04424 |
link |
2024-11-06 |
A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI |
Melusi Malinga et.al. |
2411.04316 |
null |
2024-11-06 |
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? |
Daniel P. Jeong et.al. |
2411.04118 |
link |
2024-11-06 |
Fine-Grained Guidance for Retrievers: Leveraging LLMs’ Feedback in Retrieval-Augmented Generation |
Yuhang Liu et.al. |
2411.03957 |
null |
2024-11-06 |
EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning |
Kiran Purohit et.al. |
2411.03877 |
link |
2024-11-06 |
QUILL: Quotation Generation Enhancement of Large Language Models |
Jin Xiao et.al. |
2411.03675 |
link |
2024-11-05 |
Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature |
Viviane Torres da Silva et.al. |
2411.03484 |
null |
2024-11-05 |
VERITAS: A Unified Approach to Reliability Evaluation |
Rajkumar Ramamurthy et.al. |
2411.03300 |
null |
2024-11-05 |
Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities |
Ryosuke Takata et.al. |
2411.03252 |
null |
2024-11-05 |
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems |
Jiejun Tan et.al. |
2411.02959 |
link |
2024-11-05 |
Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning |
Tao Zhang et.al. |
2411.02864 |
null |
2024-11-05 |
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization |
Yuxi Xie et.al. |
2411.02712 |
link |
2024-11-07 |
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees |
Fan Nie et.al. |
2411.02603 |
null |
2024-11-03 |
Graph-based Confidence Calibration for Large Language Models |
Yukun Li et.al. |
2411.02454 |
null |
2024-11-03 |
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models |
Aliyah R. Hsu et.al. |
2411.02448 |
link |
2024-11-04 |
Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models |
Guangzhi Xiong et.al. |
2411.02382 |
null |
2024-11-04 |
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI |
Ramneet Kaur et.al. |
2411.02381 |
null |
2024-11-04 |
“Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization |
Eldar Kurtic et.al. |
2411.02355 |
null |
2024-11-03 |
Autoformulation of Mathematical Optimization Models Using LLMs |
Nicolás Astorga et.al. |
2411.01679 |
null |
2024-11-03 |
Ontology Population using LLMs |
Sanaz Saki Norouzi et.al. |
2411.01612 |
null |
2024-11-02 |
AMREx: AMR for Explainable Fact Verification |
Chathuri Jayaweera et.al. |
2411.01343 |
null |
2024-11-01 |
Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output |
Hithesh Sankararaman et.al. |
2411.01022 |
null |
2024-10-30 |
FPE-LLM: Highly Intelligent Time-Series Forecasting and Language Interaction LLM in Energy Systems |
Zihang Qiu et.al. |
2411.00852 |
null |
2024-10-30 |
GWQ: Gradient-Aware Weight Quantization for Large Language Models |
Yihua Shao et.al. |
2411.00850 |
null |
2024-11-01 |
CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation |
Ziting Wang et.al. |
2411.00744 |
null |
2024-11-01 |
Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval |
Qingfei Zhao et.al. |
2411.00689 |
null |
2024-11-01 |
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation |
Bohan Lyu et.al. |
2411.00412 |
null |
2024-11-01 |
Beyond Utility: Evaluating LLM as Recommender |
Chumeng Jiang et.al. |
2411.00331 |
link |
2024-11-01 |
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering |
Jiwoong Sohn et.al. |
2411.00300 |
link |
2024-11-01 |
RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models |
Sraavya Sambara et.al. |
2411.00299 |
null |
2024-10-29 |
Problem Categorization Can Help Large Language Models Solve Math Problems |
Amogh Akella et.al. |
2411.00042 |
null |
2024-10-28 |
A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges |
Zifeng Wang et.al. |
2411.00024 |
null |
2024-11-04 |
Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models |
Ognjen et.al. |
2411.00023 |
null |
2024-10-31 |
Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs |
Liyi Chen et.al. |
2410.23875 |
link |
2024-10-31 |
Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs |
Shuyang Yu et.al. |
2410.23605 |
null |
2024-10-31 |
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval |
Sheryl Hsu et.al. |
2410.23214 |
null |
2024-10-30 |
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning |
Jingkun Ma et.al. |
2410.22995 |
null |
2024-10-30 |
Retrieval-Augmented Generation with Estimation of Source Reliability |
Jeongyeon Hwang et.al. |
2410.22954 |
null |
2024-10-30 |
Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations |
Leonardo Ranaldi et.al. |
2410.22874 |
null |
2024-10-30 |
Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot |
Sejin Lee et.al. |
2410.22767 |
link |
2024-10-30 |
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings |
Yashvir S. Grewal et.al. |
2410.22685 |
null |
2024-10-29 |
Distinguishing Ignorance from Error in LLM Hallucinations |
Adi Simhi et.al. |
2410.22071 |
link |
2024-10-29 |
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications |
Monica Riedler et.al. |
2410.21943 |
link |
2024-10-29 |
MARCO: Multi-Agent Real-time Chat Orchestration |
Anubhav Shrimal et.al. |
2410.21784 |
null |
2024-10-28 |
LLM-Forest for Health Tabular Data Imputation |
Xinrui He et.al. |
2410.21520 |
null |
2024-10-28 |
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation |
Shih-Yang Liu et.al. |
2410.21271 |
null |
2024-10-28 |
CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models |
Meiqi Chen et.al. |
2410.21067 |
null |
2024-10-28 |
Reward Modeling with Weak Supervision for Language Models |
Ben Hauptvogel et.al. |
2410.20869 |
link |
2024-10-28 |
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation |
Jaechang Kim et.al. |
2410.20811 |
null |
2024-10-28 |
Graph-based Uncertainty Metrics for Long-form Language Model Outputs |
Mingjian Jiang et.al. |
2410.20783 |
link |
2024-10-28 |
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation |
Dongryeol Lee et.al. |
2410.20774 |
link |
2024-10-28 |
Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation |
Mufei Li et.al. |
2410.20724 |
link |
2024-10-27 |
Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains |
Jiemin Wu et.al. |
2410.20340 |
null |
2024-10-26 |
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of Large Language Models |
Mohammad Beigi et.al. |
2410.20199 |
null |
2024-10-26 |
Uncertainty-Penalized Direct Preference Optimization |
Sam Houliston et.al. |
2410.20187 |
null |
2024-10-26 |
Mask-based Membership Inference Attacks for Retrieval-Augmented Generation |
Mingrui Liu et.al. |
2410.20142 |
null |
2024-10-26 |
Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics |
Mikhail Rumiantsau et.al. |
2410.20024 |
null |
2024-10-25 |
FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning |
Nicole Cho et.al. |
2410.19727 |
null |
2024-10-25 |
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning |
Xiangyu Zeng et.al. |
2410.19702 |
null |
2024-10-30 |
ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems |
Ishneet Sukhvinder Singh et.al. |
2410.19572 |
null |
2024-11-01 |
Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization |
Anthony Cui et.al. |
2410.19499 |
null |
2024-10-25 |
A Debate-Driven Experiment on LLM Hallucinations and Accuracy |
Ray Li et.al. |
2410.19485 |
null |
2024-10-25 |
Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models |
Liam Barkley et.al. |
2410.19385 |
null |
2024-10-25 |
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning |
Yujian Liu et.al. |
2410.19290 |
link |
2024-10-24 |
Prebunking Elections Rumors: Artificial Intelligence Assisted Interventions Increase Confidence in American Elections |
Mitchell Linegar et.al. |
2410.19202 |
null |
2024-10-24 |
AlignCap: Aligning Speech Emotion Captioning to Human Preferences |
Ziqi Liang et.al. |
2410.19134 |
null |
2024-10-24 |
LLM Tree Search |
Dylan Wilson et.al. |
2410.19117 |
null |
2024-10-30 |
Dynamic Vocabulary Pruning in Early-Exit LLMs |
Jort Vincenti et.al. |
2410.18952 |
link |
2024-10-24 |
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations |
Aryo Pradipta Gema et.al. |
2410.18860 |
link |
2024-10-25 |
An LLM Agent for Automatic Geospatial Data Analysis |
Yuxing Chen et.al. |
2410.18792 |
null |
2024-10-24 |
Task Calibration: Calibrating Large Language Models on Inference Tasks |
Yingjie Li et.al. |
2410.18764 |
null |
2024-10-24 |
LLM-Slice: Dedicated Wireless Network Slicing for Large Language Models |
Boyi Liu et.al. |
2410.18499 |
null |
2024-10-23 |
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models |
Kim Sung-Bin et.al. |
2410.18325 |
link |
2024-10-23 |
Multilingual Hallucination Gaps in Large Language Models |
Cléa Chataigner et.al. |
2410.18270 |
null |
2024-10-23 |
Beware of Calibration Data for Pruning Large Language Models |
Yixin Ji et.al. |
2410.17711 |
null |
2024-10-23 |
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models |
Guijin Son et.al. |
2410.17578 |
link |
2024-10-29 |
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination |
Jerry Huang et.al. |
2410.17477 |
null |
2024-10-22 |
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs |
Reza Fayyazi et.al. |
2410.17406 |
link |
2024-10-22 |
DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR |
Miguel Contreras et.al. |
2410.17363 |
null |
2024-10-22 |
Are Large Language Models Ready for Travel Planning? |
Ruiping Ren et.al. |
2410.17333 |
null |
2024-10-22 |
Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy |
Benedict Aaron Tjandra et.al. |
2410.17234 |
null |
2024-10-23 |
GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks |
Shuyang Hou et.al. |
2410.17031 |
null |
2024-10-22 |
SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine |
Xiaochen Wang et.al. |
2410.17021 |
null |
2024-10-22 |
Combining Ontological Knowledge and Large Language Model for User-Friendly Service Robots |
Haru Nakajima et.al. |
2410.16804 |
null |
2024-10-21 |
Large language models enabled multiagent ensemble method for efficient EHR data labeling |
Jingwei Huang et.al. |
2410.16543 |
null |
2024-10-21 |
Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models’ Reasoning with Formal Logic |
Jason Chan et.al. |
2410.16502 |
null |
2024-10-18 |
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs |
Rui Pu et.al. |
2410.16327 |
null |
2024-10-29 |
Can Knowledge Editing Really Correct Hallucinations? |
Baixiang Huang et.al. |
2410.16251 |
link |
2024-10-21 |
Analyzing Context Contributions in LLM-based Machine Translation |
Emmanouil Zaranis et.al. |
2410.16246 |
null |
2024-10-23 |
IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems |
Yihuan Mao et.al. |
2410.16237 |
null |
2024-10-21 |
Information for Conversation Generation: Proposals Utilising Knowledge Graphs |
Alex Clay et.al. |
2410.16196 |
null |
2024-10-22 |
Reducing Hallucinations in Vision-Language Models via Latent Space Steering |
Sheng Liu et.al. |
2410.15778 |
link |
2024-10-21 |
Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding |
Derong Xu et.al. |
2410.15702 |
null |
2024-10-21 |
Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences |
Yiping Ma et.al. |
2410.15701 |
null |
2024-10-21 |
NetSafe: Exploring the Topological Safety of Multi-agent Networks |
Miao Yu et.al. |
2410.15686 |
null |
2024-10-21 |
Bayesian Concept Bottleneck Models with LLM Priors |
Jean Feng et.al. |
2410.15555 |
link |
2024-10-20 |
Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini |
Chanseo Lee et.al. |
2410.15528 |
null |
2024-10-22 |
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence |
Norbert Tihanyi et.al. |
2410.15490 |
null |
2024-10-20 |
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training |
Shahrad Mohammadzadeh et.al. |
2410.15460 |
null |
2024-10-20 |
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges |
Haitao Li et.al. |
2410.15393 |
link |
2024-10-20 |
A Survey of Hallucination in Large Visual Language Models |
Wei Lan et.al. |
2410.15359 |
null |
2024-10-20 |
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment |
Songtao Jiang et.al. |
2410.15334 |
null |
2024-10-20 |
A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice |
Hsiu-Yuan Huang et.al. |
2410.15326 |
null |
2024-10-20 |
Causality for Large Language Models |
Anpeng Wu et.al. |
2410.15319 |
link |
2024-10-20 |
MAD: Move AI Decompiler to Improve Transparency and Auditability on Non-Open-Source Blockchain Smart Contract |
Eason Chen et.al. |
2410.15275 |
null |
2024-10-19 |
Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction |
Yinhan He et.al. |
2410.15165 |
link |
2024-10-19 |
MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification |
Yin Li et.al. |
2410.15154 |
link |
2024-10-22 |
Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization |
Zihui Wu et.al. |
2410.15052 |
link |
2024-10-19 |
“Ghost of the past”: identifying and resolving privacy leakage from LLM’s memory through proactive user interaction |
Shuning Zhang et.al. |
2410.14931 |
null |
2024-10-18 |
FedSpaLLM: Federated Pruning of Large Language Models |
Guangji Bai et.al. |
2410.14852 |
null |
2024-10-18 |
Enabling Scalable Evaluation of Bias Patterns in Medical LLMs |
Hamed Fayyaz et.al. |
2410.14763 |
link |
2024-10-22 |
ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries |
Kishan Maharaj et.al. |
2410.14748 |
null |
2024-10-17 |
Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors |
Anthony Sicilia et.al. |
2410.14744 |
null |
2024-10-18 |
Enhancing Large Language Models’ Situated Faithfulness to External Contexts |
Yukun Huang et.al. |
2410.14675 |
link |
2024-10-22 |
Do LLMs estimate uncertainty well in instruction-following? |
Juyeon Heo et.al. |
2410.14582 |
link |
2024-10-18 |
Combining Entropy and Matrix Nuclear Norm for Enhanced Evaluation of Language Models |
James Vo et.al. |
2410.14480 |
null |
2024-10-18 |
Zero-shot Action Localization via the Confidence of Large Vision-Language Models |
Josiah Aklilu et.al. |
2410.14340 |
null |
2024-10-18 |
Critical Questions Generation: Motivation and Challenges |
Blanca Calvo Figueras et.al. |
2410.14335 |
link |
2024-10-18 |
ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM |
Songheng Zhang et.al. |
2410.14331 |
null |
2024-10-18 |
LoGU: Long-form Generation with Uncertainty Expressions |
Ruihan Yang et.al. |
2410.14309 |
link |
2024-10-22 |
Good Parenting is all you need – Multi-agentic LLM Hallucination Mitigation |
Ted Kwartler et.al. |
2410.14262 |
null |
2024-10-18 |
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models |
Olga Loginova et.al. |
2410.14248 |
null |
2024-10-21 |
Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning |
Xingyu Tan et.al. |
2410.14211 |
null |
2024-10-18 |
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment |
Chenhang Cui et.al. |
2410.14148 |
null |
2024-10-17 |
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization |
Catarina G. Belem et.al. |
2410.13961 |
link |
2024-10-17 |
Goal Inference from Open-Ended Dialog |
Rachel Ma et.al. |
2410.13957 |
null |
2024-10-17 |
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards |
Xinze Li et.al. |
2410.13509 |
link |
2024-10-17 |
Advancing Large Language Model Attribution through Self-Improving |
Lei Huang et.al. |
2410.13298 |
null |
2024-10-17 |
Learning to Route with Confidence Tokens |
Yu-Neng Chuang et.al. |
2410.13284 |
null |
2024-10-17 |
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning |
Minseok Choi et.al. |
2410.13274 |
null |
2024-10-17 |
Atomic Calibration of LLMs in Long-Form Generations |
Caiqi Zhang et.al. |
2410.13246 |
null |
2024-10-17 |
LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch |
Caigao Jiang et.al. |
2410.13213 |
link |
2024-10-17 |
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs |
Forrest Sheng Bao et.al. |
2410.13210 |
link |
2024-10-18 |
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback |
Zonghai Yao et.al. |
2410.13191 |
link |
2024-10-21 |
Utilizing Large Language Models in An Iterative Paradigm with Domain Feedback for Molecule Optimization |
Khiem Le et.al. |
2410.13147 |
null |
2024-10-17 |
Trust but Verify: Programmatic VLM Evaluation in the Wild |
Viraj Prabhu et.al. |
2410.13121 |
null |
2024-10-17 |
Learning to Summarize from LLM-generated Feedback |
Hwanjun Song et.al. |
2410.13116 |
null |
2024-10-16 |
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models |
Jie Ren et.al. |
2410.13088 |
null |
2024-10-16 |
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models |
Linhao Luo et.al. |
2410.13080 |
link |
2024-10-16 |
PromptExp: Multi-granularity Prompt Explanation of Large Language Models |
Ximing Dong et.al. |
2410.13073 |
null |
2024-10-16 |
LLM Confidence Evaluation Measures in Zero-Shot CSS Classification |
David Farr et.al. |
2410.13047 |
null |
2024-10-16 |
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems |
Asir Saadat et.al. |
2410.13029 |
null |
2024-10-16 |
LLM Chain Ensembles for Scalable and Accurate Data Annotation |
David Farr et.al. |
2410.13006 |
link |
2024-10-16 |
REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models |
Ambuje Gupta et.al. |
2410.12890 |
null |
2024-10-16 |
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs |
Herun Wan et.al. |
2410.12600 |
null |
2024-10-16 |
A Claim Decomposition Benchmark for Long-form Answer Verification |
Zhihao Zhang et.al. |
2410.12558 |
link |
2024-10-17 |
MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration |
Jinjie Wei et.al. |
2410.12532 |
null |
2024-10-16 |
RosePO: Aligning LLM-based Recommenders with Human Values |
Jiayi Liao et.al. |
2410.12519 |
null |
2024-10-16 |
KcMF: A Knowledge-compliant Framework for Schema and Entity Matching with Fine-tuning-free LLMs |
Yongqin Xu et.al. |
2410.12480 |
null |
2024-10-18 |
MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models |
Boyang Xue et.al. |
2410.12478 |
link |
2024-10-16 |
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs |
Jingming Zhuo et.al. |
2410.12405 |
link |
2024-10-17 |
Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs |
Lei Sun et.al. |
2410.12298 |
null |
2024-10-16 |
Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors |
Linwei Tao et.al. |
2410.12295 |
null |
2024-10-17 |
LLM-based Cognitive Models of Students with Misconceptions |
Shashank Sonkar et.al. |
2410.12294 |
null |
2024-10-16 |
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation |
Junjie Chen et.al. |
2410.12265 |
null |
2024-10-16 |
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity |
Jintao Liu et.al. |
2410.12248 |
link |
2024-10-16 |
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation |
Xiaonan Jing et.al. |
2410.12222 |
null |
2024-10-16 |
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning |
Huiwen Wu et.al. |
2410.12130 |
null |
2024-10-15 |
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction |
Kaiqiao Han et.al. |
2410.12040 |
link |
2024-10-15 |
Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents |
Bolun Sun et.al. |
2410.11906 |
null |
2024-10-15 |
Zero-shot Model-based Reinforcement Learning using Large Language Models |
Abdelhakim Benechehab et.al. |
2410.11711 |
link |
2024-10-15 |
Black-box Uncertainty Quantification Method for LLM-as-a-Judge |
Nico Wagner et.al. |
2410.11594 |
null |
2024-10-15 |
AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data |
Xinjie Zhao et.al. |
2410.11531 |
null |
2024-10-15 |
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability |
Zhongxiang Sun et.al. |
2410.11414 |
null |
2024-10-15 |
LargePiG: Your Large Language Model is Secretly a Pointer Generator |
Zhongxiang Sun et.al. |
2410.11366 |
null |
2024-10-15 |
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs |
Shuo Li et.al. |
2410.11302 |
null |
2024-10-15 |
On the Capacity of Citation Generation by Large Language Models |
Haosheng Qian et.al. |
2410.11217 |
null |
2024-10-14 |
LLM Unlearning via Loss Adjustment with Only Forget Data |
Yaxuan Wang et.al. |
2410.11143 |
null |
2024-10-14 |
Can Structured Data Reduce Epistemic Uncertainty? |
Shriram M S et.al. |
2410.11141 |
null |
2024-10-14 |
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only |
Jihan Yao et.al. |
2410.11055 |
link |
2024-10-13 |
3DS: Decomposed Difficulty Data Selection’s Case Study on LLM Medical Domain Adaptation |
Hongxin Ding et.al. |
2410.10901 |
null |
2024-10-14 |
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance |
Sachin Goyal et.al. |
2410.10796 |
link |
2024-10-16 |
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators |
Rasoul Shafipour et.al. |
2410.10714 |
null |
2024-10-14 |
On Calibration of LLM-based Guard Models for Reliable Content Moderation |
Hongfu Liu et.al. |
2410.10414 |
link |
2024-10-14 |
Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion |
Xinping Zhao et.al. |
2410.10408 |
null |
2024-10-14 |
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search |
Chenglin Li et.al. |
2410.10392 |
null |
2024-10-14 |
Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning |
Yongxin Xu et.al. |
2410.10360 |
null |
2024-10-14 |
SkillAggregation: Reference-free LLM-Dependent Aggregation |
Guangzhi Sun et.al. |
2410.10215 |
null |
2024-10-13 |
A Multi-LLM Orchestration Engine for Personalized, Context-Rich Assistance |
Sumedh Rasal et.al. |
2410.10039 |
null |
2024-10-13 |
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code |
Nan Jiang et.al. |
2410.09997 |
null |
2024-10-15 |
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models |
Han Qiu et.al. |
2410.09962 |
link |
2024-10-13 |
Can Large Language Models Generate Geospatial Code? |
Shuyang Hou et.al. |
2410.09738 |
null |
2024-10-13 |
Taming Overconfidence in LLMs: Reward Calibration in RLHF |
Jixuan Leng et.al. |
2410.09724 |
link |
2024-10-13 |
Honest AI: Fine-Tuning “Small” Language Models to Say “I Don’t Know”, and Reducing Hallucination in RAG |
Xinxi Chen et.al. |
2410.09699 |
null |
2024-10-13 |
Integrating Reinforcement Learning and Large Language Models for Crop Production Process Management Optimization and Control through A New Knowledge-Based Deep Learning Paradigm |
Dong Chen et.al. |
2410.09680 |
null |
2024-10-12 |
FlatQuant: Flatness Matters for LLM Quantization |
Yuxuan Sun et.al. |
2410.09426 |
link |
2024-10-12 |
LLM $\times$ MapReduce: Simplified Long-Sequence Processing using Large Language Models |
Zihan Zhou et.al. |
2410.09342 |
link |
2024-10-15 |
Nudging: Inference-time Alignment via Model Collaboration |
Yu Fei et.al. |
2410.09300 |
null |
2024-10-11 |
Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective |
Bo Ni et.al. |
2410.08985 |
null |
2024-10-11 |
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models |
Zheng Yi Ho et.al. |
2410.08970 |
null |
2024-10-11 |
Decoding Secret Memorization in Code LLMs Through Token-Level Characterization |
Yuqing Nie et.al. |
2410.08858 |
null |
2024-10-11 |
Measuring the Inconsistency of Large Language Models in Preferential Ranking |
Xiutian Zhao et.al. |
2410.08851 |
null |
2024-10-11 |
Unveiling Molecular Secrets: An LLM-Augmented Linear Model for Explainable and Calibratable Molecular Property Prediction |
Zhuoran Li et.al. |
2410.08829 |
link |
2024-10-11 |
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation |
Ruobing Wang et.al. |
2410.08821 |
link |
2024-10-11 |
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding |
Houlun Chen et.al. |
2410.08593 |
link |
2024-10-11 |
Humanity in AI: Detecting the Personality of Large Language Models |
Baohua Zhan et.al. |
2410.08545 |
null |
2024-10-11 |
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both |
Abhijnan Nath et.al. |
2410.08458 |
null |
2024-10-11 |
oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness |
Yu He Ke et.al. |
2410.08431 |
null |
2024-10-10 |
Large Airfoil Models |
Howon Lee et.al. |
2410.08392 |
null |
2024-10-10 |
Think Beyond Size: Dynamic Prompting for More Effective Reasoning |
Kamesh R et.al. |
2410.08130 |
null |
2024-10-10 |
A Closer Look at Machine Unlearning for Large Language Models |
Xiaojian Yuan et.al. |
2410.08109 |
link |
2024-10-10 |
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering |
Yuan Sui et.al. |
2410.08085 |
null |
2024-10-10 |
Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses |
Pranav Senthilkumar et.al. |
2410.07826 |
null |
2024-10-10 |
Mitigating Gender Bias in Code Large Language Models via Model Editing |
Zhanyue Qin et.al. |
2410.07820 |
null |
2024-10-10 |
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning |
Zirui Zhao et.al. |
2410.07627 |
link |
2024-10-10 |
No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users |
Mengxuan Hu et.al. |
2410.07589 |
null |
2024-10-10 |
OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting |
Xukai Liu et.al. |
2410.07549 |
link |
2024-10-10 |
MKGL: Mastery of a Three-Word Language |
Lingbing Guo et.al. |
2410.07526 |
null |
2024-10-09 |
Localizing Factual Inconsistencies in Attributable Text Generation |
Arie Cattan et.al. |
2410.07473 |
link |
2024-10-09 |
Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning |
Abhinav Bandari et.al. |
2410.07461 |
link |
2024-10-09 |
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making |
Manling Li et.al. |
2410.07166 |
link |
2024-10-09 |
Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization |
Chengtao Jian et.al. |
2410.07018 |
null |
2024-10-09 |
Self-Boosting Large Language Models with Synthetic Preference Data |
Qingxiu Dong et.al. |
2410.06961 |
null |
2024-10-09 |
AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation |
Huanxi Liu et.al. |
2410.06943 |
null |
2024-10-09 |
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning |
Runchuan Zhu et.al. |
2410.06913 |
link |
2024-10-09 |
Calibrating Verbalized Probabilities for Large Language Models |
Cheng Wang et.al. |
2410.06707 |
null |
2024-10-09 |
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack |
Leo McKee-Reid et.al. |
2410.06491 |
null |
2024-10-09 |
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders |
David Noever et.al. |
2410.06462 |
null |
2024-10-09 |
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs |
Ruijia Niu et.al. |
2410.06431 |
null |
2024-10-08 |
Validation of the Scientific Literature via Chemputation Augmented by Large Language Models |
Sebastian Pagel et.al. |
2410.06384 |
null |
2024-10-08 |
Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning |
Ruosen Li et.al. |
2410.06304 |
null |
2024-10-08 |
EVOLvE: Evaluating and Optimizing LLMs For Exploration |
Allen Nie et.al. |
2410.06238 |
null |
2024-10-08 |
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution |
Corban Rivera et.al. |
2410.06108 |
null |
2024-10-10 |
LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs |
Vincent Emonet et.al. |
2410.06062 |
link |
2024-10-08 |
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models |
Bozhou Li et.al. |
2410.05802 |
null |
2024-10-08 |
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition |
Zheyang Xiong et.al. |
2410.05603 |
null |
2024-10-07 |
Self-rationalization improves LLM as a fine-grained judge |
Prapti Trivedi et.al. |
2410.05495 |
null |
2024-10-07 |
ESPACE: Dimensionality Reduction of Activations for Model Compression |
Charbel Sakr et.al. |
2410.05437 |
null |
2024-10-05 |
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms |
Yilong Li et.al. |
2410.05315 |
null |
2024-10-07 |
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe |
Yuxin Xiao et.al. |
2410.05248 |
null |
2024-10-07 |
Precise Model Benchmarking with Only a Few Observations |
Riccardo Fogliato et.al. |
2410.05222 |
null |
2024-10-07 |
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality |
Guanyu Zhou et.al. |
2410.04780 |
link |
2024-10-07 |
Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering |
Zimu Wang et.al. |
2410.04752 |
null |
2024-10-06 |
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval |
Pengcheng Jiang et.al. |
2410.04585 |
link |
2024-10-06 |
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination |
Xuan Gong et.al. |
2410.04514 |
null |
2024-10-05 |
DiDOTS: Knowledge Distillation from Large-Language-Models for Dementia Obfuscation in Transcribed Speech |
Dominika Woszczyk et.al. |
2410.04188 |
null |
2024-10-04 |
dZiner: Rational Inverse Design of Materials with AI Agents |
Mehrad Ansari et.al. |
2410.03963 |
link |
2024-10-03 |
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge |
Aparna Elangovan et.al. |
2410.03775 |
link |
2024-10-04 |
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores |
Robert E. Blackwell et.al. |
2410.03492 |
null |
2024-10-04 |
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation |
Tobias Leemann et.al. |
2410.03461 |
null |
2024-10-08 |
Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs |
Louis Serrano et.al. |
2410.03437 |
null |
2024-10-04 |
Towards a Benchmark for Large Language Models for Business Process Management Tasks |
Kiran Busch et.al. |
2410.03255 |
link |
2024-10-04 |
Showing LLM-Generated Code Selectively Based on Confidence of LLMs |
Jia Li et.al. |
2410.03234 |
null |
2024-10-04 |
ALR $^2$ : A Retrieve-then-Reason Framework for Long-context Question Answering |
Huayang Li et.al. |
2410.03227 |
null |
2024-10-04 |
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback |
Kyuyoung Kim et.al. |
2410.03145 |
link |
2024-10-04 |
SAG: Style-Aligned Article Generation via Model Collaboration |
Chenning Xu et.al. |
2410.03137 |
null |
2024-10-10 |
ARB-LLM: Alternating Refined Binarizations for Large Language Models |
Zhiteng Li et.al. |
2410.03129 |
link |
2024-10-04 |
UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference |
Jing Xiong et.al. |
2410.03090 |
null |
2024-10-04 |
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues |
Shilin Qu et.al. |
2410.03049 |
null |
2024-10-03 |
Characterizing Context Influence and Hallucination in Summarization |
James Flemings et.al. |
2410.03026 |
link |
2024-10-03 |
Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review |
Sungduk Yu et.al. |
2410.03019 |
null |
2024-09-30 |
Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG |
Chenhao Fang et.al. |
2410.02825 |
null |
2024-10-09 |
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation |
Han He et.al. |
2410.02748 |
link |
2024-10-03 |
Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization |
Lei Xu et.al. |
2410.02741 |
link |
2024-10-03 |
Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization |
Ryan C. Barron et.al. |
2410.02721 |
null |
2024-10-07 |
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations |
Hadas Orgad et.al. |
2410.02707 |
link |
2024-10-03 |
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers |
Shijie Chen et.al. |
2410.02642 |
null |
2024-10-03 |
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration |
Yun Qu et.al. |
2410.02511 |
link |
2024-10-03 |
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models |
Junfeng Fang et.al. |
2410.02355 |
link |
2024-10-04 |
How Much Can RAG Help the Reasoning of LLM? |
Jingyu Liu et.al. |
2410.02338 |
null |
2024-10-03 |
Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference |
Wei Cheng et.al. |
2410.02210 |
null |
2024-10-03 |
Efficiently Deploying LLMs with Controlled Risk |
Michael J. Zellinger et.al. |
2410.02173 |
null |
2024-10-03 |
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments |
Amogh Mannekote et.al. |
2410.02110 |
link |
2024-10-02 |
DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection |
Daiki Chiba et.al. |
2410.02095 |
null |
2024-10-02 |
DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning |
Yebowen Hu et.al. |
2410.01772 |
null |
2024-10-02 |
CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs |
Kangsheng Wang et.al. |
2410.01696 |
null |
2024-10-02 |
FactAlign: Long-form Factuality Alignment of Large Language Models |
Chao-Wei Huang et.al. |
2410.01691 |
link |
2024-10-02 |
Intent Detection in the Age of LLMs |
Gaurav Arora et.al. |
2410.01627 |
null |
2024-10-02 |
Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration |
Kangxi Wu et.al. |
2410.01285 |
null |
2024-10-02 |
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation |
Bryan Li et.al. |
2410.01171 |
link |
2024-10-01 |
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability |
Weitong Zhang et.al. |
2410.01064 |
null |
2024-10-01 |
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown |
Xingzhou Lou et.al. |
2410.00847 |
null |
2024-10-01 |
Dynamic Planning for LLM-based Graphical User Interface Automation |
Shaoqing Zhang et.al. |
2410.00467 |
link |
2024-10-01 |
UniAdapt: A Universal Adapter for Knowledge Calibration |
Tai D. Nguyen et.al. |
2410.00454 |
null |
2024-10-01 |
Are LLMs Aware that Some Questions are not Open-ended? |
Dongjie Yang et.al. |
2410.00423 |
null |
2024-10-01 |
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation |
Bhargav Shandilya et.al. |
2410.00387 |
null |
2024-09-30 |
A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification |
Marina Ribeiro et.al. |
2410.00250 |
null |
2024-09-30 |
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation |
Ziyao Zhang et.al. |
2409.20550 |
link |
2024-09-30 |
Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models |
Arpan Mukherjee et.al. |
2409.20512 |
null |
2024-10-04 |
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs |
Ruotong Liao et.al. |
2409.20365 |
link |
2024-09-30 |
MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants |
Zeyu Zhang et.al. |
2409.20163 |
link |
2024-09-30 |
Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation |
Huangyu Dai et.al. |
2409.19877 |
null |
2024-09-29 |
Calibrating Language Models with Adaptive Temperature Scaling |
Johnathan Xie et.al. |
2409.19817 |
link |
2024-09-29 |
MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models |
Vibhor Agarwal et.al. |
2409.19492 |
null |
2024-09-28 |
Overriding Safety protections of Open-source Models |
Sachin Kumar et.al. |
2409.19476 |
link |
2024-09-28 |
SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models |
Yi Wu et.al. |
2409.19471 |
link |
2024-09-28 |
Decoding Echo Chambers: LLM-Powered Simulations Revealing Polarization in Social Networks |
Chenxi Wang et.al. |
2409.19338 |
null |
2024-09-28 |
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning |
Kazuki Matsuda et.al. |
2409.19255 |
null |
2024-09-27 |
Secure Multiparty Generative AI |
Manil Shrestha et.al. |
2409.19120 |
null |
2024-09-27 |
A Survey on the Honesty of Large Language Models |
Siheng Li et.al. |
2409.18786 |
link |
2024-10-02 |
Model-based Preference Optimization in Abstractive Summarization without Human Feedback |
Jaepill Choi et.al. |
2409.18618 |
link |
2024-09-26 |
Cross-Institutional Structured Radiology Reporting for Lung Cancer Screening Using a Dynamic Template-Constrained Large Language Model |
Chuang Niu et.al. |
2409.18319 |
link |
2024-09-26 |
Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT |
Natthanaphop Isaradech et.al. |
2409.17683 |
null |
2024-09-26 |
A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models |
Syed Affan Daimi et.al. |
2409.17581 |
link |
2024-09-26 |
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection |
Xuefeng Du et.al. |
2409.17504 |
null |
2024-09-25 |
Post-hoc Reward Calibration: A Case Study on Length Bias |
Zeyu Huang et.al. |
2409.17407 |
link |
2024-09-25 |
Search for Efficient Large Language Models |
Xuan Shen et.al. |
2409.17372 |
link |
2024-09-20 |
A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models |
Satoshi Munakata et.al. |
2409.17173 |
null |
2024-09-25 |
Mitigating the Bias of Large Language Model Evaluation |
Hongli Zhou et.al. |
2409.16788 |
link |
2024-09-25 |
RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems |
Yihong Tang et.al. |
2409.16727 |
null |
2024-09-25 |
EventHallusion: Diagnosing Event Hallucinations in Video LLMs |
Jiacheng Zhang et.al. |
2409.16597 |
link |
2024-09-25 |
Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels |
Yishu Wei et.al. |
2409.16563 |
null |
2024-09-24 |
MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment |
Venkata Naren Devarakonda et.al. |
2409.16455 |
null |
2024-09-24 |
Automated test generation to evaluate tool-augmented LLMs as conversational AI agents |
Samuel Arcadinho et.al. |
2409.15934 |
null |
2024-09-24 |
Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts |
Sukai Huang et.al. |
2409.15915 |
null |
2024-09-24 |
Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection |
Xingyu Ma et.al. |
2409.15907 |
null |
2024-09-24 |
XTRUST: On the Multilingual Trustworthiness of Large Language Models |
Yahan Li et.al. |
2409.15762 |
link |
2024-09-23 |
Parse Trees Guided LLM Prompt Compression |
Wenhao Mao et.al. |
2409.15395 |
link |
2024-09-18 |
VERA: Validation and Enhancement for Retrieval Augmented systems |
Nitin Aravind Birur et.al. |
2409.15364 |
null |
2024-09-18 |
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning |
Essa Jan et.al. |
2409.15361 |
null |
2024-09-27 |
Reward-Robust RLHF in LLMs |
Yuzi Yan et.al. |
2409.15360 |
null |
2024-09-23 |
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? |
Yunfei Xie et.al. |
2409.15277 |
null |
2024-09-26 |
A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models |
Yixi Wu et.al. |
2409.15228 |
null |
2024-09-23 |
Boosting Healthcare LLMs Through Retrieved Context |
Jordi Bayarri-Planas et.al. |
2409.15127 |
link |
2024-09-23 |
Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications |
Sean Kim et.al. |
2409.15076 |
null |
2024-09-23 |
InterMind: A Doctor-Patient-Family Interactive Depression Assessment System Empowered by Large Language Models |
Zhiyuan Zhou et.al. |
2409.14878 |
null |
2024-09-23 |
Past Meets Present: Creating Historical Analogy with Large Language Models |
Nianqi Li et.al. |
2409.14820 |
link |
2024-09-28 |
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method |
Weichao Zhang et.al. |
2409.14781 |
link |
2024-09-23 |
zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning |
Zixiang Xian et.al. |
2409.14644 |
null |
2024-09-22 |
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization |
Minyi Zhao et.al. |
2409.14484 |
null |
2024-09-22 |
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses |
Hung-Ting Su et.al. |
2409.14324 |
link |
2024-09-21 |
OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching |
Zhangcheng Qiang et.al. |
2409.14038 |
null |
2024-09-20 |
Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology |
Aidan Gilson et.al. |
2409.13902 |
null |
2024-09-20 |
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs |
Bowen Yan et.al. |
2409.13612 |
null |
2024-09-20 |
ChainBuddy: An AI Agent System for Generating LLM Pipelines |
Jingyue Zhang et.al. |
2409.13588 |
null |
2024-09-23 |
AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit |
Mohanna Hoveyda et.al. |
2409.13447 |
link |
2024-09-20 |
Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey |
Sourav Verma et.al. |
2409.13385 |
link |
2024-09-20 |
Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems |
Andrea Colombo et.al. |
2409.13252 |
null |
2024-09-19 |
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models |
Peiyi Zhang et.al. |
2409.12739 |
null |
2024-09-19 |
LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks |
Malsha Ashani Mahawatta Dona et.al. |
2409.12580 |
null |
2024-09-19 |
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation |
Chen Liang et.al. |
2409.12411 |
null |
2024-09-19 |
On the Effectiveness of LLMs for Manual Test Verifications |
Myron David Lucena Campos Peixoto et.al. |
2409.12405 |
null |
2024-09-18 |
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models |
Abhinav Jain et.al. |
2409.12294 |
null |
2024-09-18 |
Finetuning Language Models to Emit Linguistic Expressions of Uncertainty |
Arslan Chaudhry et.al. |
2409.12180 |
null |
2024-09-05 |
LitFM: A Retrieval Augmented Structure-aware Foundation Model For Citation Graphs |
Jiasheng Zhang et.al. |
2409.12177 |
null |
2024-09-18 |
Combating Phone Scams with LLM-based Detection: Where Do We Stand? |
Zitong Shen et.al. |
2409.11643 |
null |
2024-09-17 |
HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection |
Theo King et.al. |
2409.11579 |
link |
2024-09-17 |
What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts |
Shuaiyu Chen et.al. |
2409.11540 |
null |
2024-09-17 |
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration |
Jiahui Gao et.al. |
2409.11365 |
null |
2024-09-17 |
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models |
Mengfei Liang et.al. |
2409.11353 |
link |
2024-09-25 |
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling |
Xinyue Fang et.al. |
2409.11283 |
null |
2024-09-17 |
Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models |
Bishwash Khanal et.al. |
2409.11233 |
null |
2024-09-17 |
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization |
Jianing Wang et.al. |
2409.11212 |
link |
2024-09-17 |
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B |
Jemin Lee et.al. |
2409.11055 |
link |
2024-09-16 |
Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering |
Qingru Zhang et.al. |
2409.10790 |
null |
2024-09-16 |
“The Data Says Otherwise”-Towards Automated Fact-checking and Communication of Data Claims |
Yu Fu et.al. |
2409.10713 |
null |
2024-09-17 |
Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot |
Bhuvan Sachdeva et.al. |
2409.10354 |
null |
2024-09-16 |
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey |
Yujia Zhou et.al. |
2409.10102 |
link |
2024-09-16 |
Benchmarking Large Language Model Uncertainty for Prompt Optimization |
Pei-Fu Guo et.al. |
2409.10044 |
link |
2024-09-18 |
HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making |
Sumera Anjum et.al. |
2409.10011 |
link |
2024-09-23 |
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations |
Abe Bohan Hou et.al. |
2409.09947 |
link |
2024-09-16 |
Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges |
Vinay Samuel et.al. |
2409.09927 |
link |
2024-09-16 |
SFR-RAG: Towards Contextually Faithful LLMs |
Xuan-Phi Nguyen et.al. |
2409.09916 |
null |
2024-09-15 |
ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing |
Suhyeon Yoo et.al. |
2409.09760 |
null |
2024-09-15 |
ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts |
Che Wang et.al. |
2409.09661 |
link |
2024-09-21 |
Confidence Estimation for LLM-Based Dialogue State Tracking |
Yi-Jyun Sun et.al. |
2409.09629 |
link |
2024-09-14 |
VernaCopter: Disambiguated Natural-Language-Driven Robot via Formal Specifications |
Teun van de Laar et.al. |
2409.09536 |
link |
2024-09-14 |
Hacking, The Lazy Way: LLM Augmented Pentesting |
Dhruva Goyal et.al. |
2409.09493 |
null |
2024-09-19 |
The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection |
Yi Yang et.al. |
2409.09380 |
null |
2024-09-13 |
Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions |
Zahra Ashktorab et.al. |
2409.08937 |
null |
2024-09-23 |
When Context Leads but Parametric Memory Follows in Large Language Models |
Yufei Tao et.al. |
2409.08435 |
link |
2024-09-12 |
Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT |
Irene Weber et.al. |
2409.07732 |
link |
2024-09-11 |
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications |
Praveen K Kanithi et.al. |
2409.07314 |
null |
2024-09-11 |
Reranking Laws for Language Generation: A Communication-Theoretic Perspective |
António Farinhas et.al. |
2409.07131 |
null |
2024-09-11 |
Understanding Knowledge Drift in LLMs through Misinformation |
Alina Fastowski et.al. |
2409.07085 |
link |
2024-09-11 |
Representation Tuning |
Christopher M. Ackerman et.al. |
2409.06927 |
link |
2024-09-10 |
Semi-Supervised Reward Modeling via Iterative Self-Training |
Yifei He et.al. |
2409.06903 |
link |
2024-09-10 |
Geometric-Averaged Preference Optimization for Soft Preference Labels |
Hiroki Furuta et.al. |
2409.06691 |
null |
2024-09-10 |
Alleviating Hallucinations in Large Language Models with Scepticism Modeling |
Yetao Wu et.al. |
2409.06601 |
null |
2024-09-10 |
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering |
Sacha Muller et.al. |
2409.06595 |
link |
2024-09-10 |
Automate Strategy Finding with LLM in Quant investment |
Zhizhuo Kou et.al. |
2409.06289 |
null |
2024-09-14 |
ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog |
Yujian Gan et.al. |
2409.06097 |
link |
2024-09-09 |
$\mathbb{USCD}$ : Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding |
Shuai Wang et.al. |
2409.05923 |
null |
2024-09-09 |
Benchmarking Chinese Knowledge Rectification in Large Language Models |
Tianhe Lu et.al. |
2409.05806 |
link |
2024-09-09 |
LLMs Will Always Hallucinate, and We Need to Live With This |
Sourav Banerjee et.al. |
2409.05746 |
null |
2024-09-07 |
LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs |
Yongxin Deng et.al. |
2409.04744 |
null |
2024-09-03 |
Here’s Charlie! Realising the Semantic Web vision of Agents in the age of LLMs |
Jesse Wright et.al. |
2409.04465 |
null |
2024-09-06 |
Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering |
Larissa Pusch et.al. |
2409.04181 |
null |
2024-09-13 |
Safeguarding AI Agents: Developing and Analyzing Safety Architectures |
Ishaan Domkundwar et.al. |
2409.03793 |
null |
2024-09-06 |
RAG based Question-Answering for Contextual Response Prediction System |
Sriram Veturi et.al. |
2409.03708 |
null |
2024-09-05 |
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration |
Jeremy Qin et.al. |
2409.03225 |
link |
2024-09-05 |
Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models |
Jie Ma et.al. |
2409.03155 |
link |
2024-09-04 |
CLUE: Concept-Level Uncertainty Estimation for Large Language Models |
Yu-Hsiang Wang et.al. |
2409.03021 |
null |
2024-09-04 |
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models |
Gabriel Y. Arteaga et.al. |
2409.02976 |
link |
2024-09-10 |
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA |
Jiajie Zhang et.al. |
2409.02897 |
link |
2024-09-04 |
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs |
Ruoyu Wang et.al. |
2409.02686 |
null |
2024-09-03 |
Initial Development and Evaluation of the Creative Artificial Intelligence through Recurring Developments and Determinations (CAIRDD) System |
Jeremy Straub et.al. |
2409.02291 |
null |
2024-09-03 |
Physical Rule-Guided Convolutional Neural Network |
Kishor Datta Gupta et.al. |
2409.02081 |
null |
2024-09-03 |
RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer |
Jiangyi Deng et.al. |
2409.02074 |
null |
2024-08-25 |
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM |
Jiace Zhu et.al. |
2409.01281 |
null |
2024-09-02 |
Statically Contextualizing Large Language Models with Typed Holes |
Andrew Blinn et.al. |
2409.00921 |
null |
2024-09-01 |
Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering |
Derian Boer et.al. |
2409.00861 |
link |
2024-09-04 |
Learning to Ask: When LLMs Meet Unclear Instruction |
Wenxuan Wang et.al. |
2409.00557 |
null |
2024-08-31 |
Does Alignment Tuning Really Break LLMs’ Internal Confidence? |
Hongseok Oh et.al. |
2409.00352 |
link |
2024-09-08 |
ProGRes: Prompted Generative Rescoring on ASR n-Best |
Ada Defne Tur et.al. |
2409.00217 |
link |
2024-08-30 |
LLMs hallucinate graphs too: a structural perspective |
Erwan Le Merrer et.al. |
2409.00159 |
null |
2024-08-29 |
HoneyComb: A Flexible LLM-Based Agent System for Materials Science |
Huan Zhang et.al. |
2409.00135 |
null |
2024-09-04 |
Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs |
Ziyan Cui et.al. |
2409.00128 |
null |
2024-09-08 |
Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning |
Momin Abbas et.al. |
2409.00124 |
null |
2024-09-04 |
Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation |
Mohammad Nadeem et.al. |
2409.00105 |
null |
2024-08-26 |
Evaluating ChatGPT on Nuclear Domain-Specific Data |
Muhammad Anwar et.al. |
2409.00090 |
null |
2024-08-26 |
Watermarking Techniques for Large Language Models: A Survey |
Yuqing Liang et.al. |
2409.00089 |
null |
2024-08-30 |
Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain |
Francesca Grasso et.al. |
2408.17362 |
link |
2024-08-30 |
Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling |
Guangya Wan et.al. |
2408.17017 |
null |
2024-09-05 |
UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches |
Chao Wang et.al. |
2408.16966 |
null |
2024-09-04 |
Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies |
Zhiyang Qi et.al. |
2408.16586 |
null |
2024-08-29 |
LoraMap: Harnessing the Power of LoRA Connections |
Hyeryun Park et.al. |
2408.16264 |
null |
2024-08-28 |
Logic-Enhanced Language Model Agents for Trustworthy Social Simulations |
Agnieszka Mensfelt et.al. |
2408.16081 |
link |
2024-08-28 |
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration |
Yao Zhang et.al. |
2408.15978 |
null |
2024-09-07 |
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models |
Yuncheng Yang et.al. |
2408.15915 |
link |
2024-08-28 |
Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization |
Léo Hemamou et.al. |
2408.15801 |
null |
2024-08-28 |
An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation |
Thai Tang Quoc et.al. |
2408.15658 |
null |
2024-08-28 |
Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation |
Lujun Gui et.al. |
2408.15562 |
null |
2024-08-29 |
LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation |
Haichuan Hu et.al. |
2408.15533 |
link |
2024-08-28 |
Enhancing and Accelerating Large Language Models via Instruction-Aware Contextual Compression |
Haowen Hou et.al. |
2408.15491 |
link |
2024-08-27 |
The Uniqueness of LLaMA3-70B with Per-Channel Quantization: An Empirical Study |
Minghai Qin et.al. |
2408.15301 |
null |
2024-08-27 |
Can Unconfident LLM Annotations Be Used for Confident Conclusions? |
Kristina Gligorić et.al. |
2408.15204 |
link |
2024-08-27 |
Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation |
N. E. Kriman et.al. |
2408.15171 |
null |
2024-08-27 |
Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering |
Haowei Du et.al. |
2408.15037 |
null |
2024-08-28 |
Language-specific Calibration for Pruning Multilingual Language Models |
Simon Kurz et.al. |
2408.14398 |
null |
2024-08-26 |
Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders |
Cong Xu et.al. |
2408.14238 |
link |
2024-08-25 |
CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction |
Guangya Wan et.al. |
2408.13940 |
null |
2024-08-25 |
Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models |
Duy Khoa Pham et.al. |
2408.13808 |
null |
2024-08-25 |
Poor-Supervised Evaluation for SuperLLM via Mutual Consistency |
Peiwen Yuan et.al. |
2408.13738 |
null |
2024-08-25 |
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models |
Aoxiao Zhong et.al. |
2408.13727 |
null |
2024-08-24 |
Pandora’s Box or Aladdin’s Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models |
Jinyang Wu et.al. |
2408.13533 |
null |
2024-08-27 |
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning |
Hourui Deng et.al. |
2408.13184 |
null |
2024-08-23 |
IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models |
Zhihao Yu et.al. |
2408.13073 |
link |
2024-08-23 |
Internal and External Knowledge Interactive Refinement Framework for Knowledge-Intensive Question Answering |
Haowei Du et.al. |
2408.12979 |
null |
2024-08-22 |
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection |
Mengya Hu et.al. |
2408.12748 |
link |
2024-08-22 |
Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning |
Mushui Liu et.al. |
2408.12469 |
null |
2024-08-22 |
A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation |
Weijia Zhang et.al. |
2408.12398 |
null |
2024-09-04 |
Graph Retrieval Augmented Trustworthiness Reasoning |
Ying Zhu et.al. |
2408.12333 |
link |
2024-08-22 |
Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models |
Meiyun Wang et.al. |
2408.12326 |
link |
2024-08-22 |
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators |
Dingkang Yang et.al. |
2408.12325 |
link |
2024-08-22 |
MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient |
Yanzeng Li et.al. |
2408.12236 |
null |
2024-08-22 |
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation |
KaShun Shum et.al. |
2408.12168 |
link |
2024-08-22 |
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM |
Zhaochen Su et.al. |
2408.12076 |
link |
2024-08-21 |
Understanding Epistemic Language with a Bayesian Theory of Mind |
Lance Ying et.al. |
2408.12022 |
null |
2024-08-21 |
RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization |
Jinhu Qi et.al. |
2408.12003 |
null |
2024-08-21 |
Automatic knowledge-graph creation from historical documents: The Chilean dictatorship as a case study |
Camila Díaz et.al. |
2408.11975 |
null |
2024-08-23 |
Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy |
Priyanka Mandikal et.al. |
2408.11903 |
link |
2024-08-17 |
How Susceptible are LLMs to Influence in Prompts? |
Sotiris Anagnostidis et.al. |
2408.11865 |
null |
2024-08-21 |
DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework |
Zhifei Xie et.al. |
2408.11788 |
null |
2024-08-21 |
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning |
Zhihao Li et.al. |
2408.11397 |
null |
2024-08-21 |
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models |
Chi Ma et.al. |
2408.11393 |
null |
2024-08-21 |
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation |
Xuanwang Zhang et.al. |
2408.11381 |
link |
2024-08-20 |
A Little Confidence Goes a Long Way |
John Scoville et.al. |
2408.11239 |
null |
2024-08-20 |
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model |
Chenhan Yuan et.al. |
2408.10764 |
null |
2024-08-20 |
Unconditional Truthfulness: Learning Conditional Dependency for Uncertainty Quantification of Large Language Models |
Artem Vazhentsev et.al. |
2408.10692 |
null |
2024-08-20 |
Analysis of Plan-based Retrieval for Grounded Text Generation |
Ameya Godbole et.al. |
2408.10490 |
null |
2024-08-20 |
LeCov: Multi-level Testing Criteria for Large Language Models |
Xuan Xie et.al. |
2408.10474 |
null |
2024-08-19 |
Enhanced document retrieval with topic embeddings |
Kavsar Huseynova et.al. |
2408.10435 |
null |
2024-08-19 |
LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain |
Nicholas Pipitone et.al. |
2408.10343 |
link |
2024-08-19 |
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models |
Tianyu Zhang et.al. |
2408.10124 |
link |
2024-08-19 |
MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation |
Ching-Wen Yang et.al. |
2408.09865 |
null |
2024-08-19 |
Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? |
Shiyu Ni et.al. |
2408.09773 |
null |
2024-08-19 |
A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification |
Claudio M. V. de Andrade et.al. |
2408.09629 |
null |
2024-08-17 |
TC-RAG:Turing-Complete RAG’s Case study on Medical LLM Systems |
Xinke Jiang et.al. |
2408.09199 |
link |
2024-08-17 |
Chinese Metaphor Recognition Using a Multi-stage Prompting Large Language Model |
Jie Wang et.al. |
2408.09177 |
null |
2024-08-17 |
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making |
Siyu Wu et.al. |
2408.09176 |
null |
2024-08-24 |
Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection |
Hsiu-Yuan Huang et.al. |
2408.09172 |
null |
2024-08-15 |
Graph Retrieval-Augmented Generation: A Survey |
Boci Peng et.al. |
2408.08921 |
link |
2024-08-12 |
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection |
Chengyu Song et.al. |
2408.08902 |
null |
2024-08-22 |
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions |
Chenming Tang et.al. |
2408.08780 |
null |
2024-08-16 |
Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused |
Dingwei Chen et.al. |
2408.08769 |
null |
2024-08-16 |
MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector |
Wenjie Fu et.al. |
2408.08661 |
link |
2024-08-16 |
PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code |
Ziyou Jiang et.al. |
2408.08619 |
null |
2024-08-16 |
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models |
Kaushal Kumar Maurya et.al. |
2408.08545 |
null |
2024-08-15 |
Plan with Code: Comparing approaches for robust NL to DSL generation |
Nastaran Bassamzadeh et.al. |
2408.08335 |
null |
2024-08-14 |
CodeMirage: Hallucinations in Code Generated by Large Language Models |
Vibhor Agarwal et.al. |
2408.08333 |
null |
2024-08-16 |
Covert Bias: The Severity of Social Views’ Unalignment in Language Models Towards Implicit and Explicit Opinion |
Abeer Aldayel et.al. |
2408.08212 |
null |
2024-08-15 |
LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation |
Bohao Wang et.al. |
2408.08208 |
null |
2024-08-15 |
Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy |
Shaojun Xu et.al. |
2408.08188 |
null |
2024-08-15 |
Confidence-weighted integration of human and machine judgments for superior decision-making |
Felipe Yáñez et.al. |
2408.08083 |
link |
2024-08-15 |
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning |
Jiajie Li et.al. |
2408.07981 |
null |
2024-08-14 |
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization |
Yuxin Jiang et.al. |
2408.07471 |
link |
2024-08-13 |
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty |
Yongjin Yang et.al. |
2408.06816 |
link |
2024-08-12 |
A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution |
Sampath Rajapaksha et.al. |
2408.06272 |
null |
2024-08-12 |
On Effects of Steering Latent Representation for Large Language Model Unlearning |
Dang Huu-Tien et.al. |
2408.06223 |
link |
2024-08-11 |
Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models |
Wenbo Zhang et.al. |
2408.05873 |
link |
2024-08-10 |
Can LLMs Replace Manual Annotation of Software Engineering Artifacts? |
Toufique Ahmed et.al. |
2408.05534 |
null |
2024-08-19 |
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning |
Yuze Zhao et.al. |
2408.05517 |
link |
2024-08-09 |
FiST-Financial Style Transfer with Hallucination and Creativity Control Framework |
Sohini Roychowdhury et.al. |
2408.05365 |
null |
2024-08-09 |
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning |
Ye Yuan et.al. |
2408.05141 |
null |
2024-08-16 |
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models |
Zikai Xie et.al. |
2408.05093 |
link |
2024-08-08 |
Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews |
Samantha Chan et.al. |
2408.04681 |
link |
2024-08-06 |
Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD) |
Avshalom Manevich et.al. |
2408.04664 |
null |
2024-08-08 |
Arctic-TILT. Business Document Understanding at Sub-Billion Scale |
Łukasz Borchmann et.al. |
2408.04632 |
null |
2024-08-08 |
Learning Fine-Grained Grounded Citations for Attributed Large Language Models |
Lei Huang et.al. |
2408.04568 |
link |
2024-08-20 |
Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate |
Yiqun Zhang et.al. |
2408.04472 |
link |
2024-08-07 |
Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology |
Panagiotis Fytas et.al. |
2408.04121 |
null |
2024-08-07 |
Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks |
Zizhang Chen et.al. |
2408.03732 |
null |
2024-08-19 |
KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models |
Ruizhe Zhang et.al. |
2408.03297 |
null |
2024-08-05 |
An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs |
Dongming Jin et.al. |
2408.02450 |
null |
2024-08-05 |
SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models |
Shujuan Zhao et.al. |
2408.02302 |
null |
2024-08-07 |
SpecRover: Code Intent Extraction via LLMs |
Haifeng Ruan et.al. |
2408.02232 |
null |
2024-08-05 |
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning |
Yuxuan Wang et.al. |
2408.02210 |
null |
2024-08-04 |
Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process |
Peng Wang et.al. |
2408.02103 |
null |
2024-08-04 |
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference |
Ke Shen et.al. |
2408.01935 |
null |
2024-08-03 |
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation |
Xingpeng Sun et.al. |
2408.01867 |
null |
2024-08-03 |
WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization |
Liwenhan Xie et.al. |
2408.01703 |
null |
2024-08-02 |
Analyzing LLMs’ Capabilities to Establish Implicit User Sentiment of Software Desirability |
Sherri Weitl-Harms et.al. |
2408.01527 |
null |
2024-07-28 |
Faculty Perspectives on the Potential of RAG in Computer Science Higher Education |
Sagnik Dakshit et.al. |
2408.01462 |
null |
2024-08-18 |
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework |
Kunlun Zhu et.al. |
2408.01262 |
link |
2024-08-02 |
Misinforming LLMs: vulnerabilities, challenges and opportunities |
Bo Zhou et.al. |
2408.01168 |
null |
2024-08-01 |
Granting GPT-4 License and Opportunity: Enhancing Accuracy and Confidence Estimation for Few-Shot Event Detection |
Steven Fincke et.al. |
2408.00914 |
null |
2024-07-26 |
ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model |
Ning Xu et.al. |
2408.00804 |
null |
2024-08-01 |
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions |
Guangzhi Xiong et.al. |
2408.00727 |
link |
2024-08-01 |
Future of Artificial Intelligence in Agile Software Development |
Mariyam Mahboob et.al. |
2408.00703 |
null |
2024-07-25 |
Closing the gap between open-source and commercial large language models for medical evidence summarization |
Gongbo Zhang et.al. |
2408.00588 |
null |
2024-08-01 |
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation |
Xiaoye Qu et.al. |
2408.00555 |
null |
2024-08-01 |
Jailbreaking Text-to-Image Models with LLM-Based Agents |
Yingkai Dong et.al. |
2408.00523 |
null |
2024-08-01 |
DeliLaw: A Chinese Legal Counselling System Based on a Large Language Model |
Nan Xie et.al. |
2408.00357 |
null |
2024-07-31 |
Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation |
Valdemar Danry et.al. |
2408.00024 |
null |
2024-07-30 |
WebApp1K: A Practical Code-Generation Benchmark for Web App Development |
Yi Cui et.al. |
2408.00019 |
link |
2024-07-31 |
Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs |
Shi Liu et.al. |
2407.21771 |
null |
2024-07-31 |
Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency |
Taiji Li et.al. |
2407.21443 |
null |
2024-08-09 |
Cost-Effective Hallucination Detection for LLMs |
Simon Valentin et.al. |
2407.21424 |
null |
2024-07-31 |
Towards interfacing large language models with ASR systems using confidence measures and prompting |
Maryam Naderi et.al. |
2407.21414 |
null |
2024-07-31 |
Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs |
Elan Markowitz et.al. |
2407.21358 |
link |
2024-07-30 |
Accelerating Large Language Model Inference with Self-Supervised Early Exits |
Florian Valade et.al. |
2407.21082 |
null |
2024-07-25 |
Multi-group Uncertainty Quantification for Long-form Text Generation |
Terrance Liu et.al. |
2407.21057 |
null |
2024-07-24 |
Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications |
Cui Long et.al. |
2407.21055 |
null |
2024-07-30 |
Automated Review Generation Method Based on Large Language Models |
Shican Wu et.al. |
2407.20906 |
link |
2024-07-30 |
How to Measure the Intelligence of Large Language Models? |
Nils Körber et.al. |
2407.20828 |
null |
2024-07-30 |
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian |
Serena Auriemma et.al. |
2407.20654 |
null |
2024-07-25 |
An Efficient Inference Framework for Early-exit Large Language Models |
Ruijie Miao et.al. |
2407.20272 |
null |
2024-07-17 |
Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies |
Lachlan McGinness et.al. |
2407.20244 |
null |
2024-08-02 |
Improving Retrieval Augmented Language Model with Self-Reasoning |
Yuan Xia et.al. |
2407.19813 |
null |
2024-07-29 |
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages |
Wenxuan Zhang et.al. |
2407.19672 |
link |
2024-07-27 |
Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review |
Tongyue Shi et.al. |
2407.19256 |
null |
2024-07-26 |
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation |
Zilong Wang et.al. |
2407.19056 |
link |
2024-08-08 |
Know Your Limits: A Survey of Abstention in Large Language Models |
Bingbing Wen et.al. |
2407.18418 |
null |
2024-07-25 |
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement |
Jaehun Jung et.al. |
2407.18370 |
null |
2024-07-25 |
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation |
Eric Yang et.al. |
2407.18044 |
null |
2024-07-24 |
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries |
Wenting Zhao et.al. |
2407.17468 |
null |
2024-07-24 |
ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering |
Xiuying Chen et.al. |
2407.16931 |
null |
2024-07-23 |
Generation Constraint Scaling Can Mitigate Hallucination |
Georgios Kollias et.al. |
2407.16908 |
null |
2024-07-23 |
TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class |
Anishka IIITD et.al. |
2407.16805 |
link |
2024-07-23 |
Shared Imagination: LLMs Hallucinate Alike |
Yilun Zhou et.al. |
2407.16604 |
null |
2024-07-23 |
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs |
Yifan Xia et.al. |
2407.16576 |
null |
2024-07-23 |
Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models |
Ioana Buhnila et.al. |
2407.16565 |
link |
2024-07-25 |
Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models |
Kenza Benkirane et.al. |
2407.16470 |
link |
2024-07-23 |
Enhancing LLM’s Cognition via Structurization |
Kai Liu et.al. |
2407.16434 |
link |
2024-07-23 |
LawLuo: A Chinese Law Firm Co-run by LLM Agents |
Jingyun Sun et.al. |
2407.16252 |
link |
2024-07-23 |
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models |
Nishanth Madhusudhan et.al. |
2407.16221 |
null |
2024-07-22 |
Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned |
Song Wang et.al. |
2407.15441 |
null |
2024-07-22 |
MAVEN-Fact: A Large-scale Event Factuality Detection Dataset |
Chunyang Li et.al. |
2407.15352 |
link |
2024-07-20 |
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models |
Ze Yu Zhang et.al. |
2407.14845 |
null |
2024-07-19 |
Internal Consistency and Self-Feedback in Large Language Models: A Survey |
Xun Liang et.al. |
2407.14507 |
link |
2024-07-19 |
Prompted Aspect Key Point Analysis for Quantitative Review Summarization |
An Quang Tang et.al. |
2407.14049 |
link |
2024-07-18 |
CoDefeater: Using LLMs To Find Defeaters in Assurance Cases |
Usman Gohar et.al. |
2407.13717 |
link |
2024-08-01 |
Prover-Verifier Games improve legibility of LLM outputs |
Jan Hendrik Kirchner et.al. |
2407.13692 |
null |
2024-07-18 |
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models |
Moon Ye-Bin et.al. |
2407.13442 |
null |
2024-07-18 |
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis |
Junying Chen et.al. |
2407.13301 |
link |
2024-07-19 |
AI-Assisted SQL Authoring at Industry Scale |
Chandra Maddila et.al. |
2407.13280 |
null |
2024-07-19 |
Retrieval-Augmented Generation for Natural Language Processing: A Survey |
Shangyu Wu et.al. |
2407.13193 |
null |
2024-07-18 |
Translate-and-Revise: Boosting Large Language Models for Constrained Translation |
Pengcheng Huang et.al. |
2407.13164 |
null |
2024-07-17 |
Halu-J: Critique-Based Hallucination Judge |
Binjie Wang et.al. |
2407.12943 |
link |
2024-08-01 |
Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild |
Nicolas Richet et.al. |
2407.12927 |
link |
2024-07-17 |
Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models |
Alexander R. Pelletier et.al. |
2407.12888 |
link |
2024-07-17 |
LLM-based query paraphrasing for video search |
Jiaxin Wu et.al. |
2407.12341 |
null |
2024-07-17 |
Optimizing Query Generation for Enhanced Document Retrieval in RAG |
Hamin Koo et.al. |
2407.12325 |
null |
2024-07-11 |
NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2 |
Tengfei Xue et.al. |
2407.12057 |
null |
2024-07-16 |
What’s Wrong? Refining Meeting Summaries with LLM Feedback |
Frederic Kirstein et.al. |
2407.11919 |
null |
2024-07-16 |
LoFTI: Localization and Factuality Transfer to Indian Locales |
Sona Elza Simon et.al. |
2407.11833 |
link |
2024-07-16 |
A Framework for Evaluating Appropriateness, Trustworthiness, and Safety in Mental Wellness AI Chatbots |
Lucia Chen et.al. |
2407.11387 |
null |
2024-07-19 |
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models |
Qingcheng Zeng et.al. |
2407.11282 |
link |
2024-07-15 |
AstroMLab 1: Who Wins Astronomy Jeopardy!? |
Yuan-Sen Ting et.al. |
2407.11194 |
null |
2024-07-15 |
Inertial Confinement Fusion Forecasting via LLMs |
Mingkai Chen et.al. |
2407.11098 |
null |
2024-07-15 |
Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis |
Yunting Liu et.al. |
2407.10899 |
null |
2024-07-24 |
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs |
Quang H. Nguyen et.al. |
2407.10834 |
link |
2024-07-15 |
Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval |
Shengjie Ma et.al. |
2407.10805 |
link |
2024-07-15 |
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework |
Hannah Sansford et.al. |
2407.10793 |
null |
2024-07-15 |
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses |
Jing Yao et.al. |
2407.10725 |
null |
2024-07-15 |
Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews |
Lucas Joos et.al. |
2407.10652 |
null |
2024-07-14 |
GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? |
Barah Fazili et.al. |
2407.10245 |
null |
2024-07-14 |
Look Within, Why LLMs Hallucinate: A Causal Perspective |
He Li et.al. |
2407.10153 |
null |
2024-07-13 |
Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues |
KuanChao Chu et.al. |
2407.09897 |
null |
2024-07-13 |
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks |
Shengbin Yue et.al. |
2407.09893 |
link |
2024-07-13 |
On Mitigating Code LLM Hallucinations with API Documentation |
Nihal Jain et.al. |
2407.09726 |
null |
2024-07-22 |
Mitigating Entity-Level Hallucination in Large Language Models |
Weihang Su et.al. |
2407.09417 |
link |
2024-07-12 |
PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents |
Saber Zerhoudi et.al. |
2407.09394 |
link |
2024-07-12 |
DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection |
Sangpil Youm et.al. |
2407.09283 |
null |
2024-07-12 |
The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs |
Anh Thu Maria Bui et.al. |
2407.09152 |
null |
2024-07-12 |
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors |
Nico Daheim et.al. |
2407.09136 |
link |
2024-07-12 |
Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations |
David N. Palacio et.al. |
2407.08983 |
null |
2024-07-15 |
Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation |
Biqing Qi et.al. |
2407.08940 |
link |
2024-07-12 |
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? |
Yingming Pu et.al. |
2407.08922 |
link |
2024-07-11 |
Evaluating Nuanced Bias in Large Language Model Free Response Answers |
Jennifer Healey et.al. |
2407.08842 |
null |
2024-07-11 |
Proving that Cryptic Crossword Clue Answers are Correct |
Martin Andrews et.al. |
2407.08824 |
link |
2024-07-11 |
Uncertainty Estimation of Large Language Models in Medical Question Answering |
Jiaxin Wu et.al. |
2407.08662 |
null |
2024-07-11 |
$β$-DPO: Direct Preference Optimization with Dynamic $β$ |
Junkang Wu et.al. |
2407.08639 |
link |
2024-07-11 |
On the Universal Truthfulness Hyperplane Inside LLMs |
Junteng Liu et.al. |
2407.08582 |
link |
2024-07-22 |
Lynx: An Open Source Hallucination Evaluation Model |
Selvan Sunitha Ravi et.al. |
2407.08488 |
null |
2024-07-11 |
On the attribution of confidence to large language models |
Geoff Keeling et.al. |
2407.08388 |
null |