Manan Suri

5108, 8125 Paint Branch Dr College Park, MD 20742

I am a PhD student in Computer Science at the University of Maryland, College Park, advised by Prof. Dinesh Manocha at the GAMMA Lab.

My research interests span LLM-based agents across a range of applications, including document and multimodal understanding, code and software engineering workflows, and general-purpose API-driven agents. More broadly, I work on grounding language models through retrieval, attribution, and structured reasoning, with a focus on how context is constructed, selected, and used effectively in generation, decision-making, and complex task execution.

Previously, I worked on greenwashing detection as a Data Science for Social Good Fellow at the University of Warwick, collaborating with the Algorithmic Transparency Institute. I also contributed to fact attribution and document retrieval systems at Scalenut. Last summer, I interned as an Applied Science Intern at Amazon, working on software engineering agents.

This summer, I will be joining Meta Superintelligence Labs as a research intern in Menlo Park — looking forward to connecting with people in the Bay Area!

news

Apr 10, 2026	CodeScout and Structured Uncertainty Guided Clarification accepted at ACL 2026 (Findings)!
Jan 25, 2026	Started as a Research Assistant at NIST, working on robust software engineering agents.
Dec 1, 2025	I delivered a talk “From Language Models to Agents: Foundations, Frameworks, and Future Challenges” at Illinois Institute of Technology.
Nov 3, 2025	Presented our paper “Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents” at EMNLP 2025, main conference!
Aug 28, 2025	I gave a talk at Adobe World HQ, San Jose, titled “Fine-grained Visual Attribution”.

selected publications

2026

CodeScout: Contextual Problem Statement Enhancement for Software Agents

Manan Suri, Xiangci Li, Mehdi Shojaie, Songyang Han, Chao-Chun Hsu, Shweta Garg, Aniket Anand Deshmukh, and Varun Kumar

In Findings of the Association for Computational Linguistics: ACL 2026, Jul 2026

Abs Bib HTML PDF

Current AI-powered code assistance tools struggle with poorly-defined problem statements lacking task context. CodeScout addresses this through contextual query refinement — converting underspecified requests into comprehensive problem statements via lightweight pre-exploration of target codebases. The method performs targeted context scoping, multi-perspective analysis examining potential fixes, and synthesizes insights into enhanced problem statements with reproduction steps and exploration hints. Evaluated on SWEBench-Verified, it demonstrated a 20% improvement in resolution rates with up to 27 additional issues resolved compared to baseline methods.
title = {{C}ode{S}cout: Contextual Problem Statement Enhancement for Software Agents}, author = {Suri, Manan and Li, Xiangci and Shojaie, Mehdi and Han, Songyang and Hsu, Chao-Chun and Garg, Shweta and Deshmukh, Aniket Anand and Kumar, Varun}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2026}, month = jul, year = {2026}, address = {San Diego, California, United States}, publisher = {Association for Computational Linguistics}, }
Structured Uncertainty Guided Clarification for LLM Agents

Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, and Dinesh Manocha

In Findings of the Association for Computational Linguistics: ACL 2026, Jul 2026

Abs Bib HTML PDF

LLM agents extend large language models with tool-calling capabilities, but ambiguous user instructions often lead to incorrect invocations and task failures. We introduce a principled formulation of structured uncertainty over tool-call parameters, modeling joint tool-argument clarification as a POMDP with Expected Value of Perfect Information (EVPI) objective for optimal question selection and aspect-based cost modeling to prevent redundancy. Our SAGE-Agent increases coverage on ambiguous tasks by 7–39% while reducing clarification questions by 1.5–2.7x compared to baselines. We also present ClarifyBench, the first multi-turn tool-augmented disambiguation benchmark with realistic LLM-based user simulation across diverse domains.
@inproceedings{suri-etal-2026-clarification, title = {Structured Uncertainty Guided Clarification for {LLM} Agents}, author = {Suri, Manan and Mathur, Puneet and Lipka, Nedim and Dernoncourt, Franck and Rossi, Ryan A. and Manocha, Dinesh}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2026}, month = jul, year = {2026}, address = {San Diego, California, United States}, publisher = {Association for Computational Linguistics}, }

2025

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, and Dinesh Manocha

In Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Apr 2025
ChartLens: Fine-grained Visual Attribution in Charts

Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, and Dinesh Manocha

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025

Abs

The growing capabilities of multimodal large language models (MLLMs) have advanced tasks like chart understanding. However, these models often suffer from hallucinations, where generated text sequences conflict with the provided visual data. To address this, we introduce Post-Hoc Visual Attribution for Charts, which identifies fine-grained chart elements that validate a given chart-associated response. We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects and employs set-of-marks prompting with MLLMs for fine-grained visual attribution. Additionally, we present ChartVA-Eval, a benchmark with synthetic and real-world charts from diverse domains like finance, policy, and economics, featuring fine-grained attribution annotations. Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents

Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, and Dinesh Manocha

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Nov 2025

Abs

Flowcharts are a critical tool for visualizing decision-making processes. However, their non-linear structure and complex visual-textual relationships make it challenging to interpret them using LLMs, as vision-language models frequently hallucinate nonexistent connections and decision paths when analyzing these diagrams. This leads to compromised reliability for automated flowchart processing in critical domains such as logistics, health, and engineering. We introduce the task of Fine-grained Flowchart Attribution, which traces specific components grounding a flowchart referring LLM response. Flowchart Attribution ensures the verifiability of LLM predictions and improves explainability by linking generated responses to the flowchart’s structure. We propose FlowPathAgent, a neurosymbolic agent that performs fine-grained post hoc attribution through graph-based reasoning. It first segments the flowchart, then converts it into a structured symbolic graph, and then employs an agentic approach to dynamically interact with the graph, to generate attribution paths. Additionally, we present FlowExplainBench, a novel benchmark for evaluating flowchart attributions across diverse styles, domains, and question types. Experimental results show that FlowPathAgent mitigates visual hallucinations in LLM answers over flowchart QA, outperforming strong baselines by 10–14% on our proposed FlowExplainBench dataset.

2024

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Manan Suri, Puneet Mathur, Franck Dernoncourt, Rajiv Jain, Vlad I Morariu, Ramit Sawhney, Preslav Nakov, and Dinesh Manocha

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs

Document structure editing involves manipulating localized textual, visual, and layout components in document images based on the user’s requests. Past works have shown that multimodal grounding of user requests in the document image and identifying the accurate structural components and their associated attributes remain key challenges for this task. To address these, we introduce the DocEditAgent, a novel framework that performs end-to-end document editing by leveraging Large Multimodal Models (LMMs). It consists of three novel components – (1) Doc2Command to simultaneously localize edit regions of interest (RoI) and disambiguate user edit requests into edit commands. (2) LLM-based Command Reformulation prompting to tailor edit commands originally intended for specialized software into edit instructions suitable for generalist LMMs. (3) Moreover, DocEditAgent processes these outputs via Large Multimodal Models like GPT-4V and Gemini, to parse the document layout, execute edits on grounded Region of Interest (RoI), and generate the edited document image. Extensive experiments on the DocEdit dataset show that DocEditAgent significantly outperforms strong baselines on edit command generation (2-33%), RoI bounding box detection (12-31%), and overall document editing (1-12%) tasks.

2023

ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER

Sreyan Ghosh, Utkarsh Tyagi, Manan Suri, Sonal Kumar, Ramaneswaran S, and Dinesh Manocha

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023

Abs Bib HTML PDF

Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation, to address the data scarcity problem in low-resource complex NER. ACLM alleviates the context-entity mismatch issue, a problem existing NER data augmentation techniques suffer from and often generates incoherent augmentations by placing complex named entities in the wrong context. ACLM builds on BART and is optimized on a novel text reconstruction or denoising task - we use selective masking (aided by attention maps) to retain the named entities and certain keywords in the input sentence that provide contextually relevant additional knowledge or hints about the named entities. Compared with other data augmentation strategies, ACLM can generate more diverse and coherent augmentations preserving the true word sense of complex entities in the sentence. We demonstrate the effectiveness of ACLM both qualitatively and quantitatively on monolingual, cross-lingual, and multilingual complex NER across various low-resource settings. ACLM outperforms all our neural baselines by a significant margin (1%-36%). In addition, we demonstrate the application of ACLM to other domains that suffer from data scarcity (e.g., biomedical). In practice, ACLM generates more effective and factual augmentations for these domains than prior methods.
@inproceedings{ghosh-etal-2023-aclm, title = {{ACLM}: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex {NER}}, author = {Ghosh, Sreyan and Tyagi, Utkarsh and Suri, Manan and Kumar, Sonal and S, Ramaneswaran and Manocha, Dinesh}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = jul, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.acl-long.8}, doi = {10.18653/v1/2023.acl-long.8}, pages = {104--125}, }
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network

Sreyan Ghosh, Manan Suri, Purva Chiniya, Utkarsh Tyagi, Sonal Kumar, and Dinesh Manocha

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023