Publications | Maharshi Gor

2025

NAACL 2025
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness

Debug - Last name: "Sung" Debug - Full last: "Sung" Debug - Suffix: "Sung" Debug - First name: "Yoo Yeon"
Yoo Yeon Sung
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor
Debug - Last name: "Fleisig" Debug - Full last: "Fleisig" Debug - Suffix: "Fleisig" Debug - First name: "Eve"
, Eve Fleisig
Debug - Last name: "Mondal" Debug - Full last: "Mondal" Debug - Suffix: "Mondal" Debug - First name: "Ishani"
, Ishani Mondal
Debug - Last name: "Boyd-Graber" Debug - Full last: "Boyd-Graber" Debug - Suffix: "Boyd-Graber" Debug - First name: "Jordan"
, and Jordan Boyd-Graber

In Nations of the Americas Chapter of the Association for Computational Linguistics, Apr 2025

Abs arXiv Bib PDF

Adversarial datasets should ensure AI robustness that matches human performance. However, as models evolve, datasets can become obsolete. Thus, adversarial datasets should be periodically updated based on their degradation in adversarialness. Given the lack of a standardized metric for measuring adversarialness, we propose AdvScore, a human-grounded evaluation metric. AdvScore assesses a dataset’s true adversarialness by capturing models’ and humans’ varying abilities, while also identifying poor examples. AdvScore then motivates a new dataset creation pipeline for realistic and high-quality adversarial samples, enabling us to collect an adversarial question answering (QA) dataset, AdvQA. We apply AdvScore using 9,347 human responses and ten language model predictions to track the models’ improvement over five years (from 2020 to 2024). AdvScore assesses whether adversarial datasets remain suitable for model evaluation, measures model improvements, and provides guidance for better alignment with human capabilities.
@inproceedings{sung2024advscore, title = {Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness}, author = {Sung, Yoo Yeon and Gor, Maharshi and Fleisig, Eve and Mondal, Ishani and Boyd-Graber, Jordan}, publisher = {Association for Computational Linguistics,}, booktitle = {Nations of the Americas Chapter of the Association for Computational Linguistics}, year = {2025}, month = apr, location = {Albuquerque, New Mexico}, url = {https://arxiv.org/abs/2406.16342}, }

2024

EMNLP 2024
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
Maharshi Gor
Debug - Last name: "Daumé" Debug - Full last: "Daumé III" Debug - Suffix: "III" Debug - First name: "Hal"
, Hal Daumé III
Debug - Last name: "Zhou" Debug - Full last: "Zhou" Debug - Suffix: "Zhou" Debug - First name: "Tianyi"
, Tianyi Zhou
Debug - Last name: "Boyd-Graber" Debug - Full last: "Boyd-Graber" Debug - Suffix: "Boyd-Graber" Debug - First name: "Jordan"
, and Jordan Boyd-Graber

In Empirical Methods in Natural Language Processing, Nov 2024

Abs arXiv Bib PDF Video

Recent advancements of large language models (LLMs) have led to claims of AI surpassing humans in natural language processing (NLP) tasks such as textual understanding and reasoning. This work investigates these assertions by introducing CAIMIRA, a novel framework rooted in item response theory (IRT) that enables quantitative assessment and comparison of problem-solving abilities of question-answering (QA) agents: humans and AI systems. Through analysis of over 300,000 responses from 70 AI systems and 155 humans across thousands of quiz questions, CAIMIRA uncovers distinct proficiency patterns in knowledge domains and reasoning skills. Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning, while state-of-the-art LLMs like GPT-4 and LLaMA-3-70B show superior performance on targeted information retrieval and fact-based reasoning, particularly when information gaps are well-defined and addressable through pattern matching or data retrieval. These findings highlight the need for future QA tasks to focus on questions that challenge not only higher-order reasoning and scientific thinking, but also demand nuanced linguistic interpretation and cross-contextual knowledge application, helping advance AI developments that better emulate or complement human cognitive abilities in real-world problem-solving.
@inproceedings{gor2024caimira, title = {Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA}, author = {Gor, Maharshi and {Daum\'e III}, Hal and Zhou, Tianyi and Boyd-Graber, Jordan}, booktitle = {Empirical Methods in Natural Language Processing}, publisher = {Association for Computational Linguistics,}, year = {2024}, month = nov, location = {Miami, FL}, }

2022

NeurIPS 2022
Toward Efficient Robust Training against Union of $l_p$Threat Models

Debug - Last name: "Sriramanan" Debug - Full last: "Sriramanan" Debug - Suffix: "Sriramanan" Debug - First name: "Gaurang"
Gaurang Sriramanan
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor
Debug - Last name: "Feizi" Debug - Full last: "Feizi" Debug - Suffix: "Feizi" Debug - First name: "Soheil"
, and Soheil Feizi

In Advances in Neural Information Processing Systems, Dec 2022

Abs Bib PDF

The overwhelming vulnerability of deep neural networks to carefully crafted perturbations known as adversarial attacks has led to the development of various training techniques to produce robust models. While the primary focus of existing approaches has been directed toward addressing the worst-case performance achieved under a single-threat model, it is imperative that safety-critical systems are robust with respect to multiple threat models simultaneously. Existing approaches that address worst-case performance under the union of such threat models (e.g., \(\ell_∞), \ell_2, \ell_1) either utilize adversarial training methods that require multi-step attacks which are computationally expensive in practice, or rely upon fine-tuning of pre-trained models that are robust with respect to a single-threat model. In this work, we show that by carefully choosing the objective function used for robust training, it is possible to achieve similar, or even improved worst-case performance over a union of threat models while utilizing only single-step attacks during the training, thereby achieving a significant reduction in computational resources necessary for training. Furthermore, prior work showed that adversarial training against the \ell_1 threat model is relatively difficult, to the extent that even multi-step adversarially trained models were shown to be prone to gradient-masking and catastrophic over-fitting. However, our proposed method—when applied on the \ell_1 threat model specifically—enables us to obtain the first \ell_1$ robust model trained solely with single-step adversarial attacks. Finally, to demonstrate the merits of our approach, we utilize a modern set of attack evaluations to better estimate the worst-case performance under the considered union of threat models.
@inproceedings{sriramanan:gor:feizi-neurips2022, title = {Toward Efficient Robust Training against Union of $l_p$ Threat Models}, author = {Sriramanan, Gaurang and Gor, Maharshi and Feizi, Soheil}, booktitle = {Advances in Neural Information Processing Systems}, year = {2022}, month = dec, location = {New Orleans, LA}, }
Preprint
MetaDIP: Accelerating Deep Image Prior with Meta Learning

Debug - Last name: "Zhang" Debug - Full last: "Zhang" Debug - Suffix: "Zhang" Debug - First name: "Kevin"
Kevin Zhang
Debug - Last name: "Xie" Debug - Full last: "Xie" Debug - Suffix: "Xie" Debug - First name: "Mingyang"
, Mingyang Xie
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor
Debug - Last name: "Chen" Debug - Full last: "Chen" Debug - Suffix: "Chen" Debug - First name: "Yi-Ting"
, Yi-Ting Chen
Debug - Last name: "Zhou" Debug - Full last: "Zhou" Debug - Suffix: "Zhou" Debug - First name: "Yvonne"
, Yvonne Zhou, and 1 more author

arXiv preprint arXiv: 2209.08452, Dec 2022

Abs arXiv Bib PDF

Deep image prior (DIP) is a recently proposed technique for solving imaging inverse problems by fitting the reconstructed images to the output of an untrained convolutional neural network. Unlike pretrained feedforward neural networks, the same DIP can generalize to arbitrary inverse problems, from denoising to phase retrieval, while offering competitive performance at each task. The central disadvantage of DIP is that, while feedforward neural networks can reconstruct an image in a single pass, DIP must gradually update its weights over hundreds to thousands of iterations, at a significant computational cost. In this work we use meta-learning to massively accelerate DIP-based reconstructions. By learning a proper initialization for the DIP weights, we demonstrate a 10x improvement in runtimes across a range of inverse imaging tasks. Moreover, we demonstrate that a network trained to quickly reconstruct faces also generalizes to reconstructing natural image patches.
@article{zhang2022metadip, title = {MetaDIP: Accelerating Deep Image Prior with Meta Learning}, author = {Zhang, Kevin and Xie, Mingyang and Gor, Maharshi and Chen, Yi-Ting and Zhou, Yvonne and Metzler, Christopher A.}, journal = {arXiv preprint arXiv: 2209.08452}, year = {2022}, }

2021

EMNLP 2021
MATE: Multi-view Attention for Table Transformer Efficiency

Debug - Last name: "Eisenschlos" Debug - Full last: "Eisenschlos" Debug - Suffix: "Eisenschlos" Debug - First name: "Julian Martin"
Julian Martin Eisenschlos
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor
Debug - Last name: "Müller" Debug - Full last: "Müller" Debug - Suffix: "Müller" Debug - First name: "Thomas"
, Thomas Müller
Debug - Last name: "Cohen" Debug - Full last: "Cohen" Debug - Suffix: "Cohen" Debug - First name: "William Weston"
, and William Weston Cohen

In Empirical Methods in Natural Language Processing, Nov 2021

Abs arXiv Bib PDF

This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens. Here we propose MATE, a novel Transformer architecture designed to model the structure of web tables. MATE uses sparse attention in a way that allows heads to efficiently attend to either rows or columns in a table. This architecture scales linearly with respect to speed and memory, and can handle documents containing more than 8000 tokens with current accelerators. MATE also has a more appropriate inductive bias for tabular data, and sets a new state-of-the-art for three table reasoning datasets. For HybridQA (Chen et al., 2020b), a dataset that involves large documents containing tables, we improve the best prior result by 19 points.
@inproceedings{eisenschlos2021mate, title = {MATE: Multi-view Attention for Table Transformer Efficiency}, author = {Eisenschlos, Julian Martin and Gor, Maharshi and M{\"u}ller, Thomas and Cohen, William Weston}, booktitle = {Empirical Methods in Natural Language Processing}, publisher = {Association for Computational Linguistics,}, month = nov, year = {2021}, location = {Punta Cana}, }
EMNLP 2021
Toward Deconfounding the Influence of Entity Demographics for Question Answering Accuracy

Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
Maharshi Gor
Debug - Last name: "Webster" Debug - Full last: "Webster" Debug - Suffix: "Webster" Debug - First name: "Kellie"
, Kellie Webster
Debug - Last name: "Boyd-Graber" Debug - Full last: "Boyd-Graber" Debug - Suffix: "Boyd-Graber" Debug - First name: "Jordan"
, and Jordan Boyd-Graber

In Empirical Methods in Natural Language Processing, Nov 2021

Abs arXiv Bib PDF

The goal of question answering (QA) is to answer any question. However, major QA datasets have skewed distributions over gender, profession, and nationality. Despite that skew, model accuracy analysis reveals little evidence that accuracy is lower for people based on gender or nationality; instead, there is more variation on professions (question topic). But QA’s lack of representation could itself hide evidence of bias, necessitating QA datasets that better represent global diversity.
@inproceedings{Gor:Webster:Boyd-Graber-2021, title = {Toward Deconfounding the Influence of Entity Demographics for Question Answering Accuracy}, author = {Gor, Maharshi and Webster, Kellie and Boyd-Graber, Jordan}, booktitle = {Empirical Methods in Natural Language Processing}, publisher = {Association for Computational Linguistics,}, month = nov, year = {2021}, location = {Punta Cana}, }

2019

ICCV 2019
GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions

Debug - Last name: "Kundu" Debug - Full last: "Kundu" Debug - Suffix: "Kundu" Debug - First name: "Jogendra Nath"
Jogendra Nath Kundu^*
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor^*
Debug - Last name: "Agrawal" Debug - Full last: "Agrawal" Debug - Suffix: "Agrawal" Debug - First name: "Dakshit"
, Dakshit Agrawal
Debug - Last name: "Babu" Debug - Full last: "Babu" Debug - Suffix: "Babu" Debug - First name: "R. Venkatesh"
, and R. Venkatesh Babu

In Proceedings of the IEEE/CVF International Conference on Computer Vision, Oct 2019

Abs arXiv Bib PDF Code

Despite the remarkable success of generative adversarial networks, their performance seems less impressive for diverse training sets, requiring learning of discontinuous mapping functions. Though multi-mode prior or multi-generator models have been proposed to alleviate this problem, such approaches may fail depending on the empirically chosen initial mode components. In contrast to such bottom-up approaches, we present GAN-Tree, which follows a hierarchical divisive strategy to address such discontinuous multi-modal data. Devoid of any assumption on the number of modes, GAN-Tree utilizes a novel mode-splitting algorithm to effectively split the parent mode to semantically cohesive children modes, facilitating unsupervised clustering. Further, it also enables incremental addition of new data modes to an already trained GAN-Tree, by updating only a single branch of the tree structure. As compared to prior approaches, the proposed framework offers a higher degree of flexibility in choosing a large variety of mutually exclusive and exhaustive tree nodes called GAN-Set. Extensive experiments on synthetic and natural image datasets including ImageNet demonstrate the superiority of GAN-Tree against the prior state-of-the-art.
@inproceedings{Kundu:Gor:Agrawal:Babu-ICCV2019, author = {Kundu, Jogendra Nath and Gor, Maharshi and Agrawal, Dakshit and Babu, R. Venkatesh}, title = {GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, month = oct, year = {2019}, location = {Seoul, South Korea}, first_author = {true}, }
AAAI 2019
BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN

Debug - Last name: "Kundu" Debug - Full last: "Kundu" Debug - Suffix: "Kundu" Debug - First name: "Jogendra Nath"
Jogendra Nath Kundu^*
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor^*
Debug - Last name: "Babu" Debug - Full last: "Babu" Debug - Suffix: "Babu" Debug - First name: "R. Venkatesh"
, and R. Venkatesh Babu

In Proceedings of the AAAI conference on Artificial Intelligence, Jul 2019

Abs arXiv Bib PDF Code

Human motion prediction model has applications in various fields of computer vision. Without taking into account the inherent stochasticity in the prediction of future pose dynamics, such methods often converges to a deterministic undesired mean of multiple probable outcomes. Devoid of this, we propose a novel probabilistic generative approach called Bidirectional Human motion prediction GAN, or BiHMP-GAN. To be able to generate multiple probable human-pose sequences, conditioned on a given starting sequence, we introduce a random extrinsic factor r, drawn from a predefined prior distribution. Furthermore, to enforce a direct content loss on the predicted motion sequence and also to avoid mode-collapse, a novel bidirectional framework is incorporated by modifying the usual discriminator architecture. The discriminator is trained also to regress this extrinsic factor r, which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. To further regularize the training, we introduce a novel recursive prediction strategy. In spite of being in a probabilistic framework, the enhanced discriminator architecture allows predictions of an intermediate part of pose sequence to be used as a conditioning for prediction of the latter part of the sequence. The bidirectional setup also provides a new direction to evaluate the prediction quality against a given test sequence. For a fair assessment of BiHMP-GAN, we report performance of the generated motion sequence using (i) a critic model trained to discriminate between real and fake motion sequence, and (ii) an action classifier trained on real human motion dynamics. Outcomes of both qualitative and quantitative evaluations, on the probabilistic generations of the model, demonstrate the superiority of BiHMP-GAN over previously available methods.
@inproceedings{Kundu_Gor_Babu_2019, title = {BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN}, author = {Kundu, Jogendra Nath and Gor, Maharshi and Babu, R. Venkatesh}, booktitle = {Proceedings of the AAAI conference on Artificial Intelligence}, year = {2019}, month = jul, volume = {33}, number = {01}, pages = {8553--8560}, first_author = {true}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/4874}, doi = {10.1609/aaai.v33i01.33018553}, }
WACV 2019
Unsupervised Feature Learning of Human Actions As Trajectories in Pose Embedding Manifold

Debug - Last name: "Kundu" Debug - Full last: "Kundu" Debug - Suffix: "Kundu" Debug - First name: "Jogendra Nath"
Jogendra Nath Kundu^*
Debug - Last name: "Gor" Debug - Full last: "Gor" Debug - Suffix: "Gor" Debug - First name: "Maharshi"
, Maharshi Gor^*
Debug - Last name: "Uppala" Debug - Full last: "Uppala" Debug - Suffix: "Uppala" Debug - First name: "Phani Krishna"
, Phani Krishna Uppala
Debug - Last name: "Babu" Debug - Full last: "Babu" Debug - Suffix: "Babu" Debug - First name: "R. Venkatesh"
, and R. Venkatesh Babu

In IEEE Winter Conference on Applications of Computer Vision (WACV), Jul 2019

tldr arXiv Bib PDF Code

This work introduces a novel unsupervised framework for human action modeling, which separates the learning of individual pose representations from sequences of poses. It employs a novel Encoder GAN (EnGAN) for continuous pose embedding and an RNN auto-encoder architecture, PoseRNN, for action modeling. It demonstrates superior transferability in action recognition tasks and provides qualitative insights through skeleton pose reconstructions and visualizations in the pose-embedding space.
@inproceedings{Kundu:Gor:Uppala:Babu-WACV2019, author = {Kundu, Jogendra Nath and Gor, Maharshi and Uppala, Phani Krishna and Babu, R. Venkatesh}, booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)}, title = {Unsupervised Feature Learning of Human Actions As Trajectories in Pose Embedding Manifold}, year = {2019}, volume = {}, number = {}, pages = {1459-1467}, first_author = {true}, doi = {10.1109/WACV.2019.00160}, tldr = {This work introduces a novel unsupervised framework for human action modeling, which separates the learning of individual pose representations from sequences of poses. It employs a novel Encoder GAN (EnGAN) for continuous pose embedding and an RNN auto-encoder architecture, PoseRNN, for action modeling. It demonstrates superior transferability in action recognition tasks and provides qualitative insights through skeleton pose reconstructions and visualizations in the pose-embedding space.} }