Wei Xu - Georgia Tech - College of Computing

Wei Xu
[phonetic pronunciation: way shoo ]

Associate Professor
College of Computing
Georgia Institute of Technology
wei.xu@cc.gatech.edu
@cocoweixu

I am a faculty member in Computer Science at Georgia Tech’s School of Interactive Computing (one of four schools in College of Computing) and Machine Learning Center. My research focuses on advancing large language models across three areas:

(1) reinforcement learning & post-training: multilinguality, cultural adaptation, reasoning, temporal robustness;
(2) evaluation: long-context, multi-turn interactions, agent/user simulation, and personalization;
(3) interdisciplinary AI+X applications: education, privacy, law, healthcare, and beyond.

My work has been recognized with the NSF CAREER Award; Faculty Research Awards from Google, Sony, and Criteo; the CrowdFlower AI for Everyone Award; and Paper Awards at COLING’18 and ACL’24. My research lab is supported by grants from NSF, NIH, DARPA, and IARPA. I received my Ph.D. in Computer Science from New York University and my B.S./M.S. from Tsinghua University.

I plan to recruit 1–2 PhD students for Fall 2026 (please apply to the Machine Learning or CS PhD program and list me as a potential advisor). I also recruit research-oriented MS students (apply to the MSCS program and email me) and motivated undergraduates with sufficient time to commit to research. Although I do not normally respond to admission inquiries given the volume, a brief email after you submit your application can help ensure I don’t miss it in the system.

What's New

co-organizing the Workshop on Natural User-generated Text at EMNLP 2026.
May 2026, paper on multilingual reinforcement learning accepted to ICML
May 2026, Jonathan to start summer intern at Bloomberg; Tarek to start summer intern at Google
Apr 2026, co-organizing the Human-centered Evaluation and Auditing of Language Models (AI Agents in the Loop) Workshop at CHI 2026.
Apr 2026, talk at Oracle's Distinguished Lecture Series
Jan 2026, 2 paper accepted to ICLR 2026 and 1 paper to CHI 2026. Congrats to my students and collaborators!
Dec 2025, talk at NeurIPS 2025 Workshop on Efficient Reasoning (slides)
Sep 2025, paper on LLM probabilistic reasoning accepted to NeurIPS 2025.
Aug 2025, 4 papers accepted to EMNLP 2025 main conference
Aug 2025, talk at Apple ML research (virtual) "Probabilistic Reasoning and Multicultural Alignment in LLMs"
Jul 2025, talk at JPMorgan AI Research (virtual)
May 2025, talk at Sungkyunkwan University, South Korea
May 2025, keynote at PrivateNLP@NAACL "Empowering Everyday Users to Protect Their Privacy in the Age of AI".
Mar 2025, talk at University of Pennsylvania
Feb 2025, talk at University of Massachusetts, Lowell (virtual)
Feb 2025, talk at Google Research, Mountain View
Feb 2025, talk at University of California, Berkeley
Dec 2024, Chao Jiang successfully defended his phd thesis and will join Apple AI/ML Research
Oct 2024, received an NIH R01 grant!
Oct 2024, talk at Bloomberg's CTO Data Science Speaker Series
Oct 2024, talk at Stony Brook University, New York
Oct 2024, 🏆 received the Google Academic Research Award!
Oct 2024, talk at Tokyo Institute of Technology on "Enhancing Multilingual Capabilities in LLMs" (slides)
Sep 2024, 4 long papers and 1 short paper accepted to EMNLP main conference.
Sep 2024, talk at MIT on "Cultural Biases, World Languages, and Privacy in Large Language Models" (slides).
Sep 2024, talk at Northeastern on "Human-AI Collaboration in Evaluating LLMs".
Aug 2024, 🏆 our paper on multicultural LLMs won the Best Social Impact Award at ACL 2024!
Aug 2024, tutorial at ACL 2024 on "Automatic and Human-AI Interactive Text Generation" (slides)
Aug 2024, my PhD advisor Ralph Grishman won the ACL Lifetime Achievement Award
Aug 2024, talk at Megagon (virtual)
July 2024, Yang Chen successfully defended his phd thesis, and will join NVIDIA as a research scientist.
June 2024, talk at NSF workshop on AI Text Production (virtual)
May 2024, 6 long papers accepted to ACL 2024 main conference!
May 2024, keynote at CHI 2024 HEAL Workshop on "Human-AI Collaboration in Evaluating LLMs" (slides).
May 2024, Yao Dou will start his summer internship at Microsoft Research; Chao Jiang will intern at Apple.
Apr 2024, 🏆 David Heineman won the CoC Outstanding Undergraduate Research Award!
Mar 2024, press coverage by VentureBeat on our new research about cultural biases in LLMs
Mar 2024, talk at USC and UCLA on "Amazing Multilingual Capabilities and Concerning Cultural Biases in LLMs"
Oct 2023, demo of Thresh 🌾 has been accepted to EMNLP 2023 -- a customizable tool for fine-grained human evaluation of LLM generated texts (e.g., MT, summarization, text revision, + more)
Aug 2023, I was quoted in Business Insider about AI-generated content online.
Aug 2023, Mounica Maddela defended her PhD thesis and will join Bloomberg AI's LLM group
July 2023, our paper on multilingual text simplification received Honorable Mention Award at ACL 2023!

Research Highlights

Multilingual Multicultural LLMs

While LLMs have demonstrated impressive performance, their success is largely concentrated in English and other high-resource languages. In contrast, many non-English languages remain underrepresented and underserved. Moreover, these models often reflect Western cultural biases and struggle to capture the nuances of non-Western cultural contexts (Naous et al., ACL 2024; Naous et al., NAACL 2025). We work on identifying and closing these gaps in performance and cultural adaptation. Addressing these challenges calls for a deeper analysis of pre-training data to identify and mitigate representational gaps, as well as language-routed reinforcement learning (Guo et al., ICML 2026), alignment (Guo et al., EMNLP 2025) and inference-time algorithms (Le at al., ICLR 2024) that can dynamically adapt model behavior to diverse linguistic and cultural contexts.

Robustness and Reasoning of LLMs

Artificial General Intelligence (AGI) benchmarks seek to assess an AI system’s capacity to perform tasks that require human-level intelligence, including reasoning, learning, and adapting to novel situations (Zheng et al., ACL 2024; Mendes et al., EMNLP 2024). While current systems fall short of true AGI, there is growing interest in moving beyond static benchmarks toward more realistic, dynamic evaluations. Our research focuses on designing real-world tasks that better reflect practical challenges faced by LLMs, and on developing innovative methods (Zheng et al., arXiv 2025) to enhance their robustness and performance in these complex settings.

Interdisciplinary NLP+X Research

We actively collaborate with researchers to explore impactful real-world applications of large language models in Human-Computer Interaction, Education, Security and Privacy, Healthcare, and Law (Jiang et al., EMNLP 2024; Dou et al., ACL 2024). As LLMs continue to advance, they offer exciting new capabilities across specialized domains. There are a lot of opportunities, as LLMs often exhibit promising but inconsistent performance in domain-specific tasks, where precision, context sensitivity, and domain knowledge are critical.

NLP X Lab

photos together with Alan Ritter's group

    Yao Dou (CS PhD student; LLM evaluation, multi-turn interactions)
    Tarek Naous (ECE ML PhD; multilingual LLM / fairness)
    Jonathan Zheng (ML PhD; reasoning, robustness of LLM -- co-advisor: Alan Ritter)
    Geyang Guo (CS PhD; LLM alignment / RL -- co-advisor: Alan Ritter)
    Duong Minh Le (CS PhD; multilingual LLM -- co-advisor: Alan Ritter)
    Junmo Kang (CS PhD; efficiency -- co-advisor: Alan Ritter)
    Ivy He (CS PhD; multilingual LLM -- co-advisor: Alan Ritter)
    Jerry Zheng (BSMS, autumn 2025 -- human-AI interaction)
    Julie Young (BSMS, autumn 2025 -- media framing)
    Usneek Singh (CS MS, autumn 2025 -- user simulation)
    Yiren Wang (CS MS, autumn 2025 -- media framing)
    Poorvaja Kumar (CS MS, spring 2026 -- pragmatics)
    Parth Nanda (CS MS, spring 2026 -- pragmatics)
    Benjamin Mamut (Undergrad, autumn 2025 -- LLM evaluation)
    Guanjun Yan (Undergrad, autumn 2025 -- )
    Alexey Plagov (Undergrad, autumn 2025 -- )
    Sara Takagi (Undergrad, summer 2025 -- )
    Jiayu Liu (Undergrad intern from UIUC, summer 2025 -- )

Alumni (with theses) and Visitors

    Chao Jiang (PhD 2025 → Apple AI/ML research)
    Yang Chen (PhD 2024, co-advisor: Alan Ritter → Research Scientist at NVIDIA)
    Mounica Maddela (PhD 2023 → Bloomberg AI)
    Wuwei Lan (PhD 2021 → Applied Scientist at Amazon → Research Scientist at Meta)
    Xiaofeng Wu (MS 2025 → Baidu)
    Marcus Ma (MS 2024 → PhD student at USC)
    Anton Lavrouk (MS 2024 → Lockheed Martin → IMC Trading)
    David Heineman (BS 2024, CoC Outstanding Undergrad Research Award → PYI at AI2 → PhD student at Stanford)
    Jonathan Zheng (BS 2023 → PhD student at Georgia Tech)
    Michael Ryan (BS 2023 → PhD student at Stanford)
    Zirui Shao (visiting PhD student from Zhejiang University, 2025)

Publications

Preprints

Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization
Yao Dou, Wei Xu
arXiv, 2026
Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages
Tarek Naous, Anagha Savit, Carlos Rafael Catalan, Geyang Guo, Jaehyeok Lee, Kyungdon Lee, Lheane Marie Dizon, Mengyu Ye, Neel Kothari, Sahajpreet Singh, Sarah Masud, Tanish Patwa, Trung Thanh Tran, Zohaib Khan, Alan Ritter, JinYeong Bak, Keisuke Sakaguchi, Tanmoy Chakraborty, Yuki Arase, Wei Xu
arXiv, 2025

2026

Learning to Route Languages for Multilingual Preference Optimization
Geyang Guo, Hiromi Wakaki, Yuki Mitsufuji, Alan Ritter, Wei Xu
ICML 2026
GeoRC: A Benchmark for Geolocation Reasoning Chains
Mohit Talreja, Joshua Diao, Jim Thannikary James, Radu Casapu, Tejas Santanam, Ethan Mendes, Alan Ritter, Wei Xu, James Hays
ACL 2026
Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence
Kaijie Mo, Siddhartha Venkatayogi, Chantal Shaib, Ramez Kouzy, Wei Xu, Byron C. Wallace, Junyi Jessy Li
ACL Findings 2026
Flipping the Dialogue: Training and Evaluating User Language Models
Tarek Naous, Philippe Laban, Wei Xu, Jennifer Neville
ICLR 2026
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Ruixin Yang, Ethan Mendes, Arthur Wang, James Hays, Sauvik Das, Wei Xu, Alan Ritter
ICLR 2026
Supporting Informed Self-Disclosure: Design Principles for Presenting AI-Estimates of Privacy Risks to Users
Isadora Krsek, Meryl Ye, Wei Xu, Alan Ritter, Laura Dabbish, and Sauvik Das
CHI 2026

2025

Probabilistic Reasoning with LLMs for Privacy Risk Estimation
Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu
NeurIPS 2025
CARE: Multilingual Human Preference Learning for Cultural Awareness
Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu
EMNLP 2025
SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu, Jianfeng Gao
EMNLP 2025
What are Foundation Models Cooking in the Post-Soviet World?
Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu
EMNLP 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo, Wei Xu, Alan Ritter
EMNLP 2025
Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge
Agam Shah, Liqin Ye, Sebastian Jaskowski, Wei Xu, Sudheer Chava
COLM 2025
Evaluating LLMs on Chinese Idiom Translation
Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM 2025
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena
Tarek Naous, Wei Xu
NAACL 2025
The Impact of Visual Information in Chinese Characters
Xiaofeng Wu, Karl Stratos, Wei Xu
NAACL 2025
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy*, Pradyumna Tambwekar*, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, Matthew Gombolay
(* equal contribution)
ICLR 2025
CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark
Marcus Ma, Duong Minh Le, Junmo Kang, Yao Dou, John Cadigan, Dayne Freitag, Alan Ritter, Wei Xu
AAAI 2025
Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI
Isadora Krsek, Anubha Kabra, Yao Dou, Tarek Naous, Laura A. Dabbish, Alan Ritter, Wei Xu, Sauvik Das
CSCW 2025
Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges
Xiaofeng Wu, Alan Ritter, Wei Xu
arXiv, 2025

2024

Granular Privacy Control for Geolocation with Vision Language Models
Ethan Mendes, Yang Chen, James Hays, Sauvik Das, Wei Xu, Alan Ritter
EMNLP 2024
MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain
Chao Jiang, Wei Xu
EMNLP 2024
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment
Tarek Naous, Michael J. Ryan, Anton Lavrouk, Mohit Chandra, Wei Xu
EMNLP 2024
Improving Minimum Bayes Risk Decoding with Multi-Prompt
David Heineman, Yao Dou, Wei Xu
EMNLP 2024
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
Govind Ramesh, Yao Dou, Wei Xu
EMNLP 2024
ChatHF: Collecting Rich Human Feedback from Real-time Conversations [video]
Andrew Li, Zhenduo Wang, Ethan Mendes, Duong Minh Le, Wei Xu, Alan Ritter
EMNLP 2024 (Demo)
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu
ACL 2024 🏆 Best Social Impact Award
Press Coverage by VentureBeat
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, Wei Xu
ACL 2024
NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms
Jonathan Zheng, Alan Ritter, Wei Xu
ACL 2024
Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
Ruohao Guo, Wei Xu, Alan Ritter
ACL 2024
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li
ACL 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li
ACL 2024
Automatic and Human-AI Interactive Text Generation (slides)
Yao Dou*, Philippe Laban*, Claire Gardent, Wei Xu (* equal contribution)
ACL 2024 (Tutorial)
Constrained Decoding for Cross-lingual Label Projection
Duong Minh Le, Yang Chen, Alan Ritter, Wei Xu
ICLR 2024
Design and Evaluation of an Automatic Text Simplification Prototype with Deaf and Hard-of-hearing Readers
Oliver Alonzo, Sooyeon Lee, Akhter Al Amin, Mounica Maddela, Wei Xu, Matt Huenerfauth
ASSETS 2024
Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation
Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu
EACL 2024 Workshop on Noisy User-generated Text

2023

Thresh 🌾: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation [code/demo]
David Heineman, Yao Dou, Wei Xu
EMNLP 2023 (Demo)
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP 2023
Multilingual Simplification of Medical Texts
Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh Ramanathan, Wei Xu, Byron Wallace, Junyi Jessy Li
EMNLP 2023
A Computational Interface to Infer Strategic Intent from Unstructured Language in a Low-Data Setting
Pradyumna Tambwekar, Lakshita Dodeja, Nathan Vaska, Wei Xu, Matthew Gombolay
EMNLP 2023 (Findings)
LENS 🔎 - A Learnable Evaluation Metric for Text Simplification [code/demo]
Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu (* equal contribution)
ACL 2023
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
Junmo Kang, Wei Xu, Alan Ritter
ACL 2023
Revisiting non-English Text Simplification: A Unified Multilingual Benchmark
Michael J. Ryan, Tarek Naous, Wei Xu
ACL 2023 🏆 Best Paper Award Honorable Mention
Improved Instruction Ordering in Recipe-Grounded Conversation
Duong Minh Le, Ruohao Guo, Wei Xu, Alan Ritter
ACL 2023 Press Coverage by GT News
Human-in-the-loop Evaluation for Early Misinformation Detection
Ethan Mendes, Yang Chen, Wei Xu, Alan Ritter
ACL 2023
Frustratingly Easy Label Projection for Cross-lingual Transfer
Yang Chen, Chao Jiang, Alan Ritter, Wei Xu
ACL 2023 (Findings)
Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification
Renliang Sun, Wei Xu, Xiaojun Wan
ACL 2023 (Findings)
Can Language Models be Instructed to Protect Personal Information?
Yang Chen*, Ethan Mendes*, Sauvik Das, Wei Xu, Alan Ritter (* equal contribution)
arXiv 2310.02224

2022 and before

Improving Large-scale Paraphrase Acquisition and Generation [data/leaderboard]
Yao Dou, Chao Jiang, Wei Xu
EMNLP 2022
🦕 Stanceosaurus: Classifying Stance Towards Multicultural Misinformation [data]
Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter
EMNLP 2022
arXivEdits: Understanding the Human Revision Process in Scientific Writing [data]
Chao Jiang, Wei Xu, Sam Stevens
EMNLP 2022
A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification
Oliver Alonzo, Sooyeon Lee, Mounica Maddela, Wei Xu, Matt Huenerfauth
EMNLP TSAR Workshop 2022
Extracting a Knowledge Base of COVID-19 Events from Social Media [data]
Shi Zong, Ashutosh Baheti, Wei Xu, Alan Ritter
COLING 2022
BiSECT: Learning to Split and Rephrase Sentences with Bitexts [data/code]
Joongwon Kim*, Mounica Maddela*, Reno Kriz, Wei Xu, Chris Callison-Burch (* equal contribution)
EMNLP 2021
Pre-train or Annotate? Domain Adaptation with a Constrained Budget [data/code]
Fan Bai, Alan Ritter, Wei Xu
EMNLP 2021
WIKIBIAS: Detecting Multi-Span Subjective Biases in Language [data] [code]
Yang Zhong, Jingfeng Yang, Wei Xu, Diyi Yang
EMNLP 2021 (Findings)
Neural semi-Markov CRF for Monolingual Word Alignment [code/data][slides][video]
Wuwei Lan*, Chao Jiang*, Wei Xu (* equal contribution)
ACL 2021
Controllable Text Simplification with Explicit Paraphrasing [data/code][slides] [poster]
Mounica Maddela, Fernando Alva-Manchego, Wei Xu
NAACL 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics [project website]
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou
arXiv:2102.01672, ACL GEM Workshop 2021
An Empirical Study of Pre-trained Transformers for Arabic Information Extraction [pre-trained GigaBERT]
Wuwei Lan, Yang Chen, Wei Xu, Alan Ritter
EMNLP 2020
WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols [data]
Jeniya Tabassum, Sydney Lee, Wei Xu, Alan Ritter
EMNLP 2020 Workshop on Noisy User-generated Text (shared-task overview)
Neural CRF Model for Sentence Alignment in Text Simplification [code/data][slides][video]
Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu
ACL 2020
Code and Named Entity Recognition in StackOverflow [code/data][slides][video]
Jeniya Tabassum, Mounica Maddela, Wei Xu, Alan Ritter
ACL 2020
Generalizing Natural Language Analysis through Span-relation Representations [code/data]
Zhengbao Jiang, Wei Xu, Jun Araki, Graham Neubig
ACL 2020
Learning Relation Entailment with Structured and Textual Information
Zhengbao Jiang, Jun Araki, Donghan Yu, Ruohong Zhang, Wei Xu, Yiming Yang, Graham Neubig
AKBC 2020
Discourse Level Factors for Sentence Deletion in Text Simplification [poster][slides][data - email me]
Yang Zhong, Chao Jiang, Wei Xu, Junyi Jessy Li
AAAI 2020
Multi-task Pairwise Neural Ranking for Hashtag Segmentation [code/data][poster][bib][live demo]
Mounica Maddela, Wei Xu, Daniel Preoţiuc-Pietro
ACL 2019
A Word-Complexity Lexicon and a Neural Readability Ranking Model for Lexical Simplification [code/data][slides][video][bib]
Mounica Maddela, Wei Xu
EMNLP 2018
Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering [bib][code][slides]
Wuwei Lan, Wei Xu
COLING 2018 🏆 Best Paper Award
Character-based Neural Networks for Sentence Pair Modeling [bib][code][poster]
Wuwei Lan, Wei Xu
NAACL 2018
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols [bib][data (improved version)][poster]
Chaitanya Kulkarni, Wei Xu, Alan Ritter, Raghu Machiraju
NAACL 2018
A Continuously Growing Dataset of Sentential Paraphrases [bib][data][slides]
Wuwei Lan, Siyu Qiu, Hua He, Wei Xu
EMNLP 2017
From Shakespeare to Twitter: What are Language Styles all about? [bib][slides]
Wei Xu
EMNLP 2017 Workshop on Stylistic Variation
A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter [bib][slides]
Jeniya Tabassum, Alan Ritter, Wei Xu
EMNLP 2016
Results of the WNUT16 Named Entity Recognition Shared Task [bib]
Benjamin Strauss, Bethany Toma, Alan Ritter, Marie-Catherine de Marneffe, Wei Xu
COLING 2016 Workshop on Noisy User-generated Text (shared-task overview)
Optimizing Statistical Machine Translation for Text Simplification [bib][data/code][slides][video]
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch
TACL 2016, oral presentation at ACL 2016
Discovering User Attribute Stylistic Differences via Paraphrasing [bib] [data]
Daniel Preoţiuc-Pietro, Wei Xu, Lyle Ungar
AAAI 2016
Problems in Current Text Simplification Research: New Data Can Help [bib][data][slides][video]
Wei Xu, Chris Callison-Burch, Courtney Napoles
TACL 2015, oral presentation at EMNLP 2015
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition [bib]
Timothy Baldwin, Marie-Catherine de Marneffe, Bo Han, Young-Bum Kim, Alan Ritter, Wei Xu
ACL 2015 Workshop on Noisy User-generated Text (shared-task overview)
Cost Optimization for Crowdsourcing Translation [bib]
Mingkun Gao, Wei Xu, Chris Callison-Burch
NAACL 2015
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT) [bib][data & code - email me]
Wei Xu, Chris Callison-Burch, William B. Dolan
SemEval 2015 (shared-task overview)
Extracting Lexically Divergent Paraphrases from Twitter [bib][code][video][data - email me]
Wei Xu, Alan Ritter, Chris Callison-Burch, William B. Dolan, Yangfeng Ji
TACL 2014, oral presentation at NAACL 2015
Poetry of the Crowd: A Human Computation Algorithm to Convert Prose into Rhyming Verse [bib]
Quanze Chen, Chenyang Lei, Wei Xu, Ellie Pavlick, Chris Callison-Burch
HCOMP 2014 (work-in-progress)
Infusion of Labeled Data into Distant Supervision for Relation Extraction [bib]
Maria Pershina, Bonan Min, Wei Xu, Ralph Grishman
ACL 2014
Data-driven Approaches for Paraphrasing Across Language Variations [bib]
Wei Xu
PhD Thesis
Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction [bib][data]
Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman
ACL 2013
Gathering and Generating Paraphrases from Twitter with Application to Normalization [bib][data]
Wei Xu, Alan Ritter, Ralph Grishman
ACL 2013 Workshop on Building and Using Comparable Corpora
A Preliminary Study of Tweet Summarization using Information Extraction [bib][data]
Wei Xu, Ralph Grishman, Adam Meyers, Alan Ritter
NAACL 2013 Workshop on Language Analysis in Social Media
Paraphrasing for Style [bib][data/code]
Wei Xu, Alan Ritter, Bill Dolan, Ralph Grishman, Colin Cherry
COLING 2012
Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models [bib]
Wei Xu, Joel Tetreault, Martin Chodorow, Ralph Grishman, Le Zhao
EMNLP 2011
Passage Retrieval for Information Extraction using Distant Supervision
Wei Xu, Ralph Grishman, Le Zhao
IJCNLP 2011
New York University 2011 System for KBP Slot Filing
Ang Sun, Ralph Grishman, Wei Xu, Bonan Min
TAC 2011
Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Kristen Parton, Kathleen R. McKeown, Bob Coyne, Mona T. Diab, Ralph Grishman, Dilek Hakkani-Tür, Mary Harper, Heng Ji, Wei Yun Ma, Adam Meyers, Sara Stolbach, Ang Sun, Gokhan Tur, Wei Xu, Sibel Yaman
ACL 2009
A Parse-and-Trim Approach with Information Significance for Chinese Sentence Compression
Wei Xu, Ralph Grishman
ACL Workshop on Language Generation and Summarisation 2009
Transducing Logical Relations from Automatic and Manual Annotation
Adam Meyers, Michiko Kosaka, Heng Ji, Nianwen Xue, Mary Harper, Ang Sun, Wei Xu, Shasha Liao
ACL Workshop on Linguistic Annotation 2009
Automatic Recognition of Logical Relations for English, Chinese and Japanese in the GLARF Framework
Adam Meyers, Michiko Kosaka, Nianwen Xue, Heng Ji, Ang Sun, Shasha Liao, Wei Xu
SemEval 2009
Using Non-Local Features to Improve Named Entity Recognition Recall
Xinnian Mao, Wei Xu, Yuan Dong, Haila Wang
PACLIC 2007
Domain Extension of Chinese Named Entity Recognition
Wei Xu, Bin Fu, Liu Liu, Chunfa Yuan, Wenjie Li
Frontiers of Content Computing 2007
Extractive Summarization using Inter- and Intra- Event Relevance
Wenjie Li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu
ACL 2006
Deriving Event Relevance from the Ontology Constructed with Formal Concept Analysis
Wei Xu, Wenjie Li, Mingli Wu, Wei Li, Chunfa Yuan
CICLing 2006
Building Document Graphs for Multiple News Articles Summarization: An Event-Based Approach
Wei Xu, Wenjie Li, Mingli Wu, Wei Li, Chunfa Yuan, Kam-Fai Wong
ICCPOL 2006
The Hong Kong Polytechnic University at ACE2005
Wenjie Li, Wei Li, Mingli Wu, Wei Xu
ACE 2005

Teaching

Current and Upcoming Offering:

CS 8803-LLM (Georgia Tech) - Large Language Models (a research-oriented class - Spring 2026)
CS 4650 (Georgia Tech) - Natural Language Processing (undergraduate level - Autumn 2026)

Previous Offerings:

CS 8803-LLM (Georgia Tech) - Large Language Models (a new research-oriented class - Autumn 2024)
CS 8803-NLP (Georgia Tech) - Advanced NLP (a research-oriented class - Autumn 2023)
CS 7650 (Georgia Tech) - Natural Language Processing (graduate level - Autumn 2025, 2022, 2021; Spring 2024)
CS 4650 (Georgia Tech) - Natural Language Processing (undergraduate level - Spring 2025, 2023, 2022, 2021)
Speech and Language Processing (Spring 2020, 2017)
Social Media and Text Analytics (Autumn 2019, 2017, 2016)

Service

Executive board member: NAACL (2023-2024); nomination committee (2025)
Best paper award committee: ACL (2026); EMNLP (2024, 2022)
Senior area chair: NAACL (2025, 2022, 2021); EMNLP (2026, 2024, 2022); ACL (2026, 2020)
Area chair: COLM (2024); ACL (2023, 2019); EMNLP (2021, 2020, 2018, 2016); AAAI (2020); NAACL (2019); COLING (2018)
Workshop chair: ACL (2017)
Publicity chair: ACL (2026), EMNLP (2019), NAACL (2018, 2016)

Miscellaneous

When I have spare time, I enjoy visiting art museums, hiking, biking, and snowboarding.

I wrote a biography of my phd advisor Ralph Grishman along with some early history of Information Extraction research in 2017. Ralph was named an ACL Fellow and later received the ACL Lifetime Achievement Award.

I also photographed and made a list of the best dressed NLP researchers in 2016/17 , 2015 and 2014.

Wei Xu [phonetic pronunciation: way shoo ]