Wei Xu - Georgia Tech - College of Computing

Wei Xu
[phonetic pronunciation: way shoo ]

Associate Professor
College of Computing
Georgia Institute of Technology
wei.xu@cc.gatech.edu
@cocoweixu

I am a faculty member of the School of Interactive Computing and Machine Learning Center at Georgia Tech. My research lies at the intersections of machine learning, natural language processing, and social media. I direct the NLP X Lab which currently focuses on (1) large language models, such as cultural bias, multilingual capability, temporal shifts, and personalization; (2) text generation, such as constrained decoding and learnable evaluation metric; and (3) interdisciplinary NLP applications that can make impact in education, security, accessibility, etc. I received the NSF CAREER Award, Faculty Research Awards from Google, Sony, and Criteo, CrowdFlower AI for Everyone Award, Best Paper Awards at COLING'18 and ACL'24, as well as research funds from DARPA and IARPA. I am a member of NAACL executive board. I was a postdoctoral researcher at the University of Pennsylvania. I received my PhD in Computer Science from New York University, BSMS from Tsinghua University.

I'm recruiting 1-2 PhD students every year (apply to Machine Learning or CS PhD program and list me as a potential advisor; if you have EE background, consider also apply to ML ECE program). I recruit MS students (apply to MSCS program and email me) and undergraduates who have sufficient time and motivation for research theses.

What's New

Aug 2025, 4 papers accepted to EMNLP 2025 main conference
Aug 2025, talk at Apple ML research (virtual) "Probabilistic Reasoning and Multicultural Alignment in LLMs"
Jul 2025, talk at JPMorgan AI Research (virtual)
May 2025, talk at Sungkyunkwan University, South Korea
May 2025, keynote at PrivateNLP@NAACL "Empowering Everyday Users to Protect Their Privacy in the Age of AI".
May 2025, co-organize the 10th Workshop for Noisy User-generated Text (WNUT) at NAACL 2025.
Apr 2025, co-organize Human-centered Evaluation and Auditing of Language Models Workshop at CHI 2025.
Mar 2025, talk at University of Pennsylvania
Feb 2025, talk at University of Massachusetts, Lowell (virtual)
Feb 2025, talk at Google Research, Mountain View
Feb 2025, talk at University of California, Berkeley
Dec 2024, Chao Jiang successfully defended his phd thesis and will join Apple AI/ML Research
Oct 2024, talk at Bloomberg's CTO Data Science Speaker Series
Oct 2024, talk at Stony Brook University, New York
Oct 2024, 🏆 won the Google Academic Research Award!
Oct 2024, talk at Tokyo Institute of Technology on "Enhancing Multilingual Capabilities in LLMs" (slides)
Sep 2024, 4 long papers and 1 short paper accepted to EMNLP main conference.
Sep 2024, talk at MIT on "Cultural Biases, World Languages, and Privacy in Large Language Models" (slides).
Sep 2024, talk at Northeastern on "Human-AI Collaboration in Evaluating LLMs".
Aug 2024, 🏆 our paper on multicultural LLMs won the Best Social Impact Award at ACL 2024!
Aug 2024, tutorial at ACL 2024 on "Automatic and Human-AI Interactive Text Generation" (slides)
Aug 2024, my PhD advisor Ralph Grishman won the ACL Lifetime Achievement Award
Aug 2024, talk at Megagon (virtual)
July 2024, Yang Chen successfully defended his phd thesis, and will join NVIDIA as a research scientist.
June 2024, talk at NSF workshop on AI Text Production (virtual)
May 2024, 6 long papers accepted to ACL 2024 main conference!
May 2024, keynote at CHI 2024 HEAL Workshop on "Human-AI Collaboration in Evaluating LLMs" (slides).
May 2024, Yao Dou will start his summer internship at Microsoft Research; Chao Jiang will intern at Apple.
Apr 2024, 🏆 David Heineman won the CoC Outstanding Undergraduate Research Award!
Mar 2024, press coverage by VentureBeat on our new research about cultural biases in LLMs
Mar 2024, talk at USC and UCLA on "Amazing Multilingual Capabilities and Concerning Cultural Biases in LLMs"
Oct 2023, demo of Thresh 🌾 has been accepted to EMNLP 2023 -- a customizable tool for fine-grained human evaluation of LLM generated texts (e.g., MT, summarization, text revision, + more)
Aug 2023, I was quoted in Business Insider about AI-generated content online.
Aug 2023, Mounica Maddela defended her PhD thesis and will join Bloomberg AI's LLM group
July 2023, our paper on multilingual text simplification received Honorable Mention Award at ACL 2023!

Research Highlights

Multilingual Multicultural LLMs

While LLMs have demonstrated impressive performance, their success is largely concentrated in English and other high-resource languages. In contrast, many non-English languages remain underrepresented and underserved. Moreover, these models often reflect Western cultural biases and struggle to capture the nuances of non-Western cultural contexts (Naous et al., ACL 2024; Naous et al., NAACL 2025). We work on identifying and closing these gaps in performance and cultural adaptation. Addressing these challenges calls for a deeper analysis of pre-training data to identify and mitigate representational gaps, as well as alignment (Guo et al., arXiv 2025) and inference-time algorithms (Le at al., ICLR 2024) that can dynamically adapt model behavior to diverse linguistic and cultural contexts.

Robustness and Reasoning of LLMs

Artificial General Intelligence (AGI) benchmarks seek to assess an AI system’s capacity to perform tasks that require human-level intelligence, including reasoning, learning, and adapting to novel situations (Zheng et al., ACL 2024; Mendes et al., EMNLP 2024). While current systems fall short of true AGI, there is growing interest in moving beyond static benchmarks toward more realistic, dynamic evaluations. Our research focuses on designing real-world tasks that better reflect practical challenges faced by LLMs, and on developing innovative methods (Zheng et al., arXiv 2025) to enhance their robustness and performance in these complex settings.

Interdisciplinary NLP+X Research

We actively collaborate with researchers to explore impactful real-world applications of large language models in Human-Computer Interaction, Security and Privacy, Healthcare, and Law (Jiang et al., EMNLP 2024; Dou et al., ACL 2024). As LLMs continue to advance, they offer exciting new capabilities across specialized domains. There are a lot of opportunities, as LLMs often exhibit promising but inconsistent performance in domain-specific tasks, where precision, context sensitivity, and domain knowledge are critical.

NLP X Lab

photos together with Alan Ritter's group

    Yao Dou (CS PhD student; human-centered LLM evaluation, generation)
    Tarek Naous (ECE ML PhD; multilingual multicultural LLM)
    Duong Minh Le (CS PhD; multilingual LLM -- co-advisor: Alan Ritter)
    Jonathan Zheng (ML PhD; reasoning, robustness of LLM -- co-advisor: Alan Ritter)
    Geyang Guo (CS PhD; LLM alignment -- co-advisor: Alan Ritter)
    Junmo Kang (CS PhD; efficiency -- co-advisor: Alan Ritter)
    Zirui Shao (visiting PhD student from Zhejiang University)
    Usneek Singh (MS, autumn 2025 -- )
    Zicong He (ECE MS, summer 2025 -- )
    Govind Ramesh (BSMS, winter 2022 -- ; LLM safety)
    Jerry Zheng (BSMS, autumn 2025 -- )
    Julie Young (BSMS, autumn 2025 -- )
    Rachel Choi (part-time, summer 2022 -- )
    Oleksandr Lavreniuk (Undergrad, summer 2024 -- )
    Sara Takagi (Undergrad, summer 2025 -- )
    Katerina Addington (Undergrad, autumn 2025 -- )
    Eric Kim (Undergrad, autumn 2025 -- )
    Frank Chang (Undergrad, autumn 2025 -- )
    Guanjun Yan (Undergrad, autumn 2025 -- )
    Alexey Plagov (Undergrad, autumn 2025 -- )
    Benjamin Mamut (Undergrad, autumn 2025 -- )
    Jiayu Liu (Undergrad intern from UIUC, summer 2025 -- )

Alumni (with theses)

    Chao Jiang (PhD 2025 → Apple AI/ML research)
    Yang Chen (PhD 2024, co-advisor: Alan Ritter → Research Scientist at NVIDIA)
    Mounica Maddela (PhD 2023 → Bloomberg AI)
    Wuwei Lan (PhD 2021 → Applied Scientist at Amazon)
    Xiaofeng Wu (MS 2025 → Baidu)
    Marcus Ma (MS 2024 → PhD student at USC)
    Anton Lavrouk (MS 2024 → Lockheed Martin)
    David Heineman (BS 2024, CoC Outstanding Undergrad Research Award → Predoctoral young investigator at AI2)
    Jonathan Zheng (BS 2023 → PhD student at Georgia Tech)
    Michael Ryan (BS 2023 → PhD student at Stanford)

Publications

Preprints

Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges
Xiaofeng Wu, Alan Ritter, Wei Xu
arXiv, 2025
Probabilistic Reasoning with LLMs for Privacy Risk Estimation
Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu
arXiv, 2025

2025

CARE: Assessing the Impact of Multilingual Human Preference Learning on Cultural Awareness
Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu
EMNLP 2025
What are Foundation Models Cooking in the Post-Soviet World?
Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu
EMNLP 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo, Wei Xu, Alan Ritter
EMNLP 2025
USimBench: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants? (preprint coming soon)
Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu, Jianfeng Gao
EMNLP 2025
Beyond the Reported Cutoff: Where Large Language Models Fall Short on Financial Knowledge
Agam Shah, Liqin Ye, Sebastian Jaskowski, Wei Xu, Sudheer Chava
COLM 2025
Evaluating LLMs on Chinese Idiom Translation
Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM 2025
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena
Tarek Naous, Wei Xu
NAACL 2025
The Impact of Visual Information in Chinese Characters
Xiaofeng Wu, Karl Stratos, Wei Xu
NAACL 2025
Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy*, Pradyumna Tambwekar*, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, Matthew Gombolay
(* equal contribution)
ICLR 2025
CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark
Marcus Ma, Duong Minh Le, Junmo Kang, Yao Dou, John Cadigan, Dayne Freitag, Alan Ritter, Wei Xu
AAAI 2025
Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI
Isadora Krsek, Anubha Kabra, Yao Dou, Tarek Naous, Laura A. Dabbish, Alan Ritter, Wei Xu, Sauvik Das
CSCW 2025
2024
Granular Privacy Control for Geolocation with Vision Language Models
Ethan Mendes, Yang Chen, James Hays, Sauvik Das, Wei Xu, Alan Ritter
EMNLP 2024
MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain
Chao Jiang, Wei Xu
EMNLP 2024
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment
Tarek Naous, Michael J. Ryan, Anton Lavrouk, Mohit Chandra, Wei Xu
EMNLP 2024
Improving Minimum Bayes Risk Decoding with Multi-Prompt
David Heineman, Yao Dou, Wei Xu
EMNLP 2024
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
Govind Ramesh, Yao Dou, Wei Xu
EMNLP 2024
ChatHF: Collecting Rich Human Feedback from Real-time Conversations [video]
Andrew Li, Zhenduo Wang, Ethan Mendes, Duong Minh Le, Wei Xu, Alan Ritter
EMNLP 2024 (Demo)
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu
ACL 2024 🏆 Best Social Impact Award
Press Coverage by VentureBeat
Reducing Privacy Risks in Online Self-Disclosures with Language Models
Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, Wei Xu
ACL 2024
NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms
Jonathan Zheng, Alan Ritter, Wei Xu
ACL 2024
Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
Ruohao Guo, Wei Xu, Alan Ritter
ACL 2024
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li
ACL 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li
ACL 2024
Automatic and Human-AI Interactive Text Generation (slides)
Yao Dou*, Philippe Laban*, Claire Gardent, Wei Xu (* equal contribution)
ACL 2024 (Tutorial)
Constrained Decoding for Cross-lingual Label Projection
Duong Minh Le, Yang Chen, Alan Ritter, Wei Xu
ICLR 2024
Design and Evaluation of an Automatic Text Simplification Prototype with Deaf and Hard-of-hearing Readers
Oliver Alonzo, Sooyeon Lee, Akhter Al Amin, Mounica Maddela, Wei Xu, Matt Huenerfauth
ASSETS 2024
Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation
Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu
EACL 2024 Workshop on Noisy User-generated Text

2023

Thresh 🌾: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation [code/demo]
David Heineman, Yao Dou, Wei Xu
EMNLP 2023 (Demo)
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP 2023
Multilingual Simplification of Medical Texts
Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh Ramanathan, Wei Xu, Byron Wallace, Junyi Jessy Li
EMNLP 2023
A Computational Interface to Infer Strategic Intent from Unstructured Language in a Low-Data Setting
Pradyumna Tambwekar, Lakshita Dodeja, Nathan Vaska, Wei Xu, Matthew Gombolay
EMNLP 2023 (Findings)
LENS 🔎 - A Learnable Evaluation Metric for Text Simplification [code/demo]
Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu (* equal contribution)
ACL 2023
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
Junmo Kang, Wei Xu, Alan Ritter
ACL 2023
Revisiting non-English Text Simplification: A Unified Multilingual Benchmark
Michael J. Ryan, Tarek Naous, Wei Xu
ACL 2023 🏆 Best Paper Award Honorable Mention
Improved Instruction Ordering in Recipe-Grounded Conversation
Duong Minh Le, Ruohao Guo, Wei Xu, Alan Ritter
ACL 2023 Press Coverage by GT News
Human-in-the-loop Evaluation for Early Misinformation Detection
Ethan Mendes, Yang Chen, Wei Xu, Alan Ritter
ACL 2023
Frustratingly Easy Label Projection for Cross-lingual Transfer
Yang Chen, Chao Jiang, Alan Ritter, Wei Xu
ACL 2023 (Findings)
Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification
Renliang Sun, Wei Xu, Xiaojun Wan
ACL 2023 (Findings)
Can Language Models be Instructed to Protect Personal Information?
Yang Chen*, Ethan Mendes*, Sauvik Das, Wei Xu, Alan Ritter (* equal contribution)
arXiv 2310.02224

2022 and before

Improving Large-scale Paraphrase Acquisition and Generation [data/leaderboard]
Yao Dou, Chao Jiang, Wei Xu
EMNLP 2022
🦕 Stanceosaurus: Classifying Stance Towards Multicultural Misinformation [data]
Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter
EMNLP 2022
arXivEdits: Understanding the Human Revision Process in Scientific Writing [data]
Chao Jiang, Wei Xu, Sam Stevens
EMNLP 2022
A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification
Oliver Alonzo, Sooyeon Lee, Mounica Maddela, Wei Xu, Matt Huenerfauth
EMNLP TSAR Workshop 2022
Extracting a Knowledge Base of COVID-19 Events from Social Media [data]
Shi Zong, Ashutosh Baheti, Wei Xu, Alan Ritter
COLING 2022
BiSECT: Learning to Split and Rephrase Sentences with Bitexts [data/code]
Joongwon Kim*, Mounica Maddela*, Reno Kriz, Wei Xu, Chris Callison-Burch (* equal contribution)
EMNLP 2021
Pre-train or Annotate? Domain Adaptation with a Constrained Budget [data/code]
Fan Bai, Alan Ritter, Wei Xu
EMNLP 2021
WIKIBIAS: Detecting Multi-Span Subjective Biases in Language [data] [code]
Yang Zhong, Jingfeng Yang, Wei Xu, Diyi Yang
EMNLP 2021 (Findings)
Neural semi-Markov CRF for Monolingual Word Alignment [code/data][slides][video]
Wuwei Lan*, Chao Jiang*, Wei Xu (* equal contribution)
ACL 2021
Controllable Text Simplification with Explicit Paraphrasing [data/code][slides] [poster]
Mounica Maddela, Fernando Alva-Manchego, Wei Xu
NAACL 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics [project website]
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou
arXiv:2102.01672, ACL GEM Workshop 2021
An Empirical Study of Pre-trained Transformers for Arabic Information Extraction [pre-trained GigaBERT]
Wuwei Lan, Yang Chen, Wei Xu, Alan Ritter
EMNLP 2020
WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols [data]
Jeniya Tabassum, Sydney Lee, Wei Xu, Alan Ritter
EMNLP 2020 Workshop on Noisy User-generated Text (shared-task overview)
Neural CRF Model for Sentence Alignment in Text Simplification [code/data][slides][video]
Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu
ACL 2020
Code and Named Entity Recognition in StackOverflow [code/data][slides][video]
Jeniya Tabassum, Mounica Maddela, Wei Xu, Alan Ritter
ACL 2020
Generalizing Natural Language Analysis through Span-relation Representations [code/data]
Zhengbao Jiang, Wei Xu, Jun Araki, Graham Neubig
ACL 2020
Learning Relation Entailment with Structured and Textual Information
Zhengbao Jiang, Jun Araki, Donghan Yu, Ruohong Zhang, Wei Xu, Yiming Yang, Graham Neubig
AKBC 2020
Discourse Level Factors for Sentence Deletion in Text Simplification [poster][slides][data - email me]
Yang Zhong, Chao Jiang, Wei Xu, Junyi Jessy Li
AAAI 2020
Multi-task Pairwise Neural Ranking for Hashtag Segmentation [code/data][poster][bib][live demo]
Mounica Maddela, Wei Xu, Daniel Preoţiuc-Pietro
ACL 2019
A Word-Complexity Lexicon and a Neural Readability Ranking Model for Lexical Simplification [code/data][slides][video][bib]
Mounica Maddela, Wei Xu
EMNLP 2018
Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering [bib][code][slides]
Wuwei Lan, Wei Xu
COLING 2018 🏆 Best Paper Award
Character-based Neural Networks for Sentence Pair Modeling [bib][code][poster]
Wuwei Lan, Wei Xu
NAACL 2018
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols [bib][data (improved version)][poster]
Chaitanya Kulkarni, Wei Xu, Alan Ritter, Raghu Machiraju
NAACL 2018
A Continuously Growing Dataset of Sentential Paraphrases [bib][data][slides]
Wuwei Lan, Siyu Qiu, Hua He, Wei Xu
EMNLP 2017
From Shakespeare to Twitter: What are Language Styles all about? [bib][slides]
Wei Xu
EMNLP 2017 Workshop on Stylistic Variation
A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter [bib][slides]
Jeniya Tabassum, Alan Ritter, Wei Xu
EMNLP 2016
Results of the WNUT16 Named Entity Recognition Shared Task [bib]
Benjamin Strauss, Bethany Toma, Alan Ritter, Marie-Catherine de Marneffe, Wei Xu
COLING 2016 Workshop on Noisy User-generated Text (shared-task overview)
Optimizing Statistical Machine Translation for Text Simplification [bib][data/code][slides][video]
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch
TACL 2016, oral presentation at ACL 2016
Discovering User Attribute Stylistic Differences via Paraphrasing [bib] [data]
Daniel Preoţiuc-Pietro, Wei Xu, Lyle Ungar
AAAI 2016
Problems in Current Text Simplification Research: New Data Can Help [bib][data][slides][video]
Wei Xu, Chris Callison-Burch, Courtney Napoles
TACL 2015, oral presentation at EMNLP 2015
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition [bib]
Timothy Baldwin, Marie-Catherine de Marneffe, Bo Han, Young-Bum Kim, Alan Ritter, Wei Xu
ACL 2015 Workshop on Noisy User-generated Text (shared-task overview)
Cost Optimization for Crowdsourcing Translation [bib]
Mingkun Gao, Wei Xu, Chris Callison-Burch
NAACL 2015
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT) [bib][data & code - email me]
Wei Xu, Chris Callison-Burch, William B. Dolan
SemEval 2015 (shared-task overview)
Extracting Lexically Divergent Paraphrases from Twitter [bib][code][video][data - email me]
Wei Xu, Alan Ritter, Chris Callison-Burch, William B. Dolan, Yangfeng Ji
TACL 2014, oral presentation at NAACL 2015
Poetry of the Crowd: A Human Computation Algorithm to Convert Prose into Rhyming Verse [bib]
Quanze Chen, Chenyang Lei, Wei Xu, Ellie Pavlick, Chris Callison-Burch
HCOMP 2014 (work-in-progress)
Infusion of Labeled Data into Distant Supervision for Relation Extraction [bib]
Maria Pershina, Bonan Min, Wei Xu, Ralph Grishman
ACL 2014
Data-driven Approaches for Paraphrasing Across Language Variations [bib]
Wei Xu
PhD Thesis
Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction [bib][data]
Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman
ACL 2013
Gathering and Generating Paraphrases from Twitter with Application to Normalization [bib][data]
Wei Xu, Alan Ritter, Ralph Grishman
ACL 2013 Workshop on Building and Using Comparable Corpora
A Preliminary Study of Tweet Summarization using Information Extraction [bib][data]
Wei Xu, Ralph Grishman, Adam Meyers, Alan Ritter
NAACL 2013 Workshop on Language Analysis in Social Media
Paraphrasing for Style [bib][data/code]
Wei Xu, Alan Ritter, Bill Dolan, Ralph Grishman, Colin Cherry
COLING 2012
Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models [bib]
Wei Xu, Joel Tetreault, Martin Chodorow, Ralph Grishman, Le Zhao
EMNLP 2011
Passage Retrieval for Information Extraction using Distant Supervision
Wei Xu, Ralph Grishman, Le Zhao
IJCNLP 2011
New York University 2011 System for KBP Slot Filing
Ang Sun, Ralph Grishman, Wei Xu, Bonan Min
TAC 2011
Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Kristen Parton, Kathleen R. McKeown, Bob Coyne, Mona T. Diab, Ralph Grishman, Dilek Hakkani-Tür, Mary Harper, Heng Ji, Wei Yun Ma, Adam Meyers, Sara Stolbach, Ang Sun, Gokhan Tur, Wei Xu, Sibel Yaman
ACL 2009
A Parse-and-Trim Approach with Information Significance for Chinese Sentence Compression
Wei Xu, Ralph Grishman
ACL Workshop on Language Generation and Summarisation 2009
Transducing Logical Relations from Automatic and Manual Annotation
Adam Meyers, Michiko Kosaka, Heng Ji, Nianwen Xue, Mary Harper, Ang Sun, Wei Xu, Shasha Liao
ACL Workshop on Linguistic Annotation 2009
Automatic Recognition of Logical Relations for English, Chinese and Japanese in the GLARF Framework
Adam Meyers, Michiko Kosaka, Nianwen Xue, Heng Ji, Ang Sun, Shasha Liao, Wei Xu
SemEval 2009
Using Non-Local Features to Improve Named Entity Recognition Recall
Xinnian Mao, Wei Xu, Yuan Dong, Haila Wang
PACLIC 2007
Domain Extension of Chinese Named Entity Recognition
Wei Xu, Bin Fu, Liu Liu, Chunfa Yuan, Wenjie Li
Frontiers of Content Computing 2007
Extractive Summarization using Inter- and Intra- Event Relevance
Wenjie Li, Wei Xu, Mingli Wu, Chunfa Yuan, Qin Lu
ACL 2006
Deriving Event Relevance from the Ontology Constructed with Formal Concept Analysis
Wei Xu, Wenjie Li, Mingli Wu, Wei Li, Chunfa Yuan
CICLing 2006
Building Document Graphs for Multiple News Articles Summarization: An Event-Based Approach
Wei Xu, Wenjie Li, Mingli Wu, Wei Li, Chunfa Yuan, Kam-Fai Wong
ICCPOL 2006
The Hong Kong Polytechnic University at ACE2005
Wenjie Li, Wei Li, Mingli Wu, Wei Xu
ACE 2005

Teaching

Current Offering:

CS 7650 (Georgia Tech) - Natural Language Processing (graduate level - Autumn 2025)

Previous Offerings:

CS 8803-LLM (Georgia Tech) - Large Language Models (a new research-oriented class - Autumn 2024)
CS 8803-NLP (Georgia Tech) - Advanced NLP (a research-oriented class - Autumn 2023)
CS 7650 (Georgia Tech) - Natural Language Processing (graduate level - Spring 2024, Autumn 2022, 2021)
CS 4650 (Georgia Tech) - Natural Language Processing (undergraduate level - Spring 2025, 2023, 2022, 2021))
Speech and Language Processing (Spring 2020, 2017)
Social Media and Text Analytics (Autumn 2019, 2017, 2016)

Service

I am or was an executive board member of NAACL (2023-2024), a best paper award committee member for EMNLP 2022 and 2024, a senior area chair for EMNLP 2024 (resource and evaluation), 2022 (generation); NAACL 2025 (generation), 2022 (machine learning for NLP), 2021 (generation), and ACL 2020 (generation), and an area chair for COLM 2024, ACL 2023 (semantics), EMNLP 2021 (computational social science), EMNLP 2020 (generation), AAAI 2020 (NLP), ACL 2019 (semantics), NAACL 2019 (generation), EMNLP 2018 (social media), COLING 2018 (semantics), EMNLP 2016 (generation), a workshop chair for ACL 2017, and the publicity chair for ACL 2026, EMNLP 2019, NAACL 2018 and 2016.

Miscellaneous

When I have spare time, I enjoy visiting art museums, hiking, biking, and snowboarding.

I wrote a biography of my phd advisor Ralph Grishman along with some early history of Information Extraction research in 2017.

I also made a list of the best dressed NLP researchers in 2016/17 , 2015 and 2014.

Wei Xu [phonetic pronunciation: way shoo ]