Wei Xu     

[phonetic pronunciation: way shoo ]

Assistant Professor
Department of Computer Science and Engineering
The Ohio State University
   495 Dreese Lab (2015 Neil Ave, Columbus, OH 43210)

My research lies at the intersections of machine learning, natural language processing, and social media. I focus on designing algorithms for learning semantics from large data for natural language understanding, and generation in particular with stylistic variations. I recently received the NSF CRII Award, Criteo Faculty Research Award, CrowdFlower AI for Everyone Award, Best Paper Award at COLING'18, as well as research funds from DARPA. Previously, I was a postdoctoral researcher at the University of Pennsylvania. I received my PhD in Computer Science from New York University where I was a MacCracken Fellow, MS and BS from Tsinghua University.

I am a senior area chair for ACL 2020 (generation), and an area chair for EMNLP 2020 (generation), AAAI 2020 (NLP), ACL 2019 (semantics), NAACL 2019 (generation), EMNLP 2018 (social media), COLING 2018 (semantics), EMNLP 2016 (generation), a workshop chair for ACL 2017, and the publicity chair for EMNLP 2019, NAACL 2018 and 2016. I also created the Twitter API tutorial and a new course on Social Media and Text Analytics.

  I am looking for one or a few new PhD students every year. Here is a note to prospective students.
What's New
  July 5-10, ACL 2020
  Nov 16-19, EMNLP 2020 - organizing the 6th Int'l Workshop on Noisy User-generated Text
CSE 5539 Social Media and Text Analytics (Autumn 2019; Autumn 2017; Autumn 2016)
CSE 5522 Artificial Intelligence II: Advanced Techniques (Autumn 2018; Spring 2018)
CSE 5525 Speech and Language Processing (Spring 2020; Spring 2017)

Research Highlights

Natural Language Understanding / Semantics

My approach to natural language understanding is learning and modeling paraphrases on a much larger scale and with a much broader range than previous work, essentially by developing more robust machine learning models and leveraging social media data. These paraphrase can enable natural language systems to handle errors (e.g., “everytime” ↔ “every time”), lexical variations (e.g., “oscar nom’d doc” ↔ “Oscar-nominated documentary”), rare words (e.g “NetsBulls series” ↔ “Nets and Bulls games”), and language shifts (e.g. “is bananas” ↔ “is great”). We designed a series of unsupervised and supervised learning approaches for paraphrase identification in social media data (also applicable to question/answer pairs [COLING'18] for QA systems), ranging from neural network models [COLING'18] [NAACL'18a] to multi-instance learning [TACL'14] [EMNLP'16], and crowdsourcing large-scale datasets [SemEval'15] [EMNLP'17].

Natural Language Generation / Stylistics

Many text-to-text generation problems can be thought of as sentential paraphrasing or monolingual machine translation. It faces an exponential search space larger than bilingual translation, but a much smaller optimal solution space due to specific task requirements. I advocate for a text-to-text generation framework, building on top of machine translation technologies. My recent work uncovered multiple serious problems in previous research (2010 and 2014) on text simplification [TACL'15] , designed a new tunable metric SARI [TACL'16] which is effective for evaluation and as a learning objective for training (now added by the Google AI group to TensorFlow), optimized syntax-based machine translation models [TACL'16], created pairwise neural ranking models to for lexical simplification [EMNLP'18], and studied document-level simplification [AAAI'20]. Our newest Transformer-based model initialized with BERT is the current state-of-the-art for automatic text simplification [ACL'20a]. I am interested in text generation for style transfer [COLING'12] and stylistics in general (e.g. historic ↔ modern, non-standard ↔ standard [BUCC'13], feminine ↔ masculine [AAAI'16]).

Current Students:
    Wuwei Lan (PhD student, 2016 -- ; semantics/deep learning COLING'18 NAACL'18a EMNLP'17 )
    Mounica Maddela (PhD student, 2017 -- ; generation/neural ranking model ACL'19 EMNLP'18)
    Chao Jiang (PhD student, 2018 -- ; semantics/crowdsourcing ACL'20a NAACL'18)
    Yang Zhong (PhD student, 2019 -- ; stylistics AAAI'20 AAAI'19)
    Jeniya Tabassum (PhD student; social media/information extraction ACL'20b EMNLP'16 - co-advisor: Alan Ritter)
    Sydney Lee (Undergraduate @OSU, summer 2018 --)
    Sarah Flanagan (Undergraduate @OSU, autumn 2018 --)
    Ethan Lee (Undergraduate, spring 2020 -- )
    Sam Stevens (Undergraduate, autumn 2019 -- undergraduate research thesis)
    Yulu Qin (Undergraduate, summer 2020 -- )
    Kenneth Koepcke (Undergraduate, summer 2020 -- )
    Panya Bhinder (High school intern, summer 2020 -- )

Former Student Advisees:
    Jim Chen (Undergraduate @UPenn; crowdsourcing HCOMP'14 TACL'16 → PhD @University of Washington)
    Ray Lei (Undergraduate @UPenn; crowdsourcing HCOMP'14 → Microsoft)
    Mingkun Gao (Masters student @UPenn; crowdsourcing/machine translation NAACL'15 → PhD student @UIUC)
    Siyu Qiu (Masters student @UPenn; semantics EMNLP'17 → Hulu)
    Maria Pershina (PhD student @NYU; information extraction ACL'14 → Goldman Sachs → Bloomberg)
    Wenchao Du (Undergraduate @UWaterloo; dialog AAAI'17 SAP → Masters @CMU LTI)
    Chaitanya Kulkarni (PhD student @OSU; robotic instructions NAACL'18b - advisor: Raghu Machiraju)
    Piyush Ghai (Masters student @OSU; semantics → Amazon)
    Pravar Mahajan (Masters student @OSU; social media → IBM Research Almaden → Google)
    Rita Tong (Undergraduate @OSU → Master's student @UWisconsin-Madison)
    Lillian Chow (Undergraduate @OSU, summer 2018 - spring 2019)
    Raleigh Potluri (Undergraduate @OSU, autumn 2018 - summer 2019)
    Daniel Szoke (Undergraduate @OSU, autumn 2019 - spring 2020)
    Jaewook Lee (Undergraduate @OSU, autumn 2019 - spring 2020)

Professional Service
Workshop Chair:   ACL (2017)
Area Chair/Senior Area Chair:   ACL (2020, 2019), EMNLP (2020, 2018, 2016), AAAI (2020), NAACL (2019), COLING (2018)
Publicity Chair:   EMNLP (2019), NAACL (2018, 2016)
     - Workshop on Noisy User-generated Text (W-NUT) at ACL 2015, COLING 2016, EMNLP 2017, 2018, 2019, 2020
     - SemEval 2015 shared-task: Paraphrases and Semantic Similarity in Twitter
     - 2016 Mid-Atlantic Student Colloquium on Speech, Language and Learning
Program Committee:
     ACL (2018, 2017, 2015, 2014, 2013), NAACL (2018, 2015), EMNLP (2017, 2016, 2015, 2014), COLING (2016, 2014)
     WWW (2016, 2015), AAAI (2016, 2015, 2012), KDD (2015)
Journal Reviewer:
     Transactions of the Association for Computational Linguistics (TACL)
     Journal of Artificial Intelligence Research (JAIR)

Invited Talks

When I have spare time, I enjoy visiting art museums, swimming, running, and snowboarding.

I wrote a biography of my phd advisor Ralph Grishman along with some early history of Information Extraction research in 2017.

I also made a list of the best dressed NLP researchers in 2016/17 , 2015 and 2014.