Wei Xu     

[phonetic pronunciation: way shoo ]

Assistant Professor
Department of Computer Science and Engineering
The Ohio State University
   495 Dreese Lab (2015 Neil Ave, Columbus, OH 43210)
   GHC 5715 - CMU

My research lies at the intersections of machine learning, natural language processing, and social media. I focus on designing algorithms for learning semantics from large data for natural language understanding, and generation in particular with stylistic variations. I recently received the NSF CRII Award, Criteo Faculty Research Award, CrowdFlower AI for Everyone Award, Best Paper Award at COLING'18, as well as research funds from DARPA. Previously, I was a postdoctoral researcher at the University of Pennsylvania. I received my PhD in Computer Science from New York University where I was a MacCracken Fellow, MS and BS from Tsinghua University.

I am an area chair for ACL 2019 (semantics area), NAACL 2019 (generation area), EMNLP 2018 (social media area), COLING 2018 (semantics area), EMNLP 2016 (generation area), a workshop chair for ACL 2017, and the publicity chair for EMNLP 2019, NAACL 2018 and 2016. I also created the Twitter API tutorial and a new course on Social Media and Text Analytics.

  I am looking for one or a few new PhD students every year. Here is a note to prospective students.
What's New
  Jul 28-Aug 2, Florence, Italy - ACL conference
  Nov 3-7, Hong Kong - organizing the Workshop on Noisy-user Generated Text at EMNLP-IJCNLP conference
CSE 5539 Social Media and Text Analytics (Autumn 2019; Autumn 2017; Autumn 2016)
CSE 5522 Artificial Intelligence II: Advanced Techniques (Autumn 2018; Spring 2018)
CSE 5525 Speech and Language Processing (Spring 2017)

Research Highlights

Natural Language Understanding / Semantics

We design machine learning algorithms to extract semantic or structured knowledge from large volumes of data. We have a series of work on learning web-scale paraphrases from Twitter that can enable natural language systems to handle errors (e.g. “everytime” ↔ “every time”), lexical variations (e.g. “oscar nom’d doc” ↔ “Oscar-nominated documentary”), rare words (e.g “NetsBulls series” ↔ “Nets and Bulls games”), and language shifts (e.g. “is bananas” ↔ “is great”). It is difficult to capture such lexically divergent paraphrases by the conventional similarity-based approaches. We design large-scale data [BUCC'13] [SemEval'15] [EMNLP'17], neural network models for sentence pair modeling [NAACL'18a] [COLING'18] and multi-instance learning models [TACL'14] [EMNLP'16], which jointly infers latent word-sentence relations.

Natural Language Generation / Stylistics

Many text-to-text generation problems can be thought of as sentential paraphrasing or monolingual machine translation. It faces an exponential search space larger than bilingual translation, but a much smaller optimal solution space due to specific task requirements. I advocate for a text-to-text generation framework, building on top of machine translation technologies. My recent work uncovered multiple serious problems in text simplification [TACL'15] research between 2010 and 2014, designed automatic evaluation metrics to optimize syntax-based machine translation models [TACL'16], and created neural ranking models to achieve new state-of-the-art results for lexical simplification [EMNLP'18]. I am interested in text generation with different language styles (e.g. historic ↔ modern [COLING'12], non-standard ↔ standard [BUCC'13], feminine ↔ masculine [AAAI'16]).

Current Students:
    Wuwei Lan (PhD student, 2016 -- ; semantics/deep learning EMNLP'17 NAACL'18a COLING'18)
    Mounica Maddela (PhD student, 2017 -- ; stylistics/neural ranking model EMNLP'18 ACL'19)
    Chao Jiang (PhD student, 2018 -- ; semantics NAACL'18)
    Jeniya Tabassum (PhD student; social media/information extraction EMNLP'16 - co-advisor: Alan Ritter)
    Sydney Lee (Undergraduate, summer 2018 --)
    Sarah Flanagan (Undergraduate, autumn 2018 --)
    Raleigh Potluri (Undergraduate, autumn 2018 --)
    Alex Wing (Undergraduate, summer 2019 --)

Former Student Advisees:
    Jim Chen (Undergraduate @UPenn; crowdsourcing HCOMP'14 TACL'16 - now PhD University of Washington)
    Ray Lei (Undergraduate @UPenn; crowdsourcing HCOMP'14 - now Microsoft Redmond)
    Mingkun Gao (Masters student @UPenn; crowdsourcing/machine translation NAACL'15 - now PhD UIUC)
    Siyu Qiu (Masters student @UPenn; semantics EMNLP'17 - now Hulu LA)
    Maria Pershina (PhD student @NYU; information extraction ACL'14 - now Goldman Sachs)
    Wenchao Du (Undergraduate @UWaterloo; dialog AAAI'17 SAP - now Master CMU LTI)
    Chaitanya Kulkarni (PhD student @OSU; robotic instructions NAACL'18b - advisor: Raghu Machiraju)
    Piyush Ghai (Masters student @OSU; semantics - now Amazon)
    Pravar Mahajan (Masters student @OSU; social media - now IBM Research Almaden)
    Rita Tong (Undergraduate @OSU - incoming MS student@UWisconsin-Madison)
    Lillian Chow (Undergraduate @OSU)

Professional Service
Workshop Chair:   ACL (2017)
Area Chair:   ACL (2019), NAACL (2019), COLING (2018), EMNLP (2018, 2016)
Publicity Chair:   EMNLP (2019), NAACL (2018, 2016)
     - Workshop on Noisy User-generated Text (W-NUT) at ACL 2015, COLING 2016, EMNLP 2017, 2018 and 2019
     - SemEval 2015 shared-task: Paraphrases and Semantic Similarity in Twitter
     - 2016 Mid-Atlantic Student Colloquium on Speech, Language and Learning
Program Committee:
     ACL (2018, 2017, 2015, 2014, 2013), NAACL (2018, 2015), EMNLP (2017, 2016, 2015, 2014), COLING (2016, 2014)
     WWW (2016, 2015), AAAI (2016, 2015, 2012), KDD (2015)
Journal Reviewer:
     Transactions of the Association for Computational Linguistics (TACL)
     Journal of Artificial Intelligence Research (JAIR)

Invited Talks

When I have spare time, I enjoy art, visiting museums, swimming and snowboarding.

I wrote a biography of my phd advisor Ralph Grishman along with some early history of Information Extraction research in 2017.

I also made a list of the best dressed NLP researchers in 2016/17 , 2015 and 2014.