Wei Xu     

[phonetic pronunciation: way shoo ]

Assistant Professor
Department of Computer Science and Engineering
The Ohio State University
   495 Dreese Lab (2015 Neil Ave, Columbus, OH 43210)

My research lies at the intersections of machine learning, natural language processing, and social media. I focus on designing algorithms for learning semantics from large data for natural language understanding, and generation in particular with stylistic variations. I recently received the NSF CRII Award, Criteo Faculty Research Award, CrowdFlower AI for Everyone Award, Best Paper Award at COLING'18, as well as research funds from DARPA. Previously, I was a postdoctoral researcher at the University of Pennsylvania. I received my PhD in Computer Science from New York University where I was a MacCracken Fellow, MS and BS from Tsinghua University.

I am an area chair for ACL 2019 (semantics area), NAACL 2019 (generation area), EMNLP 2018 (social media area), COLING 2018 (semantics area), EMNLP 2016 (generation area), a workshop chair for ACL 2017, and the publicity chair for EMNLP 2019, NAACL 2018 and 2016. I also created the Twitter API tutorial and a new course on Social Media and Text Analytics.

  I am looking for one or two new PhD students every year. Here is a note to prospective students.
What's New
  June 3-6, Minneapolis, MN - NAACL conference
  Nov 3-7, Hong Kong - EMNLP-IJCNLP conference
CSE 5522 Artificial Intelligence II: Advanced Techniques (Autumn 2018; Spring 2018)
CSE 5539 Social Media and Text Analytics (Autumn 2017; Autumn 2016)
CSE 5525 Speech and Language Processing (Spring 2017)

Research Highlights

Natural Language Understanding / Semantics

We design machine learning algorithms to extract semantic or structured knowledge from large volumes of data. We have a series of work on learning web-scale paraphrases from Twitter that can enable natural language systems to handle errors (e.g. “everytime” ↔ “every time”), lexical variations (e.g. “oscar nom’d doc” ↔ “Oscar-nominated documentary”), rare words (e.g “NetsBulls series” ↔ “Nets and Bulls games”), and language shifts (e.g. “is bananas” ↔ “is great”). It is difficult to capture such lexically divergent paraphrases by the conventional similarity-based approaches. We design large-scale data [BUCC'13] [SemEval'15] [EMNLP'17], neural network models for sentence pair modeling [NAACL'18a] [COLING'18] and multi-instance learning models [TACL'14] [EMNLP'16], which jointly infers latent word-sentence relations.

Natural Language Generation / Stylistics

Many text-to-text generation problems can be thought of as sentential paraphrasing or monolingual machine translation. It faces an exponential search space larger than bilingual translation, but a much smaller optimal solution space due to specific task requirements. I advocate for a text-to-text generation framework, building on top of machine translation technologies. My recent work uncovered multiple serious problems in text simplification [TACL'15] research between 2010 and 2014, designed automatic evaluation metrics to optimize syntax-based machine translation models [TACL'16], and created neural ranking models to achieve new state-of-the-art results for lexical simplification [EMNLP'18]. I am interested in text generation with different language styles (e.g. historic ↔ modern [COLING'12], non-standard ↔ standard [BUCC'13], feminine ↔ masculine [AAAI'16]).

Current Students:
    Wuwei Lan (PhD student, 2016 -- ; semantics/deep learning EMNLP'17 NAACL'18a COLING'18)
    Mounica Maddela (PhD student, 2017 -- ; stylistics/neural ranking model EMNLP'18 ACL'19)
    Chao Jiang (PhD student, 2018 -- ; semantics NAACL'18)
    Jeniya Tabassum (PhD student; social media/information extraction EMNLP'16 - co-advisor: Alan Ritter)
    Lillian Chow (Undergraduate, summer 2018 --)
    Sydney Lee (Undergraduate, summer 2018 --)
    Sarah Flanagan (Undergraduate, autumn 2018 --)
    Raleigh Potluri (Undergraduate, autumn 2018 --)
    Rita Tong (Undergraduate, autumn 2018 --)
    Bohan Zhang (Undergraduate, spring 2019 --)

Former Student Advisees:
    Jim Chen (Undergraduate @UPenn; crowdsourcing HCOMP'14 TACL'16 - now PhD University of Washington)
    Ray Lei (Undergraduate @UPenn; crowdsourcing HCOMP'14 - now Microsoft Redmond)
    Mingkun Gao (Masters student @UPenn; crowdsourcing/machine translation NAACL'15 - now PhD UIUC)
    Siyu Qiu (Masters student @UPenn; semantics EMNLP'17 - now Hulu LA)
    Maria Pershina (PhD student @NYU; information extraction ACL'14 - now Goldman Sachs)
    Wenchao Du (Undergraduate @UWaterloo; dialog AAAI'17 SAP - now Master CMU LTI)
    Chaitanya Kulkarni (PhD student @OSU; robotic instructions NAACL'18b - advisor: Raghu Machiraju)
    Piyush Ghai (Masters student @OSU; semantics - now Amazon)
    Pravar Mahajan (Masters student @OSU; social media - now IBM Research Almaden)

Professional Service
Workshop Chair:   ACL (2017)
Area Chair:   ACL (2019), NAACL (2019), COLING (2018), EMNLP (2018, 2016)
Publicity Chair:   EMNLP (2019), NAACL (2018, 2016)
     - Workshop on Noisy User-generated Text (W-NUT) at ACL 2015, COLING 2016, EMNLP 2017, 2018 and 2019
     - SemEval 2015 shared-task: Paraphrases and Semantic Similarity in Twitter
     - 2016 Mid-Atlantic Student Colloquium on Speech, Language and Learning
Program Committee:
     ACL (2018, 2017, 2015, 2014, 2013), NAACL (2018, 2015), EMNLP (2017, 2016, 2015, 2014), COLING (2016, 2014)
     WWW (2016, 2015), AAAI (2016, 2015, 2012), KDD (2015)
Journal Reviewer:
     Transactions of the Association for Computational Linguistics (TACL)
     Journal of Artificial Intelligence Research (JAIR)

Invited Talks

When I have spare time, I enjoy art, visiting museums, swimming and snowboarding.

I wrote a biography of my phd advisor Ralph Grishman along with some early history of Information Extraction research in 2017.

I also made a list of the best dressed NLP researchers in 2016/17 , 2015 and 2014.