Wei Xu     

[phonetic pronunciation: way shoo ]

Assistant Professor
Department of Computer Science and Engineering
The Ohio State University
   weixu@cse.ohio-state.edu
   495 Dreese Lab (2015 Neil Ave, Columbus, OH 43210)

My research lies at the intersections of machine learning, natural language processing, and social media. I focus on designing algorithms for learning semantics from large data for natural language understanding, and generation in particular with stylistic variations. I recently received the NSF CRII Award, Criteo Faculty Research Award, CrowdFlower AI for Everyone Award, Best Paper Award at COLING'18, as well as research funds from DARPA. Previously, I was a postdoctoral researcher at the University of Pennsylvania. I received my PhD in Computer Science from New York University where I was a MacCracken Fellow, MS and BS from Tsinghua University.

I am an area chair for NAACL 2019, EMNLP 2018 (social media area), COLING 2018 (semantics area), EMNLP 2016 (generation area), a workshop chair for ACL 2017, and the publicity chair for NAACL 2016 and 2018. I also created the Twitter API tutorial and a new course on Social Media and Text Analytics.

  I am looking for one or two new PhD students every year. Here is a note to prospective students.
What's New
  Our paper on neural network models for sentence pair modeling won a best paper award at COLING 2018!
  Congratulations to my phd student Mounica Maddela for having a long paper accepted to EMNLP 2018!
Teaching
CSE 5522 Artificial Intelligence II: Advanced Techniques (Autumn 2018; Spring 2018)
CSE 5539 Social Media and Text Analytics (Autumn 2017; Autumn 2016)
CSE 5525 Speech and Language Processing (Spring 2017)

Research Highlights

Natural Language Understanding / Semantics

We design machine learning algorithms to extract semantic or structured knowledge from large volumes of data. We have a series of work on learning web-scale paraphrases from Twitter that can enable natural language systems to handle errors (e.g. “everytime” ↔ “every time”), lexical variations (e.g. “oscar nom’d doc” ↔ “Oscar-nominated documentary”), rare words (e.g “NetsBulls series” ↔ “Nets and Bulls games”), and language shifts (e.g. “is bananas” ↔ “is great”). It is difficult to capture such lexically divergent paraphrases by the conventional similarity-based approaches. We design large-scale data [BUCC'13] [SemEval'15] [EMNLP'17], neural network models for sentence pair modeling [NAACL'18a] [COLING'18] and multi-instance learning models [TACL'14] [EMNLP'16], which jointly infers latent word-sentence relations.

Natural Language Generation / Stylistics

Many text-to-text generation problems can be thought of as sentential paraphrasing or monolingual machine translation. It faces an exponential search space larger than bilingual translation, but a much smaller optimal solution space due to specific task requirements. I advocate for a text-to-text generation framework, building on top of machine translation technologies. My recent work uncovered multiple serious problems in text simplification [TACL'15] research between 2010 and 2014, designed automatic evaluation metrics to optimize syntax-based machine translation models [TACL'16], and created neural ranking models to achieve new state-of-the-art results for lexical simplification [EMNLP'18]. I am interested in text generation with different language styles (e.g. historic ↔ modern [COLING'12], non-standard ↔ standard [BUCC'13], feminine ↔ masculine [AAAI'16]).

Publications
Students
Current Students:
    Wuwei Lan (PhD student, 2016 -- ; semantics/deep learning EMNLP'17 NAACL'18a COLING'18)
    Mounica Maddela (PhD student, 2017 -- ; stylistics/neural ranking model EMNLP'18)
    Chao Jiang (PhD student, 2018 --)
    Jeniya Tabassum (PhD student; social media/information extraction EMNLP'16 - co-advisor: Alan Ritter)
    Lillian Chow (Undergraduate, summer 2018 --)
    Sydney Lee (Undergraduate, summer 2018 --)
    Rita Tong (Undergraduate, autumn 2018 --)

Former Student Mentees:
    Jim Chen (Undergraduate @UPenn; crowdsourcing HCOMP'14 TACL'16 - now PhD University of Washington)
    Ray Lei (Undergraduate @UPenn; crowdsourcing HCOMP'14 - now Microsoft Redmond)
    Mingkun Gao (Masters student @UPenn; crowdsourcing/machine translation NAACL'15 - now PhD UIUC)
    Siyu Qiu (Masters student @UPenn; semantics EMNLP'17 - now Hulu LA)
    Maria Pershina (PhD student @NYU; information extraction ACL'14 - now Goldman Sachs)
    Wenchao Du (Undergraduate @UWaterloo; dialog AAAI'17 SAP - now Master CMU LTI)
    Piyush Ghai (Masters student @OSU; semantics - now Amazon)
    Pravar Mahajan (Masters student @OSU; social media - now IBM Research Almaden)
    Chaitanya Kulkarni (PhD student @OSU; robotic instructions NAACL'18b - advisor: Raghu Machiraju)

Professional Service
Workshop Chair:   ACL (2017)
Area Chair:   COLING (2018), EMNLP (2018, 2016)
Publicity Chair:   NAACL (2018, 2016)
Organizer:
     - Workshop on Noisy User-generated Text (W-NUT) at ACL 2015, COLING 2016, EMNLP 2017 & 2018
     - SemEval 2015 shared-task: Paraphrases and Semantic Similarity in Twitter
     - 2016 Mid-Atlantic Student Colloquium on Speech, Language and Learning
Program Committee:
     ACL (2018, 2017, 2015, 2014, 2013), NAACL (2018, 2015), EMNLP (2017, 2016, 2015, 2014), COLING (2016, 2014)
     WWW (2016, 2015), AAAI (2016, 2015, 2012), KDD (2015)
Journal Reviewer:
     Transactions of the Association for Computational Linguistics (TACL)
     Journal of Artificial Intelligence Research (JAIR)

Invited Talks
Miscellaneous

When I have spare time, I enjoy traveling, swimming and snowboarding.

I also made a list of the best dressed NLP researchers (2016/17) , (2015) and (2014).