Location: 3401 Siebel Center
Abstract:
While “Big Data” technologies are gaining great successes in unlocking knowledge from structured data, real-world data are largely unstructured and in the form of natural-language text. One of the grand challenges is to turn such massive text data into machine-actionable structures. Yet, most existing systems have heavy reliance on human efforts when dealing with text corpora of various kinds, slowing down the development of downstream applications.
In this talk, I will introduce a data-driven framework, minimal-effort StructMine, that extracts factual structures from massive text corpora with minimal human involvement. In particular, I will discuss how to apply Minimal-Effort StructMine to solve three subtasks: from identifying typed entities in text, to refining entity types into more fine-grained levels, to understanding the typed relationships between entities. Together, these three solutions form a clear roadmap for turning a massive corpus into a structured network to represent factual knowledge. Finally, I will share some directions towards mining corpus-specific structured networks for knowledge discovery.
Bio:
Xiang Ren is a Computer Science PhD candidate at University of Illinois at Urbana-Champaign, working with Jiawei Han and the Data and Information System Lab. Xiang’s research develops data-driven methods for turning unstructured text data into machine-actionable structures. More broadly, his research interests span data mining, machine learning, and natural language processing, with a focus on making sense of massive text corpora. His research has been recognized with a Google PhD Fellowship, Yahoo!-DAIS Research Excellence Award, C. W. Gear Outstanding Graduate Student Award, and has been transferred to US Army Research Lab and Microsoft Bing.