Subject: "Text Mining: Opportunities and Challenges"
Date: Tue, 4 Jul 2000 15:02:47 +0200
From: "Ronen Feldman" <ronen@instinct-soft.com>
To: "Marko Grobelnik" <marko.grobelnik@ijs.si>
CC: "Natasa Milic-Frayling" <natasamf@microsoft.com>,
     "Dunja Mladenic" <Dunja.Mladenic@ijs.si>

"Text Mining: Opportunities and Challenges"

The information age has made it easy to receive and store large amounts of
data. The proliferation of documents available - on the Web, in corporate
intranets, on news wires and elsewhere - is overwhelming. However, while the
amount of data available to us is forever increasing, our ability to absorb
and process this information has remained constant. Search engines only
exacerbate the problem by making more and more documents available in a
matter of a few keystrokes; so-called "push" technology makes the problem
even worse by constantly reminding us that we are failing to follow critical
news, events, and trends. We experience information overload, and miss
important patterns even as they unfold before us.

Text Mining is a new and exciting area of research that tackles this problem
through techniques borrowed from data mining, machine learning, information
retrieval, natural-language understanding, case-based reasoning, statistics,
and knowledge management to help people gain rapid insight into large
quantities of semi-structured or unstructured text. Typically, it involves
preprocessing a document collection (e.g., through text categorization or
term extraction), storing , indexing and analyzing the intermediate
representations (through distribution analysis, document clustering, trend
analysis, and association rule discovery), and presenting the results
graphically.

In my talk I will present some of the new challenges facing the text mining
community, with particular focus on the representation of documents and the
ability to provide better insights into document collections. I will also
try to provide a consumer perspective into text mining while reviewing the
business opportunities that currently exist in this area.
