Date Posted: August 21, 2007
What is IBM Context-Oriented Information Retrieval?
Whether you are performing customer analysis, watching market trends, monitoring fraud detection, or looking for cross-selling opportunities, you are looking for insights, and it is hidden in the data inside and outside the enterprise. Data is everywhere: in databases, e-mails, Web sites, and many more places.
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources in order to gain critical insights into customer trends, fraud, and market trends.
Existing solutions typically enable integration of structured and unstructured information by providing a single point of access to both structured and unstructured data sources. Applications must still retrieve the needed structured data and identify a set of keywords for retrieving the related unstructured data. However, identifying an appropriate set of keywords related to a query is difficult, error-prone, and often impractical for the user.
IBM® Context-Oriented Information Retrieval, using a smart contextual keyword identification mechanism, automatically associates unstructured content with the results of an SQL query, thereby eliminating the need for the application to specify a set of keywords. This technology can work with an existing product or technology that interfaces with any relational database application (such as customer relationship management (CRM), backend business intelligence applications, and online banking portals) and, with minimal customization, extend it to retrieve relevant documents from an internal or external unstructured source such as a Web site or e-mail store.
How does it work?
This technology automatically associates related unstructured content with an SQL query result. It automatically computes the context of an SQL query from the result of the query and determines the relevant tables to explore in the database without any help from the user. The computed context is then used to retrieve the relevant unstructured content to be associated with the SQL query result.
The technology can be used in two ways:
- A servlet-based demonstration: This demonstration shows the use of the technology on a sample relational database and unstructured documents. The demonstration allows the user to fire an SQL query on relational database. It finds the appropriate keywords from the user query result and allows the user to query the unstructured data using these keywords.
- Command line interface (CLI): This showcases the use of this technology as a CLI. The CLI allows the user to specify a query on any database and it returns the set of contextual keywords that are relevant to the SQL query result. The code of the CLI is also provided; this code allows the user to see how it can be accessed as an API.
About the technology author(s)

Ajay Gupta is a technical staff member in the Knowledge and Information Management group at IBM India Research Lab. He received his B-Tech in computer science from Indian Institute of Technology in Guwahati, India. Mr. Gupta's interests include databases and data integration.

Manish Bhide is a research staff member in the Information Management group at IBM India Research lab. He received his M-Tech in computer science and engineering from Indian Institute of Technology in Bombay, India, in 2002. Mr. Bhide has more than five years of research and development experience in the domains of unstructured information management, data management, and autonomic computing applied to databases. He holds two patents and has six more pending in such various fields as data management, policies for autonomic computing, and information retrieval. Mr. Bhide has published more than 15 papers in leading journals and conferences such as IEEE Transactions on Computers, SIGMOD, ICDE, ICAC, and ESORICS in the areas of data management, policies, and access control.

Mukesh Mohania received his Ph.D. in computer science and engineering from Indian Institute of Technology in Bombay, India, in 1995. He was a faculty member at University of South Australia and Western Michigan University from 1995-2001. Dr. Mohania was also associated with Kyoto University and Purdue University as a Senior Research Fellow from 1996-2001. Currently, he is senior manager in IBM India Research Lab and leading information and interaction team. He has worked extensively in the areas of rule processing in distributed databases, data warehousing, semi/unstructured databases, information integration, data mining, and autonomic computing. Dr. Mohania is an IEEE Distinguished Speaker and IEEE senior member.
