A good example of information mining is that the analyzing of group action details contained in relative databases, like master card payments or charge account credit (PIN) transactions. The sphere of information mining is healthier famous than that of text mining. To such transactions varied further info may be provide: date, location, age of card holder, salary, etc. With the help of this info patterns of interest or behavior may be determined. But at other hand, we found that around 90 % information is unstructured and these percentages of unstructured information are increasing daily. Original unstructured text database contains very less amount of structured information. Most of information on which end user works daily are in the forms of e-mails, text documents, multimedia files like video, speech and photos. Finding inside or analysis based on database or data mining methods of this information does not possible due to reason that such methods only works over structured information. Frame work based on text mining techniques is presented in order to gather technological intelligence to support R&D management.
Preprocessing is a process used for conversion of document into feature vector. Just like text categorizations the preprocessing also has controversy about its division.This work utilizes text preprocessing which consist of words responsible for lowering the performance of learning models.
Data preprocessing reduces the size of the input text documents significantly. It involves activities like sentence boundary determination, natural language specific stop word elimination and stemming. Here Stop-words are functional words which occur frequently in the language of the text (for example a, the, an, of etc. in English language), so that they are not useful for classification. Here read whole document and put all words in the vector.
|IEEE Base paper|
|Doc||Complete Project word file document|
|Read me||Complete read me text file|
|Source Code||Complete Code files|