Issue occur in text mining of article is categorization of the article as there are different kind of news available in the document of news paper or magazine etc. So manual work for this need readers and in readers also some kind of expert who can understand those topics then categorize it. Here manual work increase time and efficiency get decrease. In order to cover this many of the researcher are working for categorization of the work where manual reading efficiency can be achieve at the same time classification time for article also get reduce. For all these a kind of dictionary is normally maintain at the research level for different category while at the same time manual help or proof study of that dictionary is required. So it is partially automatic and need update regularly. There is no direct algorithm or steps for same which give results accurate for the classification.
PSO was originally developed by Eberhart and Kennedy in 1995 , and was inspired by the social behavior of a flock of birds. In the PSO algorithm, the birds in a flock are symbolically represented as particles. These particles can be considered as simple agents “flying” through a problem space. A particle’s location in the multi-dimensional problem space represents one solution for the problem. When a particle moves to a new location, a different problem solution is generated. This solution is evaluated by a fitness function that provides a quantitative value of the solution’s utility.
Update Velocity and Position
So the matrix D contain all the values of the centriod distance from the document then find the maximum similarity which will evaluate specify best possible solution. The velocity and direction of each particle moving along each dimension of the problem space will be altered with each generation of movement. In combination, the particle’s personal experience, Pid and its neighbors’ experience, Pgd influence the movement of each particle through a problem space. The random values rand1 and rand2 are used for the sake of
completeness, that is, to make sure that particles explore a wide search space before converging around the optimal solution. The values of c1 and c2 control the weight balance of Pid and Pgd in deciding the particle’s next movement velocity. At every generation, the
particle’s new location is computed by adding the particle’s current velocity, vid, to its location, xid. Mathematically, given a multi-dimensional problem space, the ith particle changes its velocity and location according to the following equations . The whole clustering behavior of the PSO clustering algorithm can be classed into two stages: a global searching stage and a local refining stage. At the initial iterations, based on the PSO algorithm’s particle velocity updating equation, the particle’s initial velocity vid, the two randomly generated values (rand1, rand2) at each generation and the inertia weight factor w provide the necessary diversity to the particle swarm by changing the momentum of particles to avoid the stagnation of particles at the local optima. The initial iterations can be classified as the global searching stage. After several iterations, the particle’s velocity will gradually reduce and the particle’s explore area will shrink while the particle will approach the optimal solution. The global searching stage gradually changes to the local refining stage.
Proposed work will improve the classification accuracy by the use of genetic algorithm. In some of previous work document classification is occur on the basis of the Prior information about the content provider. This limitation is successfully overcome in this work by classifying whole set of disputant without any background information. In this work a genetic algorithm is proposed that classify the text document in efficient manner. Here particle swarm optimization learning algorithm will be utilize for the classification which is a genetic approach. Proposed classification approach classifies the data on the basis of terms features. After perfect classification of document retrieval of document as per text query will be done.
|IEEE Base Paper|