CLASSIFICATION METHOD
Classification Method
Data Mining
classification
Classification techniques in data mining are capable of processing a large amount of data. It can be used to predict categorical class labels and classifies data based on training set and class labels and it can be used for classifying newly available data.The term could cover any context in which some decision or forecast is made on the basis of presently available information. Classification procedurs recognized method for repeatedly making such decisions in new situations. Here if we assume that problem is a concern with the construction of a procedure that will be applied to a continuing sequence of cases in which each new case must be assigned to one of a set of pre defined classes on the basis of observed features of data.Creation of a classification procedure from a set of data for which the exact classes are known in advance is termed as pattern recognition or supervised learning. Contexts in which a classification task is fundamental include, for example, assigning individuals to credit status on the basis of financial and other personal information, and the initial diagnosis of a patient’s disease in order to select immediate treatment while awaiting perfect test results. Some of the most critical problems arising in science, industry and commerce can be called as classification or decision problems.
The Algorithm:
- 1. C4.5
- 2. k-means
- 3. Support vector machines
- 4. Apriori
- 5. EM
- 6. PageRank
- 7. AdaBoost
- 8. kNN
- 9. Naive Bayes
- 10. CART
- C4.5: constructs a classifier in the form of a decision tree. In order to do this, C4.5 is given a set of data representing things that are already classified.A classifier is a tool in data mining that takes a bunch of data representing things we want to classify and attempts to predict which class the new data.
- K-means: creates k groups from a set of objects so that the members of a group are more similar. It’s a popular cluster analysis technique for exploring a dataset.can be used to pre-cluster a massive dataset followed by a more expensive cluster analysis on the sub-clusters. k-means can also be used to rapidly “play” with k and explore whether there are overlooked patterns or relationships in the dataset.
- SVM: learns a hyperplane to classify data into 2 classes. At a high-level, SVM performs a similar task like C4.5 except SVM doesn’t use decision trees at all.
- Apriori: learns association rules and is applied to a database containing a large number of transactions.Association rule learning is a data mining technique for learning correlations and relations among variables in a database.
- EM : expectation-maximization (EM) is generally used as a clustering algorithm for knowledge discovery.The EM algorithm iterates and optimizes the likelihood of seeing observed data while estimating the parameters of a statistical model with unobserved variables.
- PageRank: is a link analysis algorithm designed to determine the relative importance of some object linked within a network of objects.
- AdaBoost: is a boosting algorithm which constructs a classifierBoosting is an ensemble learning algorithm which takes multiple learning algorithms (e.g. decision trees) and combines them. The goal is to take an ensemble or group of weak learners and combine them to create a single strong learner.
- kNN: k-Nearest Neighbors, is a classification algorithm. However, it differs from the classifiers previously described because it’s a lazy learner.
- Naive Bayes: Naive Bayes is not a single algorithm, but a family of classification algorithms that share one common assumption. Every feature of the data being classified is independent of all other features given the class. Naive Bayes involves simple arithmetic. It’s just tallying up counts, multiplying and dividing.
- CART: classification and regression trees. It is a decision tree learning technique that outputs either classification or regression trees. Like C4.5, CART is a classifier.CART is a supervised learning technique, since it is provided a labeled training dataset in order to construct the classification or regression tree model
Advantages of Classification Model :
- Predictive accuracy
- Hit rate
- Speed
- Model building; predicting
- Robustness
- Scalability
- Interpretability
- Transparency, explainability
conclusion
Classification methods are typically strong in modeling communications. Each of these methods can be used in various situations as needed where one tends to be useful while the other may not. These classification algorithms can be implemented on different types of data sets like share market data, data of patients, financial data,etc. these classification techniques show how a data can be determined and grouped when a new set of data is available.
Komentar
Posting Komentar