# Data Clustering in C++

## An Object-Oriented Approach

#### By **Guojun Gan**

Chapman and Hall/CRC – 2011 – 520 pages

**Series:** Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Chapman and Hall/CRC – 2011 – 520 pages

**Series:** Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms.

Using object-oriented design and programming techniques, **Data Clustering in C++** exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered.

This book is divided into three parts--

*Data Clustering and C++ Preliminaries:*A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns*A C++ Data Clustering Framework:*The development of data clustering base classes*Data Clustering Algorithms:*The implementation of several popular data clustering algorithms

A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the CD-ROM of the book. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.

Dedication

Preface

List of Tables

List of Figures

**I Data Clustering and C++ Preliminaries**

**Introduction to Data Clustering**

Data Clustering

Data Types

Dissimilarity and SimilarityMeasures

Hierarchical Clustering Algorithms

Partitional Clustering Algorithms

Cluster Validity

Clustering Applications

Literature of Clustering Algorithms

Summary

**The Unified Modeling Language**

PackageDiagrams

Class Diagrams

Use Case Diagrams

Activity Diagrams

Notes

Summary

**Object-Oriented Programming and C++ **

Object-Oriented Programming

The C++ Programming Language

Encapsulation

Inheritance

Polymorphism

Exception Handling

Summary

**Design Patterns**

Singleton

Composite

Prototype

Strategy

TemplateMethod

Visitor

Summary

**C++ Libraries and Tools**

The Standard Template Library

Boost C++ Libraries

GNU Build System

Cygwin

Summary

**II A C++ Data Clustering Framework**

**The Clustering Library**

Directory Structure and Filenames

Specification Files

Macros and typedef Declarations

Error Handling

Unit Testing

Compilation and Installation

Summary

**Datasets**

Attributes

Records

Datasets

A Dataset Example

Summary

**Clusters**

Clusters

Partitional Clustering

Hierarchical Clustering

Summary

**Dissimilarity Measures**

The Distance Base Class

Minkowski Distance

Euclidean Distance

SimpleMatching Distance

Mixed Distance

Mahalanobis Distance

Summary

**Clustering Algorithms**

Arguments

Results

Algorithms

A Dummy Clustering Algorithm

Summary

**Utility Classes**

The Container Class

The Double-keyMap Class

The Dataset Adapters

The Node Visitors

The Dendrogram Class

The DendrogramVisitor

Summary

**III Data Clustering Algorithms**

**Agglomerative Hierarchical Algorithms**

Description of the Algorithm

Implementation

Examples

Summary

**DIANA**

Description of the Algorithm

Implementation

Examples

Summary

**The k-means Algorithm**

Description of the Algorithm

Implementation

Examples

Summary

The c-means Algorithm

Description of the Algorithm

Implementaion

Examples

Summary

**The k-prototypes Algorithm**

Description of the Algorithm

Implementation

Examples

Summary

**The Genetic k-modes Algorithm**

Description of the Algorithm

Implementation

Examples

Summary

**The FSC Algorithm**

Description of the Algorithm

Implementation

Examples

Summary

**The Gaussian Mixture Algorithm**

Description of the Algorithm

Implementation

Examples

Summary

**A Parallel k-means Algorithm**

Message Passing Interface

Description of the Algorithm

Implementation

Examples

Summary

**A Exercises and Projects**

**B Listings**

**C Software**

**Bibliography**

**Author Index**

**Subject Index**

Guojun Gan, Manulife Financial, Toronto, Canada

Name: Data Clustering in C++: An Object-Oriented Approach (Hardback) – Chapman and Hall/CRC

Description: By Guojun Gan. Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical...

Categories: Statistics & Computing, Data Preparation & Mining