Skip to Content

Data Clustering in C++

An Object-Oriented Approach

By Guojun Gan

Chapman and Hall/CRC – 2011 – 520 pages

Series: Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Purchasing Options:

  • Add to CartHardback: $98.95
    978-1-43-986223-0
    March 28th 2011

Description

Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms.

Using object-oriented design and programming techniques, Data Clustering in C++ exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered.

This book is divided into three parts--

  • Data Clustering and C++ Preliminaries: A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns
  • A C++ Data Clustering Framework: The development of data clustering base classes
  • Data Clustering Algorithms: The implementation of several popular data clustering algorithms

A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the CD-ROM of the book. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.

Contents

Dedication

Preface

List of Tables

List of Figures

I Data Clustering and C++ Preliminaries

Introduction to Data Clustering

Data Clustering

Data Types

Dissimilarity and SimilarityMeasures

Hierarchical Clustering Algorithms

Partitional Clustering Algorithms

Cluster Validity

Clustering Applications

Literature of Clustering Algorithms

Summary

The Unified Modeling Language

PackageDiagrams

Class Diagrams

Use Case Diagrams

Activity Diagrams

Notes

Summary

Object-Oriented Programming and C++

Object-Oriented Programming

The C++ Programming Language

Encapsulation

Inheritance

Polymorphism

Exception Handling

Summary

Design Patterns

Singleton

Composite

Prototype

Strategy

TemplateMethod

Visitor

Summary

C++ Libraries and Tools

The Standard Template Library

Boost C++ Libraries

GNU Build System

Cygwin

Summary

II A C++ Data Clustering Framework

The Clustering Library

Directory Structure and Filenames

Specification Files

Macros and typedef Declarations

Error Handling

Unit Testing

Compilation and Installation

Summary

Datasets

Attributes

Records

Datasets

A Dataset Example

Summary

Clusters

Clusters

Partitional Clustering

Hierarchical Clustering

Summary

Dissimilarity Measures

The Distance Base Class

Minkowski Distance

Euclidean Distance

SimpleMatching Distance

Mixed Distance

Mahalanobis Distance

Summary

Clustering Algorithms

Arguments

Results

Algorithms

A Dummy Clustering Algorithm

Summary

Utility Classes

The Container Class

The Double-keyMap Class

The Dataset Adapters

The Node Visitors

The Dendrogram Class

The DendrogramVisitor

Summary

III Data Clustering Algorithms

Agglomerative Hierarchical Algorithms

Description of the Algorithm

Implementation

Examples

Summary

DIANA

Description of the Algorithm

Implementation

Examples

Summary

The k-means Algorithm

Description of the Algorithm

Implementation

Examples

Summary

The c-means Algorithm

Description of the Algorithm

Implementaion

Examples

Summary

The k-prototypes Algorithm

Description of the Algorithm

Implementation

Examples

Summary

The Genetic k-modes Algorithm

Description of the Algorithm

Implementation

Examples

Summary

The FSC Algorithm

Description of the Algorithm

Implementation

Examples

Summary

The Gaussian Mixture Algorithm

Description of the Algorithm

Implementation

Examples

Summary

A Parallel k-means Algorithm

Message Passing Interface

Description of the Algorithm

Implementation

Examples

Summary

A Exercises and Projects

B Listings

C Software

Bibliography

Author Index

Subject Index

Author Bio

Guojun Gan, Manulife Financial, Toronto, Canada

Name: Data Clustering in C++: An Object-Oriented Approach (Hardback)Chapman and Hall/CRC 
Description: By Guojun Gan. Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical...
Categories: Statistics & Computing, Statistical Computing