Skip to Content

Introduction to Data Technologies

By Paul Murrell

Series Editor: David Madigan, Fionn Murtagh, Padhraic Smyth

Chapman and Hall/CRC – 2009 – 418 pages

Series: Chapman & Hall/CRC Computer Science & Data Analysis

Purchasing Options:

  • Add to CartHardback: $83.95
    978-1-42-006517-6
    February 22nd 2009

Description

Providing key information on how to work with research data, Introduction to Data Technologies presents ideas and techniques for performing critical, behind-the-scenes tasks that take up so much time and effort yet typically receive little attention in formal education. With a focus on computational tools, the book shows readers how to improve their awareness of what tasks can be achieved and describes the correct approach to perform these tasks.

Practical examples demonstrate the most important points

The author first discusses how to write computer code using HTML as a concrete example. He then covers a variety of data storage topics, including different file formats, XML, and the structure and design issues of relational databases. After illustrating how to extract data from a relational database using SQL, the book presents tools and techniques for searching, sorting, tabulating, and manipulating data. It also introduces some very basic programming concepts as well as the R language for statistical computing. Each of these topics has supporting chapters that offer reference material on HTML, CSS, XML, DTD, SQL, R, and regular expressions.

One-stop shop of introductory computing information

Written by a member of the R Development Core Team, this resource shows readers how to apply data technologies to tasks within a research setting. Collecting material otherwise scattered across many books and the web, it explores how to publish information via the web, how to access information stored in different formats, and how to write small programs to automate simple, repetitive tasks.

Reviews

Paul Murrell, best known for his R Graphics book, has delivered a second masterpiece for people who have the difficult task to clean and prepare raw data for further use in common statistical software packages. … provides the perfect basis for a course on data literacy … Moreover, the book also is an excellent basis for advanced M.S. and Ph.D. students as well as practitioners in academia and industry who are confronted with the task to clean and preprocess their own or their colleagues’ data.

—Jürgen Symanzik, Technometrics, May 2011

Introduction to Data Technologies introduces various computer-related topics, including markup languages, statistical computing languages, coding, storage, and querying, in a systematic manner. … the book may serve as an introduction to readers with general interest who plan to supplement their knowledge in specific computer-related topics, in addition to R programming.

Journal of the American Statistical Association, Vol. 105, No. 492, December 2010

This is a very gentle book. It enables students and statisticians, particularly those just entering the profession, to begin to familiarize themselves with important concepts and tools from the world of databases … it is encouraging that such topics are finding their way into statistics courses at all. … I found the style of the book very engaging … . It has the Paul Murrell light touch, first evident to me in his eminently readable and comprehensive book on R graphics. Like that one, the present book has interesting, occasionally slightly unusual examples and an easy and elegant writing style. The book does not hesitate to offer plain, direct advice in contexts in which other authors might simply let readers exercise their personal preferences. For students, particularly, I think this is a good thing. …

—Bill Venables, CSIRO, Australian & New Zealand Journal of Statistics, 2010

Contents

Introduction

Case Study: Point Nemo

Writing Computer Code

Case Study: Point Nemo (continued)

Syntax

Semantics

Writing Code

Checking Code

Running Code

The DRY Principle

HTML Reference

HTML Syntax

HTML Semantics

CSS Reference

CSS Syntax

CSS Semantics

Linking CSS to HTML

CSS Tips

Data Storage

Case Study: YBC 7289

Plain Text Formats

Binary Formats

Spreadsheets

XML

Databases

XML Reference

XML Syntax

Document Type Definitions

Data Queries

Case Study: The Data Expo (continued)

Querying Databases

Querying XML

SQL Reference

SQL Syntax

SQL Queries

Other SQL Commands

Data Processing

Case Study: The Population Clock

The R Environment

The R Language

Data Types and Data Structures

Subsetting

More on Data Structures

Data Import/Export

Data Manipulation

Text Processing

Data Display

Programming

Other Software

R Reference

R Syntax

Data Types and Data Structures

Functions

Getting Help

Packages

Searching for Functions

Regular Expressions Reference

Literals

Metacharacters

Conclusion

Attributions

Bibliography

Index

Further Reading appears at the end of each chapter.

Author Bio

Paul Murrell is a Senior Lecturer in the Department of Statistics at the University of Auckland, New Zealand. Author of the bestselling R Graphics (2006), he is also part of the development team for the R and Omegahat statistical computing projects. Dr. Murrell’s research interests include computational and graphical statistics.

Name: Introduction to Data Technologies (Hardback)Chapman and Hall/CRC 
Description: By Paul MurrellSeries Editor: David Madigan, Fionn Murtagh, Padhraic Smyth. Providing key information on how to work with research data, Introduction to Data Technologies presents ideas and techniques for performing critical, behind-the-scenes tasks that take up so much time and effort yet typically receive little attention in...
Categories: Statistics for the Biological Sciences, Statistics & Computing, Databases, Data Preparation & Mining