Skip to Content

Data Management Using Stata

A Practical Handbook

By Michael N. Mitchell

Stata Press – 2010 – 387 pages

Purchasing Options:

  • Add to CartPaperback: $69.95
    978-1-59718-076-4
    May 23rd 2010

Description

Using simple language and illustrative examples, this book comprehensively covers data management tasks that bridge the gap between raw data and statistical analysis. Rather than focus on clusters of commands, the author takes a modular approach that enables readers to quickly identify and implement the necessary task without having to access background information first. Each section in the chapters presents a self-contained lesson that illustrates a particular data management task via examples, such as creating data variables and automating error checking. The text also discusses common pitfalls and how to avoid them and provides strategic data management advice. Ideal for both beginning statisticians and experienced users, this handy book helps readers solve problems and learn comprehensive data management skills.

Reviews

The author uses a "learning by example" approach in the book. Overall this works well …

—Morteza Marzjarani, The American Statistician, November 2011

Contents

Introduction

Using this book

Overview of this book

Listing observations in this book

Reading and Writing Datasets

Introduction

Reading Stata datasets

Saving Stata datasets

Reading comma-separated and tab-separated files

Reading space-separated files

Reading fixed-column files

Reading fixed-column files with multiple lines of raw data per observation

Reading SAS XPORT files

Common errors reading files

Entering data directly into the Stata Data Editor

Saving comma-separated and tab-separated files

Saving space-separated files

Saving SAS XPORT files

Data Cleaning

Introduction

Double data entry

Checking individual variables

Checking categorical by categorical variables

Checking categorical by continuous variables

Checking continuous by continuous variables

Correcting errors in data

Identifying duplicates

Final thoughts on data cleaning

Labeling Datasets

Introduction

Describing datasets

Labeling variables

Labeling values

Labeling utilities

Labeling variables and values in different languages

Adding comments to your dataset using notes

Formatting the display of variables

Changing the order of variables in a dataset

Creating Variables

Introduction

Creating and changing variables

Numeric expressions and functions

String expressions and functions

Recoding

Coding missing values

Dummy variables

Date variables

Date-and-time variables

Computations across variables

Computations across observations

More examples using the egen command

Converting string variables to numeric variables

Converting numeric variables to string variables

Renaming and ordering variables

Combining Datasets

Introduction

Appending: Appending datasets

Appending: Problems

Merging: One-to-one match-merging

Merging: One-to-many match-merging

Merging: Merging multiple datasets

Merging: Update merges

Merging: Additional options when merging datasets

Merging: Problems merging datasets

Joining datasets

Crossing datasets

Processing Observations across Subgroups

Introduction

Obtaining separate results for subgroups

Computing values separately by subgroups

Computing values within subgroups: Subscripting observations

Computing values within subgroups: Computations across observations

Computing values within subgroups: Running sums

Computing values within subgroups: More examples

Comparing the by and tsset commands

Changing the Shape of Your Data

Introduction

Wide and long datasets

Introduction to reshaping long to wide

Reshaping long to wide: Problems

Introduction to reshaping wide to long

Reshaping wide to long: Problems

Multilevel datasets

Collapsing datasets

Programming for Data Management

Introduction

Tips on long-term goals in data management

Executing do-files and making log files

Automating data checking

Combining do-files

Introducing Stata macros

Manipulating Stata macros

Repeating commands by looping over variables

Repeating commands by looping over numbers

Repeating commands by looping over anything

Accessing results saved from Stata commands

Saving results of estimation commands as data

Writing Stata programs

Additional Resources

Online resources for this book

Finding and installing additional programs

More online resources

Appendix: Common elements

Index

Author Bio

Michael N. Mitchell is a senior statistician in health services research. For 12 years, he worked in the Statistical Consulting Group of the UCLA Academic Technology Services.

Name: Data Management Using Stata: A Practical Handbook (Paperback)Stata Press 
Description: By Michael N. Mitchell. Using simple language and illustrative examples, this book comprehensively covers data management tasks that bridge the gap between raw data and statistical analysis. Rather than focus on clusters of commands, the author takes a modular approach that...
Categories: Software Engineering & Systems Development, Mathematics & Statistics for Engineers, Statistics & Computing, Statistics