Short course (Sunday, July 1 2007)

Models for repeated discrete data



Names and addresses of presenters:

[ Verbeke's Photo ]

Geert Verbeke , Biostatistical Centre, K.U.Leuven

Kapucijnenvoer 35, B-3000 Leuven, Belgium

Email: geert.verbeke@med.kuleuven.be

Tel: +32-16-336891, Sec: +32-16-336892, Fax: 32-16-337015

Web: http://www.kuleuven.ac.be/biostat

 

[ Molenberghs's Photo ]

Geert Molenberghs , Center for Statistics, Limburgs Universitair Centrum

Universitaire Campus, Building D, B-3590 Diepenbeek, Belgium

Email: geert.molenberghs@uhasselt.be

Tel: +32-11-268238, Sec: +32-11-268202, Fax: +32-11-268299

Web: http://www.censtat.uhasselt.be

Abstract of the course

Starting from a brief introduction on the linear mixed model for continuous longitudinal data, extensions will be formulated to model outcomes of a categorical nature, including counts and binary data. Based on Verbeke and Molenberghs (2005), several families of models will be discussed and compared, from an interpretational as well as computational point of view. First, models will be discussed for the full marginal distribution of the outcome vector. This allows model fitting to be based on maximum likelihood principles, immediately implying inferential tools for all parameters in the models. The main disadvantage of such models is that they require complete specification of all higher-order interactions, which is often based on unrealistic assumptions, and often lead to computational problems, especially in examples with many repeated measurements per subject. Therefore, alternatives have been formulated in the statistical literature. First, following the reasoning in the linear mixed models, a full marginal model can be obtained from a random effects approach, where association between repeated measurements within the same subject is believed to be generated by underlying unobserved random effects. Alternatively, semiparametric methods can be used which do no longer require full specification of the likelihood, only of the first moments or of the first and second moments. This leads to the so-called generalized estimating equations. For both approaches, estimation and inference will be discussed and illustrated in full detail, and it will be extensively argued that both approaches yield parameters with completely different interpretations. Advantages and disadvantages of both will be discussed in full detail. Finally, when analysing longitudinal data, one is often confronted with missing observations, i.e., scheduled measurements have not been made, due to a variety of (known or unknown) reasons. It will be shown that, if no appropriate measures are taken, missing data can cause seriously biased results, and interpretational difficulties. Methods to properly analyse incomplete data, under flexible assumptions, are presented. Key concepts of sensitivity analysis are introduced. Without putting too much emphasis on software, some examples will be given on how the different approaches can be implemented within the SAS software package. Throughout the course, it will be assumed that the participants are familiar with basic statistical modelling, including linear models (regression and analysis of variance), as well as generalized linear models (logistic and Poisson regression). Moreover, pre-requisite knowledge should also include general estimation and testing theory (maximum likelihood, likelihood ratio).

[ Book's Photo ]

Outline

Four sessions of 1.5 hours are planned as follows:

  1. Session 1 (9:00-10:30) (Geert Verbeke): The linear mixed model

    • Introduction: Examples of longitudinal studies; Cross-sectional versus longitudinal studies; Merits of longitudinal studies
    • Introduction of linear mixed models
    • Inference based on the marginal model
    • Inference for random effects: Empirical Bayes estimates

  2. Session 2 (11:00-12:30) (Geert Molenberghs): Models for repeated categorical data

    • Key examples
    • Introduction to generalized linear models
    • Introduction to modelling frameworks: marginal, conditional, and random-effects models
    • Likelihood-based marginal models
    • Non-likelihood marginal modelling
    • The concepts of generalized estimating equations (GEE), alternating logistic regression, and pseudo-likelihood

  3. Session 3 (14:30-16:00) (Geert Verbeke): Generalized linear mixed models and generalized estimating equations

    • Generalized linear models with random effects
    • The marginal likelihood; (adaptive) Gaussian quadrature
    • More on marginal models with only first order moments specified: GEE
    • Working correlation
    • Model-based and empirically corrected standard errors
    • Parameter interpretation and relation to generalized mixed models

  4. Session 4 (16:15-17:45) (Geert Molenberghs): Missing data

    • Missing values in longitudinal data: Examples
    • The impact on interpretation, efficiency and inference
    • Some simple, naive solutions
    • Direct likelihood, EM, and multiple imputations
    • Introduction to selection models, pattern-mixture models
    • Sensitivity analysis

The targeted audience includes applied statisticians and biomedical researchers in industry, public health organizations, contract research organizations, and academia.

Learning outcomes and instructional methods:

As a result of the course, participants should be able to perform a basic analysis for a particular longitudinal data set at hand. Based on a selection of exploratory tools, the nature of the data, and the research questions to be answered in the analyses, they should be able to construct an appropriate statistical model, to fit the model within the SAS framework, and to interpret the obtained results. Further, participants should be aware not only of the possibilities and strengths of a particular selected approach, but also of its drawbacks in comparison to other methods.

The course will be explanatory rather than mathematically rigorous. Emphasis is on giving sufficient detail in order for participants to have a general overview of frequently used approaches, with their advantages and disadvantages, while giving reference to other sources where more detailed information is available. Also, it will be explained in detail how the different approaches can be implemented in the SAS package, and how the resulting outputs should be interpreted.

Presenters:

Geert Verbeke is Professor in Biostatistics at the Biostatistical Centre of the Katholieke Universiteit Leuven in Belgium. He wrote his dissertation as well as a number of methodological papers, on various aspects of linear mixed models for longitudinal data.

Geert Molenberghs is Professor in Biostatistics at the Limburgs Universitair Centrum in Belgium. He published methodological work on repeated categorical data and on the analysis of nonresponse in clinical and epidemiological studies.

Both presenters are editor and author of three books on the use of linear mixed models for the analysis of longitudinal data (Springer Lecture Notes 1997, Springer Series in Statistics 2000, Springer Series in Statistics 2005), and they have taught several (short) courses on the topic in universities as well as industry. They received the 2002 and 2004 CE award for courses taught at the Joint Statistical Meetings in New-York and Toronto.

 
This file last modified 06/26/07