Simulating multivariate normally distributed data in R

In my graduate class on path analysis, we do a lot of analysis on our own data. This year, I suggested that people consider analyzing simulated data based upon the statistics of their data. This way they’ll use a data set that looks like their data, but they aren’t doing a lot of model fitting on data they care about and what to use in real research. Thus, today I typed up a quick guide to simulating multivariate normal data in R for use in our class.

If you find typos, errors, etc., please let me know.

Significance testing, p-values, and confidence intervals

You have to enjoy the introduction of Sander Greenland, et al.’s article in the supplemental material posted with the American Statistical Association’s statement on p-values (full text here):

“Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and
yet these misinterpretations dominate much of the scientific literature.” (p. 1, emphasis mine)

Working scientists should be able to handle this.

Should I pursue a Ph.D. in Quantitative Psychology?

Every Fall, I get email that read something like this:

I’m writing to inquire about pursuing a Ph.D. in Quantitative Psychology at UW. I’ve always been interested in statistics (data analysis, measurement, etc.). I think that studying quantitative with you would prepare me very well to do research in [insert substantive area here].

I believe I understand the reasoning these students have. “If I learn really good stats., I’ll be able to do really cool research in social/memory/psychopathology/etc.” Here’s what I wrote a couple weeks ago to someone:

Thanks for your inquiry. You have the right person. However, before I talk about the program, let me ask you something about your email. You wrote that you are interested in stats, but your research interest is on learning and memory. That suggests to me that you should go to a cognitive program where you can focus your research on learning and memory. One of your criteria for choosing a grad. program then might be your ability to get good training in quant. methods and research design.

Why do I raise this? I do because grad. school in Psychology is largely about learning how to do good research. If you were to come to grad. school in Quant. with me, you’d learn how to do research in psychometrics, modeling, and applied stats. That research is performed rather differently than substantive research on learning and memory. Getting a PhD in quant. would not then necessarily put you in a good position to do research on learning and memory. Whereas, if you go to a program and work with an expert in some domain of learning and memory that interests you, you should come out with a solid basis for doing that work. And if you are motivated, you’ll get training in methods that you want. I hope this makes sense. If it doesn’t, feel free to ask clarifying questions.

Let’s make this clearer, If you want to study some substantive psychological domain, you want to learn how to design studies in that domain, how to measure the behavior in question, and how to analyze the acquired data. In Quant. Psychology, you’ll learn how to set up and conduct a simulation study, math stats. and probability theory underlying statistical decisions, computer programming, etc. You’ll also likely do analyses of real data, but often primary interests in quantitative research are about how a quantity or algorithm performed, less on the substantive implications of the specific results. The skills and knowledge for doing good substantive research do not (in my opinion) largely overlap with the skills and knowledge for doing good quantitative research.

In summary, I don’t think pursing a Ph.D. in Quant. is necessarily a good way to try to do great research in some substantive scientific domain. However, if you want to do research on how we measure, model, and study behavior, that is the methods and models used to design our studies and analyze our data, then I think a Ph.D. program in Quant. sounds like a good fit.

Moving from Catalyst to various tools

As UW-IT removes the old Catalyst tools, I’m moving stuff to new places, as well as updated all the pointers to that material. This blog site is my replacement for what was my Catalyst Commonview faculty webpage. I’ll be learning WordPress, as well as moving and adding information here over the coming months.

Teaching PSYCH 548: CFA and SEM in Fall of 2018

I’ll be teaching Confirmatory Factor Analysis and Structural Equation Modeling next fall (listed as PSYCH 548 for 5.0 credits).

First, if you don’t know, I encourage you to bring your own data to use in the class. You have to be able to share some form of it with me, like a covariance matrix. I won’t share, distribute, or use it for my own work. You’ll submit your R syntax and the data, so I can help debug and provide feedback. Second, I’m planning to stick with R, although maybe look at some other packages besides lavaan. In the past, I’ve let people use other programs, like Mplus, but I don’t think I’m going to do that anymore. Third, I’m planning to reinstate writing three research papers (one for each major topic: observed variable path anaylsis, confirmatory factor analysis, and latent variable path analysis), although with a peer review component. I’m also thinking about adding some work on simulation and power analysis. In the past, people have turned the class’ papers into thesis chapters or publications, so if you plan for it, you may be able to do the same thing.
For the class to be useful to you, you’ll want the following:
  • hypotheses (or the ability to create such) about how your data may be structured and tested; this is not an exploratory data analysis class.
  • if you have many more observations than variables, you need a “large” sample (probably greater than 100, over 300 better); In some cases as few as 80 people will work, but the class can be more challenging/frustrating and/or the models quite limited.
  • if you have many more variables than observations (e.g., time series, physiological, and/or neruoscience data), you’ll need to think about intra-individual covariance structure and pooling (or not) across people.
  • either way, you want to think about “redundant measures”. Items, measurements that are “getting at the same thing” and can be structured.
  • however, if you have tiny cross-sectional or two time point data sets (say, N=20) with few variables on each respondent, the latent variables part of this class probably won’t work for you.