Andy Blog

Summary

I asked ChatGPT4o to simulate a psychology study with a complex multilevel design (here's the final code). I wanted to check whether an effect we observed was down to chance by observing whether it arose regularly when running 1000s of simulated studies (it did not!). ChatGPTt did a good job, but required:

20 minutes further specifying the study design until it summarised it correctly
2 hours of modifying the Python code:
- So that it simulated many studies as opposed to 1
  - It took 20 minutes to modify an error it couldn't correct when summarising the data
- So that it outputted a graph with error bars showing the summarised data

Takeaway

If I were less familiar with Python coding, I would have struggled with correcting ChatGPTs mistakes and identifying errors.
I feel I would have taken a day to write the same code. Via the below iterative process with ChatGPT, we spent 3 hours.

Surprises

I was floored that it could understand my methods section and be useful in Psychology research!
I could copy/paste errors from my Python console directly into ChatGPT, and it often could use this to diagnose issues.

In depth

We are writing a study, and I decided to simulate it—that is, recreate it in code and re-run it 1000s of times but with random data—to double-check a worry about our (nice!) findings potentially arising purely by chance (here's the final code). I rather like coding, and I occasionally take this step (simulate my studies) to reassure myself about some aspect of what I am working on.

Our design is pretty complicated so I wondered if ChatGPT4o could help me. I naively asked ChatGPT:

I want to simulate a psychology study we ran. By simulate, I want to write python code that generates mock data based on the design of the study. If I told you about the study, do you think you could simulate it?

And to my surprise, it said (in a few more words), sure, no problem, tell me your experimental design. SO copy/pasted the draft design from our writeup into chatgpt and… it summarised our rather well! And provided working Python code. I needed to clarify some issues, however, and went through several rounds of changes. For example, asking:

Some updates are needed. There are 20 experiences, with each Cohort been allocated 5 of these experiences. There are 3 levels to the Motion -- you have missed out Month 0, at zero months. All participants do the experience and answer questions at Month 0, and then some participants answer the questions again at Month 3, whilst others answer the questions again at Month 8. There are 3 questions participants are asked per experience, and these are either correct or incorrect.

I also needed to build some flexibility in terms of the parameters I wanted to vary for the simulation.

Can we assume that the questions differ in terms of initial difficulty, with some being harder to answer correctly than others.

Now that we had code that could simulate the study once, I needed to introduce the ability to simulate up many studies

I want to run the above simulation n times. Let's have n=10 to start with. I am interested in the mean accuracy scores for AR and Video conditions, over months. Can you store these scores for each simulation. Can you plot a chart showing these mean scores (x-axis is months, y-axis is accuracy). Can you include confidence intervals centered around these mean scores. Thanks

At several points in the above steps I’ve had to ask for fixes to the python code. Below is an example of one:

I get an error on this line: plt.plot(months, mean_scores_overall['AR'], label='AR', color='blue'). The error is KeyError: 'AR'

Unfortunately ChatGPT, although providing new code that should have fixed the error, did not manage to solve the problem. Sometimes I had to go through several iterations of errors. E.g.

line 129 i get this error now: TypeError: Index must be a MultiIndex

I ended up identifying this error myself and telling ChatGPT what it was:

I am getting the same sort of error. I think your mistake is in this line: mean_scores_overall = mean_scores_df.mean(axis=1). You are not providing mean scores for both AR and Video. Just a combined mean score

Unfortunately, it could not come up with a solution to this problem so I ended up solving it myself.

andytwoods

Simulating a psychology study with ChatGPT4o

Summary

Takeaway

Surprises

In depth