Oh Model. Why won’t you work.


So…my models aren’t working out the way that I want them to.  I proposed a statistical mediation model and my X isn’t even preducting my Y, much less being related through an M between the two of them.

 I know this doesn’t mean I won’t graduate (lol) but I am a bit bummed because it’ll be harder to get any publishable articles out of this.  I also need to explore some alternate stuff.  I started exploring the relationship between the M and the Y and there’s none of that, either.  I still haven’t looked at a fourth variable I’m interested in – there may be some stuff there, and I plan to explore it a little tomorrow morning and Friday – but jeez, it’s a bit depressing.

I’m also sick now, so of course I didn’t feel like doing anything today besides listlessly clicking through websites and shit.





Question 1: Can I still graduate if I don’t have significant results?!

Question 2: WTF is this nonsense?  I have a statistical program that can do what I need to do but it’s brand new to me, and so I wanted to cross-check my work with a program I already know how to use (and for which my advisor provided syntax).  When I go to try to run it I realize that I need an advanced models add-on.  The add-on is $600!  On a program that by itself costs $200 for the student version and close to $1000 for the corporate/university version.

The geekery surrounding Stata continues.


Today I’m teaching myself to use Stata 13 SEM commands.  Generally in Stata I use the command syntax, because it’s faster and easier and I understand it.  They’re also very very simple (compared to SAS’s wtf command lines, which don’t make sense to me).

I started out learning how to do a simple mediation with the sem command and then a multilevel mediation with the gsem command.  Easy!

But Stata also has a graphical SEM builder.  Their documentation is SO handy, it tells you how to use it step by step.  Not only did I get the same results, but I also get a nifty diagram with the coefficients on it:


Which is fucking amazing.  I wonder if I can get it to flag my significant paths?  All of the paths in this diagram are significant, so maybe they just don’t show nonsignificant ones.

(This isn’t my dissertation data, btw; Stata has freely available datasets on their website that you can use to learn the techniques.  The best thing is that Stata can automatically download these datasets, so you don’t have to poke around looking for them.  You just type “use [url here]” and it GETS it for you.)

As for the lesson in this, I realized today that what takes a dissertation so fucking long isn’t the actual process of data analysis and writing.  That part is relatively easy.  It’s the learning.  I’m going to write a full post on this later, but the dissertation process is simply an alternative way of learning something – different from taking a class, kind of akin to taking comprehensive exams.  It’s struggling through the shit you don’t know that takes the longest amount of time.)

Grrr (recoding and cleaning)


For those of you who may be humanities scholars and/or don’t deal with datasets often, “data cleaning” is really just the weird misnomer quant scientists use for prepping the data.  Sometimes we have data that needs to be coded into groups, so we use statistical software to do that; sometimes we need to clean up the way the data is already coded, which was what my problem was today.  Data cleaning is the most tedious and usually the longest process when preparing to analyze data; I’d wager that as far as analyses go, quant grad students spend about 2/3 of their analysis time cleaning data and about 1/3 doing actual analyses.  (Of course, this varies depending on how complex your data is.

One of the things I hate is when I sit down to do some data work after I thought I cleaned my data and realize that there’s still a lot of cleaning and recoding to do.  It’s inevitable, of course; you always find that there’s something wrong or something you forgot to do once you begin to actually manipulate the data.  My goal today was just to explore my main variables, look at distributions and bivariate relationships (relationships between two variables, like race and age or race and scores on some measure) before moving on to constructing my model.  Of course, I found issues before I could do that.

First, I hate when scales are scored in nonsensical ways.  Let’s say that I have a score called “Skill at Underwater Basketweaving.”  I want higher scores on that scale to reflect more skill in underwater basketweaving.  I don’t want higher scores to be LESS skill, because that’s confusing.  Unfortunately I found out that three of my scales were scored that way (WTF?  In my defense, I did not score them myself).  I found out when I went to run bivariate correlations and found weird negative correlations I didn’t expect.  I don’t know if they were intended to be that way or if they were entered wrong in the survey, but I reverse scored them so that higher scores meant higher amounts of the thing the score measures.

Luckily I was using Stata (a statistical software package) which I have to say is awesome.  I just started using it – in the past I did all my data management in SPSS and then most of my analyses with SAS (both also statistical software packages), but Stata has an SEM package (the kind of analysis I am using in my dissertation) included in it now and so I was able to get my hands on it.  I imported the data into Stata – mostly clean data already – and when I found issues, it was so much faster to fix them in Stata.  Even just moving variables around was faster, although that’s likely because in Stata I am far more likely to look up the syntax to do something (and then fuck it up 4 times in a row, burning into my mind how to do it correctly*).

Then I started playing around with demographics to see if there are important differences between groups. I don’t want there to be differences between my groups because I am doing within-person analyses for my dissertation – I want to see if people change within themselves over time, and I’m not yet really interested in the differences between groups of people.  I am especially not interested – yet – in differences based on demographics, so thankfully I found on important indicators there don’t seem to be differences in the variables I am interested in.  I forgot to look at two particularly important variables, though, so I will save those for tomorrow.

So I did get something done today.  Yay!

WordPress.com has been really slow lately.  I’ll type something and it takes several seconds to show up.  It’s annoying!

*As a side note, this is the way I learn statistics and/or a new software program, and in my opinion the best way.  You sit down with a dataset, and you do stuff to it.  You fuck up multiple times, and then you learn not to do that dumb shit again.  I recall syntax so much better when I’ve typed it slightly wrong 5 times in a row before getting it right (fucking capitals, how do they work) than I do when I just copy and alter it.

Yesterday I made an outline of my dissertation.

Five years ago I would’ve never outlined anything – I would’ve just barreled into the project and started writing.  That’s a great way to to get lost, I now realize.  My outline estimates my dissertation at 90-120 pages, which is a short dissertation but a long paper!  So trying to approach this from the perspective of “I’m going to work on my dissertation today,” or even “I’m going to work on my literature review (30-45 pages) today” is not a good idea.  I’ve also discovered that I write best in a non-linear fashion, because I build up momentum as I work.  So it’s easier for me to start with a section that I know I can whip out faster – like the methods section, which is easy to write because we already did the project and it’s straightforward.  That gets some words on the paper, which gives me confidence when I go to write the much more nebulous (but also really fun) literature review.

So now I can say “I am going to complete my subsection on the communal aspects of green reed underwater basketweaving today,” which is a 2-page section and something I can realistically complete in one day.

I used my own experiences to help my students when teaching them writing – I taught them to do outlines over the summer.  Some of them balked until they started writing and then came up to tell me how useful their outlines were.  It keeps you from getting lost in your words!  It’s like a road map.  You don’t have to stick rigidly to it – in fact, you may find a better/shorter way to go, and alter your path.  But when you get lost and don’t know where to go next, the outline helps you find your way back.  I outline all of my papers now, even if briefly and loosely.

The literature review and methods section are much better outlined than the results and discussion, which makes sense given that I haven’t done the analysis yet.  I have my lit review chapters organized into major sections that are 5 to 12 pages in length; those sections are then organized into subsections that are typically 2-4 pages in length.  My methods section is projected at 15-20 pages total, but I also have that split into 5 major sections that are on average 2-3 pages in length, although some are projected to be slightly longer.

I’m a super dork but I’m actually excited about this, especially writing the literature review (which previously was my least favorite section to write).  I’m excited because I get to read about the history of the work done in this field plus the fresh work, and synthesize it all together.  And I’ve learned how fun learning by doing is in the last few years.  Coursework kind of sucks, but I didn’t mind qualifying exams at all – the studying process was stressful but intriguing – and this process has been a learning experience, too.

I also got new running sneakers yesterday.  I am so over the moon about these running sneakers.  I went running in them yesterday and I feel like a new person.  My body wasn’t crying out in protest after the run was over (just my lungs).  I also found out that they are the same sneakers Wendy Davis wore during her 11-hour filibuster, which made me happy.