In my previous post, I wrote about a Second Language Acquisition experiment I worked on last semester with the director of USC’s Spanish language program. In this follow-up, I’ll discuss how I analyzed the data. In other words, get ready for some pictures. 📊 🤓
I had the opportunity to work with the director of USC’s Spanish program recently on an experiment. She was interested in seeing if making tweaks to a teaching intervention resulted in better understanding of the grammar used in conditional sentences (“If I study, I will get an A on the exam”). She’s the expert in Second Language Acquisition (SLA), so she designed the experiment. I helped her record and edit the videos, wrangle the data, and analyze the results. In this first post, I’ll discuss how I imported and cleaned the data.
Last week, I helped out with a Software Carpentry workshop at CalTech, and it reminded me that I hadn’t posted this:
In my last two posts (here and here), I discussed my efforts to scrape the DataCamp website so that I could have a record of the courses I’ve taken. In this final post, I’m putting all the pieces together, discussing how the script works and showing what the final product looks like.
I was so pysched about scraping the DataCamp website that I dove right into the second part of the project. In my previous post, I showed how I was able to iterate through all the chapters and lessons from the course landing page. In this post, I’ll focus on how I extracted all the lesson info.
Most of what I’ve learned about data sciences has come from the courses I’ve taken at DataCamp. Until now, every time I take a course, I copy and paste the course outline into a Markdown file. This gives me an archive I can refer to when I face similar problems down the road. But it takes time, time I could be using to take more DataCamp courses!
The reason I love all the data science and programming techniques I’ve been learning is that it feels like magic. There’s a task that I have to do over and over again. I write a few lines of code. I never have to do it again. And everything I learn ends up helping me somewhere, often in ways I can’t anticipate. It’s like learning a language when you’re living abroad. As soon as you know a new word, you hear it everywhere, and a little piece of the world is suddenly unlocked.
Every time we do a round of placement tests at USC, we upload the results to our student information system (SIS) as a fixed-width text file. In the past, that file was created manually in Excel, so one of the first things I did when I arrived was create a script to automate that process. It’s definitely made things a lot quicker, but there’s still a problem: bad birthdays.
A friend of mine was complaining to me the other day about trying to get data from the American Community Survey. Fortunately, it turns out that the Census has a fantastic and well-documented API. So I volunteered to help. I read through this guide, found a helpful page for looking up geocodes from the Missouri Census Data Center, and looked through a rather cumbersome list of the variables you can request.
This week, I began exploring our backlog of language placement exams. I think the best way to talk about his is to walk you through the process of answering a sample question. For instance, how many students have taken our Spanish exam since we started collecting data?
I’ve created a GitHub repository for the scripts that I’m writing for my work in the Language Center. You can access it here. So far, there are two scripts: one for assembling test results into a fixed-width text file, and one for turning a fixed-width text file into a
In my previous post, I discussed the process of converting the results from our language placement test into a fixed-width text file that’s compatible with our student information system. But what about going in the other direction? We have years worth of data in text files, and they’re ripe for analysis!
An important part of my job at the USC Language Center is administering placement tests and making the results available to students, advisors, and other administrators. Several times during the year, students take our tests using Scantron forms, and I end up with several CSV files — one for each of the languages we offer. I then need to make sure that all those results end up in a single, fixed-width text file that’s compatible with the university’s student information system. It’s one of those data management tasks that are perfect for automation with python.