The reason I love all the data science and programming techniques I’ve been learning is that it feels like magic. There’s a task that I have to do over and over again. I write a few lines of code. I never have to do it again. And everything I learn ends up helping me somewhere, often in ways I can’t anticipate. It’s like learning a language when you’re living abroad. As soon as you know a new word, you hear it everywhere, and a little piece of the world is suddenly unlocked.
Every time we do a round of placement tests at USC, we upload the results to our student information system (SIS) as a fixed-width text file. In the past, that file was created manually in Excel, so one of the first things I did when I arrived was create a script to automate that process. It’s definitely made things a lot quicker, but there’s still a problem: bad birthdays.
A friend of mine was complaining to me the other day about trying to get data from the American Community Survey. Fortunately, it turns out that the Census has a fantastic and well-documented API. So I volunteered to help. I read through this guide, found a helpful page for looking up geocodes from the Missouri Census Data Center, and looked through a rather cumbersome list of the variables you can request.
This week, I began exploring our backlog of language placement exams. I think the best way to talk about his is to walk you through the process of answering a sample question. For instance, how many students have taken our Spanish exam since we started collecting data?
I’ve created a GitHub repository for the scripts that I’m writing for my work in the Language Center. You can access it here. So far, there are two scripts: one for assembling test results into a fixed-width text file, and one for turning a fixed-width text file into a
In my previous post, I discussed the process of converting the results from our language placement test into a fixed-width text file that’s compatible with our student information system. But what about going in the other direction? We have years worth of data in text files, and they’re ripe for analysis!
An important part of my job at the USC Language Center is administering placement tests and making the results available to students, advisors, and other administrators. Several times during the year, students take our tests using Scantron forms, and I end up with several CSV files — one for each of the languages we offer. I then need to make sure that all those results end up in a single, fixed-width text file that’s compatible with the university’s student information system. It’s one of those data management tasks that are perfect for automation with python.