Testing the Tests
How a test is designed affects the score it produces and whether that number is a faithful measure of a student’s actual knowledge.
[caption id="attachment_2547" align="alignright" width="350" caption="Illustration by Dave Cutler"]

[/caption]
A test aims to evaluate a student’s real knowledge of a subject. How closely the grade matches reality determines an exam’s fidelity. And given the growing emphasis on standardized tests, designing good ones is more important than ever.
Psychometrics is the name for the theory and math behind the science of crafting tests, and UT is a national leader in the field. UT experts help all kinds of private and public entities better their tests.
“Every field uses statistics and psychometrics,” says edcuational psychology professor Barbara Dodd. “Almost every profession has certification or licensing exams, and they need people who can create accurate assessment tools and analyze the data.”
For years, Dodd, director of the Pearson Center for Applied Psychometric Research, has been advising state and federal agencies on superior ways to build tests. In general, she says, it’s better to give lots of tests in order to get more accurate results.
When it comes to high-stakes testing, such as the TAKS, Dodds believes the future lies in computerized adaptive testing, something many private test providers, like the GRE, are already using. What computerized adaptive testing is able to do is re-estimate after each response the test-taker’s real ability and deliver more appropriate questions.
“That way, he or she does not get frustrated by a question that’s too hard or waste time on one that’s too easy,” Dodd says.
Minnesota is the only state to have tried computerized adaptive testing. The other 49 have stuck with paper and pencil either for technical or political reasons, Dodd says. Either a computerized test isn’t feasible because there isn’t a computer for every student, or policymakers find it too hard to explain to constituents why no two students get the exact same questions on an exam.
Even without using computerized testing, the trick with big standardized exams is ensuring that the questions measure how well students are learning what a particular governing body says they need to know. In Texas’ case, the charge is to ensure that the TAKS accurately tests how well a student has learned the Texas Essential Elements curriculum, determined by the Texas Education Agency.
The project gets more complicated as it gets more detailed. A recent version of the SAT with a writing prompt about the television show Jersey Shore sparked outrage from students and parents who complained the question was unfair to students who had never seen the show—or who don’t watch TV at all.
Some policymakers have questioned whether, in Texas, part of the racial achievement gap in test scores could be attributed to cultural bias in the actual questions on the TAKS.
They’re all fair questions for experts in psychometrics, particularly as Texas transitions to the new STAAR test, which arrives in the 2011-12 school year. And UT’s Center of Teaching and Learning is helping teachers design better tests in their own classrooms.
“Everywhere you go, people are trying to certify and measure,” Dodd says. “How do you measure a trait and apply a value to it that’s accurate?”
Read more about the science of test-making here.