How does the Educational Testing Service write Scholastic Aptitude Test questions?

The anxious high school student faced with taking the dreaded Scholastic Aptitude Test (SAT) on a Saturday morning has every reason to wonder just who writes those tests and how.

For in many cases the tests are the single largest factor in determining where, or even whether, a student will go on to college. Although much controversy has arisen over the effectiveness, fairness, and thoroughness of the tests, they are not taken lightly, nor are they written overnight.

The development of the Scholastic Aptitude Test, which consists of mathematical and verbal sections, is a long, careful process, taking about eighteen months from the time a question is written until it is ready for actual use.

Questions are written by test specialists at the nonprofit Educational Testing Service (ETS) in Princeton, New Jersey, by high school and college teachers, and by other individuals with particular training.

Although no one at ETS is employed exclusively to write questions, five out of about ten employees in the math area, and ten out of about twenty people in the verbal area, spend a substantial part of their time doing so. Many of the thirty or so free lance test writers used by ETS are former staff members; others have attended workshops at ETS in which the ground rules for writing test questions are discussed.

Each question on an SAT is reviewed three times by the staff at ETS. First, each of various test specialists, in reviewing a series of questions, prepares a list of correct answers, which is then compared with other lists to verify that there is, indeed, agreement about the correct answer.

Other reviewers judge the questions for “fairness to various minority groups and to males and females.” A test editor may also make further suggestions for change which the specialists then evaluate. Furthermore, a ten member SAT advisory committee on the outside reviews new editions of the test on a rotating schedule; three members see each test. This committee is selected with the intent to draw on the expertise and different points of view of people in diverse occupations and geographical areas. At present, the committee includes a principal of a high school in Dallas, Texas; a professor of mathematics education at the University of Georgia; an administrator at Wellesley College; a reading specialist at Columbia University; a high school math teacher from an inner city Chicago school; and a psychometrician from Hunter College.

Questions that the various reviewers consider “acceptable” are then included in an experimental section of the actual SAT. These do not count toward the student’s score.

The student, of course, doesn’t know which questions are experimental; he may very well slave for ten minutes over a tricky math problem that has no bearing on his final score. Each such question, tried out under standard testing conditions by representative samples of students, is analyzed statistically for its “effectiveness”, which is actually determined by the performance of the students themselves.

For instance, if students do poorly throughout the test but answer the trial question correctly, the question is apparently too easy. Conversely, if the best students cannot correctly answer the trial question, it is probably too difficult, hence inappropriate.

ETS determines the average ability level of each group taking the test, tabulates the precise number of correct and incorrect answers and omissions in the experimental section, and arrives at a computer score indicating how each group performed relative to the others. (According to James Braswell, Test Development coordinator, ETS is aware of students’ “tricks of the trade”, choosing an answer because it stands out significantly from the others, for example, or choosing the longest answer; ETS tries, of course, to eliminate questions with such answers.) Finally, satisfactory questions become part of a pool of questions from which a new edition of the SAT is assembled.

At this point, one might wonder just what all those specialists, with all their checks and balances, are trying to accomplish with the SAT: what do they intend to test, and what are their guidelines for content?

ETS says the SAT is “a test of developed ability, not of innate intelligence.” But they go on to deny that this “developed ability” is a direct reflection of the quality of high school education.

“There is minimal dependence on curriculum in the SAT,” says Braswell, “particularly in the verbal portion of the test. The mathematical portion does, however, depend heavily on first year algebra, and to a lesser extent on geometry taught in junior high or earlier.” The test, then, attempts to draw on abilities developed both in and out of school.

Critics, however, feel the tests simply evaluate a student’s ability to take tests, which may derive from intelligence, background, ambition, fear of failure, or, simply, quick metabolism.

How does the Educational Testing Service write Scholastic Aptitude Test questions?

Related Posts