nyc’s gifted and talented dilemma: a window into the utility of psychometric testing

An article in the New York Times chronicles a dilemma in New York City’s gifted and talented programs (particularly kindergarten and 1st grade programs). The issue is test prep, which some say is inflating the number of kids qualifying for gifted and talented kindergarten and 1st grade programs in the city.

I took a closer look at the article and found that it underscores the role of psychometric testing and IQ.

Let’s look at some excerpts:

Assessing students has always been a fraught process, especially 4-year-olds, a mercurial and unpredictable lot by nature, who are vying for increasingly precious seats in kindergarten gifted programs.

In New York, it has now become an endless contest in which administrators seeking authentic measures of intelligence are barely able to keep ahead of companies whose aim is to bring out the genius in every young child.
“It’s something the schools know has been corrupted,” said Dr. Samuel J. Meisels, an early-childhood education expert who gave a presentation in the fall to private school officials, encouraging them to abandon the test. Excessive test preparation, he said, “invalidates inferences that can be drawn” about children’s “learning potential and intellect and achievement.”

(Emphasis mine.)

Psychometric tests (read: IQ tests) measure one’s cognitive ability (both overall and specialized – e.g. verbal, visuospatial, and quantitative). They are most accurate when those taking such tests haven’t been exposed to similar problems; that is, an ideal IQ test features novel problems. Enter the test-prep dilemma; with test-prep, prospective test takers gain exposure to the types of problems featured on the entry exams. This dilutes the test’s efficacy since prior knowledge of problems interferes with the test’s ability to assess raw cognitive ability.

Also consider that the test takers are lil’ kids (often age 4 or 5); environmental factors (e.g. test prep) account for roughly 60% of the variance in IQ at these ages. Seeing a larger-than-usual number of kids qualifying for gifted and talented (hereafter GT) programs is thus unsurprising. However, environmental contributions to IQ variance decrease over time, so it’s highly unlikely that this increase in test scores represents a lasting impact (though I could be wrong, assuming the Flynn Effect continues unabated – which even Flynn believes won’t happen). Consider this next excerpt:

Scores had been soaring. For the 2012-13 school year, nearly 5,000 children qualified for gifted and talented kindergarten seats in New York City public schools. That was more than double the number five years ago. “We were concerned enough about our definition of giftedness being affected by test prep — as we were prior school experience, primary spoken language, socioeconomic background and culture — that we changed the assessment,” Adina Lopatin, a deputy chief academic officer in the Education Department, said.

(Emphasis mine.)

In other words, the kids weren’t getting smarter – they became more knowledgeable (there is a difference!).

Some background: for many years, GT programs in NYC required high marks on the so-called “ERB” (named after the Education Records Bureau which administers the test; it’s really called the Early Childhood Admissions Assessment, or ECAA). Because of these problems, a new exam – the Naglieri Nonverbal Ability Test – became the standard GT admissions test. However, even this test isn’t without its problems; here’s an excerpt from an older Wall Street Journal article:

City officials hailed the new test as a vast improvement. It relies on abstract spatial thinking and largely eliminates language, even from the instructions, an approach that officials said better captures intelligence, is more appropriate for the city’s multilingual population and is less vulnerable to test preparation.

As a result, they expressed the hope that it would “improve the diversity of students that are recognized as gifted and talented,” said Adina Lopatin, the deputy chief academic officer for the city’s Department of Education. City officials said they were currently compiling data on the program’s racial breakdown but students who qualified tended to be concentrated in wealthier districts. Areas such as the South Bronx produced few candidates.

Some experts have raised doubts about the NNAT’s ability to create a racially balanced class. Several studies show the test produces significant scoring gaps between wealthier white and Asian children and their poor, minority counterparts.

(Emphasis mine.)

Note the familiar racial “achievement gap” and socioeconomic status (SES) issues. I wouldn’t say the test itself produces the so-called “scoring gaps” – especially if the test’s design is “culture-fair.” What we’re really seeing here is HBD in action, with test prep and/or academic environs as possible confluent factors. I’ll also add that designing a psychometric test around racial diversity is an exercise in futility since such assumes all races are equal on average, which they aren’t. (I previously blogged about racial disparities in several well-known exams in defending the Specialized High School Admissions Test, or SHSAT; this, combined with the NNAT results, provides more evidence of cognitive differences between groups.) Anyhow, let’s return to the NYT article:

“Every time these tests change, there’s a lot of demand,” Bige Doruk, founder of Bright Kids, said. She said she did not accept the argument that admissions tests had been invalidated by test prep. “It is not a validity issue, it’s a competitive issue,” she said. “Parents will always do what they can for their children.” And not all children who take preparation courses do well, she said. The test requires that 4-year-olds sit with a stranger for nearly an hour — skills that extend beyond the scope of I.Q. or school readiness.

But does “sitting with a stranger for nearly an hour” really extend beyond IQ’s scope? If this is an implicit reference to a child’s behavior, several studies show (<– three links there – more via Google) that significant links between IQ and various life and behavioral outcomes (whether for children, adolescents, teens, or adults) exist. However, there is debate over the sources of IQ variance (biological/genetic v. environmental – i.e. how much IQ variance is truly attributable to either factor); see, for instance, this paper by Nisbett et al.

Notwithstanding, it’s clear one can’t ignore IQ.

The rest of the NYT article offers more insight into the psychometric GT testing/test prep dilemma. Needless to say, if test prep does significantly alter a psychometric exam’s results, then test prep denudes those results (or put another way, the exam isn’t psychometric enough). The result, sad to say, is some kids qualifying as GT that actually aren’t. At the same time, however, this situation highlights psychometric testing’s benefits (note that the Education Department moved to a test that was more psychometric in response to the test prep); methinks such a move underscores psychometric exams’ utility as markers for cognitive ability. Finally, consider Hunter College Elementary School – a GT school (once more from the NYT article):

Hunter, a public school for gifted children that is part of the City University of New York, requires applicants to take the Stanford-Binet V intelligence test, and until last year, families could pick from 1 of 16 psychologists to administer the test. Uncovering who was the “best tester,” one who might give children more time to answer, or pose questions different ways, was a popular parlor game among parents.

But for this year’s admission process, the school announced that every family would be required to choose from only four testers. Randy Collins, Hunter’s principal, said the change was not related to families’ flocking to “easy” testers, but rather an attempt to ease the scheduling process. “We have seen no evidence that some are easy and some are tough, that some give extra time,” he said. And yet the decision seems to have had an impact: after several years in which scores rose, Mr. Collins said, scores did not go up this year.

(Emphasis mine.)

If the “easy testers” situation is true (that is, testers who give children extra time or who “guide” children as they test), then this can also denude a psychometric exam’s efficacy. That Hunter Elementary’s admission scores didn’t rise might evince this; of course, this is pure speculation – we’d need to wait and see whether the change sparks a new trend.

Discuss (Be respectful. No trolling or threats allowed; violators subject to moderation or ban. Thanks!)

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s