Jill Barshay, Author at The Hechinger Report

An AI tutor helped Harvard students learn more physics in less time

Jill Barshay — Mon, 16 Sep 2024 10:00:00 +0000

A student’s view of PS2 Pal, the AI tutor used in a learning experiment inside Harvard’s physics department. (Screenshot courtesy of Gregory Kestin)

We are still in the early days of understanding the promise and peril of using generative AI in education. Very few researchers have evaluated whether students are benefiting, and one well-designed study showed that using ChatGPT for math actually harmed student achievement.

The first scientific proof I’ve seen that ChatGPT can actually help students learn more was posted online earlier this year. It’s a small experiment, involving fewer than 200 undergraduates. All were Harvard students taking an introductory physics class in the fall of 2023, so the findings may not be widely applicable. But students learned more than twice as much in less time when they used an AI tutor in their dorm compared with attending their usual physics class in person. Students also reported that they felt more engaged and motivated. They learned more and they liked it.

A paper about the experiment has not yet been published in a peer-reviewed journal, but other physicists at Harvard University praised it as a well-designed experiment. Students were randomly assigned to learn a topic as usual in class, or stay “home” in their dorm and learn it through an AI tutor powered by ChatGPT. Students took brief tests at the beginning and the end of class, or their AI sessions, to measure how much they learned. The following week, the in-class students learned the next topic through the AI tutor in their dorms, and the AI-tutored students went back to class. Each student learned both ways, and for both lessons – one on surface tension and one on fluid flow – the AI-tutored students learned a lot more.

To avoid AI “hallucinations,” the tendency of chatbots to make up stuff that isn’t true, the AI tutor was given all the correct solutions. But other developers of AI tutors have also supplied their bots with answer keys. Gregory Kestin, a physics lecturer at Harvard and developer of the AI tutor used in this study, argues that his effort succeeded while others have failed because he and his colleagues fine-tuned it with pedagogical best practices. For example, the Harvard scientists instructed this AI tutor to be brief, using no more than a few sentences, to avoid cognitive overload. Otherwise, he explained, ChatGPT has a tendency to be “long-winded.”

The tutor, which Kestin calls “PS2 Pal,” after the Physical Sciences 2 class he teaches, was told to only give away one step at a time and not to divulge the full solution in a single message. PS2 Pal was also instructed to encourage students to think and give it a try themselves before revealing the answer.

Unguided use of ChatGPT, the Harvard scientists argue, lets students complete assignments without engaging in critical thinking.

Kestin doesn’t deliver traditional lectures. Like many physicists at Harvard, he teaches through a method called “active learning,” where students first work with peers on in-class problem sets as the lecturer gives feedback. Direct explanations or mini-lectures come after a bit of trial, error and struggle. Kestin sought to reproduce aspects of this teaching style with the AI tutor. Students toiled on the same set of activities and Kestin fed the AI tutor the same feedback notes that he planned to deliver in class.

Kestin provocatively titled his paper about the experiment, “AI Tutoring Outperforms Active Learning,” but in an interview he told me that he doesn’t mean to suggest that AI should replace professors or traditional in-person classes.

“I don’t think that this is an argument for replacing any human interaction,” said Kestin. “This allows for the human interaction to be much richer.”

Kestin says he intends to continue teaching through in-person classes, and he remains convinced that students learn a lot from each other by discussing how to solve problems in groups. He believes the best use of this AI tutor would be to introduce a new topic ahead of class – much like professors assign reading in advance. That way students with less background knowledge won’t be as behind and can participate more fully in class activities. Kestin hopes his AI tutor will allow him to spend less time on vocabulary and basics and devote more time to creative activities and advanced problems during class.

Of course, the benefits of an AI tutor depend on students actually using it. In other efforts, students often didn’t want to use earlier versions of education technology and computerized tutors. In this experiment, the “at-home” sessions with PS2 Pal were scheduled and proctored over Zoom. It’s not clear that even highly motivated Harvard students will find it engaging enough to use regularly on their own initiative. Cute emojis – another element that the Harvard scientists prompted their AI tutor to use – may not be enough to sustain long-term interest.

Kestin’s next step is to test the tutor bot for an entire semester. He’s also been testing PS2 Pal as a study assistant with homework. Kestin said he’s seeing promising signs that it’s helpful for basic but not advanced problems.

The irony is that AI tutors may not be that effective at what we generally think of as tutoring. Kestin doesn’t think that current AI technology is good at anything that requires knowing a lot about a person, such as what the student already learned in class or what kind of explanatory metaphor might work.

“Humans have a lot of context that you can use along with your judgment in order to guide a student better than an AI can,” he said. In contrast, AI is good at introducing students to new material because you only need “limited context” about someone and “minimal judgment” for how best to teach it.

Contact staff writer Jill Barshay at (212) 678-3595 or barshay@hechingerreport.org.

This story about an AI tutor was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post An AI tutor helped Harvard students learn more physics in less time appeared first on The Hechinger Report.

Students aren’t benefiting much from tutoring, one new study shows

Jill Barshay — Mon, 09 Sep 2024 10:00:00 +0000

Matthew Kraft, an associate professor of education and economics at Brown University, was an early proponent of giving tutors — ordinarily a luxury for the rich — to the masses after the pandemic. The research evidence was strong; more than a hundred studies had shown remarkable academic gains for students who were frequently tutored every week at school. Sometimes, they caught up two grade levels in a single year.

After Covid shuttered schools in the spring of 2020, Kraft along with a small group of academics lobbied the Biden administration to urge schools to invest in this kind of intensive tutoring across the nation to help students catch up from pandemic learning losses. Many schools did — or tried to do so. Now, in a moment of scholarly honesty and reflection, Kraft has produced a study showing that tutoring the masses isn’t so easy — even with billions of dollars from Uncle Sam.

The study, which was posted online in late August 2024, tracked almost 7,000 students who were tutored in Nashville, Tennessee, and calculated how much of their academic progress could be attributed to the sessions of tutoring they received at school between 2021 and 2023. Kraft and his research team found that tutoring produced only a small boost to reading test scores, on average, and no improvement in math. Tutoring failed to lift course grades in either subject.

“These results are not as large as many in the education sector had hoped,” said Kraft in an interview. That’s something of an academic understatement. The one and only positive result for students was a tiny fraction of what earlier tutoring studies had found.

“I was and continue to be incredibly impressed with the rigorous and wide body of evidence that exists for tutoring and the large average effects that those studies produced,” said Kraft. “I don’t think I paid as much attention to whether those tutoring programs were as applicable to post-Covid era tutoring at scale.”

Going forward, Kraft said he and other researchers need to “recalibrate” or adjust expectations around the “eye-popping” or very large impacts that previous small-scale tutoring programs have achieved.

Kraft described the Nashville program as “multiple orders of magnitude” larger than the pre-Covid tutoring studies. Those were often less than 50 students, while some involved a few hundred. Only a handful included over 1,000 students. Nashville’s tutoring program reached almost 7,000 students, roughly 10 percent of the district’s student population.

Tennessee was a trailblazer in tutoring after the pandemic. State lawmakers appropriated extra funding to schools to launch large tutoring programs, even before the Biden administration urged schools around the nation to do the same with their federal Covid recovery funds. Nashville partnered with researchers, including Kraft, to study its ramp up and outcomes for students to help advise on improvements along the way.

As with the launching of any big new program, Nashville hit a series of snags. Early administrators were overwhelmed with “14 bazillion emails,” as educators described them to researchers in the study, before they hired enough staff to coordinate the tutoring program. They first tried online tutoring. But too much time and effort was wasted setting kids up on computers, coping with software problems, and searching for missing headphones. Some children had to sit in the hallway with their tablets and headphones; it was hard to concentrate.

Meanwhile, remote tutors were frustrated by not being able to talk with teachers regularly. Often there was redundancy with tutors being told to teach topics identical to what the students were learning in class.

The content of the tutoring lessons was in turmoil, too. The city scrapped its math curriculum midway. Different grades required different reading curricula. For each of them, Nashville educators needed to create tutor guides and student workbooks from scratch.

Eventually the city switched course and replaced its remote tutors, who were college student volunteers, with teachers at the school who could tutor in-person. That eliminated the headaches of troublesome technology. Also, teachers could adjust the tutoring lessons to avoid repeating exactly what they had taught in class.

But school teachers were fewer in number and couldn’t serve as many students as an army of remote volunteers. Instead of one tutor for each student, teachers worked with three or four students at a time. Even after tripling and quadrupling up, there weren’t enough teachers to tutor everyone during school hours. Half the students had their tutoring sessions scheduled immediately before or right after school.

In interviews, teachers said they enjoyed the stronger relationships they were building with their students. But there were tradeoffs. The extra tutoring work raised concerns about teacher burnout.

Despite the flux, some things improved as the tutoring program evolved. The average number of tutoring sessions that students attended increased from 16 sessions in the earlier semesters to 24 sessions per semester by spring of 2023.

Why the academic gains for students weren’t stronger is unclear. One of Kraft’s theories is that Nashville asked tutors to teach grade-level skills and topics, similar to what the children were also learning in their classrooms and what the state tests would assess. But many students were months, even years behind grade level, and may have needed to learn rudimentary skills before being able to grasp more advanced topics. (This problem surprised me because I thought the whole purpose of tutoring was to fill in missing skills and knowledge!) In the data, average students in the middle of the achievement distribution showed the greatest gains from Nashville’s tutoring program. Students at the bottom and top didn’t progress much, or at all. (See the graph below.)

“What’s most important is that we figure out what tutoring programs and design features work best for which students,” Kraft said.

Average students in the middle of the achievement distribution gained the most from Nashville’s tutoring program, while students who were the most behind did not catch up much

Source: Kraft, Matthew A., Danielle Sanderson Edwards, and Marisa Cannata. (2024). The Scaling Dynamics and Causal Effects of a District-Operated Tutoring Program.

Another reason for the disappointing academic gains from tutoring may be related to the individualized attention that many students were also receiving at Nashville’s schools. Tutoring often took place during frequently scheduled periods of “Personalized Learning Time” for students, and even students not selected for tutoring received other instruction during this period, such as small-group work with a teacher or individual services for children with special needs. Another set of students was assigned independent practice work using advanced educational software that adapts to a student’s level. To demonstrate positive results in this study, tutoring would have had to outperform all these other interventions. It’s possible that these other interventions are as powerful as tutoring. Earlier pre-Covid studies of tutoring generally compared the gains against those of students who had nothing more than traditional whole class instruction. That’s a starker comparison. (To be sure, one would still have hoped to see stronger results for tutoring as the Nashville program migrated outside of school hours; students who received both tutoring and personalized learning time should have meaningfully outperformed students who had only the personalized learning time.)

Other post-pandemic tutoring research has been rosier. A smaller study of frequent in-school tutoring in Chicago and Atlanta, released in March 2024, found giant gains for students in math, enough to totally undo learning losses for the average student. However, those results were achieved by only three-quarters of the roughly 800 students who had been assigned to receive tutoring and actually attended sessions.*

Kraft argued that schools should not abandon tutoring just because it’s not a silver bullet for academic recovery after Covid. “I worry,” he said, “that we may excuse ourselves from the hard work of iterative experimentation and continuous improvement by saying that we didn’t get the eye-popping results that we had hoped for right out of the gate, and therefore it’s not the solution that we should continue to invest in.”

Iteratively is how the business world innovates too. I’m a former business reporter, and this rocky effort to bring tutoring to schools reminds me of how Levi’s introduced custom-made jeans for the masses in the 1990s. These “personal pairs” didn’t cost much more than traditional mass-produced jeans, but it was time consuming for clerks to take measurements, often the jeans didn’t fit and reorders were a hassle. Levi’s pulled the plug in 2003. Eventually it brought back custom jeans — truly bespoke ones made by a master tailor at $750 or more a pop. For the masses? Maybe not.

I wonder if customized instruction can be accomplished at scale at an affordable price. To really help students who are behind, tutors will need to diagnose each student’s learning gaps, and then develop a customized learning plan for each student. That’s pricey, and maybe impossible to do for millions of students all over the country.

*Correction: An earlier version of this story incorrectly stated how many students were assigned to receive tutoring in the Chicago and Atlanta experiment. Only 784 students were to be tutored out of 1,540 students in the study. About three-quarters of those 784 students received tutoring. The sentence was also revised to clarify which students’ math outcomes drove the results.

This story about tutoring research was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post Students aren’t benefiting much from tutoring, one new study shows appeared first on The Hechinger Report.

Kids who use ChatGPT as a study assistant do worse on tests

Jill Barshay — Mon, 02 Sep 2024 10:00:00 +0000

Does AI actually help students learn? A recent experiment in a high school provides a cautionary tale.

Researchers at the University of Pennsylvania found that Turkish high school students who had access to ChatGPT while doing practice math problems did worse on a math test compared with students who didn’t have access to ChatGPT. Those with ChatGPT solved 48 percent more of the practice problems correctly, but they ultimately scored 17 percent worse on a test of the topic that the students were learning.

A third group of students had access to a revised version of ChatGPT that functioned more like a tutor. This chatbot was programmed to provide hints without directly divulging the answer. The students who used it did spectacularly better on the practice problems, solving 127 percent more of them correctly compared with students who did their practice work without any high-tech aids. But on a test afterwards, these AI-tutored students did no better. Students who just did their practice problems the old fashioned way — on their own — matched their test scores.

The researchers titled their paper, “Generative AI Can Harm Learning,” to make clear to parents and educators that the current crop of freely available AI chatbots can “substantially inhibit learning.” Even a fine-tuned version of ChatGPT designed to mimic a tutor doesn’t necessarily help.

The researchers believe the problem is that students are using the chatbot as a “crutch.” When they analyzed the questions that students typed into ChatGPT, students often simply asked for the answer. Students were not building the skills that come from solving the problems themselves.

ChatGPT’s errors also may have been a contributing factor. The chatbot only answered the math problems correctly half of the time. Its arithmetic computations were wrong 8 percent of the time, but the bigger problem was that its step-by-step approach for how to solve a problem was wrong 42 percent of the time. The tutoring version of ChatGPT was directly fed the correct solutions and these errors were minimized.

A draft paper about the experiment was posted on the website of SSRN, formerly known as the Social Science Research Network, in July 2024. The paper has not yet been published in a peer-reviewed journal and could still be revised.

This is just one experiment in another country, and more studies will be needed to confirm its findings. But this experiment was a large one, involving nearly a thousand students in grades nine through 11 during the fall of 2023. Teachers first reviewed a previously taught lesson with the whole classroom, and then their classrooms were randomly assigned to practice the math in one of three ways: with access to ChatGPT, with access to an AI tutor powered by ChatGPT or with no high-tech aids at all. Students in each grade were assigned the same practice problems with or without AI. Afterwards, they took a test to see how well they learned the concept. Researchers conducted four cycles of this, giving students four 90-minute sessions of practice time in four different math topics to understand whether AI tends to help, harm or do nothing.

ChatGPT also seems to produce overconfidence. In surveys that accompanied the experiment, students said they did not think that ChatGPT caused them to learn less even though they had. Students with the AI tutor thought they had done significantly better on the test even though they did not. (It’s also another good reminder to all of us that our perceptions of how much we’ve learned are often wrong.)

The authors likened the problem of learning with ChatGPT to autopilot. They recounted how an overreliance on autopilot led the Federal Aviation Administration to recommend that pilots minimize their use of this technology. Regulators wanted to make sure that pilots still know how to fly when autopilot fails to function correctly.

ChatGPT is not the first technology to present a tradeoff in education. Typewriters and computers reduce the need for handwriting. Calculators reduce the need for arithmetic. When students have access to ChatGPT, they might answer more problems correctly, but learn less. Getting the right result to one problem won’t help them with the next one.

This story about using ChatGPT to practice math was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post Kids who use ChatGPT as a study assistant do worse on tests appeared first on The Hechinger Report.

Researchers combat AI hallucinations in math

Jill Barshay — Mon, 26 Aug 2024 10:00:00 +0000

Two University of California, Berkeley, researchers documented how they tamed AI hallucinations in math by asking ChatGPT to solve the same problem 10 times. Credit: Eugene Mymrin/ Moment via Getty Images

One of the biggest problems with using AI in education is that the technology hallucinates. That’s the word the artificial intelligence community uses to describe how its newest large language models make up stuff that doesn’t exist or isn’t true. Math is a particular land of make-believe for AI chatbots. Several months ago, I tested Khan Academy’s chatbot, which is powered by ChatGPT. The bot, called Khanmigo, told me I had answered a basic high school Algebra 2 problem involving negative exponents wrong. I knew my answer was right. After typing in the same correct answer three times, Khanmigo finally agreed with me. It was frustrating.

Errors matter. Kids could memorize incorrect solutions that are hard to unlearn, or become more confused about a topic. I also worry about teachers using ChatGPT and other generative AI models to write quizzes or lesson plans. At least a teacher has the opportunity to vet what AI spits out before giving or teaching it to students. It’s riskier when you’re asking students to learn directly from AI.

Computer scientists are attempting to combat these errors in a process they call “mitigating AI hallucinations.” Two researchers from University of California, Berkeley, recently documented how they successfully reduced ChatGPT’s instructional errors to near zero in algebra. They were not as successful with statistics, where their techniques still left errors 13 percent of the time. Their paper was published in May 2024 in the peer-reviewed journal PLOS One.

In the experiment, Zachary Pardos, a computer scientist at the Berkeley School of Education, and one of his students, Shreya Bhandari, first asked ChatGPT to show how it would solve an algebra or statistics problem. They discovered that ChatGPT was “naturally verbose” and they did not have to prompt the large language model to explain its steps. But all those words didn’t help with accuracy. On average, ChatGPT’s methods and answers were wrong a third of the time. In other words, ChatGPT would earn a grade of a D if it were a student.

Current AI models are bad at math because they’re programmed to figure out probabilities, not follow rules. Math calculations are all about rules. It’s ironic because earlier versions of AI were able to follow rules, but unable to write or summarize. Now we have the opposite.

The Berkeley researchers took advantage of the fact that ChatGPT, like humans, is erratic. They asked ChatGPT to answer the same math problem 10 times in a row. I was surprised that a machine might answer the same question differently, but that is what these large language models do. Often the step-by-step process and the answer were the same, but the exact wording differed. Sometimes the methods were bizarre and the results were dead wrong. (See an example in the illustration below.)

Researchers grouped similar answers together. When they assessed the accuracy of the most common answer among the 10 solutions, ChatGPT was astonishingly good. For basic high-school algebra, AI’s error rate fell from 25 percent to zero. For intermediate algebra, the error rate fell from 47 percent to 2 percent. For college algebra, it fell from 27 percent to 2 percent.

ChatGPT answered the same algebra question three different ways, but it landed on the right response seven out of 10 times in this example

Source: Pardos and Bhandari, “ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills,” PLOS ONE, May 2024

However, when the scientists applied this method, which they call “self-consistency,” to statistics, it did not work as well. ChatGPT’s error rate fell from 29 percent to 13 percent, but still more than one out of 10 answers was wrong. I think that’s too many errors for students who are learning math.

The big question, of course, is whether these ChatGPT’s solutions help students learn math better than traditional teaching. In a second part of this study, researchers recruited 274 adults online to solve math problems and randomly assigned a third of them to see these ChatGPT’s solutions as a “hint” if they needed one. (ChatGPT’s wrong answers were removed first.) On a short test afterwards, these adults improved 17 percent, compared to less than 12 percent learning gains for the adults who could see a different group of hints written by undergraduate math tutors. Those who weren’t offered any hints scored about the same on a post-test as they did on a pre-test.

Those impressive learning results for ChatGPT prompted the study authors to boldly predict that “completely autonomous generation” of an effective computerized tutoring system is “around the corner.” In theory, ChatGPT could instantly digest a book chapter or a video lecture and then immediately turn around and tutor a student on it.

Before I embrace that optimism, I’d like to see how much real students – not just adults recruited online – use these automated tutoring systems. Even in this study, where adults were paid to do math problems, 120 of the roughly 400 participants didn’t complete the work and so their results had to be thrown out. For many kids, and especially students who are struggling in a subject, learning from a computer just isn’t engaging.

This story about AI hallucinations was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post Researchers combat AI hallucinations in math appeared first on The Hechinger Report.

PROOF POINTS: Why are kids still struggling in school four years after the pandemic?

Jill Barshay — Mon, 19 Aug 2024 10:00:00 +0000

Four years after the pandemic shuttered schools, we all want to be done with COVID. But the latest analyses from three assessment companies paint a grim picture of where U.S. children are academically and that merits coverage. While there are isolated bright spots, the general trend is stagnation.

One report documented that U.S. students did not make progress in catching up in the most recent 2023-24 school year and slid even further behind in math and reading, exacerbating pandemic learning losses.

“At the end of 2021-22, we optimistically concluded that the worst was behind us and that recovery had begun,” wrote Karyn Lewis, a researcher at NWEA, one of the assessment companies. “Unfortunately, data from the past two school years no longer support this conclusion. Growth has slowed to lag pre-pandemic rates, resulting in achievement gaps that continue to widen, and in some cases, now surpass what we had previously deemed as the low point.”

The starkest example is eighth grade students, who were in fourth grade when the pandemic first erupted in March of 2020. They now need nine months of additional school to catch up, according to NWEA’s analysis, released in July 2024. “This is a crisis moment with middle schoolers,” said Lewis. “Where are we going to find an additional year to make up for these kiddos before they leave the education system?”

All three analyses were produced by for-profit companies that sell assessments to schools. Teachers or parents may be familiar with them by the names of their tests: MAP, i-Ready and Star. Unlike annual state tests, these interim assessments are administered at least twice a year to millions of students around the nation to help track progress, or learning, during the year. These companies may have a business motive in sounding an alarm to sell more of their products, but the reports are produced by well-regarded education statisticians.

Curriculum Associates did not detect as much deterioration as NWEA, but did find widespread stagnation in 2023-24, according to a report released on August 19, 2024. Their researcher Kristen Huff described the numerical differences as tiny ones that have to do with the fact that these are different tests, taken by different students and use different methods for crunching the numbers. The main takeaway from all the reports, she said, is the same. “As a nation, we are still seeing the lasting impact of the disruption to schooling and learning,” said Huff, vice president of assessment and research at Curriculum Associates.

In short, children remain behind and haven’t recovered. That matters for these children’s future employment prospects and standard of living. Ultimately, a less productive labor force could hamper the U.S. economy, according to projections from economists and consulting firms.

It’s important to emphasize that individual students haven’t regressed or don’t know less now than they used to. The average sixth grader knows more today in 2024 than he or she did in first grade in 2019. But the pace of learning, or rate of academic growth, has been rocky since 2020, with some students missing many months of instruction. Sixth graders in 2024, on average, know far less than sixth graders did back in 2019.

Renaissance, a third company, found a mottled pattern of recovery, stagnation and deterioration depending upon the grade and the subject. (The company shared its preliminary mid-year results with me via email on Aug. 14, 2024.) Most concerning, it found that the math skills of older students in grades eight to 12 are progressing so slowly that they are even further behind than they were after the initial pandemic losses. These students were in grades four through eight when the pandemic first hit in March 2020.

On the bright side, the Renaissance analysis found that first grade students in 2023-24 had completely recovered and their performance matched what first graders used to be able to do before the pandemic. Elementary school students in grades two to six were making slow progress, and remained behind.

Curriculum Associates pointed to two unexpected bright spots in its assessment results. One is phonics. At the end of the 2023-24 school year, nearly as many kindergarteners were on grade level for phonics skills as kindergarteners in 2019. That’s four out of five kindergarteners. The company also found that schools where the majority of students are Black were showing relatively better catch-up progress. “It’s small, and disparities still exist, but it’s a sign of hope,” said Curriculum Associates’s Huff.

Here are three charts and tables from the three different testing companies that provide different snapshots of where we are.

Months of additional school required to catch up to pre-pandemic achievement levels on NWEA’s MAP tests

The bars show the difference between MAP test scores before the pandemic and in the spring of 2024 for each grade. The green line translates those deficits into months of additional schooling, based on how much students typically learned in a school year before COVID hit. For example, fifth graders would need an additional 3.9 months of math instruction over and above the usual school year to catch up to where fifth graders were before the virus. Source: Figure 3 “Recovery still elusive: 2023–24 student achievement highlights persistent achievement gaps and a long road ahead,” NWEA (July 2024).

Percentage of students below grade level by grade and year according to Curriculum Associates’s i-Ready tests

Almost one out of every five third graders is below grade level in reading, a big increase from one out of every eight students before the pandemic. Source: Figure 2, “State of Student Learning in 2024” Curriculum Associates (August 19, 2024)

The number of students who are below grade level in math is higher than it used to be before the pandemic in grades one through eight. Source: Figure 11, “State of Student Learning in 2024” Curriculum Associates (Aug. 19, 2024)

Catch-up progress as of the winter of 2023-24, according to Renaissance, maker of the Star assessments

Renaissance analysis of Star tests taken between December 2023 and March 2024 (shared with The Hechinger Report in August 2024). Final spring scores were not yet analyzed.

Understanding why recovery is stagnating and sometimes worsening over the past year is difficult. These test score analyses don’t offer explanations, but researchers shared a range of theories.

One is that once students have a lot of holes in their foundational skills, it’s really hard for them to learn new grade-level topics each year.

“I think this is a problem that’s growing and building on itself,” said NWEA’s Lewis. She cited the example of a sixth grader who is still struggling to read. “Does a sixth-grade teacher have the same skills and tools to teach reading that a second or third grade teacher does? I doubt that’s the case.”

Curriculum Associates’s Huff speculated that the whole classroom changes when a high percentage of students are behind. A teacher may have been able to give more individual attention to a small group of students who are struggling, but it’s harder to attend to individual gaps when so many students have them. It’s also harder to keep up with the traditional pace of instruction when so many students are behind.

One high school math teacher told me that she thinks learning failed to recover and continued to deteriorate because schools didn’t rush to fill the gaps right away. This teacher said that when in-person school resumed in her city in 2021, administrators discouraged her from reviewing old topics that students had missed and told her to move forward with grade-level material.

“The word that was going around was ‘acceleration not remediation’,” the teacher said. “These kids just missed 18 months of school. Maybe you can do that in social studies. But math builds upon itself. If I miss sixth, seventh and eighth grade, how am I going to do quadratic equations? How am I going to factor? The worst thing they ever did was not provide that remediation as soon as they walked back in the door.” This educator quit her public school teaching job in 2022 and has since been tutoring students to help them catch up from pandemic learning losses.

Chronic absenteeism is another big factor. If you don’t show up to school, you’re not likely to catch up. More than one in four students in the 2022-23 school year were chronically absent, missing at least 10 percent of the school year.

Deteriorating mental health is also a leading theory for school struggles. A study by researchers at the University of Southern California, released Aug. 15, 2024, documented widespread psychological distress among teenage girls and preteen boys since the pandemic. Preteen boys were likely to struggle with hyperactivity, inattentiveness and conduct, such as losing their temper and fighting. These mental health struggles correlated with absenteeism and low grades.

It’s easy to jump to the conclusion that the $190 billion that the federal government gave to schools for pandemic recovery didn’t work. (The deadline for signing contracts to spend whatever is left of that money is September 2024.) But that doesn’t tell the whole story. Most of the spending was targeted at reopening schools and upgrading heating, cooling and air ventilation systems. A much smaller amount went to academic recovery, such as tutoring or summer school. Earlier this summer two separate groups of academic researchers concluded that this money led to modest academic gains for students. The problem is that so much more is still needed.

This story about academic recovery was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: Why are kids still struggling in school four years after the pandemic? appeared first on The Hechinger Report.

PROOF POINTS: Nearly 6 out of 10 middle and high school grades are wrong, study finds

Jill Barshay — Mon, 12 Aug 2024 10:00:00 +0000

If we graded schools on how accurately they grade students, they’d fail. Nearly six out of 10 course grades are inaccurate, according to a new study of grades that teachers gave to 22,000 middle and high school students in 2022 and 2023.

The Equitable Grading Project, a nonprofit organization that seeks to change grading practices, compared 33,000 course grades with students’ scores on standardized exams, including Advanced Placement tests and annual state assessments. The organization considered a course grade to be inaccurate if a student’s test score indicated a level of knowledge that was at least a letter grade off from what the teacher had issued. For example, a grade was classified as inaccurate if a student’s test score indicated a C-level of skills and knowledge, but the student received an A or a B in the course. In this example, a D or an F grade would also be inaccurate.

Inflated grades were more common than depressed grades. In this analysis, over 40 percent of the 33,000 grades analyzed – more than 13,000 transcript grades – were higher than they should have been, while only 16 percent or 5,300 grades were lower than they should have been. In other words, two out of five transcript grades indicated that students were more competent in the course than they actually were, while nearly one out of six grades was lower than the student’s true understanding of the course content.

FRPL refers to low-income students whose families qualify for the national free or reduced price lunch program. Source: Equitable Grading Project, “Can We Trust the Transcript?” July 2024.

The discrepancy matters, the white paper says, because inaccurate grades make it harder to figure out which students are prepared for advanced coursework or ready for college. With inflated grades, students can be promoted to difficult courses without the foundation or extra help they need to succeed. Depressed grades can discourage a student from pursuing a subject or prompt them to drop out of school altogether.

“This data suggests that hundreds, perhaps thousands, of students in this study may have been denied, or not even offered, opportunities that they were prepared and eligible for,” the white paper said.

This analysis is evidence that widespread grade inflation, which has also been documented by the ACT, the National Center for Education Statistics and independent scholars, has persisted through 2023. In this transcript analysis, grade inflation occurred more frequently for Black and Hispanic students than Asian and white students. It was also more common for low-income students.

Large discrepancies were documented. Almost 4,800 of the inflated grades were two letters higher than the student’s test score would indicate. An AP exam might have indicated a D-level of mastery, but the student earned a B in the class. On the flip side, more than 1,000 students received grades that were two letter grades lower than their assessment score.

The report rejected the possibility that test anxiety is the main culprit for such widespread and large discrepancies, and laid out a list of other reasons for why grades don’t reflect a student’s skills and content mastery. Some teachers feel pressure from parents and school administrators to give high grades. Many teachers factor in participation, behavior and handing in homework assignments – things that have little to do with what a student has learned or knows. Meanwhile, grades can be depressed when teachers make deductions for late work or when students fail to turn in assignments. Group projects that are weighed heavily in the final grade can swing a student’s final transcript grade up or down. In the report, one superintendent described how teachers in his district awarded students points toward their grade based on whether their parents attended Back to School Night.

Reasonable people can debate how much grades should be used to promote good behavior. The Equitable Grading Project argues that schools should use other rewards and consequences, and keep grades tied to academic achievement.

However, solutions aren’t quick or easy. The organization worked with over 260 teachers during the 2022-23 school year to implement a version of “mastery-based grading,” which excludes homework, class assignments and student behavior from the final grade, but uses a range of assessments – not only tests and papers – to ascertain a student’s proficiency. Teachers were encouraged to allow students multiple retakes. After five workshops and four coaching sessions, teachers’ grading accuracy improved by only 3 percentage points, from 37.6 percent of their grades accurately reflecting student proficiency to 40.6 percent.

Part of the challenge may be changing the minds of teachers, who tend to think that their own grades are fine but the problem lies with their colleagues. In a survey of almost 1,200 teachers that accompanied this quantitative study, more than 4 out of 5 teachers agreed or somewhat agreed that their grades accurately reflect student learning and academic readiness. But nearly half of those same teachers doubted the accuracy of grades assigned by other teachers in their own school and department.

Grading practices are an area where schools and teachers could really use some research on what works. I’ll be keeping my eye out for solutions with evidence behind them.

This story about the Equitable Grading Project was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: Nearly 6 out of 10 middle and high school grades are wrong, study finds appeared first on The Hechinger Report.

PROOF POINTS: A little parent math talk with kids might really add up, a new body of education research suggests

Jill Barshay — Mon, 05 Aug 2024 10:00:00 +0000

Parents know they should talk and read to their young children. Dozens of nonprofit organizations have promoted the research evidence that it will help their children do better in school.

But the focus has been on improving literacy. Are there similar things that parents can do with their children to lay the foundation for success in math?

That’s important because Americans struggle with math, ranking toward the bottom on international assessments. Weak math skills impede a child’s progress later in life, preventing them from getting through college, a vocational program or even high school. Math skills, or the lack of them, can open or close the doors to lucrative science and technology fields.

A new wave of research over the past decade has looked at how much parents talk about numbers and shapes with their children, and whether these spontaneous and natural conversations help children learn the subject. Encouraging parents to talk about numbers could be a cheap and easy way to improve the nation’s dismal math performance.

A team of researchers from the University of Pittsburgh and the University of California, Irvine, teamed up to summarize the evidence from 22 studies conducted between 2010 and 2022. Their meta-analysis was published in the July 2024 issue of the Journal of Experimental Child Psychology.

Here are four takeaways:

There’s a link between parent math talk and higher math skills

After looking at 22 studies, researchers found that the more parents talked about math with their children, the stronger their children’s math skills. In these studies, researchers typically observed parents and children interacting in a university lab, a school, a museum or at home and kept track of how often parents mentioned numbers or shapes. Ordinary sentences that included numbers counted. An example could be: “Hand me three potato chips.” Researchers also gave children a math test and found that children who scored higher tended to have parents who talked about math more during the observation period.

The link between parents’ math talk and a child’s math skills was strongest between ages three and five. During these preschool years, parents who talked more about numbers and shapes tended to have children with higher math achievement. Parents who didn’t talk as much about numbers and shapes tended to have children with lower math achievement.

With older children, the amount of time that parents spent talking about math was not as closely related to their math achievement. Researchers speculated that this was because once children start school, their math abilities are influenced more by the instruction they receive from their teachers.

None of these studies proves that talking to your preschooler about math causes their math skills to improve. Parents who talk more about math may also have higher incomes and more education. Stronger math skills could be the result of all the other things that wealthier and more educated parents are giving their kids – nutritious meals, a good night’s sleep, visits to museums and vacations – and not the math talk per se. So far, studies haven’t been able to disentangle math talk from everything else that parents do for their children.

“What the research is showing at this point is that talking more about math tends to be associated with better outcomes for children,” said Alex Silver, a psychologist at the University of Pittsburgh who led the meta-analysis. “It’s an easy way to bring math concepts into your day to day life that doesn’t require buying special equipment, or setting aside time to tutor your child and try to teach them arithmetic.”

Keep it natural

The strongest link between parent talk about math and a child’s math performance was detected when researchers didn’t tell parents to do a math activity. Parents who naturally brought up numbers or shapes in a normal conversation had children who scored higher on math assessments. When researchers had parents do a math exercise with children, the amount of math-related words that a parent used wasn’t as strongly associated with better math performance for their children.

Silver, a postdoctoral research associate at the University of Pittsburgh’s Learning Research & Development Center, recommends bringing math into something that the child is paying attention to, rather than doing flashcards or workbooks. It could be as simple as asking “How many?” Here’s an example Silver gave me: “Oh, look, you have a whole lot of cars. How many cars do you have? Let’s count them. You have one, two, three. There’s three cars there.”

When you’re doing a puzzle together, turn the shape in a different direction and talk about what it looks like. Setting the dinner table, grocery shopping and keeping track of money are opportunities to talk about numbers or shapes.

“The idea is to make it fun and playful,” said Silver. “As you’re cooking, say, ‘We need to add two eggs. Oh wait, we’re doubling the recipe, so we need two more eggs. How many is that all together?’ ”

I asked Silver about the many early childhood math apps and exercises on the market, and whether parents should be spending time doing them with their children. Silver said they can be helpful for parents who don’t know where to start, but she said parents shouldn’t feel guilty if they’re not doing math drills with their kids. “It’s enough to just talk about it naturally, to find ways to bring up numbers and shapes in the context of what you’re already doing.”

Quality may matter more than quantity

In the 22 studies, more math talk was associated with higher math achievement. But researchers are unable to advise parents on exactly how much or how often to talk about math during the day. Silver said 10 utterances a day about math is probably more beneficial than just one mention a day. “Right now the evidence is that more is better, but at some point it’s so much math, you need to talk about something else now,” she said. The point of diminishing returns is unknown.

Ultimately, the quantity of math talk may not be as important as how parents talk about math, Silver said. Reading a math textbook to your child probably wouldn’t be helpful, Silver said. It’s not just about saying a bunch of math words. Still, researchers don’t know if asking questions or just talking about numbers is what makes a difference. It’s also not clear how important it is to tailor the number talk to where a child is in his math development. These are important areas of future research.

Technology may help. The latest studies are using wearable audio recorders, enabling researchers to “listen” to hours of conversations inside homes, and analyzing these conversations with natural language processing algorithms to get a more accurate understanding of parents’ math talk. The 22 studies in this meta-analysis captured as little as three minutes and as much as almost 14 hours of parent-child interactions, and these snippets of life, often recorded in a lab setting, may not reflect how parents and children talk about math in a typical week.

Low-income kids appear to benefit as much from math talk as high-income kids

Perhaps the most inspiring conclusion from this meta-analysis is that the association between a parent’s math talk and a child’s math performance was as strong for a low-income child as it was for a high-income child.

“That’s a happy thing to see that this transcends other circumstances,” said Silver. “Targeting the amount of math input that a child receives is hopefully going to be easier, and more malleable than changing broader, systemic challenges.”

While there are many questions left to answer, Silver is already putting her research into practice with her own three-year old son. She’s asked counting questions so many times that her little one has begun to tease her. Every time he sees a group of things, he pretends to be Mommy and asks, “How many? Let’s count them!”

“It’s very funny,” Silver said. “I’m like, ‘Wow, Mommy really drilled that one into you, huh?’ Buddy knows what you’re up to.”

This story about math with preschoolers was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: A little parent math talk with kids might really add up, a new body of education research suggests appeared first on The Hechinger Report.

PROOF POINTS: New studies of online tutoring highlight troubles with attendance and larger tutoring groups

Jill Barshay — Mon, 15 Jul 2024 10:00:00 +0000

Ever since the pandemic shut down schools in the spring of 2020, education researchers have pointed to tutoring as the most promising way to help kids catch up academically. Evidence from almost 100 studies was overwhelming for a particular kind of tutoring, called high-dosage tutoring, where students focus on either reading or math three to five times a week.

But until recently, there has been little good evidence for the effectiveness of online tutoring, where students and tutors interact via video, text chat and whiteboards. The virtual version has boomed since the federal government handed schools nearly $190 billion of pandemic recovery aid and specifically encouraged them to spend it on tutoring. Now, some new U.S. studies could offer useful guidance to educators.

Online attendance is a struggle

In the spring of 2023, almost 1,000 Northern California elementary school children in grades 1 to 4 were randomly assigned to receive online reading tutoring during the school day. Students were supposed to get 20 to 30 sessions each, but only one of five students received that much. Eighty percent didn’t and they didn’t do much better than the 800 students in the comparison group who didn’t get tutoring, according to a draft paper by researchers from Teachers College, Columbia University, which was posted to the Annenberg Institute website at Brown University in April 2024. (The Hechinger Report is an independent news organization based at Teachers College, Columbia University.)

Researchers have previously found that it is important to schedule in-person tutoring sessions during the school day, when attendance is mandatory. The lesson here with online tutoring is that attendance can be rocky even during the school day. Often, students end up with a low dose of tutoring instead of the high dose that schools have paid for.

However, online tutoring can be effective when students participate regularly. In this Northern California study, reading achievement increased substantially, in line with in-person tutoring, for the roughly 200 students who got at least 20 sessions across 10 weeks.

The students who logged in regularly might have been more motivated students in the first place, the researchers warned, indicating that it could be hard to reproduce such large academic benefits for all. During the periods when children were supposed to receive tutoring, researchers observed that some children – often ones who were slightly higher achieving – regularly logged on as scheduled while others didn’t. The difference in student behavior and what the students were doing instead wasn’t explained. Students also seemed to log in more frequently when certain staff members were overseeing the tutoring and less frequently with others.

Small group tutoring doesn’t work as well online

The large math and reading gains that researchers documented in small groups of students with in-person tutors aren’t always translating to the virtual world.

Another study of more than 2,000 elementary school children in Texas tested the difference between one-to-one and two-to-one online tutoring during the 2022-23 school year. These were young, low-income children, in kindergarten through 2nd grade, who were just learning to read. Children who were randomly assigned to get one-to-one tutoring four times a week posted small gains on one test, but not on another, compared to students in a comparison group who didn’t get tutoring. First graders assigned to one-to-one tutoring gained the equivalent of 30 additional days of school. By contrast, children who had been tutored in pairs were statistically no different in reading than the comparison group of untutored children. A draft paper about this study, led by researchers from Stanford University, was posted to the Annenberg website in May 2024.

Another small study in Grand Forks, North Dakota confirmed the downside of larger groups with online tutoring. Researchers from Brown University directly compared the math progress of middle school students when they received one-to-one tutoring versus small groups of three students. The study was too small, only 180 students, to get statistically strong results, but the half that were randomly assigned to receive individual tutoring appeared to gain eight extra percentile points, compared to the students who were assigned to small group tutoring. It was possible that students in the small groups learned a third as much math, the researchers estimated, but these students might have learned much less. A draft of this paper was posted to the Annenberg website in June 2024.

In surveys, tutors said it was hard to keep all three kids engaged online at once. Students were more frequently distracted and off-task, they said. Shy students were less likely to speak up and participate. With one student at a time, tutors said they could move at a faster pace and students “weren’t afraid to ask questions” or “afraid of being wrong.” (On the plus side, tutors said groups of three allowed them to organize group activities or encourage a student to help a peer.)

Behavior problems happen in person too. However, when I have observed in-person small group tutoring in schools, each student is often working independently with the tutor, almost like three simultaneous sessions of one-to-one help. In-person tutors can encourage a student to keep practicing through a silent glance, a smile or hand signal even as they are explaining something to another student. Online, each child’s work and mistakes are publicly exposed on the screen to the whole group. Private asides aren’t as easy; some platforms allow the tutor to text a child privately in a chat window, but that takes time. Tutors have told me that many teens don’t like seeing their face on screen, but turning the camera off makes it harder for them to sense if a student is following along or confused.

Matt Kraft, one of the Brown researchers on the Grand Forks study, suggests that bigger changes need to be made to online tutoring lessons in order to expand from one-to-one to small group tutoring, and he notes that school staff are needed in the classroom to keep students on-task.

School leaders have until March 2026 to spend the remainder of their $190 billion in pandemic recovery funds, but contracts with tutoring vendors must be signed by September 2024. Both options — in person and virtual — involve tradeoffs. New research evidence is showing that virtual tutoring can work well, especially when motivated students want the tutoring and log in regularly. But many of the students who are significantly behind grade level and in need of extra help may not be so motivated. Keeping the online tutoring small, ideally one-to-one, improves the chances that it will be effective. But that means serving many fewer students, leaving millions of children behind. It’s a tough choice.

This story about online tutoring was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: New studies of online tutoring highlight troubles with attendance and larger tutoring groups appeared first on The Hechinger Report.

PROOF POINTS: Asian American students lose more points in an AI essay grading study — but researchers don’t know why

Jill Barshay — Mon, 08 Jul 2024 10:00:00 +0000

When ChatGPT was released to the public in November 2022, advocates and watchdogs warned about the potential for racial bias. The new large language model was created by harvesting 300 billion words from books, articles and online writing, which include racist falsehoods and reflect writers’ implicit biases. Biased training data is likely to generate biased advice, answers and essays. Garbage in, garbage out.

Researchers are starting to document how AI bias manifests in unexpected ways. Inside the research and development arm of the giant testing organization ETS, which administers the SAT, a pair of investigators pitted man against machine in evaluating more than 13,000 essays written by students in grades 8 to 12. They discovered that the AI model that powers ChatGPT penalized Asian American students more than other races and ethnicities in grading the essays. This was purely a research exercise and these essays and machine scores weren’t used in any of ETS’s assessments. But the organization shared its analysis with me to warn schools and teachers about the potential for racial bias when using ChatGPT or other AI apps in the classroom.

AI and humans scored essays differently by race and ethnicity

“Diff” is the difference between the average score given by humans and GPT-4o in this experiment. “Adj. Diff” adjusts this raw number for the randomness of human ratings. Source: Table from Matt Johnson & Mo Zhang “Using GPT-4o to Score Persuade 2.0 Independent Items” ETS (June 2024 draft)

“Take a little bit of caution and do some evaluation of the scores before presenting them to students,” said Mo Zhang, one of the ETS researchers who conducted the analysis. “There are methods for doing this and you don’t want to take people who specialize in educational measurement out of the equation.”

That might sound self-serving for an employee of a company that specializes in educational measurement. But Zhang’s advice is worth heeding in the excitement to try new AI technology. There are potential dangers as teachers save time by offloading grading work to a robot.

In ETS’s analysis, Zhang and her colleague Matt Johnson fed 13,121 essays into one of the latest versions of the AI model that powers ChatGPT, called GPT 4 Omni or simply GPT-4o. (This version was added to ChatGPT in May 2024, but when the researchers conducted this experiment they used the latest AI model through a different portal.)

A little background about this large bundle of essays: students across the nation had originally written these essays between 2015 and 2019 as part of state standardized exams or classroom assessments. Their assignment had been to write an argumentative essay, such as “Should students be allowed to use cell phones in school?” The essays were collected to help scientists develop and test automated writing evaluation.

Each of the essays had been graded by expert raters of writing on a 1-to-6 point scale with 6 being the highest score. ETS asked GPT-4o to score them on the same six-point scale using the same scoring guide that the humans used. Neither man nor machine was told the race or ethnicity of the student, but researchers could see students’ demographic information in the datasets that accompany these essays.

GPT-4o marked the essays almost a point lower than the humans did. The average score across the 13,121 essays was 2.8 for GPT-4o and 3.7 for the humans. But Asian Americans were docked by an additional quarter point. Human evaluators gave Asian Americans a 4.3, on average, while GPT-4o gave them only a 3.2 – roughly a 1.1 point deduction. By contrast, the score difference between humans and GPT-4o was only about 0.9 points for white, Black and Hispanic students. Imagine an ice cream truck that kept shaving off an extra quarter scoop only from the cones of Asian American kids.

“Clearly, this doesn’t seem fair,” wrote Johnson and Zhang in an unpublished report they shared with me. Though the extra penalty for Asian Americans wasn’t terribly large, they said, it’s substantial enough that it shouldn’t be ignored.

The researchers don’t know why GPT-4o issued lower grades than humans, and why it gave an extra penalty to Asian Americans. Zhang and Johnson described the AI system as a “huge black box” of algorithms that operate in ways “not fully understood by their own developers.” That inability to explain a student’s grade on a writing assignment makes the systems especially frustrating to use in schools.

This table compares GPT-4o scores with human scores on the same batch of 13,121 student essays, which were scored on a 1-to-6 scale. Numbers highlighted in green show exact score matches between GPT-4o and humans. Unhighlighted numbers show discrepancies. For example, there were 1,221 essays where humans awarded a 5 and GPT awarded 3. Data source: Matt Johnson & Mo Zhang “Using GPT-4o to Score Persuade 2.0 Independent Items” ETS (June 2024 draft)

This one study isn’t proof that AI is consistently underrating essays or biased against Asian Americans. Other versions of AI sometimes produce different results. A separate analysis of essay scoring by researchers from University of California, Irvine and Arizona State University found that AI essay grades were just as frequently too high as they were too low. That study, which used the 3.5 version of ChatGPT, did not scrutinize results by race and ethnicity.

I wondered if AI bias against Asian Americans was somehow connected to high achievement. Just as Asian Americans tend to score high on math and reading tests, Asian Americans, on average, were the strongest writers in this bundle of 13,000 essays. Even with the penalty, Asian Americans still had the highest essay scores, well above those of white, Black, Hispanic, Native American or multi-racial students.

In both the ETS and UC-ASU essay studies, AI awarded far fewer perfect scores than humans did. For example, in this ETS study, humans awarded 732 perfect 6s, while GPT-4o gave out a grand total of only three. GPT’s stinginess with perfect scores might have affected a lot of Asian Americans who had received 6s from human raters.

ETS’s researchers had asked GPT-4o to score the essays cold, without showing the chatbot any graded examples to calibrate its scores. It’s possible that a few sample essays or small tweaks to the grading instructions, or prompts, given to ChatGPT could reduce or eliminate the bias against Asian Americans. Perhaps the robot would be fairer to Asian Americans if it were explicitly prompted to “give out more perfect 6s.”

The ETS researchers told me this wasn’t the first time that they’ve noticed Asian students treated differently by a robo-grader. Older automated essay graders, which used different algorithms, have sometimes done the opposite, giving Asians higher marks than human raters did. For example, an ETS automated scoring system developed more than a decade ago, called e-rater, tended to inflate scores for students from Korea, China, Taiwan and Hong Kong on their essays for the Test of English as a Foreign Language (TOEFL), according to a study published in 2012. That may have been because some Asian students had memorized well-structured paragraphs, while humans easily noticed that the essays were off-topic. (The ETS website says it only relies on the e-rater score alone for practice tests, and uses it in conjunction with human scores for actual exams.)

Asian Americans also garnered higher marks from an automated scoring system created during a coding competition in 2021 and powered by BERT, which had been the most advanced algorithm before the current generation of large language models, such as GPT. Computer scientists put their experimental robo-grader through a series of tests and discovered that it gave higher scores than humans did to Asian Americans’ open-response answers on a reading comprehension test.

It was also unclear why BERT sometimes treated Asian Americans differently. But it illustrates how important it is to test these systems before we unleash them in schools. Based on educator enthusiasm, however, I fear this train has already left the station. In recent webinars, I’ve seen many teachers post in the chat window that they’re already using ChatGPT, Claude and other AI-powered apps to grade writing. That might be a time saver for teachers, but it could also be harming students.

This story about AI bias was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: Asian American students lose more points in an AI essay grading study — but researchers don’t know why appeared first on The Hechinger Report.

PROOF POINTS: Some of the $190 billion in pandemic money for schools actually paid off

Jill Barshay — Mon, 01 Jul 2024 10:00:00 +0000

Reports about schools squandering their $190 billion in federal pandemic recovery money have been troubling. Many districts spent that money on things that had nothing to do with academics, particularly building renovations. Less common, but more eye-popping were stories about new football fields, swimming pool passes, hotel rooms at Caesar’s Palace in Las Vegas and even the purchase of an ice cream truck.

So I was surprised that two independent academic analyses released in June 2024 found that some of the money actually trickled down to students and helped them catch up academically. Though the two studies used different methods, they arrived at strikingly similar numbers for the average growth in math and reading scores during the 2022-23 school year that could be attributed to each dollar of federal aid.

One of the research teams, which includes Harvard University economist Tom Kane and Stanford University sociologist Sean Reardon, likened the gains to six days of learning in math and three days of learning in reading for every $1,000 in federal pandemic aid per student. Though that gain might seem small, high-poverty districts received an average of $7,700 per student, and those extra “days” of learning for low-income students added up. Still, these neediest children were projected to be one third of a grade level behind low-income students in 2019, before the pandemic disrupted education.

“Federal funding helped and it helped kids most in need,” wrote Robin Lake, director of the Center on Reinventing Public Education, on X in response to the two studies. Lake was not involved in either report, but has been closely tracking pandemic recovery. “And the spending was worth the gains,” Lake added. “But it will not be enough to do all that is needed.”

The academic gains per aid dollar were close to what previous researchers had found for increases in school spending. In other words, federal pandemic aid for schools has been just as effective (or ineffective) as other infusions of money for schools. The Harvard-Stanford analysis calculated that the seemingly small academic gains per $1,000 could boost a student’s lifetime earnings by $1,238 – not a dramatic payoff, but not a public policy bust either. And that payoff doesn’t include other societal benefits from higher academic achievement, such as lower rates of arrests and teen motherhood.

The most interesting nuggets from the two reports, however, were how the academic gains varied wildly across the nation. That’s not only because some schools used the money more effectively than others but also because some schools got much more aid per student.

The poorest districts in the nation, where 80 percent or more of the students live in families whose income is low enough to qualify for the federally funded school lunch program, demonstrated meaningful recovery because they received the most aid. About 6 percent of the 26 million public schoolchildren that the researchers studied are educated in districts this poor. These children had recovered almost half of their pandemic learning losses by the spring of 2023. The very poorest districts, representing 1 percent of the children, were potentially on track for an almost complete recovery in 2024 because they tended to receive the most aid per student. However, these students were far below grade level before the pandemic, so their recovery brings them back to a very low rung.

Some high-poverty school districts received much more aid per student than others. At the top end of the range, students in Detroit received about $26,000 each – $1.3 billion spread among fewer than 49,000 students. One in 10 high-poverty districts received more than $10,700 for each student. An equal number of high-poverty districts received less than $3,700 per student. These surprising differences for places with similar poverty levels occurred because pandemic aid was allocated according to the same byzantine rules that govern federal Title I funding to low-income schools. Those formulas give large minimum grants to small states, and more money to states that spend more per student.

On the other end of the income spectrum are wealthier districts, where 30 percent or fewer students qualify for the lunch program, representing about a quarter of U.S. children. The Harvard-Stanford researchers expect these students to make an almost complete recovery. That’s not because of federal recovery funds; these districts received less than $1,000 per student, on average. Researchers explained that these students are on track to approach 2019 achievement levels because they didn’t suffer as much learning loss. Wealthier families also had the means to hire tutors or time to help their children at home.

Middle-income districts, where between 30 percent and 80 percent of students are eligible for the lunch program, were caught in between. Roughly seven out of 10 children in this study fall into this category. Their learning losses were sometimes large, but their pandemic aid wasn’t. They tended to receive between $1,000 and $5,000 per student. Many of these students are still struggling to catch up.

In the second study, researchers Dan Goldhaber of the American Institutes for Research and Grace Falken of the University of Washington estimated that schools around the country, on average, would need an additional $13,000 per student for full recovery in reading and math. That’s more than Congress appropriated.

There were signs that schools targeted interventions to their neediest students. In school districts that separately reported performance for low-income students, these students tended to post greater recovery per dollar of aid than wealthier students, the Goldhaber-Falken analysis shows.

Impact differed more by race, location and school spending. Districts with larger shares of white students tended to make greater achievement gains per dollar of federal aid than districts with larger shares of Black or Hispanic students. Small towns tended to produce more academic gains per dollar of aid than large cities. And school districts that spend less on education per pupil tended to see more academic gains per dollar of aid than high spenders. The latter makes sense: an extra dollar to a small budget makes a bigger difference than an extra dollar to a large budget.

The most frustrating part of both reports is that we have no idea what schools did to help students catch up. Researchers weren’t able to connect the academic gains to tutoring, summer school or any of the other interventions that schools have been trying. Schools still have until September to decide how to spend their remaining pandemic recovery funds, and, unfortunately, these analyses provide zero guidance.

And maybe some of the non-academic things that schools spent money on weren’t so frivolous after all. A draft paper circulated by the National Bureau of Economic Research in January 2024 calculated that school spending on basic infrastructure, such as air conditioning and heating systems, raised test scores. Spending on athletic facilities did not.

Meanwhile, the final score on pandemic recovery for students is still to come. I’ll be looking out for it.

This story about federal funding for education was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: Some of the $190 billion in pandemic money for schools actually paid off appeared first on The Hechinger Report.