Math A Panel Final Report
Page 1
I.
Introduction
Each member of the Math A Panel is very passionate about the importance of learning
mathematics. Each member of the Panel either has taught, is teaching, or is using
mathematics in his/her professional position on a regular basis. We are all lovers of
mathematics, and we want our children (which we define to be
all
of New York State's
children) to be proficient in mathematics. Each member of this Panel applauds the
efforts of the Board of Regents and the Commissioner of Education to provide
all
children with access to high level mathematics curricula. We have seen very positive
outcomes of these efforts, and we want to go on record as supporting the continuation
of efforts to provide high quality programs to every child.
In this report, the Panel has been guided by the nine elements of our charge. (See
Appendix A.) We viewed the charge as focusing on these broad areas:
*
The Math A standards
. What do we expect our students to know in Math A?
Are there improvements the Panel can recommend?
*
The Math A assessment
. How is the Math A Regents exam created and
scored? Are there improvements the Panel can recommend?
*
The infrastructure
. How prepared are New York State's schools to help every
child reach the Math A standards? Are there improvements the Panel can recommend?
In our discussions, additional issues were raised that do not "neatly" fit into the above
categories. These are also addressed in this report.
The Panel felt a tremendous weight of responsibility. All of its members are keenly
aware of the Math A graduation requirement. This report is not about Math 4 or Math 8,
tests to see if children need assistance. This report is not about Math B, the advanced
math test that will be taken by most students heading to college (and certainly all
students contemplating a future in a field requiring a strong mathematics background).
This report is about Math A, an exam that must be passed before a student can receive
a high school diploma. The estimates with which we were working for our Interim Report
were that approximately 2/3 of the students failed the June 2003 Math A exam. (These
estimates did not include data from New York City, which were not yet available.) With
the adjustment recommended by the Panel, a scale score of 47 became a passing
grade of 65. This adjustment holds this year's students to the same standard as their
counterparts were held in June 2002 yet, even after this substantial adjustment, the
early estimates were that 1/3 of the students still failed. At our September 19 meeting,
we were provided with the final statewide results. They show that, after the rescaling
we recommended:
•
45% of the State's children failed at the 65 level (33% at the 55 level);
•
59% of New York City's children failed at the 65 level (47% at the 55 level).
Math A Panel Final Report
Page 2
Unless these students pass a Math A exam in the future, they will not earn a high
school diploma, which will render them ineligible for a wide range of jobs in our society,
many of which do not require high level mathematics skills.
The Panel is also concerned that, even after the adjustment we recommended, an
estimated 16% of the 9
th
graders who took the exam failed at the 65 level (11% at the
55 level). Ninth graders who take this exam are typically considered as strong math
students. For one out of six of these students to fail a Regents examination required for
graduation points to a problem that must be solved.
The weight of responsibility this Panel feels is about finding balance. On the one hand,
our schools must ensure that our high school graduates have strong math skills; on the
other hand, failing rates such as the ones we have seen with Math A are, we believe,
unacceptable. The Panel has placed its primary focus on the standards, the
assessment, and the infrastructure. For the sake of our children, we adults need to get
this right.
Math A Panel Final Report
Page 3
II.
Executive Summary
The Panel supports the Commissioner and the Board of Regents in the quest to raise
standards for all children, and we write this report in the hope of recommending
solutions to the problems the Panel has identified, so that our State may continue on its
path of providing a top quality education for every child. Based on the Panel's perusal
of math graduation exams from other states, it is the Panel's opinion that New York
State has the highest math standards in the nation; our recommendations are intended
to keep it that way.
Before Math A, there were two ways students could meet the math graduation
requirement, either by passing the Course I Regents exam, or by passing the simpler
Regents Competency Test (RCT). Math A is a much more challenging requirement
than Course I; it tests more content and it has more problem solving. It is a challenge to
move students from Course I to Math A. The challenge of moving students from the
RCT to Math A is much greater. Early on, concerns were raised about the extent of this
challenge. An SED report issued in 1998 entitled "Mathematics Standards and
Assessment Review Committee Report" contains the following caution:
Until (1) the Standards are clearly stated and communicated to teachers,
students, parents and other interested parties; (2) proper support systems
are put in place to give ALL students a fair chance to meet the Standards;
and (3) proper support systems are put in place to provide ALL teachers
with opportunities to develop mathematical content knowledge and
pedagogical strategies, it may be unfair and unrealistic to expect the
passing of the Mathematics A exam to be a requirement for a high school
diploma. (p. 4.)
This Panel has concluded that the standards are not clear, and that the necessary
support systems for students and teachers are not in place. It is not within this Panel's
charge to discuss graduation requirements; the Panel's work has focused on changes to
Math A to make it more successful.
As noted above, even after a substantial adjustment recommended by this Panel, 45%
of the students failed the June 2003 Math A exam at the 65 level. Such a failure rate on
an adjusted exam points not to a single problem or a few simple problems; it points to a
systemic problem. This Panel focused on identifying the various facets of this systemic
problem, and on making broad recommendations to the Board of Regents, so these
problems can be addressed, and so we can continue moving forward to raise all of our
students to high levels of math competency.
The Panel spent hours examining pages and pages of information, graciously provided
by SED staff. The Panel spent hours speaking with SED staff, who patiently put up with
our questions day after day. The Panel spent hours discussing what we were seeing,
and then trying to reach consensus on our recommendations. The Panel believes it has
Math A Panel Final Report
Page 4
identified a number of areas in which changes need to be made, so that we can
continue moving forward on the path to higher standards for all children.
The Panel has identified 16 findings, and has developed a total of 22 recommendations,
many with several parts, for a total of 41 recommendations. These are summarized
below.
The Math A Standards
Finding 1
:
The Math A standards lack clarity and specificity
(p. 15).
Recommendation 1A:
Educationally useful standards must be developed in
mathematics for each grade, K8, and for Math A and Math B, that consist of a
clear, welldefined set of skills, the mastery of which is demonstrable
(p. 19).
Recommend
a
tion 1B:
SED should establish a mathematics standards
committee to rewrite the standards into functional form, and to meet regularly in
the future to analyze test results, thus ensuring continuous relevance
(p. 19).
Recommendation 1C:
SED should develop and disseminate suggested
curricula for mathematics instruction for each grade K8, and for Math A and
Math B
(p. 19).
Recommendation 1D:
To benefit from the extensive research and deliberation
of the current Math A Panel, some of the current Panel members should be
included in both new committees recommended in this report, i.e., the standards
committee, and the curriculum development committee
(p. 20).
Finding 2:
The design concept that the Math A exam should be taken by the typical
student after three semesters of instruction has not been successful
(p. 20).
Recommendation 2:
The standards and curricula should be structured so that the
typical student will take the Math A exam after one year of high school
mathematics
(p. 20).
The Math A Exam
Finding 3:
The June 2003 Regents Math A exam was harder than past Math A
exams
(p. 25).
Recommendation 3A:
Establish and maintain narrow statistical targets for
difficulty of Parts I, II, III, and IV of the Math A exam forms
(p. 28)
.
Recommendation 3B:
Review the Math A item pool
(p. 28).
Math A Panel Final Report
Page 5
Recommendation 3C:
The difficulty of problems in the
anchor item set, in the
guidance documents provided to teachers, and on the actual tests should be
aligned
(p. 28).
Recommendation 3D:
The weighting of the openended items, number of scale
points possible on the openended item rubrics, and other aspects of the scoring
of openended items should be reconsidered
(p. 29).
Recommendation 3E:
The Math A test should focus on a more limited, more
clearlyspecified set of content standards and indicators
(p. 29).
Finding 4:
The Math A tests have not been able to maintain a consistent
performance standard over time
(p. 29).
Recommendation 4A:
Alternative equating designs should be considered
(p. 32).
Recommendation 4B:
Sampling procedures for estimating item performance
must be improved
(p. 32).
Recommendation 4C:
Replace the anchor item set
(p. 33).
Recommendation 4D:
Revisit performance standards (cut scores)
(p. 33).
Finding 5:
The New York State Education Department cannot accurately predict
performance on Math A test
(p. 33).
Recommendation 5A:
SED should implement procedures for predicting the
performance of test forms and groups of students on future Math A exams
(p. 34).
Recommendation 5B:
Policies for field testing and data collection should be
revised
(p. 34).
Finding 6:
Support and oversight for the Math A exam program should be improved
(p. 34).
Recommendation 6A:
SED should immediately increase inhouse content and
technical expertise resources by a minimum of one psychometrician and two
math content specialists
(p. 35).
Recommendation 6B:
SED should clarify the responsibilities assigned to its
technical advisory committee, and should request this group to provide regular
Math A Panel Final Report
Page 6
reports, including technical analyses, reactions to proposed changes in test
programs, and suggestions for improving State testing programs (p. 35).
Recommendation 6C:
SED should increase demands placed on contractors
(p. 36).
Recommendation 6D:
Internal coordination and documentation should be
improved
(p. 36).
Infrastructure Issues Related to the Attainment of Math A Standards
Finding 7:
Passing rate data for the State as a whole were not available until three
months after the exam; no data are collected regarding student performance on
individual items, nor even regarding student performance on the four parts of the
exam
(p. 37).
Recommendation 7:
SED should increase its data collection capacity to
include item level data, and should accelerate its data collection timetable
(p. 37).
Finding 8:
While
the most important use of student performance data is to inform
instruction, statewide data mining models that would enable local schools and teachers
to use these data effectively are not generally available
(p. 37).
Recommendation 8:
SED should substantially broaden its efforts to assist
districts in data collection, and the use of data to inform instruction
(p. 37).
Finding 9
:
The mathematical background of teachers delivering math instruction varies
widely; yet, raising almost three million children to higher levels of math achievement
will be impossible without highly skilled teachers
(p. 37).
Recommendation 9A:
SED and higher education need to continue and to
strengthen their partnerships to ensure strong teacher education programs, both
preservice and inservice
(p. 37).
Recommendation 9B:
The certification requirements for elementary teachers
and special education teachers should include a minimum of nine credits of
college level mathematics (see Recommendation 9C), and three credits of
teaching techniques in mathematics
(p. 37).
Recommendation 9C:
Mathematics courses required for certification, both for
mathematics teachers and elementary and special education teachers, should be
specific not only in terms of number of credits required to be taken, but also in
Math A Panel Final Report
Page 7
terms of coursework required to be taken, e.g., calculus, number theory,
algebraic structures, probability and statistics, etc.
(p. 38).
Recommendation 9D:
The Panel believes that, for any teacher responsible for
teaching mathematics at any level, the 175hour professional development
requirement should include specific mathematics requirements. The Panel's
thinking is that:
•
teachers who teach mathematics exclusively should be required to take 100
of the 175 hours in the area of mathematics;
•
secondary teachers who are certified in, and who teach in, more than one
subject area, should be required to take 50 of the 175 hours in the area of
mathematics;
•
teachers who teach mathematics as part of a broad set of teaching
responsibilities, e.g., elementary teachers and special education teachers,
should be required to take 30 of the 175 hours in the area of mathematics.
Additionally, the range of possible courses that would satisfy these requirements
should be clearly specified
(p. 38).
Finding 10.
The public has very little awareness of Math A, and may have
misunderstandings about the goals of Math A
(p. 38).
Recommendation 10:
Make greater use of SED communications capacity to
engage the public in conversations about the importance of strong mathematics
skills
(p. 38).
Finding 11:
There is
often a "disconnect" between K12 and higher education
(p. 38).
Recommendation 11
:
SED should encourage conversations at the local and
regional levels of K12 teachers of mathematics and higher education professors
of mathematics, for the purpose of sharing curriculum, and exploring professional
development opportunities and other possible collaborations, to bridge the gap
between K12 and higher education
(p. 38).
Finding 12:
Raising the level of mathematics achievement of all students to high levels
must start when children are very young, and must go beyond the school day for
school aged children
(p. 39).
Recommendation 12:
SED should encourage through grants and other means
the expansion of mathematics education initiatives beyond K12, such as the
creation of partnerships between schools and libraries, and the greater use of
public television and museums
(p. 39).
Math A Panel Final Report
Page 8
Additional Issues
Scoring Rubrics, and Communication to the Field Regarding Grading
Finding 13:
The scoring rubrics do not give credit for a variety of mathematically
correct approaches
(p. 40).
Recommendation 13A:
Develop more generally worded, holistic scoring rubrics
which permit credit to be granted for atypical, but mathematically correct, student
responses
(p. 40).
Recommendation 13B:
Rubrics should be designed so students do not lose 33%
or 50% credit for a minor arithmetic error
(p. 40).
Finding 14:
There is a serious "disconnect" between the perception of the SED
content specialists and the perception of field classroom teachers regarding the
application of the scoring rubrics
(p. 40).
Recommendation 14
:
On each set of directions for the Math A exam, a
statement should be added confirming that the scoring rubrics are a guide and
should be applied using professional judgment
(p. 40).
Finding 15:
There needs to be better communication of SED grading interpretations
during the grading process for the Math A exams
(p. 41).
Recommendation 15A:
SED should continue on its path of setting up a website
during Math A Regents exam grading to provide uptodate clarifications to
teachers grading the exam
(p. 41).
Recommendation 15B:
SED should explore ways of sending uptodate grading
clarifications to the school districts during the grading period following the
administration of the exam, as a backup to the website, to ensure the greatest
possible consistency of grading across the State
(p. 41).
Additional Issues
Calculator Use on the Math A Exam
Finding 16:
Allowing the option of using a graphing calculator on the Math A exam
provides some students with an advantage on the exam, thus creating an
inequitable situation
(p. 41).
Recommendation 16:
The use of calculators on the Math A Regents exam
should be standardized
(p. 42).
Math A Panel Final Report
Page 9
The January 2004 Exam, and All Math A Exams until A New One Is Designed
Recommendation 17:
Until the standards are rewritten, new curricula are
developed, the new course is delivered, and a new Math A Regents is designed
and field tested, the Math A Regents exam should be restructured so the exam
includes: 30 Part I items, 5 Part II items, 2 Part III items, and 2 Part IV items
(p. 43).
Recommendation 18:
The exam should be reviewed by a group of practitioners,
including math teachers, university mathematicians and mathematics educators,
with representatives from this Panel, prior to the administration of the exam
(p. 43).
Recommendation 19:
Until new items are developed and properly field tested,
the exam items should be scaled in accord with the procedures used for the
August rescaling of the June 2003 exam
(p. 43).
Recommendation 20:
The scaling should not be finalized until after the exam
has been administered and after a post equating procedure has been
implemented to ensure the fairness of the test
(p. 43).
Recommendation 21:
The 55 passing option on the Math A Regents Exam for a
local diploma should be continued until after the standards have been clarified,
after new curriculum has been developed and disseminated, and after a new
exam has been developed and administered for at least one school year (to
ensure that it is performing in accord with its design)
(p. 44).
Recommendation 22:
The math RCT safety net for special education children
should be continued until after the standards have been clarified, after new
curriculum has been developed and disseminated, and after a new exam has
been developed and administered for at least one school year (to ensure that it is
performing in accord with its design)
(p. 44).
The Panel believes our recommendations, taken together, will successfully address the
problems we have identified in our independent investigation. A suggested timeline for
implementation has been developed and is included in the report. (p. 45).
Math A Panel Final Report
Page 10
III.
The History of Math A
During the 1990s, discussion ensued about raising the standards for mathematics
education in New York State. Ultimately, a decision was made to phase out Course I,
Course II, Course III and to replace this threeyear sequence with Math A and Math B.
Conceptually, Math A was to include topics from about a year and a half of the Course I,
II, III sequence; and Math B the remainder. A major shift in emphasis was toward more
contextual problems and with a greater emphasis on
genuine
problem solving, i.e.,
mathematics within a context, where problemsolving strategies can be used. While
there was to be a Math A exam and a Math B exam, there was not a curriculum
developed. Rather, schools were informed of the math standards, expressed in seven
"Key Ideas" which, in turn, were subdivided into 103 "Performance Indicators." Schools
were told that they could reach these standards in whatever way they wished but were
advised that students would be assessed on these 103 Performance Indicators. Over
time, schools worked to develop courses to meet the new standards.
The first Math A exam was administered in June 1999. For several years, SED
produced both the old and new exams, and schools could offer either one. The last
Course I Regents was administered in January, 2002. It is no longer an option.
During this same time period, the Board of Regents made a series of policy decisions
that resulted in high school graduation becoming contingent upon the passing of five
Regents exams, with a math exam being one of those exams. Now, with Course I no
longer available, the exam required for graduation has become Math A. Prior to this
policy change, students could graduate with different types of diplomas. Some students
met the requirement by passing the Math Regents Competency Test (RCT), a fairly
basic test of skills, whereas others met the requirement by passing the Course I
Regents exam.
From the beginning, all knew Math A was a substantial change, more for some students
than for others, but a change for all. In 1998, a group of math experts expressed
concerns about the difficulty level of Math A. As the exams were phased in, concerns
from the field grew about the difficulty level and the wording of problems. When the
June 2003 Math A exam was administered, the concerns became an outcry. Teachers
saw that the test was very difficult. Early anecdotal evidence from the schools pointed
to a very high failure rate. SED responded by requesting data from schools. When the
data confirmed a high failure rate, the Commissioner made the decision to set aside the
test for current 11th and 12th graders, and to permit them to substitute their course
grade for the purpose of the graduation requirement.
Shortly thereafter, this Math A Panel was created by the Board of Regents and the
Commissioner, and asked to respond to a nineelement charge. (See Appendix A.)
The Panel dedicated three full days (and held extensive conversations between
meetings) to the first part of its investigation, which was whether the June 2003 exam
was more difficult than previous exams and, if so, what to recommend as a rescaling to
Math A Panel Final Report
Page 11
the Commissioner. The determination was made that the exam was, in fact, more
difficult. In an Interim Report, the Panel recommended rescaling the June 2003 exam
based on the June 2002 results, using 9th grade students as the basis, as the 9
th
grade
groups in both years were similar. The Panel's estimate was that this adjustment would
raise scores in the middle of the distribution about ten points. The Commissioner
accepted the Panel's recommendation, and directed SED staff to implement the
adjustment. Within days, SED generated a new scale for the June 2003 exam; it
converted an old 47 to a new 65. According to an SED press release at the time, the
estimates of the impact on passing rates were as follows:
9
th
graders: from 61% to 80% passing
10
th
graders: from 32% to 64% passing
11
th
graders: from 28% to 60% passing
12
th
graders: from 28% to 55% passing
The Panel then continued with its work on the remainder of the elements of the charge.
(See Appendix A.) This document is the Panel's final report to the Commissioner and
Board of Regents.
Math A Panel Final Report
Page 12
IV.
The Development of the Math A Exams
Each Math A Regents exam is the result of a multiyear cycle of test development,
which results in four actual tests being created each year. Three of these tests are for
the expected administrations, and one is held in reserve in case it is needed. (The June
2003 exam was the first exam used from a four test cycle. The January 2004 exam is
scheduled to be the second of the four exams from the same test development cycle.)
The first Math A test development cycle occurred in 1997 and 1998; this cycle resulted
in the setting of standards levels which are applied to this day.
1
Each Math A exam has 35 items, 20 multiple choice and 15 openended questions, The
test specification calls for point values as follows:
Part I: 2 points each for all 20 multiple choice items (totaling 40 points)
Part II: 2 points each for 5 of the openended items (totaling 10 points)
Part III: 3 points each for 5 of the openended items (totaling 15 points)
Part IV: 4 points each for 5 of the openended items (totaling 20 points)
Thus, the raw score point range is 0 to 85 points. This is scaled onto a traditional 0100
range using the equating techniques summarized elsewhere in this report.
The test development process starts with teachers being invited to Albany to write
multiple choice and openended items. Once the items have been written, SED staff
and consultants then select items for pretesting. Math A pretesting involves the creation
of between 20 and 30 forms, each of which consists of 5 or 6 multiple choice items and
3 or 4 openended items. Schools are sampled with the goal of pretesting each item on
250 representative students statewide.
When the pretest forms are returned, the items are graded, and the results are sent to
an outside consultant for analysis. Part of this grading process is called "rangefinding."
This process is an effort to assign point values to various levels of response to the
openended items. It involves establishing rules for grading each openended item and
it involves selecting student papers which are exemplars of each point value
assignment. This is done by classroom teachers under the coordination of an outside
contractor, the purpose being to create a guide for the grading process in local schools.
1
There are many critically important facts about the early development of the Math A test that cannot be
answered, because of staff turnover at SED and because some areas are hard to document. The Panel
does not know how the initial set of problems used to set original bookmarks were developed, e.g., what
were the directions to the item writers, what were the backgrounds of those writers, and who were the
students whose field test efforts on these problems were used by the benchmarking committee to assess
the difficulty of these problems? Likewise, the backgrounds of the members of the benchmarking
committees are not known. The panelists worry that the item writers, the field test students, and the
benchmarking committee members may not have been properly representative of their counterparts
statewide
.
Math A Panel Final Report
Page 13
In order to set the standard for passing and passing with distinction, during the first
cycle of test development (19971998), a "bookmarking" standardsetting process
occurred. This involves taking the items, after their relative difficulty has been
determined, and arraying them from easiest to hardest. Then, a large group of math
teachers convenes and holds several discussions for the purpose of determining where
the "cut points" should be for passing (65) and passing with distinction (85). Once these
standards/cut points are set, they are used for all future administrations of the test, until
a new standard setting process occurs. All Math A exams have used the same cut
points, through and including the June 2003 exam.
Also for the first cycle only, items that seem strong, both in terms of measuring the
content and in terms of their item statistics, are selected as "anchor items."
2
These
anchor items are used as the basis for equating all future exams. (For Math A, the
original set of anchor items included 35 items. At some point, these were pared down to
18 items. While SED staff members cannot recall the rationale for doing this, most
speculate that it was to shorten the test so it could be administered in one class period.)
The set of 18 anchor items has been used for several test administrations in a row, up
to and including the present. The June 2003 exam was equated based on these items,
as have been the remaining three exams, including the one scheduled to be
administered in January 2004.
Once the pretest results are obtained, four field test forms are prepared. These forms
are intended to be pretty close to the actual exams that will be given. Each has 35
items and looks like a Math A Regents exam. The items are selected from the pretested
items by SED staff and outside consultants. The selection is based on content
coverage and item statistics from the pretests. (About half the pretested items survive to
the field test level.)
3
Because SED's experience is that schools are more accepting of field tests which last
no more than one class period, each of the four full field test forms is divided into three
field test "miniforms," yielding a total of 12 miniforms. A representative sample of
schools is chosen and asked to administer the field test. When the forms are sent to the
schools, a 13
th
form is sent, a form with the 18 anchor items mentioned above. (There
has been a mixed practice over the years. In some years, the 18 items were
interspersed with the field test items on the same form; in other years, the anchor form
was a separate form and was "spiraled," which means it was given to randomly selected
2
Please refer to the section on the Math A exam for definitions and additional discussion of technical
issues.
3
During its work, the Panel learned that, because of the complexity of developing item statistics for open
ended questions, these statistics are not available when the field test forms are created. The selection of
the openended items for the field test form is based on content coverage and an estimate of difficulty.
Although items can be replaced after field testing, the Panel believes that pretest item statistics for open
ended items should be available before the items are selected for the field tests.
Math A Panel Final Report
Page 14
students within the same group as the field test forms. The latter practice is the more
recent one, and it was the one used for the June, 2003 exam.)
4
Once the field tests are returned and graded, again using rangefinding for the open
ended items, the results are sent to a consultant for item analysis, and the results are
reviewed by math teachers and SED staff. This can result in items being modified or
replaced by SED staff or outside consultants without input from field mathematics
specialists. Four items on the June 2003 Math A exam were replacement items from
the item pool: Items numbered 14, 15, 30 and 35.
5
Once the four forms of the exam are finalized, the equating process is applied by the
consultants
6
and the 085 raw score scale is transformed to the traditional 0100 scale
for each exam.
7
4
Although two different techniques have been used to administer the anchor items to the students, it
appears to the Panel that both methods are acceptable and should yield similar results. The difference is
noted in this report only for the sake of clarity.
5
The Panel notes that for the June 2003 Math A exam, items were replaced, and the final form was not
reviewed by field math teachers. The Panel would have recommended such a step in the development
process, but has been advised that this step has already been added, beginning with the August 2003
Regents exams. The Panel applauds SED for this additional step.
6
The Panel was provided with a very professional appearing "Equating and Scaling" report from the 2000
field test used to develop the exams administered in 2001. The Panel had requested the report used to
equate the June 2003 exam. At its September 10 meeting, the Panel was provided with a draft report
dated June 2003, and realized this was a report on the 2001 field test used to develop the exams
administered in 2002, not a report on the 2002 field test used to develop the 2003 exams. The Panel has
been advised by SED staff that the consultant did provide all of the item analyses required, as well as the
scaling tables, but has not yet submitted a formal report. While this does not appear to have a material
impact on any of the results, the Panel believes that these equating and scaling reports should be in the
hands of SED staff several months before the exams to which they pertain are used to rate schools and
students. This way, if the consultant sees a problem with any of the items, there is time for adjustment.
To the Panel, it appears that the 2002 field test Equating and Scaling report is now late by over a year.
7
Decisions regarding the final test depend upon the item statistics provided by the consultant. The item
statistics the Panel received for the field test which led to the June 2003 exam (and three exams to be
given in the future, including January 2004) had four different sets of item statistics, three of which were
crossed out. SED staff had been told by the consultant to use the one set of statistics not crossed out and
to ignore the others. The Panel attempted to ascertain if there was any importance to the three sets of
statistics crossed out and SED staff arranged for a telephone conversation with Panel members, SED
staff and representatives of the consulting company. The consulting company was not immediately able
to explain what had occurred, nor why it had occurred. This has led to some of our recommendations
regarding the technical aspects of test development.
Math A Panel Final Report
Page 15
V.
Findings and Recommendations
A.
The Math A Standards
The Panel's work started with various analyses of the Math A exam. These analyses,
which are presented later in this report, led the Panel to its findings regarding the
standards presented here. Although the analyses of the exam came first in our work,
we present our findings on standards first, as the standards form the foundation for the
exam.
Finding 1:
The Math A standards lack clarity and specificity
.
Classroom teachers, parents and students do not know what Math A is. The Panel
already mentioned in its Interim Report the failure of the June 2003 examination to
cover trigonometry. Teachers were led to expect trigonometry; the Mathematics
Resource Guide with Core Curriculum states:
Students are still expected to master basic skills of arithmetic, geometry,
algebra, trigonometry, probability, and statistics. The State Education
Department will continue to assess these skills and concepts with tests
that will be given in secure settings, and the results of these tests will be
made public each year (p. 3.).
Teachers, trying to prepare their students for the June 2003 exam, read these words,
looked at previous exams, decided based upon this guidance that trigonometry needed
to be taught, and spent weeks helping their students learn this area of mathematics. As
noted in the Math A Panel Interim Report, trigonometry was not assessed by even one
item on the June 2003 Math A Regents. Classroom teachers have come to believe,
with good reason, that they can only guess which topics are important, and hope they
make the right guess as they teach their students.
Not only are the topics unclear, but the breadth and depth of the expected
understanding are unclear. The standards as they are currently written do not easily
translate into classroom practice, and they are confusing to teachers, students, and
parents.
As just one illustration of this, we point to one of the 103 Performance Indicators,
Performance Indicator 5A. This indicator states several expectations, including the
Pythagorean Theorem, but the depth of expected knowledge is not specified. Is the
graduation performance standard a straightforward numeric substitution using the
theorem (which would be a minimal expectation) or is it a deep conceptual
understanding of the theorem and its applications (higher mastery)? Is the expectation
a simple statement of the theorem, or application to a right triangle, or application twice
in the same problem, using the theorem algebraically and proving a right triangle?
What is the expectation? It is not clear.
Math A Panel Final Report
Page 16
To elaborate, the statements of the Key Idea and Performance Indicator which include
the Pythagorean Theorem are as follows:
Key Idea 5: Measurement
Students use measurement in both metric and English to provide a major
link between the abstractions of mathematics and the real world in order to
describe and compare objects and data.
Performance Indicator 5A
:
Apply formulas to find measures such as length, area, volume, weight,
time, and angle in realworld contexts.
Includes:
•
Perimeter of polygons and circumference of circles.
•
Area of polygons and circles.
•
Volume of solids.
•
Pythagorean Theorem
On the next page is a table showing how this one Performance Indicator has been
tested over the years, and also showing the Assessment Example provided in the
guidance document provided to teachers.
Math A Panel Final Report
Page 17
Math A Regents Exam Questions Mapped by SED to
Performance Indicator 5A
Assessment
Example for
Performance
Indicator 5A, from
the Mathematics
Resource Guide
with Core
Curriculum
June 2002 Exam
August 2002
Exam
January 2003
Exam
June 2003 Exam
Ms. Brown plans to
carpet part of her
living room. The
living room floor is
a square 20 feet by
20 feet. She wants
to carpet a quarter
circle as shown
below. Find to the
nearest square
foot, what part of
the floor will remain
uncarpeted.
Show how you
arrived at your
answer.
31. As seen in the
accompanying diagram, a
person can travel from
New York City to Buffalo
by going north 170 miles
to Albany and then west
280 miles to Buffalo.
a
If an engineer
wants to design a
highway to connect New
York City directly to
Buffalo, at what angle, x,
would she need to build
the highway? Find the
angle to the
nearest
degre
e.
b
To the
nearest
mil
e, how many miles
would be saved by
traveling directly from
New York City to Buffalo
rather than by traveling
first to Albany and then to
Buffalo?
31. In the
accompanying
diagram, x
represents the
length of a ladder
that is leaning
against a wall of a
building, and y
represents the
distance from the
foot of the ladder to
the base of the
wall. The ladder
makes a 60° angle
with the ground and
reaches a point on
the wall 17 feet
above the ground.
Find the number of
feet in x and y.
30. A
rectangular
garden is going
to be planted in
a person’s
rectangular
backyard, as
shown in the
accompanying
diagram. Some
dimensions of
the backyard
and the width
of the garden
are given. Find
the area of the
garden to the
nearest square
foot.
30. To get from his high
school to his home, Jamal
travels 5.0 miles east and
then 4.0 miles north. When
Sheila goes to her home
from the same high school,
she travels 8.0 miles east
and 2.0 miles south. What
is the measure of the
shortest distance, to the
nearest tenth of a mile,
between Jamal’s home and
Sheila’s home? [The use of
the accompanying grid is
optional.]
35. Determine the
distance between
point A(–1,–3) and
point B(5,5). Write
an equation of the
perpendicular
bisector of AB. [The
use of the
accompanying grid
is optional.]
34.A straw is placed into a
rectangular box that is 3
inches by 4 inches by 8
inches, as shown in the
accompanying diagram. If
the straw fits exactly into
the box diagonally from the
bottom left front corner to
the top right back corner,
how long is the straw, to the
nearest tenth of an inch?
2. The accompanying
diagram shows a square
with side y inside a square
with side x. Which
expression represents the
area of the shaded region?
(1) x
2
(3) y
2
– x
2
(2) y
2
(4) x
2
– y
2
These are all very different problems. Which one is the standard?
Math A Panel Final Report
Page 18
The lack of clarity can also be seen in the overlap of Key Ideas and Performance
Indicators. For example, Question 30 of the June 2003 Math A Regents (in the table on
the previous page), which was mapped by SED to Key Idea 5, Performance Indicator
5A, could also be mapped to Key Idea 5, Performance Indicator 5G. Note, however, the
Assessment Example given to teachers to help them understand the standard:
Performance Indicator 5G
Assessment Example 5G from the
Mathematics Resource Guide with Core
Curriculum:
Relate absolute value, distance between two points,
and the slope of a line to the coordinate plane.
Includes:
•
Absolute value and length of a line segment.
•
Midpoint of a segment.
•
Equation of a line: pointslope and slope
intercept form.
•
Comparison of parallel and perpendicular
lines
.
What is the distance between points A (7,3)
and B (5,1)?
Clearly, Assessment Example 5G is much more direct and much easier than
Question 30 on the June 2003 Regents exam. The Panel has found such "disconnects"
repeatedly between the types of items provided to teachers as examples, and the types
of items appearing on the actual exams. What is the standard?
Yet another example of this disconnect can be found in the contrast between
Question 29 on the June 2003 exam, and the example given in the SED teacher
guidance document, both shown below. Notice how much more complex the test item
is, when compared with the example provided to teachers as guidance.
Key Idea 6 Uncertainty:
Students use ideas of uncertainty to illustrate that mathematics involves more than exactness when
dealing with everyday situations
.
Performance Indicator 6C:
Use the concept of random variable in computing probabilities
.
Includes:
•
Mutually exclusive and independent events.
•
Counting principle.
•
Sample space.
•
Probability distribution.
•
Probability of the complement of an event.
Assessment Example 6C from the Mathematics
Resource Guide with Core Curriculum
June 2003 Math A Question 29 mapped to Performance
Indicator 6C:
The graph below shows the hair colors of all
the students in a class. What is the probability that
a student chosen at random from this class has
black hair?
29.
A certain state is considering changing the
arrangement of letters and numbers on its license plates.
The two options the state is considering are:
Option 1: three letters followed by a fourdigit number
with repetition of both letters and digits allowed
Option 2: four letters followed by a threedigit number
without repetition of either letters or digits
[Zero may be chosen as the first digit of the number in either
option.]
Which option will enable the state to issue more license
plates? How many
more
different license plates will that
option yield?
Math A Panel Final Report
Page 19
The Assessment Example and the test item represent very different expectations; which
one is the standard?
In some ways, this situation can be likened to setting a standard that all students should
run fast. We all have a sense of what this means, but in a high stakes environment,
clarity and specificity are essential. Is "run fast" defined as a nine minute mile or a
seven minute mile? The Panel is convinced that the lack of clarity and specificity of the
standards must be addressed, as the standards are the foundation for all other aspects
of this work.
As the Panel reviewed student performance data from the Math A testing program, it
became very clear that we cannot limit our thinking to the high school math program.
SED data show a very high correlation between not passing the 8
th
grade math
assessment and not passing the Math A Regents examination. Almost onehalf of all
8
th
graders statewide scored a level 1 or level 2 on the Math 8 exam; how can this
apparently significant deficit be made up in a matter of months at the high school?
Because mathematics is such a sequential subject, any effort to modify Math A must
include efforts directed at the lower grades, K8. Additionally, because Math A leads to
Math B, any efforts regarding Math A must be extended to the upper grades. The effort
to streamline and clarify the standards must extend to the other grades.
The Panel's recommendations concerning this finding are:
Recommendation 1A:
Educationally useful standards must be developed in
mathematics for each grade, K8, and for Math A and Math B, that consist of a clear,
welldefined set of skills, the mastery of which is demonstrable.
Recommendation 1B:
SED should establish a mathematics standards committee to
rewrite the standards into functional form, and to meet regularly in the future to analyze
test results, thus ensuring continuous relevance.
This committee should include a large cross section of adults including mathematics
teachers, university mathematicians, professors of mathematics education, special
education teachers, parents, and adults who work with mathematics in real work
applications, both in the professions (for example, engineers and accountants) and in
the trades (for example, carpenters and electricians). The Panel envisions that this
group would meet at least once a year to review the exams against the standards, in
order to provide continuity over time.
Recommendation 1C:
SED should develop and disseminate suggested curricula
for mathematics instruction for each grade K8, and for Math A and Math B.
The Panel wishes to make it clear that it does not recommend this as a mandated or
required curriculum, but rather as additional guidance to the field. No curriculum, no
matter how strong, can take the place of a gifted classroom teacher. The Panel wishes
not to discourage in any way individual creativity either at the classroom level or the
Math A Panel Final Report
Page 20
district level, but, rather, to provide struggling teachers and schools with a suggested
starting point upon which they may build. The Panel also wishes to state that we see
this as a need because of the highly sequential structure of mathematics; this idea does
not necessarily transfer to other subject areas. The Panel envisions a curriculum
development committee of mathematics teachers, and representatives of the
mathematics standards committee.
Recommendation 1D:
To benefit from the extensive research and deliberation of
the current Math A Panel, some of the current Panel members should be included in
both new committees recommended in this report, i.e., the standards committee, and
the curriculum development committee.
Finding 2:
The design concept that the Math A exam should be taken by the typical
student after three semesters of instruction has not been successful.
The Panel understands the thinking behind the original design concept that the Math A
exam should be given to the typical student after a year and a half of coursework.
However, the "disconnect" between this model and the academic year has been
problematic. First, we are all aware of the research demonstrating the "drop" which
students experience during the summer, especially weaker students; this impacts most
those students who are struggling with Math A, and it leaves the teacher of the third
semester before the Math A exam being responsible for closing that gap. Another issue
is the rhythm of the school year, a force which cannot be ignored. Students often
experience one teacher in the first and second semesters of instruction leading to the
Math A exam, and another teacher for the third semester. It would seem logical that
teachers should be scheduled so that they remain with one group of students for all
three semesters. However, with course signups, singleton courses, etc., this becomes
a very difficult goal to meet. Additionally, teachers who are leaving or retiring are
encouraged to do so in June, so as not to disrupt student instruction; and new teachers
are hired as of September 1, for the same reason. Yet, with Math A typically being a
year and a half course, these very efforts to limit disruption of the continuity of
instruction, actually
cause
disruption.
Additionally, those students who complete Math A in a year and a half, and who choose
not to enroll in Math B, need to take an additional year and a half of mathematics.
Schools are left with the problem of "inventing" half year options to help these students
continue their education. The Panel believes these students would be better served by
taking two full year courses after passing the Math A exam.
It is also noted below in the section of this report on the Math A Exam that the current
configuration is creating a content validity issue for the exams.
Recommendation 2:
The standards and curricula should be structured so that the
typical student will take the Math A exam after one year of high school mathematics.
Math A Panel Final Report
Page 21
The Panel believes that SED, working with the curriculum committee mentioned above,
should redesign Math A into a
oneyear course
, by realigning topics in K8, by
streamlining topics, and by providing a specific scope and sequence. (The Panel
wishes to make it clear that it does not see one year as a mandated or required length
of the course. Local districts should have the option of providing alternative time
frames for course completion to tailor the course to the needs of the student
population.)
Math A Panel Final Report
Page 22
B.
Findings and Recommendations Concerning the Math A Exam
Introduction
Math A was designed to raise the standards of mathematical knowledge and problem
solving ability of New York high school graduates. The Panel supports the efforts to
provide access to high level programs for all children, and the efforts to raise math skills
across the State. However, the Panel's work has led it to the conclusion that these new
standards were not welldefined by clearlyspecified objectives, an adequately
structured curriculum, specific courses, or sufficient professional development. Rather,
it was required of teachers, students, and others to make strong inferences about
Math A based largely upon its operationalization in the form of the Math A examinations.
The introduction of the Math A test with its higher standards presented an array of
challenges to the New York State Education Department (SED) staff. These challenges
would be daunting under ordinary circumstances, but the difficulties were, we believe,
compounded by staffing inadequacies, and technical constraints imposed by New
York’s Truth in Testing law.
The Commissioner’s Panel investigating the June 2003 results was charged with
responding to nine elements of the Commissioner’s charge. One subcommittee of the
Panel focused more squarely on technical issues. For example, we looked at whether
Math A exams in general (including the June 2003 exam) have been designed and
implemented in compliance with appropriate professional test standards. We found no
material problems in this area. (See Section 1, below.)
We then proceeded to investigate technical concerns specific to the June 2003 Math A
test. We investigated issues related to item writing, test development, equating,
technical analysis, and oversight of contracted services related to the Math A
examinations. We evaluated the infrastructure that supports the conceptualization,
development, administration, and reporting of test results. It was in these areas that the
Panel found serious inadequacies.
In the following sections, we first briefly address compliance with relevant professional
standards. We then address the technical issue of comparability of examinee groups.
Finally, we turn to problems we identified with the Math A assessment and the
infrastructure supporting it. For each problem, we provide a summary of the evidence
that led us to conclude a problem existed, followed by one or more recommendations
for addressing the problem.
1.
Compliance with Appropriate Professional Standards
Element number 1 of our charge from the Commissioner of Education was to answer
the following question:
Math A Panel Final Report
Page 23
Did the June 2003 Regents Math A exam measure achievement of the
New York State mathematics standard three as defined through the core
curriculumconsistent with generally accepted standards for assessment?
(Refer to the socalled “Joint Standards.”)
Many relevant professional standards exist, including the
Code of Fair Testing Practices
in Education
, the
Code of Professional Responsibilities in Educational Measurement,
and others. The reference in the Commissioner’s charge to the “Joint Standards” is a
reference to the single most authoritative source of guidelines for appropriate practice in
educational testing. That document, which bears the formal title,
Standards for
Educational and Psychological Testing,
is the result of the joint efforts of the three
leading organizations representing expertise in educational measurement. Those
organizations are the American Educational Research Association (AERA), the
American Psychological Association (APA), and the National Council on Measurement
in Education (NCME). The
Standards
are comprehensive in that they contain chapters
on every relevant topic that could be addressed when evaluating the quality of a test
(e.g., validity, reliability, bias, and so on). The
Standards
themselves are updated on a
recurring basis. The latest edition of the
Standards
was published in 1999. (One author
of this report served on a review committee during the development of the 1999
standards and currently serves on the Joint Committee on Testing Practices of NCME.)
A detailed comparison of the degree to which the Math A examinations are developed,
administered, and reported in accordance with the
Standards
was beyond the scope of
our time frame and resources.
8
Instead, we reviewed the Math A assessment program
to identify potential, serious violations of the
Standards
on the most important aspects
they were intended to inform. For example, the
Standards
require that testing programs
provide evidence of the validity of examinations. One way to satisfy this requirement
would be to demonstrate that each item on a test was linked to an established content
standard. On this count, we were presented with evidence that each item on the Math
A test was written by teachers with backgrounds in mathematics teaching, curriculum,
and by others with editorial skills. Each item in the June 2003 Math A test was
reviewed, among other reasons, to ensure alignment with the Math A content
standards.
We were also presented with evidence that the June 2003 Math A test comported with
appropriate professional standards in other areas. For example, it met requirements for
gathering and reporting reliability evidence; statistical and judgmental procedures were
in place to screen items for potential differential functioning in various groups (i.e., to
ensure items that are equally fair to various sex or ethnic groups); contemporary,
accepted procedures were followed for establishing the performance standards (i.e., the
“cut scores”) defining the levels of performance on the test; careful sampling plans for
field testing were provided, and so on.
8
There are 264 individual standards listed in the document. Of those, not all would apply to K12 testing
programs in education, such as the Math A program. Nonetheless, a detailed evaluation of even the
relevant standards was not possible.
Math A Panel Final Report
Page 24
On the other hand, time permitting, the Panel would have wished to pursue compliance
with each, specific, relevant
Standard
in greater depth. For example, we would have
liked to more thoroughly investigate the specific qualifications of item writers. We would
have liked to obtain information on the specific characteristics of those who set the
performance standards (i.e., the cut scores) for the Math A examination in the first
place. We would have liked to delve deeper into the Math A item pool to investigate to
what extent item classifications are clear and unambiguous, the extent to which the pool
has sufficient contentvalid items to support the development of truly equivalent test
forms, and so on.
9
Admittedly, in the Panel’s prioritization of elements in the charge, this element did not
receive our focused attention until other more timecritical elements had been
addressed. It is our understanding that SED has documentation on many aspects of
alignment with the
Standards for Educational and Psychological Testing,
although we
have not yet completed a review of that documentation.
It is our opinion that detailed scrutiny against all of the relevant standards would likely
reveal areas for which improvements could be made. This speculation notwithstanding,
it is our opinion that the June 2003 examination was developed, administered, and
scored essentially in compliance with the applicable specific guidelines of the
Standards
as well as within the spirit of that document.
2.
Groups of Students Taking the Test
The Panel was asked to answer the following question, which is the third of the
elements presented to us by the Commissioner:
Were groups of students taking the June 2003 Math A exam statistically
similar to or different from those taking previous Regents Math A exams?
This charge was difficult for the Panel to address and our conclusions are not founded
on definitive data. The Math A testing program is precluded from collecting
comprehensive, timely data on test takers. And, as we will see in the next section of
this report, the statistical procedures which might aid in providing key policy and
evaluative information also suffered from inaccuracies which result in part from State
requirements that constrain appropriate test design.
Overall, our evidence on the question of group similarity is somewhat mixed. On the
one hand, as we noted in our Interim Report presented to the Commissioner on August
25, 2003, there are anecdotal reports to support the conclusion that “there are some
differences between the June, 2002 population and the June, 2003 population in that,
this year, more students who are struggling in math took the Math A exam because the
9
It is our understanding that SED has documentation that demonstrates the level of compliance with each
of the relevant standards, produced previously for another purpose. Further, we understand that SED is
gathering information regarding current compliance for review by the Panel. This documentation,
however, was not obtained in time for a complete analysis and evaluation by the Panel.
Math A Panel Final Report
Page 25
Course I exam is no longer an option.” On the other hand, one segment of the
population we examined 9
th
grade students has remained reasonably similar. As
stated in our first report, this group “has included, and continues to include, only those
students who are strong in math and who the teachers feel can challenge this exam at
that early stage of their high school career.” It was our comparison of this relatively
more stable and homogeneous group’s performance from a sample of 400 school
districts that suggested some adjustment of the scaling for the June 2003 Math A
examination was in order.
3.
Problems with Math A Assessment and Infrastructure
The Panel was asked by the Commissioner to address other questions related to the
technical aspects of the Math A assessment program. A subcommittee of the Panel
focused intensively on the following questions:
Element 2: Were there anomalies in the test preparation process that
could account for real or perceived changes in the level of difficulty in the
June 2003 Regents Math A exam in comparison with prior Math A exams?
This includes but is not limited to item writing, pretesting and field testing
(including adequacy of the samples), production scheduling, scaling,
equating, final test assembly, and review of the completed exam.
Element 4: Is the 2003 Regents Math A exam of the same level of
difficulty as prior Regents Math A exams? (That is, in addition to the
equating included in question 2, consider the content, cognitive demand,
and perceived difficulty of the exam.)
Our observations, data, and technical and logical analyses cause us to conclude that
the Math A test has gotten more difficult over time. It is clear that certain psychometric
procedures were not working properly; that relevant field test populations and
performance were unstable and poorly understood; and that SED lacked appropriate
and sufficient infrastructure to forecast, prevent, or respond to these problems.
Finding 3:
The June 2003 Regents Math A exam was harder than past Math A
exams.
Mathematics teachers on the Panel were unanimous in their assessment that the June
2003 Math A test, particularly Parts III and IV, were harder for their students than the
previous Math A tests they reviewed. In our Interim Report, we provided evidence that
supported the ultimate decision to rescale the June 2003 Math A test.
Specific to the June 2003 exam is the finding that a statistical indicator of the difficulty of
test items (called “bparameters") was higher in Parts III and IV of the June 2003 June
test than in the June 2002 test. This statistical observation is confirmed by content
experts on the Panel who judged that the June 2003 items (particularly those in Parts III
and IV) were substantially more cognitively and linguistically complex.
Math A Panel Final Report
Page 26
We also compared the statistical and/or judgmental difficulty of three groups of items: (i)
the items appearing on the June 2003 Math A exam; (ii) sample items, intended to be
representative of Math A item difficulty, presented in the Mathematics Resource Guide
and the 1998 Math A Test Sampler; and (iii) the set of 18 anchor items, created in 1998,
and used in every subsequent field test as the basis for calibrating Math A items and
equating Math A tests.
10
Content experts and noncontent experts on the Panel
concluded that the differences in these groups of items were striking, with the sample
items and anchor items being dramatically less linguistically and conceptually complex
than the comparison items in the June 2003 exam. Because we did not go back to also
examine pretest data for these items, we cannot say whether the increasing difficulty of
test items is more due to a change attributable to the way items are created (i.e., to
changes in item writing practices) or to the way tests are created (i.e., to changes in test
construction practices). There are at least three hypotheses for why the items appearing
on the June 2003 Math A exam were, in real terms, harder than items appearing on
previous tests. First, the increased difficulty may be due to a systematic evolution of
items in the Math A item pool. It is possible that more straightforward items were
selected for use on earlier examinations and that those items that remained in the pool
for inclusion on the June 2003 exam were those implicitly judged to be less than
optimal. A second hypothesis is that item writers for the Math A exam (likely
unknowingly) evolved in their item writing practices writing more straightforward,
easier items in the beginning and crafting more complex items as they exhausted their
initial ideas for items, or as they became gained more experience or a changing
perception of the level of complexity intended to be tapped by the Math A assessments.
Finally, it is possible that a preference for items of increasing complexity (again, likely
unknowingly) affected the decisions of those assembling the June Math A test in their
choice of items.
These hypotheses are, of course, hard to test. However, we believe that the increase in
item difficulty can be traced, at least in part, to some chronological constraints of the
Math A test development process. We note that items in the June 2003 test were
created at least as far back as the fall of 2000 and assembled into test forms in fall
2001. At this time, Math A instruction was just starting and item writers would likely
10
At this point, a few definitions may be helpful.
Anchor
items are test questions for which the difficulty
level of the question is considered to be known or fixed, based on the performance on those items by a
reference group. In this case, the difficulty of the anchor items was established by the performance on
those items of the first group to take the new Math A test in 1998.
Once the difficulty levels of the anchor items are known, the anchor items are administered along with
new/field test items in subsequent years. In each subsequent year, a comparison of student performance
on “known” anchor items provides a basis for determining the difficulty level of new/field test items. This
process is referred to as
calibrating
the new items.
Finally, a statistical method called
equating
is used to determine a level of overall performance on a
subsequent test comprising new items such that the standard of performance required to pass is the
same for the group taking the current form of Math A test as it was for groups that took previous forms of
the Math A test.
Math A Panel Final Report
Page 27
have been aware of the fact that the first few Math A test administrations had very high
pass rates. This situation could have created an expectation that more challenging
items would be appropriate in the future when Math A instruction was more established.
Finally, we observed two inherent design problems that affect the difficulty of the test.
The first involves an aspect of the Math A test itself and centers on the weights
assigned to and rubric scale values possible on the
openended
(also called
constructedresponse
items).
11
As we have noted, the difficulty of the openended
items in Parts III and IV of the June 2003 exam was substantially greater than problems
on previous forms. However, as these items become harder, the effect of the rubrics
and scale values used to assign partial credit increases. The current test specifications
mandate a small number of scale points possible for the openended items. If the total
points possible on an openended item can only be obtained if a student’s response is
errorfree, then even the most minor arithmetic error will lead to loss of 33% credit on a
3point free response item and a loss of 50% credit on a 2point free response item.
This design problem does not explain the lower performance on the June 2003 Math A
exam, but it can lead to unexpected fluctuations in mean performance from year to year.
The second design problem concerns the lack of close alignment of the instruction with
the content assessed on any given Math A exam. This characteristic, which we believe
is a design flaw in the assessment system, operates in the following manner. There are
many indicators – 103, each with varying levels of difficulty that form the content
standards to be taught for Math A. Many indicators encompass a variety of distinct
problem types and can be tested in a variety of ways at widely variable levels of
complexity. The Math A test, however, is constructed to consist of only 35 items. Our
review indicates that it is not uncommon for some of these items to require mastery of
the same indicator. It is the consensus of the mathematics educators on the Panel that
it is impossible for teachers to cover all possible combinations of indicators, problem
types, and levels of complexity in the preparation of students. Thus, a student’s
probability of success on the Math A exam is related, in part, to the relative emphasis
his or her teachers place on each of the indicators. For example, if a teacher
emphasizes trigonometry, but if no trigonometry indicators are represented on a
particular Math A test (as was the case on the June 2003 test) the student’s skills will be
underestimated. Conversely, if a teacher emphasizes mastery of the Pythagorean
theorem, and if that knowledge is represented on a particular Math A test (as it was in
several items on the June 2003 test), the student’s competence may be overestimated.
In situations such as presented by the Math A assessment system where there are a
large number of indicators, it would be reasonable for teachers to look to sample items
provided by the State for guidance. However, as previously noted, the sample items
11
An
openended
item is one for which a student must generate his or her own response, such as an
essay or showing the work to arrive at the solution for a problem. This item format differs from a
select
response
item (such as the multiplechoice format) where the student selects from a fixed set of provided
choices.
Rubric
refers to the scoring key used to evaluate openended items. For both item formats, a
completely incorrect response would ordinarily receive zero points. However, whereas multiplechoice
items have a fixed point value for a correct response, openended items are usually evaluated such that a
better response earns more points than a weaker response.
Math A Panel Final Report
Page 28
provided as exemplars seriously misrepresent the overall level of complexity and
difficulty of items on the June 2003 exam. Teachers who used this resource as a basis
for aligning instruction and adjusting their teaching what they perceived to be the level
of challenge of the Math A exam would, through no fault of their own, have erred.
The problem created by content underrepresentation in the current Math A assessment
system cannot be overstated. It is an obvious validity concern. Beyond that, the
problem prevents the system from capitalizing on a known phenomenon in assessment:
instructional alignment. Many states have implemented higher standards and required
mastery of more rigorous content. As might be expected, when new standards are
introduced, overall performance is often at lowerthandesirable levels. However, when
the new content standards are clearly specified, when instruction can be focused on the
content standards, when tests can be created that are more fully representative of and
aligned to the content standards, fairly large increases in average student performance
are routinely observed.
Recommendation 3A:
Establish and maintain narrow statistical targets for difficulty
of Parts I, II, III, and IV of the Math A exam forms.
The relative difficulty of the four parts of the Math A exam must be stabilized so that the
parts are more homogenous, equivalent, and stable as possible. The means and range
of item difficulties should be consistent across parts and across forms. Having these
targets in place will not only result in the reality of statistical stability, but will also
promote the perception of fairness that items on each section of the test are of
approximately the same level of challenge.
Recommendation 3B:
Review the Math A item pool.
There exists a pool of field tested items that are available for use on future Math A test
forms. However, the extent to which these items vary in linguistic and conceptual
complexity and indicator coverage is not known. Obviously, if it is decided that the
current Math A content standards and indicators are to be revised, each item in the pool
would need to be reviewed to determine whether it is well aligned to the new content
specifications. However, even if no changes are made to the content standards and
indicators comprising Math A, the entire pool of old items must be reviewed initially and
periodically to determine if item writing practices are inducing drift in complexity or
misalignment.
Recommendation 3C:
The difficulty of problems in the
anchor item set, in the
guidance documents provided to teachers, and on the actual tests should be aligned.
We noted the serious mismatch in difficulty and complexity among three sets of items:
the set of items used as anchors, the set of items provided to teachers as samples of
the content and complexity of Math A tests, and the set of operational items appearing
on the June 2003 Math A exam. We discuss later in this report the problem introduced
by a mismatch between anchor and scored items; this problem is of a more technical
Math A Panel Final Report
Page 29
nature. However, differences in difficulty among
any
of the three sets of items are of
obvious concern, particularly the extent to which misrepresentation of difficulty and
scope of coverage in the sample item set can lead to misalignment of classroom
instruction.
Recommendation 3D:
The weighting of the openended items, number of scale
points possible on the openended item rubrics, and other aspects of the scoring of
openended items should be reconsidered.
Recommendation 3E:
The Math A test should focus on a more limited, more
clearlyspecified set of content standards and indicators.
While the Panel strongly supports the higher standards envisioned by the Regents for
Math A, we believe that the current configuration of Math A content standards and
assessments jeopardizes the attainment of those higher standards. Observations and
recommendations relative to the Math A curriculum presented elsewhere in this report
support this conclusion. Recommendation 1B presented earlier in this report
recommends that a new process be put in place to review and revise the current
standards.
The configuration of Math A coursework is also relevant to the problem of content
underrepresentation on Math A tests that results from the number of indicators that
comprise the current framework. It is important to note that Math A was originally
conceived of as a challenging threesemester course. For many students, however, it is
taught as a foursemester course. Such a structure may have been thought to be
necessary given the larger number of indicators to be addressed. However, it is not
possible to adequately assess a large and representative enough sample of indicators
in a threehour, 35item examination. There are, of course, two possible remedies.
Doubling the number of items on the examination would more fully represent the
content. However, we judged that a single mathematics examination requiring six hours
of assessment to be unacceptable from many perspectives: public support, cost,
student fatigue, and others. Thus, as recommended elsewhere in this report, we
believe that consideration should be given to reducing the number of content standards
and indicators, and structuring Math A as a twosemester course.
Finding 4:
The Math A tests have not been able to maintain a consistent performance
standard over time.
Equating is the process by which a standard of performance (i.e., the level of
performance indicated by a cut score) is maintained over time. There are a variety of
designs possible for implementing equating. The equating design used for the Math A
test consists of including blocks of anchor items along with field test forms so that items
in a field test form can be calibrated and the passing standard (i.e., cut score) used for
subsequent operational test forms can be adjusted to ensure comparability with
previous years’ tests. The current equating design is perhaps the best procedure
possible given the constraints imposed by New York law on item release and
Math A Panel Final Report
Page 30
constraints imposed by the current practice of voluntary participation in pretesting and
field testing. However, the equating design used is also highly susceptible to the
introduction of fairly large and consequential errors.
A first consequential weakness in the equating design is that the anchor items are
administered along with the field test items under conditions that do not have sufficient
controls in place to assure confidence in the resulting statistical properties of
either
set
of items. For example, the test is given under what are termed “nonmotivated”
conditions. There are no consequences for students and no diagnostic information
provided to teachers as a result of their students’ participation in a Math A field test. It is
well known that students do not put forth their best effort under nonmotivated
conditions. We noted that up to nearly 20% of students simply opted not to answer
some of the multiplechoice items administered during field testing; the proportions of
students not putting forth their best or even typical effort was routinely even worse
on more complex, openended items requiring a constructed response. Consequently,
statistical estimations regarding how these field test items will perform when they really
“count” are extremely tenuous.
Compounding this problem is the fact that the field test samples can be of woefully
inadequate size and of unknown representativeness. Ideally, a large and representative
sample of students from across the State of New York should participate under
motivated conditions so that only technicallysound, fair items appear on subsequent
Math A exams. Ordinarily,
minimum
sample sizes of 1500 students carefully chosen
to proportionally represent important demographic characteristics in the State would
be desirable to obtain stable, useful information regarding each item’s difficulty,
potential bias, and other characteristics. However, because participation in field testing
is both voluntary and of essentially no consequence or benefit to test takers, we
observed that sample sizes as low as 250 were used. To obtain even this many
respondents, it was sometimes the case that the samples were not as representative of
the State as would be optimal. To the extent that item parameters (i.e., the technical
characteristics of items) are misestimated because of small, unrepresentative samples,
the equating of the Math A tests (that is, the ability to ensure that the passing standard
is equivalent from yeartoyear) is jeopardized.
A second consequential weakness in the equating design is that the block of anchor
items used to equate current forms of the Math A test consists of the identical block of
items first used to anchor the score scale in 1998. It is the judgment of the content
experts on the Panel that these anchor items most closely resemble items that would be
appropriate for assessing the old Course I. They appear to be uniformly less
linguistically and conceptually complex than, for example, the (nonanchor) items that
comprised the June 2003 test. While item writers producing items for the 1998 Math A
test may have attempted to generate items aligned with what they conceived of as the
new Math A standards, it is clear that the conceptualization, understanding, and
implementation of Math A as it has evolved are dramatically different. Thus, there has
been an increasing disconnect between the knowledge and skills measured by the
Math A Panel Final Report
Page 31
anchor items and the knowledge and skills measured by operational items on current
forms.
Previously in this report, we described the practical consequences of this disconnect.
However, there are also serious technical consequences of this disconnect that likely
resulted in an inaccurate equating of the June 2003 Math A test to previous forms and
which, if uncorrected, have the potential to affect future test forms.
Our investigation revealed serious malfunctioning of anchor items which, we suspect, is
likely attributable to the fact that the anchor items tend to reflect one “version” of
Math A, while the other items in Math A forms reflect another “version.” The statistical
process of equating requires homogeneity of content tested that is, a single “version”
of Math A should be evident in the anchor items and recent field test and operational
items. This requirement is sometimes called
unidimensionality.
To the extent that
instruction in classrooms resembles one version or another and, as a consequence, if
students perform better on one type of item than another, the requirement of
unidimensionality is violated and the statistical process of equating becomes inaccurate.
In the case of the 2003 Math A examination, the inaccuracy likely occurred because of
the following:
•
students performed approximately the same, or slightly worse, on the anchor
items compared to their performance on the operational items (perhaps due in
part to changing instructional practices, content coverage, etc.);
•
this differential made the group appear relatively weaker than previous groups,
and made the operational items appear relatively easier; and
•
these relative differences resulted in the statistical procedure of equating
producing a higher raw cut score for the June 2003 exam.
There is convincing evidence that this hypothesis explains the resulting lower passing
rate initially observed on the June 2003 exam. The Panel requested certain analyses to
shed light on this hypothesis. SED personnel and its contractor, Measured Progress,
provided us with information in graph form that showed the relationship between the
difficulty values (adjusted and unadjusted bvalues) of the items in the 2002 operational
form and its anchor items. These relationships appeared to be uniform and a statistical
test of the slopes of the two regression lines would likely be nonsignificant (indicating
similarity between what is measured by the anchor and operational items). However,
the same information provided for the June 2003 test reveals a marked difference in
these relationships, suggesting that the anchor and nonanchor items were, in fact,
measuring somewhat different constructs for the 2003 administration. SED personnel
have indicated that they will perform thorough statistical analyses on these
relationships. We fully expect that the hypothesis stated above will be upheld by such
analyses.
Finally, there is a secondary statistical problem with the anchor item set. This problem
does not necessarily explain the anomalous results witnessed on the June 2003 Math A
Math A Panel Final Report
Page 32
exam, but it has the potential to cause instability in test equating for
any
administration.
This problem involves the stability in performance of anchor items over time. In the most
common and stable equating procedures, anchor items are administered under
motivated conditions to the population of students as embedded items in an operational
form. Thus, the difficulty levels (i.e., bparameters) of the anchor items are usually
highly stable. In such situations, changes in the bparameters of an anchor item on the
order of .30 in difficulty would routinely cause the item to be
excluded
from use as an
anchor item in equating.
12
Recall, however, that for the Math A tests, the same anchor
items are administered each year in order to calibrate new/field test items. Recall also
the previously mentioned problem of small, potentially nonrepresentative field test
samples. Our investigation revealed large swings in anchor item bvalues, with the
magnitude of instability in the range of 1.0 logits (absolute value). Under such
circumstances, any equating procedure would be highly unstable.
Recommendation 4A:
Alternative equating designs should be considered.
Under current law, all operational test items must be released for public scrutiny. As a
consequence of this constraint, New York is precluded from taking advantage of the
most common and preferred equating design one that is known as an
internal anchor
design. Under an internal anchor design, a small proportion (approximately 20% of the
total number of test items) of scored anchor items, representative of the test
specifications as a whole, is embedded into the operational test. The internal anchor
design is preferable because it ensures that anchor item information is based on the
largest number of representative, optimallymotivated students as possible, and that
each item counts toward a student’s score and each item provides information used to
make the eventual pass/fail decision.
Because an internal anchor design is not permissible under current New York law, an
external anchor
design may be an appropriate alternative. Using an external anchor
design, a small, representative set of anchor items is still administered, but performance
on the anchor items is excluded when calculating students’ scores. Such a design is,
however, sometimes subjected to the criticism that it is inefficient and educationally less
desirable not to include information on anchor item performance when estimating
students’ overall competence.
Recommendation 4B:
Sampling procedures for estimating item performance must
be improved.
Sampling procedures must be revised to ensure that larger, representative, and more
optimallymotivated samples of students participate in pretesting and field testing of
Math A items. Changes in regulations to require mandatory participation of sampled
units would be one possibility; education and persuasion would be another. Enhanced
12
It may be helpful to realize that these difficulty levels (bparameters) ordinarily range from 3.0 to +3.0
on a scale called the “logit scale,” with an average or middle difficulty level of zero and a standard
deviation of approximately 1.0. Thus, a change in bvalue of .30 logits represents a change of nearly a
third of a standard deviation in the item’s difficulty.
Math A Panel Final Report
Page 33
auditing and monitoring of sample demographic characteristics and motivation by SED
is also recommended to ensure stability of representativeness and comparability with
operational data.
Recommendation 4C:
Replace the anchor item set.
As noted above, the anchor item set is no longer representative of the remaining
operational content covered by Math A exams. The equating item set must be
reconfigured using items that represent a “miniversion” of the operational Math A test
forms beginning with the January 2004 administration and for all subsequent forms.
That is, the anchor item set must be representative of the breadth and depth of content
coverage, complexity, and difficulty of currently operational Math A items.
Recommendation 4D:
Revisit performance standards (cut scores).
Because the cut scores for the two levels of performance on the Math A exam were
established under a different conceptualization of Math A, it seems imperative that the
appropriateness of the current cut scores be reviewed and, if appropriate, revised. It is
possible (and technically defensible) to maintain current standards via various methods
(assuming that it is desired to maintain the standards eventually accepted for the June
2003 exam). Options would range from simply maintaining the same standard as was
eventually applied for June 2003 to "affirming" that the standard is correct using a
content expert/judgmental review. However, serious consideration should be given to
revisiting the cut scores if the content standards and indicators remain the same. If the
content standards and indicators change, a new standard setting study would be
essential.
Finding 5:
The New York State Education Department cannot accurately predict
performance on Math A tests.
In educational testing programs, it is often desirable to be able to predict performance
(such as overall pass rates, changes in subgroup performance, etc.) on tests. These
predictions are usually only accomplished for a shortterm outcome such as for the
next administration of a test. However, even shortterm predictions are useful for
informing policy makers, gauging the resources that may be necessary to provide
remediation or advanced coursework, and other uses. In many educational testing
programs, the technical aspects of testing have been rigorously designed, refined, and
controlled. And though we would wish otherwise in many educational settings,
dramatic, pervasive progress or regress in learning over a short term is rarely observed.
These two characteristics combine to result in changes in overall performance that can
be reasonably accurately predicted.
For the Math A tests, largely as a consequence of our observations related to Finding 4
(see above), SED has been unable to predict performance characteristics of operational
Math A test forms. It would be helpful, for example, if field test data can be used to
Math A Panel Final Report
Page 34
better estimate individual item performance, to signal warnings about future operational
test performance, to suggest areas of strength and weakness in curriculum, to yield
estimates of operational form pass rates, and so on.
Recommendation 5A:
SED should implement procedures for predicting the
performance of test forms and groups of students on future Math A exams.
It is likely that some research will first be required to identify statistical approaches that
would be best suited for the context of Math A exams and would yield the most accurate
predictions. However, there is an advantage that SED has data on past test form
characteristics and group performance already in hand. Retrospective studies can be
conducted to identify promising approaches.
Recommendation 5B:
Policies for field testing and data collection should be
revised.
In a previous recommendation (4B) we identified suggested changes in field testing
data collection to address another issue. The adoption of this recommendation would
also aid in addressing the issue of predictability identified here.
In addition to revision of field testing procedures, policy changes should also be
considered that would permit SED to gather more timely and complete data on
statewide operational test administrations. In order to analyze test information, policy
makers and educators need timely and accurate data, and regulations should be put in
place to assure rapid and efficient data reporting.
Finding 6:
Support and oversight for the Math A exam program should be improved.
It goes without saying that all states are struggling to meet many demands. The
requirements imposed by the No Child Left Behind Act, public pressures for greater
accountability in education, shrinking budgets in a time of economic uncertainty, and
other forces have placed stresses on all corners of the educational system. Important
testing programs, such as the Regents examinations, have not been able to hide from
these pressures, nor would they necessarily be immune to fiscal belttightening at a
time when sacrifices must be borne by all.
On the other hand, highquality, highstakes testing cannot be done “on the cheap.” It is
the strong impression of the Panel that the Math A assessment program has lacked the
support it must have in order to produce with consistency tests that gauge the success
of New York high school students on content they must master in order to gain a
diploma. We sense that there is actually strong psychological support for the work that
has been accomplished to date. Our conversations with the Commissioner and his
associates revealed to us genuine respect for the talent, enthusiasm, and commitment
evident in those who have worked on the Math A program. However, other forms of
support are essential for ensuring the production of tests upon which important
consequences hinge.
Math A Panel Final Report
Page 35
The occasion of having to examine the development of Math A exams also availed the
Panel of several opportunities to observe internal and external processes. We believe
that there is room for improvement in these processes. For example, in some data
gathering, it was unclear where the primary responsibility rested for an activity; this
made it difficult for the Panel to know where to direct requests for information and likely
made it difficult for SED personnel to rapidly respond to such requests. In other cases,
primary responsibility for an activity was diffused; in such cases, it did not appear that
any single person had “the big picture” required for effective oversight.
Recommendation 6A:
SED should immediately increase inhouse content and
technical expertise resources by a minimum of one psychometrician and two math
content specialists.
The Panel observed that SED leaned heavily on external content experts to guide the
development and support activities for Math A (and other) examinations. The number of
content support positions at SED has been reduced over time. Further, we observed
that, with the exception of the director, there is essentially no internal expertise in
psychometrics that is, in
testing.
This
strikes us as fundamentally inconsistent with
the mission and activities of the testing unit. The director of the office does have a high
level of expertise and experience in psychometrics, but the administrative duties of the
director, i.e., oversight, support, and management of personnel and activities for 70
tests annually, greatly dilute this resource. While we are aware that there is a person
assigned as a statistical support person, expertise in statistics differs from expertise in
psychometrics (much like one cannot interchange a dentist for a physician). Additional
support is essential to this effort.
The Panel did not engage in a full analysis to identify the precise level of staffing that
would be appropriate for the assessment activities of SED; such an effort would likely
be useful to ensure economical use of resources. Nonetheless, an elaborate study is
not required to discern that current support levels are insufficient. It is our estimate that
personnel should be increased by at least one person with psychometric expertise and
at least two people with content expertise and experience in mathematics education.
Recommendation 6B:
SED should clarify the responsibilities assigned to its
technical advisory committee, and should request this group to provide regular reports,
including technical analyses, reactions to proposed changes in test programs, and
suggestions for improving State testing programs.
As an aid in the oversight, troubleshooting, review of proposed changes, and initiation
of new ideas for improvement in their testing programs, and other functions, many state
assessment programs rely on technical advisory committees to supplement their
internal resources and expertise. Such committees often consist of four to eight diverse
external experts drawn from academics with expertise in psychometrics, alternate
assessment for LEP or special needs students, directors of assessment from other
states, or similar backgrounds. Such committees usually meet from two to four times
Math A Panel Final Report
Page 36
per year to review critical aspects of a state’s testing programs, suggest ways to
respond to technical challenges, identify and recommend ways to avoid potential
problems, assist in developing plans of action, review or offer suggestions for proposed
changes, develop alternative strategies for accomplishing key goals of the assessment
program, and other activities as directed by the leadership of the state assessment
program. New York State does have such a committee. This Panel is not clear as to
how often the group has met, what its responsibilities have been, nor what its
recommendations have been. The Panel believes clarifying these responsibilities and
requiring regular reports in the future, would be helpful in terms of addressing technical
issues.
Recommendation 6C:
SED should increase demands placed on contractors.
In the course of requesting and gathering information for its investigations, the Panel
had a few occasions on which to observe the activities or results of activities performed
by external contractors. Our observation is that contractor performance is too variable.
For example, one contractor conducted special analyses overnight when an urgent
request was made. That was good. In another case, when the panel needed
information on a Math A exam from last year, we were informed that a contractor had
not yet provided the routine, annual documentation well after a year beyond when the
test had been given. We believe SED should take a firmer approach to hold all
contractors responsible for timely, accurate reports, documentation of all procedures,
and responsiveness for data requests and analyses.
Recommendation 6D:
Internal coordination and documentation should be
improved.
As we noted previously, the Panel sometimes observed that roles and responsibilities
related to production of the Math A exam may be too discrete. We believe that SED
should consider reorganization plans that would enable coordination of each testing
program and locate “the big picture” for a project within a single individual. Further,
SED should develop its own, internal “historical annals.” Such documentation would
consist of organized, centrallylocated documentation in which all relevant technical and
other related information about a test is maintained. Beyond assisting in time of need,
such as was the case for the current Math A investigation, such documentation would
also assist SED in times of personnel changes, for training purposes, for effecting
smooth transitions and sharing of information between contractors, and other benefits.
Math A Panel Final Report
Page 37
C. Findings and Recommendations Concerning Statewide Infrastructure Issues
Related to the Attainment of Math A Standards
Finding 7:
Passing rate data for the State as a whole were not available until three
months after the exam; no data are collected regarding student performance on
individual items, nor even regarding student performance on the four parts of the exam.
The Panel was surprised at the lack of data concerning the Math A test. As this is being
written in September, total failure rates have just become available. SED cannot
analyze the functioning of its assessments if it does have item level data. The Panel
recognizes that SED is moving toward a more comprehensive program of data
collection, and believes this needs to be an important priority.
Recommendation 7:
SED should increase its data collection capacity to include
item level data, and should accelerate its data collection timetable.
Data should be collected earlier, and data should be collected at the item level to
determine whether the assessments are functioning in accord with their design.
Finding 8:
While
the most important use of student performance data is to inform
instruction, statewide data mining models that would enable local schools and teachers
to use these data effectively are not generally available.
To practitioners, data is only useful if it is available and can inform instruction. Although
the Panel is aware that there are some efforts to assist districts with the effective use of
data, these efforts need to be broadly expanded.
Recommendation 8:
SED should substantially broaden its efforts to assist districts
in data collection, and the use of data to inform instruction.
Finding 9
:
The mathematical background of teachers delivering math instruction varies
widely; yet, raising almost three million children to higher levels of math achievement
will be impossible without highly skilled teachers.
Recommendation 9A:
SED and higher education need to continue and to
strengthen their partnerships to ensure strong teacher education programs, both pre
service and inservice.
Recommendation 9B:
The certification requirements for elementary teachers and
special education teachers should include a minimum of nine credits of college level
mathematics (see Recommendation 9C), and three credits of teaching techniques in
mathematics
.
Math A Panel Final Report
Page 38
Recommendation 9C:
Mathematics courses required for certification, both for
mathematics teachers and elementary and special education teachers, should be
specific not only in terms of number of credits required to be taken, but also in terms of
coursework required to be taken, e.g., calculus, number theory, algebraic structures,
probability and statistics, etc.
Recommendation 9D:
The Panel believes that, for any teacher responsible for
teaching mathematics at any level, the 175hour professional development requirement
should include specific mathematics requirements. The Panel's thinking is that:
•
teachers who teach mathematics exclusively should be required to take 100 of
the 175 hours in the area of mathematics;
•
secondary teachers who are certified in, and who teach in, more than one subject
area, should be required to take 50 of the 175 hours in the area of mathematics;
•
teachers who teach mathematics as part of a broad set of teaching
responsibilities, e.g., elementary teachers and special education teachers, should
be required to take 30 of the 175 hours in the area of mathematics.
Additionally, the range of possible courses that would satisfy these requirements should
be clearly specified.
Finding 10.
The public has very little awareness of Math A, and may have
misunderstandings about the goals of Math A.
Several Panel members recall the emphasis SED placed on the importance of
increased literacy, and sees that as a model that can be applied to mathematics.
Recommendation 10
:
Make greater use of SED communications capacity to
engage the public in conversations about the importance of strong mathematics skills.
Finding 11:
There is
often a "disconnect" between K12 and higher education.
A few years ago, SED encouraged local conversations between leaders of K12 schools
and higher education, i.e., regional meetings involving college presidents and
superintendents of schools. As the Panel has reflected upon the enormity of the task of
raising every child to Math A levels, it would seem advantageous for the gap between
K12 and higher education, in mathematics, to be bridged. We envision meetings of
local high school math teachers and college math professors to review their programs
and curriculum, and to explore collaborations.
Recommendation 11:
SED should encourage conversations at the local and
regional level of K12 teachers of mathematics and higher education professors of
mathematics, for the purpose of sharing curriculum, and exploring professional
development opportunities and other possible collaborations, to bridge the gap between
K12 and higher education.
Math A Panel Final Report
Page 39
Finding 12:
Raising the level of mathematics achievement of all students to high levels
must start when children are very young, and must go beyond the school day for school
aged children.
We know from brain research that learning is connected with neurological development,
and such development occurs at an early age. In order for children to be proficient in
mathematics at the high school level, they need exposure to good mathematics at a
young age. Typically, in schools, children are exposed to mathematics for
approximately one period, or 45 minutes per day. The Panel believes we need to move
beyond the capacity of public schools, perhaps establishing partnerships with local
public libraries to implement programs to children at a very young age, and also on an
afternoon, weekend, and summer basis during a child's school career. We would
envision that these programs would be designed to help young children become as
excited about ideas of mathematics as they are about reading a new book. We believe
that, for mathematical skills and concepts to be learned, they have to be viewed as
important as reading skills, and this means bringing other partners to the table.
The Board of Regents is in a unique position, given its broad oversight of educational
functions in New York State. The Panel believes there are opportunities to expand this
effort well beyond the doors of the state's K12 schools. Libraries have programs for
preschool; public television is viewed by even the youngest children; some museums
have science programs and exhibits, which can be expanded to include more
mathematics. To enable all children to reach high standards in mathematics will
require societal and cultural changes which will only occur if all of the forces are aligned
in the same direction. The schools alone will not be able to do this work.
Recommendation 12.
SED should encourage through grants and other means the
expansion of mathematics education initiatives beyond K12, such as the creation of
partnerships between schools and libraries, and the greater use of public television and
museums.
Math A Panel Final Report
Page 40
D.
Findings and Recommendations Concerning Additional Issues
1.
Scoring Rubrics, and Communication to the Field Regarding Grading
Finding 13:
The scoring rubrics do not give credit for a variety of mathematically
correct approaches.
While an important goal of Math A is to encourage multiple approaches to solving a
problem, some Math A exam questions force a student to solve a problem by one
particular approach, e.g., item 35 on the June 2003 Math A exam. The June 2003
scoring rubrics also had instances in which only one approach received full credit, or
where more favorable treatment in partial credit was given to one approach over
another. The Panel recognizes that it is very hard to develop a comprehensive grading
rubric that anticipates the credit that should be earned by unexpected approaches,
whether students get the right answer or are on the right track. A more holistic
approach to scoring rubrics may be needed, one perhaps more similar to the rubrics
used in the International Baccalaureate Math program. The Panel believes there must
be room for teachers to apply professional judgment in the grading of student work.
Recommendation 13A:
Develop more generally worded, holistic scoring rubrics
which permit credit to be granted for atypical, but mathematically correct, student
responses.
Recommendation 13B:
Rubrics should be designed so students do not lose 33% or
50% credit for a minor arithmetic error.
Finding 14:
There is a serious "disconnect" between the perception of the SED
content specialists and the perception of field classroom teachers regarding the
application of the scoring rubrics.
At one point, during intense discussion about a particular rubric, SED staffers stated
that the rubrics are general guides for grading; the room became very quiet, and one
classroom teacher stated that the field understanding is that the rubrics are to be
applied with little latitude. Several Panel members joined in that view. They seem led
to that thinking by the language in the scoring guide which states several times that the
"specific criteria" are to be applied. The Panel welcomes the concept of flexibility and
recommends this flexibility be clearly communicated to the field.
Recommendation 14
:
On each set of directions for the Math A exam, a statement
should be added confirming that the scoring rubrics are a guide and should be applied
using professional judgment.
Math A Panel Final Report
Page 41
Finding 15:
There needs to be better communication of SED grading interpretations
during the grading process for the Math A exams.
As the Panel discussed the grading of Math A exams, it became clear that SED staff are
very available to answer questions from the field, and that they also encourage the
flexibility noted above. The problem, though, is that SED does not have the resources
to reach out to every district, and many teachers will not think of calling SED except in
the case of a very serious matter. Thus, it is entirely possible that teachers who call
SED will apply the grading rules differently from those who do not. When the Panel
held this discussion, SED staff advised that they are exploring a website that would be
activated during Regents exam grading that could provide uptotheminute responses
to grading questions. Although this does not guarantee that the information will get to
everyone, it is a definite step in the right direction, and the Panel applauds this initiative.
Recommendation 15A
SED should continue on its path of setting up a website during
Math A Regents exam grading to provide uptodate clarifications to teachers grading
the exam.
While the website is a step in the right direction, websites are "pull" technology, i.e., the
user must pull the page up to get the information; and it is possible that there are still
schools without web access. The Panel recommends that thought be given to "push"
technology, whereby the information would be pushed out to every district. Right now,
errata sheets are faxed to districts when there is an actual error on the scoring sheet;
perhaps thought should be given to a fax every few hours after the exam (during the
school day) up to 48 work hours after the exam, to send out grading clarifications.
Recommendation 15B:
SED should explore ways of sending uptodate grading
clarifications to the school districts during the grading period following the administration
of the exam, as a backup to the website, to ensure the greatest possible consistency of
grading across the State.
2. Calculator Use on the Math A Exam
Finding 16:
Allowing the option of using a graphing calculator on the Math A exam
provides some students with an advantage on the exam, thus creating an inequitable
situation.
Students who are able to afford graphing calculators, or who live in school districts that
are able to provide them with a graphing calculator, have a distinct advantage over
other students if they are permitted to use the graphing calculator on a Regents exam.
While the Panel agrees that students should be taught how to use graphing calculators,
permitting the optional use on the exam provides an advantage to some students. The
Panel believes testing conditions should be the same for all students.
Math A Panel Final Report
Page 42
Recommendation 16:
The use of calculators on the Math A Regents exam should
be standardized
The Panel recommends that, until the State can be sure that every child has access to a
graphing calculator on the Math A exam, the use of these calculators should not be
permitted on the exam.
Math A Panel Final Report
Page 43
E. Recommendations Concerning the January 2004 Exam, and All Math A Exams
until A New One Is Designed.
The Panel has noted earlier its concern that the January 2004 Math A exam was
created at the same time as, and under the same pretest and field test conditions as,
the June 2003 exam. The Panel is concerned that there is much we do not know about
why the June 2003 exam behaved the way it did, particularly the items in Parts III and
IV. As the Panel sees it, the recommendations contained above in this report represent
a plan for redesigning the mathematics standards and assessments, and this plan will
result in a completely revised Math A exam at the end of the process. The Panel
believes it has a responsibility to make recommendations regarding the Math A exams
in the interim, and presents these recommendations here.
Recommendation 17.
Until the standards are rewritten, new curricula are
developed, the new course is delivered, and a new Math A Regents is designed and
field tested, the Math A Regents exam should be restructured so the exam includes: 30
Part I items, 5 Part II items, 2 Part III items, and 2 Part IV items.
The largest problems the Panel saw with the June 2003 exam were with the items in
Parts III and IV. By reducing the number of items in those parts and increasing the
number of items in Part I (which did not demonstrate the same problematic
performance), the Panel believes that this somewhat modified exam can be an effective
measure of student performance until a new exam is developed based on the rewritten
standards, with the provisos below. Additionally, this recommended configuration
reduces to some extent the concern about curriculum coverage, as it calls for 39 items
rather than the current 35, thus increasing the content coverage of the exam.
Recommendation 18:
The exam should be reviewed by a group of practitioners,
including math teachers, university mathematicians and mathematics educators, with
representatives from this Panel, prior to the administration of the exam.
The Panel understands SED already instituted such a quality control step for all August
2003 Regents exams, and intends to do so for future Math A exams.
Recommendation 19:
Until new items are developed and properly field tested, the
exam items should be scaled in accord with the procedures used for the August
rescaling of the June 2003 exam.
Recommendation 20.
The scaling should not be finalized until after the exam has
been administered and after a post equating procedure has been implemented to
ensure the fairness of the test.
The Panel understands SED intends to do so for future Math A exams.
If the above measures are put into place, the Math A exam should function somewhat
similar to the June 2003 exam, after it was rescaled. The Panel believes that, until the
Math A Panel Final Report
Page 44
standards, the curriculum, the assessment, and the infrastructure are in place, students
should be held to the same standard as last June's students, which leads us to these
final recommendations.
Recommendation 21:
The 55 passing option on the Math A Regents Exam for a
local diploma should be continued until after the standards have been clarified, after
new curriculum has been developed and disseminated, and after a new exam has been
developed and administered for at least one school year (to ensure that it is performing
in accord with its design).
Recommendation 22:
The math RCT safety net for special education children
should be continued until after the standards have been clarified, after new curriculum
has been developed and disseminated, and after a new exam has been developed and
administered for at least one school year (to ensure that it is performing in accord with
its design).
Math A Panel Final Report
Page 45
F. Suggested Timeline
As the Panel reviewed its thinking with SED representatives, it was suggested that the
Panel draft a timeline that might serve to guide the process. The timeline we suggest is
below.
Test Development
Standards
Curriculum
Oct 2003
Immediately:
Create three exams to be administered:
Jan 04, June 04, Aug 04,
with format:
30 Part I (2 points)
5 Part II (2 points)
2 Part III (3 points)
2 Part IV (4 points)
using current item pool, aligned with current
core curriculum item sampler, and scaled
used in August 2003 rescaling of June 2003
exam, with each complete exam to be
reviewed before administration.
Nov 2003
Dec 2003
Immediately:
Form Mathematics
Standards
Committee.
Jan 2004
Feb 2004
Nov 03 Feb 04:
Create new item pool (using Checklist of
Writing Items) for tests to be administered
Jan 05, Jun 05, Aug 05, Jan 06
under same conditions as above.
Mar 2004
Apr 2004
May 2004
May 04
: Field test.
Jun 2004
Jul 2004
Aug 2004
Sep 2004
Oct 2004
Nov 2004
Dec 2004
Jan 04 Dec 04
The Mathematics
Standards
Committee retools
the standards.
Immediately:
Examples of high
quality K – 8, Math
A, and Math B
curricula selected
and disseminated
to the field.
Jan 2005
Feb 2005
Mar 2005
Apr 2005
Feb 05 Mar 05
New Items written aligned to retooled
standards (and old items reviewed to
salvage any that are aligned).
Work should be guided by Checklist of
Writing Items.
May 2005
May 2005:
Field test.
Purpose: Create three tests for actual
administration:
Jun 06, Aug 06, and Jan 07.
Jan 05 Jun 05
Curriculum
committee writes or
chooses exemplar
curriculum, aligned
to retooled
standards.
Sep 2005
Between Sept and May, pretest, field test,
set new performance standards (bookmark).
New one year
Math A course
taught across the
State.
June 2006
First Administration of new Math A exam.
Math A Panel Final Report
Page 46
VI. Summary and Conclusion
The June 2003 Math A exam results clearly point to a need for substantial change.
After the rescaling recommended by this Panel in its Interim Report, 45% of the State's
children failed at 65; 59% of New York City's children failed at 65.
This report stated earlier:
Many states have implemented higher standards and required mastery of
more rigorous content. As might be expected, when new standards are
introduced, overall performance is often at lowerthandesirable levels.
However, when the new content standards are clearly specified, when
instruction can be focused on the content standards, when tests can be
created that are more fully representative of and aligned to the content
standards, fairly large increases in average student performance are
routinely observed (p. 28).
This Panel believes that, if the recommended streamlining and clarification of the
standards occur, and if the Math A course is streamlined to a year long course (after the
K8 standards are aligned), and if there is greater curriculum guidance to teachers and
districts struggling with this effort, and if the other recommendations in this report are
accepted, many more students will reach high levels of mathematical knowledge and
skills. There will still be some students who, despite enormous efforts by them and their
teachers, will not reach this level. This Panel believes discussion must continue to find
ways of helping these students find success; this is outside the charge of this Panel.
As we have worked, we have reflected that, in our modern society, people often look for
"sound bite" answers to even the most complex problems. We see the Math A situation
as very complex, and we hope our recommendations reflect that sense of complexity.
We also hope our thoughts prove helpful as we move forward.
Raising the level of mathematical skill and knowledge of millions of children is a
daunting challenge, but it is a challenge this Panel agrees must be faced. While taking
on this challenge, we all need to appreciate the enormity of the effort, and we must be
cognizant of the wide variety of children who enter our school doors every day. As our
children are not homogenous, our solutions for them cannot be homogenous.
Everything we do must be sensitive to their varying individual needs.
In closing, we once again express our appreciation to the staff at SED. Even as we
were working, SED continued exploring possibilities. The Panel is aware of the
Statewide Math Initiative recently formulated and the Panel believes this is exactly the
type of creative thinking that will move this effort forward.
The members of this Panel have been honored to have been asked to help find a
solution, and we offer our help in any way that might be needed in the future, so that we
adults can get this right for the children we all serve.
Back to top