Students' course grades consisted of summed scores from three exams (75% of their grade) and so-called in-class quizzes (25%). Course exams consisted of multiple choice, matching, short problem, and interpretation (often from computer output or research descriptions that I provided) items. With the exception of the short problem items, my exams assessed conceptual understanding and so were considered quite difficult, even by students who performed well on them. Unlike the other three kinds of items, the short problem items were similar (but not identical) to assigned homework problems. The students who actually did the assigned work usually performed well on these items. Those students who did not do the assigned work did well if they understood the concepts needed for the items; if not, they performed poorly.

I tried both take-home and in-class exams and quickly settled on in-class tests. Because of my students' full lives, many did not have the time needed to complete a take-home exam in a short time period. Also, I could not determine how to keep some of the students from "sharing" too much on take-home exams.

My exams were power tests. I tried to design the exams so that everyone had ample time to complete them. I officially extended the class period for an extra 30 minutes on exam days, but I gave students as much time as they wanted. The extra time was especially important for students from other countries whose primary languages were not English.

I also used "in-class quizzes" to assess my students' understanding. I assigned homework (short mastery items that were conceptual, usual problems that required students to "work out" the answers, and computer runs). The answers to the first two types of items were found in the text I used. About four times each semester, I would collect selected aspects of this assigned work and grade it. The students were required to hand in their work immediately during the class period when I called for it (hence the name of "in-class quizzes"). To receive credit for correct answers, students were required to show their reasoning and their work. Students could work together on their homework as long as they did not directly "copy" another's work. This process allowed me to reward students who were doing their work and correct their misconceptions without overwhelming myself with grading papers.


I did not allow students to use their texts during exams. When I initially tried open-book exams, I found that nervous students tended to spend too much time "flipping" through their texts, as though the exact answers to my items must be somewhere in the text. Students also did not study as much as when the exams were closed book.

After trying a number of things, I settled on allowing students to bring to exams up to five pages of notes that they had created, all notes and examples I had distributed (which included copies of the overheads I used during class as the basis for my presentations and the two concept maps I created that summarized the course), their homework assignments, and calculators. Students spent significant amounts of time creating their five pages of notes. That process served as a good learning and review activity since students had to identify important concepts and their interrelationships to create useful notes. To be useful, all of their materials had to be organized prior to the tests, again forcing students to identify important concepts and their interconnections.


I did not allow the use of computers during exams. Each student brought a calculator that included a square root key. It could have statistical capabilities too; however, since my tests were primarily conceptual, these capabilities were of no help during my exams.
I did not use "real" data sets during my exams. There wasn't enough time during a class period to use data sets as the basis for assessment items.

I tried to use "real" variables from educational news media and relevant educational research as the basis for my items. At least 75% of my items required conceptual understanding for successful completion. I tried to construct distractor options (the wrong answers) to my multiple-choice and matching items to reflect common misunderstandings held by students who incorrectly answered the items. This process allowed me to identify and to try to remediate common misunderstandings.


[an error occurred while processing this directive]
I do not believe in "grading on the curve." This approach sets grading standards based completely on the group of students who happen to be in the course at that time. I set my own standards and created my assessments to reflect them; I graded on a percent correct basis.

Students knew the percent correct cut-offs that were used to assign grades. Total points from the exams and the in-class quizzes summed together were used to assign final course grades. I did "wiggle" the cut-offs down (never up) as seemed appropriate when I assigned final grades. For example, if a student overall got 85% of the available points, that student was guaranteed an A. However, sometimes that cut-off was lowered to 84% of the available points for a grade of A. This process allowed me to compensate for exams that were too difficult.

I also looked at the pattern of grades each student received across all assessments before assigning final grades. Because statistics is a hierarchical discipline, my tests were also hierarchical. That is, items on each successive exam required understanding of concepts from earlier parts of the course. If a student was on a borderline between two grades, I allowed the grades on assessments near the end of the course to determine that student's grade because it gave students multiple opportunities to show their understanding.

Students received one point for each multiple-choice and matching item they correctly answered. The point values for the problem and interpretation items varied depending on what was required. I scored them analytically based on a rubric I created before grading the exams (although I sometimes altered the rubric while grading when students did things that I hadn't anticipated). Students could receive partial credit for these two types of items. If a student made a numerical mistake that meant all subsequent answers were incorrect, I would not give credit for the mistake but would grade the rest of the item using the student's incorrect answer as the basis for subsequent work. I was interested in what students did and did not understand, so I do not believe in allowing a simple numerical mistake to invalidate all of the remaining parts of the item.

I always did a simple item analysis to examine the quality of my items and to identify common student misunderstandings. I selected two groups of four or five test papers; one group included the highest total test scores (the high achievers) while the other contained the lowest total test scores (the low achievers). I calculated the percentage of students who correctly answered each item for the two groups of papers combined. This number served as an estimate for the overall percentage of students in the entire class who correctly answered each item. I used this percentage to determine the difficulty of each item, and so what percentage of students understood the concepts underlying each item. I also calculated the percentage of students in each of the two groups who correctly answered each item. I then subtracted the percentage of low achievers who correctly answered the item from the percentage of high achievers who did so. My tests were meant to differentiate among students based on their understanding. This percentage difference estimates how well each item succeeds in doing so. When most items differentiated the two groups, the test exhibited reasonable internal consistency (or reliability).

This process also allowed me to identify items that were too difficult (or ambiguous). When I found extremely difficult items, I removed them from the students' test scores and used them as extra credit.


As part of the review for the exam, I provided students with example items and a list of the most important concepts covered. We reviewed for about 30 minutes during the class session prior to the exam session. Students were encouraged to ask questions, create their own items, determine what was important, share their five-page exam notes, and anything else they wanted to do or I thought was important, short of actually looking at exam items. Students often came to my office to review work; some worked together to review.
We went over the exam during the class period following the test period. Each student received her/his corrected exam paper for this review. I indicated how many points were received on each item and identified the error(s) on the incorrect items, showing what could have been done to correctly answer those items. The students and I selected the items we discussed. I provided the score distribution. I encouraged students to discuss with the class why they chose the answers they did, although I explicitly indicated that they could not fight with me (or other students), be rude, or monopolize class time. I would determine when the discussion about a particular item was over. Occasionally, students would have valid reasons for selecting or creating what I initially considered to be wrong answers. When this occurred, I would indicate that I wanted to consider the possibility that they were right and would let them know during the next class period. The test review process usually took at least 30 minutes. Students were encouraged to come to my office for further private discussion.

I did not allow students to keep copies of the exams. Some students had access to organized test files while others did not. Instead, I encouraged them to make notes that they kept. Students also could come to my office to review tests any time they wished.