As mentioned in my response to the "Use of Exams" section, I never administer timed in-class statistics examinations. Rather, my in-class examinations are administered under untimed conditions, utilizing open-book and open-notes format. I record the time taken students as they turning their examination forms. Subsequently, I note the correlation between the length of time taken and each student's score. My assumption is that a significant positive relationship would indicate that some students are not spending sufficient time on each question (i.e., poor test-taking skills as a possible primary cause and poor study skills as a possible secondary cause), to their detriment, whereas a significant negative relationship would indicate that students who are taking the most time are most likely to be under-prepared for the examination (i.e., poor study habits skills as a possible primary cause and poor test-taking skills as a possible secondary cause). However, in 15 years of teaching at the college level, I have never observed a significant relationship.
As noted in my section entitled, "Post-Exam Feedback," I spend at least one-half class period providing post-examination feedback. (This also me to model item analyses.) I provide each student with a scoring key that provides the solutions or model solutions to all items. In particular, I undertake an item analysis (classical test theory due to the relatively small sample sizes [i.e., .30]). I compute item indices such as item difficulty, item discrimination, and point-biserial correlation. These indices are compared to the cutpoints provided in the literature (cf. Crocker & Algina, 1986) for deeming good items. For example, I consider point biserial correlations (obtained by correlating scores for open-ended response and overall test scores) that are two or more standard deviations above 0 as being indicative of a good item (Crocker & Algina, 1986, p. 324). For example, a class size of 20 would yield an approximate cutpoint of .23 for a point-biserial correlation. Also, I use Ebel's (1965) criteria for interpreting discrimination indices, D. (I compute discrimination indices for open-ended responses via Kelley's [1939] recommendations.) Specifically,