2) Terms related to test construction

○ Development of tests

Item

An item refers to a question which has been developed for the test, yet been used and is stored in the item bank.

※ An item that is selected from the item bank and placed in the test is called a test item.
Test item

A test item refers to an item that is selected from the item bank and printed on the test paper.

※ Used item: an item already used to construct a test.
Content specification/content outline

This is a standard for test item development. It shows the contents of items specifically classified as "category level 1, 2, and 3" (in descending numerical order) or "subject-field-domain-specific domain-item".

Item development (item writing)

It is a procedure where an item writer develops test items based on content specification.

※ Item development is often used to cover the entire procedures including item writing and review.
Item review

Item review is a procedure whereby related specialists review the developed items to make sure that they can be used in constructing a test.

Item screening

This is a procedure whereby subject experts screen the reviewed items and decide by certain standards which to keep unchanged, which to keep after modification and which to dispose of.

Item bank / item pool

The item bank is a place where the completed test items following the process of test item development, review and screening are stored. They are stored in a card or a computer with related information on items including their characteristics.

Test construction

It is a procedure whereby items are selected from the item bank to construct a test.

Standards of test construction

The standards of test construction are composed of test subjects, category level 3, field (or domain), the number of test items, and score allocation.

○ Type of test construction

One best answer type (Type A)

The examinee selects one best answer among five options of stem for each item.

□ Choose only one best answer for each test item (← lead-in)

Ex) What is the most common cause for atelectasis after abdominal surgery? (← stems)

  1. 1) Convulsion in bronchiole
  2. 2) Bronchial clogging
  3. 3) Pneumothorax
  4. 4) Pulmonary thromboembolism
  5. 5) Deterioration of respiratory quotient
Multiple true-false type (Type K)

This is an item that corresponds to a stem, followed by four or more answer choices. The examinee has to select one combination consisting of the right answers.

Ex) What is the correct term for a recessive disease state related to the X chromosome from among following items?

  1. Hemophilia A
  2. Cystic fibrosis
  3. Duchennes muscular atrophy
  4. Tay-Sachs disease
  1. 1) A, B and C
  2. 2) A, C
  3. 3) B, D
  4. 4) D
  5. 5) A, B, C and D
Extended matching set type (Type R)

It is a type of multiple choice question. A Type R item consists of 1) theme, 2) lead-in, 3) option list, and 4) stems.

The number of options allowed is from 4 to up to 26. While in a Type A item five options are used only for the item, Type R items use one list of options for all items in the set.

Ex) Theme: fatigue

For each item, select the suggested number of the most likely diagnosis among the 14 options.

  1. Acute leukemia
  2. Anemia of chronic disease
  3. Congestive heart failure
  4. Depression
  5. Epstein-Barr Virus
  6. Folic Acid Deficiency
  7. Glucose 6-phosphate dehydrogenase deficiency
  8. Hereditary spherocytosis
  9. Hypothyroidism
  10. Iron deficiency
  11. Lyme disease
  12. Microangiopathic Hemolytic Anemia
  13. Miliary tuberculosis
  14. Vitamin B12 deficiency
  1. A 19 year-old woman complains of two weeks of fatigue, fever and sore throat. She has a fever of 38.3°C, cervical lymphadenopathy and splenomegaly. The leukocyte count was 5,000/mm3 (80% lymphocytes, most of them atypical). Blood serum aspartate aminotransferase (AST) was 200 U/L. But serum bilirubin and alkaline phosphatase was normal (select one).
  2. A 15-year-old girl complains that she has been bruising easily for the last two weeks and has been feeling severe fatigue with pain in her back. She has widespread bruising, pallor, and tenderness over the vertebrae and both femurs. Hemoglobin concentration was 7.0 g/dL, the leukocyte count 2,000/mm3, and the number of platelets 15,000/mm3 (select one).

○ Knowledge Level

Recall type

This type indicates items one can produce an answer to by simply recollecting a memorized fact. The recall type may be similar to the recognition type but it is a type of item that asks an examinee to respond to given contents. This type items specialized knowledge such as forms, facts, terms, mechanisms, principles, procedures, sequence, type, classification, methods, concepts, academic rationale and theories.

Interpretation type

This type asks an examinee to respond to a new situation based on his or her complete understanding of the acquired knowledge. This requires the examinee to remember certain facts and know the reasons for them being such, and then give a new interpretation and express it in another form. This type of item is classified as dealing with the procedures required to handle clinical information and data. For example, it presents data including medical history, radiological images, electrocardiograms, and the results of an examination and ask questions requiring interpretation, identification, analysis and explanation.

Problem-solving type

This refers to an item requiring the ability to solve specific problems by taking advantage of the knowledge that a examinee has at his or her disposal. The problem-solving type usually consists of items which require the ability not only to interpret information given in the item but also to interpret the meaning or purport of the options listed. Usually it is related to diagnosis, treatment, structuring and judgement using clinical data. These items are most comprehensive items and cover everything from the memory, understanding, application, analysis and synthesis and also the judgment and decision-making ability at each step.

○ Analysis of test construction

Item analysis

Item analysis is a procedure of analyzing the characteristics of each item that takes part in an examination to evaluate the quality of the test. This procedure can be divided into qualitative analysis, which is related to the content validity, and quantitative analysis for item difficulty, item discrimination and response analysis.

Item difficulty

The KHPLEI uses the percentage of correct answers as an index indicating how difficult an item is. If all examinees give correct answers, the percentage of correct answers is 100% and if none of the examinees give the correct answer, the percentage is 0%. Therefore, the closer the percentage of correct answers is to 100, the easier the item gets to be answered, and the closer it is to 0, the more difficult the item gets.

Test construction must consider the item difficulty depending on the purpose and subjects of an examination. In general, it is desirable for items to be evenly dispersed between 10 and 90. (That is, having suitable mixing of high difficulty items with those of low difficulty to maintain overall discrimination than having all items with similar difficulty.)

Depending on the purpose and subjects of an examination, it may be that a test is composed of items only of low difficulty or high difficulty. However, it is desirable to form normal distribution curve primarily centered around 50 ~ 60.

Item discrimination

Item discrimination indicates the extent that the test discriminates the examinees depending on their abilities. If a examinee who has high scores gets the item correct and those with low score get it wrong, the item has the power of discrimination. Discrimination index can be estimated through the correlation between the score an examinee has earned for an item and the total score he or she has earned for the test. The closer this index is to 1, the higher the item discrimination becomes.

The formula to measure item discrimination is following:

Another method to calculate item discrimination is to compare the number of examinees with high test scores who have answered that item correctly with the number of examinees with low test scores who have answered that item wrongly (Johnson, 1951). When separating examinees into the two groups, they may be divided into groups with the same number without regarding their scores, or they may be separated into 27% of examinees at the top of the score and 27% at the bottom as a way of estimating item discrimination (Kelly, 1939).

The KHPLEI uses this method of separating groups into the upper 27% and the lower 27% to measure discrimination. The closer this level is to 1, the higher the discrimination becomes. Following is how the NHPLEB calculates the level of item discrimination:

According to the above formula, when the number of examinees in the upper group who have given the correct answer is smaller than the number in the bottom group who have given the correct answer, the index is negative (-).

Although there is no absolute standard to evaluate items based on the index, Ebel (1965), in reference to reliability of a test tool, set the guidelines on the evaluation of item discrimination as follows.

<Guidelines on the evaluation of the item discrimination by Ebel>
Index Item evaluation
Over 0.40 Good
0.30 ~ 0.39 Okay
0.20 ~ 0.29 Low
0.10 ~ 0.19 Poor
Less than 0.10 None
Multiple choice response analysis

This indicates the frequency of an examinee's response to each option listed in a multiple choice item. This analysis is conducted to determine the effectiveness of distracters and the function of a correct answer.

Reliability

This refers to the degree to which an examination consisently evaluates what it is intended to and it evaluates without error.

Currently, the most common measure of reliability is Cronbach α, which estimates the reliability of an examination by measuring the internal consistency. The closer the coefficient, Cronbach α, is to 1, the higher the reliability becomes. The KHPLEI uses the coefficient, Cronbach α, to determine the reliability. The formula is as follows.