Wang Ruoying, PhD student at ALC, NIE, NTU
This paper discusses GAI’s potential in assessment design and identifies its ethical and practical challenges, limitations, and concerns. In addition, the paper will provide suggestions for teachers and policymakers on responsibly integrating GAI into future assessment practices.
Strengths and Potential of GAI in Assessment Design
Personalization Tasks Design
Despite students’ differences regarding their backgrounds, personality traits, and learning styles, traditional standardized assessments often overlook the students’ differences by providing the same tasks to all. GAI overcomes this by efficiently creating assessment tasks aligned with students’ unique backgrounds, interests, and skill levels, making assessments more engaging and inclusive (Holmes, Bialik, & Fadel, 2019). Bennett (2024) emphasizes that, despite challenges, personalization enabled by GAI is essential for ethical and inclusive education, allowing teachers to efficiently create diverse, meaningful assessments, which was previously difficult with traditional methods alone.
However, the quality and relevance of the GAI-generated assessment significantly depend on the input from the teachers. We will explore this further in the latter part of the paper.
Authentic Assessment Design
Another strength of GAI in assessment design is that it can help generate authentic, performance-based assessments. Educators often find designing real-world, scenario-based tasks challenging due to time constraints and the creativity required. GAI efficiently produces realistic case studies or scenarios that teachers can refine. In my research, GAI-generated case studies were leveraged to help Singaporean Chinese youths develop workplace Mandarin communication skills, such as participating in meetings, providing feedback, managing hierarchical interactions, and handling disagreements professionally. These materials can enhance authenticity, increase student engagement, and align with Hager and Butler’s (1996) “judgmental model”, which emphasizes holistic, context-based evaluation.
Scalability, Efficiency, and Teacher Professional Growth
Now, with the help of GAI, teachers can quickly create tasks across different difficulty levels and learning contexts. Combining their professional judgment with GAI’s capabilities allows teachers to select and refine the most appropriate content, making the assessment design process more efficient and scalable.
In addition, using GAI thoughtfully can also contribute to teachers’ professional growth. Mindful interaction with GAI stimulates teachers to reflect more deeply on their assessment goals, intended learning outcomes, and evaluation criteria, ultimately fostering stronger alignment between assessment and instructional objectives (Biggs, 1996).
In summary, GAI shows great potential to improve assessment design. It helps create more personalized tasks that engage students, generates authentic assessments that reflect real-world situations, and encourages teachers to think more intentionally about designing and delivering assessments. With thoughtful use, GAI can be a powerful tool for making assessments more inclusive, meaningful, and effective.
Gaps, Limitations, and Possible Ethical Issues of GAI
GAI has significant potential in assessment design. However, it also has notable limitations. In this section, I will identify and discuss key challenges and concerns.
Potential Bias and Fairness Issues
GAI models like ChatGPT learn from extensive human-generated data and might unintentionally replicate cultural stereotypes or biases (Holmes, Bialik, & Fadel, 2019). Without careful teacher review, these biases could create discomfort or disadvantage for learners (Bennett, 2024). Thus, ensuring fairness and inclusivity still requires active human oversight to screen, adapt, and align GAI-generated content with the learners’ needs and ethical considerations.
Validity and Construct Alignment Problem
A task created by GAI may seem to align with a learning goal but could conceal underlying issues. For example, a math problem that assesses algebra might unintentionally involve complex vocabulary. In such cases, the task could transform into a reading assessment instead of a math test, introducing construct-irrelevant factors (Biggs, 1996). Teachers would be able to identify such issues during meticulous test design, but there is a risk that subtle misalignments might go unnoticed if the teacher assumes the GAI output is valid without verifying it.
Teacher and Student Over-Reliance on GAI
Another concern is that over-reliance on GAI may negatively impact teachers’ and students’ skills, judgment, and agency.
Teachers who rely excessively on GAI to generate assessment tasks without careful curation risk losing their creative input and nuanced understanding of student backgrounds and needs (Swiecki et al., 2022). Likewise, students who depend too heavily on GAI may weaken their problem-solving and critical thinking skills and diminish their learning effectiveness. Such dependence could also foster academic dishonesty if students allow GAI to complete tasks entirely. Therefore, teachers must maintain a balanced approach in GAI usage to ensure it supports rather than replaces essential teaching and learning skills.
Transparency and Explainability Issues
A further challenge is the limited transparency and explainability of GAI decision-making processes, often called the “black box” problem. This opacity makes it difficult for teachers and students to understand the rationale behind the decision, which may undermine trust in the assessment. Additionally, teachers may struggle to justify GAI-generated assessments, as saying “the GAI decided so” is neither sufficient nor acceptable.
Until better explainability mechanisms are established, transparency will remain a significant concern when adopting GAI in education. Therefore, educators should proactively prompt GAI to clarify and justify its decision during interactions.
Considerations of Students’ Perceptions and Privacy
Student perceptions and privacy are other ethical considerations that must be discussed.
First, not all students or educators will readily embrace GAI-generated assessments. Some students may feel uneasy about having an algorithm involved in their learning or assessment. They might worry that GAI-generated tasks are unfamiliar, more complex than human teacher-designed tasks, or not aligned with what they have been taught. Others may feel confused or frustrated by personalized assessments if the criteria are not transparent or consistent. These concerns are valid. Students must perceive assessments as fair and supportive to stay motivated and engaged (Sadler, 1989). Without clear communication and guidance, such discomfort could lead to mistrust or refusal to participate.
The second concern is privacy, as GAI tools often require large amounts of data to generate customized assessments or score student work. These tools may use student information, performance history, or personal data. There is the risk of data exposure where student data could be stored on external servers, potentially accessible to third parties, or used to train GAI models without consent.
In summary, challenges such as potential bias, validity and reliability issues, teacher and student over-reliance, lack of transparency, student resistance, and privacy concerns indicate that effective and ethical use of GAI requires more human intervention. Sadler (2007) warns that when new assessment methods are adopted without carefully considering their pitfalls, they risk “achieving the reverse of what was intended.” In the case of GAI, this could lead to assessments that confuse or alienate students rather than support their learning. These challenges set the stage for the next step in this critique: How can we refine and responsibly integrate GAI to fully realize its benefits while minimizing its risks?
Suggestions for Enhancing GAI Integration in Assessment Design
The overarching principle for enhancing the ethical use of GAI in assessment design is that humans must remain in control of the process, using GAI as a supportive tool rather than a replacement. With that in mind, the following actions can help optimize GAI’s integration.
Active Teacher Curation and Alignment
Instead of accepting GAI outputs at face value, teachers should actively curate the content generated by GAI to ensure its accuracy, appropriateness, and alignment with learning outcomes and instructional activities.
One practical method to effectively implement this curation is to provide relevant context, constraints, and guidelines in the initial GAI prompt. During our student-directed seminar, we demonstrated how we used GAI to design personalized oral assessments for ten English L2 learners. To guide the GAI output, we prompted it with each learner’s IELTS scores, teachers’ observations on their classroom performance, current language proficiency, and personality traits. Furthermore, in the prompt, we established constraints to exclude any writing tasks from the assessment and created guidelines to ensure that each student has equal opportunities to speak and engage. The output was satisfactory with all the information, constraints, and guidelines we provided in the prompt. Although our student profiles were fictional, we suggest that, in actual practice, teachers should carefully review the GAI-generated drafts. They should correct inaccuracies and replace unfamiliar material with content students recognize and relate to. This iterative cycle of prompting and refining ensures that the final assessments are accurate, fair, built on learners’ strengths, and tailored to their needs.
In addition to guiding and refining GAI’s outputs, teachers should equip students with the necessary skills during lessons to ensure the alignment between assessment and instructional activities. To achieve this, teachers should purposefully integrate GAI-created content into daily instructions rather than using them as isolated assessments. This content can prompt group discussions, role-plays, or formative assessments to gradually build students’ knowledge and cognitive skills. By thoughtfully embedding these elements into regular instructional activities, teachers can reinforce the connection between teaching and assessment and create a more cohesive and meaningful learning experience. During this process, if students seem confused by a GAI-generated task in a class activity, the teacher can swiftly clarify or adjust it, using that observation as feedback to refine future GAI prompts.
Furthermore, teachers can also involve colleagues in reviewing GAI-generated tasks as a peer-review process. A department might establish a routine where another experienced teacher examines the GAI-designed test for bias or misalignment. This collaborative professionalism helps refine the final assessment instrument to meet the same standards as any human teacher-designed test. It also creates opportunities for teachers to share effective prompting techniques. If a particular prompt wording helped produce well-aligned questions, the teacher could share it with colleagues. Such dialogue checks for content flaws and improves the team’s prompting strategies.
In summary, teacher involvement in initial prompt design, item generation, editing, and final approval is essential to ensure the alignment between learning objectives, classroom instruction, and evaluation.
Establishing and Applying Rigorous Evaluation Criteria to GAI-Generated Content
As discussed earlier, teachers should actively curate GAI-generated content to ensure alignment. To support this process, schools and teachers must establish clear evaluation criteria for adopting and reviewing the content and items generated by GAI. These criteria can serve as a quality-control checklist, and the checklist may include the following considerations:
Alignment with Learning Objectives
Researchers have consistently emphasized that the assessment task must align with the specific knowledge or skills based on the curriculum standards (Biggs, 1996). If any part of a GAI-generated question includes content unrelated to the learning objectives or material not previously taught, teachers should revise the question by removing irrelevant components or prompt the GAI to generate a new version based solely on the covered material. Adopting this approach helps ensure that the assessment accurately reflects students’ learning.
Cognitive Complexity
Apart from content alignment, teachers should also identify the level of thinking required in the assessment, such as application, analysis, and evaluation, and ensure it matches what was intended and what students were prepared for. Teachers may consider reviewing GAI-generated items using frameworks such as Bloom’s Taxonomy (Bloom et al., 1956) or Webb’s Depth of Knowledge (Webb, 1997) to confirm appropriate complexity.
Language Clarity
All wording should be clear, unambiguous, and suitable for students’ ages and educational backgrounds. For instance, if a GAI-generated question includes overly formal or complicated language in the instructions, especially for younger learners or those with limited literacy skills, teachers should revise it using more straightforward and accessible language to ensure that all students can understand the task requirements without confusion.
Fairness and Bias
Teachers should ensure that the content is free from cultural or gender bias and stereotyping by reviewing names, contexts, and scenarios to ensure they reflect varied cultural perspectives and that no group is consistently portrayed negatively (Holmes, Bialik, & Fadel, 2019). If an item might advantage or disadvantage certain students for reasons unrelated to the intended learning construct, it should be revised or removed to maintain fairness.
Validity and Authenticity
Teachers should confirm that the task measures the intended skill or knowledge to avoid construct-irrelevant factors. They should also determine whether the task is meaningful and authentic, closely reflecting how the skill would be applied in real-world contexts. Teachers can prompt the GAI to frame assessment questions within realistic scenarios to boost authenticity. Incorporating situations that students might encounter outside the classroom, such as resolving a common workplace issue or engaging in everyday problem-solving, can help students see the direct relevance of the assessment.
Validity and Reliability
If multiple versions of GAI-generated tests are used for different tutorial groups or makeup exams, teachers should ensure all versions are comparable in difficulty and length to maintain fairness. Likewise, its validity and reliability must be verified if GAI is used for scoring or feedback. Teachers should pilot the tool on sample student responses and compare the GAI’s results with human grading. If there are significant discrepancies, teachers should include detailed rubrics or scoring criteria, provide examples of different-level responses in the prompts, and calibrate the GAI’s evaluations against human-assigned scores to improve validity and reliability.
In conclusion, establishing and applying evaluation criteria to GAI-generated content is essential to ensuring its alignment with learning objectives regarding content and cognitive complexity and maintaining clarity, authenticity, validity, and reliability. This approach will enable teachers to confidently incorporate GAI tools into assessment practices.
Ensuring Transparency and Managing Student Concerns
To address GAI’s opacity and alleviate student concerns, teachers themselves must first understand how GAI makes decisions. For instance, teachers should request GAI to clarify why it generates specific questions or categorizes students in a particular way to ensure that the assessment tasks are valid and aligned with learning goals, especially if the categorization and task assignment were not intentionally and explicitly guided by the teachers’ prompting. Teachers should also be transparent with students about when and how GAI is used. They may explain that some tasks were created with GAI’s assistance but were carefully reviewed and adapted by the teacher. This reassures students that the teacher is still in control and that GAI is a supportive tool used thoughtfully and responsibly. Adopting this approach can help build students’ trust and make them more comfortable with GAI-assisted assessments.
Moreover, teachers can invite students to provide feedback on GAI-designed assessments by conducting a quick survey to identify questions that were unclear, confusing, or misaligned with what they learned. Suppose several students flag a particular GAI-generated question. In that case, the teacher can reflect on whether it was a flaw in the GAI-generated item or a gap in classroom instruction that needs further attention and then address the issue accordingly in future assessments or lessons. This approach helps teachers catch problems they may have overlooked and gives students a sense of agency in the assessment process. Consequently, students will feel their voices matter in this new territory.
Additionally, it is essential to address students’ anxiety and resistance empathetically. Teachers may engage in open discussions to clarify misconceptions and reassure students that the teachers will make the final grading decisions, not the GAI. Teachers could also consider demonstrating the creation process of GAI-generated questions, including the initial prompt, the GAI’s output, and subsequent teacher edits. This approach could help demystify the technology and reinforce transparency. If necessary, teachers could offer alternative assessment options to ensure that students feel supported and respected.
By combining professional expertise, transparent communication, and empathetic engagement, teachers can manage concerns around GAI’s opacity, foster student trust, and cultivate a student-centered learning environment.
Establishing Clear Guidelines for GAI Use and Academic Integrity
Clear guidelines for students and teachers are essential to ensure the responsible use of GAI in assessment.
Regarding students’ use of GAI, schools should work with teachers to establish policies that clarify when and how students can use GAI tools, how to credit GAI assistance appropriately, and what distinguishes legitimate use from academic misconduct. For example, a policy might permit students to use GAI to brainstorm ideas or polish the language but forbid using it to write complete answers or assignments. Teachers may also require students to include a short reflection describing any GAI support used, which encourages transparency and discourages over-reliance.
Regarding teachers’ use of GAI, schools may require that apart from the teacher themselves, at least one colleague must review the GAI-generated test before administering it to students. It may also be wise to avoid using GAI for high-stakes exams without extensive prior testing of its outputs in low-stakes settings. For instance, teachers may first introduce a GAI-generated question in a practice quiz or homework and observe how students approach it. After refining the question and confirming that it works well, the teacher could include a similar item on a graded test but assign it a lighter weight. Only when students are more familiar and comfortable with GAI-generated tasks and the validity and reliability of the content have been demonstrated can teachers gradually increase its use.
In addition, the guidelines must address data privacy and security concerns. Schools should ensure that any GAI tools employed comply with relevant student data protection laws and that only necessary information is shared or stored. IT staff must review the tool’s privacy policies to confirm that student data is used exclusively for educational purposes and not misused or retained unnecessarily. These measures will help protect student privacy and maintain trust in GAI-assisted assessments.
By developing and following such guidelines, schools, and institutions can maintain academic integrity and pedagogical soundness in the GAI age.
Ongoing Training and Support
Professional development and training can empower teachers to utilize GAI tools more confidently and effectively. Such training could focus on enhancing teachers’ technical skills, such as prompt engineering, and pedagogical strategies, like aligning GAI-generated tasks with learning outcomes and identifying potential bias in content. This training can help address one of the primary challenges in GAI integration: the lack of preparedness among educators (Swiecki et al., 2022). Teachers can engage in a workshop where they collaboratively create prompts for a specific topic, test them with a GAI tool, and then collectively assess the outputs, as we did in our student-directed seminar. By experimenting with various wording and constraints in their prompts and observing how those modifications influence the GAI’s responses, teachers can enhance their understanding of how to achieve valuable results. Such exercises also allow teachers to discover GAI’s limitations in a low-stakes environment. Additionally, teachers can share experiences and resources to support one another. Over time, this collaborative approach will help establish a foundation of best practices for integrating GAI into assessment design and a broader educational context.
Conclusion
In this critique, we acknowledge that GAI has the potential to enhance assessment practices by generating personalized tasks and realistic scenarios. It not only boosts student engagement but also improves the efficiency and scalability of assessment design. At the same time, we pointed out that there are ethical and practical challenges that cannot be ignored, including potential biases, validity and reliability issues, risk of over-reliance, transparency, and privacy concerns. Thereafter, we suggested that humans remain in control of the process to harness GAI’s benefits while mitigating these risks.
Teachers and policymakers should actively guide and oversee the alignment of GAI-generated assessments with learning objectives and instructional designs, apply rigorous evaluation criteria to GAI-designed tasks, ensure transparency with students about GAI’s role, establish clear usage guidelines and academic integrity policies, and enhance AI literacy by pursuing ongoing training.
With these safeguard measures, GAI can enrich assessment design and student learning more ethically and effectively without compromising human expertise.
Reference List
Bennett, R. E. (2024). Personalizing Assessment: Dream or Nightmare?. Educational Measurement: Issues and Practice, 43(4), 119-125. https://doi.org/10.1111/emip.12652
Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32(3), 347–364. https://doi.org/10.1007/BF00138871
Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. Longmans, Green.
Hager, P., & Butler, J. (1996). Two Models of Educational Assessment. Assessment & Evaluation in Higher Education, 21(4), 367–378. https://doi.org/10.1080/0260293960210407
Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education promises and implications for teaching and learning. Center for Curriculum Redesign.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. https://doi.org/10.1007/BF00117714
Sadler, D. R. (2007). Perils in the meticulous specification of goals and assessment criteria. Assessment in Education: Principles, Policy & Practice, 14(3), 387–392. https://doi.org/10.1080/09695940701592097
Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J. M., Milligan, S., … & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence, 3, 100075.
Webb, N. L. (1997). Criteria for alignment of expectations and assessments in mathematics and science education. Council of Chief State School Officers.