iSQI CT-AI Practice Test - Questions Answers, Page 4

List of questions
Question 31

Which ONE of the following models BEST describes a way to model defect prediction by looking at the history of bugs in modules by using code quality metrics of modules of historical versions as input?
Identifying the relationship between developers and the modules developed by them.
Search of similar code based on natural language processing.
Clustering of similar code modules to predict based on similarity.
Using a classification model to predict the presence of a defect by using code quality metrics as the input data.
Defect prediction models aim to identify parts of the software that are likely to contain defects by analyzing historical data and code quality metrics. The primary goal is to use this predictive information to allocate testing and maintenance resources effectively. Let's break down why option D is the correct choice:
Understanding Classification Models:
Classification models are a type of supervised learning algorithm used to categorize or classify data into predefined classes or labels. In the context of defect prediction, the classification model would classify parts of the code as either 'defective' or 'non-defective' based on the input features.
Input Data - Code Quality Metrics:
The input data for these classification models typically includes various code quality metrics such as cyclomatic complexity, lines of code, number of methods, depth of inheritance, coupling between objects, etc. These metrics help the model learn patterns associated with defects.
Historical Data:
Historical versions of the code along with their defect records provide the labeled data needed for training the classification model. By analyzing this historical data, the model can learn which metrics are indicative of defects.
Why Option D is Correct:
Option D specifies using a classification model to predict the presence of defects by using code quality metrics as input data. This accurately describes the process of defect prediction using historical bug data and quality metrics.
Eliminating Other Options:
A . Identifying the relationship between developers and the modules developed by them: This does not directly involve predicting defects based on code quality metrics and historical data.
B . Search of similar code based on natural language processing: While useful for other purposes, this method does not describe defect prediction using classification models and code metrics.
C . Clustering of similar code modules to predict based on similarity: Clustering is an unsupervised learning technique and does not directly align with the supervised learning approach typically used in defect prediction models.
ISTQB CT-AI Syllabus, Section 9.5, Metamorphic Testing (MT), describes various testing techniques including classification models for defect prediction.
'Using AI for Defect Prediction' (ISTQB CT-AI Syllabus, Section 11.5.1).
Question 32

Which ONE of the following options describes a scenario of A/B testing the LEAST?
A comparison of two different websites for the same company to observe from a user acceptance perspective.
A comparison of two different offers in a recommendation system to decide on the more effective offer for same users.
A comparison of the performance of an ML system on two different input datasets.
A comparison of the performance of two different ML implementations on the same input data.
A/B testing, also known as split testing, is a method used to compare two versions of a product or system to determine which one performs better. It is widely used in web development, marketing, and machine learning to optimize user experiences and model performance. Here's why option C is the least descriptive of an A/B testing scenario:
Understanding A/B Testing:
In A/B testing, two versions (A and B) of a system or feature are tested against each other. The objective is to measure which version performs better based on predefined metrics such as user engagement, conversion rates, or other performance indicators.
Application in Machine Learning:
In ML systems, A/B testing might involve comparing two different models, algorithms, or system configurations on the same set of data to observe which yields better results.
Why Option C is the Least Descriptive:
Option C describes comparing the performance of an ML system on two different input datasets. This scenario focuses on the input data variation rather than the comparison of system versions or features, which is the essence of A/B testing. A/B testing typically involves a controlled experiment with two versions being tested under the same conditions, not different datasets.
Clarifying the Other Options:
A . A comparison of two different websites for the same company to observe from a user acceptance perspective: This is a classic example of A/B testing where two versions of a website are compared.
B . A comparison of two different offers in a recommendation system to decide on the more effective offer for the same users: This is another example of A/B testing in a recommendation system.
D . A comparison of the performance of two different ML implementations on the same input data: This fits the A/B testing model where two implementations are compared under the same conditions.
ISTQB CT-AI Syllabus, Section 9.4, A/B Testing, explains the methodology and application of A/B testing in various contexts.
'Understanding A/B Testing' (ISTQB CT-AI Syllabus).
Question 33

Max. Score: 2
Al-enabled medical devices are used nowadays for automating certain parts of the medical diagnostic processes. Since these are life-critical process the relevant authorities are considenng bringing about suitable certifications for these Al enabled medical devices. This certification may involve several facets of Al testing (I - V).
I . Autonomy
II . Maintainability
III . Safety
IV . Transparency
V . Side Effects
Which ONE of the following options contains the three MOST required aspects to be satisfied for the above scenario of certification of Al enabled medical devices?
Aspects II, III and IV
Aspects I, II, and III
Aspects III, IV, and V
Aspects I, IV, and V
For AI-enabled medical devices, the most required aspects for certification are safety, transparency, and side effects. Here's why:
Safety (Aspect III): Critical for ensuring that the AI system does not cause harm to patients.
Transparency (Aspect IV): Important for understanding and verifying the decisions made by the AI system.
Side Effects (Aspect V): Necessary to identify and mitigate any unintended consequences of the AI system.
Why Not Other Options:
Autonomy and Maintainability (Aspects I and II): While important, they are secondary to the immediate concerns of safety, transparency, and managing side effects in life-critical processes.
Question 34

Which ONE of the following options represents a technology MOST TYPICALLY used to implement Al?
Search engines
Procedural programming
Case control structures
Genetic algorithms
Technology Most Typically Used to Implement AI: Genetic algorithms are a well-known technique used in AI . They are inspired by the process of natural selection and are used to find approximate solutions to optimization and search problems. Unlike search engines, procedural programming, or case control structures, genetic algorithms are specifically designed for evolving solutions and are commonly employed in AI implementations.
Reference: ISTQB_CT-AI_Syllabus_v1.0, Section 1.4 AI Technologies, which identifies different technologies used to implement AI.
Question 35

Which ONE of the following characteristics is the least likely to cause safety related issues for an Al system?
Non-determinism
Robustness
High complexity
Self-learning
The question asks which characteristic is least likely to cause safety-related issues for an AI system. Let's evaluate each option:
Non-determinism (A): Non-deterministic systems can produce different outcomes even with the same inputs, which can lead to unpredictable behavior and potential safety issues.
Robustness (B): Robustness refers to the ability of the system to handle errors, anomalies, and unexpected inputs gracefully. A robust system is less likely to cause safety issues because it can maintain functionality under varied conditions.
High complexity (C): High complexity in AI systems can lead to difficulties in understanding, predicting, and managing the system's behavior, which can cause safety-related issues.
Self-learning (D): Self-learning systems adapt based on new data, which can lead to unexpected changes in behavior. If not properly monitored and controlled, this can result in safety issues.
ISTQB CT-AI Syllabus Section 2.8 on Safety and AI discusses various factors affecting the safety of AI systems, emphasizing the importance of robustness in maintaining safe operation.
Question 36

A system was developed for screening the X-rays of patients for potential malignancy detection (skin cancer). A workflow system has been developed to screen multiple cancers by using several individually trained ML models chained together in the workflow.
Testing the pipeline could involve multiple kind of tests (I - III):
I . Pairwise testing of combinations
II . Testing each individual model for accuracy
III . A/B testing of different sequences of models
Which ONE of the following options contains the kinds of tests that would be MOST APPROPRIATE to include in the strategy for optimal detection?
Only III
I and II
I and III
Only II
The question asks which combination of tests would be most appropriate to include in the strategy for optimal detection in a workflow system using multiple ML models.
Pairwise testing of combinations (I): This method is useful for testing interactions between different components in the workflow to ensure they work well together, identifying potential issues in the integration.
Testing each individual model for accuracy (II): Ensuring that each model in the workflow performs accurately on its own is crucial before integrating them into a combined workflow.
A/B testing of different sequences of models (III): This involves comparing different sequences to determine which configuration yields the best results. While useful, it might not be as fundamental as pairwise and individual accuracy testing in the initial stages.
ISTQB CT-AI Syllabus Section 9.2 on Pairwise Testing and Section 9.3 on Testing ML Models emphasize the importance of testing interactions and individual model accuracy in complex ML workflows.
Question 37

''BioSearch'' is creating an Al model used for predicting cancer occurrence via examining X-Ray images. The accuracy of the model in isolation has been found to be good. However, the users of the model started complaining of the poor quality of results, especially inability to detect real cancer cases, when put to practice in the diagnosis lab, leading to stopping of the usage of the model.
A testing expert was called in to find the deficiencies in the test planning which led to the above scenario.
Which ONE of the following options would you expect to MOST likely be the reason to be discovered by the test expert?
A lack of similarity between the training and testing data.
The input data has not been tested for quality prior to use for testing.
A lack of focus on choosing the right functional-performance metrics.
A lack of focus on non-functional requirements testing.
The question asks which deficiency is most likely to be discovered by the test expert given the scenario of poor real-world performance despite good isolated accuracy.
A lack of similarity between the training and testing data (A): This is a common issue in ML where the model performs well on training data but poorly on real-world data due to a lack of representativeness in the training data. This leads to poor generalization to new, unseen data.
The input data has not been tested for quality prior to use for testing (B): While data quality is important, this option is less likely to be the primary reason for the described issue compared to the representativeness of training data.
A lack of focus on choosing the right functional-performance metrics (C): Proper metrics are crucial, but the issue described seems more related to the data mismatch rather than metric selection.
A lack of focus on non-functional requirements testing (D): Non-functional requirements are important, but the scenario specifically mentions issues with detecting real cancer cases, pointing more towards data issues.
ISTQB CT-AI Syllabus Section 4.2 on Training, Validation, and Test Datasets emphasizes the importance of using representative datasets to ensure the model generalizes well to real-world data.
Sample Exam Questions document, Question #40 addresses issues related to data representativeness and model generalization.
Question 38

A ML engineer is trying to determine the correctness of the new open-source implementation *X', of a supervised regression algorithm implementation. R-Square is one of the functional performance metrics used to determine the quality of the model.
Which ONE of the following would be an APPROPRIATE strategy to achieve this goal?
Add 10% of the rows randomly and create another model and compare the R-Square scores of both the model.
Train various models by changing the order of input features and verify that the R-Square score of these models vary significantly.
Compare the R-Square score of the model obtained using two different implementations that utilize two different programming languages while using the same algorithm and the same training and testing data.
Drop 10% of the rows randomly and create another model and compare the R-Square scores of both the models.
A . Add 10% of the rows randomly and create another model and compare the R-Square scores of both the models.
Adding more data to the training set can affect the R-Square score, but it does not directly verify the correctness of the implementation.
B . Train various models by changing the order of input features and verify that the R-Square score of these models vary significantly.
Changing the order of input features should not significantly affect the R-Square score if the implementation is correct, but this approach is more about testing model robustness rather than correctness of the implementation.
C . Compare the R-Square score of the model obtained using two different implementations that utilize two different programming languages while using the same algorithm and the same training and testing data.
This approach directly compares the performance of two implementations of the same algorithm. If both implementations produce similar R-Square scores on the same training and testing data, it suggests that the new implementation 'X' is correct.
D . Drop 10% of the rows randomly and create another model and compare the R-Square scores of both the models.
Dropping data can lead to variations in the R-Square score but does not directly verify the correctness of the implementation.
Therefore, option C is the most appropriate strategy because it directly compares the performance of the new implementation 'X' with another implementation using the same algorithm and datasets, which helps in verifying the correctness of the implementation.
Question 39

'Splendid Healthcare' has started developing a cancer detection system based on ML. The type of cancer they plan on detecting has 2% prevalence rate in the population of a particular geography. It is required that the model performs well for both normal and cancer patients.
Which ONE of the following combinations requires MAXIMIZATION?
Maximize precision and accuracy
Maximize accuracy and recall
Maximize recall and precision
Maximize specificity number of classes
Prevalence Rate and Model Performance:
The cancer detection system being developed by 'Splendid Healthcare' needs to account for the fact that the type of cancer has a 2% prevalence rate in the population. This indicates that the dataset is highly imbalanced with far fewer positive (cancer) cases compared to negative (normal) cases.
Importance of Recall:
Recall, also known as sensitivity or true positive rate, measures the proportion of actual positive cases that are correctly identified by the model. In medical diagnosis, especially cancer detection, recall is critical because missing a positive case (false negative) could have severe consequences for the patient. Therefore, maximizing recall ensures that most, if not all, cancer cases are detected.
Importance of Precision:
Precision measures the proportion of predicted positive cases that are actually positive. High precision reduces the number of false positives, meaning fewer people will be incorrectly diagnosed with cancer. This is also important to avoid unnecessary anxiety and further invasive testing for those who do not have the disease.
Balancing Recall and Precision:
In scenarios where both false negatives and false positives have significant consequences, it is crucial to balance recall and precision. This balance ensures that the model is not only good at detecting positive cases but also accurate in its predictions, reducing both types of errors.
Accuracy and Specificity:
While accuracy (the proportion of total correct predictions) is important, it can be misleading in imbalanced datasets. In this case, high accuracy could simply result from the model predicting the majority class (normal) correctly. Specificity (true negative rate) is also important, but for a cancer detection system, recall and precision take precedence to ensure positive cases are correctly and accurately identified.
Conclusion:
Therefore, for a cancer detection system with a low prevalence rate, maximizing both recall and precision is crucial to ensure effective and accurate detection of cancer cases.
Question 40

Which ONE of the following options describes the LEAST LIKELY usage of Al for detection of GUI changes due to changes in test objects?
Using a pixel comparison of the GUI before and after the change to check the differences.
Using a computer vision to compare the GUI before and after the test object changes.
Using a vision-based detection of the GUI layout changes before and after test object changes.
Using a ML-based classifier to flag if changes in GUI are to be flagged for humans.
A. Using a pixel comparison of the GUI before and after the change to check the differences.
Pixel comparison is a traditional method and does not involve AI . It compares images at the pixel level, which can be effective but is not an intelligent approach. It is not considered an AI usage and is the least likely usage of AI for detecting GUI changes.
B. Using computer vision to compare the GUI before and after the test object changes.
Computer vision involves using AI techniques to interpret and process images. It is a likely usage of AI for detecting changes in the GUI .
C. Using vision-based detection of the GUI layout changes before and after test object changes.
Vision-based detection is another AI technique where the layout and structure of the GUI are analyzed to detect changes. This is a typical application of AI .
D. Using a ML-based classifier to flag if changes in GUI are to be flagged for humans.
An ML-based classifier can intelligently determine significant changes and decide if they need human review, which is a sophisticated AI application.
Question