In a demonstration of efficiency, Fisher et al. (2003) beginners had graphs simulated with the DC method evaluated. The interrater agreement increased from 71% to 95% after training in the DC method. In addition, Lanovaz, Huxley and Dufour (2017) reported that the DC method provides appropriate control of the Type I error rate with unsimulated data. In a recent study, Wolfe, Seaman, Drasgow and Sherlock (2018) assessed the match rate between the CDC method and experienced visual analysts on 66 AB levels from several published baseline diagrams. Their results show that the average concordance between the CDC method and experienced visual analysts was 84%, indicating that the CDC method can still provide erroneous classifications in one in six cases. Another limitation is that the DC and CDC methods do not directly address problems related to the variability and immediacy of behavioral changes. Levin, J. R., Ferron, J.M., &Kratochwill, T. R.
(2012). Nonparametric statistical tests for systematic and randomized abab. Alternating and lowering treatment intervention designs: new developments, new directions. Journal of School Psychology, age 50, 599-624. doi.org/10.1016/j.jsp.2012.05.001. To study generalization on simulated records, we used the best models using non-simulated data (i.e. The same models as InStudie 1) were trained and validated to make our predictions. In particular, our analyses checked accuracy, Type I error rate and performance.
Precision indicates the extent to which the predictions match the true labels of the datasets (i.e., clear change from no clear change). The accuracy of the calculation included adding the number of agreements between the real labels and the model predictions (i.e. a clear change or no clear change) and dividing the sum by 20,000. The Type I error rate represents false positive results. In case-by-case design analysis, a false positive result appears when a behavioral analyst incorrectly concludes that a graph shows a clear change in the absence of actual change (i.e., variations are the result of chance). The calculation of the Type I error rate consisted of dividing by 10,000 the number of disagreements between the actual labels and the model predictions on the diagrams that did not show a clear change (according to the real labels). Power represents the extent to which a forecast identifies an effect that exists. In our case, performance is the part of the time during which the model detects a clear change when a real change has occurred. To calculate performance, our program has divided by 10,000 the number of agreements between real labels and predictions on diagrams that show a clear change (according to real labels).
. . .