What are the differences between TOVA and our QIKtest?  What are the limitations of the TOVA test?  Are Gaussian Norms invalid?
by Siegfried Othmer, Chief Scientist, The EEG Institute
by Siegfried Othmer, Chief Scientist, The EEG Institute

At the top level, the TOVA differs from the Conners in that the latter has variable inter-stimulus intervals versus the fixed intervals of TOVA. We came down on the side of TOVA on this, because we agreed with the TOVA people that invariance of the conditions was a precondition for the monotony that places ADHD children (and all of us!) under challenge.

            So we used the TOVA for some fifteen years, from 1990 to 2005. (TOVA even published some of our results in their newsletter.) We tried to get the TOVA people to resolve some issues, but because they already had an established customer base they were locked into their design. We had no alternative but to go a different path. This path emulated the TOVA so closely that we were able to continue to use the TOVA norms, which we did for some years.

We have been in the unique position within the field of having a central server collect anonymized CPT data. By 2013 we had accumulated sufficient data to allow new norms to be developed on the basis of these data. This was accomplished on the basis of some 50,000 records, sufficient to allow each age and gender been to be normed on the basis of 500 subjects. That compares to the TOVA norm, which I believe rests on less than 1000 subjects in total for all age and gender groups..

            This yielded population-based norms, which now constitutes one of the major distinctions between the QIKtest and the TOVA, and it is the most critical distinction. Why population-based norms? Because we have no real alternative! TOVA based its norms on a Gaussian model (the normal curve). But the distributions are not generally normal. In particular, the most critical variables—the discrete errors—are not normally distributed. This is most readily apparent with omission errors, where the modal value for all ages is zero. One cannot have a Gaussian distribution with a mean of zero.

            So, if parametric analysis is ruled out, one is left with non-parametric statistics as the only viable alternative, and that takes you to population-based norms (percentiles). In order to do that well, however, one needs large numbers in the pool. And by 2013 we were in the fortunate position of having those large numbers available.

            I might dwell on this issue for just a moment more. It turns out that all the discrete errors are power-law distributed. The Gaussian is the shortest-tailed distribution descriptive of natural phenomena with inherent variability. By contrast, the power-law distribution is the longest-tailed distribution descriptive of natural phenomena. One could even say that it is all tail! It is also called the scale-free distribution, because any part of it looks like any other part. So viewed in that perspective, one could say that the TOVA norming project could not have been more misguided than it was! But we were all Gaussians in those days, and if it had been my job to do the norming at that time, I would have done the same thing…. But now we know better.

            What are the clinical implications? With the TOVA, a lot of children find themselves in the tail of the distribution and on that basis are assigned to the stimulant-administration bin. In our assessment, many of them would be scored differently. But this highlights yet another critical distinction. Our testing serves the objective of aiding a neurofeedback strategy. We are not looking to render a digital judgment of whether or not a child meets some diagnostic threshold. We are interested in characterizing the strengths and weaknesses of a particular nervous system.

            In that project, the top-level scores aren’t the dominant issue, because we already know that irrespective of where anyone falls on the performance curve, neurofeedback training is likely to effect an improvement—if there is room for improvement. So our report includes a lot of graphical features that allow the clinician to get a handle on how the particular nervous system at issue reacts to challenges, to changes in conditions, and to extended monotony.

            However, with regard to the top-level scores, there is yet one more that calls for comment. The TOVA analysis does not take separate account of outliers in reaction time. And yet we have shown that outliers are a separate and distinct phenomenon that has to be taken into account (and yes, that should have been published long ago). ADHD children aren’t systematically slow at all. They are more subject to outlier events. And improvements in mean reaction time demonstrated with stimulants or neurofeedback tend to reflect the dropout of outlier events rather than a significant improvement in mean RT.

            A final reflection for your consideration is that by virtue of our having a central server, improvements in our analysis can be incorporated as we go. Anyone can go back and have their data re-analyzed according to the latest schema. Thus, for example, we recently introduced age dependence of the threshold for anticipatory errors. Further refinements are in store. None of this touches the bulk of the report, which consists of various illustrative presentations of the actual unvarnished data. These remain our ground truth for the clinical work.

            Ideally, the clinical interpretation of the data by the therapist would be followed by a description to the client in which the client recognizes himself or the parents recognize their child. This gets us close to the specific objectives of the training, and helps to set the training agenda. This is not what the TOVA is aiming for, and not the Conners’ test either.