Symbolsa, b | PTML-MLP | PTML-LDA | PTML-SVM | PTML-RF |
---|
Training Set | Test Set | Training Set | Test Set | Training Set | Test Set | Training Set | Test Set |
---|
NActive | 1482 | 492 | 1482 | 492 | 1482 | 492 | 1482 | 492 |
CCActive | 1317 | 401 | 925 | 312 | 1230 | 392 | 1243 | 390 |
Sn | 88.87% | 81.50% | 62.42% | 63.41% | 83.00% | 79.67% | 83.87% | 79.27% |
NInactive | 1260 | 418 | 1260 | 418 | 1260 | 418 | 1260 | 418 |
CCInactive | 1050 | 314 | 774 | 251 | 938 | 265 | 1023 | 293 |
Sp | 83.33% | 75.12% | 61.43% | 60.05% | 74.44% | 63.40% | 81.19% | 70.10% |
nMCC | 0.862 | 0.784 | 0.619 | 0.617 | 0.789 | 0.719 | 0.825 | 0.748 |
- aNActive – Number of chemicals/cases annotated as active; NInactive – Number of chemicals/cases labeled as inactive; CCActive – Chemicals/cases properly identified as active; CCInactive – Chemicals/cases properly identified as inactive; Sn – Sensitivity; Sp – Specificity; nMCC – Normalized Matthews correlation coefficient. bThe PTML models depicted here are the best found by us; the software used to find all the PTML models was STATISTICA v13.5.0.17. As in the case of the tunning hyperparameters reported by us for the PTML-MLP model (see Material and Methods section), the ones reported for the alternative PTML models are also provided. For PTML-LDA, the option of including all the D[TBI]cj descriptors was applied and the prior probability values for active and inactive were 0.485 and 0.515, respectively. The PTML-SVM model was obtained by using SVM classification type 2, radial basis functional as the kernel, gamma = 0.067, nu = 0.540, number of support vectors = 1623 (1336 bounded), number of iterations = 1000, and stopping error = 0.001. The PTML-RF found by us contained number of trees = 65, subsample proportion = 0.5, minimum number of cases = 40, maximum number of levels = 10, minimum number in the child node = 5, maximum number of nodes = 100, and prior probabilities of 0.475 and 0.525 for active and inactive, respectively