May 2009
Volume 50, Issue 5
Free
Retina  |   May 2009
Development of Predictive Models of Proliferative Vitreoretinopathy Based on Genetic Variables: The Retina 4 Project
Author Affiliations
  • Jimena Rojas
    From the Department of Ophthalmology, University of Valladolid, IOBA (Institute for Research in Ophthalmobiology), Valladolid, Spain;
  • Itziar Fernandez
    Department of Statistics, Ciber BBN (Centro de Investigación Biomédica en Red–Bioengineering, Biomaterials and Nanomedicine); and
  • J. Carlos Pastor
    From the Department of Ophthalmology, University of Valladolid, IOBA (Institute for Research in Ophthalmobiology), Valladolid, Spain;
  • Maria-Teresa Garcia-Gutierrez
    From the Department of Ophthalmology, University of Valladolid, IOBA (Institute for Research in Ophthalmobiology), Valladolid, Spain;
  • Rosa-Maria Sanabria
    From the Department of Ophthalmology, University of Valladolid, IOBA (Institute for Research in Ophthalmobiology), Valladolid, Spain;
  • Maria Brion
    Complexo Hospitalario,
  • Beatriz Sobrino
    Centro Nacional de Genotipado (CeGen), and
  • Lucia Manzanas
    Department of Ophthalmology, Clinic University Hospital of Valladolid, Valladolid, Spain.
  • Antonio Giraldo
    Department of Ophthalmology, Clinic University Hospital of Valladolid, Valladolid, Spain.
  • Enrique Rodriguez-de la Rua
    From the Department of Ophthalmology, University of Valladolid, IOBA (Institute for Research in Ophthalmobiology), Valladolid, Spain;
  • Angel Carracedo
    Fundacion Galega de Medicina Xenomica, CIBERER (Centro de Investigación Biomédica en Red Enfermedades Raras), University of Santiago de Compostela, Santiago, Spain; and the
Investigative Ophthalmology & Visual Science May 2009, Vol.50, 2384-2390. doi:https://doi.org/10.1167/iovs.08-2670
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jimena Rojas, Itziar Fernandez, J. Carlos Pastor, Maria-Teresa Garcia-Gutierrez, Rosa-Maria Sanabria, Maria Brion, Beatriz Sobrino, Lucia Manzanas, Antonio Giraldo, Enrique Rodriguez-de la Rua, Angel Carracedo; Development of Predictive Models of Proliferative Vitreoretinopathy Based on Genetic Variables: The Retina 4 Project. Invest. Ophthalmol. Vis. Sci. 2009;50(5):2384-2390. https://doi.org/10.1167/iovs.08-2670.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

purpose. Machine learning techniques were used to identify which of 14 algorithms best predicts the genetic risk for development of proliferative vitreoretinopathy (PVR) in patients who are experiencing primary rhegmatogenous retinal detachment (RD).

Method

Data from a total of 196 single nucleotide polymorphisms in 30 candidate genes were used. The genotypic profile of 138 patients with PVR following primary rhegmatogenous RD and 312 patients without PVR RD were analyzed. Machine learning techniques were used to develop statistical predictive models. Fourteen models were assessed. Their reproducibility was evaluated by an internal cross-validation method.

Results

The three best predictive models were the lineal kernel based on the Support Vector Machine (SMV), the radial kernel based on the SVM, and the Random Forest. Accuracy values were 78.4%, 70.3%, and 69.3%, respectively. The more accurate, although complex, algorithm uses 42 SNPs, whereas the simpler one uses only two SNPs, which makes them more suitable for routine diagnostic work. The radial kernel based on SVM uses 10 SNPs. The best individually predictor marker was rs2229094 in the tumor necrosis factor locus.

Conclusion

Genetic variables may be useful to predict the likelihood of the development of PVR. The predictive capabilities of these models are as good as those observed with clinical approaches. These results need external validation to estimate the true predictive capability and select the most appropriate ones for clinical use.

Proliferative vitreoretinopathy (PVR) is the main cause of failure of retinal detachment (RD) surgery, occurring in 5% to 10% of patients with RD. 1 2 Even with tremendous advances in RD surgery, the incidence of RD is still similar to that in the early 1980s. 3 4 A promising antiproliferative adjuvant treatment, intraocular 5-fluorouracil combined with low-molecular-weight heparin, 5 is not effective when PVR is already established. 6 In addition, this and other drugs used for PVR prophylaxis are not free of side effects. 7  
The identification of patients at high risk for PVR would allow us to select those for more specific treatments. It would also help us to unravel the molecular basis involved in the process, pointing out new potential therapeutic targets. Most researchers have worked on the identification of risk factors based on the analysis of clinical data or inflammatory mediators obtained from the vitreous cavity. 8 9 10 11 These studies have led to the development of predictive formulas based on clinical variables. However, with specificity and sensitivity values no greater than 80% and 60%, respectively, the results suggest that these factors do not completely explain the risk for PVR development. 
Our group has recently described the potential contribution of genetic components to PVR, suggesting that it is a complex disease in which many environmental, clinical, and genetic variables may interact. 12 To know the true genetic contribution to PVR, we started the Retina 4 project, which includes a candidate gene association study (Rojas MJ, et al. IOVS 2008;49:ARVO E-Abstract 1711) and the development of predictive models. The purpose of this work was to construct prediction models based on genotype data. To the best of our knowledge, this is a novel approach for improving the prediction of PVR. 
Materials and Methods
Patient Information
Data were obtained from a candidate gene association study that was performed in patients at eight centers in Spain (Rojas MJ et al., IOVS 2008;49:ARVO E-Abstract 1711). The information was acquired by previous assignment of specific informed consent. The inclusion criteria selected 450 patients who had undergone primary rhegmatogenous RD surgery. Those in whom PVR of grade C1 or higher 13 developed were included in the PVR case group (n = 138), and those in whom it did not develop after 3 months of follow-up were included in the control group (n = 312). To achieve a stringent phenotype classification, patients with secondary RDs of causes other than a primary rhegma were excluded. The study was conducted in compliance with the guidelines in the Declaration of Helsinki. Informed consent was obtained from all participants. 
Single Nucleotide Polymorphism (SNP) Measurement
The genotype profile came from the following 30 candidate genes known to be in the PVR pathways: CTGF, PDGF, PDGFRα, PI3KCG, EGF, FGF2, MIF, MMP2, MMP7, MCP1, IGF1, IGF2, IGF1R, TNF, TNFR2, TGFβ1, TGFβ2, SMAD3, SMAD7, IFNα, IL1α, IL1β, IL1RN, IL6, IL8, IL10, NFκB1, NFκBIA, NFκBIB, and HGF. 3 14 15 16 A total of 196 common SNPs with minor allelic frequencies >10% were selected for study. Selection was based on the disequilibrium blocks described for Caucasians available in HapMap (www.HapMap.org). These were determined by the tagging method implemented in the Haploview program, 17 with a linkage disequilibrium rate of r 2 > 0.8. These parameters allowed us to select the SNPs that explain as much as possible the genetic variation described in each gene. 
Genotyping was performed in the Spanish National Center (CeGen) using a genotyping system (SNPlex; Applied Biosystems, Foster City, CA). This technology uses oligonucleotide ligation, polymerase chain reaction, and capillary electrophoresis to allow high-throughput SNP typing with high accuracy. SNPs were converted into numeric values according to control population frequencies: homozygous major allele, 1; heterozygous, 2; homozygous minor allele, 3. 
Independent Predictive Value
The information gain method 18 was used to obtain the independent predictive value of each SNP and reduce the uncertainty degree in the data. The significance level for each SNP was calculated through the use of random permutation tests 19 with 10,000 iterations. As the number of tests is very high, the significance level was based on the false discovery rate. 20  
Predictive Models
Two-class discriminative models for patients with PVR and controls were built with four machine learning algorithms: Naïve Bayes Classifier, 21 Support Vector Machine (SVM), 22 Decision Tree, 23 and Random Forest. 24 In addition, lineal, radial, quadratic, and cubic kernels were used with the SVM. 
Naïve Bayes is a simple model that uses the frequencies of different values of each SNP in patients with known PVR case or control states. This information was used to predict the class of a new patient with specified genotype profile but with an unknown case or control state. It assumes that each SNP is independent from every other SNP in a given case or control state. Naïve Bayes is generally used as a first approach to solve a classification problem. 
SVMs extend the idea of a simple linear classifier to more complex classifiers determined by the choice of a “kernel.” SVM constructs hyperplanes in the n-dimensional space of the input data. This optimizes the separation between the two groups: PVR and control. In circumstances in which there is no hyperplane that can separate the two classes, the input space will be modified. The mission of user-selected kernels is to transform the input space into another higher dimension space where it may be possible to solve this problem. 
A Decision Tree is a class discriminator that recursively partitions the data set until each partition consists predominantly of individuals from one class. In the given context, the building of the decision tree starts by finding the single SNP that is most discriminative. Next, the same idea is applied in a recursive way to find the most discriminative SNP for the patients in this part of the tree. Finally, the tree is pruned back to avoid the overfitting model problem. 
The Random Forest model is based on a large number of decision trees in which the candidate SNPs of each tree are randomly sampled. To classify a new patient, each tree gives a classification by “voting” for that class. The forest chooses the classification having the most votes over all the trees in the forest. 
Training Set
Each model was trained two ways. First, the entire data, herein termed the large data set of 312 control and 138 PVR cases was run. This required imputation of missing data. Next, a smaller data set, herein termed the small data set that was composed of only 114 controls and 52 cases with complete data, was run. 
The analysis of only those patients with complete data (i.e., the small data set) gives valid inferences under the assumption that missing values are missing completely at random (MCAR). The missing data for variable x are MCAR if the probability of having a missing value for x is unrelated to the values of x itself or to any other variables in the data set. MCAR was evaluated by dividing the data into those with and without missing information. Contingency table analysis was then used to establish that there was no relationship between the PVR and control groups and the key variable. 
The missing data in the large data set were imputed by Random Forest. This algorithm provides a measure of the internal structure of the data, the proximity of different data points to one another. Proximities may be used to impute missing data as inputs to traditional multivariate procedures based on distances and covariance matrices. The iterative algorithm to impute missing values starts replacing them with the most frequent genotype. In each step, the proximity values calculated by Random Forest are used to update the imputation of the missing data. The imputed value is taken from the observation that has the largest proximity to the observation with a missing value. This method maintains accuracy even when a large proportion of the data are missing (http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm). 
Internal Validation
The predictive capability of seven models for the large and small data sets was evaluated by internal validation, according to accuracy, sensitivity or true-positive rate, specificity or true-negative rate, Youden’s J index, and diagnostic odds ratio. A receiver operating characteristic (ROC) curve was also computed. A cross-validation was performed. This tool provides an estimate of how a particular model might do on a new data set drawn from the same statistical distribution. In n-fold cross-validation, the original data set is divided into n equal-sized subsamples. Of the n subsamples, a single subsample is retained as the validation data for testing the model, and the remaining n − 1 subsamples are used as training data. This procedure is continued until each group has been used as test data (n times). The n results from the changes then can be combined to produce a single estimation. We used the leave-one-out method that utilizes n-fold cross-validation where n is the number of observations in the original sample. Each observation in the sample is used once as the validation data. According to the predictive capabilities, the best models were selected. 
Feature Selection
During the construction of the models, different feature selection strategies were used. For the Naïve Bayes and SVM algorithms, informative SNPs were selected by 20-fold cross-validation. In each subset, a backward sequential search was used. We started with all the SNPs, and on each step removed an SNP such that the classifier built using the remaining subset of SNPs made the biggest number of correct predictions. For Decision Tree, feature selection was an inherent part of the algorithm. Finally, for Random Forest, we used the iterative process proposed by Díaz-Uriarte et al., 25 based on the progressive elimination of SNPs with the smallest measure of variable importance. Data analysis was performed with R software. 26  
Results
Sample Characteristics
Based on the comparative characteristics of the sample population (Table 1) , only autonomous community showed significant association with PVR case or control groups (P < 0.001). Information about surgical procedures was collected only for the PVR group. Of those patients, 65.2% developed PVR following surgery, usually after just one procedure. Among postsurgical patients with PVR, there was no association between the use of scleral buckles or PPV and the development of PVR (Table 2)
Predictive Capabilities of Single SNPs
Fourteen SNPs were individually significant (P ≤ 0.05, Table 3 ). With adjustment for multiple hypotheses, only eight of these were significant, of which the TNF locus had the highest level of significance. 
Predictive Capabilities of the Large and Small Data Sets
Among the algorithms, only the Näive Bayes model had no parameters that were significantly different between the large and small data set (Table 4) . For lineal and radial SVM algorithms, differences between the data sets were statistically significant for all the parameters. For the rest of the algorithms, all the parameters of the small data set were usually greater than those of the large data set, although the differences were not always significant. Specifically, the accuracy of the quadratic kernel SVM for the small data set, 65.1% (95% CI: 57.5–71.9), was significantly greater than for that the large data set (Table 4) . The sensitivities of the quadratic kernel SVM and the Random Forest algorithm for the small data set, 53.8% (95% CI: 40.5–66.7) and 65.4% (95% CI: 51.8–76.8), respectively, were significantly greater than for the large data set (Table 4) . Finally, specificities of the cubic kernel SVM algorithm and the Decision Tree for the small data set, 70.2% (95% CI: 61.2–77.8) and 57% (95% CI: 47.8–65.7) respectively, were significantly greater than for the large data set (Table 4) . After the seven algorithms fitted on each data set were combined, the small data set had the largest area under the ROC curve (Fig. 1) . This finding indicated that it had greater reliability in the prediction of PVR development than did the large data set. 
Differences among Models Fitted with the Small Data Set
For the small data set, which contained no imputed values, the accuracy obtained with lineal kernel SVM algorithm (78.4%) exceeded the 95% CI of the other algorithms, and was therefore significantly greater (Table 4) . The most sensitive model was the radial kernel SVM (70.4%, 95% CI: 57.2–80.9), although differences with the lineal kernel SVM (69.2%, 95% CI: 55.7–80.1), Decision Tree (67.3%, 95% CI: 53.8–78.5), and Random Forest models (65.4%, 95% CI: 51.8–76.8) were not statistically significant. The lineal kernel SVM specificity (83.3%; 95% CI: 74.6–89.5) was significantly greater than any of the other models. 
The Youden index is directly related to the quality of the models, with better models having values nearer to 1. For the small data set, the best Youden indexes were produced by the lineal kernel SVM, radial kernel SVM, and Random Forest models (0.5; 0.4, and 0.4, respectively, Table 4 ). These models also showed the best diagnostic odds ratios. Accordingly, the models which best predict the onset of PVR were the lineal kernel SVM with a diagnostic odds ratio of 11.25, radial kernel SVM with a diagnostic odds ratio of 5.59, and Random Forest with a diagnostic odds ratio of 4.64. These models work with 42, 10, and 2 SNPs, respectively (Table 5)
Discussion
PVR is the main cause for the failure of retinal detachment surgery. 1 It is also an important pitfall in the development of new therapeutic strategies for other diseases where creation of a retinal detachment is needed. 27 The lack of satisfactory results in the identification of patients at risk of PVR by clinical characteristics 8 9 10 11 justifies our effort to elucidate the genetic components. 
To find the optimal classification system, we have worked with machine learning algorithms. With the use of these tools, we lose the advantage of easily interpreted classifiers as we adapt to the computational complexity of fitting such models. Unfortunately, with complex data sets, accuracy and simplicity are in conflict. One primary characteristic of genomic data sets is the high dimensionality along with highly correlated features. For this reason the application of traditional methods of statistical analysis, such as discriminant analysis and logistic regression, is limited. These methods depend on a fixed underlying model or functional form, and the objective of both techniques is to discover numerical coefficients for predetermined models. In contrast, machine learning algorithms treatthe underlying structure of data as unknown, offering the possibility of extracting the hidden complex relationships and correlations. 
In addition, the genomic data set typically has too many features in relation to the number of samples. This limits the utility of traditional methods because the number of parameters to estimate requires an unapproachably large sample size. Although a small sample size also can cause important problems in the application of machine learning algorithms, such as overfitting, the ratio of the number of samples to the number of variables is less than traditional methods. Furthermore, many machine learning tools incorporate methods of variable selection that allow us to reduce the dimensionality of the problem. 
Selection strategies for different features, in this case SNPs, are used with each machine learning algorithm. The objective of choosing SNPs is to reduce the dimensionality, eliminating irrelevant features. This has two consequences: more clear information about the disease and reduced costs with respect to computational, storage, and future sample processing. Keeping in mind that the final purpose is to find good disease classifiers, the best SNP set is the one that minimizes the prediction error of model. An ideal selection would be one resulting from exhaustive research of all possible SNP subsets. However, this is computationally overwhelming. A simple solution would be to consider a fixed number of the more informative SNPs. However, as SNPs are not independent of one another, such a model would be optimistically biased (i.e., predictions tend toward the more frequent circumstances). The best prediction is obtained from analyzing the SNPs altogether. In this case, in addition to the computational pitfalls, serious problems caused by overfitting would appear. Considering this scenario, we selected a sequential search. 
Our results show that in the large data set, the algorithm offering the best results was the Decision Tree. However, for this model, only the specificity was significantly greater. In general, algorithms using the small data set yielded better results than those obtained from the entire data set. For the small data set, the sensitivity was remarkable for its ability to predict those patients at high risk of PVR. Except for the Naïve Bayes and cubic kernel SVM algorithms, the sensitivity of the small data set was always better than for the large data set. 
When proper imputations are performed, larger data sets will usually provide models of greater accuracy. For our data, one possibility is that the missing values were not randomly distributed. However, there was no evidence of nonrandom distribution of missing values across the PVR and control groups. Another possibility is that there was not enough information for the imputation to be proper. Only if both situations simultaneously occurred would our results be invalidated. In that case, missing data that cannot be ignored would force us to work with the larger data set. 
Of the algorithms that we tested, the lineal kernel SVM was the best predictive model, although it is quite complex because it works with 42 SNPs. Simpler algorithms that were equally good, as judged by the Youden indexes, were the radial kernel SVM and Random Forest models. They work with 10 and 2 SNPs, respectively. Simplicity is a very important characteristic for the potential application of a predictive model in the clinical setting. 
Any results based on the analysis of the genetic components of any disease would be invalid if the sample was not appropriate. In our population sample, there was an association between PVR and the community in which the patients lived. This suggests that a structure in the sample exists. In a detailed analysis, Castilla y Leon was the autonomous community that introduced this dependence, where a high proportion of cases was observed. One explanation for this is that, conversely to the others, centers located in this autonomous community included in this study are those that manage almost the entire complex RD from Castilla y Leon. 
Focusing on the role of markers included in each selected model, all those included in the lineal kernel SVM have only in common that are part of the inflammatory answer. Of note, 7 of 10 SNPs identified by the radial kernel SVM model are located on genes that code for cytokines with anti-inflammatory actions: IL10, IL1RN, NFκBIA, NFκBIB, and the TGFβs. The main action of IL10 is to limit the inflammatory response by blocking IFNγ, IL-2, TNFα, and IL-4 production. 28 IL1RN binds the IL-1 receptor, inhibiting its union to IL1α and -β, neutralizing their action. 29 NFκBIA and NFκBIB are responsible for inhibition of NFκB, keeping it inactivated in the cytoplasm. Phosphorylation of these proteins and their subsequent degradation is a key step in the activation of the NFκB, a central cytokine in the inflammatory cascade. 30 Finally, TGFβ induces endothelial and epithelial permeability. It also inhibits the expression of macrophage-secreted proteases and induces the production of collagen. In the central nervous system, a tissue functionally and embryologically linked to the retina, TGFβ blocks the action of activated microglia. 31 All these actions contribute to the limitation of the acute inflammation. 32  
It is also remarkable that the two SNPs that work best with the Random Forest algorithm are located in the TNFα and IGF1 genes. These two genes code for cytokines that are central in the initiation of the inflammatory response 33 and in PVR development, respectively. IGF1 is one of the critical mediators of Müller cell transdifferentiation, 34 which plays a central role in the development of PVR. 35  
Considering the SNPs as individual predictors, rs2229094 located in the TNF locus was implicated in the three best models (i.e., the lineal kernel SVM, radial kernel SVM, and Random Forest algorithms). This TNF locus SNP was also the one that had the highest value as an individual predictor for the onset of PVR with by far the lowest adjusted false-discovery rate. This SNP is a nonsynonymous polymorphism, inducing a change from arginine to cysteine (www.ncbi.nlm.nih.gov; National Center for Biotechnology Information, Bethesda, MD). These findings suggest that this marker could be considered a potential causal polymorphism. However, functional studies are necessary to know its true implication in the inflammation process. 
The missing values potentially limit utilization of the algorithms we have fitted. However, SNPlex technology is well suited for applications requiring the testing of large numbers of SNPs and samples. 36 The data we have worked with do not exceed the range of the conversion rate described for SNPlex genotyping technology. 37 Therefore, we consider that missing values are acceptable and do not invalidate our method. 
In summary, we have developed three predictive models of PVR using genetic variables only. None of them has greater ability than that achieved by clinical predictive formulas 9 11 ; however, these results are equally as good considering that PVR is a complex polygenic disease. Our results suggest that genetic variables could contribute to the identification of patients at risk of PVR. These models have to be validated in an external sample to estimate the true predictive capability. Nevertheless, these are the first predictive models of PVR based on genetic profile. Genetic predictive variables could be added to any of the previously developed clinical formulas, 9 11 to improve their predictive capability and thus allow us to have a new tool in the prophylaxis of PVR. 
 
Table 1.
 
General Sample Characteristics
Table 1.
 
General Sample Characteristics
RD PVR Total P * H0: Independent RD vs. PVR
n % RD RD Valid % n % PVR PVR Valid %
Autonomous community <0.001
 Cataluña 110 79.7 79.7 28 20.3 20.3 138
 Pais Vasco 35 83.3 83.3 7 16.7 16.7 42
 Andalucia 9 64.3 64.3 5 35.7 35.7 14
 Castilla y Leon 60 47.2 47.2 67 52.8 52.8 127
 Valencia 98 76.0 76.0 31 24.0 24.0 129
Family RD history 0.38533
 Yes 33 75.0 75.0 11 25.0 25.0 44
 No 278 68.6 68.6 127 31.4 31.4 405
 Unknown 1 100 0 0 0 1
Aphakic 0.97431
 Yes 99 69.2 69.2 44 30.8 30.8 143
 No 213 69.4 69.4 94 30.6 30.6 307
 Unknown 0 0 0 0 0 0 0
Fellow eye RD 0.85711
 Yes 31 70.5 70.5 13 29.5 29.5 44
 No 280 69.1 69.1 125 30.9 30.9 405
 Unknown 1 100 0 0 0 1
Table 2.
 
Information Collected from the PVR Group
Table 2.
 
Information Collected from the PVR Group
PVR P * H0: Difference %
n % Valid %
Primary PVR 0.00035
 No PVR 312 69.3
 Yes 48 10.7 34.8
 No 90 20.0 65.2
Pneumatic retinopexy <0.00001
 Primary PVR 48 10.7
 No PVR 312 69.3
 Yes 9 2.0 10.0
 No 81 18.0 90.0
Scleral buckle 0.29184
 Primary PVR 48 10.7
 No PVR 312 69.3
 Yes 50 11.1 55.6
 No 40 8.9 44.4
PPV 0.09169
 Primary PVR 48 10.7
 No PVR 312 69.3
 Yes 53 11.8 58.9
 No 37 8.2 41.1
Table 3.
 
Individually Significant SNPs
Table 3.
 
Individually Significant SNPs
Gene SNP IG P FDR (Adjusted P)
EGF rs11568943 0.2944 0.0238 0.2142
FGF2 rs9990554 0.0610 0.0036 0.0324
HGF rs5745687 0.3294 0.0135 0.0540
IL1RN rs1688072 0.0877 0.0108 0.0436
IL1RN rs973635 0.1112 0.0109 0.0436
MCP1 rs3760396 0.0799 0.0033 0.0165
MMP2 rs1561220 0.0361 0.0242 0.1936
NFκBIA rs17103274 0.1948 0.0039 0.0234
PDGFRα rs7656613 0.0433 0.0181 0.0905
PI3KCG rs6961244 0.2269 0.0453 0.2676
SMAD3 rs8032802 0.2336 0.0242 0.1936
SMAD7 rs7226855 0.0493 0.0036 0.0216
TGFβ2 rs2796821 0.0425 0.0060 0.0480
TNF rs2857706 0.0321 0.0006 0.0054
Table 4.
 
Algorithm Characteristics
Table 4.
 
Algorithm Characteristics
Algorithm SNPs (n) Accuracy (%) Sensitivity (%) Specificity (%) Youden J index Diagnostic Odds Ratio
Large Data Set (with imputed missing values; n = 450)
Naive Bayes 45 56.4 (51.8–61) 44.9 (36.9–53.3) 61.5 (56–66.8) 0.1 1.31 (0.87–1.96)
Lineal kernel SVM 35 56 (51.4–60.5) 40.6 (32.7–48.9) 62.8 (57.3–68) 0 1.15 (0.77–1.74)
Radial kernel SVM 21 46 (41.4–50.6) 30.4 (23.4–38.6) 52.9 (47.3–58.4) 0 0.49 (0.32–0.75)
Quadratic kernel SVM 26 53:3 (48.7–57.9) 31.9 (24.7–40.1) 62.8 (57.3–68) 0 0.79 (0.52–1.21)
Cubic kernel SVM 10 60.4 (55.9–64.9) 60.9 (52.5–68.6) 60.3 (54.7–65.5) 0.2 2.36 (1.56–3.55)
Decision Tree 3 66.9 (62.4–71.1) 55.8 (47.5–63.8) 71.8 (66.6–76.5) 0.3 1.98 (1.57–2.49)
Random Forest 3 67.3 (62.9–71.5) 46.8 (38.8–55) 76.7 (71.7–81.1) 0.2 2.9 (1.9–4.42)
Small Data Set (with no imputed missing values; n = 166)
Naive Bayes 54 56 (48.4–63.4) 40.4 (28.2–53.9) 63.2 (54–71.4) 0 1.16 (0.59–2.27)
Lineal kernel SVM 42 78.4 (71.1–84.2) 69.2 (55.7–80.1) 83.3 (74.6–89.5) 0.5 11.25 (5.07–24.96)
Radial kernel SVM 10 70.2 (62.9–76.6) 70.4 (57.2–80.9) 70.2 (61.2–77.8) 0.4 5.59 (2.75–11.35)
Quadratic kernel SVM 17 65.1 (57.5–71.9) 53.8 (40.5–66.7) 70.2 (61.2–77.8) 0.2 2.75 (1.39–5.4)
Cubic kernel SVM 62 64.5 (56.9–71.3) 51.9 (38.7–64.9) 70.2 (61.2–77.8) 0.2 2.54 (1.29–5)
Decision Tree 4 60.2 (52.6–67.4) 67.3 (53.8–78.5) 57 (47.8–65.7) 0.2 2.73 (1.37–5.43)
Random Forest 2 69.3 (61.9–75.8) 65.4 (51.8–76.8) 71.1 (62.1–78.6) 0.4 4.64 (2.3–9.34)
Figure 1.
 
Comparison of predictive capability of both data sets. The large area under the curve for the small data set (black) indicates that it has greater reliability in the prediction the of development of PVR than does the large, more complete data set (gray).
Figure 1.
 
Comparison of predictive capability of both data sets. The large area under the curve for the small data set (black) indicates that it has greater reliability in the prediction the of development of PVR than does the large, more complete data set (gray).
Table 5.
 
SNPs Utilized by the Selected Algorithms
Table 5.
 
SNPs Utilized by the Selected Algorithms
GeneSNPModel 1 SVM Kernel LinealModel 2 SVM Kernel RadialModel 3 Random Forest
EGFrs718768Image not available
rs9999824Image not available
FGF2rs1476217Image not available
rs167428Image not available
rs3804158Image not available
rs9990554Image not available
HGFrs5745687Image not available
IFNGrs2098395Image not available
IGF1rs6214Image not available
rs7136446Image not availableImage not available
IGF2rs3213221Image not available
IGF-1Rrs8038015Image not available
IL10rs17015767Image not availableImage not available
rs1800871Image not available
rs3024498Image not available
rs4390174Image not available
IL1-ββrs3917368Image not available
IL1RNrs1688072Image not available
rs315946Image not available
rs973635Image not available
IL6rs11766273Image not available
MIFrs4820571Image not available
rs738806Image not available
MMP2rs2192853Image not available
rs243840Image not availableImage not available
rs9928731Image not available
MMP9rs3787268Image not available
NFκB1rs11722146Image not available
rs230540Image not available
rs997476Image not available
NFκBIArs17103274Image not availableImage not available
rs3138056Image not available
rs3138045Image not available
NFκBIBrs3136640Image not available
rs3136646Image not availableImage not available
PDGFRαrs4289498Image not available
PI3KCGrs6961244Image not available
rs849380Image not available
SMAD3rs3743343Image not available
TGFβ1rs2241715Image not availableImage not available
TGFβ2rs2796821Image not availableImage not available
TNFrs2229094Image not availableImage not availableImage not available
TNFR2rs1061622Image not available
rs1061628Image not available
rs3397Image not available
The authors thank the Spanish National Genotyping Center CeGen; for providing the SNP genotyping services (http://www.cegen.orgttp://www.sciencedirect.com/science?_ob=RedirectURL&_method=externObjLink&_locator=url&_cdi=4982&_plusSign=%2B&_targetURL=http%253A%252F%252Fwww.cegen.org), founded by Genoma España); staffs at the centers involved in the collection of samples for the genotyping data; and Rosa-Maria Corrales, Director of the Laboratory of Molecular Biology of IOBA. 
PastorJC, Rodriguez de la RuaE, MartinF. Proliferative vitreoretinopathy: risk factors and pathobiology. Prog Retin Eye Res. 2002;21:127–144. [CrossRef] [PubMed]
PastorJC, FernandezI, Rodriguez de la RuaE, et al. Surgical outcomes for primary rhegmatogenous retinal detachments in phakic and pseudophakic patients: The Retina 1 Project- report 2. Br J Ophthalmol. 2008;92:378–382. [CrossRef] [PubMed]
AsariaRH, CharterisDG. Proliferative vitreoretinopathy: developments in pathogenesis and treatment. Compr Ophthalmol Update. 2006;7:179–185. [PubMed]
CharterisDG, SethiCS, LewisGP, FisherSK. Proliferative vitreoretinopathy: developments in adjunctive treatment and retinal pathology. Eye. 2002;16:369–374. [CrossRef] [PubMed]
AsariaRH, KonCH, BunceC, et al. Adjuvant 5-fluorouracil and heparin prevents proliferative vitreoretinopathy: results from a randomized, double-blind, controlled clinical trial. Ophthalmology. 2001;108:1179–1183. [CrossRef] [PubMed]
CharterisDG, AylwardGW, WongD, GroenewaldC, AsariaRH, BunceC, the PVR Study Group. A randomized controlled trial of combined 5-fluorouracil and low-molecular-weight heparin in management of established proliferative vitreoretinopathy. Ophthalmology. 2004;111:2240–2245. [CrossRef] [PubMed]
WickhamL, BunceC, WongD, McGurnD, CharterisDG. Randomized controlled trial of combined 5-fluorouracil and low-molecular-weight heparin in the management of unselected rhegmatogenous retinal detachments undergoing primary vitrectomy. Ophthalmology. 2007;114:698–704. [CrossRef] [PubMed]
KonCH, OcclestonNL, AylwardGW, KhawPT. Expression of vitreous cytokines in proliferative vitreoretinopathy: a prospective study. Invest Ophthalmol Vis Sci. 1999;40:705–712. [PubMed]
KonCH, AsariaRH, OcclestonNL, KhawPT, AylwardGW. Risk factors for proliferative vitreoretinopathy after primary vitrectomy: a prospective study. Br J Ophthalmol. 2000;84:506–511. [CrossRef] [PubMed]
AsariaRH, KonCH, BunceC, et al. How to predict proliferative vitreoretinopathy: a prospective study. Ophthalmology. 2001;108:1184–1186. [CrossRef] [PubMed]
Rodriguez de la RuaE, PastorJC, AragonJ, et al. Interaction between surgical procedure for repairing retinal detachment and clinical risk factors for proliferative vitreoretinopathy. Curr Eye Res. 2005;30:147–153. [CrossRef] [PubMed]
Sanabria Ruiz-ColmenareMR, Pastor JimenoJC, Garrote AdradosJA, Telleria OrriolsJJ, Yugueros FernandezMI. Cytokine gene polymorphisms in retinal detachment patients with and without proliferative vitreoretinopathy: a preliminary study. Acta Ophthalmol Scand. 2006;84:309–313. [CrossRef] [PubMed]
MachemerR, AabergTM, FreemanHM, IrvineAR, LeanJS, MichelsRM. An updated classification of retinal detachment with proliferative vitreoretinopathy. Am J Ophthalmol. 1991;112:159–165. [CrossRef] [PubMed]
WiedemannP. Growth factors in retinal diseases: proliferative vitreoretinopathy, proliferative diabetic retinopathy, and retinal degeneration. Surv Ophthalmol. 1992;36:373–384. [CrossRef] [PubMed]
PastorJC. Proliferative vitreoretinopathy: an overview. Surv Ophthalmol. 1998;43:3–18. [CrossRef] [PubMed]
El-GhrablyIA, DuaHS, OrrGM, FischerD, TighePJ. Intravitreal invading cells contribute to vitreal cytokine milieu in proliferative vitreoretinopathy. Br J Ophthalmol. 2001;85:461–470. [CrossRef] [PubMed]
BarrettJC, FryB, MallerJ, DalyMJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. [CrossRef] [PubMed]
CoverTM, ThomasJA. Entropy, Relative Entropy and Mutual Information: Elements of Information Theory.. 1991;12–49.John Wiley New York.
WelchWJ. Construction of permutation test. J Am Stat Assoc. 1990;85:693–698. [CrossRef]
BenjaminiY, HochbergY. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol. 1995;57:289–300.
DudaRO, HartPE. Bayes Decision Theory. Pattern Classification and Scene Analysis. 1973;10–43.John Wiley & Sons New York.
BurgesCJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2:121–167. [CrossRef]
BreimanLF, FriedmanJH, LosenRA, StoneCJ. Classification and Regression Trees. 1984;Chapman & Hall New York.
BreimanLF. Random forests. Mach Learn. 2001;45:5–32. [CrossRef]
Díaz-Uriarte RÁlvarez, de AndrésS. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006;7:3. [CrossRef]
R Development Core Team (2007). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2007, Available at http://www.R-project.org. Accessed April 2007.
JoussenAM, JoeresS, FawzyN, et al. Autologous translocation of the choroid and retinal pigment epithelium in patients with geographic atrophy. Ophthalmology. 2007;114:551–560. [CrossRef] [PubMed]
PestkaS, KrauseCD, SarkarD, WalterMR, ShiY, FisherPB. Interleukin-10 and related cytokines and receptors. Annu Rev Immunol. 2004;22:929–979. [CrossRef] [PubMed]
ArendWP. Interleukin 1 receptor antagonist: a new member of the interleukin 1 family. J Clin Invest. 1991;88:1445–1451. [CrossRef] [PubMed]
HackerH, KarinM. Regulation and function of IKK and IKK-related kinases. Sci STKE. 2006;357:re13.
BöttnerM, KrielglsteinK, UnsickerK. The transforming growth factor-betas: structure, signaling, and roles in nervous system development and functions. J Neurochem. 2000;75:2227–2240. [PubMed]
SheppardD. Transforming growth factor beta: a central modulator of pulmonary and airway inflammation and fibrosis. Proc Am Thorac Soc. 2006;3:413–417. [CrossRef] [PubMed]
GoetzFW, PlanasJV, MacKenzieS. Tumor necrosis factors. Dev Comp Immunol. 2004;28:487–497. [CrossRef] [PubMed]
GuidryC. The role of Muller cells in fibrocontractive retinal disorders. Prog Retin Eye Res. 2005;24:75–86. [CrossRef] [PubMed]
PastorJC, MendezMC, de la FuenteMA, et al. Intraretinal immunohistochemistry findings in proliferative vitreoretinopathy with retinal shortening. Ophthalmic Res. 2006;38:193–200. [CrossRef] [PubMed]
De la VegaFM, LazarukKD, RhodesMD, WenzMH. Assessment of two flexible and compatible SNP genotyping platforms: TaqMan SNP Genotyping Assays and the SNPlex Genotyping System. Mutat Res. 2005;573:111–135. [CrossRef] [PubMed]
DaiZ, PappAC, WangD, HampelH, SadeeW. Genotyping panel for assessing response to cancer chemotherapy. BMC Med Genomics. 2008;1:24. [CrossRef] [PubMed]
Figure 1.
 
Comparison of predictive capability of both data sets. The large area under the curve for the small data set (black) indicates that it has greater reliability in the prediction the of development of PVR than does the large, more complete data set (gray).
Figure 1.
 
Comparison of predictive capability of both data sets. The large area under the curve for the small data set (black) indicates that it has greater reliability in the prediction the of development of PVR than does the large, more complete data set (gray).
Table 1.
 
General Sample Characteristics
Table 1.
 
General Sample Characteristics
RD PVR Total P * H0: Independent RD vs. PVR
n % RD RD Valid % n % PVR PVR Valid %
Autonomous community <0.001
 Cataluña 110 79.7 79.7 28 20.3 20.3 138
 Pais Vasco 35 83.3 83.3 7 16.7 16.7 42
 Andalucia 9 64.3 64.3 5 35.7 35.7 14
 Castilla y Leon 60 47.2 47.2 67 52.8 52.8 127
 Valencia 98 76.0 76.0 31 24.0 24.0 129
Family RD history 0.38533
 Yes 33 75.0 75.0 11 25.0 25.0 44
 No 278 68.6 68.6 127 31.4 31.4 405
 Unknown 1 100 0 0 0 1
Aphakic 0.97431
 Yes 99 69.2 69.2 44 30.8 30.8 143
 No 213 69.4 69.4 94 30.6 30.6 307
 Unknown 0 0 0 0 0 0 0
Fellow eye RD 0.85711
 Yes 31 70.5 70.5 13 29.5 29.5 44
 No 280 69.1 69.1 125 30.9 30.9 405
 Unknown 1 100 0 0 0 1
Table 2.
 
Information Collected from the PVR Group
Table 2.
 
Information Collected from the PVR Group
PVR P * H0: Difference %
n % Valid %
Primary PVR 0.00035
 No PVR 312 69.3
 Yes 48 10.7 34.8
 No 90 20.0 65.2
Pneumatic retinopexy <0.00001
 Primary PVR 48 10.7
 No PVR 312 69.3
 Yes 9 2.0 10.0
 No 81 18.0 90.0
Scleral buckle 0.29184
 Primary PVR 48 10.7
 No PVR 312 69.3
 Yes 50 11.1 55.6
 No 40 8.9 44.4
PPV 0.09169
 Primary PVR 48 10.7
 No PVR 312 69.3
 Yes 53 11.8 58.9
 No 37 8.2 41.1
Table 3.
 
Individually Significant SNPs
Table 3.
 
Individually Significant SNPs
Gene SNP IG P FDR (Adjusted P)
EGF rs11568943 0.2944 0.0238 0.2142
FGF2 rs9990554 0.0610 0.0036 0.0324
HGF rs5745687 0.3294 0.0135 0.0540
IL1RN rs1688072 0.0877 0.0108 0.0436
IL1RN rs973635 0.1112 0.0109 0.0436
MCP1 rs3760396 0.0799 0.0033 0.0165
MMP2 rs1561220 0.0361 0.0242 0.1936
NFκBIA rs17103274 0.1948 0.0039 0.0234
PDGFRα rs7656613 0.0433 0.0181 0.0905
PI3KCG rs6961244 0.2269 0.0453 0.2676
SMAD3 rs8032802 0.2336 0.0242 0.1936
SMAD7 rs7226855 0.0493 0.0036 0.0216
TGFβ2 rs2796821 0.0425 0.0060 0.0480
TNF rs2857706 0.0321 0.0006 0.0054
Table 4.
 
Algorithm Characteristics
Table 4.
 
Algorithm Characteristics
Algorithm SNPs (n) Accuracy (%) Sensitivity (%) Specificity (%) Youden J index Diagnostic Odds Ratio
Large Data Set (with imputed missing values; n = 450)
Naive Bayes 45 56.4 (51.8–61) 44.9 (36.9–53.3) 61.5 (56–66.8) 0.1 1.31 (0.87–1.96)
Lineal kernel SVM 35 56 (51.4–60.5) 40.6 (32.7–48.9) 62.8 (57.3–68) 0 1.15 (0.77–1.74)
Radial kernel SVM 21 46 (41.4–50.6) 30.4 (23.4–38.6) 52.9 (47.3–58.4) 0 0.49 (0.32–0.75)
Quadratic kernel SVM 26 53:3 (48.7–57.9) 31.9 (24.7–40.1) 62.8 (57.3–68) 0 0.79 (0.52–1.21)
Cubic kernel SVM 10 60.4 (55.9–64.9) 60.9 (52.5–68.6) 60.3 (54.7–65.5) 0.2 2.36 (1.56–3.55)
Decision Tree 3 66.9 (62.4–71.1) 55.8 (47.5–63.8) 71.8 (66.6–76.5) 0.3 1.98 (1.57–2.49)
Random Forest 3 67.3 (62.9–71.5) 46.8 (38.8–55) 76.7 (71.7–81.1) 0.2 2.9 (1.9–4.42)
Small Data Set (with no imputed missing values; n = 166)
Naive Bayes 54 56 (48.4–63.4) 40.4 (28.2–53.9) 63.2 (54–71.4) 0 1.16 (0.59–2.27)
Lineal kernel SVM 42 78.4 (71.1–84.2) 69.2 (55.7–80.1) 83.3 (74.6–89.5) 0.5 11.25 (5.07–24.96)
Radial kernel SVM 10 70.2 (62.9–76.6) 70.4 (57.2–80.9) 70.2 (61.2–77.8) 0.4 5.59 (2.75–11.35)
Quadratic kernel SVM 17 65.1 (57.5–71.9) 53.8 (40.5–66.7) 70.2 (61.2–77.8) 0.2 2.75 (1.39–5.4)
Cubic kernel SVM 62 64.5 (56.9–71.3) 51.9 (38.7–64.9) 70.2 (61.2–77.8) 0.2 2.54 (1.29–5)
Decision Tree 4 60.2 (52.6–67.4) 67.3 (53.8–78.5) 57 (47.8–65.7) 0.2 2.73 (1.37–5.43)
Random Forest 2 69.3 (61.9–75.8) 65.4 (51.8–76.8) 71.1 (62.1–78.6) 0.4 4.64 (2.3–9.34)
Table 5.
 
SNPs Utilized by the Selected Algorithms
Table 5.
 
SNPs Utilized by the Selected Algorithms
GeneSNPModel 1 SVM Kernel LinealModel 2 SVM Kernel RadialModel 3 Random Forest
EGFrs718768Image not available
rs9999824Image not available
FGF2rs1476217Image not available
rs167428Image not available
rs3804158Image not available
rs9990554Image not available
HGFrs5745687Image not available
IFNGrs2098395Image not available
IGF1rs6214Image not available
rs7136446Image not availableImage not available
IGF2rs3213221Image not available
IGF-1Rrs8038015Image not available
IL10rs17015767Image not availableImage not available
rs1800871Image not available
rs3024498Image not available
rs4390174Image not available
IL1-ββrs3917368Image not available
IL1RNrs1688072Image not available
rs315946Image not available
rs973635Image not available
IL6rs11766273Image not available
MIFrs4820571Image not available
rs738806Image not available
MMP2rs2192853Image not available
rs243840Image not availableImage not available
rs9928731Image not available
MMP9rs3787268Image not available
NFκB1rs11722146Image not available
rs230540Image not available
rs997476Image not available
NFκBIArs17103274Image not availableImage not available
rs3138056Image not available
rs3138045Image not available
NFκBIBrs3136640Image not available
rs3136646Image not availableImage not available
PDGFRαrs4289498Image not available
PI3KCGrs6961244Image not available
rs849380Image not available
SMAD3rs3743343Image not available
TGFβ1rs2241715Image not availableImage not available
TGFβ2rs2796821Image not availableImage not available
TNFrs2229094Image not availableImage not availableImage not available
TNFR2rs1061622Image not available
rs1061628Image not available
rs3397Image not available
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×