TY - JOUR
T1 - Interaction-based feature selection and classification for high-dimensional biological data
AU - Wang, Haitian
AU - Lo, Shaw Hwa
AU - Zheng, Tian
AU - Hu, Inchi
PY - 2012/11
Y1 - 2012/11
N2 - Motivation: Epistasis or gene-gene interaction has gained increasing attention in studies of complex diseases. Its presence as an ubiquitous component of genetic architecture of common human diseases has been contemplated. However, the detection of gene-gene interaction is difficult due to combinatorial explosion. Results: We present a novel feature selection method incorporating variable interaction. Three gene expression datasets are analyzed to illustrate our method, although it can also be applied to other types of high-dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance assessed using biological knowledge. We show that the classification error rates can be significantly reduced by considering interactions. Secondly, a sizable portion of genes identified by our method for breast cancer metastasis overlaps with those reported in gene-to-system breast cancer (G2SBC) database as disease associated and some of them have interesting biological implication. In summary, interaction-based methods may lead to substantial gain in biological insights as well as more accurate prediction.
AB - Motivation: Epistasis or gene-gene interaction has gained increasing attention in studies of complex diseases. Its presence as an ubiquitous component of genetic architecture of common human diseases has been contemplated. However, the detection of gene-gene interaction is difficult due to combinatorial explosion. Results: We present a novel feature selection method incorporating variable interaction. Three gene expression datasets are analyzed to illustrate our method, although it can also be applied to other types of high-dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance assessed using biological knowledge. We show that the classification error rates can be significantly reduced by considering interactions. Secondly, a sizable portion of genes identified by our method for breast cancer metastasis overlaps with those reported in gene-to-system breast cancer (G2SBC) database as disease associated and some of them have interesting biological implication. In summary, interaction-based methods may lead to substantial gain in biological insights as well as more accurate prediction.
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000310155300017
UR - https://openalex.org/W2143461268
UR - https://www.scopus.com/pages/publications/84868034825
U2 - 10.1093/bioinformatics/bts531
DO - 10.1093/bioinformatics/bts531
M3 - Journal Article
SN - 1367-4803
VL - 28
SP - 2834
EP - 2842
JO - Bioinformatics
JF - Bioinformatics
IS - 21
ER -