Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 17;24(1):463.
doi: 10.1186/s12864-023-09571-3.

Aberrant activation of five embryonic stem cell-specific genes robustly predicts a high risk of relapse in breast cancers

Affiliations

Aberrant activation of five embryonic stem cell-specific genes robustly predicts a high risk of relapse in breast cancers

Emmanuelle Jacquet et al. BMC Genomics. .

Abstract

Background: In breast cancer, as in all cancers, genetic and epigenetic deregulations can result in out-of-context expressions of a set of normally silent tissue-specific genes. The activation of some of these genes in various cancers empowers tumours cells with new properties and drives enhanced proliferation and metastatic activity, leading to a poor survival prognosis.

Results: In this work, we undertook an unprecedented systematic and unbiased analysis of out-of-context activations of a specific set of tissue-specific genes from testis, placenta and embryonic stem cells, not expressed in normal breast tissue as a source of novel prognostic biomarkers. To this end, we combined a strict machine learning framework of transcriptomic data analysis, and successfully created a new robust tool, validated in several independent datasets, which is able to identify patients with a high risk of relapse. This unbiased approach allowed us to identify a panel of five biomarkers, DNMT3B, EXO1, MCM10, CENPF and CENPE, that are robustly and significantly associated with disease-free survival prognosis in breast cancer. Based on these findings, we created a new Gene Expression Classifier (GEC) that stratifies patients. Additionally, thanks to the identified GEC, we were able to paint the specific molecular portraits of the particularly aggressive tumours, which show characteristics of male germ cells, with a particular metabolic gene signature, associated with an enrichment in pro-metastatic and pro-proliferation gene expression.

Conclusions: The GEC classifier is able to reliably identify patients with a high risk of relapse at early stages of the disease. We especially recommend to use the GEC tool for patients with the luminal-A molecular subtype of breast cancer, generally considered of a favourable disease-free survival prognosis, to detect the fraction of patients undergoing a high risk of relapse.

Keywords: Breast cancer; Cancer/testis antigens; Ectopic expression; Prognosis biomarkers; Survival analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Heatmap showing the percentage of ectopic activations of the 1882 tissue-specific genes encoding for testis, placenta and embryonic stem cells in the total TCGA-BRCA dataset and in breast cancer subtypes. Frequent ectopic activations above the threshold of 10% are presented in red colour map. Infrequent ectopic activations below 10% are shown in blue colour map
Fig. 2
Fig. 2
Kaplan–Meier survival curves showing disease-free survival probability according to the number of activated genes in the GEC tool for eight breast cancer datasets. A: Training dataset. B-D: Validation datasets. E–H: Test datasets. For each dataset, blue lines show the survival curves for the group of patients in which the corresponding tumours activated 0 or 1 gene in the GEC tool (GEC 0–1). Red lines represent the group of patients in which the tumours activated 2 or more genes (GEC 2–5). The p-values obtained from the logrank test and Cox proportional hazard model as well as the hazard ratios are displayed on the top of each plot. Significance symbols: * for p-value < 0.05, ** for p-value < 0.01, *** for p-value < 0.001
Fig. 3
Fig. 3
Results of the GEC tool in molecular subtypes of breast cancer. A: Distribution of breast cancer samples for five pooled datasets (TCGA-BRCA, GSE25066, GSE21653, E-MTAB-365 and Yau-2010) in luminal-A, luminal B, HER2-enriched and basal-like subtypes according to the number of activated genes in the GEC panel. The bar plots show the percentage of samples for each GEC group (from GEC 0 to GEC 5) in each molecular subtype. B: Same for the groups GEC 0–1 and GEC 2–5. C-F: Kaplan–Meier survival curves showing disease-free survival probability in luminal-A, luminal B, HER2-enriched and basal-like subtypes, respectively, according to the number of expressed genes in the GEC panel, presented in two groups: GEC 0–1 and GEC 2–5. The p-values obtained from the logrank test and Cox proportional hazard model as well as the hazard ratios are displayed on the top of each plot. Significance symbols: * for p-value < 0.05, ** for p-value < 0.01, *** for p-value < 0.001
Fig. 4
Fig. 4
Main results of the GSEA analysis for transcriptomic profiles of GEC + versus GEC- tumours in the dataset TCGA-BRCA. A: Kaplan–Meier disease-free survival curves between the group of tumours without GEC ectopic expressions (GEC-) and those with major GEC ectopic expressions of 4 or 5 genes (GEC +). The displayed p-value corresponds to the logrank test between GEC- and GEC + groups. B: Heatmap of the differential expression profiles of GEC + versus GEC- in TCGA-BRCA. The differentially expressed genes used for the heatmap were selected with an adjusted p-value < 0.05 of Mann–Whitney test and abs (ratio) > 1.5. The hierarchical clustering was performed using Euclidian-based distance with Ward’s linkage for samples and Pearson correlation for genes. C: GSEA plots illustrating main enrichment/depletion profiles in GEC + tumours compared to GEC- tumours in the dataset TCGA-BRCA. For all the gene sets, the enrichment or depletion was considered significant with a nominal p-value < 0.05 and FDR < 0.25. The gene sets were selected from the MSigDB database of the Broad Institute (collections C2, C5 or H of the MsigDB)
Fig. 5
Fig. 5
Gene Set Enrichment Analysis (GSEA) shows consistent molecular signatures of the aggressive GEC + tumours in several breast cancer datasets. The heatmap represents the normalized enrichment score (NES) obtained from the GSEA analysis in ten breast cancer datasets for different genes sets. Significantly enriched gene sets are shown in red colours; significantly depleted gene sets are displayed in blue colours. For all the gene sets, the enrichment or depletion was considered significant with a nominal p-value < 0.05 and FDR < 0.25. Grey cells correspond to non-significant results

Similar articles

References

    1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424. doi: 10.3322/caac.21492. - DOI - PubMed
    1. Hutter C, Zenklusen JC. The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell. 2018;173:283–285. doi: 10.1016/j.cell.2018.03.042. - DOI - PubMed
    1. Rousseaux S, Debernardi A, Jacquiau B, Vitte A-L, Vesin A, Nagy-Mignotte H, et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci Transl Med. 2013;5:186ra66. - PMC - PubMed
    1. Simpson AJG, Caballero OL, Jungbluth A, Chen Y-T, Old LJ. Cancer/testis antigens, gametogenesis and cancer. Nat Rev Cancer. 2005;5:615–625. doi: 10.1038/nrc1669. - DOI - PubMed
    1. Rousseaux S, Wang J, Khochbin S. Cancer hallmarks sustained by ectopic activations of placenta/male germline genes. Cell Cycle. 2013;12:2331–2332. doi: 10.4161/cc.25545. - DOI - PMC - PubMed
-