Genome Report | Genome Report
Microbiol. Biotechnol. Lett. 2023; 51(3): 317-324
https://doi.org/10.48022/mbl.2304.04008
Payal Gupta, Shriya Dube, Payal Priyadarshini, Shanvi Singh, Anasuya Pravallika R, Vijay Lakshmi Srivastava, Abhishek Sengupta, and Priyanka Narad*
Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, UP, 201301, India
Correspondence to :
Priyanka Narad, pnarad@amity.edu
Endometrium receptivity is a complex mechanism of intricate pathways that lead to the shift from the proliferative to the secretory phase. Our goal was to identify high-ranking differentially expressed genes and study the pathways associated with the phenomenon. Raw data were retrieved from six GEO datasets and 705 DEGs were identified through robust ranking aggregation after the integration of five datasets. 20 key genes were identified that were further re-validated in an additional dataset. Supporting evidence through the experimental references confirms them as major biomarkers of the shift from the proliferative to the secretory phase.
Keywords: Endometrium, proliferative phase, transcriptomics analysis, network analysis, reproductive health, bioinformatics
Human endometrium undergoes continuous changes throughout the menstrual cycle. The proliferative phase of the cycle witnesses several changes occurring in the hormone level such as the concentration of estradiol increases causing significant changes in the endometrium before ovulation. Estrogen concentration also regulates the receptor which prepares the body for the secretory phase. The importance of identifying the most receptive period i.e. the window of implantation (WOI) cannot be overstated. In procedures like IVF, successful results are dependent upon endometrial receptivity. During the secretory phase of the menstrual cycle, specifically the mid-secretory phase, there is a peak in progesterone levels and the endometrium becomes highly receptive and is key to the success of IVF. To attain a successful receptive phase the chronological events leading to the WOI are of utter importance. A well-organized and expressed endometrium in the proliferative phase is one such event that can prompt embryo implantation [1].
As the process is complex, embryo implantation is a multi-factorial process and a deep understanding of the molecular mechanisms underlying the endometrial tissue is imperative. It is cardinal to combine the plethora of information across different studies and perform a comparative analysis. Previous studies have performed analysis across different samples and different timelines across the menstrual cycle. A huge number of experimental studies are pulled across a plethora of resources resulting in repetitive derivation mostly due to a small sample size [2]. For this purpose, performing meta-analysis is important and useful in principle to reduce the inconclusiveness across samples and gain a better understanding of the identification of biomarkers/hub genes.
In this study, meta-analysis, and integration of results from five datasets using the robust rank aggregation (RRA) method were conducted. The analysed datasets were compared, and the genes common to all five datasets were selected for RRA analysis. To facilitate the analysis, separate lists of up-regulated (log2FC > 0) and down-regulated (log2FC < 0) genes were created from the common gene list. RRA was then performed on these gene lists, ranking them based on
To identify gene expression data for comparing proliferative and secretory phases of the endometrium, the data collection was performed in two phases. In the first phase, the data was acquired from previously published papers which were extracted through text mining. For text mining, RISmed package in R was used on 29/08/ 2022. RISmed package uses NCBI as its query database including Pubmed and provides a list of titles and their associated abstracts when given a search keyword and time period. In our study, “endometrial receptivity proliferative” was used as a search keyword and the time period was given from the year 2010 to 2022 for mining relevant literature. The literature acquired was further screened on the bases of the title and abstract to identify the relevant data. The research papers with closed access were removed. The remaining literature was full text searched for data retrieval.
In the second phase, databases like Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), European Genome-phenome Archive (EGA) (https://egaarchive.org/), and ArrayExpress (https://www.ebi.ac.uk/biostudies/arrayexpress) were manually searched with the keywords like “Proliferative phase”, “Endometrial receptivity” and “Proliferative endometrium”. The parameter for the organism was set to “Human” for the database search. The documented datasets from both phases were further screened. The duplicate datasets were removed. Only healthy samples were included in the study, samples associated with endometriosis, endometrial carcinoma, PCOS, and more were removed. Besides this, selected datasets were free from induced gene expression, any therapy or drug, mutations, or gene knockdown.
The datasets that met the following inclusion criterion were included in the study: (1) Gene expression profiling by array or high throughput sequencing; (2) The sample is endometrium of proliferative or secretory phase; (3) Proliferative and corresponding secretory samples were contained in one experiment; (4) The sample size is at least eight, with four patients in each group (5) Raw data were available; (6) The samples files were in “.CEL” format for microarray and “.txt” count file format for RNA-seq.
By searching the literature and database, we identified 104 papers and 41 datasets respectively. From 104 papers, 94 papers were excluded as the datasets associated with them were not found or were not freely available. 10 datasets were retrieved from the remaining literature which were further screened for unhealthy samples, data format, genomics data, and year of the dataset published. Detailed criteria of exclusion and inclusion is further explained in the form of flowchart in Fig. 1. After screening, we selected 5 datasets for our analysis of key genes. Further, GSE86491 was selected for validation purposes of the key genes.
Raw expression data was extracted and downloaded. This was followed by signal intensities being normalized using the “gcrma” R package. For RNA-seq analysis, gene count files were downloaded and extracted from the databases, and datasets were annotated using “hgu133plus2” and “org.Hs.eg.db” R bioconductor packages depending on the platform. The probes without anyannotation were removed. The values of multiple probe IDs mapping to the same gene were averaged. The code was run under the R environment version 4.0.3.
The gene expression profiling between the proliferative phase and secretory phase was performed using R packages – limma and DESeq2 for microarray and RNAseq datasets respectively. The significant genes were identified by applying a threshold of
The robust rank aggregation (RRA) method was employed to integrate the results of 5 multiple-platform datasets. In this study, the genes obtained from different platforms analyzed using the “limma” package and “DESeq2” package were integrated. The genes common in the analyzed 5 datasets were considered for RRA analysis. From the intersected gene list, up-regulated (log2FC > 0) and down-regulated (log2FC < 0) gene lists were created, in order to divide the DEGs for RRA analysis. RRA was performed on the up-regulated and down-regulated gene lists, once they had been ranked on the basis of
The Gene Ontology (GO) describes our knowledge of the biological domain with respect to three aspects: biological process, cellular component, and molecular function. Molecular function refers to the actions and interactions of gene products such as protein or RNA. Cellular components are the locations of molecular function and biological processes refer to both the small-scale and large-scale operations conducted by the action of gene products, such as DNA repair. In the case of this study, the gene ontology was established along with the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using the “clusterProfiler” R package. GO terms or KEGG pathways were visualized by the “ggplot2” R packages.
The Robust DEGs were subjected to network analysis and DEGs were uploaded to the STRING database and the confidence interaction score was set to 0.9 for the constructed network. The network was then exported to Cytoscape (version 3.9.1) for the identification of hub genes. Hub genes play an important role in a biological system as they are highly connected genes generally with connectivity greater than 10 in a genetic interaction network. Cytohubba plug-in was utilized to screen the potential hub genes. In Cytohubba plug-in, the genes were scored in accordance with DMNC, EPC, MCC, and Degree methods. The scored genes were ranked and the top 30 genes in each method were used for hub identification. The overlapping genes in DMNC, EPC, MCC, Degree, and DEGs were considered as hub genes.
The hub genes obtained were validated by utilization of data from Illumina RNA-seq dataset (GSE86491) with a total of 14 samples, 7 samples associated with proliferative and 7 samples of secretory phase. This dataset was not included in the study for identification of the hub genes instead this dataset was processed with the purpose of validating the hub genes. The raw data was extracted, and the analysis was performed using the ‘DESeq2” R package. In this dataset, the hub genes were observed to have a significant expression with log2FC > 1.5 and
Five datasets (GSE119209, GSE29981, E-MEXP-3111, GSE132711, GSE158958) met our inclusion criterion and were used in our study [3−7]. Transcriptomics analysis was performed on 35 proliferative and 34 secretory samples. A complete description including the experimental techniques, array type, year, country, and sample details are provided in Table 1. An independent study, GSE86491 was further selected for validation of the identified hub genes for elucidation of the mechanisms of these genes.
Table 1 . Dataset description
Accession ID | Proliferative phase samples | Secretory phase samples | Experimental type | Platform | Year | Country |
---|---|---|---|---|---|---|
GSE119209 | 6 | 5 | Expression profiling by high throughput sequencing | Illumina HiSeq 2500 | 2018 | USA |
GSE29981 | 10 | 10 | Expression profiling by array | Affymetrix Human Genome U133 Plus 2.0 | 2011 | United Kingdom |
E-MEXP-3111 | 4 | 4 | Expression profiling by array | Affymetrix Human Genome U133 Plus 2.0 | 2011 | Canada |
GSE132711 | 10 | 10 | Expression profiling by high throughput sequencing | Illumina HiSeq 2000 | 2019 | USA |
GSE158958 | 5 | 5 | Expression profiling by high throughput sequencing | Illumina NextSeq 500 | 2021 | Italy |
GSE86491 | 7 | 7 | Expression profiling by high throughput sequencing | Illumina HiSeq 2500 | 2016 | Iceland |
With the applied threshold, 4331, 1117, 1743, 4224, and 6119 DEGs were identified out of which 2232, 759, 1009, 2054, and 2049 were up-regulated and 2099, 358, 734, 2170, and 4070 were down-regulated in GSE119209, GSE29981, E-MEXP-3111, GSE132711 and GSE158958 dataset respectively. The genes found in these datasets with their statistical values are provided in Supplementary Fig. S1.
Five datasets namely GSE119209, GSE29981, EMEXP-3111, GSE132711, and GSE158958 were combined together for the RRA analysis. 705 DEGs were identified at an adjusted
The resultant robust genes were subjected to a gene ontology study using ClusterProfiler package in R. The results showed anion transmembrane transporter activity, organic anion transmembrane transporter activity, carboxylic acid transmembrane transporter activity, organic acid transmembrane transporter activity, and monocarboxylic acid transmembrane transporter activity as the top five enriched terms for molecular function whereas nuclear division, chromosome segregation, mitotic nuclear division, sister chromatid segregation, and mitotic sister chromatid segregation were the top five enriched terminology for biological processes. The top five enriched terms for cellular components were spindle, chromosomal region, chromosome, centromeric region, spindle pole, condensed chromosome, and centromeric region. Further, we also performed a KEGG pathway analysis. It was observed that Human T-cell leukemia virus 1 infection, Cell cycle, Oocyte meiosis, Complement, and coagulation cascades, and collecting duct acid secretion were the top 5 enriched KEGG pathways. To summarize, the ranked DEGs were found in carboxylic acid transmembrane transporter activity, nuclear division, spindle, and Human T-cell leukemia virus 1 infection processes and pathways. The molecular function, biological process, cellular component, and KEGG pathway dot plots of these DEGs have been shown in Fig. 2. The gene ontology results are present in Supplementary file 1.
Protein-protein interaction network was constructed for the robust genes using the STRING database and visualization was done by Cytoscape. The network constructed comprised 701 nodes and 1089 edges. The disconnected nodes were removed from the network and the remaining 116 genes were visualized. The network with these connected genes has been shown in Fig. 3. To further analyse this network and identify the crucial genes, the Cytohubba plug-in was used which identified the hub genes in the generated network. The ranking of the genes in the network was performed using MCC, DMNC, EPC, and Degree methods.
Each method provided a list of the top 30 genes. The overlap of Cytohubba analysis finally identified 20 genes namely CENPF, CEP55, DLGAP5, KIF20A, KIF23, KIF4A, MELK, NCAPG, NDC80, NEK2, NUF2, NUSAP1, PBK, PRC1, PTTG1, SPAG5, TOP2A, TPX2, TTK, and UBE2C as the hub genes. The intersection is represented by a Venn diagram in Fig. 2 of Supplementary file 2. These hub genes were further analyzed using g:Profiler tool which demonstrated that these genes are involved in major pathways like cell cycle, signaling, DNA damage, kinesins and mitotic pathway. A detailed ontology for these genes has been provided in Supplementary Table.
Dataset GSE86491 was used for the validation of the hub genes [8]. Data were pre-processed and analysed using ‘DESeq2” R package. A total number of 3604 genes were identified, out of which 1590 were up-regulated and 2014 were down-regulated with a threshold of |log2 fold change (FC)| > 1 and
It is vital to understand the underlying mechanism of proliferative phase genes to determine the shift from the proliferative phase to the secretory phase. In our study, we have performed a meta-analysis to identify the key genes which are crucial for the regulation of the healthy proliferative phase of the endometrium. Integrated analysis revealed the highly connected 116 genes, out of which 20 genes (CENPF, CEP55, DLGAP5, KIF20A, KIF23, KIF4A, MELK, NCAPG, NDC80, NEK2, NUF2, NUSAP1, PBK, PRC1, PTTG1, SPAG5, TOP2A, TPX2, TTK, and UBE2C) were found to be hub genes. These hub genes were also evaluated through a validation dataset by identifying intersecting key genes in both studies. The potential impact of hub genes identified can be understood through their roles in cell division and chromosomal stability. CENPF enables microtubule binding. Abnormalities in CENPF can lead to chromosomal instability, which may increase the risk of reproductive complications such as infertility or stillbirth. CEP55 plays a major role in cytokinesis. Dysregulation of CEP55 can disrupt proper cell division, potentially leading to reproductive problems including failed cytokinesis and it also promotes tumorigenesis in endometrial cancer by regulating the Foxo1 signaling. DLGAP5 is essential for cell cycle regulation. Abnormalities in DLGAP5 may result in aberrant cell division, which can lead to endometrial cancer. Dysregulation of KIF20A can disrupt cytokinesis, potentially leading to failed cell division. It can cause endometrial adenocarcinoma endometriosis. KIF23 is essential for cytokinesis in Rhomediated signaling. Abnormalities in KIF23 can result in faulty cytokinesis and endometriosis. It has also been reported in endometrial-embryo interactions. KIF4A enables microtubule motor activity. Dysfunctions in KIF4A can lead to defects in chromosome segregation during cell division, which may result in chromosomal abnormalities and have implications for reproductive health, including endometrial cancer. MELK is involved in various processes such as cell cycle regulation, selfrenewal of stem cells, apoptosis and splicing regulation. Dysregulation of MELK can lead to progression of endometrial carcinoma. NCAPG is involved in mitotic chromosome condensation. Abnormalities in NCAPG can lead to disrupted endometrium receptivity. NDC80 enables chromosome segregation and spindle checkpoint activity. Dysregulation of NDC80 can disrupt proper chromosome attachment and segregation. It is also reported in polycystic ovary syndrome. NEK2 helps in centrosome separation. Dysfunctions in NEK2 can lead to errors in chromosome segregation during cell division, potentially resulting in chromosomal abnormalities and increasing the risk of drug-resistant ovarian cancer. NUF2 may be responsible for lynch syndrome developing into endometrial cancer. NUSAP1 is involved in cell cycle regulation, and it is reported in ovarian cancer. Down-regulation of PBK is responsible for thin endometrium therefore, resulting in disrupted proliferative phase. PRC1 plays a major role in cytokinesis. It is responsible for maternal epigenetic modification which regulates embryo implantation. PTTG1 is reported to be active in uterine corpus endometrial carcinoma. SPAG5 is reported in proliferation and invasion in cervical cancer. Abnormal expression of TOP2A affects decidualization and changes the “window of implantation”, leading to recurrent implantation failure. TPX2 increases the risk of developing cancer and exhibits a strong connection with unfavourable outcomes in endometrial cases. TTK is reported to be a prognostic biomarker for endometrial cancer. UBE2C is reported to be significantly highly expressed in endometrial carcinoma. The supporting references are provided in the Supplementary Table. Any mutation or inactivation of these genes can lead to major consequences and eventually disrupt the endometrium. These genes can be the markers or checkpoints of a healthy endometrium in the proliferative phase.
Understanding key genes in the endometrium can result in early diagnosis of infertility related diseases and establishment of successful pregnancy. These genes can play a major role in endometrial receptivity, infertility diagnosis and treatment, endometrial disorders and also, they can provide future therapeutic interventions in personalized medicines. Our results identified genes are crucial to maintain a healthy follicular phase of endometrium. To summarise, the proliferative phase which constitutes to half of the woman’s cycle is essential as the lining of the uterus forms a new endometrial layer through proliferation of cells/tissues for fertilized egg to attach [9]. Our analysis revealed that identified hub genes are essential for the cell proliferation phase. Any mutation or inactivation of these genes can lead to major consequences and eventually disrupt the endometrium. These genes can be the markers or checkpoints of a healthy endometrium in the proliferative phase and can be targeted in case of dysregulation.
Conceptualization: Payal Gupta, Shriya Dube and Priyanka Narad
Data Curation: Anasuya Pravallika R and Vijay Lakshmi Srivastava.
Writing – Original Draft Preparation: Payal Gupta, Payal Priyadarshini and Shanvi Singh. Revision and critical appraisal: Priyanka Narad and Abhishek Sengupta.
The authors have no financial conflicts of interest to declare.
Sudeepti Kulshreshtha, Priyanka Narad, Brojen Singh , Deepak Modi, and Abhishek Sengupta
These authors contributed equally to this work.