Article Search

Microbiology and Biotechnology Letters

Research Article(보문)

View PDF

Genome Report  |  Genome Report

Microbiol. Biotechnol. Lett. 2023; 51(3): 317-324

Received: April 26, 2023; Revised: June 20, 2023; Accepted: June 28, 2023

Deciphering Key Genes of Proliferative and Secretory Phase Using Integrated Transcriptomics and Network Analysis

Payal Gupta, Shriya Dube, Payal Priyadarshini, Shanvi Singh, Anasuya Pravallika R, Vijay Lakshmi Srivastava, Abhishek Sengupta, and Priyanka Narad*

Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, UP, 201301, India

Correspondence to :
Priyanka Narad,

Endometrium receptivity is a complex mechanism of intricate pathways that lead to the shift from the proliferative to the secretory phase. Our goal was to identify high-ranking differentially expressed genes and study the pathways associated with the phenomenon. Raw data were retrieved from six GEO datasets and 705 DEGs were identified through robust ranking aggregation after the integration of five datasets. 20 key genes were identified that were further re-validated in an additional dataset. Supporting evidence through the experimental references confirms them as major biomarkers of the shift from the proliferative to the secretory phase.

Keywords: Endometrium, proliferative phase, transcriptomics analysis, network analysis, reproductive health, bioinformatics

Human endometrium undergoes continuous changes throughout the menstrual cycle. The proliferative phase of the cycle witnesses several changes occurring in the hormone level such as the concentration of estradiol increases causing significant changes in the endometrium before ovulation. Estrogen concentration also regulates the receptor which prepares the body for the secretory phase. The importance of identifying the most receptive period i.e. the window of implantation (WOI) cannot be overstated. In procedures like IVF, successful results are dependent upon endometrial receptivity. During the secretory phase of the menstrual cycle, specifically the mid-secretory phase, there is a peak in progesterone levels and the endometrium becomes highly receptive and is key to the success of IVF. To attain a successful receptive phase the chronological events leading to the WOI are of utter importance. A well-organized and expressed endometrium in the proliferative phase is one such event that can prompt embryo implantation [1].

As the process is complex, embryo implantation is a multi-factorial process and a deep understanding of the molecular mechanisms underlying the endometrial tissue is imperative. It is cardinal to combine the plethora of information across different studies and perform a comparative analysis. Previous studies have performed analysis across different samples and different timelines across the menstrual cycle. A huge number of experimental studies are pulled across a plethora of resources resulting in repetitive derivation mostly due to a small sample size [2]. For this purpose, performing meta-analysis is important and useful in principle to reduce the inconclusiveness across samples and gain a better understanding of the identification of biomarkers/hub genes.

In this study, meta-analysis, and integration of results from five datasets using the robust rank aggregation (RRA) method were conducted. The analysed datasets were compared, and the genes common to all five datasets were selected for RRA analysis. To facilitate the analysis, separate lists of up-regulated (log2FC > 0) and down-regulated (log2FC < 0) genes were created from the common gene list. RRA was then performed on these gene lists, ranking them based on p-value and |log2FC|. With our study, we were able to identify 20 hub genes which were validated using a training dataset with the same biological state to validate the significantly expressed genes across proliferative and secretory phases. It is to emphasize that the over-expression of these genes can cause uncontrolled cell proliferation ultimately leading to endometrial cancer and underexpression can cause thin endometrium. This considerably affects the receptivity of the endometrium. It is vital to examine these genes for a successful proliferative phase.

Data collection

To identify gene expression data for comparing proliferative and secretory phases of the endometrium, the data collection was performed in two phases. In the first phase, the data was acquired from previously published papers which were extracted through text mining. For text mining, RISmed package in R was used on 29/08/ 2022. RISmed package uses NCBI as its query database including Pubmed and provides a list of titles and their associated abstracts when given a search keyword and time period. In our study, “endometrial receptivity proliferative” was used as a search keyword and the time period was given from the year 2010 to 2022 for mining relevant literature. The literature acquired was further screened on the bases of the title and abstract to identify the relevant data. The research papers with closed access were removed. The remaining literature was full text searched for data retrieval.

In the second phase, databases like Gene Expression Omnibus (GEO) (, European Genome-phenome Archive (EGA) (, and ArrayExpress ( were manually searched with the keywords like “Proliferative phase”, “Endometrial receptivity” and “Proliferative endometrium”. The parameter for the organism was set to “Human” for the database search. The documented datasets from both phases were further screened. The duplicate datasets were removed. Only healthy samples were included in the study, samples associated with endometriosis, endometrial carcinoma, PCOS, and more were removed. Besides this, selected datasets were free from induced gene expression, any therapy or drug, mutations, or gene knockdown.

The datasets that met the following inclusion criterion were included in the study: (1) Gene expression profiling by array or high throughput sequencing; (2) The sample is endometrium of proliferative or secretory phase; (3) Proliferative and corresponding secretory samples were contained in one experiment; (4) The sample size is at least eight, with four patients in each group (5) Raw data were available; (6) The samples files were in “.CEL” format for microarray and “.txt” count file format for RNA-seq.

By searching the literature and database, we identified 104 papers and 41 datasets respectively. From 104 papers, 94 papers were excluded as the datasets associated with them were not found or were not freely available. 10 datasets were retrieved from the remaining literature which were further screened for unhealthy samples, data format, genomics data, and year of the dataset published. Detailed criteria of exclusion and inclusion is further explained in the form of flowchart in Fig. 1. After screening, we selected 5 datasets for our analysis of key genes. Further, GSE86491 was selected for validation purposes of the key genes.

Figure 1.Data collection workflow: The workflow of the data acquired and exclusion criterion.

Data pre-processing and normalization

Raw expression data was extracted and downloaded. This was followed by signal intensities being normalized using the “gcrma” R package. For RNA-seq analysis, gene count files were downloaded and extracted from the databases, and datasets were annotated using “hgu133plus2” and “” R bioconductor packages depending on the platform. The probes without anyannotation were removed. The values of multiple probe IDs mapping to the same gene were averaged. The code was run under the R environment version 4.0.3.

Differentially expressed genes (DEGs) analysis

The gene expression profiling between the proliferative phase and secretory phase was performed using R packages – limma and DESeq2 for microarray and RNAseq datasets respectively. The significant genes were identified by applying a threshold of p-value < 0.05 and |log2 fold change (FC)| > 1. The differentially expressed genes (DEGs) were ranked in accordance with p-value and |log2 FC|.

Robust rank aggregation analysis

The robust rank aggregation (RRA) method was employed to integrate the results of 5 multiple-platform datasets. In this study, the genes obtained from different platforms analyzed using the “limma” package and “DESeq2” package were integrated. The genes common in the analyzed 5 datasets were considered for RRA analysis. From the intersected gene list, up-regulated (log2FC > 0) and down-regulated (log2FC < 0) gene lists were created, in order to divide the DEGs for RRA analysis. RRA was performed on the up-regulated and down-regulated gene lists, once they had been ranked on the basis of p-value and |log2FC|. Subsequently, the DEGs were scored according to the ranked list and aggregately analyzed using the “Robust Rank Aggregation” R package. The final adjusted p-value in this method reflects the probability that the highly ranked genes in the datasets were identified as robust DEGs. Genes with an adjusted p-value < 0.01 in the RRA analysis were considered as strong DEGs.

Gene set enrichment analysis

The Gene Ontology (GO) describes our knowledge of the biological domain with respect to three aspects: biological process, cellular component, and molecular function. Molecular function refers to the actions and interactions of gene products such as protein or RNA. Cellular components are the locations of molecular function and biological processes refer to both the small-scale and large-scale operations conducted by the action of gene products, such as DNA repair. In the case of this study, the gene ontology was established along with the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using the “clusterProfiler” R package. GO terms or KEGG pathways were visualized by the “ggplot2” R packages.

Gene interaction networks

The Robust DEGs were subjected to network analysis and DEGs were uploaded to the STRING database and the confidence interaction score was set to 0.9 for the constructed network. The network was then exported to Cytoscape (version 3.9.1) for the identification of hub genes. Hub genes play an important role in a biological system as they are highly connected genes generally with connectivity greater than 10 in a genetic interaction network. Cytohubba plug-in was utilized to screen the potential hub genes. In Cytohubba plug-in, the genes were scored in accordance with DMNC, EPC, MCC, and Degree methods. The scored genes were ranked and the top 30 genes in each method were used for hub identification. The overlapping genes in DMNC, EPC, MCC, Degree, and DEGs were considered as hub genes.

Hub genes validation

The hub genes obtained were validated by utilization of data from Illumina RNA-seq dataset (GSE86491) with a total of 14 samples, 7 samples associated with proliferative and 7 samples of secretory phase. This dataset was not included in the study for identification of the hub genes instead this dataset was processed with the purpose of validating the hub genes. The raw data was extracted, and the analysis was performed using the ‘DESeq2” R package. In this dataset, the hub genes were observed to have a significant expression with log2FC > 1.5 and p-values < 1.56E-06. The objective was to independently identify whether the key genes identified in our integrated analysis were overlapping in another independent training set.

Data description

Five datasets (GSE119209, GSE29981, E-MEXP-3111, GSE132711, GSE158958) met our inclusion criterion and were used in our study [37]. Transcriptomics analysis was performed on 35 proliferative and 34 secretory samples. A complete description including the experimental techniques, array type, year, country, and sample details are provided in Table 1. An independent study, GSE86491 was further selected for validation of the identified hub genes for elucidation of the mechanisms of these genes.

Table 1 . Dataset description

Accession IDProliferative phase samplesSecretory phase samplesExperimental typePlatformYearCountry
GSE11920965Expression profiling by high throughput sequencingIllumina HiSeq 25002018USA
GSE299811010Expression profiling by arrayAffymetrix Human Genome U133 Plus 2.02011United Kingdom
E-MEXP-311144Expression profiling by arrayAffymetrix Human Genome U133 Plus 2.02011Canada
GSE1327111010Expression profiling by high throughput sequencingIllumina HiSeq 20002019USA
GSE15895855Expression profiling by high throughput sequencingIllumina NextSeq 5002021Italy
GSE8649177Expression profiling by high throughput sequencingIllumina HiSeq 25002016Iceland

Differentially expressed genes (DEGs) analysis

With the applied threshold, 4331, 1117, 1743, 4224, and 6119 DEGs were identified out of which 2232, 759, 1009, 2054, and 2049 were up-regulated and 2099, 358, 734, 2170, and 4070 were down-regulated in GSE119209, GSE29981, E-MEXP-3111, GSE132711 and GSE158958 dataset respectively. The genes found in these datasets with their statistical values are provided in Supplementary Fig. S1.

Integrative analysis

Five datasets namely GSE119209, GSE29981, EMEXP-3111, GSE132711, and GSE158958 were combined together for the RRA analysis. 705 DEGs were identified at an adjusted p-value cut-off p-value < 0.01. The significant 100 genes common in five datasets which were observed to follow a similar trend of expression (Fig. 1, Supplementary file S2). The result of the RRA analysis is also shown in Supplementary file 1.

Gene set enrichment analysis

The resultant robust genes were subjected to a gene ontology study using ClusterProfiler package in R. The results showed anion transmembrane transporter activity, organic anion transmembrane transporter activity, carboxylic acid transmembrane transporter activity, organic acid transmembrane transporter activity, and monocarboxylic acid transmembrane transporter activity as the top five enriched terms for molecular function whereas nuclear division, chromosome segregation, mitotic nuclear division, sister chromatid segregation, and mitotic sister chromatid segregation were the top five enriched terminology for biological processes. The top five enriched terms for cellular components were spindle, chromosomal region, chromosome, centromeric region, spindle pole, condensed chromosome, and centromeric region. Further, we also performed a KEGG pathway analysis. It was observed that Human T-cell leukemia virus 1 infection, Cell cycle, Oocyte meiosis, Complement, and coagulation cascades, and collecting duct acid secretion were the top 5 enriched KEGG pathways. To summarize, the ranked DEGs were found in carboxylic acid transmembrane transporter activity, nuclear division, spindle, and Human T-cell leukemia virus 1 infection processes and pathways. The molecular function, biological process, cellular component, and KEGG pathway dot plots of these DEGs have been shown in Fig. 2. The gene ontology results are present in Supplementary file 1.

Figure 2.Gene ontology analysis results: (A) Enriched molecular functions observed for the DEGs, (B) Enriched biological processes in which the DEGs were involved, (C) Enriched cellular components representing the localization of DEGs, (D) Enriched KEGG pathways, the DEG’s were involved in.

PPI Network analysis and hub genes

Protein-protein interaction network was constructed for the robust genes using the STRING database and visualization was done by Cytoscape. The network constructed comprised 701 nodes and 1089 edges. The disconnected nodes were removed from the network and the remaining 116 genes were visualized. The network with these connected genes has been shown in Fig. 3. To further analyse this network and identify the crucial genes, the Cytohubba plug-in was used which identified the hub genes in the generated network. The ranking of the genes in the network was performed using MCC, DMNC, EPC, and Degree methods.

Figure 3.Network depicting gene-gene interactions: Network constructed after removing disconnected nodes from the network shown in circular layout using Cytoscape. Strong, moderate, and weak interactions have been depicted by purple, green and yellow edge colours respectively on the basis of interaction score.

Each method provided a list of the top 30 genes. The overlap of Cytohubba analysis finally identified 20 genes namely CENPF, CEP55, DLGAP5, KIF20A, KIF23, KIF4A, MELK, NCAPG, NDC80, NEK2, NUF2, NUSAP1, PBK, PRC1, PTTG1, SPAG5, TOP2A, TPX2, TTK, and UBE2C as the hub genes. The intersection is represented by a Venn diagram in Fig. 2 of Supplementary file 2. These hub genes were further analyzed using g:Profiler tool which demonstrated that these genes are involved in major pathways like cell cycle, signaling, DNA damage, kinesins and mitotic pathway. A detailed ontology for these genes has been provided in Supplementary Table.

Hub genes validation

Dataset GSE86491 was used for the validation of the hub genes [8]. Data were pre-processed and analysed using ‘DESeq2” R package. A total number of 3604 genes were identified, out of which 1590 were up-regulated and 2014 were down-regulated with a threshold of |log2 fold change (FC)| > 1 and p-value <0.05. The hub genes identified from the integrative and network analysis were observed to have significant expression with log2FC > 1.5 and p-values < 1.56E-06 in this training/ validation dataset. Therefore, it is concurrent with our results and provides a confidence score for key genes identified. The validation ensures that these genes are eminently present with significant expression values, increasing the confidence of these genes as markers of healthy endometrium in the proliferative phase. The hub genes with their associated p-values and log2FC values are shown in Fig. 3 in Supplementary S2.

It is vital to understand the underlying mechanism of proliferative phase genes to determine the shift from the proliferative phase to the secretory phase. In our study, we have performed a meta-analysis to identify the key genes which are crucial for the regulation of the healthy proliferative phase of the endometrium. Integrated analysis revealed the highly connected 116 genes, out of which 20 genes (CENPF, CEP55, DLGAP5, KIF20A, KIF23, KIF4A, MELK, NCAPG, NDC80, NEK2, NUF2, NUSAP1, PBK, PRC1, PTTG1, SPAG5, TOP2A, TPX2, TTK, and UBE2C) were found to be hub genes. These hub genes were also evaluated through a validation dataset by identifying intersecting key genes in both studies. The potential impact of hub genes identified can be understood through their roles in cell division and chromosomal stability. CENPF enables microtubule binding. Abnormalities in CENPF can lead to chromosomal instability, which may increase the risk of reproductive complications such as infertility or stillbirth. CEP55 plays a major role in cytokinesis. Dysregulation of CEP55 can disrupt proper cell division, potentially leading to reproductive problems including failed cytokinesis and it also promotes tumorigenesis in endometrial cancer by regulating the Foxo1 signaling. DLGAP5 is essential for cell cycle regulation. Abnormalities in DLGAP5 may result in aberrant cell division, which can lead to endometrial cancer. Dysregulation of KIF20A can disrupt cytokinesis, potentially leading to failed cell division. It can cause endometrial adenocarcinoma endometriosis. KIF23 is essential for cytokinesis in Rhomediated signaling. Abnormalities in KIF23 can result in faulty cytokinesis and endometriosis. It has also been reported in endometrial-embryo interactions. KIF4A enables microtubule motor activity. Dysfunctions in KIF4A can lead to defects in chromosome segregation during cell division, which may result in chromosomal abnormalities and have implications for reproductive health, including endometrial cancer. MELK is involved in various processes such as cell cycle regulation, selfrenewal of stem cells, apoptosis and splicing regulation. Dysregulation of MELK can lead to progression of endometrial carcinoma. NCAPG is involved in mitotic chromosome condensation. Abnormalities in NCAPG can lead to disrupted endometrium receptivity. NDC80 enables chromosome segregation and spindle checkpoint activity. Dysregulation of NDC80 can disrupt proper chromosome attachment and segregation. It is also reported in polycystic ovary syndrome. NEK2 helps in centrosome separation. Dysfunctions in NEK2 can lead to errors in chromosome segregation during cell division, potentially resulting in chromosomal abnormalities and increasing the risk of drug-resistant ovarian cancer. NUF2 may be responsible for lynch syndrome developing into endometrial cancer. NUSAP1 is involved in cell cycle regulation, and it is reported in ovarian cancer. Down-regulation of PBK is responsible for thin endometrium therefore, resulting in disrupted proliferative phase. PRC1 plays a major role in cytokinesis. It is responsible for maternal epigenetic modification which regulates embryo implantation. PTTG1 is reported to be active in uterine corpus endometrial carcinoma. SPAG5 is reported in proliferation and invasion in cervical cancer. Abnormal expression of TOP2A affects decidualization and changes the “window of implantation”, leading to recurrent implantation failure. TPX2 increases the risk of developing cancer and exhibits a strong connection with unfavourable outcomes in endometrial cases. TTK is reported to be a prognostic biomarker for endometrial cancer. UBE2C is reported to be significantly highly expressed in endometrial carcinoma. The supporting references are provided in the Supplementary Table. Any mutation or inactivation of these genes can lead to major consequences and eventually disrupt the endometrium. These genes can be the markers or checkpoints of a healthy endometrium in the proliferative phase.

Understanding key genes in the endometrium can result in early diagnosis of infertility related diseases and establishment of successful pregnancy. These genes can play a major role in endometrial receptivity, infertility diagnosis and treatment, endometrial disorders and also, they can provide future therapeutic interventions in personalized medicines. Our results identified genes are crucial to maintain a healthy follicular phase of endometrium. To summarise, the proliferative phase which constitutes to half of the woman’s cycle is essential as the lining of the uterus forms a new endometrial layer through proliferation of cells/tissues for fertilized egg to attach [9]. Our analysis revealed that identified hub genes are essential for the cell proliferation phase. Any mutation or inactivation of these genes can lead to major consequences and eventually disrupt the endometrium. These genes can be the markers or checkpoints of a healthy endometrium in the proliferative phase and can be targeted in case of dysregulation.

Conceptualization: Payal Gupta, Shriya Dube and Priyanka Narad

Data Curation: Anasuya Pravallika R and Vijay Lakshmi Srivastava.

Writing – Original Draft Preparation: Payal Gupta, Payal Priyadarshini and Shanvi Singh. Revision and critical appraisal: Priyanka Narad and Abhishek Sengupta.

We would like to acknowledge Dr. Ashok K. Chauhan, Founder, and President, Amity University, Uttar Pradesh for providing us the opportunity to conduct research. We would also like to thank Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University for providing us with necessary resources.
  1. Petracco RG, Kong A, Grechukhina O, Krikun G, Taylor HS. 2012. Global gene expression profiling of proliferative phase endometrium reveals distinct functional subdivisions. Reprod. Sci. 19: 1138-1145.
    Pubmed KoreaMed CrossRef
  2. Yanaihara A, Otsuka Y, Iwasaki S, Aida T, Tachikawa T, Irie T, et al. 2005. Differences in gene expression in the proliferative human endometrium. Fertil. Steril. 83: 1206-1215.
    Pubmed CrossRef
  3. Kelleher AM, Behura SK, Burns GW, Young SL, Demayo FJ, Spencer TE. 2019. Integrative analysis of the forkhead box A2 (FOXA2) cistrome for the human endometrium. FASEB J. 33: 8543-8554.
    Pubmed KoreaMed CrossRef
  4. Geo accession viewer. National Center for Biotechnology Information. [Accessed January 24, 2023].
  5. Altmäe S, Reimand J, Hovatta O, Zhang P, Kere J, Laisk T, et al. 2012. Research resource: Interactome of Human embryo implantation: Identification of gene expression pathways, regulation, and integrated regulatory networks. Mol. Endocrinol. 26: 203-217.
    Pubmed KoreaMed CrossRef
  6. Chi R-pin A, Wang T, Adams N, Wu S-pin, Young SL, Spencer TE, et al. 2019. Human endometrial transcriptome and progesterone receptor cistrome reveal important pathways and epithelial regulators. J. Clin. Endocrinol. Metab. 105: e1419-e1439.
    Pubmed KoreaMed CrossRef
  7. Giacomini E, Scotti GM, Vanni VS, Lazarevic D, Makieva S, Privitera L, et al. 2021. Global transcriptomic changes occur in uterine fluid-derived extracellular vesicles during the endometrial window for embryo implantation. Hum. Reprod. 36: 2249-2274.
    Pubmed KoreaMed CrossRef
  8. Sigurgeirsson B, Åmark H, Jemt A, Ujvari D, Westgren M, Lundeberg J, et al. 2017. Comprehensive RNA sequencing of healthy human endometrium at two time points of the menstrual cycle. Biol. Reprod. 96: 24-33.
  9. Salamonsen Lois A, Jennifer C Hutchison, Caroline E. Gargett. 2021. Cyclical endometrial repair and regeneration. Development 148: Dev199577.
    Pubmed CrossRef

Starts of Metrics

Share this article on :

Related articles in MBL

Most Searched Keywords ?

What is Most Searched Keywords?

  • It is most registrated keyword in articles at this journal during for 2 years.