ИАПУ ДВО РАН

Using Fully Convolutional Network to Locate Transcription Factor Binding Sites Based on DNA Sequence and Conservation Information


2023

IEEE/ACM Transactions on Computational Biology and Bioinformatics , Q1

Статьи в журналах

Vol. 20, no. 3. Pp. 2690-2699.

Zhang Q., Xu Y., Wang S., Wu Y., Ye Y., Yuan C.A., Gribova V., Filaretov, V.F., Huang D.S. Using Fully Convolutional Network to Locate Transcription Factor Binding Sites Based on DNA Sequence and Conservation Information // IEEE/ACM Transactions on Computational Biology and Bioinformaticsthis. 2023. Vol. 20, no. 3. Pp. 2690-2699. Q1. DOI: 10.1109/TCBB.2022.3219831.

Transcription factors (TFs) play a part in gene expression. TFs can form complex gene expression regulation system by combining with DNA. Thereby, identifying the binding regions has become an indispensable step for understanding the regulatory mechanism of gene expression. Due to the great achievements of applying deep learning (DL) to computer vision and language processing in recent years, many scholars are inspired to use these methods to predict TF binding sites (TFBSs), achieving extraordinary results. However, these methods mainly focus on whether DNA sequences include TFBSs. In this paper, we propose a fully convolutional network (FCN) coupled with refinement residual block (RRB) and global average pooling layer (GAPL), namely FCNARRB. Our model could classify binding sequences at nucleotide level by outputting dense label for input data. Experimental results on human ChIP-seq datasets show that the RRB and GAPL structures are very useful for improving model performance. Adding GAPL improves the performance by 9.32% and 7.61% in terms of IoU (Intersection of Union) and PRAUC (Area Under Curve of Precision and Recall), and adding RRB improves the performance by 7.40% and 4.64%, respectively. In addition, we find that conservation information can help locate TFBSs.

10.1109/TCBB.2022.3219831

https://ieeexplore.ieee.org/abstract/document/9950422