Automatic segmentation of coronary lumen and external elastic membrane in intravascular ultrasound images using 8-layer U-Net (2024)

Journal List
Biomed Eng Online
v.20; 2021
PMC7866471

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Biomed Eng Online. 2021; 20: 16.

Published online 2021 Feb 6. doi:10.1186/s12938-021-00852-0

PMCID: PMC7866471

PMID: 33549115

Liang Dong,^#¹ Wenbing Jiang,^#² Wei Lu,^#³ Jun Jiang,¹ Ya Zhao,³ Xiangfen Song,³ Xiaochang Leng,³ Hang Zhao,³ Jian’an Wang,¹ Changling Li,¹ and Jianping Xiang³

Author information Article notes Copyright and License information PMC Disclaimer

Associated Data

Data Availability Statement

Abstract

Background

Intravascular ultrasound (IVUS) is the golden standard in accessing the coronary lesions, stenosis, and atherosclerosis plaques. In this paper, a fully automatic approach by an 8-layer U-Net is developed to segment the coronary artery lumen and the area bounded by external elastic membrane (EEM), i.e., cross-sectional area (EEM-CSA). The database comprises single-vendor and single-frequency IVUS data. Particularly, the proposed data augmentation of MeshGrid combined with flip and rotation operations is implemented, improving the model performance without pre- or post-processing of the raw IVUS images.

Results

The mean intersection of union (MIoU) of 0.937 and 0.804 for the lumen and EEM-CSA, respectively, were achieved, which exceeded the manual labeling accuracy of the clinician.

Conclusion

The accuracy shown by the proposed method is sufficient for subsequent reconstruction of 3D-IVUS images, which is essential for doctors’ diagnosis in the tissue characterization of coronary artery walls and plaque compositions, qualitatively and quantitatively.

Keywords: IVUS, Coronary, Segmentation, Lumen, EEM, MeshGrid, U-Net

Background

Coronary heart disease has been the leading cause of death worldwide [1], and the coronary atherosclerosis is the dominant cause of coronary heart disease. In early atherosclerosis, coronary artery remodeling slows down the progression of vascular stenosis with the accumulation of coronary plaques. Intravascular ultrasound (IVUS) is one of the most effective real-time medical imaging techniques, which plays a critical role in the diagnosis and treatment of coronary heart disease.

2D IVUS images acquired serially by an IVUS catheter pulling back through the coronary artery can evaluate arterial distensibility caused by atherosclerotic plaque. The accurate segmentation of lumen and external elastic membrane cross-sectional area (EEM-CSA) from 2D coronary IVUS images that contributes to assessing the atherosclerosis plaque and its vulnerability by measuring lumen diameter, plaque eccentricity, plaque burden, etc., has crucial clinical significance. However, it is time-consuming and experience-dependent for doctors to manually delineate the lumen and EEM contours on the 2D IVUS images. A typical IVUS pullback contains more than 3000 images, so an accurate, fast, and fully automatic segmentation of lumen and EEM-CSA is highly desirable, but remains a challenging task due to the relative complexity of the IVUS images.

Results

Experiments were carried out for segmenting the lumen and EEM-CSA with four augmentation strategies of No Augmentation, Flip–Rotate, MeshGrid, MeshGrid–Flip–Rotate. The contours predicted by the method with MeshGrid–Flip–Rotate augmentation (3rd row of Fig.1) were in higher agreement with the ground truth (2nd row of Fig.1) for a range of morphologies, in comparison with those with No augmentation (4th row of Fig.1), MeshGrid (5th row of Fig.1), and Flip–Rotate (6th row of Fig.1). Taking the 2nd and 3rd column as example, the results of last 3 rows show that some noise points would be segmented in the background and EEM-CSA area.

Open in a separate window

Fig. 1

$6 \times 5$ image matrix of segmentation result comparison for four different data augmentation strategies. The rows show the original image and different image augmentation strategies, including raw IVUS images as inputs (1st row), ground truth images as outputs (2nd row), No augmentation (4th row), Meshgrid (5th row), Flip–Rotate (6th row) and Meshgrid–Flip–Rotate (3rd row). The columns represent different IVUS image cases, choosing images of different shapes and sizes as much as possible (1st columns to 5th columns). The figure shows that Meshgrid–Flip–Rotate (3rd row) method have best segmentation performance, which are very close to the ground truth images

Table Table11 quantifies the segmentation results from the four data augmentation strategies. Compared with the other three data augmentation strategies, the way of MeshGrid–Flip–Rotate presented better segmentation performance in both lumen and EEM-CSA, with MIoU of 0.937 and 0.804, respectively. The reason why 8-layer U-Net does not exceed EEM segmentation result of Ji Yang et al. [10] could be addressed from two perspectives. On the one hand, the training data are too small to train a powerful segmentation model; for another, 8-layer U-Net’s architecture could be further optimized.

Table 1

The quantitative performance for the four different data augmentation strategies

Augmentation strategies	MIoU (lumen)	MIoU (EEM-CSA)
No augmentation	0.872	0.703
Flip–Rotate	0.915	0.759
MeshGrid	0.894	0.747
MeshGrid–Flip–Rotate	0.937	0.804
Ji Yang et al.[10]	0.900	0.860

Open in a separate window

The italic number results represent best MIoU performance in Lumen and EEM-CSA, our MeshGrid–Flip–Rotate can achieve 0.937 MIoU of lumen, which is better than other methods. But MIoU results of EEM-CSA is 0.804, which is inferior to 0.860 of Ji Yang et al.

Discussion

The IVUS images varied significantly from the intensity gradient of edge of lumen to the contour curvature of plaques. The current dataset was limited to the single-vendor and single-brand. However, the proposed method provided acceptable segmentation results for both lumen and EEM-CSA from the frames in the dataset. On the visual comparison, in the case of the EEM-CSA segmentation, the performance was lower for complex frames while it was comparable good for simple frames; excellent results were seen for lumen segmentation for all cases. The segmentation of bifurcation images might be difficult due to the ambiguous vessel definition. The amount of calcified plaque could be possibly underestimated by IVUS and images of the vessel wall could be degraded due to geometric distortions, and shadows at the back of calcified plaques, all of these could make the model difficult to train. The case data including calcified plaque needs to be enriched in future to train a more robust model.

The data augmentation of MeshGrid–Flip–Rotate helped improve the segmentation performance. It was capable of generalizing well to eliminate the outliers (3rd row of Fig.3). Neither pre-processing steps nor post-processing steps were necessary. The model was trained well on the current dataset, which provided MIoU of 0.937 for lumen predictions. This result is better than the results in the literature in Table Table1.1. Investigating the reason, we believe that the help mainly comes from MeshGrid–Flip–Rotate data augmentation method and the 8-layer U-Net network that can extract more image features. However, when the testing set deviated far from the training set, such as serious artifacts, mixture plaques and branch vessels, the accuracy for EEM-CSA became relatively low (MIoU of 0.804). It can be improved largely when more coronary IVUS data of different categories are collected for training in the future.

Open in a separate window

Fig. 3

The proposed U-Net Architecture with 8 layers. The 8-layer deep U-Net consists of three parts, encoding networks (left in the figure), decoding networks (right in the figure) and skip connection (black arrow in the middle)

From clinical perspective, a clinical threshold to assess the quality of the method should be provided by expert physicians to interpret the segmentation feature with complex or simple frames. For example, a fast pullback through the calcified lesion may result in loss of image features, increase of catheter artifacts and calcified shadows from the echogenicity of the lumen and plaque textures, which makes it tougher to be annotated even by experienced physicians. The cardiac cycle motion and coronary vessel pulsation due to the variability or arrhythmia of heart rate might push the catheter to touch the vessel and plaque boundaries, which increase the artifacts and motion jitters in the IVUS images.

Our future research is to extend the current dataset to enhance the robustness and generality of our method presented in this paper. The heterogeneous dataset of IVUS images shall cover different medical centers, different probe frequencies from different venders. More IVUS image categories from different artery pullback sections and different characteristics should be considered: plaque, bifurcations, branches, shadow artifact, stent, catheter artifact, etc. Each frame shall be cross-labeled by three expert physicians according to the respective categories to assess the method, which will make it more convincing.

Conclusion

In this paper, an 8-layer U-Net is proposed with the data augmentation of MeshGrid–Flip–Rotate, which specifically fits for the coronary IVUS lumen and EEM-CSA segmentation task. The experimental results show its superiority in segmentation accuracy and efficiency. Furthermore, it provides a good start for the image-based gating to implement 3D-IVUS reconstruction when fused with X-ray projections, which enables fluid and dynamic analysis on plaques and vascular walls of coronary arteries.

Method

In this section, we first introduce the coronary IVUS dataset used for training and testing. Then, the 8-layer deep U-Net architecture that predicts the masks for the lumen and the EEM-CSA of IVUS images is presented. The training details are described, and the metric for evaluating the proposed method is illustrated.

Dataset and augmentation

We use the coronary IVUS dataset from The Second Affiliated Hospital of Zhejiang University School of Medicine. It consists of in vivo pullback of coronary artery acquired by the iLab IVUS from Boston Scientific Corporation equipped with the 40-MHz OptiCross catheter. It contains IVUS frames from 30 patients, which are chosen at the end-diastolic cardiac phase in DICOM formats, with the resolution of 512 × 512. The dataset is divided into two parts, 567 frames of 24 patients for training and 108 frames of 6 patients for testing, respectively. The training set is used for building the deep learning model and the testing set is used to evaluate the model performance.

IVUS images contain catheter, lumen, endothelium, intima, media, external elastic membrane, adventitia, atherosclerosis plaque. The external elastic membrane is usually treated as the borders of media and adventitia. The media is gray or dark as it contains dense smooth muscle. The adventitia is similar to external tissues surrounding the vascular walls. The endothelium and intima are thinner than the lumen and media. Thus, the lumen and EEM-CSA can be manually annotated by experienced physicians as the ground truth for metric evaluation. Each IVUS frame has been manually annotated for the lumen and EEM-CSA in the short-axis view by three clinical experts, daily working with the specific IVUS brand from the Cardiology Department, shown in Fig.2. Each expert is blinded to the other two experts’ annotations and each frame is repeatedly labeled by each of the three experts to ensure the correctness and blindness of the annotations. From the visual point of view of annotation, 92% of the annotated cases have high consistency.

Open in a separate window

Fig. 2

Ground truth labeling for lumen and EEM-CSA. The right image is raw IVUS image as train inputs, the middle image is annotation mask image as train outputs, the left image is a superposition of right and middle image

The training set comprises 567 frames, which is not large enough for training a CNN model from scratch. Data augmentation is essential for better performance. The augmentation is twofold and performed online. First, the coronary IVUS raw images and the corresponding ground truth are randomly (1) rotated at angles: 90°, 180° or 270°; (2) flipped up–down or left–right. Secondly, the MeshGrid is added to the raw image at pixel-level, providing the relative location information. Due to the relatively fixed position like intima and adventitia in IVUS images, MeshGrid could play a good guiding role in training process, which guides the neural network where to look.

Network architectures

The U-Net is one type of the fully convolutional network [13] and is the most common convolutional network architecture for biomedical image segmentation. It consists of encoder and decoder parts and predicts segmentation mask at pixel-level instead of image-level classification. The encoder part is used for down-sampling and extracts higher-level features. The decoder part is used for up-sampling the output from the encoder part and concatenates the feature maps of the corresponding layer by skip connection. The skip connection is to relieve the gradient diffusion problem due to deep layers. The final decoder layer is activated by softmax to produce the class probability map to recover the segment predictions.

The encoder part has 9 blocks and each incorporates two repeated operations of 3 × 3 convolution, batch normalization and LeakyReLU activation. The down-sampling operation of 3 × 3 convolution with stride 2 × 2 reduces feature maps by half. The size of the 8th block is 2 × 2 to capture the deeper abstract information. The decoder part has 8 blocks to restore the image dimension. Each up-sampling operation contains a 5 × 5 deconvolution with stride 2. The skip connection concatenates the corresponding feature maps. The last convolution outputs the probability map of mask class prediction by softmax activation. The entire architecture is shown in Fig.3. The parameter initialization of all layers of the model uses the random initialization method.

Compared to other U-Net variations, our proposed U-Net was no major innovation in structure. We replaced the original 4-layer network with an 8-layer network, which been able to extract deeper image features. The actual results also confirmed this simple deepening design.

Implementation details

The model was trained and evaluated on Dell PowerEdge T640 server with Xeon Silver 4114 processor, 128GB of RAM, and four Nvidia GTX 1080Ti graphics cards. It took less than 90min for training and 10ms per image for inference.

We implement model training with TensorFlow framework. The less frames are not enough to train the CNN from scratch, in addition to data augmentation, we also employed transfer learning to initialize the encoder of U-Net’s weights using VGG16[14]. The optimizer was Adam [15], which was fast and robust. The weights were initialized randomly and the batch size was set to 16. The initial learning rate was 0.001 with the decay of 0.1 every 2000 iterations. A total of 8000–10000 iterations were done for training. Lumen and EEM-CSA were trained and predicted at one shot with the softmax function as the output activation, which gave each pixel its class probability. The loss function was the sparse softmax cross entropy [16]:

$L (p_{\hat{y}}, p) = - \sum_{j = 1}^{K} p_{{\hat{y}}_{j}} log (p_{j}),$

$p_{j} = softmax(x_{j}) = \frac{e^{x_{j}}}{\sum_{k = 1}^{K} e^{x_{k}}},$

with K being the number of classes, $p_{j}$ being the predicted probability belonging to class j, and $p_{y_{j}}$ being the true probability.

Evaluation criteria

In semantic segmentation, the mean intersection over union (MIoU) is a widely used metric to evaluate the model, which is a common measure in semantic segmentation [17]. We compute the MIoU score between the ground truth and the predicted masks:

$MIoU = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{p_{ii}}{\sum_{j = 0}^{k} p_{ij} + \sum_{j = 0}^{k} p_{ji} - p_{ii}},$

with k being the number of classes excluding background, and $p_{ij}$ being the number of pixels of class i predicted to class j.

Acknowledgements

The authors thank the Second Affiliated Hospital of Zhejiang University for providing the IVUS datasets.

Abbreviations

3D	Three-dimensional
IVUS	Intravascular ultrasound
EEM-CSA	External elastic membrane cross-sectional area
MIoU	Mean Intersection of Union

Authors’ contributions

LD: project administration; WJ: project administration; WL: original draft writing; JJ: data curation; YZ: original draft writing; XS: formal analysis and paper revision; XL: data curation; HZ: formal analysis and paper revision; JW: project administration; CL: project administration; JX: project administration. All authors read and approved the final manuscript.

Funding

This study was supported by National Natural Science Foundation of China (No. 81320108003, No. 31371498, Nos. 81100141 and 81570322), Zhejiang Provincial Public Welfare Technology Research Project (No. LGF20H020012), Zhejiang Provincial key research and development plan (No. 2020C03016), the Major projects in Wenzhou of China (No. 2019ZG0107) and Scientific research project of Zhejiang Education Department (Y201330290).

Availability of data and materials

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate

This study protocol was approved by the Second Affiliated Hospital of Zhejiang University. We followed the CONSORT guideline to perform this study.

Consent for publication

Not applicable.

Competing interests

No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Liang Dong andWenbing Jiang are co-first authors for this paper

References

1. Hay S. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390:10100. doi:10.1016/S0140-6736(17)32130-X. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

2. Faraji M, Cheng I, Naudin I, Basu A. Segmentation of arterial walls in intravascular ultrasound cross-sectional images using extremal region selection. Ultrasonics. 2018;84:356–365. doi:10.1016/j.ultras.2017.11.020. [PubMed] [CrossRef] [Google Scholar]

3. Lee JH, Hwang YN, Kim GY, Min KS. Segmentation of the lumen and media-adventitial borders in intravascular ultrasound images using a geometric deformable model. IET Image Proc. 2018;12(10):1881–1891. doi:10.1049/iet-ipr.2017.1143. [CrossRef] [Google Scholar]

4. Sun S, Sonka M, Beichel RR. Graph-based IVUS segmentation with efficient computer-aided refinement. IEEE Trans Med Imaging. 2013;32(8):1536–1549. doi:10.1109/TMI.2013.2260763. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

5. Cui H, et al. Fast marching and Runge–Kutta based method for centreline extraction of right coronary artery in human patients. Cardiovasc Eng Technol. 2016;7(2):159–169. doi:10.1007/s13239-016-0263-0. [PubMed] [CrossRef] [Google Scholar]

6. Cui H, Xia Y, Zhang Y, Zhong L. Validation of right coronary artery lumen area from cardiac computed tomography against intravascular ultrasound. Mach Vis Appl. 2018;29(8):1287–1298. doi:10.1007/s00138-018-0978-z. [CrossRef] [Google Scholar]

7. Shen D, Wu G, Suk H-I. Deep Learning in Medical Image Analysis. Annu Rev Biomed Eng. 2017;19:42. doi:10.1146/annurev-bioeng-071516-044442. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi:10.1038/nature14539. [PubMed] [CrossRef] [Google Scholar]

9. Su S, Gao Z, Zhang H, Qiang L, Li S. Detection of lumen and media-adventitia borders in IVUS images using sparse auto-encoder neural network. In: International symposium on biomedical imaging (ISBI), 2017.

10. Su S, Hu Z, Lin Q, Hau WK, Gao Z, Zhang H. An artificial neural network method for lumen and media-adventitia border detection in IVUS. Comput Med Imaging Graph. 2017;57:29–39. doi:10.1016/j.compmedimag.2016.11.003. [PubMed] [CrossRef] [Google Scholar]

11. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2015, pp. 234–241.

12. Sun S, Pang J, Shi J, Yi S, Ouyang W. Fishnet: a versatile backbone for image, region, and pixel level prediction. In: Advances in neural information processing systems. 2018, pp. 754–764.

13. Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(4):640–651. doi:10.1109/TPAMI.2016.2572683. [PubMed] [CrossRef] [Google Scholar]

14. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014.

15. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

16. Martins A, Astudillo R. From softmax to sparsemax: a sparse model of attention and multi-label classification. In: International conference on machine learning. 2016, pp. 1614–1623.

17. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D. Image segmentation using deep learning: A survey. arXiv preprint arXiv:2001.05566. 2020. [PubMed]

Articles from BioMedical Engineering OnLine are provided here courtesy of BMC