{"id":776,"date":"2020-10-29T10:02:00","date_gmt":"2020-10-29T01:02:00","guid":{"rendered":"https:\/\/arithmer.blog\/?p=776"},"modified":"2022-03-08T15:45:04","modified_gmt":"2022-03-08T06:45:04","slug":"instance-segmentation-how-to-for-real-time-predictions","status":"publish","type":"post","link":"https:\/\/arithmer.blog\/blog\/instance-segmentation-how-to-for-real-time-predictions","title":{"rendered":"YOLACT \u30ea\u30a2\u30eb\u30bf\u30a4\u30e0\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u30bb\u30b0\u30e1\u30f3\u30c6\u30fc\u30b7\u30e7\u30f3"},"content":{"rendered":"\n<p class=\"has-small-font-size\">\u672c\u8cc7\u6599\u306f2020\u5e7410\u670829\u65e5\u306b\u793e\u5185\u5171\u6709\u8cc7\u6599\u3068\u3057\u3066\u5c55\u958b\u3057\u3066\u3044\u305f\u3082\u306e\u3092WEB\u30da\u30fc\u30b8\u5411\u3051\u306b\u30ea\u30cb\u30e5\u30fc\u30a2\u30eb\u3057\u305f\u5185\u5bb9\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"361\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_01.jpg\" alt=\"\" class=\"wp-image-769\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_01.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_01-300x108.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_01-768x277.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_01-304x110.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<h3 class=\"has-medium-font-size wp-block-heading\" id=\"instance-segmentation\"><strong>\u25a0Instance Segmentation<\/strong><\/h3>\n\n\n\n<p style=\"font-size:16px\">\u2794 <em>\u201cInstance segmentation is the task of detecting and delineating each distinct object of interest appearing in an image\u201d<\/em> \u2014 <a rel=\"noreferrer noopener\" href=\"https:\/\/paperswithcode.com\/task\/instance-segmentation\" target=\"_blank\">source<\/a><br><br>\u2794 Sub-task of:<\/p>\n\n\n\n<ul style=\"font-size:16px\"><li>\u201cObject Detection\u201d<\/li><li>\u201cSemantic Segmentation\u201d<\/li><\/ul>\n\n\n\n<p style=\"font-size:16px\">\u2794Improvements in baselines (R-CNN, FCN) for the \u201cparent\u201d tasks do not automatically apply to the \u201cdaughter\u201d task<br><br>\u2794 Typically combines:<\/p>\n\n\n\n<ul style=\"font-size:16px\"><li>detection of boxes for all objects<\/li><li>segmentation of pixels<\/li><\/ul>\n\n\n\n<h3 class=\"has-medium-font-size wp-block-heading\" id=\"methodology\"><strong>\u25a0Methodology<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"216\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_02.jpg\" alt=\"\" class=\"wp-image-770\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_02.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_02-300x65.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_02-768x166.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_02-304x66.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<p style=\"font-size:16px\">\u2794 Based on the <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1703.06870\" target=\"_blank\"><strong>Mask R-CNN<\/strong><\/a> model:<\/p>\n\n\n\n<ul style=\"font-size:16px\"><li>Approach is \u201cdetect\u201d and THEN \u201csegment\u201d: two-steps<\/li><li>A <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1311.2524\" target=\"_blank\">Region-based CNN<\/a> (<a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1506.01497\" target=\"_blank\">Faster R-CNN<\/a>)<br>outputs class labels and bounding-box offset for each candidate<ul><li>Start with a Region Proposal Network (RPN)<\/li><li>Extract features from RoI and predict class and bbox<\/li><\/ul><\/li><li>Additionally adds a branch to output the pixel mask of the object<ul><li>Uses Fully Convolutional Networks (FCN) sharing weights and maintaining spatial correspondence<\/li><li>Needs alignment between pixels and feature maps (RoIAlign)<\/li><\/ul><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"465\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_03.jpg\" alt=\"\" class=\"wp-image-771\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_03.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_03-300x140.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_03-768x357.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_03-304x141.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<p style=\"font-size:16px\">\u2794 Based on the <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1904.02689\" target=\"_blank\">YOLACT<\/a> model:<\/p>\n\n\n\n<ul style=\"font-size:16px\"><li>Approach is single-step, like anchor-free object detection (e.g. CenterNet) (but still has anchors)<\/li><li>Uses a \u201cglobal mask\u201d instead of separate masks for instances: no loss of quality due to reduced resolution<\/li><li>Performs 2 parallel tasks:<ul><li>Generate prototype \u201cglobal\u201d masks (entire image)<\/li><li>Predict linear combination of coefficients for each instance (hence the name: You Only Look At CoefficienTs)<\/li><\/ul><\/li><li>Instance masks are constructed by combining prototypes with the learned coefficients in an assembly step (crop to bbox)<\/li><li>Computation cost is constant with <a href=\"https:\/\/manage.wix.com\/dashboard\/81eabb3b-1ef3-4ded-81d6-982866704a5e\/blog\/overview\/search\/.hash.instance?referralInfo=sidebar\">#instance<\/a><\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"431\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_04.jpg\" alt=\"\" class=\"wp-image-772\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_04.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_04-300x129.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_04-768x331.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_04-304x131.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"455\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_05.jpg\" alt=\"\" class=\"wp-image-773\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_05.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_05-300x137.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_05-768x349.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_05-304x138.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<h3 class=\"has-medium-font-size wp-block-heading\" id=\"related-work\"><strong>\u25a0Related work<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"284\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_06.jpg\" alt=\"\" class=\"wp-image-774\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_06.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_06-300x85.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_06-768x218.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_06-304x86.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<p style=\"font-size:18px\"><strong>YOLACT (and YOLACT++) are similar to:<\/strong><\/p>\n\n\n\n<ol style=\"font-size:16px\"><li><a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/2001.00309\" target=\"_blank\">BlendMask<\/a> (CVPR20) uses attention maps instead of coefficients<\/li><li><a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1911.06667\" target=\"_blank\">CenterMask<\/a> (CVPR20) based on anchor-free Obj. Detection<\/li><li><a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/2003.05664\" target=\"_blank\">CondInst<\/a> removes dependency on bbox in assembly step<\/li><li><a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1912.04488\" target=\"_blank\">SOLO<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/2003.10152\" target=\"_blank\">SOLOv2<\/a> entirely bbox free: predicts instance category directly pixel by pixel<\/li><\/ol>\n\n\n\n<p style=\"font-size:16px\">Exceptional resources on the open-source instance-segmentation toolbox from Adelaide University (on top of <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/facebookresearch\/detectron2\" target=\"_blank\">detectron2<\/a>): <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/aim-uofa\/AdelaiDet\" target=\"_blank\">AdelaiDet<\/a><\/p>\n\n\n\n<p style=\"font-size:18px\"><strong>Mask accuracy details<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"411\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_07.jpg\" alt=\"\" class=\"wp-image-775\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_07.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_07-300x123.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_07-768x316.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_07-304x125.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<h3 class=\"has-medium-font-size wp-block-heading\" id=\"real-time-instance-segmentation\"><strong>\u25a0Real-time instance segmentation<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"378\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_08.jpg\" alt=\"\" class=\"wp-image-765\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_08.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_08-300x113.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_08-768x290.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_08-304x115.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"389\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_09.jpg\" alt=\"\" class=\"wp-image-766\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_09.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_09-300x117.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_09-768x299.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_09-304x118.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<h3 class=\"has-medium-font-size wp-block-heading\" id=\"more-real-time-instance-segmentation\"><strong>\u25a0More Real-Time instance segmentation<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1000\" height=\"358\" src=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_10.jpg\" alt=\"\" class=\"wp-image-767\" srcset=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_10.jpg 1000w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_10-300x107.jpg 300w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_10-768x275.jpg 768w, https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/NS20201029_10-304x109.jpg 304w\" sizes=\"(max-width: 1000px) 100vw, 1000px\"><\/figure>\n\n\n\n<h3 class=\"has-medium-font-size wp-block-heading\" id=\"\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\"><strong>\u25a0\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9<\/strong><\/h3>\n\n\n\n<p style=\"font-size:16px\"><a href=\"https:\/\/arithmer.blog\/wp-content\/uploads\/2022\/02\/13_YOLACT-\u30ea\u30a2\u30eb\u30bf\u30a4\u30e0\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u30bb\u30b0\u30e1\u30f3\u30c6\u30fc\u30b7\u30e7\u30f3.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">YOLACT-\u30ea\u30a2\u30eb\u30bf\u30a4\u30e0\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u30bb\u30b0\u30e1\u30f3\u30c6\u30fc\u30b7\u30e7\u30f3.pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u672c\u8cc7\u6599\u306f2020\u5e7410\u670829\u65e5\u306b\u793e\u5185\u5171\u6709\u8cc7\u6599\u3068\u3057\u3066\u5c55\u958b\u3057\u3066\u3044\u305f\u3082\u306e\u3092WEB\u30da\u30fc\u30b8\u5411\u3051\u306b\u30ea\u30cb\u30e5\u30fc\u30a2\u30eb\u3057\u305f\u5185\u5bb9\u306b\u306a\u308a\u307e\u3059\u3002 \u25a0Instance Segmentation \u2794 \u201cInstance segmentation is &#8230; <\/p>\n","protected":false},"author":3,"featured_media":768,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[20,82,80,81,35,24,45,36],"_links":{"self":[{"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/posts\/776"}],"collection":[{"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/arithmer.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=776"}],"version-history":[{"count":4,"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/posts\/776\/revisions"}],"predecessor-version":[{"id":784,"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/posts\/776\/revisions\/784"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/arithmer.blog\/index.php?rest_route=\/wp\/v2\/media\/768"}],"wp:attachment":[{"href":"https:\/\/arithmer.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/arithmer.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/arithmer.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}