{"id":2558,"date":"2015-09-23T11:33:05","date_gmt":"2015-09-23T03:33:05","guid":{"rendered":"http:\/\/blog.hoyo.idv.tw\/?p=2558"},"modified":"2018-02-08T16:28:21","modified_gmt":"2018-02-08T08:28:21","slug":"%e5%9c%a8-centos-6-7-%e4%b8%8a%e5%ae%89%e8%a3%9d-tesseract-ocr","status":"publish","type":"post","link":"https:\/\/blog.hoyo.idv.tw\/?p=2558","title":{"rendered":"\u5728 CentOS 6.7 \u4e0a\u5b89\u88dd Tesseract OCR"},"content":{"rendered":"<h2>\u5b89\u88dd<\/h2>\n<p>\u53c3\u8003\u81f3\uff1a\u00a0<a href=\"http:\/\/my.oschina.net\/iceman\/blog\/40771\" target=\"_blank\" rel=\"noopener\">Centos5.5 \u5b89\u88c5Tesseract-OCR<\/a>\u00a0(<a href=\"http:\/\/blog.hoyo.idv.tw\/wp-content\/uploads\/2015\/09\/Centos5.5-\u5b89\u88c5Tesseract-OCR-\u96ea\u4eba\u7684\u4e2a\u4eba\u7a7a\u95f4-\u5f00\u6e90\u4e2d\u56fd\u793e\u533a.html\" target=\"_blank\" rel=\"noopener\">\u672c\u6a5f\u5099\u4efd<\/a>)<\/p>\n<p>CentOS 5.5 \u548c 6.7 \u5be6\u5728\u6709\u4e9b\u5dee\u8ddd\uff0c\u4e0d\u904e\u9084\u597d\u8981\u5b89\u88dd\u7684\u8edf\u9ad4\u8b8a\u5316\u4e0d\u5927\uff0c\u6700\u5f8c\u5b89\u88dd\u7684\u9078\u64c7\u662f<\/p>\n<ol>\n<li><a href=\"http:\/\/www.leptonica.com\/source\/leptonica-1.69.tar.gz\">leptonica-1.69.tar.gz<\/a><\/li>\n<li><a href=\"https:\/\/code.google.com\/p\/tesseract-ocr\/downloads\/detail?name=tesseract-ocr-3.02.02.tar.gz&amp;can=2&amp;q=\">tesseract-ocr-3.02.02.tar.gz<\/a><\/li>\n<li><a href=\"https:\/\/code.google.com\/p\/tesseract-ocr\/downloads\/detail?name=tesseract-ocr-3.02.eng.tar.gz&amp;can=2&amp;q=\">tesseract-ocr-3.02.eng.tar.gz<\/a><\/li>\n<li><a href=\"https:\/\/code.google.com\/p\/tesseract-ocr\/downloads\/detail?name=tesseract-ocr-3.02.chi_tra.tar.gz&amp;can=2&amp;q=\">tesseract-ocr-3.02.chi_tra.tar.gz<\/a><\/li>\n<\/ol>\n<p><strong>\u78ba\u5be6\u6309\u7167\u5b89\u88dd\u6b65\u9a5f<\/strong>\u5c07\u76f8\u4f9d\u6027\u7a0b\u5f0f\u5148\u5b89\u88dd\u5f8c\u7de8\u8b6f\u5373\u53ef\u9806\u5229\u5b89\u88dd<\/p>\n<pre class=\"nums:false lang:sh decode:true\">yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel<\/pre>\n<p>\u7279\u5225\u5beb\u51fa\u4f86\u662f\u56e0\u70ba\uff0c\u6211\u4ee5\u70ba\u6211\u7684\u74b0\u5883\u5b89\u88dd\u4e00\u5806\u8edf\u9ad4\u61c9\u8a72\u4e0d\u7f3a\u9019\u7a2e\u57fa\u790e\u5143\u4ef6\uff0c\u7d50\u679c\u9084\u662f\u5c11\u4e86 .... = =a<\/p>\n<h2>\u4f7f\u7528<\/h2>\n<p>\u5c31\u6572\u6307\u4ee4\u5c31\u5c0d\u4e86<\/p>\n<pre class=\"lang:default decode:true\">Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]\r\n\r\npagesegmode values are:\r\n0 = Orientation and script detection (OSD) only.\r\n1 = Automatic page segmentation with OSD.\r\n2 = Automatic page segmentation, but no OSD, or OCR\r\n3 = Fully automatic page segmentation, but no OSD. (Default)\r\n4 = Assume a single column of text of variable sizes.\r\n5 = Assume a single uniform block of vertically aligned text.\r\n6 = Assume a single uniform block of text.\r\n7 = Treat the image as a single text line.\r\n8 = Treat the image as a single word.\r\n9 = Treat the image as a single word in a circle.\r\n10 = Treat the image as a single character.\r\n-l lang and\/or -psm pagesegmode must occur before anyconfigfile.\r\n\r\nSingle options:\r\n  -v --version: version info\r\n  --list-langs: list available languages for tesseract engine<\/pre>\n<p>--<\/p>\n<p>tesseract \u8fa8\u8b58\u5716\u6a94 \u7522\u751f\u6587\u5b57\u6a94\u6848\u540d\u7a31 -l \u4f7f\u7528\u8fa8\u8b58\u5b57\u9ad4<\/p>\n<pre class=\"nums:false lang:default decode:true\">tesseract \/tmp\/phototest.tif \/tmp\/output -l eng<\/pre>\n<p>\u8f38\u51fa\u7684\u6a94\u6848\u6703\u81ea\u52d5\u52a0\u4e0a .txt \u526f\u6a94\u540d<\/p>\n<p>phototest.tif \u662f\u5167\u9644\u7684\u6e2c\u8a66\u5716\u6a94\uff0c\u53ef\u4ee5\u5230 <a href=\"https:\/\/www.drupal.org\/files\/issues\/phototest.jpg\" target=\"_blank\" rel=\"noopener\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\">\u9019\u88e1<\/a> \u770b<\/p>\n<p>\u56e0\u70ba\u6709\u5b89\u88dd\u6b63\u9ad4\u4e2d\u6587\u5b57\u9ad4\u8fa8\u8b58\u6a94\u6848\uff0c\u7576\u7136\u4e5f\u53ef\u4ee5\u63db\u6210\u9019\u6a23\u8fa8\u8b58<\/p>\n<pre class=\"nums:false lang:default decode:true\">tesseract phototest.tif output -l chi_tra<\/pre>\n<p>\u4e0d\u904e\u8fa8\u8b58\u6b63\u78ba\u7387\u5c31\u76f8\u7576\u5dee\u4e86\uff0c\u7d50\u679c\u5982<\/p>\n<pre class=\"nums:false lang:default highlight:0 decode:true\">ThiS iS a |0t of T2 point teXt to teSt the\r\noc\u300c c0de and see if it WorkS 0n a|| typeS\r\nof fi|e f0\u300cmat'\r\n\r\n\u02c9|\u02c9he quick br0Wn do9 jumped oVe\u300c the\r\n|aZy fo)(_ The quick broWn do9 jumped\r\noVer the |aZy f0X_ \u02c9|\u02c9he quick br0Wn do9\r\njumped 0Ver the |aZy f0X_ \u02c9\u300che quick\r\nbr0Wn do9 jumped oVe\u300c the |aZy fo)(_\r\n<\/pre>\n<p>\u770b\u4e0d\u61c2\u7684\u4eba\u8acb\u770b eng \u8fa8\u8b58\u7d50\u679c\uff0c\u5982<\/p>\n<pre class=\"nums:false lang:default highlight:0 decode:true\">This is a lot of 12 point text to test the\r\nocr code and see if it works on all types\r\nof file format.\r\n\r\nThe quick brown dog jumped over the\r\nlazy fox. The quick brown dog jumped\r\nover the lazy fox. The quick brown dog\r\njumped over the lazy fox. The quick\r\nbrown dog jumped over the lazy fox.\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h2>\u6709\u95dc\u8fa8\u8b58\u7387\u63d0\u6607<\/h2>\n<ul>\n<li><a href=\"http:\/\/yy-programer.blogspot.tw\/2012\/08\/training-tesseract-ocr-301.html\" target=\"_blank\" rel=\"noopener\">Training Tesseract OCR 3.0.1<\/a><\/li>\n<li><a href=\"http:\/\/miphol.com\/muse\/2013\/05\/tesseract-ocr.html\" target=\"_blank\" rel=\"noopener\">TESSERACT OCR \u4e2d\u6587\u8bc6\u522b\u5c1d\u8bd5<\/a><\/li>\n<\/ul>\n<p>--<\/p>\n<h2>\u514d\u5b89\u88dd<\/h2>\n<p>\u5b89\u88dd\u5f8c\u7684\u6a94\u6848\u5373\u53ef\u8907\u88fd\u51fa\u4f86\u4f7f\u7528\uff0c\u4f7f\u7528\u4e0a\u6703\u9047\u5230\u7684\u554f\u984c\u5c31\u662f tessdata \u8def\u5f91\u6307\u5b9a<\/p>\n<pre class=\"lang:default decode:true\">&gt;tesseract.exe 3.jpg 3 --tessdata-dir .\\tessdata -l chi_tra<\/pre>\n<p>--<\/p>\n<h2>\u6e2c\u8a66\u7d50\u679c<\/h2>\n<ul>\n<li>\u4e0d\u540c\u7684\u7248\u672c\u8a9e\u8a00\u8fa8\u8b58\u6a94\u6848\u7121\u6cd5\u5171\u7528<\/li>\n<li>\u4e0d\u540c\u7684\u8fa8\u8b58\u6a94\u8fa8\u8b58\u7387\u4e0d\u540c<\/li>\n<li>\u4ee5\u4e0a\u7684\u554f\u984c\u53ef\u4ee5\u85c9\u7531\u514d\u5b89\u88dd\u7684\u65b9\u5f0f\u89e3\u6c7a<\/li>\n<li>4.0 \u7a0b\u5f0f\u53ef\u4ee5\u5957\u7528 3.05 \u8fa8\u8b58\u6a94\u6848<\/li>\n<\/ul>\n<p>--<\/p>\n<h2>Windows \u4ee5\u53ca\u8a13\u7df4<\/h2>\n<ul>\n<li><a href=\"https:\/\/digi.bib.uni-mannheim.de\/tesseract\/\" target=\"_blank\" rel=\"noopener\">https:\/\/digi.bib.uni-mannheim.de\/tesseract\/<\/a><\/li>\n<li><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>--<\/p>\n<div class=\"pvc_clear\"><\/div>\n<p class=\"pvc_stats all \" data-element-id=\"2558\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> &nbsp;1,489&nbsp;total views<\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u5b89\u88dd \u53c3\u8003\u81f3\uff1a\u00a0Centos5...<\/p>\n<div class=\"pvc_clear\"><\/div>\n<p class=\"pvc_stats all \" data-element-id=\"2558\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> &nbsp;1,489&nbsp;total views<\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[26],"tags":[185,205],"_links":{"self":[{"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=\/wp\/v2\/posts\/2558"}],"collection":[{"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2558"}],"version-history":[{"count":10,"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=\/wp\/v2\/posts\/2558\/revisions"}],"predecessor-version":[{"id":4262,"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=\/wp\/v2\/posts\/2558\/revisions\/4262"}],"wp:attachment":[{"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2558"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.hoyo.idv.tw\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}