文章詳情頁

python 如何做一個識別率百分百的OCR

瀏覽：44日期：2022-06-18 09:20:08

目錄寫在前面技術棧實現思路具體實現讀取圖片二值化圖像膨脹找輪廓外接矩形過濾字符字符分割構造數據集向量搜索（分類）生成結果寫在前面

當然這里說的百分百可能有點夸張，但其實想象一下，游戲里面的某個窗口的字符就是那種樣子，不會變化的。而且識別的字符可能也不需要太多。中文有大幾千個常用字，還有各種符號，其實都不需要。

這里針對的場景很簡單，主要是有以下幾點：

識別的字符不多：只要識別幾十個常用字符即可，比如說26個字母，數字，還有一些中文。背景統一，字體一致：我們不是做驗證碼識別，我們要識別的字符都是清晰可見的。字符和背景易分割：一般來說就是對圖片灰度化之后，黑底白字或者白底黑字這種。技術棧

這里用到的主要就是python+opencv了。

python3 opencv-python

環境主要是以下的庫：

pip install opencv-pythonpip install imutilspip install matplotlib實現思路

首先看下圖片的灰度圖。

python 如何做一個識別率百分百的OCR

第一步：二值化，將灰度轉換為只有黑白兩種顏色。

python 如何做一個識別率百分百的OCR

第二步：圖像膨脹，因為我們要通過找輪廓算法找到每個字符的輪廓然后分割，如果是字符還好，中文有很多左右偏旁，三點水這種無法將一個整體進行分割，這里通過膨脹將中文都黏在一起。

python 如何做一個識別率百分百的OCR

第三步：找輪廓。

python 如何做一個識別率百分百的OCR

第四步：外接矩形。我們需要的字符是一個矩形框，而不是無規則的。

python 如何做一個識別率百分百的OCR

第五步：過濾字符，這里比如說標點符號對我來說沒用，我通過矩形框大小把它過濾掉。

python 如何做一個識別率百分百的OCR

第六步：字符分割，根據矩形框分割字符。

python 如何做一個識別率百分百的OCR

第七步：構造數據集，每一類基本上放一兩張圖片就可以。

python 如何做一個識別率百分百的OCR

第八步：向量搜索+生成結果，根據數據集的圖片，進行向量搜索得到識別的標簽。然后根據圖片分割的位置，對識別結果進行排序。

具體實現讀取圖片

首先先讀取待識別的圖片。

import cv2import numpy as npfrom matplotlib import pyplot as pltfrom matplotlib.colors import NoNormimport imutilsfrom PIL import Imageimg_file = 'test.png'im = cv2.imread(img_file, 0)

使用matplotlib畫圖結果如下：

python 如何做一個識別率百分百的OCR

二值化

在進行二值化之前，首先進行灰度分析。

python 如何做一個識別率百分百的OCR

灰度值是在0到255之間，0代表黑色，255代表白色。可以看到這里背景色偏黑的，基本集中在灰度值30，40附近。而字符偏白，大概在180灰度這里。

這里選擇100作為分割的閾值。

thresh = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY)[1]

2值化后效果如下：

python 如何做一個識別率百分百的OCR

圖像膨脹

接下來進行一個圖像的縱向膨脹，選擇一個膨脹的維度，這里選擇的是7。

kernel = np.ones((7,1),np.uint8) dilation = cv2.dilate(thresh, kernel, iterations=1)

python 如何做一個識別率百分百的OCR

找輪廓

接下來調用opencv找一下輪廓，

# 找輪廓cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)cnts = imutils.grab_contours(cnts)

接下來我們再讀取一下原圖，繪制輪廓看下輪廓的樣子。

python 如何做一個識別率百分百的OCR

外接矩形

對于輪廓我們可以做外接矩形，這里可以看下外接矩形的效果。

python 如何做一個識別率百分百的OCR

過濾字符

這里過濾字符的原理其實就是將輪廓內的顏色填充成黑色。下面的代碼是將高度小于15的輪廓填充成黑色。

for i, c in enumerate(cnts): x, y, w, h = cv2.boundingRect(c) if (h < 15):cv2.fillPoly(thresh, pts=[c], color=(0))

填充后可以看到標點符號就沒了。

python 如何做一個識別率百分百的OCR

字符分割

因為圖像是個矩陣，最后字符分割就是使用切片進行分割。

for c in cnts: x, y, w, h = cv2.boundingRect(c) if (h < 15):continue cropImg = thresh[y:y+h, x:x+w] plt.imshow(cropImg) plt.show()構造數據集

最后我們創建數據集進行標注，就是把上面的都串起來，然后將分割后的圖片保存到文件夾里，并且完成標注。

import cv2import numpy as npimport imutilsfrom matplotlib import pyplot as pltimport uuiddef split_letters(im): # 2值化 thresh = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY)[1] # 縱向膨脹 kernel = np.ones((7, 1), np.uint8) dilation = cv2.dilate(thresh, kernel, iterations=1) # 找輪廓 cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) # 過濾太小的 for i, c in enumerate(cnts):x, y, w, h = cv2.boundingRect(c)if h < 15: cv2.fillPoly(thresh, pts=[c], color=(0)) # 分割 char_list = [] for c in cnts:x, y, w, h = cv2.boundingRect(c)if h < 15: continuecropImg = thresh[y:y + h, x:x + w]char_list.append((x, cropImg)) return char_listfor i in range(1, 10): im = cv2.imread(f'test{i}.png', 0) for ch in split_letters(im):print(ch[0])filename = f'ocr_datas/{str(uuid.uuid4())}.png'cv2.imwrite(filename, ch[1])向量搜索（分類）

向量搜索其實就是個最近鄰搜索的問題，我們可以使用sklearn中的KNeighborsClassifier。

訓練模型代碼如下：

import osimport numpy as npfrom sklearn.neighbors import KNeighborsClassifierimport cv2import pickleimport jsonmax_height = 30max_width = 30def make_im_template(im): template = np.zeros((max_height, max_width)) offset_height = int((max_height - im.shape[0]) / 2) offset_width = int((max_width - im.shape[1]) / 2) template[offset_height:offset_height + im.shape[0], offset_width:offset_width + im.shape[1]] = im return templatelabel2index = {}index2label = {}X = []y = []index = 0for _dir in os.listdir('ocr_datas'): new_dir = 'ocr_datas/' + _dir if os.path.isdir(new_dir):label2index[_dir] = indexindex2label[index] = _dirfor filename in os.listdir(new_dir): if filename.endswith('png'):im = cv2.imread(new_dir + '/' + filename, 0)tpl = make_im_template(im) # 生成固定模板tpl = tpl / 255 # 歸一化X.append(tpl.reshape(max_height*max_width))y.append(index)index += 1print(label2index)print(index2label)model = KNeighborsClassifier(n_neighbors=1)model.fit(X, y)with open('simple_ocr.pickle', 'wb') as f: pickle.dump(model, f)with open('simple_index2label.json', 'w') as f: json.dump(index2label, f)

這里有一點值得說的是如何構建圖片的向量，我們分隔的圖片的長和寬是不固定的，這里首先需要使用一個模型，將分隔后的圖片放置到模板的中央。然后將模型轉換為一維向量，當然還可以做一個歸一化。

生成結果

最后生成結果就是還是先分割一遍，然后轉換為向量，調用KNeighborsClassifier模型，找到最匹配的一個作為結果。當然這是識別一個字符的結果，我們還需要根據分割的位置進行一個排序，才能得到最后的結果。

import cv2import numpy as npimport imutilsfrom sklearn.neighbors import KNeighborsClassifierimport pickleimport jsonwith open('simple_ocr.pickle', 'rb') as f: model = pickle.load(f)with open('simple_ocr_index2label.json', 'r') as f: index2label = json.load(f)max_height = 30max_width = 30def make_im_template(im): template = np.zeros((max_height, max_width)) offset_height = int((max_height - im.shape[0]) / 2) offset_width = int((max_width - im.shape[1]) / 2) template[offset_height:offset_height + im.shape[0], offset_width:offset_width + im.shape[1]] = im return template.reshape(max_height*max_width)def split_letters(im): # 2值化 thresh = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY)[1] # 縱向膨脹 kernel = np.ones((7, 1), np.uint8) dilation = cv2.dilate(thresh, kernel, iterations=1) # 找輪廓 cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) # 過濾太小的 for i, c in enumerate(cnts):x, y, w, h = cv2.boundingRect(c)if h < 15: cv2.fillPoly(thresh, pts=[c], color=(0)) # 分割 char_list = [] for c in cnts:x, y, w, h = cv2.boundingRect(c)if h < 15: continuecropImg = thresh[y:y + h, x:x + w]char_list.append((x, cropImg)) return char_listdef ocr_recognize(fname): im = cv2.imread(fname, 0) char_list = split_letters(im) result = [] for ch in char_list:res = model.predict([make_im_template(ch[1])])[0] # 識別單個結果result.append({ 'x': ch[0], 'label': index2label[str(res)]}) result.sort(key=lambda k: (k.get(’x’, 0)), reverse=False) # 因為是單行的，所以只需要通過x坐標進行排序。 return ''.join([it['label'] for it in result])print(ocr_recognize('test1.png'))

以上就是python 如何做一個識別率百分百的OCR的詳細內容，更多關于python 做一個OCR的資料請關注好吧啦網其它相關文章！

Python 編程

上一條：Python中requests做接口測試的方法下一條：python 爬取華為應用市場評論

相關文章：

1. 完美解決vue 中多個echarts圖表自適應的問題2. SpringBoot+TestNG單元測試的實現3. vue實現web在線聊天功能4. idea配置jdk的操作方法5. Docker容器如何更新打包并上傳到阿里云6. Springboot 全局日期格式化處理的實現7. python 浮點數四舍五入需要注意的地方8. IntelliJ IDEA設置默認瀏覽器的方法9. Java GZip 基于內存實現壓縮和解壓的方法10. JAMon(Java Application Monitor)備忘記

排行榜

					
					Docker容器如何更新打包并上傳到阿里云
IntelliJ IDEA設置默認瀏覽器的方法
VMware中如何安裝Ubuntu
idea配置jdk的操作方法
JAMon(Java Application Monitor)備忘記
Java GZip 基于內存實現壓縮和解壓的方法
python 浮點數四舍五入需要注意的地方
完美解決vue 中多個echarts圖表自適應的問題
Springboot 全局日期格式化處理的實現
vue實現web在線聊天功能
SpringBoot+TestNG單元測試的實現