Confidence threshold
A scanned image, or especially a photograph, may contain areas of varying quality. Some regions may be dim or blurry, some may contain dirt, spots, scratches, glare, unwanted gradients, and other defects. These factors can interfere with recognition and lead to less accurate OCR results or the wrong order of the recognized text.
The complex structure analysis algorithm allows for very flexible filtering of areas with low confidence. You can choose whether to skip such areas and achieve highly reliable OCR results at the cost of losing some of the content, or to extract the maximum amount of text from an image with slightly lower accuracy. This filtering threshold is controlled through dsrConfidence
recognition setting:
Alias | Behavior |
---|---|
Default | Synonym for Mid . |
Low | Extract most of the text except for very low quality areas. |
LowMid | Extract most of the text except for low quality areas. |
Mid | Skip some blurry or dull areas and image defects. Default behavior that balances recognition accuracy and coverage. |
MidHigh | Prefer more accurate recognition at the cost of losing some content from areas of low confidence. |
High | Skip most of the blurry areas and image defects. |
Ultra | Process only the highest quality areas, skipping all others. |
All | Try to extract all the text from the image, regardless of its quality. May lead to less accurate results and artifacts, but can extract text even from illustrations or logos. |
If this parameter is omitted, medium (Mid
) filtering is used, providing a good balance between recognition accuracy and volume of extracted text. As a general rule, it is a good idea to tweak dsrConfidence
only if you are unhappy with the result:
- If you get unwanted artifacts or prefer maximum recognition accuracy, try setting more aggressive filtering (
Ultra
,High
orMidHigh
).
Keep in mind, that some content blocks will not be recognized. - If some blocks of text do not appear in the recognition results, set the value to
Low
orLowMid
. - To get all the text from the image, regardless of the accuracy, set the value of the parameter to
All
.
dsrConfidence
parameter only applies to complex structure analysis and combined structure analysis modes.