Optional: Advanced

Text Recognition and OCR-Options

Make PDF-documents searchable (Text OCR)

If enabled, BarcodeOCR will be recognized text in scanned images and layers the recognized text as an invisible layer on top. If you search in an output document for a specific text, the corresponding result will be highlighted on the correct position. If you using a document management system or a system for indexing the meta information (i.e. windows explorer) you can search the content of the documents and find the file you are searching for very easy.

If enabled, all input documents will be recognized and converted to a searchable pdf file. This also applies to the files that will be saved in the error folder. Enabling this function will change the change the contents of the pages and will result in invalid digital signatures if the files are already signed. Due to this fact, existing PDF/A Certifications or digital signatures will be removed before conversion.

Optimize algorithms

You can adjust the level of quality or the level of speed by moving the arrow to the dedicated side. If you need more precision on the recognized text, move the cursor to the right.

Only detect the following characters

If you want to define what characters might be detected, you can enter all allowed characters without any delimiter (i.e. ABCDEF).

Ignore the following characters

If you want to define what characters should not be detected, you can enter all disallowed characters without any delimiter (i.e. ABCDEF).

Optimized processing speed through utilization of all processor cores

This results in a more optimal utilization of several processor cores. When this option is activated, up to 16 proceassor cores are automatically used. If this option is not checked, multiple processor cores are also used for detection, but processing is not as effective in this case.

If the function is enabled, blank page removal cannot be activated and when PowerShell scripts (subfolder and downstream application) are called, the text per page is not passed as parameter.

Convert documents to black-white documents to reduce filesize

Documents scanned in color can be converted into black white scans and can be compressed by using the CCITT Group 4 Compression-Algorithm, to ensure that the resulting file size is small.

This option will be performed after the Barcode recognition.

Change the resolution of the source documents 

If you are forced to scan the documents in a high resolution in order to recognize all barcodes, you can use this option to rerender the images to a specific resolution (DPI). You can combine this function with the "Compress documents" Option from PDF in order to receive even smaller documents.

This option will be performed after the Barcode recognition.

Remove blank pages from the scanned documents

If BarcodeOCR should delete pages with a low amount of information (remove blank pages), you can set up an information threshold. Once activated, every page will be analyzed and a factor will be calculated. If the factor is below the threshold set, the document will be deleted. If the factor is above the threshold, the document will be used in the output document.

To set an initial factor, you can analyze a scanned pdf page using the option "Determine by sample scan".

When activated, the log will contain all factors for each page so that you can adjust the value to the optimal level to remove blank pages but do not delete important pages by accident.


Preprocessing the documents

Various filters can improve barcode recognition before the scanned documents are processed. The filters are only applied during processing and they do not change the document.

FilterDescriptionHelpfull with
Invert
  • switches black and white values
  • Documents with low contrast
Despeckle
  • removes speckles and white dots from black barcode areas that can be produced on a black and white scan
  • Black and white documents with a strong compression
  • 1D-Barcodes with "dotted lines"
Dilate
  • dilates the colour values
  • Deformed barcodes
Erode
  • erodes the colour values
  • Deformed barcodes
Sharpen
  • sharpens the image
  • Blurry barcodes

Recognizing options

Waiting time before a new file is processed in seconds


Some scanners do not only file the documents when all the pages have been scanned but create temporary files during the scanning process which contain the pages already scanned. To prevent BarcodeOCR from starting recognition of not yet fully scanned documents, you can use this box to specify a time limit before processing of a new file commences. The number specified is the time in seconds that has to have elapsed since the time the file was created.

Cancellation of the processing after an unusually long processing time

BarcodeOCR has a safety mechanism, that sorts out files if they are being processed for an unusually long time. That way an unobstructed processing of the file queue can be ensured. The default processing time is 60 minutes (15 minutes in older versions). If the file was partially processed the resulting error and output files won't be deleted and the source file from the watcher folder will be copied to the error folder and deleted from the watcher folder.

If you happen to process large documents (page size or resolution) or use filter options, the default time of 60 minutes might not suffice. If the option "Service restart delay" is enabled, the process time for each page is set to 90 minutes. If you don't want to limit the process time, enable the "Do not limit processing time" option.

For option "Document stack not separated" check the complete document for a barcode 

The splitting option "Document stack not separated" only searches on the first page for a valid barcode. If no valid barcode is located, the document will be moved to error folder. Enable this option to search the whole document for valid barcodes and rename the output file according to the barcode value.

Fix damaged PDF files

PDF files created by a scanner might not be compliant to the PDF standard. This option attempts to fix noncompliant PDF files. If the attempt to fix the file fails, the file will be named "corrupted file" and moved to the error folder.

Following steps will be applied amongst others:

  • Removal of a PDF-A signature, to exclude faultily saved PDF-A files
  • Removal of meta information within the file
  • Removal of unnecessary or duplicate information streams within the PDF file
  • Attempt to restore the file structure

Convert PDF-Documents into a specific format version

BarcodeOCR keeps the PDF-Format-Version from the input file and uses the same format as the output file. If you need to change this behaviour, you can set the PDF-Version from the output document to a specific version.

View PDF file as images during processing

If the documents were not scanned in by scanner, but exported as a file from an application, no barcode will probably be recognized without activating this function. If your scanner already saves the document as a searchable file, the option is also required.

If the documentet has multiple layers, the scanner embed cutting masks in the PDF files or certain optimizations have been applied, enabling this option can also lead to the detection of barcodes.

Compatibility mode: barcode recognition

This type of detection was used by version 4.x. In this case, a graphic is created from each page which is then analyzed.

The amount of time and memory required is considerably higher, but may be required for vectorized PDF files that use a barcode font and are not recognized by the option "View PDF files as images during processing".

In addition, this mode must be switched on for the area detection so that the areas can be assigned without errors. If range detection is activated, this mode is automatically activated and must be switched off manually.

Difference between white and black values in the barcode


BarcodeOCR has to stipulate the threshold between white and black values in the document in order to separate the light areas in the barcode from the dark ones. There are two options for this.

  • Automatically

    • The threshold between white and black characters in the document is detected automatically.
  • Iteratively

    • The recognition process initially starts at the threshold specified in the "Level" field (valid values 0 to 255), and attempts recognition with this setting.
    • If this proves unsuccessful, the threshold is reduced by the value in the "Step" field and recognition is tried again. Then, the threshold in "Level" is increased by "Step" and recognition is attempted once more.
    • The value in "Step" is doubled for the next run. The "Count" field indicates the number of runs. The iterative method is particularly useful if grey or coloured paper is involved.

Backing up input files

Ensure that your input files are backed up, before testing new processing options or prerelease versions of BarcodeOCR. Make sure your documents are being processed without fault, before disabling the backup of your input files.

The input files will be moved to a backup folder before processing and a random string will be attached to the filename, to always have a backup of the original document at hand after processing. The backup folder won't be cleaned automatically.

Barcodes

Barcode contains checksum

By enabling this option, the contained checksum will be removed

Character set

Determines the character set of the output

Downstream application (per output file)

There are sample files in the folder "Script Samples" in the BarcodeOCR installation folder showing how to customize BarcodeOCR to your needs

BarcodeOCR is able to launch 3rd party software after a file has been processed successfully. That way the file processing capabilities of BarcodeOCR can be extended by 3rd party applications or scripts.

Create a information file with extended barcode information for each output file

  • A XML file will be created for each output file with any kind of information for the processed document (source file and destination file as well as value, position of the barcode, format, barcode type, etc.)
  • The file will be saved within the output folder and the same name as the processed file with the extension ".xml" appended


Downstream application after each recognized barcode

The application will be executed with each successful saving process within the output or error folder. (Meaning that it might be executed multiple times, if the document is multi paged)

 

Please bear in mind that the application will run within the security context of the Windows service of BarcodeOCR and, in this case as well, access to the network must be specified with UNC paths. 


The following parameters are available for importing or moving:

Parameter
Description
-resultResult of scan (either error or success)
-configurationNAMEName of the configuration that was executed
-sourcePath to the source file
-destinationPath to the target file
-pageFromStart page of target file
-pageToLast page of target file
-errorReasonReason why the file could not be processed (if result=error, otherwise blank)
-detectedBarcodeThe barcode that was detected


Downstream application (per input file)

The application will be launched before the input file is removed and represents the last processing step. During this last step the processed file is already moved to the error or output folder and the application will be launched for each processed file.

 

Please bear in mind that the application will run within the security context of the Windows service of BarcodeOCR and, in this case as well, access to the network must be specified with UNC paths. 


The following parameters are available for importing or moving:

Parameter
Description
-sourcePath to the source file
-files

List of saved documents with informations about the filename, barcode value and a value if the file was saved successfully.

Single file values are separated with a comma (,) and entire file entries are separated via semicolon (;)

Signature of a single entry:

Description: "filename.pdf","BarcodeValue","Successfully processed?"

Example: "ER-12345.pdf","ER-12345","True"

Complete example: "ER-12345.pdf","ER-12345","True";"AR-12345.pdf","AR-12345","True";"Duplicate.pdf","","False"


The service will be restarted after the last step of the wizard is completed. The configuration will be active afterward.

Starting a PowerShell script (per output file)

There are sample files in the folder "Script Samples" in the BarcodeOCR installation folder showing how to customize BarcodeOCR to your needs

After each processed file, a PowerShell script is started, which receives an extensive data structure with information about the recognition process.

An example script is automatically installed and can be found in the installation folder under "Script Samples" → "External Application - Powershell write all parameters to file. ps1".

The example script writes all transferred information to a text file and shows all possible data fields, e. g. the following example:

  • the configuration used (name and unique ID)
  • the file was processed successfully, if no error reason
  • the barcode used
  • the processing time
  • Storage location of any backups created
  • Target path and any created subfolders
  • Number of pages and page numbers from the original document
  • and for each individual page:
    • page number
    • recognized text (if corresponding option is set)
    • detected barcodes (text, type and position)