Use your PC to Master Japanese and Chinese
Using Optical Character Recognition
At the present time, most text exists or is available only in printed
form stored on paper. Traditional methods of processing this text
involve annotating it (or a copy) with a pencil, or typing it into a
word processor. However, typing it is exceedingly slow and error prone, hence
then need for an automatic method of converting the printed text to
electronic text. This process is called
optical character recognition(11-
2)
(OCR). The process involves scanning a page to produce an electronic
image of the page, then processing the image to recognize the electronic
text. The text is then checked against the original image, errors are
corrected, and the text is saved in a standard native file format. Although
full page black and white scanned images are quite large (2-3 megabytes) in
size, the resulting text is only about 1/1000 of that.
Although Roman language OCR programs are fairly sophisticated, Asian language
OCR is still in its infancy. The technical challenges include distinguishing a
200 times larger character set, and the need to achieve essentially perfect
accuracy. Consequently, Asian language OCR currently involves considerable
human interaction to produce acceptable results.
Optimal image contrast and brightness are critical to error-free recognition,
so scanning parameters must be adjusted to yield an optimal image for
recognition. Sometimes a compromise is required. After the recognition process
is run, the correction process verifies the recognition results with the
original image, and corrects whatever errors in zoning and character
recognition are noticed. The verification and correction process can be time
consuming. If there are a large number of errors caused by poor image quality,
it may be easier to re-scan and reprocess a page than to correct the errors.
To use an Asian language
OCR(11-
2)
program to import scanned text:
- Launch the OCR program. If the OCR program supports acquiring a page from
a scanner, use the File | Acquire or Scan selections to scan the
page. Otherwise, launch the scanner program separately.
- Position the image in the scanner to avoid noticeable image skew
(horizontal lines should be horizontal). A little skew is ok. Zoom the image
and set the scan contrast and brightness for the sharpest and clearest image
possible, and set the resolution to 400 dpi or so (there is a tradeoff between
resolution and processing time, and system capacity). Create the final image or
save a bitmap file in a format that the OCR program can open.
- Use the Recognize command to see what the scanning software makes
of the acquired image. Correct incorrect zoning, then correct incorrect
characters. It is essential to fix errors at the scanning stage if you plan to
run the resulting text through a translation or annotation program: small
errors in characters cause big errors in translation as the machine becomes
hopelessly confused.
- Save Japanese files as
Shift-JIS(D-
-
7),
and Chinese files as
BigFive(D-
-
1)
or
GuoBiao(D-
-
4).
See
Importing Native Files(7-
4).
Importing Native Files
You can import scanned files or plain text files from native word processors
directly into Smart Characters, or run them through a translation or annotation
step before importing them. If you add native fonts to Smart Characters, you
can open the files in their native code spaces. See the
File Format(3-
2)
dialog and
Use Other Fonts(8-
5).
- Select File | Open | Interpret File As to open a file in a native
code space and review it for accuracy. See
Interpreting Native Formats(7-
1).
If you have installed the
ScAnnotate Automatic Annotator(11-
2)
, select the Translate
Add Annotations(3-
35)
command to create an annotated file in the
Combined(4-
9).
If you have installed a full translation program, select Translate
Translate Window(3-
35)
command to launch it and create a translation of the original document.
- Select File | Open | Convert File From to convert a text file in a
native code space to the Combined symbol set for hand annotation.
Apropos Customer Service home
page 617-648-2041
Last Modified: March 23, 1996
Copyright © 1996 Apropos, Inc.