Integrated Technique for Automated Digitization of Raster Maps/Processing

1 de Julio de 2000 Vol. 1 No.1

INTEGRATED TECHNIQUE FOR AUTOMATED DIGITIZATION OF RASTER MAPS
Serguei Levachkine and Evgueni Polchkov

(continue...)

3 Raster Map Processing

The main goal of this, the principal stage of automated vectorization of raster maps, is that of recognition of cartographic images; i.e. generation of vector layers and attribute information in electronic maps. From our point of view, the most promising line of software development is the creation of methods, algorithms and programs which focus on locating and identifying specific cartographic objects. Each cartographic image has its own graphical representation parameters, which can be used for automated object recognition on a raster map. The particular attributes depend on the topological class of the object. In traditional GIS, vector map objects are divided into three types; points, arcs and polygons, representing respectively point, linear and area objects. This classification can easily be extended to analysis of cartographic images in raster maps. Objects are drawn on thematic maps in the form of graphical symbols, which are the same for all objects in a given group. Graphical images have geometric (location) and attribute (quantitative and qualitative parameters) information, which we combine to form the concept of a cartographic image. The main geographical coding attributes of cartographic images of the three classes are shown in Table 4.

Object type	Graphical representation attributes
Point	Shape Size Symbol
Arc	Type Color Thickness
Polygon	Area			Outline
	Fill	Crosshatching	Crape		Type
	Color	Type Color Thickness Angle Density	Shape Size Pattern Density		Color Thickness

Table 4. Main attributes used for graphical coding of cartographic images.

The classification of cartographic images is different when the vectorization of raster maps is considered. All objects on a raster map have area, and in this sense they are all polygons. It is not an easy problem to reduce the graphical coding elements of cartographic images to elements that correspond to the geometric categories "point" (a coordinate pair), "line" (a sequence of coordinate pairs) and "polygon" (a closed set of line segments which do not intersect and form the border of a geometrical figure). However, clearly the classification of point, linear and polygonal objects must be preserved, because we can omit the relative stretch of the cartographic images in one or two directions (respectively lines or points) with respect to the stretch of the map field. A recognition program that recognizes, for example, point objects, does not have to distinguish between the point cartographic image itself or an element of a polygon fill pattern.

Note that there may be other graphical objects involved in recognition of cartographic images, which are nearly always absent from raster maps. Principally, these are letters and digits (toponyms and quantitative and qualitative characteristics of objects). Additionally, there may be other graphical elements in the map (footnotes to lines, insets, etc.) It is thus convenient to use the classification presented in Table 5.

Type of object recognized	Cartographic images and their elements
Point	Symbols of point objects Element of polygon fill pattern
Arc	Symbols of linear objects Explicit polygon borders Crosshatched line
Polygon	Symbols of polygonal objects with implicit borders given by: Solid fill Crosshatching Pattern fill
Text	Toponyms Altitude marks Road distances Parameter values on the contour lines Tags on geodetic points, hydrometric monitoring posts, etc.
Additional graphics	Text footnotes Guides Tick-marks on contour lines Insets, etc.

Table 5. Cartographic object classification from automated vectorization point of view.

An important element of an automated raster map technique is the development of an optimal sequence for cartographic image recognition, successively eliminating elements already decoded from the raster map field and restoring images which were hidden by the eliminated elements. The basic principle of this optimized ordering must be "from simple to complex". Nevertheless, the possiblility of using information from objects already digitized (whether manually or by an automated system) must be provided for in the development of a recognition strategy. For example, the point layer of hydrological monitoring posts can be successfully used for recognition of linear elements of the river network. Moreover, the symbols for these posts generally cover images of the river, complicating automated identification of the rivers. Taking this into account, it becomes clear that hydrological monitoring posts must be vectored before the river network is digitized. Eliminating them from the raster map, one can use their locations and attribute data (mainly altitude marks) to aid in recognition of elements of the river network.

Further developing this approach, it is suggested to use already existing small scale vector maps for recognition of corresponding cartographic images on large scale maps. A small scale map contains generalized (in a broad sense) information about a considerable proportion of the objects on the corresponding large scale map. As a rule, the generalization involved in decreasing the map scale consists in the simplification of the geometric shape of the object and the elimination of a part of the object. For example, on a large scale map, a river is represented by a polygon, but on the small scale map, as a line. In general, a given object can be expected to change in topological type when the degree of generalization changes. Even if the topological type of an object is preserved after generalization, several objects on a large scale map may correspond to a single object on a small scale map. Examples of the correspondence between the objects in maps of different scales are presented in Table 6.

Scale		Examples
Small	Large	Small	Large
Point	Point	Altitude marks	Altitude marks
	Line	Out-of-scale irrigation block	Irrigation channels
	Poly-gon	Out-of-scale populated place	Territory of populated place
Line	Point	Dotted lines	Separate dots
	Line	Electric transmission line	Electric transmission line
	Poly-gon	Line of riverbed	River water area
Poly-gon	Point	Watershed	Separate dots
	Line	Watershed	Dotted lines
	Poly-gon	Bog	Bog region

Table 6. Correspondence of cartographic objects between maps of different scale.

The use of small scale maps solves a difficult problem in automated digitization; the search for objects in the whole raster map field. In this case, a vectored object can be found in the nearest neighbor of its generalized analogue, and nowhere else.

The search zone for paired point objects can be restricted to a circle with a radius defined by the correlation between the scales of the vectored maps and the maps used.

We suggest the use of the "caterpillar" algorithm (the name reflects the shape of the illustrated algorithm) for searching for paired linear objects. The caterpillar algorithm involves the construction of a system of line segments perpendicular to the contour of their small-scale analogue, divided in half by it. The length of each segment can be chosen by the correlation between the scales of the maps used and their density, i.e. by the curvature of the generalized line. Moreover, the search object is located along segments constructed in this way. The sequence of reference points of the search curve can thus be found. The reference points obtained can be joined by straight line segments in an interactive digitation system without any intervention by the operator.

Automated cartographic image recognition is simplified and its reliability increased by the use of corresponding vector layers of a small scale map for digitization of isoline and other regular systems of linear objects (such as the coordinate grid, or urban blocks with linear or radial planning). But in this case not all lines of a large scale map have small scale analogues. For example, the contour lines on a vectored 1:50,000 map may have 10m density while on the corresponding 1:250,000 map they have 50m density. In such a case, the contour lines that have counterparts in the generalization (0, 50, 100, etc.) are vectored first by the caterpillar algorithm. Next, the "stair" algorithm (the name reflects the shape of the illustrated algorithm) is applied for the recognition of the intermediate contour lines. The stair algorithm constructs a system of curves between each adjacent pair of already vectored contour lines, which are perpendicular to each of these contour lines. The density of these curves is defined by the curvature of the basic lines, just as in the caterpillar algorithm. Moreover, points of the adjacent contour lines to be searched for are located along the curves constructed in this way. Between two index contour lines, the number of additional lines to be found is well-defined (for example, between two contour lines of 100 and 150m four additional contour lines of 110, 120, 130 and 140m always exist and can be found). Once all the necessary reference points have been found, it is clear that they can be joined in succession using the program tools given by the caterpillar algorithm.

The sequence of reference points of a vectored linear object can be copied from the layer which contains the corresponding point objects. For example, shoreline structures (hydrometric monitoring posts, bridges, docks etc.) can be used as reference points to digitize the contours of rivers and lakes. The hydrometric monitoring posts are particularly useful here. Their coordinates and attribute data (name of the river or lake and altitude mark) can be used in automated recognition algorithms for the elements of the hydrological network on the raster map. Note that in this case automated digitizing reverses the order of operations compared to traditional techniques. Traditionally, the operator first digitized the hydrological network manually, and then vectored the location points of the shoreline structures using vector editing tools.

In other words, maximal usage of already existing information (directly or indirectly related to the vectored objects) employed as a general principle of automated cartographic image recognition can increase efficiency and reliability. For example, algorithms that use digital models of a region, and that are based on small scale maps, can be produced for digitization of the hydrological network. If the layers are already vectored, this can be used to generate the sequence of reference points of the curves to be recognized; otherwise these points can be indicated manually as described above. This simplifies automated digitization and increases its reliability.

Summarizing the processing of raster maps, we note that the methods and algorithms used for this process must provide complete, even redundant cartographic image recognition in order to eliminate erroneous recognition of objects, since visual control and correction of the vector layers can be carried out more quickly than manual digitization of missed objects.

To conclude the discussion in this section, we comment that the process of automated cartographic image recognition (processing), from our point of view, should follow the scheme presented in the following Table 7, where, as before, experts assign scores indicating the degree of possible automation of the various steps.

Operation	Score
Development of strategy for automated digitization of raster maps
Elaboration of sampling matrices of raster maps
Classification of recognized objects	-
Selection of the size, pattern and color fill of basic sampling matrices of raster maps	75
Estimation of statistical weights of separated elements of the cartographic images	75
Recognition of cartographic images
Digitization of objects which have vector analogues	75
Digitization of objects which do not have vector analogues	50
Elimination of superfluous recognized objects	-
Recognition of attribute data of vectored objects
Classification of attribute information carriers	-
Location and identification of attribute information	75
Correction of errors in attribute data recognition	75
Elimination of recognized images from raster map
Restoration of image covered by recognized object	75
Correction of restored image	75

Table 7. The degree of possible automation of the processing operations.

[ Este número | Artículo]

Dirección General de Servicios de Cómputo Académico-UNAM
Ciudad Universitaria, México D.F.