A new method for writer identi ﬁ cation based on historical documents

: Identifying the writer of a handwritten document has remained an interesting pattern classi ﬁ ca - tion problem for document examiners, forensic experts, and paleographers. While mature identi ﬁ cation systems have been developed for handwriting in contemporary documents, the problem remains challen - ging from the viewpoint of historical manuscripts. Design and development of expert systems that can identify the writer of a questioned manuscript or retrieve samples belonging to a given writer can greatly help the paleographers in their practices. In this context, the current study exploits the textural information in handwriting to characterize writer from historical documents. More speci ﬁ cally, we employ oBIF ( oriented Basic Image Features ) and hinge features and introduce a novel moment - based matching method to compare the feature vectors extracted from writing samples. Classi ﬁ cation is based on minimization of a similarity criterion using the proposed moment distance. A comprehensive series of experiments using the International Conference on Document Analysis and Recognition 2017 historical writer identi ﬁ cation dataset reported promising results and validated the ideas put forward in this study.


Introduction
Identifying the writer of a handwritten document is an established task in forensic analysis, document examination, paleography, and information retrieval problems. Along with physical biometric identifiers such as fingerprints, face, and deoxyribonucleic acid, handwriting in human identification is considered a special case of behavioral biometrics [1]. With the recent advancements in pattern classification and machine learning, automatic writer identification systems have matured substantially in recent years [2][3][4][5][6][7][8]. These systems aim to capture the visual differences in the handwriting of different individuals. These differences typically include the variations in allographs, the slope of lines, the slant of characters, line spacing, inter and intra-word spacings, legibility, cursiveness, and so on. These writer-specific attributes are extracted through computational structural or statistical features at paragraph, line, word, or sub-word levels. In recent years, data-driven feature learning using convolutional neural networks has also emerged as a popular choice of researchers in characterizing the writer from handwriting [9][10][11].  Among various features capturing the writing style, textural measures have remained an attractive as well as an effective choice for many researchers. Commonly employed textural features include different variants of local binary patterns [12,13], the hinge features [14], curvature-free cloud of line distribution (COLD) features [15,16], histogram of oriented gradients [17], gray-level co-occurrence matrices [18], and the run-length features [16]. Writer identification using a codebook of graphemes [19,20], small writing strokes [21], and small windows around keypoints in handwriting [22] has also been investigated. Learning features from writing samples using ConvNets has also gained popularity in recent years [23,24].
Contrary to the contemporary documents, the complexity of identifying the hand that produced a manuscript becomes much more challenging in the case of historical documents [25]. Historical manuscripts often degrade over time and commonly suffer from noise, holes, ripping, tearing, and stains. In most cases, such documents are photographed using high-resolution cameras and are made available for research and development of computational methods. Identifying the scribe of historical documents can also be exploited to estimate the date and/or geographical information of the origin of the manuscript [26]. Furthermore, information on scribes can also be employed to match different fragments of the same manuscript and combine them together into a complete document [27].
A number of writer identification competitions have been organized in conjunction with the International Conference on Frontiers in Handwriting Recognition [28][29][30] and the International Conference on Document Analysis and Recognition (ICDAR) [31][32][33]. In addition to these competitions targeting writer identification on modern datasets, dedicated competitions on the identification of scribes from historical documents have also been organized in conjunction with ICDAR 2017 [34] and ICDAR 2019 [35]. The winning system [25] of the ICDAR 2017 competition employed oriented Basic Image Features (oBIFs) [36] to characterize the writer. For the 2109 competition, the system based on pathlet and scale-invariant feature transform (SIFT) features outperformed the other submitted systems.
This article targets the problem of writer identification from historical manuscripts extending our previous findings on this problem [13,25]. While most of the studies on such problems focus on the feature extraction part and employ standard matching techniques, we introduce a new moment-based distance to compare two writing samples. A combination of oBIF column histograms [25] and hinge features [14] is employed to map writing samples to feature vectors, which are subsequently compared using the proposed distance measure. An experimental study on the publicly available ICDAR 2017 dataset and a comparison with the existing techniques validate the effectiveness of the proposed method. The key highlights of this study are outlined as follows: -Writer characterizing from challenging historical manuscripts using oBIFs and hinge features.
-Introduction of a novel moment-based measure to compute the distance between two feature vectors.
-A comprehensive experimental study on publicly available historical documents.
-Promising performance in terms of writer identification rates outperforming the existing methods.
It is pertinent to mention that data-driven feature extraction using deep learning methods has emerged as a popular method for this problem in recent years [9][10][11]. These methods jointly train the feature extractor and the classifier typically using different variants of ConvNets. We, on the other hand, employ the standard pattern classification pipeline of feature extraction followed by matching the query document with those in the reference base. A major motivation for this choice is the fact that although for experimental datasets, the amount of handwriting available per writer (class) is sufficient to learn writer-specific features using deep learning methods, for most practical applications, the amount of text per writer is fairly limited. In some cases, it could be a single line or a few words only. Standard visual features with conventional processing pipelines are more effective in such situation and also drive the current research.
The content of this article is organized as follows. In Section 2, we discuss the recent advancements in writer identification with a prime focus on historical manuscripts. Section 3 introduces the textural features employed to characterize the writer along with the proposed moment distance. Experimental study, quantitative performance, and a detailed analysis of the reported results are presented in Section 4. Section 5 concludes the article with a discussion on key findings and insights into open research problems on this subject.

Related works
The problem of writer identification has been thoroughly investigated by the handwriting recognition community. A major contributing factor to this research's attention was the public availability of large handwriting datasets like IAM [37], RIMES [38], CVL [39], KHATT [40], and QUWI [41]. Despite these advancements, identifying the scribe from historical manuscripts remains an open problem as discussed in Section 1. In recent years, however, several joint projects [42][43][44] between paleographers and researchers in pattern classification have resulted in increased acceptability of computerized solutions by domain experts [45].
A major challenge in the automatic characterizing of writers is to identify the set of computational features that are able to capture writer-specific information from the samples under study. These can be a set of pre-defined hand-crafted features or can be learned through data using convolutional neural networks. With respect to historical documents, a number of studies investigate the features primarily employed for modern documents on historical manuscripts [46]. Gattal et al. [25], for instance, captured the textural information in handwriting by combining the oBIFs computed at multiple scales from binarized historical documents. Identification is carried out in the nearest-neighbor framework using a number of distance metrics, and an experimental study on the ICDAR 2017 Historical WI dataset reported an accuracy of 77.39%. Likewise, Lai et al. [47] proposed pathlet and SIFT features for writer identification in historical documents. Pathlet and unidirectional SIFT features are extracted to capture rich shape (slant and curvature) and structural information (corners and junctions) in the handwriting. The extracted features are then encoded using a newly proposed bagged-vector of locally aggregated descriptors (VLAD) scheme. The methods reported state-of-the-art performance on ICDAR 2017 Historical-WI dataset and achieved the top performance in the ICDAR 2019 historical document reading challenges-image retrieval (HDRC-IR) competition.
Among other methods, Chammas et al. [48] used a deep convolutional neural network using small patches of handwriting extracted through SIFT keypoints. Features learned by the convolutional neural network (CNN) are encoded through multi-VLAD and are normalized using the L2 norm. Classification with an exemplar support vector machine reported an accuracy of 97% on the ICDAR2019 HDRC-IR dataset. In another study, Christlein et al. [49] used a deep residual CNN to learn effective feature representations using surrogate classes. These classes are obtained by applying clustering on the samples in the training set. Features learned by the CNN are subsequently employed for classification. In an extension of this study, Jordan et al. [50] employed the same features and introduced a re-ranking method to improve the retrieval performance. The re-ranking relies on k-reciprocal nearest neighbor relationships and was shown to significantly improve the performance on the ICDAR 2107 dataset.
In other notable works on historical documents, transfer learning on pretrained CNNs is employed in ref. [51] to identify writer from the images of the twelfth century Bible. An extension of this study was the evaluation of pretrained CNNs on medieval documents [52]. Similarly, a number of pretrained CNNs are evaluated for multiple tasks, including dating and identification of writing styles from historical manuscripts. In another series of related studies, a detailed analysis of writer identification in historical documents was carried out in ref. [53], and the study was extended to handwriting on papyrus in ref. [54]. To handle the problem of data scarcity, Nasir et al. [55] proposed a two-step fine-tuning by first tuning the weights of pretrained CNN on modern handwriting images in the IAM dataset and subsequently tuning it on the limited samples of handwriting on papyrus. While the work in ref. [55] extracts features from rectangular windows of handwriting obtained with dense sampling, patches around keypoints in handwriting are considered in ref. [56], resulting in enhanced performance. An overall writer identification rate of 64% is reported in ref. [56], as opposed to 54% in [55] on a challenging set of 50 writing samples from ten different scribes.
A summary of notable studies on writer identification from historical manuscripts, primarily targeting the ICDAR datasets, is presented in Table 1. It can be observed that among hand-crafted features, textural features represent an attractive choice to capture the information on writing style and, hence, identify the writers [25,34]. Among machine-learned features, a common recent trend is to identify keypoints in handwriting (e.g., using SIFT), extract small patches using these keypoints, and employ these patches for feature learning through a CNN. It is also common to encode the extracted features where different variants of VLAD encoding have been investigated [47][48][49]. In our study, since the primary focus is on proposing an effective distance measure, we employ textural measures to extract features and employ a number of distance metrics to validate the superiority of the proposed metric. These details are presented in the next section.

Methods
This section introduces the proposed writer identification technique that relies upon extracting a set of features from the writing samples in the reference base and comparing those with the questioned  document. An overview of the system is presented in Figure 1 while the feature extraction and the proposed moment-based matching method are discussed further.

Feature extraction
Feature extraction is an important step in any image classification task. It allows for the mapping of given images to points in the feature space, so that images of the same class (writer in our case) cluster together in the feature space. In our study, we have chosen to capture the curvature, contour, and texture information in the handwriting to characterize the writer. The corresponding computational features include oBIFs column histograms, and hinge features, detailed in the following.

oBIFs column histograms
In one of our recent studies [25], we investigated the effectiveness of oBIFs in identifying writers from historical documents and obtained promising results. We, therefore, employ oBIFs as one of the features to evaluate the performance of the newly proposed matching scheme. oBIFs are an extension of the Basic Image Features (BIFs) introduced in ref. [57]. oBIFs include the application of a bank of derivative-of-Gaussian filters at multiples scales (controlled by scale parameter σ). Each location in the image is attributed to one of the seven predefined symmetry classes. These classes include dark line on light, light line on dark, dark rotational, light rotational, slop, saddle-like, or flat. BIFs were later extended to include local orientation information where meaningful. The representation can be further enriched by combining the oBIFs at two different scales producing the oBIFs column features. In our study, we investigate two combinations of scale parameters , yielding a feature vector of dimension 484. Furthermore, the parameter ε, which determines if a location is to be labeled as flat, is set to a small value of = ε 0.01. Computational details of oBIFs columns can be found in our previous work [25].

Hinge features
Among the various contour-based features reported in the literature, hinge [20] and delta-n hinge [14] features, designed to capture the ink-trace curvature, are known to be highly discriminative for different writers. The hinge feature [20] computes the joint probability distribution of orientations of the two arms of a hinge, considering each pixel on the writing contour as the reference point. Parameters involved in the calculation of hinge feature include the length of the arm r and the number of (angle) bins in the histogram p. The hinge feature was later extended to the delta-n hinge feature [14] to achieve rotation invariance. It simultaneously considers successive pixels for a fixed Manhattan distance and computes the probability of angle derivative in both directions. This introduces two additional parameters, the Manhattan distance l Δ , and the number of derivative n. In our study, we set = p 40 as suggested in ref.
[58], = n 2, and the Manhattan distance = l Δ 7. The generated feature vector is standardized to have zero mean and unit variance and is subsequently mapped to the interval [ ] 0, 1 using the following function: where ( ) V x represents the normalized version of the feature vector x. Decision on the identity of the writer of a query document is made separately using oBIFs column histogram and hinge features and the individual decisions are subsequently combined to arrive at the final output. Details of classification using the newly proposed moment matching are presented in the following sections.

Moment-based distance
Distance metric plays an important role in comparing the feature vectors and eventually performing classification. A number of distance measures have been proposed in the literature to compute the dis-similarity between feature vectors [59]. Commonly employed metrics include Euclidean distance, city block distance, correlation, cosine distance, and spearman distance. In the current study, in addition to the investigation of standard metrics, we also propose a novel matching method and the moment distance, which is elaborated further below.
Moments have been widely employed in image analysis, pattern classification, object recognition, and image coding [60]. If a feature vector is considered as a discrete function ( ) f x with = … x N 0, 1, , , then the moment of order k is defined as follows: The first moment is the expected value of a random variable, and the second central moment is its variance. Likewise, the third moment is the skewness, and the fourth moment is the kurtosis. The moments about mean are the mean of deviations from the mean after raising them to integer powers. The kth population moment about mean μ k is given as follows: For matching, we extract the feature vector f Q from the questioned document and compare it with all the vectors f R i in the references base R. Matching is carried out using the proposed moment-based matching method. The kth moment about arbitrary origin "a" denoted by ′ m k is Using equations (8) and (9), the kth moment can be defined as: In our study, we have (empirically) chosen the values of = k 2, 4, 6 resulting in ′ m 2 , ′ m 4 , and ′ m 6 and the final distance between two vectors is computed as follows: The distance between the query feature vector and all those in the reference base is computed using equation (11), and the writer of the query document is identified as the writer of the document in the reference collection, which reports the minimum distance (nearest neighbor framework). Moment distances are computed separately for the two sets of features (oBIF column histograms and hinge features) and the final decision is made by combining the individual decisions. Decisions can be combined using the product (Prod), sum (Sum), average (Avr), or minimum (Min) rules, and based on the findings of our previous study [25], in the present work, we combine the individual decisions using the Min rule.

Experiments and results
The experimental study of our system is carried out on the ICDAR 2017 Historical Writer Identification dataset [34]. The test set of this competition contains a total of 3,600 historical manuscripts with 720 unique writers, i.e., each writer contributed five pages. For consistency, we employ the same experimental protocol as that of the competition and quantify the performance using Top-1, Top-5, and Top-10 identification rates along with the mean average precision (mAP). Top-k refers to retrieving the best k solutions (writers in our case) against a query document and verifying if there is at least one correct answer in the retrieved hitlist [34].
In the first experiment, we aim to evaluate the performance of well-known handcrafted features on the challenging set of historical manuscripts. Using the competition experimental protocol, we implemented a number of well-known features applied to the writer identification problem on modern manuscripts to select the best subset of features. The investigated features include the oBIFs column histogram, Delta Hinge features, local binary patterns (LBP), LBP column histogram, complete local binary patterns (CLBP), local binary pattern variance (LBPV), run length features, edge direction and edge hinge features, and COLD features. The results of these experiments are summarized in Table 2. It can be observed in Table 2 that among the investigated features, the oBIF column histograms at { } = σ 2, 4 and { } = σ 2, 8 while = ε 0.1 outperform other features on the ICDAR 2017 dataset. Likewise, among other features, delta hinge features (with parameters = r 10 and = p 40) report a Top-1 identification rate of 71%. The performance of other textural measures like LBP, CLBP, and run-length feature is relatively lower reporting mAP values <0.5. Although these features are known to perform well on modern handwriting, lower performance on historical manuscripts reveals that for such challenging scenarios, more robust representations must be employed.
Once the top-performing features are identified, we evaluate the performance of these features using different distance metrics, including the proposed moment-based distance. The corresponding results are summarized in Table 3. It can be seen that by using the proposed moment-based distance, Top-1 identification rates of as high as 77.36% and 75% are reported with the oBIF column histogram (f 3) and delta hinge feature (f 4), respectively. A similar trend can be observed for all the employed metrics, i.e., Top-5 and Top-10 identification rates and mAP. Another interesting observation is that across all the four investigated features, the moment distance, in general, outperforms other metrics validating that the matching is generalized and is not tuned to a specific set of features.
In addition to studying the performance of the features with respect to different distance metrics, we also investigated the combination of decisions of individual features to study how the proposed metric behaves when decisions are combined. Decisions are combined using a minimum of the sum distances from features (Sum-Min), minimum of the product (Prod-Min), minimum of the average (Avg-Min), and minimum of the minimum (Min-Min) distances. Performance with the combination of decisions is summarized in Table 4, where it can be observed that, in general, classification performance of the combination scheme based on the minimum of the product (Prod-Min) is relatively better than other combination methods. A Top-1 identification rate of 78.75% and mAp of 58.62% is reported. Among the investigated combinations, combination of the oBIF column histogram (f 3) and delta hinge features (f 4) reports the highest identification rates. It should, however, be noted that the objective is not only to find the best combination but also to study the evolution of performance with the newly proposed metric. In general, a consistency in performance is observed in different combinations, where no single combination significantly surpasses others validating the generalization of the moment distance.
In the end, we also provide a performance comparison with methods (hand-crafted features) evaluated on the same dataset and using the same experimental protocol as that of our study. Although higher identification rates are reported in studies employing machine-learned features (using ConvNets), it is important to recall that the objective of this study is not to introduce novel features but to enhance the matching step. Consequently, to show the effectiveness of the proposed moment-based distance, we investigated two well-known hand-crafted features and studied the performance evolution as a function of the distance metric. Consequently, the comparison is also made with techniques employing hand-crafted features. Table 5 shows that using the proposed moment-based matching, we achieve better performance as compared to those reported in ref. [25] as well as the winning system of the ICDAR 2017 competition [34]. Given the complexity of the problem, the reported identification rates are indeed very promising validating the effectiveness of the employed features and the proposed moment-matching method.

Conclusion
We presented an effective technique for characterizing writer from historical manuscripts. The technique relies on extracting oBIF column histograms and delta-hinge features from writing samples, and these features are matched using the newly proposed moment-based distance. A number of existing hand-crafted features are evaluated on the ICDAR 2017 Historical Document Writer Identification (Historical-WI) dataset, Bold text indicates a significant performance compared to other performances. Bold text indicates a significant performance compared to other performances. and the best-performing features are selected. A comprehensive study with different distance metrics and different decision combination schemes is also carried out. The reported results validated the effectiveness of the moment-based matching method in identifying writers from historical manuscripts. In our further study on this subject, we plan to investigate other categories of features and employ a formal feature selection strategy to identify the most discriminate subset of features for this problem. Furthermore, we also plan to extend this study to an unsupervised framework where manuscripts do not have writer labels and need to be grouped into clusters as a function of similarity in the writing style.
Funding information: The authors state no funding involved.

Conflict of interest:
The authors declare no conflict of interest.