Classification of cow behavior patterns using inertial measurement units and a fully convolutional network model

In this study, we aimed to classify 7 cow behavior patterns automatically with an inertial measurement unit (IMU) using a fully convolutional network (FCN) algorithm. Behavioral data of 12 cows were collected by attaching an IMU in a waterproof box on the neck behind the head of each cow. Seven behavior patterns were considered: rub scratching (leg), ruminating-lying, lying, feeding, self-licking, rub scratching (neck), and social licking. To simplify the data and compare classification performance with or without magnetometer data, the 9-axis IMU data were reduced using the square root of the sum of squares to develop 2 datasets. Comparing the classification accuracy of the 3 models using a window size of 64 with 6-axis data and a window size of 128 with both 6-axis and 9-axis data, the best overall accuracy (83.75%) was achieved using the FCN model with a window size of 128 (12.8 s) using all IMU data. This model achieved classification accuracies of 83.2, 96.5, 92.8, 98.1, 82.9, 87.2, and 45.2% for ruminating-lying, lying, feeding, rub scratching (leg), self-licking, rub scratching (neck), and social licking, respectively. As a sequence of varied and intensive movement, the classification accuracy of behavior patterns related to skin disease was lower; better classification of these behavior patterns could be achieved with full IMU data and a larger window size. In the future, additional data will take into account different data types, such as audio and video data, to further enhance performance. In addition, an adaptive sliding window size will be used to improve model performance.


INTRODUCTION
Useful information can be obtained for improving livestock management and animal welfare by observing and analyzing cattle behavior (Vasseur et al., 2012). Traditionally, cattle behavior has been evaluated by human observation; however, this approach is labor and time intensive, sensitive to human error, and requires subjective judgment.
With the advent of sensors and digital technology, automated animal behavior monitoring systems are rapidly evolving, offering the potential for accurate and continuous monitoring of animal behavior. In previous studies, 3-axis accelerometers, inertial measurement units (IMU), or ear tag sensors (Kour et al., 2018;Achour et al., 2022;Chang et al., 2022, respectively) were attached to the legs, back, neck, head, or ears of cattle to monitor their behavior (Rahman et al., 2018;Achour et al., 2019). Behavior classification algorithms based on decision trees (González et al., 2015;Andriamandroso et al., 2017;Arcidiacono et al., 2017), K-nearest neighbors (Benaissa et al., 2019;Balasso et al., 2021), and support vector machines (Shen et al., 2020) have been proposed to classify feeding, ruminating, resting, and standing behavior patterns by collecting triaxial acceleration data of cattle neck movement. Some deep learning algorithms such as a convolutional neural network (CNN) with a SoftMax classifier (Kasfi et al., 2016) and a recurrent neural network with a long short-time memory model (Peng et al., 2019(Peng et al., , 2020 have been used to classify multiple cattle behavior patterns with an attached acceleration sensor or IMU sensor. Recently, the classification of 9 cattle behavior patterns using a sequential deep neural network combined with a time-frequency domain joint data representation model was trained using IMU data (Hosseininoorbin et al., 2021). However, processing speed-the key point to real-time animal monitoring-was rarely considered as the main indicator of the classification models mentioned above. The majority of these behavior classification models used all 9 axes of IMU data to improve classification accuracy, but this makes it challenging to achieve real-time behavior classification due to the quantity of data. In this study, we aimed to establish a classification method for multiple cattle behavior patterns with high efficiency and accuracy.
The potential of a fully convolutional network (FCN) is to simplify and speed up the learning and inference of networks (Long et al., 2015). The FCN is a special case of CNN that trains end-to-end with a smaller number of parameters in a network. In addition, this model has high accuracy because of the use of the global average pooling layer (Zhou et al., 2016). The class activation map method (Wang et al., 2017) can be used to highlight which parts of the input time series contribute most to a certain classification. The FCN algorithm has been used for the analysis of time series data, such as classifying surgical skills using kinematic data (Ismail Fawaz et al., 2020). As a sequence, the FCN model was suitable to achieve the classification task using time series cattle movement data used in this study.
In this study, cattle behavior patterns were classified using the FCN model. To improve the effectiveness of model performance, IMU data dimensions were reduced during preprocessing. The dimensionality of the input data was reduced by using the square root of sum of squares instead of individual triaxial accelerations, angular velocity, and magnetic field data. Two classification models using FCN are proposed to classify multiple cattle behavior patterns: the first uses 3-column data computed from all data of IMU, and the second uses 2-column data computed from 3-axis acceleration and 3-axis angular velocity data. In addition, we compared classification performance between 2 window sizes to determine the effect of different time windows. The classification performance of the 3 models was evaluated using accuracy and F1 score, which is an evaluation of a test's accuracy.

Location and Animals
Cattle movement data were collected from a group of 12 Junlian Yellow cattle (aged between 3 and 4 yr) that are indigenous to the southern Sichuan mountains. Experiments were conducted at an animal feeding farm in Sichuan, China, from November 28 to December 12, 2020, and from June 26 to July 6, 2021. As shown in Figure 1, the 12 cows were placed in a barn and raised in semi-open single pens. Each cow could only feed from its own trough. Ten of the cows were pregnant, 2 of which had skin ringworms caused by fungal infections. All cows had access to water and were fed according to Chinese Ministry of Agriculture standard NYT/815/2004, which is a beef cattle feeding standard. This standard specifies the requirements of dietary DMI, net energy, and digestible CP, mineral elements, and vitamins for beef cattle. This study was approved by Sichuan Agricultural University Institutional Animal Care and Use Committee (Approval No. 20200057).

Collar IMU System
The sensor and battery were packaged in a waterproof box marked with numbers to identify the corresponding cows. A 9-axis IMU sensor (WT901SDCL, Shenzhen Vite Intelligent Technology Co. Ltd.), shown in Figure  2a, was used, and a lithium polymer battery powered the sensor. A box was attached to each cow's neck with an adjustable collar, keeping the box above the neck, as shown in Figure 2b. This location has been demonstrated to be effective for recording changes in cow behavior (Peng et al., 2019(Peng et al., , 2020. The IMU sensor collected accelerometer data at 10 Hz, which were stored on a 16 GB micro-SD card. The battery lasted 10 d with normal use. Four infrared cameras (MVM3150-B11, Shenzhen Weiduan Trading Co. Ltd.) were installed in 4 positions in the barn, from where they could record all the activities of the 12 cows, 24 h/d, as shown in Figure 1. The video footage was used to annotate IMU data and verify the classification performance.

Data Acquisition and Analysis
Data Acquisition. The sensor and video data were downloaded weekly at the same time as the battery was replaced. The camera and IMU times were synchronized.
Behavior Definition. In the current study, ruminating-lying, lying, feeding, rub scratching (leg), selflicking, rub scratching (neck), and social licking were the behavior classes used to develop the classification model. Thirteen behavior patterns were observed and labeled initially based on the synchronized video; ultimately, 7 behavior patterns were defined as the target behavior patterns for this study because of the low frequency and small sample sizes of some behavior patterns. Rub scratching (leg) behavior was included even though it rarely occurred, because it is an important indicator for detection of skin disease. The individual behavior patterns are defined in Table 1 and are based on previous studies (Martiskainen et al., 2009;Arcidiacono et al., 2017;Peng et al., 2019) and personal observations.

CNN and FCN.
A CNN is a feed-forward artificial neural network whose artificial neurons respond to the surrounding units; it exhibits excellent performance for data feature extraction. Generally, the network can be divided into 2 parts: feature extraction and classification. For feature selection, the CNN consists of n 3 stacked convolutional blocks, which correspond to n 2 convolutional layers (con), an activation function (σ), and 1 pooling (δ) layer. The feature recognition part is linked to the classification part, which is made up of n 1 fully connected (FC) layers (denoted λ); s, the SoftMax (or prediction) layer, is made up of |Z| classes. The following formula can be used to describe a typical CNN (Zaid et al., 2020): where con is implemented by the following function: where x(t) is the time series, w k represents the weight of each convolution kernel, and b represents the relevant bias value. The activation function (σ), such as rectified linear units (ReLU), can be expressed as a function of We proposed FCN models to overcome the vanishing and exploding gradients that make deep CNN models prone to overfitting. The structure of FCN is shown in Figure 3. The network consists of 3 convolutional layers as basic blocks, each of which performs a nonlinear exchange of the input time series, and each followed by a ReLU activation operation. After the 3 convolution blocks, the extracted features were input to a global average pooling (GAP) layer, which replaced the traditional final FC layer. Notably, the number of weights can be reduced. We performed classification using a SoftMax classifier, which is an activation function with several neurons equal to the number of classes in the dataset. An additional advantage of GAP layers is their natural extension: the class activation map (CAM), which can be used to determine which regions of the data contribute the most to the identification of a particular class. Let F k (t) represent the results of the last convolutional layer, which is a multivariate time series with k channels, and w k y is the weight corresponding to the output neuron of class y for the k filter. Since we use a GAP layer, the input of the final SoftMax function z can be defined as  The cow lies on the ground without ruminating. Feeding The cow puts its head into the feed pad and starts eating. Rub scratching (leg) The cow stands and uses its back legs for scratching, using whole-body movement. Self-licking The cow turns its neck and licks its own back. Rub scratching (neck) The cow stretches its neck violently up and down or forward and backward.

Social licking
The cow licks the surface of the head and neck of another cow in the communal space between 2 pens.
Finally, we computed the class activation map that explains the classification as label y (CAM y ) by the following formula: [5] Data Preprocessing and Behavior Classification Using the FCN Model. The data analysis process, from preprocessing to behavior pattern classification, is shown in Figure 3. Preprocessing and data annotation were similar to the methods described by Peng et al. (2019Peng et al. ( , 2020. In preprocessing, erroneous IMU data caused by the occasional data packet loss during the use of a micro-SD card were manually removed. The dataset used in the entire experiment consisted of IMU sensor data and video data collected from the cows. The IMU data were labeled as 1 of 7 behavior patterns if the data matched the corresponding video activity according to the definition in Table 1. After preprocessing and data labeling, the IMU data of each behavior pattern was ready for classification and prediction of each behavior pattern.
To improve training efficiency, we dimensionally reduced 9-axis data using the following formulas: where A denotes the square root of the sum of the squares of the triaxial acceleration; that is, a x , a y , a z ; G denotes square root of the sum of the squares of the triaxial angular velocity; that is, g x , g y , g z ; and M denotes the square root of the sum of the squares of the triaxial magnetic field data; that is, m x , m y , m z . After reducing the dimension of the data, we used A, G, and M as dataset 1 instead of 9-axis IMU data. In several previous studies, only acceleration and angular velocity were selected for behavior classification. We used dataset 2, which contained A and G, for comparison with dataset 1.  Figure 3. The structure of the proposed fully convolutional network (FCN) and data analysis process and FCN classifier construction for cattle behavior, where ReLu denotes rectified linear units; CONV1 denotes the first layer convolution neural network, CONV2 denotes the second layer convolution neural network, and CONV3 denotes the third layer convolution neural network; FCN denotes fully convolutional network; A denotes the square root of the sum of the squares of the triaxial acceleration, that is, a x , a y , a z ; G denotes square root of the sum of the squares of the triaxial angular velocity, that is, g x , g y , g z ; and M denotes the square root of the sum of the squares of the triaxial magnetic field data, that is, m x , m y , m z .
After preprocessing, data annotation, and dimension reduction, the 2 datasets for each behavior pattern contained 20,000 rows of data, which we used to develop the classification models. The information and relationships between the total sample size, each behavior data row, and the number of axes of 2 window sizes are shown in Table 2.
To compare the performance of classification models with different window sizes and different data sizes from IMU sensors, we sliced dataset 1 into 2 windows: window sizes 64 (6.4 s) and 128 (12.8 s), and we sliced dataset 2 to window size 128 (12.8 s). Consistent sample sizes of 7 behavior patterns were selected as the input dataset and, to increase the sample size, the data were sliced with a part overlap. Every batch added 10 rows of new data and retained the data from the previous batch. Before inputting all IMU data labeled with the 7 behavior patterns into the classifier, they were mixed and randomly shuffled 20 times to prevent model overfitting, ensuring that batches were more representative of the entire dataset. We allocated 70% of the randomly selected data to training and 30% to testing.
We determined the classification accuracy of all behavior patterns using a confusion matrix, which provided a measure of the correct and incorrect classification for each behavior model class. We used accuracy and F1 score to evaluate the performance of the FCN models. The following formulas were used to compute the accuracy and F1 score:

Accuracy TP TN TP TN FP FN
where TP is the result of the classification model correctly predicting the positives, TN is the result of the classification model correctly predicting the negatives, FP is the result of the classification model incorrectly predicting the positives, and FN is the result of the classification model incorrectly predicting the negatives.

RESULTS
The classification performance of the 3 FCN models in terms of accuracy and F1 score for each of the 2 window sizes are shown in Table 3. The window size 64 with dataset 2, window size 128 with dataset 2, and window size 128 with dataset 1 are denoted below as 64-2, 128-2, and 128-1, respectively.
The 128-1 FCN model had the best performance for both metrics, with a classification accuracy of 83.75%. The classification accuracies for 128-2 (81.83%) and 128-1 (83.75%) were similar but accuracy for 64-2 was lower (69.7%). In general, the 128-1 model yielded the best classification performance among the 3. When comparing the effect of window size, classification performance was better when using window size 128 than window size 64 for the same dataset. However, the model trained with dataset 1 (A, G, and M) achieved higher accuracy with the same window size (128).
The confusion matrix in Figure 4 shows the classification accuracy for each behavior pattern of the 3 models. Among all models, the classification accuracy of lying and feeding was relatively high; for 128-1, the classification accuracies for lying and feeding were >90%. In some cases, these 2 behavior patterns were misclassified as each other. Other cases, in all 3 models, were incorrectly classified as ruminating-lying. A small number of feeding behavior events were confused with where TP is the result of the classification model correctly predicting the positives, TN is the result of the classification model correctly predicting the negatives, FP is the result of the classification model incorrectly predicting the positives, and FN is the result of the classification model incorrectly predicting the negatives. self-licking due to the similar jaw movements produced by the bite action. The classification performance of ruminating-lying in all 3 models was relatively stable. The most common misclassification of ruminating-lying was lying or feeding in all 3 models.
The classification accuracy of rub scratching (leg) in the 64-2 model was only 68.9%; 22.8% of instances were classified incorrectly as rub scratching (neck) because of similarities in physical movement. However, the classification performance improved significantly in the 128-2 and 128-1 models, which achieved 95.3 and 98.1%, respectively. In these 2 models, only a few cases of rub scratching (leg) behaviors were incorrectly recognized as rub scratching (neck) or other classes. In the 64-2, 128-2, and 128-1 models, the classification accuracy of self-licking, rub scratching (neck), and social licking increased gradually. These 3 behavior patterns were often misclassified because of their variable and sudden movements. However, the accuracy of selflicking and rub scratching (neck) was 82.9 and 87.2%, respectively, in the 128-1 model. Performance for social licking was improved in 128-2 compared with that of the other 2 models; the algorithm found it difficult to separate social licking behaviors from other behavior classes, usually confusing it with self-licking or rub scratching (neck).
The classification accuracy of the 3 models revealed that the ruminating-lying and feeding behavior patterns were not significantly affected by window size or dataset selection. The reason for these stable classification performances is outlined in Figure 5; the x-axis acceleration waveforms of these 3 behavior patterns changed periodically and regularly, which consistently achieved high classification accuracies for both window sizes in previous research (Peng et al., 2019(Peng et al., , 2020. For the other 4 behavior patterns-rub scratching (leg), self-licking, rub scratching (neck), and social licking-irregularly fluctuating waveforms can be observed ( Figure 6). The movements of rubbing either the leg or neck are produced by the cow's neck, body, and legs. The waveforms of these behavior patterns are variable because the duration and amplitude of each movement depends on the particular degree of the itch of the cows. Similarly, a large area of the front of the cow's body was used to turn around for the self-licking and social licking behaviors. In summary, the period of a single movement of these 4 behavior patterns was irregular and longer than that of the other 3 behavior patterns. This could explain the better classification performance with a window size of 128 because complete movements can be included within one window (12.8 s).

DISCUSSION
In this study, we trained FCN models with processed IMU data collected from 12 cows. We prepared 2 datasets by computing the square root of the sums of squares acceleration, angular velocity, and magnetometer data to improve classification efficiency. We compared the overall classification accuracy of the 3 models and found that the 128-1 model achieved the best classification performance (83.75%). The model using dataset 1, which included the magnetometer data, achieved the highest accuracy. For the classification of some rotating movements, magnetometer data increased the orientation of the features. In our previous work (Wu et al., 2022), we established a deep residual bidirectional a long short-time memory (LSTM) model to classify 6 cattle behavior patterns with a 9-axis IMU data. In this study, we used 3-axis IMU data acquired by dimensionality reduction of 9-axis IMU data and classified 7 behavior patterns; the classification performance  achieved in this study was almost the same as that of our previous study but with improved running speed and processing efficiency.
The classification accuracy of each behavior pattern was affected by dataset and window size selection. Lying and feeding behaviors were classified with high accuracy by all models, consistent with previous works (Peng et al., 2019(Peng et al., , 2020. Ruminating while lying is a behavior pattern with periodically changing movement. However, the classification accuracy of ruminating while lying was 95.3% with a window size of 12.8 s, which was lower than in previous research (Peng et al., 2019). This could be because some of the data were collected from cows in the third trimester; the amplitude of feeding movement during this period was smaller than usual, and it is possible that ruminating (lying) was confused with feeding.
Both rub scratching (leg) and rub scratching (neck) behaviors are whole-body activities that imply head and neck movements. According to video data, the movement of these 2 behavior patterns was flexible and diverse, with different periods and amplitudes. Thus, 1 sample of window size 64 (6.4 s) might not contain a complete rubbing action, which could lead to misclassification. This could also provide an explanation for the misclassification of self-licking, as this behavior can last for a relatively long time. Social licking cases were difficult to classify for all models. As shown in Figure 6, the period and physical movement of social licking can be unpredictable, because this interaction is for communication and cleaning; thus it can lead to incorrect classification as self-licking or rub scratching (neck), which have similar neck movements to social licking.
We also prepared the data of 7 behavior patterns to prevent overfitting. Therefore, the overall sample sizes were limited because rub scratching (leg) rarely happened; hence, the data were difficult to collect. We collected itching data from only 2 cows, which limits the model's applicability to other cattle species and types. To improve the model's performance and generalizability, a larger dataset containing special behaviors, including itching and licking, is necessary for future development of the model. To enhance applicability in practice, we will upgrade the equipment by extending  battery life using solar cells. To improve the performance of the behavior classification model, an adaptive sliding window could be used instead of the fixed-size sliding window.

CONCLUSIONS
We classified and compared 7 behavior patterns by FCN models using IMU data and different window sizes and datasets. We achieved the best performance in overall classification with a window size of 128-1 (83.75%). For particular behavior patterns, window size had little effect on the classification of the 3 periodically changing behavior patterns. For the other 4 behavior patterns, which had diverse periods and amplitudes, the model performed better with a larger window size (128). Additionally, dataset 1, which contained magnetometer data, performed well for behavior patterns that included many rotating movements. During data preprocessing, we simplified the 9-axis data by calculating the square root of the sum of squares of each sensor's 3-axis data. This approach improved the effectiveness of model training. This model is suitable for real-time monitoring and could provide benefits for farm management. In future research, we expect to increase the sample size to improve the model's accuracy and to generalize its applicability.