Abstract-In radiotherapy using 18-fluorodeoxyglucose

positron emission tomography (18F-FDG-PET), the accurate delineation of the

biological tumour volume (BTV) is a crucial step. In this study, new approach

to segment the BTV in F-FDG-PET images is suggested. The technique is based on

the k-means clustering algorithm incorporating automatic optimal cluster number

estimation, using intrinsic positron emission tomography image information. Partitioning

data into a finite number of k homogenous and separate clusters (groups)

without use of prior knowledge is carried out by some unsupervised partitioning

algorithm like the k-means clustering algorithm. To evaluate these resultant

clusters for finding optimal number of clusters, properties such as cluster

density, size, shape and separability are typically examined by some cluster

validation methods. Mainly the aim of clustering analysis is to find the

overall compactness of the clustering solution, for example variance within

cluster should be a minimum and separation between the clusters should be a

maximum.

I.

INTRODUCTION

There

are several approaches to validate the segmentation techniques such as phantom

studies and the macroscopic surgical specimen obtained from histology. The use

of macroscopic samples for validation of segmentation techniques in positron

emission tomography (PET) images is one of the most promising approaches

reported so far in clinical studies, the procedure consists of the comparison

of the tumour volumes defined on the PET data with actual tumour volumes

measured on the macroscopic samples recorded from histology (where PET was

performed prior to surgery). Segmentation using the cluster –based algorithms

is very popular, but the main problem in this case is the determination of the

optimal and desired number of clusters. In this, we have implemented an

approach based on k-means algorithm with an automatic estimation12 of the

optimal number of clusters, based on the maximum intensity ratio in a given

volume of interest (VOI).

II.

METHEDOLOGY

Calculate

the VE for a range of k (2–50

clusters), and the optimal number which corresponds to the minimum of SVEs. This method gives

good results but consumes a significant computation time by performing the

clustering for a large range of cluster values before selecting the optimal

number of clusters. Several approaches have been proposed in the literature to

identify the optimal cluster number to better fit the data, three of them are

used.

Unfortunately,

the results are not promising because they are not adapted to PET image

segmentation. So our goal in this study is to improve the k-means clustering

method, by incorporating an automatic determination of the optimal number of

clusters using a new criterion based on PET image features.

After

analysing the variation of the maximum activity (intensity) of the uptake () in the VOI by scanning all slices, we

conclude for all patients that the maximum intensity value decreases from

middle to frontier slices, and the maximum intensity is often situated almost at

the centre of the BTV. The optimal cluster number has a minimum value at the

centre of the BTV, and increases from central to frontier slices. This

correlation between the optimal number of clusters and the maximum intensity

motivates our choice of the following slice image feature:

Where

is

the maximum activity (intensity) of the uptake in the corresponding slice, is

the maximum activity in all slices that encompasses tumour volume BTV inside

the , and is the difference between the maximum and the

minimum values of (Imax (slice)/Imax (VOI)) in the .

Similarly

to , the new criterion , has a maximum value

for the middle slices and decreases for the frontiers of the BTV. Note that the

r values range from ‘0’ to ‘1’ for all patients.

1.

Modelling: This section is dedicated to finding a

relationship between the optimal number of clusters k, and the new criterion r.

This relationship could be used to determine the optimal cluster number for the

segmentation of new PET images using only the new slice image feature r.

After analysing the variation of k in function of r criterion

(for all patients included in this study), we use two fitting models: an

exponential and a power function given by (a) and (b), respectively,

k

= ? ? e?. r + 1 (a)

k

= a ? + 1 (b)

Where

?, ?, a, b are coefficients of fitting models and r

is the proposed criterion. The fitting accuracy evaluation is based on the R-square

criterion. Note that we added ‘1’ to the original fitting equation to avoid

clustering the image with one cluster for the high values of r.

2.

Generalisation: The aim of this step is to automate the

choice of the optimal cluster number for all patients using one corresponding

relationship function by defining a generalised model for all patients. For

this reason, we have divided the database randomly into two parts of 50% each.

The first part (validation set), contains three patients is used for optimising

the model coefficients and fixing the optimal power and exponential generalised

model. The second part (test set), contains four patients, is used for testing

the accuracy of the fixed optimal model.

According

to the R-square criterion, the optimal exponential and power generalised

function can be rewritten as follows:

k

= 46.52e?5.918 × r

+ 1

k

= 1.683r?1.264 + 1

k

mean alogorithm 3 :

Fig

1. K-mean

clustering algorithm

3.

Optimization by new technique: The

underlying idea behind it is an analysis of the “movement” of objects between

clusters, considered either forward from k to k+1 or backwards from k+1 to k

groups. In other words we find the movement in membership or joint probability

around k groups. The joint probability obtained from adjacent consecutive k

numbers of groups will be used to produce a diagonally dominant probability

matrix for optimal value of k homogenous and separable groups. The maximum

normalised value of the trace as the greatest value for k within the range

tested, will be defined as the optimal value of k clusters for the given

dataset. Formally, we may describe our approach as follows. For a given choice

of k = number of clusters, a given choice of clustering technique U, and a

given choice of V = set of parameters v1…vn used to control the clustering

technique, we first construct a set of clusters (U,V) = {} with i=1..k. Next, we

construct sets of clusters (U,V), and (U,V) using the same

clustering technique. In the work reported here, we will not vary U and V so we

may write these cluster sets more simply as , and . Now these three (, and ) consecutive groups

around k will be used to find the proportion of common objects from Ck to , and , to create a

rectangular proportion matrix of size m x n, where m and n correspond to rows

and columns of proportion matrix . We denote the

proportion of data elements in common between a particular pair of clusters,

say cluster from and cluster from by, which can be

abbreviated to . Similarly, we can

compute, k to create a

rectangular matrix of size n x m where n correspond to columns and m to rows.

Note that in general is not equal to as they have different

cardinality meaning k+1 not equal to k and vice versa. To investigate how much

movement of objects occurs from Ck to Ck+1 and from to we will apply the dot product of matrix for

size m x n ( to ) and n x m rectangular

matrices ( to ) to get the joint

information in square matrix m x m for clusters. Due to the

row sum constant of 1 the resultant square matrix is also known as a row

stochastic matrix 4 or probability and transition matrix 5. The trace of

resultant set of clusters will be

normalised (average of the trace) to determine the set of more stable (optimal

k) clusters as if the

normalised trace is maximum and may change occur in the depending on the

dataset for the range of adjacent set of k values. This matrix will be used to

determine the maximum normalised trace value for determining set of more stable

or consistent clusters , that will indicate set of clusters in are stable and

completely separated from one another. The steps involves in determining the

optimal value of k from the resultant clusters are follows as:

1.

Create the m x n forward proportion matrix from k to k+1

= (1)

2.

Create the n x m backward proportion matrix from k+1 to k

= (2)

Where

= 1,2,3 … . . and = 1,2,3 … . .

The

dot product of (1) by (2) will give us a m x m matrix as in (3) below with the

entries showing the joint probabilities of the forward/back movement of the

objects between the set of clusters from k to k+1 and k+1 to k.

= * (3)

The

new index can be calculated from in (3) as follows:

= (4)

From (4) the normalised maximum trace value

for will indicate the set of stable cluster at k

value. In an extreme situation the normalised trace is equal to 1, that is

where the set of clusters in will keep splitting

until the value of the trace is 1 and it may decrease or increase as we

continue but will always stay less than 1.

III.

CONCLUSION

A new unsupervised cluster-based approach for segmenting the BTV in F-FDG-PET

images is introduced. The system is more reliable and has very less error. It can be improved by technique

used in determining an optimal value of K in K-means

clustering, for which

k-means clustering it uses a method to find an optimal value of k number of

clusters, using the features and variables inherited from datasets. The new

proposed method is based on comparison of movement of objects forward/back from

k to k+1 and k+1 to k set of clusters to find the joint probability, which is

different from the other methods and indexes that are based on the distance.

REFERENCES

1 M. Mehar, K. Matawie and A. Maeder, “Determining an

optimal value of K in K-means clustering,” 2013 IEEE International Conference on

Bioinformatics and Biomedicine, Shanghai, 2013, pp. 51-55.

2 A. Tafsast, M. L. Hadjili, A. Bouakaz and N. Benoudjit,

“Unsupervised cluster-based method for segmenting biological tumour volume

of laryngeal tumours in 18F-FDG-PET images,” in IET

Image Processing, vol. 11, no. 6, pp. 389-396, 6 2017.

3 MacQueen,

J.: ‘Some methods for classification and analysis of multivariate

observations’. Int. Proc. Fifth Berkeley Symp. on Mathematical Statistics and

Probability, 1967, 1, (14), pp. 281–297

4 Johnson,

C.R., Row stochastic matrices similar to doubly stochastic matrices. Linear and

Multilinear Algebra, 1981. 10(2): p. 113-130.

5 Ross, S.M.,

Stochastic processes, 1996, Wiley (New York).