Process Mineralogy Today

A discussion resource for process mineralogy using todays technologies

close

MinAssist Privacy Policy

Privacy is important and we respect yours.

This policy sets out our privacy practices and how we handle information we may collect from or about you when you visit MinAssist.com.au.

What do we capture and what for?

You may choose to provide information such as email address, name, phone number, company and so forth to MinAssist in order to:

  • subscribe to the MinAssist Newsletter;
  • download resources like digital books; or
  • make an enquiry about MinAssist’s services.

This information is used by MinAssist to:

  • respond to enquiries originating from you.
  • add you to a mailing list for newsletters and other occasional email contact. You may request at any time to be removed from this list.
  • add your to our contacts database which may result in email, postal or telephone communication. You may request at any time to be removed from this list.

Google Analytics

MInAssist also uses Google Analytics, a web analytics service. Google Analytics uses cookies, web beacons, and other means to help MInAssit analyse how users use the site. For information about Google’s Privacy Policy please refer to http://www.google.com/intl/en/privacypolicy.html.

Sharing of Information

MinAssist may share information under the following circumstances:

  • legal requirement - courts, administrative agencies, or other government entities.
  • organisations that may provide services to us - where relevant we may need to share some of your information to companies we engage with (for example, accountants, lawyers, business advisors, marketing service providers, debt collection service providers, equipment providers). Note that these third parties are prohibited by law or by contract from processing personal information for purposes other than those disclosed in this Privacy Policy.
  • where the information is already in the public domain.
  • business sale or merger - where contact data may be passed to new owners.

Access

At your request, we will provide you with reasonable access to your personal information, so that you can review what we have stored and, if you choose, request corrections to it. Please request access by writing to us at the address listed in the Contact Information section below.

Security

MinAssist combines technical and physical safeguards with employee policies and procedures to protect your information. We will use commercially reasonable efforts to protect your information.

Links to Other Websites

When you click on a link on this website that takes you to a website operated by another company, you will be subject to that company’s privacy practices.

Amendments

MinAssist may amend this Privacy Policy from time to time.

Enforcement, Dispute Resolution, and Verification

Please contact us with any questions or concerns related to this Privacy Policy by using the address listed in the Contact Information section below. We will investigate and attempt to resolve complaints or disputes regarding personal information.

Contact Information

If you have questions or concerns related to this Privacy Policy, you may contact us by email at enquiries@minassist.com.au.

Working with High-Dimensional Data Part 2: Classification by Cluster Analysis

In part 1 of this introductory series on working with high-dimensional data we determined that cluster analysis is a commonly used method to perform more reliable and scalable classification of large data sets. In the case of high-dimensional data such as geochemical assays or SEM-EDS mineralogy and texture analyses, dimensionality reduction enables visualisation of the raw data and provides a framework to assess the quality of the cluster analysis results. In this article we take a closer look at the K-means clustering method to form a better understanding of its strengths and weaknesses.

 

Continuing with the example in part 1 (chemical assays for four elements from a mine planning feasibility study) figure 1 shows the reduced 2D representation of the original data using multi-dimensional scaling (MDS). Together the x and y dimensions, with arbitrary units, describe the relative positioning of the high-dimensional data, while retaining assay information from the original four elements. In this example we see that MDS highlights two or three main clusters to the right in the x direction, with the remaining data points scattered almost randomly.

Figure 1: The transformed representation of the original four-dimensional data using MDS with arbitrary dimensions on the x and y axes.

 

The next step is to develop a classification using cluster analysis. Arguably the most popular clustering approach is the K-means method, which involves six basic steps:

1) Assign a user-defined number of random initial centroids to the data set (‘K’ number of centroids/clusters),

2) For each centroid, identify the set of closest data points and assign them as belonging to that cluster (distances can be computed using a range of metrics, however, euclidean distance is probably the most popular distance metric),

3) Compute the mean values for each cluster,

4) Assign the newly computed mean as an updated centroid, and

5) For each updated centroid, identify the set of closest data points and re-assign them as belonging to that cluster.

6) Repeat steps 2-5 until convergence, i.e. the cluster centroids no longer change.

 

K-means is a fast and effective method, and is available in most analytical software packages (e.g.: ioGAS for geochemistry). However, the main drawback of the K-means method is that it does not identify true clusters, but rather groups data points into the number of clusters the user specifies, which makes it more of an unsupervised partitioning method. In many cases this is not a problem, but it does happen that K-means forces samples into groups they do not belong or where the user doesn’t want them, which is at least something to be aware of. This interactive graph is a good demonstration how the method works.

 

Figure 2 demonstrates some of the drawbacks I mentioned using our working example. Here I requested seven clusters. Zone A indicates three samples that are grouped into cluster 0, while their chemical profiles indicate they should be left separate from cluster 0 and probably from each other. Zones B and C show similar occurrences where samples are incorrectly included in clusters. The main challenge with K-means is choosing the number of clusters and deciding whether or not divisions between samples, or the inclusion of samples in clusters, are correct. The process requires background knowledge of the data, the project objectives, and a bit of test work. In general it is recommended to go through several iterations of clustering, each time using a different number of requested clusters.

Figure 2: Clustering results using the K-means method and requesting seven clusters from the algorithm (arbitrary dimensions on the x and y axes computed from MDS). The legend shows the number of samples in each cluster and their percentage of the total.

 

In this example we developed a two-step clustering procedure to produce the desired result. The first step is a standard K-means analysis requesting 24 clusters (we arrived at the number 24 after several test runs). The large number of clusters places key divisions between samples, however, it also divides the data into far too many groups. The second step recombines “sub-clusters” (guided by their chemistry) into their appropriate clusters (figure 3). The resulting seven clusters correctly group samples, and leave the “outlier” samples to the far left on the x-axis as individuals.

 

Figure 3: Demonstrating a two-step procedure to correctly identify geochemical groups. Step 1 identifies a large number of clusters and places key cluster boundaries in the correct locations, while step 2 merges the sub-clusters to create the desired geochemical groups (arbitrary dimensions on the x and y axes computed from MDS). The legends show the number of samples in each cluster and their percentage of the total.

 

I am not familiar with the details and flexibility of clustering algorithms in commercial analytical packages, but the Python open source scripting libraries allow easy implementation of this two-step procedure. Specifically it gives us greater control over the K-means method and the results it produces. 

 

In part 1 I explained that the chemical assays in this example were performed on screened fractions and therefore partially represent the mineral processing behaviour of the feed material. It follows that the output from the cluster analysis serve as a framework for a geometallurgical domain model. In the next article in this introductory series (part 3) we will explore how the geospatial relationships between the clusters define a behaviour profile and how it assists the operator with critical mine planning decisions.

 

Feel free to provide some feedback on your experiences with K-means clustering and its implementation in commercial analytical packages.


Share On LinkedIn


About the Author: Pieter Botha


Previous Arrow Back to all posts

Follow Blog

Subscribe to the MinAssist Newsletter group and receive notifications of new Process Mineralogy Today blog posts.

We promise not to spam you and we'll keep your email safe and secure. MinAssist may send you occassional email correspondence but you can unsubscribe at any time.