A function-based implementation of k-means clustering that maintains data association.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

K-Means Clustering

A repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the tests directory.

The thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.

For example, given some data where each element is of form

# Each element would actually be a Numpy array, but the following uses lists for readability.
[
  [1, 2, 3, 4, 5],
  [4, 6, 7, 8, 2],
  ...
]

specifying ndim=3 will result in only the first three elements of each data point being used for each operation.

This is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (segmentation.py) in this same project. Other examples of use could be for maintaining data association in object detection elements. Given some

[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]

we may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.

How it Works

Specifying the k value results in a dict[int: NDArray] where each NDArray contains the elements within the cluster. The keys of this dict range from 0 to k-1, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.

Here is an example of the use of the cluster function:

import numpy as np
from kmeans import cluster

data = np.random.random((10000, 7))

# Only cluster along first three elements of a given data point.
# Choose initial_means randomly (default)
clusters, centroids = cluster(data, k=4, ndim=3, tolerance=0.001)

for key in clusters:
  cluster = clusters[key]
  centroid = centroids[key]
  print(cluster, centroid, sep="\n")

Features

k-means clustering (no side-effects)
k-means clustering w/ animation
- (2-D & 3-D)
image segmentation via kmeans.segmentation.segment_img function

k-means Animation

Using the view_clustering function

2-D Case (Smallest Tolerance Possible)

kmeans2D_animate.webm

3-D Case (Tolerance = 0.001)

kmeans3D_animate.webm

Image Segmentation

Perform image segmentation based on color groups specified by the user.

Two options:

Averaged Colors

k=4

seg_groups04

k=10

seg_groups10

Random Colors

k=4

seg_rand_groups04_cpy

Developed With

Python (3.12.1)
Numpy (1.26.2)
Matplotlib (3.8.4)

However, no features specific to Python 3.12 were used.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.3

Apr 29, 2024

1.0.2

Apr 29, 2024

1.0.1

Apr 16, 2024

1.0.0

Apr 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmeans_tjdwill-1.0.3.tar.gz (81.9 kB view hashes)

Uploaded Apr 29, 2024 Source

Built Distribution

kmeans_tjdwill-1.0.3-py3-none-any.whl (12.3 kB view hashes)

Uploaded Apr 29, 2024 Python 3

Hashes for kmeans_tjdwill-1.0.3.tar.gz

Hashes for kmeans_tjdwill-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`2d0e00aab7de1c5ceeb8bf37a7fc69c8f07e9c5526ca32b1e69b4bfa9e4c27a8`
MD5	`5c90d4ece83de3242f86d6273ec3d56c`
BLAKE2b-256	`fa36c0b82dc68015503402592bc08def84a03ac0694d6bbc83e2e18e01daa0d8`

Hashes for kmeans_tjdwill-1.0.3-py3-none-any.whl

Hashes for kmeans_tjdwill-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0651dc529dbfb35478d45947cea9ab061e30ea994136407322fd037f0023c803`
MD5	`53d2ec758975326c61eecd4ef7acc5a1`
BLAKE2b-256	`3bfcf6882b63ddc6ac6f22e2eb0943585faa85dbb6f5079405f61435049813fc`