Skip to content

cluster Module

The cluster module describes the LabeledCloud and Cluster objects used in Spyral.

Cluster

Representation of trajectory cluster data.

Parameters:

Name Type Description Default
event int

The event number (default = -1)

-1
label int

The label from the clustering algorithm (default = -1)

-1
data ndarray

The PointCloud data for the Cluster (default = empty array)

EMPTY_DATA

Attributes:

Name Type Description
event int

The event number

label int

The cluster label from the algorithm

data ndarray

The point cloud data (trimmed down). Contains position, integrated charge

x_spline BSpline | None

An optional spline on z-x to smooth the cluster.

y_spline BSpline | None

An optional spline on z-y to smooth the cluster.

c_spline BSpline | None

An optional spline on z-charge to smooth the cluster.

Methods:

Name Description
apply_smoothing_splines

Apply smoothing to the underlying cluster data with smoothing splines

create_splines

Create smoothing splines for the x,y,charge dimensions

drop_outliers

Use the scikit-learn LocalOutlierFactor to identify and remove outliers in the Cluster

Source code in src/spyral/core/cluster.py
class Cluster:
    """Representation of trajectory cluster data.

    Parameters
    ----------
    event: int
        The event number (default = -1)
    label: int
        The label from the clustering algorithm (default = -1)
    data: ndarray
        The PointCloud data for the Cluster (default = empty array)

    Attributes
    ----------
    event: int
        The event number
    label: int
        The cluster label from the algorithm
    data: ndarray
        The point cloud data (trimmed down). Contains position, integrated charge
    x_spline: BSpline | None
        An optional spline on z-x to smooth the cluster.
    y_spline: BSpline | None
        An optional spline on z-y to smooth the cluster.
    c_spline: BSpline | None
        An optional spline on z-charge to smooth the cluster.

    Methods
    -------
    apply_smoothing_splines(smoothing=1.0)
        Apply smoothing to the underlying cluster data with smoothing splines
    create_splines(smoothing=1.0)
        Create smoothing splines for the x,y,charge dimensions
    drop_outliers(scale=0.05)
        Use the scikit-learn LocalOutlierFactor to identify and remove outliers in the Cluster
    """

    def __init__(
        self,
        event: int = -1,
        label: int = -1,
        data: np.ndarray = EMPTY_DATA,
    ):
        self.event = event
        self.label = label
        self.data = data
        self.x_spline: BSpline | None = None
        self.y_spline: BSpline | None = None
        self.c_spline: BSpline | None = None

    def drop_outliers(self, scale: float = 0.05) -> np.ndarray:
        """Use scikit-learn LocalOutlierFactor to test the cluster for spatial outliers.

        This helps reduce noise when fitting the data.

        Parameters
        ----------
        scale: float
            Scale factor to be multiplied by the length of the trajectory to get
            the number of neighbors over which to test

        Returns
        -------
        numpy.ndarray
            The indicies of points labeled as outliers
        """
        neighbors = int(scale * len(self.data))  # 0.05 default
        if neighbors < 2:
            neighbors = 2
        test_data = self.data[:, :3].copy()
        neigh = LocalOutlierFactor(n_neighbors=neighbors)
        result = neigh.fit_predict(test_data)
        mask = result > 0
        self.data = self.data[mask]  # label=-1 is an outlier
        return np.flatnonzero(~mask)  # Invert the mask to get outliers

    def create_splines(self, smoothing: float = 1.0) -> None:
        """Create smoothing splines for the x,y,charge dimensions

        Create smoothing splines along the z-coordinate for x, y, and charge.
        The degree of smoothing is controlled by the smoothing parameter. smoothing = 0.0 is
        no smoothing (pure interpolation) and higher values gives a higher degree of smoothing.

        Parameters
        ----------
        smoothing: float
            The smoothing factor (lambda in the scipy notation). Must be a positive float or zero.
        """

        self.x_spline = make_smoothing_spline(
            self.data[:, 2], self.data[:, 0], lam=smoothing
        )
        self.y_spline = make_smoothing_spline(
            self.data[:, 2], self.data[:, 1], lam=smoothing
        )
        self.c_spline = make_smoothing_spline(
            self.data[:, 2], self.data[:, 3], lam=smoothing
        )

    def apply_smoothing_splines(self, smoothing: float = 1.0) -> None:
        """Apply smoothing to the underlying cluster data with smoothing splines

        Apply smoothing splines to the x, y, and charge dimensions as a function of
        z. The degree of smoothing is controlled by the smoothing parameter. If the splines
        are not already created using the create_splines function, they will be created here.

        Note: This function modifies the underlying data in the cluster. This is not a reversible operation.

        Parameters
        ----------
        smoothing: float
            The smoothing factor (lambda in the scipy notation). Must be a positive float or zero.

        """

        if self.x_spline is None or self.y_spline is None or self.c_spline is None:
            self.create_splines(smoothing)

        self.data[:, 0] = self.x_spline(self.data[:, 2])  # type: ignore
        self.data[:, 1] = self.y_spline(self.data[:, 2])  # type: ignore
        self.data[:, 3] = self.c_spline(self.data[:, 2])  # type: ignore

apply_smoothing_splines(smoothing=1.0)

Apply smoothing to the underlying cluster data with smoothing splines

Apply smoothing splines to the x, y, and charge dimensions as a function of z. The degree of smoothing is controlled by the smoothing parameter. If the splines are not already created using the create_splines function, they will be created here.

Note: This function modifies the underlying data in the cluster. This is not a reversible operation.

Parameters:

Name Type Description Default
smoothing float

The smoothing factor (lambda in the scipy notation). Must be a positive float or zero.

1.0
Source code in src/spyral/core/cluster.py
def apply_smoothing_splines(self, smoothing: float = 1.0) -> None:
    """Apply smoothing to the underlying cluster data with smoothing splines

    Apply smoothing splines to the x, y, and charge dimensions as a function of
    z. The degree of smoothing is controlled by the smoothing parameter. If the splines
    are not already created using the create_splines function, they will be created here.

    Note: This function modifies the underlying data in the cluster. This is not a reversible operation.

    Parameters
    ----------
    smoothing: float
        The smoothing factor (lambda in the scipy notation). Must be a positive float or zero.

    """

    if self.x_spline is None or self.y_spline is None or self.c_spline is None:
        self.create_splines(smoothing)

    self.data[:, 0] = self.x_spline(self.data[:, 2])  # type: ignore
    self.data[:, 1] = self.y_spline(self.data[:, 2])  # type: ignore
    self.data[:, 3] = self.c_spline(self.data[:, 2])  # type: ignore

create_splines(smoothing=1.0)

Create smoothing splines for the x,y,charge dimensions

Create smoothing splines along the z-coordinate for x, y, and charge. The degree of smoothing is controlled by the smoothing parameter. smoothing = 0.0 is no smoothing (pure interpolation) and higher values gives a higher degree of smoothing.

Parameters:

Name Type Description Default
smoothing float

The smoothing factor (lambda in the scipy notation). Must be a positive float or zero.

1.0
Source code in src/spyral/core/cluster.py
def create_splines(self, smoothing: float = 1.0) -> None:
    """Create smoothing splines for the x,y,charge dimensions

    Create smoothing splines along the z-coordinate for x, y, and charge.
    The degree of smoothing is controlled by the smoothing parameter. smoothing = 0.0 is
    no smoothing (pure interpolation) and higher values gives a higher degree of smoothing.

    Parameters
    ----------
    smoothing: float
        The smoothing factor (lambda in the scipy notation). Must be a positive float or zero.
    """

    self.x_spline = make_smoothing_spline(
        self.data[:, 2], self.data[:, 0], lam=smoothing
    )
    self.y_spline = make_smoothing_spline(
        self.data[:, 2], self.data[:, 1], lam=smoothing
    )
    self.c_spline = make_smoothing_spline(
        self.data[:, 2], self.data[:, 3], lam=smoothing
    )

drop_outliers(scale=0.05)

Use scikit-learn LocalOutlierFactor to test the cluster for spatial outliers.

This helps reduce noise when fitting the data.

Parameters:

Name Type Description Default
scale float

Scale factor to be multiplied by the length of the trajectory to get the number of neighbors over which to test

0.05

Returns:

Type Description
ndarray

The indicies of points labeled as outliers

Source code in src/spyral/core/cluster.py
def drop_outliers(self, scale: float = 0.05) -> np.ndarray:
    """Use scikit-learn LocalOutlierFactor to test the cluster for spatial outliers.

    This helps reduce noise when fitting the data.

    Parameters
    ----------
    scale: float
        Scale factor to be multiplied by the length of the trajectory to get
        the number of neighbors over which to test

    Returns
    -------
    numpy.ndarray
        The indicies of points labeled as outliers
    """
    neighbors = int(scale * len(self.data))  # 0.05 default
    if neighbors < 2:
        neighbors = 2
    test_data = self.data[:, :3].copy()
    neigh = LocalOutlierFactor(n_neighbors=neighbors)
    result = neigh.fit_predict(test_data)
    mask = result > 0
    self.data = self.data[mask]  # label=-1 is an outlier
    return np.flatnonzero(~mask)  # Invert the mask to get outliers

LabeledCloud dataclass

Utility dataclass just for temporary holding in the clustering algorithims

Attributes:

Name Type Description
label int

The label from the clustering algorithm

point_cloud PointCloud

The cluster data in original point cloud coordinates

parent_indicies ndarray

The incidies of this cluster's data in the original parent point cloud

Source code in src/spyral/core/cluster.py
@dataclass
class LabeledCloud:
    """Utility dataclass just for temporary holding in the clustering algorithims

    Attributes
    ----------
    label: int
        The label from the clustering algorithm
    point_cloud:
        The cluster data in original point cloud coordinates
    parent_indicies:
        The incidies of this cluster's data in the original parent point cloud
    """

    label: int  # default is noise label
    point_cloud: PointCloud
    parent_indicies: np.ndarray

convert_labeled_to_cluster(cloud, params)

Function which takes in a LabeledCloud and ClusterParamters and returns a Cluster

Parameters:

Name Type Description Default
cloud LabeledCloud

The LabeledCloud to convert

required
params ClusterParameters

Configuration parameters for the cluster

required

Returns:

Type Description
tuple[Cluster, ndarray]

A two element tuple containing first the Cluster, and second a list of indicies in the preciding cloud that were labeled as noise.

Source code in src/spyral/core/cluster.py
def convert_labeled_to_cluster(
    cloud: LabeledCloud, params: ClusterParameters
) -> tuple[Cluster, np.ndarray]:
    """Function which takes in a LabeledCloud and ClusterParamters and returns a Cluster

    Parameters
    ----------
    cloud: LabeledCloud
        The LabeledCloud to convert
    params: ClusterParameters
        Configuration parameters for the cluster

    Returns
    -------
    tuple[Cluster, np.ndarray]
        A two element tuple containing first the Cluster,
        and second a list of indicies in the preciding
        cloud that were labeled as noise.
    """
    # Joining can make point cloud unsorted
    sort_point_cloud_in_z(cloud.point_cloud)
    data = np.zeros((len(cloud.point_cloud), 5))
    data[:, :3] = cloud.point_cloud.data[:, :3]  # position
    data[:, 3] = cloud.point_cloud.data[:, 4]  # peak integral
    data[:, 4] = cloud.point_cloud.data[:, 7]  # scale (big or small)
    cluster = Cluster(cloud.point_cloud.event_number, cloud.label, data)
    outliers = cluster.drop_outliers(params.outlier_scale_factor)
    return (cluster, outliers)