Skip to content

array_crop

ArrayCropParams ¤

Bases: PadValueParams

Mixin class containing pad value parameters needed for array crop transformers.

getArrayLength ¤

getArrayLength()

Gets the array length parameter.

Returns:

Type Description
int

array length.

Source code in src/kamae/spark/transformers/array_crop.py
58
59
60
61
62
63
def getArrayLength(self) -> int:
    """
    Gets the array length parameter.
    :returns: array length.
    """
    return self.getOrDefault(self.arrayLength)

setArrayLength ¤

setArrayLength(value)

Sets the parameter array length to the given value.

Parameters:

Name Type Description Default
value int

array length.

required

Returns:

Type Description
ArrayCropParams

Instance of class mixed in.

Source code in src/kamae/spark/transformers/array_crop.py
48
49
50
51
52
53
54
55
56
def setArrayLength(self, value: int) -> "ArrayCropParams":
    """
    Sets the parameter array length to the given value.
    :param value: array length.
    :returns: Instance of class mixed in.
    """
    if value < 1:
        raise ValueError("Array length must be greater than 0.")
    return self._set(arrayLength=value)

ArrayCropTransformer ¤

ArrayCropTransformer(
    inputCol=None,
    outputCol=None,
    inputDtype=None,
    outputDtype=None,
    layerName=None,
    arrayLength=128,
    padValue=None,
)

Bases: BaseTransformer, SingleInputSingleOutputParams, ArrayCropParams

Transformer that reshapes arrays into consistent shapes by either cropping or padding.

If the tensor is shorter than the specified length, it is padded with specified pad value.

Initialises the ArrayCropTransformer

Parameters:

Name Type Description Default
inputCol Optional[str]

Input column name.

None
outputCol Optional[str]

Output column name.

None
inputDtype Optional[Union[str, int, float]]

Input data type to cast input column(s) to before transforming.

None
outputDtype Optional[Union[str, int, float]]

Output data type to cast the output column to after transforming.

None
layerName Optional[str]

Name of the layer. Used as the name of the Keras layer

None
arrayLength Optional[int]

The length to crop or pad the arrays to. Defaults to 128.

128
padValue Optional[Union[str, int, float]]

The value pad the arrays with. Defaults to None.

None

Returns:

Type Description
None

None

Source code in src/kamae/spark/transformers/array_crop.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
@keyword_only
def __init__(
    self,
    inputCol: Optional[str] = None,
    outputCol: Optional[str] = None,
    inputDtype: Optional[Union[str, int, float]] = None,
    outputDtype: Optional[Union[str, int, float]] = None,
    layerName: Optional[str] = None,
    arrayLength: Optional[int] = 128,
    padValue: Optional[Union[str, int, float]] = None,
) -> None:
    """
    Initialises the ArrayCropTransformer
    :param inputCol: Input column name.
    :param outputCol: Output column name.
    :param inputDtype: Input data type to cast input column(s) to before
    transforming.
    :param outputDtype: Output data type to cast the output column to after
    transforming.
    :param layerName: Name of the layer. Used as the name of the Keras layer
    :param arrayLength: The length to crop or pad the arrays to. Defaults to 128.
    :param padValue: The value pad the arrays with. Defaults to `None`.
    :returns: None
    """
    super().__init__()
    kwargs = self._input_kwargs
    self.setParams(**kwargs)
    self._pad_type_to_valid_element_types = {
        "int": ["int", "bigint", "smallint"],
        "float": ["float", "double", "decimal(10,0)"],
        "string": ["string"],
        "boolean": ["boolean"],
    }

compatible_dtypes property ¤

compatible_dtypes

List of compatible data types for the layer. If the computation can be performed on any data type, return None.

Returns:

Type Description
Optional[List[DataType]]

List of compatible data types for the layer.

get_keras_layer ¤

get_keras_layer()

Gets the Keras layer that performs the array cropping and padding.

Returns:

Type Description
Layer

Keras layer with name equal to the layerName parameter that performs the array cropping and padding operation.

Source code in src/kamae/spark/transformers/array_crop.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def get_keras_layer(self) -> keras.layers.Layer:
    """
    Gets the Keras layer that performs the array cropping and padding.

    :returns: Keras layer with name equal to the layerName parameter
    that performs the array cropping and padding operation.
    """
    return ArrayCropLayer(
        name=self.getLayerName(),
        input_dtype=self.getInputKerasDtype(),
        output_dtype=self.getOutputKerasDtype(),
        array_length=self.getArrayLength(),
        pad_value=self.getPadValue(),
    )