picos.expressions.samples

Implements Samples.

Classes

class picos.expressions.samples.Samples(samples=None, forced_original_shape=None, **kwargs)[source]

Bases: object

A collection of data points.

Example

>>> from picos.expressions import Samples
>>> # Load the column-major vectorization of six matrices.
>>> data = [[[1*i, 3*i],
...          [2*i, 4*i]] for i in range(1, 7)]
>>> S = Samples(data)
>>> S
<Samples: (6 4-dimensional samples)>
>>> [S.num, S.dim, S.original_shape]  # Metadata.
[6, 4, (2, 2)]
>>> S.matrix  # All samples as the columns of one matrix.
<4×6 Real Constant: [4×6]>
>>> print(S.matrix)
[ 1.00e+00  2.00e+00  3.00e+00  4.00e+00  5.00e+00  6.00e+00]
[ 2.00e+00  4.00e+00  6.00e+00  8.00e+00  1.00e+01  1.20e+01]
[ 3.00e+00  6.00e+00  9.00e+00  1.20e+01  1.50e+01  1.80e+01]
[ 4.00e+00  8.00e+00  1.20e+01  1.60e+01  2.00e+01  2.40e+01]
>>> print(S[0].T)  # The first sample (transposed for brevity).
[ 1.00e+00  2.00e+00  3.00e+00  4.00e+00]
>>> print(S.mean.T)  # The sample mean (transposed for brevity).
[ 3.50e+00  7.00e+00  1.05e+01  1.40e+01]
>>> print(S.covariance)  # The sample covariance matrix.
[ 3.50e+00  7.00e+00  1.05e+01  1.40e+01]
[ 7.00e+00  1.40e+01  2.10e+01  2.80e+01]
[ 1.05e+01  2.10e+01  3.15e+01  4.20e+01]
[ 1.40e+01  2.80e+01  4.20e+01  5.60e+01]
>>> print(S.original[0])  # The first sample in its original shape.
[ 1.00e+00  3.00e+00]
[ 2.00e+00  4.00e+00]
>>> U = S.select([0, 2, 4])  # Select a subset of samples by indices.
>>> print(U.matrix)
[ 1.00e+00  3.00e+00  5.00e+00]
[ 2.00e+00  6.00e+00  1.00e+01]
[ 3.00e+00  9.00e+00  1.50e+01]
[ 4.00e+00  1.20e+01  2.00e+01]
>>> T, V = S.partition()  # Split into training and validation samples.
>>> print(T.matrix)
[ 1.00e+00  2.00e+00  3.00e+00]
[ 2.00e+00  4.00e+00  6.00e+00]
[ 3.00e+00  6.00e+00  9.00e+00]
[ 4.00e+00  8.00e+00  1.20e+01]
>>> print(V.matrix)
[ 4.00e+00  5.00e+00  6.00e+00]
[ 8.00e+00  1.00e+01  1.20e+01]
[ 1.20e+01  1.50e+01  1.80e+01]
[ 1.60e+01  2.00e+01  2.40e+01]
__init__(samples, forced_original_shape=None, always_copy=True)[source]

Load a number of data points (samples).

Parameters
  • samples

    Any of the following:

    • A tuple or list of constants, each of which denotes a sample vector. Matrices are vectorized but their original_shape is stored and may be used by PICOS internally.

    • A constant row or column vector whose entries denote scalar samples.

    • A constant matrix whose columns denote the samples.

    • Another Samples instance. If possible, it is returned as is (Samples instances are immutable), otherwise a shallow copy with the necessary modifications is returned instead.

    In any case, constants may be given as constant numeric data values (anything recognized by load_data) or as constant PICOS expressions.

  • forced_original_shape – Overwrites original_shape with the given shape.

  • always_copy (bool) – If this is False, then data that is provided in the form of CVXOPT types is not copied but referenced if possible. This can speed up instance creation but will introduce inconsistencies if the original data is modified. Note that this argument has no impact if the samples argument already is a Samples instance; in this case data is never copied.

static __new__(cls, samples=None, forced_original_shape=None, **kwargs)[source]

Prepare a Samples instance.

kfold(k)[source]

Perform k-fold cross-validation (without shuffling).

If random shuffling is desired, write S.shuffled().kfold(k) where S is your Samples instance. To make the shuffling reproducible, see shuffled.

Returns list(tuple)

A list of k training set and validation set pairs.

Warning

If the number of samples n is not a multiple of k, then the last n \bmod k samples will appear in every training but in no validation set.

Example

>>> from picos.expressions import Samples
>>> n, k = 7, 3
>>> S = Samples(range(n))
>>> for i, (T, V) in enumerate(S.kfold(k)):
...     print("Partition {}:\nT = {}V = {}"
...           .format(i + 1, T.matrix, V.matrix))
Partition 1:
T = [ 2.00e+00  3.00e+00  4.00e+00  5.00e+00  6.00e+00]
V = [ 0.00e+00  1.00e+00]

Partition 2:
T = [ 0.00e+00  1.00e+00  4.00e+00  5.00e+00  6.00e+00]
V = [ 2.00e+00  3.00e+00]

Partition 3:
T = [ 0.00e+00  1.00e+00  2.00e+00  3.00e+00  6.00e+00]
V = [ 4.00e+00  5.00e+00]
partition(after_or_fraction=0.5)[source]

Split the samples into two parts.

Parameters

after_or_fraction (int or float) – Either a fraction strictly between zero and one that denotes the relative size of the first partition or an integer that denotes the number of samples to put in the first partition.

select(indices)[source]

Return a new Samples instance with only selected samples.

Parameters

indices – The indices of the samples to select.

shuffled(rng=None)[source]

Return a randomly shuffled instance of the samples.

Parameters

rng – A function that generates a random float in [0, 1). Defaults to whatever random.shuffle defaults to.

Example

>>> from picos.expressions import Samples
>>> S = Samples(range(6))
>>> print(S.matrix)
[ 0.00e+00  1.00e+00  2.00e+00  3.00e+00  4.00e+00  5.00e+00]
>>> rng = lambda: 0.5  # Fake RNG for reproducibility.
>>> print(S.shuffled(rng).matrix)
[ 0.00e+00  5.00e+00  1.00e+00  4.00e+00  2.00e+00  3.00e+00]
property covariance[source]

The sample covariance matrix.

property dim

Sample dimension.

property matrix

A matrix whose columns are the samples.

property mean[source]

The sample mean as a column vector.

property num

Number of samples.

property original[source]

A tuple containing the samples in their original shape.

property original_shape

Original shape of the samples before vectorization.

property vectors

A tuple containing the samples as column vectors.