class documentation

class DataDict(defaultdict):

Constructor: DataDict(dict_)

View In Hierarchy

A (default) dictionary like parameter_value: list_of_measures,

e.g. with float parameter or dimension as key and runtimes as value. This class is used under the hood in the Results class.

A main functionality is the method clean, which joins all entries which have almost equal keys. This allows to have a float parameter as key.

This class provides simple computations on this kind of data, like x, y = .xy_arrays() == sorted(keys), sp1(values).

If the dictionary values are not lists, one may get rather unexpected results or exceptions.

Details: this class allows to use float values as keys when clean_key and set_clean are used to access the data in the dict. Inheriting from defaultdict with list as default value, the syntax:

data = DataDict()
data[first_key] += [first_data_point]

without initialization of the key value works perfectly fine.

Caveat: small values are considered as the same key, even if they are close to zero. Either use a different comparison via the equal keyword parameter, or use 1 / key_value or log(key_value)`.

TODO: consider numpy.allclose for almost equal comparison?

Method __init__ Use dict(dict_.data or dict_), and dict_.meta_data for initialization.
Method __repr__ Undocumented
Method aggregated WIP return a dict with (agg(data), number_of_samples) per key.
Method argmin slack_index_shift can be +-1, looking to the right/left of argmin
Method clean merge keys which have almost the same value
Method clean_key set similar key values all to be key, return key.
Method get_near get the merged values list of all nearby keys.
Method percentile TODO:review percentile based on bootstrapping
Method samples return a dict with the number of values per key
Method set_clean join all entries with similar key and return the new value, a joined list of all respective values.
Method test return p-value of the mannwhitneyu test
Method tests return p-values of the mannwhitneyu test of entries adjacent with gap
Method update update data lists from a dict of lists (and only a dict)
Method xy_arrays return two arrays ready to be plotted like plot(*xy_arrays).
Instance Variable meta_data Undocumented
Property successes return a class instance with attributes x (i.e. keys), n, nsucc, and rate as arrays.
Method _near_key return a key in self which is equal to key and otherwise key.
def __init__(self, dict_=None):

Use dict(dict_.data or dict_), and dict_.meta_data for initialization.

Details: dict_.meta_data are assigned as a reference.

def __repr__(self):

Undocumented

def aggregated(self, agg=sp1, relative=False, by=None):

WIP return a dict with (agg(data), number_of_samples) per key.

For example, to get the 10%tile:

.aggregated(lambda x: np.percentile(x, 10))
def argmin(self, agg=sp1, slack=1.0, slack_index_shift=+1):

slack_index_shift can be +-1, looking to the right/left of argmin

def clean(self, equal=(lambda x, y: x - 1e-06 < y < x + 1e-06)):

merge keys which have almost the same value

def clean_key(self, key, equal=(lambda x, y: x - 1e-06 < y < x + 1e-06)):

set similar key values all to be key, return key.

Use method set_clean to access and change the clean-key dictionary value more conveniently.

def get_near(self, key, equal=(lambda x, y: x - 1e-06 < y < x + 1e-06)):

get the merged values list of all nearby keys.

Caveat: the returned value is a new list

See Also
clean, set_clean.
def percentile(self, prctile, agg=sp1, samples=100):

TODO:review percentile based on bootstrapping

def samples(self):

return a dict with the number of values per key

def set_clean(self, key):

join all entries with similar key and return the new value, a joined list of all respective values.

This is the same as clean_key which however returns the new key, not the values.

Example:

data.set_clean(key) += [new_data_point]

# same as
data[data.clean_key(key)] += [new_data_point]

# or more explicite, however with a different order of the data
data[key] += [new_data_point]
data.clean_key(key)  # joins data, however in the "wrong" order

# similar as
data[key] += [new_data_point]
data.clean()  # cleans *all* keys
def test(self, key, key2=None, method='auto'):

return p-value of the mannwhitneyu test

def tests(self, gap=1, method='auto'):

return p-values of the mannwhitneyu test of entries adjacent with gap

def update(self, dict_):

update data lists from a dict of lists (and only a dict)

def xy_arrays(self, agg=sp1, type_=np.asarray):

return two arrays ready to be plotted like plot(*xy_arrays).

The x-array contains the sorted keys, the y-array contains the respectively aggregated values.

For example to be used like:

``plot(*self.xy_arrays())``.

Parameter agg determines the function to aggregate data values, by default sp1 which is the mean corrected for missing data. To show dispersion, we can use agg=lambda x: np.percentile(x, 10) and ..., 90).

meta_data =

Undocumented

@property
successes =

return a class instance with attributes x (i.e. keys), n, nsucc, and rate as arrays.

TODO: consider using cma.utilities.utils.DictClass2 instead namedtuple?

def _near_key(self, key, equal=(lambda x, y: x - 1e-06 < y < x + 1e-06), exclude=None):

return a key in self which is equal to key and otherwise key.