A (default) dictionary like parameter_value: list_of_measures,
e.g. with float parameter or dimension as key and runtimes as value. This
class is used under the hood in the Results class.
A main functionality is the method clean, which joins all entries
which have almost equal keys. This allows to have a float parameter
as key.
This class provides simple computations on this kind of data, like x, y = .xy_arrays() == sorted(keys), sp1(values).
If the dictionary values are not lists, one may get rather unexpected results or exceptions.
Details: this class allows to use float values as keys when
clean_key and set_clean are used to access the data in the
dict. Inheriting from defaultdict with list as default value,
the syntax:
data = DataDict() data[first_key] += [first_data_point]
without initialization of the key value works perfectly fine.
Caveat: small values are considered as the same key, even if they are
close to zero. Either use a different comparison via the equal
keyword parameter, or use 1 / key_value or log(key_value)`.
TODO: consider numpy.allclose for almost equal comparison?
| Method | __init__ |
Use dict(dict_.data or dict_), and dict_.meta_data for initialization. |
| Method | __repr__ |
Undocumented |
| Method | aggregated |
WIP return a dict with (agg(data), number_of_samples) per key. |
| Method | argmin |
slack_index_shift can be +-1, looking to the right/left of argmin |
| Method | clean |
merge keys which have almost the same value |
| Method | clean |
set similar key values all to be key, return key. |
| Method | get |
get the merged values list of all nearby keys. |
| Method | percentile |
TODO:review percentile based on bootstrapping |
| Method | samples |
return a dict with the number of values per key |
| Method | set |
join all entries with similar key and return the new value, a joined list of all respective values. |
| Method | test |
return p-value of the mannwhitneyu test |
| Method | tests |
return p-values of the mannwhitneyu test of entries adjacent with gap |
| Method | update |
update data lists from a dict of lists (and only a dict) |
| Method | xy |
return two arrays ready to be plotted like plot(*xy_arrays). |
| Instance Variable | meta |
Undocumented |
| Property | successes |
return a class instance with attributes x (i.e. keys), n, nsucc, and rate as arrays. |
| Method | _near |
return a key in self which is equal to key and otherwise key. |
Use dict(dict_.data or dict_), and dict_.meta_data for
initialization.
Details: dict_.meta_data are assigned as a reference.
WIP return a dict with (agg(data), number_of_samples) per key.
For example, to get the 10%tile:
.aggregated(lambda x: np.percentile(x, 10))
set similar key values all to be key, return key.
Use method set_clean to access and change the clean-key
dictionary value more conveniently.
join all entries with similar key and return the new value,
a joined list of all respective values.
This is the same as clean_key which however returns the new key, not
the values.
Example:
data.set_clean(key) += [new_data_point] # same as data[data.clean_key(key)] += [new_data_point] # or more explicite, however with a different order of the data data[key] += [new_data_point] data.clean_key(key) # joins data, however in the "wrong" order # similar as data[key] += [new_data_point] data.clean() # cleans *all* keys
return two arrays ready to be plotted like plot(*xy_arrays).
The x-array contains the sorted keys, the y-array contains the respectively aggregated values.
For example to be used like:
``plot(*self.xy_arrays())``.
Parameter agg determines the function to aggregate data values, by
default sp1 which is the mean corrected for missing data. To show
dispersion, we can use agg=lambda x: np.percentile(x, 10) and ...,
90).
return a class instance with attributes x (i.e. keys), n,
nsucc, and rate as arrays.
TODO: consider using cma.utilities.utils.DictClass2 instead namedtuple?