cma.experimentation.ResultsPandas

class documentation

class ResultsPandas:

Constructor: ResultsPandas(filepathname, format)

WIP ad hoc code for writing (changing) results in a pandas.DataFrame and to disk,

where "results" refers to end-results of experiment repetitions rather than the traces of single runs.

Caveat: this is an ad hoc implementation, some interfaces may be incomplete, interface details are still versatile and in flux, some recent changes may be faulty.

Main features, similar to Results, are

Intermediate savings of results which consequently can be loaded from a different shell while the experiment is running.
Backup under the current timestamp before each saving into a 'backups-name' folder (optional but default).
Similar float values can be "equalized" for correct data aggregation, see the check_close_values and equalize_close_values methods. This uses np.isclose which considers 1e-8 to be close to 1e-9 and 1+1e-5 close to 1+1e-6.

Compared to Results, this class is useful to store more information for each run, like the termination condition or a final condition number or constraints violations or meta parameter information. (In contrast, with Results the workaround for catching the final condition would writing a nonfinite entry when the target was not reached.)

A guiding code example:

import cma.experimentation

res = cma.experimentation.ResultsPandas('some-name')  # reloads data to append/continue
for dim in dimensions:
    es = cma.CMA(dim * [2], 1, {'verbose': -9})
    # a tracker in case we want to track, say, a minimum over the trace as result
    es.optimize(cma.ff.rosen, callback=my_result_tracker)
    if es.opts.get('verbose') == -9:  # do not save data while testing the setup
        res.append([es.N, es.popsize,
                    es.result.evaluations,
                    1 if 'ftarget' in es.stop() else 0,
                    repr(es.stop),
                    es.condition_number,
                    my_result_tracker.my_val_of_interest],
                    colums=['dimension', 'popsize',
                            'evaluations',
                            'targethit',
                            'stopcondition',
                            'conditionnumber',
                            'myvalue'])
        res.save()

# load the data:
res = cma.experimentation.ResultsPandas('some-name')
print(res.summary)  # summary statistics of columns in a dict
res.df  # the pandas data frame

Notes: self.df.drop(...) could be used to return a data frame with some entried dropped.

Method	`__getattr__`	access the `DataFrame` `Results.df` directly from `Results`
Method	`__init__`	load data when `filepathname` exists.
Method	`append`	append a single data row
Method	`backup`	backup saved data by making a file copy
Method	`check_close_values`	Undocumented
Method	`column_values`	a sorted list of values in the column of name `column`.
Method	`drop`	call `self.df.drop` and reassign result
Method	`equalize_close_values`	Undocumented
Method	`extend`	extend frame by data which is a sequence of rows
Method	`load`	Undocumented
Method	`print_checks`	do not use `iterrows` but `itertuples` https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
Method	`reset_index`	caveat: the original index is lost which may be undesirable
Method	`save`	save data, warn when saving takes more than `time_s` seconds.
Instance Variable	`backup_dirname`	Undocumented
Instance Variable	`df`	Undocumented
Instance Variable	`last_backup`	Undocumented
Instance Variable	`name`	Undocumented
Property	`failure_indices`	TODO: revise such that we have a boolean failed_setting column
Property	`summary`	return number of finite entries, number of different values, and
Method	`_polish`	a quick hack
Instance Variable	`_extension`	Undocumented

def __getattr__(self, name): ¶

access the DataFrame Results.df directly from Results

def __init__(self, filepathname, format='.csv'): ¶

load data when filepathname exists.

format='.csv' is human readible, however '.feather' is much more performant, '.parquet' should work too.

def append(self, data, columns, kwargs_df={}, kwargs_concat={}): ¶

append a single data row

def backup(self): ¶

backup saved data by making a file copy

def check_close_values(self, column): ¶

Undocumented

def column_values(self, column, by='dimension'): ¶

a sorted list of values in the column of name column.

def drop(self, *args, **kwargs): ¶

call self.df.drop and reassign result

def equalize_close_values(self, column): ¶

Undocumented

def extend(self, data, columns, kwargs_df={}, kwargs_concat={}): ¶

extend frame by data which is a sequence of rows

def load(self, polish=False): ¶

Undocumented

def print_checks(self, column): ¶

do not use iterrows but itertuples https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

def reset_index(self): ¶

caveat: the original index is lost which may be undesirable

def save(self, backup=True, time_s=2, **kwargs): ¶

save data, warn when saving takes more than time_s seconds.

kwargs are passed to the saving method of the pandas data frame.

Details: save is in essence just a shortcut for:

self.backup()
self.df.to_feather(self.name + self.extension)  # assuming .extension == '.feather'

TODO: generalize by passing the DataFrame method name for saving, like .save('to_feather')? Annoyingly, pandas does not add a proper extension by default.

backup_dirname = ¶

Undocumented

df = ¶

Undocumented

last_backup = ¶

Undocumented

name = ¶

Undocumented

@property
failure_indices = ¶

TODO: revise such that we have a boolean failed_setting column

@property
summary = ¶

return number of finite entries, number of different values, and

(min, max) for each column.

def _polish(self, column): ¶

a quick hack

_extension = ¶

Undocumented