class documentation

class ResultsPandas:

Constructor: ResultsPandas(filepathname, format)

View In Hierarchy

WIP ad hoc code for writing (changing) results in a pandas.DataFrame and to disk,

where "results" refers to end-results of experiment repetitions rather than the traces of single runs.

Caveat: this is an ad hoc implementation, some interfaces may be incomplete, interface details are still versatile and in flux, some recent changes may be faulty.

Main features, similar to Results, are

  • Intermediate savings of results which consequently can be loaded from a different shell while the experiment is running.
  • Backup under the current timestamp before each saving into a 'backups-name' folder (optional but default).
  • Similar float values can be "equalized" for correct data aggregation, see the check_close_values and equalize_close_values methods. This uses np.isclose which considers 1e-8 to be close to 1e-9 and 1+1e-5 close to 1+1e-6.

Compared to Results, this class is useful to store more information for each run, like the termination condition or a final condition number or constraints violations or meta parameter information. (In contrast, with Results the workaround for catching the final condition would writing a nonfinite entry when the target was not reached.)

A guiding code example:

import cma.experimentation

res = cma.experimentation.ResultsPandas('some-name')  # reloads data to append/continue
for dim in dimensions:
    es = cma.CMA(dim * [2], 1, {'verbose': -9})
    # a tracker in case we want to track, say, a minimum over the trace as result
    es.optimize(cma.ff.rosen, callback=my_result_tracker)
    if es.opts.get('verbose') == -9:  # do not save data while testing the setup
        res.append([es.N, es.popsize,
                    es.result.evaluations,
                    1 if 'ftarget' in es.stop() else 0,
                    repr(es.stop),
                    es.condition_number,
                    my_result_tracker.my_val_of_interest],
                    colums=['dimension', 'popsize',
                            'evaluations',
                            'targethit',
                            'stopcondition',
                            'conditionnumber',
                            'myvalue'])
        res.save()

# load the data:
res = cma.experimentation.ResultsPandas('some-name')
print(res.summary)  # summary statistics of columns in a dict
res.df  # the pandas data frame

Notes: self.df.drop(...) could be used to return a data frame with some entried dropped.

Method __getattr__ access the DataFrame Results.df directly from Results
Method __init__ load data when filepathname exists.
Method append append a single data row
Method backup backup saved data by making a file copy
Method check_close_values Undocumented
Method column_values a sorted list of values in the column of name column.
Method drop call self.df.drop and reassign result
Method equalize_close_values Undocumented
Method extend extend frame by data which is a sequence of rows
Method load Undocumented
Method print_checks do not use iterrows but itertuples https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
Method reset_index caveat: the original index is lost which may be undesirable
Method save save data, warn when saving takes more than time_s seconds.
Instance Variable backup_dirname Undocumented
Instance Variable df Undocumented
Instance Variable last_backup Undocumented
Instance Variable name Undocumented
Property failure_indices TODO: revise such that we have a boolean failed_setting column
Property summary return number of finite entries, number of different values, and
Method _polish a quick hack
Instance Variable _extension Undocumented
def __getattr__(self, name):

access the DataFrame Results.df directly from Results

def __init__(self, filepathname, format='.csv'):

load data when filepathname exists.

format='.csv' is human readible, however '.feather' is much more performant, '.parquet' should work too.

def append(self, data, columns, kwargs_df={}, kwargs_concat={}):

append a single data row

def backup(self):

backup saved data by making a file copy

def check_close_values(self, column):

Undocumented

def column_values(self, column, by='dimension'):

a sorted list of values in the column of name column.

def drop(self, *args, **kwargs):

call self.df.drop and reassign result

def equalize_close_values(self, column):

Undocumented

def extend(self, data, columns, kwargs_df={}, kwargs_concat={}):

extend frame by data which is a sequence of rows

def load(self, polish=False):

Undocumented

def reset_index(self):

caveat: the original index is lost which may be undesirable

def save(self, backup=True, time_s=2, **kwargs):

save data, warn when saving takes more than time_s seconds.

kwargs are passed to the saving method of the pandas data frame.

Details: save is in essence just a shortcut for:

self.backup()
self.df.to_feather(self.name + self.extension)  # assuming .extension == '.feather'

TODO: generalize by passing the DataFrame method name for saving, like .save('to_feather')? Annoyingly, pandas does not add a proper extension by default.

backup_dirname =

Undocumented

df =

Undocumented

last_backup =

Undocumented

name =

Undocumented

@property
failure_indices =

TODO: revise such that we have a boolean failed_setting column

@property
summary =

return number of finite entries, number of different values, and

(min, max) for each column.

def _polish(self, column):

a quick hack

_extension =

Undocumented