The following goals are set:
    'taskSubmitted':lambda a, b: maximization(a, b),
    'restarted_lost':lambda a, b: minimization(a, b),
    'restarted_noact':lambda a, b: minimization(a, b),
    'excessive_delay':lambda a, b: minimization(a, b)

For statistical tests the following threasholds are set:
	tiny_sample_size = 15 - t-test should be used
	small_sample_size = 30 - z-test can be used
	good_sample_size = 40 - sample distribution do not need to be normal
	Source: Larson and Farber Moore; Notz and Fligner, see Using the t Procedures

Testing each goal comparing new version against each previous version

Check if the new version is performing better each goal according to the goal's optimization objective (maximization or minimization). The distribution of data of the new version and each previous version must be normal and they need to present evidence of difference in order to answer whether the new version is better or not than each of its previous versions.

Comparing goal: taskSubmitted...


  taskSubmitted (0.17): New version data is comparable!

  taskSubmitted (0.17 vs 0.16): There is no evidence of difference!

  taskSubmitted (0.17 vs 0.15): There is no evidence of difference!

  taskSubmitted (0.17 vs 0.14): There is no evidence of difference!

  taskSubmitted (0.17 vs 0.12.1): The new version is BETTER (7.7) than the previous (4.9)

  taskSubmitted (0.17 vs 0.13): There is no evidence of difference!

  taskSubmitted (0.17 vs 0.12): The new version is BETTER (7.7) than the previous (4.8)

  taskSubmitted (0.17 vs 0.11): There is no evidence of difference!

  taskSubmitted (0.17 vs 0.10): The new version is BETTER (7.7) than the previous (4.8)

  taskSubmitted (0.17 vs 0.9): There is no evidence of difference!

Developed by Cleber Jorge Amaral.

Sources:

Shapiro-Wilk test
David S. Moore, William I. Notz, Michael A. Fligner. The Basic Practice of Statistics. 6th Edition. W. H. Freeman (2011)
Ron Larson and Betsy Farber. Elementary Statistics Picturing the World. 5th Edition. Prentice Hall (2012)