Open Data

RECAP Artificial Data Traces

Throughout the course of the project, RECAP has collected a large amount amount of data. In an effort to increase transparency and research speed, RECAP has opted to release real and artificial workloads aiding in the research process throughout the field of cloud computing.

Data on:

  • Infrastructure and Network Management
  • Big Data Analytics
  • Edge/Fog Computing for Smart Cities
  • Virtual Content Delivery Networks

Was provided by the RECAP partners and extended using algorithms for synthetic workload generation such as:

  • Structural Models based Workload Generation
  • Regression-Model-based Workload Generation
  • GAN-based Workload Generation
  • Traffic Propagation based Workload Generation

The work was performed as part of WP5. The objective of the WP5 – Data Collection, Visualization and Analysis of RECAP is to provide the necessary tools for managing and refining the data needed for the rest of the work packages. This includes the collection as well as the generation of data. Within this work package, Task 5.3 Artificial Workload Generation is responsible for the generation of a collection of datasets with artificial workloads, that complement the real data traces collected from industrial partners. Moreover, because publicly available workload data is scarce we provide the data as public data sets. This document released with the data is a companion report to Deliverable D5.3 which is of type “dataset”. The aim of the report is to describe the collection of datasets that constitute D5.3 and the mathematical techniques (structural time series models, generative adversarial networks, and workload based on traffic propagation) by which one can artificially generate and/or augment such datasets. The datasets described include real data traces collected by industrial partners and artificial data traces generated by the use of statistical models and neural networks. Each published data set can be used by the scientific and industrial community as a starting point for the modelling and experimental validation of distributed edge and cloud applications, facilitating the repeatability of the results.

Download

The data is hosted on Zenodo.

Open Access Guidelines

Zenodo is an OpenAire indexed repository for open access publication of scientific work and data. It meets EU regulations in terms of data publications, and is hosted by CERN with a program defined for the next 20+ years.

Citation

Please use the follow BibTeX to cite the data:

@dataset{leznik_mark_2019_3458559,
  author       = {Leznik, Mark and
                  Garcia Leiva, Rafael and
                  Le Duc, Thang and
                  Svorobej, Sergej and
                  Närvä, Linus and
                  Noya Mariño, Manuel and
                  Willis, Peter and
                  Giannoutakis, Konstantinos M. and
                  Loomba, Radhika and
                  Humanes, Héctor and
                  López, Miguel Ángel and
                  Östberg, P-O and
                  Casari, Paolo and
                  Domaschka, Jörg},
  title        = {RECAP Artificial Data Traces},
  month        = oct,
  year         = 2019,
  note         = ,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.3458559},
  url          = {https://doi.org/10.5281/zenodo.3458559}
}