Timeseries: Simple Use Case

Have you seen that timeseries have been added in Scilab 2024.0.0?

Here is a simple use case using COVID data from https://covid.ourworldindata.org.

Read data

Data that we wish to process is stored in a CSV file.

Retrieve data thanks to http_get function:

http_get("https://covid.ourworldindata.org/data/owid-covid-data.csv", "TMPDIR/owid-covid-data.csv", follow=%t);

Then read the file and select useful columns with readtimeseries function:

ts = readtimeseries("TMPDIR/owid-covid-data.csv", "VariableNames", ["date", "continent", "location", "new_cases", "new_deaths", "people_fully_vaccinated"])

Transform data as needed

Adjust data using the pivot function for better analysis:

p = pivot(ts, Rows="continent", Columns="date", ColumnsBinMethod="year", Method="sum", DataVariable="new_cases")

Plot

Use stackedplot function to plot the timeseries:

statFrance = ts(ts.location == "France") // select data corresponding to France
statFrance(:, ["continent", "location"]) = [] // remove columns "continent" and "location"
statSpain = ts(ts.location == "Spain") // select data corresponding to Spain
statSpain(:, ["continent", "location"]) = [] // remove columns "continent" and "location"
stackedplot(statFrance, statSpain, "LegendLabels", ["France", "Spain"])

Don’t hesitate to share your use cases and issues with us.

7 Likes

Hello,
I’ve been evaluating a few functions and properties around timeseries new features: the test case is to get data from a .csv typical test file downloaded from french DSO enedis for residential power meter “Linky”, as illustrated below.
The data range is about 2 years, of hourly power consumption records (some irregular sample times here and there).

Identifiant PRM;Type de donnees;Date de debut;Date de fin;Grandeur physique;Grandeur metier;Etape metier;Unite;Pas en minutes
12345678909876;Courbe de charge;11/01/2022;01/01/2024;Energie active;Consommation;Comptage Brut;W;
Horodate;Valeur
2022-01-11T01:00:00+01:00;278
2022-01-11T02:00:00+01:00;1462
2022-01-11T03:00:00+01:00;1501
2022-01-11T04:00:00+01:00;601
2022-01-11T05:00:00+01:00;344
2022-01-11T06:00:00+01:00;365
2022-01-11T07:00:00+01:00;594
2022-01-11T08:00:00+01:00;518
2022-01-11T09:00:00+01:00;412
2022-01-11T10:00:00+01:00;378
2022-01-11T11:00:00+01:00;520
2022-01-11T12:00:00+01:00;431
2022-01-11T13:00:00+01:00;474
2022-01-11T14:00:00+01:00;418
2022-01-11T15:00:00+01:00;395
2022-01-11T16:00:00+01:00;289
2022-01-11T17:00:00+01:00;264
2022-01-11T18:00:00+01:00;372
2022-01-11T19:00:00+01:00;309
2022-01-11T20:00:00+01:00;404

I first tried to detect the import option automatically (as readtimeseries() with no options ) but it failed because of the header structure, with some irregular introductive lines in the beginning of files until the real regular data:

dummyfile= "TMPDIR\Enedis_Conso_Heure_20220111-20240101_12345678909876.csv"
opts= detectImportOptions(dummyfile)

it fails :

Attention : Une incohĂ©rence a Ă©tĂ© trouvĂ©e dans les colonnes. À la ligne 2, 2 colonnes ont Ă©tĂ© trouvĂ©es, alors que la prĂ©cĂ©dente en avait 9.
Ă  la ligne 64 de la fonction detectImportOptions ( C:\Program Files\scilab-2024.0.0\modules\spreadsheet\macros\detectImportOptions.sci ligne 75 )
csvTextScan: can not read file, error in the column structure

then I created manually import structure : several attempts required to properly set the details of the fields ‘header’ and ‘inputFormat’ (help page could be refined with more exotic test cases here I guess. or improvement of autodetect header file) :

opts= struct("variableNames", ["Horodate","Valeur"],"variableTypes", ["datetime","double"],"delimiter", ";","datalines", [4,33448],"header", ["Identifiant PRM;Type de donnees;Date de debut;Date de fin;Grandeur physique;Grandeur metier;Etape metier;Unite;Pas en minutes"
"12345678909876;Courbe de charge;11/01/2022;01/01/2024;Energie active;Consommation;Comptage Brut;W;"
"Horodate;Valeur"],"inputFormat","yyyy-MM-ddTHH:mm:ss+","emptyCol",[])
ts= readtimeseries(dummyfile,opts)

which ran successully, except the current limitation (warning is properly issued) to manage UTC and daylight saving time information in datetime : then everything behind the ‘+’ in inputformat, the ‘+01:00 or +02:00’ information is not used to obtain true local time : this may be also indicated in the help page.

ATTENTION : UTC/GMT format is not managed. The result does not take it into account.
ts =


33445x1 timeseries
Horodate Valeur

 ___________________ ______


2022-01-11 01:00:00 278
2022-01-11 02:00:00 1462
2022-01-11 03:00:00 1501

 

2023-12-31 23:00:00 144
2023-12-31 23:30:00 126
2024-01-01 00:00:00 152

Then i tried the timeseries plot feature

stackedplot(ts)

it’s fine to get basic outlook of the whole data but the lack of callback function to set automatically the x_ticks with properly formatted datetime information according to the level of zoom prevent any real practical interpretation of this plot:


Then I suggest that the Simple date and time plotting on x-axis is still an issue to work on in the next Scilab releases.

Thank you for the discussion and the developments already performed!

David

Hello ,
I’ve been evaluating on dummy experimental dataset around readtimeseries features and options.
dummyfREADTSmodif.txt (6.5 KB)
dummyfREADTS.txt (6.8 KB)
It seems that readtimeseries and detectImportOptions are not able to manage every usual delimiter characters, especially tab delimiter.
To evaluate I also created similar dummy .csv files (that I’m not allowed to upload in this forum) by replacing tab delimiter in .txt by ; in .csv file.

With true csv file (; delimiter), the readtimeseries is importing automatically and creating the right number of variable, however it keeps untouched the columns’ headers as variable names while it contains improper characters, so it is not useful to exploit further the data in readable code (e.g. ts.Commande dĂ©bit active [kg/h] is issuing an error).
In dummyfREADTSmodif files I processed the original colums’headers to remove nonalphanumeric characters which made the variable names suitable for further use in code :

    fich= "TMPDIR\dummyfREADTS.txt"    
    opts= detectImportOptions(fich)
    opts.delimiter= ascii(9);
    varN= strsplit(opts.variableNames,ascii(9))
    for i=1:size(varN,1), varN(i)= part(varN(i),find(isalphanum(varN(i)))); end // retirer les caractÚres non alphanumériques pour les noms de variables
    opts.variableNames= varN';
    varT=emptystr(varN')
    varT(1)="datetime" 
    varT(2:$)="double"
    opts.variableTypes=varT
    opts.inputFormat= "dd/MM/yyyy HH:mm:ss"
    ts = readtimeseries(fich,opts);
    ts.Commandedébitactivekgh(4:6) // get some values through variable's name

I suggest to include such features directly in the process of readtimeseries to extend its utility.

Thank you,
David

it’s necessary to manually select the first line of the data, here the fourth.

Hello,

Following the update of COVID data, the timeseries, statFrance and statSpain don’t have the same time base.

Here is the code to use to have the same time base:

// statFrance.date and statSpain.date must be the same
dt = statSpain.date;
statFrance = statFrance(dt, :)

// data has too many NaN values and the plot is not clear
// looking at the data, interesting information is available every 7 days
newTimes = [dt(1):caldays(7):dt($)]';
rstatFrance = statFrance(newTimes, :)
rstatSpain = statSpain(newTimes, :)

stackedplot(rstatFrance, rstatSpain, "LegendLabels", ["France", "Spain"])
1 Like

Hello David,

I think you can use ts("Commande débit active [kg/h]") to get your data, rather than renaming the columns and losing them for later use

Hi Antoine,
yes, I can use the command ts("Commande dĂ©bit active [kg/h]") but it’s a bit heavy when writing code with equations for example, compared to ts.CommandedĂ©bitactivekgh. That’s why I preferred to convert into more usual variable names, and copy original in description field for later use indeed. This might be another example of user self-tuning after timeserie import.

I suggest also to extend the timeseries() help mentionning and show casing that one can access the vectors of data through '.' feature : I feel it is mentionned in the case of Table but not for timeseries. As well, it may be helpful for new user to include there the description and examples about how to insert/extract data, add/concatenate, remove, column-wise and row-wise, with the rules to do it, I spent quite time to found the appropriate and authorized syntax.
Timeseries help might be extended with more focus on creation from existing set of data vectors curm . As shown in timeseries(curm(1:5:$,2), curm(1:5:$,5:9),'VariableNames', ["Datetime" "HorodLBV", "mesFreq", "mestens", "Pe", "Pereac", "cosPhi" ],'SampleRate', 1 , 'StartTime', curstart(i) )] // undersampling
It is possible to initialize the timeserie with any number of arguments in the list before ‘VariableName’ provided: the overall number of data columns of virtually concatenated data equals the number of strings in VariableNames, including the datetime column name only if datetime are actually provided as first argument of the list of data variables, otherwise, as in example above with SampleRate and StartTime, there’s one more variable name (for Time) than data column .

Also giving more details, warnings about how to use the Properties fields: what is meant and allowed to do with these properties, for instance I was wrongly figuring that writing StartTime property after ts creation will change the Time vector.

I migrated several of my previous data objects using the new feature timeseries and I’m glad, handy while extremly powerful, thank you all developers !

David

Hello ,
I’ve been evaluating the readtimeseries function on another test case where import failed : because the decimal separator was , instead of usual . . The only option to import with readtimeseries is to change the original text files itself, replacing , by . .

dummycommadecsep.txt (2.8 KB)
I guess more options to include in detectImportOptions struct to handle wider situations.

Thanks,

David

Hello David,

Indeed, only the separator is managed by the detectImportOptions function but not the decimal option. It will be managed in the next version.

Adeline

Hi everyone,
I needed another code to retrieve timeseries from the daily consmption file that one can retrieve from french electricity distribution operator Enedis. The direct attempt fails to import file with the detectImportOptions call, so Ishare my experience to finally manage to get the data fom this dummy file excerpt (I can’t upload file as new user):

Identifiant PRM;Type de donnees;Date de debut;Date de fin;Grandeur physique;Grandeur metier;Etape metier;Unite
;Index;07/04/2021;23/03/2024;Energie active;Consommation;Comptage Brut;Wh
Horodate;Type de releve;EAS F1;EAS F2;EAS F3;EAS F4;EAS F5;EAS F6;EAS F7;EAS F8;EAS F9;EAS F10;EAS D1;EAS D2;EAS D3;EAS D4;EAS T
2021-04-08T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934
2021-04-09T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934
2021-04-10T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934
2021-04-11T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934
2021-04-12T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934
2021-04-13T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934
2021-04-14T00:00:00+02:00;ArrĂȘtĂ© quotidien;6331934;;;;;;;;;;4559625;1771377;0;932;6331934


2024-03-14T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-15T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-16T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-17T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-18T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-19T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-20T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-21T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-22T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
2024-03-23T23:00:00+01:00;ArrĂȘtĂ© quotidien;15371992;;;;;;;;;;6764278;8186010;141343;280361;15371992
Periode;Identifiant calendrier fournisseur;Libelle calendrier fournisseur;Identifiant classe temporelle 1;Libelle classe temporelle 1;Cadran classe temporelle 1;Identifiant classe temporelle 2;Libelle classe temporelle 2;Cadran classe temporelle 2;Identifiant classe temporelle 3;Libelle classe temporelle 3;Cadran classe temporelle 3;Identifiant classe temporelle 4;Libelle classe temporelle 4;Cadran classe temporelle 4;Identifiant classe temporelle 5;Libelle classe temporelle 5;Cadran classe temporelle 5;Identifiant classe temporelle 6;Libelle classe temporelle 6;Cadran classe temporelle 6;Identifiant classe temporelle 7;Libelle classe temporelle 7;Cadran classe temporelle 7;Identifiant classe temporelle 8;Libelle classe temporelle 8;Cadran classe temporelle 8;Identifiant classe temporelle 9;Libelle classe temporelle 9;Cadran classe temporelle 9;Identifiant classe temporelle 10;Libelle classe temporelle 10;Cadran classe temporelle 10;Identifiant calendrier distributeur;Libelle calendrier distributeur;Identifiant classe temporelle distributeur 1;Libelle classe temporelle distributeur 1;Cadran classe temporelle distributeur 1;Identifiant classe temporelle distributeur 2;Libelle classe temporelle distributeur 2;Cadran classe temporelle distributeur 2;Identifiant classe temporelle distributeur 3;Libelle classe temporelle distributeur 3;Cadran classe temporelle distributeur 3;Identifiant classe temporelle distributeur 4;Libelle classe temporelle distributeur 4;Cadran classe temporelle distributeur 4
Du 2021-04-08T00:00:00+02:00 au;FC000010;Base;BASE;Base;EAS F1;;;EAS F2;;;EAS F3;;;EAS F4;;;EAS F5;;;EAS F6;;;EAS F7;;;EAS F8;;;EAS F9;;;EAS F10;DI000003;Avec différenciation temporelle et saisonniÀre;HCB;Heures Creuses Saison Basse;EAS D1;HPB;Heures Pleines Saison Basse;EAS D2;HCH;Heures Creuses Hiver / Saison Haute;EAS D3;HPH;Heures Pleines Hiver / Saison Haute;EAS D4

and the code to read timeseries below, from adhoc options struct.
NB :

  • opts.datalines needs to be set according to the actual number of lines in the files and excluding the 2 text lines at the end of the file.
  • datatypes : use “string” when empty data in corresponding columns (inspired by another try with detectimportoptions)
  • emptycol : not used? how to combine with datatypes ?
fich= "Enedis_Conso_Jour_20210408-20240322_12345678909876.csv"

//opts=detectImportOptions(fich) // fail
//to be adaptated for daily consumption file extract
opts= struct(  "variableNames", ["Horodate","Type de releve","EAS F1","EAS F2","EAS F3","EAS F4","EAS F5","EAS F6","EAS F7","EAS F8","EAS F9","EAS F10","EAS D1","EAS D2","EAS D3","EAS D4","EAS T"],..
  "variableTypes", ["datetime","string","double","string","string","string","string","string","string","string","string","string","double","double","double","double","double"],..
  "delimiter", ";",..
  "datalines", [4,1085],..
  "header", ["ï»żIdentifiant PRM;Type de donnees;Date de debut;Date de fin;Grandeur physique;Grandeur metier;Etape metier;Unite"
"12345678909876;Index;07/04/2021;23/03/2024;Energie active;Consommation;Comptage Brut;Wh"],..
  "inputFormat", ["yyyy-MM-ddTHH:mm:ss+","","","","","","","","","","","","","","","",""],..
  "emptyCol", [])

TSlnk= readtimeseries(fich,opts)

stackedplot(TSlnk)