Timeseries: Simple Use Case

Have you seen that timeseries have been added in Scilab 2024.0.0?

Here is a simple use case using COVID data from https://covid.ourworldindata.org.

Read data

Data that we wish to process is stored in a CSV file.

Retrieve data thanks to http_get function:

http_get("https://covid.ourworldindata.org/data/owid-covid-data.csv", "TMPDIR/owid-covid-data.csv", follow=%t);

Then read the file and select useful columns with readtimeseries function:

ts = readtimeseries("TMPDIR/owid-covid-data.csv", "VariableNames", ["date", "continent", "location", "new_cases", "new_deaths", "people_fully_vaccinated"])

Transform data as needed

Adjust data using the pivot function for better analysis:

p = pivot(ts, Rows="continent", Columns="date", ColumnsBinMethod="year", Method="sum", DataVariable="new_cases")

Plot

Use stackedplot function to plot the timeseries:

statFrance = ts(ts.location == "France") // select data corresponding to France
statFrance(:, ["continent", "location"]) = [] // remove columns "continent" and "location"
statSpain = ts(ts.location == "Spain") // select data corresponding to Spain
statSpain(:, ["continent", "location"]) = [] // remove columns "continent" and "location"
stackedplot(statFrance, statSpain, "LegendLabels", ["France", "Spain"])

Don’t hesitate to share your use cases and issues with us.

7 Likes

Hello,
I’ve been evaluating a few functions and properties around timeseries new features: the test case is to get data from a .csv typical test file downloaded from french DSO enedis for residential power meter “Linky”, as illustrated below.
The data range is about 2 years, of hourly power consumption records (some irregular sample times here and there).

Identifiant PRM;Type de donnees;Date de debut;Date de fin;Grandeur physique;Grandeur metier;Etape metier;Unite;Pas en minutes
12345678909876;Courbe de charge;11/01/2022;01/01/2024;Energie active;Consommation;Comptage Brut;W;
Horodate;Valeur
2022-01-11T01:00:00+01:00;278
2022-01-11T02:00:00+01:00;1462
2022-01-11T03:00:00+01:00;1501
2022-01-11T04:00:00+01:00;601
2022-01-11T05:00:00+01:00;344
2022-01-11T06:00:00+01:00;365
2022-01-11T07:00:00+01:00;594
2022-01-11T08:00:00+01:00;518
2022-01-11T09:00:00+01:00;412
2022-01-11T10:00:00+01:00;378
2022-01-11T11:00:00+01:00;520
2022-01-11T12:00:00+01:00;431
2022-01-11T13:00:00+01:00;474
2022-01-11T14:00:00+01:00;418
2022-01-11T15:00:00+01:00;395
2022-01-11T16:00:00+01:00;289
2022-01-11T17:00:00+01:00;264
2022-01-11T18:00:00+01:00;372
2022-01-11T19:00:00+01:00;309
2022-01-11T20:00:00+01:00;404

I first tried to detect the import option automatically (as readtimeseries() with no options ) but it failed because of the header structure, with some irregular introductive lines in the beginning of files until the real regular data:

dummyfile= "TMPDIR\Enedis_Conso_Heure_20220111-20240101_12345678909876.csv"
opts= detectImportOptions(dummyfile)

it fails :

Attention : Une incohérence a été trouvée dans les colonnes. À la ligne 2, 2 colonnes ont été trouvées, alors que la précédente en avait 9.
à la ligne 64 de la fonction detectImportOptions ( C:\Program Files\scilab-2024.0.0\modules\spreadsheet\macros\detectImportOptions.sci ligne 75 )
csvTextScan: can not read file, error in the column structure

then I created manually import structure : several attempts required to properly set the details of the fields ‘header’ and ‘inputFormat’ (help page could be refined with more exotic test cases here I guess. or improvement of autodetect header file) :

opts= struct("variableNames", ["Horodate","Valeur"],"variableTypes", ["datetime","double"],"delimiter", ";","datalines", [4,33448],"header", ["Identifiant PRM;Type de donnees;Date de debut;Date de fin;Grandeur physique;Grandeur metier;Etape metier;Unite;Pas en minutes"
"12345678909876;Courbe de charge;11/01/2022;01/01/2024;Energie active;Consommation;Comptage Brut;W;"
"Horodate;Valeur"],"inputFormat","yyyy-MM-ddTHH:mm:ss+","emptyCol",[])
ts= readtimeseries(dummyfile,opts)

which ran successully, except the current limitation (warning is properly issued) to manage UTC and daylight saving time information in datetime : then everything behind the ‘+’ in inputformat, the ‘+01:00 or +02:00’ information is not used to obtain true local time : this may be also indicated in the help page.

ATTENTION : UTC/GMT format is not managed. The result does not take it into account.
ts =

33445x1 timeseries
Horodate Valeur
… ___________________ ______

2022-01-11 01:00:00 278
2022-01-11 02:00:00 1462
2022-01-11 03:00:00 1501
… …
2023-12-31 23:00:00 144
2023-12-31 23:30:00 126
2024-01-01 00:00:00 152

Then i tried the timeseries plot feature

stackedplot(ts)

it’s fine to get basic outlook of the whole data but the lack of callback function to set automatically the x_ticks with properly formatted datetime information according to the level of zoom prevent any real practical interpretation of this plot:


Then I suggest that the Simple date and time plotting on x-axis is still an issue to work on in the next Scilab releases.

Thank you for the discussion and the developments already performed!

David

Hello ,
I’ve been evaluating on dummy experimental dataset around readtimeseries features and options.
dummyfREADTSmodif.txt (6.5 KB)
dummyfREADTS.txt (6.8 KB)
It seems that readtimeseries and detectImportOptions are not able to manage every usual delimiter characters, especially tab delimiter.
To evaluate I also created similar dummy .csv files (that I’m not allowed to upload in this forum) by replacing tab delimiter in .txt by ; in .csv file.

With true csv file (; delimiter), the readtimeseries is importing automatically and creating the right number of variable, however it keeps untouched the columns’ headers as variable names while it contains improper characters, so it is not useful to exploit further the data in readable code (e.g. ts.Commande débit active [kg/h] is issuing an error).
In dummyfREADTSmodif files I processed the original colums’headers to remove nonalphanumeric characters which made the variable names suitable for further use in code :

    fich= "TMPDIR\dummyfREADTS.txt"    
    opts= detectImportOptions(fich)
    opts.delimiter= ascii(9);
    varN= strsplit(opts.variableNames,ascii(9))
    for i=1:size(varN,1), varN(i)= part(varN(i),find(isalphanum(varN(i)))); end // retirer les caractères non alphanumériques pour les noms de variables
    opts.variableNames= varN';
    varT=emptystr(varN')
    varT(1)="datetime" 
    varT(2:$)="double"
    opts.variableTypes=varT
    opts.inputFormat= "dd/MM/yyyy HH:mm:ss"
    ts = readtimeseries(fich,opts);
    ts.Commandedébitactivekgh(4:6) // get some values through variable's name

I suggest to include such features directly in the process of readtimeseries to extend its utility.

Thank you,
David

it’s necessary to manually select the first line of the data, here the fourth.

Hello,

Following the update of COVID data, the timeseries, statFrance and statSpain don’t have the same time base.

Here is the code to use to have the same time base:

// statFrance.date and statSpain.date must be the same
dt = statSpain.date;
statFrance = statFrance(dt, :)

// data has too many NaN values and the plot is not clear
// looking at the data, interesting information is available every 7 days
newTimes = [dt(1):caldays(7):dt($)]';
rstatFrance = statFrance(newTimes, :)
rstatSpain = statSpain(newTimes, :)

stackedplot(rstatFrance, rstatSpain, "LegendLabels", ["France", "Spain"])
1 Like

Hello David,

I think you can use ts("Commande débit active [kg/h]") to get your data, rather than renaming the columns and losing them for later use

Hi Antoine,
yes, I can use the command ts("Commande débit active [kg/h]") but it’s a bit heavy when writing code with equations for example, compared to ts.Commandedébitactivekgh. That’s why I preferred to convert into more usual variable names, and copy original in description field for later use indeed. This might be another example of user self-tuning after timeserie import.

I suggest also to extend the timeseries() help mentionning and show casing that one can access the vectors of data through '.' feature : I feel it is mentionned in the case of Table but not for timeseries. As well, it may be helpful for new user to include there the description and examples about how to insert/extract data, add/concatenate, remove, column-wise and row-wise, with the rules to do it, I spent quite time to found the appropriate and authorized syntax.
Timeseries help might be extended with more focus on creation from existing set of data vectors curm . As shown in timeseries(curm(1:5:$,2), curm(1:5:$,5:9),'VariableNames', ["Datetime" "HorodLBV", "mesFreq", "mestens", "Pe", "Pereac", "cosPhi" ],'SampleRate', 1 , 'StartTime', curstart(i) )] // undersampling
It is possible to initialize the timeserie with any number of arguments in the list before ‘VariableName’ provided: the overall number of data columns of virtually concatenated data equals the number of strings in VariableNames, including the datetime column name only if datetime are actually provided as first argument of the list of data variables, otherwise, as in example above with SampleRate and StartTime, there’s one more variable name (for Time) than data column .

Also giving more details, warnings about how to use the Properties fields: what is meant and allowed to do with these properties, for instance I was wrongly figuring that writing StartTime property after ts creation will change the Time vector.

I migrated several of my previous data objects using the new feature timeseries and I’m glad, handy while extremly powerful, thank you all developers !

David