Error Handling reading File

Hello, I hope you can assist me. I’ve been collecting a substantial amount of data, resulting in a data.txt file containing around 35,000 lines. For example:

09:38:25.184	<FL111 Po0 t1198 t2195 PLTI153 FM250 Er0 HB2 B0 t3225 VS150 I0 AT0 TC32
09:38:25.284	<FL111 Po0 t1198 t2197 PLTI153 FM250 Er0 HB2 B0 t3225 VS150 I0 AT0 TC42
09:38:25.389	<FL111 Po0 t1198 t2198 PLTI153 FM250 Er0 HB2 B0 t3225 VS150 I0 AT0 TC52
09:38:25.473	<FL111 Po0 t1198 t2197 PLTI153 FM250 **Er** HB2 B0 t3225 VS150 I0 AT0 TC62
09:38:25.583	<FL111 Po0 t1198 t2195 PLTI153 FM250 Er0 HB2 B0 t3225 VS150 I0 AT0 TC72
09:38:25.677	<FL111 Po0 t1197 t2198 PLTI153 FM250 Er0 HB2 B0 t3225 VS150 I0 AT0 TC82

In one of the lines, specifically the one with Er (which lacks a number), I encounter an issue.

I’m attempting to read this data.txt file in Scilab 2023.1 and extract all the values into a matrix for subsequent post-processing.

Here’s the script I’m using:

Steps = csvRead('data.txt', ascii(9), [], "string"); // Read data into a matrix

i = 1;
for j = 1:u    // Loop through the CoolTerm Matrix Measurement
    if "FL" == part(Steps(j,v), 1:2) then  // Filter messages starting with FL
        Time(i,:) = msscanf(Steps(j,1), "%d:%d:%d.%d"); // Read Time String into double
        Serial(i,:) = msscanf(Steps(j,v), "FL%d Po%d t1%d t2%d PLTI%d FM%d Er%d HB%d B%d t3%d VS%d I%d AT%d TC%d"); // Move String Values into a Double Matrix
        i = i + 1;
    end
end

type or paste code here

However, due to the missing number at Er, I’m encountering the following error message: “Submatrix incorrectly defined.” Is there a way to skip this line and proceed to the next one? Could using a “Try…Catch…End” approach be beneficial?

I’ve also experimented with the following code:

filename = 'data.txt';
file = mopen(filename, 'r');
Steps = mgetl(file, -1); // Read the file line by line

variable_name = "DATA"; // Store all measurements in one matrix or struct
i = 1;

for cnt = 1:size(Steps,'*') // Load only consistent data
    try
        param(cnt,:) = tokens(Steps(cnt))';
    end
    // Additional conditions could be added here
end

However, this approach takes a significant amount of time as Scilab examines each line thoroughly. Is there a more efficient way to achieve this?

Yes, since the failing line yields a vector with less values than expected, I would suggest:

values = msscanf(Steps(j,v), "FL%d Po%d t1%d t2%d PLTI%d FM%d Er%d HB%d B%d t3%d VS%d I%d AT%d TC%d"); // Move String Values into a Double Matrix
if length(values) == 14
    Serial(i,:) = values;
    i = i+1;
end

Thank you very much. It appears to be functioning, though I’m uncertain. I initiated the loop approximately 2 hours ago. Scilab is currently in operation, but it’s been running continuously. I might need to explore an alternative approach. The process is quite time-consuming as Scilab is taking an extended period to process all 35,000 lines of measurement data.
The data contains temperature measurements which is collected for 1 hour and a sampling rate of 10ms.
Maybe I have to try to use Python instead.

Hi ,

you may try the following approach, considering column vectors before you parse it with msscanf;


Steps = csvRead('data.txt', ascii(9), [], "string"); // Read the data example you show in the first post into a matrix
//iterate on the columns using native function msscanf() using the iteration on line '-1' first argument feature 
//thus only 14 calls to msscanf()
c1= msscanf(-1, Steps(:,1), "%2d:%2d:%f");    // timestamps decoding, 1st column > 3 column vectors each for H M S , you may convert into decimal time directly or using datenum if you can retrieve the date stamp somewhere else.
//you may test specific column error code for each column 
// for column 7 Er for instance, you want to know the row index (time) when an error occur thus
c7errindex= grep (Steps(:,7),"**Er**");     //vector of index, here only one index, 4

David

I fear that Python itself won’t help to find the methodology.

David is right! The correct approach is the “dual” one, i.e. read all lines in one call then iterate on columns once wrong line is suppressed, thanks to iterating feature of msscanf.
But in that case the separator should be a space (not a tab) in the csvRead call. The below modified David’s code processes a 35000 lines file like yours in less than a second:

Steps = csvRead('data.txt', " ", [], "string");
c7errindex= grep (Steps(:,7),"**Er**"); 
Steps(c7errindex,:) = []; // remove line with error
Time = msscanf(-1, Steps(:,1), "%2d:%2d:%f"+ascii(9)+"<FL%d");
fmt = strsplit("Po%d t1%d t2%d PLTI%d FM%d Er%d HB%d B%d t3%d VS%d I%d AT%d TC%d"," ");
Serial = zeros(size(Steps,1),14);
Serial(:,1) = Time(:,4);
Time = Time(:,1:3);
for i=2:14
    Serial(:,i) = msscanf(-1,Steps(:,i),fmt(i-1));
end

There is a special handling of the <FL%d format because we have to parse it from the first column which is not splitted because of the tab character.

You guys are awesome! Thank you so much for the effort.

Here’s how my measurement looks: I have 35,000 lines of this. However, there are constantly new inconsistent lines. Now, for example, there’s a missing space between two values. I’m still struggling.

12:40:57.765 <FL79 Po0 t1177 t2181 PLTI157 FLTI2785 FLI250 CA2800 FM80 Er0 HB0 B0 t3235 VS379 i13 AT0 TC91
12:40:57.765 <FL79 Po0 t1177 t2181 PLTI157 FLTI2785 FLI250 CA2800FM80 Er0 HB0 B0 t3235 VS379 i13 AT0 TC92
12:40:57.765 <FL79 Po0 t1177 t2181 PLTI157 FLTI2785 FLI250CA2800 FM80 Er0 HB0 B0 t3235 VS379 i13 AT0 TC93
12:40:57.765 <FL79 Po0 t1177 t2180 PLTI157 FLTI2785 FLI250 CA2800 FM80 Er0 HB0 B0 t3235 VS379 i13 AT0 TC94
12:40:57.765 <FL79 Po0 t1177 t2174 PLTI157 FLTI2785 FLI250 CA2800 FM80 Er0 HB0 B0 t3235 VS379 i10 AT0 TC95
12:40:57.765 <FL79 Po0 t1177 t2178 PLTI157 FLTI2785 FLI250 CA2800 FM80Er0 HB0 B0 t3235 VS379 i10 AT0 TC96

Sorry, such inconsistent data won’t be readable by csvRead

Thank you! The ideal approach would be to implement error handling to gracefully handle inconsistent lines. Your previous post was really helpful for checking the length of cells, but it did slow down considerably when dealing with 35,000 lines of data.