File Stream Data Presist Issue #2284
-
Hello Streampipes, ![]() The file I upload is attached: I then add the timestamp and choose to persist events. When I create a new data view in the data explorer as follows: ![]() I find the number of raws much less than expected. Doing some investigation I find that number of raws matching the # Events in "Settings - Data Lake". However, checking the Data Lake Metrics, I find 20,480 consumed as expected. Refer below: ![]() It only worked properly once, but other than that it keeps returning differing number of rows with inconsistent value each time. Could you please help me get this right? Or at least understand such behavior? Thank you, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Hi @AkthemRehab, The problem you are facing is due to the fact that the File Stream Adapter is primarily designed for testing purposes. It operates by taking a CSV file and replaying its contents. When there's no "original" timestamp present in the data, the adapter generates a new timestamp for each row upon reading. Consequently, this results in multiple events sharing the same timestamp. When these events are stored in the time series database, entries with identical timestamps overwrite each other. As a result, your data lake sink will receive the correct number of events, but not all of them will be stored in the database. The timestamp is used as an index in time series databases. If two events arrive with the same time stamp, they are overwritten. To address this, consider providing an 'original' timestamp to your CSV data. By doing so, StreamPipes will utilize this timestamp, and you should observe the desired outcome. Does this explanation clarify the issue for you? Cheers, |
Beta Was this translation helpful? Give feedback.
-
No, that's something else. Let's discuss it in the other discussion. |
Beta Was this translation helpful? Give feedback.
Hi @AkthemRehab,
thanks for your question.
The problem you are facing is due to the fact that the File Stream Adapter is primarily designed for testing purposes. It operates by taking a CSV file and replaying its contents. When there's no "original" timestamp present in the data, the adapter generates a new timestamp for each row upon reading. Consequently, this results in multiple events sharing the same timestamp. When these events are stored in the time series database, entries with identical timestamps overwrite each other. As a result, your data lake sink will receive the correct number of events, but not all of them will be stored in the database. The timestamp is used as an index i…