You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-2Lines changed: 7 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,11 @@ The registry_create_policy is used when the pipeline is started to either resume
40
40
41
41
interval defines the minimum time the registry should be saved to the registry file (by default 'data/registry.dat'), this is only needed in case the pipeline dies unexpectedly. During a normal shutdown the registry is also saved.
42
42
43
-
During the pipeline start the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
43
+
When registry_local_path is set to a directory, the registry is save on the logstash server in that directory. The filename is the pipe.id
44
+
45
+
with registry_create_policy set to resume and the registry_local_path set to a directory where the registry isn't yet created, should load from the storage account and save the registry on the local server
46
+
47
+
During the pipeline start for JSON codec, the plugin uses one file to learn how the JSON header and tail look like, they can also be configured manually.
44
48
45
49
## Running the pipeline
46
50
The pipeline can be started in several ways.
@@ -91,6 +95,7 @@ The log level of the plugin can be put into DEBUG through
because debug also makes logstash chatty, there are also debug_timer and debug_until that can be used to print additional informantion on what the pipeline is doing and how long it takes. debug_until is for the number of events until debug is disabled.
94
99
95
100
## Other Configuration Examples
96
101
For nsgflowlogs, a simple configuration looks like this
Copy file name to clipboardExpand all lines: lib/logstash/inputs/azure_blob_storage.rb
+96-47Lines changed: 96 additions & 47 deletions
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,9 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
39
39
# The default, `data/registry`, it contains a Ruby Marshal Serialized Hash of the filename the offset read sofar and the filelength the list time a filelisting was done.
# The default, `resume`, will load the registry offsets and will start processing files from the offsets.
43
46
# When set to `start_over`, all log files are processed from begining.
44
47
# when set to `start_fresh`, it will read log files that are created or appended since this start of the pipeline.
@@ -58,6 +61,9 @@ class LogStash::Inputs::AzureBlobStorage < LogStash::Inputs::Base
58
61
# debug_until will for a maximum amount of processed messages shows 3 types of log printouts including processed filenames. This is a lightweight alternative to switching the loglevel from info to debug or even trace
@logger.info("resuming from remote registry #{registry_path}")
157
+
end
158
+
break
135
159
rescueException=>e
136
-
@logger.error(@pipe_id+" caught: #{e.message}")
160
+
@logger.error("caught: #{e.message}")
137
161
@registry.clear
138
-
@logger.error(@pipe_id+" loading registry failed for attempt #{counter} of 3")
162
+
@logger.error("loading registry failed for attempt #{counter} of 3")
139
163
end
140
164
end
141
165
end
142
166
# read filelist and set offsets to file length to mark all the old files as done
143
167
ifregistry_create_policy == "start_fresh"
144
-
@logger.info(@pipe_id+" starting fresh")
145
168
@registry=list_blobs(true)
146
169
save_registry(@registry)
147
-
@logger.info("writing the registry, it contains#{@registry.size} blobs/files")
170
+
@logger.info("starting fresh, writing a clean the registry to contain#{@registry.size} blobs/files")
148
171
end
149
172
150
173
@is_json=false
@@ -164,27 +187,32 @@ def register
164
187
iffile_tail
165
188
@tail=file_tail
166
189
end
167
-
@logger.info(@pipe_id+" head will be: #{@head} and tail is set to #{@tail}")
190
+
@logger.info("head will be: #{@head} and tail is set to #{@tail}")
168
191
end
169
-
end# def register
170
-
171
-
172
192
173
-
defrun(queue)
174
193
newreg=Hash.new
175
194
filelist=Hash.new
176
195
worklist=Hash.new
177
-
# we can abort the loop if stop? becomes true
196
+
@last=start=Time.now.to_i
197
+
198
+
# This is the main loop, it
199
+
# 1. Lists all the files in the remote storage account that match the path prefix
200
+
# 2. Filters on path_filters to only include files that match the directory and file glob (**/*.json)
201
+
# 3. Save the listed files in a registry of known files and filesizes.
202
+
# 4. List all the files again and compare the registry with the new filelist and put the delta in a worklist
203
+
# 5. Process the worklist and put all events in the logstash queue.
204
+
# 6. if there is time left, sleep to complete the interval. If processing takes more than an inteval, save the registry and continue.
205
+
# 7. If stop signal comes, finish the current file, save the registry and quit
178
206
while !stop?
179
-
chrono=Time.now.to_i
180
207
# load the registry, compare it's offsets to file list, set offset to 0 for new files, process the whole list and if finished within the interval wait for next loop,
@logger.info("list_blobs took #{Time.now.to_i - chrono} sec")
379
417
end
380
-
if(@debug_until > @processed)then@logger.info(@pipe_id+" list_blobs took #{Time.now.to_i - chrono} sec")end
381
418
returnfiles
382
419
end
383
420
384
421
# When events were processed after the last registry save, start a thread to update the registry file.
385
422
defsave_registry(filelist)
386
-
# TODO because of threading, processed values and regsaved are not thread safe, they can change as instance variable @!
423
+
# Because of threading, processed values and regsaved are not thread safe, they can change as instance variable @! Most of the time this is fine because the registry is the last resort, but be careful about corner cases!
387
424
unless@processed == @regsaved
388
425
@regsaved=@processed
389
-
@logger.info(@pipe_id+" processed #{@processed} events, saving #{filelist.size} blobs and offsets to registry #{registry_path}")
0 commit comments