Skip to content

[Feature][http-Sink] Implementing http batch writes #9292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 20, 2025

Conversation

ocean-zhc
Copy link
Contributor

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@hailin0 hailin0 requested a review from Copilot May 9, 2025 01:41
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements HTTP batch writes for the HTTP sink connector in SeaTunnel. Key changes include:

  • Adding batch processing logic in HttpSinkWriter for both array and object modes
  • Introducing new configuration options (array_mode, batch_size, request_interval_ms, and format) for HTTP sinks
  • Updating both tests and documentation to verify and explain the new batch features

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
seatunnel-connectors-v2/connector-http/connector-http-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/http/sink/HttpSinkBatchWriterTest.java Added tests for both object and array modes to validate individual and batch processing
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/sink/HttpSinkWriter.java Enhanced writer logic with batch buffering and configurable HTTP request intervals in array mode
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/sink/HttpSinkFactory.java Integrated new options into the sink factory for proper instantiation of the sink writer
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/sink/HttpSink.java Updated to invoke the new writer with batch processing parameters
seatunnel-connectors-v2/connector-http/connector-http-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/http/config/HttpSinkOptions.java Added new options to support batch processing configuration
docs/zh/connector-v2/sink/Http.md
docs/en/connector-v2/sink/Http.md
Updated documentation to describe the new batch processing features and their configuration

byte[] serialize = serializationSchema.serialize(row);
jsonRecords.add(new String(serialize));
}
String body = "[" + String.join(",", jsonRecords) + "]";
Copy link
Preview

Copilot AI May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using a dedicated JSON library to construct the JSON array, which can help ensure proper formatting and handle edge cases more robustly.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way of manually constructing JSON arrays in the code does have some potential problems, I have constructed json arrays using Jackson

.defaultValue(0)
.withDescription("The interval milliseconds between two HTTP requests");

public static final Option<String> FORMAT =
Copy link
Preview

Copilot AI May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The description for the FORMAT option indicates that only 'json' is supported; consider clarifying the documentation or renaming the option to avoid confusion regarding supported formats.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

@ocean-zhc ocean-zhc May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Http.md documentation already states that

| array_mode | Boolean| No | false | Send data as a JSON array when true, or as a single JSON object when false (default) |
| batch_size | Int | No | 1 | The batch size of records to send in one HTTP request. Only works when array_mode is true. |
| request_interval_ms | Int | No | 0 | The interval milliseconds between two HTTP requests, to avoid sending requests too frequently. |
| format | String | No | json | The format of batch data. Currently only "json" is supported, which will send data as JSON array. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what another format we can support in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently think of just json, later expansion, the code can be modified to enumerate the judgment can be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove useless config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 removed useless config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should be updated too.

| array_mode | Boolean| No | false | Send data as a JSON array when true, or as a single JSON object when false (default) |
| batch_size | Int | No | 1 | The batch size of records to send in one HTTP request. Only works when array_mode is true. |
| request_interval_ms | Int | No | 0 | The interval milliseconds between two HTTP requests, to avoid sending requests too frequently. |
| format | String | No | json | The format of batch data. Currently only "json" is supported, which will send data as JSON array. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should be updated too.

Comment on lines 61 to 63
boolean arrayMode = pluginConfig.get(HttpSinkOptions.ARRAY_MODE);
int batchSize = pluginConfig.get(HttpSinkOptions.BATCH_SIZE);
int requestIntervalMs = pluginConfig.get(HttpSinkOptions.REQUEST_INTERVAL_MS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public void write(SeaTunnelRow element) throws IOException {
if (!arrayMode) {
// Object mode: send each record individually, ignore batch_size setting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Object mode: send each record individually, ignore batch_size setting

// Object mode: send each record individually, ignore batch_size setting
writeSingleRecord(element);
} else {
// Array mode: batch processing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Array mode: batch processing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

try {
// only support post web hook
// Send HTTP request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Send HTTP request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -76,8 +166,15 @@ public void write(SeaTunnelRow element) throws IOException {

@Override
public void close() throws IOException {
if (arrayMode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should invoke flush method when invoke prepareCommit method too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if (Objects.nonNull(httpClient)) {
httpClient.close();
}
}

protected HttpClientProvider createHttpClient(HttpParameter httpParameter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected HttpClientProvider createHttpClient(HttpParameter httpParameter) {
@VisibleForTesting
protected HttpClientProvider createHttpClient(HttpParameter httpParameter) {

return;
}

// Check request interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check request interval

Please do not add useless comment.

Comment on lines 131 to 132
Thread.currentThread().interrupt();
log.warn("Sleep interrupted", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should throw exception directly. Othewise the writer woule never be closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@corgy-w corgy-w merged commit 04ee8ac into apache:dev May 20, 2025
5 checks passed
@ocean-zhc ocean-zhc deleted the httpsink-batch branch May 20, 2025 13:45
joexjx pushed a commit to joexjx/seatunnel that referenced this pull request May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Sink] When the Source is LocalFile and the Sink is HTTP, how to set the batch read size?
3 participants