Skip to content

PLUGIN-1950 : Log zero records for Table mode and SQL Statement mode.#75

Open
sahusanket wants to merge 1 commit into
developfrom
PLUGIN-1950_log_zero_records
Open

PLUGIN-1950 : Log zero records for Table mode and SQL Statement mode.#75
sahusanket wants to merge 1 commit into
developfrom
PLUGIN-1950_log_zero_records

Conversation

@sahusanket
Copy link
Copy Markdown

@sahusanket sahusanket commented May 15, 2026

Add Logging for Empty Tables and Queries in Multi-Table Plugins

Problem

When reading from multiple database tables or executing multiple SQL statements, there is currently no visibility into which tables or queries yielded zero records.

Solution

This PR adds informational logging whenever an ingestion source (table or custom query) produces exactly zero records.

Key Changes

  1. Empty Table Detection in Multi-Table Mode:

    • MultiTableDBInputFormat: Updated getTableSplits to explicitly detect when a table is empty (bounding query returns NULL, NULL) and return exactly one full-table split (1=1).
    • DBTableRecordReader: Added logging to emit Source table '<fullTableName>' has zero records. when exactly zero rows are read. Verified that the reader is processing a full-table split (1=1) to prevent false alarms on empty partial splits when splitsPerTable > 1.
  2. Improved Logging in SQL Statement Mode:

    • SQLStatementRecordReader: Added logging to emit SQL statement '<id>' ('<query>') has zero records. when exactly zero rows are read.

Testing

  1. Verified in CDAP Sandbox that it is able to log the statement for Empty Table with splits = 4
Screenshot 2026-05-15 at 5 24 52 PM
  1. Verified in CDAP Sandbox that it is able to log the statement for SQL statement mode
Screenshot 2026-05-15 at 5 34 57 PM
  1. Also verified that in case of Proper data, it is not printing this LOG LINE.

  2. Verified in PROD environment as well.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logging to identify when source tables or SQL statements return zero records, enhancing observability. It also updates MultiTableDBInputFormat to handle cases where split boundaries are null by returning a single default split. Feedback was provided regarding SQLStatementRecordReader to move the full SQL statement logging from INFO to DEBUG level to prevent the exposure of sensitive information and avoid log bloat.

}
if (!results.next()) {
if (pos == 0) {
LOG.info("SQL statement '{}' ('{}') has zero records.", split.getId(), split.getSqlStatement());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging the full SQL statement at INFO level can expose sensitive information (such as PII or credentials in the WHERE clause) in the logs. It can also lead to log bloat for very large queries. Consider logging only the statement ID at INFO level and moving the full query to DEBUG level.

          LOG.info("SQL statement '{}' has zero records.", split.getId());
          LOG.debug("SQL statement '{}' ('{}') has zero records.", split.getId(), split.getSqlStatement());

@sahusanket sahusanket self-assigned this May 15, 2026
@sahusanket sahusanket requested a review from vikasrathee-cs May 15, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant