Skip to content

feat(core): add validation rule for the condition syntax check of row-level access controls #1176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented May 2, 2025

Description

Wait #1161
Add the rule to check if the condition of RLAC is valid.

URL

POST /v3/connector/{data_source}/validate/rlac_condition_syntax_is_valid

Sample Payload

{
    "manifest_str": "...",
    "modelName": "orders",
    "requiredProperties": [
        {
            "name": "@SESSION_LEVEL",
            "required": true
        },
        {
            "name": "@SESSION_ROLE",
            "required": true
        },
        {
            "name": "@SESSION_REGION",
            "required": true
        }
    ],
    "condition": "level = @SESSION_LEVEL AND role = @SESSION_ROLE OR region = @SESSION_REGION"
}

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

- **New Features**
  - Added validation for row-level access control (RLAC) condition syntax, ensuring all session properties referenced in conditions are properly declared.
  - Introduced new Python and Rust APIs to support RLAC rule validation and model access.
  - Expanded validation endpoints and test coverage for RLAC scenarios.

- **Bug Fixes**
  - Improved error messages for validation failures related to missing session properties in RLAC conditions.

- **Tests**
  - Added comprehensive tests for RLAC rule validation, covering both valid and invalid scenarios.

- **Documentation**
  - Enhanced Python bindings documentation by exposing new classes and functions for RLAC validation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Copy link

coderabbitai bot commented May 2, 2025

Walkthrough

This update introduces row-level access control (RLAC) condition syntax validation to the system. It adds new Rust and Python bindings for RLAC-related types and validation logic, exposes these to Python, and integrates a new validation rule in the server. Associated tests are added at both the Rust and Python levels.

Changes

File(s) Change Summary
ibis-server/app/model/__init__.py, ibis-server/app/model/validator.py Broadened the type annotation for validation parameters and added a new RLAC condition syntax validation rule and method. Updated the validation rules list and method signatures accordingly.
ibis-server/tests/routers/v3/connector/postgres/test_validate.py Added async test for the new RLAC condition syntax validation endpoint, covering both valid and invalid scenarios.
wren-core-base/src/mdl/py_method.rs Added Python bindings for SessionProperty and RowLevelAccessControl structs, and a method to retrieve models by name from the manifest.
wren-core-py/src/lib.rs, wren-core-py/src/manifest.rs, wren-core-py/src/validation.rs Introduced a new validation module and exposed RLAC validation and manifest conversion functions and types to Python. Made to_manifest a Python-callable function.
wren-core-py/tests/test_modeling_core.py Added tests for RLAC rule validation, checking for correct handling of session property requirements.
wren-core/core/src/logical_plan/analyze/access_control.rs Added the validate_rlac_rule function to check RLAC conditions for undeclared session properties, and comprehensive tests for this logic.
wren-core/core/src/logical_plan/analyze/mod.rs Made the access_control module public to allow external access.
wren-core-py/src/errors.rs, wren-core-py/src/extractor.rs Minor refactoring: simplified error message formatting and replaced filtering logic with a helper method. No changes to public interfaces.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API Server
    participant RLAC Validator (Python)
    participant Manifest (Rust)
    participant RLAC Validator (Rust)

    Client->>API Server: POST /validate/rlac_condition_syntax_is_valid (parameters, manifest)
    API Server->>RLAC Validator (Python): validate_rlac_condition_syntax_is_valid(parameters, manifest)
    RLAC Validator (Python)->>Manifest (Rust): to_manifest(manifest_str)
    Manifest (Rust)-->>RLAC Validator (Python): Manifest object
    RLAC Validator (Python)->>Manifest (Rust): get_model(model_name)
    Manifest (Rust)-->>RLAC Validator (Python): Model object
    RLAC Validator (Python)->>RLAC Validator (Rust): validate_rlac_rule(RLAC, Model)
    RLAC Validator (Rust)-->>RLAC Validator (Python): Validation result (Ok/Error)
    RLAC Validator (Python)-->>API Server: Success or ValidationError
    API Server-->>Client: HTTP 204 (success) or 422 (error)
Loading

Suggested reviewers

  • douenergy

Poem

In fields of code where carrots grow,
A rabbit hops with RLAC in tow.
Now session props must match their name,
Or errors gently stake their claim.
With tests and bindings, all in sync—
This patch is fresher than you think!
🥕✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core ibis dependencies Pull requests that update a dependency file rust Pull requests that update Rust code python Pull requests that update Python code labels May 2, 2025
@goldmedal goldmedal force-pushed the feat/rls-validation branch from 5968f52 to d146a84 Compare May 5, 2025 03:15
@goldmedal goldmedal force-pushed the feat/rls-validation branch from d146a84 to 4c159d6 Compare May 6, 2025 07:11
@goldmedal goldmedal marked this pull request as ready for review May 6, 2025 13:27
@goldmedal goldmedal requested a review from douenergy May 6, 2025 13:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
ibis-server/tests/routers/v3/connector/postgres/test_validate.py (1)

124-178: Enhance test coverage for RLAC validation.

The test effectively validates basic success and failure cases for RLAC condition syntax validation, including support for both string and boolean formats for the "required" flag. However, there are opportunities to make the tests more comprehensive:

Consider extending the test coverage to include:

  1. A test case with "required": true or "required": True to verify that required properties are correctly handled
  2. A test with multiple session properties to ensure all are validated correctly
  3. A test with invalid condition syntax (e.g., malformed SQL) to verify syntax validation
async def test_validate_rlac_condition_syntax_is_valid(
    client, manifest_str, connection_info
):
    # Existing test cases...
    
+    # Test with required property set to true
+    response = await client.post(
+        url=f"{base_url}/validate/rlac_condition_syntax_is_valid",
+        json={
+            "connectionInfo": connection_info,
+            "manifestStr": manifest_str,
+            "parameters": {
+                "modelName": "orders",
+                "requiredProperties": [
+                    {"name": "session_order", "required": True},
+                ],
+                "condition": "@session_order = o_orderkey",
+            },
+        },
+    )
+    assert response.status_code == 204
+
+    # Test with multiple session properties
+    response = await client.post(
+        url=f"{base_url}/validate/rlac_condition_syntax_is_valid",
+        json={
+            "connectionInfo": connection_info,
+            "manifestStr": manifest_str,
+            "parameters": {
+                "modelName": "orders",
+                "requiredProperties": [
+                    {"name": "session_order", "required": "false"},
+                    {"name": "session_role", "required": "false"},
+                ],
+                "condition": "@session_order = o_orderkey AND @session_role = 'admin'",
+            },
+        },
+    )
+    assert response.status_code == 204
+
+    # Test with invalid condition syntax
+    response = await client.post(
+        url=f"{base_url}/validate/rlac_condition_syntax_is_valid",
+        json={
+            "connectionInfo": connection_info,
+            "manifestStr": manifest_str,
+            "parameters": {
+                "modelName": "orders",
+                "requiredProperties": [
+                    {"name": "session_order", "required": "false"},
+                ],
+                "condition": "@session_order = = o_orderkey",  # Invalid syntax
+            },
+        },
+    )
+    assert response.status_code == 422
+    assert "Error during planning: Syntax error" in response.text
wren-core-py/tests/test_modeling_core.py (2)

325-336: Good negative test case

This second test case properly validates error handling when a session property is used in the condition but not declared in the required properties list. The error message assertion should be moved outside the pytest.raises context manager block to ensure it's actually evaluated.

 with pytest.raises(Exception) as e:
     validate_rlac_rule(rlac, model)
-     assert (
-         str(e.value)
-         == "Exception: DataFusion error: Error during planning: The session property @session_user is used, but not found in the session properties"
-     )
+
+assert (
+    str(e.value)
+    == "Exception: DataFusion error: Error during planning: The session property @session_user is used, but not found in the session properties"
+)

307-337: Consider adding more test cases

The current tests cover the basic success and failure cases, but could be extended to test other validation scenarios such as:

  1. Invalid condition syntax
  2. Referencing columns that don't exist in the model
  3. Case sensitivity handling for property names

These would provide more comprehensive test coverage of the validation function.

ibis-server/app/model/validator.py (2)

168-176: Consider validating property structure

While the method converts the incoming properties to SessionProperty objects, it doesn't validate the structure of each property object beforehand. This could lead to runtime errors if the incoming data doesn't match the expected format.

Consider adding validation for each property object before conversion:

for prop in required_properties:
    if not isinstance(prop, dict):
        raise ValidationError(f"Property must be an object, got {type(prop)}")
    if "name" not in prop:
        raise ValidationError("Property missing required field 'name'")
    if "required" not in prop:
        raise ValidationError("Property missing required field 'required'")

189-191: Improve error message handling

The current implementation passes the exception object directly to ValidationError, which might result in less informative error messages. Consider extracting and formatting the error message for better clarity.

 try:
     validate_rlac_rule(rlac, model)
 except Exception as e:
-    raise ValidationError(e)
+    error_msg = f"Invalid RLAC condition: {str(e)}"
+    raise ValidationError(error_msg)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0f888d and 4c159d6.

📒 Files selected for processing (12)
  • ibis-server/app/model/__init__.py (1 hunks)
  • ibis-server/app/model/validator.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_validate.py (1 hunks)
  • wren-core-base/src/mdl/py_method.rs (3 hunks)
  • wren-core-py/src/errors.rs (1 hunks)
  • wren-core-py/src/extractor.rs (1 hunks)
  • wren-core-py/src/lib.rs (2 hunks)
  • wren-core-py/src/manifest.rs (1 hunks)
  • wren-core-py/src/validation.rs (1 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (3 hunks)
  • wren-core/core/src/logical_plan/analyze/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
wren-core-py/src/extractor.rs (1)
wren-core/core/src/mdl/mod.rs (2)
  • catalog (266-268)
  • schema (270-272)
ibis-server/tests/routers/v3/connector/postgres/test_validate.py (3)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/postgres/test_dry_plan.py (1)
  • manifest_str (33-34)
ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
  • connection_info (37-44)
wren-core-py/tests/test_modeling_core.py (8)
wren-core-py/src/manifest.rs (2)
  • to_json_base64 (10-14)
  • to_manifest (18-23)
ibis-server/app/mdl/core.py (1)
  • to_json_base64 (17-18)
wren-core-py/src/validation.rs (1)
  • validate_rlac_rule (7-13)
wren-core/core/src/logical_plan/analyze/access_control.rs (3)
  • validate_rlac_rule (82-110)
  • test_validate_rlac_rule (691-756)
  • required_properties (243-260)
ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1)
  • manifest_str (30-31)
wren-core-base/src/mdl/builder.rs (3)
  • model (68-71)
  • model (288-291)
  • condition (298-301)
wren-core-base/src/mdl/py_method.rs (1)
  • get_model (53-61)
wren-core-base/src/mdl/manifest.rs (4)
  • name (256-258)
  • name (289-291)
  • name (300-302)
  • name (306-308)
wren-core/core/src/logical_plan/analyze/access_control.rs (3)
wren-core-py/src/validation.rs (1)
  • validate_rlac_rule (7-13)
wren-core-base/src/mdl/builder.rs (11)
  • model (68-71)
  • model (288-291)
  • condition (298-301)
  • new (44-56)
  • new (103-117)
  • new (196-210)
  • new (277-286)
  • new (313-325)
  • new (362-370)
  • new (392-399)
  • new_required (175-181)
wren-core-py/tests/test_modeling_core.py (1)
  • test_validate_rlac_rule (307-336)
🔇 Additional comments (21)
wren-core-py/src/manifest.rs (1)

16-16: Great addition exposing the function to Python!

Making the to_manifest function accessible from Python via the #[pyfunction] attribute complements the existing to_json_base64 function and provides symmetry in the API. This change enables Python code to easily convert base64-encoded manifest strings to proper Manifest objects, which is essential for the RLAC validation workflow.

wren-core/core/src/logical_plan/analyze/mod.rs (1)

1-1: Good visibility change for the access_control module

Making the access_control module public is appropriate since it contains RLAC validation logic that needs to be accessible from other parts of the codebase, particularly through the Python bindings in wren-core-py.

ibis-server/app/model/__init__.py (1)

360-360: Appropriate type generalization for validation parameters

Changing the parameters type from a specific dict[str, str] to a more general dict is necessary to support the complex nested structures needed for RLAC condition validation, including session properties and condition strings.

wren-core-py/src/extractor.rs (1)

59-60: Code readability improvement

The change from map_or(true, ...) to is_none_or(...) makes the intent of the code clearer while maintaining the same functionality. The filter condition correctly checks if either the table's catalog/schema is None or matches the model's catalog/schema.

wren-core-py/src/errors.rs (1)

54-54: Streamlined error handling. LGTM!

The change removes the unnecessary "DataFusion error: " prefix, simplifying the error message formatting. Since DataFusionError likely already contains contextual information, this avoids duplication in error messages.

wren-core-py/src/validation.rs (1)

1-14: Well-structured Python binding for RLAC validation. LGTM!

The implementation is clean and follows good practices:

  1. Takes references to input parameters
  2. Returns a proper Result type for error propagation
  3. Uses the ? operator for concise error handling
  4. Correctly exposes the function to Python via #[pyfunction]

This thin wrapper around the core Rust validation logic provides a clean interface for Python to validate RLAC rules.

wren-core-py/src/lib.rs (3)

10-10: LGTM: Clean module declaration.

The new validation module is properly declared following the established pattern in this file.


19-22: LGTM: Appropriate class exports for RLAC.

The manifest-related classes needed for RLAC validation are properly exposed to Python.


24-25: LGTM: Properly exposed functions for Python.

Both the manifest parser and RLAC validation functions are correctly exposed to Python, enabling the validation rule implementation.

wren-core-py/tests/test_modeling_core.py (2)

6-13: Good additions to the imports list

The new imports clearly indicate the new functionality being tested: row-level access control validation with RowLevelAccessControl, SessionProperty, to_manifest, and validate_rlac_rule.


307-323: Well-structured test setup

The test correctly initializes the test environment by:

  1. Converting the base64 manifest string to a manifest object
  2. Retrieving the "customer" model
  3. Creating a valid RLAC rule with required properties and condition

The condition correctly references the model column "c_name" and session property "@session_user", matching the properties defined in the manifest.

wren-core/core/src/logical_plan/analyze/access_control.rs (2)

78-110: Well-implemented RLAC validation function

The validate_rlac_rule function is well-designed and follows best practices:

  1. Clear documentation explaining the purpose
  2. Proper use of existing functions (collect_condition) to avoid code duplication
  3. Case-insensitive comparison using to_lowercase() for property names
  4. Clear error messages that specify which properties are missing
  5. Efficient filtering using iterators

The function effectively validates that all session properties referenced in the condition are declared in the required properties list.


690-756: Comprehensive test coverage

The test_validate_rlac_rule function provides excellent test coverage, including:

  1. Valid rules with matching session properties (both single and multiple properties)
  2. Invalid rules with missing session properties
  3. Invalid rules with syntax errors
  4. Invalid rules referencing non-existent columns

Each test case has clear assertions for the expected behavior and error messages.

wren-core-base/src/mdl/py_method.rs (4)

22-22: Appropriate import for new types

The import statement correctly adds RowLevelAccessControl and SessionProperty types from the manifest module, which will be exposed to Python.


53-61: Useful model retrieval method

The get_model method provides a convenient way to retrieve a specific model by name from a manifest. It correctly:

  1. Searches through the models collection
  2. Uses find for efficient lookup
  3. Handles cloning of the Arc-wrapped model
  4. Returns an Option to handle the case where the model is not found

This method is essential for the validation workflow where a specific model needs to be validated.


73-83: Good SessionProperty Python binding

The Python binding for SessionProperty is well-implemented with:

  1. A clear constructor with appropriate default values
  2. Support for both required and optional parameters
  3. Support for optional default expressions

This allows Python code to easily create and manipulate SessionProperty objects.


86-96: Well-defined RowLevelAccessControl Python binding

The Python binding for RowLevelAccessControl is properly implemented with:

  1. A clear constructor with appropriate default values
  2. Support for empty required_properties list by default
  3. All essential fields exposed to Python

This enables Python code to create RLAC objects for validation purposes.

ibis-server/app/model/validator.py (4)

3-8: Appropriate imports for RLAC validation

The imports clearly bring in all the necessary components for RLAC validation:

  1. RowLevelAccessControl and SessionProperty for creating validation objects
  2. to_manifest for converting manifest strings to objects
  3. validate_rlac_rule for performing the validation

These imports support the new validation functionality.


15-15: Good addition to the validation rules list

Adding "rlac_condition_syntax_is_valid" to the rules list makes the new validation rule available to the validation framework.


23-23: Necessary type hint generalization

Changing the type hint from dict[str, str] to dict is appropriate since the RLAC validation needs to handle more complex parameter structures like nested objects in the requiredProperties field.


154-192: Well-implemented RLAC validation method

The _validate_rlac_condition_syntax_is_valid method is well-structured and follows the pattern of other validation methods:

  1. Checks for required parameters with clear error messages
  2. Extracts the necessary values from the parameters
  3. Constructs the required objects (SessionProperty and RowLevelAccessControl)
  4. Converts the manifest string and retrieves the model
  5. Performs the validation with proper error handling

The implementation correctly transforms the API parameters into the objects needed for validation and propagates errors appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant