A data engineer (let’s call him Dan) took advantage of some preexisting views that apply the same row-level security as the source systems. By the way, everyone on Dan’s team has access to all of the data in the SAP systems. He does some testing, thinks it’s done and announces that this new combined view is available for use. Alice starts using the view, likes what she sees, and builds the results into an interactive visualization. Others start using the visualization… Due to the RLS, Bob’s results are slightly different than Alice’s, but he doesn’t notice. When Lin tries it, nothing shows up. She tells Alice her visualization is broken. Alice says “it looks good to me.” Lin is frustrated…
Clear and Communicated Requirements
How can this situation be avoided? Make sure the requirements of your data engineering project are clear and communicated to all parties. Row-level security should be explicitly addressed in all requirements involving systems like this. They should have either said “Results should be the same regardless of the access level of the user”, or “Results will differ depending on the access level of the user.”
The requirements in the scenario above were to build a view that aggregates data across business units to give us a company-wide look at financial results. That seems to imply that anyone using the view will get the same results. Don’t make assumptions — spell it out. Say that “Everyone using this view will get the same results, regardless of row-level security.” Maybe Dan didn’t realize this when building the combined view — if he had, he wouldn’t have built the solution by reusing the views with RLS.
Test Before Calling it Done
Once requirements are clear and spelled out, you need to make sure that they’re tested before calling it “done.” The first step in that is, who’s doing the testing? Obviously, the data engineer should be testing, but it’s likely that they have a blind spot or two. Did they read the requirements? Did they interpret them correctly? Did they check to ensure all the requisite software engineering best practices were correctly followed?
For complex data engineering and analytics projects, it’s almost always better to have someone else do some testing, in addition to the data engineer, before calling it “done.” And whoever that person is, they’re going to need to have different identities or roles set up to simulate all of the different users we mentioned above — Alice, Bob, and Lin. It’s especially important to be testing with users that have access to most, but not all of the data — partially limited access can lead to slight differences that often get overlooked. You don’t want to be introducing errors like that into your analytics.
Okay, let’s say you’ve spelled out the requirements. And you’ve done the initial testing to make sure all those different identities are working correctly. Awesome! Life is good! What could possibly go wrong?
Scenario: Expose Data Users can Access
Let’s take a look at a different scenario: a view that only exposes data that the user has access to. The requirement: A view that generates a list of general ledger entries across all of the SAP instances. Users will get different results depending on their access. Dan takes that requirement and implements it by reusing the views with RLS: