Skip to content
This repository has been archived by the owner on Feb 24, 2024. It is now read-only.

current_setting can lead to bad performance when used on RLS #609

Open
steve-chavez opened this issue Dec 20, 2022 · 21 comments
Open

current_setting can lead to bad performance when used on RLS #609

steve-chavez opened this issue Dec 20, 2022 · 21 comments
Labels
references Technical reference.

Comments

@steve-chavez
Copy link
Member

steve-chavez commented Dec 20, 2022

Edit: solution below.

Having:

-- $ postgrest-with-postgresql-15 psql -- on nix-shell

create table chat_messages(
  id text PRIMARY KEY
, content text
, workspace uuid
);

create or replace function app_metadata()
returns jsonb as $$
  select ((current_setting('request.jwt.claims', true)::jsonb)->'app_metadata');
$$ language sql stable;

alter table chat_messages enable row level security;

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  workspace = (app_metadata() ->> 'workspace')::uuid
);

grant select on chat_messages to postgrest_test_anonymous;

insert into chat_messages(id, content, workspace)
select 
  'id-' || x, 
  'content-' || x, 
  case when x % 2 = 0 
    then 'b080c0f7-8c22-429c-9d1a-ade3e43fb109'::uuid 
    else '2156a8d9-d06f-40cd-adf8-411ae8c0bc76'::uuid 
  end
from generate_series(1, 3000000) x;

Doing a count is fast without RLS:

explain analyze SELECT COUNT(*) FROM chat_messages;
Planning Time: 0.105 ms
Execution Time: 161.549 ms

When RLS is enabled, the count is awfully slow:

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);

SELECT COUNT(*) FROM chat_messages;
  count
---------
 1500000

explain analyze SELECT COUNT(*) FROM chat_messages;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=90821.44..90821.45 rows=1 width=8) (actual time=18597.007..18597.008 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..90795.25 rows=10476 width=0) (actual time=18597.002..18597.002 rows=0 loops=1)
         Filter: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
         Rows Removed by Filter: 3000000
 Planning Time: 0.783 ms
 Execution Time: 18597.062 ms

This happens because:

  • jsonb casting(::jsonb) is not LEAKPROOF , this means that the casting will be done for each row on the query, PostgreSQL doesn't cache this static value.
  • The json value inside the request.jwt.claims is somewhat lengthy. We have no control over this as the JWT can come from an external system. A shorter jsonclaims will lead to less time:
begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"app_metadata":{"workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76"}}', true);

explain analyze SELECT COUNT(*) FROM chat_messages;

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=117937.00..117937.01 rows=1 width=8) (actual time=3449.217..3449.218 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..117937.00 rows=1 width=0) (actual time=3449.213..3449.213 rows=0 loops=1)
         Filter: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
         Rows Removed by Filter: 3000000
 Planning Time: 0.259 ms
 Execution Time: 3449.257 ms

The count is a contrived example(for which there are workarounds) but more real queries will lead to the same problem.

References:

@dwagin
Copy link
Contributor

dwagin commented Dec 20, 2022 via email

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Dec 20, 2022

This is certainly not because of LEAKPROOF or RLS. You can see it easily, by manually executing the constructed query:

SELECT COUNT(*) FROM chat_messages WHERE workspace = (app_metadata() ->> 'workspace')::uuid;

This is equally slow - even without RLS.

LEAKPROOF is about pushing conditions through RLS, i.e. when you do a query SELECT ... FROM rls_enabled_table WHERE some_func_used_here(col). In this case some_func_used_here must be defined LEAKPROOF to make it run on the base query directly - and not on the result after applying RLS.

But in this case, your function is inside the RLS policy. If you're leaking data there, that's all your own responsibility.

Whan about index on uuid column?

Yes, this is the right solution.

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 20, 2022

But in this case, your function is inside the RLS policy. If you're leaking data there, that's all your own responsibility.

We encourage using current_setting on RLS on the docs , that has been there for quite a while. All the HTTP context GUCs could be used inside RLS, and we can't control the wrapper functions users put over them.

Hello Steve,Whan about index on uuid column?

Thanks dwagin, yeah that works. But the thing is, we didn't have this issue with the legacy GUCs that were available before v9.0.

So in the above example, by doing this:

create or replace function workspace()
returns uuid as $$
  select (current_setting('request.jwt.claims.app_metadata.workspace', true))::uuid;
$$ language sql stable;

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  workspace = workspace()
);

begin;
set local role postgrest_test_anonymous;
select set_config('request.jwt.claims.app_metadata.workspace', '2156a8d9-d06f-40cd-adf8-411ae8c0bc76', true);
explain analyze SELECT COUNT(*) FROM chat_messages;

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=87937.00..87937.01 rows=1 width=8) (actual time=1252.412..1252.413 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..87937.00 rows=1 width=0) (actual time=1252.409..1252.409 rows=0 loops=1)
         Filter: (workspace = (current_setting('request.jwt.claims.app_metadata.workspace'::text, true))::uuid)
         Rows Removed by Filter: 3000000
 Planning Time: 0.292 ms
 Execution Time: 1252.451 ms

The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem.

@steve-chavez
Copy link
Member Author

steve-chavez commented Dec 20, 2022

My proposal would be to have an allowlist config that brings back the legacy GUCs for certain keys to avoid the JSON parsing problem. Could be something like

http-context-shortcuts = "request.jwt.claims.app_metadata.workspace, request.headers.user_agent, request.cookies.session_id"

This seems related to PostgREST/postgrest#1941, maybe the config proposed there could be merged with this one.

@wolfgangwalther
Copy link
Member

The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem.

I still think it's only a problem when you parse it 3 million times like in your example on a seq scan. But that's not the GUCs fault.

My proposal would be to have an allowlist config that brings back the legacy GUCs for certain keys to avoid the JSON parsing problem.

You can easily do this in a pre-request function. Just set your custom GUC to the extracted value from the JSON.

I'd rather wait for schema variables to finally come to PG - because those will solve this nicely.

@steve-chavez
Copy link
Member Author

You can easily do this in a pre-request function. Just set your custom GUC to the extracted value from the JSON.

Yeah, that would work for avoiding the JSON guc but it will also come with a perf decrease because we don't have pipeline mode. Last time I checked adding a pre-request was like 15% reduction in throughput.

I'd rather wait for schema variables to finally come to PG - because those will solve this nicely.

True. Have you reviewed that feature by any chance? I wonder if it will land on pg16 and if indeed we can use it as we intend to.

@steve-chavez
Copy link
Member Author

I wonder if it will land on pg16

That's a downside I guess, it will be almost a year for pg16.

I still think it's only a problem when you parse it 3 million times like in your example on a seq scan. But that's not the GUCs fault.

As I mentioned above the example is contrived. But there are some real world queries with big tables where filtering on some millions of rows is legitimate. I do think the JSON GUC is a problem here since the perf problem doesn't happen on the legacy gucs.

@dwagin
Copy link
Contributor

dwagin commented Dec 20, 2022

The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem.

In my environment only x3 (PG 13).

@steve-chavez
Copy link
Member Author

Hm, I've tried on pg13(with nix-shell and postgrest-with-postgresql-13 psql) and I get a x9 perf increase.

@dwagin
Copy link
Contributor

dwagin commented Dec 20, 2022

And speed up when PARALLEL RESTRICTED

@steve-chavez
Copy link
Member Author

Circling back on this. How about this config:

# this is the default 
http-context = "request.method, request.path, request.jwt.claims, request.headers, request.cookies"

# we also set `search_path` and `role` for every request, but those cannot be modified as they're core to our security model

GUCs can be removed and shortcuts can be added:

http-context = "request.jwt.claims.app_metadata.workspace, request.headers"

For every sub level on request.jwt.claims.sub-level/request.headers.sub-level we'll get the value of the sub-level json key.

This will solve this issue and PostgREST/postgrest#1941.

I think it will be useful even if we can do it with pre-request later.

@steve-chavez
Copy link
Member Author

steve-chavez commented Apr 22, 2023

Just found out a plain SQL solution to the above issue, we can force computing the current_setting once with a MATERIALIZED CTE. So we have to change the above policy to:

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  (
    with cached as materialized(
      select (app_metadata() ->> 'workspace')::uuid as val
    )
    select workspace = val from cached
  )
);

Now the policy is still correct and the COUNT is fast:

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);

SELECT COUNT(*) FROM chat_messages;
  count
---------
 1500000

explain analyze SELECT COUNT(*) FROM chat_messages;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=241687.00..241687.01 rows=1 width=8) (actual time=775.210..775.211 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..237937.00 rows=1500000 width=0) (actual time=0.145..728.419 rows=1500000 loops=1)
         Filter: (SubPlan 3)
         Rows Removed by Filter: 1500000
         SubPlan 3
           ->  Result  (cost=0.05..0.06 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=3000000)
                 CTE cached
                   ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.054..0.056 rows=1 loops=1)
                 InitPlan 2 (returns $1)
                   ->  CTE Scan on cached  (cost=0.00..0.02 rows=1 width=16) (actual time=0.058..0.060 rows=1 loops=1)
 Planning Time: 0.238 ms
 Execution Time: 775.264 ms

This should be added to the docs

@tonyxiao
Copy link

This is epic! Also works wonders for things like postgres realtime too!

@steve-chavez
Copy link
Member Author

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
(
with cached as materialized(
select (app_metadata() ->> 'workspace')::uuid as val
)
select workspace = val from cached
)
);

Note the above policy with the materialized CTE is not always a win. If the column is indexed, the above policy results in a noticeable slowdown(almost x6 slower) decrease:

create index on chat_messages(uuid);

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  (
    with cached as materialized(
      select (app_metadata() ->> 'workspace')::uuid as val
    )
    select workspace = val from cached
  )
);

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=209086.43..209086.44 rows=1 width=8) (actual time=817.665..817.666 rows=1 loops=1)
   ->  Index Only Scan using chat_messages_workspace_idx on chat_messages  (cost=0.43..205336.43 rows=1500000 width=0) (actual time=0.098..776.944 rows=1500000 loops=1)
         Filter: (SubPlan 2)
         Rows Removed by Filter: 1500000
         Heap Fetches: 0
         SubPlan 2
           ->  CTE Scan on cached  (cost=0.03..0.05 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=3000000)
                 CTE cached
                   ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.055..0.057 rows=1 loops=1)
 Planning Time: 0.240 ms
 Execution Time: 817.720 ms

While the original policy without the materialized CTE from above:

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  workspace = (app_metadata() ->> 'workspace')::uuid
);

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-4
0cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca
4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
BEGIN
SET
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=35164.45..35164.46 rows=1 width=8) (actual time=124.649..124.650 rows=1 loops=1)
   ->  Index Only Scan using chat_messages_workspace_idx on chat_messages  (cost=0.45..31414.95 rows=1499800 width=0) (actual time=0.082..78.337 rows=1500000 loops=1)
         Index Cond: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
         Heap Fetches: 0
 Planning Time: 0.353 ms
 Execution Time: 124.694 ms
(6 rows)

So maybe we can just boil down to "always use an index when comparing against current_setting() values, if not possible make sure to cache them in a materialized CTE".

@steve-chavez steve-chavez transferred this issue from PostgREST/postgrest Apr 25, 2023
@steve-chavez steve-chavez changed the title JSON GUCs lead to bad performance when used on RLS current_setting can lead to bad performance when used on RLS Apr 25, 2023
@steve-chavez steve-chavez added the references Technical reference. label Apr 25, 2023
@wolfgangwalther
Copy link
Member

wolfgangwalther commented Apr 26, 2023

There is a better way to force pg to execute the app_metadata function only once for the query: Make it use an InitPlan.

Instead of this:

create policy sel on chat_messages for select
using(
  workspace = (app_metadata() ->> 'workspace')::uuid
);

You'll need to wrap the right side in a scalar subquery:

create policy sel on chat_messages for select
using(
  workspace = (select (app_metadata() ->> 'workspace')::uuid)
);

Since the subquery does not depend on the outer query, it will result in an InitPlan. This will run it exactly once, and then put the result into the main query which can use an index or not, it doesn't matter - it will be fast in both cases.


First policy explained:

Aggregate  (cost=121660.00..121660.01 rows=1 width=8) (actual time=15673.547..15673.548 rows=1 loops=1)
  ->  Seq Scan on chat_messages  (cost=0.00..117937.00 rows=1489200 width=0) (actual time=0.026..15614.352 rows=1500000 loops=1)
        Filter: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
        Rows Removed by Filter: 1500000
Planning Time: 0.281 ms
Execution Time: 15673.592 ms

Second policy explained:

Aggregate  (cost=69187.03..69187.04 rows=1 width=8) (actual time=264.203..264.204 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.028..0.029 rows=1 loops=1)
  ->  Seq Scan on chat_messages  (cost=0.00..65437.00 rows=1500000 width=0) (actual time=0.052..214.629 rows=1500000 loops=1)
        Filter: (workspace = $0)
        Rows Removed by Filter: 1500000
Planning Time: 0.138 ms
Execution Time: 264.229 ms

Second policy + index:

Aggregate  (cost=35168.46..35168.47 rows=1 width=8) (actual time=151.675..151.677 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.027..0.027 rows=1 loops=1)
  ->  Index Only Scan using chat_messages_workspace on chat_messages  (cost=0.43..31418.43 rows=1500000 width=0) (actual time=0.049..98.173 rows=1500000 loops=1)
        Index Cond: (workspace = $0)
        Heap Fetches: 0
Planning Time: 0.144 ms
Execution Time: 151.703 ms

@tonyxiao
Copy link

interesting, so should be the same performance as materialized CTE but a bit less verbose to write?

@wolfgangwalther
Copy link
Member

interesting, so should be the same performance as materialized CTE but a bit less verbose to write?

No, it has better performance than a materialized CTE.

@GaryAustin1
Copy link

I did a quick test in https://github.com/orgs/supabase/discussions/9311#discussioncomment-5732511 and get the same result with either method on a quick test, I'll do more. It makes sense they would be the same as they set a constant value to compare rows against and get run once.

BUT I believe this method is the one scattered thru out my original table https://github.com/orgs/supabase/discussions/9311#discussioncomment-3815869 which did things like:
image

@GaryAustin1
Copy link

I did just run a 2nd test of the original auth.uid() = user_id 25K rows, and get
43, 13, 3 for original CTE, init with non_indexed user_id
.07, 13, .13 for orginal, CTE, init with indexed user_id

So the initplan way is faster in this case.
If you have an indexed column it seems to take advantage of it, but still slower than just the indexed column test.

@wolfgangwalther
Copy link
Member

The difference you see between initplan and without for an indexed query is negligible. In theory, it's a very tiny overhead of running an additional initplan node in the query. But doing a few more runs gives me the following results:

Bare query using index, but no initplan:

  • 152ms
  • 123ms
  • 126ms
  • 125ms
  • 122ms

Scalar subquery using index and initplan:

  • 152ms
  • 124ms
  • 129ms
  • 124ms
  • 126ms

There's a lot more noise here than any noticeable difference between the two.

@GaryAustin1
Copy link

GaryAustin1 commented Apr 26, 2023

Added some more tests a realistic two part RLS policy here with code: https://github.com/orgs/supabase/discussions/9311#discussioncomment-5734798
is_admin() OR auth.uid() = user_id
100K selects, no difference between indexed and not. Offset is adding limit 10 offset 90000 but only for admin as user gets 1 row.
wrapped is initplan

role normal wrapped CTE
user 1900 10.3 53
admin 1900 18.5 18.5
offset 90k 1900 48 48

Edit... Adding where user_id = 'UUID' to the query takes the time down to... 0.2msec from 10 and 53 for a user query.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
references Technical reference.
Development

No branches or pull requests

5 participants
-