Bakken & Baeck logo

Dynamic materialized views in SQLAlchemy | Nico Lutz

How to implement dynamic materialized views in SQLAlchemy

For those who don't know, ORMs are an abstraction that, in this case, map each row of a database to a python class. In such a way that makes it easy to talk to a database by just calling functions on an object instead of directly using e.g. sql. ORMs do have their strengths and weaknesses. They can speed up the development, especially for smaller projects. Furthermore they ease up on domain knowledge on the developer and can enforce data consistency across an app. In general this comes with a cost of slower performance (especially at scale) and fails to compete when introducing complex sql queries.

That being said the following paragraphs will show how one can model dynamic MATERIALIZED VIEWs in the popular python ORM SQLAlchemy. As a database backend we choose postgres.

Imagine the following use case. In a current project we have users, user roles (role A and role B) and a huge table consisting of shared information (MAIN). But only certain user roles can see a certain portion of the MAIN table. We could have opted for a solution where we split the MAIN table into two tables (one for role A and one for role B), but choose another option instead. We are creating a VIEW. A view is a stored query inside the database. It can be accessed as a virtual table but is run every time it is referenced as a query. This gives us the advantage of only having to maintain only one table (MAIN).

Views are not straightforward in SQLAlchemy and require a bit of engineering and understanding of SQLAlchemy internals. Luckily Mike Bayer posted a great recipe on VIEWS, which we can tune to our needs:

import sqlalchemy as sa
from sqlalchemy.ext import compiler
from sqlalchemy.schema import DDLElement
from sqlalchemy.sql import table


class CreateView(DDLElement):
    def __init__(self, name, selectable):
        self.name = name
        self.selectable = selectable

@compiler.compiles(CreateView)
def _create_view(element, compiler, **kw):
    return "CREATE VIEW %s AS %s" % (
        element.name,
        compiler.sql_compiler.process(element.selectable, literal_binds=True),
    )


@compiler.compiles(DropView)
def _drop_view(element, compiler, **kw):
    return "DROP VIEW %s" % (element.name)

With the above in mind it is straightforward to create a VIEW for our queries. But after a couple of tests it becomes immediately clear that the VIEW is now a huge performance bottleneck. The reason is clear - from the postgres docs:

It can be accessed as a virtual table but is run every time is it referenced as a query.

This VIEW gets called every time on our page! Luckily there is a way to materialize the VIEW. In postgres a MATERIALIZED VIEW is a VIEW where the result of the query is persistent in a table. Think of this is a "cache" for your query. All we need to change is the compile function where we add the MATERIALIZED VIEW statement, like so:

@compiler.compiles(CreateMaterializedView)
def compile(element, compiler, **kw):
    return "CREATE MATERIALIZED VIEW IF NOT EXISTS %s AS %s" % (
        element.name,
        compiler.sql_compiler.process(element.selectable, literal_binds=True),
    )

A critical piece of code is missing though. The materialized view will not be updated when the underlying data changes - from the postgres docs.

A materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view's query is stored, so that fresh data can be generated for the materialized view with:

In order for this to happen one must REFRESH the MATERIALIZED VIEW, so we create a separate function that we call every time some critical updates happen in the underlying data.

async def refresh_mat_view(name, concurrently):
    # since session.execute() bypasses autoflush, must manually flush in order
    # to include newly-created/modified objects in the refresh
    _con = "CONCURRENTLY " if concurrently else ""
    async with AsyncSessionFactory() as session:
        await session.execute("REFRESH MATERIALIZED VIEW " + _con + name)

We could stop here - but wait we are missing one thing. In our case we wanted to dynamically generate these views. What we mean with this is that we didn't want to define a class for these MATERIALIZED VIEWs. Based on the user roles we want to have access to the underlying ORM model and create these classes in the code (and don't migrate our database all the time when we create a new MATERIALIZED VIEW). In order to do so, we need to define an abstract SQLAlchemy class that will hold - as a dummy - the row data. As an upside we can also reference our refresh_mat_view in that class to refresh the data when needed. We can even add relationships as @declared_attr.

class MaterializedUserRole(Base):
    # Overall, materialized views in PostgreSQL are a very convenient way to add caching to many applications.

    # this is not actually the Table. Because each user will have their own view (and therefore their own materialized view)
    __abstract__ = True

    __table_args__ = {"extend_existing": True}

    @classmethod
    async def refresh(cls, concurrently=True):
        """Refreshes the current materialized view"""
        await refresh_mat_view(cls.__table__.fullname, concurrently)


    @declared_attr
    def details(self) -> Mapped[List["Details"]]:
        return relationship(
            "Details",
            primaryjoin=f"{self.__name__}.id==foreign(Details.id)",
            uselist=True,
            viewonly=True,
            lazy="subquery",
        )

And now in order to actually populate the python class we use a neat trick. Pythons builtin type has a secret strength. Usually type is called with one argument and determines what kind of object a class is. But when called with three parameters it will return a new object of that type. This provides us with a dynamic form of the class statement and we can create classes on the fly.

from sqlalchemy import Column
from .user import User


attr = ["primary_key", "required", "unique", "nullable"]
columns = {
    str(c).split(".")[-1]: Column(
        c.type, **{at: getattr(c, at) for at in attr if hasattr(c, at)}
    )
    for c in [
        User.id,
        User.name
    ]
}

def get_materializeduserrole_orm_model(userrole: str) -> MaterializedUserRole:

    tablename = "materializeduserrole_%s" % userrole
    class_name = "MaterializedUserRole%s" % userrole

    # the class was already loaded and is therefore in sqlalchemy's registry
    for key, val in Base.registry._class_registry.items():
        if key == class_name:
            return val

    # otherwise use type to populate the class
    Model = type(
        class_name,
        (MaterializedUserRole,),
        {
            "__tablename__": tablename,
            "__table_args__": {"extend_existing": True},
            **columns,
        },
    )

    return Model

A side-effect of this now is that we no longer need to compile the MATERIALIZED VIEW. SQLAlchemy will find the table in the database and add it to its Base.registry (if its already in the database). In our case these MATERIALIZED VIEWs are created in another app and our main backend doesn't really need a way to "create" it.

Note: The lookup in the registry looks not as it is the best option - being a for loop. I am not sure how this can be done differently. My code got inspired by sqlalchemy_utils get_class_by_table function which also iterates through the registry. If someone knows how to do this better - drop me a line.