[sql lab] a better approach at limiting queries (#4947)

* [sql lab] a better approach at limiting queries

Currently there are two mechanisms that we use to enforce the row
limiting constraints, depending on the database engine:
1. use dbapi's `cursor.fetchmany()`
2. wrap the SQL into a limiting subquery

Method 1 isn't great as it can result in the database server storing
larger than required result sets in memory expecting another fetch
command while we know we don't need that.

Method 2 has a positive side of working with all database engines,
whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy
does the work as specified for the dialect. On the downside though
the query optimizer might not be able to optimize this as much as an
approach that doesn't use a subquery.

Since most modern DBs use the LIMIT syntax, this adds a regex approach
to modify the query and force a LIMIT clause without using a subquery
for the database that support this syntax and uses method 2 for all
others.

* Fixing build

* Fix lint

* Added more tests

* Fix tests
This commit is contained in:
Maxime Beauchemin
2018-05-14 14:44:05 -05:00
committed by GitHub
parent 7a4a89b195
commit b839608c32
6 changed files with 145 additions and 71 deletions

View File

@@ -17,7 +17,6 @@ from sqlalchemy.orm import sessionmaker
from sqlalchemy.pool import NullPool
from superset import app, dataframe, db, results_backend, security_manager, utils
from superset.db_engine_specs import LimitMethod
from superset.models.sql_lab import Query
from superset.sql_parse import SupersetQuery
from superset.utils import get_celery_app, QueryStatus
@@ -186,9 +185,8 @@ def execute_sql(
query.user_id, start_dttm.strftime('%Y_%m_%d_%H_%M_%S'))
executed_sql = superset_query.as_create_table(query.tmp_table_name)
query.select_as_cta_used = True
elif (query.limit and superset_query.is_select() and
db_engine_spec.limit_method == LimitMethod.WRAP_SQL):
executed_sql = database.wrap_sql_limit(executed_sql, query.limit)
elif (query.limit and superset_query.is_select()):
executed_sql = database.apply_limit_to_sql(executed_sql, query.limit)
query.limit_used = True
# Hook to allow environment-specific mutation (usually comments) to the SQL