* Improve database type inference
Python's DBAPI isn't super clear and homogeneous on the
cursor.description specification, and this PR attempts to improve
inferring the datatypes returned in the cursor.
This work started around Presto's TIMESTAMP type being mishandled as
string as the database driver (pyhive) returns it as a string. The work
here fixes this bug and does a better job at inferring MySQL and Presto types.
It also creates a new method in db_engine_specs allowing for other
databases engines to implement and become more precise on type-inference
as needed.
* Fixing tests
* Adressing comments
* Using infer_objects
* Removing faulty line
* Addressing PrestoSpec redundant method comment
* Fix rebase issue
* Fix tests
* [sql lab] a better approach at limiting queries
Currently there are two mechanisms that we use to enforce the row
limiting constraints, depending on the database engine:
1. use dbapi's `cursor.fetchmany()`
2. wrap the SQL into a limiting subquery
Method 1 isn't great as it can result in the database server storing
larger than required result sets in memory expecting another fetch
command while we know we don't need that.
Method 2 has a positive side of working with all database engines,
whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy
does the work as specified for the dialect. On the downside though
the query optimizer might not be able to optimize this as much as an
approach that doesn't use a subquery.
Since most modern DBs use the LIMIT syntax, this adds a regex approach
to modify the query and force a LIMIT clause without using a subquery
for the database that support this syntax and uses method 2 for all
others.
* Fixing build
* Fix lint
* Added more tests
* Fix tests
* Force lowercase column names for Snowflake and Oracle
* Force lowercase column names for Snowflake and Oracle
* Remove lowercasing of DB2 columns
* Remove DB2 lowercasing
* Fix test cases
* use session context manager
* contextlib2 added to requirements.txt
* Fixing error: Import statements are in the wrong order. from contextlib2 import contextmanager should be before import sqlalchemy
* Fixing return inside generator
* fixed C812 missing trailing comma
* E501 line too long
* fixed E127 continuation line over-indented for visual indent
* E722 do not use bare except
* reorganized imports
* added context manager contextlib2.contextmanager
* fixed import ordering
fixes https://github.com/apache/incubator-superset/issues/4926
In rare cases where the query is stopped before it is started, SQL Lab
returns an unexpected string payload instead of a normal dictionary.
This aligns the process to handle the error in a homogeneous fashion.
* move access permissions methods to security manager
* consolidate all security methods into SupersetSecurityManager
* update security method calls
* update calls from tests
* move get_or_create_main_db to utils
* raise if supersetsecuritymanager is not extended
* rename sm to security_manager
* conditional check on datatype of results before converting to df
fix type checking
fix conditional checks
remove trailing whitespace and fix df_data fallback def
actually remove trailing whitespace
generalized type check to check all columns for dict
refactor dict col check
* move df conversion to helper and add unit test
add missing newlines
another missing newline
fix quotes
more quote fixes
* Fix USA's state geojson for 'Country Map' visualization
Turns out the ISO codes were missing from the geojson file, this adds it
and uses human-readable indents.
* using proper ISO codes
* Linting
New linting rules started applying, I'm guessing a new version of
pylint?
* Add "Impersonate user" setting to Datasource
* Add tests
* Use g.user.username for all the sync cases
* use uri.username instead of uri.user
* Small refactoring
* [sqllab] improve Hive support
* Fix "Transport not open" bug
* Getting progress bar to show
* Bump pyhive to 0.4.0
* Getting [Track Job] button to show
* Fix testzz
* Escaping the user's SQL in the explore view
When executing SQL from SQL Lab, we use a lower level API to the
database which doesn't require escaping the SQL. When going through
the explore view, the stack chain leading to the same method may need
escaping depending on how the DBAPI driver is written, and that is the
case for Presto (and perhaps other drivers).
* Using regex to avoid doubling doubles
* upgrade celery to 4.0.2
* using Redis for unit tests (sqla broker not supported in Celery 4)
* Setting Celery's soft_time_limit based on `SQLLAB_ASYNC_TIME_LIMIT_SEC` config
* Better error handling in async tasks
* Better statsd logging in async tasks
* show [pending/running] query status in Results tab
* systematically using sqla NullPool on worker (async) to limit number
of database connections
* sql_lab.py: compress via utils
* utils.py: added zlib_compress and zlib_compress_to_string
* core.py: converted to use zlib_decompress_to_string; renamed uncompress to decompress in utils.py
* utils_tests.py: added test for compress/decompress
* fixed broken utils test; removed redundant code and empty lines from utils.py
* utils.py: corrected docstrings, removed unnecessary 'else'
* removed yet another superfluous else