Commit Graph

101 Commits

Author SHA1 Message Date
Rob DiCiuccio
6537d5ed8c Replace pandas.DataFrame with PyArrow.Table for nullable int typing (#8733)
* Use PyArrow Table for query result serialization

* Cleanup dev comments

* Additional cleanup

* WIP: tests

* Remove explicit dtype logic from db_engine_specs

* Remove obsolete  column property

* SupersetTable column types

* Port SupersetDataFrame methods to SupersetTable

* Add test for nullable boolean columns

* Support datetime values with timezone offsets

* Black formatting

* Pylint

* More linting/formatting

* Resolve issue with timezones not appearing in results

* Types

* Enable running of tests in tests/db_engine_specs

* Resolve application context errors

* Refactor and add tests for pyodbc.Row conversion

* Appease isort, regardless of isort:skip

* Re-enable RESULTS_BACKEND_USE_MSGPACK default based on benchmarks

* Dataframe typing and nits

* Renames to reduce ambiguity
2020-01-03 11:55:39 -05:00
Kim Truong
2d42272e60 Add user agent logs (#8826)
* feat: add user agent logging

* fix: lint

* fix: address feedback

* fix: formatting
2019-12-16 14:28:00 -08:00
Will Barrett
562aeab1aa Fix a bunch of files with pylint disabled (#8743)
* Re-enable pylint for superset/jinja_context.py

* Re-enable pylint for superset/sql_lab.py

* Re-enable pylint for superset/sql_parse.py

* Re-enable pylint for superset/exceptions.py

* Re-enable lint for superset/translations/utils.py

* Re-enable pylint for superset/views/schedules.py

* Re-enable pylint for superset/views/base.py

* Re-enable pylint for superset/views/log/views.py

* Re-enable pylint for superset/views/annotations.py

* black

* PR feedback, pylint, isort fixes

* Black, one more time...

* Move ungrouped-imports to a global disable
2019-12-11 10:14:24 -08:00
Craig Rueda
df2ee5cbcb Adding app context wrapper to Celery tasks (#8653)
* Adding app context wrapper to Celery tasks
2019-11-27 15:06:06 +00:00
Craig Rueda
e490414484 Flask App factory PR #1 (#8418)
* First cut at app factory

* Setting things back to master

* Working with new FLASK_APP

* Still need to refactor Celery

* CLI mostly working

* Working on unit tests

* Moving cli stuff around a bit

* Removing get in config

* Defaulting test config

* Adding flask-testing

* flask-testing casing

* resultsbackend property bug

* Fixing up cli

* Quick fix for KV api

* Working on save slice

* Fixed core_tests

* Fixed utils_tests

* Most tests working - still need to dig into remaining app_context issue in tests

* All tests passing locally - need to update code comments

* Fixing dashboard tests again

* Blacking

* Sorting imports

* linting

* removing envvar mangling

* blacking

* Fixing unit tests

* isorting

* licensing

* fixing mysql tests

* fixing cypress?

* fixing .flaskenv

* fixing test app_ctx

* fixing cypress

* moving manifest processor around

* moving results backend manager around

* Cleaning up __init__ a bit more

* Addressing PR comments

* Addressing PR comments

* Blacking

* Fixes for running celery worker

* Tuning isort

* Blacking
2019-11-20 15:47:06 +00:00
Beto Dealmeida
d66bc5ad90 SIP-23: Persist SQL Lab state in the backend (#8060)
* Squash all commits from VIZ-689

* Fix javascript

* Fix black

* WIP fixing javascript

* Add feature flag SQLLAB_BACKEND_PERSISTENCE

* Use feature flag

* Small fix

* Fix lint

* Fix setQueryEditorSql

* Improve unit tests

* Add unit tests for backend sync

* Rename results to description in table_schema

* Add integration tests

* Fix black

* Migrate query history

* Handle no results backend

* Small improvement

* Address comments

* Store SQL directly instead of reference to query

* Small fixes

* Fix clone tab

* Fix remove query

* Cascade delete

* Cascade deletes

* Fix tab closing

* Small fixes

* Small fix

* Fix error when deleting tab

* Catch 404 when tab is deleted

* Remove tables from state on tab close

* Add index, autoincrement and cascade

* Prevent duplicate table schemas

* Fix mapStateToProps

* Fix lint

* Fix head

* Fix javascript

* Fix mypy

* Fix isort

* Fix javascript

* Fix merge

* Fix heads

* Fix heads

* Fix displayLimit

* Recreate migration script trying to fix heads

* Fix heads
2019-11-14 09:44:57 -08:00
Will Barrett
e4ca44e95f Use config[] not config.get() (#8454)
* Typo fix in CONTRIBUTING.md

* Alter references to config.get('FOO') to use preferred config['FOO']

* Set missing configuration constants in superset/config.py

* Misc. CI fixes

* Add type annotation for FEATURE_FLATGS
2019-10-30 16:19:16 -07:00
Kim Truong
3cba1bd990 feat: add expand_data parameter (#8472)
* feat: add expand_data parameter

* fix: reformat files
2019-10-30 11:45:35 -07:00
John Bodley
9fc37ea9f1 [ci] Deprecate flake8 (#8409)
* [ci] Deprecate flake8

* Addressing @villebro's comments
2019-10-18 14:44:27 -07:00
Erik Ritter
11935ce118 Add commit to attempt to resolve query table lock (#8262) 2019-09-23 14:37:50 -07:00
serenajiang
00257b9433 [sqllab] add retries for stop_query (#8139)
* [sqllab] add retries for stop_query

* use backoff for retries

* address PR comments

* import statement order
2019-09-03 16:37:42 -07:00
Rob DiCiuccio
7595d9e5fd [SQL Lab] Async query results serialization with MessagePack and PyArrow (#8069)
* Add support for msgpack results_backend serialization

* Serialize DataFrame with PyArrow rather than JSON

* Adjust dependencies, de-lint

* Add tests for (de)serialization methods

* Add MessagePack config info to Installation docs

* Enable msgpack/arrow serialization by default

* [Fix] Prevent msgpack serialization on synchronous queries

* Add type annotations
2019-08-27 14:23:40 -07:00
serenajiang
624449816f [logging] add query id to SQL Lab logs (#8104)
* [logging] add query id to logs

* add query_id to hive and presto logging
2019-08-26 10:35:18 -07:00
serenajiang
e6956f84b4 [fix] checks for stopped queries (#8097) 2019-08-22 22:23:44 -07:00
Erik Ritter
fc6a53eda9 [SQL Lab] Add hard time limit fallback for async queries (#7783) 2019-07-01 10:02:37 -07:00
John Bodley
5c58fd1802 [format] Using Black (#7769) 2019-06-25 13:34:48 -07:00
Maxime Beauchemin
4b5931f637 Alternative fix for #7559 (#7575)
* Alternative fix for #7559

Just an idea...

* logging
2019-06-01 09:21:35 -07:00
Kim Truong
d2967340d9 View Presto row and array objects clearly in the data grid (#7625)
* feat: rough check in for Presto rows and arrays

* fix: presto arrays

* fix: return selected and expanded columns

* fix: add helper methods and unit tests

* fix: only allow exploration of selected columns

* fix: address Beto's comments and add more unit tests
2019-05-31 11:25:07 -07:00
michellethomas
52473c5d34 Fix race condition when fetching results in SQL Lab (#7198) (#7242)
* Fix race condition when fetching results in SQL Lab

* Fix lint

(cherry picked from commit ca6a73b028)
2019-04-08 15:08:32 -07:00
Maxime Beauchemin
1dd4d7a587 Apply ASF licenses throughout the code base (#5800)
* Add license headers

* reabased

* lint

* Removing licenses from vendors folder
2019-01-15 15:53:27 -08:00
timifasubaa
9d70c348d3 pass source to db api mutator (#6497) 2019-01-10 17:30:32 -08:00
Maxime Beauchemin
d427db0a8b [SQL Lab] Allow running multiple statements (#6112)
* Allow running multiple statements from SQL Lab

* fix tests

* More tests

* merge heads

* fix heads
2018-12-22 10:28:22 -08:00
Beto Dealmeida
672c470e79 Pass security manager to QUERY_LOGGER (#6548)
* Pass security manager to log_query

* Fix lint
2018-12-18 10:55:58 -08:00
Mahendra M
808622414c [SIP-3] Scheduled email reports for Slices / Dashboards (#5294)
* [scheduled reports] Add support for scheduled reports

* Scheduled email reports for slice and dashboard visualization
  (attachment or inline)
* Scheduled email reports for slice data (CSV attachment on inline table)
* Each schedule has a list of recipients (all of them can receive a single mail,
  or separate mails)
* All outgoing mails can have a mandatory bcc - for audit purposes.
* Each dashboard/slice can have multiple schedules.

In addition, this PR also makes a few minor improvements to the celery
infrastructure.
* Create a common celery app
* Added more celery annotations for the tasks
* Introduced celery beat
* Update docs about concurrency / pools

* [scheduled reports] - Debug mode for scheduled emails

* [scheduled reports] - Ability to send test mails

* [scheduled reports] - Test email functionality - minor improvements

* [scheduled reports] - Rebase with master. Minor fixes

* [scheduled reports] - Add warning messages

* [scheduled reports] - flake8

* [scheduled reports] - fix rebase

* [scheduled reports] - fix rebase

* [scheduled reports] - fix flake8

* [scheduled reports] Rebase in prep for merge

* Fixed alembic tree after rebase
* Updated requirements to latest version of packages (and tested)
* Removed py2 stuff

* [scheduled reports] - fix flake8

* [scheduled reports] - address review comments

* [scheduled reports] - rebase with master
2018-12-10 22:29:29 -08:00
Beto Dealmeida
5168c69a25 Hook for auditing queries (#6484)
* Hook for auditing queries

* Address comments

* Use __name__
2018-12-07 12:19:19 -08:00
Jeffrey Wang
0584e3629f Add separate limit setting for SqlLab (#4941)
* Add separate limit setting for SqlLab

Use separate param for wrap sql

Get query limit from config

unit tests for limit control rendering in sql editor

py unit test

pg tests

Add max rows limit

Remove concept of infinity, always require defined limits

consistency

Assert on validation errors instead of tooltip

fix unit tests

attempt persist state

pr comments and linting

* load configs in via common param

* default to 1k
2018-11-07 15:57:44 -08:00
Maxime Beauchemin
bbfd69a138 [utils.py] gathering/refactoring into a "utils/" folder (#6095)
* [utils] gathering/refactoring into a "utils/" folder

Moving current utils.py into utils/core.py and moving other *util*
modules under this new "utils/" as well.

Following steps include eroding at "utils/core.py" and breaking it down
into smaller modules.

* Improve tests

* Make loading examples in scope for tests

* Remove test class attrs examples_loaded and requires_examples
2018-10-16 17:59:34 -07:00
timifasubaa
46c86672c8 remove utf8 declaration (#6096) 2018-10-15 11:53:24 -07:00
timifasubaa
dd9eeda03e remove future (#6065) 2018-10-13 09:39:04 -07:00
timifasubaa
e12d00ae71 log query fetch time (#6033) 2018-10-04 11:13:32 -07:00
Junda Yang
fdd44ace27 remove duplicated utils (#5851) 2018-09-13 15:24:08 -07:00
timifasubaa
c82cea3c8d fix sqllab logging (#5862) 2018-09-11 17:23:44 -07:00
timifasubaa
9a4bba485a [sqllab]More granular sqllab logging (#5736)
* quote hive column names (#5368)

* create db migration

* use stats_logger timing

* trigger build
2018-09-11 08:52:37 -07:00
Christine Chambers
ae3fb04036 Bug: fixing async syntax for python 3.7 (#5759)
* Bug: fixing async syntax for python 3.7

Rename async to async_ so superset installs for python 3.7.

* Addressing PR comments. Use kwargs instead of explicitly specifying async_ so downstream engines (e.g. PyHive) that supports async can choose to use the async_ in pythonwq3.7 and async in <=python3.6

* addressing additional pr comments
2018-08-28 17:40:45 -07:00
Maxime Beauchemin
be04c98cd3 [sql lab] always use NullPool (#5612)
I think that the only place where we want db connection pooling would be
to talk to the metadata database. SQL Lab should close its connections
and never pool them.
Given that each Gunicorn worker will create its own pool that can lead
to way too many connections opened.

closes https://github.com/apache/incubator-superset/issues/4666
2018-08-14 16:27:13 -07:00
Maxime Beauchemin
9331cf79b5 [sql lab] allow EXPlAIN queries (#5558)
* [sql lab] allow EXPlAIN queries

closes https://github.com/andialbrecht/sqlparse/issues/421

* typo
2018-08-03 15:33:33 -07:00
Maxime Beauchemin
6f87552bc2 [sqllab] fix unexpected keyword argument 'ignore_nan' (#5490) 2018-07-26 15:20:48 -07:00
Maxime Beauchemin
e22fcb9abf [sql lab] fix Hive 'Transport' not open issue (#5494) 2018-07-26 15:18:49 -07:00
John Bodley
7fcc2af68f [sql] Correct SQL parameter formatting (#5178) 2018-07-21 12:01:26 -07:00
timifasubaa
41447e8b3b remove limiting at the display level (#5413) 2018-07-19 17:36:03 -07:00
Maxime Beauchemin
777d876a52 Improve database type inference (#4724)
* Improve database type inference

Python's DBAPI isn't super clear and homogeneous on the
cursor.description specification, and this PR attempts to improve
inferring the datatypes returned in the cursor.

This work started around Presto's TIMESTAMP type being mishandled as
string as the database driver (pyhive) returns it as a string. The work
here fixes this bug and does a better job at inferring MySQL and Presto types.
It also creates a new method in db_engine_specs allowing for other
databases engines to implement and become more precise on type-inference
as needed.

* Fixing tests

* Adressing comments

* Using infer_objects

* Removing faulty line

* Addressing PrestoSpec redundant method comment

* Fix rebase issue

* Fix tests
2018-06-27 21:35:12 -07:00
timifasubaa
93cdf60920 [sqllab] Fix sql lab resolution link (#5216)
* pass link

* update frontend to show link differently

* nits
2018-06-19 20:33:24 -07:00
Timi Fasubaa
a9d7fafd9f add tests 2018-05-30 12:50:27 -07:00
Timi Fasubaa
d38315a307 reuse_regex_logic 2018-05-25 15:07:27 -07:00
Timi Fasubaa
1aced9b562 force limit only when there is no existing limit 2018-05-25 14:54:11 -07:00
Maxime Beauchemin
b839608c32 [sql lab] a better approach at limiting queries (#4947)
* [sql lab] a better approach at limiting queries

Currently there are two mechanisms that we use to enforce the row
limiting constraints, depending on the database engine:
1. use dbapi's `cursor.fetchmany()`
2. wrap the SQL into a limiting subquery

Method 1 isn't great as it can result in the database server storing
larger than required result sets in memory expecting another fetch
command while we know we don't need that.

Method 2 has a positive side of working with all database engines,
whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy
does the work as specified for the dialect. On the downside though
the query optimizer might not be able to optimize this as much as an
approach that doesn't use a subquery.

Since most modern DBs use the LIMIT syntax, this adds a regex approach
to modify the query and force a LIMIT clause without using a subquery
for the database that support this syntax and uses method 2 for all
others.

* Fixing build

* Fix lint

* Added more tests

* Fix tests
2018-05-14 14:44:05 -05:00
Ville Brofeldt
b391676544 Force lowercase column names for Snowflake and Oracle (#4994)
* Force lowercase column names for Snowflake and Oracle

* Force lowercase column names for Snowflake and Oracle

* Remove lowercasing of DB2 columns

* Remove DB2 lowercasing

* Fix test cases
2018-05-14 13:43:13 -05:00
grafke
8591319bde [sql lab] Use context manager for sqllab sessions (#4927)
* use session context manager

* contextlib2 added to requirements.txt

* Fixing error: Import statements are in the wrong order. from contextlib2 import contextmanager should be before import sqlalchemy

* Fixing return inside generator

* fixed C812 missing trailing comma

* E501 line too long

* fixed E127 continuation line over-indented for visual indent

* E722 do not use bare except

* reorganized imports

* added context manager contextlib2.contextmanager

* fixed import ordering
2018-05-10 10:32:31 -07:00
Maxime Beauchemin
415d1c092b [sql lab] handle query stop race condition (#4928)
fixes https://github.com/apache/incubator-superset/issues/4926

In rare cases where the query is stopped before it is started, SQL Lab
returns an unexpected string payload instead of a normal dictionary.

This aligns the process to handle the error in a homogeneous fashion.
2018-05-07 13:49:42 -07:00
John Bodley
d533ce0967 [pylint] prepping for enabling pylint for non-errors (#4884) 2018-04-28 20:08:09 -07:00