Commit Graph

125 Commits

Author SHA1 Message Date
Sumedh Sakdeo
80e777823b Field names in big query can contain only alphanumeric and underscore (#5641)
* Field names in big query can contain only alphanumeric and underscore

* bad quote

* better place for mutating labels

* lint

* bug fix thanks to mistercrunch

* lint

* lint again
2018-08-21 13:45:42 -07:00
Sumedh Sakdeo
0fbda33c68 Handling bigquery dialect when previewing data (#5655)
* Handling bigquery dialect when previewing data

* review comments

* lint
2018-08-20 22:04:22 -07:00
Sumedh Sakdeo
5966a674e5 Explore View Perf Fix (#5637) 2018-08-15 12:27:08 -07:00
Sumedh Sakdeo
c9bd5a6167 Fetch a batch of rows from bigquery (#5632)
* Fetch a batch of rows from bigquery

* unused const

* review comments
2018-08-14 21:44:04 -07:00
Ville Brofeldt
e1f4db8e24 Match viz dataframe column case to form_data fields for Snowflake, Oracle and Redshift (#5487)
* Add function to fix dataframe column case

* Fix broken handle_nulls method

* Add case sensitivity option to dedup

* Refactor function definition and call location

* Remove added blank line

* Move df column rename logit to db_engine_spec

* Remove redundant variable

* Update comments in db_engine_specs

* Tie df adjustment to db_engine_spec class attribute

* Fix dedup error

* Linting

* Check for db_engine_spec attribute prior to adjustment

* Rename case sensitivity flag

* Linting

* Remove function that was moved to db_engine_specs

* Get metrics names from utils

* Remove double import and rename dedup variable
2018-08-03 09:53:56 -07:00
Maxime Beauchemin
fe6846b8db [sql lab] simplify the visualize flow (#5523)
* [sql lab] simplify the visualize flow

The "visualize flow" linking SQL Lab to the "explore view" has never
worked so great for people, here's a list of issues:

* it's not really clear to users that their query is wrapped as a
subquery, and the explore view runs queries on top of it

* lint + fix tests

* Addressing comments
2018-08-02 10:52:38 -07:00
Ville Brofeldt
c1e6c68a3e Add time grain blacklist and addons to config.py (#5380)
* Add interim grains

* Refactor and add blacklist

* Change PT30M to PT0.5H

* Linting

* Linting

* Add time grain addons to config.py and refactor engine spec logic

* Remove redundant import and clean up config.py

* Fix bad rebase

* Implement changes proposed by @betodealmeida

* Revert removal of name from Grain

* Linting
2018-07-30 23:44:30 -07:00
Maxime Beauchemin
cd55998d63 Improve hive/pyhive error message regex (#5502) 2018-07-27 08:31:37 -07:00
Maxime Beauchemin
41286b7545 [sql lab] extract Hive error messages (#5495)
* [sql lab] extract Hive error messages

So pyhive returns an exception object with a stringified thrift error
object. This PR uses a regex to extract the errorMessage portion of that
string.

* Unit test
2018-07-26 15:17:55 -07:00
Ville Brofeldt
a165aec822 Fix broken dedup and remove redundant db_spec logic (#5467)
* Fix broken dedup and remove redundant db_spec logic

* Add test case
2018-07-23 10:41:38 -07:00
John Bodley
7fcc2af68f [sql] Correct SQL parameter formatting (#5178) 2018-07-21 12:01:26 -07:00
George
0d5443e392 Add week granularity for Clickhouse (#5455) 2018-07-21 09:53:21 -07:00
timifasubaa
f8a6e09220 [sqllab] Fix sqllab limit regex issue with sqlparse (#5295)
* include items after limit to the modified query

* use sqlparse
2018-07-16 15:27:30 -07:00
timifasubaa
22b7c2db62 quote hive column names (#5368) 2018-07-13 15:51:16 -07:00
timifasubaa
28ba5a9ddb use schema form field in upload csv (#5303) 2018-07-06 09:46:53 -07:00
aaronbannin
252cba20de impala support for epoch timestamps (#5349) 2018-07-04 19:25:58 -04:00
EvelynTurner
ad9103f5ba [Bug fix] Divide by 1000.000 in epoch_ms_to_dttm() to not lose precision in Presto (#5211)
* Fix how the annotation layer interpretes the timestamp string without timezone info; use it as UTC

* [Bug fix] Fixed/Refactored annotation layer code so that non-timeseries annotations are applied based on the updated chart object after adding all data

* [Bug fix] Fixed/Refactored annotation layer code so that non-timeseries annotations are applied based on the updated chart object after adding all data

* Fixed indentation

* Fix the key string value in case series.key is a string

* Fix the key string value in case series.key is a string

* [Bug fix] Divide by 1000.000 in epoch_ms_to_dttm() to not lose precision in Presto

* [Bug fix] Divide by 1000.000 in epoch_ms_to_dttm() to not lose precision in Presto
2018-07-04 19:19:57 -04:00
Minh Mai
059b64dad7 normalize column names for Redshift (#5337) 2018-07-04 17:30:37 -04:00
Maxime Beauchemin
777d876a52 Improve database type inference (#4724)
* Improve database type inference

Python's DBAPI isn't super clear and homogeneous on the
cursor.description specification, and this PR attempts to improve
inferring the datatypes returned in the cursor.

This work started around Presto's TIMESTAMP type being mishandled as
string as the database driver (pyhive) returns it as a string. The work
here fixes this bug and does a better job at inferring MySQL and Presto types.
It also creates a new method in db_engine_specs allowing for other
databases engines to implement and become more precise on type-inference
as needed.

* Fixing tests

* Adressing comments

* Using infer_objects

* Removing faulty line

* Addressing PrestoSpec redundant method comment

* Fix rebase issue

* Fix tests
2018-06-27 21:35:12 -07:00
timifasubaa
b0eee129e9 add more precise types to hive table from csv (#5267) 2018-06-25 16:12:01 -07:00
timifasubaa
bd24f854c9 specify hve namespace for tables (#5268) 2018-06-25 12:04:27 -07:00
timifasubaa
0e5293b9be Update db_engine_specs.py (#5264) 2018-06-21 16:01:34 -07:00
Maxime Beauchemin
c89933d870 [sql lab] quote schema and table name (#5195)
fixes https://github.com/apache/incubator-superset/issues/4595
2018-06-18 08:42:08 -07:00
Xiao Hanyu
b71f551493 Optimize presto SQL Lab query performance. (#5132)
By stop polling when presto query already finished.

When user make queries to Presto via SQL Lab, presto will run the query
and then it can return all data back to superset in one shot.

However, the default implementation of superset has enabled a default
polling for presto to:

- Get the fancy progress bar
- Get the data back when the query finished.

However, the polling implementation of superset is not right.

I've done a profiling with a table of 1 billion rows, here're some data:

- Total number of rows: 1.02 Billion
- SQL Lab query limit: 1 million
- Output Data: 1.5 GB
- Superset memory consumed: about 10-20 GB
- Time: 7 minutes to finish in Presto, takes additional 15 minutes for
  superset to get and store data.

The problems with default issue is, even if presto has finished the
query (7 minutes with above profiling), superset still do lots of wasted
polling, in above profiling, superset sent about 540 polling in total,
and at half of the polling is not necessary.

Part of the simplied polling response:

```
{
  "infoUri": "http://10.65.204.39:8000/query.html?20180525_042715_03742_nza9u",
  "id": "20180525_042715_03742_nza9u",
  "nextUri": "http://10.65.204.39:8000/v1/statement/20180525_042715_03742_nza9u/11",
  "stats": {
    "state": "FINISHED",
    "queuedSplits": 21701,
    "progressPercentage": 35.98264191882267,
    "elapsedTimeMillis": 1029,
    "nodes": 116,
    "completedSplits": 15257,
    "scheduled": true,
    "wallTimeMillis": 2571904,
    "peakMemoryBytes": 0,
    "processedBytes": 40825519532,
    "processedRows": 47734066,
    "queuedTimeMillis": 0,
    "queued": false,
    "cpuTimeMillis": 849228,
    "rootStage": {
      "state": "FINISHED",
      "queuedSplits": 0,
      "nodes": 1,
      "totalSplits": 17,
      "processedBytes": 16829644,
      "processedRows": 11495,
      "completedSplits": 17,
      "stageId": "0",
      "done": true,
      "cpuTimeMillis": 69,
      "subStages": [
        {
          "state": "CANCELED",
          "queuedSplits": 21701,
          "nodes": 116,
          "totalSplits": 42384,
          "processedBytes": 40825519532,
          "processedRows": 47734066,
          "completedSplits": 15240,
          "stageId": "1",
          "done": true,
          "cpuTimeMillis": 849159,
          "subStages": [],
          "wallTimeMillis": 2570374,
          "userTimeMillis": 730020,
          "runningSplits": 5443
        }
      ],
      "wallTimeMillis": 1530,
      "userTimeMillis": 50,
      "runningSplits": 0
    },
    "totalSplits": 42401,
    "userTimeMillis": 730070,
    "runningSplits": 5443
  }
  }
}
```

Superset will terminate the polling when it finds that `nextUri`
becomes none, but actually, when `["stats"]["state"] == "FINISHED"`,
it means that presto has already finished the query and superset can stop
polling and get the data back.

After this simple optimization, we get a 2-5x performance boost for
Presto SQL Lab queries.
2018-06-05 08:56:18 -07:00
timifasubaa
cefc206a36 Merge pull request #5023 from timifasubaa/fix_sqllab_commit
[sqllab] force limit queries only when there is no existing limit
2018-05-31 11:12:46 -07:00
Timi Fasubaa
a9d7fafd9f add tests 2018-05-30 12:50:27 -07:00
Beto Dealmeida
6c3e469154 Add more time grains (#5083)
* Add more time grains

* Use FLOOR

* Fix quotes for lint
2018-05-29 12:43:48 -07:00
Maciej Bryński
ae50845843 Proper error handling in Hive Queries (#4428)
* Proper error handling in Hive Queries

* Change quotes

* Trigger checks

* Adding call to parent class

* Small fix

* Fix in method call
2018-05-29 12:42:45 -07:00
Timi Fasubaa
d38315a307 reuse_regex_logic 2018-05-25 15:07:27 -07:00
Maxime Beauchemin
b839608c32 [sql lab] a better approach at limiting queries (#4947)
* [sql lab] a better approach at limiting queries

Currently there are two mechanisms that we use to enforce the row
limiting constraints, depending on the database engine:
1. use dbapi's `cursor.fetchmany()`
2. wrap the SQL into a limiting subquery

Method 1 isn't great as it can result in the database server storing
larger than required result sets in memory expecting another fetch
command while we know we don't need that.

Method 2 has a positive side of working with all database engines,
whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy
does the work as specified for the dialect. On the downside though
the query optimizer might not be able to optimize this as much as an
approach that doesn't use a subquery.

Since most modern DBs use the LIMIT syntax, this adds a regex approach
to modify the query and force a LIMIT clause without using a subquery
for the database that support this syntax and uses method 2 for all
others.

* Fixing build

* Fix lint

* Added more tests

* Fix tests
2018-05-14 14:44:05 -05:00
Yongjie Zhao
7a4a89b195 Update Apache Kylin dbengine with supported week/quarter grains (#4965) 2018-05-14 14:11:59 -05:00
Ville Brofeldt
b391676544 Force lowercase column names for Snowflake and Oracle (#4994)
* Force lowercase column names for Snowflake and Oracle

* Force lowercase column names for Snowflake and Oracle

* Remove lowercasing of DB2 columns

* Remove DB2 lowercasing

* Fix test cases
2018-05-14 13:43:13 -05:00
timifasubaa
d87504cb42 Merge pull request #4833 from timifasubaa/help_sqllab_forget_the_past
[sqllab] Help sqllab forget query history
2018-05-07 10:56:39 -07:00
Timi Fasubaa
ab958c67e6 make queries older than 6 hours timeout 2018-05-07 10:14:37 -07:00
Yongjie Zhao
5d6e59aa8a Support Apache Kylin in EngineSpec (#4925)
* Support Apache Kylin in EngineSpec

* Fix flake8
2018-05-03 08:42:43 -07:00
Maxime Beauchemin
5f6a1cea47 Fix typos from linting (#4918)
Caused by https://github.com/apache/incubator-superset/pull/3847

Fixes https://github.com/apache/incubator-superset/issues/4915
2018-05-01 13:34:10 -07:00
Beto Dealmeida
13da5a8742 Fix for week_start_sunday and week_ending_saturday (#4911)
* Handle locked weeks

* Fix spelling

* Fix druid

* Clean unit tests
2018-05-01 13:27:56 -07:00
John Bodley
d533ce0967 [pylint] prepping for enabling pylint for non-errors (#4884) 2018-04-28 20:08:09 -07:00
Ville Brofeldt
fa3da8c888 Implement Snowflake engine with supported time grains (#4882)
* Implement Snowflake engine with supported time grains

* Fix typo in second grain
2018-04-25 14:44:24 -07:00
Riccardo Magliocchetti
3b18fbf9e3 db_engine_specs: use correct sqlite week time grain (#4831)
We want the week of year %W and not the day of week %w
when using week as time grain.

Reference:
https://www.sqlite.org/lang_datefunc.html
2018-04-15 16:15:31 -07:00
timifasubaa
20f46eede5 call next() the right way (#4804) 2018-04-11 13:20:14 -07:00
Beto Dealmeida
426c34ee86 Pass granularity from backend to frontend as ISO duration (#4755)
* Add ISO duration to time grains

* Use ISO duration

* Remove debugging code

* Add module to yarn.lock

* Remove autolint

* Druid granularity as ISO

* Remove dangling comma
2018-04-06 16:19:17 -07:00
Maxime Beauchemin
93ec76f757 [sql lab] reduce the number of metadata calls when loading a table (#4593) 2018-03-15 17:53:34 -07:00
John Bodley
4250e239a2 Merge pull request #4590 from michellethomas/fixing_double_escape_presto
Removing escape_sql so we dont double escape
2018-03-13 12:19:44 -07:00
Hugh A. Miles II
2bc089ef8d Added new exception class and start of better exception/error handling (#4514)
* rebase and linting

* change back

* wip

* fixed broken test

* fix flake8

* fix test
2018-03-11 22:07:51 -07:00
Michelle Thomas
e1af421f0c Removing escape_sql so we dont double escape 2018-03-09 15:37:17 -08:00
Kyle Travis
31a995714d [bug] Fix CSV upload feature for DB with password (#4562)
* Use sqlalchemy_uri_decrypted in create_engine calls

* Update tox mysql uri

* Include mysql charset=utf8 for py2.7 in tox.ini
2018-03-07 17:42:52 -08:00
John Bodley
150768ee30 [presto] Removing patched presto (#4530) 2018-03-05 23:16:02 -08:00
timifasubaa
404e2d552a fixes to csv - hive upload (#4488) 2018-02-27 22:13:06 -08:00
John Bodley
d57a37e341 [flake8] Adding flake8-coding (#4477) 2018-02-25 15:06:11 -08:00