superset2

mirror of https://github.com/apache/superset.git synced 2026-06-03 06:39:25 +00:00

Author	SHA1	Message	Date
Maxime Beauchemin	cc3a625a4b	Use py3's f-strings instead of s.format(*locals()) (#6448 ) Use py3's f-strings instead of s.format(locals()) In light of the bug reported here https://github.com/apache/incubator-superset/issues/6347, which seems like an odd `.format()` issue in py3, I greped and replaced all instances of `.format(locals())` using py3's f-strings * lint * fix tests	2018-12-02 13:50:49 -08:00
Junda Yang	f1cae2ecdd	override get_view_names in PrestoEngineSpec (#6459 ) * override get_view_names in PrestoEngineSpec * add test * flake 8 * flake 8	2018-11-28 15:13:38 -08:00
John Bodley	74f0817bf0	[hive] Fixing where lastest partition logic (#6357 )	2018-11-12 10:07:38 -08:00
Junda Yang	c552c125d7	Move metadata cache one layer up (#6153 ) * Update wording * nit update for api endpoint url * move metadata cache one layer up * refactor cache * fix flake8 and DatabaseTablesAsync * nit * remove logging for cache * only fetch for all tables that allows cross schema fetch * default allow_multi_schema_metadata_fetch to False * address comments * remove unused defaultdict * flake 8	2018-10-31 13:23:26 -07:00
Sumedh Sakdeo	71d6ff40d0	partition and clustering bigquery keys (#6212 ) * partition and clustering bigquery keys * flake8	2018-10-29 11:23:21 -07:00
Maxime Beauchemin	bbfd69a138	[utils.py] gathering/refactoring into a "utils/" folder (#6095 ) * [utils] gathering/refactoring into a "utils/" folder Moving current utils.py into utils/core.py and moving other util modules under this new "utils/" as well. Following steps include eroding at "utils/core.py" and breaking it down into smaller modules. * Improve tests * Make loading examples in scope for tests * Remove test class attrs examples_loaded and requires_examples	2018-10-16 17:59:34 -07:00
Junda Yang	177bed3bb6	allow cache and force refresh on table list (#6078 ) * allow cache and force refresh on table list * wording * flake8 * javascript test * address comments * nit	2018-10-16 13:14:45 -07:00
timifasubaa	46c86672c8	remove utf8 declaration (#6096 )	2018-10-15 11:53:24 -07:00
timifasubaa	dd9eeda03e	remove future (#6065 )	2018-10-13 09:39:04 -07:00
Junda Yang	712c1aa767	Allow user to force refresh metadata (#5933 ) * Allow user to force refresh metadata * fix javascript test error * nit * fix styling * allow custom cache timeout configuration on any database * minor improvement * nit * fix test * nit * preserve the old endpoint	2018-10-08 20:25:40 -07:00
John Bodley	1ee08fc216	[select-star] Adding optional schema to view (#6051 )	2018-10-08 10:32:40 -07:00
timifasubaa	00c4c7ec4b	fix csv upload bugs (#5940 )	2018-09-20 10:34:15 -05:00
livinm	83fa7af42a	Enable Teradata (#5870 ) * Enable Teradata New DB engine spec for Teradata: - LimitMethod should be WRAP_SQL since Teradata does not supporting "LIMIT" clause (TOP) - Timegrains for Teradata is added * Update formatting to pass flake8 tests	2018-09-13 08:01:25 -07:00
Ville Brofeldt	77fe9ef130	Force quoted column aliases for Oracle-like databases (#5686 ) * Replace dataframe label override logic with table column override * Add mutation to any_date_col * Linting * Add mutation to oracle and redshift * Fine tune how and which labels are mutated * Implement alias quoting logic for oracle-like databases * Fix and align column and metric sqla_col methods * Clean up typos and redundant logic * Move new attribute to old location * Linting * Replace old sqla_col property references with function calls * Remove redundant calls to mutate_column_label * Move duplicated logic to common function * Add db_engine_specs to all sqla_col calls * Add missing mydb * Add note about snowflake-sqlalchemy regression * Make db_engine_spec mandatory in sqla_col * Small refactoring and cleanup * Remove db_engine_spec from get_from_clause call * Make db_engine_spec mandatory in adhoc_metric_to_sa * Remove redundant mutate_expression_label call * Add missing db_engine_specs to adhoc_metric_to_sa * Rename arg label_name to label in get_column_label() * Rename label function and add docstring * Remove redundant db_engine_spec args * Rename col_label to label * Remove get_column_name wrapper and make direct calls to db_engine_spec * Remove unneeded db_engine_specs * Rename sa_ vars to sqla_	2018-09-03 22:49:58 -07:00
Christine Chambers	ae3fb04036	Bug: fixing async syntax for python 3.7 (#5759 ) * Bug: fixing async syntax for python 3.7 Rename async to async_ so superset installs for python 3.7. * Addressing PR comments. Use kwargs instead of explicitly specifying async_ so downstream engines (e.g. PyHive) that supports async can choose to use the async_ in pythonwq3.7 and async in <=python3.6 * addressing additional pr comments	2018-08-28 17:40:45 -07:00
Sumedh Sakdeo	80e777823b	Field names in big query can contain only alphanumeric and underscore (#5641 ) * Field names in big query can contain only alphanumeric and underscore * bad quote * better place for mutating labels * lint * bug fix thanks to mistercrunch * lint * lint again	2018-08-21 13:45:42 -07:00
Sumedh Sakdeo	0fbda33c68	Handling bigquery dialect when previewing data (#5655 ) * Handling bigquery dialect when previewing data * review comments * lint	2018-08-20 22:04:22 -07:00
Sumedh Sakdeo	5966a674e5	Explore View Perf Fix (#5637 )	2018-08-15 12:27:08 -07:00
Sumedh Sakdeo	c9bd5a6167	Fetch a batch of rows from bigquery (#5632 ) * Fetch a batch of rows from bigquery * unused const * review comments	2018-08-14 21:44:04 -07:00
Ville Brofeldt	e1f4db8e24	Match viz dataframe column case to form_data fields for Snowflake, Oracle and Redshift (#5487 ) * Add function to fix dataframe column case * Fix broken handle_nulls method * Add case sensitivity option to dedup * Refactor function definition and call location * Remove added blank line * Move df column rename logit to db_engine_spec * Remove redundant variable * Update comments in db_engine_specs * Tie df adjustment to db_engine_spec class attribute * Fix dedup error * Linting * Check for db_engine_spec attribute prior to adjustment * Rename case sensitivity flag * Linting * Remove function that was moved to db_engine_specs * Get metrics names from utils * Remove double import and rename dedup variable	2018-08-03 09:53:56 -07:00
Maxime Beauchemin	fe6846b8db	[sql lab] simplify the visualize flow (#5523 ) * [sql lab] simplify the visualize flow The "visualize flow" linking SQL Lab to the "explore view" has never worked so great for people, here's a list of issues: * it's not really clear to users that their query is wrapped as a subquery, and the explore view runs queries on top of it * lint + fix tests * Addressing comments	2018-08-02 10:52:38 -07:00
Ville Brofeldt	c1e6c68a3e	Add time grain blacklist and addons to config.py (#5380 ) * Add interim grains * Refactor and add blacklist * Change PT30M to PT0.5H * Linting * Linting * Add time grain addons to config.py and refactor engine spec logic * Remove redundant import and clean up config.py * Fix bad rebase * Implement changes proposed by @betodealmeida * Revert removal of name from Grain * Linting	2018-07-30 23:44:30 -07:00
Maxime Beauchemin	cd55998d63	Improve hive/pyhive error message regex (#5502 )	2018-07-27 08:31:37 -07:00
Maxime Beauchemin	41286b7545	[sql lab] extract Hive error messages (#5495 ) * [sql lab] extract Hive error messages So pyhive returns an exception object with a stringified thrift error object. This PR uses a regex to extract the errorMessage portion of that string. * Unit test	2018-07-26 15:17:55 -07:00
Ville Brofeldt	a165aec822	Fix broken dedup and remove redundant db_spec logic (#5467 ) * Fix broken dedup and remove redundant db_spec logic * Add test case	2018-07-23 10:41:38 -07:00
John Bodley	7fcc2af68f	[sql] Correct SQL parameter formatting (#5178 )	2018-07-21 12:01:26 -07:00
George	0d5443e392	Add week granularity for Clickhouse (#5455 )	2018-07-21 09:53:21 -07:00
timifasubaa	f8a6e09220	[sqllab] Fix sqllab limit regex issue with sqlparse (#5295 ) * include items after limit to the modified query * use sqlparse	2018-07-16 15:27:30 -07:00
timifasubaa	22b7c2db62	quote hive column names (#5368 )	2018-07-13 15:51:16 -07:00
timifasubaa	28ba5a9ddb	use schema form field in upload csv (#5303 )	2018-07-06 09:46:53 -07:00
aaronbannin	252cba20de	impala support for epoch timestamps (#5349 )	2018-07-04 19:25:58 -04:00
EvelynTurner	ad9103f5ba	[Bug fix] Divide by 1000.000 in epoch_ms_to_dttm() to not lose precision in Presto (#5211 ) * Fix how the annotation layer interpretes the timestamp string without timezone info; use it as UTC * [Bug fix] Fixed/Refactored annotation layer code so that non-timeseries annotations are applied based on the updated chart object after adding all data * [Bug fix] Fixed/Refactored annotation layer code so that non-timeseries annotations are applied based on the updated chart object after adding all data * Fixed indentation * Fix the key string value in case series.key is a string * Fix the key string value in case series.key is a string * [Bug fix] Divide by 1000.000 in epoch_ms_to_dttm() to not lose precision in Presto * [Bug fix] Divide by 1000.000 in epoch_ms_to_dttm() to not lose precision in Presto	2018-07-04 19:19:57 -04:00
Minh Mai	059b64dad7	normalize column names for Redshift (#5337 )	2018-07-04 17:30:37 -04:00
Maxime Beauchemin	777d876a52	Improve database type inference (#4724 ) * Improve database type inference Python's DBAPI isn't super clear and homogeneous on the cursor.description specification, and this PR attempts to improve inferring the datatypes returned in the cursor. This work started around Presto's TIMESTAMP type being mishandled as string as the database driver (pyhive) returns it as a string. The work here fixes this bug and does a better job at inferring MySQL and Presto types. It also creates a new method in db_engine_specs allowing for other databases engines to implement and become more precise on type-inference as needed. * Fixing tests * Adressing comments * Using infer_objects * Removing faulty line * Addressing PrestoSpec redundant method comment * Fix rebase issue * Fix tests	2018-06-27 21:35:12 -07:00
timifasubaa	b0eee129e9	add more precise types to hive table from csv (#5267 )	2018-06-25 16:12:01 -07:00
timifasubaa	bd24f854c9	specify hve namespace for tables (#5268 )	2018-06-25 12:04:27 -07:00
timifasubaa	0e5293b9be	Update db_engine_specs.py (#5264 )	2018-06-21 16:01:34 -07:00
Maxime Beauchemin	c89933d870	[sql lab] quote schema and table name (#5195 ) fixes https://github.com/apache/incubator-superset/issues/4595	2018-06-18 08:42:08 -07:00
Xiao Hanyu	b71f551493	Optimize presto SQL Lab query performance. (#5132 ) By stop polling when presto query already finished. When user make queries to Presto via SQL Lab, presto will run the query and then it can return all data back to superset in one shot. However, the default implementation of superset has enabled a default polling for presto to: - Get the fancy progress bar - Get the data back when the query finished. However, the polling implementation of superset is not right. I've done a profiling with a table of 1 billion rows, here're some data: - Total number of rows: 1.02 Billion - SQL Lab query limit: 1 million - Output Data: 1.5 GB - Superset memory consumed: about 10-20 GB - Time: 7 minutes to finish in Presto, takes additional 15 minutes for superset to get and store data. The problems with default issue is, even if presto has finished the query (7 minutes with above profiling), superset still do lots of wasted polling, in above profiling, superset sent about 540 polling in total, and at half of the polling is not necessary. Part of the simplied polling response: ``` { "infoUri": "http://10.65.204.39:8000/query.html?20180525_042715_03742_nza9u", "id": "20180525_042715_03742_nza9u", "nextUri": "http://10.65.204.39:8000/v1/statement/20180525_042715_03742_nza9u/11", "stats": { "state": "FINISHED", "queuedSplits": 21701, "progressPercentage": 35.98264191882267, "elapsedTimeMillis": 1029, "nodes": 116, "completedSplits": 15257, "scheduled": true, "wallTimeMillis": 2571904, "peakMemoryBytes": 0, "processedBytes": 40825519532, "processedRows": 47734066, "queuedTimeMillis": 0, "queued": false, "cpuTimeMillis": 849228, "rootStage": { "state": "FINISHED", "queuedSplits": 0, "nodes": 1, "totalSplits": 17, "processedBytes": 16829644, "processedRows": 11495, "completedSplits": 17, "stageId": "0", "done": true, "cpuTimeMillis": 69, "subStages": [ { "state": "CANCELED", "queuedSplits": 21701, "nodes": 116, "totalSplits": 42384, "processedBytes": 40825519532, "processedRows": 47734066, "completedSplits": 15240, "stageId": "1", "done": true, "cpuTimeMillis": 849159, "subStages": [], "wallTimeMillis": 2570374, "userTimeMillis": 730020, "runningSplits": 5443 } ], "wallTimeMillis": 1530, "userTimeMillis": 50, "runningSplits": 0 }, "totalSplits": 42401, "userTimeMillis": 730070, "runningSplits": 5443 } } } ``` Superset will terminate the polling when it finds that `nextUri` becomes none, but actually, when `["stats"]["state"] == "FINISHED"`, it means that presto has already finished the query and superset can stop polling and get the data back. After this simple optimization, we get a 2-5x performance boost for Presto SQL Lab queries.	2018-06-05 08:56:18 -07:00
timifasubaa	cefc206a36	Merge pull request #5023 from timifasubaa/fix_sqllab_commit [sqllab] force limit queries only when there is no existing limit	2018-05-31 11:12:46 -07:00
Timi Fasubaa	a9d7fafd9f	add tests	2018-05-30 12:50:27 -07:00
Beto Dealmeida	6c3e469154	Add more time grains (#5083 ) * Add more time grains * Use FLOOR * Fix quotes for lint	2018-05-29 12:43:48 -07:00
Maciej Bryński	ae50845843	Proper error handling in Hive Queries (#4428 ) * Proper error handling in Hive Queries * Change quotes * Trigger checks * Adding call to parent class * Small fix * Fix in method call	2018-05-29 12:42:45 -07:00
Timi Fasubaa	d38315a307	reuse_regex_logic	2018-05-25 15:07:27 -07:00
Maxime Beauchemin	b839608c32	[sql lab] a better approach at limiting queries (#4947 ) * [sql lab] a better approach at limiting queries Currently there are two mechanisms that we use to enforce the row limiting constraints, depending on the database engine: 1. use dbapi's `cursor.fetchmany()` 2. wrap the SQL into a limiting subquery Method 1 isn't great as it can result in the database server storing larger than required result sets in memory expecting another fetch command while we know we don't need that. Method 2 has a positive side of working with all database engines, whether they use LIMIT, ROWNUM, TOP or whatever else since sqlalchemy does the work as specified for the dialect. On the downside though the query optimizer might not be able to optimize this as much as an approach that doesn't use a subquery. Since most modern DBs use the LIMIT syntax, this adds a regex approach to modify the query and force a LIMIT clause without using a subquery for the database that support this syntax and uses method 2 for all others. * Fixing build * Fix lint * Added more tests * Fix tests	2018-05-14 14:44:05 -05:00
Yongjie Zhao	7a4a89b195	Update Apache Kylin dbengine with supported week/quarter grains (#4965 )	2018-05-14 14:11:59 -05:00
Ville Brofeldt	b391676544	Force lowercase column names for Snowflake and Oracle (#4994 ) * Force lowercase column names for Snowflake and Oracle * Force lowercase column names for Snowflake and Oracle * Remove lowercasing of DB2 columns * Remove DB2 lowercasing * Fix test cases	2018-05-14 13:43:13 -05:00
timifasubaa	d87504cb42	Merge pull request #4833 from timifasubaa/help_sqllab_forget_the_past [sqllab] Help sqllab forget query history	2018-05-07 10:56:39 -07:00
Timi Fasubaa	ab958c67e6	make queries older than 6 hours timeout	2018-05-07 10:14:37 -07:00
Yongjie Zhao	5d6e59aa8a	Support Apache Kylin in EngineSpec (#4925 ) * Support Apache Kylin in EngineSpec * Fix flake8	2018-05-03 08:42:43 -07:00

1 2 3

140 Commits