Improve database type inference (#4724)

* Improve database type inference

Python's DBAPI isn't super clear and homogeneous on the
cursor.description specification, and this PR attempts to improve
inferring the datatypes returned in the cursor.

This work started around Presto's TIMESTAMP type being mishandled as
string as the database driver (pyhive) returns it as a string. The work
here fixes this bug and does a better job at inferring MySQL and Presto types.
It also creates a new method in db_engine_specs allowing for other
databases engines to implement and become more precise on type-inference
as needed.

* Fixing tests

* Adressing comments

* Using infer_objects

* Removing faulty line

* Addressing PrestoSpec redundant method comment

* Fix rebase issue

* Fix tests
This commit is contained in:
Maxime Beauchemin
2018-06-27 21:35:12 -07:00
committed by GitHub
parent 04fc1d1089
commit 777d876a52
8 changed files with 224 additions and 117 deletions

View File

@@ -7,7 +7,9 @@ from __future__ import unicode_literals
import textwrap
from superset.db_engine_specs import (
HiveEngineSpec, MssqlEngineSpec, MySQLEngineSpec)
BaseEngineSpec, HiveEngineSpec, MssqlEngineSpec,
MySQLEngineSpec, PrestoEngineSpec,
)
from superset.models.core import Database
from .base_tests import SupersetTestCase
@@ -193,3 +195,9 @@ class DbEngineSpecsTestCase(SupersetTestCase):
FROM
table LIMIT 1000"""),
)
def test_get_datatype(self):
self.assertEquals('STRING', PrestoEngineSpec.get_datatype('string'))
self.assertEquals('TINY', MySQLEngineSpec.get_datatype(1))
self.assertEquals('VARCHAR', MySQLEngineSpec.get_datatype(15))
self.assertEquals('VARCHAR', BaseEngineSpec.get_datatype('VARCHAR'))