data_profiling.lib.base module

class data_profiling.lib.base.C(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: StrEnum

BLACK_SQUARE = '■'
CHAR = 'CHAR'
CLASSPATH = 'CLASSPATH'
CLASS_NAME = 'class_name'
CONNECTION_STRING = 'connection_string'
CSV_EXTENSION = '.csv'
DATABASE = 'database'
DATE = 'DATE'
DECIMAL = 'DECIMAL'
EXCEL_EXTENSION = '.xlsx'
FLOAT = 'FLOAT'
JAR = 'jar'
JDBC = 'jdbc'
NUMBER = 'NUMBER'
PORT_NUMBER = 'port_number'
SQL_EXTENSION = '.sql'
VARCHAR = 'VARCHAR'
class data_profiling.lib.base.Config

Bases: object

CONFIG_DIR = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/data-profiling/checkouts/release-1.0.5/config')
PRIMARY_CONFIG_FILE = 'config.yaml'
classmethod get_config(file_name: str = 'config.yaml') dict

Read a configuration file from the configuration file directory :param file_name: file within the configuration directory :return: the configuration corresponding to that file

class data_profiling.lib.base.Database(host_name: str, port_number: int, database_name: str, user_name: str, password: str, auto_commit: bool = False, **kwargs)

Bases: object

Wrapper around the jaydebeapi module.

classmethod execute(sql: str, parameters: list = [], cursor: Cursor = None, is_debug: bool = False) Tuple[Cursor, list]
Wrapper around the Cursor class
Returns a tuple containing:
1: the cursor with the result set
2: a list of the column names in the result set, or an empty list if not a SELECT statement
Parameters:
  • sql – the query to be executed

  • parameters – the parameters to fill the placeholders

  • cursor – if provided will be used, else will create a new one

  • is_debug – if True log the query but don’t do anything

Returns:

a tuple containing:

classmethod fetch_one_row(sql: str, parameters: list = [], default_value=None) list | str | int
Run the given query and fetch the first row.
If default_value not provided then …
If there is only a single element in the select clause the function returns None.
If there are multiple elements in the select clause the function to return [None]*the number of elements.
Parameters:
  • sql – the query to be executed

  • parameters – the parameters to fill the placeholders

  • default_value – if the query does not return any rows, return this.

Returns:

if the return contains two or more things return them as a list, else return a single item.

classmethod get_connection() Connection
class data_profiling.lib.base.Logger(level: [str | int] = None, session: str = None, **kwargs)

Bases: object

classmethod get_logger() Logger
record_factory_factory()

Enables us to display a session identifier with each log message.

classmethod set_level(level: str) None
data_profiling.lib.base.dedent_sql(s)

Remove leading spaces from all lines of a SQL query. Useful for logging.

Parameters:

s – query

Returns:

cleaned-up version of query

data_profiling.lib.base.get_line_count(file_path: str | Path) int

See https://stackoverflow.com/questions/845058/how-to-get-line-count-of-a-large-file-cheaply-in-python