View on GitHub

Shift-4-Haystack

The help of the implement of the Haystack specification

Provider + AWS Athena

This provider extends the DBProvider to manage time-series with AWS Athena.

Use HAYSTACK_PROVIDER=shaytack.providers.athena to use this provider. Add the variable HAYSTACK_DB to describe the link to the backend to read the ontology and HAYSTACK_TS to describe required elements to run Athena queries.

The format of HAYSTACK_TS is :

athena://shaystack?output_bucket_name=<S3 bucket name>&output_folder_name=<output folder>

hisURI structure

hisURI in the ontology definition contains all the elements required to query Athena databases and retrieve time series data according to the information given inside hisURI whose structure should be as follows:

"hisURI": {
            "db_name": "data base name",
            "table_name": "table name",
            "partition_keys": "partition keys key1='value1'/key2='value2'/.../key(n)='value(n)",
            "hs_type": "type of timeseries (composit: dict or simple: float or string)",
            "hs_value_column": { 
                "column1": "float",
                "column2": "float",
                ...
                "column(n)": "float"
            },
            "hs_date_column": {
                "time": "%Y-%m-%d %H:%M:%S.%f"
            },
            "date_part_keys": {
                "year_col": "year",
                "month_col": "month",
                "day_col": "day"
            }
        }

You can create Athena tables, using AWS Glue or by running a DDL statement in the Athena query editor.

Running Haystack Api

$ HAYSTACK_PROVIDER=shaystack.providers.athena \
  HAYSTACK_DB=sample/athena_sample.hayson.json \
  HAYSTACK_TS=athena://shaystack?output_bucket_name=bucket_name&output_folder_name=folder_name \
  shaystack

Requirements

In order to use the Athena provider, ensure that the IAM role or another IAM principal has the required permissions to access the source data bucket and the query result bucket, as well as Athena permissions to execute Athena queries.

Example – The following identity-based permissions policy allows actions that a user or other IAM principal requires to run queries that use Athena UDF statements

>    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "lambda:InvokeFunction",
                "athena:GetQueryResults",
                "s3:ListMultipartUploadParts",
                "athena:GetWorkGroup",
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:athena:*:MyAWSAcctId:workgroup/MyAthenaWorkGroup",
                "arn:aws:s3:::MyQueryResultsBucket/*",
                "arn:aws:lambda:*:MyAWSAcctId:function:OneAthenaLambdaFunction",
                "arn:aws:lambda:*:MyAWSAcctId:function:AnotherAthenaLambdaFunction"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "athena:ListWorkGroups",
            "Resource": "*"
        }
    ]
}

Refer to this link for more details on the Athena permissions policy requirements.

Limitation