Authored By: Gaurav Kumar – (ElogicSquare Analytics)
- Is data governance critical for you?
- Would you want to build a flexible data governance strategy, iteratively realizing the benefits as it evolves?
- Would you have big data oriented programs, generating a vast amount of data?
- Would you want a governance strategy around individual programs or rolled up into some logical grouping?
- Would you want to improve over your existing governance strategy?
- Would you have regulatory / compliance needs?
- Are the Overall IT objectives in line with the business goals?
Data Governance in a Nutshell
When it comes to Data governance I remember Mark Twin phrase “Lies, damned lies, and statistics”. One side several business leaders are still exploring “What to do” and “How to do” data governance, which data to consider, which tools are available and on the other side there are complex regulatory compliance like HIPAA, SOX and Basel II hanging as sword.
DG especially in Big Data occasionally perceived as lie by a few and dammed lie by other few BUT when done properly not only this solves governance problem but also improve the data quality. Simple goal of DG is to govern how data can be accessed and used via business initiatives, as well as defined and managed via data management infrastructure
So what have we built?
We have built a multi-tenant Healthcare Analytics Platform on the Hortonworks Big Data Stack. The platform receives messages from multiple devices, from multiple tenants (in this case, it is the hospitals). Usual messages received are, from the devices attached to the patients in the high or low acuity areas, from the ventilators, from the laboratories, ADT messages. Our flagship product ‘LogiCrunch’ processes, predicts, publishes the patient’s condition in real-time to the clinicians (respective tenants’)
Listed below are the key governance based activities organized by Phases:
- Standardized stream based Lake
- Standardized feature specific Ponds
- Tenant specific data islands
- Who can access what
- Data maintenance procedures
- Periodic automated model evaluation procedures
- Standardized Tenant Authentication and Authorization Strategy
- Reference Data Management
- Technical and Business Metadata Management
- Enterprise Master Patient Index
- Raw Data (Lake) Persistence
- Dynamically build metadata in real-time Tag based policy enforcement
- Standardized Tenant Feed Specific Data Carpentry
- Data Masking
- Outlier Detection
- Product specific Rules
- Tenant based Policies
- Metadata based data cataloging and lineage discovery
Measure and Monitor:
- Ensure Regulatory Compliance
- Conformance with Policies, Standards and Data Principles
- Data Governance KPI
Tools Used to Power the Platform:
Hortonworks Big Data Governance Stack
- Apache Falcon
- Apache Atlas
- Apache Ranger
- Apache Knox
- Apache Solr
OpenLDAP (trust established between tenants and the platform) for authentication, integrated with Knox
PostgreSQL (Authorization at the Web layer)
Elastic Search (Web Access logs)