Connect and model complex distributed data sets to build repositories, such as data warehouses, data lakes, using appropriate technologies.
Manage data related contexts ranging across addressing small to large data sets, structured/unstructured or streaming data, extraction, transformation, curation, modelling, building data pipelines, identifying right tools, writing SQL/Java/Scala code, etc.
•Create and maintain optimal data pipeline architecture
•Assemble large, complex data sets that meet functional / non-functional business requirements
•Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
•Build the infrastructure required for optimal extraction transformation, and loading of data from a wide variety of data sources ‘big data’ technologies
•Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics
•Work with stakeholders including Management, Domain leads, and Teams to assist with data-related technical issues and support their data infrastructure needs
•Keep data secure
•Create data tools for analytics and data scientist team members
•Work with data and analytics experts to strive for greater functionality data systems
Minimum of 6 years' hands-on experience with a strong data background
Solid development skills in Java, Scala and SQL
Clear hands-on mastery in big database systems - Hadoop ecosystem, Cloud technologies (e.g. AWS, Azure, Google), in-memory database systems (e.g. HANA, Hazel cast, etc) and other database systems - traditional RDBMS (e.g. Terradata, SQL Server, Oracle), and NoSQL databases (e.g. Cassandra, MongoDB, DynamoDB)
Practical knowledge across data extraction and transformation tools - traditional ETL tools (e.g. Informatica, Altryx) as well as more recent big data tools