Data Platform Engineer / Cloud Architect

REMOTE

Web3 Builders is a Smart Contract/NFT Trust and Security Startup (Stealth) 

As a Data Platform Engineer / Cloud Architect, you’ll be responsible for the data platform architecture, infrastructure and core data services on which Web3Trust products are built making it easy to access, process, and transform data for analytics. The ideal candidate would possess several skill sets including proficiency with python, SQL, data pipelines, data modeling, distributed systems, and excellent communication skills.

You will be working closely with Data Scientists and Product Managers to understand requirements and build data infrastructure that includes but is not limited to data streams, data lake, data warehouse, as well as the core data services to manage ingestion, storage optimization, processing and consumption of data, all while maintaining the highest data quality, observability, and reliability.

This is a fantastic role for a platform builder with awesome skills that wants to jump into an extremely fast-cycle deployment world at the forefront of blockchain tech. Your energy and enthusiasm to work and grow alongside our high-trajectory team are essential!

What You'll Be Doing

  • Develop and maintain scalable batch & real-time data streaming system where data producers and consumers can integrate seamlessly

  • Develop and maintain core data services which process streams and write to various storage solutions optimized for analytics

  • Develop and maintain APIs for serving data at scale in both real-time and batch manner

  • Implement 3 pillars of observability in the data platform including logs, metrics and tracing to make debugging a pleasure not a pain :)

  • Ensure data quality throughout the data platform by automating quality checks

  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.

  • Collaborate with Data Integration Engineers and Data Scientists to improve data collection methodologies, data pipelines, data quality, and data accessibility

  • Write unit/integration tests, contribute to engineering wiki, and document work.

  • Perform data analysis required to troubleshoot data-related issues and assist in the resolution of data issues.

  • Work closely with all business and product teams to develop strategy for long term data platform architecture.

  • Collaborate with data scientists to develop/maintain infrastructure and processes that support the full ML lifecycle from development to deploying/monitoring models in production

What You Bring

  • 5+ years of experience in a Data Engineer/Platform Engineer role, who has attained a B.S or M.S degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field

  • 5+ years experience in Python with working knowledge of building fault tolerant, realtime & batch data processing pipelines.

  • 3+ years experience building systems on cloud (AWS, GCP, Azure) with competency in core computing & big data services (EC2, Lambda, S3, EMR, RDS, Redshift, Athena in AWS for example)

  • Strong experience architecting reliable, scalable, and observable distributed ‘big data’ systems

  • Strong experience with processing data in structured and unstructured formats e.g. XML, EDI, JSON, CSVs, Parquet, ORC, Avro

  • Strong experience with message queuing, stream processing, and highly scalable ‘big data’ stores

  • Strong experience working with data streaming platforms such as Kafka, Pulsar, GCP PubSub, AWS Kinesis

  • Experience developing, managing, and deploying containerized services

  • Experience working with big data processing frameworks such as Spark, Dask, Flink, Presto/Trino etc.

  • Winning + Can-Do attitude, can take responsibilities, be friendly and approachable, ready to take challenges individually, and last but not the least excellent team player.

  • Experience with production grade MLOps (scalable model deployment, versioning, monitoring)

Bonus points

  • Familiarity with Blockchains and their data structures

  • Familiarity with 3rd party APIs for extracting data from various blockchains (Infura, EtherScan, BlockChair, etc.)

  • Data science background with automation experience.

  • Familiarity with the MLops toolchain including solutions for feature stores, data versioning, model versioning, model serving, model monitoring, data drift detection etc.

  • Experience with data pipeline and workflow orchestration tools: Airflow, Prefect, Dagster etc..