About this role
<div> <h2>Junior Data Engineer</h2> <h3>Responsibilities</h3> <h4>1. Data Pipeline Development & Maintenance</h4> <ul> <li>Participate in the development and optimization of both batch and real-time ETL/ELT processes.</li> <li>Support the reliable extraction, transformation, cleansing, and loading of data from various business systems into the data warehouse and data lake.</li> </ul> <h4>2. Data Modeling & Data Asset Development</h4> <ul> <li>Assist in dimensional modeling for enterprise data warehouses under the guidance of senior team members.</li> <li>Contribute to the design and development of fact tables, dimension tables, and aggregate tables.</li> <li>Maintain standardized data dictionaries, technical documentation, and data governance artifacts.</li> </ul> <h4>3. Data Quality Assurance</h4> <ul> <li>Support the implementation of data quality monitoring frameworks.</li> <li>Assist in defining and maintaining validation rules for data completeness, accuracy, consistency, and timeliness.</li> <li>Collaborate with the team to identify, analyze, and resolve data quality issues in a closed-loop manner.</li> </ul> <h4>4. Data Platform Technology Implementation</h4> <ul> <li>Participate in the learning, evaluation, and implementation of modern data platform technologies, including Spark, Flink, Kafka, Airflow, Snowflake, and related tools.</li> <li>Assist in building CI/CD pipelines, automated testing capabilities, and engineering best practices for data platform development.</li> </ul> <h4>5. Cross-functional Collaboration & Delivery</h4> <ul> <li>Work closely with Product, Data Analytics, and Backend Engineering teams to gather requirements and participate in technical solution discussions.</li> <li>Translate complex business requirements into scalable and maintainable data solutions.</li> <li>Provide support for day-to-day platform operations, troubleshooting, and incident resolution.</li> </ul> <hr> <h2>Qualifications</h2> <h3>Education</h3> <ul> <li>Graduating in 2026 with a Bachelor’s degree or above from an accredited domestic or international university.</li> <li>Preferred majors include Computer Science, Software Engineering, Mathematics, Statistics, Data Science, or related disciplines.</li> </ul> <h3>Programming & Database Fundamentals</h3> <ul> <li>Solid foundation in computer science principles and database concepts.</li> <li>Proficient in SQL for data querying and manipulation.</li> <li>Strong proficiency in at least one mainstream programming language (Python, Java, or Scala).</li> <li>Demonstrates good coding standards, software engineering practices, and object-oriented programming skills.</li> </ul> <h3>Big Data Technology Knowledge</h3> <ul> <li>Strong interest in big data technologies and data engineering.</li> <li>Familiar with the core concepts and use cases of Hadoop, Spark, Flink, Kafka, and related ecosystem components.</li> <li>Experience with cloud platforms (AWS, GCP, Azure) and/or cloud data warehouses (Snowflake, BigQuery) is a plus.</li> </ul> <h3>Data Engineering Fundamentals</h3> <ul> <li>Understanding of data warehousing concepts, including dimensional modeling, star schema, and snowflake schema design.</li> <li>Familiarity with Linux operating systems, common shell commands, and Git version control.</li> <li>Knowledge of Docker containerization and CI/CD concepts is considered an advantage.</li> </ul> <h3>Soft Skills</h3> <ul> <li>Excellent analytical thinking and problem-solving abilities.</li> <li>Strong communication skills with the ability to explain technical concepts clearly to non-technical stakeholders.</li> <li>Demonstrates accountability, ownership, and a proactive mindset.</li> <li>Ability to work effectively in a fast-paced environment and manage multiple priorities simultaneously.</li> </ul> <h3>Preferred Qualifications</h3> <ul> <li>Previous internship experience in Data Engineering, Data Platform, Data Warehouse, or Big Data-related roles.</li> <li>Contributions to open-source projects.</li> <li>Outstanding achievements in data-related competitions, hackathons, or technical challenges.</li> </ul> </div> <p>初级数据工程师 岗位职责 数据管道开发与维护:参与公司离线及实时数据ETL/ELT流程的开发与优化,协助完成从各业务系统到数据仓库/数据湖的数据可靠传输与清洗转换。 数据建模与资产建设:在导师指导下参与数据仓库的维度建模工作,协助完成事实表、维度表及聚合表的设计与开发,并维护规范的数据字典与技术文档。 数据质量保障:参与建立数据质量监控体系,协助定义数据完整性、准确性等校验规则,配合团队进行数据问题的定位、分析与闭环处理。 平台技术落地:参与数据平台相关技术栈(如Spark、Flink、Kafka、Airflow、Snowflake等)的学习、调研与落地实践,协助推进数据平台的CI/CD流程建设与自动化测试。 跨团队协作与交付:与产品、数据分析及后端团队紧密配合,参与需求评审与技术方案讨论,将复杂的业务需求转化为可落地的数据解决方案,并支持日常运维与故障排查。 任职要求 教育背景:2026届国内外本科及以上学历应届毕业生,计算机、软件工程、数学、统计学或相关专业优先。 编程与数据库基础:具备扎实的计算机基础,熟练掌握SQL语言;至少精通一门主流编程语言(Python/Java/Scala),具备良好的代码规范与面向对象编程思维。 大数据技术认知:对大数据生态有浓厚兴趣,了解Hadoop、Spark、Flink、Kafka等主流组件的基本原理与应用场景;有云平台(AWS/GCP/Azure)或云数仓(Snowflake/BigQuery)使用经验者优先。 数据工程素养:了解数据仓库基础理论(如星型/雪花模型、维度建模),熟悉Linux常用命令及Git版本控制工具;了解Docker容器化及CI/CD基本概念者加分。 综合素质:具备优秀的逻辑思维能力与良好的沟通表达能力,能够将技术问题清晰地向非技术人员传达;拥有强烈的责任心与Owner意识,具备在快节奏环境中抗压与多任务推进的能力。 加分项:有相关数据工程实习经历、参与过开源项目贡献、数据竞赛中取得优异成绩者优先。</p>TBC