Senior Data Engineer (Remote)
vidIQ helps YouTube creators and brands generate more views and subscribers, while saving time. With over 1 Million active weekly users, we are the #1 Chrome Extension for YouTube creators, with clients including Red Bull, Buzzfeed, PBS, TMZ, BBC as well as hundreds of thousands of the largest YouTube creators in the world. We’re backed by top Silicon Valley investors including Scott Banister and Mark Cuban. vidIQ is profitable with a fully remote team over 25 employees and growing.
Role & Responsibilities
vidIQ is seeking a highly-motivated Senior Data Engineer with 5+ years of hands-on data engineering experience to join our growing team. The ideal candidate will be a go-getter with the ability to work independently. In this role, you will have oversight of partitioning data, building an ETL pipeline, data compaction, and AWS optimization.
You must be highly collaborative and a self-starter who is able to work in a fast-paced environment. Strong communication skills are essential in this role, as it will be integral in communicating to the back-end team where and how to implement data integration and persistence. You will also communicate to management the volumes of data we are gathering, as well as communicate the data access points and how to use this data, to the team and management.
You’ll be a good fit for this role if the following are true:
You love building things. You like new challenges and strive to ship new features to customers on a regular basis.
You love to learn. You enjoy keeping up with the latest trends. If a project uses a tool that’s new to you, you dive into the docs and tutorials to figure it out.
You act like an owner. When bugs appear, you document and fix them. When projects are too complex, you work with others to refine the scope until it’s something you believe can be built in a reasonable amount of time and maintained in the long run.
You care about code quality. You believe simple is better and strive to write code that is easy to read and maintain. You consider edge cases and write tests to handle them. When you come across legacy code that is difficult to understand, you add comments or refactor it to make it easier for the next person.
You understand balance. Great products must balance performance, customer value, code quality, dependencies, and so on. You know how to consider all of these concerns while keeping your focus on shipping things.
You over-communicate by default. If a project is off-track, you bring it up proactively and suggest ways to simplify and get things going. You proactively share status updates without being asked and strive to keep things as honest and transparent as possible.
- 5+ years experience using Python for internal data pipelines (moving data inside AWS account)
- Experience with Scala - for external data pipelines (moving data from outside of AWS account into AWS account) FS2, http4s
- Additional experience with DynamoDB, Lambda, Athena, S3, AWS GlueFamiliar with Spark (in the moment Scala only) preferred
- Hands-on experience with data workflow orchestration (Airflow)