Big Data Developer (Media Measurement sphere)
Our client is the company which pioneers the future of cross-platform media measurement, arming organizations with the insights they need to make decisions with confidence. Central to this aim are our people who work together to simplify the complex on behalf of our clients & partners.
It is a trusted partner for planning, transacting and evaluating media across platforms. With a data footprint that combines digital, linear TV, over-the-top and theatrical viewership intelligence with advanced audience insights, its platform allows media buyers and sellers to quantify their multiscreen behavior and make business decisions with confidence.
You’ll be responsible for building next generation data delivery platform. Our API is a driver of business growth by providing access to the television ratings metrics for a broad range of clients and products, including industry leading ad agencies, national television networks and other products. As a member of this fast-moving team you’ll have large impact on the evolution and adoption of the API as well as on the success of the business. It’s worth mentioning that this company processes and stores dozens of petabytes of data which is coming from Web and their current infrastructure processes 15 bln requests per day.
Service consists of two applications connected by the queue. First, one is the API web-service that accepts user requests (with JSON payload), validates them and pushes to the job queue. Second is the data engine implemented as a long-running Spark application deployed on the EMR cluster. Data engine pulls job requests from the queue, builds Spark jobs from JSON payload and runs these jobs on the cluster. When the job is done, the result is pushed back to the web-service via queue and returned to the client. It is important to note that all the data engine use is preprocessed and pre-aggregated already and stored as parquet files. So the only thing engine does is running a client queries against this data. As a result, the tasks to work with are related mostly to both Spark engine and API (we use Akka here), performance testing and optimization (again, API and engine), messaging (currently we move from SQS to custom Redis-based implementation), DevOps (Jenkins, Docker). Scrum is used as a development methodology. Tickets/tasks are not assigned strictly, it is always possible to pick any one interesting to you from the backlog.
What You’ll Do
- Work within an agile team to develop new endpoints and enhancements to the API;
- Recommend and implement creative solutions for improving query response times for large data sets;
- Increase scalability and maintainability to support rapid usage growth;
- Collaborate openly with stakeholders and clients to continuously improve the product and increase adoption;
- Big Data
What You’ll Need
- Experience in the design and development of web-based APIs;
- Our API is written in Scala, so experience with functional languages like Scala or Haskell is preferred (but not required);
- We are building everything in the cloud, so experience building, deploying and managing application in AWS is preferred;
- Experience using Apache Spark is preferred
- Strong SQL skills nice to have;
- Strong communication skills (written and verbal) along with a track record of success delivering large software projects;
- Demonstrated knowledge of commonly used software engineering concepts, practices, and procedures;