Senior Data Engineer / Real Estate Investment Platform

Location: New York, NY
Date Posted: 07-10-2017
Innovative software startup building a technology-driven trading platform for real estate investment, powered by machine learning and data science seeking Senior Data Engineer to help build out the team and platform. We are quantifying and automating every step of the investment process, from discovery to due diligence to asset management, through a data aggregation engine.  We currently use a combination of Tableau, Mapbox, and D3.  The platform opens up the commercial class to those beyond just large institutional investors. This is a very well funded startup with respected investors like Andreessen Horowitz and Khosla ventures. The engineering team is world class, with individuals from Google, as well as the who's-who of the NY pre-IPO startup world.

  • Work closely with fellow data and software engineers to scale our data processing platform to handle a wide variety of structured and unstructured data sources
  • Design, implement, and scale the end-end pipelines powering our data-driven investment models
  • Process data into visualization tools, generating market and asset level insights
  • Leverage machine learning techniques to increase the performance of our existing models
  • Work on novel ways to visualize and present data to business stakeholders
  • BS, MS, or PhD in computer science, engineering or other related field
  • 5+ years experience with data-intensive backend programming
  • Understand experimental design and can build for measurement and interpretation of results
  • Write clean, well-structured, production-quality code in Python, Java, Scala, or similar
  • Have built and deployed large-scale ETL pipelines: from data ingestion and processing to storage and validation
  • SQL and NoSQL data stores (Postgres, Redshift, Redis, Vertica, Cassandra, etc.)
  • Scraping websites with structured and unstructured data

Bonus requirements

  • Large scale processing frameworks (Hadoop, Spark, Pig, HBase)
  • Pandas, numpy, scipy, scikit-learn
  • Built machine learning pipelines at a large scale

Modern, service-oriented stack, continuously integrated and deployed (Jenkins, Ansible) on AWS: React + Redux, Immutable.js, Stylus, Node, Koa, Python, Django, with a combination of SQL (Postgres) and NoSQL (Redis) data stores. We believe in using the best tool for the job, so we are always refining our toolset.

this job portal is powered by CATS