Master ThesisApply to this job
RAW Labs is a rapidly expanding Swiss enterprise data technology company that was spun out of École Polytechnique Fédérale de Lausanne (EPFL), by Prof. Anastasia Ailamaki and a team of highly successful engineers and scientists from amongst others CERN, Cisco and Salesforce.
At RAW Labs we have developed novel and innovative technologies to interrogate massive quantities of data in different formats, that are held in a variety of data stores across an enterprise infrastructure and in the Cloud. By leveraging this core technology RAW Labs has built a Cloud based Data Sharing platform for creating and maintaining APIs. The RAW platform enables our customers to exploit all forms of data to create curated Data Products via our DataOps infrastructure, and securely share data in hours, not days. Enterprises use RAW Labs’ platform to drive ML/AI, business intelligence and data analytics applications without having to build and maintain complex data engineering infrastructures.
RAW Labs is funded by a group of highly sophisticated and experienced technology investors and are advised by technology luminaries including: Prof. Martin Odersky (creator of Scala), Prof. Mike Franklin (co-creator of Spark), Dr. Alon Halevy (from Facebook’s AI team).
Where you’ll be
Our R&D team is based in two development centers: one in Lausanne, Switzerland, and the other in Athens, Greece. The successful applicants will be working in/near either office, with Remote working available if desired by agreement – note that access to either office will be required for face-to-face meetings with your project supervisor.
About the Role
We are seeking a number of highly talented, innovation-driven Interns to help our research and further differentiate the RAW platform.
Our Interns will be assigned one of several projects we have, based on preference, availability, suitability and business priorities (See below).
As an Intern you will be assigned a technical supervisor, who will typically be a senior engineer from our staff. Whilst you are working with us, you will be fully integrated into the core engineering team and hence will get to see how a commercial technology company develops its product. And, of course, if you excel during the Internship period, there will be opportunities to join us permanently as we grow.
Firstly and foremostly, we are looking for candidates with a passion for new technology, an inquisitive mind, a self-starting approach, and a can-do attitude. You may be working on problems that may not have been solved yet, or only partially solved. We are looking for:
- Masters’ candidates only, in Computer Science or related field.
- Commercial experience a benefit, not a pre-requisite
- Evidence of an innovative project you have undertaken in the software and/or data space.
- Great oral and written communication skills, preferably in English, but we have French, Greek and Portuguese speakers too.
For technical skills, we can tailor the project to the successful candidate’s experience. Here are some of the technologies we use currently:
- Scala and/or Java and/or Kotlin, and SQL.
- Development of distributed / big data, especially Spark
- Development of Visual Studio Code Extensions
- Cloud Service Provider’s stack, e.g: AWS
- Benchmarking and profiling tools, e.g., JMH, Apache JMeter
- Container technologies such as Kubernetes and/or Docker
- CI/CD tools, e.g., Jenkins, Artifactory and DevOps tooling, e.g., Terraform, Docker, Compose, Ansible
- Security frameworks/libraries/providers, e.g. Auth0
Please indicate in your application which of the projects are of interest to you. There can be multiple:
1. GraalVM/Truffle code generator (Expected duration: 6m-12m)
For Whom: Students interested into gaining hands-on into compilers and database architecture. Duration: Flexible, but min. of 3 months. Ideally 6 to 12 months.
RAW’s query engine implements just-in-time code generation techniques, and additionally supports multiple experimental code emitters. The goal of this internship is to contribute to an experimental Truffle-based code generator for the RAW query engine.
This project requires knowledge of Java and basic knowledge of compiler design (e.g. university course on compilers). Database engine design knowledge is desirable. Experience in – or strong desired to learn – GraalVM and Truffle in particular are a big plus, so please mention it.
We provide learning material, so above all, this project requires willingness to learn, implement and experiment.
2. AI/ML for Schema Discovery
For Whom: Students interested into gaining hands-on experience in the development of practical AI/ML models. Duration: Flexible. Expect 2 to 6 months.
A major feature of RAW’s query engine is the ability to read “raw” data (CSV, XML, JSONs, etc) with complex structures, directly from source and without any previous processing. Among other features, this requires a complex schema detection framework. For this we have developed an inferrer.
The goal of this project is to advance the use of AI/ML techniques for the inferrer to detect the structure and schema of typical datasets in a performant matter, e.g. character encoding, headers, delimiters, data types, formats of dates, numbers, nulls, etc. It is an opportunity to develop novel techniques, which may lay the foundation for future research-level work.
This project requires previous experience in building AI/ML models. No specific toolchain is required. Above all, this project requires a deep interest in the development and experimentation of new techniques as well as its careful evaluation.
Apply to this job