Skip to content

Latest commit

 

History

History
28 lines (23 loc) · 6.21 KB

README.md

File metadata and controls

28 lines (23 loc) · 6.21 KB

Data Scientist Interview Offline Exercise

Instructions

  1. Before you begin working on the exercise, be sure to go over the Data Explanation section below. We would advise reading the Additional Notes section as well.
  2. The Jupyter notebook exercise.ipynb has all the questions (5 in total) for the exercise. You should use the same notebook as the primary space for your answers. The data to be used for answering the questions is in the touches.sqlite3 file. Make sure to not commit your answers to this repo, but instead copy-paste the notebook and the sqlite3 file into a different folder, type your answers in the copied Jupyter notebook and then send the notebook over to the recruiter or upload it to Greenhouse (you do not need to send back or upload the sqlite3 file).
  3. You’re welcome to use whatever Python libraries and packages you love. You’re also welcome to write your own Python libraries and modules and import them into the notebook - just make sure to send that extra code in as well.
  4. If you’re not confident with your Python skills, write the answers separately in your favorite language - and then send that code as well. Even if you choose this option, use the Jupyter notebook to explain your work and document where to find that code. If your work involves charts and graphs, send them as well and document in the notebook where they are and what they represent. If your work involves SQL queries, you can just write down the query in the notebook in a markdown to show us what you did.
  5. Within the Jupyter notebook, make sure to use markdowns and comments liberally to explain what you’re doing and why. We care more about how you answer a question than what the correct answer is. We also love good communicators!
  6. The 5 questions build up from easy to hard. Please spend the amount of time on it you feel is reasonable. We estimate the first four questions will take 1-2 hours to solve in total. The last question could take anywhere between half-an-hour and several hours depending on how much in depth you want to go with it.
  7. IMPORTANT: If you're applying for an Associate Data Scientist position, you only need to complete questions 1, 2 and 3. If you're applying for a Senior Data Scientist position, not completing all of the questions will put you at a disadvantage for the next steps.
  8. PRO TIP: Throughout the exercise, using smarter and more efficient SQL queries than just "SELECT * FROM <table>;" will put you at an advantage among candidates.

Data Explanation

If you're a data scientist, you must have seen the oft-cited Venn diagram with a circle for Math/Stat, another for Programming/Algorithms and a third one for Business Domain. At SalesLoft, we believe you cannot be a good data scientist if you do not make a meaningful effort in understanding the domain. This section is to explain a little bit of the business domain to you, particularly as it pertains to this exercise.

Sellers use our platform to get in touch with their prospective customers. They do that by executing a series of touches (like email, call etc.) to contact people working in companies that they're targeting. Sometimes, the people they are contacting go cold and do not respond at all - they do not reply to emails or pick up phone calls. Sometimes, even if they do respond, they are not the right people and/or do not have leverage. Yet others are interested in the sales pitch they hear and want to move forward. When that happens, an opportunity gets created and the sellers begin understanding the prospective customer's pain points, their business space etc. and evaluate if they fit an ideal customer profile. Simultaneously, they also negotiate on various terms of the contract. When the customer finally signs the contract, the opportunity ends in a "Closed Won" stage, whereas if the deal dies then the opportunity ends in "Closed Lost" (like if negotiations failed or if the company went with a competitor) or "Disqualified" (like if the company did not fit the ideal customer profile).

In the SQLite database we have provided (touches.sqlite3), you will find four tables - companies, persons, touches and opportunities. Every table has an id column that is referenced in some other tables (like the id column in companies is linked to the company_id column in persons). You can pretend that this is data from a hypothetical sales team using SalesLoft.

  • the companies table records all the prospective customer companies that this team has targeted or plans to target in the future
  • the persons table records information about people working at those companies.
  • the touches table records information about the various times of contact and types of touches that were scheduled to be executed against people the team is targeting. When the column status in the table has the value 'completed', this means that the touch was executed (like email was sent or call was made).
  • the opportunities table records all the opportunities that this hypothetical sales team has been able to create. When the latest_stage column is 'Closed Won', it represents that a deal was successfully signed with that specific company. When the latest stage is 'Closed Lost' or 'Disqualified', it represents that the opportunity was in the end unsuccessful. All other values of latest_stage indicate that the opportunity is still in flight and still being worked on.

Additional Notes

The data we are giving you is in the touches.sqlite3 file. If you are unfamiliar with this type of file format, check out the following:

  • You can use an online viewer SQLite Viewer or a Mac app such as Sqlite Browser to view the tables in the database, search them etc.
  • Some database management tools, such as TablePlus, will happily open that file for you and let you query its tables. This is not super necessary, though, as you can query it through Python.
  • If you code in Python but have never used SQLite before, check out Python's API interface for SQLite databases. You can safely use it with the pandas read_sql function.