Starting Your Project

Things to think about when deciding on project feasibility

"When selecting a project, the most important consideration is data availability." 1 If you don't have enough quality data ... good luck :)

Tip: Consider that data augmentation can alleviate both the need for more manual labelling and also protect you from problems with out-of-domain data (e.g. when unexpected image types arise in the data when the model is being used in production) by synthetically creating more data likely to be seen that may not be in your dataset as is.
Note: "... iterate from end to end in your project; don’t spend months fine-tuning your model, or polishing the perfect GUI, or labeling the perfect dataset"
2

This is good advice for any software project ...fail early and fail often. If you don't, you're likely to only uncover critical problems much later than you would have before, and even worse, you're likely to not produce anything at all! In the world of deep learning there are a number of tools, that while helpful, can really get you so bogged down that you never deploy something usable (e.g., experiment tracking tools, hyperparameter optimization libraries, etc...).

Also, remember that getting something in production is a different task from winning a kaggle competition, where the later may require use of some of those aforementioned tools and the ensembling of dozens of models. For production, something better than human is often good enough to get out there and through refactoring, improve.


The Drivetrain Approach 3

Four Steps

Step 1: Define your objective(s)

It's amazing how in my 20+ years as a developer, how rare it is that a customer is able to clearly define what they want! In my experience, more than not, it is the developers that end up defining the goals. Not having a clear objective is likely to waste time, energy, and money to produce something that won't even see the light of day. You can't gauge the completion or quality of any software project without clear objective(s).

Ex.1: Show most relevant search results.
Ex.2: Drive additional sales by recommending to customers items to purchase they otherwise wouldn't

Step 2: What actions can you take to achieve those objective(s)?

What things can make your goals a reality. Pretty simple.

Ex.1: Ranking the search results will help show the most relevants ones first.
Ex.2: Ranking the recommendations will help.

Step 3: What data is needed to take those actions?

If you don't have the data, you'll need to get it ... because the data pulls the levers which get you closer to your objective(s).

Ex.1: Seeing what how pages linked to other pages.
Ex.2: Collecting data on what customers purchased, what was recommended, and what they did with that info.

Step 4: Build models

Only once you have the data and know what actions you want to be able to take based on the information within it, do you being modeling ... first, defining what models you can even build with that data and second, what data you need to collect for models you can't.

Ex.1: A model that takes the page relation data and predicts a ranking given a query.
Ex.2: Two models that predict the purchasing proabilities conditional on seeing or not seeing a recommendation.


1. "Chaper 2: From Model to Production". In The Fastbook p.58

2. Ibid.

3. Ibid. pp.64-65