It is easier to ignore or move a problem around than it is to solve it 


Мы поможем в написании ваших работ!



ЗНАЕТЕ ЛИ ВЫ?

It is easier to ignore or move a problem around than it is to solve it

1. It has to work

While this sounds like a no-brainer, I am amazed how many people, new and experienced, get carried away with new fancy-sounding names or because something came out of DeepMind or OpenAI or Stanford/MIT/what have you. Participating in the Real World has no room for ideology or specific research agendas. If your fancy model does not work on their dataset, environment and resource constraints, the Real World will mercilessly reject it. There are many results on arXiv that only work on a handful of datasets or work on bajillion GPUs that only Google infrastructure can support. Do the community a favor and stop publishing those as general results. It has to work. Not just as “kosher” science in your paper but also for others’ situations. It is for the same reason why we don’t think of doing anything in Computer Vision without ConvNets today or why we readily use Attention with sequence models. It has to work.

Conjecture: So many, esp. folks new to ML, get carried away with fancy models names and can’t wait to try them, or write blog posts about them, and so on. I think this is like someone newly learning to write. They think using big words will make their writing better, but experience will teach them otherwise.

2. No matter how hard you push and no matter what the priority, you can’t increase the speed of light
Cache hierarchies must be respected, network overheads will throw a wrench in your distributed training, there is only so much you can cram in a vector, and so on.

3. With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea
A sufficiently motivated graduate student or a large hyperparameter sweep at a massive datacenter can find a set of hyperparams that will make a crazy complicated model work well or even produce outstanding results, but no one in the Real World ever ships models that are so hard to tune. A dirty secret I found while helping companies with their ML teams back when running Joostware — most did not know/care about hyperparameter tuning.

4. Some things in life can never be fully appreciated nor understood unless experienced firsthand
Some things in machine learning can never be fully understood by someone who neither builds production ML models nor maintains them. No amount of courseware, MOOCs, Kaggling will prepare you for that. There is no substitute for deploying a model, observing user interactions with the model, dealing with code/model rot, and so on.

5. It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases, this is a bad idea
End-to-End learning sounds like a good idea on paper, but for most deployment scenarios, pipelined architectures that are piecewise optimized will continue to stay. That doesn’t mean we will not have end-to-end systems at all (speech recognition and machine translation have decent production-worthy end-to-end solutions), but for most situations having observable paths for debugging will trump other options.

For example, in speech, acoustic modeling is hard, but you can let your network figure out those details on the way to solving a different problem (say speech recognition). In NLP, parsing is hard to get right. But thankfully, for 99% of the Real World tasks, we can get by without parsing. In Vision, don’t solve a segmentation problem first if all you need is a classifier. The list is endless.

Corollary: Don’t solve a problem unless you absolutely have to.



Поделиться:


Последнее изменение этой страницы: 2024-07-06; просмотров: 40; Нарушение авторского права страницы; Мы поможем в написании вашей работы!

infopedia.su Все материалы представленные на сайте исключительно с целью ознакомления читателями и не преследуют коммерческих целей или нарушение авторских прав. Обратная связь - 216.73.216.196 (0.007 с.)