The third episode of A4N — the Artificial Neural Network News Network podcast — is out (listen on my website, on Apple Podcasts, Spotify, Google Podcasts, or YouTube). In this episode, our guest host Kirill Eremenko joins us to discuss SuperDataScience, his thriving data-science education business, and Vince introduces us to machine learning projects being applied to understand -- and preserve -- marine life in the oceans.
Our special guest today is Kirill Eremenko. Kirill is Russian-born Australian, and Founder and CEO of SuperDataScience, an online educational portal for Data Scientists. Their mission is to “Make The Complex Simple,” and become the biggest learning portal for Data Science enthusiasts. Ever. He is also the Co-Founder of BlueLifeAI, Founder of the DataScienceGo conference, and hosts his own podcast, the SuperDataScience Podcast!
Part I: Scaling a Global Data Business with Kirill Eremenko
4:25 OmniFocus
4:36 SuperDataScience
5:25 Udemy
8:05 Deep Work
8:20 The Great CEO Within
17:57 DataScienceGO Virtual
22:10 Fake Ad for “TPML”
Part 2: 23:00 Saving the Oceans with Machine Learning
24:30 A.I. Is Helping Scientists Understand an Ocean’s Worth of Data
30:18 Some like it hot - visual guidance for preference prediction
31:15 Merantix
32:10 Wirewax
33:05 Scale
36:00 VGGish
41:18 Jon Krohn’s LinkedIn
41:28 Twitter @JonKrohnLearns
41:49 jon@jonkrohn.com
41:52 email newsletter at jonkrohn.com
Transcript
Jon Krohn 0:05
Welcome to A4N the Artificial Neural Network News Network the show about the latest developments in artificial intelligence, machine learning and data science, where we both introduce technical aspects of these advances, as well as discuss their social implications. In today's episode, we'll be discussing creating successful data science focused businesses with the extraordinary Kirill Eremenko.
Kirill Eremenko 0:31
Please, that was too strong, but thank you. Thank you. Nice to meet everybody.
Jon Krohn 0:39
As well as talking about breaking news on how machine learning is being deployed to understand the world's oceans as well as to clean the oceans up. My name is Jon Krohn, I have a Canadian accent and I always use the word data as a plural term. So that's how you can identify me.
Vincent Petaccio 0:57
And I'm Vince Petaccio, the one with glasses.
Jon Krohn 1:01
They are really thick frames. Alright, so let's get started. We're deeply honored to have our special guest Kirill with us today. So I found out just before the program–this is mind blowing to me–I assumed that Kirill was on people's podcasts all the time, and that this was just like another day, another podcast for him. And in one respect, that's true, because Kirill has his own podcast, his SuperDataScience Podcast, and we'll talk about SuperDataScience a whole bunch in this program. And with SuperDataScience, Kirill has recorded 363 of his own podcast episodes. And on 180 of those he's had a guest. However, this is Kirill's second time ever as a guest on a podcast and the last time was four years ago in 2016. So what an honor to have you here . Tell us, where are you joining us from today and how are things going for you under the Coronavirus lockdown?
Kirill Eremenko 2:00
Thank you, Jon. Thank you very much. I want to preface this by saying that the honor is mine. I'm actually holding your book, Deep Learning Illustrated by Jon Krohn right in my hands now. Funny enough it arrived in my post the same day as–or the next day following when we recorded the podcast show. The SuperDataScience show and yeah, so very exciting looks like a really cool book with great illustrations. And of course, Vince, very lovely to meet you as well. You are a climate advocate, so I'm looking forward to discussing things today about oceans and things like that.
Vincent Petaccio 2:38
Yeah!
Kirill Eremenko 2:39
To answer your question, Jon, I'm calling from the UK so during this Coronavirus pandemic, I was fortunate to be with my girlfriend at the time when the lockdowns took place and we're a bit west of London about two hours West. We're staying in a small little village. So you know, just going through this pandemic together, and I'm quite grate–I'm very grateful for that. And hopefully things will settle down soon, so I can go back to Australia where I usually am and see my family there.
Jon Krohn 3:14
Yeah, we can tell from your strong Australian accent– (laughter)
Kirill Eremenko 3:17
No, no. Um, like my accent is a mix because I was born in Russia. I grew up in Africa, and I live in Australia now, so it's quite hard to tell, like for people where the accent is from sometimes.
Jon Krohn 3:31
And you're fortunate that your company, SuperDataScience is completely distributed, right? You guys are like–everyone is–work separately all the time, right, so this isn't a big disruption?
Kirill Eremenko 3:42
Yeah, no, it's a–it's totally like we adapted to this very quickly. We have 15 people on the team fully distributed. We cover 10 different time zones, eight different countries. And yeah, fully distributed. And luckily, as well, our students can study from anywhere. So we are continuing to provide services and products to our students and supporting them in this difficult time. And I think it's, it's a good time to learn, right? Like if you can't go anywhere, if you're sitting at home, and you've always wanted to study data science, this is the time to jump into it. I myself, I'm taking several courses, like I was just learning how to be more productive with this tool called OmniFocus. And I'm really enjoying learning online.
Jon Krohn 4:27
Tell us about OmniFocus in a second. But I also just want to make the audience aware of SuperData Science and kind of what you guys do. So the SuperData Science brand that you created–five years ago, roughly?
Kirill Eremenko 4:40
Mmm, yeah, I think around 2015, yeah.
Jon Krohn 4:43
Initially, I think, it was you recording data science videos. I think you did all the initial ones. Is that right?
Kirill Eremenko 4:51
Yeah, I think I did, like maybe 12, or a bit more courses, myself.
Jon Krohn 4:56
So that initial connection that you created, and then subsequent videos, your data science, educational video tutorials, they range from data visualization, all the way through to deep learning. And super data science now has 1.1 million students growing quickly at the time of recording.
Kirill Eremenko 5:13
Thank you. (laughter) Just to clarify, so people are not misled–we have, yeah, one, one point, over 1.1 million students through Udemy, where you can buy courses individually. So if you're interested into- just like trying out a specific topic in data science, whether it's visualization or deep learning, or machine learning, or I don't know, like, soft skills or whatever else in data science, you can pick this individual course and just purchase it on Udemy. Or you're after a specific topic, like you might be interested in BERT and how to use that framework for natural language processing. But for people who are interested in going further, we have a membership in SuperDataScience, where people can sign up and get access to all the courses, additional workshops and more in order to progress their career further, like learning paths and stuff like that. And that's where people who want to advance their career further go. And of course, the bulk or the main, the majority of students that we have over one million's on Udemy. But then from there, we have a pathway through SuperDataScience as well.
Jon Krohn 6:20
You know, I checked that platform out. I checked the SuperDataScience platform out after we spoke. After I was on your podcast, I took your recommendation to sign up for the SuperDataScience newsletter, and then went to the platform. And it's very well set out there. You can access the platform for free, and I think there's quite a bit of content in there that is free. So, I definitely do recommend doing that. And then most of the courses, like the courses that people could be getting separately in Udemy, you can then pay in the SuperDataScience platform to access those specific programs, right?
Kirill Eremenko 6:53
Yes, yeah, absolutely. Yeah. So anybody indeed can check it out. We have trial memberships, we have some free content, SuperDataScience, and we also provide absolutely free content like the podcast is free. The newsletter that you mentioned, the Data Science Insider is also free. So in whatever ways we can we support the data science community.
Jon Krohn 7:13
Nice. Yeah, I love it. And then, so you said you were studying something specifically. So, you're studying right now, you said, "Omni–"?
Kirill Eremenko 7:21
OmniFocus. So after starting the business–five years ago or a bit more now–and growing and scaling it, I've had less and less time for data science and more and more time for requirements to run the business and like there's a lot involved in building a team creating a culture, setting up processes, having difficult conversations, hiring, firing, strategy, vision, products, values, like all these things combined, and I found myself getting more and more like overwhelmed. And so I've been, well, over the past five years one of my personal journeys has been discovering productivity and reading books like the “Productivity Project” by Chris Bailey, which actually came out in 2015, fantastic book. Now I'm reading “Deep Work” by Cal Newport, another fantastic book.
Jon Krohn 8:11
We talked about that on the last episode.
Kirill Eremenko 8:14
Yeah, yeah
Jon Krohn 8:16
Or, on the episode that I recorded with you.
Kirill Eremenko 8:17
It's a great book and productivity is important. And recently I was reading a book, “The Great CEO Within” by Matt Mochary, who's a CEO coach in the Silicon Valley. There's a real cool, really cool episode with him on the "This Week in Startups" podcast or "TWIST" by Jason Calacanis, I think and so they've got a really cool episode with Matt Mochary there, and anyway, so I found out about this book, I was reading it and there he briefly mentioned this to OmniFocus. And I thought, Oh, well, you know, I'll give it a go. Anything that can increase my productivity would be cool. Found this tool. It's amazing. It like blows all other tools–I was using Evernote before–blows all other tools completely out of the water, like you can put in like any email, I get that icon action right away, I just forward it on to OmniFocus and archive it, so now my inbox is almost always free. I have like an inbox in OmniFocus. And then I allocate them to projects and so on. And I got to do a quick shout out to the course. The course I'm taking is by Peter Akkies, how to basically be more productive with OmniFocus Three. Basically if you look at Peter Akkies on YouTube, you'll find his videos and that course has really been a life changer for me. So yeah, highly recommend if somebody wants a tool for productivity, OmniFocus is the best tool I found, and I did quite a lot of research in this space.
Jon Krohn 9:44
Amazing. I will be checking that out today. It sounds like I would be wasting time if I didn't. One more question here that might have a bit of a long answer, but I'm really interested to hear it. So prior to Kirill and me being introduced–and this isn't me flattering you, because I had this opinion before I'd ever met you–is that SuperDataScience is the biggest educational brand in data science. The number of students that are impacted by SuperDataScience. I have not found any other program that comes close. Tell us about your journey. You know, now you're studying focus tools, and you're managing a data science company, and I think this is a journey that a lot of people in machine learning will experience or already have experienced among listeners. So tell us about that, that experience of going from being a data scientist and then growing a successful business and that gradual transition towards management where you find yourself today.
Kirill Eremenko 10:53
Oh, thank you. That's a very good question. I tend to ask it myself on my podcast quite a lot. Yeah, I'd probably preface it with, or start the answer by saying that you gotta do what you love. Like, the goal shouldn't be able to build a business or make a ton of money or I don't know or even impact a million people.That's, that if you start out with that goal, you're like, really setting yourself up for failure in my view. And for me, I just found–I was exploring quite a lot. I was working at Deloitte and then I was in the industry as a data scientist in a superannuation company in Australia, which is basically a pension fund. And I was exploring quite a lot. What do I want to do with my life, what I want to do like what am I what am I going to actually going to enjoy and love and i found through like a long story, but I found that teaching by by taking an online course and then trying it out, basically trying to teach them some of course myself I found that I really love it and that this is something I want to do? So I started, just started doing that. And I was really good at it. Well from my perspective, I was really good at it! And–
Jon Krohn 12:08
I think we can quantitatively–we have quantitative evidence that indicates that you're pretty good at it.
Kirill Eremenko 12:17
Thanks. Yeah, but like at the time, people were giving me great comments and feedback saying, "Hey, I learned something from you." And I enjoyed that and looking back on my life, this is an advice I have for people when people asked me, how do I find out what I'm passionate about? It only later struck me that I've actually been teaching all my life. Like when I was at uni, I was teaching these extra curricular classes, like through a remote school, then I was always like, helping my brothers learn. I was teaching. When I was employed in companies. It wasn't ever like, crystallized that, like I'm an educator, but it's always been something that I enjoyed explaining things to people, and so just circled back round and I ended up doing something that had been with me all my life. And only accidentally, I finally figured out that that was my passion. And I started just doing that more and more and more, and at some point, I realized that I can continue doing it as a hobby, and there's nothing wrong with that. Or, if I want to scale, I'll have to treat it as a business. Like I won't be able to scale a hobby. And there's a very big, like, massive difference between a hobby and scaling a business. So I went and I did a course. I signed up for a course in Santa Barbara. So I was living–well, I still do live–in Australia at the time and this course was in Santa Barbara, like a course on how to scale a business basically. And in–
Jon Krohn 13:47
In Santa Barbara, California.
Kirill Eremenko 13:48
California, so I had to fly to Santa Barbara. It required some commitment for me. I had to go there four times in a year for, for like two days each time, just like–
Jon Krohn 13:58
How long is that flight?
Kirill Eremenko 13:58
13 hours? 13 hours.
Vincent Petaccio 14:00
Wow.
Kirill Eremenko 14:01
For two days, go back and like learn how to, how to scale the business. But one thing that I learned and I'm happy to share on the spot because one thing that I learned on the first day of that course, like paid off. I knew at the moment I learned, I knew that this is going to pay for the tuition and this is going to pay for all the flights and all the trouble. Like, this one concept I've learned is going to completely cover–is worth it makes the whole thing worth it. Do you want to know what it is?
Jon and Vince 14:29
Ehhhhhh.
Jon Krohn 14:30
I don't know... (laughter)
Yes, please what is it?!
Kirill Eremenko 14:35
Okay, all right. All right. So the concept is, if you want to scale a business, then as your business scales, whatever task that scales in terms of time with your business has to be delegated. That's it. Basically, if you're–example, if you're spending two hours replying to customer questions a week, which is not much, If you want to 10x your business, well, that's going to be 20 hours a week, which is also kind of manageable. But, if you want to 100x your business is going to be 200 hours a week. And that's impossible. So right away, you know, even at the onset, as soon as you decide it's no longer a hobby, now there's going to be a business, you need to start thinking, what is going to scale, what tasks are going to scale in terms of amount of time required as I scale my business. And whatever comes up, like you could just sit down write a whole list, whatever it comes up on that list, you need to start delegating, you need to start hiring people hiring freelancers, hiring contractors, building a team, installing a culture and that's a whole separate, you know, how you do that, that's the “how”. But the “what” is that you need to delegate, otherwise you'll be overwhelmed and you won't be able to scale your business. You won't be able to scale your impact.
Jon Krohn 15:49
Nice, that is really good advice. I'm someone who–I– like slightly starting to get comfortable with this stuff. I've been lucky At my job at untapt where I've been for the last five years in working with Vince, and Grant and Andrew, who are all co-hosts of the show. It's very nice to have people that I know, I can ask them to do something and they're going to do it really well. It's something I've struggled with. It's very hard for me trusting, you know, to delegate something to other people. The result of that is a constant feeling of being overwhelmed. Like so–I–that is amazing advice. And yeah, I really appreciate that. So, ummm...so what's your dream state? Do you have a vision today of what you're driving toward maybe with SuperDataScience or just more generally in your life?
Kirill Eremenko 16:44
That's a great question. With SuperDataScience, we have a mission, which is to make the complex simple. That's just like a mission doesn't ever change. That's something that we're always going to keep doing. Regardless of how big we get how small we get, whatever the circumstances are that's what we do. We take complex topics and break them all down into simple tutorials or simple ways to understand these topics. And the vision behind, or the vision that right now we have for, in order for us to keep fulfilling on that mission is to become the number one place for people to get started in their career in data science and to continue progressing that career. So we want to create a community of data scientists who can learn and grow and take courses and progress their careers, and at the same time, help each other out. And that's why we're supplying all these different avenues or channels of media. Whether it's a podcast, video tutorials, blog posts, workshops, webinars. We have a real event, DataScienceGO we have a virtual conference, which we're launching in June this year, DataScienceGO Virtual, Udemy courses, and so on and where people can get all this content and also network with each other. So that's, that's the vision. And yeah, we're getting we're on our way, and we're doing our best and just recently, like, what was it? Today's...? Yeah, the day before yesterday, we had our monthly team meeting where we all get together. It's quite hard to orchestrate because of the different time zones, but nevertheless. We invited four of our students to this meeting to give us you know, tell us their stories because we have people in our development team, we have people in our media team, we have people in the marketing team, who often don't really interact with students and they don't know the story so we ask them to share their stories. And it was very touching, was very impactful on how these four people from four different walks of life have interacted with our products and have thereby changed their careers or really progressed and become influencers or are starting out into the space of data science. So from feedback like that, we can see that we're on the right track.
Jon Krohn 19:05
Yeah, that's super cool. And I'm not surprised. Because I've seen in your platform, there's kind of, there's four main educational tracks. I'm going to try to remember them. There's like data analyst. There's data scientist, machine learning engineer. And, is it, it's like a, like a business like a machine learning manager kind of track?
Kirill Eremenko 19:24
Yeah, yeah, Data Science manager,
Jon Krohn 19:26
Data science manager. That's right. That's cool. It's great to think about it kind of in those ways. And there's lots of content in there that I'm looking forward to digging into at some point. Hopefully soon, although now it's gonna have to be after OmniFocus.
Kirill Eremenko 19:45
Thanks, Jon, thanks. Yeah. And looking forward to hopefully, working together. Like, as I mentioned, the DataScienceGO Virtual conference. Well, you're going to be a speaker there, so it's gonna be fun.
Jon Krohn 20:00
Yup!
Kirill Eremenko 20:00
I'm excited about amplifying each other's impact on the data science community in ways like that.
Jon Krohn 20:07
Yeah, that's gonna be–I think it's gonna be an interesting talk. It isn't–it isn't one I've given as kind of a standalone our talk before. So, I do-so I have this kind of, depending on how I do it five to 10 hours of content on applying deep learning to natural language problems, and what I'm going to do in the hour at DataScienceGO Virtual, is I'm going to kind of summarize the best parts of that 10 hours down to, specifically kind of looking at what kinds of model architectures perform well with, you know, a particular data set size and a particular problem that you're solving. And then we can compare those different model architectures. You know, if we use a dense net, a compositional net, a recurrent neural network, if we have multiple parallel streams of processing, kind of how do, how do these different models compare against each other? And what are the metrics that we should be using to evaluate their performance? I'm really looking forward to giving that talk.
Kirill Eremenko 21:08
Fantastic. Can't wait, can't wait. It sounds like you have–you're the person to give that talk. With your experience in deep learning it's going to be a really good one.
Jon Krohn 21:17
Nice, all right, well, ummm...Vince, do you have any questions? I didn't let you talk at all. Do you have anything, any–
Kirill Eremenko 21:23
Is he still there?
Vince Petaccio 21:25
I'm still here. I'm just taking it in, I'm enjoying it. No, I think we've pretty much covered it all. It's great to hear about everything you're doing Kirill. It's really great to see the impact you're having on the data science community and helping, you know, younger, more junior practitioners grow their careers and experience more and, you know, really fulfill their maximum potential. So it's great to see you. Thank you for that.
Kirill Eremenko 21:45
Thank you, Vince.
Jon Krohn 21:47
Wonderful. Yeah, so this is the Artificial Neural Network News Network. Up to this point, we've just been getting to know our guest Kirill very well, and it's been a delight to do that. Coming up next, we are going to get into the news portion of the show. We are going to be talking about new machine learning techniques that are being used to understand life in the ocean and protect it. More on that coming up in a second.
Vincent Petaccio 22:17
In a world where toilet paper can be bought for an ounce of gold, is there any place where I can find what I need? Empty shelves, stores with no toilet paper, empty rolls in the bathroom. What can I do?
And that's when I found TPML! With TPML, you can analyze social media posts to find where toilet paper is in stock and not just that, real user reviews can tell you where the highest quality toilet paper can be found to suit your needs!
Jon Krohn 23:04
All right, I hope you enjoyed that message from our sponsor TPML. Up next on the program–it was so hard not to laugh during filming of that. So, we are now going to discuss a news piece. This was inspired by something that I came across in the New York Times, an article called "AI is Helping Scientists Understand an Oceans Worth of Data." So, in this story, they talked about machine learning being used to identify and classify whale sounds from 180,000 hours of ocean sound recordings. And this is important because endangered species like right whales are migrating north due to warming oceans, and they are dying in unexpectedly large chunks. So, in the last couple of years since 2017, 30 of them–which doesn't sound like a lot, but that's actually seven and a half percent of all of the around 400 right whales that there are on the planet–So 30 of them, 7.5% of them have died in the last couple years. And they usually live very long. So this is alarming. So this was a project mentioned in the article that Google was helping create the algorithms for for detecting these right whale sounds. However, it's kind of a broader effort. There's also the Charles Stark Draper Laboratory in the New England Aquarium, who are collaborating to develop machine learning algorithms that predict where whales and other animals are, using combinations of data across satellites, sonar, radar, human reports, ocean currents and so on. The idea being that if we can predict where whales are, we can then adapt shipping routes, we can adapt fishing programs, so that they are less harmful to these endangered species. Vince, do you want to tell us more about this?
Vincent Petaccio 24:56
Yeah, absolutely. So just to give a little bit of a contextual background to this, the ocean is really critical in helping us to kind of slow down the rate and the pace and also decrease the severity of climate change. The ocean is a major sink for carbon in the atmosphere. In fact, the ocean has absorbed so much carbon since the Industrial Revolution, that if it weren't for the ocean, our atmosphere is predicted to have risen by over 30 degrees Celsius in temperature. So that, Yeah, as opposed to the one degree, which we have already seen since the Industrial Revolution. Which is itself alarming and causing a lot of trouble for humanity and other species around the world. So you can see that the ocean is really important for helping us control and stabilize the climate and the ecosystems of this planet. And so with that in mind, it's really important that we do everything that we can to kind of reduce humanity's impact on the ocean so that the ocean can continue helping us. So right now one of the things that's concerning about the ocean is that we see a whole bunch of different changes happening as a result of climate change. So, water temperature is an obvious one. As the global temperatures increase, that increases the temperature of the water, and that's what gave rise to the migrations of these North Atlantic right whales from the Gulf of Maine into the Gulf of "Lawrence" or Lawrence up in Canada. .
Jon Krohn 26:23
Yeah, yeah yeah, Lawrence.
Vincent Petaccio 26:23
Yeah. And so, uh–
Jon Krohn 26:25
In Canada, we just call him Larry.
Vincent Petaccio 26:28
(laughter) Ok, the Gulf of Larry, Larry's gulf.
Jon Krohn 26:31
He's a really nice guy.
Vincent Petaccio 26:33
He's got a great stroke. He's, he's great with the golf. (laughter) But in addition to rising water temperatures, one thing that we also see happening is the change in temperature of the water can affect the currents of the water, the movement of the water throughout the ocean. And in addition, the melting of ice in the ocean can actually change the salinity of the water because you're basically melting freshwater ice into saltwater. And between the salinity changes and the temperature changes, you get even more drastic shifts of ocean currents, which can have broader implications for weather above the surface of the ocean, for the movement and the habitats for certain types of animals, even where certain animals breed, and give birth can be drastically changed by this or totally destabilized. So it's really important that we do everything that we can to protect the oceans and the wildlife within them. And so in this particular story that you mentioned, Jon, it's Dr. Ann Allen at NOAA, the National Oceanic and Atmospheric Administration here in the United States. She had–she found herself with over 180,000 hours of underwater recordings. And this is actually a major, not issue, but challenge or really an opportunity with oceanic research. There's an enormous amount of data because we can record data all day and all night, all over the ocean. I mean, the ocean is the majority of the surface of the earth, and so there's a large amount of space where we can collect a huge amount of data. But once we actually have the data, it's hard to know what to do with it to have the greatest impact. So Dr. Allen had 180,000 hours of underwater recordings, and she was studying the occurrence of whales in a bunch of different Pacific Islands. And she said, "Okay, how can I actually use this enormous amount of information to serve my research goals?" And so as you mentioned, she worked with Google to try to build a model that could help her identify humpback whale songs in those recordings. So they–she took that hundred 80,000 hours of underwater recordings, and found 10 hours of it that she labeled to indicate whether or not there were humpback whale recordings within that and worked with Google, who kind of modified a YouTube model that they use, which–I think all we really know about it at this point is that it's a neural network, which they use to identify certain sounds in YouTube video content. And they were able to build a model with her that lets her very rapidly analyze data from these underwater recordings so that she doesn't have to trawl through hundreds of thousands of hours of recordings to find and track these humpback whales. So it's pretty exciting stuff that they're working on. And I think that it really bodes well for the future of this type of oceanic research.
Jon Krohn 29:29
Nice! This reminds me of a couple of different companies that friends of mine work at that do similar kinds of things outside of the environmental space. So a guy who did–who was studying–he was an undergrad at Oxford while I was doing my PhD there, and we were both on the Oxford entrepreneurs committee together. His name is Rasmus and Rasmus has become incredibly successful in the AI space out of Berlin. So, he did a PhD at ETH Zürich, a technical university in Switzerland. And during that time–you guys might remember this. As part of his doctoral dissertation, he came up with a convolutional neural network inside a web UI that predicts how attractive you are. So you can upload a photo and then it kind of tells you, how–it could assign a bunch of things it was like you could guess your age but the primary training dataset–. I don't think he ever said where, which website it was, but using a–somebody gave him access to a dating website database. And so you could, you have this logistic regression model predicting a–you know, so you have an input of someone's profile photo from their dating profile, and then how likely they are to be contacted in the platform. Anyway, I digress. Maybe you guys aren't aware of that. I feel there was a big splash a few years ago. Anyway. He now runs this company Merantix. And one of the things that they do is they work with autonomous driving companies. And autonomous vehicles create absolutely crazy amounts of data per second. It's like the amount of data that creates per second is like more than the data we had stored on Earth two decades ago. It's like, wild.
Vincent Petaccio 31:30
Wow.
Jon Krohn 31:31
Don't quote me on that number, but it's something–I'm not far off on those things. And so he's company builds tools. It sounds a lot like what he's describing here in this environmental use case, where you have this huge mandate, 180,000 hours of motion, sound recordings and you want to be able to identify where in there there are whale sounds, you can study those or you can build, you know, maps, predicting where the whales are going to be. So they do this with autonomous driving to be able to, so that you can look for something, some specific kind of elements amongst the huge amount of data generated by an autonomous vehicle. And I just want to give one other example, which is this company, Wirewax. So a friend of mine here in New York, Dana LaGattuta, she does client partnerships at Wirewax. And so they're the same kind of idea. Wirewax is this British company that allows companies like Disney to upload all of their video catalog. All of their B-reels, all of their ESPN video footage. And then if they want to be able to look something up, like "man catches baseball", they can type that in as a, as a query. And then using this Wirewax tool, it will go and look over all the video footage that Disney has and identify clips that seem relevant to that group. Anyway, so this is an interesting–
Vincent Petaccio 32:51
That's amazing.
Jon Krohn 32:52
It's a cool application that–all these applications, it's cool what we can do with deep learning and AI. Kirill, I don't know if we haven't been letting you talk, or–
Kirill Eremenko 33:00
That's an interesting observation. There's more and more companies that are popping up like that. For instance, a recent one is a Scale.com. Super expensive domain this company managed to get, and it's run by a very young guy. I forgot his name. I heard him on another podcast. And what they do–
Jon Krohn 33:20
Benedict Scale.
Kirill Eremenko 33:22
Huh?
Jon Krohn 33:22
His name is Benedict Scale!
Kirill Eremenko 33:22
No way.
Vincent Petaccio 33:25
Ah, the Benedict Scale!
Jon Krohn 33:25
He's a famous sportsballer!
Kirill Eremenko 33:32
He decided to get this website? (laughter). Anyway, so, Alexander Wang is his name and he is probably like 22,23 years old or something like that. And they collect, basically, their mission is high quality training and validation data for AI applications. And similar to what you said, self driving cars. Before you can use that data for training, you need to make sure–You know, like, you need to label it, you need to label all the people like, you know, like, where's the person in that? In the video that the car has captured, where's the human? Where's the signpost? Where's the traffic sign, traffic light, whatever else. And so they do all of that. And they are working with Open AI, Nvidia, Lyft, Samsung SAP, Toyota, Airbnb. I think on the podcast, he even mentioned that they're working with Tesla as well, but I'm not sure about that one. Waymo as well. So all the big companies, they need this service and they have all this data, but before you can train on it, you need to label it. So it's like machine learning before machine learning.
Jon Krohn 34:40
Yeah, yes,
Vincent Petaccio 34:42
Yeah–
Jon Krohn 34:42
Go ahead, Vince.
Vincent Petaccio 34:42
It is really an enormous amount of opportunity when you have all of this data coming in. And it's interesting because, particularly with oceans, we have, as I mentioned, huge amounts of data. But some of this data goes back really, really far in time. One example that comes to mind immediately from Is it back during the Cold War, the United States actually installed a huge array of underwater microphones which are incredibly sensitive all around the United States, in the oceans, and in some other places the world as well, with the goal of identifying and tracking submarines during the Cold War, with the idea that if you could find them, you could track them in the case of like a nuclear strike that was impending. So those have been running since the Cold War, and they're still running today, many of them and that gives us decades of data that, with the appropriate tools, we can really extract a lot of valuable ecological information from. It's interesting bring–an interesting parallel that just came to mind for me, particularly in the environmental space, is this model that is kind of humorously named VGGish, (laughter) which was published last year in 2019, by actually Huawei Cloud and the Rainforest Connection, which is an organization that focuses on well, rainforests. And this model is actually a convolutional neural network, which can be used to analyze sound recordings from smartphones in the Amazon rainforest with the intent of identifying human activity. And the goal here is to deploy an application that can be run on any kind of consumer hardware that can create this kind of ad hoc, democratized network of audio recordings to identify deforestation, in order to kind of help track it and identify where it's happening illegitimately, or to stop illegal logging and foresting. So there's just a huge amount of opportunity for kind of leveraging enormous quantities of data around the world to help us kind of beat back the specter of climate change if we just take the opportunity to get creative and think about how we can best use It. So it's really exciting.
Kirill Eremenko 37:02
Could we go back to the whole whales thing? I wanted to find out, Like what is–what are your guys' opinion on the fact that it took Google nine months to get this scientist a working model? I mean, like, from your experience in deep learning, is that, is that maybe a little bit too long?
Vincent Petaccio 37:20
Yeah, I think it's an interesting question. It's funny, I was reading an article yesterday, that one of my, one of the people in my network shared that talked about, "What's the real duration of a real life machine learning project?" And the conclusion was that the ideas that we have of how long a machine learning project takes, those ideas are skewed towards the short end by the fact that a lot of machine learning projects kind of fizzle out before they actually reach completion, and that in reality, it can take nine months to a year and a half for a typical machine learning project to go from kind of idea to deployment. And yeah, it is interesting that it took Google nine months to go from, you know, an existing model to one that was kind of fine tuned, I guess or modified for this use case. But at the same time, you know, it seems like a pretty complex question about, you know, how much of the time was spent labeling 180,000 hours of audio recordings and–
Jon Krohn 38:18
Yeah, I think they only labeled a subset, but even that would be a huge – I can't remember the exactly numbers but– like I can’t remember the exact numbers, but –
Kirill Eremenko 38:24
10 hours, no?
Jon Krohn 38:26
10 hours?
Vincent Petaccio 38:27
Yeah, they ended up with 10 hours of training data, but, you know, who knows if they had to spend, you know, four or five times that to actually find the positive cases in their data set, you know?
Jon Krohn 38:35
Oh, yeah, of course.
Kirill Eremenko 38:38
Okay, good example. Yeah, machine learning projects take, usually what I use, I've recommended the ratio you gotta multiply by five. Whatever you predict, you’ve got to multiply by five.
Jon Krohn 38:53
That sounds safe. Yeah, that sounds about right. It might not even be safe. It's so...I'm always super optimistic about how quickly things can be done. But there's–with machine learning models, you run into complexity all over the place. Vince has a knack, though, for somehow, when I predict that a certain task will take a certain amount of time, and typically–it's an optimistic estimate–Vince somehow does it in half the time, so I don't know what goes on with him
Vince Petaccio 39:20
You got to start by setting the bar low. And then just, you know, you set yourself up to always overachieve!
Jon Krohn 39:29
Yeah, I thought he was really terrible at his job. And then did okay.
Vincent Petaccio 39:35
Yep! Always exceeding expectations!
39:37
(laughter)
Jon Krohn 39:39
There you go. All right. So all kinds of cool applications of machine learning to understanding what's happening in the oceans. And on top of that, there are things like ocean cleanup robots that use machine learning to detect plastic and clean it up. So there's an example of that at UC San Diego–University of California San Diego–has a cool robot that they've been testing called Fred.
Jon Krohn 40:00
That's it for the content in today's show! Kirill, thank you so much for being on the program. Is there anything coming up soon that you'd like our audience members to know about?
Kirill Eremenko 40:09
Oh, please join us for DataScienceGO Virtual. It’s free, it’s a conference, a virtual conference. If you’re stuck at home, want to learn a bit more about data science we’ve got two days. First day is going to be for beginners, and there’s going to be some workshops and some talks, and the second day is for advanced practitioners. Lots of cool talks and workshops including yours, Jon. So make sure to check us out, you can find it at datasciencego.com
Jon Krohn 40:36
Nice! Thank you very much, Kirill. Thank you for your fascinating discussion of your journey and the advice that you provided for people with data science related businesses as well as other kinds of businesses in general. And Vince, thank you as always for your wonderful coverage of this topic, one that I’m sure is dear to your heart.
Jon Krohn 40:58
So please do like us, subscribe to us, follow this podcast. We’re on Apple Podcasts, Spotify, Google Podcasts, and YouTube. Once the Coronavirus lockdown is over we will go back to having live video footage of the program for you to enjoy on YouTube, so look out for that. Please do reach out to us. LinkedIn, I think, is a favorite for many of us. You can find me on LinkedIn, just mention that you’re a listener on the program. Is that good for you too, Vince and Kirill?
Vincent Petaccio 41:28
Yeah, absolutely
Kirill Eremenko 41:28
Yup, absolutely.
Jon Krohn 41:30
And then I’m on Twitter, @JonKrohnLearns. Although Twitter – yeah, we’ve talked about this on the show – it isn’t super active for data scientists.
Kirill Eremenko 41:39
Same.
Jon Krohn 41:41
Do you have a handle you want to share, Kirill?
Kirill Eremenko 41:44
Not really.
(laughter)
Jon Krohn 41:46
There ya go! Feel free to send me an email. I’m jon@jonkrohn.com. You can also sign up for my email newsletter at jonkrohn.com. That’s my number one recommendation for staying up to date on any of the content I’m releasing. I have a brand new series called “Machine Learning Foundations,” that I’m starting to roll out end of May. 2020 is the first class, and so I have 8 three-and-a-half hour classes running the end of May through to the beginning of September covering everything you need to know to be a great data scientist in terms of foundations. So, Linear Algebra, Calculus, Probability, Statistics, Data Structure, Algorithms, and Optimization. All these kinds of topics are covered over that. I’m starting in the O’Reilly learning platform as live trainings, however I will also be rolling out videos as quickly as I can into my YouTube channel, so look out for that stuff.
Jon Krohn 42:41
Alright, many thanks to Sangbin and Maria for editing the show. Of course again to our guest Kirill Eremenko, and to you listener, if you decide to be the person who makes our theme song eventually. Alright, and then your theme song –
Kirill Eremenko 42:54
Don’t forget Vince! And thanks to Vince!
(laughter)
Jon Krohn 42:58
I already thanked Vince again at the end! He doesn’t get thanks because he’s a co-host, he’s always here. Hey, Vince! Thanks to you, huh?
(laughter)
Jon Krohn 43:08
Thanks a lot, Kirill!
Kirill Eremenko 43:11
Thank you very much.
Vincent Petaccio 43:12
Thanks, Kirill, it was a pleasure!