published Jul 29, 2015
A few months ago we removed the “Email Your Readers” feature, which let Leanpub authors email their readers separately from the emails they can send when publishing a new version of a book. We built the separate “Email Your Readers” feature in the first place because we know how important email can be for marketing to people who have already shown interest in you and what you’re doing. However, we realized the feature had a structural problem, and in this post we want to explain why we removed it, and why we’ve built a better way to contact your readers by email for marketing purposes.
(tl;dr version: The old feature was opt-out and our new MailChimp integration is opt-in.)
We took a look at all the “Email Your Readers” emails that had been sent in the previous year or so. It turned out most of these emails actually contained information about updates to the book, which really should have been sent using the book updates email feature, since it automatically includes links to the new version and was built for that specific purpose.
The rest of the emails to readers were mostly marketing messages about new books being released by the author or for other products from the author. The majority of these were well-received by readers.
However, a significant minority were seen as spammy and resulted in both angry emails from Leanpub readers and in emails from Leanpub being flagged as spam. The former is bad enough, but the latter is a huge problem for us, as it’s important that Leanpub is able to get emails into your readers’ inboxes to let them know when you’ve updated your books.
We understand the huge value of marketing through email lists, and we believe authors who are serious about promoting their own books should have their own mailing lists. We’re not trying to get between authors and their readers. We just want your marketing mailing list to be opt-in, unlike the book update list, which is basically the only thing we think should be opt-out, since people who buy Leanpub books are assumed to be buying them in part because they can be updated. (To be clear, we never give actual email addresses to authors unless readers have opted in; our email feature hides reader email addresses from authors.)
So, given the importance of marketing email lists and the importance of only using opt-out emails for book updates, what’s the better feature?
To turn on our MailChimp integration, click on “Mailing Lists => Reader List Settings” in the author app sidebar for your book. There will be a blue “Authorize MailChimp” button, which will allow you to connect your MailChimp account to Leanpub. (Of course, you’ll have to sign up for MailChimp first if you don’t have an account.)
Once this is set up, any new readers of your book will be given the option to sign up for your mailing list. You can also import the readers who have already opted in by using “Mailing Lists => Collected Emails”.
In the future we plan to add a lot more community stuff in Leanpub itself, enabling authors to offer coupons for new books to readers of their other books and facilitate discovery of related books, etc. Basically, we want to help authors to promote their books in a way that doesn’t seem like spam from the author or from Leanpub. This will only really work once we have an improved mobile app, better web experience, etc. But the point is that you don’t need to wait until future features are built–you can use our MailChimp integration today.
Peter, Scott & Len
published Jul 13, 2015
Today Len Epp has a new title at Leanpub, one that is long overdue:
Although Len wasn’t at Leanpub on day one, he has been instrumental to its growth since joining in 2012. When Len joined, Leanpub was doing about $1,000 a month in sales; we’re now doing well over $100,000 a month and growing.
Sometimes when we’re arguing about a new feature or direction for Leanpub, and Len thinks I think he’s losing, he likes to troll me with this chart:
My troll response is that I’m so good at forecasting that I knew this would happen, and that we’d need to bring him on to help cope with it. More importantly, the next 100x of growth will be harder than the previous 100x was, and it goes without saying Scott and I are thrilled that Len became an integral part of the team early on.
Incidentally, Len has a long history of arguing with me: we met in grade 9, in high school, where we enjoyed doing cool things like designing board games and reading existentialist literature. Eventually, we headed in opposite directions but remained friends: I went west to Victoria, British Columbia to do Computer Science and Psychology at the University of Victoria, and went on to the bay area, where I did an internship at SLAC before spending 8 years as a software developer for Silicon Valley startups. Len went east, doing a D. Phil. in English at Balliol College, Oxford, and then becoming an investment banker in London, before coming back to Canada, co-founding a non-profit in Montreal and then joining Leanpub. Len recently moved from Montreal to Victoria to be with the rest of the team and focus on taking Leanpub to the next stage in its development.
Leanpub is still a bootstrapped startup where everyone does a bit of everything, and Scott, Len and I each manage client projects for Ruboss in addition to our work on Leanpub. As our resident corporate finance and literary type, Len’s interests complement my focus on product and Scott’s focus on tech. Functionally, Scott and I consider Len to be a cofounder, so we’re now making it official.
Congratulations, Len! The next phase of Leanpub is going to be the most exciting yet, and you’re going to be crucial to it.
published Jun 19, 2015
W. Jason Gilmore is a Columbus, Ohio-based software developer, writer and consultant. He is the author of eight books on web development, and has published over 300 articles in publications like JSMag, and Linux Magazine. He has published three books on Leanpub, the most recent being Easy E-Commerce Using Laravel and Stripe
This interview was recorded on May 28, 2015.
W. Jason Gilmore
Jason is a Columbus, Ohio-based software developer, writer and consultant. His recent projects include a Linux powered autonomous environmental monitoring buoy, and an e-commerce analytics application for a globally recognized publisher. Jason is the author of eight books on web development, including the best-selling “Beginning PHP and MySQL, Fourth Edition”, “Easy PHP Websites with the Zend Framework, Second Edition”, and three Leanpub books, “Easy Active Record for Rails Developers,” “Easy Laravel 5”, and most recently, “Easy E-Commerce Using Laravel and Stripe,” which he wrote along with his co-author Eric Barnes. Jason has also published over 300 articles in publications such as developer.com, JSMag, and Linux Magazine, and he has instructed hundreds of students in the United States and Europe.
In this interview, we’re going to talk about Jason’s professional interests, his books, his experiences using Leanpub, and ways we could improve Leanpub for him and other authors. So thank you Jason for being on the Lean Publishing Podcast.
Jason Gilmore: Thank you for having me, it’s quite a pleasure and an honor.
E: I’d like to start maybe with a biographical question, going way back. Can you tell us how you first became interested in being a software developer?
G: Sure. I think I’ve spent the majority of my life sitting in front of a computer of varying types. By the time I went to college, it was really a forgone conclusion, that I would wind up in software, just out of a pure interest in the topic that has again really been a lifelong interest. At the time didn’t know where I would end up in the software world, but I just knew I had to find a place somewhere within the industry. And I’ve had a lot of fun ever since.
E: So did you study computer science?
G: I did, I studied computer science at the Ohio State University based right here in Columbus, Ohio, and upon graduation, have since spent the majority of the last 17 years or so working for a wide variety of companies and clientele. I’ve spent the majority of my career working as an independent contractor. And I’ve worked with universities, with startups, small mom-and-pop type businesses, as well as a lot of other larger organizations as well. And I’ve always found it very interesting, because no matter whether it’s a mom-and-pop organization or a large company, there’s always a new and unique angle to the project you’re working on. So I’ve really enjoyed really every minute of it.
E: I imagine working freelance, or maybe “independently” might be a better word, must have a lot of advantages in terms of control over your time and the projects that you’re working on?
G: Absolutely. Especially I think - and this happens to a lot of more experienced developers - over time you begin to have the luxury of choosing more interesting projects. And that has certainly been the case in recent years. I’ve had the opportunity to work on a variety of e-commerce projects, both on the sales side and the analytic side. I’ve worked on some data warehousing projects. And as you mentioned in my bio, I had this really, really fascinating opportunity to work on a Linux-powered environmental monitoring device, which was way out of my general scope of knowledge, so it was quite a challenging project and a lot of fun. So, like I said, it’s great to work in this capacity, because there really always seems to be something interesting going on.
E: It sounds really interesting to have such a wide array of things to work on, and never knowing what’s around the next bend. Perhaps this is related to that, but at what point did you start writing about your work?
G: Interestingly enough, my writing career, if you will, started almost at the very same time that I landed my first professional programming gig. Making a very long story short, I was living overseas at the time, and I was working on a project for a company in Italy, and it just so happened that it was PHP based. At the time of course, PHP - this was 1998 I think - PHP was of course just a few short years old at that time. So I was scouring the web for learning resources, and happened upon a website that is still around today, called devshed.com. I went to the homepage - I remember this like it was yesterday.
I went to the home page, and there was an ad, a banner on the front page that said, “Devshed is looking for authors.” And I thought, “Wow, that sounds like it might be interesting.” So I emailed the individual running the site. And it was a gentleman named Randy Cosby. I sent along a couple article ideas. It was for both PHP- and MySQL-related topics. As I mentioned, they were looking for authors. So Randy was excited that somebody had inquired I guess. And next thing I know, I was writing regular articles for the website. I really enjoyed that, and interestingly, that began what I commonly refer to as my two careers in the software industry. One - hands-on doing development and consulting. And the other, a much more, let’s call it academic career, in which I was at least attempting to educate my fellow developers in some fashion.
And that project led to - now I’m dating myself, this was a long time ago - some early tutorials for the very new mysql.com at the time. Again, they were very happy to receive material, I guess. I wrote for zend.com, for O’Reilly Net when they were actively publishing tutorials, Web Review, and a wide variety of publications that were really popular at the time, I guess. And a lot of those have gone away, but what that did was give me experience, right? I prior to that had really no writing experience, other then that gained at college, writing the occasional essay.
So this was a great way to formulate my thoughts, and I guess prove to myself that I understood whatever that topic was. And again, back then it was primarily PHP and MySQL. And over the years again, that lead to writing for Linux Magazine, for TechTarget - so a wide variety of publications. And it gave me the experience to formulate these thoughts and try and communicate a usually highly technical topic in an understandable way. And I continued doing that very regularly. Certainly on a monthly basis, all the way up through graduating from college when I returned to the United States.
I finished my degree, and I’m not sure, it was two weeks after I had graduated that I received an email from a gentleman named Gary Cornell, who was a founder of Apress, a computer book publisher. And Gary said in the email - I also remember this like it was yesterday - “I really like your PHP articles. Do you want to write a book?” And I had never in a million years considered this idea of writing a book. Again, I had just graduated, and I was doing some contract work, even at that time. I responded and said, “Sure, let’s do this,” and ended up writing a book that was called, “A Programmer’s Introduction to PHP 4.0.” And that published in January of 2001, if I’m not mistaken. I effectively dropped everything and wrote that book over the course of about four and a half months. And I mean, I just - I dropped everything and just wrote full time. Because I was so excited about this interesting opportunity. And that book published, and it did moderately well. But what that did was really give me the writing bug, and really motivated me to devote an increasing amount of time to writing. And that led to the next book, which published in 2004. And the next in ‘05. And here we are, this is hard to believe - 14 years later, and I just published the ninth book through Leanpub.
Along the way, I stepped further into the publishing industry. I spent several years working as Apress’s open source editorial director, acquiring books that fell into the open source genre of all types - Linux, MySQL, Postgres - you name it. Pretty much anything exciting that was open source-related. I worked with authors to publish their books - that was a lot of fun as well. But here we are again 14, 15 years later and I’m still writing away and still having a great time doing it.
E: That’s a really great story. Obviously I have quite a few questions about your opinions about publishing and the industry, and where it’s going and your experience. But before I get to those, I’ve just got a couple of other questions I’d like to ask about. I know that you’re a co-founder of the annual CodeMash Conference, which is the largest multi-day developer event in the mid-west. I was just wondering if you could explain a little bit about what CodeMash is, and why you helped found it?
G: Sure, CodeMash is an annual event that is held here in Northern Ohio, in a town called Sandusky, Ohio, which is also home to Cedar Point, the large amusement park, just for a point of reference. And CodeMash was started, I guess, 11 years ago - give or take. By myself, a gentleman named Brian Prince, and another gentleman named Jim Holmes. CodeMash came out of a series of conversations that a group of us had about the mid-west and the lack of at the time - now this has very much changed - the lack of tech conferences. Given our three respective professions, it just so happened that we traveled on a regular basis to - well really all over the country, but to the east and west coast primarily to attend tech conferences.
I was at the time working for Apress as an editor, and that meant going out and meeting prospective authors all of the time. And one of the best places to do that logically is at conferences. So I would attend all of the conferences, and always marveled over how they were always in California, or always in Portland or Seattle or New York or Boston. And this, despite there being a very large technical community in the mid-west. And we just really concluded that the mid-west, and Ohio in particular wasn’t getting its due in that regards, and had the crazy idea - trust me, very crazy idea - that we would change that and start a conference.
And that conference became CodeMash. And CodeMash is, I think, a very unique event in that, well, it’s held in Sandusky, Ohio, which was not a tech hub for starters. But it also happens to be held in a place called the Kalahari Water Park and Convention Center. The Kalahari is the largest indoor water park in the United States, among other things. Well, we hold this in January, and Sandusky, being in the northern part of Ohio, is a very cold place, very snowy. So we have this interesting scenario where the weather is usually absolutely horrible, but there’s a large group of attendees inside, because the Kalahari is like a small city, it’s a very large complex, wearing shorts and sandals and it’s almost like a tropical environment. It’s really kind of funny. And so we have the water park, and there’s all of the things that you might imagine would come with that water park. There’s a large arcade, there’s all sorts of forms of entertainment. But, at the same time, the conference, which is now four days in duration, hosts - I’ve lost track, somewhere north of 250 sessions over four days.
E: Oh wow.
G: Half day sessions, which are four hours, full day sessions which are eight hours. And then something like 200 one-hour sessions on the Thursday and Friday. We also have something called KidsMash, which is a companion kids’ conference. So because the Kalahari’s a fun place for families to attend, the kids have the opportunity if they’d like to attend KidsMash, and learn about robotics and learn about programming and technology in general. So it’s certainly earned a reputation as being a very family friendly conference. And that’s something that you certainly do not see at other conference venues.
Over the years we’ve hosted a number of different bands. So we’ve had a number of rock bands, if you will, play on Thursday night. We have a water park party, and there’s a game room that goes on I think 18 hours a day, where you can play Settlers of Catan and all kinds of other interesting games. So a lot of fun, a lot of networking goes on. The event has grown to somewhere north of 2,000 attendees. So great networking, a great opportunity to learn.
And over the years, even as the conference has grown, we’ve stayed true to the really central idea that it would be low-cost. If you go to the west coast or the east coast, because the cost of everything is much higher - I mean the cost to rent a conference center in Portland is just astronomical, right? Because it’s so much higher, you wind up paying a pretty hefty price for a conference ticket. CodeMash, for all four days is just a couple of hundred dollars.
E: Wow, that’s amazing.
G: We have a very special hotel rate, yes absolutely. And the conference is, it’s a non profit 506-E3. The committee have certainly stayed true to that tenet of providing this low cost, high quality event. And just to clarify, I say “we” because of course, this goes back more than a decade, and I’m still in many ways very wedded to the conference. But I, and one of the other co-founders, are - we call it “retired” as of last year. So we are consulting members to the board, if you will. Which means we complain about whatever, and are generally useless. But it’s a great event, and if you’re - I was going say anywhere in the mid-west, consider attending. But the reality is, if you’re anywhere in the United States, I would suggest giving it a look. We have people from California, Florida, Georgia, Texas. People come from overseas every year - from England, from Germany. So it’s grown to be quite a sizable event, and I strongly recommend checking it out.
E: Well thanks for that, it sounds like a really great thing to have created, and very innovative in the best sense of the term.
Just switching gears a little bit. I have a question I read in the introduction to Easy Laravel 5, your Leanpub book. You focused on PHP for much of your career, as you’ve been saying in this interview. But then you wandered away from PHP for a while, before coming back to it via Laravel. Can you explain this professional journey, and why you were losing interest more or less in PHP, and why you came back to it?
G: Yeah, absolutely. Back in - let’s say 2008, 2009, I think the sentiment was shared by a number of other programmers at the time. It just felt like maybe the PHP language was being passed by, in a sense, by other emerging technologies. And certainly this is maybe a general characteristic fault of many developers - myself included, that you’re always looking for the next new interesting, shiny object to learn about. And there was quite a bit going on at that time. Node had just come out, in ‘09. I think Angular had just come out. Rails of course was a very, very hot topic at the time. And it just seemed like there was a lot going on in these other areas, that wasn’t necessarily going on in PHP. It almost sounds like an accusation, when in reality, maybe I was part of the problem, right? But being an open source project, if I had felt that, and others had felt that - maybe we should have attempted to inject some excitement into the community. But at any rate, there were a number of individuals who did do that, right? And I think the sentiment was widely held at the time.
And, so I’d stepped away. Not stepped away, I just had gotten caught up in doing a lot of Rails work at the time, frankly. But none the less, having been involved with PHP for so long over the years, of course I just kept an eye on what was going on. And out of nowhere, along comes - for instance, Composer, the package manager, which was and still is a huge improvement to the overall workings of PHP application development. And then of course, a number of frameworks really started to get popular. And we saw a lot of micro frameworks come out of nowhere. Laravel came along. And all of a sudden, there was - PHP went from being an area that, although still in very widespread usage - I mean, there’s no debate about that, it was just not a particularly exciting place to be in, and suddenly it became an incredibly hot topic again. And really it was Laravel in particular that captured my attention. And here we are. I’ve since written 2 books about the topic. But it was really interesting and I think exciting to see this community that has been so prominent for so long, really - it has a second wind right now, or maybe third wind. But I think there’s really no more exciting time than now to be a PHP developer. Which is pretty cool considering the language is, I guess technically, 20 years old, right?
G: And you can’t really say that about a lot of programming languages. It’s just a very exciting time for PHP developers, and I’m glad to be part of it.
E: Speaking about your books, I noticed that sometimes you involve the idea of getting the person who’s reading the book involved in the development of a real world project. I found the Arcade Nomad project to be fascinating in particular. After I read it, I lost some time doing retro gaming, specifically Shinobi. I’d forgotten that if you touched someone or something, you just die.
G: That’s right.
E: No health points or anything like that, you’re just gone.
G: I have a Shinobi arcade game in here in my house. It’s one of my favorite video games of all time.
E: Oh yeah, it’s fantastic. Before I move on, I have a question related to that. Can you give a description about what the Arcade Nomad project is, and what inspired it?
G: Sure. I am a very big believer of including companion projects in the book - so, basing the book around some sort of thematic project that would certainly be a viable, real world application that somebody might want to build, and maybe turn into a business. And at the same time, I’m very adamant about only writing about technologies and topics that interest me. Because I think you’re going to write a better book if you’re truly into the topic, and interested in the topic, then I think a better book is going to come out of it. And being a child of the 80’s, I spent a heck of a lot of time playing arcade games inside malls and gas stations and grocery stores, and you name it. Of course as the years have passed, those arcade games have become very few and far between. At least here in Ohio, there’s almost no arcades left.
So I thought it would be interesting to basically build a companion project that would allow fellow interested aficionados of arcade games to add arcade games that they find out in the wild - maybe if they’re at a gas station or a laundromat or whatever. They could add the game to the site and help fellow arcade gamers find those as they’re passing by. That became the theme project for Easy Active Record For Rails Developers. I had a lot of fun building it. Unfortunately, I don’t travel anywhere near as much as I used to, so I don’t exactly contribute to the site. But yeah, you really want a project that captures the attention or the fascination of the reader, and can hold their attention throughout the course of the book. Whether the reader is 20 years old or 50 years old, I think pretty much - I’m stereotyping a bit here - but pretty much everybody in tech can relate to video games at one level or another.
E: On that note, I actually noticed an interesting element to your latest book, Easy E-Commerce using Laravel and Stripe. There’s sort of a gaming element built into it, where there’s this fictitious lawn care company named, “We Do Lawns.”
G: That’s right.
E: And there’s this villain, who’s actually a boss, whom you set up as the surly company owner, Todd McDew. You set up this narrative where you’re actually doing work for this company owner and trying to sort of satisfy him at the end of each chapter…
G: That’s right. “Easy E-Commerce using Laravel and Stripe” was without a doubt the book that I had the most fun writing. And that was with a co-author, Eric Barnes, who’s also the founder of the very popular Laravel News website and newsletter. I had an absolute blast writing this book with Eric. The whole concept of the “We Do Lawns” fictional company and its owner, Todd McDew, and it’s assistant Patti Oregano, came about during the course of a very early conversation that Eric and I had about the book. We knew we wanted to base it around a companion project. I’m pretty sure it was Eric that came up with the idea of the landscaping company, and I think the name as well. And so of course we bought the domain, because of course that’s the first thing you do when you think of any project, right? Rush out and buy the domain before somebody else snaps it up.
And that just very quickly evolved into this story within a story, in which we had the company owner, Todd McDew - he’s a very gruff individual who knows he wants the website, but he doesn’t like tech. And so he begrudgingly basically works with you, the reader, to build the site. So we wanted to inject, even though it’s of course a little melodramatic, we wanted to inject this sense of realism into the book, where not only are you learning about Laravel and Stripe, but you’re additionally getting some sense of the tension that you might encounter when you’re working with a client who might not be entirely rational. He might be making odd requests for certain features and things like that. So we just had fun with it. And I think the book is much better because of it. We have some really - I hope - funny dialogue at the end of each chapter, in which you’re interacting with Mr McDew, and you’re telling him about the features, and he’s asking questions, very valid questions - and offering his opinion.
Not all tech books need to be boring and dry and very academic in nature. I mean the reality is, you can write a book that people enjoy reading, and have fun reading, and hopefully learn something from it that ultimately enriches their careers. I know that’s the type of book I want to write, and I have a hard time believing I’ll ever write a book that does not include some sort of companion project for those very reasons.
E: On that subject, and the subject of your writing and your experience with publishing and the kind of books you want to write - I wanted to ask you, given you’ve had experience writing a successful print book, and your work with Apress - I’d like to know why you decided to switch to Leanpub?
G: How many hours do we have? My interaction with Apress, as I mentioned, goes back 15 years now, and I’m still great friends with many of their employees. Apress is very much a part of me in many, many ways. And you’re right, I did write a very successful book for Apress, that has been in print for 11 years next month - and that’s very hard to believe. The book is called, “Beginning PHP and MySQL,” it’s in its fourth edition. And that was published in June of 2004. So we’re fast approaching it’s 11th year in print. By any measure, that’s a rare, an extraordinarily lucky outcome or result, to have a book that’s been in print that long. I’ve had a royalty stream for 11 years because of that book, and it’s worked out great. I can’t even imagine it could’ve worked out any better, right?
So logically - believe me, I’ve been asked this question many times - logically, why would you not continue working in that fashion? And the answer is, I think, complicated, but it really boils down to several key factors, first and foremost being control. I have complete control over how I write the book, how I go about writing the book, what my schedule is going to be for the book, how I want to market the book. Do I want to sell it just electronically, or do I want to do a print book? Which I have done, I self-published, “Easy PHP Websites with the Zend Framework” back in 2009 and 2010. And I published that as a real-deal print book, through a real, very famous computer book printer up in Michigan, called Malloy, and went through the process of publishing it in a very traditional fashoin. I’ve since moved away from that because the electronic format has become, without a doubt, the predominant way to purchase computer books these days. But certainly control is a big part of it. I control all of those factors now, and I very much like that.
But let’s, I mean, let’s not beat around the bush here. It also comes down to the revenue side. And Apress is, almost becomes, irrelevant when we move onto the revenue side. Because all traditional book publishers generally structure their compensation agreements, their royalty agreements, in the very same way: typically, an author gets an advance, which could be anywhere between - let’s call it three and six or seven thousand dollars these days - and there’s also the royalty stream that comes from it, which typically, this does vary a bit, but typically starts at 10, 12 percent. And what a lot of first time authors do not understand is that that comes from the net sale of the book. So if a book has a price tag of, let’s call it $50, it’s sold into a chain - Amazon, what have you - at a discount, because logically, the retailer needs to make money, so they’re not buying it at list. They’re buying it at discount. And I’m just throwing out a number here, just to keep this straight, let’s say it’s 50%. So a $50 book is sold in at $25. That is the number from which royalties are calculated. And again, just using the number 10% to keep this simple: a $25 sell in gives the author a $2.50 per unit royalty.
And there’s a million different other factors that come into play in regards to that money. That money of course is paid out a quarter in arrears. So you’re looking at 6 months before you see that money. But of course, the advance needs to be recouped, which makes perfect sense. But also a certain percentage of those royalties are held, often up to a year and a half, because of potential returns, which these days is really kind of irrelevant considering almost everything is electronic, or print on demand. So there’s a lot of revenue related factors that come into play as well.
Now granted, this is not - and I want to be very clear about this - this is not imply that a publisher is taking you behind the woodshed in terms of making money, because they have editors, they have marketing, they have printing, they have production, they have offices. They have all of this infrastructure that self-publishers logically do not have. And that’s a great benefit to a lot of first-time authors. Because, well, you get an editor. Which are expensive. You get a marketing team, you get all of these extra things. Right? But that plays a big part in determining what, in terms of the monetary side - what you wind up earning. And I invite everybody to pull out their spreadsheet and do some math to figure out how many copies you have to sell - again, using that $2.50 per book number, to make real, to make a decent amount of money. You have to sell a lot. Especially when you take into consideration the certainly hundreds, if not thousands of hours that you’ve put into writing the book. And that’s a big deal.
Now on the flip side of that, working through a service such as Leanpub: obviously Leanpub has its expenses. Leanpub’s only taking 10% of the sale, meaning the author gets 90%, right? [Editor’s note: Leanpub pays author a royalty of 90% minus 50 cents on every sale]. It completely, quite literally, given the example that we’re using, turns that revenue agreement upside down. The author, the content creator, earns the majority of the money derived from the sale. And again, I invite everybody to run those numbers. They’re completely different. Now, you do not have a marketing team, you do not have editors. There are a lot of things that you do not have. But because that revenue model was flipped on its head, you only have to sell a fraction of the number of books that you would otherwise have to sell through a publisher to earn the same amount of money.
I maybe went on a bit of a tangent there, but it’s very important I think that people who are weighing whether they should use a traditional publisher or a self-publishing service such as Leanpub, I think it’s very important that they understand that dynamic. Although, if you ask the question to any publisher, they’re going to be perfectly clear and explain that very matter to you, I still think it’s lost on a lot of first time authors.
So, there’s the control side of things, and there’s the revenue side of things. And of course, there’s the customer side of things. When working through a publisher, I have no idea who bought my book, unless they email me. I’ve no idea. And using a publishing service such as Leanpub, of course I can email those authors through the Leanpub interface, which I do on a regular basis - I can interact with many, many more readers than I would otherwise be able to, working through a traditional publisher. That’s an aspect of book writing that I absolutely love. It’s really great to find out what other people are working on, and how they’re using my book to accomplish that.
Maybe jumping back to the revenue and control side of things again, another great advantage of self-publishing is the flexibility. When you take a look at the Easy Laravel book for instance: I sell the book. I also sell the book plus three and a half hours of video. I sell the book, plus the videos, plus consulting: if you would like to consult over Skype, maybe I could look at your code, maybe I could explain some of the more complex aspects of Laravel or PHP in general. There’s that option as well. And those options just aren’t - because of the nature of the beast when it comes to traditional publishing - those options really aren’t readily available. You just can’t do that stuff so easily.
E: Thank you very much for that. I really appreciate all the detail. I think you explained everything very well. On that note, when it comes to doing things that maybe couldn’t be done before, and in particular in relation to royalties, that’s one of our big hopes - there’s this sense in which we’re hoping to unlock the creation and the successful publication of books that previously would never have been written. Because let’s say you’re writing, especially in the technical space - let’s say you’re a specialist, and there’s only 1,000 other people out there in the entire world who are sophisticated enough in the area to be interested in reading a book. Conventionally, what publisher is going to take you up on that?
G: That’s right.
E: How much of a commitment are they going to really make to it? And how valuable is it for you as an author to do that? Well now, with something like Leanpub, that pays a 90% royalty, minus 50 cents per transaction, if you can reach 500 of those 1000 people, but sell the book at say $20 or even $50 per book - you’re getting a 90% royalty, and suddenly it actually becomes worth it.
E: And so there’s a whole new space opened up for especially specialists, but anyone to actually write a book. And make enough to get a car or whatever it is you need to do. Or pay some tuition for your kids or something like that. It’s a whole new way of making publishing valuable to authors.
G: That’s right, and you’re - you hit the nail on the head. That book, if the perceived audience were 500 to 1,000 members, that book just would not be signed. And logically, I mean, this makes perfect sense. A traditional publisher’s just not going to look at that. Because it would be in turn, a hard sell for them to get it into the retail chains, right? I mean, what retail chain is going to want to put a pre-order in on a book that has such a small perceived audience? Whereas with Leanpub, that is a - if an author can turn around and reach those 500 members, or the 1,000 members - that, for all involved, is a fantastic outcome.
E: Yeah, I completely agree. One of the interesting things that’s unlocked by this approach as well, including self-publishing, and in-progress publishing that you can do with Leanpub, is that, especially in the technology space where things can move really fast - well, maybe a lot of people had their first or second book published conventionally. But eventually they’re like - the subject of one of the first interviews we did for this podcast series told us that, between starting a book and having it appear in the bookstores - he had a child.
E: And the world moves on from when you start writing to when something gets finished.
G: That’s right.
E: And the timeline of a conventional publisher - you were a thought-leader when you started writing it, but no one knew that, and by the time your book comes out, it looks like just conventional stuff.
G: And this is a very good point. I can draw upon very recent experience with the Easy Laravel 5 book. That book was published on February 4th or 5th of this year. And I tallied it up coincidentally yesterday. I have released 105 updates to that book since February 4th. Bug fixes, new chapter, improvements to sections. You name it. 105 updates, okay? If that book had gone through a traditional publisher, when that February 4th date hit, and the book was released - you’re looking - unless there is a really egregious error, a show-stopper - you’re looking at minimum 6 months for updates to occur. Just because that’s just the way it works. There’s just a process that is necessarily lengthy to do those sorts of updates. Now, contrast that with what I’ve just done here over the last 3 months. I released 105 updates to readers. And it’s so easy through Leanpub. I mean, I manage everything in GitHub, and write the book in Markdown.
And maybe a reader emails me and says “Hey, this changed.” This just happened last week. “This changed in Laravel, you need to update this path.” No problem. I open up Sublime, make the change, commit the change, go into the Leanpub interface, publish new version. If it’s a big change, I tell the readers. If it’s not, I just do the update and not bother anybody. It’s great. I mean, in terms of writing environments - and I think I’ve used them all over the years. Whether it’s Word or DocBook, LaTeX - you name it. This is the perfect writing environment. I can write the book in my code editor, which also happens to be Sublime. I can use the very same tools that I use every day for other work. Git, namely Github, to manage the book. And then, and maybe most importantly of all, I can use the Leanpub production mechanism. I press the magic button, and the book is formatted.
Having gone through, in 2009, 2010 with my first self-published book, I went out - and this is how insanely stupid I am sometimes - I went out and bought InDesign, spent $700 or whatever it was. Laid out the book myself to printer specifications at the time, because that book was - it was printed as well as in electronic format. I just had a horrible time managing all of that, and making the book look good. I used DocBook in a subsequent book, that was great too. But still quite a process, a chore, right? That’s not my strength. I’m not a production editor. I want to write. I don’t want to make the book look good. I mean, I want it to look good, but I don’t want to invest my time in doing it. And Leanpub does that flawlessly. The book looks professional, it looks great. I mean, I don’t know what else - I could go on and on about how great it is.
E: Well thanks, thanks very much. We really appreciate hearing that. Our customer development is very important - sort of a philosophy at Leanpub, you know the Steve Blank kind of philosophy. It’s been through interacting with people, with authors directly, that we’ve managed to build the Leanpub engine, as it were, into something that has hopefully a more or less automagical book creation button now.
G: That’s right.
E: And again thank you very much for your kind words. I actually have a very kind of “working author” question for you. I noticed that all your Leanpub books have the minimum and suggested prices set to the same amount. And for anyone listening, on Leanpub, authors actually set a suggested price for their book - which is presented to customers on their landing page. But they can also set a minimum price that is lower than that, and Leanpub customers can actually take a slider and slide that down to the minimum price if they want to. Or they can slide it up, and even pay more than the suggested price. So, Jason, I was wondering what led to your decision to make the minimum and suggested prices the same on all your books?
G: This is going to be a very anti-climactic answer. I’ve never given it that much thought. I mean, it’s definitely a cool feature of Leanpub. And you see, I’m always scouring the Leanpub catalog, and just looking at the different books, because it interests me more than anything else, and I see that a lot of authors do set those differently. I just, I don’t know. I just haven’t given it that much thought at that level of detail. Basically, my approach to pricing is really simplistic, and really underscores my general lack of expertise in that side of publishing. I start out by setting a price that I think is reasonable and fair, and reachable by your average readers. Not something too extraordinarily expensive, and which tends to fall around $30 more or less.
And then I will slightly tune it, which is another great aspect of self-publishing. I will start to tune it in the weeks that follow. If the book is not selling as many copies as I had maybe thought, well then logically I’ll drop the price a couple of dollars. And over the weeks, I basically try and find the perfect balance for the book price. But I just haven’t given the whole minimal pricing concept much more thought than that. I’m just trying to find, again, a fair and reasonable price. And I leave it at that, and basically put the majority of my time and effort into continuing to improve the book as quickly as possible, following that initial publication.
E: Thanks for that. I think that’s a great answer. It includes a lot of things including changing the price over time, but also taking into account where your energy and attention ought to be best directed when you’re working on a project. We’ve had some authors who really obsess over pricing. What’s the psychological impact of having a different minimum price from a suggested price? But that doesn’t necessarily mean they’ll do it one way or the other. And other people are like, “You know what? The pricing is not something that I want to spend a lot of time thinking about.” Obviously people will adjust it and maybe have sales, like a coupon on Twitter every once in a while. But it’s a really interesting thing how, sometimes you can optimize your processes by sticking to the things you really care about, and not trying to be someone you’re not and get involved in a type of customer engagement maybe that isn’t just something you’re really interested in.
G: To the audience, I’m sitting here listening to this, and really smiling ear to ear. Because that in a nutshell describes the transition that I have made as a self-publisher over the last five years, six years. Back in 2009 when I published the Zend Framework book, as I mentioned earlier, I was doing the production myself. I designed the cover myself, and if you’ve ever seen anything I designed, I am horrible at it. Designed the original cover myself. I had an e-commerce site set up through wjgilmore.com. So I was managing the sales myself. Well, I ask everybody - how much time did I have left to write books? Not much, because I was dealing with all of this other stuff, that were not my core competencies. And over time, I learned the hard lessons in regards to understanding what my core competencies are: writing books and hopefully talking to a lot of customers or potential customers about those books. Sticking to those competencies, rather than trying to be a control freak and manage everything else, right?
In the last eight months or so, I published the Active Record book in August on Leanpub. This has without a doubt, from a writing perspective, been the most productive eight months of my career. There is no question. Let’s see, 200 - I’ve written over 500 pages of published material, and have at least another 500 pages in development right now. So, I mean, without a doubt - not dealing with production, not dealing with e-commerce, not dealing with cover design has freed up the time to do that. There’s a very important lesson there to be learned there, because it’s easy to get caught up in pricing and fret over that, and easy to get caught up in all this other stuff that is not part of what - not only what you’re good at, but what you enjoy doing. I never enjoyed that stuff. Right? I mean, I guess the e-commerce stuff was - it’s always been fun but I just never liked that other stuff. I like writing, and that’s all I want to do.
E: It’s interesting that, in the context of technical publishing, that it reminds me of an old colleague of mine, an English professor, who, when his first book of poetry was published, I talked to him about it, he said to me, “Of all the things I had to go through to get my book published, almost none of them had anything to do with writing poetry.”
G: That’s right.
E: So yeah, it’s true across genres, because of the way the industry has always worked.
G: To be frank though, even in self-publishing, the last 20% of the time spent on any book, self-published or otherwise, is by far the most stressful, and the most time consuming, precisely because you have a million little details to wrap up. I’d love to be the fly buzzing around every would-be author’s room, because it’s that time that I believe most book projects wind up dying. The author becomes so tired and frustrated with himself and with the process, that there are probably countless books that have never been published, simply because that last 20% of the project is so rough. So by removing all of these other very important tasks from my stack, I can power through that 20% much, much more effectively than before.
E: On that note actually, I have one last question for you.
E: Is there anything on Leanpub that you would like us to improve? Or something that you think is missing, or that would just make your workflow even better?
G: I would say the book cover. The dedicated book pages are great, I mean, in that they serve the very obvious purpose of telling would-be customers about the book’s contents. I think those could use some work.
G: And again, this is coming from - I just told you I have zero design acumen. So I would not be the person to do that. But I think there would be some interesting opportunities to give authors the ability to add even more content to the book, and maybe organize it a little better.
G: One of the reasons that I have companion websites for each book - easyactiverecord.com, easylaravelbook.com, easyecommercebook.com - is precisely because I can provide even more potential information to readers, even though maintaining them can be teidous.
Now at the same time, I appreciate very much there being a single page for the book, the admin interface. You go in, you can use HTML source if you want. In some cases you can use Markdown. It’s very streamlined, you can input the information, hit update and be done with it.
So there’s two sides to that coin, right? On the one side it’s very streamlined and great. On the other side, I just wonder if there would be the opportunity to add a little bit of additional information. Other than that, I very much enjoy the service. As your colleagues know, I email hello@leanpub probably once every two weeks saying, “Did you know this is broken or whatever?” Little minor things. And I think, “Boy, these guys must really get irritated when they see…”
E: Oh no, no. Quite the opposite. That’s some of the most important work we do, is listening to people. And I’m not saying that in a precious way. It really is. One of the wonderful things about working with authors is that they like to write, and they have long attention spans. So when it’s sort of like the perfect kind of customer to have, for customer development.
E: They think hard about what they’re doing and are investing a lot of time and thought in what they’re doing. Sometimes their books are very important to them professionally, intellectually and also personally. And so any feedback we get is very good for us, and we really appreciate it. So I just wanted to say thanks for that feedback about how we might be able to improve those initial pages in the book, and also for what you were saying about your websites. I noticed in particular with easyactiverecord.com, you have a “what readers are saying” section. Something like that where people can put testimonials, would probably be really useful -
G: And those sell books. They have very profound impact on book sales. Yeah, I mean stuff like that, right? Stuff to really polish the edges of the page. Because fundamentally the important material is there, right? You guys definitely nailed that. But polishing the edges and adding those - again, optional features. They don’t have to be mandatory but… Maybe now that I guess, now that the gears are turning, it would be interesting to receive a little bit more in terms of customer demographics. For instance, country, or state. Little details like that, because that helps, if I wanted to do a Facebook campaign, which I’m running right now. Or a Twitter campaign, advertising campaign, because you can highly target individuals, and hopefully put your book in front of them, your advertisement. Having that sort of information - and I understand why Leanpub doesn’t divulge, unless the reader allows it.
G: I totally understand that, why Leanpub doesn’t divulge all of the customer details. But having some general information - state, country, city. Or any other optional information - profile information that the customer, again, would like to divulge. That would be pretty useful.
E: Thanks for that, that’s a really good suggestion. We do have some Google Analytics I think that you can use for tracking people who come to your webpage or people who convert.
G: I do use that, yeah.
E: But as you say, obviously for us, there’s always this trade off where we know, obviously the kind of author who’s into it wants to have as much information, and publishers who use Leanpub as well, want to have as much information as they can. But it’s very important to us to protect our readers so that information that’s important to them isn’t being released - basically, we don’t want people to have to opt out. That’s just not very Leanpub.
G: Totally agree with you there. I mean people can opt-in to share email addresses with authors and things like that. For example the feature you were describing earlier, where you can actually email your readers. That’s done through the Leanpub kind of form, so that actually they don’t see your email address, and you don’t see theirs.
E: That’s right, and there’s a certain type of author for whom that is a deal breaker. They’re like, “I want my list.”
E: And they want as much information as I can get about the customers. That’s just always a tradeoff for us, and we’ve got a pretty straightforward position on it. But it’s always something that we can work to improve, and maybe even be just clearer about.
G: Yeah, and my answer to that is a simple one, because I’ve had this very conversation with other Leanpub authors and customers, about that concept, authors who are adamant about collecting other customer information. I get that side of it as well. I understand that. Well, then there are other services out there in which you can do that. If that is what you need, then go there. Or build a list through a newsletter, right? I mean, there are other ways in which I do exactly that. I manage my email list through, in my case, MailChimp, and use Leanpub for a significant part of my sales. Leanpub strikes, in my opinion, an appropriate balance in that regard. Because I put myself in the position of the reader. If I’m the reader, do I necessarily want to divulge my details to a potential author? I don’t know - maybe. But as the customer, I appreciate having that choice.
E: Well thanks very much for that! I think our time is just about up. Thanks very much for being on the Lean Publishing Podcast, and for being a Leanpub author!
G: Oh it’s my pleasure, and thank you very much for your time.
This interview has been edited for conciseness and clarity.
A Game of Features: Killing the Affiliate Program, Advertising Program, Offers & Services, Outdated Documentation and Leanpub Enterprise
published Jun 16, 2015
Today we’re announcing the ending of a number of Leanpub features and experiments.
The following features are being killed today:
- The Affiliate Program
- The Advertising Program
- The Offers & Services Feature
We’ve also killed the Leanpub Enterprise experiment. If you didn’t notice, you’re not alone. Oh yeah, and we’re going to be removing a bunch of documentation that is about 3 years old. We’ll be replacing it with updated documentation, and (once we’ve done everything we plan) new videos, etc.
At Leanpub we value experimenting and learning. We’re called Leanpub after all.
So, we try a lot of stuff, either of our own initiative or in responding to customer and potential customer feedback. Sometimes we learn what works or doesn’t work, and other times we learn how we feel about a market or a process.
Recently we’ve learned a few things:
- We don’t like doing enterprise sales. (Goodbye Leanpub Enterprise.)
- We don’t like policing an affiliate program.
- We don’t like the inherent opacity involved in the affiliate program or the advertising program.
Leanpub is based on a culture of mutual respect, transparency and trust between us, authors and readers.
Fundamentally, an affiliate program breaks this. On a book purchase page in the normal case, we show two sliders: what the reader pays, and what the author earns. If there’s a cause, we show three sliders: reader pays, author earns and cause gets.
If there’s an affiliate involved, earning 50% of the sticker price, we found ourselves not wanting to show that as a slider.
But obviously we’re not going to lie to readers, so doing any kind of garbage like burying the affiliate fee in what the author earns and then having some legalese about “author earns” meaning “what the author either earned or would have earned if 50% of the sticker price wasn’t going to the affiliate” is just stupid. And it’s even worse if there’s a cause involved.
So, we learned that we’re uncomfortable with the affiliate program.
Then, couple this with the fact that the affiliate program both attracts fraudulent and scummy affiliates and has essentially been a failure for most authors, and the decision to kill it is really easy.
While we’re at it, we’re killing the advertising program, by which Leanpub runs ads and acts as an affiliate. The advertising program suffers from the same issues around transparency and sliders, and we couldn’t find a way to run ads which were revenue neutral. (In my ideal world, I could have spent $1000 on Twitter ads and earned at least $1000 in profit to cover their cost. We achieved 10% or 20% of that in some cases, but nothing approximating 100%. And since we’re currently bootstrapped, we’re spending our own money, not a VC’s.)
Last, we’re removing the offers and services feature, since it’s distracting and represents a bunch of meetings on our part for not much benefit for our authors. What we want to be doing is building features that make it easy for interested parties to contact Leanpub authors, instead of us having some sort of facilitated intermediary chaperone garbage feature to play that role. And, just like with the former iBooks and Kindle feature, we only like earning money when we actually deliver value.
I’d like to emphasize that we are more committed than ever to the core Leanpub workflow.
Things like minimum & suggested price, causes, in-progress publishing, Markdown & Markua, bundles, packages, etc are not going anywhere. The “more wood behind fewer arrows” cliche is also good advice. We have more focus and clarity around our vision than ever. And with this focus, we are improving these core workflows.
The Markua spec and support is coming along great, including the rewritten (and open source) in-browser editor coming soon.
We recently added the ability to write in GitHub but get the output in Dropbox, which is probably the best way to write on Leanpub in a team. (It’s the “mullet” feature: business on the front, party on the back. You see this feature as a “Send Output to Dropbox” checkbox if you go to https://leanpub.com/YOURBOOK/writing_settings and select “Using Git and GitHub” or “Using Git and BitBucket”.)
We’re also enhancing the multiple copy support in packages, so that an author will be able to say that a package is for n copies, and then a purchaser gets n redeem tokens (& links) which can be easily shared with friends and colleagues, etc.
Finally, we have a much larger vision which I’m not going to talk about publicly yet, but which is really exciting. I’ll just say that we are aware of all the things that suck with Leanpub, and we have an interesting way forward…
Thanks for being Leanpub authors,
published Jun 11, 2015
Roger Peng is an Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and co-founder of the Johns Hopkins Data Science Specialization on Coursera and the Simply Statistics blog, where he writes about statistics for the general public.
This interview was recorded on May 27, 2015.
Len Epp: Hi, I’m Len Epp from Leanpub. And in this Lean Publishing Podcast interview, I’ll be interviewing Dr. Roger Peng. Roger is an Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He is also a co-founder of the Johns Hopkins Data Science Specialization on Coursera, which has enrolled over 1.5 million students, and he’s a co-founder of the popular Simply Statistics blog, where he writes about statistics for the general public. Roger’s research interests include the study of air pollution and health risk assessment, and statistical methods for environmental data. He is also a leader in the area of methods and standards for reproduceable research, and is the reproduceable research editor for the Journal of Biostatistics.
In addition to being the author of more than a dozen software packages, implementing statistical methods for environmental studies, he has also given workshops, tutorials and short courses in statistical computing and data analysis. Roger recently published his first Leanpub book, “R Programming for Data Science”, which uses material developed as part of the Johns Hopkins Data Science Specialization. The book is available for free, with a suggested price of $15, and already has over 17,000 readers. The book can also be bought along with lecture videos and datasets.
In this interview, we’re going to talk about Roger’s professional interests, his book, his experiences using Leanpub, and ways we can improve Leanpub for him and other authors. So, thank you Roger for sitting through that introduction and for being on the Lean Publishing Podcast.
Roger: No problem, thanks for having me. And I just want to warn you that my building is right next to the hospital here, so you may hear the occasional siren.
E: That’s okay, it’ll just give some background color.
E: So I’d like to start with a couple of biographical questions. Can you tell us how you first became interested in statistics generally, and in biostatistics specifically?
P: That’s a great question. It’s kind of a weird path that I took. I studied math in college, and I think that’s how a lot of people get involved in statistics initially. Part of my math requirements required that I take a course in statistics, so I took a - I think it was probability theory. And I really enjoyed it, and so I kind of kept going down that road and taking more and more statistics classes - and ended up being kind of like a minor in that area. And so I just naturally thought about applying to graduate school. My older brother had gone to graduate school, so I figured that was the right thing to do. And it was kind of funny - so I graduated from college in 1999, and basically it was the dot com craziness. Everyone was going to the software companies, and here I was going to apply to graduate school. So I kind of bucked the trend there.
Anyway, so I applied to graduate school in statistics, because I thought that’s what I wanted to do. Seemed like a fun field. And so, I went to UCLA and got my PhD there. And I originally didn’t learn any biostatistics per se. I wasn’t really working in biomedical sciences. And so I was looking for a job, and my adviser, his grad school roommate was a Professor at Johns Hopkins. I had no intention of really applying there, because I didn’t really think I was doing biostatistics. And then I think, my old roommate’s there, he says he really loves the place, so you should check it out. So I applied and I interviewed, and I really liked the people there. And I thought, “Okay, well even if I’m not specialized in this topic area, it seems like a great environment, a great institution. So I got the job here and I came.
So it was kind of weird, because it wasn’t necessarily directly what my training was. But I think for me, a lot of decisions I make, in terms of what to do or where to go, are based on, what people are involved in it? Are there good people involved? And if I like being with them, then that’s the bottom line for me.
E: That’s interesting how you bring up the startup world as. I mean that’s how a lot of decisions are made in startup-land as well, right? It’s like, we’ve got lots of options, but we’ve got lots of ideas. But what startup should we work with? The people that you’re going to be involved with are often a driving factor there.
P: Yeah, because I think things always change, and the people need to be able to deal with it. And you’ve got to make sure that you’re with the right people when things go wrong.
E: I have this specific question about the work that you’re doing now. It’s on your website that you’re working on environmental biostatistics, and how air pollution and climate change affect human health. Can you give us a little information about how you would use statistics to study those effects?
P: There’s a couple of areas. My biggest area is probably looking at outdoor air pollution and population health. This work directly informs national level type regulation on air pollution standards. So what we do is we look at the study in the US where the US Environmental Protection Agency monitors air pollution all across the country, in all the major cities. The idea is that we want to see how the levels of air pollution that are changing in the air, are related to different population health metrics. So we might, say, look at the number of people who have been hospitalized for a heart attack on a given day, or the number of people who were hospitalized for respiratory infection - something we think is linked to air pollution exposure. So we have these very long time series of daily levels of pollutants, and from day to day, things go up and down.
So, you would imagine that if pollution is linked to health, that as pollution’s going up and down, the various health metrics also should be going up and down. But the problem is that teasing out that signal is really hard. Because it’s not the kind of signal that - air pollution’s not the kind of thing that knocks you over as soon as you walk outside, right? Well at least not in the United States, right? And so, there are all kinds of other competing factors that are a risk for your health. Teasing out the signal that air pollution contributes to either morbidity or mortality risk is really where statistical models are needed.
Back in the old days, in the 40’s and the 50’s, when pollution was just out of control, you didn’t need fancy statistical models to see that it was affecting people’s health. You just had to go out on the street and see people having problems. But now that pollution levels are lower, it’s not so obvious to see those kinds of problems. But nevertheless, we still do see pretty strong associations between changes and pollution levels and various health outcomes.
E: I imagine it must be even more complex when you factor in climate change?
P: Yeah, climate change is an aspect that affects how we think about things in the longer term, right? There are different time scales in which you could think about air pollution problems. One of them is the day to day level. But another one’s how things change over time, and are things improving as air pollution levels go down? Climate change can affect that in a variety of ways. One is affecting the weather, which has an interaction with air pollution levels. And the other is that, as we implement policies to deal with climate change, that has a direct effect on air pollution levels too. So, for example, we want to deal with climate change by shutting down some power plants. Then that will also affect the direct levels of pollution. So there’s lots of interactions between the different things there. And so, statistical modeling is useful for integrating all the different kinds of data that you come across. So there’s climate data or air pollution data or health data. And it’s also useful for teasing out these small signals that we have to detect.
E: And are you always working with a national dataset, or do you focus on a specific region, or say, urban versus rural, or something like that?
P: My work focuses on national-level studies. We get data on the pollution side from the entire US EPA monitoring network. Also, we get health data from really large administrative claims databases like Medicare, and Medicaid, which are these large national insurance programs. So we can look at insurance records and see every time someone was hospitalized - we can mark that up, and then see if it’s related to changes in air pollution levels from the monitoring network.
E: That’s really interesting. This reminds me of stories in the media in the last year or so about Paris shutting down half the cars on the streets, when you can only drive a car if your license plate ends in like an even number, or an odd number, to cut down on car pollution. As you were saying, pollution levels have gone down generally in the States in the last couple of decades - do you see any problem like that emerging in an American city in the next 10 or 15 years?
P: Any problem like what, sorry?
E: Like there is in Paris.
P: If you look across the nation, things have gotten much better over the last few decades. But there are still cities that have very high levels of pollution and still have problems. For example, if you look at the, I think it was in 1996 - the Atlanta Olympics, Summer Olympics. They implemented a scheme like that in terms of traffic, just traffic control. I think just because they envisioned lots of people coming and things like that. But I think there are cities that are still beyond the regulations here in the United States that need to improve their levels. And so, although conditions are generally much better. it’s not a solved problem yet.
E: Just switching gears slightly, and just talking about data science more generally, I looked at the John Hopkins Data Science Lab website, and I’m going to read a quote and ask you to explain a little bit about what it’s talking about. It says, “The revolution in measurement, and the resulting deluge of data has made data science the most important field of study in the world today.” So can you explain a little bit about why data science is so important generally? Just for people who might not be familiar with it.
P: Sure yeah. So I think, if you look around - if you just look around yourself, everything that you look at is essentially generating data. And if it’s not generating data itself, we have some device that can collect data from it. So everywhere you go in the world, and in your life today, there is information that’s being generated, kind of spewing out into the world. And a lot of it we can’t collect - there’s just too much. But we can collect more and more of it as time goes on, because of improvements in technology and in computing power.
And I think if you looked back many, many years, say 100 years, the biggest issue was collecting the data, because it was very expensive, and you had to be very careful not to waste a lot of resources collecting data. And then once you got the data - assuming you did it right - the analysis is pretty straight forward, because there’s maybe 10 data points. But now we have the kind of reverse situation, where the collection of the data is very routine, in fact almost too routine sometimes. I mean, the data is just happening. It’s being collected whether we like it or not. You look at some server web logs, the data’s just being collected, it just is. And so now the analysis actually has become much more complicated, and much more difficult to do, because of the volume and the complexity and the heterogeneity of all the data that’s just being generated automatically.
So the difficulties and the skills that are required have really flipped. Whereas before, you had to be really careful about optimizing your study design and making sure that you’re not wasting things. I mean, you still have to worry about that. But now the skills for data analysis are really necessary. And there are lot of fields that didn’t emphasize the data analysis part. And they’re realizing now, “Oh, actually we’ve really got to train people in this area,” whereas in the past we did not have to. So that’s why I say that I think data science is, in many ways, taking over every area of either science or business or whatever. Because everywhere there’s data. And so the skills to analyze those data are becoming increasingly valuable.
E: I imagine that the presence of this data is unlocking new areas of study. For example, in the past, people weren’t all wearing Fitbit’s and clocking their steps every day and stuff like that. So now that things are being tracked that weren’t being tracked before, it must open up new areas for study.
P: Yeah, there are new areas of study being created. Things like wearable computing - that didn’t exist a couple of years ago. There’s just new kinds of data that we’ve never seen before, and so we don’t even know how to characterize it. If you have an accelerometer that you’re wearing, how do we even get any information from that data? How do we know what you’re doing with the accelerometer? That’s where statisticians like me, and many other people that I work with, earn our money, because we have to figure out ways to look at the data, to characterize it, to understand what’s happening.
E: Okay, I’m not sure if my next question is related to that. But you do say on your website that you have a special interest in reproducible research. I was wondering if you could explain a little bit about what reproducible research means, and why it’s so important to you and to your field?
P: Reproducible research is like the scientific analogue of open source software. The basic idea is that, for your, for work to be reproducible - and there’s a lot of confusion sometimes about the terminology - the idea is that you want your data and your software code to be available for others to look at. So, the idea is that I can take your data, and I can take the software that you used to analyze it. And I can reproduce the numbers that you’ve published, or the graph or the plots that you made. Or whatever it is that the result was. It’s not that I’m going to redo your whole study, I’m just goint to take your data and produce what you produced.
And the funny thing is that, this was not really that important many years ago. Because the data, it was so small - there was no software, right? There was nothing to provide. If you really want to know whether an experiment was valid or not, you would just do it again, right? You’d do it yourself, right? But now the problem is a lot of studies are so big, and they involve such large quantities of data that the collection of it - like I said before, the collection of the data’s not really the challenging part. It’s really the analysis, and how they analyze data to come to a conclusion about nature or whatever it is they’re studying. That’s really where all the difficulty is.
So we really need to know, the problem is that the publishing infrastructure is not really designed to let people know what those details are. So the bottom line way to know what those details are, is to see the code to see the data. But again, because a lot of this happened very quickly, there isn’t the kind of infrastructure there for allowing people access to data, giving people access to software code. And so a lot of that has to be built. So I was interested early on in getting this idea across to people, and convincing them that it had to happen in order for science to then move forward.
E: It’s really interesting. I mean, I know that in the popular science media there have been articles over the last year or so about how the interests of scientists aren’t necessarily aligned with actually reproducing research. Because you get the headline or you get the promotion or you get the patent or something like that for actually doing the original research. And so, often articles will be published, and I guess there’s some very low proportion of experiments or studies that are actually reproduced at all.
P: Yeah, yeah.
E: I mean, is that true in your specialty as well? That there’s less of an incentive to just work on reproducing someone’s results than there are to do your own original work?
P: Yeah, I mean I think that’s a general phenomenon, and it’s a general aspect of our culture. I think the emphasis is on discovery, and I think analyzing a published data set and coming to the same results is not what might be considered as discovery, right? On the other hand, analyzing someone else’s data set and finding out something that was wrong, well that is discovery, right? So there are some people interested in doing that. But in general, just reproducing another finding is difficult. It’s sometimes difficult to, for example, get funding for, or to get published in a high profile journal.
However, when it comes to something that’s really high impact, something that’s really interesting to that subfield, to that field, it will get reproduced. If it’s something that’s really surprising or something that could have an impact on the entire field, people are going to want to know whether it’s true or not. And the only way you determine whether it’s true or not, is to have multiple people do the same experiment independently.
P: If no one cares about it, then it’s kind of hard to justify reproducing it. And there’s a lot of scientific publications out there. And it’s, it’s not like every single one of them is going to be reproduced. It’s just not physically possible.
E: Right, okay.
P: But there are many examples in the recent popular press, either of things that were faked or things that weren’t reproduced. And you realize that that’s kind of how the system is supposed to work. There’s this example with stem cells I think. It was a very surprising result, right? So what happened? Well, immediately ten labs went to reproduce it, and they couldn’t do it. None of them could do it. So they knew it was wrong. And so, there is an interest in reproducing things, but probably weighted more on the things that are surprising or really exciting.
E: Fair enough. Speaking of original research, I read in the preface to your book that your first experience with R - and I’ll ask you a question about that in just a minute - involved an analysis of word frequencies in classic texts like Shakespeare and Milton, to see if you could identify authorship based on word frequency. And I was wondering if you could explain a little bit about how you got into that, and what your results were, if you can remember that far back?
P: To be honest, I can’t remember how I got into it. I needed a senior project when I was in college, and I think my advisor pointed me to this paper, it was published in the 60’s, about two statisticians who had analyzed the Federalist Papers, because there was some controversy over who wrote which of the Federalist Papers. And so they did a little statistical analysis, and I adopted the same approach that they took. The question was, “Can you identify certain written works based on the rate at which they use what are called function words?” Words you don’t really think about, like “the”, “and” , “he”, “she”. You probably don’t spend a lot of thought on how frequently you’re going to use that word. And so, the idea is that it reflects your natural personal style.
And so the analysis involves taking these texts that we downloaded out of Project Gutenberg, and you’d use a Perl script to divide up the text into words, and to count how often they used a certain sub-set of these function words. And then, from that - I don’t know how technical you want to get, but we used a basic linear discriminant analysis to see if we could separate one author from another. And it was pretty straightforward actually. It was surprising how well things authors separated. Granted, we picked a group of people who were pretty different from each other. But you could see that authors that wrote in the same time period, they were closer to each other than authors that were writing in very different time periods. And so, there was a kind of logic to what we found - it did seem that a lot of the books or plays from these famous writers were identifiable from these patterns of how frequently they used these kind of meaningless words.
E: And did you have - I’m just curious if you had any kind of response from people on the humanities end of things?
P: Well I would be surprised if you met anyone who’s actually read that paper. I mean it was published in a statistics journal, so I’m not sure how often they’re pulling that off the shelf.
E: Well it’s interesting, because there are some pretty big controversies about authorship, specifically around Shakespeare, right? Like did Shakespeare write the plays? Or was it a group of people, or was it in fact someone else? And so I was just curious when I read about your experience with that - if anyone had kind of gone, “Aha, here’s another tool for me to make my argument that it was actually not Shakespeare who wrote the plays,” or something like that.
P: No, I have not gotten any emails or issues along those lines. But I think you’re right though. Authorship is always a very interesting topic to people. And even not just in literature, but in many different areas. It is interesting to think about how you would characterize numerically something like - for example, a piece of music or whatever, and then be able to separate it between two different people or so. But I’ve not been enlisted in that.
E: Yeah, it’s a fascinating space. I mean, because there are often biographical things that people will try to pull out of an author’s writings. Like, were they hiding something? And people will, I mean, I know this from my experience in the humanities, that sometimes people will try to tease those things out. But it’s been quite a while since I’ve heard of anyone trying to do a statistical analysis of word use like that.
E: Anyway, moving on, to talk about your book more specifically. You explain in the book that the R programming language has become the de facto programming language for data science. Can you explain a little bit about what R is, and why that’s happened?
P: Yeah sure, I’ll try. R is a language that was started in the early 90’s. It was created by two statisticians from New Zealand. Originally they wanted to create a statistical language that was free, and that could run on very lightweight computers - I think they were using old Macintoshes - and they wanted to use it to teach statistics. That was their goal. They didn’t have any grand aspirations at that time, I think. But, one of the issues - so, back then, open source software was still in some sense controversial. There were really no statistical packages of any quality that were free or open source. And so, you had to pay a lot of money to use these statistical packages to analyze any data. Unless you were at a big company or at a university, you didn’t really have access to this kind of stuff.
And so they put R up on the web in the later 90’s. And it was really one of the first open source statistical packages out there that you could really use to do serious data analysis. I found it just because I didn’t really want to pay the money for all these expensive packages. And so, I found it, and I started using it pretty early on actually. It’s a language that’s in some sense a clone of an earlier language called S+, which was one of the ones that cost a lot of money. A lot of people, including myself, had been trained on S+, and so it was easy to go over to R. It had a similar syntax - things like that. So that’s how I started out using it.
And I think very quickly, as many successful open source projects I think experience, a big community developed around it - all over the world, in Europe, the US and Latin America, Central America. A lot of people gathered around it, I think initially because it was free, and eventually I think because the community itself becomes a reason to use the package. People started building add-on packages that you could load up, and it became this thing where all of a sudden R was better at some things than a lot of the commercial packages. And I think from then there was no turning back.
I got involved somewhat early on in the kind of history of language, and saw it develop and become popular. And now it’s actually hard for me to comprehend sometimes, how popular it’s become. I always thought it would be this niche academic thing. But now it’s in business everywhere. There’s companies developed around selling and consulting in R, and there are a lot of data science companies using it for analysis. Its capabilities have just really expanded in so many different directions. One of the things that makes it great is that it has this ability to be very customizable. Anyone can implement a procedure that they want to use to analyze their data. It’s a very flexible and powerful programming language, and it has a great community behind it, an enormous community now, to support and to learn new things.
E: Speaking about size of community, I noticed from the description of your book, that the Coursera course it’s partially based on has 1.5 million people who have participated in it. I’m wondering just if you could explain it a little bit about that, or if I’ve got it slightly wrong?
P: The 1.5 million is not for the one course. We have a sequence of courses that they call specialization on Coursera, and the sequence has nine courses. That’s our data science specialization. And it kind of follows the pipeline from, how do you get data, to how do you clean it, to how do you kind of analyze it? Then how do you make data products? So R programming is just 1 of the 9 courses. It runs every month, and it’s a month long course that runs every month. It typically gets on the order of 40,000 to 45,000 students enrolled per month.
P: The other 8 courses are not all quite as popular as that. Across the 9 courses, they all run every month, we get about 170,000 people enrolled per month. And so we have a lot of students. There’s a lot of interest obviously in this area, and the R programming class is one of the more popular ones in the sequence.
E: Do you know if your students are coming from all around the world? Are they concentrated in certain areas?
P: They are coming from all over for sure. Less than half come from the US. Something like 30 to 40% come from the US, and then the rest come from all over. We get a lot of people from China, Brazil, India, the UK. So it’s kind of all over the map, yeah.
E: That’s amazing. Are they mostly people who are studying in a university somewhere, who get directed towards one of the nine courses, or towards the entire specialization?
P: The people who are studying in university are a big group. But they’re not the majority. A lot of the people that we get are working full time, and are kind of looking to upgrade their skills. To learn something new, and to maybe look to change careers or change positions in whatever they’re doing. I think our sequence is structured nicely for those kinds of people, because it’s a month long. It runs very frequently, so you can take it whenever you’re ready. It’s a lot of working professional type of people.
E: I imagine that the courses are taught mostly by video?
P: Yes, we have lecture videos. And then, depending on the nature of the course, we may have quizzes. My course has programming assignments that you have to complete, and they are graded by a unit testing framework. Some of the courses have projects where you have to do a data analysis, or create some software. So yeah, we have all kinds of things like that.
E: And is having accompanying books, is that a conventional thing for a Coursera course? Or is that something that you and your colleagues are innovators on?
P: That’s a good question. It’s hard to say what’s conventional, for something, for a phenomenon that’s like 3 years old.
E: Very good point.
P: But I don’t know, I don’t think it’s that common, unless you are someone who already had a textbook. But we felt like it was a natural thing to do. And I think we would have done it sooner had we learned about Leanpub sooner. We were looking, but we couldn’t quite find the right mechanism. We didn’t want to use a regular publisher, and I think the nature of the courses that we teach - it’s very low cost, it’s hopefully accessible. And you can take it for free, so it’s hopefully accessible to as many people as possible. We wanted to layer on something like a textbook, using a similar kind of model. And the traditional publisher really was not the way to do that.
E: No, no, fair enough! Do you use the ability to update your book quite a bit? Is that something you do once every couple of weeks?
P: No, well - initially I did update it a little bit. But the courses are fairly mature at this point, so they don’t change that much. And I wanted the book to match the course material somewhat closely, not exactly. My plan is that the books will evolve. So maybe not on a very frequent basis, but on a regular basis things will be added to the course, things will be added to the book, and so some things will be updated.
E: And what were some of the reasons you didn’t want to go with a traditional publisher?
P: Well, so I’ve done it before. I have another book through a traditional academic publisher. The bottom line is that they don’t really hit the right audiences, in my opinion. And also they - you have to charge a lot of money to make it worth your while. From an author’s point of view, you’re going to have to charge a lot of money to make it worth your while. It’s also a very slow process, because the publisher really doesn’t do anything for you. You have to do all the formatting, everything.
E: Oh really, you have to do the formatting as well?
P: Oh yeah, I mean for an academic book, unless you’re writing something that’s guaranteed to be a bestseller, you have to do everything. They do a little bit of marketing for you, and then they go to the conferences and stand at the exhibit booth for you. But there’s not much else that happens. And they do a little editing. So it’s a lot of work to go through, and then to have to sell it at such a high price. The number of people who are going to see this book is very limited from the get-go. I had that experience already with one of my other books. And so I was looking for something different, something that we could price low but still make it worth our while. And I think Leanpub just kind of hit all those points. And in addition, I think the authoring process I found really attractive.
E: Oh great.
P: Writing in Markdown, but still being able to do all the mathematics and the code and everything. It was just, it hit the right kind of balance I think.
E: So you were familiar with Markdown before you came to Leanpub?
P Yeah, in fact we teach it in one of our courses.
E: Oh, great. Just talking shop a little bit - can you tell us how you found out about Leanpub? Was it just kind of searching around for a publishing platform?
P: Yeah, so I actually heard about it from one of my colleagues. My colleague, Brian Caffo, who teaches the specialization with me. He’s one of these guys that he’s like - his brain is kind of connected to the Internet. So he’s always aware of what the latest things are. And I think he found it, and he published a book, it’s called, “Statistical Inference.” And he just raved about it, so to the point where I said, “Okay, if I don’t do this myself, I’m just going to have to keep listening to him talk about it.” So I just signed up and started the book. And once I just got going, I realized, this is just like - it feels like, I don’t know, it kind of feels like Leanpub has just hit every pain point that I had about publishing, like simultaneously - I don’t know, like maybe you guys are living in my bedroom or something - figured out every problem that I had with the publishing process, you just solved it. And so, it was just a weird coincidence I think.
E: Well that’s very nice to say, and I’m very glad to hear that. I mean, Leanpub’s been around for a couple of years already, and customer development has been really important to us. So a lot of what you’re seeing in Leanpub is other people like you who’ve been kicking the tires for quite some time, and giving us feedback. And it is one of the pleasures of working with people who are doing something serious and sustained, like writing a book - is that, they like to give you feedback, and they like to write. And they like to analyze things. So I’m really glad to hear that, because if you find something in Leanpub that you’re like, “Oh my God, I can’t believe that was there, but that’s exactly what I needed,” that’s probably because someone just like you was there at some point when it didn’t exist, and was like, “You know what would be really great, would be if we had this.”
On that note actually, I would like to know if there’s anything you think that we could do to improve? Or if there was anything you saw that was missing? If you could have your one wish feature built for you, what would that be?
P: There probably is something, but I can’t - it’s one of those things where like, when someone asks you, you don’t remember. Right now it’s really quite good for me. And I think actually, it’s quite good for academic publishing. If you’re writing, if you’re a different kind of writer, I don’t know how good it is for you. But for people like me, who are doing academic publishing, I think it’s just the right tool and it’s just the right model for that style. Unfortunately I don’t have my wish list in front of me.
E: That’s okay, that’s okay. If you ever think of anything, please get in touch.
P: Yeah, but I think - I really am serious though when I say that you really hit all the major points. And so I think you’re at least 90% of the way there, so there’s another 10%, we’ll figure it out.
E: Well thanks very much for that actually. I do have just one more question about academic publishing specifically. It’s something we’ve been thinking about for quite some time now. And I was wondering, one of the big questions about academic publishing is that people are often looking - I mean, if they’re tenure track, but they don’t have tenure yet, that’s kind of the most important promotion point. And there’s often very specific, in fact even calculated methods for saying what the value is of getting a publication in a certain journal. I don’t know how much it’s like this in the States, but definitely in the UK. They have this thing called the “Research Assessment Exercise,” which actually kind of quantifies your contribution to the field. And often this is based on rankings of journals or university presses for example. And so getting a monograph published with The Oxford University Press or something like that is worth more than one from somewhere else. I’m curious about what you might think about that when it comes to academic publishing in the future. Do you think this is something that’s going to change, where for example, if you published an academic book on Leanpub, it’s hard to know how it would fit in with that ranking, where people are looking for quantified professional development?
P: Yeah, I think that’s a short term issue. So people today may have an issue, may have a problem, because it’s we’re in transition. Ebooks are still kind of new. But I think in a couple of years, it won’t even come up. And the idea that you’re self-publishing in a way, or whatever, is not such a big deal, because I think with books in particular, the publishing process is not like when you’re writing a journal article, which is peer reviewed. With books, there are peer reviewers, but it’s a much - you have much more control, and it’s much more your thing. And so, it’s much more of a personal statement when you write a book, I think, then if you write a journal article, it’s a research article.
I think with books, what it comes down to is not so much like, “Oh is this publisher good or not?” It’s more about - well it’s a really big commitment of time to write a good book. And if you’re a junior professor, you’re looking to get promoted - you’re going to think, “Oh, what’s the trade-off here? I could spend this amount of time to write this book, or I could spend the same amount of time to write two research articles.” Because there’s a huge commitment of time, and there’s a trade off: “I’ve got to do one or the other, I can’t do both.”
And I think one thing that’s nice about something like Leanpub, and a lot of these other tools out there - is that, it really decreases the amount of time that’s not spent just directly producing content. Because time is the one resource that is the most important resource. If you can minimize the amount of time doing things that are really not that important, like emailing back and forth with the publisher or whatever, and just really focus on writing content and writing your book - Again, that’s one of the beauties of Markdown, right? You’re just focused on writing the content. I think that is a major plus. And that’s what I tell people now too. The tools are developed such that you don’t have to waste time figuring out, how do you format things correctly, or how to get things - how to produce things. You just focus on writing. And I think that’s the kind of thing that I would worry about most in terms of the time trade-off for writing a book versus not writing a book. I think it’s not so much an issue, like, “Oh, should I go with this publisher or that publisher?”
E: Thanks very much for that. I really appreciate you giving us your time today. Unless you have any questions for me, I’d just like to say thanks for being on the Lean Publishing Podcast, and for being a Leanpub author.
P: Well thanks for having me, I’m really enjoying it.
This interview has been edited for conciseness and clarity.