published Nov 26, 2015
This year, Black Friday is on November 27, and many Leanpub authors have decided to offer discounts on their books. Below you will find a list of these deals, with links. Please note some of these deals won’t be live until tomorrow, and some only last for one day!
(A note to our non-North American customers: In the United States every year there is a tradition called “Black Friday” where stores across the country offer discounts on many of their products. It’s sort of the unofficial beginning of the holiday shopping season.)
Have fun checking out these great deals and discovering new books and authors!
If you’re interested in discovering even more deals, follow Leanpub on Twitter, as we expect even more authors will be announcing deals throughout the day and over the weekend.
Please note that there’s also a follow-on tradition called “Cyber Monday”, which is essentially a second Black Friday, but focused on sales for online purchases rather than in–physical-store purchases. This distinction seems a little outdated to me (what with Amazon and all), but in any case, it’s another fun opportunity for readers to buy new books they might not otherwise have tried, and for authors to reach out and find new readers who are looking for special deals.
So, some Leanpub authors will surely be promoting deals on Cyber Monday as well. Please check out our Twitter feed then for even more deals!
Update! [Friday 3:08 PM PST] Joseph Benharosh’s book The essentials of Object Oriented PHP is now on sale!
Author: Joseph Benharosh
Book: The essentials of Object Oriented PHP
Update! [Friday 1:00 PM PST] Kristopher Wilson’s book The Clean Architecture in PHP is now on sale!
Author: Kristopher Wilson
Book: The Clean Architecture in PHP
Update! [Friday 11:56 AM PST] Chris “The Grumpy Programmer” Hartjes has put his books on sale for 50% off for 96 hours! Below you’ll find a link to his website and links to his books on Leanpub.
Chris’s website: http://grumpy-learning.com
Author: Chris Hartjes
Book: The Grumpy Little Book Of Hack
Author: Chris Hartjes
Book: The Grumpy Programmer’s Guide To Building Testable Applications In PHP
Author: Chris Hartjes
Book: The Grumpy Programmer’s PHPUnit Cookbook
List of Black Friday Deals
Author: Adam Tornhill
Book or Bundle: Patterns in C
Author: Alan Clark, Terry Peterson, and Steve Pitt
Book or Bundle: Simply Manage
Deal: Half price.
Author: Antonio Santiago
Book or Bundle: The book of OpenLayers 3
Deal: $10 discount
Author: Ash Furrow
Book or Bundle: Functional Reactive Programming on iOS
Deal: Half price.
Author: Ash Furrow
Book or Bundle: Your First Swift App
Deal: Half price.
Author: Bill Edstrom
Book or Bundle: Guide to Tracktion 6
Deal: Price lowered to $9.99.
Author: Christian Grobmeier
Book or Bundle: The Zen Programmer
Author: Christian Grobmeier y Lucas Videla
Book or Bundle: El Programador Zen
Author: Christian Kvalheim
Book or Bundle: The Little Mongo DB Schema Design Book
Deal: $8.99 (down from $14.99)
Author: Christopher Berg
Book or Bundle: The Virtuoso Teacher
Author: Christopher Berg
Book or Bundle: Mental Strategies to Improve Sight-Reading, Memorization, and Performance
Deal: $1.99 from now through Cyber Monday.
Author: Chuck Heintzelman
Book or Bundle: Laravel 5.1 Beauty
Deal: 60% off.
Author: Daniel Root
Book or Bundle: Trello Dojo
Deal: $5.00. Effortlessly improve all your home and work projects with Trello Dojo.
Author: Daniel Schmitz and Daniel Pedrinha Georgii
Book or Bundle: React - Beginner’s Guide
Deal: 60% off (only $2.00)
Author: Daniel Schmitz and Daniel Pedrinha Georgii
Book or Bundle: Laravel and AngularJS
Deal: 80% off (only $6.00)
Author: Devon Steiger
Book or Bundle: Archetypes State of Play
Author: Devon Steiger
Book or Bundle: Archetypes State of Chaos
Author: Devon Steiger
Book or Bundle: The Peace Speaker
Deal: $0.99 plus book launch at 12am EST.
Author: Diomidis Spinellis
Book or Bundle: The Elements of Computing Style
Deal: $3.95 sale price, regular price $14.95.
Author: Dirk Haun
Book or Bundle: Präsentieren für Geeks
Deal: $1.95 minimum price for the entire weekend.
Author: Duncan Dickinson
Book or Bundle: The Groovy 2 Tutorial
Deal: 50% off.
Author: Joseph Little
Book or Bundle: Agile Release Planning - My practical methods
Deal: $3.00 off for the day.
Author: Kevin Karplus
Book or Bundle: Applied Electronics for Bioengineers
Deal: Minimum price lowered from $3.00 to $2.50 for Nov 26–Nov 30.
Author: Luc P. Beaudoin
Book or Bundle: Cognitive Productivity
Deal: $3.19 instead of $15.99 - that’s an 80% discount!
Author: Lukasz Wrobel
Book or Bundle: Memoirs of a Software Team Leader
Deal: 30% off.
Author: Michael Müller
Book or Bundle: Web Development with Java and JSF
Deal: $9.99. 5 copy package $39.99, classroom package $99.99.
Author: Michael Müller
Book or Bundle: Java Lambdas und (parallel) Streams
Author: Nicolas Fränkel
Book or Bundle: Integration Testing from the Trenches
Author: Paul M. Jones
Book or Bundle: Modernizing Legacy Applications In PHP
Deal: 50% off for 10 days starting on November 26.
Author: Paul M. Jones
Book or Bundle: Solving The N+1 Problem In PHP
Deal: 50% off for 10 days starting on November 26.
Author: Rachel Panckhurst
Book or Bundle: Fait maison
Deal: Discount TBD.
Author: Rob Aley
Book or Bundle: PHP Beyond the web
Deal: 75% off promo ($6.24 instead of $24.99).
Author: Ryphna St-John
Book or Bundle: Ryphna’s Notebook
Deal: FREE until December 1st!
Author: Ryphna St-John
Book or Bundle: The Call Of The Doll
Deal: FREE until December 1st!
Author: Sam Atkinson
Book or Bundle: Java Interview Bootcamp
Deal: 50% off on packages ($10.00 to $5.00 for the basic, $19.00 down to $10.00 for the premium).
Author: Thomas Morton
Book or Bundle: Write Copy RIGHT
Author: Utkan Uluçay
Book or Bundle: Paradigmanın Neresindesiniz?
Author: W. Jason Gilmore
Book or Bundle: Easy Laravel 5
Deal: 20% off (from $31.00 to $24.75)
Author: W. Jason Gilmore and Eric L. Barnes
Book or Bundle: Easy E-Commerce Using Laravel and Stripe
Deal: $10 off.
Author: Yilmaz Guleryuz
Book or Bundle: YOGURT for All!
Deal: Bundle discount.
published Nov 24, 2015
Brian Caffo is a professor in the Department of Biostatistics at Johns Hopkins University’s Bloomberg School of Public Health. In this interview he talks with Leanpub cofounder Len Epp about the how he first became interested in biostatistics, why it’s such an important and growing field, and about his research interests and initiatives.
This interview was recorded on July 21, 2015.
Len Epp: Hi, I’m Len Epp from Leanpub, and in this Lean Publishing Podcast, I’ll be interviewing Brian Caffo. Brian is a professor at the Department of Biostatistics at the Bloomberg School of Public Health at Johns Hopkins University, and director of the graduate program at JHU Biostatistics.
Brian works in the fields of computational statistics and neuroinformatics, and is a co-founder of the SMART Working Group at Johns Hopkis Biostatistics, which specializes in medical, and especially neurological, imaging and biosignals, such as polysomnography and wearable computing. In 2011, he was among the recipients of the Presidential Early Career Award for Scientists and Engineers, and the first statistician to receive such an award. Brian has also received the Bloomberg School of Public Health, Golden Apple, and Amtra Teaching Awards.
Brian is the author of two Leanpub books, Statistical inference for data science and Regression Models for Data Science in R [note: since we conducted this interview, Brian has published two more books]. Each book offers a brief but rigorous treatment of statistical inference and regression models respectively, and is intended for practicing data scientists. Both books are companions to classes offered as part of the Data Science Specialization on Coursera, a ten-course program offered by Brian and his colleagues at Johns Hopkins, Jeff Leek and Roger Peng.
In this interview, we’re going to talk about Brian’s professional interests - his books, his experiences using Leanpub, and ways we can improve Leanpub for him and other authors.
So thank you, Brian, for being on the Lean Publishing Podcast.
Brian Caffo: Thank you. It’s great to talk with you and meet you.
E: Thanks. So, I usually like to start these interviews by asking people for their origin story, to learn how they got to where they are in their careers, and how they developed their interests. So I’m wondering, specifically, how you first became interested in bio statistics and why you decided to pursue a career in academia?
C: Well, the long version of this story is, I was actually a swimmer in college. I wasn’t a terribly good swimmer, but I was on a big team. I was at the University of Florida, which has a great swimming program. And I was an art major at the time, and I was spending so much time training that I didn’t have a ton of time to actually put into being an art major - which is a surprisingly difficult major, especially in terms of time. And I had an aptitude for mathematics. So I kept taking math classes, maybe a little bit lower level than I needed to, but then just kept incrementally doing it to fill out my hours, so that I could have some classes to be able to manage swimming and trying to do the art major.
Finally, after doing this for long enough, I talked to a guidance counselor, and they said, “You know, we typically don’t get too many art majors who are taking differential equations, and linear algebra, and these sorts of subjects.” And she said, “You seem to be actually doing better in those than you are doing in your art classes. You can always do it in your spare time.” So, from there I switched over to become a math major. And from mathematics, I spent some time working with the Children’s Oncology Group, which was then centered in Gainesville, at the University of Florida, where I was at.
From there I just really fell in love with working with data, and the kind of computing and mathematics that goes along with statistics. Just staying in academics for me, was in a lot of ways a no-brainer. I really loved the things that I was doing, and I loved the kind of research that I was doing. I was very fortunate to get a position here at Hopkins, where I have such amazing access to great data, great researchers, great medical research. So that’s the long version of my origin story.
E: Okay, thanks - that’s very good. It’s an interesting path. I was wondering if you could explain some of the reasons you co-founded the SMART Working Group, and what the purpose of the group is?
C: Yeah, so originally this was co-founded with a collaborator here, Ciprian Crainiceanu. We had a lot of similar interests, in terms of how we approach modeling and statistics. We were getting a lot of people coming to us, talking to us about some new kind of measurement that they were collecting. In my case, it was mostly brain imaging measurements. You might think that there’s only a couple of ways that you can measure the brain, and that idea couldn’t possibly be more wrong. Just even with one type of scanner, a magnetic residence imaging scanner, there are so many different ways you can tweak an MRI scanner to give you different kinds of signals in the brain, that you can barely count them.
At any rate, both in terms of brain imaging, but also things like sleep studies and polysomnograms well as other kinds of wearable computing things, we were constantly getting people coming to us and talking to us about, “How do I analyze this kind of data?” Especially because we’re here at a school of public health, we’re here at a medical institution, people wanted to relate these measurements to disease. They wanted to create preventions and prognoses. Being at a school of public health, people wanted to relate it to large populations. And they didn’t know how to do it. It involves a lot of computing mathematics, statistics. And so we noticed a common thread of some biological signals, or biosignals, and we founded the group out of it. Initially it was a “group” with air quotes around it, and then after enough faculty joined in, and enough students joined in, we started having alumni from the group, and postdocs and things like that. It’s now become a rather large entity. When we have a full group meeting - which we don’t have that often anymore because it’s gotten so unwieldy - maybe 35 or 40 people will show up.
E: Wow, that’s great. Can you explain a little bit about what a biosignal is, and maybe give an example of one?
C: A biosignal, basically, is any biological or medical signal that is used to create a diagnosis, or to create a measurement that is then used for research purposes. That’s a very broad definition. An important class of biosignals that we don’t really delve into too much in our work is the field of computational genomics and high throughput bioinformatics. So we don’t do too much of that. There’s a lot of different interesting kinds of measurements that go on there, but we broadly classify it. We did it that way because around here, and around Hopkins, there were large developed bioinformatics groups, because there was so much excitement over the sequencing of the human genome. But a lot of these other technological revolutions were getting left behind, in terms of analysis skills. So we sort of lumped them all together. And so yeah, I agree that things like biosignals are kind of a vague term. But now that we’ve gotten big enough, we maybe need to make them more precise, and define ourselves a little better.
E: Oh, no, fair enough. I was just wondering. So, for example it’s like something people might be familiar with, like rapid eye movement? Does that count as a biosignal? Or like measuring eye movements?
C: Yeah, so there you’re talking about in sleep. Sleep is typically measured - if you get a rigorous sleep study, that’s called a polysomnogram. The collection of biosignals that they would collect in that case would be an electroencephalogram - they’d put electrodes on your head; that collects brain activity. They have things like a myogram, that they would put for example on your chest - that would detect breathing, and that’s detecting motion a little bit. They might have some motion sensor that they’re putting on your leg for restless leg syndrome. They might have something that measures oxygen that they would put on your finger, and they might put an EKG on. In most of the sleep studies we were looking at, they were very interested in cardiac outcomes associated with different kinds of sleep disorders, and so they might have an EKG on.
So, specifically when you talk about REM, what you’re talking about is the collection of measurements. For a diagnostic sleep study, they have things that are going to measure things about your breathing. REM is specific classification of a sleep state. And that arises from subsets of these biosignals, that they use to then classify different kinds of deep sleep - different stages of sleep in REM. That’s an important aspect of the measurement. Usually they have to take them and pass them through a human to get the staging like that.
As an example, we have several papers on analyzing the percent of the time that you spend in REM over the course of an evening; they call things like that “sleep architecture”. So we often think about how, if you’re tired for several days straight, you might think you can catch up on sleep or something like that. That relates to things like sleep efficiency, and sleep architecture, and the sorts of things that we get out of those signals. So, going from those signals, to these measurements, and relating them to diseases in the population is really what we try to do.
E: And you’ve done work with signals that come directly from the brain, where you’ve implanted electrodes directly on the brain, as well. Is that correct?
C: Yeah, the one kind of measurement like that that I’ve dealt with is called electrocorticography. That’s something they do for people with pretty severe epilepsy, where different medications and other kinds of treatments have failed, and they’re left with nothing, other than brain surgery. They’d saw off the top of their head, and as a measure to help inform the surgery, they’d place this electrode sheet directly on the cortex. Then, usually there’s some time between the early parts of the surgery, and then the actual brain surgery part, so people often let some amount of experimentation go on - in terms of maybe playing sounds, or having them do things, and then recording the brain activity while that’s going on.
So yeah, in some rare cases you can actually collect human measurements from otherwise healthy humans who have some severe disorder, like epilepsy, where the measurements are directly implanted on the brain. I don’t personally do this, but there’s also a lot of people around here who study mice, and monkeys, and other things where they actually implant the electrodes. So these aren’t implanted, they just rest on top of the cortex. There are other people who do things - there’s a fascinating field called machine-brain interface, where people are implanting electrodes in monkey brains, and they’re getting the monkeys to feed themselves with a robotic arm - just the robotic arm being controlled with the electrodes directly implanted in the brain. There’s tons of neat areas where there’s actually a direct implantation of the electrodes, but that only occurs in these more invasive things that people do on animals. I almost exclusively work on human data.
E: I’ve read also that the SMART Working Group uses brain imaging for prediction, and I was wondering if you could explain a little bit about what brain imaging is, and how it can be used to predict behaviors?
C: So, the kind of brain imaging that I work on is called functional magnetic resonance imaging. In that, you don’t get a static image, you get a dynamic image that represents, hopefully, brain activity–localized brain activity–over time. There’s a lot of different ways that people might use both this kind of measurement, and other kinds of brain measurements for prediction.
As an example, a colleague that I’m working with right now wants to use brain activity as measured by fMRI, plus some structural measurements in people who are in comas, to try and predict when they’ll come out of it, or if they’ll come out of it and the prognosis. So that’s an example of using brain imaging as a biomarker to predict some outcome.
A colleague of mine in the SMART Working Group - someone we managed to successfully recruit to the university - an extremely well-known fMRI researcher, Martin Lindquist, works on actually trying to predict what’s going on in your head, with the information from the scanner, at that moment. In particular, he works on pain. So he tries to predict, they have people in the scanner, and they actually deliver pain to them by a hot plate or something that’s resting on their wrist. It actually stings a little bit, and he tries to predict how hot it was on the plate, just based exactly on the brain signal and things like that.
That has implications for trying to understand how we can get a better measurement of pain, right? When people just say, “Oh something hurts,” a physician doesn’t know what that means, but if they can calibrate it…. So, he’s working toward the idea of actual prediction of pain. That’s another way that you could use these kinds of measurements for prediction - and there’s quite a few.
I tend to more focus on the public health-y type prediction-type things, where we try to predict whether or not a person has a disease; whether or not they’ll come out of the coma is another example - these sorts of things, where the image is just a collection, a part of the measurement. A really big one that everyone’s working on right now is trying to predict who will get Alzheimer’s disease, the reason being that, if you can detect Alzheimer’s disease early, then you have a much better chance of being able to develop an effective treatment.
E: Generally on the subject of gathering data, I was wondering if you had any comments about the impact that wearable computing is going to have on public health generally, and perhaps on your field specifically?
C: It’s going to be huge; it’s going to be huge. The SMART Group does a lot of wearable computing work. Personally, I don’t do too much. But my colleague Ciprian who co-founded the group with me, along with some others in the group, have really dove into wearable computing in a big way.
There’s so many different types. When you think of wearable computing, people tend to think of things like Fitbit and stuff like that. But for research purposes, there’s a million different kinds of sensors and measurements that people can take, that are now small enough and portable enough, and it is just amazing.
I think it’s going to revolutionize public health, in terms of our ability to get accurate measurements for lots of different things. The key bottleneck is having enough people who know how to analyze this stuff, where our ability to collect data is just so vastly outstripping our ability to analyze it. So we see that actually the bigger problem is not the development of the sensors and stuff like that, beccause lots of smart people are working on that, and they’re developing great stuff; but then so much data gets produced, and the real bottleneck now is people to analyze the data. So I would say to anyone who’s an aspiring young machine learner, or computer statistician, or biostatistician - or anything like that, it’s a great field to get into.
E: I was going to ask on that subject - generally speaking, statistics seems to be sort of enjoying a cultural moment. With the popularity of sports and election statistics, most commonly associated in North America with people like Nate Silver, I was wondering if you think that this specifically is going to inspire more people to get into things like data science? And if data literacy generally will improve, as we go forward - say if we’re all wearing Fitbits, or things like that?
C: Well, I think this cultural revolution for sure is helping. Moneyball, the book, is a great example - yeah like Nate Silver, that cultural revolution is great, and is really going to help out our field in this closely-related field.
But I think the bigger impetus for drawing people into the field is the demand for jobs in the field. I think the fact that there are so - it’s one of the relatively few sectors where there’s an enormous amount of job growth, and that there’s way more demand than supply. There’s no apparent harvesting that’s happening. There’s no leveling off that seems to be eventually the case. And new data oriented companies seem to be popping up every day, and giant companies - all your Googles, and Facebooks, and Twitters, and everyone, they’re ostensibly data companies at some level.
And so, I think the major draw for people into this field will be the fact that it is going to be one of the principal jobs of the future. Which, when I got into it, when I was a lowly art major trying to figure out what to do - there weren’t all the different kinds of options. It’s interesting now, we see our students - the amount of options that our students have now is truly remarkable. Some of them go into finance, some of them go into technology and move off to Silicon Valley - and some of them stay in academics. Some of them do biostatistics, some of them go to mathematics. The number of options they have now is remarkable.
E: I read on your website that you have a particular interest in large scale open access education. And I know, of course, that you’ve been successful with the data specialization course on Coursera. I was wondering what inspired your interest in open access education, and what plans you might have going forward?
C: Well, Leanpub fits really, really well into our vision, and I’ll get to that in a second.
So, initially it was really just kind of fortuitous. I had wanted to flip my classroom, which is the process where students watch videos of the lecture at home. During the class period, they actually get more of my time and the TA’s time, actually doing problems. There’s a lot of work so far that’s showing that that’s a more effective way to teach people, and that the old sage on the stage lecture model is not the optimal way to do things.
So, when I contacted some people to do some recording in our school, they mentioned that we had just struck a deal with this open access open education company called Coursera and asked whether I’d like to be one of the people on the launch. And so I agreed. I was really enthusiastic about it, snd I happily went and talked to Roger and Jeff, who are my two colleagues here, who were very interested in it as well. I think my class was okay; their classes were just blockbusters.
From then on, our interest was really piqued, and for a variety of reasons, one being, this idea of delivering low cost or free education is very appealing to people in academics. So, I think that the books, Leanpub in particular, has really helped us in terms of really fitting into that model. Our Coursera model for all of our courses is: everything’s free. The lecture notes are all posted on GitHub, and you can see the full development process.
The videos are all free, both on Coursera, and you can get them off of YouTube from most of us as well. And it just makes sense too, that the textbook - if there is a textbook that existed for these classes - that that should also have a free option, or a variable - or something that, some new way for doing the pricing so that it conformed to this new model. And it conforms great, Leanpub for a textbook, especially - especially because we can give the students edition updates, and things like that, and a lot of things that people would complain about in university textbooks, just get all solved all at once.
So that fit actually pretty well with our open education mission. I don’t know specifically how I got interested into it, other than the series of events. In that it kind of always kind of fell well within my kind of personal ethic. And I think the same with Roger and Jeff. I would also mention that school was a very early pioneer for open education. Well before Coursera, and before Khan Academy, and things like that - there was MIT Open Courseware, and our school was a participant in MIT Open Courseware. I was in on some of those meetings where they were deciding to do it, and I was super enthusiastic about it at that time, as well. This was quite a while ago. So I think the School of Public Health here has really been on the vanguard of open education, and being part of that culture kind of seeps in as well.
E: It’s really interesting, I know there are some voices out there that respond relatively conservatively to the idea of open education. And in particular, they’ll invoke the possibility that it might be a competitor or a threat to conventional university education, and I was wondering what your response might be to that criticism?
C: For sure, well it’s possible it might be a threat to certain financial models, for certain departments, for certain topics. But by and large, the question is whether or not these are coming up with new markets, or they’re poaching existing markets. I think the vast majority of the ways in which the students take these classes are new markets. They might be university students - but, a student might sign up for a machine learning class that they may not have taken at their university otherwise. I think the majority of the student engagement at Coursera, edX, Udemy, these other sites, is probably new users.
But for sure there is a certain amount of poaching that also has to occur. That student that was going to take that elective class elects to just take it on Coursera, or something like that. I’m sure to some extent that is happening a little bit. But it’s not all that dissimilar from the correspondence courses, and other historic attempts over time to broaden access to education in different ways for people who have different circumstances.
I think a lot of that focus on how online education is going to disrupt universities is a laser-beamed focus on 18 to 22 year olds in undergraduate education. But the people who take these classes exist well into their work life, well beyond their university life. And so I think it’s a lot more complex than that. I think there is a certain amount of disruption that’s occurring because of it. But I think that it’s - in a lot of ways, good disruption.
I think that one way which the kind of disruption that you’re talking about might occur, is any place that really has a revenue model where they use large introductory classes as a revenue generating component - with adjuncts to generate the revenue that they use for other things. If they don’t have a very diverse kind of financial model, that might get disrupted. But even that, I don’t see that much. People like in-person classes, so I just see how much – I do think most of it is new users and new content. But I am a university professor at some level and dependent on brick-and-mortar learning. Not on some level, on exactly every level.
E: On the subject of generating revenue and textbooks in academia, and also journal articles, I was wondering how you see academic publishing evolving say over the next few years, and if there’s going to be a shift perhaps along the lines of what’s happening towards open access education with video tutorials and things like that?
C: I feel a little bit more confident that something like Leanpub is going to disrupt traditional book publishing, than I am about what will happen with traditional universities with respect to online teaching. Because our experience with Leanpub has just shown that if you have your own channels to get your book out there, then it is really an ideal circumstance.
If you’re willing to publish something purely as an ebook, or mostly as an ebook, then you can do all sorts of interesting things with embedded video. In my stat inference book, not all, but most of the homework problems have links to YouTube videos that actually give the solutions, with me working them out; I have a little tablet here that I hand write out the solutions on as I record myself doing them. That kind of disruption seems pretty inevitable to me at this point.
For journal publishing, I’m less and less certain about it. I don’t know a lot about the journal publishing business. I guess I don’t really know much about the book publishing business either. From my experience as an author, I can’t even imagine contacting a traditional publisher at this point. But for journal publishing, that’s - I mean I still submit my best articles to what I think are the best journals that’ll take them. And I don’t know about disruption… There’s a lot of discussion in academics right now about disruption of academic publishing, and there’s a lot of new models that are coming out. I must say many of them are really impressive - Frontiers is an example of one that I’ve worked with, that I think is quite impressive. And PLOS is another example that’s quite impressive. But it’s less clear in my mind how that will shake out.
E: I know that in both of your books on Leanpub, you mention that people are invited to send you errata with pull requests on GitHub and things like that, and I was wondering if engaging directly with people who’ve already bought your book is important to you? And if there’s anything we could do at LeanPub to help you engage better with your readers online?
C: I like the fact that if I decide to add an extra section or chapter or new set of problems or something like that, that Leanpub allows me to contact everyone who’s bought the book, and tell them. So, sometimes I’ll republish the book, and I’ll send out an email to everyone that just says, “I’m republishing the book, this is my minor stuff, don’t bother reloading, re-downloading and putting on your devices for this.” But then sometimes I’ll put a whole new section or chapter in, and I’ll be like, “This is probably worth re-downloading if you’re still actively engaged in the book.”
So the ability to contact people, I think, is quite useful. And people who have bought the inference book might want to get the regression one. So the ability to email them out and say to do that is great. So yeah, I think that’s a nice aspect of having direct access to the customer.
And the GitHub integration… So, the students submit pull requests on GitHub, then I’ll correct that error. So it’s not like they’re emailing me errata, they’re submitting it as a pull request. I’ll accept the pull requests. I have one of the switches on GitHub that - whenever I check something in, it automatically re-compiles it on Leanpub. So they’ll submit a pull request, I’ll accept the pull request when I recommit the repository, repush the repository, then it automatically gets recompiled on Leanpub.
Then I have to republish the book, which I’ll only do every now and then when it’s big enough, a big enough set of changes. But that’s actually really great, because it’s not like I’m getting a ton of emails. I’m getting and working through on GitHub, which is fantastic.
E: And specifically with respect to you emailing readers, we actually do a kind of, a sort of double blind, where we don’t reveal the reader’s email address to the author, or the author’s email address to the reader.
C: That’s right, yeah.
E: We’re kind of a middleman. Do you see that as a good thing, or does it bother you that you can’t see the emails - the actual email addresses unless they opt into that?
C: I think it’s sort of irrelevant to me. I mean if I had a list of emails, I wouldn’t do anything with it. If I actually had to manage the emails, that would be kind of an annoyance. So it’s kind of useful. But then also as a - from the consumer side, it seems like a nice protection of their identity and their information.
C: So that seems pretty reasonable.
C: And I don’t have any need for their email otherwise. So yeah, it seems like the right way to do it.
E: Okay, great. I was wondering if there’s anything specific that you think we could improve in your workflow, or for the way you like to write, and to contact readers, for example, if there’s anything specifically that’s come up?
C: Well so, for the data science specialization, we actually wrote all the lecture notes in Markdown. So the ability to convert the lecture notes into kind of a… well, I think of the stuff I’m doing as more kind of like lecture text, right? So it’s sort of like lecture text, it’s highly connected to the class, right? It’s sort of a standalone book, but it’s mostly connected to the class. But the ability to kind of create that kind of entity in Markdown - because with Leanpub, the authoring is in Markdown - it was very seamless. And that helped a lot.
One instance where I think things could be improved is if you could author directly in LaTeX. I noticed that Leanpub apparently does Markdown and then probably does Pandoc to create the LaTeX file, which is then - I’m pretty confident I’m seeing in the log file, standard LaTeX compiling. So if there’s some way you could actually author in LaTeX, for the maths, science, computer science, stat crowd. LaTeX is the lingua franca of the community. People already know it. That’s one thing.
The other thing would be access to the log files after it gets compiled. That would be very useful. Because when it comes in an email - that’s just a timing thing.
I think another useful thing would be an offline compiler, or something that was close enough to where you could compile it offline. So a combination of like a Pandoc - a collection of Pandoc commands or something like that. That said, “Oh yeah, take this, do this to your Markdown file” - and this will approximate pretty well whether or not you’ll get an error. I think that - so that you can kind of write and rapidly debug.
I’ve noticed the only time I ever get an error while compiling is with equations. That’s it. So for certain subjects, that’s no problem. But for like - for inference it was a little - that part was a little hard. Fortunately I already knew the math was already typeset without errors for LaTeX to begin with. So that was helpful. But for writing a de novo book, something that would allow you kind of a quicker ability to debug errors in the math - math typesetting, that would be helpful.
E: Okay well thanks very much for that. We’re actually going to be working on an author app, and hopefully a lot of that will be incorporated into it when we do that. Because we want to make it easier for people to work on their own, on their own machines.
C: Sure, and the other thing - I’ve never tried it, but another solution in my particular case would just be to convert it to EPUB myself, and then upload it as an EPUB file.
E: Oh okay.
C: I noticed you guys just accept an EPUB file by itself.
E: Yes that’s correct.
C: Yeah, so I could author it in LaTeX, Pandoc it to an EPUB file, and then just submit the EPUB file. I’ve never tried that solution.
E: I just have one last question. Are you working on any other books right now that you plan on publishing?
C: I don’t know how often you look in people’s accounts, I’ve got about seven queued up. I teach three classes as part of the specialization. I’ve done two of them - inference and regression. I’ve got to finish up regression. I think on the slider of Leanpub, it says maybe 70%, I’d like to get it up to 100%. And then, after I’ve finished with regression, I teach a third class in specialization called, ”Developing Data Products.” I’d like to finish that one. Because it’s a computer-oriented book, that will be really ideal for the Leanpub authoring and setting. And then after that, I have two Coursera classes that are coming out that I hope to create.
My new strategy is to write the Leanpub book, then record the videos, and then release the class. The reason being is that the Leanpub book will almost then serve as a script for the videos and then for the class. So we’re thinking of that in terms of the workflow.
E: That’s just really interesting. I’m so glad to hear that you’re using Leanpub that way. I mean it’s one of the things we always hoped it would be used for.
I’d just like to say, before we go, thank you very much Brian for being on the Lean Publishing Podcast and for being a Leanpub author.
C: Thank you. We’ve had a blast. It’s been really fun working with the platform, and I really - I’m a big proselytizer for Leanpub at this point. Our experience has been great, so thank you guys for creating such a great product.
E: Thanks very much.
This interview has been edited for conciseness and clarity.
published Nov 09, 2015
Paul M.Jones is an internationally recognized PHP expert. Paul is the author of the Leanpub books, Modernizing Legacy Applications In PHP and Solving The N+1 Problem In PHP. In this interview, Leanpub co-founder Len Epp talks with Paul about his career, the origins of his interest in PHP, and his experience self-publishing technical books.
This interview was recorded on July 13, 2015.
Paul M. Jones
Len Epp: Hi I’m Len Epp from Leanub, and in this Lean Publishing podcast, I’ll be interviewing Paul M. Jones. Based in Burns, Tennessee, Paul M. Jones is an internationally recognized PHP expert who has worked for organizations in many different sectors, including medical, non-profit, educational and military organizations. He is the lead developer of the Solar PHP framework, and lead on the Aura for PHP project, and was a founding contributor to the Zend Framework. Paul is a regular speaker at technical conferences worldwide, and blogs at paul-m-jones.com. In a previous career, he was an operations intelligence specialist for the US Air Force.
Paul is the author of the Leanub books, Modernizing legacy applications in PHP, and Solving the N+1 Problem in PHP. In Modernizing legacy applications, Paul explains how to get messy legacy PHP code in order, using a series of specific steps to turn it into an organized, modern, testable application. In Solving The N+1 Problem, Paul explains what the problem is and how to discover and solve it in your app using PHP.
In this interview, we’re going to talk about Paul’s professional interests, his books, his experiences using Leanpub, and ways we can improve Leanpub for him and for other authors. So thank you Paul for being on the Lean Publishing Podcast.
Paul M. Jones: Thank you for having me.
E: I usually like to start these interviews by asking people for their origin stories, so I was wondering if you could tell me how you first became interested in programming?
J: This is actually one of my favourite topics. A lot of people go through life and never find their true calling. I got my true calling very early in life. When I was just shy of 13, my dad brought home a TI-99/4A Texas Instruments computer. If anyone out there knows what one of those is, you know exactly how long ago that was. I sat down with that thing for a couple of weeks, and by the end of that time I knew what I was going to be doing for the rest of my life. I started out programming in TI-BASIC, when you had to save the program to a tape cassette recorder. And then you had to play it back to load it back into memory. So that’s how I got started, and it’s just been - it’s been a wonderful experience since then, learning how to program, learning the craft and the skill that goes along with it, and all the communities that go along with it as well.
E: Was your father a programmer himself?
J: No, my father, first he was an Army Chaplin, then he was a Methodist Minister. And then he became a certified financial planner, which is sort of a minor version of a stock broker. So he himself was not especially technical. I ended up having to help him out a lot with the various computers that we ended up buying for his businesses. I remember we had an Apple III for a long time. And I was in charge of helping him set up a lot of VisiCalc stuff to begin with.
E: It’s interesting, a lot of
J: For good or bad, I still have to help him out with a lot of computers.
E: It’s interesting, a lot of people I’ve spoken to, their first introduction to computers is playing games. But it sounds like you just dove right in and started messing around with the machine?
J: Oh course there were video games at the time, but they were stand up, coin operated things. You did not download the latest version of whatever it was to your iPhone. If you wanted to play a game on your home computer, most of the time what you had to do is go buy a magazine - a physical paper magazine, look for a code listing in that magazine, and then type it in by hand from the magazine into the computer that you’re working on, and then save it.
J: And then try to figure out how to debug it from there.
E: Do you remember what the first game was that you played on that computer?
J: Oh wow, that’s a great question. I remember one of the first ones was a text adventure-style game. Again, I don’t remember specifically what it was. I think that was one that worked on both the TI and on an Apple II that I had access to at school. And then after that of course, you ended up buying games. But we were not super wealthy back then, and games were pretty expensive - comparatively speaking. So if you were going to buy a game, you either had to shell out 20 or 30 dollars for it, and wait for it to come in the mail and then put it in. So it was easier to spend the time than it was to spend the money.
E: And how did you end up with a focus on PHP in your career? Can you explain a little bit about your path to that specialty?
J: Yeah, when I was in the Air Force, I did some programming there was well. I worked with databases a lot, worked with FoxPro. And then around 1994, we got a copy of the original Mosaic browser, that we used on internal classified networks. And of course to program that, you would have to write up pages using plain old HTML. So I started doing that, and realized I liked it. When I got out of the Air Force, I continued working with web pages, that kind of thing, and trying to attach databases to them.
One of the first database systems that I worked with, attaching it to the web was Filemaker Pro - again, that should give you an idea of how long ago this was - using, I think they called it CDML, Claris Dynamic Markup Language. That was fun. I did that for one of the colleges that I was that I was working at at the time. And then I heard about this thing called MySQL, which was a real SQL database system - and this language called PHP that you could use to interact with that, and then generate a webpage from it. And that sounded interesting, so I started on that. That was in 1999, and I’ve just kind of kept on doing it since then.
E: You mentioned that you were coding when you were in the military as well. I saw from your bio that you’ve worked for a number of different types of organizations, and I think people might be interested in knowing if there’s any sort of stark difference between, say, coding in a military organization and coding for a - say a company or a non-profit. I guess it would depend what sector they’re in?
J: Yeah exactly. And my programming work in the military was sort of secondary to the work that I was doing. My primary work was as a - it’s called an operations intelligence specialist, as an enlisted guy. But they put me in charge of training other people who were in the organization. And then to keep track of a lot of that, a lot of that training - I ended up with, like I said, FoxPro and a couple of other things. So the programming for that was, again, secondary to the job. It was in support of that job. Programming there did not strike me as especially different than programming for any other client on the outside. But then that’s because it was a - sort of a small, local client compared to the rest of the military. It was just from my organization.
So I have not found any dramatic differences other than the need to make sure that whatever you did in the military stayed in the military. There was no open source, anything like that. So everything you did had to be kept under wraps. Whereas out here, in the civilian world, you’ve got all this open source stuff, which is fantastic to work with. And you can share both your solutions and your problems with everyone else, and have everyone else look at what you’re doing. To tell you what you’re doing wrong, and maybe help other people figure out what they’re doing wrong. So there was that difference.
E: That’s really interesting. There’s obviously been quite a bit of press lately about US personnel hacks and things like that. Do you have an opinion in general about the security of data in the US systems?
J: We’ll we’ve all heard of Murphy’s Law - anything that can go wrong, will. There are military variations on that law. One of them is, “Always remember that your weapon was built by the lowest bidder.” Unfortunately the same thing is true for government data programs - anything that’s related to data, it’s built by the lowest bidder. So I think this is an example of where that comes back to bite you really fast.
E: That’s really interesting. When the ObamaCare website fiasco unfolded, I had this joke that winning a government contract is an entirely different set of skills from actually making a website or doing anything. So I always thought of the problem in terms of just procurement. But that’s pretty straightforward criticism when you say that the lowest bidder wins.
J: Yeah, that’s pretty much it. In fact when you talk about ObamaCare, it doesn’t matter what your political affiliation is, how you felt about it at all. As a project management case study, it is fascinating. It’s almost a list of everything you could possibly do wrong, all combined into one project. There’s a guy named Arnold Kling, who is an economist. He’s blogged about that in the past as well, if you’re interested in an economist’s point of view, from a guy who’s not only an economist, but was also a programmer and built websites during the boom. He’s got a series of very interesting criticisms about it. So does Megan McArdle. I think she works for Bloomberg now, and is also a technical person. She wrote a lot about the ObamaCare fiasco, again from a production point of view. All really good stuff.
E: And do you think that it’s possible that - to put it broadly, the government has learned a lesson from that in the future? Are changes to procurement processes just so difficult to change?
J: It’s my guess that the inertia is not in favour of things getting better. None of the incentives are right. With government, again, this is an economics thing. When you are taking someone else’s money to spend money on something that you yourself are not personally vested in, then you’re not going to be careful with how much money you take. And you’re not necessarily going to be careful how you spend the money. And you’re not necessarily going to hold anyone, or any of the right people accountable for how it gets spent. And that’s regardless of where you stand politically, it’s just a set of economic incentives.
E: Moving back to PHP, I know that it’s gone sort of up and down in terms of reputation in the last ten years or so. What’s your take on that, and how things have changed just in the last couple of years?
J: Depending on how you felt about PHP ten years ago, it had nowhere to go but up in terms of security. There were a lot of security flaws in the language itself. The primary problem was not the language itself, in my opinion. The primary problem was that people who were drawn to PHP were not necessarily professional programmers. They were not people who had security concerns at heart, they just wanted to get something done quick. Get it on the web. Because either it was going to make them money, or they needed to do it for their own internal organizations. Ao they weren’t necessarily thinking about security concerns right off the bat, because they weren’t targets.
And then a couple of years later, suddenly everyone was a target. With cross-site scripting exploits or SQL injections they can take over your machine. That’s the point at which the community started standing up and paying attention to those kinds of things. You started seeing things like the filter extension put in place. I think that was Pierre-Alain Joy who did that, which was a great boon. But in addition to that, there was widespread education, if by no other means, then by word of mouth among developers, that these kinds of security flaws existed and you needed to watch out for them. So I think that security concerns have been addressed by the language, but it’s been primarily addressed by the better education of the people working in the language.
E: Okay great, thanks. You’re the lead developer for something called the Solar PHP framework. Can you explain a little bit about how you got into that and what it is?
J: Solar is actually an older one at this point. Way back when, in 2002 or 2003, I became more aware of a project called PEAR, that’s P-E-A-R, a collection of libraries. And I wrote a small, what I called the foundation, not a framework, because back then the word framework was a bad word to use in PHP communities. So I called it a foundation, and it was a collection of libraries from PEAR, to do all the basic things that you needed to do, like database connection, authentication, caching, logging - stuff like that.
I realized when putting this together from those other libraries, that none of the libraries really worked the same way; even though they were all part of the same project, they were all from different authors under one banner. So they didn’t look the same way, they didn’t feel the same way, you didn’t call them the same way. So, with that in mind, I decided that I wanted to start another project, where those individual libraries would all look and feel the same way. This was right around the time that PHP 5 came out. So I figured it’d be a good time to make a break with that older system, and start putting together a new set of libraries.
Solar originally stood for Simple Object Library and Application Repository. But it ended up being this monolithic thing, where you downloaded everything all at once, and used all of it whether you wanted to or not. That turned out to be the standard for framework projects at the time. So that was the origin of it, that’s how it got started. A lot of people got interested in that and started contributing as well. It was nowhere near as popular as say the Zend framework or Symphony 1 at the time. But it did have what I would call a small but committed community, and I learned a ton from writing it, and I learned a ton from the people who provided patches and helped to work on it as well.
E: And the Aura for PHP project - that’s a newer thing that you’re working on?
J: That’s exactly right. Aura is essentially Solar version 2. One of the problems that we had with Solar first of all, was the name. It’s S-O-L-A-R. At some time thereafter, the Apache S-O-L-R - SOLR project came out, and people started confusing it with that. So when Solar the PHP project went 1.0, we started discussing how we were going to do the 2.0 project. The first thing we decided was that we needed to change the name. So we settled on Aura as sort of a pun on the name. Au is gold, a sun symbol, and Ra is a sun God, so Aura. And we decided that the new project should be more - should adhere more closely to the original ideas that we had had when we started. That is a series of individual libraries, not something that’s made to deliver, there’s a monolithic framework. So our primary goal was to take the SOLAR stuff and split it off into individual, independent, decoupled components, where they would be independent not only from a particular framework, but also independent of each other.
So those were the driving principles behind starting Aura as version 2 of Solar. I think it’s worked out pretty well. We moved away from a universal service locator to using dependency injection and providing a dependency injection container that you could use if you wanted to, and then the various versions thereafter of the individual libraries, we’ve been able to split those into even smaller components. For example, the Aura SQL component used to be a database connection, a connection manager, an SQL query builder and a gateway and mapper system. We’ve actually split that out so that in version 2, the SQL query builder is its own thing. It’ll work with any database connection at all. You don’t even need one, you can just build the queries at random if you feel like it. And the SQL connection portion just does that. The mapper and gateway then split off in their own components as well. So it’s been a story of reducing the size of the individual packages so they can be recombined in any way you like.
E: Great, thanks, that’s really clear. I was wondering what motivated you to write Modernizing legacy applications in PHP on Leanpub?
J: So the first motivation was, I wrote a talk called, “It was like that when I got here.”, subtitled, “Steps toward modernizing legacy codebases.” The motivation for that talk was that I had been in several organizations where we had these really old codebases - they were tough to work with. And over the course of several years, over several different organizations, I came up with a list of steps and notes for myself on how to reorganize these codebases and make them easier to work with, so that we could add features more quickly, fix bugs more easily and isolate things more easily.
So, the talk came out of a generalized version of a story that I heard over and over again, when you go into an organization and you look at the codebase, and it’s horrible. You ask the people who are already there, “How did it get to be this bad?” And they look at you and they say, “I don’t know how it got to be this bad, it was like that when I got here, and we’ve just dealt with it since then.” Of course that’s a lot of suffering in our daily lives. That’s a lot of pain and anguish. And you end up having this relationship with the codebase that is sort of adversarial. When you walk into work, you’re always kind of scared. “What’s going to break today after I try to fix something?” You spend a lot of hours late at night trying to make sure that things are going to work, because if you touch one piece of code over in one place, then something else somewhere else breaks. And it’s not really your fault, but you’re the programmer, you’re supposed to know what’s going on.
So after writing that talk and going through some of the initial steps of what I had done to modernize these legacy codebases in my own work, I gave the talk the first time, and it was well received. I gave the talk a second time, it was well received. The third time I gave the talk, some people came up to me who had attended the first one, and they said, “Yeah we’ve done it all, we’ve done everything you said, what’s next? Because it’s not really where it needs to be yet.” Well, I had this huge list of notes already, and it turned out that the timing was right for me to sit down and take all of those notes and compile them into hopefully a good “how to” instruction manual on exactly how to follow these steps, follow these principles, and through a series of baby steps, end up on the other side with a codebase that has gone from the spaghetti mess to something that is auto-loaded, dependency-injected, unit-tested, layer-separated and front-controlled. Based on the feedback, I think it’s been a success in those terms.
E: It’s great to hear when things develop from talks like that, and getting feedback from people that they like it, and then wanting to share it with a wider audience over time, and at any time when they want to read it.
E: I was wondering, in the introduction to the book, you define legacy applications, but you make some comments specifically about the nature of PHP legacy applications. I was wondering if you could maybe say a little bit about that?
J: Sure. First of all, the word legacy carries a lot of baggage with it. Normally when we think of legacy code, we think of something that is merely old. It’s five years or ten years old or was developed according to old principles, or it was merely “there before I got here.” And so by definition, every programmer who comes onto a job looks at all the code that was already there as legacy - no matter how good it might actually be. But in the work that I had done that led me to write the book, I discovered certain patterns cropping up over and over again, as to how these applications had been constructed. And some of some of the points in those patterns included things like globals being there, or there being evidence of having attempted to rewrite it using a framework more than once.
So you walk in, and you’ll see a codebase where you can see evidence they’ve tried to apply one framework, and it never really finished, and then another programmer came in later and tried to apply another framework, and that one never really finished. And so you’ve got this codebase with a mess of idioms in it from different systems. Or, there is a poor separation of concerns between what you’ve got; this is especially true in PHP. Where you’ve got a page script that sits in the document root, that has a lot of includes in it to execute logic. And those includes, when you combine them all, end up combining the concerns of the model and the view and the controller all into the same scope. We’re all sharing each other’s variables. So that the typical PHP application in those cases - it ends up looking like what I call an “include oriented” system, rather than class oriented or object oriented, or even procedural. It’s include oriented, and it’s page-based, because you browse to these pages that are sitting directly in the document root, and each page is responsible for setting itself up and tearing itself down. Those and other factors are the terms that I use to determine whether or not something has a legacy application in PHP land, or not.
E: And you talk about how the particular way that people come to PHP in the first place has a specific impact on legacy applications in PHP.
J: That’s exactly right. One of the great things about PHP is that you do not need to be a professional programmer in order to use it. If you’ve got a business idea, or if you’re just working for your organization, you’re not necessarily a technical person to begin with. You hear about this language, PHP. You know that you can type in a few lines of code and get them actually running on a server. And that’s fantastic. It allows you to make money very quickly. So the great thing about PHP is that pretty much anyone can use it.
But the great thing about PHP is also the terrible thing about PHP. And that is that anyone can use it, whether they are professional or not. So the people who write these programs, even if they are junior developers who will end up being professional programmers - PHP allows you to do a lot of stuff that maybe you shouldn’t be doing in the first place, like combining concerns, that kind of thing. But it lets you get up and running so that you can get something productive on the web. And then you turn away and you go to do your next thing. Unfortunately, what’s left behind is something that’s kind of a security and maintenance headache, because you weren’t necessarily thinking about, in advance, good architecture. You weren’t thinking about whether you’d be able to test it automatically or not. Testing, who needs that? I can look at the page, and I can see that it’s working - clearly it’s alright. And that’s great, until your first SQL injection hits, or until your first cross-site scripting problem hits. And then you’re left with this horrible mess. Reading a book called The Inmates Are Running the Asylum, by Alan Cooper, I found what I think is the right analogy for this. When you look at these codebases, it’s like looking at a dancing bear. When you look at a dancing bear - you are not thinking that maybe it’s pirouette is off balance, or that it’s plié is not a full extension. You’re just amazed the thing dances at all in the first place. Looking at codebases is a lot like that. You wonder how this ever worked.
E: That’s a really good image. You also talk about how, when people are faced with the situation where they’ve got this terrible legacy code, that often they have a desire to rewrite the whole thing - which I guess in the context is understandable. And you also write specifically about developers who have the desire to become one’s own customer. I was wondering if you could explain this? I mean I think it’s clear, but I was wondering if you could explain a little bit about that - that image of wanting to be your own customer, and why that’s a problem?
J: So every developer, first of all - no developer ever looks at anyone else’s code and says, “This code is fine the way it is, we’re going to leave it alone.” The first instinct of every developer when they look at someone else’s program, no matter how good it is, is, “This is not the way I would’ve done it, it needs a rewrite.” So because we have that first initial instinct, every time, we need to be suspicious of that. It turns out that that instinct is very self-serving in a lot of ways. We’re not necessarily looking at it to rewrite it, to make things serve the customer better - or for the program to be better. We’re doing it because we want it for ourselves. And if we want it for ourselves, that means that we get to be the customer then - “I get to determine what the needs are for it. And it should be done the way I want it to be done.” So we use ourselves as a reference point, instead of using some external concern as a reference point.
Treating one’s self as the customer includes things like, “Well, I want to use this new framework X. Because that’s the newest, hottest thing. So clearly it should be rewritten in that.” Or, “The way this has been put together is not the way I would have put it together myself, so I am going to be the determinant of what is right in this case - regardless of what any external concern is, and do it the way I would have liked,” rather than looking at it and saying, “Well, it’s serving its purpose pretty well. Other people are paying for it. Maybe we should mostly leave this alone, and only tinker with it around the edges. Or, “Make sure that it keeps working the same way, but just improve the quality of what’s going on under the hood.” The neat thing about being your own customer is that, if you’ll pardon the phrasing, it feels very sexy to do a complete rewrite, because then you get to do it “the right way”. The way it should have been done.
The problem is, that makes you very optimistic as to how well it’s going to go. The joke that I make about rewrites is: You’re going to estimate that it’s going to take a certain amount of time. Of course everyone’s got their favourite estimation techniques - normally it’s, pick some estimate, and then double it - and that’s how long it’ll really take. So you’ve got some buffer. But when it comes to a rewrite, it’s not enough to just double it. You also have to convert it to the next higher units. So that if you think it’s going to be a 6 week rewrite, it’s really maybe about 12 months. That’s because there’s so much going on in the system that you’re blind to, that your optimism blinds you to. And then when you’re most of the way - when you’re 12 weeks into what was going to be 6 week rewrite, you realize how long it’s really going to be. You start cutting corners, you start taking shortcuts again. And you end up with a system that’s just as bad as the old one, just in different ways.
E: And I think you mentioned Netscape as an example in your book. I think that - you say it took them 3 years or something to do their rewrite?
J: Something like that. I don’t recall. I grabbed that one from Joel Spolsky.
J: The idea is that Netscape looked at their codebase and said, “We need to completely redo this.” And maybe that was true. But they went out of business during the rewrite. They’re Mozilla now. They had to completely change everything about their business in order to make that happen. And if you’re in a position where you can afford to do that, maybe a rewrite is for you. But most people that I know do not have either the money or the time to actually go through with that kind of thing. So I hope that this book, if nothing else, will save people from thinking that a rewrite is going to save them. Nuking it from orbit feels great, and refactoring feels a lot like work, and so we don’t want to do that. But it turns out that refactoring, even though it feels like work, has the benefit of actually working. Whereas a rewrite, most of the time, is just going to blow you up.
E: Recently, I think it might have even been just over this past weekend, you held an online boot camp, walking people through the modernizing process. Can you explain the motivation for setting up that boot camp? How you set it up and how it went in the end?
J: I was invited to speak at the Zend conference. I don’t remember if it was last year or the year before, where they wanted a tutorial session on basically working through the book. So I put it together, and it ended up being something like 380 slides worth of information or something like that, for a three hour slot. I presented it to a full room; I was very happy about it. Unfortunately, I had to cut out about a third of the information in it, and I had to skip one entire chapter, because it’s just too much to go over at once. So I figured that if I could present it online to a committed audience over the course of a weekend, rather than in one three hour session, trying to shove everything in there, that that would be a better way of delivering the information. And it turns out to have been true. We worked through, essentially, the same set of information, but I was able to spend a lot more time showing examples and working through - basically, doing limited code examples while doing it. And people could ask questions while I was doing it. It took the full eight hours to get through it, so I really don’t remember quite how I got through any amount of slides in three hours previously. But it was very well received.
E: And do you think it’s something you’ll be doing again?
J: It is something that I would love to do again. This was a good first run, but I saw some mistakes and flaws in how I did the presentation, and there were some things missing, so I had to sort of hem and haw and add those in on the fly. I expect the second version of it to have a lot more to it, and to go a little more smoothly. Again, it was something I very much enjoyed, and I think the people that attended liked it as well.
E: Great, fantastic. I have a question about your second Leanpub book, which is called Solving the N+1 Problem in PHP. I was wondering if you could just explain what that problem is, and why it’s important?
J: Yeah, so the N+1 problem essentially is a “number of queries” problem when you’re putting together your objects. It’s not something that applies just to PHP, this will happen in any language. In fact when I first encountered it, it was happening as a series of stored procedures in Postgres. The idea is this. If you’ve got, say, a blog post or a series of blog posts, and you want to get all the comments for all those blog posts - generally what happens is developers will first get the list of blog posts. So they have to get a list of 10 posts. And then they loop through that list to get the comments on each post, and attach the comments to the post objects. What happens in that case, is you end up making 11 queries. 1 to get the original set of 10 posts, and then 1 query for each of the 10 posts that are in that collection. So you end up with a total of 11 queries. Maybe that by itself is not such a big performance problem, but when you’re putting together a collection of say 20,000 objects with five relations each - you end up with 200,001 queries, and a webpage that doesn’t load for three hours. So that’s a pretty serious performance problem. And it doesn’t even need to be in terms of 10,000 or 20,000 objects. It could be in terms of 20,000 requests being made against your system. If you can reduce by an order of magnitude or more, the number of queries that are being made coming out of an application as scalability concern, then you need a lot fewer machines in order to scale up.
So that’s the basis of the N+1 problem, where the 1 is the initial query, and the N is the multiple number of queries that have to happen to populate those other objects. The book is about how to recognize that that’s going on, why it happens in the first place, and how to solve it - and then some automated ways of solving it after you’ve figured out how to do it by hand.
E: Okay great, thanks, that’s really clear. I was wondering - just switching gears a little bit to your process publishing the books and things like that - I was wondering if you could talk a little bit about how you found out about Leanpub in the first place, and why you chose to use us to publish your book?
J: So Leanpub was highly recommended by a colleague of mine, a guy named Chis Hartjes, the Grumpy Programmer. He had published once through a traditional publisher, and then again through Leanpub - and had nothing but praises to sing for Leanpub. A joke that I have made, and I think he’s heard this before is that, I figured if Chris Hartjes could write a book on Leanpub, then by God so could I. And so that’s how I’d come to Leanpub in the first place.
E: Okay thanks. If you have more to say, please go ahead.
J: No, no if you’ve got more specific questions, I can answer them.
E: Sure, I actually do have a couple. But before I get a little technical, I would like to ask what your opinion is about the book market and how it’s evolving now? Maybe even specifically the computer book market. Do you see more authors doing what you and Chris chose to do, to self-publish programming books? And are there market- and technology-based reasons for doing that? Or is it something with the publishing industry itself that’s driving those choices?
J: I’m going to take the easy way out and say there’s a series of different concerns all playing back and forth with each other on that one. One of the things that people in general like, is to feel like they have been chosen. So when a publisher comes to you and says, “We have seen you, we like you, we want you to publish a book with us,” it has that aspect of feeling like you’ve been chosen by someone else. It makes you feel good about yourself. So that right there is a very powerful driver that favours the traditional market. Because you get that sense of other people having recognized you. It’s a status symbol if nothing else. So that’s one driver in favour of traditional publishing.
The primary driver that I see in favour of individual publishing or self-publishing is that you get to keep more money. And frankly I am a cheap little man, with a mercenary heart, and I like that idea. It also means that I get to be in control of - for good or bad, in control of everything about the process. I get to choose the art, I get to choose the font. I get to choose everything about the book. I get to choose the process for writing. I get to choose in all but the broadest general sense how it’s going to be published. So for people who have very - I’m not going to call them people who are control freaks, I’m going to call them people who are control enthusiasts - for the control enthusiast, self-publishing is a wonderful, wonderful thing.
E: I’m not sure if this was important to you, but one thing in particular with technology books is that having control over timing seems to be really important to people, especially if they’re talking about something that’s meant to solve a problem that people have right now. I think in particular it’s hard for people who think in a technical way to say, “I’ve got the solution, it’s out there, to solve the problems that exist right now. But now I need to subject myself to an arbitrary process. That means it’s going to take a year or two for my solution to get out there.” It seems like there’s just a certain type of person that finds that to be an unbearable situation.
J: I completely agree, and as a follow on to that, it may be that your solution, when written down, is only 30 or 40 or 50 pages. It’s going to be tough to find a publisher who’s going to find that a profitable concern to follow. Whereas with individual publishing, you can do small one-off pieces that address as narrow or as broad a topic as you like. But if it’s small, it’s probably a relatively narrow topic, but being highly focused, you can spend all your time working on that one thing - get it out, get it out of your brain and have it published. And that’s a fantastic thing, in fact that’s the N+1 book that we talked about before. I think that’s a pretty good example of that kind of thing. It didn’t really fit in the modernizing book, even though we reference it. But it’s not something that would be a full-sized book on its own either. So publishing it as, I think it’s like 60 pages, something like that, publishing it as its own stand-alone thing turned out to be really - I mean, it’s really nice to be able to do that. You just can’t do that with a traditional publisher. Not at a profit anyway.
E: And did you publish either of your books in-progress? That is, publish the first version before all the chapters had been -
J: I did, and that was another real benefit of self-publishing, and specifically with the model that Leanpub presents. I was able to write, I think, the first three chapters, which frankly were the hardest three chapters to write, with one exception, and then put them out there on Leanpub to say, “Hey, this is here, if you’re interested come and get it. If you like, please give me feedback so that I can fix typos and address other concerns.” The feedback was phenomenal, just right off the bat, and it wasn’t even finished yet. So, first of all, being able to publish it in installments in that way was super helpful to me as an author. But it was also super helpful to me as a perfectionist. Because you’ve got to get a lot of feedback from a lot of people who are paying attention, who will all find some small detail that you never noticed, and you can put it in and have it go out on the next iteration. It’s fantastic.
E: And how did you receive feedback? Did you explicitly encourage it in your book or on your landing page?
J: So again, one of the wonderful things about the Leanpub process was, as you publish, as you release iterations, you are able to send an email to everyone who had already bought it and say, “Here’s the next version. If you notice problems, here’s how to contact me.” So people would send me emails. In fact I think it was only by email at the time, or someone would say on Twitter, “Hey I noticed on page X that you’ve got this right here, and that’s not quite right. You should probably say this instead.” That communication mechanism, just from hitting the publish button, was fantastic. In addition to that, there were several people who knew me or already, knew my email address, and so could buy the book or download a sample. And they gave me feedback just prior to
E: That’s really interesting, I have question related to that. So, just for anyone listening, the way our emailing readers feature works in Leanpub, it’s that when someone publishes a new version of their book, they have the option to send a message to all readers. And that message goes to you by email. But you don’t actually see the author’s email address, and the author doesn’t see your email address either. So, we facilitate communication without revealing email addresses to people, although it is possible for you to share your email address with the author. My question to you Paul is, did you feel that that was a loss that you didn’t have email addresses for all your readers? Or was the sort of - not exactly double blind - but was the sort of blind communication system that we have, did that work just fine for you?
J: It worked just fine for me. But that was mostly because when people would email me, or when they would contact me through the Leanpub system, they recognized that it was essentially a double blind system. So, if they felt like giving their email addresses they would, and if they didn’t feel like it, they wouldn’t. But in every case, I can’t think of a single case where this wasn’t true. Everyone wanted to provide their email address as part of that communication. So I did not feel like I was missing out on anything. If nothing else, it felt more valid that they were not required to put an email address of some sort, that they wanted to actually have this communication back and forth. And that was very satisfying, if that makes sense?
E: Okay thanks, that’s really clear and really good feedback for us, to know that that works that way. I was wondering, as you were saying about control enthusiasts, a lot of authors who use Leanpub are very opinionated about their writing and publishing tools. I was wondering, from your perspective, if there’s anything that stood out that you think we could improve?
J: That’s a really good question. I can’t think of anything offhand in terms of the tooling that goes on. I just use a text editor and I work in Markdown. I use Markdown for everything personally anyway. So that was not such a big deal to me. Having the Markua superset Markdown was also very nice. If there was one thing that I think could be improved in terms of things on Leanpub’s side, and I think I’ve mentioned this to Scott and one of the other guys at Leanpub, is that it would be nice to be able to publish the sample separately from the main book. So that if you want to make a change to the sample or add or remove things from it, it would be very nice to do that independently. I am aware that the backend production process for those two things is probably closely - they’re probably closely tied to each other. And so, decoupling them from each other might be a very difficult thing to do. But even so, as an author and as the person doing the publishing, I personally would find that pretty useful. Having said that, that is the only thing that I’ve run into that’s been even mildly inconvenient. And everything else about the Leanpub publishing process has been very straightforward and very easy to use.
E: Okay thanks very much for that. That’s the way Leanpub works right now - if you want to update your sample book, you have to hit the “publish a new version” button - which, if you haven’t changed your core book, won’t change anything. But it’s an important - there is a distinction obviously between updating your sample and updating your book, whether you do them both at the same time or not. And it certainly would make - there would be a sort of logic to having them be separate processes. Perhaps one day we’ll do that, but right now, yeah, they’re pretty closely intertwined, those two, the generation of those two things.
The only question I have left is, are you planning on writing another book, and is there one in the pipeline right now?
J: There’s nothing in the pipeline right now. I do have some general vague ethereal ideas about what I would like to do next. One of them is to write a book about action domain responder, which is a refinement of the model for your controller pattern that applies more specifically to web applications than it does other kinds of applications. That would likely be a very small book.
Another one that I’ve got in mind is, there is a culture of what are called “preppers,” people who like to be ready for emergencies, that kind of thing. I have considered the idea of writing a book specific to my experiences in putting together my own little stash of stuff, and how I went about doing it in a step-by-step way, so that it didn’t have to be this gigantic expense all at once. It could be this small step-by-step thing, sort of like how the modernizing process works for applications.
E: That’s really interesting to connect those two things, that would be great to see that.
Okay, well, Paul, I’d just like to say thank you very much for your time and for being on the Lean Publishing podcast - and for being a Leanpub author.
J: Thank you very much for having me.
This interview has been edited for conciseness and clarity.
published Oct 09, 2015
At Leanpub we often hear from people who would like to know how they should go about buying multiple copies of books, usually for classes or for company teams or even corporate libraries.
Currently, there are two ways to buy multiple copies of Leanpub books:
You make a single purchase for a fair price, and you distribute it yourself.
You make a multiple-copy purchase, and people download the book from Leanpub.
Either process is fine with us! It basically boils down to whether you know each recipient’s email address, and whether you want them downloading the book from you or from Leanpub.
Here’s how each of these options works in detail:
1. You make single purchase for a fair price, and you distribute it yourself.
Using this option, we ask that you simply buy the ebook once for the book’s suggested price, multiplied by the number of copies you intend to distribute, and share the ebook files with the intended readers yourself. Of course, there is no DRM on any Leanpub book. So, to buy six copies of a book with a suggested price of $10, you’d pay $60 (6 * $10 suggested price) for one copy, and distribute it using your own process.
2. You make a multiple-copy purchase, and people download the book from Leanpub.
We’ve built a feature that lets you buy multiple copies and assign each copy to a unique reader, who will either create a new Leanpub account, or add the book to their existing Leanpub library. This means they will always have access to the latest version of the book, which is nice since so many Leanpub books are published in-progress or improved over time with updates.
Here are the steps for making a multiple-copy purchase:
1) Add the ebook to your shopping cart.
2) Click the Edit button (which looks like a pencil writing on some paper) to edit the item in your cart.
3) Select the number of copies you would like to buy.
5) On the Thank You page, you will see the following message: “You have purchased multiple copies of this book. Click here to manage your download tokens.” Click that link, and you will see a nice form where you can send a copy of the book to each of your intended readers.
published Oct 01, 2015
We’ve recently made an improvement to our royalty payment policy.
Previously, we only paid out royalties if an author, publisher or cause had earned over $40. This was for administrative convenience, but now we’re changing the policy so that we will make payments to any account that has earned more than $1.00.
As usual, payments will be made on or near the first day of every month. Because of our 100% Happiness Guarantee two-click refund policy, royalties are held for 45 days. Once the royalty owing to you from each individual purchase has passed the 45 day threshold, it will be added to the amount that will be paid to you at the beginning of the next month.