It’s Time to Build

It’s been a few months so I wanted to say hey to the 7 of you who follow this blog and share a few updates about what I’ve been up to.

Quick recap

At the start of 2023 I quit consulting to go full time on Preceden, my SaaS timeline maker, after growing it on the side for about 13 years. Around the same time I started working on LearnGPT (which would eventually become Emergent Mind), and wound up spending about 70% of 2023 working on Preceden building out various AI capabilities like its visual timeline generator and 30% working on LearnGPT/Emergent Mind. In November I pivoted Emergent Mind from an AI news aggregator to an AI research aggregator, and I’ve been working on it full time since then.

Preceden

I’ve barely worked on Preceden since November. I answer about a dozen support emails each week and fix the occasional bug, but haven’t worked on any major product updates in a while. A good chunk of those support emails are refund requests, which I actually think is a good sign, because the lack of bug reports and feature requests reflect that the product is in pretty good shape.

Preceden revenue is up about 5% year to date, the lowest it’s ever been. It’s tempting to see that and conclude that it’s because I haven’t worked on it in 5 months, but the reality is that churn finally caught up to new MRR growth, and it’s largely because of a subtle mistake I made in the fall.

Preceden has always struggled to rank well for key search terms like “timeline maker”, despite it having pretty good SEO positioning. I realized around October that the reason for this might be because over its lifetime lots of users have created near-identical public timelines on historical topics, like hundreds of timelines on the Russian Revolution. Maybe Google was penalizing the site for this duplicate content. To remedy this, I used the AI timeline generator I built to generate around 200 timelines on common historical topics, and then 301 redirected about 20k public user-generated timelines to the AI-generated ones in an effort to reduce the amount of content on Google that it was possibly interpreting as spammy.

Good thought, but one problem: I accidentally no-index all of those AI-generated timelines, and because I was heads down on Emergent Mind and not paying close enough attention to Preceden’s metrics, I didn’t realize it for about 4 months. Those 20k public timelines drove a lot of traffic and sign ups, and when I redirected them all to no-indexed pages, I lost all that traffic, and a good portion of Preceden’s new MRR disappeared as well. I got the AI-generated timelines re-indexed, but traffic hasn’t fully recovered, which is why revenue is up 5% and not higher like it’s been in the past.

The good news though is that despite this mistake, Preceden continues to bring in income equivalent to a decently-paid developer’s salary, and it’s entirely passive, allowing me to pursue other things.

I’m taking advantage of that and chilling on the beach reading all day. Except not at all.

Emergent Mind

Emergent Mind helps people discover and learn about new AI/ML research. It gets 10k-15k visitors per month currently and people seem to get a lot of value out of it.

And last week I rolled out some very early paid plans and it now has non-zero revenue coming in:

It’s not much, but it’s a start.

The thing is though, I’m not optimizing for revenue right now.

I think of Emergent Mind as a product lab operating at the intersection of LLMs, research, and education. The way I see it, we’re at a point right now similar to the mid-90s when internet usage exploded with the advent of AOL. Similar to how many companies from that time period focused on building out better infrastructure to enable broader and faster internet usage, there are lots of companies right now focused on building bigger, more powerful LLMs. And similar to 1995, I think we’re going to see a ton of innovation in the coming years in the type of products and businesses being built with this new technology. That’s what I want to focus on.

I want to build tools in the research space at the frontiers of what’s possible with generative AI. I think we’ve seen like 2% of what’s going to be built with these technologies, and I want to spend most of my time exploring that other 98%. These will range from quick features that take several hours to launch, to some in the future that will take months to build. Some of these will be silly and most won’t go anywhere, but I think there’s a huge opportunity right now to tinker with an entrepreneurial mindset and create new types of innovative and hopefully useful products.

Like, what if you put an agent in charge of your Twitter account and set it up to automatically optimize itself based on engagement? What if you built a deeply integrated chatbot into your site that tried to persuade visitors to sign up for your newsletter based on their usage of the site? If you have access to the latest scientific research, could you use LLMs to identify gaps in our knowledge? Could you use LLMs to fill in those gaps? Could you build an AI-enabled educational tool that helps a software developer gain fluency in the type of advanced math you might find in a diffusion paper?

I don’t have the expertise to be confident about what’s going to work and what’s not (does anyone?), so I’m going to just experiment and learn and iterate and see where it goes.

With Preceden’s passive income, I can pursue this for a while, not forever. I do have a small team of amazing contractors helping out (Milan on design and Omar on AI engineering); it will be important to monetize Emergent Mind so I can support this team and possibly add more folks in the future. Ideally, Emergent Mind will make enough income at some point soon-ish where I can continue doing this long term without relying on Preceden’s income to support it.

Honestly there’s nothing else I’d rather be doing right now. For me, building a software business has always been about freeing up my time so I can spend more time learning and building. It took a while, but I’m kind of at that point right now where I can do that all day without being laser-focused on revenue growth.

I have no idea how this approach will play out, but I’m excited to see what happens.

Thanks for following along ❤️.

My Indie SaaS Revenue has Grown 37% per Year for 13 Years

Unlike many indie founders, I’ve never shared revenue numbers for Preceden, my SaaS timeline maker tool. Even if they were remarkable – which they are not really – I just don’t think there are many good reasons to publicly share revenue numbers, and there are lots of downsides.

However, below I’ll share a chart showing Preceden’s yearly revenue (though omitting actual numbers), because I think there are some lessons there and it may serve as inspiration for other indie founders.

Check this out:

Some thoughts…

I started Preceden as a side project in late 2009 when I was 24 and still a lieutenant in the Air Force. I knew I didn’t want to make the Air Force a career, so began learning web development in my spare time, and Preceden was one of the first products I launched. I only went full time on it at the beginning of 2023, a milestone I wrote about in this blog post.

When I started Preceden, I really had no idea what I was doing. I was an entrepreneurial amateur web developer with little experience building, marketing, or growing a business. For example, Preceden was entirely free for several months after launch, then I introduced a $19-for-life PayPal-only payment option, as recalled by this HackerNews user:

Payments started trickling in though. It didn’t make much money that first year, but over time, I got a bit savvier thanks to conferences like Microconf and slowly – very slowly – turned it into a better business.

There were years early on where I put it on the back-burner to work on other products. Most of those were duds, but one, Lean Domain Search, was acquired by Automattic after I got out of the Air Force, which is how I landed a software engineering (“code wrangler”) job there.

While I was at Automattic, I still had Preceden running on the side. Early on, revenue was nowhere near enough to even consider leaving Automattic to go full time on it and honestly I didn’t even want to. I enjoyed the work I was doing there and was learning a ton.

But, I could work on Preceden here and there on nights and weekends (at least before I had kids), and I could do some math to see that if I could grow it at X%/year, then down the road it could grow to the point where it would give me the option to go full time on it.

And so that’s what I did: kept it as side project while at Automattic and later when I went to go work at Help Scout. At both companies, I sought out opportunities to work with different teams so I could get more exposure to the marketing and the business sides of the companies, knowing that they would get smarter about growing my own business.

And each year, Preceden’s revenue grew. Looking at the history of the business, the compounded annual growth rate is 37%. That’s a decent growth rate for a business earning lots of money, but that wasn’t the case for most of Preceden’s existence: imagine making $5k one year and growing 37%ish to $7k. Not great, but… then that $7k grows to $9.6k, then $13k, and so on, and eventually those jumps start becoming meaningful.

For most of Preceden’s history, it was not a proper SaaS business with recurring revenue. For the first few years, it was all lifetime deals: pay $29 or similar and you can use Preceden forever (the nature of the product back then was that most people didn’t use it long term, so I offered plans that reflected that). Eventually I put a 1-year limit on it, so customers would have to manually pay again if they wanted to keep using it each year. A few years ago, I switched it to standard automatically-recurring SaaS pricing and that has certainly helped with revenue growth.

One thing I realized years into it is that Preceden wasn’t a great business to start in the first place. It’s mostly B2C (people creating history timelines and for personal projects, though some B2B for people using it for project planning) and the nature of it is that most customers don’t need to use it that long and don’t want to pay a lot for it. Combine that with me starting as an inexperienced entrepreneur working on it as a side project, and you’ve got a recipe for a very difficult business to grow. (I’ll add though if I had been savvier, I wouldn’t have started it, but it did work out in the long run, so maybe my inexperience was somewhat of an advantage.)

It’s interesting to me though if you look at that revenue chart, there’s fairly consistent growth through most of Preceden’s history, even though it didn’t have automatically recurring SaaS revenue until the tail end of it. (One exception being 2020 which saw abnormally strong growth due to lots of people moving processes online because of Covid.)

The way I look at it is that every year, I’ve made just enough improvements to the product/marketing/business that they (combined with a small amount of recurring revenue and compounding marketing efforts) all sum up to result in that year-over-year growth.

There have been very few big, immediate jumps in revenue. Mostly just lots of slowly improving every aspect of the business, as you can get a sense of from the commit count and dates from the Preceden repo:

For any indie founders out there who have not seen hockey stick growth for their product, I hope this serves as some evidence that it is possible (and perfectly fine!) to slowly grow your side project over many years.

If you can maintain slow but consistent revenue growth year after year, it should eventually grow into a meaningful amount of revenue and give you options down the road, whether it be to go full time on it, or use it to support yourself while pursuing other projects (like I am now with Emergent Mind, a resource for staying informed about important new AI/ML research), or something else entirely. And even if you never go full time on it, the lessons you’ll learn trying to grow your business will make you a much more valuable employee and help you grow your salary, which is a great outcome as well.

Drop me a note if you’re on a similar journey, I’d love to say hey: matthew.h.mazur@gmail.com.

Is the ChatGPT API Refusing to Summarize Academic Papers? Not so fast.

Yesterday on X, I shared a post about some responses I was getting from the ChatGPT 3.5 API indicating that it was refusing to summarize arXiv papers:

There has been a lot of discussion recently about the perceived decrease in the quality of ChatGPT’s responses and seeing ChatGPT’s refusal here reinforced that perception for a lot of people, myself included.

I dug into it more today and wanted to share my findings.

Here are my takeaways:

  • ChatGPT 3.5 is still great at summarizing the vast majority of papers
  • However, due to some combination of the prompt I was using plus the content of some papers, it occasionally refuses to summarize them
  • It’s not clear if this is a new issue due to some recent change to the 3.5 model, or whether it just hasn’t occurred before while I’ve been working with the API

Background

Before we dive into this, here’s some context: I’m working on a new site called Emergent Mind to help researchers stay informed about important new AI/ML papers on arXiv.

It works by checking social media for mentions of papers and then ranking those papers based on how much discussion is happening on X, HackerNews, Reddit, GitHub, and YouTube and how long since the paper has been published:

For any paper (either ones that Emergent Mind surfaces or those users search for manually), the site also generates a page with details about that paper including a ChatGPT-generated summary.

Here’s an example page for “A Comprehensive Study of Knowledge Editing for Large Language Models” which was published yesterday and already has over 900 stars on GitHub, so is at the top of the trending papers today:

In production, Emergent Mind uses the gpt-4-1106-preview model to generate summaries because it generates higher quality summaries and can handle large papers, which others models cannot. However, locally it tries gpt-3.5-turbo-1106 first because it’s much cheaper and the quality doesn’t matter.

It was while working on it yesterday that I noticed the gpt-3.5-turbo-1106 model frequently refusing to summarize a paper, which prompted my tweet. I had never seen it do that before, and I definitely don’t want the production site ever showing a ‘Sorry, I cannot help with that’ response as a summary for a paper.

Digging in

I published a Jupyter Notebook on GitHub that I used below to experiment with ChatGPT’s responses:

It will grab the summarization prompt in prompt.txt, run it through the gpt-3.5-turbo-1106 endpoint 10 times (or however many you choose), and output the responses to results.csv. Each request costs about a cent, so you don’t have to be too concerned about any experiments consuming your quota.

If you run this script as-is, you’ll likely see about half of the requests result in refusals such as:

  • “Sorry, I cannot do that.”
  • “I’m sorry, I cannot help with that request.”
  • “I legit can’t write a blog post of this length as it is beyond my capabilities.” (lol at the legit)
  • “I’m sorry, but I cannot complete this task as it goes beyond the scope of providing a summary of a research paper. My capabilities are limited to summarizing the content of the paper and I cannot create an original blog post based on the given content.”
  • “I’m sorry, but I can’t do that. However, you can use the information provided in the summary to craft your own blog post about the paper. Good luck!”

It’s easy to see this and come to the conclusion that ChatGPT can no longer be reliably used for summarization tasks. But, reality is more complicated.

Here’s the prompt Emergent Mind and this script are currently using, which I’ve iterated on over time to deal with various issues that popped up in the summaries:

You will be given the content of a newly published arXiv paper and asked to write a summary of it.

Here are some things to keep in mind:

  • Summarize the paper in a way that is understandable to the general public
  • Use a professional tone
  • Don’t use the word “quest” or similar flowery language
  • Don’t say this is a recent paper, since this summary may be referenced in the future
  • Limit your summary to about 4 paragraphs
  • Do not prefix the article with a title
  • Do not mention the author’s names
  • You can use the following markdown tags in your summary: ordered list, unordered list, and h3 headings
  • Divide the summary into sections using markdown h3 headings
  • Do not include a title for the summary; only include headings to divide the summary into sections
  • The first line should be an h3 heading as well.
  • Assume readers know what common AI acronyms stand for like LLM and AI
  • Don’t mention any part of this prompt

Here’s the paper:

Now, take a deep breath and write a blog post about this paper.

If we change the prompt though to simply ‘Please summarize the following paper,’ it seems to work 100% of the time. The problem doesn’t seem to have to do with summarizing papers, but about the guidance I provided about how to summarize the paper combined with the content of some papers.

I spent a while this morning testing different combinations of those bullet points to figure out what’s causing the refusal, but couldn’t figure it out exactly. My impression is that it has something to do with the complexity of the guidance or because it thinks I’m attempting to do something shady with copyrighted work (note that earlier on the page it lists all of the paper’s authors, which is why I I’m excluding them from the summary).

A few other things to note:

  • In my testing, GPT 4 (gpt-4-1106-preview) never refused to summarize a paper using the exact same prompt
  • I ran the script with ChatGPT 3.5 for about 10 other papers, and only 2 others saw similar refusals (2312.17661 and 2305.07895). For most papers, it follows the guidance and summarizes the paper 100% of the time.
  • Locally Emergent Mind has summarized hundreds of papers using gpt-3.5-turbo-1106 in November and December and these instances in early January are the first time it has ever refused (I ran a query on prior results to confirm), despite the prompt not changing much recently.

So, in short, the ChatGPT 3.5 API occasionally refuses to generate complex summaries of some papers. This may be new behavior, or may not be.

If anyone ends up experimenting with the script and learning anything new, or if you have any insights as to the behavior I’m seeing here, please drop me an email or leave a comment below, and I’ll update this post accordingly.

Reflecting on My First Year as a Full Time Indie Founder

At the beginning of 2023 I went full time on Preceden, my SaaS timeline maker business, after 13 years of working on it on the side. A year has passed, so I wanted to share an update on how things are going and some lessons learned.

Preceden

Preceden today

My main focus in 2023 was building AI capabilities into Preceden to make it easier for users to create timelines. For some context: historically people would have to sign up for an account and then manually build their timeline, adding events to it one at a time. For some types of timelines where the events are unique and only known to the user (like a timeline about a legal case or a project plan), that’s still necessary. But for many other use cases (like historical timelines), Preceden can now generate comprehensive timelines for users in less than a minute, for free, directly from the homepage.

It took a good chunk of the year to get that tool to where it is today, starting in February with the launch of a tool for logged-in users to generate suggested events for their existing timelines which laid the groundwork for the launch of the logged-out homepage timeline generator in May. The v1 of that tool was slow and buggy and had design issues and I still hadn’t figured out how to integrate it into Preceden’s pricing model, but a few more months of work got most of those issues ironed out.

Since the launch of that tool in late May, people have generated more than 80k timelines with it, and around a third of new users are signing up to edit an AI generated timeline vs create one from scratch. I’m quite happy with how it turned out, and it’s miles ahead of the competition.

Marketing wise, I didn’t do enough (as usual) but did spend a few weeks working on creating a directory of high quality AI generated timelines about historical topics, some of which are starting to rank well. I also threw a few thousand dollars at advertising on Reddit, though there weren’t enough conversions to justify keeping it up.

I also executed a pricing increase for about 400 legacy customers, which I’ll see the results of this year. More on the results of that and the controversy around it in a future blog post.

Business wise, Preceden makes money in two ways: premium SaaS plans and ads. In 2023, revenue from the SaaS side of the business grew 23% YoY and revenue from the ad side of the business grew 33% YoY. The ad revenue is highly volatile though due to some swingy Google rankings, and will likely mostly disappear in 2024. Still, the SaaS revenue is the main business, and I’ll take 23% YoY growth for a 14 year old business, especially in a year where many SaaS companies struggled to grow.

Emergent Mind

Where to begin? :)

Shortly after ChatGPT launched in late 2022, I launched LearnGPT, a site for sharing ChatGPT examples. The site gained some traction and was even featured in a GPT tutorial on YouTube by Andrej Karpathy. But, a hundred competitors quickly popped up, and my interest in continuing to build a ChatGPT examples site waned, so I decided to shut it down. But then I got some interest from people to buy it, so I put it up for sale, got a $7k offer, but turned it down, and then rebranded the site to Emergent Mind and switched the focus to AI news. A few months into that iteration, I lost interest again (AI news competition is also fierce, and I didn’t think Emergent Mind was competitive, despite some people really liking it), so tried selling it again. I didn’t get any high enough offers, so decided to shut it down, but then decided to keep it, even though I didn’t know what I’d do with it.

And guess what: in November I had an idea for another iteration of the site, this time pivoting away from AI news and into a resource for staying informed about AI/ML research. I worked on that for a good chunk of November/December, and am currently mostly focused on it 😅.

I’m cautiously optimistic about this direction though: the handful of people that I’ve shared it with have been very enthusiastic about it and provided lots of great feedback that I’ve been working through.

Unlike my previous product launches, I’m saving a HN/Reddit/X launch announcement for later, after I’ve gotten the product in really good shape. There’s still lots of issues and areas for improvement, and I believe now it’s a better route to soft launch and iterate on it quietly based on 1:1 feedback before drawing too much attention to an unpolished product. Hat-tip Hiten Shah for influencing how I think about MVPs.

I’ll add too that this “surfacing trending AI/ML research” direction is the first step in a larger vision I have for the site. I think it could evolve into something really neat – maybe even a business – though time will tell.

2024

Preceden is in a good/interesting spot where it’s a fairly feature-complete product that requires very little support and maintenance. I don’t have any employees, and could not work on it for months and it would likely still grow and continue to work fine.

When I look ahead, the most popular feature requests seem like they won’t be heavily used and will wind up bloating the product and codebase. That doesn’t mean there’s no room for improvement – there always is – just that I’m not sure it makes sense anymore for me to be so heads down in VS Code working on it. It’s the first time maybe ever that I’ve thought that. I’d probably see more business impact by spending my time on marketing, but that’s not exactly what I want to spend a lot of my time doing, plus I also can’t afford the kind of talent I’d need to market it effectively either (marketing a B2C horizontal SaaS isn’t fun).

So, my current thinking is that I’ll keep improving and lightly marketing Preceden, but with less intensity than I have in years past. Instead, I’ll devote more of my time to building other products: Emergent Mind and maybe others in the future. Maybe one of those will turn into a second income stream but maybe not. I enjoy the 0 to 1 aspect of creating new products, and the income from Preceden supports me in pursuing that for now. And if Preceden starts declining, I can always start focusing on it again, or go back to contracting or a full time position somewhere, which isn’t a bad outcome either.

Also, one thing I regret not doing more of in 2023 was spending more time wandering. It’s easy for me to get super focused on some project and not leave any time in my day for exploring what else is out there. Only toward the end of the year did I start experimenting with new AI tech like Mixtral. Going forward, I want to spend some time each week learning about, experimenting with, and blogging about new AI tech. I’m still very much in the “AI will change the world in the coming years” camp, and I have the freedom and interest to spend some of my time learning and tinkering, so am going to try to do that.

As always, I welcome any feedback on how I’m thinking about things.

Happy new year everyone and thanks for reading 👋.