|
|
|
Return To The Website Feeds Homepage
Search My Site XML Feed

Posted: Thu, 17 Jul 2025 00:00:00 +0000

Today marks the 5th anniversary of the launch of searchmysite.net, and it is also a year since the last blog entry (which was the Four year retrospective), so now seems a good time to summarise progress.

The original objective of searchmysite.net was the ambitious “search just the good stuff” (it even indexed wikipedia until that was stopped to cut costs), then it was changed to “search non-commercial sites”, although few people seemed to know what “non-commercial sites” were so it was changed to “search personal and independent sites”. However, in order give the site more focus, and in line with the Unix philosophy of trying to “do one thing and do it well”, independent sites are now being removed. I’m really sorry to see some of my favourite independent sites go, but searching all good personal sites is hopefully a much more achievable goal for a spare-time project than searching all good personal and independent sites.

Continue reading at the publisher's website.


Posted: Wed, 17 Jul 2024 00:00:00 +0000

Today is the 4 year anniversary of searchmysite.net (the open source search engine for the indieweb / small web / digital gardens, which heavily downranks pages with adverts, and aims to pay running costs via a listing fee and search-as-a-service), so that seems a good point for a quick update on its current status.

Highlights from the past year

The biggest and most exciting news from the past year was of course the redesign and first major open source contributor in Aug 2023. More in the dedicated blog post.

Continue reading at the publisher's website.


Posted: Sat, 16 Dec 2023 00:00:00 +0000

Milestones

17 Jul 2020 - the first site submitted by a real user and indexed. 12 Mar 2022 - the 1000th site, around 20 months later. 15 Dec 2023 - the 2000th site, roughly 21 months after the 1000th.

The 2,000th site indexed by searchmysite.net is…

paulrobichaux.com. Many thanks for submitting!

How many sites could searchmysite.net contain?

I ran a poll on Mastodon asking How many people in the world actively maintain a personal website? The most popular answer was 80,000, i.e. roughly 1 out of every 100,000 people in the world.

Continue reading at the publisher's website.


Posted: Sat, 19 Aug 2023 00:00:00 +0000

Some exciting news, first hinted at in the Three year retrospective: the searchmysite.net redesign has been launched, thanks to its first major open source contributor Lucas Gramajo.

First major open source contributor

Just to clarify, there have been other open source contributors to searchmysite.net before. The first was Binyamin, who contributed the favicon on 23 Oct 2020. And after searchmysite.net was fully open sourced in Dec 2020, there have been other contributions in the form of bug reports, discussion etc. Plus of course searchmysite.net is entirely based on other open source projects like Apache Solr, so without all the contributions to those great projects, this project wouldn’t exist.

Continue reading at the publisher's website.


Posted: Mon, 17 Jul 2023 00:00:00 +0000

Today is the 3 year anniversary1 of searchmysite.net (the open source search engine for the indieweb / small web / digital gardens, which heavily downranks pages with adverts, and aims to pay running costs via a listing fee and search-as-a-service), so that seems like a good opportunity for a quick recap of progress so far and hints at what is likely ahead.

Progress so far

Highlights from the past 3 years

It was fully open sourced in Dec 2020. I’ve had good feedback from real users, so I know searchmysite.net is still helping some real people find real content that is difficult to find elsewhere, and even that the search as a service is proving useful2. It now has nearly 1,800 sites listed (although if 0.01% of the world’s population has an actively updated personal web site, that still means searchmysite.net has just 0.2% of actively updated personal web sites). The English language wikipedia was added to the index, although later removed for cost reasons, but that does show the system can handle 10s of millions of documents, in addition to 100s of thousands of (admittedly mostly spam bot) searches a day. Every results view now has an RSS output, so you can e.g. subscribe to search queries for your favourite topics or even make an RSS feed for a site which doesn’t already have one. A number of alternative search engines have since been launched, both commercial ones with big funding, and independent ones like search.marginalia.nu, so that does suggest there is real interest in new approaches to search.

Usage levels

Stats have been fairly static, averaging around 10 real users a day for most of the time, although for the past 2.5 weeks there have been at least 30 users a day, which is good (let’s hope that continues):

Continue reading at the publisher's website.


Posted: Sat, 10 Dec 2022 00:00:00 +0000

Running costs

This is a short post comparing previous running costs on AWS and current running costs on Hetzner. Previous posts with details of running costs are searchmysite.net: The delicate matter of the bill from Jan 2021, and the Escalating running costs section in the searchmysite.net retrospective and future plans from Jan 2022.

AWS

Month      Cost       GBP July 2020 $9.04 £7.30 August 2020 $20.47 £15.56 September 2020 $28.95 £21.49 October 2020 $41.09 £31.72 November 2020 $44.64 £34.42 December 2020 $47.13 £35.36 January 2021 $47.35 £34.69 February 2021 $44.43 £32.43 March 2021 $47.94 £34.43 April 2021 $47.70 £34.60 May 2021 $49.11 £35.31 June 2021 $48.54 £34.26 July 2021 $49.54 £35.86 August 2021 $53.78 £38.51 September 2021 $53.22 £38.67 October 2021 $65.47 £48.34 November 2021 $75.85 £55.41 December 2021 $77.00 £57.85 January 2022 $77.00 £57.13 February 2022 $78.27 £58.04 March 2022 $52.01 £38.81 April 2022 $3.46 £2.63 May 2022 $0.00 £0.00

Hetzner

Month      Cost       GBP March 2022 €6.77 £5.64 April 2022 €6.97 £5.86 May 2022 €6.97 £5.85 June 2022 €8.36 £7.44 July 2022 €8.36 £7.21 August 2022 €8.36 £7.46 September 2022 €8.36 £7.53 October 2022 €8.36 £7.53 November 2022 €8.36 £7.44

Notes

Servers are of similar specifications: AWS was a t3.medium EC2 instance with 2 vCPUs and 4Gb RAM, while Hetzner is a CPX21 instance also with 2 vCPUs and 4Gb RAM. There have not been stability or performance issues with either provider. An instance with 2 vCPUs and 4Gb RAM has been fine for indexing nearly 6,500,000 pages, and handling up to 166K searches a day from spam bots and up to 1.7K searches a day from real users1. There was a big monthly increase in running costs starting in October 2021 when I began indexing Wikipedia, and ending in March 2022 when I stopped indexing Wikipedia, largely due to additional disk space requirements. Excluding the period when Wikipedia was indexed (for a fair comparison), AWS was around £35 a month while Hetzner is around £7 a month, so Hetzner is around 80% cheaper for an equivalent service. In addition, AWS costs were unpredictable and increased significantly over time as usage increased, while Hetzner costs have been much more stable. This makes it easier to budget, and perhaps more importantly makes it feel less like I am paying to service the spam bots. There were two months (March and April 2022) when I was paying for two providers, because (i) I simply switched off the AWS servers, in case the migration failed and I needed to switch back, but was charged almost the full amount for the switched off servers, and (ii) when I decided the migration was complete and irreversible, I thought I had deleted absolutely everything on AWS, but kept on finding I’d missed things like backups and the Elastic IP address, so it took nearly 2 months to get the AWS bills to zero.

Income

Month Income February 2021 £11.45 March 2021 £11.45 April 2021 £22.90 May 2021 £0.00 June 2021 £0.00 July 2021 £0.00 August 2021 £0.00 September 2021 £0.00 October 2021 £0.00 November 2021 £11.45 December 2021 £11.45 January 2022 £11.63 February 2022 £23.26 March 2022 £0.00 April 2022 Continue reading at the publisher's website.


Posted: Sat, 29 Oct 2022 00:00:00 +0000

A quick summary of the new web feed for all search results

All search results pages (including Newest Pages and Browse Sites) now have a web feed icon in the top middle next to the results count (in between the Filters and Sort by). Clicking this takes you to an OpenSearch Atom format web feed1 for that query.

This allows you to, for example:

Subscribe to new posts about your favourite topics. For example, if you are interested in seeing new posts about Stable Diffusion, search for “stable diffusion” (with double-quotes), change Sort by to “Published date (newest first)”, (optionally) set filters like Language and In web feed, then copy and paste the web feed link into your feed reader. Create feeds for sites which don’t provide feeds. For example, use Browse Sites to get to the site you want (use e.g. Sort by Domain if necessary), click the Domain link to return all results from that domain, set Filters and Sort by if necessary, and use the web feed link.

Note that this functionality:

Continue reading at the publisher's website.


Posted: Sun, 18 Sep 2022 00:00:00 +0000

This is a quick post summarising the simplified site listing workflow and search as a service improvements.

Why the site listing workflow needed simplifying

The old site listing workflow was suprisingly complicated, with a number of different routes through the process, and the ability to restart and take a different route at a later date. Unfortunately, there were a number of issues, for example:

Unexpected combinations of routes leading to unusual bugs like “A site submitted via Quick Add but awaiting approval, then submitted again via Verified Add, won’t be indexed until moderator approval”. Difficulties adding new features like “Search as a service: Free trial mode”. Users not entirely sure whether they should use the “Quick Add”, “Verified Add (IndieAuth)” or “Verified Add (DCV)” option.

The new site listing workflow

All submissions in the new workflow start from the same Add Site page, and the listing types have been renamed to “Basic” and “Full” (plus the new “Free Trial”), which is hopefully clearer. The second step for the Full listing asks for “Login and domain ownership validation method”, which again is hopefully clearer than the existing “Domain Control Validation” or “IndieAuth” options.

Continue reading at the publisher's website.


Posted: Sun, 12 Jun 2022 00:00:00 +0000

This post is to provide an update on the automated SEO searches issue described in my last post Almost all searches on my independent search engine are now from SEO spam bots. It references the discussion on Hacker News (HN) from Mon 16 May.

Traffic and system performance and stability

In terms of traffic, there were 18,034 visitors to blog.searchmysite.net on Mon 16 May and 2,699 visitors to searchmysite.net, pretty much all of which came as a result of the HN post:

Continue reading at the publisher's website.


Posted: Sat, 14 May 2022 00:00:00 +0000

Introduction

searchmysite.net was launched nearly 2 years ago to help people discover all the great original content on personal and independent websites which is so hard to find now that the major search engines have become swamped by SEO spam. It employed a number of novel techniques to try to avoid the same fate, such as having a community-driven curation and moderation layer, heavily downranking pages containing adverts to reduce the incentive for SEO spam, and aiming to pay the running costs from its search as a service rather than by selling user data to advertisers.

Continue reading at the publisher's website.


Posted: Sat, 12 Mar 2022 00:00:00 +0000

The 1,000th site indexed by searchmysite.net is…

weeklymusings.net run by scottnesbitt.online:

It was submitted and indexed on Monday 7 March 2022. (The 1,001st site indexed was uh.edu/engines.)

It was part of a flurry of new submissions, thanks largely to the mention at hubme.it/search-my-site.

Note that it is the 1,000th site indexed rather than submitted. Many more sites have been submitted than are currently indexed, because some (currently 357) have been submitted but not approved for indexing, and some (currently 43, i.e. around 4-5%) have been approved but have subsequently had indexing disabled due to the indexing errors detailed in Some of the challenges of building an internet search.

Continue reading at the publisher's website.


Posted: Sat, 05 Mar 2022 00:00:00 +0000

Priorities for 2022

As detailed in the searchmysite.net retrospective and future plans, one of the priorities for 2022 was moving to a cheaper hosting provider (another was to spend less time writing blog entries, hence this being a brief post).

Cheaper hosting

I had hoped to use fosshost.org, but unfortunately they’re not taking new projects. Of the paid alternatives, I chose hetzner.com which appeared around a quarter of the cost of the previous hosting and got reasonable mentions in places like Hacker News.

Continue reading at the publisher's website.


Posted: Sat, 08 Jan 2022 00:00:00 +0000

Introduction

It has been around 1.5 years since I launched searchmysite.net as a side-project to try and address the problems with the current commercial internet search offerings, and I reckon I’ve now spent around 650 hours working on it1. The 2021 year end seems a good opportunity for a retrospective of what has gone well, what has gone neither well nor not so well, and what has not gone so well. And based on that, some thoughts on where it should go in 2022. Given the nature of the project, this will be an open and honest account rather than a Silicon Valley style “fake it ’til you make it” piece.

Continue reading at the publisher's website.


Posted: Sat, 30 Oct 2021 00:00:00 +0000

So I’ve finally managed to index Wikipedia, or at least the 6,392,807 English language pages.

Some of the benefits this brings to searchmysite.net:

It turns it into a much more useful search engine for day-to-day usage. Many of my internet searches in the past have simply ended with clicks to Wikipedia, so now when I’m performing that sort of search I can use searchmysite.net to get the Wikipedia link and see if there are any other personal or independent sites which have anything interesting to say on the topic. It could still benefit from users submitting more good quality personal and independent websites for indexing1, and some other changes such as extending its relevancy tuning, but it is definitely showing promise. It shows that the system can handle nearly 6,500,000 documents, even on a single relatively low spec server. As an aside, this is nearly a quarter of the size of the first Google index in 19982. The mechanism for allowing the indexing process to differ on a site-by-site basis opens up the possibility of implementing additional custom indexing processes for other sites. Maybe, being open source, people could even contribute their own in future.

BTW, if someone wants to try out the Wikipedia import, they can simply spin up a searchmysite instance using the 8 commands listed in the README.md, and then run import.sh via docker exec -it src_indexing_1 /usr/src/app/bulkimport/wikipedia/import.sh.

Continue reading at the publisher's website.


Posted: Sat, 23 Oct 2021 00:00:00 +0000

One of the objectives of searchmysite.net - indexing just good quality content

It is a truth universally acknowledged that most of the modern internet is rubbish. Even Google appears to index trillions of pages1 but only saves around “hundreds of billions”2 in their search index, suggesting even they chuck most of it out.

The approach searchmysite.net takes is to try to index just “the good stuff”, rather than to try to index the whole internet with all of its garbage, and better quality content in the index should be a factor which helps improve results quality. There are a number of techniques it uses to try to achieve this, primarily:

Continue reading at the publisher's website.


Posted: Fri, 11 Jun 2021 00:00:00 +0000

Introduction

This is just a quick update on progress. Since the last post on 30 Jan 2021 I have:

Been checking the new submissions on a daily basis and approving/rejecting accordingly. Been checking and responding to emails. Made a couple of minor maintenance releases, primarily bug fixes - details at https://github.com/searchmysite/searchmysite.net/releases.

Stability

I’m also pleased to report that there have been no outages in the first 6 months of this year. There have been two occasions where the indexing got stuck, i.e. new submissions could not be indexed and existing sites could not be reindexed, but users could search previously indexed sites so it was not really a visible outage as such:

Continue reading at the publisher's website.


Posted: Sat, 30 Jan 2021 00:00:00 +0000

Introduction

searchmysite.net is an open source search engine and search as a service for personal and independent websites. The first users began submitting sites on 17 July 2020, and it has been growing steadily since then. It is a bootstrapped side-project, so currently receives no external funding. Furthermore, it does not plan to fund itself with advertising, unlike pretty much every other search engine.

This post contains a quick review of current and expected future running costs, along with a summary of the plan to pay these running costs. The future estimates aren’t necessarily very accurate, and there isn’t much detail on possible cost optimisations, because the main focus is still on building the system, increasing adoption, and of course testing whether the idea of a search engine sustained by anything other than advertising can actually work.

Continue reading at the publisher's website.


Posted: Thu, 17 Dec 2020 00:00:00 +0000

searchmysite.net is an open source search engine and search as a service for personal and independent websites, which has a unique approach to advertising to try to tackle spam. I mentioned this approach to advertising in my last post. While this attracted a lot of positivity, it also unfortunately got some negativity1, so I thought I’d write a quick post to clarify my position on advertising and search engines, hopefully in positive terms.

Continue reading at the publisher's website.


Posted: Sat, 12 Dec 2020 00:00:00 +0000

Introduction

I used the quote “talk is cheap, show me the code” in the introduction to my first post on searchmysite.net. Well, here it is: https://github.com/searchmysite/searchmysite.net/.

Why aren’t other search engines open source?

Pretty much every other search engine treats their inner workings as a closely guarded secret. This is to stop the spammers figuring out how to game the system and increase the ranking of their results to earn a greater share of the advertising revenue. However, this isn’t a concern for searchmysite.net, because its operating model is designed to both keep spam out and to remove the financial incentive for spam in the first place:

Continue reading at the publisher's website.


Posted: Sat, 05 Dec 2020 00:00:00 +0000

Introduction to relevancy tuning for searchmysite.net

This post contains details of the most recent round of relevancy tuning for searchmysite.net. I’ve decided to dedicate a whole post to the subject, given how important but under-appreciated the topic is.

It is surprisingly difficult to find much good information about relevancy tuning on the internet, unless it is hidden away on some hard-to-find personal websites somewhere. For most of the big sites the scoring algorithm is opaque, perhaps to try to retain an advantage in the game of cat and mouse with the Search Engine Optimisation practitioners. This of course isn’t a concern for searchmysite.net, with its model designed to both keep spam out and remove the financial incentive for spam.

Continue reading at the publisher's website.


Posted: Sun, 29 Nov 2020 00:00:00 +0000

Introduction

Keen-eyed observers may have noticed that the Analytics section on the Privacy Policy has recently been updated. I thought it would be worth a short post with further information.

The original web anlytics solution

Hopefully it goes without saying that some form of web analytics is useful even for a privacy aware site like searchmysite.net, because you really need to know how a site is being used when looking at certain issues and enhancements, scaling the infrastructure etc.

Continue reading at the publisher's website.


Posted: Sun, 22 Nov 2020 00:00:00 +0000

Introduction

In my last major post, searchmysite.net update: Seeding and scaling from 25 Sept 2020, I concluded “I’ll ease off on enhancements, and try to focus on adoption for a while”. So how has that gone? Well, I had a nice burst of activity between 16 and 19 Oct, with 215 sites submitted in 4 days, which was great and led to some really useful feedback, including the first site to use the searchmysite.net API to power their search page. However, the higher levels of usage did expose some issues, which I’ll summarise here.

Continue reading at the publisher's website.


Posted: Sat, 21 Nov 2020 00:00:00 +0000

Welcome

Welcome to the first post on searchmysite.net’s new blog. Well the first one first posted on this blog - there are actually some earlier posts which I’ve copied over from my personal site where they were first published:

searchmysite.net: Building a simple search for non-commercial websites (18 Jul 2020) searchmysite.net update: Seeding and scaling (25 Sep 2020) Adding a simple search page to my personal website with searchmysite.net (9 Oct 2020)

I decided to split the blogs out because there’s quite a bit I want to write on both, and searchmysite.net does seem to be starting to get a life of its own now.

Continue reading at the publisher's website.


Posted: Fri, 09 Oct 2020 00:00:00 +0000

Introduction

This post shows how I added a simple search page to my personal website with searchmysite.net. You can click on it and try it out via the Search link at the top of any page on my site. Note that it is very simple.

I know I don’t really need search functionality given that my personal website only currently has a few posts, but I wanted to test the process out and demonstrate it.

Continue reading at the publisher's website.


Posted: Fri, 25 Sep 2020 00:00:00 +0000

Introduction

It has been just over 2 months since I launched https://searchmysite.net. I’ve had some good feedback from the IndieWeb community in that time, and made some key changes as a result, so thought it was time for an update. You may still want to refer to searchmysite.net: Building a simple search for non-commercial websites for the original overview.

The main changes can be summarised as seeding and scaling:

“Pre-loading” the search index with several hundred personal websites, so the results set will be much richer from the outset. Allowing site submissions from non-verified site owners, so the search index can grow more quickly. Improving the relevancy tuning, so that that it can continue returning good results from a much larger and potentially more noisy search index. Redesigning the Search and Browse pages, to facilitate navigation of considerably more sites. Upgrading the servers, to allow it to cope with the larger indexing load and index size.

I’ve also replaced references to “the non-commercial web” and “independent websites” with references to “personal websites” to try to give the site a clearer focus. I still like the idea of listing small independent websites beyond just personal websites, e.g. the independent B&B sites I mention in London to Orkney, and most of the NC500, in an electric car, but I should get the personal websites listing working really well first, and of course see if there is any interest in extending.

Continue reading at the publisher's website.


Posted: Sat, 18 Jul 2020 00:00:00 +0000

Introduction

I’ve written previously about what went wrong with the internet and how to fix it, and one of the ideas I mentioned was a new model for search. Given “talk is cheap, show me the code”, I decided to implement it. Okay, it wasn’t quite that easy, but here it is: https://searchmysite.net.

The key features are that it:

Contains only sites submitted by verified site owners, as a form of quality control. Contains no adverts, and downranks results containing adverts to discourage “Search Engine Optimisation”, clickbait content etc. (note that there is a model for sustaining the service long term without having to resort to advertising). Features a very high degree of privacy (no persistent cookies, only one session cookie in the Add My Site and Manage My Site sections, no code downloaded from third parties, etc.) Has an API for site owners to e.g. inspect their data and add a search box to their own sites. Has filters for site owners to customise their indexing process.

To quickly recap the idea, it has been inspired by the growing interest in the noncommercial web and the reaction against the over-commercialisation of the internet and the problems that brings. On forums like Hacker News, for example, there have been lot of comments about how hard it is to find all the fun and interesting content from personal websites and blogs nowadays and how the advertising funded search model is broken1.

Continue reading at the publisher's website.




The URL Of This Webpage Is:
https://www.georgefarina.net/webguide/categories/viewfeed.php?title=Search+My+Site