Contains a process that I used to convert my blog from proprietary Blogger platform to Hexo FOSS static site generator #WIP

adnan360 ffe95d12c4 Add info on fixing HTML artifacts 3 vuotta sitten
hexo-auto-upload-images 5605c8b8b2 Added code for handling previous inturrupted attempts (best effort, not guaranteed) 4 vuotta sitten
wp-blogger-import-fix 01299efc97 Fixed including "div>" in output, updated text for more tag and default wp posts 4 vuotta sitten
LICENSE 1cbc88922d Initial commit 4 vuotta sitten
README.md ffe95d12c4 Add info on fixing HTML artifacts 3 vuotta sitten
hashover.sql 08ab0a230c Added comment transfer instructions with wp2hashover, added scope for previous HashOver code 4 vuotta sitten
wp-auto-upload-images.md 4b23637825 Added conversion regex for leftover small images, updated readme, separated auto upload images instructions 4 vuotta sitten

README.md

Blogger to Hexo conversion

WARNING: The instructions provided here are still work in progress (you can say incomplete and not tested). So they can change in future. If you follow this process, follow at your own risk and certainly take backups if you do so. Do NOT follow these instructions to apply change on any production/live site. I am not responsible for any damage.

I created this repo to explain how I converted my Blogger blog to a more open, parseable and manageable static site generator Hexo.

Step 1: Export data from Blogger

  • First I logged in to Blogger dashboard, went to my blog.
  • Navigated to Settings -> Other
  • Under Import & backup, clicked Backup content then Save to computer.

This gave me a blog-dd-mm-yyyy.xml file. Ref

Step 2: Import into WordPress

  • Setup a WordPress installation as usual, blank install+up-to-date recommended
  • Trash (delete) the test pages and posts that WordPress creates and then delete them permanently from Trash (otherwise it will get included into the export we're going to do later)
  • From WP Admin, go to Tools -> Import
  • Click Install Now under Blogger importer
  • When it completes, click Run Importer under Blogger importer
  • Choose the .xml file you exported earlier, click Upload file and import and continue with on screen instructions to complete the import

Step 3: Change Categories to tags

Blogger does not have Categories or Tags. It has "Labels". So the Blogger importer in WP made a decision to make labels to categories. Hexo however, does not have a good time with this strategy. It places categories inside one another and makes a mess of the interface.

This is because Hexo by default sees multiple categories as "hierarchical categories" for some reason. e.g. the front matter of posts with multiple categories convert into something like:

categories:
  - Movies
  - Cars
  - Food

Hexo sees them as Movies inside Cars inside Food, which is weird. There is a thing called category grouping:

categories:
  - [Movies, Cars, Food]

This does not help and it stays like before, putting category inside category.

So maybe it is better to convert those categories into tags. After all categories and tags have the same task, you click on them, then a list posts under that is shown on a page. It also stays semantically in line with labels. I went this way. Maybe you'd like to keep it as is. That's fine too.

If you'd like to go my way, then convert categories into tags:

  • From WP Admin, go to Tools -> Import
  • Click Install Now under Categories and Tags Converter
  • When it completes, click Run Importer under Categories and Tags Converter
  • Click Categories to Tags, click Check All and run the conversion.

Step 3: Import external images into WordPress

Blogger has some issues with how it handles images. It puts something like this:

<a href="https://1.bp.blogspot.com/-CJNSp3KUFR8/XJMEpQ2bb0I/AAAAAAAACho/oDw643NqRU47i0HUQ1H_ryPFEjFCL1NrgCLcBGAs/s1600/01.json-wallpaper-response.png" style="margin-left: 1em; margin-right: 1em;"><img data-original-height="470" data-original-width="836" src="https://1.bp.blogspot.com/-CJNSp3KUFR8/XJMEpQ2bb0I/AAAAAAAACho/oDw643NqRU47i0HUQ1H_ryPFEjFCL1NrgCLcBGAs/s320/01.json-wallpaper-response.png" width="320" height="179" border="0"></a>

This looks seriously messed up. Plus, Blogger's long cdn links doesn't help it either.

One thing to note is that the <a> has the original big image and the <img> has the smaller image (if you've chosen to use a shrunk version from the Blogger's WYSIWYG editor to appear in the post body instead of the full size one). Notice the .../s1600/... on the a href and s320 on the img src. It probably describes the dimensions.

We don't want the images to be coming from blogger. Because, if in future Blogger closes, it will die with those images. It is better to keep our images with our blog source code.

There is a plugin called Auto Upload Images to automatically upload external images into WordPress. If we run plugin with this mess of a code, the smaller image will be uploaded to WP. Because the plugin doesn't know the how Blogger inserts images. It will only look into <img src and import the smaller size images. So we'll have to replace the hole code above into something simple as this:

<img src="https://1.bp.blogspot.com/-CJNSp3KUFR8/XJMEpQ2bb0I/AAAAAAAACho/oDw643NqRU47i0HUQ1H_ryPFEjFCL1NrgCLcBGAs/s1600/01.json-wallpaper-response.png" />

We have taken the full size image url and put it in the <img src. Now the plugin will see the full size image (if we decide to use it).

Don't worry about putting an <a> around the image. It will automatically show a lightbox in Hexo with the bigger image when clicked, at least in the default theme. So we can get rid of it.

Do you want to do this for all images yourself? I think no. Luckily, I have written a plugin for this.

There is another issue. You will find that the slugs for the posts after import is different than what it was on Blogger. The WordPress Blogger Importer sets the slugs according to the post title, which may not be always the same as the slug you set on your Blogger blog post. There is a fix which I have included in the plugin as well.

So, the plugin does 3 things:

  1. replace image HTML code with their highest resolution (explained above);
  2. update slug to be same as Blogger;
  3. replace Blogger read more anchor link to <!--more--> for WordPress.

To use the import fix plugin:

  • Copy the wp-blogger-import-fix folder into wp-content/plugins
  • Make sure you have increased maximum execution time in your php.ini, then restart Apache service. You may have to guess how much you would need to increase it based on how many posts you have. I had 120 set.
  • From wp-admin activate the blogger-import-fix plugin. It will start converting code automatically.

WordPress importer automatically imports images into wp-content/uploads and changes the urls in the post body. If for some reason, some images did not import (which happens), it will be imported with the conversion script we'll use later. If you want to upload those images into WordPress, you can check out wp-auto-upload-images.md.

Step 4: Convert into Hexo

  • Export your WordPress blog with Tools -> Export
  • Choose All content, then click Download Export File.

Hexo migration plugin will later parse this file and create posts based on it. This will give you a yourblogname.WordPress.yyyy-mm-dd.xml

  • Create your hexo site as usually (if not already done)
npm install -g hexo-cli
hexo init hexotest
cd hexotest
npm install
  • Then install the migrator plugin. But before that please note, this migrator plugin does not work under node 13.0.x - 13.7.x. You will need to have at least 13.8.0 for it to work. v12 may also work, but not tested. If your distro's package manager cannot install those versions, you may try nvm. (Disclaimer: I haven't tried it yet myself.) This issue is for v2.0.0 of the package only. 1.0.0 works fine on the 13.0.x - 13.7.x versions, but has limited capabilities.
npm install hexo-migrator-wordpress --save

I set post_asset_folder to true on my _config.yml:

post_asset_folder: true

This will create folders for each post for assets (like images) to be put inside it. I think this is better instead of having one images folder and dumping all images incosiderately.

Then I started the migration process: (Doc)

hexo migrate wordpress /path/to/yourblogname.WordPress.yyyy-mm-dd.xml

It imported posts one by one. At the end it said something like:

...
INFO  115 posts migrated.

A hexo generate then hexo serve should make your site available in http://localhost:4000

So now, we will upload the images into Hexo. I have a functioning script in hexo-auto-upload-images folder in this repo. You can copy the auto-upload-images.js file into your <hexo root>/scripts folder. Make sure you have post_asset_folder: true set in your _config.yml and install a relative path plugin with npm i -s hexo-asset-link. Then run hexo generate. It will go through all the .md files, download the linked images into the folder for your post and replace the old urls of the images with the images that have been downloaded.

Step 5: Converting comments into Hexo

Don't forget the comments! Fortunately Blogger export file had comments as well. This has let the WP importer to import and attach the comments with the posts. So the good news is we have the comments in our own WP database and on our own servers. Now comes the task of getting them into Hexo.

I have looked into commenting solutions for Hexo. They include Disqus, barely known third party services that who knows when they will shutdown without notice and even some to use GitHub issues as comments! Disqus is closed source so I would prefer not to use it. GitHub is closed source (at least when I'm writing this) and issues are not supposed to be used as comments, so no!

A better balance between being open and being able to manage a service myself, I chose HashOver. I chose to use v2.0 despite of it being in the active development because it has database support. v1 is flat file and requires file permissions to be set to 0777 which I'm not a fan of.

  • Download v2 (a.k.a. hashover-next) from: https://github.com/jacobwb/hashover-next
  • Do the setup using doc from here. Hashover next supports flat file formats such as xml and json, but you'll have to use MySQL because we'll need to run a script later that requires the setup to be MySQL.
  • Add this to your head. You can change theme name if you want (I added it in themes/landscape/layout/_partial/head.ejs): <% if (page.path !== 'index.html'){ %> <link rel="stylesheet" type="text/css" href="http://domain.com/hashover/themes/default/comments.css"> <% } %>
  • Add this somewhere in the body (I added it in themes/landscape/layout/_partial/article.ejs): <% if (!index && post.comments){ %> <div id="hashover"></div> <% } %>
  • Add this to head or at the end of body (I placed it in themes/landscape/layout/_partial/footer.ejs): <% if (page.path !== 'index.html'){ %> <script type="text/javascript" src="http://domain.com/hashover/comments.php"></script> <% } %>
  • If hashover is not showing up it may be that you'd need to apply some workaround.
  • Also, the HashOver docs is not clear enough, but setting from which domains you want to access the comments is crucial. (e.g. from GitLab pages) So you'll need to edit hashover/config/settings.json to something like this below.

If the file doesn't exist, login to yourdomain.tld/hashover/admin with the admin credentials you input on backend/classes/secrets.php, go to Settings and hit Save (no need to change anything).

I had my Hexo site running in http://localhost:4000. So, I removed http:// from the url and put it in the allowed-domains value. I also added gitlab.io since I plan to deploy my site in GitLab pages. (Although I would remove the localhost:4000 entry from a real production install.) Also, as a part of transition, I want to show HashOver comments in my archived blogger blog, so I added it as well. No .com to let it pass on country specific domains, so that it also works on xyz.blogspot.cn etc. Something like this:

...
	"minifies-javascript": false,
	"minify-level": 1,
	"allowed-domains": ["localhost:4000", "mysite.gitlab.io", "myblogsubdomain.blogspot"]
}
  • Also, make sure to set data-format to sql in the file.

If you ever see that the tables are not being created, you can use an SQL Query available in the hashover.sql in this repo.

Generally, posting a test comment should create the tables and running the above sql should not be necessary.

But we'll also have to import in all the comments from WP. We have a php script called wp2hashover.

Clone the repo or download the project. Copy config.example.php to config.php in wp2hashover to change your settings. All the settings are self explanatory. In case of "thread_syntax" values, use your permalink value from _config.yml and (a) replace the /s with - and (b) delete the last / from the end.

For example, if you have this on _config.yml:

permalink: :year/:month/:day/:title/

then use this on config.php:

$hashover_thread_syntax_posts=':year-:month-:day-:title';
$hashover_thread_syntax_pages=':title';

Also set these to accurately depict the url on page-info table:

$hashover_url_syntax_posts=':year/:month/:day/:title';
$hashover_url_syntax_pages=':title';

Now put it somewhere inside your PHP supported server and access the wp2hashover.php. It should automatically import all the comments from your WordPress db into hashover db. Look for the output on the page. If it shows an error, either try to find what may cause the issue or post an issue on wp2hashover project.

I had some rel="nofollow ugc" showing up on comments with urls. So I ran this SQL query on phpMyAdmin:

UPDATE comments SET body = REPLACE(body, ' rel="nofollow ugc"', '') WHERE INSTR(body, ' rel="nofollow ugc"') > 0;

It didn't completely fix it though. This might be due to some uncommon formatting/parsing issue in HashOver.

Ref: