Contains a process that I used to convert my blog from proprietary Blogger platform to Hexo FOSS static site generator #WIP
adnan360 ffe95d12c4 Add info on fixing HTML artifacts | 3 years ago | |
---|---|---|
hexo-auto-upload-images | 4 years ago | |
wp-blogger-import-fix | 4 years ago | |
LICENSE | 4 years ago | |
README.md | 3 years ago | |
hashover.sql | 4 years ago | |
wp-auto-upload-images.md | 4 years ago |
WARNING: The instructions provided here are still work in progress (you can say incomplete and not tested). So they can change in future. If you follow this process, follow at your own risk and certainly take backups if you do so. Do NOT follow these instructions to apply change on any production/live site. I am not responsible for any damage.
I created this repo to explain how I converted my Blogger blog to a more open, parseable and manageable static site generator Hexo.
This gave me a blog-dd-mm-yyyy.xml
file. Ref
.xml
file you exported earlier, click Upload file and import and continue with on screen instructions to complete the importBlogger does not have Categories or Tags. It has "Labels". So the Blogger importer in WP made a decision to make labels to categories. Hexo however, does not have a good time with this strategy. It places categories inside one another and makes a mess of the interface.
This is because Hexo by default sees multiple categories as "hierarchical categories" for some reason. e.g. the front matter of posts with multiple categories convert into something like:
categories:
- Movies
- Cars
- Food
Hexo sees them as Movies inside Cars inside Food, which is weird. There is a thing called category grouping:
categories:
- [Movies, Cars, Food]
This does not help and it stays like before, putting category inside category.
So maybe it is better to convert those categories into tags. After all categories and tags have the same task, you click on them, then a list posts under that is shown on a page. It also stays semantically in line with labels. I went this way. Maybe you'd like to keep it as is. That's fine too.
If you'd like to go my way, then convert categories into tags:
Blogger has some issues with how it handles images. It puts something like this:
<a href="https://1.bp.blogspot.com/-CJNSp3KUFR8/XJMEpQ2bb0I/AAAAAAAACho/oDw643NqRU47i0HUQ1H_ryPFEjFCL1NrgCLcBGAs/s1600/01.json-wallpaper-response.png" style="margin-left: 1em; margin-right: 1em;"><img data-original-height="470" data-original-width="836" src="https://1.bp.blogspot.com/-CJNSp3KUFR8/XJMEpQ2bb0I/AAAAAAAACho/oDw643NqRU47i0HUQ1H_ryPFEjFCL1NrgCLcBGAs/s320/01.json-wallpaper-response.png" width="320" height="179" border="0"></a>
This looks seriously messed up. Plus, Blogger's long cdn links doesn't help it either.
One thing to note is that the <a>
has the original big image and the <img>
has the smaller image (if you've chosen to use a shrunk version from the Blogger's WYSIWYG editor to appear in the post body instead of the full size one). Notice the .../s1600/...
on the a href and s320
on the img src
. It probably describes the dimensions.
We don't want the images to be coming from blogger. Because, if in future Blogger closes, it will die with those images. It is better to keep our images with our blog source code.
There is a plugin called Auto Upload Images to automatically upload external images into WordPress. If we run plugin with this mess of a code, the smaller image will be uploaded to WP. Because the plugin doesn't know the how Blogger inserts images. It will only look into <img src
and import the smaller size images. So we'll have to replace the hole code above into something simple as this:
<img src="https://1.bp.blogspot.com/-CJNSp3KUFR8/XJMEpQ2bb0I/AAAAAAAACho/oDw643NqRU47i0HUQ1H_ryPFEjFCL1NrgCLcBGAs/s1600/01.json-wallpaper-response.png" />
We have taken the full size image url and put it in the <img src
. Now the plugin will see the full size image (if we decide to use it).
Don't worry about putting an <a>
around the image. It will automatically show a lightbox in Hexo with the bigger image when clicked, at least in the default theme. So we can get rid of it.
Do you want to do this for all images yourself? I think no. Luckily, I have written a plugin for this.
There is another issue. You will find that the slugs for the posts after import is different than what it was on Blogger. The WordPress Blogger Importer sets the slugs according to the post title, which may not be always the same as the slug you set on your Blogger blog post. There is a fix which I have included in the plugin as well.
So, the plugin does 3 things:
<!--more-->
for WordPress.To use the import fix plugin:
wp-blogger-import-fix
folder into wp-content/plugins
120
set.wp-admin
activate the blogger-import-fix
plugin. It will start converting code automatically.WordPress importer automatically imports images into wp-content/uploads
and changes the urls in the post body. If for some reason, some images did not import (which happens), it will be imported with the conversion script we'll use later. If you want to upload those images into WordPress, you can check out wp-auto-upload-images.md
.
Hexo migration plugin will later parse this file and create posts based on it. This will give you a yourblogname.WordPress.yyyy-mm-dd.xml
npm install -g hexo-cli
hexo init hexotest
cd hexotest
npm install
13.0.x - 13.7.x
. You will need to have at least 13.8.0 for it to work. v12 may also work, but not tested. If your distro's package manager cannot install those versions, you may try nvm. (Disclaimer: I haven't tried it yet myself.) This issue is for v2.0.0 of the package only. 1.0.0 works fine on the 13.0.x - 13.7.x versions, but has limited capabilities.npm install hexo-migrator-wordpress --save
I set post_asset_folder
to true
on my _config.yml
:
post_asset_folder: true
This will create folders for each post for assets (like images) to be put inside it. I think this is better instead of having one images
folder and dumping all images incosiderately.
Then I started the migration process: (Doc)
hexo migrate wordpress /path/to/yourblogname.WordPress.yyyy-mm-dd.xml
It imported posts one by one. At the end it said something like:
...
INFO 115 posts migrated.
A hexo generate
then hexo serve
should make your site available in http://localhost:4000
So now, we will upload the images into Hexo. I have a functioning script in hexo-auto-upload-images
folder in this repo. You can copy the auto-upload-images.js
file into your <hexo root>/scripts
folder. Make sure you have post_asset_folder: true
set in your _config.yml
and install a relative path plugin with npm i -s hexo-asset-link
. Then run hexo generate
. It will go through all the .md files, download the linked images into the folder for your post and replace the old urls of the images with the images that have been downloaded.
Don't forget the comments! Fortunately Blogger export file had comments as well. This has let the WP importer to import and attach the comments with the posts. So the good news is we have the comments in our own WP database and on our own servers. Now comes the task of getting them into Hexo.
I have looked into commenting solutions for Hexo. They include Disqus, barely known third party services that who knows when they will shutdown without notice and even some to use GitHub issues as comments! Disqus is closed source so I would prefer not to use it. GitHub is closed source (at least when I'm writing this) and issues are not supposed to be used as comments, so no!
A better balance between being open and being able to manage a service myself, I chose HashOver. I chose to use v2.0 despite of it being in the active development because it has database support. v1 is flat file and requires file permissions to be set to 0777 which I'm not a fan of.
themes/landscape/layout/_partial/head.ejs
):
<% if (page.path !== 'index.html'){ %>
<link rel="stylesheet" type="text/css" href="http://domain.com/hashover/themes/default/comments.css">
<% } %>
themes/landscape/layout/_partial/article.ejs
):
<% if (!index && post.comments){ %>
<div id="hashover"></div>
<% } %>
themes/landscape/layout/_partial/footer.ejs
):
<% if (page.path !== 'index.html'){ %>
<script type="text/javascript" src="http://domain.com/hashover/comments.php"></script>
<% } %>
hashover/config/settings.json
to something like this below.If the file doesn't exist, login to yourdomain.tld/hashover/admin
with the admin credentials you input on backend/classes/secrets.php
, go to Settings and hit Save (no need to change anything).
I had my Hexo site running in http://localhost:4000
. So, I removed http://
from the url and put it in the allowed-domains
value. I also added gitlab.io
since I plan to deploy my site in GitLab pages. (Although I would remove the localhost:4000
entry from a real production install.) Also, as a part of transition, I want to show HashOver comments in my archived blogger blog, so I added it as well. No .com
to let it pass on country specific domains, so that it also works on xyz.blogspot.cn
etc. Something like this:
...
"minifies-javascript": false,
"minify-level": 1,
"allowed-domains": ["localhost:4000", "mysite.gitlab.io", "myblogsubdomain.blogspot"]
}
data-format
to sql
in the file.If you ever see that the tables are not being created, you can use an SQL Query available in the hashover.sql
in this repo.
Generally, posting a test comment should create the tables and running the above sql should not be necessary.
But we'll also have to import in all the comments from WP. We have a php script called wp2hashover.
Clone the repo or download the project. Copy config.example.php
to config.php
in wp2hashover to change your settings. All the settings are self explanatory. In case of "thread_syntax" values, use your permalink
value from _config.yml
and (a) replace the /
s with - and (b) delete the last /
from the end.
For example, if you have this on _config.yml:
permalink: :year/:month/:day/:title/
then use this on config.php
:
$hashover_thread_syntax_posts=':year-:month-:day-:title';
$hashover_thread_syntax_pages=':title';
Also set these to accurately depict the url on page-info
table:
$hashover_url_syntax_posts=':year/:month/:day/:title';
$hashover_url_syntax_pages=':title';
Now put it somewhere inside your PHP supported server and access the wp2hashover.php
. It should automatically import all the comments from your WordPress db into hashover db. Look for the output on the page. If it shows an error, either try to find what may cause the issue or post an issue on wp2hashover project.
I had some rel="nofollow ugc"
showing up on comments with urls. So I ran this SQL query on phpMyAdmin:
UPDATE comments SET body = REPLACE(body, ' rel="nofollow ugc"', '') WHERE INSTR(body, ' rel="nofollow ugc"') > 0;
It didn't completely fix it though. This might be due to some uncommon formatting/parsing issue in HashOver.
Ref: