Home
mandy edited this page 4 years ago

Mandy

Mandy is a bot which scrapes media links from various Lost Media wikis and reuploads them to the Internet Archive before they die.

To contact, email mandy@firemail.cc or create an issue on the most relevant repo. Legal matters? Read Copyright first.

Want this information in a blog post with pictures instead? It's more up to date.
My Quest to Stop Rare Media Being Lost Again

Contents:

  1. Why was this made?
  2. Which links are archived?
  3. Where can I find a list of archived links?
  4. Where is the source code?
  5. How does this work? Where can I find documentation?
  6. How do I run it?
  7. Can you archive [website] as well?
  8. Who is the angry blonde girl. Why is she there?
  9. Copyright

Why was this made?

Too many dead links.

There are currently two main Lost Media communities known to the bot's creator: the Lost Media Archive wikia and the Lost Media Wiki. Unfortunately, both these sites heavily rely on off-site hosting for videos such as YouTube and Vimeo. YouTube is well-known for deleting videos and entire accounts, which means it is not a safe place to archive media you don't want to lose.

This bot was made in an attempt to automate the preservation of the poorly-archived media that is referenced on these lost media sites.

Which links are archived?

See Wiki: Which links are archived?

Where can I find a list of archived links?

The success list in the list repo keeps track of all the links which have been archived successfully.

The other lists keep track of links which have been recognised but not successfully archived. See Wiki: Which links are archived? for reasons why this may happen.

Also not that this list will not include links which were already archived. Those should be able to be found on archive.org.

Where is the source code?

This repo.

Can I fix it?

Feel free to fix or fork.

How does this work? Where can I find documentation?

Mandy is mostly made up of Python and shell scripts. A modified version of TubeUp (which uses youtube-dl and InternetArchive) is used for downloading and reuploading media. BeautifulSoup 4 is sometimes used for parsing RSS feeds. wget is sometimes used for downloading necessary feeds and wikia pages, for checking archive.org for existing copies and for fetching channel IDs from YouTube.

See Wiki: Documentation.

How do I run it?

Unless the last commit date to the list repo is over 15 days old or you have contacted me, you shouldn't (unless you replace all the parsers with different target websites).

This bot only needs to be run in one place at a time. Unless something has gone wrong, I am running this bot continually.

It is possible in the future that something happens resulting in me being unable to continue running this bot. In that case, see Wiki: Documentation.

Can you archive [website] as well?

If I you have time, probably. (now I have a job and stuff...)

This bot relies on fetching a page of recent activity from a website and looking through it for links. For example, the Lost Media Archive wikia (like all other similar wikia sites) has an RSS feed page which lists the last 40-50 changes to the site (now the wikia Recent Changes page is used, but the RSS feed scraper can still be found in old commits). By reading this page every few minutes, the bot can find all the new links added to the site (except under certain circumstances).

If a similar 'recent activity' page can be found on the website in question, a parser should not be hard to implement.

Who is the angry blonde girl? Why is she there?

The name of this bot is taken from Mandy, one of the main characters in the cartoon The Grim Adventures of Billy & Mandy.

In the show, she becomes the boss of the Grim Reaper (a traditional personification of death), which parallels the mission of this bot which prevents YouTube from killing rare media.

The character Mandy was created primarily by Maxwell Atoms. I apologise in advance to him and the show's animators for the use of their works without permission. The account avatar is cropped and modified directly from season 3, episode 1 (Daddy's Little Spider"), certain other images are original (albeit partially traced and derivative of Atoms' one-liners). These representations to not purport to be official in any way, nor is any relevant creator likely to be impacted by or even made aware of this project.

Copyright

"This bot reuploaded content I own/etc. I would like my content taken down."

My mistake. I can help fix it.

Ask politely at mandy@firemail.cc and I will most likely respect your rightful decision and takedown your work. This is far quicker and easier than going through the archive.org takedown backlog.

This way, I can also blacklist your content to prevent future reuploads.

"This bot reuploaded content I own/etc. I want to sue you personally for money."

My country is continuously listed in the USTR list of US trading partners "that do not adequately or effectively protect and enforce intellectual property rights".

See the previous header if you would like your material taken down.

"I am Maxwell Atoms. What is this?"

oh shi-