A port of n0madic/twitter-scraper to Node.js. https://the-convocation.github.io/twitter-scraper/

Kara c1c423176c fix: await ApiError creation in new rate limiter (#117) 5 days ago
.github 958fbf62df feat: allow manually running the CI workflow 1 month ago
.husky 84f43b3fc9 chore: add commit linting 2 years ago
examples 910b038470 chore: add compat check 6 months ago
src c1c423176c fix: await ApiError creation in new rate limiter (#117) 5 days ago
.commitlintrc 84f43b3fc9 chore: add commit linting 2 years ago
.eslintrc.js b21ff5eb42 feat: add docs workflow 2 years ago
.gitattributes 6e5c062b3f chore: add gitattributes 2 years ago
.gitignore ba9d9c2d6c chore: enable corepack 1 year ago
.npmignore 2250233634 chore: exclude typedoc config from dist 1 year ago
.prettierignore df49ca44ef feat: support custom rate-limiting strategies (#116) 5 days ago
.prettierrc 99db1493fa chore: scaffold project 2 years ago
.yarnrc.yml b8ec18cd0c fix merge conflicts 1 year ago
LICENSE d368af1bb3 chore: initial commit 2 years ago
README.md b726c5575b chore: include PR link for DIY pooling instead of just alluding to it 5 days ago
jest.config.js a6f0ae1445 feat: implement TLS cipher shuffling for Node to reduce 404s (#97) 6 months ago
package.json c1c423176c fix: await ApiError creation in new rate limiter (#117) 5 days ago
rollup.config.mjs a66fb5db7c chore: set up rollup for multiplatform builds (#96) 6 months ago
test-setup.js a6f0ae1445 feat: implement TLS cipher shuffling for Node to reduce 404s (#97) 6 months ago
tsconfig.json a66fb5db7c chore: set up rollup for multiplatform builds (#96) 6 months ago
typedoc.json df49ca44ef feat: support custom rate-limiting strategies (#116) 5 days ago
yarn.lock df49ca44ef feat: support custom rate-limiting strategies (#116) 5 days ago

README.md

twitter-scraper

Documentation badge

A port of n0madic/twitter-scraper to Node.js.

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse-engineered. No API rate limits. No tokens needed. No restrictions. Extremely fast.

You can use this library to get the text of any user's Tweets trivially.

Known limitations:

  • Search operations require logging in with a real user account via scraper.login().
  • Twitter's frontend API does in fact have rate limits (#11)

Installation

This package requires Node.js v16.0.0 or greater.

NPM:

npm install @the-convocation/twitter-scraper

Yarn:

yarn add @the-convocation/twitter-scraper

TypeScript types have been bundled with the distribution.

Usage

Most use cases are exactly the same as in n0madic/twitter-scraper. Channel iterators have been translated into AsyncGenerator instances, and can be consumed with the corresponding for await (const x of y) { ... } syntax.

Browser usage

This package directly invokes the Twitter API, which does not have permissive CORS headers. With the default settings, requests will fail unless you disable CORS checks, which is not advised. Instead, applications must provide a CORS proxy and configure it in the Scraper options.

Proxies (and other request mutations) can be configured with the request interceptor transform:

const scraper = new Scraper({
  transform: {
    request(input: RequestInfo | URL, init?: RequestInit) {
      // The arguments here are the same as the parameters to fetch(), and
      // are kept as-is for flexibility of both the library and applications.
      if (input instanceof URL) {
        const proxy = "https://corsproxy.io/?" +
          encodeURIComponent(input.toString());
        return [proxy, init];
      } else if (typeof input === "string") {
        const proxy = "https://corsproxy.io/?" + encodeURIComponent(input);
        return [proxy, init];
      } else {
        // Omitting handling for example
        throw new Error("Unexpected request input type");
      }
    },
  },
});

corsproxy.io is a public CORS proxy that works correctly with this package.

The public CORS proxy corsproxy.org does not work at the time of writing (at least not using their recommended integration on the front page).

Next.js 13.x example:

"use client";

import { Scraper, Tweet } from "@the-convocation/twitter-scraper";
import { useEffect, useMemo, useState } from "react";

export default function Home() {
  const scraper = useMemo(
    () =>
      new Scraper({
        transform: {
          request(input: RequestInfo | URL, init?: RequestInit) {
            if (input instanceof URL) {
              const proxy = "https://corsproxy.io/?" +
                encodeURIComponent(input.toString());
              return [proxy, init];
            } else if (typeof input === "string") {
              const proxy = "https://corsproxy.io/?" +
                encodeURIComponent(input);
              return [proxy, init];
            } else {
              throw new Error("Unexpected request input type");
            }
          },
        },
      }),
    [],
  );
  const [tweet, setTweet] = useState<Tweet | null>(null);

  useEffect(() => {
    async function getTweet() {
      const latestTweet = await scraper.getLatestTweet("twitter");
      if (latestTweet) {
        setTweet(latestTweet);
      }
    }

    getTweet();
  }, [scraper]);

  return (
    <main className="flex min-h-screen flex-col items-center justify-between p-24">
      {tweet?.text}
    </main>
  );
}

Edge runtimes

This package currently uses cross-fetch as a portable fetch. Edge runtimes such as CloudFlare Workers sometimes have fetch functions that behave differently from the web standard, so you may need to override the fetch function the scraper uses. If so, a custom fetch can be provided in the options:

const scraper = new Scraper({
  fetch: fetch,
});

Note that this does not change the arguments passed to the function, or the expected return type. If the custom fetch function produces runtime errors related to incorrect types, be sure to wrap it in a shim (not currently supported directly by interceptors):

const scraper = new Scraper({
  fetch: (input, init) => {
    // Transform input and init into your function's expected types...
    return fetch(input, init)
      .then((res) => {
        // Transform res into a web-compliant response...
        return res;
      });
  },
});

Rate limiting

The Twitter API heavily rate-limits clients, requiring that the scraper has its own rate-limit handling to behave predictably when rate-limiting occurs. By default, the scraper uses a rate-limiting strategy that waits for the current rate-limiting period to expire before resuming requests.

This has been known to take a very long time, in some cases (up to 13 minutes).

You may want to change how rate-limiting events are handled, potentially by pooling scrapers logged-in to different accounts (refer to #116 for how to do this yourself). The rate-limit handling strategy can be configured by passing a custom implementation to the rateLimitStrategy option in the scraper constructor:

import { Scraper, RateLimitStrategy } from "@the-convocation/twitter-scraper";

class CustomRateLimitStrategy implements RateLimitStrategy {
  async onRateLimit(event: RateLimitEvent): Promise<void> {
    // your own logic...
  }
}

const scraper = new Scraper({
  rateLimitStrategy: new CustomRateLimitStrategy(),
});

More information on this interface can be found on the RateLimitStrategy page in the documentation. The library provides two pre-written implementations to choose from:

  • WaitingRateLimitStrategy: The default, which waits for the limit to expire.
  • ErrorRateLimitStrategy: A strategy that throws if any rate-limit event occurs.

Contributing

Setup

This project currently requires Node 18.x for development and uses Yarn for package management. Corepack is configured for this project, so you don't need to install a particular package manager version manually.

The project supports Node 16.x at runtime, but requires Node 18.x to run its build tools.

Just run corepack enable to turn on the shims, then run yarn to install the dependencies.

Basic scripts

  • yarn build: Builds the project into the dist folder
  • yarn test: Runs the package tests (see Testing first)

Run yarn help for general yarn usage information.

Testing

This package includes unit tests for all major functionality. Given the speed at which Twitter's private API changes, failing tests are to be expected.

yarn test

Before running tests, you should configure environment variables for authentication.

TWITTER_USERNAME=    # Account username
TWITTER_PASSWORD=    # Account password
TWITTER_EMAIL=       # Account email
TWITTER_COOKIES=     # JSON-serialized array of cookies of an authenticated session
PROXY_URL=           # HTTP(s) proxy for requests (optional)

Commit message format

We use Conventional Commits, and enforce this with precommit checks. Please refer to the Git history for real examples of the commit message format.