Global News Publishers Actor

Ultimate News Scraper - Rise of the Phoenix

Extract real-time and historical article data from a large catalog of global news publishers with category targeting and clean JSON output.

News / Media

What it does

Extract real-time and historical article data from a large catalog of global news publishers with category targeting and clean JSON output.

Request this workflow View on Apify

Best forNews monitoring
Market and competitor intelligence
SEO and content research
Publisher coverage tracking

Fields

Site name
Country
Region
Language
Article title
Author
Article body

Inputs

Websites to scrape
Categories to scrape
Execution mode
Historic cutoff date
Max items per site

README

Ultimate News Scraper - Rise of the Phoenix technical notes

Ultimate News Scraper - Rise of the Phoenix is an Apify Actor for real-time and historical article extraction across 800+ global publishers. It supports category targeting, proxy configuration, and fallback crawling using Scrapling, PyDoll, and Selenium. The workflow is useful when a team needs structured news article data for monitoring, research, market intelligence, SEO analysis, or internal reporting.

Use Cases

News monitoring
Market and competitor intelligence
SEO and content research
Publisher coverage tracking
Historical article collection

Data Fields

Site name
Country
Region
Language
Article title
Author
Article body
Tags
Published date
Article URL
URL hash
Main image URL
SEO description
Scraped at timestamp
Scraping tool
Execution mode
Category URL
Source HTML language
Cutoff filtered flag

Inputs

Websites to scrape
Categories to scrape
Execution mode
Historic cutoff date
Max items per site
No article limit
Proxy configuration
Manual site category filters

Workflow

Selected news publishers or categories
Actor run with current or historic mode
Clean article JSON dataset
Delivery to spreadsheet, database, API, or alerts
Monitoring report or intelligence workflow

Delivery

CSV
Excel
Google Sheets
API
Database
Airtable
Notion
Slack
CRM

Limitations

Coverage depends on the Actor's active publisher catalog and each publisher's site structure.
Some article fields may be unavailable on specific publishers.
Historic collection depends on available article archives and the configured cutoff date.
Proxy and fallback crawling can improve coverage but do not guarantee every target article will be available.
The workflow is reviewed before setup.

Setup Notes

Choose specific publishers, categories, or the broader active catalog depending on the monitoring goal.
Use current mode for fresh coverage and historic mode with a cutoff date for archive collection.
Configure proxy and fallback options when publisher coverage or reliability needs extra review.

Output Handling

Keep source metadata, article title, author, body, publication date, URL, category URL, and scraping mode together.
Use URL hash and article URL for deduplication across current and historic runs.
Route long article bodies to storage that can handle full text before sending summaries to spreadsheets or alerts.

Quality Checks

Review publisher coverage before relying on the workflow for monitoring.
Check that cutoff filtering behaves as expected in historic mode.
Sample articles across multiple publishers to confirm body text, dates, and metadata quality.

FAQ

Can this Actor collect both current and historic news?

Yes. The Actor supports current and historic execution modes, with a historic cutoff date available for older article collection.

Can I choose specific publishers?

Yes. The input schema supports selecting one or more websites from the active catalog, or leaving the selection empty to use the broader active catalog.

Can I target specific categories?

Yes. The Actor supports category targeting and manual site category filters for more controlled collection.

What kind of output does it produce?

The output schema includes source metadata, article title, author, body text, tags, publication date, URL fields, image URL, SEO description, scrape timestamp, scraping tool, execution mode, category URL, and cutoff filtering status.

Can The Scrape Lab deliver the data to my tools?

Yes. The Scrape Lab can clean and route the Actor output to CSV, Excel, Google Sheets, APIs, databases, Airtable, Notion, Slack, CRMs, or reporting workflows.

Is every publisher guaranteed to work?

No. Availability depends on publisher structure, public data access, archive availability, and technical complexity. Every workflow is reviewed before setup.

Ultimate News Scraper - Rise of the Phoenix

What it does

Best for

Fields

Inputs

Ultimate News Scraper - Rise of the Phoenix technical notes

Use Cases

Data Fields

Inputs

Workflow

Delivery

Limitations

Setup Notes

Output Handling

Quality Checks

FAQ

Can this Actor collect both current and historic news?

Can I choose specific publishers?

Can I target specific categories?

What kind of output does it produce?

Can The Scrape Lab deliver the data to my tools?

Is every publisher guaranteed to work?

Use Cases

News & Publisher Intelligence

Need data collected or piped somewhere?