2018-09-28 14:14:59 +08:00
2018-09-28 14:14:59 +08:00
2018-09-28 14:14:59 +08:00
2018-09-28 11:17:05 +08:00
2018-05-23 09:51:18 +08:00
2018-06-07 14:39:46 +08:00
2018-06-07 14:39:46 +08:00
2018-09-28 11:17:05 +08:00
2018-09-28 14:14:59 +08:00

DataExtracter Help


DataExtracter helps you quickly extract data from any web pages. All you need to do is:

  • Find out the selectors (JQuery selectors) for target data
  • Call Extractor methods in extension backgroud page console, as introduced bellow.

Where is the extension backgroud page console?

Goto chrome://extensions/ and click backgroud page link of the extension

In the opening window, find Console, and type your scripts.

Qucik Start

Extract current page

new Extractor().task(".list-item", ["a.title", "p.content"]).start();

Extract multiple pages (1-10, interval 1)

new Extractor().task(".list-item", ["a.title", "p.content"],"http://sample.com/?pn=${page}", 1, 10, 1).start();

Extract multiple urls (list)

new Extractor().task(".list-item", ["a.title", "p.content"],["http://sample.com/abc","http://sample.com/xyz"]).start();

Extract specified pages (1,3,5)

new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start();

Extractor.task() Signitures:

// a task extracting data from current page
task(itemsSelector:string, fieldSelectors:string[])
// a task extracting data from a range of pages
task(itemsSelector:string, fieldSelectors:string[], urlTemplate:string, from:number, to:number, interval:number)
// a task extracting data from a list of pages
task(itemsSelector:string, fieldSelectors:string, urlTemplate:string, pages:number[])
// a task extracting data from a list of pages
task(itemsSelector:string, fieldSelectors:string[], urls:string[])
// a task extracting data of urls which extracted from last task result
task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult)

Advanced Usage:

Stop tasks

The only way to stop tasks before its finish, is Closing the Tab which runs tasks.

Extract attributes.

e.g.: link text and target (use 'selector@attribute')

new Extractor().task('.list-item', ['a.title', 'a.title@href']).start();

Use task chain.

e.g.: Collect links from http://sample.com/abc & Extract data of each link

new Extractor()
    .task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
    .task('list-item', ["a.title", "p.content"])
    .start();

Save result of any task

To a multiple task (chain) Extractor e:

e = new Extractor()
e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
    .task('list-item', ["a.title", "p.content"])
    .start();

User will be asked to save the final result when it finishes.

You may want to save another task's result, other than the final:

// save the result of first task
// that is, a list of urls
e.save(1)

Incase you want to save it again, use:

e.save()

Restart tasks

In cases some later task fails, you don't need to restart all task.

Here we have 2 tasks:

e = new Extractor()
e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
    .task('list-item', ["a.title", "p.content"])
    .start();

Suppose the second task fails, we can restart and continue from the task 2:

e.restart(2);

If you'd like restart all task, use:

e.start();
// or
e.restart();
Description
No description provided
Readme 325 KiB
Languages
TypeScript 93.1%
HTML 5.1%
JavaScript 1.8%