134 lines
		
	
	
		
			3.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			134 lines
		
	
	
		
			3.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # DataExtracter Help
 | |
| ----------------------------
 | |
| 
 | |
| DataExtracter helps you quickly extract data from any web pages. 
 | |
| 
 | |
| All you need to do is:
 | |
| 
 | |
| - Find out the selectors (JQuery selectors) for target data
 | |
| - Type scripts in the console of `extension backgroud page`, as introduced bellow.
 | |
| 
 | |
|  
 | |
| 
 | |
| ## Qucik Start
 | |
| 
 | |
| Extract current page
 | |
| ```js
 | |
| new Extractor().task(".list-item", ["a.title", "p.content"]).start();
 | |
| ```
 | |
| 
 | |
| Extract multiple pages (1-10, interval 1)
 | |
| 
 | |
| ```js
 | |
| new Extractor().task(".list-item", ["a.title", "p.content"],"http://sample.com/?pn=${page}", 1, 10, 1).start();
 | |
| ```
 | |
| 
 | |
| Extract multiple urls (list)
 | |
| 
 | |
| ```js
 | |
| new Extractor().task(".list-item", ["a.title", "p.content"],["http://sample.com/abc","http://sample.com/xyz"]).start();
 | |
| ```
 | |
| 
 | |
| Extract specified pages (1,3,5)
 | |
| 
 | |
| ```js
 | |
| new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start();
 | |
| ```
 | |
| 
 | |
| ## Extractor.task() Signitures
 | |
| 
 | |
| ```ts
 | |
| // a task extracting data from current page
 | |
| task(itemsSelector:string, fieldSelectors:string[])
 | |
| // a task extracting data from a range of pages
 | |
| task(itemsSelector:string, fieldSelectors:string[], urlTemplate:string, from:number, to:number, interval:number)
 | |
| // a task extracting data from a list of pages
 | |
| task(itemsSelector:string, fieldSelectors:string, urlTemplate:string, pages:number[])
 | |
| // a task extracting data from a list of pages
 | |
| task(itemsSelector:string, fieldSelectors:string[], urls:string[])
 | |
| // a task extracting data of urls which extracted from last task result
 | |
| task(itemsSelector:string, fieldSelectors:string[], urls:ExtractResult)
 | |
| ```
 | |
| 
 | |
| ## Advanced Usage
 | |
| 
 | |
| ### Stop Tasks
 | |
| 
 | |
| Tasks wait for their target elements' appearance, given some elements were loaded asynchronously.
 | |
| 
 | |
| But if you typed wrong selectors, the task waits forever for  elements which don't exists.
 | |
| 
 | |
| The only way to stop tasks before its finish, is `Closing the host tab`.
 | |
| 
 | |
| ### Extract Attributes.
 | |
| 
 | |
| e.g.: link text and target (use 'selector@attribute')
 | |
| 
 | |
| ```js
 | |
| new Extractor().task('.list-item', ['a.title', 'a.title@href']).start();
 | |
| ```
 | |
| 
 | |
| ### Use Task Chain.
 | |
| 
 | |
| e.g.: Collect links from `http://sample.com/abc`, then, Extract data of each link
 | |
| 
 | |
| ```js
 | |
| new Extractor()
 | |
|     .task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
 | |
|     .task('list-item', ["a.title", "p.content"])
 | |
|     .start();
 | |
| ```
 | |
| 
 | |
| ### Save Result of Any Task
 | |
| 
 | |
| To a multiple task (chain) Extractor `e`:
 | |
| 
 | |
| ```js
 | |
| e = new Extractor()
 | |
| e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
 | |
|     .task('list-item', ["a.title", "p.content"])
 | |
|     .start();
 | |
| ```
 | |
| 
 | |
| User will be asked to save  the final result when it finishes.
 | |
| 
 | |
| Incase you want to save it again, use:
 | |
| 
 | |
| ```js
 | |
| e.save()
 | |
| ```
 | |
| 
 | |
| You may want to save another task's result, other than the final:
 | |
| 
 | |
| ```js
 | |
| // save the result of first task
 | |
| // to the example above, that is a list of urls
 | |
| e.save(1)
 | |
| ```
 | |
| 
 | |
| ### Restart Tasks
 | |
| 
 | |
| In cases some later task fails, you don't need to restart all task.
 | |
| 
 | |
| Here we have 2 tasks:
 | |
| 
 | |
| ```js
 | |
| e = new Extractor()
 | |
| e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
 | |
|     .task('list-item', ["a.title", "p.content"])
 | |
|     .start();
 | |
| ```
 | |
| 
 | |
| Suppose the second task fails, we can restart and continue from the task 2:
 | |
| 
 | |
| ```js
 | |
| e.restart(2);
 | |
| ```
 | |
| 
 | |
| If you'd like restart all task, use:
 | |
| 
 | |
| ```js
 | |
| e.start();
 | |
| // or
 | |
| e.restart();
 | |
| ``` |