Image Surfer Pro Toolbar

Tools & Settings icon User Configuration: File Type Tabs
Auto Collection Configuration

Each of the media types supported by Image Surfer Pro has an associated Auto Collection configuration. While the Processing Web Pages along with the Directed Search sections of this manual do a good job of describing how webpages are processed, they are not written from the perspective of each of the media types. The topics they cover are large and some details of how each media type is collected may need some clarification. In this section of the manual we wish to focus specifically on each media type, how and when it is collected, and why you might wish to have specific configurations for these User Preferences.

What Is Auto Collection

As detailed in Processing Web Pages, there are several steps to processing a web page: Extraction, assimilation, and Results Display. Here we will focus in on the assimilation Process.

assimilation also consists of three steps: Determining what will be assimilated, forming trees, and merging trees. Other sections of this manual cover tree formation and how trees merge. Here we wish to detail how it is determined what data is assimilated. Specifically how it is decided if data extracted from a processed webpage is automatically assimilated.

Auto Collection is defined as the designation of specific data extracted from a webpage to be automatically processed for assimilation. In most cases you will have an idea of what type of media links you want to collect from the webpages you are surfing. If you enable the Auto Collect configurations for this type of media the processing will assimilate the desired data after extraction completes. However, you may want to simply turn all Auto Collection off - in which case you will be prompted to chose what data is assimilated from the sets of extracted data.

Universal yellow triangle with ! Warning Any data set not configured to automatically collect will be discarded if data is extracted for a data set which is configured to automatically collect!

Image Auto Collection

It has been said "An image is worth a thousand words." and this concept has never been more clearly realized than it is on the Internet. Every webpage you visit has multiple images on it. Images are sometimes the entire focus of a webpage and sometimes they are the accents on the page. They play the role of buttons, decoration, and even advertisements. In most cases, the images you wish to collect will seem obvious to you while you ignore thousands of other images on the very same pages.

Because images are a static medium (with the possible exception of a few file formats such as GIF), images are often used as the source of a hyper link. Web browsers also treat images in a special way, if the URL links directly to an image, most browsers will simply display the image as if it were a webpage. In many cases the entire purpose of a webpage is to provide links to individual images. In almost every case these are the images you wish to collect because they will never be buttons, decoration, nor ads.

Image auto collection cut out Image Surfer Pro gives you Auto Collection configurations for images broken into two groups:
  • Direct Image Links: Hyper links found on the processed page which link directly to image files. These images were not visible on the page.

  • Embedded Image Links: Images visible on the processed page.

{Automatically collect direct image links}

In almost every case you will want to enable this configuration. The original purpose of Image Surfer Pro was to collect quality links to quality images which could be shared between users. Direct image links found on a processed page are typically the focus of the page and the entire reason you were surfing the page to start with.

You may wish to turn this configuration off if you are processing pages which have mixed video and image content (such as a mixed media Free Hosted Gallery), and you want to collect only the video media.

{Automatically collect embedded image links}

In almost every case you will want to set this to Direct Search. In this case, only embedded images found on specific pages during a directed search will be automatically assimilated while in other processing they will only appear as an option for assimilation if no other data set was automatically collected.

Unlike all the other Auto Collection configurations, this configuration affects both extraction and assimilation.

Detail of the portion of the User Configuration Images Tab dealing with the collection of Embedded Image urls
Setting Extracted Assimilated
Always From Every Page Processed Always Automatic
Directed Pages where not many direct image or video references found.
Typically directed search 3rd level pages or any directly processed webpage that isn't a FHG.
Automatic only after a direct search
Never Pages where not many direct image or video references found.
Typically directed search 3rd level pages or any directly processed webpage that isn't a FHG.
Never Automatic
User Prompt if nothing automatically assimilated

{Min embedded image file Kbytes for auto collection}

If you have set {Automatically collect embedded image links} to Directed Search you will probably want to set this slider bar configuration to Always. If on the other hand you have chosen to Always collect embedded images you will probably want to increase the file size based on the types of images you wish to avoid. Images used for decoration and buttons will typically be small in size. Advertisements will be larger. The optimal setting will take some experimentation with the types of sites you typically process for embedded images.

All embedded images assimilated are subject to comparison with this configuration, regardless of whether they are collected automatically or by choice after no data was selected for automatic collection.

Video Auto Collection

Like images, if the page references a video file, that link is probably what you wish to collect. Unlike images, if a video is present on a page it is typically the focus of the page and at least one of the reasons the page exists. Videos are rarely used as advertisements and because they are interactive are not used as the source of a hyper link. Image Surfer Pro will extract any embedded or directly linked video URL.

However, not everything which appears to be a video on a webpage is in fact a video. Internet Explorer supports a multitude of image file formats, and several video file formats (mp4 flv wmv mov mpg mpeg avi asf). The {Automatically collect video information found} applies to all of video file formats but does not apply to Shockwave Flash frames. Make sure you understand how videos are presented on webpages.

{Automatically collect video information found}

Unless you do not wish to collect video file links in your fusker collection there is very little reason not to enable this configuration.

Frame Auto Collection

Frames are active embedded objects on webpages. They have a multitude of uses. Two very typical uses are to display either a video stream or an active advertisement. Unlike actual videos or image files they may or may not be direct reference to a source file.

Shockwave Flash file information is extracted by Image Surfer Pro in much the same way video file information is. However, these files are not a static video file format, they include what is essentially executable code which uses the Shockwave Flash addon to your browser as a user interface and as such are highly risky.

Not all frames use Shockwave Flash. The HTML tags <iframe>, <object>, and <embed> all allow for other types of active content and can easily be used to display video. It would not be reasonable for Image Surfer Pro to claim support for video content if it did not support video presented in this way as nearly every top tier tube site, such as YouTube, presents their videos in this way. Data sources associated with these tags which are not recognized static video files nor Shockwave Flash files will be extracted as Raw Frame data. Because this type of encoding essentially creates an open interactive pipe to the hosting server, they are highly risky.

{Automatically collect frame information found}

This configuration applies to both Shockwave Flash and Raw Frame data extracted from any processed webpage.

It is Highly Recommended that frame collection be limited to either selection off of an ISP Form or through the use of the URL Capture Bar. Thus we recommend leaving this configuration disabled.

Page Auto Collection

There is no Auto Collection for pages. The page type is intended to be a catch all to make sure that any web location can be collected and used, but there is no way to differentiate a general web link from another to say it would be of interest. Auto collection of non-media links would essentially collect every link found on a processed web page, while the vast majority of these links would be of little to no interest to the collector. Think of your favorites or bookmarks within your browser, those represent the web locations you have some interest in returning to, however they represent a tiny fraction of the number of web locations you come across in a normal day of surfing.

Page objects have a very real and valid use within Image Surfer Pro, but you need to designate each page you wish to have in your collection individually.



Differences in Free and Full Versions

There are no differences in the capabilities of the free and registered versions of Image Surfer Pro related to setting your user preferences. The primary differences between the Free and Registered versions of the software are their ability to build fusker collections. There are constraints for Process Page button from Image Surver Pro toolbar, ISP Forms icon from Image Surver Pro toolbar, and the URL Capture Bar which use the configurations discussed here. The Free version of the software is primarily a viewer for the fusker collection files.