Image Surfer Pro Toolbar

Processing Webpages
The Process Page Button

For most users, this button is the primary way in which new files are added to a fusker collection. The process of adding a URL to a fusker collection consists of the following steps:

How Image Surfer Pro processes data from a webpage depends upon what is currently displayed in the IE window. Image Surfer Pro treats the content displayed in the IE window as one of three types of webpage:

NOTE: The Process Page button from the Image Surfer Pro toolbar button can also be used to initiate a Directed Search when processing an ISP Form. However, in this section of the user's manual we will cover only how the data is collected from the currently displayed webpage and leave the directed search to the ISP Form section of the user's manual.

Extracting Relevant Data

The process of extraction involves not only determining what data from the webpage might be relevant but the assignment of an Image Surfer Pro data type to the URLs. In most cases the assignment is relatively straight forward (any MP4 file is a video etc.), but depending upon the processing being done the assignment may become more complex.

Direct Image Display

Internet Explorer will directly display image files as easily as it displays webpages. It can sometimes be difficult to determine if you are seeing a webpage or an image. Extraction in this case is rather obvious... the URL of the displayed image is treated as an ISP Image.

ISP Forms

The Process Pge button from the Image Surfer Pro toolbar button will check the header information of the webpage to determine if it is an Image Surfer Pro generated form. Each selection box on an ISP Form is processed from the top of the page to the bottom. Each Add box which has been checked will add a URL to a list for assimilation.

In most cases the wording of the selection box makes it clear which data type the URL will be assigned. However, the data type assigned to all generic (i.e. non-image, non-video, non-frame) links is determined by the drop down list at the top of the table where the link is displayed.

Cut out from an ISP Form which shows the add all select boxes and data type drop down menu at the top of the Image and Text tables

The setting of the drop down box at the top of the table will be used for all of the generic Add Page To Collection selection boxes within the table but will not be used if the link is known to be a video or image. If you wish to add links as different types from within the same table, select Unknown. Selecting Unknown as the data type will then prompt you individually during assimilation for each of the selected links from the table.

Generic Webpages

If the page displayed is neither an image nor an ISP Form, data extraction gets much more complex. Image Surfer Pro breaks apart the HTML code of the website and generates lists of possibly relevant data. The following lists of information are created from the HTML code.

Video Files File references with the following extensions are assigned the Video data type:
.mp4 .flv .wmv .mov .mpg .mpeg .avi .asf

In addition, the HTML5 <video> tags are specifically processed for any source identified as MP4. Any such identified video source is held as a video even if the URL file extension is not recognized.
Frame Data File references with the following extensions are assigned the Frame data type:
.swf

In addition, the data source of <iframe>, <object>, and <embed> tags will be identified as a Frame source if the URL file extension is not recognized as either a video or shockwave flash frame.
Image Files File references with the following extensions are assigned the Image data type:
.jpg .jpe .jpeg .gif .bmp .tif .tiff .pic .pct .pict .pcx .pxr .png

In addition, any URL found in the src attribute of an <img> tag will be held as an image even if the URL extension is not recognized.

Reference to image files are kept in two assimilation lists:
  • Embedded (visible) images on the page
  • Direct Images referenced by a hyper link on the webpage
Page Reference The URL of the processed webpage is kept as a possible Page object for assimilation.

Assimilating Desired Content

Not all of the extracted content is always assimilated into the fusker collection and how the data is assimilated is again dependant upon the source of the data. The assimilation process can be summarized by the following steps:

What Data Will Be assimilated

All data received from the processing of a directly displayed image or an ISP Form will be assimilated because you expressly selected the data for assemilation. Images selected are not subjected to the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.

When a general webpage is processed your User Preferences are used to determine what data will automatically be assimilated.

Videos Automatically assimilated based on the {Automatically collect video information found} from the Videos Tab.
Frames Automatically assimilated based on the {Automatically collect frame information found} from the Frames Tab.
Images Directly Referenced Images
Images not visible on the webpage but directly referenced by hyper-links on the webpage are automatically assimilated based on the {Automatically collect direct image links} from the Images Tab.
Embedded (visible) Images
Images visible on the webpage will be assimilated if the {Automatically collect embedded image links} from the Images Tab is set to Always and they are larger than the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.

Detail of the portion of the User Configuration Images Tab dealing with the size of Embedded Image urls
Pages The webpage URL is never automatically assimilated.

Because assimilation is dependent upon your User Preferences and the information extracted from a page, it is possible to process a webpage and have no data automatically added to your fusker collection. When this happens, Image Surfer Pro will inform you what information it did extract from the webpage and let you chose which sets of data are added.

If no data was automatically added to the Fusker Collection this dialog allows the user to decide what extracted data they want addd to the fusker collection. You may select as many of the different sets as you like when presented with the choice. Options where no relevant data was extracted will be disabled.

If the {Automatically collect embedded image links} setting on User Preferences Images Tab was not set to Always it is possible that Embedded Image information was extracted but not automatically added.

Even after embedded images are selected as a group for assimilation, each of the images is validated against the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.

Detail of the portion of the User Configuration Images Tab dealing with the size of Embedded Image urls

The original webpage option will always be available even if no other option is.


Forming Trees

Each extracted URL that is chosen for assimilation is made into a single branch ISP Tree consisting of a blank collection segment, a domain access segment, potentially several directory segments, and a file segment of the type determined during extraction.

For example consider the file reference:
http://www.rexwallpapers.com/images/wallpapers/celebs/sarah/sarah_michelle_gellar_1.jpg

Closeup of the fusker collection view with this single image

This image reference converted to an ISP Tree would have 7 segments under a collection segment.

The domain access segment (http:) represents the protocol used to access the reference. This segment is determined when the URL information is extracted by either the Process Page button from Image Surfer Pro toolbar or the URL Capture Bar.

Below the domain access segment are 5 directory segments. The top most of these segments is called the Domain or Root Directory segment. These segments indicate where on the web this file is stored. In this case the file is stored at Rex Wallpapers under the directory images/wallpapers/celebs/sarah

The final segment is always a file segment (sarah_michelle_gellar_1.jpg). Extraction will have defined the URL as an image because the file extension of (.jpg) was recognized as the JPEG image format.

The extraction process identifies media files as simply Image, Video, or Frame, but when a tree is formed from the URL, the exact form of the final file segment will be more percise:

Extraction Icon Description
Image Image Surfer Pro Image segment Icon Image
Any recognized image extension as well as URLs found as an <img> "src" value.
Video Image Surfer Pro MP4 Video segment Icon MP4 Video
Any URL with an MP4 extension or found as the MP4 identified source in a <video> tag.
Image Surfer Pro Flash Video segment Icon Flash Video
Any URL with an FLV extension.
Image Surfer Pro Windows Media segment Icon Window Media
Any URL with an extension which would be recognized by Windows Media Player. Specifically those with one of the following extensions: .wmv .mov .mpg .mpeg .avi .asf.
Frame Image Surfer Pro Shockwave Flash segment Icon Shockwave Flash
Any URL with an SWF extension.
Image Surfer Pro Raw Frame segment Icon Raw Frame
The data source of <iframe>, <object>, and <embed> tags which were not recognized as one of the other media types.
Page Image Surfer Pro Page segment Icon Web Pages
Any URL, typically reserved for use with Normal pages where HTML would be used such as Links from ISP Form tables. These links will only be added to your fusker collection after asking how you wish to treat an unknown URL type.

If the assimilated URL was found by processing a directly displayed image, the file segment of the single branch ISP Tree would have Auto Ranging applied to it. URLs extracted from any other type of page will not be auto ranged.

Tree Merging

Once the single branch ISP Tree is formed it is merged with your existing fusker collection. The merger process can be a bit complicated but the results are usually intuitive. New directory path information will appear as a new branch in the fusker tree. List and numeric fusks are not automatically generated or expanded at the directory level.

The merger process will equalize the roll up level of the new fusker tree to match that of the branches it is compared to in the existing fusker tree. This can help propagate your structural preferences automatically as you build your fusker collection. In addition the user preference {Force split segments based on string size} will be used when the new tree branch is merged with an existing tree branch which has Split Directories.

At the file segment level the setting of the {Auto combine individual files into fusked files} configuration for each file type governs how file segments are merged. If it is enabled, the new file segment will combine information with the first file segment of the same type found within the same path to form an optimally fusked file segment. This includes any information added by the auto ranging done when the single branch ISP Tree was formed. The Optimization Process is applied after combining the information such that duplicate references are removed and the final segment may be either list or numerically fusked.

The user preference {Auto combine individual video files into fusked videos} applies to all extracted video types individually. For example, MP4 Videos will only merge with other MP4 Videos, Windows Media videos with other Windows Media videos, and Flash Videos with other Flash Videos. However there is only a single user preference associated with auto merging for videos. Similarly Shockwave Flash frames will only merge with other Shockwave Flash frames and Raw Frames with other Raw Frames, but there is only one preference associated with auto merging Frames, {Auto combine individual frames into fusked frames}. Images of all file types can be merged into a single Image segment.

Sample interactive task progress window shown by Image Surfer Pro during the data assimilation processing. The single branch ISP Trees are merged into the fusker collection in groups. Each group may be contain several trees in the order they were found in the HTML of the processed page. Status of the group mergers is shown in an interactive task progress window similar to this one. The assimilation of any group may be stopped without affecting the assimilation of other groups.

The groups themselves are added in the following order:
  • Page
  • Embedded Images
  • Direct Image References
  • Frames
  • Videos

Displaying Results

As the single branch ISP Trees are merged into your fusker collection the selection is constantly moved to the last merged file segment. When the last URL from the last group has been assimilated into the fusker collection, a specific view of the last segment is generated. The resulting webpage looks similar to an Expanded view of the file segment except the entire path of the last URL is maintained and used to chose the correct iteration step at each fusked segment in the path. This should guarantee the content shown includes the last file assimilated.

The visualization may or may not include all of the information added. Information could be added which had different domain access protocols, different domains, different directory paths, or just different file types.

Related User Preferences:

Image of User Preferences Dialog with the General tab selected - nothing highlighted Image of User Preferences Dialog with the Processing tab selected - Auto Range Configuration highlighted Image of User Preferences Dialog with the Processing tab selected - nothing highlighted Image of User Preferences Dialog with the Views tab selected - nothing highlighted
Image of User Preferences Dialog with the Images tab selected - Collection and Optimization highlighted Image of User Preferences Dialog with the Videos tab selected - Collection and Optimization highlighted Image of User Preferences Dialog with the Frames tab selected - Collection and Optimization highlighted Image of User Preferences Dialog with the Pages tab selected - Optimization highlighted


Processing Tab: Auto Range
The six inputs in the Auto Ranging Configuration block of the processing tab all play a significant role in how images directly displayed in the browser display are added to the fusker collection. They determine whether or not Auto Range Fusking is performed, how large the numerical range of files will be, and where the range starts relative to the image file being processed.

Processing Tab: Structure Propagation
When incoming directory or file segments are compared to a split directory segment in the current fusker collection, if the URL text of the split directory matches, the incoming segment will be split during the matching process. However, if the text does not match, the user configuration {Force split segments based on string size} is used. The incoming segment will remain unsplit if the configuration is not enabled and will split after the same number of characters as the existing segment if it is enabled.

Images Tab: Image Collection
There are three controls for image collection. The first, {Automatically collect direct image links} deals with images that are not visible on the webpage but which were directly referenced by hyper-links on the processed page. The second two, {Automatcially collect embedded image lnks} and {Min embedded image file Kbytes for auto collection} are applied to images visible on the processed page. They determine whether or not the images are extracted, automatically collected, and the minimum file size required for these images to be added to the fusker collection.

Images Tab: Auto Optimize: If the {Auto combine individual Images into fusked images} configuration is checked, any images being added to the same directory in the fusker collection will be grouped into a fusked file. The form of the fusked file will be optimized and may be either a list or numeric fusk.

Videos Tab: Video Collection
The {Automatically collect video information found} configuration determines whether or not video file references found during the Directed Search are automatically added to the fusker collection. If selected all references to MP4, Flash, and Windows Media video files will be added the the fusker collection automatically.

Videos Tab: Auto Optimize
If the {Auto combine individual Videos into fusked Videos} configuration is checked, any videos being added to the same directory in the fusker collection will be grouped into a fusked file with other videos of the same type (e.g. FLV files group together but MP4 files do not group with FLV files). The form of the fusked file will be optimized and may be either a list or numeric fusk.

Frames Tab: Frame Collection
The {Automatically collect frame information found} configurations determines whether or not Shockwave Flash (SWF) file references and raw data found in <iframe>, <embed>, and <object> tags during the directed search is automatically added to the fusker collection.

Frames Tab: Auto Optimize
If the {Auto combine individual Frames into fusked Frames} configuration is checked, any frame information being added to the same directory in the fusker collection will be grouped into a fusked file with other frames of the same type (either SWF or Raw). The form of the fusked file will be optimized and may be either a list or numeric fusk.

Pages Tab: Auto Optimize
If the {Auto combine individual Pages into fusked Pages} configuration is checked, any page URL being added to the same directory in the fusker collection will be grouped into a fusked file with other pages. The form of the fusked file will be optimized and may be either a list or numeric fusk.

Differences in Free and Full Versions

Screen capture of free version limitation dialog Processing Image Galleries:
In the Free Version of Image Surfer Pro the Process Pge button from the Image Surfer Pro toolbar button will only process a directly displayed image file. It will not process an ISP Form or general webpage. If the displayed page is not a direct image file, you will be given the option of adding the webpage to the fusker collection as a page URL.

Use Limitation:
The Free Version of Image Surfer Pro will only allow you to use the Process Webpage button a limited number of times per browsing session. The primary use of the free version of Image Surfer Pro is to visualize existing fusker collection files and provide a limited feel for the ability to modify those fusker collections. Building extensive fusker collections with the free version will be difficult.

Screen Capture Examples

Sample screen capture after using the Process Page button Examples of using the button are separated in to several pages: