| Set Machine home | |
| Register | |
| Tutorial | |
| Help | |
| System requirements | |
| Database / ODBC | |
| Glossary | |
| Contact info |
| Title | Source | Target | Purpose |
|---|---|---|---|
| Quotes01 | www | database | Scrape stock quotes off the web |
| Tables01 | www | database | Scrape HTML tables off the web |
| FundRace | www | database | Scrape political contribution data |
| EMail-Basic | database | Load all messages from folders | |
| EMail-FromName | database | Identify messages with specifed sender names | |
| EMail-FromMail | database | Identify messages with specifed sender email addresses | |
| EMail-WordPairs | HTML | Identify messages with word pairs near each other | |
| EMail-NearTextRTF | RTF | Identify messages with word pairs near each other | |
| AmazonOrders | database | Parse Amazon order (sold/ship now) messages | |
| AmazonRefunds | database | Parse Amazon refund messages | |
| eBay-EndOfAuction | database | Parse eBay end-of-auction messages | |
| PayPal-UGotCash | database | Parse PayPal *You've Got Cash!* messages | |
| Bouncebacks | database | Parse undeliverable email notification messages | |
| Removes | database | Parse mail list removal requests | |
| CountKeywords | text | text | Count words / keywords in a group of text files |
| LineCount | text | text | Count the number of lines in a group of text files |
| HTMLLines | text | HTML | Replace newline characters with HTML <BR> tags |
| RTFLines | text | RTF | Replace newline characters with RTF 'line' tokens |
| NewLine-CRLF-LF | text | text | Replace line feeds with carriage return - line feeds |
| NewLine-LF-CRLF | text | text | Replace carriage return - line feeds with line feeds |
| TextSearch | text | text | Search for text in files |
| TextSearchWide | text | text | Search for text in Unicode / 16-bit text |
| WholeWordsOnly | text | text | Search for text, whole-words only |
| CSV-DB | CSV text | database | Import comma-separated data to database |
| ParseXML | XML | database | Parse XML files into a database |
| PADScan | XML | database | Parse PAD files into a database |
| RTF2HTML | RTF | HTML | Generate web pages from RTF (Rich Text Format) |
| SiteMapGen | HTML | HTML | Generate a site map from existing web pages |
| HTMLUnicodeMapsToC | HTML | C files | Convert ISO8859-to-Unicode maps into C include files |
| CFunctions | C files | text | Extract function headers from C/C++ source |
| SearchAndReplace | files | files | Locate and replace data in files |
| SwapBytes | files | files | Swap bytes in files |
| FilterJunk | files | files | Extract printable characters from files |
See also the following topics :
Extracts function definition headers from C and C++ source files. Even works on MFC source.
Contact WWWGrab.com for more information.
Counts words and keywords in a group of HTML files.
User must configure the "Keywords" string set.
Included in distribution package.
Computes the line count of a group of text files.
See the tutorial for more information.
Included in distribution package.
Imports CSV (comma-separated-variable) data to database table "csvdata". Transmits fields 1-13 in the input to fields A-M in the database. Ignores the first line of input, which is assumed to contain layout information. Can easily be adapted to import a different number of fields.
Contact WWWGrab.com for more information.
Loads all messages from the selected folder into database table "message". Loads basic message items: Sender name/email, recipient name/email, subject, date, the entire text body, and the source message store / folder names.
Contact WWWGrab.com for more information.
Loads messages with selected sender names into database table "message". Modify the "BeginsWith" and "EndsWith" string sets to filter on sender names that begin with and end with the desired text.
Contact WWWGrab.com for more information.
Loads messages with selected sender email addresses into database table "message". Modify the "BeginsWith" and "EndsWith" string sets to filter on sender email addresses that begin with and end with the desired text.
Contact WWWGrab.com for more information.
Version of EMail-SearchWordPairs.pxd that outputs RTF text.
Contact WWWGrab.com for more information.
Identifies messages containing any of the text entries listed in the
SearchText string set.
Included in distribution package.
Identifies messages with proximate text strings, i.e. word pairs, near each
other, creates an HTML file. Check the
screen shot.
Included in distribution package.
Sample parser for generated emails, parses messages in selected folders and
transmits selected information to the database.
This sample parses eBay "end of auction" messages and loads a table called
"eauction".
See the extracting data from online correspondence
topic.
Included in distribution package.
Searches for one or more text strings in the input files.
Mimics the output of grep.
Check the screen shot.
User must configure the "text to find" string set.
See the tutorial for more information.
Included in distribution package.
Searches for and replaces one or more patterns in the input files.
User must configure the "new text" string set.
See the tutorial for more information.
Included in distribution package.
Extracts printable characters (ASCII 30-126) from the input, discards
everything else.
Included in distribution package.
Generates a site map (web page) from web pages (HTML files) in a directory.
Extracts the TITLE and description META HTML tags for each page.
Used to generate this site's map.
Included in distribution package.
Replaces newline characters with HTML <BR> tags.
Can be used as a post-processor to preserve newlines when converting to HTML.
Included in distribution package.
Converts ISO8859-to-Unicode maps to C include files. Reads the HTML files, filters out the hexadecimal ISO8859-to-Unicode mapping values and formats them so that they can be included and compiled in a C program. The input maps can be found at : ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859.
Contact WWWGrab.com for more information.
"Fix" line feed characters - replace line feeds (0x0A) with carriage return
/ line feeds (0x0D0A).
Included in distribution package.
Replace carriage return / line feeds (0x0D0A) with line feeds (0x0A).
Included in distribution package.
Parses a PAD (Portable Application Description) XML file and loads selected values into a database table.
Contact WWWGrab.com for more information.
XML parsing sample. See the XML Parsing topic for more information.
Generates web content (HTML) from RTF (rich text format) files. Does a decent job of converting the old Set Machine RTF help file to HTML. May require modification for other RTF files! Builds links and a separate index file too.
Contact WWWGrab.com for more information.
Replaces newline characters with RTF "\line" tokens.
Contact WWWGrab.com for more information.
Swaps consecutive bytes in a group of files. Big-endian to little-endian and vice-versa.
This application of Set Machine is trivial and inefficient (and a little silly), but it works. Note that it handles the leftover byte if, for some reason, this transformation is performed on a file with an odd byte count. Processing the odd last byte allows for round-trip conversions that return the input files to their starting states.
This application of Set Machine is inefficient because Set Machine examines the value of every byte, which, for this task, is not necessary.
Contact WWWGrab.com for more information.
Performs wide-string (Unicode) text search.
Contact WWWGrab.com for more information.
Performs a "whole-word-only" search for a string.
Contact WWWGrab.com for more information.
Set Machine can be configured to transform many legacy data formats into databases, XML, or other formats.
Set Machine has been used to :
Contact WWWGrab.com for more information.
Set Machine can be configured to perform a practically infinite variety of transformation / parsing / filtering tasks, on both stored emails and files. A number of capabilities not currently included :
... will be implemented in future releases.
Set Machine has been thoroughly tested on Windows XP and Windows Vista. Preliminary checks indicate that it functions correctly on other 32-bit Windows systems.
View the PAD file: setmachine.htm (XML version: setmachine.xml)
32-bit Windows, Pentium processor, 5MB available hard disk space. WWWGrab requires an Internet connection. The WWWGrab samples require Microsoft Access.
In order to make use of the database output actions a DBMS with suitable ODBC driver is required. Most DBMSs available on MS Windows platforms meet this requirement.
In order to read messages (emails) Set Machine requires installation of MAPI (Messaging Application Programming Interface). Most Windows platforms with installed email clients (e.g. MS Outlook) meet this requirement.
Pressing Ctrl-D activates debug mode, pressing Ctrl-D again deactivates it. Activating debug mode calls the extension DLL with UserIndex = 99 :
Development Tools sample PXX.CPP responds to UserIndex 99 by producing a dialog box that shows recognized patterns and other information. Build your own extension DLL with the Set Machine Development Tools ...
| smc.exe | The command line version of Set Machine | |
|---|---|---|
| setmachine.dll | Set Machine library | |
| smlib.h | setmachine.dll function interface definition | |
| pxx.h | Set Machine User-Defined Function (UDF) interface definition | |
| ipx.h | IPX2 interface definition | (required by smlib.h) |
| ixxinbuf.h | IXXINBUF (input buffer) interface definition | (required by pxx.h) |
| xxdefs.h | Basic #defines, typedefs, etcetera | (required by ixxinbuf.h) |
| pxx.cpp | Sample UDF implementation | |
| smc.cpp | Source code for smc.exe | (illustrates use of IPX2 interface to setmachine.dll) |
| smclient.cpp | Sample setmachine.dll command-line client | |
| pxdb.dll | Database interface DLL | (required by setmachine.dll) |
| fsel.dll | Message system interface DLL | (required by setmachine.dll) |
| ReadMe.txt | Description of the Development Tools files |
SetMachine.DLL exports two functions :
SetMachine.DLL requires prior installation of SetMachine.EXE. SetMachine.DLL also requires registration after the 30-day evaluation period.
SMC.EXE is the command line version of SetMachine. It accepts a single command line argument, the .PXD file to run, and calls SetMachine.DLL.
| Set Machine home | Register | Tutorial | Help | Contact info |