Post Reply 
 
Thread Rating:
  • 7 Votes - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[C]Duplicate line finder
01-09-2010, 02:13 AM (This post was last modified: 01-09-2010 04:06 PM by rubix.)
Post: #1
[C]Duplicate line finder
Hey everyone, Haven't posted anything in a while, thought I'd contribute a little something.

My mom wanted a program that would scan through a text file and and duplicate lines, so I decided to whip this up. Naturally I can't expect my mom to be able to use a command prompt 001_tongue, so it's GUI. Nothing special, but someone might learn something

Couple of notes:
the CheckForDuplicates() function assumes that the maximum amount of characters per line is 500. It will stay this way until I find an efficient way to retrieve that on the go 001_tongue, It could easily be made to to have the whole file size amount of chars per line, but that just didn't seem like the right path, I'm hoping someone has a good idea on how to go about this. Anyway, the MAX_CHARS_PER_LINE define at the top of the function controls that for easy increasing.

I feel I commented alot, almost to the point where it gets hard to read the code. So I'm posting the uncommented code and there is a commented file in the zip.

[Image: screenshotjm.th.jpg]

code: written in C

main.cpp
    C++ Programming
#include "def.h"
 
int WINAPI WinMain (HINSTANCE
hThisInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR lpszArgument,
                     int nCmdShow)
{
    HWND hwnd;
    MSG messages;
    WNDCLASSEX wincl;
 
    CoInitialize(NULL);
 
    wincl.hInstance = hThisInstance;
    wincl.lpszClassName = szClassName;
    wincl.lpfnWndProc = WindowProcedure;
    wincl.style = CS_DBLCLKS;
    wincl.cbSize = sizeof (WNDCLASSEX);
 
    wincl.hIcon = LoadIcon (NULL, IDI_APPLICATION);
    wincl.hIconSm = LoadIcon (NULL, IDI_APPLICATION);
    wincl.hCursor = LoadCursor (NULL, IDC_ARROW);
    wincl.lpszMenuName = NULL;
    wincl.cbClsExtra = 0;
    wincl.cbWndExtra = 0;
    wincl.hbrBackground = (HBRUSH) COLOR_BACKGROUND;
 
    if (!RegisterClassEx (&wincl))
        return 0;
 
 
    hwnd = CreateWindowEx (
           0,
           szClassName,
           "File Checker",
           WS_OVERLAPPEDWINDOW ^ WS_MAXIMIZEBOX,
           CW_USEDEFAULT,
           CW_USEDEFAULT,
           650,
           450,
           HWND_DESKTOP,
           NULL,
           hThisInstance,
           NULL
           );
 
    ShowWindow (hwnd, nCmdShow);
 
    while (GetMessage (&messages, NULL, 0, 0))
    {
        TranslateMessage(&messages);
        DispatchMessage(&messages);
    }
 
    return messages.wParam;
}
 
 
LRESULT CALLBACK WindowProcedure (HWND hwnd, UINT message, WPARAM wParam,
LPARAM lParam)
{
    char FilePathBuf[MAX_PATH];
    switch (message)
    {
        case WM_CREATE:
            CreateWindowEx(0, "Edit", "", WS_CHILD | WS_VISIBLE | ES_MULTILINE | ES_AUTOVSCROLL | ES_AUTOHSCROLL | WS_VSCROLL | WS_HSCROLL |
ES_READONLY, 10, 10, 300, 200, hwnd,
                          (HMENU)LEFT_EDIT, 0, NULL);
            CreateWindowEx(0, "Edit", "", WS_CHILD | WS_VISIBLE | ES_MULTILINE | ES_AUTOVSCROLL | ES_AUTOHSCROLL | WS_HSCROLL | ES_READONLY, 320, 10, 300, 200, hwnd,
                          (HMENU)RIGHT_EDIT, 0, NULL);
            CreateWindowEx(0, "button", "Browse", WS_VISIBLE
| WS_CHILD,
                          320, 250, 135, 20, hwnd, (HMENU)BROWSE_BUTT, 0, NULL);
            CreateWindowEx(0, "button", "Check it!",
WS_VISIBLE | WS_CHILD,
                          300 - 135, 250, 135,
20, hwnd, (HMENU)CHECK_BUTT, 0, NULL);
            CreateWindowEx(0, "button", "Clear", WS_VISIBLE
| WS_CHILD,
                          300 - 135, 250 -
30, 135, 20, hwnd, (HMENU)CLEAR_BUTT, 0, NULL);
            break;
        case WM_COMMAND:
            switch(LOWORD(wParam))
            {
                case BROWSE_BUTT:
                    char filebuf[MAX_PATH];
                    LPITEMIDLIST lpItemIDList;
                    BROWSEINFO myInfo;
 
                    myInfo.hwndOwner =
hwnd;
                    myInfo.pidlRoot = NULL;
                    myInfo.pszDisplayName =
filebuf;
                    myInfo.lpszTitle = "Choose a text file(.txt)\0";
                    myInfo.ulFlags =
BIF_USENEWUI | BIF_BROWSEINCLUDEFILES;
                    myInfo.lpfn = NULL;
                    myInfo.lParam = (LPARAM)NULL;//WM_MYPAR;
                    myInfo.iImage = 0;
                    if ((lpItemIDList = ::SHBrowseForFolder(&myInfo)) != NULL) {
                        if(SHGetPathFromIDList(lpItemIDList, FilePathBuf))
                            SetDlgItemText(hwnd, RIGHT_EDIT, FilePathBuf);
                        else SetDlgItemText(hwnd, RIGHT_EDIT, "Failed to retrieve path of file\0");
                    }
                    break;
 
                case CHECK_BUTT:
                    LARGE_INTEGER size;
                    HANDLE h;
 
                    GetDlgItemText(hwnd, RIGHT_EDIT, FilePathBuf, MAX_PATH);
                    char *ext = strrchr(FilePathBuf, '.');
                    //!ext should get evaluated first, so we dont use a NULL
pointer
                    if(!ext || strncmp(ext, ".txt", 4) != 0) {
                        MessageBox(hwnd, "Invalid file\r\nNeeds to be a valid text file(.txt)", "Error", MB_OK);
                        break;
                    }
                    h = CreateFile(FilePathBuf, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
                    if(h == INVALID_HANDLE_VALUE) {
                        MessageBox(hwnd, "Failed to open file", "Error", MB_OK);
                        break;
                    }
                    if(GetFileSizeEx(h, &size) == 0) {
                        MessageBox(hwnd, "Failed to retrieve file size", "Error",
MB_OK);
                        CloseHandle(h);
                        break;
                    }
                    CloseHandle(h);
                    char *DispBuf = (char
*)malloc(size.QuadPart);
                    if(!DispBuf) {
                        MessageBox(hwnd, "Failed to allocate enough memory", "Error",
MB_OK);
                        break;
                    }
                    int flag =
CheckForDuplicates(FilePathBuf, &DispBuf);
                    if(flag == ERR_FILE)      SetDlgItemText(hwnd, LEFT_EDIT, "File error\r\n\0");
                    else if(flag == ERR_SEEK)
SetDlgItemText(hwnd, LEFT_EDIT, "Failed to
find position in file\r\n\0");
                    else if(flag == ERR_MEMO)
SetDlgItemText(hwnd, LEFT_EDIT, "Error
Allocating enough memory\r\n\0");
                    else if(flag >= 0) {
                        HWND ledit = GetDlgItem(hwnd, LEFT_EDIT);
                        SetDlgItemInt(hwnd, LEFT_EDIT, flag, FALSE);
                        int len =
GetWindowTextLength(ledit);
                        SendDlgItemMessage(hwnd, LEFT_EDIT, EM_SETSEL, (WPARAM)len, (LPARAM)len);
                        SendDlgItemMessage(hwnd, LEFT_EDIT, EM_REPLACESEL,
(WPARAM)TRUE, (LPARAM)" Duplicates found...\r\n\r\n\0");
 
                        if(flag > 0)
{//don't attempt to print duplicates if there
are none...
                            len = GetWindowTextLength(ledit);
                            SendDlgItemMessage(hwnd, LEFT_EDIT, EM_SETSEL,
(WPARAM)len, (LPARAM)len);
                            SendDlgItemMessage(hwnd, LEFT_EDIT, EM_REPLACESEL,
(WPARAM)TRUE, (LPARAM)DispBuf);
                        }
                    }
 
                    free(DispBuf);
                    break;
                case CLEAR_BUTT:
                    SetDlgItemText(hwnd, LEFT_EDIT, "\0");
                    break;
            }
            break;
        case WM_NCHITTEST:
            UINT i, ret = DefWindowProc(hwnd, message, wParam, lParam);
            static UINT bad[] = {
                 HTLEFT, HTRIGHT, HTTOP, HTBOTTOM, HTSIZE, HTTOPLEFT, HTTOPRIGHT, HTBOTTOMLEFT, HTBOTTOMRIGHT
            };
            for(i = 0; i < sizeof(bad) / sizeof(*bad); i++)
                if(ret == bad[i]) return HTCAPTION;
            return ret;
        case WM_DESTROY:
            CoUninitialize();
            PostQuitMessage (0);
            break;
        default:
            return DefWindowProc (hwnd,
message, wParam, lParam);
    }
 
    return 0;
}
 
//-1,   ERR_FILE - file error
//-2,   ERR_SEEK - file seeking failure
//-3,   ERR_MEMO - memory allocation failure
//>= 0           - success
///TODO: make it so the second filename reflects on the first, and puts it in the same
directory.
///NOTE: may need to remove null char from filename before attempting to use it
fopen()
int CheckForDuplicates(char *filename, char **outbuf)
{
#define MAX_CHARS_PER_LINE 500
 
    int flag = 0;
 
    int *IsDup = (int *)malloc(sizeof(int));
    if(!IsDup) return
ERR_MEMO;
 
    BOOL pass;
    signed int x = 0, y = 0, copied = 0;
    long pos = 0;
    char *buf = NULL;
    FILE_WORDS fw = {NULL, NULL};
    FILE *file = fopen(filename, "r");
    if(file == NULL) {free(IsDup); return ERR_FILE; }
 
    char *dup_fname = (char *)calloc(strlen(filename) + strlen("_No_Duplicates") + 1, sizeof(char));
    if(!dup_fname) {free(IsDup); fclose(file); return ERR_MEMO;}
    strncpy(dup_fname, filename, strlen(filename) - 4); //remove
".txt"
    strncat(dup_fname, "_No_Duplicates.txt", sizeof(dup_fname) - strlen(dup_fname) - 1);
 
    FILE *dup_file = fopen(dup_fname, "w+");
    if(dup_file == NULL) { free(IsDup), free(dup_fname); fclose(file);
return ERR_FILE; }
 
    buf = (char *)malloc(MAX_CHARS_PER_LINE);
    if(!buf) {flag = ERR_MEMO; buf = NULL; goto cleanup;}
    x = 0;
    while(fgets(buf, MAX_CHARS_PER_LINE, file) != NULL)
{
        x++;
        fw.word1 = (char *)realloc(fw.word1, strlen(buf) +
1);
        if(!fw.word1) {flag = ERR_MEMO; fw.word1 = NULL; goto
cleanup;}
        else strncpy(fw.word1, buf, strlen(buf) + 1);
 
        pos = ftell(file);
        if(pos == -1L) {flag =
ERR_SEEK; goto cleanup;}
        y = x;
        while(fgets(buf, MAX_CHARS_PER_LINE, file) != NULL)
{
            fw.word2 = (char *)realloc(fw.word2, strlen(buf) +
1);
            if(!fw.word2) {flag = ERR_MEMO; fw.word2 = NULL; goto
cleanup;}
            else strncpy(fw.word2, buf, strlen(buf) + 1);
 
            if(strncmp(fw.word1,
fw.word2, strlen(fw.word1) > strlen(fw.word2) ? strlen(fw.word2) : strlen(fw.word1)) == 0) {
                if(copied == 0) strncpy(*outbuf, fw.word1, strlen(fw.word1) + 1);
                else strncat(*outbuf, fw.word1, sizeof(outbuf) - strlen(*outbuf) - 1);
                strncat(*outbuf, "\r\n", sizeof(outbuf) - strlen(*outbuf) - 1);
 
                copied++;
                IsDup = (int *)realloc(IsDup, sizeof(int) * copied);
                if(!IsDup) {flag = ERR_MEMO; IsDup = NULL; goto cleanup;}
                else IsDup[copied - 1] = y;
            }
            y++;
        }
        fseek(file, pos, SEEK_SET);
        strcpy(fw.word1, "");
        strcpy(fw.word2, "");
        strcpy(buf, "");
    }
 
    if(feof(file)) flag = copied;
    else flag = ERR_FILE;
 
    rewind(file);
    pass = FALSE;
    for(x = 0; fgets(buf, MAX_CHARS_PER_LINE, file) != NULL; x++) {
        for(y = 0; y < copied; y++) {
            if(x == IsDup[y]) {
                pass = FALSE;
                break;
            } else pass = TRUE;
        }
        if(pass) fprintf(dup_file,
"%s", buf);
    }
 
cleanup:
    if(dup_fname != NULL) free(dup_fname);
    if(buf != NULL) free(buf);
    if(fw.word1 != NULL) free(fw.word1);
    if(fw.word2 != NULL) free(fw.word2);
    if(IsDup != NULL) free(IsDup);
    if(file) fclose(file);
    if(dup_file) fclose(dup_file);
 
#undef MAX_CHARS_PER_LINE
    return flag;
}



def.h
    C++ Programming
#ifndef DEF_H_INCLUDED
#define DEF_H_INCLUDED
 
#define _WIN32_WINNT 0x0500
 
#include <windows.h>
#include <shlobj.h>
#include <stdio.h>
 
#define LEFT_EDIT   0x01
#define RIGHT_EDIT  0x02
#define BROWSE_BUTT 0x03
#define CHECK_BUTT  0x04
#define CLEAR_BUTT  0x05
 
#define ERR_FILE -1
#define ERR_SEEK -2
#define ERR_MEMO -3
 
LRESULT CALLBACK WindowProcedure (HWND, UINT, WPARAM, LPARAM);
int CheckForDuplicates(char *, char **);
 
char szClassName[] = "FileChecker";
typedef struct _FILE_WORDS {
 
    char *word1;
    char *word2;
 
}FILE_WORDS;
 
#endif // DEF_H_INCLUDED



Code::Blocks project and executable attached. Has a commented and uncommented copy. As usual any criticism or fixes that helps me improve is welcome Laugh.

edit: forgot to post zip 001_tongue


Attached File(s)
.zip  Duplicate_Finder_By_rubix.zip (Size: 179.88 KB / Downloads: 15)

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
Find all posts by this user
Quote this message in a reply
01-09-2010, 04:05 PM (This post was last modified: 01-09-2010 04:11 PM by rubix.)
Post: #2
RE: [C]Duplicate line finder
Hmm, seems I left something in there.
in main - copy (2).cpp line 97:
Code:
SetDlgItemText(hwnd, RIGHT_EDIT, FilePathBuf);
should be erased. I've edited first post to reflect that.

edit: also, the strncmp on line 218 is kind of stupid, there's no need to check which length is longer lol. I added that early on and forgot to remove it. That's what you get when you haven't coded in a while 001_tongue. It will work however.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
Find all posts by this user
Quote this message in a reply
01-09-2010, 05:26 PM
Post: #3
RE: [C]Duplicate line finder
Nice, now is there an option to actually show the duplicate lines and possible a "weak duplicate" option to detect lines that are similar but not exactly the same?

[Image: adminexecuterh8.png]
[Image: ABGr7.gif]
☤ ☢ Software Engineer - Director of Inferno ☢ ☤
Visit this user's website Find all posts by this user
Quote this message in a reply
01-09-2010, 07:33 PM
Post: #4
RE: [C]Duplicate line finder
Yeah, it automatically displays any lines it finds to be duplicates in the left edit control in the picture. There's no 'weak duplicate' option, but the more I think about that the more it seems like something I should definitely add. Do you mean so that for instance it would match these two lines:
Code:
this is an example.
thsi is an example.
so that it catches tiny spelling mistakes?
Seems like it may be a treat to code, would have to watch out for intentional things as well, for instance it might be hard to tell if this was meant to be a serperate line or a duplicate:
Code:
this
hits
perhaps if I prompted the user if it found a 'weak duplicate'? I'm interested to hear everyone else ideas on how this could be done.

Anyway if I get the urge to code I may fix a couple things I noticed and add an option to display duplicates or not. Thanks for the input.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
Find all posts by this user
Quote this message in a reply
01-09-2010, 08:37 PM
Post: #5
RE: [C]Duplicate line finder
It should operate much like the Find function of firefox, except with maybe catching spelling mistakes and such but I guess that is hard to do.

[Image: adminexecuterh8.png]
[Image: ABGr7.gif]
☤ ☢ Software Engineer - Director of Inferno ☢ ☤
Visit this user's website Find all posts by this user
Quote this message in a reply
01-10-2010, 10:21 AM
Post: #6
RE: [C]Duplicate line finder
(01-09-2010 07:33 PM)rubix Wrote:  I'm interested to hear everyone else ideas on how this could be done.

Well since you mentioned it... I think a pretty cool library could be made out of this.

I would start by revamping your searching method to use Regular Expressions. RegEx is FAST... REAL FAST! Although, they are a bit cryptic to read.

Next I would would do searches of characters around the pressed key . For example, if I wanted to search for "The key is blue and green..." a positive would result from, "The keu is vlue adn green,,,". There might already be a library like this, but I'll post more details on how to do it later. I've done plenty of file parsing in my life.

☤Legalize It☤
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)