首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
CS 412代做、代写Python设计程序
项目预算:
开发周期:
发布时间:
要求地区:
CS 412: Spring ’24
Introduction To Data Mining
Assignment 5
(Due Monday, April 29, 23:59)
• The homework is due on Monday, April 29, 2024, at 23:59. Note that this is a hard deadline. We are
using Gradescope for all homework assignments. In case you haven’t already, make sure to join this
course on Gradescope using the code shared on Canvas. Contact the TAs if you face any technical
difficulties while submitting the assignment. Please do NOT email a copy of your solution. We will
NOT accept late submissions (without a reasonable justification).
• Please use Campuswire if you have questions about the homework. Make sure to appropriately tag your
post. Also, scroll through previous posts to make sure that your query was not answered previously.
In case you are sending us an email regarding this Assignment, start the subject with “CS 412 Spring
’24 HW5:” and include all TAs and the Instructor (Jeffrey, Xinyu, Kowshika, Sayar, Ruby).
• Please write your code entirely by yourself. All programming needs to be in Python 3.
• The homework will be graded using Gradescope. You will be able to submit your code as many times
as you want.
• The grade generated by the autograder upon submission will be your final grade for this assignment.
There are no post deadline tests.
• Do NOT add any third-party libraries in your code. Built-in Python libraries are allowed.
• For submitting on Gradescope, you would need to upload a Python file named homework5.py. A
python file named homework5.py containing starter code is available on Canvas.
• You are provided two sample test cases on Canvas, you can try debugging your code with minsup
values of 2 or 3 with the given sample inputs. On Gradescope, your code will be evaluated on these
sample test cases as well as additional test cases. You will get autograder feedback for the sample test
cases but not for the other hidden test cases.
• Late submission policy: there will be a 24-hour grace period without any grade reduction, i.e., Gradescope will accept late submissions until Tuesday, April 30, 2024, at 23:59.. Unfortunately, we will
NOT accept late submissions past the grace period (without a reasonable justification).
1
Problem Description
The focus of the programming assignment is to implement a frequent itemset mining algorithm based
on Apriori method with pruning. Given a transacion database T DB and a minimum support threshold minsup, the algorithm should simulate the Aprirori method with pruning - returning all the candidate
itemsets and the frequent itemsets at each scan of the algorithm.
We will test your code on relatively small transaction databases (maximum 15 transactions of length 10).
Please make sure the runtime of your code does not exceed 10 seconds for such small databases.
You will not get any credit if your code does not work.
Input Format: The input will be a plain text file with a transaction database, with each line corresponding
to a transaction composed of a string of letters. Each letter in a transaction corresponds to an item. For
example, the transaction database Test-1.txt is as following:
ACD
BCE
ABCE
BE
Your code will take two inputs:
1. Path to a plain text file pointing to the transaction database; and
2. An integer, the minimum support.
2
Output Format: Your code will implement a function called apriori based on Apriori algorithm with pruning. It will return a 3-level nested dictionary.
Figure 1: Simulation of Test-1.txt
Figure 1 shows the simulation of the Apriori algorithm with pruning for an example. The expected
output (3-level nested dictionary to be returned from the apriori function of your code) is shown in Figure
2.
Output dictionary structure
Let’s consider the 3 levels of the dictionary as outer, middle, and inner levels. The keys of the outer
level will denote the scans (or iterations) of the algorithm. For example, in Figure 1, the algorithm terminates after 3 scans and so in the dictionary of Figure 2, we have 3 elements in the outer dictionary, where the
keys of these 3 elements are integers 1, 2, and 3 denoting the first, second and third scans of the algorithm,
respectively. The scan numbers must start from 1 and should of integer data type.
Value of each scan no.(i.e., each key in the outer layer) is a dictionary, which are the middle layer dictionaries. In Figure 1, the algorithm generates the candidate itemsets and the frequent itemsets in each scan.
So each middle dictionary will have two elements - the key c denoting the candidate itemsets and the key f
denoting the frequent itemsets. The data type of keys c and f should be string.
Value for the keys c and f will be dictionaries - denoting the candidate itemsets and the frequent itemsets
of the corresponding scan. The keys of these dictionaries will be of string data type denoting the itemsets.
The values will be of integer data type denoting the support of the associated itemset.
3
Figure 2: Expected output for Test-1.txt
4
Notes
1. Pruning: While creating the candidate itemsets at every scan, you are supposed to apply pruning.
For example, in Figure 1, at the 2nd scan, merging AC and BC can generate the candidate ABC for
the 3rd scan, but as a subset AB of ABC is absent in the frequent set F2, ABC is pruned and not
included in the candidate set C3. Similarly, the ABC is absent in the corresponding inner dictionary
of Figure 2.
2. Sorting: The alphabets in the strings of the keys of the inner dictionaries should be alphabetically
sorted. For example, BCE should not be any of BEC, CBE, CEB, ECB, EBC.
3. Filename: The submitted file should be named homework5.py, otherwise Gradescope will generate an
error.
4. Terminating: If the frequent itemsets of a scan has only one itemset, the algorithm will terminate
and no further scan will be done. For example, in Figure 1, F3 has only one itemset BCE, so the 4th
scan was not performed.
Also, if the candidate itemsets of a scan is empty, that scan will be discarded and won’t be included in
the output. For example, let’s assume for some input, the frequent itemsets F2 obtained at 2nd scan
are AC, BC. So the candidate itemsets C3 for the 3rd scan will be empty (ABC won’t be in C3 as AB
is absent in F2 and so ABC will be pruned). In this case, the output will not include the 3rd scan as
both C3 and F3 are empty.
5. Error: If you get an error from the autograder that says the code could not be executed properly and
suggests contacting the course staff, please first check carefully if your code is running into an infinite
loop. An infinite loop is the most likely cause of this error.
What you have to submit
You need to submit a Python file named homework5.py. A starter code is posted on Canvas. Implement
the code to compute the required output. You can add as many functions in your code as you need. Your
code should be implemented in Python 3 and do NOT add any third-party packages in your code; you can
use Python’s built-in packages.
Your code must include a function named apriori which takes following two inputs:
1. Transaction database (filename in the starter code): path to a plain text file with the sequence database
as shown in the example above. Each line will have a transaction. Note that there will be an empty
line at the end of the file.
2. Minimum support (minsup in the starter code): an integer indicating the minimum support for the
frequent itemset mining.
A call to the function will be like:
apriori("hw5 sample input 1.txt", 2)
Additional Guidelines
The assignment needs you to both understand algorithms for frequent itemset mining, in particular Apriori
with pruning, as well as being able to implement the algorithm in Python. Here are some guidelines to
consider for the homework:
• Please start early. It is less likely you will be able to do a satisfactory job if you start late.
• It is a good idea to make early progress on the assignment, so you can assess how long it will take: (a)
start working on the assignment as soon as it is posted. Within the first week, you should have a sense
of the parts that will be easier and parts that will need extra effort from you; (b) Solve an example
5
(partly) by hand as a warm-up to get comfortable with the steps that you will have to code. For the
warm-up, you can use the two sample test cases provided on Canvas named hw5 sample input 1.txt and
hw5 sample input 2.txt.
6
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
data程序代写、代做c/c++编程语...
2024-05-17
data程序代写、代做python编程...
2024-05-17
program代做、c/c++,python程...
2024-05-17
代写math 3333 3.0 - winter 2...
2024-05-17
代做seng6110 programming ass...
2024-05-17
代写seng6110 object oriented...
2024-05-17
代写comp828: statistical pro...
2024-05-17
代做culture and society调试数...
2024-05-17
代做comp 4911 winter 2024 as...
2024-05-17
代做lh physical iiib / 03 33...
2024-05-17
代做3032ict big data analyti...
2024-05-17
代写comp4702 report代写留学生...
2024-05-17
代写fin2020 hw6代写c/c++编程
2024-05-17
热点标签
fit2004
fit3152
mec208
econ20120
cpt304
econ2101
econ0051
engi4547
econ1048
eengm2510
fit1008
7033mkt
ec2066
cct380h5f
man00019m
mech265001
fin2020
fit9137
n1542
csc4140
math6119
comp1710
fina864
csys5020
busi4412
math5007
2702ict
dts204tc
comp2003j
cosc2673
ecmt2150
bff3121–
comu7000
stat6118
comp814
acc202
ematm0067
bit233
ecs776p
600543
bpln0025
comp3400
econ7030
159.342 ‐ operating
mang6134
math1005/math6005
geog5404m
comp1710/6780
infs 2042
inf6028
bman30702
math0002
msci242l
mgt11001
com00177m
bman71282
fit2001
cpt210
159.341
econ7310
comp3221
comp10002
cpt206
ecmt1010
finm081
econ2005
cpt202
fit3094
socs0030
data7201
data2x01
mn-3507
mat246h1
ib2d90
ib3j80
acc207
comp90007
compx518-24a
fit1050
info1111
acct2201
buad801
compsci369
cse 332s
info1110
math1033
scie1000
eeee2057
math4063
cmt219
econ5074
eng5009
csse2310/csse7231
ec333
econ0001
cpt204
elec4630
ma117
dts104tc
comp2017
640481
csit128
eco000109m
finc5090
ggr202h5f
nbs8295
4ssmn902
chc6171
dsa1002
ebu6304
comp1021
csci-ua.202
com6511
ma416
mec206
iom209
bism7202
idepg001
cpt106
comp1212
ecom209
math1062
mn-3526
fnce3000
fmhu5002
psyc10003
fina2222
be631-6-sp/1
finc2011
37989
5aaob204
citx1401
econ0028
bsan3204
comp9123
cmt218
itp122
qbus6820
ecmt1020
bus0117
soft3202/comp9202
basc0057
mecm30013
aem4060
acb1120
comp2123
econ2151
ecmt6006
inmr77
com 5140
ocmp5328
comp1039
had7002h
cmt309
asb-3715
elec373
cpt204-2324
be631-6-sp
econ3016
mast10007
buss6002
comp4403
comp30023
finm1416
csc-30002
6qqmn971
fin668
mnfg309
inft2031
cits1402
comp2011
eecs 3221
ebu4201
ct60a9600
com336
8pro102
econ7300
comp3425
comp8410
comp222
finm8007
comp2006
comp26020
comp1721
eeen3007j
cis432
csci251
comp5125m
com398sust
finm7405
econ7021
fin600
infs4205/7205
mktg2510-
32022
mth6158
comp328
finn41615
2024
mec302
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!